01 Introduction to SPCA

Welcome to Systems Programming and Computer Architecture. This isn’t just another programming course. We’re going to peel back the layers of abstraction that you’re used to and look at what’s really happening inside the machine. The goal is to change the way you think about code and computers.

This course covers in depth…

This course is about getting your hands dirty with the metal. We’ll explore:

How to write fast and correct code. It’s not enough for code to work; in systems, it needs to be efficient and robust.
How to write good systems code. This is a different beast from application code. The best programmers in the world are often systems programmers, even when they’re writing applications.
What makes programs go fast (and slow). We’ll dive into the mechanics of performance, moving beyond simple algorithmic complexity.
Programming in C. It’s an old language, but it’s still the undisputed king for systems programming. It gives you the power and control you need.
Programming in Assembly Language. You’ll rarely write it, but you must understand it. It’s the language of the machine, and knowing it helps you write better code in any language.
Programs as more than mathematical objects. A program isn’t just a function to be evaluated. It’s an actor in the world. It interacts with hardware, with networks, with people. Think about Facebook—it’s not a mathematical object; it’s a system with a profound (good or bad) effect on the world.
How programs interact with the hardware. This is the core of the course: understanding the dialogue between the software you write and the silicon that executes it.

Who are we?

Your guides on this journey are Prof. Timothy Roscoe and Prof. Ana Klimovic.

We haven’t spent our entire careers in academia. We come from industry, and that perspective shapes how we teach this course.

Prof. Roscoe was at Intel.

Prof. Klimovic was a Research Scientist at Google Brain.

Acknowledgements

We stand on the shoulders of giants. Much of this course material has a rich history:

About half of the material is heavily based on the legendary CS 15-213 course at Carnegie Mellon University (CMU). This is also the basis for the main textbook. Our thanks go to Dave O’Halloran, Randall Bryant, & Brian Railing at CMU.
Some of the C programming material is adapted from CSE333 at the University of Washington, with thanks to (ex-)Prof. Steve Gribble.
We’ve also added a lot of new material covering multicore systems, devices, and more. This part is mostly our fault. 😉

Logistics

Lectures

Physical:
- 10:00-12:00 on Tuesdays (HG E 7) and Wednesdays (ETA F 5).
Recordings:
- Recordings will be available a few days after the lecture on the ETH Video Portal: https://video.ethz.ch/lectures/d-infk/2024/autumn.html

Quote

The slides are not intended to be understood without the lectures… They are a guide, not a replacement for the discussion and context provided in person.

Moodle

The Moodle page is your central hub for the course: https://moodle-app2.let.ethz.ch/course/view.php?id=23115 (or https://moodle-app2.let.ethz.ch/course/view.php?id=26295 for 2025)

This is the first place to look!
All links and lecture materials will be posted here.
Ask questions in the forum! TAs and Professors monitor it regularly. This is the best way to get help, as everyone can benefit from the question and answer.
We will not answer questions on Discord. The signal-to-noise ratio is too low, and the advice can be unreliable. Stick to the Moodle forum for official help.

Tutorial sessions

These are very important. This is where you’ll build the practical skills that are essential for the exam and for being a real systems programmer.

Logistics:
- Wednesday, 12:00-14:00 or 14:00-16:00.
- Check myStudies for your specific room and stream.
Content:
- You’ll learn the tools and skills needed for the lab exercises.
- This material is examinable but won’t be covered in the lectures!
There will be a session this Wednesday (tomorrow).

Language

We’ll teach in English (and C…).
- If we speak too fast or something is unclear, please raise your hand and ask! Questions are always welcome.
The assistants speak German, English, Italian, French, and more.

Asking Questions

There’s a hierarchy for getting help. Please try to follow it.

Ask during the lectures.
Ask on the Moodle forum outside of lectures. (Best for most questions!)
Ask your friends.
Check the web (Stack Overflow is often good, but be critical).
Ask your teaching assistant.
Ask another teaching assistant.
Email us (troscoe@inf.ethz.ch or aklimovic@ethz.ch). This should be a last resort, as there are over 400 of you!

What is Systems Programming?

Systems vs. Application Programming

There’s a fundamental difference between a systems programmer and an application programmer.

Systems programmers can, and do, write applications.
- They are often the most productive programmers because they have a deep understanding of the entire stack.
- Their code is more efficient and robust.
- The performance difference can be dramatic.
Pure applications programmers should not be trusted to write system software. The mindset is different; the assumptions are different.

So, what is this system software?

”Systems” as a Field

It’s the foundational layer of software that makes everything else possible. It encompasses:

Operating systems
Database systems
Networking protocols and routing
Compiler design and implementation
Distributed systems
Cloud computing & online services
Big Data and machine learning frameworks

Essentially, it’s everything on and above the hardware/software boundary.

Why Systems Matters

You can’t build a skyscraper on a shaky foundation. Systems software is that foundation.

It provides the nice abstractions that application programmers love and rely on:
- Processes, threads, virtual memory, garbage collection, files, network sockets, cloud services, databases, parallel frameworks, containers, etc.
It provides the security properties we depend on:
- Authentication, authorization, protection, access control, etc.
It’s everything that makes a computer more than a glorified abacus:
- I/O devices, graphical output, multiprocessing, accelerators for ML and AI, etc.

…all of this is provided by system software. And someone has to write it.

Quote

„In designing an operating system one needs both theoretical insight and horse sense. Without the former, one designs an ad hoc mess; without the latter one designs an elephant in best Carrara marble (white, perfect, and immobile).“ — Roger Needham and David Hartley, 1968

This field is a beautiful, messy, and fascinating blend of deep theory and brutal pragmatism.

Why Systems Programming is Different

It’s fundamentally a different way of thinking about computing.

Most CS courses emphasize abstraction:
- Abstract data types (objects, contracts).
- Programs as pure mathematical objects with well-defined behavior.
- Performance as asymptotic analysis (worst-case complexity).
These abstractions have limitations:
- They often don’t survive contact with reality, especially in the presence of bugs.
- Systems Programmers understand the details of the underlying implementations. They know when the abstractions hold and, more importantly, when they leak.

Motivation – Some Inconvenient Truths About Computers

Let’s get concrete. Here are some truths about computers that challenge the clean, abstract models you may be used to.

Inconvenient Truth: Computers don’t really deal with numbers.

Computers don’t deal with integers

In mathematics, the set of integers $Z$ is infinite. A sum is a well-defined concept.

S = {i_{0} ... i_{k - 1}, i \in Z}

T = j = 0 \sum k - 1 i_{j}

In reality, on a computer, things are… different. Consider this simple C program that sums integers from standard input:

#include <stdio.h>
#include <stdlib.h>
 
#define BUFFER_LENGTH 80
 
int main(int argc, char *argv[])
{
    char buffer[BUFFER_LENGTH];
    int total = 0;
 
    while( fgets(buffer, BUFFER_LENGTH, stdin) ) {
        total += atoi(buffer);
    }
    printf("Total is %d\n", total);
    return 0;
}

If we feed it a file with 0, 1, 2, 3, 4, it correctly outputs 10. But what if we feed it this file?

The program outputs Total is 42.

Important

This is not a bug – it is correct behavior!

The int type in C is not a mathematical integer. It’s a fixed-size container (usually 32 bits) that operates using modular arithmetic. The behavior you see is a result of integer overflow, which is perfectly defined.

Computers don’t deal with reals either

The same goes for real numbers. In math, addition is associative:

\forall x, y, z \in R, (x + y) + z = x + (y + z)

But in the world of floating-point numbers, this property breaks down. Consider this code:

#include <stdio.h>
 
int main(int argc, char *argv[])
{
    float x = 1e20;
    float y = -1e20;
    float z = 3.14;
 
    printf("(x + y) + z = %f\n", (x + y) + z);
    printf("x + (y + z) = %f\n", x + (y + z));
    return 0;
}

Running this will give you two different answers.

Important

This is not a bug – it is correct behavior!

Computer Arithmetic

It does not generate random values. The results are deterministic and follow specific rules.
You cannot assume all “usual” mathematical properties. This is due to the finiteness of representations.
- Integer operations satisfy “ring” properties (commutativity, associativity, distributivity) over a finite field (e.g., modulo $2^{32}$ ).
- Floating point operations satisfy “ordering” properties (monotonicity, signs).
Observation: You need to understand which abstractions apply in which contexts. This is critical for compiler writers and any serious programmer who cares about correctness and performance.

Inconvenient Truth: The best programmers know assembly.

You’ve got to know assembly

Chances are, you’ll never write a program in assembly. Compilers are much better and more patient at it than you are.
But: understanding assembly is key to the machine-level execution model.
- Behavior of programs in the presence of bugs: When high-level models break down, the assembly tells you what’s really happening.
- Tuning program performance: You need to understand what optimizations the compiler is (and isn’t) doing, and where inefficiencies come from.
- Implementing system software: The compiler’s target is machine code. The OS must manage the machine’s state at this level.
- Creating / fighting malware: Malware operates at the machine level. To fight it, you must understand its language.

Assembly example: measuring cycles

The Time Stamp Counter (TSC) is a special 64-bit register on Intel-compatible machines that increments with every clock cycle. It’s the most precise timer you can get. You can’t read it from a high-level language; you need a tiny piece of assembly.

Here’s a C function that uses an asm block to read the TSC:

uint64_t rdtsc()
{
    uint32_t lo, hi;
    // This inline assembly reads the TSC into the edx and eax registers,
    // then moves them into our C variables 'hi' and 'lo'.
    asm volatile("rdtsc; movl eax,%1"
        : "=r" (hi), "=r" (lo)
        :
        : "%edx", "%eax");
    return (lo | (((uint64_t)hi) << 32));
}

We can use this to measure precisely how many cycles a computation takes.

int main(int argc, char *argv[])
{
    uint64_t start, overhead;
    unsigned long result;
 
    // Measure the overhead of calling rdtsc itself.
    start = rdtsc();
    overhead = rdtsc()-start;
    printf("Counter overhead is %lu cycles\n", overhead);
 
    // Time the function
    start = rdtsc();
    result = calc();
    printf("Time = %lu cycles\n", rdtsc()-start-overhead );
    printf("Result = %lu\n", result);
    return 0;
}

Of course, proper performance measurement is a subtle art, but this is the fundamental tool.

Inconvenient Truth: Performance is about much more than asymptotic complexity.

Asymptotic complexity is a powerful theoretical tool, but in the real world:

Constant factors matter too – often more. An $O (n lo g n)$ algorithm with a huge constant factor can be slower than an $O (n^{2})$ algorithm for any realistic input size.
Even the exact operation count does not predict performance. You can easily see a 10x performance difference for the same algorithm depending on how the code is written.
You must optimize at multiple levels: algorithm, data representations, procedures, and loops.
To do this, you must understand the system: how programs are compiled and executed, how to measure performance and find bottlenecks, and how to improve it without sacrificing modularity.

Example: Matrix-Matrix Multiplication

This is a fundamental operation in ML, graphics, and scientific computing. The naive algorithm requires $O (n^{3})$ operations for $n \times n$ matrices.

(...) \leftarrow (...) \times (...)

How complicated can it be? Let’s look at the performance of a simple triple-loop implementation on an old Core 2 Duo machine.

The y-axis is Gflop/s (billions of floating-point operations per second), so higher is better. Notice how performance decreases as the matrix size grows and no longer fits in the processor’s caches.

Now, let’s compare that naive loop to the best-known, hand-optimized code for that same processor (by Kazushige Goto).

It’s 160 times faster.

Crucially, both implementations perform exactly the same number of floating-point operations ( $2 n^{3}$ ). The difference isn’t the algorithm’s complexity; it’s the implementation’s awareness of the underlying system.

What’s going on? The speedup comes from multiple sources:

Multiple threads: 4x. Using multiple cores.
Vector instructions: 4x. Using SIMD (Single Instruction, Multiple Data) units to perform multiple operations at once.
Memory hierarchy and other optimizations: 20x. This is the biggest factor!

This 20x comes from techniques like blocking/tiling, loop unrolling, and instruction scheduling. The effect is profound: less register spills, less L1/L2 cache misses, less TLB misses. This is pure systems knowledge in action.

Inconvenient Truth: Memory is not a nice array that stores your data.

The details of memory

Memory is not unbounded. It must be allocated and managed. Many applications are memory-dominated.
Memory performance is not uniform. Accessing memory can be fast or slow. Cache and virtual memory effects can have a huge impact. Adapting your program to the memory system can lead to major speed improvements.
Memory is typed. Different kinds of memory behave differently.
Sometimes memory isn’t even memory.

Consider this code with a potential out-of-bounds write:

typedef struct {
    int a[2];
    double d;
} struct_t;
 
double fun(int i) {
    volatile struct_t s;
    s.d = 3.14;
    s.a[i] = 1073741824; /* Possibly out of bounds */
    return s.d;
}

The results are bizarre and system-specific:

fun(0) → 3.14 (Correct)
fun(1) → 3.14 (Correct)
fun(2) → 3.13999... (Close, but wrong!)
fun(3) → 2.00000... (Very wrong!)
fun(4) → 3.14 (Correct again?!)
fun(6) → Segmentation fault (Crash)

This happens because of how struct_t is laid out on the stack. Writing to a[2] or a[3] is actually overwriting the bytes of the double d that lives next to it in memory.

Memory system performance

The pattern of memory access is critical. Consider two ways to copy a 2D array:

// Accesses memory sequentially (good)
void copyij(int src[2048][2048], int dst[2048][2048]) {
    for (int i = 0; i < 2048; i++)
        for (int j = 0; j < 2048; j++)
            dst[i][j] = src[i][j];
}
 
// Jumps through memory (bad)
void copyji(int src[2048][2048], int dst[2048][2048]) {
    for (int j = 0; j < 2048; j++)
        for (int i = 0; i < 2048; i++)
            dst[i][j] = src[i][j];
}

On an Intel Core i7, the performance difference is staggering: 5.2 ms vs 162 ms. This is because copyij accesses memory in a linear, cache-friendly way, while copyji jumps all over memory, causing constant cache misses.

This effect is visualized by the “Memory Mountain,” which plots memory throughput against access stride and data size. copyij lives on the high peaks, while copyji is down in the deep valleys.

What does `*p = v` do?

To an application programmer, this C statement stores the value v at the memory location pointed to by p. The compiler translates this into a single machine instruction.

// C code
void store(uint64_t *p, uint64_t v) {
    *p = v;
}
 
// x86 Assembly
movq %rsi, (%rdi)
 
// ARMv8-A Assembly
str x1, [x0]
 
// RISC-V RV64I Assembly
sd a1, 0(a0)

To a systems programmer, that single instruction could trigger a cascade of events:

Cache hit and sets a dirty bit in the cache
Cache hit, global invalidate
Cache miss, write allocate, and evicts another line
Cache miss, write-through
Cache miss, fetch from another cache in exclusive mode
TLB hit, mark page dirty
TLB miss and hardware fill
TLB miss, software exception
Page fault, page not present
Page fault, protection fault
Write page table entry, change virtual memory mappings
…and so on.

Or it might not even be memory! It could be:

A device register, initiating an I/O operation.
A trigger to send an inter-processor interrupt.
A write that sends a cache transaction to an I/O device.

The point is not to be overwhelmed, but to be aware. A systems programmer knows these possibilities exist and can reason about them when debugging a tricky performance issue or a bizarre bug.

Inconvenient Truth: Computers don’t just execute programs; Programs don’t just calculate values.

Computers are not isolated calculators.

They need to get data in and out. I/O is critical to reliability and performance.
They sense the physical world (mouse clicks, sensor data) and act in it (displaying graphics, controlling motors).
They communicate over networks, which introduces a host of systems-level problems: concurrency, unreliability, interoperability, and complex performance issues.

The simplified diagrams in introductory textbooks are useful starting points, but they are lies-to-children. The reality is far more complex.

This complexity has exploded because the old ways of making computers faster have hit physical walls.

Processors are not getting faster. The free lunch of Moore’s Law giving us faster clock speeds is over. Instead, we get more transistors, which architects use to build new, complex, and heterogeneous kinds of computers (multi-core, SoCs, accelerators, FPGAs).

A modern System-on-Chip (SoC) looks less like the simple diagram and more like a city map of specialized components.

And research systems like Enzian push this even further, tightly integrating CPUs with FPGAs.

How can these be programmed? This is a systems software problem.

Inconvenient Truth: Programs are not semantic specifications.

The role of “standards”

Language standards like Java’s aim to specify unambiguously what a program does. This gives us “write once, run anywhere.”
The C standard is different. Behavior is frequently described as “implementation dependent”.

What does “implementation defined” mean? The standard says it’s “unspecified behavior where each implementation documents how the choice is made.” This has led to two interpretations:

The modern compiler view: The compiler is allowed to do anything (including optimizing the code away completely), as long as it’s documented somewhere.
The classic systems view: The compiler implements the most natural mapping to the target hardware and documents this. This is what you want for systems programming, as it gives you predictable control over the machine.

This leads to a crucial shift in mindset for this course:

Success

A C program is not a semantic specification. A program is a set of instructions to a compiler that tell it what assembly language to generate.

What to expect in this course?

Course Goals

Think like systems programmers. Build a mental model of the entire machine, from the silicon up.
Become more effective application programmers. Find and eliminate bugs efficiently, understand and tune for performance, and write better code in any language.
Prepare for later systems classes at ETHZ like Compilers, Operating Systems, Networks, and Computer Architecture.

What we’ll assume you know

We build on concepts from previous courses. You don’t need to be an expert, but you should be familiar with:

1: The Basics
- Memory, addresses, bytes, words (binary, hex, byte-ordering)
- Boolean algebra (and, or, not, xor)
- How to write programs in some language (e.g., Java, C++)
2: The Concepts
- Processor architecture (registers, addressing modes, instruction formats)
- Basic memory systems (cache, virtual memory, I/O)
- Software engineering (object-orientation, design-by-contract)
- Concurrency and parallelism (threads, locks, mutexes)

You’ll be writing a lot of C

You learn systems programming by doing it. You’ll see a lot of code, and you’ll write a lot of code. Exam performance will depend on it.
You’ll need to be comfortable with the command line (Unix shell: ls, make, gcc, gdb, etc.).
- Why? It’s closer to reality, and it makes it clearer what is really going on under the hood.

Programming environment

The course targets CodeExpert and Linux Ubuntu 24.04 LTS on 64-bit x86 PC hardware.
You have options:
- Native install, or use the lab machines.
- Windows Subsystem for Linux (WSL).
- Virtual machine (e.g., VirtualBox).
We’ll help, but there’s a limit to how weird we’re prepared to get.

C in the exam

The exam will be online, using CodeExpert.
It will cover C knowledge, systems programming skills, and course concepts, often combined in the same question.
For example, instead of asking “What is the largest positive signed integer representable in a two’s complement format using n bits?”, we will ask you to:

Fill in the body of the following C function, which calculates and returns the largest positive signed integer representable in two’s complement format using sz bits:
```
/*
 * Return the largest signed integer representable
 * in two's-complement notation using sz bits.
 */
int64_t maxint(unsigned int sz)
{
    // Your answer goes here
}
```

A word on testing & submitting

Testing: Devising and writing good tests is an essential systems programming skill. In many exercises and exam questions, you won’t get a test suite. This is what happens in reality.
Submitting: Reasoning about the correctness of your code is what makes a good programmer. In the exam, the last submitted version is what counts. We strongly advise getting good at writing C during the course and not submitting untested answers at the last moment!

Textbooks & Recommendations

Main Textbook:
- Randal E. Bryant and David R. O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition (CS:APP3e). http://csapp.cs.cmu.edu
Books on C:
- Brian Kernighan and Dennis Ritchie, The C Programming Language, 2nd Edition. (The classic “K&R”)
- Samuel Harbison and Guy Steele, C: A Reference Manual, 5th edition.
Other Recommendations:
- Brian Kernighan and Rob Pike, The Practice of Programming. (An excellent book on the philosophy of writing good, clean, simple code).
- Peter van der Linden, Expert C Programming: Deep C Secrets. (For when you want to explore the weird corners of the language).

CS Notes

Explorer

01 Introduction to SPCA

This course covers in depth…

Who are we?

Acknowledgements

Logistics

Lectures

Moodle

Tutorial sessions

Language

Asking Questions

What is Systems Programming?

Systems vs. Application Programming

”Systems” as a Field

Why Systems Matters

Why Systems Programming is Different

Motivation – Some Inconvenient Truths About Computers

Inconvenient Truth: Computers don’t really deal with numbers.

Computers don’t deal with integers

Computers don’t deal with reals either

Computer Arithmetic

Inconvenient Truth: The best programmers know assembly.

You’ve got to know assembly

Assembly example: measuring cycles

Inconvenient Truth: Performance is about much more than asymptotic complexity.

Example: Matrix-Matrix Multiplication

Inconvenient Truth: Memory is not a nice array that stores your data.

The details of memory

Memory-related bugs are still a nightmare.

Memory system performance

What does *p = v do?

Inconvenient Truth: Computers don’t just execute programs; Programs don’t just calculate values.

Inconvenient Truth: Programs are not semantic specifications.

The role of “standards”

What to expect in this course?

Course Goals

What we’ll assume you know

You’ll be writing a lot of C

Programming environment

C in the exam

A word on testing & submitting

Textbooks & Recommendations

Table of Contents

Graph View

Backlinks

What does `*p = v` do?