Lecture from: 16.04.2024 | Video: Video ETHZ

Learning Goals for Today

Recap of Foundational Concepts (From Previous Lectures):

  • We’ve seen simple proofs of correctness for concurrent algorithms and encountered unexpected failures when running them on real computers.
  • We introduced the concept of memory models as the crucial contract between the programmer, compiler, runtime, and hardware architecture.
  • We saw Java’s volatile and synchronized constructs, which provide specific guarantees within this contract.

Today’s New Topics:

  • Implementing Mutexes with Atomic Registers: Exploring how the fundamental mutual exclusion property (provided by locks) can be built from the ground up using only atomic read/write operations on registers.
    • Dekker’s Algorithm
    • Peterson’s Algorithm
  • Implementing N-thread Locks with Atomic Registers: Extending these ideas beyond two threads.
    • Filter Lock
    • Bakery Lock
  • Locks using Atomic Operations: Brief look at locks built using hardware-level atomic instructions (TAS, TATAS, exponential backoff).

Context: Remember, you will rarely implement these low-level locks yourself in practice; you’ll use library functions (like Java’s synchronized or ReentrantLock). However, building them helps understand the core principles and challenges, reinforcing the learning process – “involve me and I learn.”

Behind Locks: Implementation of Mutual Exclusion

We’ve used locks (synchronized, Lock) as a fundamental tool. But how do they work underneath? How can we guarantee that only one thread enters a critical section at a time, using only basic memory operations? This section explores the implementation of mutual exclusion.

Assumptions (for Algorithm Analysis)

To analyze the correctness of the upcoming mutual exclusion algorithms, we’ll initially make some simplifying assumptions. It’s crucial to remember these are not always true in real systems without explicit guarantees (like volatile or memory fences).

  1. Atomic Reads and Writes: We assume that reading or writing a primitive variable (like boolean or int) happens atomically – as a single, indivisible operation. (We will define “atomic” more precisely later).
  2. No Reordering: We assume, for the purpose of understanding the algorithms’ logic, that the compiler and hardware do not reorder the sequence of reads and writes as written in the code. (This is FALSE in practice! Real implementations need mechanisms like volatile or memory barriers to prevent harmful reordering).
  3. Eventual Exit: A thread attempting to enter a critical section will eventually leave it (i.e., call the unlock/postprotocol part). We don’t worry about infinite loops inside the critical section itself.

We also assume the standard multithreaded environment where thread execution can interleave arbitrarily. We make no assumptions about progress outside the critical sections (a thread might stall indefinitely in its non-critical section).

Critical Sections

The requirements for a correct critical section implementation:

  1. Mutual Exclusion: No two processes execute statements from their critical sections simultaneously (no interleaving within the critical sections).
  2. Freedom from Deadlock: If multiple processes are trying to enter their critical sections, at least one must eventually succeed.
  3. Freedom from Starvation: If any process attempts to enter its critical section, it must eventually succeed (it cannot be blocked indefinitely while others proceed).

The Critical Section Problem Structure

The general setup involves two or more processes (or threads) competing for access to a critical section, using shared global variables to coordinate.

Process P                      Process Q
 local variables                local variables
 loop                           loop
   non-critical section           non-critical section
   preprotocol   <-- Try to     preprotocol   <-- Try to
   critical section <-- Access   critical section <-- Access
   postprotocol  <-- Release    postprotocol  <-- Release

On a simple single-core machine, a rudimentary (and generally bad) approach is to disable interrupts (IRQs) before the critical section and re-enable them after. This prevents the OS from preempting (switching) the thread during the critical section. However, this doesn’t work on multi-core systems and has many other drawbacks.

Attempt 1: Using “Want” Flags

Let’s try to solve mutual exclusion for two processes, P and Q, using shared boolean flags indicating intent.

  • Shared: volatile boolean wantp = false, wantq = false;
  • Process P:
    loop {
        // p1: non-critical section
        p2: while(wantq) {} // Wait if Q wants in
        p3: wantp = true;   // Indicate I want in
        p4: critical section
        p5: wantp = false;  // Indicate I'm done
    }
  • Process Q: Symmetric code, swapping p/q.

Problem: Does this guarantee mutual exclusion? No.

Failure Scenario (State Space Analysis): Consider the state [p, q, wantp, wantq].

  1. Start: [p1, q1, false, false]
  2. P runs non-critical, reaches p2: [p2, q1, false, false]
  3. Q runs non-critical, reaches q2: [p2, q2, false, false]
  4. P executes while(wantq) (false), continues to p3: [p3, q2, false, false]
  5. Q executes while(wantp) (false), continues to q3: [p3, q3, false, false] (Interleaving!)
  6. P executes wantp = true: [p4, q3, true, false] P enters CS.
  7. Q executes wantq = true: [p4, q4, true, true] Q also enters CS!

Mutual Exclusion VIOLATED! Both processes can be in the critical section (state [p4, q4, ...]) simultaneously. The core issue is checking the other’s flag before setting one’s own.

Attempt 2: Flags First, Then Wait

Let’s reverse the order: set our flag first, then check the other’s.

  • Shared: volatile boolean wantp = false, wantq = false;
  • Process P:
    loop {
        // p1: non-critical section
        p2: wantp = true;   // Indicate I want in FIRST
        p3: while(wantq) {} // Wait if Q wants in
        p4: critical section
        p5: wantp = false;  // Indicate I'm done
    }
  • Process Q: Symmetric.

Problem: Does this work? No.

Failure Scenario (Deadlock):

  1. Start: [p1, q1, false, false]
  2. P runs non-critical, reaches p2: [p2, q1, false, false]
  3. Q runs non-critical, reaches q2: [p2, q2, false, false]
  4. P executes wantp = true: [p3, q2, true, false]
  5. Q executes wantq = true: [p3, q3, true, true] (Interleaving!)
  6. P executes while(wantq) (true) P waits.
  7. Q executes while(wantp) (true) Q waits.

DEADLOCK! Both processes are stuck in state [p3, q3, true, true], each waiting for the other’s flag to become false, which will never happen.

Attempt 3: Turn Variable (Strict Alternation)

Let’s try using a turn variable to explicitly alternate access.

  • Shared: volatile int turn = 1; (Indicates whose turn it is)
  • Process P:
    loop {
        // p1: non-critical section
        p2: while(turn != 1) {} // Wait until it's my turn
        p3: critical section
        p4: turn = 2;           // Give turn to Q
    }
  • Process Q: Symmetric (waits for turn != 2, sets turn = 1).

Problem: Does this work? It guarantees mutual exclusion, but…

Failure Scenario (Starvation / Lack of Progress): This enforces strict alternation. If Process P takes its turn, enters the critical section (p3), and exits (p4, setting turn=2), it cannot re-enter (p2 blocks) until Process Q has taken its turn. If Q stays in its non-critical section for a long time (or crashes!), P can be starved, unable to enter the critical section even though Q is not using it. This violates the freedom from starvation requirement.

Combination: Decker’s Algorithm

Decker’s algorithm (historically significant as one of the first correct solutions) combines Try 2 (flags) and Try 3 (turn variable) to resolve conflicts.

  • Shared: volatile boolean wantp=false, wantq=false; volatile int turn=1;
  • Process P:
    loop {
      non_critical_section;
      wantp = true;
      while (wantq) {          // Q is also interested
        if (turn == 2) {     // If it's Q's turn to break the tie
          wantp = false;     //   Retract interest temporarily
          while (turn != 1); //   Wait for my turn
          wantp = true;      //   Re-assert interest
        }
      }
      critical_section;
      turn = 2;                // Give turn to Q
      wantp = false;           // I'm done
    }
  • Process Q: Symmetric.

Logic: Uses flags to signal intent. If both want access, the turn variable acts as a tie-breaker. The process whose turn it isn’t temporarily backs off (wantp = false), waits for its turn, and then tries again. This ensures mutual exclusion and prevents deadlock and starvation (though the proof is more involved).

More Concise: Peterson’s Lock (1981)

Peterson found a simpler and more elegant 2-process solution.

  • Shared:
    • volatile boolean flag[1..2] = {false, false}; // Process i is interested
    • volatile int victim; // Which process yields priority
  • Process P (id=P, other=Q):
    loop {
      non_critical_section;
      flag[P] = true;      // I am interested
      victim = P;          // I yield priority (let the other go first if contested)
      // Wait ONLY if Q is interested AND Q was the last one to yield (it's Q's "turn")
      while (flag[Q] && victim == P) {};
      critical_section;
      flag[P] = false;     // I am no longer interested
    }
  • Process Q (id=Q, other=P): Symmetric.

Key Idea: The victim variable ensures that if both processes try to enter simultaneously, only the one who isn’t the victim can proceed immediately. The victim must wait until the other process either loses interest (flag[other] == false) or yields priority by setting victim to the other process ID.

Proving Peterson’s Lock Correctness

We want to formally prove that Peterson’s Lock satisfies:

  1. Mutual Exclusion
  2. Freedom from Starvation

This requires a more rigorous way to talk about concurrent events.

Formalism: Events, Precedence, Intervals

  • Events: Atomic points in the execution (e.g., p_1 = "flag[P]=true"). is the j-th occurrence of event i in process P.
  • Precedence (): Defines the total order of events within a single thread’s execution.

  • Interval: A sequence of events between a start event and an end event where .
  • Concurrency: Two intervals are concurrent if neither precedes the other.

Atomic Registers

To model shared variables like flag and victim, we use the concept of an Atomic Register (is not related to CPU registers).

  • Operations: read() and write(v).
  • Effect Time (τ(J)): Each operation J (read or write) takes effect at a single, instantaneous point in time τ(J), which occurs between the start and end of the operation J.
  • Distinctness: Operations J and K on the same register have different effect times: τ(J) ≠ τ(K).
  • Read Value: A read J returns the value v written by the write K whose effect time τ(K) is the latest one that precedes τ(J).

Example Timeline: This shows operations by threads A, B, C on a shared register.

  • Read K by A returns 1 (written by J).
  • Read N by C returns 8 (written by M, which happened after L).
  • Read O by A returns 4 (written by L, which happened after J and M).

Treating operations on atomic registers as instantaneous events simplifies proofs, but remember that the order of concurrent events is non-deterministic.

Proof: Mutual Exclusion (Peterson)

Proof: Freedom from Starvation (Peterson)

Peterson’s Lock in Java

class PetersonLock {
    // Need atomicity and visibility guarantees
    volatile boolean flag[] = new boolean[2];
    volatile int victim;
 
    // id = 0 or 1
    public void Acquire(int id) {
        int other = 1 - id;
        flag[id] = true;
        victim = id;
        // Spin-wait
        while (flag[other] && victim == id) {};
    }
 
    public void Release(int id) {
        flag[id] = false;
    }
}

Caveat: As noted before, volatile on the array reference flag isn’t enough. For guaranteed correctness in Java, use AtomicIntegerArray for flag and AtomicInteger for victim.

Extending to N Processes

Peterson’s lock is elegant but limited to two processes. How do we handle N processes?

The Filter Lock

The Filter Lock extends Peterson’s idea to N processes using levels.

  • Imagine N-1 “waiting rooms” or levels before the final critical section (level N).
  • Each thread t has a level[t] indicating the highest level it’s trying to enter (initially 0, non-critical).
  • Each level l (from 1 to N-1) has a victim[l] variable, similar to Peterson’s.
  • To enter CS (Acquire):
    • A thread me must pass through levels i = 1 to N-1.
    • To enter level i, it sets level[me] = i and victim[i] = me.
    • It then waits (while) until there is no other thread k at a level greater than or equal to i (level[k] >= i) OR until me is no longer the victim for level i (victim[i] != me).
  • To leave CS (Release): Set level[me] = 0.
int[] level; // size n, init 0
int[] victim; // size n, init undefined
 
lock(me) { // me is thread ID 0..n-1
  for (int i = 1; i < n; ++i) { // Try levels 1 to n-1
    level[me] = i;
    victim[i] = me;
    // Wait while some OTHER thread k is at level i or higher
    // AND I am the victim for this level
    while ( (exists k != me such that level[k] >= i) &&
            (victim[i] == me) ) {};
  }
  // Passed all levels, enter Critical Section (level n)
}
 
unlock(me) {
  level[me] = 0; // Exit CS and all levels
}

Intuition: At each level i, the victim mechanism ensures at most N-i threads can pass. By level N-1, at most one thread remains, which can then enter the critical section.

Filter Lock in Java: Requires AtomicIntegerArray for level and victim for correctness. The Others check involves iterating through all other threads.

Properties:

  • Satisfies Mutual Exclusion.
  • Is Deadlock-free.
  • Is Starvation-free (a thread can only be blocked by threads entering after it reached a certain level).

Fairness:

  • We can divide the lock preprotocol into a finite doorway section and an unbounded waiting section.

  • A lock is first-come-first-served (FCFS) (and thus defined to be fair) if Doorway_A → Doorway_B implies CS_A → CS_B.

  • Filter Lock Fairness: The Filter Lock is not FCFS. A thread can pass through all levels quickly while another thread, which started earlier, might be repeatedly made the victim at lower levels.

  • Other Issues: Reads N-1 shared variables in the while loop (busy-waiting), which can be inefficient (cache contention).

A Small Detour: Safe and Regular Registers

Can we build mutual exclusion even if reads/writes aren’t fully atomic? Surprisingly, yes, using weaker register types.

  • Safe Register: A read not concurrent with a write returns the correct current value. A read concurrent with a write can return any possible value from the register’s domain (garbage!).
  • Regular Register: Stronger than safe. A read not concurrent with a write returns the correct value. A read concurrent with a write can return either the old value or the new value being written (but not garbage).
  • Atomic Register: Strongest. Read returns value from the unique preceding write based on effect time τ.

(SWMR = Single Writer Multiple Reader, the name is a bit misleading as it isn’t really “safe”)

Example: If B writes 4 while C reads, a safe register read by C could return anything. A regular register read by C could return 1 (old value) or 4 (new value).

Lamport’s Bakery Algorithm (1974)

Provides mutual exclusion for N processes using only regular registers (a surprising result!). Inspired by bakery ticket systems.

  • Idea: Processes take a “ticket number” before entering. The number taken is higher than any existing number. Processes wait until their number is the lowest among those wanting entry.
  • Uses process IDs to break ties if numbers are equal.

Simplified 2-Process Version:

  • Shared: volatile int np = 0, nq = 0; (Ticket numbers)
  • Process P:
    loop {
      non_critical_section;
      np = nq + 1; // Take ticket > Q's
      // Wait if Q wants access (nq!=0) AND Q has smaller ticket (nq < np)
      // OR if tickets are equal and Q has priority (np==nq and PID_Q < PID_P)
      // Simplified here: just wait if Q interested and has earlier/equal ticket
      while (nq != 0 && nq < np); // Simplified check for illustration
      critical_section;
      np = 0; // Release ticket
    }
  • Process Q: Symmetric logic nq = np + 1; while (np != 0 && np <= nq);

N-Process Version:

  • Shared:
    • boolean flag[0..n-1] = {false}; // SWMR “I want the lock”
    • integer label[0..n-1] = {0}; // SWMR “ticket number”
  • Lexicographical Order: if OR ( and ).
  • Acquire (lock(me)):
    1. Set flag[me] = true.
    2. Take ticket: label[me] = max(label[0..n-1]) + 1.
    3. Wait (while) until for all other processes k, either flag[k] is false OR I have the lower ticket according to .
  • Release (unlock(me)):
    1. Set flag[me] = false.

Properties: Guarantees Mutual Exclusion, is FCFS (and thus starvation-free), and works using only SWMR regular registers (reads concurrent with the single writer might return old or new value).

Bakery Lock in Java: Again, needs AtomicIntegerArray for flag and label for practical correctness due to Java Memory Model guarantees. Conflict check iterates through others, comparing labels and IDs. MaxLabel finds the current max label.

Problem: Although correct and FCFS, it requires reading N shared variables (flag and label) in the while loop, which is inefficient.

This concludes our exploration of implementing mutual exclusion using basic atomic/regular registers. Next time, we’ll look at hardware primitives that make implementing efficient locks much easier.

Continue: 17 Read-Modify-Write Operations, TAS, CAS, TATAS Locks, Deadlocks