Lecture from: 16.04.2024 | Video: Video ETHZ
Learning Goals for Today
Recap of Foundational Concepts (From Previous Lectures):
- We’ve seen simple proofs of correctness for concurrent algorithms and encountered unexpected failures when running them on real computers.
- We introduced the concept of memory models as the crucial contract between the programmer, compiler, runtime, and hardware architecture.
- We saw Java’s
volatile
andsynchronized
constructs, which provide specific guarantees within this contract.
Today’s New Topics:
- Implementing Mutexes with Atomic Registers: Exploring how the fundamental mutual exclusion property (provided by locks) can be built from the ground up using only atomic read/write operations on registers.
- Dekker’s Algorithm
- Peterson’s Algorithm
- Implementing N-thread Locks with Atomic Registers: Extending these ideas beyond two threads.
- Filter Lock
- Bakery Lock
- Locks using Atomic Operations: Brief look at locks built using hardware-level atomic instructions (TAS, TATAS, exponential backoff).
Context: Remember, you will rarely implement these low-level locks yourself in practice; you’ll use library functions (like Java’s synchronized
or ReentrantLock
). However, building them helps understand the core principles and challenges, reinforcing the learning process – “involve me and I learn.”
Behind Locks: Implementation of Mutual Exclusion
We’ve used locks (synchronized
, Lock
) as a fundamental tool. But how do they work underneath? How can we guarantee that only one thread enters a critical section at a time, using only basic memory operations? This section explores the implementation of mutual exclusion.
Assumptions (for Algorithm Analysis)
To analyze the correctness of the upcoming mutual exclusion algorithms, we’ll initially make some simplifying assumptions. It’s crucial to remember these are not always true in real systems without explicit guarantees (like volatile
or memory fences).
- Atomic Reads and Writes: We assume that reading or writing a primitive variable (like
boolean
orint
) happens atomically – as a single, indivisible operation. (We will define “atomic” more precisely later). - No Reordering: We assume, for the purpose of understanding the algorithms’ logic, that the compiler and hardware do not reorder the sequence of reads and writes as written in the code. (This is FALSE in practice! Real implementations need mechanisms like
volatile
or memory barriers to prevent harmful reordering). - Eventual Exit: A thread attempting to enter a critical section will eventually leave it (i.e., call the unlock/postprotocol part). We don’t worry about infinite loops inside the critical section itself.
We also assume the standard multithreaded environment where thread execution can interleave arbitrarily. We make no assumptions about progress outside the critical sections (a thread might stall indefinitely in its non-critical section).
Critical Sections
The requirements for a correct critical section implementation:
- Mutual Exclusion: No two processes execute statements from their critical sections simultaneously (no interleaving within the critical sections).
- Freedom from Deadlock: If multiple processes are trying to enter their critical sections, at least one must eventually succeed.
- Freedom from Starvation: If any process attempts to enter its critical section, it must eventually succeed (it cannot be blocked indefinitely while others proceed).
The Critical Section Problem Structure
The general setup involves two or more processes (or threads) competing for access to a critical section, using shared global variables to coordinate.
Process P Process Q
local variables local variables
loop loop
non-critical section non-critical section
preprotocol <-- Try to preprotocol <-- Try to
critical section <-- Access critical section <-- Access
postprotocol <-- Release postprotocol <-- Release
On a simple single-core machine, a rudimentary (and generally bad) approach is to disable interrupts (IRQs) before the critical section and re-enable them after. This prevents the OS from preempting (switching) the thread during the critical section. However, this doesn’t work on multi-core systems and has many other drawbacks.
Attempt 1: Using “Want” Flags
Let’s try to solve mutual exclusion for two processes, P and Q, using shared boolean flags indicating intent.
- Shared:
volatile boolean wantp = false, wantq = false;
- Process P:
loop { // p1: non-critical section p2: while(wantq) {} // Wait if Q wants in p3: wantp = true; // Indicate I want in p4: critical section p5: wantp = false; // Indicate I'm done }
- Process Q: Symmetric code, swapping p/q.
Problem: Does this guarantee mutual exclusion? No.
Failure Scenario (State Space Analysis): Consider the state [p, q, wantp, wantq]
.
- Start:
[p1, q1, false, false]
- P runs non-critical, reaches p2:
[p2, q1, false, false]
- Q runs non-critical, reaches q2:
[p2, q2, false, false]
- P executes
while(wantq)
(false), continues to p3:[p3, q2, false, false]
- Q executes
while(wantp)
(false), continues to q3:[p3, q3, false, false]
(Interleaving!) - P executes
wantp = true
:[p4, q3, true, false]
→ P enters CS. - Q executes
wantq = true
:[p4, q4, true, true]
→ Q also enters CS!
Mutual Exclusion VIOLATED! Both processes can be in the critical section (state [p4, q4, ...]
) simultaneously. The core issue is checking the other’s flag before setting one’s own.
Attempt 2: Flags First, Then Wait
Let’s reverse the order: set our flag first, then check the other’s.
- Shared:
volatile boolean wantp = false, wantq = false;
- Process P:
loop { // p1: non-critical section p2: wantp = true; // Indicate I want in FIRST p3: while(wantq) {} // Wait if Q wants in p4: critical section p5: wantp = false; // Indicate I'm done }
- Process Q: Symmetric.
Problem: Does this work? No.
Failure Scenario (Deadlock):
- Start:
[p1, q1, false, false]
- P runs non-critical, reaches p2:
[p2, q1, false, false]
- Q runs non-critical, reaches q2:
[p2, q2, false, false]
- P executes
wantp = true
:[p3, q2, true, false]
- Q executes
wantq = true
:[p3, q3, true, true]
(Interleaving!) - P executes
while(wantq)
(true) → P waits. - Q executes
while(wantp)
(true) → Q waits.
DEADLOCK! Both processes are stuck in state [p3, q3, true, true]
, each waiting for the other’s flag to become false, which will never happen.
Attempt 3: Turn Variable (Strict Alternation)
Let’s try using a turn variable to explicitly alternate access.
- Shared:
volatile int turn = 1;
(Indicates whose turn it is) - Process P:
loop { // p1: non-critical section p2: while(turn != 1) {} // Wait until it's my turn p3: critical section p4: turn = 2; // Give turn to Q }
- Process Q: Symmetric (waits for
turn != 2
, setsturn = 1
).
Problem: Does this work? It guarantees mutual exclusion, but…
Failure Scenario (Starvation / Lack of Progress): This enforces strict alternation. If Process P takes its turn, enters the critical section (p3
), and exits (p4
, setting turn=2
), it cannot re-enter (p2
blocks) until Process Q has taken its turn. If Q stays in its non-critical section for a long time (or crashes!), P can be starved, unable to enter the critical section even though Q is not using it. This violates the freedom from starvation requirement.
Combination: Decker’s Algorithm
Decker’s algorithm (historically significant as one of the first correct solutions) combines Try 2 (flags) and Try 3 (turn variable) to resolve conflicts.
- Shared:
volatile boolean wantp=false, wantq=false; volatile int turn=1;
- Process P:
loop { non_critical_section; wantp = true; while (wantq) { // Q is also interested if (turn == 2) { // If it's Q's turn to break the tie wantp = false; // Retract interest temporarily while (turn != 1); // Wait for my turn wantp = true; // Re-assert interest } } critical_section; turn = 2; // Give turn to Q wantp = false; // I'm done }
- Process Q: Symmetric.
Logic: Uses flags to signal intent. If both want access, the turn
variable acts as a tie-breaker. The process whose turn it isn’t temporarily backs off (wantp = false
), waits for its turn, and then tries again. This ensures mutual exclusion and prevents deadlock and starvation (though the proof is more involved).
More Concise: Peterson’s Lock (1981)
Peterson found a simpler and more elegant 2-process solution.
- Shared:
volatile boolean flag[1..2] = {false, false};
// Process i is interestedvolatile int victim;
// Which process yields priority
- Process P (id=P, other=Q):
loop { non_critical_section; flag[P] = true; // I am interested victim = P; // I yield priority (let the other go first if contested) // Wait ONLY if Q is interested AND Q was the last one to yield (it's Q's "turn") while (flag[Q] && victim == P) {}; critical_section; flag[P] = false; // I am no longer interested }
- Process Q (id=Q, other=P): Symmetric.
Key Idea: The victim
variable ensures that if both processes try to enter simultaneously, only the one who isn’t the victim can proceed immediately. The victim must wait until the other process either loses interest (flag[other] == false
) or yields priority by setting victim
to the other process ID.
Proving Peterson’s Lock Correctness
We want to formally prove that Peterson’s Lock satisfies:
- Mutual Exclusion
- Freedom from Starvation
This requires a more rigorous way to talk about concurrent events.
Formalism: Events, Precedence, Intervals
- Events: Atomic points in the execution (e.g.,
p_1 = "flag[P]=true"
). is the j-th occurrence of eventi
in process P. - Precedence (
→
): Defines the total order of events within a single thread’s execution.
- Interval: A sequence of events between a start event and an end event where .
- Concurrency: Two intervals are concurrent if neither precedes the other.
Atomic Registers
To model shared variables like flag
and victim
, we use the concept of an Atomic Register (is not related to CPU registers).
- Operations:
read()
andwrite(v)
. - Effect Time (
τ(J)
): Each operationJ
(read or write) takes effect at a single, instantaneous point in timeτ(J)
, which occurs between the start and end of the operationJ
. - Distinctness: Operations
J
andK
on the same register have different effect times:τ(J) ≠ τ(K)
. - Read Value: A read
J
returns the valuev
written by the writeK
whose effect timeτ(K)
is the latest one that precedesτ(J)
.
Example Timeline: This shows operations by threads A, B, C on a shared register.
- Read K by A returns 1 (written by J).
- Read N by C returns 8 (written by M, which happened after L).
- Read O by A returns 4 (written by L, which happened after J and M).
Treating operations on atomic registers as instantaneous events simplifies proofs, but remember that the order of concurrent events is non-deterministic.
Proof: Mutual Exclusion (Peterson)
Proof: Freedom from Starvation (Peterson)
Peterson’s Lock in Java
class PetersonLock {
// Need atomicity and visibility guarantees
volatile boolean flag[] = new boolean[2];
volatile int victim;
// id = 0 or 1
public void Acquire(int id) {
int other = 1 - id;
flag[id] = true;
victim = id;
// Spin-wait
while (flag[other] && victim == id) {};
}
public void Release(int id) {
flag[id] = false;
}
}
Caveat: As noted before, volatile
on the array reference flag
isn’t enough. For guaranteed correctness in Java, use AtomicIntegerArray
for flag
and AtomicInteger
for victim
.
Extending to N Processes
Peterson’s lock is elegant but limited to two processes. How do we handle N processes?
The Filter Lock
The Filter Lock extends Peterson’s idea to N processes using levels.
- Imagine N-1 “waiting rooms” or levels before the final critical section (level N).
- Each thread
t
has alevel[t]
indicating the highest level it’s trying to enter (initially 0, non-critical). - Each level
l
(from 1 to N-1) has avictim[l]
variable, similar to Peterson’s. - To enter CS (Acquire):
- A thread
me
must pass through levelsi = 1
toN-1
. - To enter level
i
, it setslevel[me] = i
andvictim[i] = me
. - It then waits (
while
) until there is no other threadk
at a level greater than or equal toi
(level[k] >= i
) OR untilme
is no longer the victim for leveli
(victim[i] != me
).
- A thread
- To leave CS (Release): Set
level[me] = 0
.
int[] level; // size n, init 0
int[] victim; // size n, init undefined
lock(me) { // me is thread ID 0..n-1
for (int i = 1; i < n; ++i) { // Try levels 1 to n-1
level[me] = i;
victim[i] = me;
// Wait while some OTHER thread k is at level i or higher
// AND I am the victim for this level
while ( (exists k != me such that level[k] >= i) &&
(victim[i] == me) ) {};
}
// Passed all levels, enter Critical Section (level n)
}
unlock(me) {
level[me] = 0; // Exit CS and all levels
}
Intuition: At each level i
, the victim
mechanism ensures at most N-i threads can pass. By level N-1, at most one thread remains, which can then enter the critical section.
Filter Lock in Java: Requires AtomicIntegerArray
for level
and victim
for correctness. The Others
check involves iterating through all other threads.
Properties:
- Satisfies Mutual Exclusion.
- Is Deadlock-free.
- Is Starvation-free (a thread can only be blocked by threads entering after it reached a certain level).
Fairness:
-
We can divide the lock preprotocol into a finite doorway section and an unbounded waiting section.
-
A lock is first-come-first-served (FCFS) (and thus defined to be fair) if
Doorway_A → Doorway_B
impliesCS_A → CS_B
. -
Filter Lock Fairness: The Filter Lock is not FCFS. A thread can pass through all levels quickly while another thread, which started earlier, might be repeatedly made the victim at lower levels.
-
Other Issues: Reads N-1 shared variables in the
while
loop (busy-waiting), which can be inefficient (cache contention).
A Small Detour: Safe and Regular Registers
Can we build mutual exclusion even if reads/writes aren’t fully atomic? Surprisingly, yes, using weaker register types.
- Safe Register: A read not concurrent with a write returns the correct current value. A read concurrent with a write can return any possible value from the register’s domain (garbage!).
- Regular Register: Stronger than safe. A read not concurrent with a write returns the correct value. A read concurrent with a write can return either the old value or the new value being written (but not garbage).
- Atomic Register: Strongest. Read returns value from the unique preceding write based on effect time
τ
.
(SWMR = Single Writer Multiple Reader, the name is a bit misleading as it isn’t really “safe”)
Example: If B writes 4 while C reads, a safe register read by C could return anything. A regular register read by C could return 1 (old value) or 4 (new value).
Lamport’s Bakery Algorithm (1974)
Provides mutual exclusion for N processes using only regular registers (a surprising result!). Inspired by bakery ticket systems.
- Idea: Processes take a “ticket number” before entering. The number taken is higher than any existing number. Processes wait until their number is the lowest among those wanting entry.
- Uses process IDs to break ties if numbers are equal.
Simplified 2-Process Version:
- Shared:
volatile int np = 0, nq = 0;
(Ticket numbers) - Process P:
loop { non_critical_section; np = nq + 1; // Take ticket > Q's // Wait if Q wants access (nq!=0) AND Q has smaller ticket (nq < np) // OR if tickets are equal and Q has priority (np==nq and PID_Q < PID_P) // Simplified here: just wait if Q interested and has earlier/equal ticket while (nq != 0 && nq < np); // Simplified check for illustration critical_section; np = 0; // Release ticket }
- Process Q: Symmetric logic
nq = np + 1; while (np != 0 && np <= nq);
N-Process Version:
- Shared:
boolean flag[0..n-1] = {false};
// SWMR “I want the lock”integer label[0..n-1] = {0};
// SWMR “ticket number”
- Lexicographical Order: if OR ( and ).
- Acquire (lock(me)):
- Set
flag[me] = true
. - Take ticket:
label[me] = max(label[0..n-1]) + 1
. - Wait (
while
) until for all other processesk
, eitherflag[k]
is false OR I have the lower ticket according to .
- Set
- Release (unlock(me)):
- Set
flag[me] = false
.
- Set
Properties: Guarantees Mutual Exclusion, is FCFS (and thus starvation-free), and works using only SWMR regular registers (reads concurrent with the single writer might return old or new value).
Bakery Lock in Java: Again, needs AtomicIntegerArray
for flag
and label
for practical correctness due to Java Memory Model guarantees. Conflict
check iterates through others, comparing labels and IDs. MaxLabel
finds the current max label.
Problem: Although correct and FCFS, it requires reading N shared variables (flag
and label
) in the while
loop, which is inefficient.
This concludes our exploration of implementing mutual exclusion using basic atomic/regular registers. Next time, we’ll look at hardware primitives that make implementing efficient locks much easier.
Continue: 17 Read-Modify-Write Operations, TAS, CAS, TATAS Locks, Deadlocks