Lecture from 06.05.2025 | Video: Video ETHZ
We’ve seen the immense computational demands, particularly in AI, driving the need for massive parallelism. Understanding how to manage concurrency correctly and efficiently is crucial, whether dealing with multi-core CPUs or specialized hardware like Tesla’s FSD chip, which, interestingly, aims for high fault tolerance without traditional locks according to Elon Musk (though the specifics of how they achieve concurrency control are complex).
Last Week’s Lecture Recap
Let’s briefly summarize the key points from our previous lecture:
- Explicit/Interface Locks (
java.util.concurrent.locks
): We saw that explicitLock
andCondition
objects offer more flexibility and power than intrinsic monitors (synchronized
/wait
/notify
) but are also more complex and error-prone. - Sleeping Barber Example: This illustrated a technique using counters alongside condition variables to avoid unnecessary notifications when no threads are actually waiting, preventing “lost signals”. The core idea is to keep counts of active/waiting threads.
- Reader/Writer Locks: We introduced
ReentrantReadWriteLock
as a mechanism to allow concurrent reads while ensuring exclusive writes, discussing various implementations and fairness considerations (e.g., reader vs. writer priority).
Learning Goals Today
Today, we’ll focus on:
- Lock Granularity in Practice: Applying different locking strategies (fine-grained, optimistic, lazy) to concrete data structures like linked lists and stacks.
- Various Locking Modes: Understanding the trade-offs and implementation details of these different approaches through examples.
Fine-Grained Locking Revisited
Recall the core idea of fine-grained locking:
- Goal: Increase concurrency by splitting a data structure into smaller pieces, each protected by its own separate lock.
- Benefit: Allows operations on disjoint pieces to proceed in parallel without mutual exclusion.
- Challenge: Often significantly more intricate than coarse-grained locking, requiring careful analysis of special cases and potential interactions between threads operating on adjacent pieces.
Example: Fine-Grained Locking on a Sorted Linked List Set
We previously introduced the concept of hand-over-hand locking (or lock coupling) for traversing and modifying a linked list, where a thread always holds locks on two adjacent nodes (pred
and curr
) before proceeding or modifying pointers.
Let’s reconsider removing c
. Locking just b
is insufficient.
Concurrent removes of b
and c
can interfere, leaving c
undeleted.
The problem is that modifying b.next
requires knowing c.next
safely, and preventing interference with b
itself. Locking both pred
and curr
is necessary. (Note that d
does not need to be locked, since as long as a -> d
is satisfied, we don’t care how d.next
changes. Deletes to d
happen atomically using these locks, so that isn’t a problem).
The hand-over-hand locking protocol for remove
ensures safety by locking curr
before unlocking pred
during traversal, and holding locks on both pred
and curr
during the actual pointer update (pred.next = curr.next
).
Disadvantages of Hand-Over-Hand Locking:
- Overhead: Operations require a sequence of lock acquire/release operations during traversal, adding overhead.
- Blocking: A slow thread holding a lock on an early node can block other threads wanting to access nodes further down the list, even if the operations are on logically disjoint parts. Concurrency is limited by the sequential nature of the traversal locking.
Can we do better for list-based structures?
Optimistic Synchronization (Locking)
Idea
Assume conflicts (interference from other threads) are rare. Perform the operation in stages:
- Find Nodes (Lock-Free): Traverse the list without acquiring any locks to find the relevant nodes (e.g.,
pred
andcurr
for an add/remove operation). - Lock Nodes: Once the target nodes are found, acquire locks only on those specific nodes.
- Validate: After acquiring locks, check if the situation is still valid. Did another thread modify the list structure (e.g., delete
pred
orcurr
, insert between them) after stage 1 but before we acquired the locks in stage 2? - Perform Operation: If validation succeeds, perform the actual modification (e.g., update pointers).
- Unlock: Release the locks. If validation fails, release locks and retry (or signal failure).
Example: add(c)
- Traverse without locks, finding that
c
should go betweenb
andd
. - Lock
b
andd
. - Validate: Is
b
still reachable? Isd
still the successor ofb
? - If valid, set
c.next = d
,b.next = c
. - Unlock
b
andd
.
Validation Failure Example 1: Deletion
- Thread A:
add(c)
. Finds insertion point betweenb
andd
. - Thread B:
remove(b)
. Runs concurrently. Findsa
andb
, locks them, changesa.next
tod
, unlocks.b
is now removed. - Thread A: Tries to lock
b
andd
. Let’s say it succeeds (ifb
wasn’t fully reclaimed). - Thread A: Validates. Rescans from the head. Finds
a
points tod
. Nodeb
is no longer reachable. Validation fails. - Thread A: Unlocks
b
andd
, returnsfalse
(or retries).
Validation Failure Example 2: Insertion
- Thread A:
add(c)
. Finds insertion point betweenb
andd
. - Thread B:
insert(b')
. Runs concurrently. Finds insertion point betweenb
andd
, locksb
andd
, insertsb'
(b.next = b'
,b'.next = d
), unlocks. - Thread A: Tries to lock
b
andd
. Succeeds. - Thread A: Validates. Rescans or checks
b.next
. Findsb.next
is nowb'
, notd
. Validation fails becaused != succ(b)
. - Thread A: Unlocks
b
andd
, returnsfalse
(or retries).
Validation Summary: The validate(pred, curr)
method needs to:
- Lock
pred
andcurr
. - Check if
pred
is still reachable from thehead
. - Check if
curr
is still the immediate successor ofpred
(pred.next == curr
). - (Implicitly) Check if
pred
andcurr
are not marked deleted (relevant for lazy lists).
Correctness relies on Validation
- For
remove(c)
(whereb
is pred,c
is curr): If validation (b
reachable,c == b.next
, neither marked) passes while holding locks onb
andc
, then no other thread can be concurrently deletingb
orc
, nor inserting between them. It’s safe to proceed. - For
remove(c)
(wherec
not found,b
is pred,d
is curr): If validation (b
reachable,d == b.next
, neither marked) passes while holding locks onb
andd
, then no other thread can have insertedc
betweenb
andd
after our initial scan but before we lockedb
andd
. It’s safe to return false.
Optimistic List Trade-offs
- Good:
- No Contention on Traversals: The initial scan is lock-free, allowing many concurrent readers/traversers.
- Wait-Free Traversals: Finding the nodes doesn’t involve waiting for locks (assuming
contains
doesn’t lock). A wait-free operation guarantees completion in a finite number of its own steps, regardless of other threads’ speeds or pausing. - Fewer Lock Acquisitions: Locks are only acquired briefly on the target nodes during the modify phase.
- Bad:
- Double Traversal: Need to traverse (or at least rescan for validation) twice in the worst case (scan, lock, validate-scan, modify).
contains()
Needs Locks?: A simplecontains
that just scans might return true for a node that’s concurrently being deleted (after scan, before validation). A correctcontains
might also need to lock/validate, losing the wait-free benefit.- Not Starvation-Free: Threads might repeatedly fail validation due to high contention and have to retry indefinitely.
Lazy Synchronization (Locking)
Idea: Improve upon optimistic locking, particularly for remove
. Removing nodes often requires locking three nodes (pred, curr, succ) in fine-grained schemes or careful validation in optimistic schemes. Lazy synchronization aims to simplify remove
.
Lazy List Approach:
- Similar to optimistic list, but modifies the
remove
operation. - Scan Once: Traverse the list once (optimistically or using hand-over-hand).
contains()
Never Locks: A simple scan is sufficient.- Logical Deletion (Marking): Instead of physically removing a node immediately,
remove
first marks the node as “logically deleted”. - Physical Deletion (Lazy): The actual removal (updating predecessor’s
next
pointer) is done later, potentially by the sameremove
operation or even by subsequentadd
orremove
operations that happen to traverse over the marked node. - Key Invariant: Every unmarked node in the list must always be reachable from the head.
contains
only returns true for unmarked nodes.
Lazy List Remove Steps:
- Scan: Find
pred
andcurr
(wherecurr
is the node to be removed). - Lock: Lock
pred
andcurr
. - Validate: Check if
pred
is reachable and points tocurr
, AND crucially, check thatcurr
is not already marked as deleted. - Logical Delete: If validation passes, mark
curr
as deleted. - Physical Delete: Change
pred.next
to point tocurr.next
. - Unlock: Release locks on
pred
andcurr
.
Key Invariant for Lazy List: If a node is not marked as deleted, it must be reachable from the head via a path of unmarked nodes. contains
checks this invariant by ignoring marked nodes.
Lazy Remove Code Sketch:
Wait-Free contains()
:
Since contains
only performs reads and doesn’t acquire locks, it’s wait-free. The marked
flag ensures it doesn’t report true for nodes that have been logically removed.
Lazy List Trade-offs:
- Pros:
contains
is wait-free and fast (single scan, no locks).remove
might be simpler as physical deletion is separated. - Cons:
add
still needs validation similar to optimistic lists. Memory isn’t reclaimed until marked nodes are physically unlinked (which might be delayed).
More Practical: Lazy Skip Lists
While linked lists illustrate locking concepts well, they aren’t typically the best structure for concurrent sets/maps due to O(n) traversal. Skip lists offer a more practical alternative. (Bill Pugh is the inventor).
Skip List: A probabilistic data structure representing a set (sorted elements without duplicates).
- Interface:
add(x)
,remove(x)
,find(x)
(orcontains(x)
). - Typical Workload: Assumes many calls to
find
, feweradd
, and even fewerremove
.
Why not balanced trees (AVL, Red-Black)?
- Rebalancing after add/remove can be expensive.
- Rebalancing often requires global restructuring (locking large parts or the whole tree), hindering concurrency.
- Implementing lock-free or highly concurrent balanced trees is extremely complex.
Skip Lists
A probabilistic approach (Las Vegas style - always correct, probably fast) to achieve logarithmic performance on average without complex rebalancing.
Structure
- A sorted multi-level linked list.
- Each node has a probabilistic height (e.g., P(height=n) = 0.5^n). Higher nodes act as express lanes.
- No rebalancing is needed.
Property: Nodes present at higher levels are always present in all lower levels. Level 0 contains all elements.
Operations
We can apply the lazy synchronization strategy to skip lists.
Searching
- Start at the highest level list at the head (
-∞
). - Traverse forward until the next node is greater than the target (8).
- Drop down one level.
- Repeat traversal and dropping down until level 0 is reached.
- Expected search time is O(log n).
Similarly for finding 6 (which doesn’t exist)…
The find
operation typically returns predecessors (pre
) and successors (succ
) at each level to facilitate subsequent add
/remove
.
Adding
add(6)
Example:
- Find Predecessors (Lock-Free): Traverse lock-free to find the predecessors at each level where the new node (6) should be inserted (nodes 5 and -inf in this example).
- Lock Predecessors: Acquire locks on the predecessor nodes found (lock node 5 at levels 0, 1, 2; lock head at level 3).
- Validate: Rescan from predecessors to ensure they are still reachable and their successors haven’t changed (i.e., no concurrent add/remove interfered).
- Splice: If validation succeeds, create the new node (6) with its random height (say, 3 levels), and update the
next
pointers of the locked predecessors at the relevant levels. Mark the node as fully linked. [include image from page 33] - Unlock: Release locks on predecessors.
Removing
remove(5)
Example (Lazy):
- Find Predecessors: Traverse lock-free to find predecessors of node 5 at all levels.
- Lock Victim: Acquire lock on the node to be removed (node 5).
- Logical Remove (Mark): Mark node 5 as logically deleted at all its levels. This prevents
contains(5)
from returning true. - Lock Predecessors & Validate: Acquire locks on predecessors. Validate that predecessors are still linked correctly and point to the (now marked) victim node 5.
- Physical Remove: If validation passes, update predecessors’
next
pointers to bypass node 5. - Unlock: Release locks on predecessors and victim.
Contains
contains(8)
Example:
- Perform a standard skip list find operation (traverse high levels, drop down). No locks are acquired.
- If node 8 is found, check if it is marked for deletion AND if it is fully linked (relevant during add operations).
- Return true only if found, not marked, and fully linked.
- This operation is wait-free because it involves no locking and the traversal is guaranteed to finish (assuming no infinite loops in the list structure itself, which marking prevents for removals).
Skip lists provide a practical concurrent set/map data structure. The full implementation involves handling many details carefully (e.g., node marking, validation logic).
Motivating Lock-Free Programming
We’ve explored various locking techniques (spinlocks, locks with scheduling/waiting, different granularities). Now, let’s consider why we might want to move beyond locks entirely.
Reminder: Problems with Spinlocks
- Poor performance under contention (bus traffic, cache invalidation).
- Waste CPU resources spinning.
- No scheduling fairness (can lead to starvation).
- No notification mechanism for condition waiting.
Locks with Waiting/Scheduling (Mutexes, Semaphores, Monitors)
- Avoid wasteful spinning by suspending threads (yielding CPU).
- Require support from OS/runtime scheduler.
- Internal waiting queues need protection (often short-term spinlocks!).
- Higher latency for wakeup (involves scheduler).
- Hybrid “competitive spinning” tries spinning for a short time before blocking.
Performance: Locks have minimal overhead in the uncontended case (often just the cost of one atomic RMW). However, in the contended case, performance degrades significantly due to blocking, context switching, and scheduler overhead. Starvation is possible depending on the implementation.
Fundamental Disadvantages of Locking
- Pessimistic: Locks assume the worst (a conflict might happen) and enforce mutual exclusion always, even if no conflict actually occurs in a particular execution.
- Performance: Overhead even when uncontended; significant degradation under contention. Bottlenecks (Amdahl’s Law!).
- Blocking Semantics:
- Priority Inversion: A high-priority thread can be blocked waiting for a low-priority thread holding a lock.
- Convoying: If a thread holding a lock is descheduled (e.g., due to time slice end, page fault), all other threads waiting for that lock are blocked, even if they are ready to run.
- Fault Tolerance: If a thread dies while holding a lock, the lock might never be released, blocking other threads permanently.
- Deadlocks/Livelocks: Prone to these issues.
- Interrupt Handlers: Cannot typically acquire blocking locks within interrupt handlers.
Lock-Free Programming
Can we achieve synchronization without using locks (mutual exclusion)? Yes, using lock-free techniques.
Recap: Blocking Synchronization Issues
- Deadlock: Cycle of processes waiting for resources held by others.
- Livelock: Processes repeatedly change state in response to others but make no overall progress (e.g., two people trying politely to get out of each other’s way in a corridor).
- Starvation: A process is indefinitely prevented from making progress, even though the system might be progressing overall.
Definitions for Non-Blocking Synchronization
- Lock-Freedom: Guarantees that at least one thread always makes progress (completes its operation) in a finite number of steps, regardless of the state of other threads (even if some are slow or paused). Implies system-wide progress but does not guarantee freedom from starvation (a specific thread might repeatedly fail or be preempted just before completing).
- Wait-Freedom: Stronger guarantee. Every thread is guaranteed to make progress (complete its operation) in a finite number of its own steps, regardless of other threads. Implies freedom from starvation.
Progress Conditions Compared
Non-blocking (No Locks) | Blocking (Locks) | |
---|---|---|
Everyone Progresses | Wait-free | Starvation-free |
Someone Progresses | Lock-free | Deadlock-free |
Non-Blocking Algorithms
- Key Property: The failure or suspension of one thread cannot cause the failure or indefinite suspension of another thread. This is fundamentally different from locks, where one thread holding a lock can indefinitely block others.
- Implementation: Rely on atomic hardware primitives like Compare-and-Swap (CAS) to make progress safely without acquiring locks.
Continue here: 22 Lock-Free Stack, Lock-Free List-Based Set, Lock-Free Queue