Lecture from 23.04.2025 | Video: Video ETHZ
Deadlock
We saw a potential deadlock scenario last time with the bank account transferTo
method.
Definition: A deadlock occurs when two or more processes (or threads) are mutually blocked, each waiting for a resource held by another process in the set, preventing any of them from proceeding. Think of traffic gridlock or a four-way intersection standoff.
Threads and Resources Graphically
To analyze deadlocks, we can use a resource allocation graph:
- Circles represent threads (e.g., A).
- Squares with locks represent resources (e.g., lock x).
- Arrow from Thread to Resource (P → a): Thread P is requesting (and blocked on) resource a.
- Arrow from Resource to Thread (b → Q): Resource b is held by thread Q.
Deadlocks – More Formally
A deadlock exists if and only if the resource allocation graph (showing threads and resources , with edges for “wants” and “has”) contains a cycle.
Techniques for Handling Deadlocks
- Deadlock Detection and Recovery:
- Some systems (mainly databases) periodically run algorithms to find cycles in the dependency graph.
- Recovery is difficult. Usually involves aborting one or more processes/transactions in the cycle, releasing their resources.
- Not generally feasible for locks in parallel programming: Forcibly releasing a lock held by a thread typically leaves the shared data in an inconsistent state.
- Deadlock Avoidance/Prevention: Design the system or program so that cycles can never occur. This is the standard approach in parallel programming. Common techniques include:
- Two-Phase Locking with Retry: Acquire all needed locks; if any fails, release all held locks and retry later. Common in databases where transactions can be aborted cleanly. Less common with basic mutexes.
- Resource Ordering (Lock Ordering): Define a global, arbitrary total order among all resources (locks). Require all threads to acquire locks only in ascending order according to this global order. This prevents cycles.
Back to Our Bank Account Example
Recall the transferTo
method causing deadlock:
class BankAccount {
// ...
synchronized void transferTo(int amount, BankAccount a) {
this.withdraw(amount); // 1. Locks 'this'
a.deposit(amount); // 2. Locks 'a'
}
}
Deadlock occurs if Thread 1 calls x.transferTo(y)
(locks x, tries to lock y) and Thread 2 calls y.transferTo(x)
(locks y, tries to lock x).
How can we avoid this deadlock?
Option 1: Smaller Critical Sections (Usually Wrong)
// Make transferTo NOT synchronized
void transferTo(int amount, BankAccount a) {
this.withdraw(amount); // Locks/unlocks 'this' internally
a.deposit(amount); // Locks/unlocks 'a' internally
}
Problem: This avoids deadlock because locks are not held simultaneously. However, it introduces a race condition / bad interleaving. Money is withdrawn from this
but before it’s deposited into a
, another thread could observe the system in an inconsistent state (money has temporarily “disappeared”). This is often unacceptable. Atomicity of the transfer is lost.
Option 2: One Lock For All Accounts (Inefficient)
class BankAccount {
static Object globalLock = new Object(); // ONE lock for ALL accounts
// withdraw & deposit use globalLock internally (not shown)
void transferTo(int amount, BankAccount to) {
synchronized (globalLock) { // Acquire the single global lock
withdraw(amount); // Assumes withdraw now uses globalLock too
to.deposit(amount); // Assumes deposit now uses globalLock too
}
}
}
Problem: This avoids deadlock because only one lock exists. However, it destroys concurrency! Only one transfer (or withdraw/deposit) can happen anywhere in the system at a time, even if the transfers involve completely different pairs of accounts. Very inefficient.
Option 3: Resource Ordering (The Preferred Way)
Impose a global order on acquiring the locks. We need a way to uniquely order BankAccount
objects. Let’s assume each account has a unique accountNr
.
class BankAccount {
// ... accountNr field ...
// withdraw & deposit remain synchronized on 'this'
void transferTo(int amount, BankAccount to) {
// Ensure locks are always acquired in ascending order of accountNr
if (this.accountNr < to.accountNr) {
synchronized (this) { // 1. Lock 'this' (lower number)
synchronized (to) { // 2. Lock 'to' (higher number)
this.withdraw(amount);
to.deposit(amount);
}
}
} else if (this.accountNr > to.accountNr) {
synchronized (to) { // 1. Lock 'to' (lower number)
synchronized (this) { // 2. Lock 'this' (higher number)
this.withdraw(amount);
to.deposit(amount);
}
}
} else {
// Special case: transferring to self? Handle appropriately (e.g., do nothing or throw exception)
// Cannot acquire same lock twice unless it's re-entrant (which synchronized is),
// but the logic here avoids that specific self-transfer deadlock possibility anyway.
}
}
}
How it works: By always acquiring the lock on the account with the smaller account number first, we break the potential for a circular wait. If Thread 1 wants x then y, and Thread 2 wants y then x, one of them will acquire the lower-numbered account first, and the other will block on that first lock, preventing the deadlock cycle.
Programming Trick: Generating an Order
What if objects don’t have a natural unique ID for ordering? We can generate one!
class BankAccount {
private static final AtomicLong counter = new AtomicLong(0);
private final long uniqueId = counter.incrementAndGet(); // Assign unique ID on creation
void transferTo(int amount, BankAccount to) {
// Use uniqueId instead of accountNr for ordering locks
if (this.uniqueId < to.uniqueId) {
synchronized (this) { synchronized (to) { /* ... */ } }
} else if (this.uniqueId > to.uniqueId) {
synchronized (to) { synchronized (this) { /* ... */ } }
} else { /* transfer to self */ }
}
}
Using an AtomicLong
ensures each account gets a unique sequential ID upon creation, providing a consistent global order.
Another Deadlock Example: StringBuffer
The (older, now largely replaced by StringBuilder
) StringBuffer
class in Java had synchronized methods. Consider append(StringBuffer sb)
:
synchronized void append(StringBuffer sb) {
int len = sb.length(); // Calls sb.length() - requires lock on sb?
// ... potentially expand 'this.value' ...
sb.getChars(0, len, this.value, this.count); // Calls sb.getChars() - requires lock on sb?
}
Problem #1 (Race Condition): The lock on this
is held, but the code calls methods (length
, getChars
) on the other StringBuffer
(sb
). If sb
’s methods are also synchronized, the lock for sb
is not held by the append
method between the sb.length()
call and the sb.getChars()
call. Another thread could modify sb
in between, causing append
to read the wrong length or copy inconsistent data.
Problem #2 (Deadlock): If two threads execute x.append(y)
and y.append(x)
concurrently:
- Thread 1 locks
x
(enteringx.append
). - Thread 2 locks
y
(enteringy.append
). - Thread 1 tries to call
y.length()
(ory.getChars()
) and blocks waiting for the lock ony
. - Thread 2 tries to call
x.length()
(orx.getChars()
) and blocks waiting for the lock onx
. Deadlock! This is the exact same pattern as the bank transfer.
Fixes?
- Fixing both requires careful thought (e.g., lock ordering using unique IDs, or a single global lock for all
StringBuffer
s, both with drawbacks). - The original Java library initially didn’t fix it, relying on users to avoid such patterns.
- Eventually, the non-synchronized
StringBuilder
was introduced for single-threaded use cases, whileStringBuffer
remains (mostly) for legacy compatibility, still carrying the potential for these issues if used improperly.
Avoiding Deadlocks: Programming Patterns
Deadlocks involving acquiring multiple locks are hard to debug. Use patterns to avoid them:
- Different Object Types: If operations involve locks on objects of different types (e.g., a Hashtable and a WorkQueue), establish and document a fixed acquisition order (e.g., “Always lock Hashtable before WorkQueue”).
- Acyclic Structures: If objects are part of an acyclic structure (like a tree), use the structure to define the locking order (e.g., “Never acquire a lock on a non-child node while holding a lock on a parent node”).
Significance of Deadlocks
- While data races are fundamental errors violating the memory model, Deadlock is often the more dominant practical problem in complex concurrent systems using locks.
- Deadlocks can be very hard to anticipate and debug because they depend on specific timing and interleaving.
- Starvation (a related issue where a thread is repeatedly denied access even if not strictly deadlocked) is also a concern.
Semaphores: Beyond Simple Mutual Exclusion
Locks provide mutual exclusion – ensuring only one thread executes a critical section at a time. But sometimes we need more generalized synchronization.
Why Need More Than Locks?
- Communication: Locks don’t provide a direct way for threads to communicate about changes in state (e.g., “data is ready,” “buffer is full/empty”). Threads might acquire a lock only to find the condition they need isn’t met, forcing them to release the lock and try again (inefficient busy-waiting or complex
wait/notify
). - Ordering: Locks don’t enforce a specific order of entry if multiple threads are waiting.
- Resource Counting: Locks are binary (held or not held). What if we want to allow up to N threads to access a resource simultaneously?
Example Need: Producer-consumer queues, where producers add items and consumers remove them. Consumers need to wait if the queue is empty, and producers need to wait if the queue is full.
Semaphores (Edsger Dijkstra, 1965):
Named after railway signals, semaphores are a fundamental synchronization primitive.
Semantics:
- A semaphore
S
is an abstract data type holding a non-negative integer value, initialized to some value . - It supports two atomic operations:
acquire(S)
(orP(S)
,wait(S)
):// ---- Start Atomic ---- wait until S > 0; // Block if S is zero S = S - 1; // ---- End Atomic ----
release(S)
(orV(S)
,signal(S)
):// ---- Start Atomic ---- S = S + 1; // ---- End Atomic ----
Intuition: acquire
tries to decrement the count, waiting if it’s zero. release
increments the count (potentially waking up a waiting thread).
Building a Lock with a Semaphore
A standard lock (mutex) can be implemented using a semaphore initialized to 1 (a “binary semaphore”):
mutex = new Semaphore(1);
(Initial value 1 means “unlocked”)lock()
operation:mutex.acquire();
- If
S
is 1 (unlocked), it becomes 0 (locked), and the thread proceeds. - If
S
is 0 (locked), the thread blocks until another thread releases.
- If
unlock()
operation:mutex.release();
- Sets
S
from 0 back to 1 (unlocked). If another thread was waiting onacquire
, it can now proceed.
- Sets
Example: Scaled Dot Product Synchronization
Recall synchronizing two threads computing partial dot products.
- With Locks: Requires careful locking/unlocking around the shared sum
x
. - With Semaphores:
- Initialize
Semaphore S = new Semaphore(0);
- Thread A:
xA = ...; x = x + xA; S.release();
- Thread B:
xB = ...; S.acquire(); x = x + xB;
(assumingx
starts at 0 or requires mutual exclusion for the update if not atomic). Thread B waits until A releases, ensuringxA
’s contribution is included before B addsxB
. (Note: This simple example assumesx=x+xA
is atomic or protected separately if needed.)
- Initialize
Example: Rendezvous
Problem: Two processes, P and Q, need to wait for each other at a specific point (rendezvous) before proceeding.
Solution using Semaphores: Use two semaphores, P_Arrived
and Q_Arrived
, both initialized to 0.
Attempt 1:
This only makes Q wait for P, not the other way around.
Attempt 2 (Deadlock!):
Problem: If P executes acquire(Q_Arrived)
and blocks, and Q executes acquire(P_Arrived)
and blocks, neither can execute release
to signal the other. Deadlock!
Correct Solution (Rendezvous): Separate the signaling and waiting.
Each process first signals its own arrival (release
) and then waits for the other’s signal (acquire
). This avoids the deadlock.
Implementing Semaphores (Digression)
How can acquire
wait without spinning? Semaphores are typically implemented by the OS or language runtime using blocking queues.
- Each semaphore
S
has an associated wait queueQS
. acquire(S)
: IfS > 0
, decrementS
. IfS == 0
, add the current thread (self
) toQS
andblock
(yield CPU).release(S)
: IfQS
is empty, incrementS
. IfQS
is not empty, remove a waiting processp
fromQS
andunblock(p)
(make it runnable again).
Note: This allows FIFO order…
Back to Dot Product at Scale
Imagine computing a dot product of million-element vectors using 10,000 threads. Using individual semaphores or locks for pairwise synchronization becomes incredibly complex and inefficient. We need higher-level abstractions for coordinating large groups of threads.
Bulk-Synchronous Parallel (BSP): A model where computation proceeds in phases (“supersteps”). All threads perform computation locally, then communicate, and then synchronize at a barrier before proceeding to the next superstep.
Barriers
Barrier: A synchronization primitive used to make a number of threads wait until all of them have reached a particular point in the code before any of them are allowed to proceed.
How can we implement a barrier using semaphores?
Barrier Attempt 1 (Flawed)
Goal: Synchronize n
processes (threads). We want all n
threads to wait at a specific point (the barrier) until every single one has arrived, after which they can all proceed.
Idea: Use a semaphore, let’s call it barrier
, to act as a gate, and a shared counter, count
, to track how many threads have arrived.
-
Shared Variables:
Semaphore barrier = new Semaphore(0);
// Initialized to 0 (gate closed)volatile int count = 0;
// Counts arriving threads (volatile for visibility, but not atomicity!)final int n = num_threads;
// Total number of threads
-
Code for Each Thread
Pi
:// init phase: barrier = 0; count = 0; (done once globally) // ... pre-barrier work ... // Arrive at barrier: count++; // Increment arrival counter if (count == n) { // Last thread to arrive "opens" the gate release(barrier); } acquire(barrier); // Wait for the gate to open // ... post-barrier work ...
Analysis of Flaws:
-
Race Condition on
count
: The linecount++
is not atomic. If multiple threads execute this line concurrently, the increments can interleave in ways that result incount
having a value less thann
even after alln
threads have executed the line. For example, two threads might readcount
(say, its value isk
), both computek+1
, and both writek+1
back. The counter only increases by 1 instead of 2. Consequently, the conditionif (count == n)
might never become true for any thread, or it might become true prematurely. -
Deadlock/Stall (Incorrect Gate Mechanism): Even if we assume
count++
was atomic (e.g., usingAtomicInteger.incrementAndGet()
), this logic is still flawed. Therelease(barrier)
operation only increments the semaphore’s internal value once (from 0 to 1). The subsequentacquire(barrier)
operations will decrement it. Only the first thread to executeacquire(barrier)
after therelease
will succeed. All othern-1
threads will block indefinitely onacquire(barrier)
because the semaphore’s count will immediately become 0 again and is never incremented further. The gate opens for one thread and then immediately closes.
Invariant Violations: This implementation violates key invariants we’d expect from a barrier:
- “count provides the number of processes that have passed the barrier” - Violated due to the race condition on
count++
. - “when all processes have reached the barrier then all waiting processes can continue” - Violated because only one process passes the
acquire
.
Barrier with Mutex (Turnstile - Still Wrong for Reuse)
To fix the race condition on count
, we can protect it with a mutex (which, as we know, can be implemented with a Semaphore(1)
). This ensures only one thread modifies count
at a time. Let’s call this mutex mutex
.
-
Shared Variables:
Semaphore barrier = new Semaphore(0);
// The main barrier gateSemaphore mutex = new Semaphore(1);
// Lock for protecting countint count = 0;
// Arrival counter (doesn’t need volatile if always accessed under mutex)final int n = num_threads;
-
Code for Each Thread
Pi
:// ... pre-barrier work ... // Arrive at barrier: acquire(mutex); // Lock before accessing count count++; int current_count = count; // Read count while holding lock release(mutex); // Unlock after accessing count if (current_count == n) { release(barrier); // Last thread opens the gate } acquire(barrier); // Wait for the gate release(barrier); // <-- PROBLEM FOR REUSE! // ... post-barrier work ...
Analysis:
- Mutual Exclusion on Count: The
mutex
correctly prevents the race condition oncount
. The last thread arriving will reliably detectcount == n
. - Gate Opening: The last thread successfully releases (
V
) thebarrier
semaphore, incrementing its count to 1. - First Thread Passes: The first thread (might be the last arriver or another waiting thread) that executes
acquire(barrier)
will succeed, decrementing the count back to 0. - The “Turnstile” Problem: The crucial issue lies in the
release(barrier)
call after theacquire
. The intent is seemingly to let the next waiting thread through. This creates a “turnstile” effect: one thread acquires, then releases, allowing the next thread to acquire, which then releases, and so on. Alln
threads will eventually pass the barrier. - Not Reusable: This barrier works correctly exactly once. Consider what happens if the threads loop and reach the barrier a second time. The
barrier
semaphore’s count might have been left at a value greater than 0 by the last thread passing through in the previous iteration (because therelease(barrier)
calls propagate). A fast thread arriving for the second time might executeacquire(barrier)
and pass immediately, even before all other threads have finished the first barrier synchronization, because the semaphore count wasn’t reset to 0.
Reusable Barrier Attempts (Illustrating the Difficulty)
The challenge is resetting the barrier correctly so it can be used multiple times within a loop.
Attempt 1 (Flawed Reuse Logic)
Let’s try adding a “reset” phase.
- Code for Each Thread
Pi
:// === Phase 1: Arrival === acquire(mutex); count++; release(mutex); if (count == n) release(barrier); // Open gate acquire(barrier); // Wait for gate release(barrier); // Let next thread through (turnstile) // === Phase 2: Reset (Attempt) === acquire(mutex); count--; release(mutex); // PROBLEM: If count becomes 0, should we 'close' the barrier? if (count == 0) acquire(barrier); // Try to reset barrier count? WRONG!
Problem: A fast thread might complete Phase 1, execute Phase 2 (decrementing count
), and loop back around to Phase 1 again (incrementing count
before slower threads have even completed Phase 1 of the first barrier). This completely breaks the logic for detecting count == n
and count == 0
correctly. Furthermore, the acquire(barrier)
in Phase 2 is problematic – which thread should do it, and what if the count is already 0?
Invariant Violations:
- “When all processes have run through the barrier then count = 0” - Could be violated if fast threads increment count again before slow threads decrement.
- “When all processes have run through the barrier then barrier = 0” - Violated because the turnstile
release
calls leave the barrier count > 0.
Attempt 2 (Still Flawed Reuse Logic)
Maybe releasing the mutex earlier helps?
- Code for Each Thread
Pi
:// === Phase 1: Arrival === acquire(mutex); count++; if (count == n) release(barrier); // Open gate if last release(mutex); // <-- Release mutex EARLIER acquire(barrier); // Wait for gate release(barrier); // Let next thread through (turnstile) // === Phase 2: Reset (Attempt) === acquire(mutex); count--; // PROBLEM: If count becomes 0, should we 'close' the barrier? if (count == 0) acquire(barrier); // Still WRONG! release(mutex);
Problem: This doesn’t solve the fundamental problem. Releasing the mutex earlier just changes the timing slightly but doesn’t prevent fast threads from looping around and interfering with the state (count
and barrier
) before slower threads have completed the synchronization phases. The invariants listed are still violated for the same reasons.
Solution: Two-Phase Reusable Barrier
The standard solution involves two phases and two barriers (or a barrier and a second semaphore acting as a gate). The core idea is to ensure all threads have finished phase 1 (arriving) before any thread can start phase 2 (resetting/leaving), and all threads must finish phase 2 before any thread can start the next phase 1.
This two-phase approach correctly ensures that the barrier can be reused safely, preventing the race conditions seen in the simpler attempts.
Lesson Learned
- Implementing synchronization primitives like Semaphores, Rendezvous, and Barriers correctly is hard and prone to subtle errors (race conditions, deadlocks).
- Naive “trial and error” approaches are very unlikely to succeed.
- Ways Out:
- Clearly identify the invariants your concurrent system must maintain.
- Use established patterns and algorithms designed for specific synchronization needs.
- Leverage known good libraries (like
java.util.concurrent
) which provide robust implementations of locks, semaphores, barriers, etc.
Summary
- Locks are not enough: While essential for mutual exclusion, locks lack mechanisms for threads to efficiently wait for specific events or state changes, and they don’t impose ordering.
- Semaphores: Provide a more general mechanism based on atomic counter manipulation (
acquire
/release
), useful for controlling access to a finite number of resources, signaling, and building other primitives like locks. - Rendezvous and Barriers: Common synchronization patterns for coordinating multiple threads, implementable (carefully!) using semaphores. Deadlock is a key risk if implemented incorrectly.
- Next Steps: We’ll explore solutions to classic concurrency problems like the Producer-Consumer problem, introducing Monitors and Condition Variables as higher-level constructs often built upon locks and semaphores.
Continue here: 19 Producer Consumer Pattern, Queue, Monitors