18 Deadlocks, Semaphores, Barriers

Lecture from 23.04.2025 | Video: Video ETHZ

Deadlock

We saw a potential deadlock scenario last time with the bank account transferTo method.

Definition: A deadlock occurs when two or more processes (or threads) are mutually blocked, each waiting for a resource held by another process in the set, preventing any of them from proceeding. Think of traffic gridlock or a four-way intersection standoff.

Threads and Resources Graphically

To analyze deadlocks, we can use a resource allocation graph:

Circles represent threads (e.g., A).
Squares with locks represent resources (e.g., lock x).
Arrow from Thread to Resource (P → a): Thread P is requesting (and blocked on) resource a.
Arrow from Resource to Thread (b → Q): Resource b is held by thread Q.

Deadlocks – More Formally

A deadlock exists if and only if the resource allocation graph (showing threads $T_{1} ... T_{n}$ and resources $R_{1} ... R_{m}$ , with edges for “wants” and “has”) contains a cycle.

Techniques for Handling Deadlocks

Deadlock Detection and Recovery:
- Some systems (mainly databases) periodically run algorithms to find cycles in the dependency graph.
- Recovery is difficult. Usually involves aborting one or more processes/transactions in the cycle, releasing their resources.
- Not generally feasible for locks in parallel programming: Forcibly releasing a lock held by a thread typically leaves the shared data in an inconsistent state.
Deadlock Avoidance/Prevention: Design the system or program so that cycles can never occur. This is the standard approach in parallel programming. Common techniques include:
- Two-Phase Locking with Retry: Acquire all needed locks; if any fails, release all held locks and retry later. Common in databases where transactions can be aborted cleanly. Less common with basic mutexes.
- Resource Ordering (Lock Ordering): Define a global, arbitrary total order among all resources (locks). Require all threads to acquire locks only in ascending order according to this global order. This prevents cycles.

Back to Our Bank Account Example

Recall the transferTo method causing deadlock:

class BankAccount {
 // ...
 synchronized void transferTo(int amount, BankAccount a) {
  this.withdraw(amount); // 1. Locks 'this'
  a.deposit(amount);     // 2. Locks 'a'
 }
}

Deadlock occurs if Thread 1 calls x.transferTo(y) (locks x, tries to lock y) and Thread 2 calls y.transferTo(x) (locks y, tries to lock x).

How can we avoid this deadlock?

Option 1: Smaller Critical Sections (Usually Wrong)

// Make transferTo NOT synchronized
void transferTo(int amount, BankAccount a) {
  this.withdraw(amount); // Locks/unlocks 'this' internally
  a.deposit(amount);     // Locks/unlocks 'a' internally
}

Problem: This avoids deadlock because locks are not held simultaneously. However, it introduces a race condition / bad interleaving. Money is withdrawn from this but before it’s deposited into a, another thread could observe the system in an inconsistent state (money has temporarily “disappeared”). This is often unacceptable. Atomicity of the transfer is lost.

Option 2: One Lock For All Accounts (Inefficient)

class BankAccount {
  static Object globalLock = new Object(); // ONE lock for ALL accounts
  // withdraw & deposit use globalLock internally (not shown)
 
  void transferTo(int amount, BankAccount to) {
    synchronized (globalLock) { // Acquire the single global lock
      withdraw(amount); // Assumes withdraw now uses globalLock too
      to.deposit(amount); // Assumes deposit now uses globalLock too
    }
  }
}

Problem: This avoids deadlock because only one lock exists. However, it destroys concurrency! Only one transfer (or withdraw/deposit) can happen anywhere in the system at a time, even if the transfers involve completely different pairs of accounts. Very inefficient.

Option 3: Resource Ordering (The Preferred Way)

Impose a global order on acquiring the locks. We need a way to uniquely order BankAccount objects. Let’s assume each account has a unique accountNr.

class BankAccount {
  // ... accountNr field ...
  // withdraw & deposit remain synchronized on 'this'
 
  void transferTo(int amount, BankAccount to) {
    // Ensure locks are always acquired in ascending order of accountNr
    if (this.accountNr < to.accountNr) {
      synchronized (this) { // 1. Lock 'this' (lower number)
        synchronized (to) {   // 2. Lock 'to' (higher number)
          this.withdraw(amount);
          to.deposit(amount);
        }
      }
    } else if (this.accountNr > to.accountNr) {
      synchronized (to) {   // 1. Lock 'to' (lower number)
        synchronized (this) { // 2. Lock 'this' (higher number)
          this.withdraw(amount);
          to.deposit(amount);
        }
      }
    } else {
      // Special case: transferring to self? Handle appropriately (e.g., do nothing or throw exception)
      // Cannot acquire same lock twice unless it's re-entrant (which synchronized is),
      // but the logic here avoids that specific self-transfer deadlock possibility anyway.
    }
  }
}

How it works: By always acquiring the lock on the account with the smaller account number first, we break the potential for a circular wait. If Thread 1 wants x then y, and Thread 2 wants y then x, one of them will acquire the lower-numbered account first, and the other will block on that first lock, preventing the deadlock cycle.

Programming Trick: Generating an Order

What if objects don’t have a natural unique ID for ordering? We can generate one!

class BankAccount {
  private static final AtomicLong counter = new AtomicLong(0);
  private final long uniqueId = counter.incrementAndGet(); // Assign unique ID on creation
 
  void transferTo(int amount, BankAccount to) {
    // Use uniqueId instead of accountNr for ordering locks
    if (this.uniqueId < to.uniqueId) {
       synchronized (this) { synchronized (to) { /* ... */ } }
    } else if (this.uniqueId > to.uniqueId) {
       synchronized (to) { synchronized (this) { /* ... */ } }
    } else { /* transfer to self */ }
  }
}

Using an AtomicLong ensures each account gets a unique sequential ID upon creation, providing a consistent global order.

Another Deadlock Example: `StringBuffer`

The (older, now largely replaced by StringBuilder) StringBuffer class in Java had synchronized methods. Consider append(StringBuffer sb):

synchronized void append(StringBuffer sb) {
  int len = sb.length(); // Calls sb.length() - requires lock on sb?
  // ... potentially expand 'this.value' ...
  sb.getChars(0, len, this.value, this.count); // Calls sb.getChars() - requires lock on sb?
}

Problem #1 (Race Condition): The lock on this is held, but the code calls methods (length, getChars) on the other StringBuffer (sb). If sb’s methods are also synchronized, the lock for sb is not held by the append method between the sb.length() call and the sb.getChars() call. Another thread could modify sb in between, causing append to read the wrong length or copy inconsistent data.

Problem #2 (Deadlock): If two threads execute x.append(y) and y.append(x) concurrently:

Thread 1 locks x (entering x.append).
Thread 2 locks y (entering y.append).
Thread 1 tries to call y.length() (or y.getChars()) and blocks waiting for the lock on y.
Thread 2 tries to call x.length() (or x.getChars()) and blocks waiting for the lock on x. Deadlock! This is the exact same pattern as the bank transfer.

Fixes?

Fixing both requires careful thought (e.g., lock ordering using unique IDs, or a single global lock for all StringBuffers, both with drawbacks).
The original Java library initially didn’t fix it, relying on users to avoid such patterns.
Eventually, the non-synchronized StringBuilder was introduced for single-threaded use cases, while StringBuffer remains (mostly) for legacy compatibility, still carrying the potential for these issues if used improperly.

Avoiding Deadlocks: Programming Patterns

Deadlocks involving acquiring multiple locks are hard to debug. Use patterns to avoid them:

Different Object Types: If operations involve locks on objects of different types (e.g., a Hashtable and a WorkQueue), establish and document a fixed acquisition order (e.g., “Always lock Hashtable before WorkQueue”).
Acyclic Structures: If objects are part of an acyclic structure (like a tree), use the structure to define the locking order (e.g., “Never acquire a lock on a non-child node while holding a lock on a parent node”).

Significance of Deadlocks

While data races are fundamental errors violating the memory model, Deadlock is often the more dominant practical problem in complex concurrent systems using locks.
Deadlocks can be very hard to anticipate and debug because they depend on specific timing and interleaving.
Starvation (a related issue where a thread is repeatedly denied access even if not strictly deadlocked) is also a concern.

Semaphores: Beyond Simple Mutual Exclusion

Locks provide mutual exclusion – ensuring only one thread executes a critical section at a time. But sometimes we need more generalized synchronization.

Why Need More Than Locks?

Communication: Locks don’t provide a direct way for threads to communicate about changes in state (e.g., “data is ready,” “buffer is full/empty”). Threads might acquire a lock only to find the condition they need isn’t met, forcing them to release the lock and try again (inefficient busy-waiting or complex wait/notify).
Ordering: Locks don’t enforce a specific order of entry if multiple threads are waiting.
Resource Counting: Locks are binary (held or not held). What if we want to allow up to N threads to access a resource simultaneously?

Example Need: Producer-consumer queues, where producers add items and consumers remove them. Consumers need to wait if the queue is empty, and producers need to wait if the queue is full.

Semaphores (Edsger Dijkstra, 1965):

Named after railway signals, semaphores are a fundamental synchronization primitive.

Semantics:

A semaphore S is an abstract data type holding a non-negative integer value, initialized to some value $s \geq 0$ .

It supports two atomic operations:

acquire(S) (or P(S), wait(S)):

// ---- Start Atomic ----
wait until S > 0; // Block if S is zero
S = S - 1;
// ---- End Atomic ----

release(S) (or V(S), signal(S)):

// ---- Start Atomic ----
S = S + 1;
// ---- End Atomic ----

Intuition: acquire tries to decrement the count, waiting if it’s zero. release increments the count (potentially waking up a waiting thread).

Building a Lock with a Semaphore

A standard lock (mutex) can be implemented using a semaphore initialized to 1 (a “binary semaphore”):

mutex = new Semaphore(1); (Initial value 1 means “unlocked”)
lock() operation: mutex.acquire();
- If S is 1 (unlocked), it becomes 0 (locked), and the thread proceeds.
- If S is 0 (locked), the thread blocks until another thread releases.
unlock() operation: mutex.release();
- Sets S from 0 back to 1 (unlocked). If another thread was waiting on acquire, it can now proceed.

Example: Scaled Dot Product Synchronization

Recall synchronizing two threads computing partial dot products.

With Locks: Requires careful locking/unlocking around the shared sum x.
With Semaphores:
- Initialize Semaphore S = new Semaphore(0);
- Thread A: xA = ...; x = x + xA; S.release();
- Thread B: xB = ...; S.acquire(); x = x + xB; (assuming x starts at 0 or requires mutual exclusion for the update if not atomic). Thread B waits until A releases, ensuring xA’s contribution is included before B adds xB. (Note: This simple example assumes x=x+xA is atomic or protected separately if needed.)

Example: Rendezvous

Problem: Two processes, P and Q, need to wait for each other at a specific point (rendezvous) before proceeding.

Solution using Semaphores: Use two semaphores, P_Arrived and Q_Arrived, both initialized to 0.

Attempt 1:

This only makes Q wait for P, not the other way around.

Attempt 2 (Deadlock!):

Problem: If P executes acquire(Q_Arrived) and blocks, and Q executes acquire(P_Arrived) and blocks, neither can execute release to signal the other. Deadlock!

Correct Solution (Rendezvous): Separate the signaling and waiting.

Each process first signals its own arrival (release) and then waits for the other’s signal (acquire). This avoids the deadlock.

Implementing Semaphores (Digression)

How can acquire wait without spinning? Semaphores are typically implemented by the OS or language runtime using blocking queues.

Each semaphore S has an associated wait queue QS.
acquire(S): If S > 0, decrement S. If S == 0, add the current thread (self) to QS and block (yield CPU).
release(S): If QS is empty, increment S. If QS is not empty, remove a waiting process p from QS and unblock(p) (make it runnable again).

Note: This allows FIFO order…

Back to Dot Product at Scale

Imagine computing a dot product of million-element vectors using 10,000 threads. Using individual semaphores or locks for pairwise synchronization becomes incredibly complex and inefficient. We need higher-level abstractions for coordinating large groups of threads.

Bulk-Synchronous Parallel (BSP): A model where computation proceeds in phases (“supersteps”). All threads perform computation locally, then communicate, and then synchronize at a barrier before proceeding to the next superstep.

Barriers

Barrier: A synchronization primitive used to make a number of threads wait until all of them have reached a particular point in the code before any of them are allowed to proceed.

How can we implement a barrier using semaphores?

Barrier Attempt 1 (Flawed)

Goal: Synchronize n processes (threads). We want all n threads to wait at a specific point (the barrier) until every single one has arrived, after which they can all proceed.

Idea: Use a semaphore, let’s call it barrier, to act as a gate, and a shared counter, count, to track how many threads have arrived.

Shared Variables:
- Semaphore barrier = new Semaphore(0); // Initialized to 0 (gate closed)
- volatile int count = 0; // Counts arriving threads (volatile for visibility, but not atomicity!)
- final int n = num_threads; // Total number of threads

Code for Each Thread Pi:

// init phase: barrier = 0; count = 0; (done once globally)
// ... pre-barrier work ...
 
// Arrive at barrier:
count++; // Increment arrival counter
if (count == n) {
    // Last thread to arrive "opens" the gate
    release(barrier);
}
acquire(barrier); // Wait for the gate to open
 
// ... post-barrier work ...

Analysis of Flaws:

Race Condition on count: The line count++ is not atomic. If multiple threads execute this line concurrently, the increments can interleave in ways that result in count having a value less than n even after all n threads have executed the line. For example, two threads might read count (say, its value is k), both compute k+1, and both write k+1 back. The counter only increases by 1 instead of 2. Consequently, the condition if (count == n) might never become true for any thread, or it might become true prematurely.
Deadlock/Stall (Incorrect Gate Mechanism): Even if we assume count++ was atomic (e.g., using AtomicInteger.incrementAndGet()), this logic is still flawed. The release(barrier) operation only increments the semaphore’s internal value once (from 0 to 1). The subsequent acquire(barrier) operations will decrement it. Only the first thread to execute acquire(barrier) after the release will succeed. All other n-1 threads will block indefinitely on acquire(barrier) because the semaphore’s count will immediately become 0 again and is never incremented further. The gate opens for one thread and then immediately closes.

Invariant Violations: This implementation violates key invariants we’d expect from a barrier:

“count provides the number of processes that have passed the barrier” - Violated due to the race condition on count++.
“when all processes have reached the barrier then all waiting processes can continue” - Violated because only one process passes the acquire.

Barrier with Mutex (Turnstile - Still Wrong for Reuse)

To fix the race condition on count, we can protect it with a mutex (which, as we know, can be implemented with a Semaphore(1)). This ensures only one thread modifies count at a time. Let’s call this mutex mutex.

Shared Variables:
- Semaphore barrier = new Semaphore(0); // The main barrier gate
- Semaphore mutex = new Semaphore(1); // Lock for protecting count
- int count = 0; // Arrival counter (doesn’t need volatile if always accessed under mutex)
- final int n = num_threads;

Code for Each Thread Pi:

// ... pre-barrier work ...
 
// Arrive at barrier:
acquire(mutex);     // Lock before accessing count
count++;
int current_count = count; // Read count while holding lock
release(mutex);     // Unlock after accessing count
 
if (current_count == n) {
    release(barrier); // Last thread opens the gate
}
 
acquire(barrier); // Wait for the gate
release(barrier); // <-- PROBLEM FOR REUSE!
 
// ... post-barrier work ...

Analysis:

Mutual Exclusion on Count: The mutex correctly prevents the race condition on count. The last thread arriving will reliably detect count == n.
Gate Opening: The last thread successfully releases (V) the barrier semaphore, incrementing its count to 1.
First Thread Passes: The first thread (might be the last arriver or another waiting thread) that executes acquire(barrier) will succeed, decrementing the count back to 0.
The “Turnstile” Problem: The crucial issue lies in the release(barrier) call after the acquire. The intent is seemingly to let the next waiting thread through. This creates a “turnstile” effect: one thread acquires, then releases, allowing the next thread to acquire, which then releases, and so on. All n threads will eventually pass the barrier.
Not Reusable: This barrier works correctly exactly once. Consider what happens if the threads loop and reach the barrier a second time. The barrier semaphore’s count might have been left at a value greater than 0 by the last thread passing through in the previous iteration (because the release(barrier) calls propagate). A fast thread arriving for the second time might execute acquire(barrier) and pass immediately, even before all other threads have finished the first barrier synchronization, because the semaphore count wasn’t reset to 0.

Reusable Barrier Attempts (Illustrating the Difficulty)

The challenge is resetting the barrier correctly so it can be used multiple times within a loop.

Attempt 1 (Flawed Reuse Logic)

Let’s try adding a “reset” phase.

Code for Each Thread Pi:

// === Phase 1: Arrival ===
acquire(mutex); count++; release(mutex);
if (count == n) release(barrier); // Open gate
acquire(barrier);     // Wait for gate
release(barrier);     // Let next thread through (turnstile)
 
// === Phase 2: Reset (Attempt) ===
acquire(mutex); count--; release(mutex);
// PROBLEM: If count becomes 0, should we 'close' the barrier?
if (count == 0) acquire(barrier); // Try to reset barrier count? WRONG!

Problem: A fast thread might complete Phase 1, execute Phase 2 (decrementing count), and loop back around to Phase 1 again (incrementing count before slower threads have even completed Phase 1 of the first barrier). This completely breaks the logic for detecting count == n and count == 0 correctly. Furthermore, the acquire(barrier) in Phase 2 is problematic – which thread should do it, and what if the count is already 0?

Invariant Violations:

“When all processes have run through the barrier then count = 0” - Could be violated if fast threads increment count again before slow threads decrement.
“When all processes have run through the barrier then barrier = 0” - Violated because the turnstile release calls leave the barrier count > 0.

Attempt 2 (Still Flawed Reuse Logic)

Maybe releasing the mutex earlier helps?

Code for Each Thread Pi:

// === Phase 1: Arrival ===
acquire(mutex);
count++;
if (count == n) release(barrier); // Open gate if last
release(mutex); // <-- Release mutex EARLIER
 
acquire(barrier);     // Wait for gate
release(barrier);     // Let next thread through (turnstile)
 
// === Phase 2: Reset (Attempt) ===
acquire(mutex);
count--;
// PROBLEM: If count becomes 0, should we 'close' the barrier?
if (count == 0) acquire(barrier); // Still WRONG!
release(mutex);

Problem: This doesn’t solve the fundamental problem. Releasing the mutex earlier just changes the timing slightly but doesn’t prevent fast threads from looping around and interfering with the state (count and barrier) before slower threads have completed the synchronization phases. The invariants listed are still violated for the same reasons.

Solution: Two-Phase Reusable Barrier

The standard solution involves two phases and two barriers (or a barrier and a second semaphore acting as a gate). The core idea is to ensure all threads have finished phase 1 (arriving) before any thread can start phase 2 (resetting/leaving), and all threads must finish phase 2 before any thread can start the next phase 1.

This two-phase approach correctly ensures that the barrier can be reused safely, preventing the race conditions seen in the simpler attempts.

Lesson Learned

Implementing synchronization primitives like Semaphores, Rendezvous, and Barriers correctly is hard and prone to subtle errors (race conditions, deadlocks).
Naive “trial and error” approaches are very unlikely to succeed.
Ways Out:
- Clearly identify the invariants your concurrent system must maintain.
- Use established patterns and algorithms designed for specific synchronization needs.
- Leverage known good libraries (like java.util.concurrent) which provide robust implementations of locks, semaphores, barriers, etc.

Summary

Locks are not enough: While essential for mutual exclusion, locks lack mechanisms for threads to efficiently wait for specific events or state changes, and they don’t impose ordering.
Semaphores: Provide a more general mechanism based on atomic counter manipulation (acquire/release), useful for controlling access to a finite number of resources, signaling, and building other primitives like locks.
Rendezvous and Barriers: Common synchronization patterns for coordinating multiple threads, implementable (carefully!) using semaphores. Deadlock is a key risk if implemented incorrectly.
Next Steps: We’ll explore solutions to classic concurrency problems like the Producer-Consumer problem, introducing Monitors and Condition Variables as higher-level constructs often built upon locks and semaphores.

Continue here: 19 Producer Consumer Pattern, Queue, Monitors

CS Notes

Explorer

18 Deadlocks, Semaphores, Barriers

Deadlock

Threads and Resources Graphically

Deadlocks – More Formally

Techniques for Handling Deadlocks

Back to Our Bank Account Example

Option 1: Smaller Critical Sections (Usually Wrong)

Option 2: One Lock For All Accounts (Inefficient)

Option 3: Resource Ordering (The Preferred Way)

Programming Trick: Generating an Order

Another Deadlock Example: `StringBuffer`

Avoiding Deadlocks: Programming Patterns

Significance of Deadlocks

Semaphores: Beyond Simple Mutual Exclusion

Building a Lock with a Semaphore

Example: Scaled Dot Product Synchronization

Example: Rendezvous

Implementing Semaphores (Digression)

Back to Dot Product at Scale

Barriers

Barrier Attempt 1 (Flawed)

Barrier with Mutex (Turnstile - Still Wrong for Reuse)

Reusable Barrier Attempts (Illustrating the Difficulty)

Attempt 1 (Flawed Reuse Logic)

Attempt 2 (Still Flawed Reuse Logic)

Solution: Two-Phase Reusable Barrier

Lesson Learned

Summary

Table of Contents

Graph View

CS Notes

Explorer

18 Deadlocks, Semaphores, Barriers

Deadlock

Threads and Resources Graphically

Deadlocks – More Formally

Techniques for Handling Deadlocks

Back to Our Bank Account Example

Option 1: Smaller Critical Sections (Usually Wrong)

Option 2: One Lock For All Accounts (Inefficient)

Option 3: Resource Ordering (The Preferred Way)

Programming Trick: Generating an Order

Another Deadlock Example: StringBuffer

Avoiding Deadlocks: Programming Patterns

Significance of Deadlocks

Semaphores: Beyond Simple Mutual Exclusion

Building a Lock with a Semaphore

Example: Scaled Dot Product Synchronization

Example: Rendezvous

Implementing Semaphores (Digression)

Back to Dot Product at Scale

Barriers

Barrier Attempt 1 (Flawed)

Barrier with Mutex (Turnstile - Still Wrong for Reuse)

Reusable Barrier Attempts (Illustrating the Difficulty)

Attempt 1 (Flawed Reuse Logic)

Attempt 2 (Still Flawed Reuse Logic)

Solution: Two-Phase Reusable Barrier

Lesson Learned

Summary

Table of Contents

Graph View

Another Deadlock Example: `StringBuffer`