Lecture from 21.05.2025 | Video: Video ETHZ
As we venture into increasingly complex concurrent algorithms, especially non-blocking ones, the need for both robust implementation techniques and rigorous theoretical frameworks becomes critical. The sheer scale of modern computing, exemplified by hardware like the NVIDIA H100 GPU and the massive data centers underpinning cloud services, demands concurrency. However, this scale also magnifies the potential impact of subtle errors, as highlighted by incidents like the significant data loss reported by Google Cloud’s pension fund client. Today, we will confront one such subtle error, the ABA problem, which plagues lock-free algorithms that reuse memory. We will examine practical solutions like pointer tagging and hazard pointers. Then, equipped with an appreciation for these practical complexities, we will introduce Linearizability, a cornerstone theory for formally defining and proving the correctness of concurrent objects.
Summary/Recap Last Lecture
In our previous lecture, we laid the groundwork for today’s discussion:
- Memory (Node) Pools: We introduced the concept of using pre-allocated pools of memory nodes. This is a common and powerful technique, particularly in performance-sensitive environments like operating system kernels or applications using unmanaged languages (like C/C++), to avoid the overhead associated with frequent dynamic memory allocation and deallocation from the OS.
- Memory Reuse and the ABA Problem: We discovered a critical flaw tied to memory reuse: the ABA problem. This issue specifically affects lock-free algorithms relying on Compare-and-Swap (CAS). We noted that other atomic primitives like Load-Linked/Store-Conditional (LL/SC) are generally immune because they detect any store to the monitored address, not just value changes. Despite this vulnerability, CAS remains the most prevalent and versatile atomic operation available on many architectures (we’ll touch upon why it’s considered so powerful when we discuss Consensus later).
- Remedies: We briefly introduced two primary techniques for mitigating the ABA problem: Pointer Tagging and Hazard Pointers.
Learning Goals Today
Our objectives for today are twofold, reflecting the synergy between practice and theory:
- Introducing Formal Correctness Theory:
- Define Linearizability rigorously as a correctness condition for concurrent objects.
- Understand the concept of Histories to formally represent concurrent executions.
- Appreciate the significance of Composability in Linearizability.
- Briefly contrast Linearizability with the weaker model of Sequential Consistency, noting why the latter is not composable.
- (Briefly) Consensus Motivation: Consider the relative power of atomic operations like CAS versus simpler ones like Test-and-Set (TAS), setting the stage for a later discussion on Consensus.
As the quote from Susser emphasizes, navigating the complexities of concurrent programming requires both practical techniques and a solid theoretical understanding. Today, we aim to strengthen both.
Formal Models: Linearizability
Linearizability = Atomic consistency
The Core Question: When multiple threads operate concurrently on a shared object (like a queue or stack), what constitutes correct behavior?
Linearizability (proposed by Herlihy and Wing) is a widely adopted correctness condition that provides an intuitive yet formal answer.
- The Intuition: Despite the concurrent execution and overlapping method calls, the object’s behavior should be understandable as if each method call took effect atomically at a single, instantaneous point in time somewhere between its start (invocation) and end (response).
- Linearization Point: This conceptual instant of atomic effect.
- Definition: An object (or an execution history) is linearizable if every possible concurrent execution can be mapped to at least one valid sequential execution that respects the object’s sequential specification (e.g., FIFO for a queue) and, crucially, respects the real-time ordering of non-overlapping operations in the original concurrent execution.
Sequential vs. Concurrent Views
Sequential programming assumes a clear state between method calls, and methods can be analyzed based on pre/post conditions in isolation. Let us see if the following examples are linearizable.
Example 1:
Example 2:
Example 3:
Example 4:
Example 5:
Linearization Points and Composability
It’s important to note that the linearization point – the conceptual instant an operation takes effect – isn’t always tied to a fixed line of source code. It often depends on the state of the object during execution. For example, a dequeue
operation on an empty queue might linearize at the point where it checks and determines the queue is empty (perhaps throwing an exception), while a dequeue
on a non-empty queue linearizes later when it successfully retrieves the item and updates the head
pointer.
But what actually is linearizability?
- Linearizability is a correctness condition for concurrent systems where operations appear to occur instantaneously at some point between their start and end times.
- This ensures the system behaves as if all operations were executed in a legal sequential order that respects real-time constraints.
- Linearizable systems are easier to reason about and offer strong consistency guarantees but may come with performance trade-offs.
- Correctness Guarantee
The system behaves as if all operations were executed in some legal sequential order that respects the real-time order of non-overlapping operations.- Intuitive Behavior
From the point of view of a client (e.g. a user thread or process), operations appear to happen instantly and sequentially. This makes reasoning about the system easier.- Stronger Than Serializability
Linearizability is stronger than serializability (a term often used in databases). It not only requires a valid sequential ordering but also that this order respects real-time constraints.- Composability
If individual operations or components are linearizable, the system as a whole can also be considered linearizable. This makes modular design and reasoning possible.- Fault Tolerance (sometimes)
In distributed systems, linearizability can imply strong guarantees about the visibility of updates — useful in things like replicated state machines (think: Raft, Paxos, etc.).
Formalizing Histories
To reason precisely, we model concurrent executions as histories.
We represent method calls as two distinct events:
- Invocation: Denoted like
A q.enq(x)
, representing threadA
invoking methodenq
with argumentx
on objectq
. It includes the thread, object, method, and arguments. - Response: Denoted like
A q: void
, representing threadA
receiving a response (in this case,void
) from objectq
. It includes the thread, object, and result.
- A History (H) is formally defined as a finite sequence of invocation and response events.
- An invocation and response match if they belong to the same thread and the same object instance.
- An invocation is pending if it exists in the history without a matching response.
- A history (or subhistory) is complete if it contains no pending invocations.
To analyze specific aspects of a history, we use projections:
- Object Projection (H|q): The subsequence of H containing only events related to object
q
. - Thread Projection (H|A): The subsequence of H containing only events initiated by thread
A
.
The complete(H)
operation yields the subhistory of H containing only complete method calls (i.e., matching invocation-response pairs), effectively removing any trailing pending invocations.
A Sequential History is a history where method calls do not interleave. That is, for any method call (except potentially the very last one in the sequence if it’s pending), the response event immediately follows its invocation event.
A history H
is Well-formed if the projection onto each individual thread (H|A
for all threads A) is sequential. This captures the requirement that a single thread executes its operations one after another.
Two histories, H
and G
, are considered Equivalent if they are indistinguishable from the perspective of every thread. Formally, H|A = G|A
for all threads A.
A sequential history H
is Legal if, for every object x
involved in the history, the object projection H|x
adheres to the object’s defined sequential specification (e.g., satisfies pre/post conditions, obeys FIFO rules for a queue).
We define a precedence relation between method calls within a history H
. A method call m0
precedes another method call m1
, denoted m0 →H m1
, if the response event of m0
occurs chronologically before the invocation event of m1
. If neither m0 →H m1
nor m1 →H m0
, the method calls overlap. The precedence relation →H
establishes a partial order over the non-overlapping method calls in the history H
. This order becomes total if H
is sequential.
Formal Definition of Linearizability
A history H
is linearizable if it can be extended (by appending responses to some pending invocations that “took effect” and discarding others that “did not take effect”) to a history G
such that:
complete(G)
(the historyG
without any remaining pending invocations) is equivalent to some legal sequential historyS
.- The real-time precedence order of
G
is preserved inS
:→G ⊂ →S
. (Self-correction based on standard definition:→S ⊂ →G
). This means if operationm0
finished before operationm1
started in the actual concurrent executionG
, thenm0
must appear beforem1
in the equivalent sequential executionS
.
Determining which pending operations “took effect” is part of constructing the equivalent sequential history.
This visual emphasizes that the sequential history S
must be chosen such that it respects the non-overlapping intervals observed in G
. If a
finishes before b
starts in G
, then a
must come before b
in S
. G
is a partial order, while S
is a total order.
Properties and Reasoning Strategy
Composability: Linearizability is a composable property. If individual components (objects) in a system are proven linearizable in isolation, the entire system composed of these components is also linearizable. This modularity is essential for building and verifying complex concurrent systems.
Atomic registers (the basic building blocks) are inherently linearizable.
Recall: Linearization Points: The specific point where an operation linearizes (meaning the operation is complete and visible to others as if it happened at that exact point in time; i.e. it takes effect) might depend on the execution path (e.g., whether a queue dequeue finds the queue empty or not). It’s not always a fixed line of code but rather a dynamic point during execution.
Identifying Points:
-
Locking: Often the release of the lock, or the key update within the critical section.
-
Wait-Free/Lock-Free: Typically the single successful atomic instruction (like CAS or an atomic increment) that achieves the operation’s core effect.
-
Strategy: To prove linearizability, identify a linearization point for each operation such that the resulting sequential execution is always legal and respects the real-time partial order.
-
Summary: Linearizability is a powerful, composable correctness condition that formally captures the intuitive notion of atomic operations in a concurrent setting.
A really good video explaining this lecture (in english :D) imo…
Sequential Consistency (A Weaker Alternative)
While linearizability is often the desired model for software components, it’s important to be aware of weaker consistency models, particularly Sequential Consistency (SC), which is often discussed in relation to hardware memory models.
- Definition: A history H is sequentially consistent if it can be extended to G, such that
complete(G)
is equivalent to a legal sequential history S. - Key Omission: The real-time precedence constraint (
→S ⊂ →G
) is dropped.
- Requirements: The equivalent sequential history
S
must be legal and must preserve the program order of operations executed by the same thread. - Allowed Reordering: SC allows non-overlapping operations performed by different threads to be reordered in
S
compared to their actual execution order inG
.
Comparing SC and Linearizability
The queue execution where enq(x)
precedes enq(y)
in real-time, but deq()
returns y
, is not linearizable. However, it is sequentially consistent because we can construct a legal sequential history S = enq(y), enq(x), deq()->y
. This S
is FIFO-legal and respects the per-thread program orders (A: enq(x)
then deq()
; B: enq(y)
), even though it reorders the non-overlapping enq(x)
and enq(y)
compared to their real-time execution.
Non-Composability: Unlike linearizability, Sequential Consistency is not composable. Proving individual components are SC does not guarantee the combined system is SC.
This example demonstrates non-composability. Analyzing queue q
requires reordering B:q.enq(y)
before A:q.enq(x)
. Analyzing queue p
requires A:p.enq(x)
before B:p.enq(y)
. These required sequential orderings conflict, meaning the overall history H cannot be represented by a single valid sequential history, even though each projection H|q and H|p can be.
The Flags example (where threads write 1 to their flag but read 0 from the other’s flag) also shows non-composability. Each flag register is SC individually, but the combined behavior is inconsistent with any sequential execution of the operations.
Peterson’s algorithm relies on the property guaranteed by SC (or similar weak models) that writes eventually become visible and that there’s some consistent ordering observed by reads, ensuring at least one process sees the other’s flag set if both attempt entry.
Side Remark: Quiescent Consistency
Sounds more complicated than it actually is… basically we define a partial ordering between periods where nothing happens…
Linearizability demands that the equivalent sequential order respects the real-time order of all non-overlapping operations. Sequential Consistency drops the real-time requirement between different threads entirely. Quiescent Consistency (QC) offers a middle ground.
- Idea: Perhaps we only care about preserving the real-time order of operations when the system has momentarily settled down.
- Quiescence: A period during an execution where no method invocations are pending (i.e., no operations are currently in progress).
- Definition: Quiescent Consistency requires that the equivalent sequential history respects the real-time order of any two operations (
m0 →H m1
) only if they are separated by a period of quiescence in the original history H. If operations overlap, or if they are non-overlapping but occur without a quiescent period between them, QC imposes no constraint on their relative order in the sequential history (beyond per-thread program order and sequential legality).
In other words, quiescent consistency requires non-overlapping methods to take effect in their real-time order only if the system was quiet between the first one finishing and the second one starting.
TLDR
Quiescent Consistency (QC) ensures operations appear in a legal order between points of quiescence — times when no operations are in progress.
It’s weaker than Linearizability (LC), which enforces real-time ordering, and different from Sequential Consistency (SC), which only preserves program order without real-time constraints.🔹 QC: Correctness is checked between “quiet moments” (no operations in flight).
🔹 LC: Every operation must look instant and respect real-time.
🔹 SC: Looks sequential per thread, but real-time order can be violated.QC is easier to implement than LC but gives stronger guarantees than SC in certain cases.
Quiescent Consistency vs. Sequential Consistency
QC and SC are incomparable – neither is strictly stronger than the other, meaning there are histories permissible under one model but not the other.
Relevance to Hardware
Why SC? While intuitive (“program order execution”), SC is often considered too strong and performance-limiting for hardware architects to guarantee by default.
Modern processors use caches, store buffers, and lazy write-backs. A write by one core is not instantly visible to others. This violates SC by default.
Stronger consistency (like SC or even stricter guarantees needed for linearizability) must be explicitly requested using memory barrier/fence instructions or implicitly via synchronization primitives (locks, atomics, volatile
keyword in Java/C#) which insert necessary fences.
Real-World Hardware: Provides memory models weaker than SC. SC can usually be achieved programmatically, but at the cost of performance (pipeline stalls, cache coherence traffic caused by fences).
Summary Comparison:
- Linearizability: Strong, composable, intuitive “atomic” view. Best for reasoning about concurrent software objects.
- Sequential Consistency: Weaker, non-composable, simpler definition. Sometimes used as an idealized hardware model, but often still stronger than actual hardware guarantees.
Continue here: 26 Consensus, Consensus Hierarchy, Transactional Memory, Atomic Blocks, Scala-STM, Clock Based STM