Lecture from: 19.02.2024 | Video: Video ETHZ
Course Administration
Why Take This Course?
Understanding parallel programming is increasingly vital in modern computing.
- Necessity Driven by Hardware: Since the early 2000s, significant performance gains no longer come from faster single cores but from using multiple cores in parallel. Effectively harnessing modern hardware requires parallel programming techniques.
- A Different Mindset: Parallelism challenges the sequential way we often think about computation. It requires reasoning about tasks that don’t have a fixed, total order of execution.
- Intellectually Engaging: Exploring the concepts and solving the challenges of parallel programming can be a stimulating and rewarding experience.
Course Goals
This course aims to equip you with:
- Practical Skills: The ability to design and implement parallel programs, primarily using Java (though concepts apply broadly, and C examples may be used).
- Fundamental Understanding: A solid grasp of the core principles underlying parallel computing. Specific tools and languages evolve, but fundamental concepts (like decomposition, synchronization, and efficiency analysis) endure and enable learning new paradigms.
Course Overview
The curriculum is structured into four main thematic blocks:
- (Parallel) Programming: Foundational concepts, tools, and techniques.
- Parallelism: Exploring task and data parallelism, strategies for decomposing problems, and representing dependencies using task graphs.
- Concurrency: Managing shared resources, understanding threads, locks, mutual exclusion, and preventing common issues like race conditions and deadlocks.
- Parallel Algorithms: Studying common patterns for designing parallel algorithms and analyzing their performance characteristics.
Part 1 Schedule:
Part 2 Schedule:
Learning Objectives
Upon successful completion of this course, you will be able to:
- Understand and apply fundamental concepts in parallelism and concurrency.
- Design and construct parallel algorithms using various paradigms (task/data parallelism) and mechanisms (threads, tasks, locks, communication channels).
- Reason about the correctness of parallel programs, identifying potential issues like race conditions and deadlocks, and ensuring properties like starvation-freedom.
- Analyze the performance of parallel algorithms, considering metrics like speedup and efficiency.
- Be prepared to implement parallel solutions for practical, real-world problems (e.g., searching large datasets, image processing).
Core Concepts: Parallelism and Concurrency
While often related, parallelism and concurrency address distinct aspects of computation:
Parallelism: The concept of executing multiple computations simultaneously to solve a larger problem faster. It focuses on dividing work and utilizing multiple processing units at the same time.
Analogy
Searching for a needle in a haystack. If one person searches, it takes time T. If ten people search different sections of the haystack at the same time, the total time is significantly reduced (ideally close to T/10). Finding the maximum value in a large array is another task amenable to parallelism.
Concurrency: The challenge of managing access to shared resources by multiple tasks or threads that might be executing overlapping in time (even if not truly simultaneously on different cores). It focuses on coordination and preventing interference.
Analogy
Two chefs working in the same kitchen need to share the stove and oven. Concurrency management ensures they can both make progress without getting in each other’s way or spoiling each other’s dishes (e.g., ensuring only one uses the stove at a time if required).
Key Concepts in Concurrency:
- Mutual Exclusion: A fundamental property ensuring that only one thread or process can access a specific shared resource (the “critical section”) at any given time. This prevents data corruption caused by simultaneous modifications (race conditions).
- Starvation: A situation where a thread or process is perpetually denied access to a resource it needs to make progress, even though the resource may become available. Systems should ideally be starvation-free.
- Communication Protocol: A defined set of rules or mechanisms enabling threads/processes to coordinate and exchange information correctly and efficiently.
The Driving Force: Why Parallelism Now?
Historically, performance improvements largely followed Moore’s Law, which observed that the number of transistors on integrated circuits doubled roughly every two years. This density increase enabled faster clock speeds and thus better single-core performance.
The Clock Speed Plateau: Around 2005, increasing clock speeds further became impractical.
- Reason: The primary limitations became power consumption and heat dissipation. Faster, smaller transistors generate more heat in a smaller area. Cooling became a major bottleneck.
The Shift to Multi-Core: The industry responded by shifting focus from making single cores faster to putting multiple processing cores on a single chip.
- Strategy: Keep individual cores relatively power-efficient but increase overall throughput by executing multiple tasks in parallel across these cores. This necessitates parallel programming to leverage the available hardware effectively.
The Challenge of Mutual Exclusion
Mutual exclusion is critical when multiple entities need access to a resource that cannot be shared simultaneously.
Example: Alice (Cat) and Bob (Dog) Sharing a Garden
Alice and Bob cannot be in the garden at the same time.
Requirements:
- Mutual Exclusion: Only Alice or Bob can be in the garden at once.
- Progress (No Lockout): If the garden is empty and a pet wants to enter, it should eventually be allowed to do so. It shouldn’t be blocked indefinitely by the mechanism itself.
How can we enforce these rules?
Attempt 1: Shared Signal (Alternating Turns)
Use a single light signal. If the light is ‘On’, it’s Alice’s turn; if ‘Off’, it’s Bob’s. When leaving, the pet pulls a rope to toggle the light.
- Problem: Starvation. If Alice enters, she can potentially stay indefinitely (or repeatedly leave and re-enter before Bob gets a chance if the signal toggle is flawed), preventing Bob from ever getting his turn. The signal only indicates whose turn it is, not whether the garden is free.
Attempt 2: Notification Flags
Each pet has a flag they can raise to signal their intent to enter.
Access Protocol 2a (Check-Then-Set)
- Alice checks if Bob’s flag is down.
- If down, Alice raises her flag and enters.
- Problem 1: Mutual Exclusion Failure. Alice and Bob could check simultaneously, both see the other’s flag down, both raise their flags, and both enter the garden.
- Problem 2: Deadlock. Alice raises her flag. Bob raises his flag. Now Alice sees Bob’s flag is up and waits. Bob sees Alice’s flag is up and waits. Neither can proceed.
Access Protocol 2b (Set-Then-Check)
- Alice raises her flag (signaling intent).
- Alice checks if Bob’s flag is up.
- If Bob’s flag is down, Alice enters. (If up, she presumably lowers her flag and tries again later).
- Problem: Livelock. Alice raises flag. Bob raises flag. Alice sees Bob’s flag, lowers hers. Bob sees Alice’s flag, lowers his. They both immediately try again, raise flags, see the other’s flag, lower flags… They are continuously active but make no progress in entering the garden. This is a form of starvation where processes are busy but achieve nothing.
Attempt 3: A Combined Protocol (Flags + Signal)
Let’s combine the flags (for intent) with the binary signal light (for arbitration/turn-taking).
Protocol:
- Signal Intent: Pet raises its flag.
- Check Other Pet: While own flag is raised, check the other pet’s flag.
- Check Signal (If Contention): If both flags are up, check the light signal. The pet whose turn it is according to the signal has priority.
- Defer (If No Priority): The pet without priority lowers its flag and waits (goes back to step 2 or waits longer).
- Enter (If Priority or No Contention): The pet with priority (or if the other flag was down initially) keeps its flag up and enters the garden.
- Exit and Switch: When leaving, the pet lowers its flag and then switches the light signal to give the turn to the other pet.
How this solves the issues:
- Mutual Exclusion: A pet only enters if its flag is up and either the other flag is down OR it has priority via the signal. Ensures only one enters.
- Deadlock: The light signal breaks the symmetry when both want to enter simultaneously; one is given priority.
- Livelock: The fixed priority determined by the signal prevents endless deferral loops.
- Starvation: Because the signal alternates after each exit, a waiting pet will eventually get priority.
Remaining Issue (Fairness): This protocol prevents indefinite starvation, but one pet could still potentially “hog” the garden by entering, leaving quickly, and immediately trying to re-enter before the other pet can react, possibly getting priority again depending on timing. True fairness often requires more sophisticated mechanisms (like counters or queues).
The Producer-Consumer Pattern
This is a classic concurrency pattern involving coordination between entities that produce data and entities that consume it, using a shared buffer.
Example: Bob (Zookeeper/Producer) and Alice (Tiger/Consumer)
- Bob (Producer): Fills plates with food.
- Alice (Consumer): Eats food from plates.
- Shared Resource: The plates (buffer).
- Constraints:
- Alice can only eat if there is food (plates not empty).
- Bob can only refill plates when Alice isn’t currently eating (mutual exclusion sometimes needed, depending on exact setup, but here focused on buffer state).
Unlike pure mutual exclusion (two competing entities), this is cooperative. Synchronization ensures the consumer doesn’t access an empty buffer and the producer doesn’t overflow a full buffer (or interfere).
Simple Flag-Based Protocol (Single Plate/Slot Buffer)
Assume a single flag signals the state of the plates:
- Flag UP = Plates EMPTY.
- Bob (Producer) Action:
- Waits until Flag is UP.
- Enters, fills the plates.
- Exits, sets Flag to DOWN (Plates FULL).
- Alice (Consumer) Action:
- Waits until Flag is DOWN.
- Enters, eats the food.
- Exits, sets Flag to UP (Plates EMPTY).
This simple protocol works for a single-slot buffer because the actions are inherently sequential and cooperative. The flag clearly communicates the state relevant to the other party.
The Readers-Writers Problem
Another fundamental concurrency scenario involving access to a shared resource where access patterns differ:
- Readers: Only inspect the data. Multiple readers can access the resource concurrently without issue.
- Writers: Modify the data. A writer requires exclusive access – no other reader or writer should be active simultaneously.
This pattern is common in databases, file systems, and shared configuration objects.
Example: Alice (Writer) and Bob (Reader) Sharing a Message
Alice updates a shared message piece by piece, while Bob reads it.
Scenario:
- Alice writes “Pet”. Bob reads “Pet”. (OK)
- Alice writes “Loves”. Bob reads “Loves”. (OK)
- Alice writes “Ham”. Bob is temporarily delayed (e.g., thread scheduling) and doesn’t read yet.
- Alice overwrites “Ham” with “I”. Bob still hasn’t read.
- Alice overwrites “I” with “Hate”. Bob still hasn’t read.
- Alice overwrites “Hate” with “Salad”. Bob finally reads “Salad”.
Result: Bob’s combined message is “Pet Loves Salad” - an inconsistent, partially updated view of the data.
What went wrong? The reader observed the shared data in an intermediate, inconsistent state because the writer’s multi-step update was interleaved with the reader’s access (or lack thereof due to timing).
Why This Matters: The Complexity of Reality
These simple analogies illustrate fundamental synchronization problems, but they hide deeper complexities.
-
The Bad News:
- Real-world parallel computing is vastly more intricate than these pet-and-zookeeper examples suggest.
- Memory Models & Reordering: Critically, the effects of an action by one thread (like setting a flag:
flag = true
) are not guaranteed to be immediately visible to other threads, nor are actions guaranteed to appear in the same order to other threads as they were executed in the original thread’s code. Compilers and hardware perform optimizations (like instruction reordering and caching) that can lead to unexpected behaviors if not handled carefully. The simple flag protocols discussed earlier are often broken in practice due to these effects. - Understanding why this happens requires delving into computer architecture (caches, memory barriers) and memory consistency models – topics for deeper study.
-
The Good News:
- Hardware Support: Modern CPUs provide special atomic instructions (like Compare-and-Swap, Fetch-and-Add) that guarantee certain operations complete indivisibly, helping to build reliable low-level synchronization. They also provide memory barriers/fences to control the visibility and ordering of memory operations.
- High-Level Abstractions: Programming languages (like Java) and libraries offer higher-level concurrency primitives (e.g.,
synchronized
blocks,ReentrantLock
,Semaphore
,BlockingQueue
,AtomicInteger
) that encapsulate these complexities and provide safer, easier-to-use mechanisms for managing concurrency. - Focus: While we won’t always wrestle with the lowest-level hardware details in this course, understanding the existence and nature of these challenges (like reordering and visibility) is crucial for correctly using the higher-level abstractions and for debugging tricky concurrency bugs. This course provides the foundational knowledge needed.
Continue here: 02 Java Recap and JVM Overview