Lecture from: 28.11.2024 | Rui Zhangs Notes | Video: Videos ETHZ
The Cheapest Path Problem
Recall from last time…
The cheapest path problem aims to find the path with the minimum total cost between two vertices in a graph. Costs are associated with edges, representing distances, time, or other metrics. We restrict ourselves to non-negative edge costs in this part, which allows for efficient algorithms. This problem has numerous applications in areas like navigation, network routing, and transportation planning.
Definitions and Assumptions
-
Weighted Directed Graph: We work with a weighted directed graph , where is the set of vertices and is the set of directed edges. Each edge has a non-negative weight (cost) denoted by .
-
Cheapest Path Cost: The cost of the cheapest path from vertex to vertex is denoted by . If no path exists, .
Recurrence Relation for Cheapest Paths
A fundamental concept for solving the cheapest path problem is the recurrence relation, which expresses the cheapest path cost recursively:
This states:
- The cheapest path from a vertex to itself has cost 0.
- The cheapest path from to (if ) is found by considering all neighbors of , and choosing the neighbor that minimizes the sum of the edge cost and the cheapest path cost from to .
- If no path exists, the cost is infinite.
Implicit Ordering and Avoiding Circularity
This recurrence appears circular, as it requires knowing to compute . However, algorithms like Dijkstra’s address this by implicitly considering vertices in increasing order of their distances from the source vertex. This ordering ensures that by the time we compute , the values for relevant predecessors have already been correctly determined.
To make this clearer, let’s redefine the recurrence relation in terms of a set containing vertices whose shortest distances from the source are known. We use to denote the shortest distance from to , considering only paths using vertices in .
This revised recurrence solves the circularity issue. The algorithm’s strategy is to iteratively expand by adding the vertex with the smallest value among those not yet in . This ensures that when computing , all relevant values are already known.
Proof of Correctness of the Recurrence with Implicit Ordering (Non-Negative Costs)
Recall from last time…
We prove that by maintaining the set and adding vertices in non-decreasing order of their tentative distances, the algorithm correctly computes the shortest path distances.
Theorem: When the algorithm selects a vertex with the minimum to add to , the tentative distance equals the true shortest path distance .
Proof by Contradiction:
-
Assumption: Assume there exists a cheaper path from to with cost .
-
Path Decomposition: Since , path must leave at some point. Let be the last vertex in along , and let be the first vertex outside along . (Possibly ).
-
Cost Analysis: The cost of is: . Due to non-negative edge costs, . Thus, .
Because was minimal we know that and because we assumed there was a shorter path we get: . That’s a contradiction since is the minimal cost among reachable nodes not in .
-
Contradiction: This contradicts the minimality of , as should be the smallest distance among vertices not in .
Therefore, our assumption must be false, proving that .
Negative Costs and Bellman-Ford Algorithm
As discussed, negative edge costs pose significant challenges for shortest path algorithms. Dijkstra’s algorithm fails in their presence. In this section, we delve deeper into the problem of general edge costs, exploring a dynamic programming approach.
Core Idea: Path Length as Ordering Criterion
The key idea is to organize the computation based on the length (number of edges) of the paths. We’ll iteratively compute the shortest path distances considering paths of increasing lengths. This approach avoids the pitfalls of negative cycles by explicitly limiting the path length. Note that this dynamic programming approach will compute the correct shortest path if one exists. It will, however, not terminate if there exists a negative cycle reachable from the source.
Bellman-Ford (MIT)
For a intuitive and simple explanation watch this…
l-Gute Schranken (l-Good Bounds)
The values represent upper bounds on the true shortest path distances. After the -th iteration, represents the shortest path distance from source vertex to vertex using paths with at most edges. These are called l-good bounds. They are “good” because they are guaranteed to be valid upper bounds; they may not be the shortest path, but we know the actual shortest path cannot be longer.
Let’s illustrate this with our example graph containing negative edge costs.
- : We consider paths of length 1 (i.e., direct edges from the source). For the top-most node , we find a path of cost 10. This is an l-good bound (an upper bound) because there might exist a shorter path with more than one edge. For the bottom-middle node, we also find a path with cost 100. Since there are no other paths to this node using only one edge, this l-good bound of 100 is also the shortest path using at most one edge. After this first iteration (), our set would contain the source node, the top-most node with , and the bottom-middle node with .
Improving Bounds: From -Good to -Good
We iteratively improve the l-good bounds, , representing the shortest path from to using at most edges.
A shortest path to with at most edges either:
- Uses at most edges: Then .
- Uses exactly edges: It’s a shortest path of length to a neighbor , plus edge .
Formalization
For each vertex :
We choose the shorter path: the existing or a path through a neighbor using edges.
Starting with 0-good bounds (only the source at distance 0), we iterate for , applying the equation. Each step refines our bounds. After iterations, we have the true shortest path distances, assuming no negative cycles. This works due to the principle of optimality: optimal solutions are built from optimal subproblems. If a negative cycle is reachable, the algorithm won’t terminate as distances can decrease indefinitely A further iteration after steps would reveal such a cycle.
Example: Iteratively Building Shortest Paths
Let’s illustrate the iterative improvement of bounds with a concrete example.
0-Good Bounds (Initialization)
Initially, we have 0-good bounds. Only the source node (A) has a distance of 0; all others are initialized to infinity, as they are not reachable with 0 edges.
Visually:
1-Good Bounds (First Iteration)
In the first iteration, we calculate 1-good bounds, considering paths of length at most 1. We examine direct neighbors of the source (A).
In tabular form:
2-Good Bounds and Beyond (Subsequent Iterations)
In the second iteration (calculating 2-good bounds), we consider paths of length at most 2. In this specific example, no further improvements are possible. The shortest paths to all reachable nodes have already been found in the previous iteration.
In general, we would continue iterating until (where is the number of vertices), as a simple path without cycles can have at most edges. This ensures that, in the absence of negative cycles, the algorithm has explored all possible shortest paths.
Bellman-Ford Algorithm
The Bellman-Ford algorithm computes shortest paths in a weighted directed graph, even with negative edge weights. It also detects negative cycles. If a negative cycle is reachable from the source, the algorithm reports it, as shortest paths are undefined in that case.
Algorithm Description
-
Initialization:
- Set the distance from the source vertex to itself to 0: .
- Set the distance from the source to all other vertices to infinity: for all . This indicates that initially, we don’t know any paths to these vertices.
-
Iteration (Relaxation):
- Relax all edges in the graph times, where is the number of vertices.
- To “relax” an edge means to check if going from to and then directly to is cheaper than the current known path to . If it is, update :
- If , update .
-
Negative Cycle Detection:
- Perform one more iteration of relaxations.
- If any distance can still be reduced, a negative cycle reachable from exists.
Pseudocode
BellmanFord(G, s):
Initialize distances: d[s] = 0, d[v] = ∞ for all v ≠ s
for i = 1 to |V| - 1: // Main relaxation loop
for each edge (u, v) in E:
if d[u] + c(u, v) < d[v]:
d[v] = d[u] + c(u, v)
for each edge (u, v) in E: // Negative cycle detection
if d[u] + c(u, v) < d[v]:
return "Negative cycle detected"
return d // Shortest path distances
Example:
def bellman_ford(graph, source):
distances = {vertex: float('infinity') for vertex in graph}
distances[source] = 0
for _ in range(len(graph) - 1):
for u in graph:
for v, weight in graph[u].items():
if distances[u] != float('infinity') and distances[u] + weight < distances[v]:
distances[v] = distances[u] + weight
for u in graph:
for v, weight in graph[u].items():
if distances[u] != float('infinity') and distances[u] + weight < distances[v]:
print("Graph contains negative weight cycle")
return
return distances
Runtime Analysis: Deriving
Let be the number of vertices (nodes) and be the number of edges.
-
Initialization: Setting initial distances takes time. We visit each vertex once.
-
Relaxation: The outer loop runs times. The inner loop iterates over all edges. Relaxing an edge takes constant time ().
- Therefore, the relaxation phase has a time complexity of . Since is in the same order of growth as
-
Negative Cycle Detection: This step iterates over all edges once, taking time.
-
Total Runtime: The dominant term is . The initialization and negative cycle detection steps are less significant in terms of asymptotic complexity.
-
Considering Sparse Graphs (): If the number of edges is approximately equal to the number of vertices minus 1 (which is the case in, for example, trees or minimally connected graphs), then . Substituting this into the runtime complexity, we get:
In this specific case where is close to (a sparse graph), then the time complexity becomes quadratic.
In summary, the overall time complexity of the Bellman-Ford algorithm is . In the specific case of a sparse graph where , the runtime simplifies to .
Minimum Spanning Trees
The Elektrifizierung Problem (1926)
In 1926, Otakar Borůvka tackled the challenge of designing a cost-efficient electrical network for Moravia. The problem involved connecting cities with minimal wiring costs while ensuring all cities were connected This is known as the minimum spanning tree (MST) problem in graph theory.
Modeling the Problem as an Undirected Graph:
The electrical network can be represented as an undirected, weighted graph:
- Vertices (Nodes): Represent the cities.
- Edges: Represent potential power lines between cities.
- Weights: Represent the cost of constructing a power line between two cities (e.g., distance, materials).
Crucially, the problem can be simplified. The source of electricity is irrelevant since we seek to connect all cities. We can conceptually treat all power sources as a single source. This allows us to model the network with an undirected graph, as the direction of power flow doesn’t affect the overall cost of connecting the cities.
Goal: Minimum Spanning Tree
The objective is to find a connected graph of minimal total edge weight that spans all cities. This, by definition, is a minimum spanning tree (MST): a subset of edges that connects all vertices without any cycles and with the minimum possible total edge weight. This tree represents the most cost-effective way to electrify the region, ensuring every city is connected to the power grid.
Minimum Spanning Trees (MSTs)
Greedy Algorithms: Minimum Spanning Tree (MIT)
A good explanation…
Given a connected, undirected graph with non-negative edge weights, a minimum spanning tree (MST) is a subgraph that fulfills the following conditions:
-
Connects all vertices (Spanning): Every vertex in must be included in the MST. This ensures that all nodes in the original graph are reachable within the tree.
-
Is acyclic: must not contain any cycles. If a cycle existed, we could remove any edge from the cycle, and the graph would remain connected but with a lower total weight, contradicting the minimality requirement of an MST. Therefore, acyclicity is essential for minimality.
-
Has minimum total weight: The sum of the weights of the edges in must be the smallest possible among all spanning trees of . Formally, if represents the weight of edge , then:
must be minimized. This condition ensures that the chosen set of edges provides the least costly way to connect all vertices.
In summary, a subset forms an MST if the graph is connected and acyclic, and is minimal among all possible connected, acyclic subgraphs that span . Any subgraph satisfying these properties represents a valid solution to the minimum spanning tree problem.
Motivation: Clustering
Finding a Minimum Spanning Tree (MST) has a direct application in clustering. An MST inherently groups vertices into a single connected component based on proximity, represented by edge weights. This creates a cluster where all vertices are linked with the minimal possible total cost. Furthermore, strategically removing edges from the MST, starting with the heaviest ones, allows us to divide the graph into smaller clusters, effectively creating a hierarchical clustering structure.
Hierarchical Clustering with MSTs
The process of cutting an MST to form clusters is based on a simple yet powerful idea: the heavier an edge in the MST, the less “similar” the two vertices it connects are likely to be. Therefore, by removing heavy edges, we disconnect less similar vertices and partition the graph into more cohesive clusters.
Procedure:
-
Compute the MST: Start by finding the MST of the graph using an algorithm like Borůvka’s or Prim’s.
-
Edge Removal: Remove edges from the MST one at a time, in decreasing order of weight. After each removal, the MST will be split into one more connected component. Each connected component represents a cluster.
-
Hierarchy: The order in which edges are removed creates a hierarchy of clusters. Initially, all vertices are in a single cluster (the MST). As edges are removed, this cluster is successively divided into smaller and smaller clusters. This process can be visualized as a dendrogram, a tree-like diagram that illustrates the hierarchical relationship between clusters.
Example: Applying to the Elektrifizierung Problem
Let’s revisit the electrification problem discussed above. After computing the MST, imagine that the most expensive edge connects cities that are geographically far apart or separated by difficult terrain. Removing this edge would divide the network into two sub-networks, each representing a more localized cluster of cities. Further edge removals would continue to break down the network into smaller, more tightly connected clusters. This could inform decisions about building more localized power distribution infrastructure. For example if a link to a certain city is extremely costly, it might be more reasonable to build a separate local power plant there.
Conclusion
Using MSTs for clustering provides a simple and intuitive method for grouping similar data points. By leveraging the structure of the MST and the weights of its edges, we can create a hierarchical representation of the data, revealing relationships at different levels of granularity. This technique is particularly useful when the data has a natural notion of distance or similarity that can be encoded in edge weights. The connection between MSTs and clustering further highlights the versatility of graph algorithms in solving a wide range of data analysis problems.
Now let us develop the algorithm step by step…
Distinct Edge Weights: A Simplifying Assumption
Before diving into the (Borůvka’s) algorithm, we introduce a simplifying assumption: all edge weights in the graph are distinct. This assumption simplifies the analysis and correctness proofs of MST algorithms. While real-world scenarios might not always satisfy this assumption, algorithms designed under this constraint often work correctly in the general case or can be adapted with tie-breaking rules. If edge weights are not distinct, a deterministic tie-breaking rule (e.g., based on vertex IDs) can be used to ensure that the algorithm behaves as if the edge weights were distinct.
Safe Edges: Guaranteed Inclusion in MST
A safe edge is an edge that is guaranteed to be part of some Minimum Spanning Tree (MST). Identifying safe edges allows us to build the MST incrementally by adding these edges without the risk of creating cycles or violating the MST properties.
Identifying Safe Edges: Building Blocks of MST Algorithms
The core idea behind efficient MST algorithms is to identify and add “safe edges”—edges guaranteed to be part of some MST—iteratively. Let’s explore two crucial concepts that help us pinpoint these edges:
The Cut Property: Dividing to Conquer
Imagine dividing the graph’s vertices into two disjoint sets, creating a “cut.” The Cut Property states that the cheapest edge crossing this cut must belong to at least one MST.
Intuition: Think of it like building bridges between two islands. To minimize cost, you’d always choose the cheapest bridge available, regardless of how other bridges connect within each island. This cheapest bridge represents a safe edge. No matter how you connect the cities on each side of the cut, including this cheapest cross-cut edge will always be part of an overall cheapest solution.
Locally Minimal Edges: A Refinement of the Cut Property
A locally minimal edge is an edge that’s the cheapest connection between a vertex and any vertex outside its immediate “neighborhood” (defined by a specific set of vertices). It’s a more localized version of the Cut Property.
Definition: An edge is locally minimal if there’s a set of vertices containing but not (or vice-versa), such that is the cheapest edge connecting to the rest of the graph.
Theorem: Every locally minimal edge is a safe edge.
Proof (by contradiction):
- Assume a locally minimal edge is not in any MST.
- Consider any MST . Adding to would create a cycle.
- This cycle must contain another edge crossing the cut defined by the set that made locally minimal.
- Since is locally minimal, .
- Replacing with in would create a new spanning tree with a lower total weight, contradicting the assumption that was an MST.
Therefore, must be in some MST, making it a safe edge.
Building the MST: An Iterative Approach
Armed with the concept of safe edges, we can construct an MST iteratively:
- Initialization: Begin with an empty set of edges.
- Iteration: Repeatedly find and add safe edges to this set until a spanning tree is formed.
This simple framework underpins several specific MST algorithms, each differing in how they efficiently identify safe edges. Borůvka’s algorithm, which we’ll explore next, provides a concrete example of this iterative approach. By focusing on finding and adding safe edges, we guarantee that the final set of edges will form a minimum spanning tree. The different algorithms essentially provide efficient strategies for discovering these safe edges.
Borůvka’s Algorithm: Connecting Components Efficiently
Borůvka’s algorithm, one of the earliest MST algorithms, provides an elegant and inherently parallel approach to constructing a minimum spanning tree. It operates by iteratively merging connected components through the selection of safe edges.
Algorithm Description
-
Initialization: - Begin with an empty set of edges to represent the future MST. - Initially, each vertex in the graph is treated as its own separate connected component. Think of these as isolated islands that we want to connect with bridges.
-
Iteration: As long as there’s more than one component:
- Find Cheapest Outgoing Edges: For each component, identify the cheapest edge connecting it to any other component. This is like finding the most affordable bridge from each island to a neighboring island. Use a deterministic tiebreaker (e.g., lowest vertex ID) if edge weights are not unique.
- Merge Components: Add these cheapest edges to the set . Merge the connected components joined by these edges. Note that some edges might be chosen by multiple components—only add each edge once to avoid cycles. This step is like building the selected bridges, thus merging islands into larger landmasses.
-
Termination: The algorithm stops when all vertices belong to a single connected component—a single “super-island.” The set of edges now constitutes the MST.
Pseudocode
Boruvka(G):
F = ∅ # Set of MST edges
Components = {{v} for v in V} # Initially, each vertex is its own component
while |Components| > 1:
SafeEdges = ∅
for each component C in Components:
cheapestEdge = findCheapestEdge(C, G) # finds the edge with minimum weight connecting C
# to another component. Returns None if it doesn't exist.
if cheapestEdge is not None:
SafeEdges.add(cheapestEdge)
for edge (u,v) in SafeEdges:
Components = mergeComponents(Components, u, v) # Merges components containing u and v
if (v,u) not in F: # prevent duplicate edges
F = F ∪ {(u, v)}
return F
Runtime Analysis:
Per Iteration: Each iteration requires examining all edges to find the cheapest for each component and merging components, taking time with efficient data structures (e.g., using a Union-Find data structure for merging).
Number of Iterations: The number of connected components at least halves with each iteration. This leads to a maximum of iterations.
Total Runtime: Combining these, Borůvka’s algorithm has a time complexity of . This signifies that the runtime grows proportionally to the product of the number of edges and vertices and the logarithm of the number of vertices. It’s important to note that this analysis assumes the use of efficient data structures for managing connected components and finding minimum edges. Without these optimizations, the runtime could be higher.
Prim’s Algorithm: Growing an MST from a Seed
Prim’s algorithm offers a different perspective on constructing MSTs. Instead of merging components, it starts with a single vertex and expands outward, like a growing tree, always maintaining a single connected component. This approach leverages a priority queue for efficiency.
Algorithm Description
-
Initialization: - Select an arbitrary starting vertex as the initial “seed” for the MST. - Create an empty set to store the MST edges. - Initialize a set to track vertices currently in the MST. - Assign a key value
key[v]
to each vertex :∞
if no edge connectss
tov
, orw(s, v)
if edge(s, v)
exists.key[v]
represents the cheapest known connection cost fromv
to the growing MST. - Use a priority queue to store all vertices, prioritized by theirkey
values (lowest cost at the front). -
Iteration: While the priority queue is not empty: - Select and Add: Extract the vertex with the minimum
key
value from . This vertex is the cheapest to connect to the current MST. Add to the set . - Update Neighbors: For each neighbor of that’s not in : - Ifw(u, v)
(the weight of the edge from to ) is less thankey[v]
, updatekey[v]
tow(u, v)
and update ‘s priority in . This step essentially discovers potentially cheaper connections to vertices outside the current MST. Crucially, if a cheaper edge tov
is found, it implies that the previously known cheapest edge tov
cannot be part of the MST. -
Termination: When is empty, all vertices are in , and the implicitly chosen edges (tracked via updates to the
key
values and aparent
dictionary in implementations) form the MST.
Prim algorithm using min-heap:
Pseudocode
import heapq
def prim(graph, start):
mst_set = set()
key = {v: float('inf') for v in graph}
parent = {v: None for v in graph} #store the MST
key[start] = 0
priority_queue = [(0,start)]
while priority_queue:
_, u = heapq.heappop(priority_queue)
if u in mst_set: #was already processed, skip
continue
mst_set.add(u)
for v, weight in graph[u].items():
if v not in mst_set and weight < key[v]:
key[v] = weight
parent[v] = u
heapq.heappush(priority_queue, (key[v], v))
return parent
Runtime Analysis: Efficient with Priority Queues
Using a binary heap for the priority queue, Prim’s algorithm has a runtime of , equivalent to Dijkstra’s algorithm. A Fibonacci heap could theoretically achieve , but binary heaps are often more practical.
Multithreaded example
Priority Queue (e.g. Min Heap)
A priority queue is an abstract data type that operates like a regular queue, but where each element has a priority associated with it. Elements are dequeued (removed) based on their priority, with the highest priority element always removed first. In a min-heap priority queue, the element with the smallest value (or key) has the highest priority. This makes it ideal for algorithms like Prim’s that need to repeatedly find and extract the element with the minimum key.
See Heap Sort
Heaps and Heap Sort (MIT)
Key Operations
make(nodes, key)
: Initialise all nodes with the key ofkey
insert(item, priority)
: Adds anitem
with the givenpriority
to the queue.extract_min()
: Removes and returns the item with the highest priority (lowest key).decrease_key(item, new_priority)
: Decreases the priority (key) of a specific item. This is crucial for Prim’s algorithm to update the cost of reaching a vertex when a cheaper path is found. It’s important that this operation efficiently updates the internal structure of the priority queue to maintain the heap property.
Common Implementations
- Binary Heap: A binary tree that satisfies the heap property (the value of each node is less than or equal to the value of its children). Binary heaps provide efficient time complexity for
insert
,extract_min
, anddecrease_key
operations, where is the number of elements in the queue. This logarithmic performance is crucial for the overall efficiency of algorithms like Prim’s. In practical implementations,decrease_key
often involves knowing the location of the element in the heap. This can be achieved by storing a mapping (e.g., a dictionary) from elements to their positions in the heap. - Fibonacci Heap: A more complex data structure that offers theoretically better performance, particularly for
decrease_key
, which takes amortized time. However, the constant factors associated with Fibonacci heaps often make binary heaps a more practical choice for most applications.
Importance in Prim’s Algorithm
The priority queue is essential for Prim’s algorithm to efficiently select the vertex with the minimum key in each iteration. The extract_min
operation allows quick retrieval of the next vertex to add to the MST, and decrease_key
enables efficient updates of the connection costs as the MST grows. Without a priority queue, finding the minimum key would require scanning all vertices, significantly increasing the algorithm’s runtime. The efficient logarithmic operations of a binary heap priority queue are key to achieving the overall time complexity of Prim’s algorithm.
Continue here: 13 Kruskal’s Algorithm, Union Find, Self-Adapting Data Structures and Amortized Analysis