Lecture from: 12.12.2024 | Video: Videos ETHZ
Recap: Single Source Shortest Paths
We’ve recently explored algorithms for finding shortest paths in a graph with a cost function , focusing on finding the shortest path from a source vertex to all other vertices .
Let’s recap these single-source shortest path algorithms:
Cost Function | Algorithm | Runtime | Core Idea |
---|---|---|---|
for all | BFS | Explore the graph layer by layer. | |
Dijkstra’s | Greedily explore vertices in order of increasing distance from the source. | ||
Bellman-Ford | Iteratively relax edges to improve distance estimates; detects negative cycles. |
Where (number of vertices) and (number of edges).
The All-Pairs Shortest Path Problem:
A limitation of the algorithms above is that they solve the single-source problem. Today, we address the all-pairs shortest path problem: finding the shortest path between all pairs of vertices in the graph. This problem has broad applications, from navigation and network routing to social network analysis (e.g., identifying central nodes).
A naive approach would be to run a single-source algorithm for each vertex. This would result in the following runtimes:
Cost Function | Algorithm | Runtime |
---|---|---|
BFS repeated | ||
Dijkstra’s repeated | ||
Bellman-Ford repeated |
Geht es besser?? Yes! We’ll explore two more efficient algorithms:
- Floyd-Warshall: runtime.
- Johnson’s Algorithm: runtime.
These algorithms provide significant improvements for dense graphs (where is close to ) and enable solving the all-pairs shortest paths problem more efficiently.
Floyd-Warshall: All-Pairs Shortest Paths
The Floyd-Warshall algorithm employs dynamic programming to solve the all-pairs shortest path problem. It leverages a clever subproblem definition to efficiently compute the shortest paths between all pairs of vertices.
Subproblem Definition
Let be the set of vertices in the graph. We define the following subproblem:
Example:
In this example, would represent the shortest path from vertex 1 to vertex 3 using only vertices 1 and 2 as possible intermediate vertices.
Recursive Formulation: Building Solutions from Subproblems
We can construct a recursive relation for based on whether vertex is used as an intermediate vertex in the shortest path from to .
Case 1: Vertex i
is not an intermediate vertex
If the shortest path from to doesn’t use vertex , then the path only uses intermediate vertices from the set . Thus:
Case 2: Vertex i
is an intermediate vertex (exactly once)
If the shortest path from to does use vertex , it can be broken down into two subpaths:
- The shortest path from to using intermediate vertices in : . Note that in this subproblem the path from cannot use itself as intermediate node since it’s only allowed to use nodes .
- The shortest path from to using intermediate vertices in : . The same applies here: the path itself cannot use as an intermediate node.
Combining these subpaths:
Case 3: Vertex i
appears multiple times
If vertex appears multiple times in a shortest path and the total cost of the cycle containing i is negative, we have a negative cycle. In this case, the shortest path is not well-defined, as we could repeatedly traverse the negative cycle to decrease the path length indefinitely.
Combining Cases (Assuming No Negative Cycles)
Taking the minimum of the two valid cases, we get the recursive relation:
Base Case:
The base case for this recursion is when no intermediate vertices are allowed ():
This initializes the shortest path distances to the direct edge costs or infinity if no direct edge exists.
Algorithm: Floyd-Warshall in Pseudocode
Building upon the recursive formulation, we can develop a concise pseudocode implementation of the Floyd-Warshall algorithm.
Pseudocode:
We assume:
- No self loops
- No negative cycle
This pseudocode initializes the distance matrix with direct edge costs and infinity for other pairs. Then, it iteratively considers each vertex as a potential intermediate vertex, updating shortest path distances according to the recursive relation. A final loop is included for negative cycle detection.
Runtime Analysis: Cubic Complexity
The runtime of the Floyd-Warshall algorithm is determined by the three nested loops in the main dynamic programming phase. Each loop iterates over all vertices, resulting in a time complexity of .
Space Complexity: Quadratic with Optimization
The basic implementation uses space due to the three-dimensional distance matrix. However, by observing that we only need the results from the previous iteration (i-1
) to compute the current iteration (i
), we can reduce the space complexity to by using a single two-dimensional matrix and overwriting it in place. This in-place update significantly reduces memory usage without affecting the time complexity.
Reconstructing Shortest Paths: Backtracking
To find the actual shortest paths, not just the distances, we can maintain a separate “predecessor” matrix pred
. During the main loop, whenever we update d[i][j]
, we also update pred[i][j]
to store the intermediate vertex k
that led to the improvement. By backtracking through the pred
matrix, we can reconstruct the sequence of vertices along the shortest path between any two vertices. This backtracking process takes linear time, , for each path.
Detecting Negative Cycles
A crucial aspect of the Floyd-Warshall algorithm is its ability to detect negative cycles. While it might seem intuitive to check for negative values in the diagonal of the distance matrix (i.e., ) after each iteration, this is not always sufficient. Let’s explore why.
Counterexample:
Consider these two graphs with identical structure and weights but different vertex numbering:
Graph 1:
Here, (path: 1 → 2 → 3 → 2 →1), correctly indicating a negative cycle.
Graph 2:
However, in this graph, , even though the same negative cycle exists. This is because with this numbering we are going through the nodes in a different order meaning we find the negative cycle later.
A Reliable Criterion for Negative Cycle Detection:
While checking the diagonal after each iteration isn’t foolproof, the following statement is guaranteed to be true:
Theorem: A negative cycle exists in the graph if and only if there exists a vertex such that after the algorithm completes all iterations.
Intuition: If we can reach a vertex from itself with a negative cost after considering all possible intermediate vertices, it means we’ve traversed a negative cycle.
Implication for Floyd-Warshall:
This theorem provides a simple and efficient way to detect negative cycles after running the Floyd-Warshall algorithm:
-
Run the Algorithm: Execute the Floyd-Warshall algorithm to compute the final distance matrix .
-
Check the Diagonal: Inspect the diagonal elements of the final distance matrix ( for all ). If any diagonal element is negative, a negative cycle exists.
-
Output: If no diagonal elements are negative, the algorithm has correctly computed all-pairs shortest paths. Otherwise, report the presence of a negative cycle.
Corrected Negative Cycle Check in Code:
Proof of the Theorem
Let’s rigorously prove the theorem connecting negative cycles to the diagonal elements of the distance matrix in Floyd-Warshall.
Theorem: A negative cycle exists in the graph if and only if there exists a vertex such that after the algorithm completes.
() Direction (Existence of Negative Cycle Implies Negative Diagonal Element):
If for some vertex after the algorithm completes (meaning all vertices have been considered as intermediate vertices), then there exists a path from back to itself with a negative total weight. This path, by definition, constitutes a negative cycle. This direction of the proof is relatively straightforward.
() Direction (Negative Cycle Implies Existence of Negative Diagonal Element):
This direction is more subtle. We need to show that if a negative cycle exists, the algorithm will inevitably produce a negative diagonal element in the distance matrix.
-
Assume a Negative Cycle: Let’s assume a negative cycle exists in the graph. A cycle is a closed walk with no repeated vertices (except the start and end vertex).
-
Decomposition into Simple Cycles: Any cycle (even one with repeated vertices) can be decomposed into one or more simple cycles. If the original cycle has a negative weight, at least one of these simple cycles must also have a negative weight (otherwise, the sum of non-negative weights couldn’t be negative).
-
Consider a Negative Simple Cycle Let be a simple negative weight cycle. Meaning it contains no repeated vertices except for the start and end node.
-
Select Maximal Index: Let be the vertex in cycle with the highest index. Let be any other vertex in (). This ensures that when the algorithm reaches iteration
j
all other nodes in the cycle have already been considered as intermediate nodes. -
Paths within the Cycle: Denote the path from to along cycle as and the path from back to along as . Since and are in a simple cycle, and do not contain the vertex as an internal node (it can only appear as a start or end node). However they can contain as an internal node.
-
Considering Subproblems: The paths and are considered in the subproblems and , respectively. This is because these subproblems consider paths using only intermediate vertices with indices up to , and neither nor uses as an intermediate node.
-
Bounding Path Costs: Due to the optimality principle of dynamic programming, we know that:
where denotes the cost of a path . (The shortest path can’t be longer than a specific path).
-
Recursion and Negative Cycle: In iteration , the algorithm computes:
Since (because is a negative cycle), we have . Thus, at least one diagonal element will become negative by iteration j at the latest (possibly even earlier), indicating the presence of a negative cycle. This negative value will then be propagated to
This completes the proof, showing that the final diagonal elements of the distance matrix reliably indicate the presence or absence of negative cycles in the graph.
Johnson’s Algorithm
We’ve seen that Floyd-Warshall solves the all-pairs shortest paths problem in time. Johnson’s algorithm offers an alternative approach with a runtime of . While the complexities might seem similar, the key difference lies in their dependence on the number of edges ().
Dense vs. Sparse Graphs:
-
Dense Graphs: In dense graphs, where approaches (the maximum possible number of edges), Floyd-Warshall’s performance is generally more efficient. The term in Johnson’s runtime becomes less significant compared to the term as the graph becomes denser.
-
Sparse Graphs: In sparse graphs, where is much smaller than (e.g., linear in , as in trees), Johnson’s algorithm shines. Its runtime becomes closer to , significantly outperforming Floyd-Warshall’s cubic complexity. The logarithmic factor in Johnson’s algorithm becomes more prominent compared to the cubic term in Floyd-Warshall. Sparse graphs tend to be common in real-world applications.
TLDR: Dense Floyd-Warshall, Sparse Johnson’s Algorithm
Apsp and Johnson (MIT)
Johnson’s Strategy: Leveraging Dijkstra’s and Bellman-Ford
Johnson’s algorithm cleverly combines Dijkstra’s algorithm (efficient for non-negative edge weights) with Bellman-Ford (handles negative edge weights but slower) to achieve its efficiency:
-
Reweighting: It uses Bellman-Ford to compute a set of new edge weights that are non-negative and preserve the shortest paths. This process, called reweighting, eliminates negative weights while ensuring that running Dijkstra’s on the reweighted graph yields the correct shortest paths in the original graph.
-
Dijkstra’s for All Pairs: After reweighting, it applies Dijkstra’s algorithm to each vertex, efficiently solving the all-pairs shortest paths problem on the reweighted (and now non-negative) graph.
Reweighting in Johnson’s Algorithm: Handling Negative Edges
Johnson’s algorithm relies on a crucial reweighting step to transform the graph into one with non-negative edge weights while preserving the shortest paths. This allows the use of Dijkstra’s algorithm, which is more efficient than Bellman-Ford for non-negative weights.
A Tempting but Incorrect Approach: Adding a Constant
A seemingly simple solution would be to add a large enough constant to all edge weights, making them non-negative.
However, this approach does not work. Why? Because adding a constant to all edge weights changes the shortest paths.
The Problem: Adding a constant to edge weights disproportionately affects longer paths. Each edge in a path contributes the added constant, favoring paths with fewer edges, even if they weren’t originally the shortest.
We need a more sophisticated approach that ensures non-negative weights while maintaining the relative lengths of paths.
Reweighting: A Telescoping Sum Approach
To achieve a valid reweighting, we need a method that adds the same value to the weight of all paths between any two vertices and . This can be accomplished using a technique based on telescoping sums and a potential function.
Potential Function
We introduce a potential function , assigning a real-valued “height” to each vertex . We then define the reweighted edge cost as:
Effect on Path Weights
Let’s consider a path from to : .
Original Path Cost
Reweighted Path Cost
Notice that most of the terms cancel out (this is the telescoping sum). We are left with:
Key Observation
The reweighted path cost differs from the original cost by a constant amount () that depends only on the start and end vertices, not on the path itself.
This reweighting scheme, using the potential function, guarantees that the shortest path between any two vertices and remains the same under the original costs () and the reweighted costs ().
Finding a Suitable Potential Function
Our goal is to determine a height function such that all reweighted edge costs are non-negative: for all . This ensures that we can apply Dijkstra’s algorithm after reweighting.
Johnson’s Insight: Introducing a New Vertex
Johnson’s key idea is to introduce a new auxiliary vertex connected to all original vertices with zero-weight edges:
- Add a new vertex to the graph.
- Add directed edges with weight for all .
Now, define the height function as the length of the shortest path from the new vertex to vertex in this augmented graph:
Why is this a good idea?
Let’s analyze the implication of this definition for the reweighted edge costs:
We want to show that: .
1. Interpreting
represents the shortest path distance from to .
2. Shortest Path Property
By the definition of shortest paths, we know that for any edge in the original graph, the shortest path from to must be shorter than or equal to the shortest path from to followed by the edge . Formally:
- The left side, , represents the cost of the shortest path from to .
- The right side, , represents the cost of going from to via the shortest path and then directly to .
3. Rewriting the Inequality
Rearranging the inequality, we get:
4. Connecting to Reweighted Cost
Notice that the left-hand side of this inequality is exactly our definition of the reweighted edge cost . Therefore:
This demonstrates that by defining the height function using shortest paths from the new vertex , we guarantee that all reweighted edge costs are non-negative.
Computing with Bellman-Ford
To compute efficiently, we can use the Bellman-Ford algorithm on the augmented graph with as the source vertex. Bellman-Ford is crucial here because the original graph might contain negative edge weights.
After running Bellman-Ford, h(v)
is the calculated shortest distance from z
to v
. It will also detect any negative weight cycles in the original graph (if a negative cycle exists, we cannot guarantee non-negative reweighted costs).
If no negative cycles are detected, we proceed with Dijkstra’s algorithm on the reweighted graph. This combination of Bellman-Ford for reweighting and Dijkstra’s for efficient shortest path computation is what makes Johnson’s algorithm so effective.
Example:
Summary and Runtime Analysis
Johnson’s algorithm provides an efficient solution to the all-pairs shortest paths problem, especially for sparse graphs. Let’s summarize the steps and analyze the runtime:
Step | Runtime | Description |
---|---|---|
1. Augment Graph | Add a new vertex and connect it to all existing vertices with zero-weight edges. | |
2. Compute Heights | Run Bellman-Ford from to calculate the height function (shortest distance from to ). Detect negative cycles. If a negative cycle is found, report it and terminate. | |
3. Reweight Edges | Compute reweighted edge costs for all edges . | |
4. Run Dijkstra’s | Run Dijkstra’s algorithm from each vertex in the reweighted graph to compute all-pairs shortest paths. |
Total Runtime
The total runtime is dominated by steps 2 and 4:
Matrices and Graphs
We can represent walks in a graph with vertices using matrices. Specifically, powers of the adjacency matrix capture information about the existence of walks of different lengths between vertices.
Let’s consider walks from vertex to vertex with exactly edges. These walks have a recursive structure: a walk of length from to can be broken down into a walk of length from to some intermediate vertex , followed by a single edge from to .
This recursive structure allows us to model various graph problems using matrix operations.
Examples: Modeling Graph Problems with Matrices
We can define different problems based on the properties of the walks we’re interested in.
Example 1: Existence of Walks
Let be the adjacency matrix of graph , where if there’s an edge from to , and otherwise. Define:
Recursive Formulation
A walk of length from to exists if there’s an intermediate vertex such that there’s a walk of length from to AND an edge from to . This can be expressed using logical OR and AND:
Example 2: Minimal Cost of a Walk
Now, let’s consider weighted graphs. Let be the cost of the edge from to (or if no edge exists). We want to find the minimum cost of a walk of length from to . Define:
Recursive Formulation
The minimum cost walk of length from to can be found by considering all possible intermediate vertices . We take the minimum over the costs of reaching an intermediate vertex in steps and then adding the cost of the edge from to :
where we use the following conventions:
- (cost of a zero-length walk from a vertex to itself is 0)
- for (no walk of length 0 exists between distinct vertices)
This formulation captures the principle of optimality: the minimum cost path of length must be composed of a minimum cost path of length followed by a single edge.
Example 3: Number of Walks
Finally, let’s count the number of distinct walks of length from to . Define:
Recursive Formulation
To count the number of walks of length , we sum the number of walks of length to each intermediate vertex that has a direct edge to :
Where is an indicator function that is 1 if there is an edge from s
to j
and 0 otherwise.
Base Case: (the adjacency matrix itself, as each edge represents a walk of length 1). This means that if there is a walk (an edge) of length one from i
to j
.
Matrix Representation of Walk Counts
Let’s connect the recursive formulations from the previous examples to matrix operations. Notice that the summation in the recursive definition of the number of walks closely resembles matrix multiplication.
Matrix Formulation for Number of Walks
Recall that the number of walks of length from to is given by:
This is precisely the formula for matrix multiplication. Therefore, we can express the number of walks of length using the -th power of the adjacency matrix:
where is the adjacency matrix and is the matrix whose entry represents the number of walks of length from to . Note that standard arithmetic addition and multiplication are used here.
Theorem: Walks and Matrix Powers
The element in (the -th power of the adjacency matrix) is equal to the number of walks of length from vertex to vertex . Let us prove this by induction.
Base Case ()
. By definition, the adjacency matrix has a 1 in entry if there is an edge (a walk of length 1) from to , and 0 otherwise. This matches our definition of .
Inductive Hypothesis
Assume the theorem holds for some . That is, assume that represents the number of walks of length from to .
Inductive Step ()
We want to show that represents the number of walks of length from to .
By the definition of matrix multiplication:
By the inductive hypothesis, is the number of walks of length from to . is 1 if there’s an edge from to (a walk of length 1), and 0 otherwise.
Therefore, the sum represents the total number of walks that can be formed by taking a walk of length from to some intermediate vertex , followed by a single edge from to . This is precisely the number of walks of length from to .
Applications of Matrix Powers in Graph Analysis
The connection between matrix powers and walk counts has several important applications in graph theory and algorithms. Let’s explore two of them:
Application 1: Counting Triangles in a Directed Graph
Let be a directed graph without self-loops. A directed triangle is a set of three vertices such that the directed edges , , and exist in the graph.
Counting Triangles with Matrix Trace:
The number of directed triangles in can be calculated using the trace of the cubed adjacency matrix:
where denotes the trace of (the sum of its diagonal elements).
Why does this work?
represents the number of walks of length 3 from vertex back to itself. Each such walk corresponds to a triangle containing vertex . However, each triangle is counted three times (once for each of its vertices as the starting point). Dividing by 3 corrects for this overcounting.
Generalization to Cycles of Length k: By calculating we can find the total number of cycles of length k. We would have to divide by k to adjust for overcounting.
Application 2: All-Pairs Reachability and Transitive Closure
Reachability: The reachability problem asks: given two vertices and , does there exist a directed path from to ? This is equivalent to finding the transitive closure of the graph.
Undirected Graphs: In undirected graphs, reachability is determined by connected components. Algorithms like Depth-First Search (DFS) or Breadth-First Search (BFS) can efficiently find connected components.
Directed Graphs: Reachability in directed graphs is more complex.
Adding Self-Loops: A common trick is to add self-loops to each vertex in the graph. This ensures that if a path from to exists, there also exists a walk of length at most from to (we can use self-loops to extend shorter paths to the required length). Note that this applies only if there are no cycles. With cycles, there is a path iff there is a walk with length smaller than .
Using Matrix Powers for Reachability After adding the self loops, we can check for reachability by computing or even better if we also add self loops to the graph . There is a path from u to v iff .
This matrix , obtained by summing powers of the adjacency matrix (up to or ), effectively represents the transitive closure of the graph. If , there’s a path from to ; otherwise, there isn’t. The question we need to answer however is: How to matrix multiplication efficiently?
Calculating Matrix Powers Efficiently: Iterative Squaring
Using powers of the adjacency matrix to solve graph problems requires efficient matrix exponentiation. Naive repeated multiplication is too slow for large exponents.
Iterative Squaring: A Logarithmic Speedup
Iterative squaring, also known as repeated squaring, drastically reduces the number of multiplications needed to compute . It leverages the following observation:
Algorithm:
- Initialize:
result = I
(identity matrix),temp = A
,k' = k
- Iterate: While
k' > 0
:- If
k'
is odd:result = result * temp
(using appropriate multiplication - arithmetic or boolean),k' = k' - 1
temp = temp * temp
k' = k' / 2
- If
- Return:
result
Runtime Analysis: From Linear to Logarithmic
The number of iterations is proportional to , as we halve k'
in each iteration. Each iteration involves a constant number of matrix multiplications. With standard matrix multiplication (), the total runtime becomes —a significant improvement over the naive . This logarithmic complexity makes iterative squaring practical for even very large exponents.
Efficient Matrix Multiplication: Strassen’s Algorithm
Standard matrix multiplication of two matrices takes time. Strassen’s algorithm provides a faster approach by cleverly reducing the number of subproblems in a divide-and-conquer strategy.
Divide and Conquer:
-
Divide: Split each matrix into four submatrices.
-
Conquer: Recursively compute products of submatrices. Strassen’s key insight is that the product of two matrices can be computed using only 7 multiplications (instead of the usual 8) and 18 additions/subtractions. This reduction in multiplications at each level of recursion leads to asymptotic improvement.
Note: While not shown here, the specific way these 7 multiplications are performed is crucial to Strassen’s algorithm and involves clever combinations of the submatrices.
Runtime Analysis: Master Theorem
The recurrence relation for Strassen’s algorithm is:
- : Number of recursive subproblems.
- : Size of each subproblem.
- : Time for combining subproblem results (additions/subtractions).
Applying the Master Theorem (Case 1), we get a runtime of . This is a significant improvement over the standard matrix multiplication. Strassen’s algorithm demonstrates that matrix multiplication can be performed faster than the straightforward cubic approach.
Geht es besser?
Yes! Further optimizations and more complex algorithms have pushed this theoretical bound even lower, though often with practical trade-offs. The algorithm and slight improvements of it are currently the most asymptotically efficient matrix multiplication algorithms but due to large constant factors not used in practice. But there exist other algorithms which are faster than Strassen’s algorithm which are practical too.
Final “side questing” Boss: If you want to, you can try figuring out a similar matrix computation for the previous two recursions using what we’ve learned in LinAlg, DiskMat and then implement it using concepts from AnD and EProg…