14 All Source Shortest Paths, Floyd-Warshall, Johnson's Algorithm, Matrices and Graphs, Connected Components in Digraphs, Efficient Matrix Operations

Lecture from: 12.12.2024 | Video: Videos ETHZ

Recap: Single Source Shortest Paths

We’ve recently explored algorithms for finding shortest paths in a graph $G = (V, E)$ with a cost function $c : E \to R$ , focusing on finding the shortest path from a source vertex $s$ to all other vertices $t \in V$ .

Let’s recap these single-source shortest path algorithms:

Cost Function	Algorithm	Runtime	Core Idea
$c (e) = 1$ for all $e \in E$	BFS	$O (m + n)$	Explore the graph layer by layer.
$c (e) \geq 0$	Dijkstra’s	$O ((m + n) lo g n)$	Greedily explore vertices in order of increasing distance from the source.
$c (e) \in R$	Bellman-Ford	$O (n \cdot m)$	Iteratively relax edges to improve distance estimates; detects negative cycles.

Where $n = ∣ V ∣$ (number of vertices) and $m = ∣ E ∣$ (number of edges).

The All-Pairs Shortest Path Problem:

A limitation of the algorithms above is that they solve the single-source problem. Today, we address the all-pairs shortest path problem: finding the shortest path between all pairs of vertices $(s, t)$ in the graph. This problem has broad applications, from navigation and network routing to social network analysis (e.g., identifying central nodes).

A naive approach would be to run a single-source algorithm for each vertex. This would result in the following runtimes:

Cost Function	Algorithm	Runtime
$c (e) = 1$	BFS repeated	$O (n (m + n))$
$c (e) \geq 0$	Dijkstra’s repeated	$O (n (m + n) lo g n)$
$c (e) \in R$	Bellman-Ford repeated	$O (n^{2} \cdot m)$

Geht es besser?? Yes! We’ll explore two more efficient algorithms:

Floyd-Warshall: $O (n^{3})$ runtime.
Johnson’s Algorithm: $O (n \cdot (m + n lo g n))$ runtime.

These algorithms provide significant improvements for dense graphs (where $m$ is close to $n^{2}$ ) and enable solving the all-pairs shortest paths problem more efficiently.

Floyd-Warshall: All-Pairs Shortest Paths

The Floyd-Warshall algorithm employs dynamic programming to solve the all-pairs shortest path problem. It leverages a clever subproblem definition to efficiently compute the shortest paths between all pairs of vertices.

Subproblem Definition

Let $V = {1, 2, \dots, n}$ be the set of vertices in the graph. We define the following subproblem:

d_{u \to v}^{(i)} := Length of the shortest path from u to v using only vertices in {1, 2, \dots, i} as intermediate vertices.

Example:

In this example, $d_{1 \to 3}^{(2)}$ would represent the shortest path from vertex 1 to vertex 3 using only vertices 1 and 2 as possible intermediate vertices.

Recursive Formulation: Building Solutions from Subproblems

We can construct a recursive relation for $d_{u \to v}^{(i)}$ based on whether vertex $i$ is used as an intermediate vertex in the shortest path from $u$ to $v$ .

Case 1: Vertex `i` is not an intermediate vertex

If the shortest path from $u$ to $v$ doesn’t use vertex $i$ , then the path only uses intermediate vertices from the set ${1, 2, \dots, i - 1}$ . Thus:

d_{u \to v}^{(i)} = d_{u \to v}^{(i - 1)}

Case 2: Vertex `i` is an intermediate vertex (exactly once)

If the shortest path from $u$ to $v$ does use vertex $i$ , it can be broken down into two subpaths:

The shortest path from $u$ to $i$ using intermediate vertices in ${1, 2, \dots, i - 1}$ : $d_{u \to i}^{(i - 1)}$ . Note that in this subproblem the path from $u \to i$ cannot use $i$ itself as intermediate node since it’s only allowed to use nodes ${1, \dots, i - 1}$ .
The shortest path from $i$ to $v$ using intermediate vertices in ${1, 2, \dots, i - 1}$ : $d_{i \to v}^{(i - 1)}$ . The same applies here: the path itself cannot use $i$ as an intermediate node.

Combining these subpaths:

d_{u \to v}^{(i)} = d_{u \to i}^{(i - 1)} + d_{i \to v}^{(i - 1)}

Case 3: Vertex `i` appears multiple times

If vertex $i$ appears multiple times in a shortest path and the total cost of the cycle containing i is negative, we have a negative cycle. In this case, the shortest path is not well-defined, as we could repeatedly traverse the negative cycle to decrease the path length indefinitely.

Combining Cases (Assuming No Negative Cycles)

Taking the minimum of the two valid cases, we get the recursive relation:

d_{u \to v}^{(i)} = min (d_{u \to v}^{(i - 1)}, d_{u \to i}^{(i - 1)} + d_{i \to v}^{(i - 1)})

Base Case:

The base case for this recursion is when no intermediate vertices are allowed ( $i = 0$ ):

d_{u \to v}^{(0)} = ⎩ ⎨ ⎧ 0, c (u, v), \infty, if u = v if (u, v) \in E otherwise

This initializes the shortest path distances to the direct edge costs or infinity if no direct edge exists.

Algorithm: Floyd-Warshall in Pseudocode

Building upon the recursive formulation, we can develop a concise pseudocode implementation of the Floyd-Warshall algorithm.

Pseudocode:

We assume:

No self loops
No negative cycle

def FloydWarshall(V, E, c):
    """
    Computes all-pairs shortest paths using the Floyd-Warshall algorithm.
 
    Args:
      V: Set of vertices (integers from 1 to n).
      E: Set of edges, represented as pairs of vertices (u, v).
      c: Cost function c: E -> float, mapping edges to their costs.
 
    Returns:
        A 2D matrix d where d[u][v] represents the shortest path distance from u to v.
        Returns "Negative Cycle Detected" if a negative cycle is detected.
    """
    n = len(V)
    d = [[float('inf')] * n for _ in range(n)]  # Initialize distances to infinity
 
    # Base Cases: Distance to self is 0, direct edge costs
    for u in V:
        d[u-1][u-1] = 0
        for v in V:
            if (u, v) in E:
                d[u-1][v-1] = c((u, v))
 
    # Main Dynamic Programming Loop
    for k in range(1, n + 1):         # Intermediate vertices allowed (1 to n)
        for i in range(1, n + 1):     # Source vertex
            for j in range(1, n + 1): # Destination vertex
                d[i-1][j-1] = min(d[i-1][j-1], d[i-1][k-1] + d[k-1][j-1])
 
    return d

This pseudocode initializes the distance matrix with direct edge costs and infinity for other pairs. Then, it iteratively considers each vertex as a potential intermediate vertex, updating shortest path distances according to the recursive relation. A final loop is included for negative cycle detection.

Runtime Analysis: Cubic Complexity

The runtime of the Floyd-Warshall algorithm is determined by the three nested loops in the main dynamic programming phase. Each loop iterates over all $n$ vertices, resulting in a time complexity of $O (n^{3})$ .

Space Complexity: Quadratic with Optimization

The basic implementation uses $O (n^{3})$ space due to the three-dimensional distance matrix. However, by observing that we only need the results from the previous iteration (i-1) to compute the current iteration (i), we can reduce the space complexity to $O (n^{2})$ by using a single two-dimensional matrix and overwriting it in place. This in-place update significantly reduces memory usage without affecting the time complexity.

Reconstructing Shortest Paths: Backtracking

To find the actual shortest paths, not just the distances, we can maintain a separate “predecessor” matrix pred. During the main loop, whenever we update d[i][j], we also update pred[i][j] to store the intermediate vertex k that led to the improvement. By backtracking through the pred matrix, we can reconstruct the sequence of vertices along the shortest path between any two vertices. This backtracking process takes linear time, $O (n)$ , for each path.

Detecting Negative Cycles

A crucial aspect of the Floyd-Warshall algorithm is its ability to detect negative cycles. While it might seem intuitive to check for negative values in the diagonal of the distance matrix (i.e., $d_{u \to u}^{(i)} < 0$ ) after each iteration, this is not always sufficient. Let’s explore why.

Counterexample:

Consider these two graphs with identical structure and weights but different vertex numbering:

Graph 1:

Here, $d_{1 \to 1}^{(3)} = - 1$ (path: 1 → 2 → 3 → 2 →1), correctly indicating a negative cycle.

Graph 2:

However, in this graph, $d_{1 \to 1}^{(3)} = 0$ , even though the same negative cycle exists. This is because with this numbering we are going through the nodes in a different order meaning we find the negative cycle later.

A Reliable Criterion for Negative Cycle Detection:

While checking the diagonal after each iteration isn’t foolproof, the following statement is guaranteed to be true:

Theorem: A negative cycle exists in the graph if and only if there exists a vertex $v$ such that $d_{v \to v}^{(n)} < 0$ after the algorithm completes all iterations.

\exists a negative cycle ⟺ \exists v \in V : d_{v \to v}^{(n)} < 0

Intuition: If we can reach a vertex $v$ from itself with a negative cost after considering all possible intermediate vertices, it means we’ve traversed a negative cycle.

Implication for Floyd-Warshall:

This theorem provides a simple and efficient way to detect negative cycles after running the Floyd-Warshall algorithm:

Run the Algorithm: Execute the Floyd-Warshall algorithm to compute the final distance matrix $d^{(n)}$ .
Check the Diagonal: Inspect the diagonal elements of the final distance matrix ( $d_{v \to v}^{(n)}$ for all $v$ ). If any diagonal element is negative, a negative cycle exists.
Output: If no diagonal elements are negative, the algorithm has correctly computed all-pairs shortest paths. Otherwise, report the presence of a negative cycle.

Corrected Negative Cycle Check in Code:

   # Negative Cycle Detection
    for v in V:
        if d[v-1][v-1] < 0:
            return "Negative Cycle Detected"

Proof of the Theorem

Let’s rigorously prove the theorem connecting negative cycles to the diagonal elements of the distance matrix in Floyd-Warshall.

Theorem: A negative cycle exists in the graph if and only if there exists a vertex $v$ such that $d_{v \to v}^{(n)} < 0$ after the algorithm completes.

( $\Leftarrow$ ) Direction (Existence of Negative Cycle Implies Negative Diagonal Element):

If $d_{v \to v}^{(n)} < 0$ for some vertex $v$ after the algorithm completes (meaning all vertices have been considered as intermediate vertices), then there exists a path from $v$ back to itself with a negative total weight. This path, by definition, constitutes a negative cycle. This direction of the proof is relatively straightforward.

( $\Rightarrow$ ) Direction (Negative Cycle Implies Existence of Negative Diagonal Element):

This direction is more subtle. We need to show that if a negative cycle exists, the algorithm will inevitably produce a negative diagonal element in the distance matrix.

Assume a Negative Cycle: Let’s assume a negative cycle $C$ exists in the graph. A cycle is a closed walk with no repeated vertices (except the start and end vertex).
Decomposition into Simple Cycles: Any cycle (even one with repeated vertices) can be decomposed into one or more simple cycles. If the original cycle has a negative weight, at least one of these simple cycles must also have a negative weight (otherwise, the sum of non-negative weights couldn’t be negative).
Consider a Negative Simple Cycle Let $C$ be a simple negative weight cycle. Meaning it contains no repeated vertices except for the start and end node.
Select Maximal Index: Let $j$ be the vertex in cycle $C$ with the highest index. Let $i$ be any other vertex in $C$ ( $i \neq = j$ ). This ensures that when the algorithm reaches iteration j all other nodes in the cycle have already been considered as intermediate nodes.
Paths within the Cycle: Denote the path from $i$ to $j$ along cycle $C$ as $P_{1}$ and the path from $j$ back to $i$ along $C$ as $P_{2}$ . Since $i$ and $j$ are in a simple cycle, $P_{1}$ and $P_{2}$ do not contain the vertex $j$ as an internal node (it can only appear as a start or end node). However they can contain $i$ as an internal node.
Considering Subproblems: The paths $P_{1}$ and $P_{2}$ are considered in the subproblems $d_{i \to j}^{(j - 1)}$ and $d_{j \to i}^{(j - 1)}$ , respectively. This is because these subproblems consider paths using only intermediate vertices with indices up to $j - 1$ , and neither $P_{1}$ nor $P_{2}$ uses $j$ as an intermediate node.
Bounding Path Costs: Due to the optimality principle of dynamic programming, we know that:
$d_{i \to j}^{(j - 1)} \leq c (P_{1}) and d_{j \to i}^{(j - 1)} \leq c (P_{2})$
where $c (P)$ denotes the cost of a path $P$ . (The shortest path can’t be longer than a specific path).
Recursion and Negative Cycle: In iteration $j$ , the algorithm computes:
$d_{i \to i}^{(j)} = min (d_{i \to i}^{(j - 1)}, d_{i \to j}^{(j - 1)} + d_{j \to i}^{(j - 1)})$
Since $d_{i \to j}^{(j - 1)} + d_{j \to i}^{(j - 1)} \leq c (P_{1}) + c (P_{2}) = c (C) < 0$ (because $C$ is a negative cycle), we have $d_{i \to i}^{(j)} < 0$ . Thus, at least one diagonal element will become negative by iteration j at the latest (possibly even earlier), indicating the presence of a negative cycle. This negative value will then be propagated to $d_{i \to i}^{(n)}$

This completes the proof, showing that the final diagonal elements of the distance matrix reliably indicate the presence or absence of negative cycles in the graph.

Johnson’s Algorithm

We’ve seen that Floyd-Warshall solves the all-pairs shortest paths problem in $O (n^{3})$ time. Johnson’s algorithm offers an alternative approach with a runtime of $O (n (m + n lo g n))$ . While the complexities might seem similar, the key difference lies in their dependence on the number of edges ( $m$ ).

Dense vs. Sparse Graphs:

Dense Graphs: In dense graphs, where $m$ approaches $n^{2}$ (the maximum possible number of edges), Floyd-Warshall’s $O (n^{3})$ performance is generally more efficient. The $n lo g n$ term in Johnson’s runtime becomes less significant compared to the $n^{3}$ term as the graph becomes denser.
Sparse Graphs: In sparse graphs, where $m$ is much smaller than $n^{2}$ (e.g., linear in $n$ , as in trees), Johnson’s algorithm shines. Its runtime becomes closer to $O (nm lo g n)$ , significantly outperforming Floyd-Warshall’s cubic complexity. The logarithmic factor in Johnson’s algorithm becomes more prominent compared to the cubic term in Floyd-Warshall. Sparse graphs tend to be common in real-world applications.

TLDR: Dense $⟹$ Floyd-Warshall, Sparse $⟹$ Johnson’s Algorithm

Apsp and Johnson (MIT)

Johnson’s Strategy: Leveraging Dijkstra’s and Bellman-Ford

Johnson’s algorithm cleverly combines Dijkstra’s algorithm (efficient for non-negative edge weights) with Bellman-Ford (handles negative edge weights but slower) to achieve its efficiency:

Reweighting: It uses Bellman-Ford to compute a set of new edge weights that are non-negative and preserve the shortest paths. This process, called reweighting, eliminates negative weights while ensuring that running Dijkstra’s on the reweighted graph yields the correct shortest paths in the original graph.
Dijkstra’s for All Pairs: After reweighting, it applies Dijkstra’s algorithm to each vertex, efficiently solving the all-pairs shortest paths problem on the reweighted (and now non-negative) graph.

Reweighting in Johnson’s Algorithm: Handling Negative Edges

Johnson’s algorithm relies on a crucial reweighting step to transform the graph into one with non-negative edge weights while preserving the shortest paths. This allows the use of Dijkstra’s algorithm, which is more efficient than Bellman-Ford for non-negative weights.

A Tempting but Incorrect Approach: Adding a Constant

A seemingly simple solution would be to add a large enough constant $C$ to all edge weights, making them non-negative.

However, this approach does not work. Why? Because adding a constant to all edge weights changes the shortest paths.

The Problem: Adding a constant to edge weights disproportionately affects longer paths. Each edge in a path contributes the added constant, favoring paths with fewer edges, even if they weren’t originally the shortest.

We need a more sophisticated approach that ensures non-negative weights while maintaining the relative lengths of paths.

Reweighting: A Telescoping Sum Approach

To achieve a valid reweighting, we need a method that adds the same value to the weight of all paths between any two vertices $s$ and $t$ . This can be accomplished using a technique based on telescoping sums and a potential function.

Potential Function

We introduce a potential function $h : V \to R$ , assigning a real-valued “height” $h (v)$ to each vertex $v$ . We then define the reweighted edge cost $\overset{c}{^}$ as:

\overset{c}{^} (u, v) := c (u, v) + h (u) - h (v)

Effect on Path Weights

Let’s consider a path $P$ from $s$ to $t$ : $s = v_{0} \to v_{1} \to v_{2} \to \dots \to v_{k} = t$ .

Original Path Cost

c (P) = c (v_{0}, v_{1}) + c (v_{1}, v_{2}) + \dots + c (v_{k - 1}, v_{k})

Reweighted Path Cost

\overset{c}{^} (P) = \overset{c}{^} (v_{0}, v_{1}) + \overset{c}{^} (v_{1}, v_{2}) + \dots + \overset{c}{^} (v_{k - 1}, v_{k}) = [c (v_{0}, v_{1}) + h (v_{0}) - h (v_{1})] + [c (v_{1}, v_{2}) + h (v_{1}) - h (v_{2})] + \dots + [c (v_{k - 1}, v_{k}) + h (v_{k - 1}) - h (v_{k})] = c (v_{0}, v_{1}) + c (v_{1}, v_{2}) + \dots + c (v_{k - 1}, v_{k}) + h (v_{0}) - h (v_{1}) + h (v_{1}) - h (v_{2}) + \dots + h (v_{k - 1}) - h (v_{k})

Notice that most of the $h (v)$ terms cancel out (this is the telescoping sum). We are left with:

\overset{c}{^} (P) = c (P) + h (s) - h (t)

Key Observation

The reweighted path cost $\overset{c}{^} (P)$ differs from the original cost $c (P)$ by a constant amount ( $h (s) - h (t)$ ) that depends only on the start and end vertices, not on the path itself.

This reweighting scheme, using the potential function, guarantees that the shortest path between any two vertices $s$ and $t$ remains the same under the original costs ( $c$ ) and the reweighted costs ( $\overset{c}{^}$ ).

Finding a Suitable Potential Function

Our goal is to determine a height function $h : V \to R$ such that all reweighted edge costs are non-negative: $\overset{c}{^} (u, v) \geq 0$ for all $(u, v) \in E$ . This ensures that we can apply Dijkstra’s algorithm after reweighting.

Johnson’s Insight: Introducing a New Vertex

Johnson’s key idea is to introduce a new auxiliary vertex $z$ connected to all original vertices with zero-weight edges:

Add a new vertex $z$ to the graph.
Add directed edges $(z, v)$ with weight $c (z, v) = 0$ for all $v \in V$ .

Now, define the height function $h (v)$ as the length of the shortest path from the new vertex $z$ to vertex $v$ in this augmented graph:

h (v) := length of shortest path from z to v

Why is this a good idea?

Let’s analyze the implication of this definition for the reweighted edge costs:

We want to show that: $\overset{c}{^} (u, v) = c (u, v) + h (u) - h (v) \geq 0$ .

1. Interpreting $h (v)$

$h (v)$ represents the shortest path distance from $z$ to $v$ .

2. Shortest Path Property

By the definition of shortest paths, we know that for any edge $(u, v)$ in the original graph, the shortest path from $z$ to $v$ must be shorter than or equal to the shortest path from $z$ to $u$ followed by the edge $(u, v)$ . Formally:

h (v) \leq h (u) + c (u, v)

The left side, $h (v)$ , represents the cost of the shortest path from $z$ to $v$ .
The right side, $h (u) + c (u, v)$ , represents the cost of going from $z$ to $u$ via the shortest path and then directly to $v$ .

3. Rewriting the Inequality

Rearranging the inequality, we get:

c (u, v) + h (u) - h (v) \geq 0

4. Connecting to Reweighted Cost

Notice that the left-hand side of this inequality is exactly our definition of the reweighted edge cost $\overset{c}{^} (u, v)$ . Therefore:

\overset{c}{^} (u, v) \geq 0

This demonstrates that by defining the height function using shortest paths from the new vertex $z$ , we guarantee that all reweighted edge costs are non-negative.

Computing $h (v)$ with Bellman-Ford

To compute $h (v)$ efficiently, we can use the Bellman-Ford algorithm on the augmented graph with $z$ as the source vertex. Bellman-Ford is crucial here because the original graph might contain negative edge weights.

After running Bellman-Ford, h(v) is the calculated shortest distance from z to v. It will also detect any negative weight cycles in the original graph (if a negative cycle exists, we cannot guarantee non-negative reweighted costs).

If no negative cycles are detected, we proceed with Dijkstra’s algorithm on the reweighted graph. This combination of Bellman-Ford for reweighting and Dijkstra’s for efficient shortest path computation is what makes Johnson’s algorithm so effective.

Example:

Summary and Runtime Analysis

Johnson’s algorithm provides an efficient solution to the all-pairs shortest paths problem, especially for sparse graphs. Let’s summarize the steps and analyze the runtime:

Step	Runtime	Description
1. Augment Graph	$O (n)$	Add a new vertex $z$ and connect it to all existing vertices with zero-weight edges.
2. Compute Heights	$O (nm)$	Run Bellman-Ford from $z$ to calculate the height function $h (v)$ (shortest distance from $z$ to $v$ ). Detect negative cycles. If a negative cycle is found, report it and terminate.
3. Reweight Edges	$O (m)$	Compute reweighted edge costs $\overset{c}{^} (u, v) = c (u, v) + h (u) - h (v)$ for all edges $(u, v) \in E$ .
4. Run Dijkstra’s	$O (n (m + n lo g n))$	Run Dijkstra’s algorithm from each vertex in the reweighted graph to compute all-pairs shortest paths.

Total Runtime

The total runtime is dominated by steps 2 and 4:

O (nm) + O (n (m + n lo g n)) = O (nm + nm lo g n + n^{2} lo g n)

Matrices and Graphs

We can represent walks in a graph $G$ with vertices $V = {1, 2, \dots, n}$ using matrices. Specifically, powers of the adjacency matrix capture information about the existence of walks of different lengths between vertices.

Let’s consider walks from vertex $i$ to vertex $j$ with exactly $k$ edges. These walks have a recursive structure: a walk of length $k$ from $i$ to $j$ can be broken down into a walk of length $k - 1$ from $i$ to some intermediate vertex $s$ , followed by a single edge from $s$ to $j$ .

This recursive structure allows us to model various graph problems using matrix operations.

Examples: Modeling Graph Problems with Matrices

We can define different problems based on the properties of the walks we’re interested in.

Example 1: Existence of Walks

Let $A$ be the adjacency matrix of graph $G$ , where $A_{ij} = 1$ if there’s an edge from $i$ to $j$ , and $A_{ij} = 0$ otherwise. Define:

L_{ij}^{(k)} := {1, 0, if there exists a walk of length k from i to j otherwise

Recursive Formulation

A walk of length $k$ from $i$ to $j$ exists if there’s an intermediate vertex $s$ such that there’s a walk of length $k - 1$ from $i$ to $s$ AND an edge from $s$ to $j$ . This can be expressed using logical OR and AND:

L_{ij}^{(k)} = s = 1 ⋁ n (L_{i s}^{(k - 1)} \land A_{s j})

Example 2: Minimal Cost of a Walk

Now, let’s consider weighted graphs. Let $c (i, j)$ be the cost of the edge from $i$ to $j$ (or $\infty$ if no edge exists). We want to find the minimum cost of a walk of length $k$ from $i$ to $j$ . Define:

M_{ij}^{(k)} := Minimum cost of a walk of length k from i to j

Recursive Formulation

The minimum cost walk of length $k$ from $i$ to $j$ can be found by considering all possible intermediate vertices $s$ . We take the minimum over the costs of reaching an intermediate vertex $s$ in $k - 1$ steps and then adding the cost of the edge from $s$ to $j$ :

M_{ij}^{(k)} = 1 \leq s \leq n min (M_{i s}^{(k - 1)} + M_{s j}^{(1)})

where we use the following conventions:

$M_{ii}^{(0)} = 0$ (cost of a zero-length walk from a vertex to itself is 0)
$M_{ij}^{(0)} = \infty$ for $i \neq = j$ (no walk of length 0 exists between distinct vertices)

This formulation captures the principle of optimality: the minimum cost path of length $k$ must be composed of a minimum cost path of length $k - 1$ followed by a single edge.

Example 3: Number of Walks

Finally, let’s count the number of distinct walks of length $k$ from $i$ to $j$ . Define:

N_{ij}^{(k)} := Number of walks of length k from i to j

Recursive Formulation

To count the number of walks of length $k$ , we sum the number of walks of length $k - 1$ to each intermediate vertex $s$ that has a direct edge to $j$ :

N_{ij}^{(k)} = s = 1 \sum n (N_{i s}^{(k - 1)} \cdot I_{{(s, j) \in E}})

Where $I_{{(s, j) \in E}} = N_{s j}^{(1)}$ is an indicator function that is 1 if there is an edge from s to j and 0 otherwise.

Base Case: $N^{(1)} = A$ (the adjacency matrix itself, as each edge represents a walk of length 1). This means that $N_{ij}^{1} = 1$ if there is a walk (an edge) of length one from i to j.

Matrix Representation of Walk Counts

Let’s connect the recursive formulations from the previous examples to matrix operations. Notice that the summation in the recursive definition of the number of walks closely resembles matrix multiplication.

Matrix Formulation for Number of Walks

Recall that the number of walks of length $k$ from $i$ to $j$ is given by:

N_{ij}^{(k)} = s = 1 \sum n (N_{i s}^{(k - 1)} \cdot I_{{(s, j) \in E}}) = s = 1 \sum n N_{i s}^{(k - 1)} \cdot N_{s j}^{(1)}

This is precisely the formula for matrix multiplication. Therefore, we can express the number of walks of length $k$ using the $k$ -th power of the adjacency matrix:

N^{(k)} = N^{(k - 1)} \cdot A = A^{k}

where $A$ is the adjacency matrix and $N^{(k)}$ is the matrix whose $(i, j)$ entry represents the number of walks of length $k$ from $i$ to $j$ . Note that standard arithmetic addition and multiplication are used here.

Theorem: Walks and Matrix Powers

The element $(i, j)$ in $A^{k}$ (the $k$ -th power of the adjacency matrix) is equal to the number of walks of length $k$ from vertex $i$ to vertex $j$ . Let us prove this by induction.

Base Case ( $k = 1$ )

$A^{1} = A$ . By definition, the adjacency matrix $A$ has a 1 in entry $(i, j)$ if there is an edge (a walk of length 1) from $i$ to $j$ , and 0 otherwise. This matches our definition of $N_{ij}^{(1)}$ .

Inductive Hypothesis

Assume the theorem holds for some $k \geq 1$ . That is, assume that $(A^{k})_{ij}$ represents the number of walks of length $k$ from $i$ to $j$ .

Inductive Step ( $k + 1$ )

We want to show that $(A^{k + 1})_{ij}$ represents the number of walks of length $k + 1$ from $i$ to $j$ .

By the definition of matrix multiplication:

(A^{k + 1})_{ij} = (A^{k} \cdot A)_{ij} = s = 1 \sum n (A^{k})_{i s} \cdot A_{s j}

By the inductive hypothesis, $(A^{k})_{i s}$ is the number of walks of length $k$ from $i$ to $s$ . $A_{s j}$ is 1 if there’s an edge from $s$ to $j$ (a walk of length 1), and 0 otherwise.

Therefore, the sum represents the total number of walks that can be formed by taking a walk of length $k$ from $i$ to some intermediate vertex $s$ , followed by a single edge from $s$ to $j$ . This is precisely the number of walks of length $k + 1$ from $i$ to $j$ .

Applications of Matrix Powers in Graph Analysis

The connection between matrix powers and walk counts has several important applications in graph theory and algorithms. Let’s explore two of them:

Application 1: Counting Triangles in a Directed Graph

Let $G$ be a directed graph without self-loops. A directed triangle is a set of three vertices ${i, j, k}$ such that the directed edges $(i, j)$ , $(j, k)$ , and $(k, i)$ exist in the graph.

Counting Triangles with Matrix Trace:

The number of directed triangles in $G$ can be calculated using the trace of the cubed adjacency matrix:

Number of triangles = \frac{1}{3} Tr (A^{3}) = \frac{1}{3} (i = 1 \sum n (A^{3})_{i, i})

where $Tr (A^{3})$ denotes the trace of $A^{3}$ (the sum of its diagonal elements).

Why does this work?

$(A^{3})_{ii}$ represents the number of walks of length 3 from vertex $i$ back to itself. Each such walk corresponds to a triangle containing vertex $i$ . However, each triangle is counted three times (once for each of its vertices as the starting point). Dividing by 3 corrects for this overcounting.

Generalization to Cycles of Length k: By calculating $T r (A^{k})$ we can find the total number of cycles of length k. We would have to divide by k to adjust for overcounting.

Application 2: All-Pairs Reachability and Transitive Closure

Reachability: The reachability problem asks: given two vertices $u$ and $v$ , does there exist a directed path from $u$ to $v$ ? This is equivalent to finding the transitive closure of the graph.

Undirected Graphs: In undirected graphs, reachability is determined by connected components. Algorithms like Depth-First Search (DFS) or Breadth-First Search (BFS) can efficiently find connected components.

Directed Graphs: Reachability in directed graphs is more complex.

Adding Self-Loops: A common trick is to add self-loops to each vertex in the graph. This ensures that if a path from $u$ to $v$ exists, there also exists a walk of length at most $n - 1$ from $u$ to $v$ (we can use self-loops to extend shorter paths to the required length). Note that this applies only if there are no cycles. With cycles, there is a path iff there is a walk with length smaller than $n$ .

Using Matrix Powers for Reachability After adding the self loops, we can check for reachability by computing $R = A + A^{2} + \dots + A^{n - 1}$ or even better if we also add self loops to the graph $R = ⋁_{i = 1}^{n - 1} A^{i}$ . There is a path from u to v iff $R_{uv} > 0$ .

This matrix $R$ , obtained by summing powers of the adjacency matrix (up to $n$ or $n - 1$ ), effectively represents the transitive closure of the graph. If $R_{uv} > 0$ , there’s a path from $u$ to $v$ ; otherwise, there isn’t. The question we need to answer however is: How to matrix multiplication efficiently?

Calculating Matrix Powers Efficiently: Iterative Squaring

Using powers of the adjacency matrix to solve graph problems requires efficient matrix exponentiation. Naive repeated multiplication is too slow for large exponents.

Iterative Squaring: A Logarithmic Speedup

Iterative squaring, also known as repeated squaring, drastically reduces the number of multiplications needed to compute $A^{k}$ . It leverages the following observation:

A^{k} = {(A^{k /2})^{2}, A \cdot (A^{(k - 1) /2})^{2}, if k is even if k is odd

Algorithm:

Initialize: result = I (identity matrix), temp = A, k' = k
Iterate: While k' > 0:
- If k' is odd: result = result * temp (using appropriate multiplication - arithmetic or boolean), k' = k' - 1
- temp = temp * temp
- k' = k' / 2
Return: result

Runtime Analysis: From Linear to Logarithmic

The number of iterations is proportional to $lo g_{2} k$ , as we halve k' in each iteration. Each iteration involves a constant number of matrix multiplications. With standard matrix multiplication ( $O (n^{3})$ ), the total runtime becomes $O (n^{3} lo g k)$ -a significant improvement over the naive $O (n^{3} k)$ . This logarithmic complexity makes iterative squaring practical for even very large exponents.

Efficient Matrix Multiplication: Strassen’s Algorithm

Standard matrix multiplication of two $n \times n$ matrices takes $O (n^{3})$ time. Strassen’s algorithm provides a faster approach by cleverly reducing the number of subproblems in a divide-and-conquer strategy.

Divide and Conquer:

Divide: Split each matrix into four $\frac{n}{2} \times \frac{n}{2}$ submatrices.
Conquer: Recursively compute products of submatrices. Strassen’s key insight is that the product of two $2 \times 2$ matrices can be computed using only 7 multiplications (instead of the usual 8) and 18 additions/subtractions. This reduction in multiplications at each level of recursion leads to asymptotic improvement.

Note: While not shown here, the specific way these 7 multiplications are performed is crucial to Strassen’s algorithm and involves clever combinations of the submatrices.

Runtime Analysis: Master Theorem

The recurrence relation for Strassen’s algorithm is:

T (n) = 7 T (\frac{n}{2}) + O (n^{2})

$7$ : Number of recursive subproblems.
$\frac{n}{2}$ : Size of each subproblem.
$O (n^{2})$ : Time for combining subproblem results (additions/subtractions).

Applying the Master Theorem (Case 1), we get a runtime of $O (n^{l o g_{2} 7}) \approx O (n^{2.81})$ . This is a significant improvement over the standard $O (n^{3})$ matrix multiplication. Strassen’s algorithm demonstrates that matrix multiplication can be performed faster than the straightforward cubic approach.

Geht es besser?

Yes! Further optimizations and more complex algorithms have pushed this theoretical bound even lower, though often with practical trade-offs. The algorithm and slight improvements of it are currently the most asymptotically efficient matrix multiplication algorithms but due to large constant factors not used in practice. But there exist other algorithms which are faster than Strassen’s algorithm which are practical too.

Final “side questing” Boss: If you want to, you can try figuring out a similar matrix computation for the previous two recursions using what we’ve learned in LinAlg, DiskMat and then implement it using concepts from AnD and EProg…

Continue here: 15 Quick Select and Finding the Median

CS Notes

Explorer

14 All Source Shortest Paths, Floyd-Warshall, Johnson's Algorithm, Matrices and Graphs, Connected Components in Digraphs, Efficient Matrix Operations

Recap: Single Source Shortest Paths

Floyd-Warshall: All-Pairs Shortest Paths

Subproblem Definition

Recursive Formulation: Building Solutions from Subproblems

Case 1: Vertex i is not an intermediate vertex

Case 2: Vertex i is an intermediate vertex (exactly once)

Case 3: Vertex i appears multiple times

Combining Cases (Assuming No Negative Cycles)

Base Case:

Algorithm: Floyd-Warshall in Pseudocode

Runtime Analysis: Cubic Complexity

Space Complexity: Quadratic with Optimization

Reconstructing Shortest Paths: Backtracking

Detecting Negative Cycles

Proof of the Theorem

(⇐) Direction (Existence of Negative Cycle Implies Negative Diagonal Element):

(⇒) Direction (Negative Cycle Implies Existence of Negative Diagonal Element):

Johnson’s Algorithm

Johnson’s Strategy: Leveraging Dijkstra’s and Bellman-Ford

Reweighting in Johnson’s Algorithm: Handling Negative Edges

A Tempting but Incorrect Approach: Adding a Constant

Reweighting: A Telescoping Sum Approach

Potential Function

Effect on Path Weights

Original Path Cost

Reweighted Path Cost

Key Observation

Finding a Suitable Potential Function

Johnson’s Insight: Introducing a New Vertex

Why is this a good idea?

1. Interpreting h(v)

2. Shortest Path Property

3. Rewriting the Inequality

4. Connecting to Reweighted Cost

Computing h(v) with Bellman-Ford

Summary and Runtime Analysis

Total Runtime

Matrices and Graphs

Examples: Modeling Graph Problems with Matrices

Example 1: Existence of Walks

Recursive Formulation

Example 2: Minimal Cost of a Walk

Recursive Formulation

Example 3: Number of Walks

Recursive Formulation

Matrix Representation of Walk Counts

Matrix Formulation for Number of Walks

Theorem: Walks and Matrix Powers

Base Case (k=1)

Inductive Hypothesis

Inductive Step (k+1)

Applications of Matrix Powers in Graph Analysis

Application 1: Counting Triangles in a Directed Graph

Application 2: All-Pairs Reachability and Transitive Closure

Calculating Matrix Powers Efficiently: Iterative Squaring

Iterative Squaring: A Logarithmic Speedup

Algorithm:

Runtime Analysis: From Linear to Logarithmic

Efficient Matrix Multiplication: Strassen’s Algorithm

Divide and Conquer:

Runtime Analysis: Master Theorem

Table of Contents

Graph View

Case 1: Vertex `i` is not an intermediate vertex

Case 2: Vertex `i` is an intermediate vertex (exactly once)

Case 3: Vertex `i` appears multiple times

( $\Leftarrow$ ) Direction (Existence of Negative Cycle Implies Negative Diagonal Element):

( $\Rightarrow$ ) Direction (Negative Cycle Implies Existence of Negative Diagonal Element):

1. Interpreting $h (v)$

Computing $h (v)$ with Bellman-Ford

Base Case ( $k = 1$ )

Inductive Step ( $k + 1$ )