Video from: 06.03.2025 | Video: Homelab
Recall: Augmenting Paths and Maximum Matchings
Let’s begin by revisiting the fundamental concepts that underpin the Hopcroft-Karp algorithm. These ideas were introduced previously and are crucial for understanding the algorithm’s mechanics and correctness.
Augmenting Paths
Given a matching in a bipartite graph , an augmenting path with respect to is a path in that satisfies the following conditions:
- Alternating Path: The edges of are alternately in and not in .
- Unmatched Endpoints: Both the starting and ending vertices of are unmatched with respect to , meaning they are not incident to any edge in .
Augmenting a Matching
If we find an -augmenting path , we can increase the size of the matching by performing an augmentation. This is achieved by taking the symmetric difference of the current matching and the edges of the augmenting path :
This operation effectively “flips” the status of edges along : edges in are removed from the matching, and edges in are added to the matching. The size of the matching increases by exactly one, i.e., .
Berge’s Theorem
A cornerstone result in matching theory is Berge’s Theorem, which provides a necessary and sufficient condition for a matching to be maximum.
Berge’s Theorem: A matching in a graph is a maximum matching (i.e., has the largest possible cardinality) if and only if there is no -augmenting path in .
This theorem is critical because it gives us an algorithmic approach to finding maximum matchings. If our current matching is not maximum, Berge’s Theorem guarantees the existence of an augmenting path, which we can use to increase the matching size. We can repeat this process until no more augmenting paths can be found, at which point we have reached a maximum matching.
Basic Algorithm Sketch
Based on Berge’s Theorem, we can outline a basic algorithm for finding a maximum matching:
Algorithm Sketch:
Input: Bipartite Graph G = (V, E)
Output: Maximum Matching M
Initialize M = ∅ (empty matching)
while True:
Search for an M-augmenting path P in G
if no M-augmenting path P exists:
return M (M is a maximum matching)
else:
M = M ⊕ P (augment the matching)
Finding Augmenting Paths in Bipartite Graphs using BFS
For bipartite graphs, we can efficiently search for augmenting paths using Breadth-First Search (BFS). The idea is to explore paths that alternate between edges not in the matching and edges in the matching.
BFS Approach to Find Augmenting Paths:
-
Start from Unmatched Vertices in Partition A: Begin the BFS from all vertices in partition that are currently unmatched. These vertices form the first layer (layer 0) of our BFS.
-
Alternating Layers:
- Odd Layers (from A to B): From vertices in an even layer (say layer , which is in partition ), explore edges that are not in the current matching . Reach new vertices in partition , which form the next layer (layer , an odd layer).
- Even Layers (from B to A): From vertices in an odd layer (say layer , in partition ), explore edges that are in the current matching . Reach new vertices in partition , which form the next layer (layer , an even layer).
-
Check for Unmatched Vertices in Partition B: During the BFS, if we reach an unmatched vertex in partition (say in layer ), we have found an augmenting path. This path can be traced back from this unmatched vertex to an unmatched vertex in partition (in layer 0) by following the predecessor pointers in the BFS tree.
-
Augment and Repeat: If an augmenting path is found, augment the matching and repeat the BFS process from step 1 with the new, larger matching.
-
Termination: If the BFS completes without finding any augmenting path (i.e., no unmatched vertex in is reached), then the current matching is maximum, and we terminate.
Inefficiency of Simple BFS Approach and Motivation for Hopcroft-Karp
While the BFS approach works, a naive implementation might perform redundant work in each iteration. Consider that in each iteration, we find only one augmenting path and increase the matching size by just one. We might need to repeat this process many times, potentially up to iterations in the worst case.
The inefficiency arises from the fact that we find and augment along only one augmenting path at a time, even if multiple disjoint augmenting paths might be available. This leads to potential duplicate work in subsequent BFS searches, as we might re-explore parts of the graph that could have been used to find other augmenting paths in the same iteration.
The Hopcroft-Karp algorithm addresses this inefficiency by finding a maximal set of shortest, vertex-disjoint augmenting paths in each iteration. By augmenting along all paths in this set simultaneously, it achieves a significantly faster runtime.
Hopcroft-Karp Algorithm: Finding Maximum Matching Efficiently
The Hopcroft-Karp algorithm is a more sophisticated approach to finding maximum matchings in bipartite graphs. It builds upon the idea of augmenting paths but optimizes the process by finding and augmenting along multiple augmenting paths in each iteration.
Algorithm Steps
Hopcroft-Karp Algorithm:
Input: Bipartite Graph G = (A ∪ B, E)
Output: Maximum Matching M
Initialize M = ∅ (empty matching)
repeat:
1. BFS Phase:
- Perform a BFS starting from all unmatched vertices in A.
- Construct alternating paths, layer by layer.
- Keep track of distances from unmatched vertices in A.
- Let 'k' be the length of the shortest augmenting path found.
- If no augmenting path is found, break the loop (M is maximum).
2. DFS Phase:
- Initialize a set S = ∅.
- For each unmatched vertex 'u' in A:
- Perform a DFS starting from 'u'.
- Add each vertex-disjoint augmenting path of length 'k' to set S.
- Continue DFS from 'u' until no more vertex-disjoint augmenting paths can be found.
3. Augmentation:
- For each augmenting path P in the set S:
- M = M ⊕ P (augment the matching along path P)
until no augmenting paths are found in BFS phase
return M
Key Ideas in Hopcroft-Karp
-
Shortest Augmenting Paths: The algorithm focuses on finding shortest augmenting paths in each iteration. This is crucial for bounding the number of iterations.
-
Maximal Set of Vertex-Disjoint Paths: In each iteration, the algorithm finds a maximal set of vertex-disjoint augmenting paths of the shortest length. “Maximal” here means that we cannot add any more augmenting paths of the shortest length to while maintaining vertex-disjointness. It’s an inclusion-maximal set.
-
Simultaneous Augmentation: The algorithm augments the matching along all paths in the set in a single iteration. This is the key optimization that leads to the faster runtime.
Finding a Maximal Set of Vertex-Disjoint Augmenting Paths using DFS
The DFS phase is responsible for finding a maximal set of vertex-disjoint shortest augmenting paths. The process is as follows:
-
Iterate through Unmatched Vertices in A: For each unmatched vertex , initiate a DFS.
-
DFS for Shortest Paths: The DFS explores paths starting from , respecting the alternating structure of augmenting paths and only considering paths of the shortest length (length determined by the BFS phase).
-
Vertex-Disjointness Enforcement: When a DFS finds an augmenting path , we add to our set . To ensure vertex-disjointness, once a vertex is included in a path in , it should not be used in any other path added to in the same iteration. This can be managed by keeping track of vertices already used in paths in during the DFS.
-
Maximality: The DFS process, when iterated through all unmatched vertices in , is designed to find a maximal set of vertex-disjoint shortest augmenting paths. While not necessarily a maximum set (in terms of cardinality of ), it is maximal in the sense that we cannot add another shortest augmenting path that is vertex-disjoint from all paths already in .
Runtime of Hopcroft-Karp Algorithm
The Hopcroft-Karp algorithm achieves a significantly better runtime than the simple augmenting path algorithm. The key result is that the number of iterations of the main loop is bounded by . Since each iteration (BFS and DFS phases) takes time in a sparse graph, the overall runtime of the Hopcroft-Karp algorithm is:
Runtime:
In this lecture, we will focus on understanding and proving why the number of iterations is .
Proof of Iterations in Hopcroft-Karp Algorithm
To understand why the Hopcroft-Karp algorithm is efficient, we need to prove that the number of iterations in the main loop is indeed bounded by . The proof is structured in three parts.
Part 1: Increasing the Length of Shortest Augmenting Paths
Claim: With each iteration of the Hopcroft-Karp algorithm, the length of the shortest augmenting path increases strictly. More specifically, if is the length of the shortest augmenting path in iteration , then the length of the shortest augmenting path in iteration will be at least .
Why and not ? Augmenting paths in bipartite graphs always have odd length because they alternate between partitions and and start and end in different partitions (unmatched vertices are in different partitions in a bipartite graph if a perfect matching is not possible, and if a perfect matching is possible, then no augmenting path exists). Therefore, the length of an augmenting path is always an odd integer. If the shortest augmenting path length is , the next possible length must be at least .
Proof of Claim 1:
Let be the matching at the beginning of iteration , and let be the maximal set of vertex-disjoint shortest -augmenting paths of length . Let be the matching after augmentation in iteration . We want to show that any -augmenting path must have length greater than .
Consider an -augmenting path . We analyze how can interact with the paths in .
Case 1: is Vertex-Disjoint from All Paths in .
If is vertex-disjoint from all paths in , then was also an -augmenting path in iteration . Since was chosen to be a maximal set of vertex-disjoint shortest -augmenting paths of length , and is vertex-disjoint from all paths in , cannot be of length . If it were of length , we could have added it to , contradicting the maximality of . Therefore, in this case, . Since augmenting path lengths are odd, .
Case 2: Shares Vertices with at Least One Path in .
Let be a path that shares at least one vertex with . To analyze this case, we need a helper lemma that relates the lengths of intersecting augmenting paths.
Using the Lemma to Prove Claim:
Let be a path that intersects with . We know . By the lemma, . Since and intersect at least at one vertex, . Thus, .
Therefore, in both Case 1 and Case 2, any -augmenting path must have length at least . This proves that with each iteration, the length of the shortest augmenting path increases strictly (by at least 2).
Lemma
Let be a matching, be a shortest -augmenting path, and be an -augmenting path where . Then, if and share vertices, .
Proof of the Lemma:
Let and . Then . The size of the matching increases by 1 with each augmentation. Thus, and . This implies that .
Consider the symmetric difference . The set of edges consists of paths and cycles where edges alternate between and . Since is larger than by 2, must contain at least two more edges from (which are from and ) than from . This can only happen if contains at least two vertex-disjoint -augmenting paths (or components that act like augmenting paths in terms of edge imbalance). However, a simpler observation is that must be related to the increase in matching size, which is 2.
We know that . Using the inclusion-exclusion principle, . Therefore, .
It’s argued that . This is because is obtained from by two augmentations, and is a shortest -augmenting path. It implies that the “augmenting components” in must have lengths related to . A more direct argument relies on the fact that . The symmetric difference essentially represents the changes made to to get . Since the matching size increased by 2, we expect the total “length” of change to be at least related to twice the shortest augmenting path length. A rigorous justification for would require a more detailed decomposition of into augmenting paths.
Assuming , we have . Rearranging this inequality, we get . This proves the lemma.
Part 2: Bounding the Number of Iterations based on Path Length
Claim: Let be a matching, and let be the length of a shortest -augmenting path. If is a maximum matching, then .
Proof of Claim:
Assume . Consider the symmetric difference . As we discussed earlier, decomposes into vertex-disjoint paths and cycles. Since , there must be at least vertex-disjoint paths in this decomposition that are -augmenting paths. Let be a set of vertex-disjoint -augmenting paths from . We can choose to be a set of paths such that .
Since is the length of the shortest -augmenting path, each path in must have length at least . Each path of length at least contains at least vertices. Since the paths in are vertex-disjoint, the total number of vertices involved in these paths is at least . This number cannot exceed the total number of vertices in the graph, .
Therefore, we have: .
Dividing both sides by , we get: .
Rearranging, we obtain: . This proves Claim 2.
Part 3: Bounding the Total Number of Iterations -
We now demonstrate that the Hopcroft-Karp algorithm requires at most iterations to find a maximum matching. We divide the algorithm’s execution into two phases based on the length of the shortest augmenting path, .
A) First Phase: Iterations with Short Augmenting Paths
Consider the initial iterations where the length of the shortest augmenting path, , is less than . We know from Part 1 that in each iteration, the length of the shortest augmenting path strictly increases. Since the shortest augmenting path length is at least 1 and must be less than in this phase, the number of iterations in this phase is limited.
In the most conservative scenario, the shortest augmenting path length starts at 1 and increases by at least 2 in each iteration. Therefore, the maximum number of iterations before the shortest path length becomes is roughly . Thus, the number of iterations in this first phase is at most .
B) Second Phase: Iterations with Long Augmenting Paths
Now consider the iterations where the length of the shortest augmenting path, , is at least , i.e., . From Part 2, Claim 2, we have the bound:
where is the matching at the beginning of the current iteration and is a maximum matching. Since we are in the second phase where , it follows that:
Therefore, for any iteration in this second phase,
This inequality tells us that when we are in the second phase (long augmenting paths), the size of our current matching is already within of the size of a maximum matching . Since in each iteration, the size of the matching increases by at least 1, there can be at most remaining iterations to reach a maximum matching from this point.
Total Iterations:
The total number of iterations for the Hopcroft-Karp algorithm is the sum of the iterations in the first phase and the second phase. Both phases are bounded by iterations. Therefore, the total number of iterations is .
This concludes the proof that the Hopcroft-Karp algorithm finds a maximum matching in iterations, leading to an overall time complexity of .
Practical Considerations: Achieving Near-Maximum Matchings
In practice, if achieving a perfectly maximum matching is not strictly necessary, we can consider stopping the Hopcroft-Karp algorithm after a certain number of iterations or when the length of the shortest augmenting path becomes sufficiently large. This can provide a good approximation of a maximum matching with reduced computational effort. For instance, stopping after a fixed number of iterations or when the shortest path length exceeds a threshold can often yield a matching that is very close to maximum size in real-world scenarios.
Conclusion
The Hopcroft-Karp algorithm is guaranteed to find a maximum matching in a bipartite graph in iterations. Since each iteration (BFS and DFS phases) takes linear time , the overall time complexity is .
This concludes the proof that the Hopcroft-Karp algorithm is indeed very efficient for finding maximum matchings in bipartite graphs, with a runtime significantly better than the simpler augmenting path algorithms.
Overview of Matching Algorithms (Beyond Bipartite)
As we discussed matching algorithms in the context of Christofides’ algorithm (finding a minimum weight perfect matching), let’s briefly mention the complexity landscape for matching problems in general graphs (beyond bipartite graphs).
For bipartite graphs:
- Hopcroft-Karp Algorithm (Unweighted): Achieves a time complexity of .
- More Advanced Algorithms (Weighted): Algorithms exist with complexities close to for weighted bipartite matching, especially with polynomially bounded weights.
For general graphs (non-bipartite):
- Micali-Vazirani Algorithm (Unweighted): A complex algorithm that achieves a time complexity of , similar to Hopcroft-Karp for bipartite graphs. Gabow-Tarjan algorithm is another algorithm with similar complexity.
- Matrix Multiplication Based Algorithms (Unweighted): Algorithms using fast matrix multiplication have achieved slightly better theoretical complexities like for unweighted maximum matching.
- Gabow’s Algorithms (Weighted): For weighted matching in general graphs, algorithms like Gabow’s algorithms exist, with complexities around for finding maximum weight or minimum weight perfect matchings.
Finding minimum weight perfect matchings, as required in Christofides’ Algorithm, can be solved in polynomial time, although the algorithms are more involved than MST algorithms.
Metric TSP and Approximation Algorithms
In our previous lecture, we discussed the Traveling Salesperson Problem (TSP), a classic NP-complete problem. We highlighted that while finding an optimal solution for the general TSP is computationally intractable, the Metric TSP, a special case where distances satisfy the triangle inequality, admits efficient approximation algorithms.
We explored a 2-approximation algorithm for the Metric TSP based on Minimum Spanning Trees (MST). While a 2-approximation is valuable, we aim for even better solutions. Today, we will delve into Christofides’ Algorithm, a significant advancement that provides a 1.5-approximation for the Metric TSP, a long-standing benchmark in this area.
Christofides Algorithm: Achieving a 1.5-Approximation
Christofides’ Algorithm, developed by Nicos Christofides in 1976, is a polynomial-time algorithm that guarantees a tour with a length at most 1.5 times the optimal tour length for the Metric TSP. It elegantly combines the concept of Minimum Spanning Trees (MSTs) with perfect matchings to achieve this improved approximation ratio.
Steps of Christofides Algorithm
Let be a complete graph with vertices representing cities and edge weights representing distances satisfying the triangle inequality. Christofides’ Algorithm proceeds in the following steps:
-
Compute a Minimum Spanning Tree (MST) of .
- We begin by finding a Minimum Spanning Tree of the complete graph . Algorithms like Kruskal’s or Prim’s can efficiently compute an MST in polynomial time, typically or .
- Key Insight: The total weight of the MST, , serves as a lower bound for the optimal TSP tour length, . This is because any TSP tour must span all vertices, and an MST is the minimum weight spanning subgraph. Hence, .
-
Identify Vertices with Odd Degree in .
- Let be the set of vertices in the MST that have an odd degree.
- Property of Graph Degrees: In any graph, the number of vertices with odd degree is always even. Therefore, is even.
-
Find a Minimum Weight Perfect Matching in the subgraph induced by .
- Consider the subgraph of induced by the vertices in . Let’s call this subgraph .
- Find a minimum weight perfect matching in . Since is even, a perfect matching exists if is complete (which it is since is complete). Efficient algorithms exist to find minimum weight perfect matchings in polynomial time, although they are more complex than MST algorithms.
- Key Insight: We are focusing on the vertices with odd degrees in the MST. By adding a perfect matching of these vertices, we aim to make all vertex degrees even, paving the way for an Eulerian tour construction.
-
Construct a Multigraph .
- Create a multigraph by taking the union of the edges of the MST and the edges of the minimum weight perfect matching . Note that might be a multigraph because and could share vertices, and thus might contain multiple edges between the same pair of vertices.
- Degree Parity in : In , every vertex has an even degree. Why? Consider any vertex . If , it had an even degree in . Since only involves vertices in , the degree of remains even in . If , it had an odd degree in . Since is a perfect matching on , exactly one edge of is incident to . Adding this edge to makes the degree of even (odd + 1 = even).
-
Find an Eulerian Tour in .
- Since every vertex in has an even degree and is connected (because is spanning), contains an Eulerian tour . We can find such a tour efficiently using Hierholzer’s algorithm.
- Eulerian Tour Properties: The Eulerian tour traverses every edge of exactly once.
-
Convert Eulerian Tour to Hamiltonian Cycle by Shortcutting.
- Traverse the Eulerian tour . As we traverse, create a Hamiltonian cycle by “shortcutting” repeated vertices. Whenever we are about to visit a vertex that has already been visited in , we skip to the next unvisited vertex in .
- Triangle Inequality Guarantee: Due to the triangle inequality, shortcutting does not increase the tour length. By skipping repeated vertices, we create a valid Hamiltonian cycle (visiting each vertex exactly once).
The resulting cycle is the tour produced by Christofides’ Algorithm.
Justification and Intuition Behind Christofides Algorithm
Christofides’ Algorithm builds upon the 2-approximation MST algorithm by strategically adding a minimum weight perfect matching to the MST. Let’s understand the rationale behind each step:
-
MST as a Backbone: The MST provides a low-cost backbone connecting all cities. Its weight is a lower bound on the optimal TSP tour. However, simply traversing the MST (like in the 2-approximation algorithm by doubling edges) leads to revisiting vertices.
-
Addressing Odd-Degree Vertices: The problem with the doubled MST approach is that it forces us to revisit vertices to create a closed tour. Christofides’ Algorithm cleverly addresses this by focusing on the odd-degree vertices in the MST. These vertices are the “problem points” hindering the direct construction of an Eulerian tour from the MST itself.
-
Minimum Weight Perfect Matching: By finding a minimum weight perfect matching of the odd-degree vertices , we are essentially “repairing” the MST to make all degrees even. We choose a minimum weight matching to keep the added cost as low as possible. The edges in can be thought of as “shortcuts” that efficiently connect the odd-degree vertices in pairs, making their degrees even without adding too much extra length.
-
Eulerian Tour and Shortcutting: After adding the matching to the MST , we obtain a multigraph where all vertices have even degrees. This guarantees the existence of an Eulerian tour. By shortcutting this Eulerian tour, we transform it into a Hamiltonian cycle, ensuring each city is visited exactly once, while leveraging the low cost structure built upon the MST and the matching.
Approximation Ratio Proof: Why 1.5-Approximation?
The crucial aspect of Christofides’ Algorithm is that it achieves a 1.5-approximation ratio. Let’s outline the proof to understand why the length of the tour produced by Christofides’ Algorithm is guaranteed to be no more than 1.5 times the optimal TSP tour length, .
-
MST Lower Bound: We know that the weight of the MST, , is a lower bound on the optimal TSP tour length:
-
Bounding the Minimum Weight Perfect Matching : Consider an optimal TSP tour, . Let be the set of odd-degree vertices in . Consider the vertices in as they appear in . Traverse the optimal tour . As we encounter vertices from , we are effectively visiting them in some order. Let’s create two perfect matchings on by considering alternating edges of the optimal tour as we visit vertices in .
- Imagine traversing and encountering vertices of in order (since is even).
- Consider two sets of edges from connecting vertices in :
- (wrap around)
Both and are perfect matchings on . Because of the triangle inequality, the sum of the weights of edges in is no more than the weight of the path segments in connecting these pairs. Similarly for . Importantly, the edges of and , when considered as paths in , are edge-disjoint and together they form an alternating cycle cover of the vertices in within .
Therefore, . Since is a minimum weight perfect matching on , its weight must be less than or equal to the weight of either or (at least one of which must have weight no more than half of ). Thus, .
Hence, we have the bound:
-
Length of Eulerian Tour : The multigraph is formed by . The length of the Eulerian tour in is the sum of the weights of edges in and :
-
Approximation Ratio: Combining the bounds for and :
-
Hamiltonian Cycle via Shortcutting: Since shortcutting does not increase the length (due to triangle inequality), the length of the Hamiltonian cycle obtained by shortcutting is no longer than :
Combining all inequalities, we have:
This proves that Christofides’ Algorithm provides a 1.5-approximation for the Metric TSP.
Recent Advances and Limits of Approximation
For decades, Christofides’ Algorithm remained the best known polynomial-time approximation algorithm for the Metric TSP. It was a major open question whether a better approximation ratio could be achieved.
-
Long-Standing Open Problem: For a long time, achieving an approximation ratio better than 1.5 for the Metric TSP was a significant challenge.
-
Breakthrough in 2020: In September 2020, a breakthrough was achieved by Karlin, Klein, and Gharan. They presented a polynomial-time algorithm with an approximation ratio of . This is a very slight improvement over 1.5, but it was a major theoretical advancement, breaking the 1.5 barrier for the first time in polynomial time.
-
Current Research: Research continues to explore whether even better approximation ratios are possible. However, there are also limitations.
-
NP-Hardness of Better Approximation: It is known that it is NP-hard to achieve an approximation ratio better than 1.008 for the Metric TSP. This means that we cannot expect to find polynomial-time algorithms that get arbitrarily close to the optimal solution unless P=NP.
Therefore, while Christofides’ Algorithm is not optimal, it provides a practically useful and theoretically significant approximation guarantee for the Metric TSP, and further substantial improvements are likely to be very challenging to achieve due to complexity-theoretic barriers.
Continue here: 07 Graph Coloring, Greedy Coloring, Smallest-Last Heuristics, Block Graph Colorings