13 Kruskal's Algorithm, Union Find, Self-Adapting Data Structures and Amortized Analysis

Lecture from: 05.12.2024 | Rui Zhangs Notes | Video: Videos ETHZ

Last time, we covered Minimum Spanning Trees (MSTs) and discussed Borůvka’s and Prim’s algorithms. Today, we’ll move on to Kruskal’s algorithm.

Brief Recap of Prim’s Algorithm

Prim’s algorithm constructs an MST by starting with a single vertex and iteratively adding the cheapest edge connecting the current MST to a vertex not yet included. It uses a priority queue to efficiently find the vertex with the minimum key (cheapest connection cost) and an array d[] to store these key values for direct access.

Execution:

Initialization: An arbitrary vertex is added to the MST. All other vertices have their keys initialized to infinity (or the direct connection cost if an edge exists).
Iteration: The vertex $u$ with the smallest key is extracted from the priority queue and added to the MST. For each neighbor $v$ of $u$ outside the MST, if edge $(u, v)$ offers a cheaper connection (w(u, v) < key[v]), key[v] and $v$ ‘s priority in the queue are updated (using decrease_key). This ensures we always consider the cheapest connection to the growing MST.

The d values in the example represent the smallest edge weight connecting each node to the growing MST. These values, along with edge information updated during the algorithm, allow for MST reconstruction. The priority queue’s efficient extract_min and decrease_key operations enable Prim’s $O ((∣ V ∣ + ∣ E ∣) lo g ∣ V ∣)$ runtime.

Priority Queue Invariant

The priority queue $H$ maintains the following invariant: it contains precisely the vertices not yet in the MST, i.e., $H = V ∖ S$ , where $V$ is the set of all vertices and $S$ is the set of vertices currently in the MST. This ensures that the algorithm always has access to the vertices eligible to be added to the growing MST.

While Loop Invariant: The Meaning of `d[v]`

The while loop in Prim’s algorithm maintains an invariant concerning the values stored in d[v]:

d[v] represents the weight of the cheapest edge connecting vertex v to the current MST ( $S$ ). More formally:

d [v] = ⎩ ⎨ ⎧ 0, min_{(u, v) \in E : u \in S} {w (u, v)}, \infty, if v = s (the starting vertex) if v \in V ∖ S and \exists (u, v) \in E with u \in S if v \in V ∖ S and ∄ (u, v) \in E with u \in S

This invariant ensures that d[v] always reflects the minimum cost to reach vertex v from the current MST. The use of infinity for unreachable vertices ensures that they are not prematurely selected by the algorithm. The invariant is needed for the algorithm’s correctness, as it guarantees that we are always adding the cheapest possible edge to expand the MST, ultimately leading to a minimum spanning tree. Each time a vertex is added to the MST, this invariant is maintained by updating the d[v] values of its neighbors that are not yet in the MST.

Kruskal’s Algorithm: Building an MST with Sorted Edges

Kruskal’s algorithm offers a different approach to constructing MSTs compared to Prim’s. It focuses on globally sorting edges by weight and iteratively adding safe edges that don’t create cycles. This global perspective contrasts with Prim’s locally optimal approach.

Core Idea: Kruskal’s algorithm leverages the fact that the cheapest edge in the entire graph is always a safe edge. It extends this by iteratively considering edges in increasing order of weight, adding them to the MST as long as they don’t introduce cycles.

Algorithm Description:

Initialization: Start with an empty set $F = \emptyset$ to represent the MST edges. Initially, each vertex exists as its own separate connected component.
Iteration:
- Sort all edges in the graph by weight in non-decreasing order.
- For each edge $(u, v)$ in sorted order:
  - If adding $(u, v)$ to $F$ does not create a cycle (i.e., if $u$ and $v$ are in different connected components):
    - Add $(u, v)$ to $F$ .
    - Merge the connected components containing $u$ and $v$ .

How to Detect Cycles Efficiently: To determine if adding an edge creates a cycle, we need to check if the vertices it connects belong to the same connected component. This can be done efficiently using a Union-Find (which we’ll look at later on..) data structure (also known as a disjoint-set data structure).

Union-Find supports two key operations:

find(v): Returns a representative element for the connected component containing vertex v.
union(u, v): Merges the connected components containing vertices u and v.

By using find, we can check if two vertices belong to the same component: find(u) == find(v). If they do, adding the edge would create a cycle.

Pseudocode:

def kruskal(G):
    F = set()  # Use a set for efficient cycle detection
    for (u, v, weight) in sorted(G.edges(data='weight')): # Access weight data directly
        if find(u) != find(v):
            union(u, v)
            F.add((u, v))
    return F

Proving Correctness

We can demonstrate the correctness of Kruskal’s algorithm using two approaches: induction and the Cut Property (Schnittprinzip).

Proof by Induction

Base Case: With no edges selected, each vertex is its own MST, which is trivially correct.

Inductive Hypothesis: Assume that after adding $k$ edges, the set $F$ of selected edges is a subset of some MST.

Inductive Step: Let $(u, v)$ be the next edge considered (the $(k + 1)$ -th edge). If adding $(u, v)$ creates a cycle, it’s discarded, and the hypothesis holds. If it doesn’t create a cycle:

$(u, v)$ connects two different connected components (by the algorithm’s design).
Consider a cut that separates these components. $(u, v)$ is the cheapest edge crossing this cut (because Kruskal’s algorithm considers edges in sorted order).
By the Cut Property, $(u, v)$ belongs to some MST. Therefore, adding it to $F$ maintains the inductive hypothesis- $F$ remains a subset of an MST.

Conclusion: By induction, when all edges have been considered, the final set $F$ forms an MST.

Proof using the Cut Property (Schnittprinzip)

Let’s analyze the moment Kruskal’s algorithm considers adding an edge $(u, v)$ .

Disjoint Components: If $u$ and $v$ are in different connected components, adding $(u, v)$ does not create a cycle. Consider the cut that separates these components. Since Kruskal’s algorithm considers edges in increasing weight order, $(u, v)$ must be the cheapest edge crossing this cut.
Cut Property Guarantees Safety: The Cut Property states that the cheapest edge crossing any cut must belong to some MST. Therefore, $(u, v)$ is a safe edge to add.
Contradiction Argument: Suppose a cheaper edge $(x, y)$ crossed this cut. Kruskal’s algorithm would have considered $(x, y)$ before $(u, v)$ due to the sorting. If $(x, y)$ didn’t create a cycle, it would have already been added, and $u$ and $v$ would be in the same component, contradicting our initial assumption. If it did create a cycle at the time it was considered it would have correctly been discarded.

Conclusion: Each edge added by Kruskal’s algorithm is guaranteed by the Cut Property to be part of an MST. Since the algorithm continues until all vertices are connected, the final result is a minimum spanning tree. It’s important to emphasize that the sorted order of edge consideration is crucial for this argument to hold.

Runtime Analysis of Kruskal’s Algorithm

Outer Loop Iterations: The outer loop iterates at most $∣ E ∣$ times (once for each edge), where $∣ E ∣$ is the number of edges. In the worst case, all edges must be examined.

Inner Loop (without Union-Find): Without using a Union-Find data structure to efficiently manage connected components, checking for cycles would require a graph traversal for each edge, potentially taking $O (∣ V ∣)$ time per edge. This would lead to a total runtime of $O (∣ E ∣ \cdot ∣ V ∣)$ , which is not optimal. Remember $∣ V ∣$ represents the number of vertices.

Optimization with Union-Find: Employing a Union-Find data structure dramatically improves efficiency. Using path compression and union by rank, find and union operations take nearly constant amortized time (almost $O (1)$ ). This reduces the time complexity of the inner loop significantly. We will analyze the Union-Find data structure in more detail later. For now, it’s sufficient to understand that it allows for near-constant-time cycle detection. This optimization is crucial for making Kruskal’s algorithm practical for large graphs.

Overall Runtime (with Union-Find):

The dominant factor in the runtime becomes the initial sorting of the edges, which takes $O (∣ E ∣ lo g ∣ E ∣)$ time. With the efficient Union-Find structure, the remaining operations take nearly linear time in the number of edges.

Therefore, the overall time complexity of Kruskal’s algorithm with Union-Find is $O (∣ E ∣ l o g ∣ E ∣)$ . Since $∣ E ∣ < ∣ V ∣^{2}$ we can also state that $O (∣ E ∣ lo g ∣ E ∣) = O (∣ E ∣ lo g ∣ V ∣^{2}) = O (2∣ E ∣ lo g ∣ V ∣) = O (∣ E ∣ lo g ∣ V ∣)$ which gives the same complexity as Prim’s algorithm.

Using Union-Find is essential for making Kruskal’s algorithm competitive with other MST algorithms.

Union-Find Data Structure

The Union-Find data structure, also known as a disjoint-set data structure, is crucial for efficiently managing connected components in algorithms like Kruskal’s. It enables fast operations to determine if two elements belong to the same set (connected component) and to merge two sets.

Operations

make(V): Initializes the data structure with each vertex v in V as a separate connected component.
same(u, v): Checks if vertices u and v belong to the same connected component.
union(u, v): Merges the connected components containing vertices u and v.

Implementation and Storage: Representing Connected Components

Representative-Based Approach

We can represent connected components by assigning a unique representative to each component. Let rep[v] denote the representative of the component containing vertex v. If two vertices have the same representative, they are in the same connected component.

In this illustration, circled nodes are representatives. Arrows indicate that a node’s rep value points to the representative. For example, rep[node pointed to] = circled node

Invariant: rep[u] = rep[v] if and only if u and v are in the same connected component ( $CON (u) = CON (v)$ ). This allows for constant-time (O(1)) checking of component membership using array lookups.

Implementing the Operations

make(V): Initialization sets each vertex as its own representative (rep[v] = v for all v in V). This takes $O (∣ V ∣)$ time.

same(u, v): Checks if rep[u] == rep[v], which is $O (1)$ .

`union(u, v)`: Merging Components

Variant 1: Iterating Through All Vertices - Inefficient

Iterate through all vertices and update the representative of those in the same component as u to rep[v].

For every x ∈ V:
  if rep[x] == rep[u]:
    then rep[x] = rep[v]

This approach is inefficient, taking $O (∣ V ∣)$ time for each union operation.

Variant 2: Using Membership Lists

Introduce members[r], a list storing all members of the component with representative r.

Invariant: members[rep[u]] lists all vertices in CON(u).

Union Operation:

For x in members[rep[u]]:
    rep[x] = rep[v]
    members[rep[v]] = members[rep[v]] ∪ {x}  //using set union

This takes $O (∣ CON (u) ∣)$ time – proportional to the size of u’s component.

Worst-Case Scenario:

In a linear graph, the i-th union could update i-1 elements, leading to a total time of $Θ (1 + 2 + ... + n - 1) = Θ (n^{2})$ for a sequence of unions. This is still quadratic.

Towards a More Efficient Solution:

While Variant 2 isn’t optimal, it suggests improvements. We’ll explore these optimizations next, leading to a much more efficient Union-Find implementation. A key idea will be to minimize the size of the members list that needs updating during the union operation.

Variant 3: Union by Size (Rank)

To improve upon Variant 2, we introduce a simple but powerful optimization: union by size (or rank). Instead of arbitrarily merging the members lists, we always attach the smaller connected component to the larger one. This minimizes the number of rep values that need updating. In practice, we often use “rank,” an upper bound on the tree height, as a proxy for size, but the principle is the same.

Modification: Store the size (or rank) of each component in an array size[], where size[r] represents the number of elements in the component with representative r.

Union Operation with Union by Size:

if size[rep[u]] < size[rep[v]]:
    for x in members[rep[u]]:
        rep[x] = rep[v]
	members[rep[v]] = members[rep[v]] ∪ {x}
	size[rep[v]] = size[rep[v]] + size[rep[u]]
 
else:
    for x in members[rep[v]]:
        rep[x] = rep[u]
	members[rep[u]] = members[rep[u]] ∪ {x}
	size[rep[u]] = size[rep[u]] + size[rep[v]]

This modification ensures that we always iterate over the smaller members list, minimizing the number of updates.

Worst-Case Analysis

In the worst-case scenario of a skewed tree (not a linear graph), a single union operation can still take $O (∣ V ∣)$ time. This happens when the smaller set in the union has a large number of elements causing a large amount of representative updates.

In this specific linear graph example however, the individual runtime of a union call is indeed $O (n /2)$ which simplifies to $O (n)$ , because, in the worst case, we might need to update the representatives of roughly half the nodes in the graph. While union by size improves the overall performance over a sequence of operations, it doesn’t eliminate the possibility of a linear-time single union in specifically constructed cases that result in skewed trees.

Amortized Analysis: Aiming for $O (n lo g n)$ Total Runtime

We want to analyze the total runtime of all $n - 1$ union operations required to build the MST using amortized analysis. Our goal is to show that this total runtime is $O (n lo g n)$ , even though individual union operations can be $O (n)$ in the worst case.

Proof:

Let $N_{u}$ be the number of times the representative of node u changes throughout the algorithm. The total runtime of all union operations can be bounded by:

O (u \in V \sum N_{u})

Bounding $N_{u}$ : How many times can a node’s representative change? With union by size, every time a node’s representative changes, the size of its connected component at least doubles. This is because we always merge the smaller component into the larger one.

Using this insight:

Component Size Doubling: Each time rep[u] changes, u becomes part of a component at least twice as large.
Maximum Changes: Since the maximum size of a component is n, rep[u] can change at most $lo g_{2} n$ times. A component can double in size at most a logarithmic number of times before reaching the maximum size.

Therefore, $N_{u} \leq lo g_{2} n$ .

This crucial bound on $N_{u}$ will allow us to prove the desired $O (n lo g n)$ amortized time complexity in the following steps.

Runtime of Kruskal’s Algorithm: A Summary

Kruskal’s algorithm builds an MST by iteratively adding the cheapest edges that don’t create cycles. Its runtime is primarily determined by two factors:

Sorting the Edges: Initially, all edges must be sorted by weight, which takes $O (∣ E ∣ lo g ∣ E ∣)$ time, or equivalently $O (∣ E ∣ lo g ∣ V ∣)$ since $∣ E ∣ < ∣ V ∣^{2}$ .
Union and Find Operations: For each edge, we check if adding it creates a cycle using the Union-Find data structure. With efficient path compression and union by rank (as will be explained later), the amortized time complexity of each union and find operation becomes nearly constant, $O (α (∣ V ∣))$ , where $α (∣ V ∣)$ is the inverse Ackermann function, which grows extremely slowly and can be considered practically constant for all realistic input sizes.

Overall Runtime:

Combining these, Kruskal’s algorithm achieves a time complexity of:

O (∣ E ∣ l o g ∣ V ∣)

The edge sorting dominates the runtime. This is the same complexity as Prim’s algorithm, making Kruskal’s a competitive choice for finding MSTs. The efficiency of Kruskal’s algorithm heavily relies on the optimized Union-Find data structure. Without these optimizations in Union-Find, the runtime would be significantly worse.

Self-Adapting Data Structures and Amortized Analysis

Traditional worst-case analysis can be pessimistic for data structures that adapt to usage patterns. Amortized analysis provides a more realistic performance measure by averaging the cost of operations over a sequence. A self-adapting data structure modifies its structure based on past requests to improve future performance.

Linear Search in a Linked List: A Motivating Example

Consider searching for elements in a linked list. The cost of a search is proportional to the element’s position in the list. Given a sequence of search requests (Anfragen) $q_{1}, \dots, q_{n}$ , the optimal list arrangement would place frequently accessed elements near the front.

Let $f_{i}$ be the number of times the $i$ -th element is requested, with $f_{1} \geq f_{2} \geq \dots \geq f_{l}$ , where $l$ is the number of distinct elements. The total cost for an optimal static list would be:

i = 1 \sum l (i \cdot f_{i})

However, we typically don’t know the frequencies $f_{i}$ in advance. This is where self-adapting data structures come in.

Self-Adapting Data Structures: Dynamically Adjusting for Efficiency

Self-adapting data structures address the limitation of needing prior knowledge by rearranging elements based on usage. One simple strategy is Move-to-Front (MTF).

Moving an accessed element to the front of the list takes constant time, $O (1)$ , as it only involves pointer manipulation.

Move-to-Front (MTF): Competitive Analysis

Note for readers: The lecturer kinda rushed through so idk how correct the following is…

MTF’s efficiency is analyzed by comparing it to the optimal static list arrangement-a theoretical benchmark where element frequencies are known in advance.

Claim: The total cost of MTF on any sequence of requests is at most twice the cost of the optimal static list:

C os t_{MTF} \leq 2 \cdot i = 1 \sum l (i \cdot f_{i})

Proof:

Let $c_{ij}$ be the number of times element i is traversed while searching for element j.

Key Insight: $c_{ij} \leq min (f_{i}, f_{j})$

This holds because for i to be encountered before j in a search for j, i must have been requested at least once since j was last requested. The number of times this can happen is limited by the smaller of the two frequencies.

Total Cost of MTF: The total cost of MTF is the sum of costs for all searches:

C os t_{MTF} = j = 1 \sum l 1 + i \neq = j \sum c_{ij} = j \sum f_{j} + i \neq = j \sum c_{ij}

We can bound this using the key insight:

i \neq = j \sum c_{ij} \leq i \neq = j \sum min (f_{i}, f_{j})

To simplify, consider pairs $(i, j)$ and $(j, i)$ :

i \neq = j \sum min (f_{i}, f_{j}) = 2 \cdot 1 \leq i < j \leq l \sum min (f_{i}, f_{j}) \leq 2 j = 1 \sum l i = 1 \sum j - 1 f_{j} = 2 j = 1 \sum l (j - 1) f_{j}

Then

C os t_{MTF} \leq j = 1 \sum l f_{j} + 2 j = 1 \sum l (j - 1) f_{j} = j = 1 \sum l (2 j - 1) f_{j} \leq 2 j = 1 \sum l j \cdot f_{j}

which is twice the cost of the optimal static list, proving the claim. This demonstrates that even a simple self-adjusting strategy like MTF can achieve a competitive performance compared to the theoretically optimal static arrangement. This analysis highlights the power of amortized analysis in assessing the effectiveness of self-adapting data structures.

Continue here: 14 All Source Shortest Paths, Floyd-Warshall, Johnson’s Algorithm, Matrices and Graphs, Connected Components in Digraphs, Efficient Matrix Operations

CS Notes

Explorer

13 Kruskal's Algorithm, Union Find, Self-Adapting Data Structures and Amortized Analysis

Brief Recap of Prim’s Algorithm

Priority Queue Invariant

While Loop Invariant: The Meaning of `d[v]`

Kruskal’s Algorithm: Building an MST with Sorted Edges

Proving Correctness

Proof by Induction

Proof using the Cut Property (Schnittprinzip)

Runtime Analysis of Kruskal’s Algorithm

Union-Find Data Structure

Operations

Implementation and Storage: Representing Connected Components

Representative-Based Approach

Implementing the Operations

`union(u, v)`: Merging Components

Variant 1: Iterating Through All Vertices - Inefficient

Variant 2: Using Membership Lists

Variant 3: Union by Size (Rank)

Worst-Case Analysis

Amortized Analysis: Aiming for $O (n lo g n)$ Total Runtime

Runtime of Kruskal’s Algorithm: A Summary

Self-Adapting Data Structures and Amortized Analysis

Linear Search in a Linked List: A Motivating Example

Self-Adapting Data Structures: Dynamically Adjusting for Efficiency

Move-to-Front (MTF): Competitive Analysis

Table of Contents

Graph View

CS Notes

Explorer

13 Kruskal's Algorithm, Union Find, Self-Adapting Data Structures and Amortized Analysis

Brief Recap of Prim’s Algorithm

Priority Queue Invariant

While Loop Invariant: The Meaning of d[v]

Kruskal’s Algorithm: Building an MST with Sorted Edges

Proving Correctness

Proof by Induction

Proof using the Cut Property (Schnittprinzip)

Runtime Analysis of Kruskal’s Algorithm

Union-Find Data Structure

Operations

Implementation and Storage: Representing Connected Components

Representative-Based Approach

Implementing the Operations

union(u, v): Merging Components

Variant 1: Iterating Through All Vertices - Inefficient

Variant 2: Using Membership Lists

Variant 3: Union by Size (Rank)

Worst-Case Analysis

Amortized Analysis: Aiming for O(nlogn) Total Runtime

Runtime of Kruskal’s Algorithm: A Summary

Self-Adapting Data Structures and Amortized Analysis

Linear Search in a Linked List: A Motivating Example

Self-Adapting Data Structures: Dynamically Adjusting for Efficiency

Move-to-Front (MTF): Competitive Analysis

Table of Contents

Graph View

While Loop Invariant: The Meaning of `d[v]`

`union(u, v)`: Merging Components

Amortized Analysis: Aiming for $O (n lo g n)$ Total Runtime