Lecture from: 07.11.2024 | Rui Zhangs Notes | Video: Videos ETHZ
Introduction to Graph Theory
The origins of graph theory can be traced back to Leonhard Euler’s groundbreaking solution to the “Königsberger Brückenproblem,” a classic puzzle posed in the 18th century. This problem laid the foundation for a new branch of mathematics that has become instrumental in understanding and modeling various complex systems.
The Königsberg Bridge Problem
Königsberg (now Kaliningrad) was a city with a river flowing through it, dividing the city into four distinct landmasses connected by seven bridges. The problem challenged people to find a route through the city that crossed each bridge exactly once and only once.
Euler recognized that the problem’s essence wasn’t the specific shape of the landmasses or the exact lengths of the bridges, but rather the connections between them. This led him to abstract the problem into a graph, a mathematical structure consisting of vertices (representing the landmasses) and edges (representing the bridges).
Euler’s solution to the Königsberg Bridge Problem hinged on the concept of an Eulerian path, also known as an Eulerian trail. This is a path in a graph that traverses each edge exactly once.
Eulerian Paths
An Eulerian path is a path in a graph that visits every edge of the graph exactly once. It can start and end at different vertices.
Example:
Degree of a Vertex
The degree of a vertex is the number of edges incident to it.
Example:
Euler’s Theorem
Euler proved a fundamental theorem that relates the existence of Eulerian paths to the degrees of vertices in a graph:
A graph has an Eulerian path if and only if it has at most two vertices with an odd degree.
Proof:
-
Necessity: If a graph has an Eulerian path, we can traverse the path, entering and leaving each vertex (except for the start and end vertices) using an even number of edges. Therefore, all vertices except possibly the start and end have even degrees.
-
Sufficiency: Assume a graph has at most two vertices with odd degrees.
-
Case 1: Zero odd-degree vertices: We can start at any vertex and construct an Eulerian circuit (a closed Eulerian path) by traversing the graph until we return to the starting vertex.
-
Case 2: Two odd-degree vertices: We can start at one of the vertices with odd degree, traverse the graph, and end at the other vertex with odd degree. This creates an Eulerian path.
-
Haus of Nikolaus
The “Haus of Nikolaus” puzzle is a classic graph theory problem that asks you to trace every edge of the house diagram exactly once without lifting your pen from the paper. This is equivalent to finding an Eulerian path (or an Eulerian circuit if the start and end points are the same).
Applying Euler’s Theorem:
To determine if an Eulerian path exists, we examine the degrees of the vertices in the “Haus of Nikolaus” graph. Notice that only the two bottom vertices have an odd degree (degree 3). All other vertices have an even degree (degree 2 or 4).
According to Euler’s theorem, a graph has an Eulerian path if and only if it has at most two vertices with odd degree. Since the “Haus of Nikolaus” graph satisfies this condition (it has exactly two vertices with odd degree), we can conclude that an Eulerian path exists. Furthermore, the path must start at one of the odd-degree vertices and end at the other.
Finding an Eulerian Path:
Because the two bottom vertices have odd degrees, any Eulerian path must start at one of these vertices and end at the other. There are multiple valid Eulerian paths for this graph. Finding one is often a matter of trial and error, systematically exploring possible routes until a solution is found.
Hamiltonian Path
A Hamiltonian path is a path in a graph that visits each vertex exactly once. Unlike an Eulerian path, which focuses on traversing edges, a Hamiltonian path focuses on visiting vertices. Finding a Hamiltonian path is generally a much harder computational problem than finding an Eulerian path. There’s no simple, efficient algorithm to determine if a graph has a Hamiltonian path.
Hamiltonian Cycle (or Circuit):
A Hamiltonian cycle (or Hamiltonian circuit) is a Hamiltonian path that is also a cycle, meaning it starts and ends at the same vertex.
The Challenge of Finding Hamiltonian Paths:
Determining whether a graph has a Hamiltonian path or cycle is a classic NP-complete problem. This means there is no known algorithm that can solve this problem in polynomial time for all possible graphs. As the number of vertices in a graph increases, the number of possible paths to check grows exponentially, making exhaustive search impractical for large graphs.
Algorithms for Eulerian/Hamiltonian Paths
Let us denote as the number of vertices or nodes, as the number of edges. Now suppose we wanted to find a Eulerian or Hamiltonian Path, a naive approach would be to brute force.
For the Eulerian Path, we’d check all permutations of the edges. There are permutations of edges. For each permutation, we’d check if it forms a valid Eulerian Path.
For the Hamiltonian Path, we’d check all permutations of the vertices. There are permutations of vertices. For each permutation, we’d check if it forms a valid Hamiltonian Path.
Brute-Force Complexity:
- Eulerian Path: The complexity of this brute-force approach for an Eulerian path is .
- Hamiltonian Path: The complexity of this brute-force approach for a Hamiltonian path is .
These brute-force algorithms are computationally extremely expensive, especially for graphs with a large number of vertices or edges. The factorial growth of the complexity makes these approaches impractical for even moderately sized graphs.
The natural question which follows is: Geht es besser? (Can we do better?)
For Eulerian paths, the answer is a resounding yes! There exist efficient algorithms that can find Eulerian paths (or circuits) in time, where is the number of vertices and is the number of edges. This is a significant improvement over the factorial complexity of brute-force approaches.
For Hamiltonian paths, however, the current consensus is that a polynomial-time algorithm is impossible. This is closely related to the famous problem.
Summary:
- Eulerian Paths: Efficient algorithms exist, allowing us to find Eulerian paths in polynomial time.
- Hamiltonian Paths: No efficient general algorithm is known, and it’s likely that no polynomial-time algorithm exists (assuming ).
Graphs
Graphs provide a powerful mathematical model for representing and analyzing networks and relationships within various systems. Examples of their applications include:
- Computer Networks: Modeling the structure of the internet, routing protocols, and data flow.
- Social Networks: Representing connections between individuals, analyzing social interactions, and studying information diffusion.
- Transportation Networks: Modeling road systems, airline routes, and public transportation networks for optimization and planning.
- Neural Networks: Representing the connections between neurons in artificial intelligence and machine learning models.
Mathematical Definition
A graph, denoted by , is formally defined as a pair , where:
- : A finite, non-empty set of vertices (also called nodes). Vertices represent the individual entities within the network.
- : A set of edges. Each edge represents a connection between two vertices.
Edges can be represented in different ways depending on the type of graph:
-
Undirected Graph: An edge is an unordered pair of distinct vertices, denoted as , where and . This means the connection is bidirectional (goes both ways). Vertices and are said to be adjacent or neighbors. The edge is said to be incident to both and .
-
Directed Graph (Digraph): An edge is an ordered pair of vertices, denoted as , where . This represents a directed connection from to . We say there is an edge from to .
Degree of a Vertex
- Undirected Graph: The degree of a vertex , denoted as , is the number of edges incident to .
- Directed Graph: We distinguish between in-degree (number of edges coming into ) and out-degree (number of edges going out of ).
Handshaking Lemma
The Handshaking Lemma states that the sum of the degrees of all vertices in a graph is equal to twice the number of edges.
Proof:
The Handshaking Lemma is a consequence of the fact that each edge connects exactly two vertices. Consider any edge in the graph:
- Counting Degree: This edge contributes 1 to the degree of each of the two vertices it connects.
- Double Counting: Therefore, when we sum up the degrees of all vertices, we are effectively counting each edge twice.
Types of Paths
TLDR
- Walk: Most general, allows repetition of vertices and edges.
- Path: No vertex repetition allowed, each vertex visited exactly once.
- Closed Walk/Cycle: Starts and ends at the same vertex, allowing repetition of vertices but not edges (except for the first and last edge, which are the same).
Walk (Weg)
A walk is a sequence of vertices where each consecutive pair is connected by an edge. You can visit the same vertex multiple times.
Formal Definition: A walk is a sequence , where:
- are vertices (nodes) in the graph.
- are edges connecting the vertices.
- is the starting vertex.
- is the ending vertex.
Path (Pfad)
A path is a walk where no vertex is visited more than once.
Formal Definition: A path is a walk where for all .
Closed Walk (Zyklus or Cycle)
A closed walk is a walk that starts and ends at the same vertex. This is also known as a cycle.
Formal Definition: A closed walk is a walk where and .
Observation: A walk is a closed walk if and only if the end vertex has an even degree in the walk.
- Closed walk: A walk is closed if and only if . This means the end vertex is visited twice in the walk (once at the start and once at the end).
- Even degree in the walk: The degree of the end vertex in the walk is the number of edges incident to it in the sequence. Since , every time we enter we must also exit it in the walk. Therefore, the degree of must be even.
Reaching Relation (Connectivity)
The reaching relation in a graph describes the ability to get from one vertex to another. Formally, we say that vertex reaches vertex if there exists a walk (a sequence of alternating vertices and edges) from to .
Properties of the Reaching Relation
This relation has important properties:
- Symmetric: If can reach , then can reach . This is because we can simply reverse the walk from to to obtain a walk from to . Note: This assumes an undirected graph. In a directed graph, the reaching relation is not necessarily symmetric.
- Reflexive: Every vertex can reach itself. The trivial walk consisting of just the vertex (with no edges) constitutes a walk from to .
- Transitive: If can reach , and can reach , then can reach . We can concatenate the walk from to with the walk from to to obtain a walk from to .
Equivalence Relation and Connected Components
Because the reaching relation is symmetric, reflexive, and transitive, it is an equivalence relation (see 09 Equivalency Relation and Classes, Partitions, Partially Ordered Sets). This has a significant consequence: it partitions the vertices of the graph into equivalence classes. These equivalence classes are called connected components (or sometimes “Zusammenhangskomponenten,” abbreviated as ZHK).
- Within a Connected Component: Every vertex within a connected component can reach every other vertex within the same component.
- Between Connected Components: No vertex in one connected component can reach a vertex in a different connected component.
Connected Graphs
A graph is said to be connected if there is a path between any two vertices. In terms of connected components, this means that a connected graph has exactly one connected component. All vertices belong to the same equivalence class, and they can all reach each other. Conversely, a disconnected graph has two or more connected components.
Eulerian Cycle (or Circuit)
An Eulerian cycle, also known as an Eulerian circuit, is a closed Eulerian path. This means it’s a trail that traverses every edge of a graph exactly once and returns to the starting vertex. The existence of Eulerian cycles, just like Eulerian paths, depends on the degrees of the vertices in the graph.
Cycle vs. Path:
An Eulerian cycle is a specific type of Eulerian path. Every Eulerian cycle is an Eulerian path, but the converse is not true—not every Eulerian path is an Eulerian cycle. A graph can possess an Eulerian path but lack an Eulerian cycle if it has precisely two vertices with odd degrees.
Euler’s Theorem for Eulerian Cycles
A connected graph has an Eulerian cycle if and only if every vertex in has an even degree.
Proof:
Part 1: Why even degrees are necessary (If there’s an Eulerian cycle, all degrees must be even)
Imagine you’re walking along the Eulerian cycle. Every time you enter a vertex, you have to leave it (since you can’t reuse edges). So, for every “incoming” edge to a vertex, there must be an “outgoing” edge. This means the edges at each vertex are paired up, resulting in an even degree. The only possible exception is the starting/ending vertex, but since the cycle is closed (starts and ends at the same place), even that vertex has an even degree.
Part 2: Why even degrees are sufficient (If all degrees are even, an Eulerian cycle exists)
This part is trickier, but the core idea is to keep building cycles and merging them until you’ve used all the edges.
- Start anywhere and wander: Begin at any vertex and walk along unused edges. Because every vertex has an even degree, you’ll never get stuck at a vertex without an exit (except when you return to your starting point). This creates a closed cycle.
- If you missed some edges, find a detour: If the cycle you made doesn’t include all the edges, find a vertex on your cycle that still has unused edges connected to it. Start a new walk from that vertex, again using only unused edges. This will create another cycle.
- Merge the detours: Combine the new cycle with your original cycle. Imagine “splicing” the new cycle into the original one at the vertex where you started the detour. Now you have a bigger cycle.
- Repeat until you’ve covered everything: Keep finding detours and merging them into your growing cycle until all edges have been used. Because all degrees are even, you’re guaranteed to eventually use up all the edges, and the final result will be an Eulerian cycle.
Finding Eulerian Cycles: A Step-by-Step Approach
Our goal is to find an Eulerian cycle in a graph, meaning a path that traverses every edge exactly once and returns to the starting vertex. We’ll achieve this by breaking down the problem into smaller, more manageable pieces:
- Subcycles: First, we’ll find individual cycles within the graph, ensuring that each cycle doesn’t repeat vertices.
- Merging: We’ll then connect these subcycles together to form a larger cycle.
Example:
We start by finding individual cycles:
Then, we connect these cycles to obtain a complete Eulerian cycle:
The Algorithm
1. walk
Function:
We define a walk
function that uses Depth-First Search (DFS) to find a cycle within the graph:
u.edges()
: Returns a list of edges connected to vertexu
.uv.marked()
: Checks if an edgeuv
has been marked as used in a cycle.- Recursive Call: If an unmarked edge
uv
is found, it is marked, and thewalk
function is recursively called on the other vertex (uv.v
) to continue exploring along the path.
Example:
The walk
function, starting from a vertex, will trace out a cycle and mark the edges:
Then, starting from a vertex with unmarked edges, we find another subcycle:
2. Finding and Merging Subcycles:
- Iteration: We repeatedly call
walk(u)
for vertices with unmarked edges until all edges are marked. This finds all subcycles. - Splicing: To combine the subcycles, we identify a common vertex in two cycles and “splice” them together at that point.
3. Eulerian Cycle:
By iteratively finding subcycles and splicing them together, we obtain a complete Eulerian cycle (if the graph is connected).
Properties and Correctness
Let’s rigorously establish the properties and correctness of the walk(u)
function for finding Eulerian cycles.
Properties:
walk(u)
marks a walk/trail starting at vertex . This is by construction. The function follows edges, marking them as it goes, creating a trail.- Every edge is marked at most once. The function only marks unmarked edges, ensuring no edge is marked twice.
- The final vertex of a walk has all its incident edges marked. The function only terminates when it reaches a vertex where all incident edges are already marked. Because all vertices have even degrees, this implies the final vertex must be the starting vertex, , making a cycle.
Invariant:
For all (all vertices in the graph), the number of unmarked edges incident to is even.
Assumptions (to be proven):
- Invariant Preservation: The
walk(u)
function preserves the invariant. In other words, if the invariant is true before callingwalk(u)
, it remains true after the function completes. - Cycle Guarantee: If the invariant is valid before calling
walk(u)
, then the walk produced by the function is a cycle.
If we can prove these assumptions, then we can prove the correctness of our Eulerian cycle algorithm.
Proof of Assumption 2 (Cycle Guarantee)
We’ll prove this by contradiction. Assume that the invariant holds before walk(u)
, but the walk produced is not a cycle. This means that the final vertex of , let’s call it , is different from the starting vertex .
-
Even Degree at : Since the invariant holds, has an even number of unmarked edges.
-
ends at : The
walk
function terminated at , meaning all edges incident to that were originally unmarked within the component containing are now marked. Therefore, must now have zero unmarked edges within the currently considered connected component. This must be the case because the walk terminates when there are no more unmarked adjacent edges to follow. -
Contradiction: If and the
walk
function terminated at , there must have been no unmarked edge to continue the walk from . However, since the walk started at and followed a series of unmarked edges to , an odd number of edges incident to would have been marked (because we arrived at through one edge), leaving an odd number of unmarked edges. This contradicts our previous observation that after the execution ofwalk
, has zero unmarked edges. Thus, our initial assumption that isn’t a cycle must be false, meaning is indeed a cycle if has an even number of unmarked incident edges at the start ofwalk(u)
.
Therefore, if the invariant holds before calling walk(u)
, the walk produced by the function must be a cycle.
In simpler terms: If every vertex has an even number of unused “exits,” and you start walking along these exits, you’ll eventually come back to where you started. You can’t get stuck anywhere else because every time you enter a vertex, there’s always another exit available. The only way the walk stops is if you return to your starting point, completing a cycle.
Proof of Assumption 1 (Invariant Preservation)
To prove that the walk(u)
function preserves the invariant. That is, if every vertex has an even number of unmarked edges before walk(u)
is called, then every vertex will still have an even number of unmarked edges after walk(u)
completes.
Proof:
-
Consider an arbitrary vertex
v
: Let’s focus on any vertexv
in the graph. -
Two Cases:
-
Case 1:
v
is not part of the walkW
: Ifv
is not part of the walkW
generated bywalk(u)
, then none of its incident edges are marked during the walk. Therefore, the number of unmarked edges incident tov
remains unchanged, and since it was even initially, it remains even. -
Case 2:
v
is part of the walkW
: Ifv
is part of the walkW
, then the walk enters and leavesv
a certain number of times. Each time the walk passes throughv
, two of its incident edges are marked (one incoming edge and one outgoing edge). Note: We are considering the version of thewalk
algorithm which continues looking for another available edge even if it returns to . This does not change the correctness of the final algorithm, it merely makes it easier to argue why the invariant is maintained since now it is more obvious that at the point of termination, all edges must be marked. The cycle will simply contain more than once.
-
-
Net Change in Unmarked Edges: In Case 2, let’s say the walk passes through
v
a total ofk
times. This means2k
edges incident tov
are marked. Since the number of unmarked edges was even initially, and we’ve marked an even number (2k
) of additional edges, the number of unmarked edges atv
remains even after the walk. The single exception is the starting node for which an odd number of incident edges will be marked after the first step ofwalk
. However, is revisited and another incident edge is marked when we discover there are no unmarked adjacent edges left to follow at . This way an even number of edges will be marked at as well. -
Conclusion: In both cases, the number of unmarked edges incident to
v
remains even after the execution ofwalk(u)
. Sincev
was an arbitrary vertex, this holds true for all vertices. Therefore,walk(u)
preserves the invariant.
Eulerweg Pseudocode:
Explanation and Improvements:
EulerWalk(G)
: This is the main function. It initializes edges as unmarked, finds subcycles usingwalk(v)
, and then merges them.walk(u)
: This function now returns an explicit list of edges. Alternatively, we could define it to store the discovered cycle in some implicit global storage and then just return the starting vertex which would suffice to then deduce the found cycle. This would avoid copying the list of edges when returning from the function and could save some runtime in practice.merge_cycles
: This function takes two cycles and merges them at a common vertex using array slicing or linked lists for better performance in practice.find_common_vertex
: Uses a simple linear scan in worst-case to find a common vertex but mentions optimization potential using a hashmap. This operation is likely very fast in practice since the length of cycles likely only amounts to a small fraction of . Consider also that each merge reduces the number of cycles to merge by one, implying a triangular number complexity , but with a very small constant factor.- Edge Marking: Efficient edge marking is crucial. The
marked
property could be implemented using a hash table or a bitset for constant-time () checking and marking. This contributes significantly to the overall linear time complexity. - Cycle Representation: How cycles are represented (
cycles
list) matters for efficiency. Using linked lists or other dynamic data structures could improve the performance of merging cycles. - Connected Components: The code assumes the graph is connected. Handling disconnected graphs requires some adjustments. You would need to call
EulerWalk
on each connected component and then, if necessary, return a list of Eulerian cycles (one for each component).
Runtime Analysis
Let be the number of vertices and be the number of edges.
walk(u)
: Traverses a set of edges at most once, marking them. Complexity: .- Finding Subcycles: Each call to
walk(u)
finds a cycle and marks a set of previously unmarked edges. Since each edge is visited exactly once, the total complexity of finding all subcycles is . - Merging Subcycles: Finding connecting vertices and splicing cycles can be done efficiently using appropriate data structures. Overall merging complexity: in the worst-case.
The dominant steps are finding and merging subcycles. Thus, the overall time complexity of the Eulerian cycle algorithm is , which is linear in the size of the graph.
Continue here: 10 Topological Ordering, Directed Graphs, Representing Graphs on Computers, Topological Sort Algorithm