Lecture from: 14.11.2024 | Rui Zhangs Notes | Video: Videos ETHZ
Topological Ordering
Imagine a directed graph representing a set of tasks and their dependencies. Vertices represent tasks, and a directed edge indicates that task must be completed before task can begin. A topological ordering (or topological sort) of this graph is a linear ordering of the vertices such that for every directed edge , vertex comes before vertex in the ordering. Essentially, it’s a valid sequence in which to perform the tasks.
Existence of Topological Orderings
Not all directed graphs have a topological ordering. The presence of directed cycles prevents a valid topological sort.
Example of a graph with no topological ordering:
In this circular dependency, it’s impossible to start any task because each task depends on another.
General Rule
A directed graph has a topological ordering if and only if it is a Directed Acyclic Graph (DAG), meaning it contains no directed cycles.
Proof
Let’s prove that a directed graph has a topological sort if and only if it contains no directed cycles. We can express this formally as:
A directed graph has a topological ordering is a DAG (Directed Acyclic Graph).
1. (If has a topological ordering, then is a DAG):
We will prove this direction by contraposition. Suppose is not a DAG, meaning it contains a directed cycle. Let the cycle be . Assume, for the sake of contradiction, that a topological ordering exists. In this ordering, there must be some vertex in the cycle, say , that comes first. However, since is part of a cycle, there’s an edge (where is the preceding vertex in the cycle, or if ). This means must come before in any topological ordering, contradicting our assumption that comes first in the cycle. Hence, a graph with a directed cycle cannot have a topological ordering.
TLDR: Look at the example above…
2. (If is a DAG, then has a topological ordering):
We will prove this direction constructively by outlining an algorithm to produce a topological ordering for any DAG. This will be covered in the following sections.
Examples of Topological Sorting
Topological sorting finds applications in various algorithms and scenarios where dependencies between tasks or operations exist. Here are a few examples:
- Karatsuba’s Multiplication Algorithm: The recursive nature of Karatsuba’s algorithm can be visualized as a dependency graph, where the computation of smaller products must precede the calculation of the final product.
- Fibonacci Number Calculation: Computing Fibonacci numbers recursively also exhibits dependencies, as the -th Fibonacci number depends on the -th and -th numbers.
- Edit Distance Problem: Calculating the edit distance between two strings involves a dynamic programming approach, where the value of a cell in the table depends on the values of its neighboring cells.
Directed Graphs
A directed graph (or digraph) is similar to an undirected graph, but its edges are ordered pairs.
- Edges as Ordered Pairs: An edge in a digraph is represented as , where and . This signifies a directed edge from to .
- No Parallel Edges (in this definition): The provided definition disallows parallel edges, meaning there can be at most one edge from to .
- Predecessors and Successors: If is an edge in the digraph, we say is a direct predecessor (or parent) of , and is a direct successor (or child) of .
Degree of a Vertex
In directed graphs, we distinguish between in-degree and out-degree:
- In-degree (): The number of edges directed into vertex .
- Out-degree (): The number of edges directed out of vertex .
Special Cases:
- Source (or Quelle): A vertex with is called a source. It has no incoming edges.
- Sink (or Senke): A vertex with is called a sink. It has no outgoing edges.
Walks, Paths, and Cycles
Similar to undirected graphs, we define walks, paths, and cycles in digraphs, but with the added constraint of direction:
- Directed Walk: A sequence of alternating vertices and edges where each is a directed edge.
- Directed Path: A directed walk with distinct vertices.
- Directed Cycle: A directed path that starts and ends at the same vertex.
Representing Graphs on a Computer
Storing graphs efficiently on a computer is crucial for implementing graph algorithms. Let’s consider a graph with vertices and edges. We’ll examine two common representations: adjacency matrices and adjacency lists.
Suppose we want to store this graph:
Adjacency Matrix
An adjacency matrix represents the graph as an matrix , where:
- if there’s an edge from to .
- otherwise.
Example:
Operations and Runtime
Operation | Runtime | Intuition |
---|---|---|
Check edge | Direct access to matrix element A[u][v] . | |
Find neighbors of | Iterate through row A[u] . | |
Add/remove edge | Change matrix element A[u][v] . |
Pros and Cons
Pros | Cons |
---|---|
Simple implementation. | Space-inefficient, especially for sparse graphs (). |
Constant-time edge lookups. | Linear time to find neighbors. |
Suitable for dense graphs ( close to ). | Can be wasteful for graphs with few edges. |
Adjacency List
An adjacency list represents a graph as an array of lists. Each list corresponds to a vertex and stores the vertices such that is an edge. Often, these lists are implemented using linked lists for efficient insertion and deletion of edges.
Example:
Operations and Runtime
Operation | Runtime | Intuition |
---|---|---|
Check edge | Traverse the adjacency list of . | |
Find neighbors of | Directly access the adjacency list of . | |
Add edge | Add to ‘s linked list | |
Remove edge | Search and remove from ‘s list. |
Pros and Cons
Pros | Cons |
---|---|
Space-efficient for sparse graphs. | Edge lookup can take up to time. |
Efficient for iterating over neighbors. | Slightly more complex implementation than adjacency matrix. |
Suitable for most graph algorithms. | Less efficient for very dense graphs. |
Choosing the Right Representation
- Adjacency Matrix: Simpler, constant-time edge lookup, suitable for dense graphs.
- Adjacency List: More space-efficient for sparse graphs, efficient neighbor iteration, more commonly used in practice. The choice between the two depends on the specific application and the characteristics of the graph being represented.
Topological Sort Algorithm
Let’s explore a topological sort algorithm and its connection to the absence of directed cycles.
Core Idea: The algorithm repeatedly finds and removes sinks (vertices with no outgoing edges) from the graph.
- Find a Sink: Locate a vertex with .
- Append to List: Add the sink to the end of the topological ordering list.
- Remove and Recurse: Remove the sink and all its incoming edges from the graph. Repeat the process on the remaining subgraph.
Initial Approach (using path
)
The path
function aims to find a sink by exploring a path from a given vertex:
Example:
Calling path(A)
might produce the marked path A, B, E, F, G. The topological order would be initialized with G, F, E, B, A (built by prepending).
Properties of path
:
- Marks a Path:
path(u)
marks a directed path starting at . - End Node Properties: The end vertex of has all its successors either already marked or has no successors (a sink).
Claim: If there are no directed cycles, the end vertex found by path(u)
is a sink.
Proof (by Contrapositive): If is not a sink, the graph contains a directed cycle.
- Assume is not a sink: has a successor . By
path
’s properties, must already be marked. - is on the path: Since is marked, it must lie on the path from to .
- Cycle Exists: The edge and the path segment from to form a cycle.
Improved Approach (using visit
and DFS)
The path
function is inefficient because it restarts the search after finding each sink. The visit
function improves this:
Example:
visit
processes all reachable vertices from a given vertex before adding it to the order. This means it’s more efficient and doesn’t discard work.
Depth-First Search (DFS):
dfs_topological_sort
uses visit
to perform a depth-first traversal, ensuring all vertices are visited and added to the topological order.
DFS Tree:
The DFS creates a tree structure that captures the dependencies between vertices, enabling the topological sort. By adding nodes after all descendants have been visited, we create the correct ordering.
Let’s analyze the topological sort procedure more deeply by augmenting the visit
function with pre- and post-visit timestamps.
We’ve introduced pre
and post
dictionaries (or arrays) to store timestamps. pre[u]
records the time when vertex u
is first visited, and post[u]
records the time when the visit(u)
call completes. T
is a global counter (or a parameter passed by reference) that keeps track of the current time. Note how T is recursively updated and passed down.
Example with Pre- and Post-visit Times:
Starting the DFS from vertex A, we obtain the pre- and post-visit times shown in the image.
Pre- and Post-order Sort:
-
Pre-order: The pre-order is not necessarily a topological sort. Note that vertices are prepended to
topological_order
and not added. The pre-order sorting is, however, the order in which the recursivevisit
calls begin. -
Post-order: The post-order is a topological sort. The post-order is the order in which the recursive
visit
calls complete. This is because, by definition, when avisit(u)
call completes, it means that all of u’s successors have also completed theirvisit
calls. Thus, it’s safe to place u before its successors in the topological order.
Interval Pattern (Nesting):
The intervals defined by [pre[u], post[u]]
exhibit a nesting pattern. This nesting directly corresponds to the recursion tree of the DFS.
Edge Classifications based on Intervals
Note for readers: I suggest watching this less confusing explanation
Given an edge , we can classify it based on the intervals and :
-
Tree Edge: If , then is a tree edge. This means is a descendant of in the DFS tree. The
visit(u)
call directly leads to thevisit(v)
call. Example: (A, B), (B, E). -
Back Edge: If , then is a back edge. This indicates a cycle in the graph. is a descendant of , and the edge goes back to an ancestor. Examples: (F, B), (D, A).
-
Forward Edge: If , then is a forward edge. This means is a descendant of , but the edge is not part of the DFS tree. There is a path u to v in the tree but a direct edge as well. Examples: (A, F), (B, G).
-
Cross Edge: If , then is a cross edge. This means and are in different branches of the DFS tree, and there is no ancestor-descendant relationship between them. They belong to different subtrees that don’t overlap. Examples: (H, G), (D, H).
Impossible Cases:
-
Overlapping but not Nested Intervals: Due to the recursive nature of DFS, two intervals cannot partially overlap without one completely containing the other.
-
: This case is impossible because
visit(u)
would callvisit(v)
ifv
was not yet marked, so the interval ofv
can’t end before u.
Observations and Topological Sort Justification
Let’s connect the edge classifications to the existence of directed cycles and justify why the reverse post-order provides a topological sort.
Observations:
-
Back Edge Implies Cycle: If a back edge exists, then there is a directed cycle. This is because a back edge connects a vertex to an ancestor in the DFS tree. The path from to in the DFS tree, combined with the back edge , forms a cycle.
-
Non-Back Edges and Post-Order: For all non-back edges , we have . Let’s justify this for each non-back edge type:
- Tree Edge: By definition, for a tree edge .
- Forward Edge: Similarly, for a forward edge , .
- Cross Edge: For a cross edge , the DFS finishes visiting and its descendants before starting to visit . Therefore, .
Connecting to Topological Sort:
-
No Cycles Imply No Back Edges: If there are no directed cycles in the graph, then observation 1 tells us there can be no back edges.
-
Reverse Post-Order is Topological Sort: If there are no back edges, then for every edge in the graph (all of which are now non-back edges), (observation 2). This means that if we sort the vertices in reverse post-order, for every edge , will come before in the sorted list. This is precisely the definition of a topological sort.
Runtime of Topological Sort
The runtime of the topological sort algorithm depends on the graph representation. Let be the number of vertices and be the number of edges.
Adjacency Matrix
-
visit
calls: Thevisit
function is called at most once for each vertex. Therefore, there are at most calls tovisit
. -
Work per
visit
call: Insidevisit
, the main work is iterating through the successors of a vertex u. With an adjacency matrix, this requires checking all n entries in the row corresponding to u to determine its successors. -
Total Runtime: Consequently, the total runtime using an adjacency matrix is .
Adjacency List
-
visit
calls: Similar to the adjacency matrix case, thevisit
function is called at most n times. -
Work per
visit
call: For a vertex u, the work insidevisit
involves iterating through the successors of u. With an adjacency list, this takes time, where is the out-degree of u. Additionally, the recursive calls ofvisit
do their work. -
Total Runtime: Summing up the work across all
visit
calls gives us:Since and (the total number of edges — the Handshaking lemma for directed graphs), the runtime simplifies to .
TLDR:
- Adjacency Matrix:
- Adjacency List:
For sparse graphs where , the adjacency list representation leads to a significantly faster topological sort algorithm. For dense graphs where is closer to , the difference in performance might be less pronounced. However, the adjacency list representation generally offers better overall performance, especially for large or sparse graphs.
Continue here: 11 Shortest Path Algorithms, Cheapest Walks in Weighted Graphs, Acyclical Graphs and Topological Sort, Dijkstra’s Algorithm