Lecture from: 14.11.2024 | Rui Zhangs Notes | Video: Videos ETHZ

Topological Ordering

Imagine a directed graph representing a set of tasks and their dependencies. Vertices represent tasks, and a directed edge indicates that task must be completed before task can begin. A topological ordering (or topological sort) of this graph is a linear ordering of the vertices such that for every directed edge , vertex comes before vertex in the ordering. Essentially, it’s a valid sequence in which to perform the tasks.

Existence of Topological Orderings

Not all directed graphs have a topological ordering. The presence of directed cycles prevents a valid topological sort.

Example of a graph with no topological ordering:

In this circular dependency, it’s impossible to start any task because each task depends on another.

General Rule

A directed graph has a topological ordering if and only if it is a Directed Acyclic Graph (DAG), meaning it contains no directed cycles.

Proof

Let’s prove that a directed graph has a topological sort if and only if it contains no directed cycles. We can express this formally as:

A directed graph has a topological ordering is a DAG (Directed Acyclic Graph).

1. (If has a topological ordering, then is a DAG):

We will prove this direction by contraposition. Suppose is not a DAG, meaning it contains a directed cycle. Let the cycle be . Assume, for the sake of contradiction, that a topological ordering exists. In this ordering, there must be some vertex in the cycle, say , that comes first. However, since is part of a cycle, there’s an edge (where is the preceding vertex in the cycle, or if ). This means must come before in any topological ordering, contradicting our assumption that comes first in the cycle. Hence, a graph with a directed cycle cannot have a topological ordering.

TLDR: Look at the example above…

2. (If is a DAG, then has a topological ordering):

We will prove this direction constructively by outlining an algorithm to produce a topological ordering for any DAG. This will be covered in the following sections.

Examples of Topological Sorting

Topological sorting finds applications in various algorithms and scenarios where dependencies between tasks or operations exist. Here are a few examples:

  • Karatsuba’s Multiplication Algorithm: The recursive nature of Karatsuba’s algorithm can be visualized as a dependency graph, where the computation of smaller products must precede the calculation of the final product.

  • Fibonacci Number Calculation: Computing Fibonacci numbers recursively also exhibits dependencies, as the -th Fibonacci number depends on the -th and -th numbers.

  • Edit Distance Problem: Calculating the edit distance between two strings involves a dynamic programming approach, where the value of a cell in the table depends on the values of its neighboring cells.

Directed Graphs

A directed graph (or digraph) is similar to an undirected graph, but its edges are ordered pairs.

  • Edges as Ordered Pairs: An edge in a digraph is represented as , where and . This signifies a directed edge from to .
  • No Parallel Edges (in this definition): The provided definition disallows parallel edges, meaning there can be at most one edge from to .
  • Predecessors and Successors: If is an edge in the digraph, we say is a direct predecessor (or parent) of , and is a direct successor (or child) of .

Degree of a Vertex

In directed graphs, we distinguish between in-degree and out-degree:

  • In-degree (): The number of edges directed into vertex .
  • Out-degree (): The number of edges directed out of vertex .

Special Cases:

  • Source (or Quelle): A vertex with is called a source. It has no incoming edges.
  • Sink (or Senke): A vertex with is called a sink. It has no outgoing edges.

Walks, Paths, and Cycles

Similar to undirected graphs, we define walks, paths, and cycles in digraphs, but with the added constraint of direction:

  • Directed Walk: A sequence of alternating vertices and edges where each is a directed edge.
  • Directed Path: A directed walk with distinct vertices.
  • Directed Cycle: A directed path that starts and ends at the same vertex.

Representing Graphs on a Computer

Storing graphs efficiently on a computer is crucial for implementing graph algorithms. Let’s consider a graph with vertices and edges. We’ll examine two common representations: adjacency matrices and adjacency lists.

Suppose we want to store this graph:

Adjacency Matrix

An adjacency matrix represents the graph as an matrix , where:

  • if there’s an edge from to .
  • otherwise.

Example:

Operations and Runtime

OperationRuntimeIntuition
Check edge Direct access to matrix element A[u][v].
Find neighbors of Iterate through row A[u].
Add/remove edgeChange matrix element A[u][v].

Pros and Cons

ProsCons
Simple implementation.Space-inefficient, especially for sparse graphs ().
Constant-time edge lookups.Linear time to find neighbors.
Suitable for dense graphs ( close to ).Can be wasteful for graphs with few edges.

Adjacency List

An adjacency list represents a graph as an array of lists. Each list corresponds to a vertex and stores the vertices such that is an edge. Often, these lists are implemented using linked lists for efficient insertion and deletion of edges.

Example:

Operations and Runtime

OperationRuntimeIntuition
Check edge Traverse the adjacency list of .
Find neighbors of Directly access the adjacency list of .
Add edge Add to ‘s linked list
Remove edge Search and remove from ‘s list.

Pros and Cons

ProsCons
Space-efficient for sparse graphs.Edge lookup can take up to time.
Efficient for iterating over neighbors.Slightly more complex implementation than adjacency matrix.
Suitable for most graph algorithms.Less efficient for very dense graphs.

Choosing the Right Representation

  • Adjacency Matrix: Simpler, constant-time edge lookup, suitable for dense graphs.
  • Adjacency List: More space-efficient for sparse graphs, efficient neighbor iteration, more commonly used in practice. The choice between the two depends on the specific application and the characteristics of the graph being represented.

Topological Sort Algorithm

Let’s explore a topological sort algorithm and its connection to the absence of directed cycles.

Core Idea: The algorithm repeatedly finds and removes sinks (vertices with no outgoing edges) from the graph.

  1. Find a Sink: Locate a vertex with .
  2. Append to List: Add the sink to the end of the topological ordering list.
  3. Remove and Recurse: Remove the sink and all its incoming edges from the graph. Repeat the process on the remaining subgraph.

Initial Approach (using path)

The path function aims to find a sink by exploring a path from a given vertex:

def path(u):
	u.mark()
	if any(v in u.successors() and not v.isMarked() for v in u.successors()):
		path(v)

Example:

Calling path(A) might produce the marked path A, B, E, F, G. The topological order would be initialized with G, F, E, B, A (built by prepending).

Properties of path:

  1. Marks a Path: path(u) marks a directed path starting at .
  2. End Node Properties: The end vertex of has all its successors either already marked or has no successors (a sink).

Claim: If there are no directed cycles, the end vertex found by path(u) is a sink.

Proof (by Contrapositive): If is not a sink, the graph contains a directed cycle.

  1. Assume is not a sink: has a successor . By path’s properties, must already be marked.
  2. is on the path: Since is marked, it must lie on the path from to .
  3. Cycle Exists: The edge and the path segment from to form a cycle.

Improved Approach (using visit and DFS)

The path function is inefficient because it restarts the search after finding each sink. The visit function improves this:

def visit(u, topological_order):
	u.mark()
	for v in u.successors():
		if not v.isMarked():
			visit(v, topological_order)
	topological_order.insert(0, u)   # Prepend u to maintain order

Example:

visit processes all reachable vertices from a given vertex before adding it to the order. This means it’s more efficient and doesn’t discard work.

Depth-First Search (DFS):

def dfs_topological_sort(G):
	G.unmarkAll()
	topological_order = []
	for u in G.nodes():
		if not u.isMarked():
			visit(u, topological_order)
	return topological_order

dfs_topological_sort uses visit to perform a depth-first traversal, ensuring all vertices are visited and added to the topological order.

DFS Tree:

The DFS creates a tree structure that captures the dependencies between vertices, enabling the topological sort. By adding nodes after all descendants have been visited, we create the correct ordering.


Let’s analyze the topological sort procedure more deeply by augmenting the visit function with pre- and post-visit timestamps.

def visit(u, topological_order, pre, post, T):
    pre[u] = T
    T += 1
    u.mark()
    for v in u.successors():
        if not v.isMarked():
            T = visit(v, topological_order, pre, post, T) # T is modified recursively.
    topological_order.insert(0, u)
    post[u] = T
    T += 1
    return T

We’ve introduced pre and post dictionaries (or arrays) to store timestamps. pre[u] records the time when vertex u is first visited, and post[u] records the time when the visit(u) call completes. T is a global counter (or a parameter passed by reference) that keeps track of the current time. Note how T is recursively updated and passed down.

Example with Pre- and Post-visit Times:

Starting the DFS from vertex A, we obtain the pre- and post-visit times shown in the image.

Pre- and Post-order Sort:

  • Pre-order: The pre-order is not necessarily a topological sort. Note that vertices are prepended to topological_order and not added. The pre-order sorting is, however, the order in which the recursive visit calls begin.

  • Post-order: The post-order is a topological sort. The post-order is the order in which the recursive visit calls complete. This is because, by definition, when a visit(u) call completes, it means that all of u’s successors have also completed their visit calls. Thus, it’s safe to place u before its successors in the topological order.

Interval Pattern (Nesting):

The intervals defined by [pre[u], post[u]] exhibit a nesting pattern. This nesting directly corresponds to the recursion tree of the DFS.

Edge Classifications based on Intervals

Note for readers: I suggest watching this less confusing explanation

Given an edge , we can classify it based on the intervals and :

  1. Tree Edge: If , then is a tree edge. This means is a descendant of in the DFS tree. The visit(u) call directly leads to the visit(v) call. Example: (A, B), (B, E).

  2. Back Edge: If , then is a back edge. This indicates a cycle in the graph. is a descendant of , and the edge goes back to an ancestor. Examples: (F, B), (D, A).

  3. Forward Edge: If , then is a forward edge. This means is a descendant of , but the edge is not part of the DFS tree. There is a path u to v in the tree but a direct edge as well. Examples: (A, F), (B, G).

  4. Cross Edge: If , then is a cross edge. This means and are in different branches of the DFS tree, and there is no ancestor-descendant relationship between them. They belong to different subtrees that don’t overlap. Examples: (H, G), (D, H).

Impossible Cases:

  • Overlapping but not Nested Intervals: Due to the recursive nature of DFS, two intervals cannot partially overlap without one completely containing the other.

  • : This case is impossible because visit(u) would call visit(v) if v was not yet marked, so the interval of v can’t end before u.

Observations and Topological Sort Justification

Let’s connect the edge classifications to the existence of directed cycles and justify why the reverse post-order provides a topological sort.

Observations:

  1. Back Edge Implies Cycle: If a back edge exists, then there is a directed cycle. This is because a back edge connects a vertex to an ancestor in the DFS tree. The path from to in the DFS tree, combined with the back edge , forms a cycle.

  2. Non-Back Edges and Post-Order: For all non-back edges , we have . Let’s justify this for each non-back edge type:

    • Tree Edge: By definition, for a tree edge .
    • Forward Edge: Similarly, for a forward edge , .
    • Cross Edge: For a cross edge , the DFS finishes visiting and its descendants before starting to visit . Therefore, .

Connecting to Topological Sort:

  • No Cycles Imply No Back Edges: If there are no directed cycles in the graph, then observation 1 tells us there can be no back edges.

  • Reverse Post-Order is Topological Sort: If there are no back edges, then for every edge in the graph (all of which are now non-back edges), (observation 2). This means that if we sort the vertices in reverse post-order, for every edge , will come before in the sorted list. This is precisely the definition of a topological sort.

Runtime of Topological Sort

The runtime of the topological sort algorithm depends on the graph representation. Let be the number of vertices and be the number of edges.

Adjacency Matrix

  • visit calls: The visit function is called at most once for each vertex. Therefore, there are at most calls to visit.

  • Work per visit call: Inside visit, the main work is iterating through the successors of a vertex u. With an adjacency matrix, this requires checking all n entries in the row corresponding to u to determine its successors.

  • Total Runtime: Consequently, the total runtime using an adjacency matrix is .

Adjacency List

  • visit calls: Similar to the adjacency matrix case, the visit function is called at most n times.

  • Work per visit call: For a vertex u, the work inside visit involves iterating through the successors of u. With an adjacency list, this takes time, where is the out-degree of u. Additionally, the recursive calls of visit do their work.

  • Total Runtime: Summing up the work across all visit calls gives us:

    Since and (the total number of edges — the Handshaking lemma for directed graphs), the runtime simplifies to .

TLDR:

  • Adjacency Matrix:
  • Adjacency List:

For sparse graphs where , the adjacency list representation leads to a significantly faster topological sort algorithm. For dense graphs where is closer to , the difference in performance might be less pronounced. However, the adjacency list representation generally offers better overall performance, especially for large or sparse graphs.

Continue here: 11 Shortest Path Algorithms, Cheapest Walks in Weighted Graphs, Acyclical Graphs and Topological Sort, Dijkstra’s Algorithm