Topological Sorting - Learning Module

Loading content...

0/276

Kahn's Algorithm (BFS-Based)

An Algorithm Born from Intuition

In 1962, Arthur B. Kahn published a remarkably elegant algorithm for topological sorting. The genius of Kahn's approach lies in its simplicity: it directly implements the intuitive observation we made when proving that DAGs always have topological orderings.

The Core Insight:

Every DAG has at least one source—a vertex with no incoming edges. A source has no dependencies, so it can go first in any valid ordering. After placing a source in our output, we can conceptually 'remove' it from the graph. The remaining graph is still a DAG, so it also has a source. Repeat until all vertices are processed.

This greedy, iterative process naturally produces a valid topological ordering. If we ever reach a state where no sources remain but unprocessed vertices exist, we've detected a cycle.

What You Will Learn

By the end of this page, you will fully understand Kahn's algorithm: its mechanism, implementation details, correctness proof, time and space complexity analysis, and practical considerations including cycle detection and handling of multiple valid orderings.

Algorithm Overview

Let's state Kahn's algorithm precisely:

Kahn's Algorithm:

Initialize: Compute the in-degree (number of incoming edges) for every vertex. Collect all vertices with in-degree 0 into a queue (or set) of sources.
Process Sources: While the queue is non-empty:
- Remove a vertex u from the queue
- Append u to the output ordering
- For each neighbor v of u:
  - Decrease v's in-degree by 1 (conceptually removing the edge u → v)
  - If v's in-degree becomes 0, add v to the queue
Check Completion:
- If the output contains all vertices, return it as a valid topological ordering
- If some vertices remain unprocessed, the graph has a cycle (no valid ordering exists)

Why This Works:

When we add u to the output and decrement neighbors' in-degrees, we're simulating the removal of u from the graph. A neighbor v becomes a source (in-degree 0) exactly when all its dependencies have been processed. Since we only process sources, we never violate any edge constraint.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Graph:
    A ──► C ──► E
    │     ▲     │
    │     │     ▼
    └──►  B ──► D
 
Step 0: Compute in-degrees
  in_degree = {A: 0, B: 1, C: 2, D: 1, E: 1}
  sources_queue = [A]  (only A has in-degree 0)
 
Step 1: Process A
  output = [A]
  Neighbors of A: B, C
  Decrement: in_degree[B] = 0, in_degree[C] = 1
  B becomes source: sources_queue = [B]
 
Step 2: Process B
  output = [A, B]
  Neighbors of B: C, D
  Decrement: in_degree[C] = 0, in_degree[D] = 0
  Both become sources: sources_queue = [C, D]
 
Step 3: Process C (or D - order may vary)
  output = [A, B, C]
  Neighbors of C: E
  Decrement: in_degree[E] = 0
  E becomes source: sources_queue = [D, E]
 
Step 4: Process D
  output = [A, B, C, D]
  Neighbors of D: (none shown going out)
  sources_queue = [E]
 
Step 5: Process E
  output = [A, B, C, D, E]
  sources_queue = []
 
Result: [A, B, C, D, E] - Valid topological order!
All 5 vertices processed, so graph is a DAG.

Complete Implementation

Let's implement Kahn's algorithm in detail, handling all edge cases and providing clear structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
from collections import deque
 
def kahns_topological_sort(graph):
    """
    Perform topological sort using Kahn's algorithm.
    
    Args:
        graph: Dictionary mapping each vertex to list of its neighbors
               Example: {'A': ['B', 'C'], 'B': ['D'], 'C': ['D'], 'D': []}
    
    Returns:
        List of vertices in topological order, or None if cycle exists
    """
    # Step 1: Initialize in-degree for all vertices
    in_degree = {v: 0 for v in graph}
    
    # Count incoming edges for each vertex
    for vertex in graph:
        for neighbor in graph[vertex]:
            in_degree[neighbor] += 1
    
    # Step 2: Collect all initial sources (in-degree = 0)
    # Using deque for O(1) append/popleft operations
    sources = deque()
    for vertex in graph:
        if in_degree[vertex] == 0:
            sources.append(vertex)
    
    # Step 3: Process sources iteratively
    topological_order = []
    
    while sources:
        # Remove a source from queue
        current = sources.popleft()
        topological_order.append(current)
        
        # "Remove" current from graph by decrementing neighbors' in-degrees
        for neighbor in graph[current]:
            in_degree[neighbor] -= 1
            
            # If neighbor becomes a source, add to queue
            if in_degree[neighbor] == 0:
                sources.append(neighbor)
    
    # Step 4: Check if all vertices were processed
    if len(topological_order) == len(graph):
        return topological_order
    else:
        # Some vertices remain with positive in-degree
        # This means the graph has a cycle
        return None
 
 
# Example usage and testing
def main():
    # Example 1: Valid DAG
    dag = {
        'A': ['C'],
        'B': ['C', 'D'],
        'C': ['E'],
        'D': ['E'],
        'E': []
    }
    result = kahns_topological_sort(dag)
    print(f"DAG topological order: {result}")
    # Possible outputs: ['A', 'B', 'C', 'D', 'E'] or ['B', 'A', 'D', 'C', 'E'] etc.
    
    # Example 2: Graph with cycle
    cyclic = {
        'A': ['B'],
        'B': ['C'],
        'C': ['A']  # Creates cycle
    }
    result = kahns_topological_sort(cyclic)
    print(f"Cyclic graph result: {result}")  # None - cycle detected
    
    # Example 3: Disconnected DAG
    disconnected = {
        'A': ['B'],
        'B': [],
        'C': ['D'],
        'D': []
    }
    result = kahns_topological_sort(disconnected)
    print(f"Disconnected DAG: {result}")  # Some valid order like ['A', 'C', 'B', 'D']
 
if __name__ == "__main__":
    main()

Implementation Note: Queue vs Set

We use a queue for sources, which gives us FIFO (first-in, first-out) ordering among equally valid sources. Using a set or priority queue instead would give different valid topological orderings. For lexicographically smallest ordering, use a min-heap (priority queue) of sources.

TypeScript Implementation

For TypeScript/JavaScript developers, here's an idiomatic implementation with strong typing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
type Graph<T> = Map<T, T[]>;
 
interface TopologicalSortResult<T> {
    success: boolean;
    order: T[] | null;
    cycleVertices: T[] | null;  // Vertices involved in cycle if detected
}
 
function kahnsTopologicalSort<T>(graph: Graph<T>): TopologicalSortResult<T> {
    // Step 1: Compute in-degrees
    const inDegree = new Map<T, number>();
    
    // Initialize all vertices with in-degree 0
    for (const vertex of graph.keys()) {
        inDegree.set(vertex, 0);
    }
    
    // Count incoming edges
    for (const [vertex, neighbors] of graph.entries()) {
        for (const neighbor of neighbors) {
            inDegree.set(neighbor, (inDegree.get(neighbor) ?? 0) + 1);
        }
    }
    
    // Step 2: Initialize queue with all sources
    const sources: T[] = [];
    for (const [vertex, degree] of inDegree.entries()) {
        if (degree === 0) {
            sources.push(vertex);
        }
    }
    
    // Step 3: Process sources
    const topologicalOrder: T[] = [];
    
    while (sources.length > 0) {
        const current = sources.shift()!;  // Remove from front of queue
        topologicalOrder.push(current);
        
        // Process all neighbors
        const neighbors = graph.get(current) ?? [];
        for (const neighbor of neighbors) {
            const newDegree = (inDegree.get(neighbor) ?? 0) - 1;
            inDegree.set(neighbor, newDegree);
            
            if (newDegree === 0) {
                sources.push(neighbor);
            }
        }
    }
    
    // Step 4: Check result
    if (topologicalOrder.length === graph.size) {
        return {
            success: true,
            order: topologicalOrder,
            cycleVertices: null
        };
    } else {
        // Find vertices with remaining positive in-degree (part of cycles)
        const cycleVertices: T[] = [];
        for (const [vertex, degree] of inDegree.entries()) {
            if (degree > 0) {
                cycleVertices.push(vertex);
            }
        }
        return {
            success: false,
            order: null,
            cycleVertices
        };
    }
}
 
// Usage example
const courseGraph: Graph<string> = new Map([
    ['Intro', ['DataStructures']],
    ['DataStructures', ['Algorithms', 'Databases']],
    ['Algorithms', ['MachineLearning']],
    ['Databases', ['MachineLearning']],
    ['MachineLearning', []]
]);
 
const result = kahnsTopologicalSort(courseGraph);
if (result.success) {
    console.log('Course order:', result.order);
} else {
    console.log('Cannot schedule! Cycle involves:', result.cycleVertices);
}

Correctness Proof

Let's rigorously prove that Kahn's algorithm is correct—that is, it produces a valid topological ordering for DAGs and correctly detects cycles.

Theorem: Kahn's algorithm produces a valid topological ordering if and only if the input graph is a DAG.

Proof of Soundness (Output is Valid):

We must show that for every edge (u, v) in the original graph, u appears before v in the output.

Consider any edge (u → v). When does v get added to the output?

v is added when its in-degree becomes 0
v's in-degree decreases by 1 each time a predecessor is processed
v's in-degree becomes 0 only after ALL its predecessors (including u) are processed
Therefore, u is in the output before v

Proof of Completeness (Detects All Cycles):

We must show that if the algorithm fails to process all vertices, a cycle exists.

Suppose the algorithm terminates with k < n vertices processed, where n = |V|. Let S be the set of unprocessed vertices. Every vertex in S has positive in-degree (otherwise it would be in the source queue). Since no vertex in S was ever added to the queue, and the algorithm finished, every in-edge to vertices in S must come from other vertices in S. This creates a cycle: start at any vertex in S, follow an incoming edge (which leads to another vertex in S), and continue. Since S is finite, we must eventually revisit a vertex, completing a cycle.

The Invariant

Throughout the algorithm, we maintain the invariant: in_degree[v] equals the number of unprocessed predecessors of v. When this reaches 0, all of v's dependencies have been processed, so v can safely be added to the output next.

Time and Space Complexity

Let's analyze the efficiency of Kahn's algorithm in detail.

Time Complexity: O(V + E)

Breaking down each phase:

Time Complexity Breakdown
Phase	Operations	Complexity
Initialize in-degrees	Set all to 0	O(V)
Count in-degrees	Traverse all edges once	O(E)
Find initial sources	Check each vertex	O(V)
Main loop iterations	Each vertex processed once	O(V)
Edge processing	Each edge decrements in-degree once	O(E)
Queue operations	Each vertex enqueued/dequeued once	O(V)

Total: O(V + E) — This is optimal! We must examine every vertex and every edge at least once to determine if the graph is a DAG and compute the ordering.

Space Complexity: O(V)

In-degree array/map: O(V)
Source queue: O(V) in the worst case (all vertices could be sources in a graph with no edges)
Output array: O(V)
No additional per-edge storage needed (we don't modify the graph)

Total: O(V) — Linear in the number of vertices, independent of edge count.

Practical Considerations:

•Cache Performance — The algorithm processes vertices in a cache-friendly manner if the graph adjacency list is stored contiguously.
•Early Termination — If you only need to detect whether a topological order exists (not compute it), you can stop as soon as the source queue becomes empty with unprocessed vertices remaining.
•Parallel Potential — All sources in the queue at any moment are independent—they could be processed in parallel if needed.

Cycle Detection in Detail

Kahn's algorithm naturally detects cycles, but for practical applications, you often want to report the cycle, not just detect it. Here's how:

Finding the Actual Cycle:

When the algorithm terminates with unprocessed vertices:

All unprocessed vertices have positive in-degree
They form one or more cycles (or are reachable only from cycles)
To find an actual cycle, start from any unprocessed vertex and follow edges within the unprocessed set until you revisit a vertex

Enhanced Algorithm with Cycle Reporting:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
from collections import deque
 
def kahns_with_cycle_detection(graph):
    """
    Topological sort with detailed cycle detection.
    
    Returns:
        (order, cycle_path) where:
        - order is the topological order (or partial order if cycle)
        - cycle_path is None if DAG, or a list showing the cycle
    """
    in_degree = {v: 0 for v in graph}
    for vertex in graph:
        for neighbor in graph[vertex]:
            in_degree[neighbor] += 1
    
    sources = deque(v for v in graph if in_degree[v] == 0)
    order = []
    
    while sources:
        current = sources.popleft()
        order.append(current)
        for neighbor in graph[current]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                sources.append(neighbor)
    
    if len(order) == len(graph):
        return order, None  # No cycle
    
    # Cycle exists - find and report it
    unprocessed = {v for v in graph if in_degree[v] > 0}
    
    # Find a cycle by walking from any unprocessed vertex
    def find_cycle_from(start):
        visited = set()
        path = []
        current = start
        
        while current not in visited:
            visited.add(current)
            path.append(current)
            # Move to any unprocessed neighbor
            for neighbor in graph[current]:
                if neighbor in unprocessed:
                    current = neighbor
                    break
        
        # Extract the cycle portion
        cycle_start_idx = path.index(current)
        cycle = path[cycle_start_idx:] + [current]  # Close the loop
        return cycle
    
    cycle_path = find_cycle_from(next(iter(unprocessed)))
    return order, cycle_path
 
# Example
cyclic_graph = {
    'A': ['B'],
    'B': ['C'],
    'C': ['D', 'A'],  # C→A creates cycle
    'D': []
}
 
order, cycle = kahns_with_cycle_detection(cyclic_graph)
if cycle:
    print(f"Cycle detected: {' → '.join(cycle)}")
    # Output: Cycle detected: A → B → C → A
else:
    print(f"Topological order: {order}")

Multiple Cycles

A graph can have multiple independent cycles. The unprocessed set might contain vertices from different cycles. For complete diagnostics, you might need to find all cycles or at least identify all strongly connected components.

Variations and Extensions

Kahn's algorithm can be adapted for various specialized requirements:

Variation 1: Lexicographically Smallest Order

Replace the queue with a min-heap (priority queue). Always process the smallest available source first.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import heapq
 
def lexicographic_topological_sort(graph):
    """
    Returns lexicographically smallest topological ordering.
    """
    in_degree = {v: 0 for v in graph}
    for vertex in graph:
        for neighbor in graph[vertex]:
            in_degree[neighbor] += 1
    
    # Use min-heap instead of queue
    min_heap = [v for v in graph if in_degree[v] == 0]
    heapq.heapify(min_heap)
    
    order = []
    while min_heap:
        # Always take the smallest available source
        current = heapq.heappop(min_heap)
        order.append(current)
        
        for neighbor in graph[current]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                heapq.heappush(min_heap, neighbor)
    
    return order if len(order) == len(graph) else None
 
# With vertices 'A', 'B', 'C' all being valid first choices,
# this always picks 'A' first.

Variation 2: All Topological Orders

Generate all valid topological orderings using backtracking. At each step, try all current sources as the next element.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def all_topological_sorts(graph):
    """
    Generate all valid topological orderings of a DAG.
    Warning: Can be exponentially many orderings!
    """
    in_degree = {v: 0 for v in graph}
    for vertex in graph:
        for neighbor in graph[vertex]:
            in_degree[neighbor] += 1
    
    all_orders = []
    current_order = []
    
    def backtrack():
        # Find all current sources
        sources = [v for v in graph if in_degree[v] == 0 
                   and v not in current_order]
        
        if not sources:
            if len(current_order) == len(graph):
                all_orders.append(current_order.copy())
            return
        
        for source in sources:
            # Choose: add source to current order
            current_order.append(source)
            for neighbor in graph[source]:
                in_degree[neighbor] -= 1
            
            # Explore
            backtrack()
            
            # Unchoose: remove source, restore in-degrees
            current_order.pop()
            for neighbor in graph[source]:
                in_degree[neighbor] += 1
    
    backtrack()
    return all_orders
 
# Example: Diamond DAG {A→B, A→C, B→D, C→D}
# Has 2 valid orderings: [A,B,C,D] and [A,C,B,D]

Other Extensions

•Parallel Processing — Process all current sources in parallel batches. This is the basis for parallel build systems that compile independent modules simultaneously.
•Weighted Critical Path — Add weights to edges or vertices. Track the longest weighted path to each vertex to find the critical path in project scheduling (PERT/CPM).
•Dynamic Updates — Maintain topological order as edges are added/removed. This requires more complex data structures but avoids recomputing from scratch.
•Counting Dependencies — Track how many predecessors each vertex has in total (not just immediate), useful for bottleneck analysis.

Comparison with DFS Approach

Kahn's algorithm is one of two classical approaches to topological sorting. The other uses DFS post-ordering, which we'll study on the next page. Here's a preview comparison:

Kahn's Algorithm vs DFS Approach
Aspect	Kahn's (BFS)	DFS Approach
Time Complexity	O(V + E)	O(V + E)
Space Complexity	O(V) for queue and in-degrees	O(V) for recursion stack and visited
Output Order	Forward (sources first)	Reverse post-order (sinks first, then reverse)
Parallelization	Natural batching of sources	Harder to parallelize
Cycle Detection	Implicit (count mismatch)	Explicit (back edge detection)
Intuition	Process sources iteratively	Finish deepest first, reverse
Implementation	Iterative with queue	Recursive with stack
Lex. Smallest	Use min-heap for sources	More complex to achieve

When to Choose Kahn's

Prefer Kahn's algorithm when: (1) you want the forward topological order directly, (2) you need parallel batch processing of independent vertices, (3) you want lexicographically smallest order with minimal modification, or (4) iterative code is preferred over recursion (e.g., very deep graphs that might overflow the stack).

Summary: Kahn's Algorithm Mastery

Key Takeaways

•Core Idea — Repeatedly find and output sources (vertices with no remaining dependencies), simulating their removal from the graph.
•Data Structure — Maintain in-degree counts and a queue of vertices with in-degree 0. Update counts as vertices are processed.
•Cycle Detection — If the queue empties before all vertices are processed, a cycle exists. The unprocessed vertices with positive in-degree are in or reachable from cycles.
•Complexity — O(V + E) time, O(V) space. Optimal for the problem.
•Correctness — Vertices are output only after all predecessors, ensuring every edge constraint is satisfied.
•Variations — Use min-heap for lexicographic ordering. Use backtracking to enumerate all orderings. Process source batches for parallelism.
•Practical Use — Build systems, task schedulers, and dependency resolvers often use Kahn's approach due to its intuitive layer-by-layer processing.

What's Next:

We'll now study the alternative approach: DFS-based topological sorting. This elegant algorithm uses the post-order traversal of depth-first search, reversed, to produce a topological ordering. It offers different intuitions and is sometimes simpler to implement.

Page Complete

You now have a complete understanding of Kahn's algorithm for topological sorting. You can implement it, prove its correctness, analyze its complexity, detect cycles, and adapt it for specialized requirements. This is one of the two fundamental tools for ordering dependencies.