Loading content...
In 1962, Arthur B. Kahn published a remarkably elegant algorithm for topological sorting. The genius of Kahn's approach lies in its simplicity: it directly implements the intuitive observation we made when proving that DAGs always have topological orderings.
The Core Insight:
Every DAG has at least one source—a vertex with no incoming edges. A source has no dependencies, so it can go first in any valid ordering. After placing a source in our output, we can conceptually 'remove' it from the graph. The remaining graph is still a DAG, so it also has a source. Repeat until all vertices are processed.
This greedy, iterative process naturally produces a valid topological ordering. If we ever reach a state where no sources remain but unprocessed vertices exist, we've detected a cycle.
By the end of this page, you will fully understand Kahn's algorithm: its mechanism, implementation details, correctness proof, time and space complexity analysis, and practical considerations including cycle detection and handling of multiple valid orderings.
Let's state Kahn's algorithm precisely:
Kahn's Algorithm:
Initialize: Compute the in-degree (number of incoming edges) for every vertex. Collect all vertices with in-degree 0 into a queue (or set) of sources.
Process Sources: While the queue is non-empty:
u from the queueu to the output orderingv of u:
v's in-degree by 1 (conceptually removing the edge u → v)v's in-degree becomes 0, add v to the queueCheck Completion:
Why This Works:
When we add u to the output and decrement neighbors' in-degrees, we're simulating the removal of u from the graph. A neighbor v becomes a source (in-degree 0) exactly when all its dependencies have been processed. Since we only process sources, we never violate any edge constraint.
123456789101112131415161718192021222324252627282930313233343536373839
Graph: A ──► C ──► E │ ▲ │ │ │ ▼ └──► B ──► D Step 0: Compute in-degrees in_degree = {A: 0, B: 1, C: 2, D: 1, E: 1} sources_queue = [A] (only A has in-degree 0) Step 1: Process A output = [A] Neighbors of A: B, C Decrement: in_degree[B] = 0, in_degree[C] = 1 B becomes source: sources_queue = [B] Step 2: Process B output = [A, B] Neighbors of B: C, D Decrement: in_degree[C] = 0, in_degree[D] = 0 Both become sources: sources_queue = [C, D] Step 3: Process C (or D - order may vary) output = [A, B, C] Neighbors of C: E Decrement: in_degree[E] = 0 E becomes source: sources_queue = [D, E] Step 4: Process D output = [A, B, C, D] Neighbors of D: (none shown going out) sources_queue = [E] Step 5: Process E output = [A, B, C, D, E] sources_queue = [] Result: [A, B, C, D, E] - Valid topological order!All 5 vertices processed, so graph is a DAG.Let's implement Kahn's algorithm in detail, handling all edge cases and providing clear structure:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
from collections import deque def kahns_topological_sort(graph): """ Perform topological sort using Kahn's algorithm. Args: graph: Dictionary mapping each vertex to list of its neighbors Example: {'A': ['B', 'C'], 'B': ['D'], 'C': ['D'], 'D': []} Returns: List of vertices in topological order, or None if cycle exists """ # Step 1: Initialize in-degree for all vertices in_degree = {v: 0 for v in graph} # Count incoming edges for each vertex for vertex in graph: for neighbor in graph[vertex]: in_degree[neighbor] += 1 # Step 2: Collect all initial sources (in-degree = 0) # Using deque for O(1) append/popleft operations sources = deque() for vertex in graph: if in_degree[vertex] == 0: sources.append(vertex) # Step 3: Process sources iteratively topological_order = [] while sources: # Remove a source from queue current = sources.popleft() topological_order.append(current) # "Remove" current from graph by decrementing neighbors' in-degrees for neighbor in graph[current]: in_degree[neighbor] -= 1 # If neighbor becomes a source, add to queue if in_degree[neighbor] == 0: sources.append(neighbor) # Step 4: Check if all vertices were processed if len(topological_order) == len(graph): return topological_order else: # Some vertices remain with positive in-degree # This means the graph has a cycle return None # Example usage and testingdef main(): # Example 1: Valid DAG dag = { 'A': ['C'], 'B': ['C', 'D'], 'C': ['E'], 'D': ['E'], 'E': [] } result = kahns_topological_sort(dag) print(f"DAG topological order: {result}") # Possible outputs: ['A', 'B', 'C', 'D', 'E'] or ['B', 'A', 'D', 'C', 'E'] etc. # Example 2: Graph with cycle cyclic = { 'A': ['B'], 'B': ['C'], 'C': ['A'] # Creates cycle } result = kahns_topological_sort(cyclic) print(f"Cyclic graph result: {result}") # None - cycle detected # Example 3: Disconnected DAG disconnected = { 'A': ['B'], 'B': [], 'C': ['D'], 'D': [] } result = kahns_topological_sort(disconnected) print(f"Disconnected DAG: {result}") # Some valid order like ['A', 'C', 'B', 'D'] if __name__ == "__main__": main()We use a queue for sources, which gives us FIFO (first-in, first-out) ordering among equally valid sources. Using a set or priority queue instead would give different valid topological orderings. For lexicographically smallest ordering, use a min-heap (priority queue) of sources.
For TypeScript/JavaScript developers, here's an idiomatic implementation with strong typing:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
type Graph<T> = Map<T, T[]>; interface TopologicalSortResult<T> { success: boolean; order: T[] | null; cycleVertices: T[] | null; // Vertices involved in cycle if detected} function kahnsTopologicalSort<T>(graph: Graph<T>): TopologicalSortResult<T> { // Step 1: Compute in-degrees const inDegree = new Map<T, number>(); // Initialize all vertices with in-degree 0 for (const vertex of graph.keys()) { inDegree.set(vertex, 0); } // Count incoming edges for (const [vertex, neighbors] of graph.entries()) { for (const neighbor of neighbors) { inDegree.set(neighbor, (inDegree.get(neighbor) ?? 0) + 1); } } // Step 2: Initialize queue with all sources const sources: T[] = []; for (const [vertex, degree] of inDegree.entries()) { if (degree === 0) { sources.push(vertex); } } // Step 3: Process sources const topologicalOrder: T[] = []; while (sources.length > 0) { const current = sources.shift()!; // Remove from front of queue topologicalOrder.push(current); // Process all neighbors const neighbors = graph.get(current) ?? []; for (const neighbor of neighbors) { const newDegree = (inDegree.get(neighbor) ?? 0) - 1; inDegree.set(neighbor, newDegree); if (newDegree === 0) { sources.push(neighbor); } } } // Step 4: Check result if (topologicalOrder.length === graph.size) { return { success: true, order: topologicalOrder, cycleVertices: null }; } else { // Find vertices with remaining positive in-degree (part of cycles) const cycleVertices: T[] = []; for (const [vertex, degree] of inDegree.entries()) { if (degree > 0) { cycleVertices.push(vertex); } } return { success: false, order: null, cycleVertices }; }} // Usage exampleconst courseGraph: Graph<string> = new Map([ ['Intro', ['DataStructures']], ['DataStructures', ['Algorithms', 'Databases']], ['Algorithms', ['MachineLearning']], ['Databases', ['MachineLearning']], ['MachineLearning', []]]); const result = kahnsTopologicalSort(courseGraph);if (result.success) { console.log('Course order:', result.order);} else { console.log('Cannot schedule! Cycle involves:', result.cycleVertices);}Let's rigorously prove that Kahn's algorithm is correct—that is, it produces a valid topological ordering for DAGs and correctly detects cycles.
Theorem: Kahn's algorithm produces a valid topological ordering if and only if the input graph is a DAG.
Proof of Soundness (Output is Valid):
We must show that for every edge (u, v) in the original graph, u appears before v in the output.
Consider any edge (u → v). When does v get added to the output?
Proof of Completeness (Detects All Cycles):
We must show that if the algorithm fails to process all vertices, a cycle exists.
Suppose the algorithm terminates with k < n vertices processed, where n = |V|. Let S be the set of unprocessed vertices. Every vertex in S has positive in-degree (otherwise it would be in the source queue). Since no vertex in S was ever added to the queue, and the algorithm finished, every in-edge to vertices in S must come from other vertices in S. This creates a cycle: start at any vertex in S, follow an incoming edge (which leads to another vertex in S), and continue. Since S is finite, we must eventually revisit a vertex, completing a cycle.
Throughout the algorithm, we maintain the invariant: in_degree[v] equals the number of unprocessed predecessors of v. When this reaches 0, all of v's dependencies have been processed, so v can safely be added to the output next.
Let's analyze the efficiency of Kahn's algorithm in detail.
Time Complexity: O(V + E)
Breaking down each phase:
| Phase | Operations | Complexity |
|---|---|---|
| Initialize in-degrees | Set all to 0 | O(V) |
| Count in-degrees | Traverse all edges once | O(E) |
| Find initial sources | Check each vertex | O(V) |
| Main loop iterations | Each vertex processed once | O(V) |
| Edge processing | Each edge decrements in-degree once | O(E) |
| Queue operations | Each vertex enqueued/dequeued once | O(V) |
Total: O(V + E) — This is optimal! We must examine every vertex and every edge at least once to determine if the graph is a DAG and compute the ordering.
Space Complexity: O(V)
Total: O(V) — Linear in the number of vertices, independent of edge count.
Practical Considerations:
Kahn's algorithm naturally detects cycles, but for practical applications, you often want to report the cycle, not just detect it. Here's how:
Finding the Actual Cycle:
When the algorithm terminates with unprocessed vertices:
Enhanced Algorithm with Cycle Reporting:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
from collections import deque def kahns_with_cycle_detection(graph): """ Topological sort with detailed cycle detection. Returns: (order, cycle_path) where: - order is the topological order (or partial order if cycle) - cycle_path is None if DAG, or a list showing the cycle """ in_degree = {v: 0 for v in graph} for vertex in graph: for neighbor in graph[vertex]: in_degree[neighbor] += 1 sources = deque(v for v in graph if in_degree[v] == 0) order = [] while sources: current = sources.popleft() order.append(current) for neighbor in graph[current]: in_degree[neighbor] -= 1 if in_degree[neighbor] == 0: sources.append(neighbor) if len(order) == len(graph): return order, None # No cycle # Cycle exists - find and report it unprocessed = {v for v in graph if in_degree[v] > 0} # Find a cycle by walking from any unprocessed vertex def find_cycle_from(start): visited = set() path = [] current = start while current not in visited: visited.add(current) path.append(current) # Move to any unprocessed neighbor for neighbor in graph[current]: if neighbor in unprocessed: current = neighbor break # Extract the cycle portion cycle_start_idx = path.index(current) cycle = path[cycle_start_idx:] + [current] # Close the loop return cycle cycle_path = find_cycle_from(next(iter(unprocessed))) return order, cycle_path # Examplecyclic_graph = { 'A': ['B'], 'B': ['C'], 'C': ['D', 'A'], # C→A creates cycle 'D': []} order, cycle = kahns_with_cycle_detection(cyclic_graph)if cycle: print(f"Cycle detected: {' → '.join(cycle)}") # Output: Cycle detected: A → B → C → Aelse: print(f"Topological order: {order}")A graph can have multiple independent cycles. The unprocessed set might contain vertices from different cycles. For complete diagnostics, you might need to find all cycles or at least identify all strongly connected components.
Kahn's algorithm can be adapted for various specialized requirements:
Variation 1: Lexicographically Smallest Order
Replace the queue with a min-heap (priority queue). Always process the smallest available source first.
123456789101112131415161718192021222324252627282930
import heapq def lexicographic_topological_sort(graph): """ Returns lexicographically smallest topological ordering. """ in_degree = {v: 0 for v in graph} for vertex in graph: for neighbor in graph[vertex]: in_degree[neighbor] += 1 # Use min-heap instead of queue min_heap = [v for v in graph if in_degree[v] == 0] heapq.heapify(min_heap) order = [] while min_heap: # Always take the smallest available source current = heapq.heappop(min_heap) order.append(current) for neighbor in graph[current]: in_degree[neighbor] -= 1 if in_degree[neighbor] == 0: heapq.heappush(min_heap, neighbor) return order if len(order) == len(graph) else None # With vertices 'A', 'B', 'C' all being valid first choices,# this always picks 'A' first.Variation 2: All Topological Orders
Generate all valid topological orderings using backtracking. At each step, try all current sources as the next element.
123456789101112131415161718192021222324252627282930313233343536373839404142
def all_topological_sorts(graph): """ Generate all valid topological orderings of a DAG. Warning: Can be exponentially many orderings! """ in_degree = {v: 0 for v in graph} for vertex in graph: for neighbor in graph[vertex]: in_degree[neighbor] += 1 all_orders = [] current_order = [] def backtrack(): # Find all current sources sources = [v for v in graph if in_degree[v] == 0 and v not in current_order] if not sources: if len(current_order) == len(graph): all_orders.append(current_order.copy()) return for source in sources: # Choose: add source to current order current_order.append(source) for neighbor in graph[source]: in_degree[neighbor] -= 1 # Explore backtrack() # Unchoose: remove source, restore in-degrees current_order.pop() for neighbor in graph[source]: in_degree[neighbor] += 1 backtrack() return all_orders # Example: Diamond DAG {A→B, A→C, B→D, C→D}# Has 2 valid orderings: [A,B,C,D] and [A,C,B,D]Kahn's algorithm is one of two classical approaches to topological sorting. The other uses DFS post-ordering, which we'll study on the next page. Here's a preview comparison:
| Aspect | Kahn's (BFS) | DFS Approach |
|---|---|---|
| Time Complexity | O(V + E) | O(V + E) |
| Space Complexity | O(V) for queue and in-degrees | O(V) for recursion stack and visited |
| Output Order | Forward (sources first) | Reverse post-order (sinks first, then reverse) |
| Parallelization | Natural batching of sources | Harder to parallelize |
| Cycle Detection | Implicit (count mismatch) | Explicit (back edge detection) |
| Intuition | Process sources iteratively | Finish deepest first, reverse |
| Implementation | Iterative with queue | Recursive with stack |
| Lex. Smallest | Use min-heap for sources | More complex to achieve |
Prefer Kahn's algorithm when: (1) you want the forward topological order directly, (2) you need parallel batch processing of independent vertices, (3) you want lexicographically smallest order with minimal modification, or (4) iterative code is preferred over recursion (e.g., very deep graphs that might overflow the stack).
What's Next:
We'll now study the alternative approach: DFS-based topological sorting. This elegant algorithm uses the post-order traversal of depth-first search, reversed, to produce a topological ordering. It offers different intuitions and is sometimes simpler to implement.
You now have a complete understanding of Kahn's algorithm for topological sorting. You can implement it, prove its correctness, analyze its complexity, detect cycles, and adapt it for specialized requirements. This is one of the two fundamental tools for ordering dependencies.