Loading learning content...
While Kosaraju's algorithm elegantly finds SCCs in two DFS passes, Robert Tarjan—one of the giants of algorithm design—developed an alternative approach in 1972 that accomplishes the same task in a single DFS traversal.
Tarjan's algorithm introduces the concept of low-link values—a way to track, during DFS, the earliest discovered vertex reachable from the current subtree. This information, combined with a stack that tracks the current path, allows us to identify SCCs as we discover them, without needing a second pass.
Though conceptually more subtle than Kosaraju's algorithm, Tarjan's approach is more space-efficient and forms the foundation for many other graph algorithms, including finding bridges and articulation points.
By the end of this page, you will understand DFS discovery times and their role in graph analysis, the low-link value concept and how it tracks reachability to ancestors, why we need an explicit stack to track the current DFS path, the complete Tarjan's algorithm with detailed pseudocode and implementation, correctness proof and complexity analysis, and comparison with Kosaraju's approach.
To understand Tarjan's algorithm, we must first deeply understand DFS discovery times.
Discovery Time (disc[v]):
When running DFS, we assign each vertex v a discovery time disc[v]—the order in which vertices are first visited. The first vertex visited has disc = 0, the second has disc = 1, and so on.
Properties of Discovery Times:
Uniqueness: Each vertex has a unique discovery time (assuming we visit each vertex exactly once).
Ancestor Relationship: If u is an ancestor of v in the DFS tree, then disc[u] < disc[v]. We discovered u before descending to find v.
Subtree Range: All vertices in the subtree rooted at v have discovery times in a contiguous range [disc[v], disc[v] + subtree_size - 1].
Back Edge Detection: If (v, u) is an edge and disc[u] < disc[v], and u is an ancestor of v in the DFS tree, then (v, u) is a back edge—it points from a descendant back to an ancestor.
123456789101112131415161718192021222324252627
Graph: A → B → C ↑ ↓ └───────┘ (C → A back edge) DFS from A: Visit A: disc[A] = 0 → Visit B: disc[B] = 1 → Visit C: disc[C] = 2 → Edge C → A: A already visited (disc[A] = 0 < disc[C] = 2) This is a BACK EDGE (C is descendant, A is ancestor) Finish C Finish B Finish A DFS Tree: A (disc=0) └── B (disc=1) └── C (disc=2) └── back edge to A Discovery times: {A: 0, B: 1, C: 2} The back edge C → A indicates a cycle containing A, B, C.This cycle means they're all in the same SCC.Back edges are the key to finding SCCs. A back edge from v to ancestor u means there's a cycle containing all vertices on the path from u to v. These vertices are all in the same SCC. Tarjan's algorithm tracks these back edges implicitly through low-link values.
The central concept in Tarjan's algorithm is the low-link value.
Definition: Low-Link Value
The low-link value of a vertex v, denoted low[v], is the smallest discovery time of any vertex reachable from the subtree rooted at v, including v itself, using:
Intuition:
low[v] answers the question: "Starting from v and exploring its subtree, what's the earliest-discovered vertex I can reach, possibly via a single back edge?"
If low[v] < disc[v], it means the subtree rooted at v can reach back to an ancestor of v—indicating v is not the root of an SCC.
If low[v] = disc[v], it means v is the "root" of an SCC—no vertex in its subtree can reach an earlier vertex, so this subtree (potentially with some exclusions) forms a complete SCC.
1234567891011121314151617181920212223242526272829303132
Graph: A → B → C → D ↑ ↓ └───────────┘ DFS from A (computing low-link values): Visit A: disc[A] = 0, low[A] = 0 (initially)→ Visit B: disc[B] = 1, low[B] = 1 → Visit C: disc[C] = 2, low[C] = 2 → Visit D: disc[D] = 3, low[D] = 3 → Edge D → A: A is on stack, disc[A] = 0 Update low[D] = min(low[D], disc[A]) = min(3, 0) = 0 Finish D, return low[D] = 0 Update low[C] = min(low[C], low[D]) = min(2, 0) = 0 Finish C, return low[C] = 0 Update low[B] = min(low[B], low[C]) = min(1, 0) = 0 Finish B, return low[B] = 0Update low[A] = min(low[A], low[B]) = min(0, 0) = 0Finish A Final values: disc: {A: 0, B: 1, C: 2, D: 3} low: {A: 0, B: 0, C: 0, D: 0} Interpretation:- low[A] = disc[A] = 0 → A is root of an SCC- low[B] = 0 < disc[B] = 1 → B can reach earlier vertex (A)- low[C] = 0 < disc[C] = 2 → C can reach earlier vertex (A)- low[D] = 0 < disc[D] = 3 → D can reach earlier vertex (A) All vertices have the same low-link value, indicating one SCC: {A, B, C, D}Low-link values propagate upward in the DFS tree. When we finish exploring a child, we update the parent's low-link to incorporate what the child discovered. A low low-link value 'bubbles up' through the tree, indicating reachability to an early ancestor.
A subtle but crucial aspect of Tarjan's algorithm is distinguishing between edges that should update low-link values and those that shouldn't.
The Problem with Cross Edges:
During DFS, we encounter different types of edges:
For SCC detection, back edges matter (they indicate cycles within the current exploration path). Cross edges to vertices in already-completed SCCs should NOT update low-link values—those vertices are "locked in" to their own SCCs.
The Stack Solution:
Maintain a stack of vertices currently being explored (on the current DFS path). When we encounter an edge v → u:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
Graph with cross edge: A → B → C E → F ↑ ↓ ↓ ↑ ↓ └───D └──────┴───┘ Let me simplify: A → B → C ↑ ↓ └── D C → E → F ↑ ↓ └───┘ Two SCCs: {A, B, D} and {E, F}Cross edge: C → E DFS from A: Visit A (disc=0), push A to stack → Visit B (disc=1), push B → Visit C (disc=2), push C → Edge C → E: E unvisited → Visit E (disc=3), push E → Visit F (disc=4), push F → Edge F → E: E on stack Update low[F] = min(4, 3) = 3 Finish F (low[F]=3) disc[E] = low[E]? 3 = 3, YES! Pop until E: SCC {E, F} found! Finish E, return low[E] = 3 C → E: E is NOT on stack anymore! Do NOT update low[C] (E is in completed SCC) → Edge C → ... (no more edges from C) Finish C, disc[C]=2, low[C]=2 disc[C] = low[C]? YES → SCC {C} found! Pop C Finish B, ... Wait, let me reconsider the graph. Let me use a cleaner example: Graph: SCC₁: {0, 1, 2} with 0→1→2→0 SCC₂: {3, 4} with 3→4→3 Cross edge: 2→3 DFS from 0: Visit 0 (disc=0, stack=[0]) → Visit 1 (disc=1, stack=[0,1]) → Visit 2 (disc=2, stack=[0,1,2]) → Edge 2→0: 0 is ON STACK Update low[2] = min(2, 0) = 0 → Edge 2→3: 3 unvisited → Visit 3 (disc=3, stack=[0,1,2,3]) → Visit 4 (disc=4, stack=[0,1,2,3,4]) → Edge 4→3: 3 is ON STACK Update low[4] = min(4, 3) = 3 Finish 4, low[4]=3 ≠ disc[4]=4, not SCC root low[3] = min(low[3], low[4]) = min(3, 3) = 3 Finish 3, low[3]=3 = disc[3]=3, SCC ROOT! Pop until 3: SCC = {3, 4} Back to 2: 3 is NOT ON STACK Do NOT update low[2]! (Correct behavior) Finish 2, low[2]=0, not SCC root low[1] = min(1, 0) = 0 Finish 1, low[1]=0, not SCC root low[0] = min(0, 0) = 0 Finish 0, low[0]=0 = disc[0]=0, SCC ROOT! Pop until 0: SCC = {0, 1, 2} RESULT: Two SCCs correctly identified!If we updated low[v] based on edges to any visited vertex (not just on-stack vertices), we would incorrectly 'merge' SCCs. The on-stack check ensures we only consider back edges within the current exploration path, not cross edges to already-completed components.
Now we can present the complete algorithm.
Data Structures:
disc[v]: Discovery time of vertex v (-1 if unvisited)low[v]: Low-link value of vertex vonStack[v]: Boolean, whether v is currently on the exploration stackstack: The exploration stack containing vertices on the current DFS pathtimer: Global counter for assigning discovery timessccs: List of discovered SCCsAlgorithm Steps:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
function tarjanSCC(n: number, edges: [number, number][]): number[][] { // Build adjacency list const graph: number[][] = Array.from({ length: n }, () => []); for (const [u, v] of edges) { graph[u].push(v); } // State variables const disc: number[] = new Array(n).fill(-1); // Discovery time const low: number[] = new Array(n).fill(-1); // Low-link value const onStack: boolean[] = new Array(n).fill(false); const stack: number[] = []; let timer = 0; const sccs: number[][] = []; function dfs(v: number): void { // Initialize discovery time and low-link value disc[v] = low[v] = timer++; stack.push(v); onStack[v] = true; // Explore all neighbors for (const u of graph[v]) { if (disc[u] === -1) { // Case 1: u is unvisited - tree edge dfs(u); // After returning, update low-link with child's low-link low[v] = Math.min(low[v], low[u]); } else if (onStack[u]) { // Case 2: u is on stack - back edge // Update low-link with u's discovery time low[v] = Math.min(low[v], disc[u]); } // Case 3: u is visited but not on stack - cross edge // Do nothing! u is in a completed SCC } // If v is the root of an SCC if (disc[v] === low[v]) { const scc: number[] = []; // Pop all vertices until we get v while (true) { const u = stack.pop()!; onStack[u] = false; scc.push(u); if (u === v) break; } sccs.push(scc); } } // Run DFS from all vertices to handle disconnected graphs for (let v = 0; v < n; v++) { if (disc[v] === -1) { dfs(v); } } return sccs;} // Example usage:// const sccs = tarjanSCC(5, [[0,1], [1,2], [2,0], [1,3], [3,4], [4,3]]);// Output: [[2, 1, 0], [4, 3]] — two SCCsThe condition disc[v] == low[v] identifies v as an SCC root. It means: 'The earliest vertex reachable from my subtree is myself.' No path from this subtree can escape to an earlier vertex, so everything on the stack from v onward forms a complete SCC.
Let's trace through a complete example to fully understand the algorithm's execution.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116
Graph with 8 vertices: 0 → 1 → 2 → 3 ↑ ↓ ↓ └───────┘ └→ 4 → 5 ↑ ↓ └───┘ 6 → 7 → 6 (separate component) Edges: 0→1, 1→2, 2→0, 2→3, 3→4, 4→5, 5→4, 6→7, 7→6 Expected SCCs: {0,1,2}, {3}, {4,5}, {6,7} ═══════════════════════════════════════════════════════════════ TARJAN'S ALGORITHM TRACE═══════════════════════════════════════════════════════════════ Start DFS from vertex 0: ┌─ Visit 0: disc[0]=0, low[0]=0, stack=[0]│ ├─ Edge 0→1: 1 unvisited, recurse│ ││ └─ Visit 1: disc[1]=1, low[1]=1, stack=[0,1]│ ├─ Edge 1→2: 2 unvisited, recurse│ ││ └─ Visit 2: disc[2]=2, low[2]=2, stack=[0,1,2]│ ├─ Edge 2→0: 0 on stack!│ │ low[2] = min(2, disc[0]) = min(2, 0) = 0│ ││ ├─ Edge 2→3: 3 unvisited, recurse│ ││ └─ Visit 3: disc[3]=3, low[3]=3, stack=[0,1,2,3]│ ├─ Edge 3→4: 4 unvisited, recurse│ ││ └─ Visit 4: disc[4]=4, low[4]=4, stack=[0,1,2,3,4]│ ├─ Edge 4→5: 5 unvisited, recurse│ ││ └─ Visit 5: disc[5]=5, low[5]=5, stack=[0,1,2,3,4,5]│ ├─ Edge 5→4: 4 on stack!│ │ low[5] = min(5, disc[4]) = min(5, 4) = 4│ ││ └─ Return from 5:│ disc[5]=5, low[5]=4│ 5 ≠ 4, NOT SCC root│ │ low[4] = min(low[4], low[5]) = min(4, 4) = 4│ Return from 4:│ disc[4]=4, low[4]=4│ 4 = 4, IS SCC ROOT!│ Pop until 4: [5, 4]│ ★ SCC found: {4, 5}│ stack now: [0,1,2,3]│ │ low[3] = min(low[3], low[4]) = min(3, 4) = 3│ (wait, 4 is popped, but we use the low value before pop)│ Actually: low[3] = min(3, ?) │ After recursion returns, 4 is no longer on stack.│ We don't update from popped vertices.│ low[3] stays 3.│ │ Return from 3:│ disc[3]=3, low[3]=3│ 3 = 3, IS SCC ROOT!│ Pop until 3: [3]│ ★ SCC found: {3}│ stack now: [0,1,2]││ low[2] = min(low[2], low[3]?) │ Actually low[2] was already 0 from the back edge.│ After returning from 3, we would do:│ low[2] = min(low[2], low[3]) = min(0, 3) = 0│ (This doesn't change anything since low[2] was already 0)││ Return from 2:│ disc[2]=2, low[2]=0│ 2 ≠ 0, NOT SCC root││ low[1] = min(low[1], low[2]) = min(1, 0) = 0│ Return from 1:│ disc[1]=1, low[1]=0│ 1 ≠ 0, NOT SCC root││ low[0] = min(low[0], low[1]) = min(0, 0) = 0│ Return from 0:│ disc[0]=0, low[0]=0│ 0 = 0, IS SCC ROOT!│ Pop until 0: [2, 1, 0]│ ★ SCC found: {0, 1, 2}│ stack now: []└────────────────────────────────────── Next unvisited: vertex 6 ┌─ Visit 6: disc[6]=6, low[6]=6, stack=[6]│ ├─ Edge 6→7: 7 unvisited, recurse│ ││ └─ Visit 7: disc[7]=7, low[7]=7, stack=[6,7]│ ├─ Edge 7→6: 6 on stack!│ │ low[7] = min(7, disc[6]) = min(7, 6) = 6│ ││ └─ Return from 7:│ disc[7]=7, low[7]=6│ 7 ≠ 6, NOT SCC root││ low[6] = min(low[6], low[7]) = min(6, 6) = 6│ Return from 6:│ disc[6]=6, low[6]=6│ 6 = 6, IS SCC ROOT!│ Pop until 6: [7, 6]│ ★ SCC found: {6, 7}└────────────────────────────────────── ═══════════════════════════════════════════════════════════════FINAL RESULT: SCCs = [{4,5}, {3}, {0,1,2}, {6,7}]═══════════════════════════════════════════════════════════════SCCs are discovered in reverse topological order of the component graph. Sink SCCs (no outgoing edges) are discovered first, source SCCs last. This is because we pop SCCs when we can no longer reach 'earlier' vertices—which happens for sinks first.
Let's prove that Tarjan's algorithm correctly identifies all SCCs.
Claim 1: All vertices in a popped SCC are mutually reachable.
Proof: When we pop an SCC at root r (where disc[r] = low[r]), we pop all vertices from r to the top of the stack. These vertices were pushed after r and before r finished processing.
Every vertex v in this set can reach r:
r can reach every such v:
Thus all popped vertices are mutually reachable. ∎
Claim 2: No vertex from a different SCC is included.
Proof: Vertices from earlier SCCs (earlier in DFS) were already popped—they're not on the stack.
Vertices from later SCCs haven't been pushed yet—they'll be processed in separate DFS calls.
The on-stack check prevents updating low-link from vertices in already-completed SCCs, so different SCCs remain distinct. ∎
Claim 3: Every SCC is found.
Proof: Every vertex is visited exactly once during DFS. When a vertex's processing completes and disc[v] = low[v], it's identified as an SCC root. Since low[v] captures reachability to ancestors:
Every strongly connected component has exactly one root—the first vertex discovered in that SCC during DFS. This root will satisfy disc = low because its low-link cannot improve (no reachable ancestors), so the SCC will be detected and popped.
Since every vertex belongs to exactly one SCC, and every SCC has a unique root that triggers a pop, all SCCs are found. ∎
Claim 4: No SCC is reported twice.
Proof: Once an SCC is popped, all its vertices are removed from the stack and have onStack = false. They cannot be part of any future SCC pop. ∎
Combining claims 1-4: Tarjan's algorithm is correct. ∎
The key invariant is: vertices on the stack form a chain of ancestors in the current DFS tree, plus any vertices we've descended to but not fully processed. When disc[v] = low[v], we've found a 'closed system'—all vertices from v to the top of the stack form a complete SCC.
Time Complexity: O(V + E)
DFS Traversal:
Stack Operations:
Low-link Updates:
Total Time: O(V + E)
The algorithm achieves optimal time complexity—same as a single DFS traversal.
Space Complexity: O(V)
Arrays:
Stack:
Recursion Stack:
Total Space: O(V)
Comparison with Kosaraju:
Kosaraju requires O(V + E) space because it builds the transpose graph explicitly. Tarjan only needs O(V) space (assuming the original graph is already given). This makes Tarjan more space-efficient for large graphs.
| Aspect | Tarjan | Kosaraju |
|---|---|---|
| Time Complexity | O(V + E) | O(V + E) |
| Space Complexity | O(V) | O(V + E) |
| DFS Passes | 1 | 2 |
| Extra Graph Storage | None | Transpose graph |
| SCC Order | Reverse topological | Topological |
| Implementation | More complex | Simpler |
For graphs with many more edges than vertices (dense graphs), Tarjan's O(V) space is significantly better than Kosaraju's O(V + E). For sparse graphs (E ≈ V), the difference is minor. Choose based on your specific constraints.
There are several practical considerations when implementing Tarjan's algorithm.
Alternative Low-Link Update:
Some implementations use low[v] = min(low[v], low[u]) instead of low[v] = min(low[v], disc[u]) for back edges. Both work, but the disc version is conceptually cleaner and matches the original paper.
Iterative Implementation:
For very large graphs, recursion may cause stack overflow. An iterative version using an explicit call stack is more robust:
Edge Cases:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
function tarjanIterative(n: number, edges: [number, number][]): number[][] { const graph: number[][] = Array.from({ length: n }, () => []); for (const [u, v] of edges) { graph[u].push(v); } const disc: number[] = new Array(n).fill(-1); const low: number[] = new Array(n).fill(-1); const onStack: boolean[] = new Array(n).fill(false); const sccStack: number[] = []; let timer = 0; const sccs: number[][] = []; // Call stack: [vertex, neighborIndex, isProcessing] type Frame = { v: number; idx: number; }; for (let start = 0; start < n; start++) { if (disc[start] !== -1) continue; const callStack: Frame[] = [{ v: start, idx: 0 }]; while (callStack.length > 0) { const frame = callStack[callStack.length - 1]; const v = frame.v; // First time visiting this vertex if (frame.idx === 0 && disc[v] === -1) { disc[v] = low[v] = timer++; sccStack.push(v); onStack[v] = true; } // Process neighbors let foundUnvisited = false; while (frame.idx < graph[v].length) { const u = graph[v][frame.idx]; frame.idx++; if (disc[u] === -1) { // Unvisited neighbor - recurse callStack.push({ v: u, idx: 0 }); foundUnvisited = true; break; } else if (onStack[u]) { // Back edge low[v] = Math.min(low[v], disc[u]); } } if (foundUnvisited) continue; // All neighbors processed - check if SCC root if (disc[v] === low[v]) { const scc: number[] = []; while (true) { const u = sccStack.pop()!; onStack[u] = false; scc.push(u); if (u === v) break; } sccs.push(scc); } // Pop frame and update parent's low-link callStack.pop(); if (callStack.length > 0) { const parent = callStack[callStack.length - 1].v; low[parent] = Math.min(low[parent], low[v]); } } } return sccs;}The iterative version is more complex and harder to debug. Use the recursive version unless you specifically need to handle graphs with millions of vertices where recursion would overflow. Many languages/runtimes support configuring larger stack sizes as an alternative.
Tarjan's algorithm is a masterpiece of algorithmic design—achieving optimal time complexity in a single pass through clever use of low-link values. Let's consolidate the key concepts:
What's Next:
With both Kosaraju's and Tarjan's algorithms understood, we now turn to the practical applications of SCCs. The next page explores how SCC decomposition is used in real-world systems: analyzing dependencies in software, detecting deadlocks, optimizing compilers, understanding web structure, and more.
You now understand Tarjan's algorithm in depth: the low-link value concept, the on-stack check for handling cross edges, the SCC root condition, complete implementation (both recursive and iterative), correctness proof, and complexity analysis. You can implement either Kosaraju's or Tarjan's algorithm and explain the trade-offs between them.