Cycle Detection - Learning Module

Loading content...

0/276

Using DFS with Color/State Marking

Beyond Simple Traversal: DFS as a Discovery Engine

Depth-First Search is far more powerful than a simple graph traversal algorithm. When augmented with state marking and timestamp recording, DFS becomes a discovery engine that can classify every edge in a graph, compute vertex ordering properties, and solve a wide variety of graph problems.

The three-color marking system we introduced for cycle detection is just the beginning. By tracking when each vertex is discovered and when it's finished, we unlock capabilities that serve as the foundation for algorithms including topological sorting, strongly connected component detection, and articulation point identification.

In this page, we'll master the complete state-marking paradigm, understanding not just how it works, but why each component is essential.

What You Will Learn

By the end of this page, you will understand the complete DFS state-marking framework including discovery and finish times, be able to classify every edge in any graph, and know how these concepts enable advanced algorithms. You'll implement production-ready DFS with full instrumentation.

The Three Colors Revisited: Deep Understanding

Let's build a complete mental model of the three-color system by understanding what each color truly represents in terms of the DFS traversal state.

WHITE (Undiscovered)

The vertex has never been encountered during the DFS
No information is available about the vertex or its descendants
The vertex is not part of any DFS execution

GRAY (In Progress / Active)

We have discovered this vertex and started exploring it
We are currently somewhere in the subtree rooted at this vertex
Exploration of some descendants may be complete, but NOT all
The vertex is on the current recursion stack (the path from root to current position)

BLACK (Finished / Complete)

This vertex and ALL of its descendants have been fully explored
No more DFS work will be done from this vertex
The vertex has been removed from the recursion stack
All outgoing edges have been examined

Color Semantics in DFS Execution
Color	Stack Status	Exploration Status	Time Events
WHITE	Never on stack	Not started	No timestamps yet
GRAY	Currently on stack	In progress	Has discovery time only
BLACK	Was on stack, now removed	Complete	Has both discovery and finish time

The Path Invariant

At any moment during DFS, the GRAY vertices form a path from the starting vertex to the vertex currently being explored. This path is exactly the current recursion stack. Understanding this invariant is key to reasoning about DFS properties.

The Color Transition Machine:

Every vertex follows exactly one trajectory through colors:

WHITE ──discover──> GRAY ──finish──> BLACK

No vertex ever goes backwards. Once a vertex is BLACK, it stays BLACK forever. This monotonicity is what makes reasoning about DFS properties tractable.

Discovery and Finish Times: Temporal Footprints

The three colors capture state, but we can capture even more information by recording when each transition happens. This gives us timestamps for each vertex:

Discovery Time (d[v]): The clock value when vertex v is first encountered (turns GRAY)

Finish Time (f[v]): The clock value when vertex v is completely processed (turns BLACK)

We maintain a global clock that increments with each event. For a graph with V vertices, the clock runs from 1 to 2V (one discovery and one finish event per vertex).

The Parenthesis Theorem:

For any two vertices u and v, exactly one of the following holds:

[d[u], f[u]] and [d[v], f[v]] are disjoint — Neither is ancestor of the other
[d[u], f[u]] is contained within [d[v], f[v]] — u is a descendant of v
[d[v], f[v]] is contained within [d[u], f[u]] — v is a descendant of u

The intervals behave like properly nested parentheses—they either don't overlap at all, or one is completely inside the other. This is why it's called the Parenthesis Theorem.

dfs_timestamps.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
from collections import defaultdict
from enum import Enum
from dataclasses import dataclass
from typing import Optional
 
class Color(Enum):
    WHITE = 0
    GRAY = 1
    BLACK = 2
 
 
@dataclass
class VertexInfo:
    """Complete DFS information for a vertex."""
    color: Color = Color.WHITE
    discovery_time: int = 0
    finish_time: int = 0
    parent: int = -1  # Parent in DFS tree
 
 
class DFSWithTimestamps:
    """
    Full DFS implementation with color marking and timestamps.
    Captures complete traversal information.
    """
    
    def __init__(self, vertices: int):
        self.V = vertices
        self.adj: dict[int, list[int]] = defaultdict(list)
        self.info = [VertexInfo() for _ in range(vertices)]
        self.time = 0
        
        # Edge classification results
        self.tree_edges: list[tuple[int, int]] = []
        self.back_edges: list[tuple[int, int]] = []
        self.forward_edges: list[tuple[int, int]] = []
        self.cross_edges: list[tuple[int, int]] = []
    
    def add_edge(self, u: int, v: int) -> None:
        """Add directed edge from u to v."""
        self.adj[u].append(v)
    
    def run_dfs(self) -> None:
        """
        Execute DFS on entire graph, recording all information.
        
        After calling this method, you can access:
        - self.info[v].discovery_time, finish_time, parent
        - self.tree_edges, back_edges, forward_edges, cross_edges
        """
        self.time = 0
        
        # Reset all vertices
        for v in range(self.V):
            self.info[v] = VertexInfo()
        
        # Clear edge classifications
        self.tree_edges.clear()
        self.back_edges.clear()
        self.forward_edges.clear()
        self.cross_edges.clear()
        
        # Run DFS from each unvisited vertex
        for v in range(self.V):
            if self.info[v].color == Color.WHITE:
                self._dfs_visit(v)
    
    def _dfs_visit(self, u: int) -> None:
        """
        DFS visit with full timestamp recording and edge classification.
        """
        # Discover vertex u
        self.time += 1
        self.info[u].discovery_time = self.time
        self.info[u].color = Color.GRAY
        
        # Explore all neighbors
        for v in self.adj[u]:
            self._classify_edge(u, v)
            
            if self.info[v].color == Color.WHITE:
                # Tree edge: leads to undiscovered vertex
                self.info[v].parent = u
                self._dfs_visit(v)
        
        # Finish vertex u
        self.info[u].color = Color.BLACK
        self.time += 1
        self.info[u].finish_time = self.time
    
    def _classify_edge(self, u: int, v: int) -> None:
        """
        Classify edge (u, v) based on colors and timestamps.
        
        Called when exploring edge from u to v.
        At this point, u is GRAY (we're processing it).
        """
        v_info = self.info[v]
        
        if v_info.color == Color.WHITE:
            self.tree_edges.append((u, v))
        elif v_info.color == Color.GRAY:
            # v is being processed, so v is ancestor of u
            self.back_edges.append((u, v))
        else:  # BLACK
            u_info = self.info[u]
            if u_info.discovery_time < v_info.discovery_time:
                # v was discovered after u started but finished before u
                # So v is a descendant of u in DFS tree
                self.forward_edges.append((u, v))
            else:
                # v was completely processed before u was discovered
                # Different subtree
                self.cross_edges.append((u, v))
    
    def print_results(self) -> None:
        """Print complete DFS analysis."""
        print("\nVertex Information:")
        print("-" * 50)
        for v in range(self.V):
            info = self.info[v]
            print(f"Vertex {v}: d={info.discovery_time}, "
                  f"f={info.finish_time}, parent={info.parent}")
        
        print("\nEdge Classification:")
        print(f"  Tree edges:    {self.tree_edges}")
        print(f"  Back edges:    {self.back_edges}")
        print(f"  Forward edges: {self.forward_edges}")
        print(f"  Cross edges:   {self.cross_edges}")
        
        if self.back_edges:
            print("\n⚠️  Graph contains cycles (back edges exist)")
        else:
            print("\n✓ Graph is acyclic (DAG)")
    
    def is_ancestor(self, u: int, v: int) -> bool:
        """
        Check if u is an ancestor of v in DFS tree.
        Uses the parenthesis theorem.
        """
        u_info, v_info = self.info[u], self.info[v]
        return (u_info.discovery_time <= v_info.discovery_time and
                v_info.finish_time <= u_info.finish_time)
    
    def get_topological_order(self) -> Optional[list[int]]:
        """
        Get topological ordering if graph is acyclic.
        Vertices sorted by decreasing finish time.
        """
        if self.back_edges:
            return None  # Has cycles
        
        vertices = list(range(self.V))
        vertices.sort(key=lambda v: -self.info[v].finish_time)
        return vertices
 
 
# Example usage
if __name__ == "__main__":
    g = DFSWithTimestamps(6)
    
    # Create a graph with various edge types
    g.add_edge(0, 1)
    g.add_edge(0, 2)
    g.add_edge(1, 3)
    g.add_edge(2, 3)  # Cross or forward depending on order
    g.add_edge(3, 4)
    g.add_edge(4, 5)
    g.add_edge(5, 3)  # Back edge - creates cycle
    
    g.run_dfs()
    g.print_results()
    
    # Check ancestry relationships
    print(f"\nIs 0 ancestor of 4? {g.is_ancestor(0, 4)}")
    print(f"Is 4 ancestor of 0? {g.is_ancestor(4, 0)}")

Edge Classification: A Complete Taxonomy

Every edge in a directed graph falls into exactly one of four categories during DFS traversal. Understanding these categories is crucial for reasoning about graph algorithms.

1. Tree Edges

Definition: Edges (u, v) where v is WHITE when we explore from u
Meaning: These edges form the DFS tree (or forest)
Detection: v.color == WHITE when processing edge

2. Back Edges

Definition: Edges (u, v) where v is an ancestor of u in the DFS tree
Meaning: These edges create cycles; they "go back" to vertices we're still processing
Detection: v.color == GRAY when processing edge
Key Property: Back edges are the ONLY edges that create cycles

3. Forward Edges

Definition: Edges (u, v) where v is a descendant of u, but not via a tree edge
Meaning: A shortcut to a descendant (we already reached v through tree edges)
Detection: v.color == BLACK AND d[u] < d[v]

4. Cross Edges

Definition: All other edges—those between vertices with no ancestor/descendant relationship
Meaning: Connect vertices from different subtrees or different DFS trees
Detection: v.color == BLACK AND d[u] > d[v]

Edge Classification Summary
Edge Type	v's Color	Timing Condition	Creates Cycle?	DFS Tree Relation
Tree	WHITE	—	No	v becomes child of u
Back	GRAY	d[v] < d[u] < f[u] < f[v]	YES	v is ancestor of u
Forward	BLACK	d[u] < d[v]	No	v is descendant of u
Cross	BLACK	d[v] < f[v] < d[u]	No	No relation

Undirected Graph Simplification

In undirected graphs, forward and cross edges don't exist! When you find a BLACK adjacent vertex in an undirected graph, it means you've found a back edge (cycle), because any path you took to finish that vertex must have passed through your current position somehow.

Visual Example:

Consider DFS traversal of the graph with edges: 0→1, 0→3, 1→2, 2→3, 3→1

If we visit in order 0→1→2→3:

DFS Tree:        Edge Types:
    0            0→1: Tree
   / \           0→3: Forward (3 is descendant, d[0]<d[3])
  1   3          1→2: Tree  
  |              2→3: Tree
  2              3→1: Back (1 is GRAY, ancestor of 3)
  |
  3

The back edge 3→1 creates the cycle 1→2→3→1.

Timestamps as Ancestry Oracles

Once DFS completes and we have discovery/finish times for all vertices, we gain the ability to answer ancestry queries in O(1) time. This is remarkably powerful for many algorithms.

The Key Insight:

Vertex u is an ancestor of vertex v in the DFS tree if and only if:

d[u] ≤ d[v] AND f[v] ≤ f[u]

In other words, u was discovered before v, and v finished before u. This makes sense: if u is an ancestor, we must have started exploring u before we could reach v, and v must complete before we can complete u (we're waiting for the entire subtree).

Applications:

LCA Queries: Combined with other techniques, timestamps help find lowest common ancestors
Subtree Queries: Check if vertex v is in the subtree rooted at u
Edge Classification: Distinguish forward edges from cross edges
Ordering Validation: Verify topological ordering correctness

ancestry_queries.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
class AncestryOracle:
    """
    Preprocesses a graph for O(1) ancestry queries.
    """
    
    def __init__(self, vertices: int, edges: list[tuple[int, int]]):
        self.V = vertices
        self.adj: dict[int, list[int]] = {i: [] for i in range(vertices)}
        for u, v in edges:
            self.adj[u].append(v)
        
        self.discovery = [0] * vertices
        self.finish = [0] * vertices
        self.time = 0
        
        self._run_dfs()
    
    def _run_dfs(self) -> None:
        """Precompute all timestamps."""
        visited = [False] * self.V
        for v in range(self.V):
            if not visited[v]:
                self._dfs(v, visited)
    
    def _dfs(self, u: int, visited: list[bool]) -> None:
        visited[u] = True
        self.time += 1
        self.discovery[u] = self.time
        
        for v in self.adj[u]:
            if not visited[v]:
                self._dfs(v, visited)
        
        self.time += 1
        self.finish[u] = self.time
    
    def is_ancestor(self, u: int, v: int) -> bool:
        """
        O(1) check: Is u an ancestor of v in DFS tree?
        """
        return (self.discovery[u] <= self.discovery[v] and
                self.finish[v] <= self.finish[u])
    
    def is_descendant(self, u: int, v: int) -> bool:
        """Is u a descendant of v?"""
        return self.is_ancestor(v, u)
    
    def are_related(self, u: int, v: int) -> bool:
        """Is either an ancestor of the other?"""
        return self.is_ancestor(u, v) or self.is_ancestor(v, u)
    
    def subtree_size(self, u: int) -> int:
        """
        Number of vertices in subtree rooted at u.
        Uses the observation that descendants have
        discovery times in interval [d[u], f[u]].
        """
        count = 0
        for v in range(self.V):
            if (self.discovery[u] <= self.discovery[v] <= self.finish[u]):
                count += 1
        return count
    
    def get_interval(self, v: int) -> tuple[int, int]:
        """Get the [discovery, finish] interval for vertex v."""
        return (self.discovery[v], self.finish[v])
 
 
# Example: Building a tree-like structure
#      0
#     /|\
#    1 2 3
#    |   |
#    4   5
 
oracle = AncestryOracle(6, [
    (0, 1), (0, 2), (0, 3),
    (1, 4), (3, 5)
])
 
print("Intervals (discovery, finish):")
for v in range(6):
    print(f"  Vertex {v}: {oracle.get_interval(v)}")
 
print(f"\nIs 0 ancestor of 4? {oracle.is_ancestor(0, 4)}")  # True
print(f"Is 0 ancestor of 5? {oracle.is_ancestor(0, 5)}")  # True  
print(f"Is 1 ancestor of 5? {oracle.is_ancestor(1, 5)}")  # False (different subtree)
print(f"Is 4 ancestor of 1? {oracle.is_ancestor(4, 1)}")  # False (reverse)
print(f"Are 4 and 5 related? {oracle.are_related(4, 5)}")  # False (cousins)
print(f"Subtree size of 0: {oracle.subtree_size(0)}")  # 6

The Euler Tour Technique

The discovery/finish times are closely related to the Euler Tour technique for tree problems. Each vertex appears twice in the timeline: at discovery and at finish. This creates a sequence where the interval [d[u], f[u]] exactly captures the subtree rooted at u, enabling many advanced tree algorithms.

Applications of Edge Classification

The ability to classify edges enables a wide variety of graph algorithms. Here are the key applications:

1. Cycle Detection (Back Edges)

As we've seen, a directed graph has a cycle if and only if DFS discovers at least one back edge. This is the most direct application.

2. Topological Sorting (Absence of Back Edges)

A DAG can be topologically sorted by outputting vertices in decreasing order of finish time. Since there are no back edges, no vertex points to one that finishes later.

3. Strongly Connected Components (Using Tree + Back Edges)

Both Tarjan's and Kosaraju's algorithms use DFS with careful attention to back edges to find SCCs.

4. Detecting Unreachable Code (Forward + Cross Edges)

In control flow graphs, forward and cross edges can indicate alternative paths or unreachable code segments.

edge_applications.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
class EdgeApplications:
    """
    Demonstrates practical applications of DFS edge classification.
    """
    
    def __init__(self, vertices: int):
        self.V = vertices
        self.adj: dict[int, list[int]] = {i: [] for i in range(vertices)}
    
    def add_edge(self, u: int, v: int):
        self.adj[u].append(v)
    
    def topological_sort(self) -> list[int] | None:
        """
        Return topological order if DAG, None if has cycles.
        Uses the fact that in a DAG, processing vertices by
        decreasing finish time gives valid topological order.
        """
        WHITE, GRAY, BLACK = 0, 1, 2
        color = [WHITE] * self.V
        result = []
        
        def dfs(u: int) -> bool:
            color[u] = GRAY
            for v in self.adj[u]:
                if color[v] == GRAY:
                    return False  # Back edge - cycle!
                if color[v] == WHITE:
                    if not dfs(v):
                        return False
            color[u] = BLACK
            result.append(u)  # Add on FINISH (will reverse later)
            return True
        
        for v in range(self.V):
            if color[v] == WHITE:
                if not dfs(v):
                    return None  # Has cycle
        
        return result[::-1]  # Reverse to get decreasing finish order
    
    def find_all_cycles(self) -> list[list[int]]:
        """
        Find all simple cycles by tracking back edges and reconstructing.
        Note: This can be expensive for graphs with many cycles.
        """
        WHITE, GRAY, BLACK = 0, 1, 2
        color = [WHITE] * self.V
        parent = [-1] * self.V
        cycles = []
        
        def dfs(u: int, path: list[int]):
            color[u] = GRAY
            path.append(u)
            
            for v in self.adj[u]:
                if color[v] == GRAY:
                    # Back edge! Extract cycle from path
                    idx = path.index(v)
                    cycle = path[idx:] + [v]
                    cycles.append(cycle)
                elif color[v] == WHITE:
                    parent[v] = u
                    dfs(v, path)
            
            path.pop()
            color[u] = BLACK
        
        for start in range(self.V):
            if color[start] == WHITE:
                dfs(start, [])
        
        return cycles
    
    def reachability_analysis(self) -> dict:
        """
        Analyze reachability patterns using edge classification.
        Returns statistics about the graph structure.
        """
        WHITE, GRAY, BLACK = 0, 1, 2
        color = [WHITE] * self.V
        discovery = [0] * self.V
        finish = [0] * self.V
        time = 0
        
        stats = {
            'tree_edges': 0,
            'back_edges': 0,
            'forward_edges': 0,
            'cross_edges': 0,
            'root_vertices': 0,  # DFS tree roots
        }
        
        def dfs(u: int):
            nonlocal time
            time += 1
            discovery[u] = time
            color[u] = GRAY
            
            for v in self.adj[u]:
                if color[v] == WHITE:
                    stats['tree_edges'] += 1
                    dfs(v)
                elif color[v] == GRAY:
                    stats['back_edges'] += 1
                else:  # BLACK
                    if discovery[u] < discovery[v]:
                        stats['forward_edges'] += 1
                    else:
                        stats['cross_edges'] += 1
            
            color[u] = BLACK
            time += 1
            finish[u] = time
        
        for v in range(self.V):
            if color[v] == WHITE:
                stats['root_vertices'] += 1
                dfs(v)
        
        # Derive additional insights
        stats['is_dag'] = stats['back_edges'] == 0
        stats['is_tree'] = (stats['tree_edges'] == self.V - stats['root_vertices'] 
                           and stats['back_edges'] == 0 
                           and stats['forward_edges'] == 0
                           and stats['cross_edges'] == 0)
        stats['is_connected'] = stats['root_vertices'] == 1
        
        return stats
 
 
# Example usage
g = EdgeApplications(6)
edges = [(0, 1), (0, 2), (1, 3), (2, 3), (3, 4), (4, 5)]
for u, v in edges:
    g.add_edge(u, v)
 
print("Topological sort:", g.topological_sort())
print("Reachability analysis:", g.reachability_analysis())
 
# Add a cycle
g.add_edge(5, 3)
print("\nAfter adding cycle 5→3:")
print("Topological sort:", g.topological_sort())
print("Cycles found:", g.find_all_cycles())
print("Reachability analysis:", g.reachability_analysis())

Iterative DFS: Simulating the Color State Machine

For very deep graphs, recursive DFS can overflow the call stack. We can implement the same algorithm iteratively using an explicit stack, but we need to be careful to maintain the same state transitions.

The Challenge:

In recursive DFS, a vertex naturally turns BLACK when we return from the function. In iterative DFS, we must explicitly handle both the "discover" and "finish" events.

Two Approaches:

Iterator-based: Store an iterator over neighbors on the stack, resume where we left off
Event-based: Push both "discover" and "finish" events to the stack

Let's implement the event-based approach for maximum clarity:

iterative_dfs_colors.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
from enum import Enum
from dataclasses import dataclass
 
class Color(Enum):
    WHITE = 0
    GRAY = 1
    BLACK = 2
 
class EventType(Enum):
    DISCOVER = 0  # Start processing vertex
    FINISH = 1    # Done processing vertex
 
 
@dataclass
class StackEvent:
    """Event to process from the stack."""
    event_type: EventType
    vertex: int
 
 
class IterativeDFSColors:
    """
    Iterative DFS with full three-color state tracking.
    Avoids recursion stack overflow for deep graphs.
    """
    
    def __init__(self, vertices: int):
        self.V = vertices
        self.adj: dict[int, list[int]] = {i: [] for i in range(vertices)}
        self.color = [Color.WHITE] * vertices
        self.discovery = [0] * vertices
        self.finish = [0] * vertices
        self.parent = [-1] * vertices
        self.time = 0
        self.has_cycle = False
    
    def add_edge(self, u: int, v: int):
        self.adj[u].append(v)
    
    def run_dfs(self) -> None:
        """
        Execute complete DFS with iterative implementation.
        """
        self.time = 0
        self.color = [Color.WHITE] * self.V
        self.has_cycle = False
        
        for source in range(self.V):
            if self.color[source] == Color.WHITE:
                self._dfs_from(source)
    
    def _dfs_from(self, source: int) -> None:
        """
        DFS from single source using explicit stack.
        
        Key insight: Push FINISH event before DISCOVER event,
        so DISCOVER is processed first (stack is LIFO).
        When we pop DISCOVER, we push neighbor explorations.
        When we pop FINISH, we complete the vertex.
        """
        stack = [StackEvent(EventType.DISCOVER, source)]
        
        while stack:
            event = stack.pop()
            vertex = event.vertex
            
            if event.event_type == EventType.DISCOVER:
                if self.color[vertex] != Color.WHITE:
                    # Already discovered through another path
                    continue
                
                # Discover vertex (WHITE → GRAY)
                self.time += 1
                self.discovery[vertex] = self.time
                self.color[vertex] = Color.GRAY
                
                # Push FINISH event (will execute after all descendants)
                stack.append(StackEvent(EventType.FINISH, vertex))
                
                # Push neighbor explorations (in reverse for natural order)
                for neighbor in reversed(self.adj[vertex]):
                    if self.color[neighbor] == Color.WHITE:
                        self.parent[neighbor] = vertex
                        stack.append(StackEvent(EventType.DISCOVER, neighbor))
                    elif self.color[neighbor] == Color.GRAY:
                        # Back edge detected - cycle!
                        self.has_cycle = True
            
            else:  # FINISH event
                # Finish vertex (GRAY → BLACK)
                if self.color[vertex] == Color.GRAY:  # Guard against duplicates
                    self.time += 1
                    self.finish[vertex] = self.time
                    self.color[vertex] = Color.BLACK
    
    def print_state(self) -> None:
        print("\nDFS Results (Iterative):")
        print("-" * 40)
        for v in range(self.V):
            print(f"Vertex {v}: d={self.discovery[v]:2d}, "
                  f"f={self.finish[v]:2d}, parent={self.parent[v]}")
        print(f"\nHas cycle: {self.has_cycle}")
 
 
class IterativeDFSIterator:
    """
    Alternative: Iterator-based approach.
    More memory efficient as we don't duplicate events.
    """
    
    def __init__(self, vertices: int):
        self.V = vertices
        self.adj: dict[int, list[int]] = {i: [] for i in range(vertices)}
    
    def add_edge(self, u: int, v: int):
        self.adj[u].append(v)
    
    def has_cycle(self) -> bool:
        """
        Check for cycle using iterator-based iterative DFS.
        """
        WHITE, GRAY, BLACK = 0, 1, 2
        color = [WHITE] * self.V
        
        for source in range(self.V):
            if color[source] != WHITE:
                continue
            
            # Stack: (vertex, iterator over remaining neighbors)
            stack = [(source, iter(self.adj[source]))]
            color[source] = GRAY
            
            while stack:
                vertex, neighbors = stack[-1]
                
                try:
                    neighbor = next(neighbors)
                    
                    if color[neighbor] == GRAY:
                        return True  # Back edge!
                    elif color[neighbor] == WHITE:
                        color[neighbor] = GRAY
                        stack.append((neighbor, iter(self.adj[neighbor])))
                
                except StopIteration:
                    # Done with all neighbors
                    stack.pop()
                    color[vertex] = BLACK
        
        return False
 
 
# Demonstration
g = IterativeDFSColors(6)
edges = [(0, 1), (1, 2), (2, 3), (3, 4), (4, 5)]
for u, v in edges:
    g.add_edge(u, v)
 
g.run_dfs()
g.print_state()
 
# Add a back edge
g.add_edge(5, 2)
g.run_dfs()
g.print_state()

Event-Based vs Iterator-Based

The event-based approach is conceptually cleaner and easier to extend with additional processing. The iterator-based approach uses less memory (no duplicate events) and is more performant for simple use cases. Choose based on your needs.

Common Mistakes and Debugging Strategies

State-based DFS is conceptually elegant but has several subtle pitfalls. Here are the most common mistakes and how to avoid them:

Common Mistakes

•Forgetting to mark BLACK: If you don't mark vertices BLACK after processing, you'll re-explore them and get wrong results. In recursive DFS, this happens naturally at the end of the function. In iterative, it's easy to forget.
•Wrong order of neighbor processing in iterative DFS: If you push neighbors directly onto the stack, they're processed in reverse order (LIFO). For consistent behavior with recursive DFS, reverse the neighbor list when pushing.
•Confusing discovery time increment: The clock should increment BEFORE assigning to discovery_time, and again BEFORE assigning to finish_time. A common bug is to increment after assignment.
•Not handling disconnected graphs: Always loop over all vertices as potential starting points, not just vertex 0.
•Checking color incorrectly for cycle detection: For directed graphs, only GRAY→GRAY edges are back edges. BLACK means the vertex is done, not that it's an ancestor.
•Modifying the graph during traversal: Adding edges while DFS is running can cause unpredictable behavior. Build the complete graph first.

debugging_helpers.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
def verify_dfs_timestamps(vertices: int, discovery: list, finish: list) -> list[str]:
    """
    Verify that DFS timestamps are valid.
    Returns list of any violations found.
    """
    violations = []
    
    # Check basic bounds
    for v in range(vertices):
        if discovery[v] < 1 or discovery[v] > 2 * vertices:
            violations.append(f"Vertex {v}: invalid discovery time {discovery[v]}")
        if finish[v] < 1 or finish[v] > 2 * vertices:
            violations.append(f"Vertex {v}: invalid finish time {finish[v]}")
        if discovery[v] >= finish[v]:
            violations.append(f"Vertex {v}: discovery ({discovery[v]}) >= finish ({finish[v]})")
    
    # Check all timestamps are unique
    all_times = discovery + finish
    if len(all_times) != len(set(all_times)):
        violations.append("Duplicate timestamps found!")
    
    # Check parenthesis property: intervals must be properly nested
    for u in range(vertices):
        for v in range(u + 1, vertices):
            interval_u = (discovery[u], finish[u])
            interval_v = (discovery[v], finish[v])
            
            # Check if they overlap improperly
            if (interval_u[0] < interval_v[0] < interval_u[1] < interval_v[1] or
                interval_v[0] < interval_u[0] < interval_v[1] < interval_u[1]):
                violations.append(f"Vertices {u} and {v} have improperly overlapping intervals")
    
    return violations
 
 
def trace_dfs_execution(adj: dict, source: int = 0) -> None:
    """
    Trace DFS execution step by step for debugging.
    """
    vertices = len(adj)
    color = ['W'] * vertices  # WHITE
    discovery = [0] * vertices
    finish = [0] * vertices
    time = 0
    
    def dfs(u: int, indent: str = ""):
        nonlocal time
        time += 1
        discovery[u] = time
        color[u] = 'G'  # GRAY
        
        print(f"{indent}DISCOVER {u} at time {time}")
        print(f"{indent}  Colors: {color}")
        
        for v in adj.get(u, []):
            edge_type = ""
            if color[v] == 'W':
                edge_type = "TREE"
                dfs(v, indent + "    ")
            elif color[v] == 'G':
                edge_type = "BACK (cycle!)"
            else:
                edge_type = "CROSS/FORWARD"
            print(f"{indent}  Edge {u}→{v}: {edge_type}")
        
        color[u] = 'B'  # BLACK  
        time += 1
        finish[u] = time
        print(f"{indent}FINISH {u} at time {time}")
    
    print(f"\nDFS Trace from source {source}")
    print("=" * 50)
    
    for v in range(vertices):
        if color[v] == 'W':
            if v != source:
                print(f"\nNew tree starting at {v}")
            dfs(v)
    
    print("\nFinal timestamps:")
    for v in range(vertices):
        print(f"  Vertex {v}: d={discovery[v]}, f={finish[v]}")
    
    errors = verify_dfs_timestamps(vertices, discovery, finish)
    if errors:
        print("\nValidation FAILED:")
        for e in errors:
            print(f"  - {e}")
    else:
        print("\nValidation PASSED ✓")
 
 
# Example trace
trace_dfs_execution({
    0: [1, 2],
    1: [3],
    2: [3],
    3: [4],
    4: []
})

Summary: Mastering DFS State Marking

The three-color DFS framework with timestamps is one of the most powerful tools in graph algorithms. Let's consolidate what we've learned:

Key Takeaways

•Three colors capture complete state: WHITE (undiscovered), GRAY (in progress on stack), BLACK (finished). Every vertex transitions WHITE→GRAY→BLACK exactly once.
•Timestamps provide temporal order: Discovery time marks when we start; finish time marks when we complete. The interval [d[v], f[v]] captures the complete exploration window for v.
•Parenthesis Theorem enables O(1) ancestry: Two vertices' intervals are either disjoint or one contains the other. Containment means ancestor-descendant relationship.
•Edge classification reveals structure: Tree edges build the DFS forest, back edges indicate cycles, forward/cross edges show alternative reachability.
•Back edges are the key to cycle detection: In directed graphs, a cycle exists iff we find at least one back edge (GRAY→GRAY).
•Iterative implementation requires explicit events: Use an event-based or iterator-based approach to avoid recursion stack overflow while maintaining correct state transitions.

Coming Up Next:

In the final page of this module, we'll explore the practical applications of cycle detection, from deadlock detection in operating systems to dependency resolution in package managers. We'll see how these theoretical concepts translate into real-world engineering solutions.

Page Complete

You now have complete mastery of the DFS state-marking paradigm. You understand colors, timestamps, edge classification, and both recursive and iterative implementations. This knowledge is foundational for many advanced graph algorithms including topological sort, SCC detection, and articulation point finding.