Loading content...
When each resource type in the system has exactly one instance—one printer, one database connection, one exclusive file lock—deadlock detection becomes remarkably elegant. In this scenario, the wait-for graph provides a complete and direct representation of the system's deadlock potential.
Why is single-instance detection simpler? Because resource allocation is binary: a resource is either held by exactly one process or available. There's no ambiguity about "which instance" a process might receive. When process P waits for resource R, and R is held by process Q, we know exactly that P is waiting for Q. This directness translates into a graph where:
This equivalence between cycles and deadlocks makes single-instance detection both conceptually clear and computationally efficient.
By the end of this page, you will understand why cycles in the wait-for graph precisely indicate deadlock for single-instance resources, master multiple algorithms for cycle detection (DFS, topological sort), implement production-quality detection code, and analyze the time and space complexity of these approaches.
The wait-for graph (WFG) is a directed graph that captures the essential waiting relationships between processes.
Formal Definition:
Given:
The wait-for graph G = (V, E) is defined as:
An edge Pᵢ → Pⱼ exists if and only if:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150
from dataclasses import dataclass, fieldfrom typing import Dict, Set, List, Optional, Tuplefrom enum import Enum class ResourceState(Enum): """State of a single-instance resource.""" FREE = 0 HELD = 1 @dataclassclass Resource: """A single-instance resource.""" id: int name: str state: ResourceState = ResourceState.FREE holder: Optional[int] = None # Process ID holding this resource @dataclass class Process: """A process that may hold or request resources.""" pid: int name: str holding: Set[int] = field(default_factory=set) # Resource IDs held waiting_for: Optional[int] = None # Resource ID waiting for (if any) @dataclassclass WaitForGraph: """ Wait-for graph for single-instance resource deadlock detection. The graph is derived from the current state of processes and resources. Cycles in this graph indicate definite deadlock. """ adjacency: Dict[int, Set[int]] = field(default_factory=dict) # pid -> set of pids processes: Dict[int, Process] = field(default_factory=dict) resources: Dict[int, Resource] = field(default_factory=dict) def add_process(self, process: Process): """Register a process in the system.""" self.processes[process.pid] = process self.adjacency[process.pid] = set() def add_resource(self, resource: Resource): """Register a single-instance resource.""" self.resources[resource.id] = resource def allocate(self, pid: int, resource_id: int) -> bool: """ Attempt to allocate a resource to a process. Returns True if successful, False if resource is held by another. """ resource = self.resources[resource_id] process = self.processes[pid] if resource.state == ResourceState.FREE: # Grant the resource resource.state = ResourceState.HELD resource.holder = pid process.holding.add(resource_id) process.waiting_for = None # No longer waiting self._rebuild_edges_for(pid) return True else: # Resource is held - process must wait process.waiting_for = resource_id self._rebuild_edges_for(pid) return False def release(self, pid: int, resource_id: int): """Release a resource held by a process.""" resource = self.resources[resource_id] process = self.processes[pid] if resource.holder == pid: resource.state = ResourceState.FREE resource.holder = None process.holding.discard(resource_id) # Rebuild edges for all processes that might have been waiting for this for p in self.processes.values(): if p.waiting_for == resource_id: self._rebuild_edges_for(p.pid) def _rebuild_edges_for(self, pid: int): """Rebuild wait-for edges for a specific process.""" process = self.processes[pid] self.adjacency[pid].clear() if process.waiting_for is not None: resource = self.resources[process.waiting_for] if resource.holder is not None and resource.holder != pid: # This process waits for the holder self.adjacency[pid].add(resource.holder) def get_edges(self) -> List[Tuple[int, int]]: """Get all edges as (waiting_pid, holding_pid) tuples.""" edges = [] for pid, waiting_for in self.adjacency.items(): for holder_pid in waiting_for: edges.append((pid, holder_pid)) return edges def print_graph(self): """Print the current wait-for graph.""" print("Wait-For Graph:") print(" Processes:", [p.name for p in self.processes.values()]) print(" Edges:") for pid, waiting_for in self.adjacency.items(): if waiting_for: p = self.processes[pid] for holder_pid in waiting_for: h = self.processes[holder_pid] r = self.resources.get(p.waiting_for) print(f" {p.name} --({r.name if r else '?'})--> {h.name}") # Demonstrationdef demonstrate_wait_for_graph(): wfg = WaitForGraph() # Create processes for i, name in enumerate(["Web Server", "Database", "Cache", "Logger"]): wfg.add_process(Process(pid=i, name=name)) # Create single-instance resources for i, name in enumerate(["Mutex A", "Mutex B", "Mutex C"]): wfg.add_resource(Resource(id=i, name=name)) # Simulate a deadlock scenario: # P0 acquires R0, then requests R1 # P1 acquires R1, then requests R2 # P2 acquires R2, then requests R0 print("=== Initial allocations ===") wfg.allocate(0, 0) # P0 gets R0 (Mutex A) wfg.allocate(1, 1) # P1 gets R1 (Mutex B) wfg.allocate(2, 2) # P2 gets R2 (Mutex C) wfg.print_graph() print("=== Creating deadlock ===") wfg.allocate(0, 1) # P0 wants R1 (held by P1) -> P0 waits for P1 wfg.allocate(1, 2) # P1 wants R2 (held by P2) -> P1 waits for P2 wfg.allocate(2, 0) # P2 wants R0 (held by P0) -> P2 waits for P0 wfg.print_graph() print("Cycle: P0 -> P1 -> P2 -> P0 [DEADLOCK!]") if __name__ == "__main__": demonstrate_wait_for_graph()Why Cycles Mean Deadlock (for Single Instances):
Suppose we have a cycle: P₀ → P₁ → P₂ → ... → Pₖ → P₀
This means:
No process in this cycle can make progress:
Since each resource has only one instance, there's no chance of another instance becoming available. The cycle IS the deadlock.
This perfect correspondence between cycles and deadlocks holds ONLY for single-instance resources. With multiple instances, a cycle is necessary but not sufficient—the allocation of specific instances matters. We'll explore this distinction when we cover multiple-instance detection.
The most intuitive approach to finding cycles is depth-first search (DFS). The algorithm colors nodes to track their exploration state and identifies cycles via "back edges"—edges pointing to an ancestor in the current DFS tree.
The Three-Color Algorithm:
A back edge exists when we encounter a GRAY node during DFS—meaning we've found a path back to an ancestor, completing a cycle.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198
from enum import IntEnumfrom typing import Dict, Set, List, Tuple, Optionalfrom dataclasses import dataclass class Color(IntEnum): """DFS node coloring for cycle detection.""" WHITE = 0 # Not yet visited GRAY = 1 # Currently in DFS stack (being explored) BLACK = 2 # Fully explored @dataclassclass CycleDetectionResult: """Result of cycle detection algorithm.""" has_cycle: bool cycle: List[int] # Nodes in the cycle (if found) deadlocked: Set[int] # All nodes involved in some cycle def detect_cycles_dfs(adjacency: Dict[int, Set[int]]) -> CycleDetectionResult: """ Detect cycles in a directed graph using DFS with three-color marking. Time Complexity: O(V + E) where V = vertices, E = edges Space Complexity: O(V) for color array and recursion stack Args: adjacency: Mapping from node ID to set of neighbor node IDs Returns: CycleDetectionResult with cycle information """ if not adjacency: return CycleDetectionResult( has_cycle=False, cycle=[], deadlocked=set() ) color: Dict[int, Color] = {node: Color.WHITE for node in adjacency} parent: Dict[int, Optional[int]] = {node: None for node in adjacency} cycle_found: List[int] = [] all_in_cycles: Set[int] = set() def dfs(node: int) -> bool: """ DFS from a node, returning True if a cycle is found. """ color[node] = Color.GRAY for neighbor in adjacency.get(node, set()): if color[neighbor] == Color.GRAY: # Back edge found! This is a cycle. # Reconstruct the cycle: neighbor -> ... -> node -> neighbor nonlocal cycle_found cycle_found = reconstruct_cycle(node, neighbor, parent) all_in_cycles.update(cycle_found) return True elif color[neighbor] == Color.WHITE: parent[neighbor] = node if dfs(neighbor): return True color[node] = Color.BLACK return False def reconstruct_cycle(current: int, cycle_start: int, parent_map: Dict[int, Optional[int]]) -> List[int]: """ Reconstruct the cycle path from cycle_start back to cycle_start. When we find a back edge current -> cycle_start, the cycle is: cycle_start -> ... -> current -> cycle_start """ cycle = [current] node = current while parent_map[node] != cycle_start and parent_map[node] is not None: node = parent_map[node] cycle.append(node) cycle.append(cycle_start) cycle.reverse() # Start from cycle_start return cycle # Run DFS from each unvisited node for start_node in adjacency: if color[start_node] == Color.WHITE: if dfs(start_node): # Found a cycle starting from this component pass # Continue to find all cycles return CycleDetectionResult( has_cycle=len(cycle_found) > 0, cycle=cycle_found, deadlocked=all_in_cycles ) def find_all_cycles(adjacency: Dict[int, Set[int]]) -> List[List[int]]: """ Find ALL cycles in the graph, not just one. Uses Johnson's algorithm concept: for each node, find all cycles that include that node as the smallest-indexed node. This is more expensive but provides complete information. """ all_cycles: List[List[int]] = [] nodes = sorted(adjacency.keys()) for start in nodes: # Find cycles starting from 'start' that only go through # nodes >= start (to avoid counting the same cycle multiple times) blocked: Set[int] = set() path: List[int] = [] def circuit(v: int) -> bool: found_cycle = False path.append(v) blocked.add(v) for w in adjacency.get(v, set()): if w == start: # Found a cycle all_cycles.append(path.copy()) found_cycle = True elif w not in blocked and w > start: if circuit(w): found_cycle = True if found_cycle: blocked.discard(v) path.pop() return found_cycle circuit(start) return all_cycles # Demonstrationdef demo_cycle_detection(): print("=== Single Cycle ===") # P0 -> P1 -> P2 -> P0 graph1 = { 0: {1}, 1: {2}, 2: {0}, } result1 = detect_cycles_dfs(graph1) print(f"Has cycle: {result1.has_cycle}") print(f"Cycle: {result1.cycle}") print(f"Deadlocked processes: {result1.deadlocked}") print("=== No Cycle ===") # P0 -> P1 -> P2 (no back edge) graph2 = { 0: {1}, 1: {2}, 2: set(), } result2 = detect_cycles_dfs(graph2) print(f"Has cycle: {result2.has_cycle}") print("=== Multiple Cycles ===") # P0 -> P1 -> P0 (cycle 1) # P2 -> P3 -> P2 (cycle 2, disconnected) graph3 = { 0: {1}, 1: {0}, 2: {3}, 3: {2}, } all_cycles = find_all_cycles(graph3) print(f"All cycles found: {all_cycles}") print("=== Complex Graph ===") # 0 -> 1 -> 2 # ^ | | # | v v # 4 <- 3 <- - graph4 = { 0: {1}, 1: {2, 3}, 2: {3}, 3: {4}, 4: {0}, } result4 = detect_cycles_dfs(graph4) print(f"Has cycle: {result4.has_cycle}") print(f"One cycle: {result4.cycle}") if __name__ == "__main__": demo_cycle_detection()Algorithm Correctness Proof:
We prove that a GRAY node encounter indicates a cycle:
Gray nodes are ancestors: During DFS from node u, colored GRAY, all nodes on the path from the DFS starting point to u are also GRAY (currently on the stack).
Back edge creates cycle: If we're at node v, exploring edge v → w, and w is GRAY, then w is an ancestor of v. The path w → ... → v plus edge v → w forms a cycle.
Completeness: Any cycle in the graph will be detected because DFS explores all edges. When we traverse an edge that's part of a cycle, we'll find a back edge.
Black nodes are safe: If w is BLACK, we've fully explored w and found no cycle through it. Edge v → w doesn't create a new cycle because any cycle containing this edge would have been detected during w's exploration.
For production systems with potentially deep recursion, convert the recursive DFS to an iterative version using an explicit stack. This prevents stack overflow for graphs with thousands of nodes or long chains. The logic remains identical—we just manage the stack manually.
An alternative to DFS is the topological sort approach using Kahn's algorithm. The insight: a directed acyclic graph (DAG) can be topologically sorted, but a graph with cycles cannot. If topological sort fails to include all nodes, cycles exist.
Kahn's Algorithm Insight:
Topological sort works by repeatedly removing nodes with no incoming edges (in-degree = 0). In a DAG, we can always find such a node until the graph is empty. In a graph with cycles, we'll reach a point where all remaining nodes have incoming edges—they're in cycles.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165
from collections import dequefrom typing import Dict, Set, List, Tuplefrom dataclasses import dataclass @dataclassclass KahnDetectionResult: """Result of Kahn's algorithm-based deadlock detection.""" has_deadlock: bool deadlocked: Set[int] # Processes in deadlock safe_order: List[int] # Safe processes in completion order def detect_deadlock_kahn(adjacency: Dict[int, Set[int]]) -> KahnDetectionResult: """ Detect deadlock using Kahn's topological sort algorithm. Key insight: In the wait-for graph, a process with no outgoing edges (not waiting for anyone) can potentially complete. When it completes, processes waiting for it (incoming edges) may now be unblocked. We reverse the logic: find processes that NO ONE is waiting for (in-degree 0 in the reversed graph, i.e., out-degree 0 in original). Wait, that's not quite right for our use case... Actually, for wait-for graphs: - A process with in-degree 0 (no one waiting for it) can complete only if it's not itself waiting (out-degree 0 also). - A process not waiting for anyone (out-degree 0) can definitely complete. Let's reformulate: we reduce based on OUT-degree = 0 (not waiting). Time Complexity: O(V + E) Space Complexity: O(V) """ if not adjacency: return KahnDetectionResult( has_deadlock=False, deadlocked=set(), safe_order=[] ) # Calculate out-degree for each node # Out-degree = number of processes this process is waiting for out_degree = {node: len(neighbors) for node, neighbors in adjacency.items()} # Calculate in-degree (who's waiting for this process) in_degree = {node: 0 for node in adjacency} for node, neighbors in adjacency.items(): for neighbor in neighbors: in_degree[neighbor] = in_degree.get(neighbor, 0) + 1 # Processes with out-degree 0 are not waiting - they can complete # (assuming they have the resources they need, which for this graph means # they're not blocked) queue = deque() for node in adjacency: if out_degree[node] == 0: queue.append(node) safe_order = [] removed = set() while queue: # This process can complete process = queue.popleft() safe_order.append(process) removed.add(process) # When this process completes, anyone waiting for it might now proceed # We need the reverse adjacency for this for waiting_process, waits_for in adjacency.items(): if process in waits_for and waiting_process not in removed: # waiting_process was waiting for process out_degree[waiting_process] -= 1 if out_degree[waiting_process] == 0: queue.append(waiting_process) # Any node not removed is in a cycle (deadlocked) deadlocked = set(adjacency.keys()) - removed return KahnDetectionResult( has_deadlock=len(deadlocked) > 0, deadlocked=deadlocked, safe_order=safe_order ) def detect_deadlock_reduction(adjacency: Dict[int, Set[int]]) -> KahnDetectionResult: """ Alternative formulation: Graph reduction algorithm. 1. Find all nodes with out-degree 0 (not waiting) 2. Remove them and all edges to them 3. Repeat until no more removals possible 4. Remaining nodes are deadlocked This is conceptually identical to Kahn's but framed as reduction. """ # Make a working copy remaining = {node: neighbors.copy() for node, neighbors in adjacency.items()} safe_order = [] changed = True while changed: changed = False # Find nodes with out-degree 0 removable = [node for node, neighbors in remaining.items() if len(neighbors) == 0] for node in removable: safe_order.append(node) del remaining[node] changed = True # Remove this node from all neighbor lists for neighbors in remaining.values(): neighbors.discard(node) return KahnDetectionResult( has_deadlock=len(remaining) > 0, deadlocked=set(remaining.keys()), safe_order=safe_order ) # Demonstrationdef demo_kahn_detection(): print("=== Deadlock Scenario ===") # P0 -> P1 -> P2 -> P0 (cycle) g1 = { 0: {1}, 1: {2}, 2: {0}, } result1 = detect_deadlock_kahn(g1) print(f"Has deadlock: {result1.has_deadlock}") print(f"Deadlocked: {result1.deadlocked}") print(f"Safe processes: {result1.safe_order}") print("=== Some Safe, Some Deadlocked ===") # P0 -> P1 -> P2 -> P1 (P1, P2 in cycle) # P3 -> P1 (P3 depends on deadlocked P1) # P4 not waiting (P4 is safe) g2 = { 0: {1}, 1: {2}, 2: {1}, 3: {1}, 4: set(), } result2 = detect_deadlock_kahn(g2) print(f"Has deadlock: {result2.has_deadlock}") print(f"Deadlocked: {result2.deadlocked}") print(f"Safe processes: {result2.safe_order}") print("=== Reduction Algorithm ===") result3 = detect_deadlock_reduction(g2) print(f"Has deadlock: {result3.has_deadlock}") print(f"Deadlocked: {result3.deadlocked}") print(f"Safe processes: {result3.safe_order}") if __name__ == "__main__": demo_kahn_detection()Comparison: DFS vs Kahn's Algorithm:
| Aspect | DFS (Three-Color) | Kahn's Algorithm |
|---|---|---|
| Time Complexity | O(V + E) | O(V + E) |
| Space Complexity | O(V) + recursion stack | O(V) + queue |
| Output | One cycle path | All deadlocked nodes |
| Early termination | Stops at first cycle | Processes entire graph |
| Recursive? | Yes (or explicit stack) | No (iterative) |
| Additional info | Cycle path for debugging | Safe execution order |
| Best for | Quick detection | Full analysis |
DFS is better when you just need to know IF there's a deadlock and want to stop early. Kahn's algorithm is better when you need to identify ALL deadlocked processes and want the safe completion order for non-deadlocked processes. Most production systems use DFS for its simplicity and early termination.
A production deadlock detector must handle real-world concerns: thread safety, efficient updates, minimal locking, and detailed reporting. Here's a production-quality implementation:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225
#include <vector>#include <unordered_map>#include <unordered_set>#include <mutex>#include <shared_mutex>#include <optional>#include <chrono>#include <atomic> /** * Production-grade single-instance deadlock detector. * * Features: * - Lock-free graph updates via copy-on-write * - Concurrent reads during detection * - Detailed cycle reporting * - Performance metrics */class DeadlockDetector {public: using ProcessId = uint64_t; using ResourceId = uint64_t; struct DetectionResult { bool has_deadlock; std::vector<ProcessId> cycle; // Cycle path std::vector<ProcessId> all_deadlocked; // All affected processes std::chrono::nanoseconds detection_time; }; struct ResourceHold { ResourceId resource; ProcessId holder; }; private: // Graph state (copy-on-write for lock-free reads) struct GraphState { std::unordered_map<ProcessId, std::unordered_set<ProcessId>> wait_for; std::unordered_map<ResourceId, ProcessId> resource_holders; std::unordered_map<ProcessId, std::optional<ResourceId>> process_waiting; uint64_t version = 0; }; std::shared_ptr<GraphState> current_state_; mutable std::shared_mutex state_mutex_; // Metrics std::atomic<uint64_t> detection_count_{0}; std::atomic<uint64_t> deadlock_count_{0}; std::atomic<uint64_t> total_detection_ns_{0}; public: DeadlockDetector() : current_state_(std::make_shared<GraphState>()) {} /** * Record that a process now holds a resource. * Thread-safe. */ void record_acquire(ProcessId process, ResourceId resource) { std::unique_lock lock(state_mutex_); // Copy-on-write auto new_state = std::make_shared<GraphState>(*current_state_); new_state->resource_holders[resource] = process; new_state->process_waiting[process] = std::nullopt; new_state->version++; rebuild_edges_for(process, *new_state); current_state_ = new_state; } /** * Record that a process released a resource. * Thread-safe. */ void record_release(ProcessId process, ResourceId resource) { std::unique_lock lock(state_mutex_); auto new_state = std::make_shared<GraphState>(*current_state_); new_state->resource_holders.erase(resource); new_state->version++; // Rebuild edges for all processes waiting for this resource for (auto& [pid, waiting_for] : new_state->process_waiting) { if (waiting_for && *waiting_for == resource) { rebuild_edges_for(pid, *new_state); } } current_state_ = new_state; } /** * Record that a process is blocked waiting for a resource. * Thread-safe. */ void record_block(ProcessId process, ResourceId resource) { std::unique_lock lock(state_mutex_); auto new_state = std::make_shared<GraphState>(*current_state_); new_state->process_waiting[process] = resource; new_state->version++; rebuild_edges_for(process, *new_state); current_state_ = new_state; } /** * Detect deadlock in the current graph state. * Thread-safe, non-blocking for other operations. */ DetectionResult detect() const { auto start = std::chrono::high_resolution_clock::now(); // Get snapshot (lock-free read after this) std::shared_ptr<GraphState> snapshot; { std::shared_lock lock(state_mutex_); snapshot = current_state_; } DetectionResult result; result.has_deadlock = false; // DFS-based cycle detection enum Color { WHITE, GRAY, BLACK }; std::unordered_map<ProcessId, Color> color; std::unordered_map<ProcessId, ProcessId> parent; for (const auto& [pid, _] : snapshot->wait_for) { color[pid] = WHITE; } std::function<bool(ProcessId)> dfs = [&](ProcessId node) -> bool { color[node] = GRAY; auto it = snapshot->wait_for.find(node); if (it != snapshot->wait_for.end()) { for (ProcessId neighbor : it->second) { if (color[neighbor] == GRAY) { // Back edge - cycle found! result.has_deadlock = true; // Reconstruct cycle result.cycle.push_back(node); ProcessId curr = node; while (curr != neighbor) { curr = parent[curr]; result.cycle.push_back(curr); } result.cycle.push_back(neighbor); return true; } if (color[neighbor] == WHITE) { parent[neighbor] = node; if (dfs(neighbor)) return true; } } } color[node] = BLACK; return false; }; for (const auto& [pid, _] : snapshot->wait_for) { if (color[pid] == WHITE) { if (dfs(pid)) break; } } // Collect all deadlocked (still GRAY after detection) if (result.has_deadlock) { for (const auto& [pid, c] : color) { if (c == GRAY) { result.all_deadlocked.push_back(pid); } } } auto end = std::chrono::high_resolution_clock::now(); result.detection_time = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start); // Update metrics detection_count_++; if (result.has_deadlock) deadlock_count_++; total_detection_ns_ += result.detection_time.count(); return result; } /** * Get detection statistics. */ struct Stats { uint64_t total_detections; uint64_t deadlocks_found; double avg_detection_us; }; Stats get_stats() const { Stats s; s.total_detections = detection_count_.load(); s.deadlocks_found = deadlock_count_.load(); s.avg_detection_us = s.total_detections > 0 ? (total_detection_ns_.load() / 1000.0) / s.total_detections : 0; return s; } private: void rebuild_edges_for(ProcessId pid, GraphState& state) { state.wait_for[pid].clear(); auto waiting_it = state.process_waiting.find(pid); if (waiting_it != state.process_waiting.end() && waiting_it->second) { ResourceId resource = *waiting_it->second; auto holder_it = state.resource_holders.find(resource); if (holder_it != state.resource_holders.end() && holder_it->second != pid) { state.wait_for[pid].insert(holder_it->second); } } }};Key Production Features:
Copy-on-Write State: Graph updates create a new state object. Readers always see a consistent snapshot without blocking writers.
Shared Mutex: Multiple detection threads can run concurrently (shared lock). Updates take exclusive lock but are fast (just pointer swap).
Metrics Collection: Track detection count, deadlock count, and average detection time for monitoring and alerting.
Detailed Results: Return not just "has deadlock" but the cycle path and all affected processes for debugging and recovery.
Nanosecond Timing: Precise timing allows tuning detection frequency based on actual overhead.
In highly concurrent systems, be careful of ABA problems where state changes back to an earlier configuration between your check and action. The version number in our GraphState helps detect such issues. Always validate that the state you're acting on is still current.
Let's examine how single-instance deadlock detection applies to common scenarios.
Scenario: Database transactions acquiring exclusive locks on individual rows.
Each row lock is a single-instance resource (only one transaction can hold it). This is a perfect fit for wait-for graph detection.
123456789101112131415161718192021222324
-- Transaction T1BEGIN;UPDATE accounts SET balance = balance - 100 WHERE id = 1; -- Acquires lock on row 1-- ... some work ...UPDATE accounts SET balance = balance + 100 WHERE id = 2; -- Wants lock on row 2-- BLOCKED if T2 holds lock on row 2 -- Transaction T2 (concurrent)BEGIN;UPDATE accounts SET balance = balance - 50 WHERE id = 2; -- Acquires lock on row 2-- ... some work ...UPDATE accounts SET balance = balance + 50 WHERE id = 1; -- Wants lock on row 1-- BLOCKED if T1 holds lock on row 1 -- Wait-For Graph:-- T1 -> T2 (T1 waiting for row 2, held by T2)-- T2 -> T1 (T2 waiting for row 1, held by T1)-- CYCLE: T1 -> T2 -> T1 [DEADLOCK] -- Database detects this and aborts one transaction:-- ERROR: deadlock detected-- DETAIL: Process 12345 waits for ShareLock on transaction 67890-- Process 67890 waits for ShareLock on transaction 12345-- HINT: See server log for query details.Most modern databases (PostgreSQL, MySQL, SQL Server) implement wait-for graph deadlock detection for row locks. When deadlock is detected, one transaction is chosen as the 'victim' and aborted. The application receives an error and can retry. This is the detection + recovery pattern in action.
Understanding the computational complexity helps tune detection frequency and resource budgets.
| Operation | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Build wait-for graph | O(P × R) | O(P + E) | P=processes, R=resources, E=edges |
| DFS cycle detection | O(P + E) | O(P) | Linear in graph size |
| Kahn's algorithm | O(P + E) | O(P) | Same as DFS |
| Find ALL cycles | O((P+E)(C+1)) | O(P + E) | C = number of cycles (Johnson's) |
| Edge insertion (incremental) | O(P) | O(1) | Check for cycle through new edge |
| Edge deletion (incremental) | O(E) | O(P) | May need re-validation |
Practical Performance:
For typical systems:
Scaling Considerations:
For very large systems (100,000+ processes):
Don't overthink detection overhead. Modern CPUs can traverse graphs with millions of edges per second. For a system with 1000 processes and sparse wait-for relationships, detection takes ~10-100 microseconds. The cost of NOT detecting (processes stuck forever) far exceeds detection overhead.
Production implementations must handle edge cases that simple algorithms overlook.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
def detect_deadlock_robust(adjacency: dict, blocked_since: dict, timeout_threshold: float) -> dict: """ Robust detection handling common edge cases. Args: adjacency: Wait-for graph (pid -> set of pids waited for) blocked_since: When each waiting process started waiting (pid -> timestamp) timeout_threshold: Seconds before considering a wait pathological Returns: Dict with: has_deadlock, cycle, pending_timeouts, recommendations """ import time result = { 'has_deadlock': False, 'cycle': [], 'pending_timeouts': [], # Processes about to timeout 'recommendations': [] } # Edge case 1: Filter out processes not actually waiting active_waiters = {pid: waits for pid, waits in adjacency.items() if waits} # Non-empty wait set # Edge case 2: Self-loops (shouldn't happen, but catch it) for pid, waits in active_waiters.items(): if pid in waits: result['has_deadlock'] = True result['cycle'] = [pid, pid] result['recommendations'].append( f"CRITICAL: Process {pid} waiting for itself. Bug in application." ) return result # Edge case 3: Check for processes close to timeout now = time.time() for pid in active_waiters: blocked_duration = now - blocked_since.get(pid, now) if blocked_duration > timeout_threshold * 0.8: # 80% of timeout result['pending_timeouts'].append(pid) # Main detection (DFS) # ... standard DFS cycle detection ... # Edge case 4: Distinguish timeout candidates from true deadlock if result['cycle']: # Check if any cycle participant will soon timeout cycle_set = set(result['cycle']) timeout_set = set(result['pending_timeouts']) if cycle_set & timeout_set: result['recommendations'].append( f"Deadlock may self-resolve: processes {cycle_set & timeout_set} " f"near timeout. Consider waiting briefly before recovery." ) return resultEvery edge case listed above has caused production outages. Unit test each scenario. Inject faults (crashed processes, slow responses) into integration tests. A detection algorithm that works on simple cases but fails on edge cases is worse than no algorithm—it gives false confidence.
We've comprehensively covered deadlock detection for single-instance resources—the foundational case that underpins more complex scenarios.
What's Next:
The next page addresses the more complex case: multiple-instance detection. When resources have multiple instances (e.g., 5 printers, 10 database connections), a cycle in the wait-for graph is necessary but not sufficient for deadlock. We'll learn the Banker's-style reduction algorithm that handles this case.
You now have complete mastery of single-instance deadlock detection. You can implement production-quality detectors, understand their complexity, handle edge cases, and apply them to real systems like databases and multi-threaded applications. This is the foundational skill for all deadlock detection work.