Database Management SystemsDeadlock Handling

Deadlock Handling in Database Systems

LevelIntermediate

Duration75 mins

TopicDeadlock Handling

2 / 5

Deadlock Detection

Finding the Invisible Knot

Imagine being a traffic controller with no view of the roads—only reports from drivers saying "I'm stuck and waiting for the car ahead of me." From these fragmented reports, you must determine whether traffic is experiencing temporary congestion or a true gridlock that requires intervention. This is precisely the challenge facing database deadlock detection algorithms.

Deadlock detection is a critical capability that enables databases to identify when transactions have formed circular wait patterns that cannot resolve naturally. Unlike prevention strategies that sacrifice concurrency, detection allows the system to operate at full speed and intervene only when necessary—making it the preferred approach in most production database systems.

What You Will Learn

By the end of this page, you will master the theory and practice of deadlock detection, including detection algorithms, timing strategies, cost-benefit analysis, and how major database systems implement detection in production environments.

Detection vs. Prevention: A Strategic Choice

Before diving into detection mechanisms, it's essential to understand why database systems choose detection over prevention as their primary deadlock management strategy.

The Fundamental Tradeoff:

Database systems face a choice between two philosophies:

Prevention: Guarantee deadlocks cannot occur by restricting how transactions acquire locks
Detection: Allow deadlocks to occur and resolve them when detected

Most production systems favor detection because the performance cost of guaranteed prevention is typically higher than the occasional cost of rolling back a deadlocked transaction.

Advantages of Detection

•Maximum concurrency—no restrictive protocols
•No upfront overhead until deadlock occurs
•Allows flexible, dynamic lock acquisition
•Handles unpredictable access patterns well
•Minimal impact on transaction throughput

Costs of Detection

•Work lost when transaction is rolled back
•Detection algorithm overhead (periodic or continuous)
•Potential delay between deadlock formation and detection
•Need for victim selection heuristics
•Possible cascading aborts in some scenarios

The Economic Justification

In most systems, deadlocks are rare events. If 0.1% of transactions deadlock, it's more efficient to occasionally roll back that 0.1% than to impose restrictions on 100% of transactions. Detection becomes the economically optimal choice when the cost of occasional rollbacks is less than the cost of continuous prevention overhead.

Detection Timing Strategies

A critical design decision in deadlock detection is when to check for deadlocks. Different timing strategies offer different tradeoffs between detection latency and computational overhead:

Key Question: Should we check continuously, periodically, or only when certain conditions are met?

Comparison of Detection Timing Strategies
Strategy	Description	Latency	Overhead	Best For
Continuous Detection	Check for deadlock on every lock request that blocks	Minimal (immediate)	High (many checks)	Low-volume, latency-critical systems
Periodic Detection	Run detection algorithm at fixed intervals (e.g., every 5 seconds)	Variable (0 to interval)	Moderate (predictable)	General-purpose OLTP systems
Timeout-Triggered	Start detection when a transaction waits beyond threshold	Threshold + detection time	Low (targeted checks)	Systems with varied transaction lengths
Lazy Detection	Only detect when resources are scarce or system appears stuck	High (reactive)	Minimal	Read-heavy workloads with rare deadlocks
Hybrid Approach	Combine periodic with timeout-triggered escalation	Balanced	Adaptive	Enterprise production systems

Continuous Detection (Immediate)

In this approach, every blocking lock request triggers a deadlock check. When transaction T requests a lock held by transaction T', the system immediately checks whether granting this request would complete a cycle.

LOCK_REQUEST(T, resource R):
    IF R is available:
        GRANT lock to T
    ELSE:
        holder = current_lock_holder(R)
        IF would_create_cycle(T, holder):
            // Deadlock detected before waiting!
            SELECT_VICTIM_AND_ABORT()
        ELSE:
            ADD T to wait queue for R

Advantage: Deadlocks are detected instantly—transactions never actually enter a deadlock state.

Disadvantage: The cycle check occurs on every blocking request, which can be expensive in high-contention workloads. Each check might traverse a significant portion of the wait graph.

Periodic Detection

The system runs a complete cycle detection algorithm at regular intervals. Between runs, deadlocks may form but go undetected for up to one interval period.

DETECTION_THREAD:
    WHILE system_running:
        SLEEP(detection_interval)
        BUILD wait_for_graph from current lock table
        cycles = FIND_CYCLES(wait_for_graph)
        FOR EACH cycle in cycles:
            victim = SELECT_VICTIM(cycle)
            ABORT(victim)

Typical intervals in production systems:

MySQL InnoDB: Continuous (checks on each lock wait)
PostgreSQL: Configurable, typically 1 second (deadlock_timeout)
Oracle: Automatic, ~3 seconds by default
SQL Server: ~5 seconds typical, adaptive based on workload

The Interval Dilemma

Setting the detection interval involves a tradeoff: short intervals detect deadlocks faster but consume more CPU. Long intervals reduce overhead but leave transactions blocked longer. In high-value transaction systems (financial trading), subsecond detection is often required. In analytical workloads, longer intervals are acceptable.

Detection Algorithm Fundamentals

At its core, deadlock detection reduces to a graph cycle detection problem. The system maintains (explicitly or implicitly) a graph representing wait relationships between transactions, then searches for cycles in that graph.

The Abstract Process:

Construct a graph where nodes are transactions and edges represent "waits-for" relationships
Search the graph for cycles using graph traversal algorithms
Report any cycles found—each cycle represents a deadlock
Resolve each deadlock by aborting one or more transactions in the cycle

The specific algorithms used depend on system architecture (centralized vs. distributed) and performance requirements.

Common Detection Algorithms

•Wait-For Graph (WFG) with DFS — Build explicit graph, find cycles using Depth-First Search. O(V + E) complexity where V = transactions, E = wait edges. Most common in centralized systems.
•Wait-For Graph with Topological Sort — Attempt topological ordering of WFG. If ordering fails (cycle exists), deadlock is present. Clean theoretical approach.
•Timestamp-Based Detection — Track wait timestamps; transactions waiting beyond threshold trigger targeted cycle search. Efficient for systems with rare deadlocks.
•Path-Pushing Algorithm — For distributed systems: each node pushes its local wait paths to other nodes. Deadlock detected when path returns to origin.
•Edge-Chasing Algorithm — For distributed systems: special probe messages chase edges around potential cycles. Cycle confirmed when probe returns to sender.

Algorithm Complexity Analysis:

Algorithm	Time Complexity	Space Complexity	Suitable For
DFS Cycle Detection	O(V + E)	O(V)	General purpose, centralized
Tarjan's SCC	O(V + E)	O(V)	Finding all cycles efficiently
Timeout-based	O(1) per check	O(blocked transactions)	Simple implementation
Path-pushing	O(network latency × path length)	O(path information)	Distributed databases
Edge-chasing	O(network latency × cycle length)	O(1) per probe	Distributed databases

In practice, the Wait-For Graph with DFS is the workhorse algorithm for single-database systems due to its optimal complexity and straightforward implementation.

DFS Cycle Detection in Detail

Depth-First Search (DFS) forms the backbone of most deadlock detection implementations. The algorithm explores the wait-for graph, marking nodes as it proceeds, and detects cycles when it encounters a node already in the current traversal path.

The Three-Color Marking Scheme:

To correctly detect cycles, each transaction (node) is assigned one of three states:

WHITE (Unvisited): Transaction has not been examined yet
GRAY (In Progress): Transaction is currently being explored—we haven't finished examining all nodes reachable from it
BLACK (Completed): Transaction and all reachable nodes have been fully explored

Critical Insight: A cycle exists if and only if we encounter a GRAY node during DFS traversal. Finding a BLACK node means we've reached a subtree we've already fully explored—no cycle there.

deadlock_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
class DeadlockDetector:
    """
    Deadlock detection using DFS with three-color marking.
    Finds all cycles in the wait-for graph.
    """
    
    def __init__(self):
        self.wait_for_graph = {}  # transaction_id -> set of waiting_for_ids
        self.color = {}           # WHITE=0, GRAY=1, BLACK=2
        self.parent = {}          # For reconstructing cycle path
        self.cycles = []          # All detected cycles
    
    def build_wait_graph(self, lock_table):
        """
        Construct wait-for graph from current lock state.
        Edge T1 -> T2 means T1 is waiting for a lock held by T2.
        """
        self.wait_for_graph.clear()
        
        for resource, info in lock_table.items():
            holder = info['holder']
            waiters = info['waiting_queue']
            
            for waiter in waiters:
                if waiter not in self.wait_for_graph:
                    self.wait_for_graph[waiter] = set()
                self.wait_for_graph[waiter].add(holder)
    
    def detect_cycles(self):
        """
        Run DFS from each unvisited node to find all cycles.
        Returns list of cycles, where each cycle is a list of transaction IDs.
        """
        self.cycles = []
        self.color = {t: 0 for t in self.wait_for_graph}  # All WHITE
        self.parent = {}
        
        for transaction in self.wait_for_graph:
            if self.color.get(transaction, 0) == 0:  # WHITE
                self._dfs_visit(transaction)
        
        return self.cycles
    
    def _dfs_visit(self, transaction):
        """
        DFS visit with cycle detection and path reconstruction.
        """
        self.color[transaction] = 1  # Mark GRAY (in progress)
        
        for neighbor in self.wait_for_graph.get(transaction, []):
            if self.color.get(neighbor, 0) == 0:  # WHITE - unvisited
                self.parent[neighbor] = transaction
                self._dfs_visit(neighbor)
            
            elif self.color.get(neighbor, 0) == 1:  # GRAY - cycle found!
                # Reconstruct the cycle
                cycle = [neighbor]
                current = transaction
                while current != neighbor:
                    cycle.append(current)
                    current = self.parent.get(current)
                cycle.append(neighbor)  # Complete the cycle
                self.cycles.append(cycle[::-1])  # Reverse for readability
        
        self.color[transaction] = 2  # Mark BLACK (complete)
    
    def get_deadlock_info(self, lock_table):
        """
        Full deadlock detection with diagnostic information.
        """
        self.build_wait_graph(lock_table)
        cycles = self.detect_cycles()
        
        if not cycles:
            return {"deadlock_detected": False}
        
        return {
            "deadlock_detected": True,
            "cycle_count": len(cycles),
            "cycles": cycles,
            "involved_transactions": set().union(*[set(c) for c in cycles])
        }

Why Three Colors?

Two colors (visited/unvisited) are insufficient for cycle detection. With only two colors, we can't distinguish between 'currently exploring this path' and 'completely finished with this subtree.' The GRAY state specifically marks nodes on the current DFS path, enabling accurate cycle identification.

Detection Optimizations

While basic DFS is O(V + E), production systems employ various optimizations to reduce the practical cost of detection, especially in high-throughput environments:

Incremental Graph Maintenance:

Rather than rebuilding the wait-for graph from scratch each detection cycle, maintain it incrementally:

On lock request block: Add edge (requester → holder)
On lock release: Remove all edges to the released transaction
On transaction commit/abort: Remove all edges involving that transaction

This reduces the per-cycle cost from O(all locks) to O(recent changes).

Production Optimizations

•Lazy Graph Construction — Only add blocked transactions to the graph. Transactions not waiting for any lock cannot participate in deadlock.
•Local-First Detection — Check for simple 2-transaction cycles first (most common case). Only invoke full DFS if quick check fails.
•Bounded Search Depth — Limit DFS depth since practical cycles are small. Cycles of length >5 are extremely rare.
•Partitioned Detection — In large systems, partition the wait graph by resource type or database region. Run detection independently per partition.
•Wait Timeout Pre-Filtering — Only include transactions that have been waiting beyond a minimum threshold, reducing graph size.
•Probabilistic Sampling — In extremely high-concurrency systems, sample a subset of waiting transactions for detection rather than checking all.

Real-World Performance Numbers:

System	Detection Strategy	Typical Detection Time	Graph Size Limit
PostgreSQL	Timeout + DFS	~1-10ms	No practical limit
MySQL InnoDB	Continuous + optimized DFS	<1ms for 2-cycles	Bounded recursion depth
Oracle	Background thread	~10-100ms	Adaptive
SQL Server	Background monitor	~5-50ms	Configurable

These times are for detection algorithm execution only—not including the time until detection is triggered.

The 2-Cycle Fast Path

Since ~95% of deadlocks involve exactly 2 transactions, many systems implement a fast path: when T₁ blocks on T₂, immediately check if T₂ is waiting for T₁. This O(1) check catches the vast majority of deadlocks without full graph traversal.

Deadlock Detection in Distributed Systems

When transactions span multiple database nodes, deadlock detection becomes significantly more challenging. No single node has complete visibility of the system state, making centralized approaches infeasible.

The Distributed Detection Challenge:

Transaction T₁ on Node A waits for T₂
Transaction T₂ is on Node B, waiting for T₃
Transaction T₃ is on Node C, waiting for T₁

No single node sees the complete cycle. Detection requires coordination across nodes.

Centralized Coordinator Approach:

Designate one node as the deadlock coordinator:

Each node periodically sends its local wait-for edges to coordinator
Coordinator builds global wait-for graph
Coordinator runs cycle detection
Coordinator instructs victim node to abort transaction

Pros: Simple algorithm, consistent detection

Cons: Coordinator becomes bottleneck and single point of failure; communication overhead

Edge-Chasing Algorithm (Obermarck's):

No coordinator needed—detection is fully distributed:

When T waits across nodes, send probe message to remote
Remote node forwards probe along wait chain
If probe returns to originating transaction, cycle detected
Detected on the node where cycle closes

Pros: No coordinator, scales well

Cons: Complex implementation, possible phantom deadlocks

The Phantom Deadlock Problem:

In distributed systems, detection messages travel with non-zero latency. By the time a probe message completes its circuit, the original wait situation may have changed:

Time t₀: T₁ waits for T₂ (probe sent)
Time t₁: T₂ releases lock, T₁ proceeds
Time t₂: Probe arrives, T₂ now waits for T₃
Time t₃: Probe continues to T₃
Time t₄: T₃ waits for T₁ (current)
Time t₅: Probe detects 'cycle' — but T₁ is no longer waiting!

This phantom deadlock appears real to the detection algorithm but has already resolved. Resolution: Before aborting, verify the deadlock still exists with a synchronous check.

False Positives Are Costly

Aborting a transaction due to a phantom deadlock wastes the work that transaction has completed. Production distributed databases implement confirmation phases before victim selection to minimize false positive aborts. The tradeoff is slightly longer detection latency for higher accuracy.

Detection in Major Database Systems

Understanding how major databases implement deadlock detection helps you configure and troubleshoot production systems effectively:

MySQL InnoDB:

InnoDB implements continuous deadlock detection using a wait-for graph. When a transaction requests a lock and must wait, InnoDB immediately checks for a cycle involving the requester.

-- InnoDB configuration options
innodb_deadlock_detect = ON              -- Enable/disable detection
innodb_lock_wait_timeout = 50            -- Fallback timeout (seconds)
innodb_print_all_deadlocks = ON          -- Log all deadlocks to error log

-- View current lock waits
SELECT * FROM performance_schema.data_lock_waits;

InnoDB's detection is optimized for the common 2-transaction case and limits recursion depth to prevent excessive CPU usage during detection.

PostgreSQL:

PostgreSQL uses a timeout-triggered approach. Transactions wait for deadlock_timeout (default: 1s) before deadlock detection runs. This reduces CPU overhead but introduces detection latency.

-- PostgreSQL configuration
SET deadlock_timeout = '1s';              -- Time before detection triggers
SET lock_timeout = '10s';                 -- Absolute lock wait limit

-- Monitor lock dependencies
SELECT * FROM pg_locks WHERE NOT granted;
SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock';

-- View deadlock logs (in postgresql.log)
-- LOG:  detected deadlock while waiting for ShareLock on transaction 12345

SQL Server:

SQL Server runs a background lock monitor thread that periodically checks for deadlocks. The default interval is 5 seconds, but it adapts—if deadlocks are detected, the interval decreases to as low as 100ms.

-- SQL Server deadlock analysis
-- Enable trace flag for deadlock information in error log
DBCC TRACEON(1222, -1);    -- Detailed XML deadlock graphs
DBCC TRACEON(1204, -1);    -- Text-based deadlock info

-- Using Extended Events (recommended approach)
CREATE EVENT SESSION DeadlockCapture ON SERVER
ADD EVENT sqlserver.xml_deadlock_report
ADD TARGET package0.event_file (SET filename = 'Deadlocks.xel');

Deadlock Detection Comparison Across Databases
Database	Detection Trigger	Default Timing	Configurable?
MySQL InnoDB	Immediate on block	Instant	Can be disabled (innodb_deadlock_detect=OFF)
PostgreSQL	Timeout threshold	1 second	deadlock_timeout parameter
Oracle	Background process	~3 seconds	Not directly configurable
SQL Server	Background monitor	5 seconds (adaptive)	Adjusts automatically
MariaDB	Same as MySQL	Instant	Same as MySQL InnoDB options

Summary: Deadlock Detection Mastery

We've comprehensively explored deadlock detection—from theoretical algorithms to practical implementations in production databases. Here are the essential takeaways:

Key Takeaways

•Detection is preferred over prevention in most systems because it maximizes concurrency and only incurs cost when deadlocks actually occur.
•Timing strategies balance latency and overhead — continuous detection finds deadlocks instantly but costs CPU; periodic detection reduces overhead but introduces delay.
•Wait-for graph with DFS is the fundamental algorithm for centralized systems, with O(V + E) complexity that's efficient for typical workloads.
•The three-color marking scheme enables accurate cycle detection by distinguishing between 'currently exploring' and 'finished exploring' states.
•Production optimizations like 2-cycle fast paths, incremental graph maintenance, and bounded search depths make detection practical at scale.
•Distributed detection is fundamentally harder due to lack of global visibility. Algorithms like edge-chasing handle this but must account for phantom deadlocks.
•Each major database has distinct approaches — understand your specific database's detection mechanism to effectively configure and troubleshoot deadlocks.

What's Next:

Now that we understand how databases detect deadlocks, we'll examine the Wait-For Graph in depth—the data structure at the heart of detection. The next page explores graph construction, maintenance, and advanced cycle-finding algorithms used in enterprise systems.

Detection Mastery Achieved

You now understand the complete lifecycle of deadlock detection—from the theoretical foundation of cycle detection to the practical implementations in MySQL, PostgreSQL, SQL Server, and distributed systems. This knowledge enables you to configure, monitor, and troubleshoot deadlock detection in production environments.

2 / 5

Loading learning content...

Database Management SystemsDeadlock Handling

Deadlock Handling in Database Systems

LevelIntermediate

Duration75 mins

TopicDeadlock Handling

2 / 5

Deadlock Detection

Finding the Invisible Knot

What You Will Learn

Detection vs. Prevention: A Strategic Choice

Before diving into detection mechanisms, it's essential to understand why database systems choose detection over prevention as their primary deadlock management strategy.

The Fundamental Tradeoff:

Database systems face a choice between two philosophies:

Prevention: Guarantee deadlocks cannot occur by restricting how transactions acquire locks
Detection: Allow deadlocks to occur and resolve them when detected

Most production systems favor detection because the performance cost of guaranteed prevention is typically higher than the occasional cost of rolling back a deadlocked transaction.

Advantages of Detection

•Maximum concurrency—no restrictive protocols
•No upfront overhead until deadlock occurs
•Allows flexible, dynamic lock acquisition
•Handles unpredictable access patterns well
•Minimal impact on transaction throughput

Costs of Detection

•Work lost when transaction is rolled back
•Detection algorithm overhead (periodic or continuous)
•Potential delay between deadlock formation and detection
•Need for victim selection heuristics
•Possible cascading aborts in some scenarios

The Economic Justification

Detection Timing Strategies

A critical design decision in deadlock detection is when to check for deadlocks. Different timing strategies offer different tradeoffs between detection latency and computational overhead:

Key Question: Should we check continuously, periodically, or only when certain conditions are met?

Comparison of Detection Timing Strategies
Strategy	Description	Latency	Overhead	Best For
Continuous Detection	Check for deadlock on every lock request that blocks	Minimal (immediate)	High (many checks)	Low-volume, latency-critical systems
Periodic Detection	Run detection algorithm at fixed intervals (e.g., every 5 seconds)	Variable (0 to interval)	Moderate (predictable)	General-purpose OLTP systems
Timeout-Triggered	Start detection when a transaction waits beyond threshold	Threshold + detection time	Low (targeted checks)	Systems with varied transaction lengths
Lazy Detection	Only detect when resources are scarce or system appears stuck	High (reactive)	Minimal	Read-heavy workloads with rare deadlocks
Hybrid Approach	Combine periodic with timeout-triggered escalation	Balanced	Adaptive	Enterprise production systems

Continuous Detection (Immediate)

LOCK_REQUEST(T, resource R):
    IF R is available:
        GRANT lock to T
    ELSE:
        holder = current_lock_holder(R)
        IF would_create_cycle(T, holder):
            // Deadlock detected before waiting!
            SELECT_VICTIM_AND_ABORT()
        ELSE:
            ADD T to wait queue for R

Advantage: Deadlocks are detected instantly—transactions never actually enter a deadlock state.

Disadvantage: The cycle check occurs on every blocking request, which can be expensive in high-contention workloads. Each check might traverse a significant portion of the wait graph.

Periodic Detection

The system runs a complete cycle detection algorithm at regular intervals. Between runs, deadlocks may form but go undetected for up to one interval period.

DETECTION_THREAD:
    WHILE system_running:
        SLEEP(detection_interval)
        BUILD wait_for_graph from current lock table
        cycles = FIND_CYCLES(wait_for_graph)
        FOR EACH cycle in cycles:
            victim = SELECT_VICTIM(cycle)
            ABORT(victim)

Typical intervals in production systems:

MySQL InnoDB: Continuous (checks on each lock wait)
PostgreSQL: Configurable, typically 1 second (deadlock_timeout)
Oracle: Automatic, ~3 seconds by default
SQL Server: ~5 seconds typical, adaptive based on workload

The Interval Dilemma

Detection Algorithm Fundamentals

The Abstract Process:

Construct a graph where nodes are transactions and edges represent "waits-for" relationships
Search the graph for cycles using graph traversal algorithms
Report any cycles found—each cycle represents a deadlock
Resolve each deadlock by aborting one or more transactions in the cycle

The specific algorithms used depend on system architecture (centralized vs. distributed) and performance requirements.

Common Detection Algorithms

•Wait-For Graph (WFG) with DFS — Build explicit graph, find cycles using Depth-First Search. O(V + E) complexity where V = transactions, E = wait edges. Most common in centralized systems.
•Wait-For Graph with Topological Sort — Attempt topological ordering of WFG. If ordering fails (cycle exists), deadlock is present. Clean theoretical approach.
•Timestamp-Based Detection — Track wait timestamps; transactions waiting beyond threshold trigger targeted cycle search. Efficient for systems with rare deadlocks.
•Path-Pushing Algorithm — For distributed systems: each node pushes its local wait paths to other nodes. Deadlock detected when path returns to origin.
•Edge-Chasing Algorithm — For distributed systems: special probe messages chase edges around potential cycles. Cycle confirmed when probe returns to sender.

Algorithm Complexity Analysis:

Algorithm	Time Complexity	Space Complexity	Suitable For
DFS Cycle Detection	O(V + E)	O(V)	General purpose, centralized
Tarjan's SCC	O(V + E)	O(V)	Finding all cycles efficiently
Timeout-based	O(1) per check	O(blocked transactions)	Simple implementation
Path-pushing	O(network latency × path length)	O(path information)	Distributed databases
Edge-chasing	O(network latency × cycle length)	O(1) per probe	Distributed databases

In practice, the Wait-For Graph with DFS is the workhorse algorithm for single-database systems due to its optimal complexity and straightforward implementation.

DFS Cycle Detection in Detail

The Three-Color Marking Scheme:

To correctly detect cycles, each transaction (node) is assigned one of three states:

WHITE (Unvisited): Transaction has not been examined yet
GRAY (In Progress): Transaction is currently being explored—we haven't finished examining all nodes reachable from it
BLACK (Completed): Transaction and all reachable nodes have been fully explored

Critical Insight: A cycle exists if and only if we encounter a GRAY node during DFS traversal. Finding a BLACK node means we've reached a subtree we've already fully explored—no cycle there.

deadlock_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
class DeadlockDetector:
    """
    Deadlock detection using DFS with three-color marking.
    Finds all cycles in the wait-for graph.
    """
    
    def __init__(self):
        self.wait_for_graph = {}  # transaction_id -> set of waiting_for_ids
        self.color = {}           # WHITE=0, GRAY=1, BLACK=2
        self.parent = {}          # For reconstructing cycle path
        self.cycles = []          # All detected cycles
    
    def build_wait_graph(self, lock_table):
        """
        Construct wait-for graph from current lock state.
        Edge T1 -> T2 means T1 is waiting for a lock held by T2.
        """
        self.wait_for_graph.clear()
        
        for resource, info in lock_table.items():
            holder = info['holder']
            waiters = info['waiting_queue']
            
            for waiter in waiters:
                if waiter not in self.wait_for_graph:
                    self.wait_for_graph[waiter] = set()
                self.wait_for_graph[waiter].add(holder)
    
    def detect_cycles(self):
        """
        Run DFS from each unvisited node to find all cycles.
        Returns list of cycles, where each cycle is a list of transaction IDs.
        """
        self.cycles = []
        self.color = {t: 0 for t in self.wait_for_graph}  # All WHITE
        self.parent = {}
        
        for transaction in self.wait_for_graph:
            if self.color.get(transaction, 0) == 0:  # WHITE
                self._dfs_visit(transaction)
        
        return self.cycles
    
    def _dfs_visit(self, transaction):
        """
        DFS visit with cycle detection and path reconstruction.
        """
        self.color[transaction] = 1  # Mark GRAY (in progress)
        
        for neighbor in self.wait_for_graph.get(transaction, []):
            if self.color.get(neighbor, 0) == 0:  # WHITE - unvisited
                self.parent[neighbor] = transaction
                self._dfs_visit(neighbor)
            
            elif self.color.get(neighbor, 0) == 1:  # GRAY - cycle found!
                # Reconstruct the cycle
                cycle = [neighbor]
                current = transaction
                while current != neighbor:
                    cycle.append(current)
                    current = self.parent.get(current)
                cycle.append(neighbor)  # Complete the cycle
                self.cycles.append(cycle[::-1])  # Reverse for readability
        
        self.color[transaction] = 2  # Mark BLACK (complete)
    
    def get_deadlock_info(self, lock_table):
        """
        Full deadlock detection with diagnostic information.
        """
        self.build_wait_graph(lock_table)
        cycles = self.detect_cycles()
        
        if not cycles:
            return {"deadlock_detected": False}
        
        return {
            "deadlock_detected": True,
            "cycle_count": len(cycles),
            "cycles": cycles,
            "involved_transactions": set().union(*[set(c) for c in cycles])
        }

Why Three Colors?

Detection Optimizations

While basic DFS is O(V + E), production systems employ various optimizations to reduce the practical cost of detection, especially in high-throughput environments:

Incremental Graph Maintenance:

Rather than rebuilding the wait-for graph from scratch each detection cycle, maintain it incrementally:

On lock request block: Add edge (requester → holder)
On lock release: Remove all edges to the released transaction
On transaction commit/abort: Remove all edges involving that transaction

This reduces the per-cycle cost from O(all locks) to O(recent changes).

Production Optimizations

•Lazy Graph Construction — Only add blocked transactions to the graph. Transactions not waiting for any lock cannot participate in deadlock.
•Local-First Detection — Check for simple 2-transaction cycles first (most common case). Only invoke full DFS if quick check fails.
•Bounded Search Depth — Limit DFS depth since practical cycles are small. Cycles of length >5 are extremely rare.
•Partitioned Detection — In large systems, partition the wait graph by resource type or database region. Run detection independently per partition.
•Wait Timeout Pre-Filtering — Only include transactions that have been waiting beyond a minimum threshold, reducing graph size.
•Probabilistic Sampling — In extremely high-concurrency systems, sample a subset of waiting transactions for detection rather than checking all.

Real-World Performance Numbers:

System	Detection Strategy	Typical Detection Time	Graph Size Limit
PostgreSQL	Timeout + DFS	~1-10ms	No practical limit
MySQL InnoDB	Continuous + optimized DFS	<1ms for 2-cycles	Bounded recursion depth
Oracle	Background thread	~10-100ms	Adaptive
SQL Server	Background monitor	~5-50ms	Configurable

These times are for detection algorithm execution only—not including the time until detection is triggered.

The 2-Cycle Fast Path

Deadlock Detection in Distributed Systems

The Distributed Detection Challenge:

Transaction T₁ on Node A waits for T₂
Transaction T₂ is on Node B, waiting for T₃
Transaction T₃ is on Node C, waiting for T₁

No single node sees the complete cycle. Detection requires coordination across nodes.

Centralized Coordinator Approach:

Designate one node as the deadlock coordinator:

Each node periodically sends its local wait-for edges to coordinator
Coordinator builds global wait-for graph
Coordinator runs cycle detection
Coordinator instructs victim node to abort transaction

Pros: Simple algorithm, consistent detection

Cons: Coordinator becomes bottleneck and single point of failure; communication overhead

Edge-Chasing Algorithm (Obermarck's):

No coordinator needed—detection is fully distributed:

When T waits across nodes, send probe message to remote
Remote node forwards probe along wait chain
If probe returns to originating transaction, cycle detected
Detected on the node where cycle closes

Pros: No coordinator, scales well

Cons: Complex implementation, possible phantom deadlocks

The Phantom Deadlock Problem:

In distributed systems, detection messages travel with non-zero latency. By the time a probe message completes its circuit, the original wait situation may have changed:

Time t₀: T₁ waits for T₂ (probe sent)
Time t₁: T₂ releases lock, T₁ proceeds
Time t₂: Probe arrives, T₂ now waits for T₃
Time t₃: Probe continues to T₃
Time t₄: T₃ waits for T₁ (current)
Time t₅: Probe detects 'cycle' — but T₁ is no longer waiting!

This phantom deadlock appears real to the detection algorithm but has already resolved. Resolution: Before aborting, verify the deadlock still exists with a synchronous check.

False Positives Are Costly

Detection in Major Database Systems

Understanding how major databases implement deadlock detection helps you configure and troubleshoot production systems effectively:

MySQL InnoDB:

InnoDB implements continuous deadlock detection using a wait-for graph. When a transaction requests a lock and must wait, InnoDB immediately checks for a cycle involving the requester.

-- InnoDB configuration options
innodb_deadlock_detect = ON              -- Enable/disable detection
innodb_lock_wait_timeout = 50            -- Fallback timeout (seconds)
innodb_print_all_deadlocks = ON          -- Log all deadlocks to error log

-- View current lock waits
SELECT * FROM performance_schema.data_lock_waits;

InnoDB's detection is optimized for the common 2-transaction case and limits recursion depth to prevent excessive CPU usage during detection.

PostgreSQL:

PostgreSQL uses a timeout-triggered approach. Transactions wait for deadlock_timeout (default: 1s) before deadlock detection runs. This reduces CPU overhead but introduces detection latency.

-- PostgreSQL configuration
SET deadlock_timeout = '1s';              -- Time before detection triggers
SET lock_timeout = '10s';                 -- Absolute lock wait limit

-- Monitor lock dependencies
SELECT * FROM pg_locks WHERE NOT granted;
SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock';

-- View deadlock logs (in postgresql.log)
-- LOG:  detected deadlock while waiting for ShareLock on transaction 12345

SQL Server:

-- SQL Server deadlock analysis
-- Enable trace flag for deadlock information in error log
DBCC TRACEON(1222, -1);    -- Detailed XML deadlock graphs
DBCC TRACEON(1204, -1);    -- Text-based deadlock info

-- Using Extended Events (recommended approach)
CREATE EVENT SESSION DeadlockCapture ON SERVER
ADD EVENT sqlserver.xml_deadlock_report
ADD TARGET package0.event_file (SET filename = 'Deadlocks.xel');

Deadlock Detection Comparison Across Databases
Database	Detection Trigger	Default Timing	Configurable?
MySQL InnoDB	Immediate on block	Instant	Can be disabled (innodb_deadlock_detect=OFF)
PostgreSQL	Timeout threshold	1 second	deadlock_timeout parameter
Oracle	Background process	~3 seconds	Not directly configurable
SQL Server	Background monitor	5 seconds (adaptive)	Adjusts automatically
MariaDB	Same as MySQL	Instant	Same as MySQL InnoDB options

Summary: Deadlock Detection Mastery

We've comprehensively explored deadlock detection—from theoretical algorithms to practical implementations in production databases. Here are the essential takeaways:

Key Takeaways

•Detection is preferred over prevention in most systems because it maximizes concurrency and only incurs cost when deadlocks actually occur.
•Timing strategies balance latency and overhead — continuous detection finds deadlocks instantly but costs CPU; periodic detection reduces overhead but introduces delay.
•Wait-for graph with DFS is the fundamental algorithm for centralized systems, with O(V + E) complexity that's efficient for typical workloads.
•The three-color marking scheme enables accurate cycle detection by distinguishing between 'currently exploring' and 'finished exploring' states.
•Production optimizations like 2-cycle fast paths, incremental graph maintenance, and bounded search depths make detection practical at scale.
•Distributed detection is fundamentally harder due to lack of global visibility. Algorithms like edge-chasing handle this but must account for phantom deadlocks.
•Each major database has distinct approaches — understand your specific database's detection mechanism to effectively configure and troubleshoot deadlocks.

What's Next:

Detection Mastery Achieved

2 / 5