Deadlock Handling - Learning Module

Loading content...

0/252

Wait-For Graph

Mapping the Maze of Dependencies

When dozens of transactions compete for thousands of resources in a complex database system, how does the DBMS keep track of who's waiting for whom? The answer lies in an elegant data structure called the Wait-For Graph (WFG)—a directed graph that captures the web of dependencies between blocked transactions.

The Wait-For Graph is the essential tool for deadlock detection. By maintaining a live map of transaction dependencies, database systems can quickly identify circular waits that signal deadlock. Understanding this structure is fundamental to grasping how modern databases manage concurrent access.

What You Will Learn

By the end of this page, you will master the Wait-For Graph—its formal definition, construction methods, maintenance strategies, cycle detection algorithms, and how it's implemented in production database systems. You'll be able to visualize, analyze, and troubleshoot transaction dependencies.

Formal Definition of the Wait-For Graph

The Wait-For Graph (WFG) is a directed graph G = (V, E) where:

V (Vertices): Each vertex represents an active transaction in the system
E (Edges): A directed edge (Tᵢ → Tⱼ) exists if transaction Tᵢ is waiting for a resource currently held by transaction Tⱼ

Formal Properties:

Directed: Edges have direction—Tᵢ → Tⱼ means Tᵢ waits for Tⱼ, not the reverse
Dynamic: The graph changes as transactions acquire locks, release locks, and complete
Transient: Vertices are added when transactions block and removed when they complete
Cycle-Indicative: A cycle in the WFG indicates a deadlock; acyclic WFG means no deadlock

The Fundamental Theorem:

A system is in a deadlocked state if and only if its Wait-For Graph contains one or more cycles.

Why Directed Edges Matter

The direction of edges captures the asymmetric nature of waiting. If T₁ holds a lock that T₂ needs, only T₂ is blocked—T₁ can proceed freely. This asymmetry means T₁ → T₂ would be incorrect; the correct edge is T₂ → T₁ (T₂ waits for T₁).

Mathematical Notation:

Let's formalize the Wait-For Graph construction:

Given:
  L = set of all locks in the system
  T = set of all active transactions
  holder(l) = transaction currently holding lock l
  waiting(l) = set of transactions waiting for lock l

Wait-For Graph Construction:
  V = { t ∈ T | ∃l ∈ L : t ∈ waiting(l) } ∪ { t ∈ T | ∃l ∈ L : t = holder(l) }
  E = { (tᵢ, tⱼ) | ∃l ∈ L : tᵢ ∈ waiting(l) ∧ tⱼ = holder(l) }

In plain language:

Include as vertices all transactions that are waiting or holding locks
Draw an edge from each waiting transaction to the transaction holding the resource it needs

Visual Representation and Examples

Understanding Wait-For Graphs becomes intuitive through visualization. Let's examine several scenarios with varying complexity:

Example 1: No Deadlock

Consider a scenario where:

T₁ holds lock on A, wants lock on B
T₂ holds lock on B (not waiting for anything)
T₃ is waiting for lock on A (held by T₁)

The wait-for relationships are:

T₁ → T₂ (T₁ waits for B held by T₂)
T₃ → T₁ (T₃ waits for A held by T₁)

    T₃ ──→ T₁ ──→ T₂

This is a chain, not a cycle. No deadlock. When T₂ commits, T₁ can proceed, and then T₃ can proceed.

Example 2: Simple Deadlock (2 Transactions)

T₁ holds A, wants B
T₂ holds B, wants A

Edges:

T₁ → T₂ (T₁ waits for B)
T₂ → T₁ (T₂ waits for A)

    T₁ ⟷ T₂
    (actually: T₁ ──→ T₂ ──→ T₁)

Cycle detected: T₁ → T₂ → T₁. DEADLOCK!

Example 3: Complex Deadlock (Multiple Transactions)

T₁ holds A, wants D
T₂ holds B, wants A
T₃ holds C, wants B
T₄ holds D, wants C

Edges:

T₁ → T₄ (wants D)
T₂ → T₁ (wants A)
T₃ → T₂ (wants B)
T₄ → T₃ (wants C)

    T₁ ──→ T₄
    ↑        ↓
    T₂ ←── T₃

Cycle: T₁ → T₄ → T₃ → T₂ → T₁. DEADLOCK involving 4 transactions!

Multiple Waits Per Transaction

A transaction can have multiple outgoing edges if it's blocked waiting for multiple resources held by different transactions. This creates more complex graphs but doesn't change the cycle detection logic—any cycle indicates deadlock.

Wait-For Graph Construction

Constructing the Wait-For Graph requires maintaining the relationship between locks and transactions. There are two primary approaches:

Approach 1: On-Demand Construction

Build the graph only when detection is needed. Traverse the lock table and identify all wait relationships:

def build_wait_for_graph():
    """
    Construct WFG from current lock table state.
    Called before each detection cycle.
    """
    graph = {}  # adjacency list: waiter -> [holders]
    
    for resource, lock_info in lock_table.items():
        holders = lock_info.holders          # Transactions holding locks
        waiters = lock_info.wait_queue       # Transactions waiting
        
        for waiter in waiters:
            if waiter not in graph:
                graph[waiter] = []
            
            # Edge from waiter to each holder
            for holder in holders:
                if holder != waiter:  # Can't wait for yourself
                    graph[waiter].append(holder)
    
    return graph

Pros: Simple implementation, no continuous overhead Cons: O(locks × waiters) construction cost per detection

Approach 2: Incremental Maintenance

Maintain the graph continuously as lock operations occur:

class IncrementalWaitForGraph:
    """
    Incrementally maintained Wait-For Graph.
    Updated on each lock operation for O(1) detection queries.
    """
    
    def __init__(self):
        self.edges = {}           # waiter -> set of holders
        self.reverse_edges = {}   # holder -> set of waiters (for fast removal)
    
    def on_lock_blocked(self, waiter, holder):
        """
        Called when 'waiter' blocks waiting for 'holder'.
        Adds edge: waiter -> holder
        """
        if waiter not in self.edges:
            self.edges[waiter] = set()
        self.edges[waiter].add(holder)
        
        if holder not in self.reverse_edges:
            self.reverse_edges[holder] = set()
        self.reverse_edges[holder].add(waiter)
    
    def on_lock_granted(self, waiter):
        """
        Called when 'waiter' acquires its lock.
        Removes all outgoing edges from waiter.
        """
        if waiter in self.edges:
            for holder in self.edges[waiter]:
                if holder in self.reverse_edges:
                    self.reverse_edges[holder].discard(waiter)
            del self.edges[waiter]
    
    def on_transaction_complete(self, txn):
        """
        Called when transaction commits or aborts.
        Removes all edges involving this transaction.
        """
        # Remove outgoing edges
        self.on_lock_granted(txn)
        
        # Remove incoming edges (others waiting for this txn)
        if txn in self.reverse_edges:
            for waiter in self.reverse_edges[txn]:
                if waiter in self.edges:
                    self.edges[waiter].discard(txn)
            del self.reverse_edges[txn]

Pros: O(1) graph operations, always current Cons: Continuous memory overhead, complexity in edge maintenance

Which Approach to Use?

Most production databases use a hybrid: maintain some metadata incrementally (like blocked transaction lists) but construct the full graph on-demand for detection. This balances memory usage with detection efficiency.

Handling Complex Lock Modes

Real database systems support multiple lock modes (Shared, Exclusive, Update, Intent locks) that complicate wait-for relationship construction:

Shared Lock Complications:

Multiple transactions can hold shared locks simultaneously. When a transaction waits for exclusive access, it waits for ALL shared lock holders:

Scenario:
  T₁ holds S-lock on X
  T₂ holds S-lock on X  
  T₃ wants X-lock on X (waits)

Wait-For Edges:
  T₃ → T₁
  T₃ → T₂

T₃ creates edges to BOTH T₁ and T₂ because T₃ cannot proceed until both release.

Wait Relationships for Different Lock Conflicts
Requester Wants	Currently Held By	Wait Edges Created
X-lock (one txn)	X-lock by T₁	Requester → T₁
X-lock (one txn)	S-lock by T₁, T₂, T₃	Requester → T₁, T₂, T₃
S-lock (one txn)	X-lock by T₁	Requester → T₁
S-lock (one txn)	S-lock by T₁, T₂	No wait (granted immediately)
X-lock upgrade	S-lock by T₁ (upgrading) + T₂	T₁ → T₂ (upgrade waits for other readers)
IX-lock (intent)	X-lock by T₁	Requester → T₁

Lock Upgrade Scenarios:

A particularly subtle case occurs when a transaction holding a shared lock wants to upgrade to exclusive:

Scenario:
  T₁ holds S-lock on X, wants to upgrade to X-lock
  T₂ holds S-lock on X

Analysis:
  T₁ cannot upgrade until T₂ releases its S-lock
  
Wait-For Edge:
  T₁ → T₂

Danger Zone:
  If T₂ also wants to upgrade:
  T₂ → T₁ (T₂ waits for T₁'s S-lock to release)
  
  DEADLOCK: Both have S-locks, both want X-locks, each waits for other.

This conversion deadlock is common in read-then-update patterns. Many systems detect this specific pattern and prevent it or handle it specially.

The Upgrade Deadlock Pattern

Lock upgrade deadlocks are insidious because they can occur between transactions accessing the SAME resource. Solution: use UPDATE locks (U-locks) or SELECT...FOR UPDATE syntax that acquires exclusive access from the start when updates are anticipated.

Cycle Detection Algorithms

Finding cycles in the Wait-For Graph is the core algorithmic challenge. Several approaches exist, each with different performance characteristics:

Algorithm 1: Standard DFS with Recursion Stack

The classic approach using three-color marking (WHITE, GRAY, BLACK):

cycle_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def find_cycles_dfs(graph):
    """
    Find all cycles in directed graph using DFS.
    Returns list of cycles (each cycle is a list of nodes).
    """
    WHITE, GRAY, BLACK = 0, 1, 2
    color = {node: WHITE for node in graph}
    path = []
    cycles = []
    
    def dfs(node):
        color[node] = GRAY
        path.append(node)
        
        for neighbor in graph.get(node, []):
            if color.get(neighbor, WHITE) == WHITE:
                dfs(neighbor)
            elif color.get(neighbor, WHITE) == GRAY:
                # Cycle found! Extract it from path
                cycle_start = path.index(neighbor)
                cycle = path[cycle_start:] + [neighbor]
                cycles.append(cycle)
        
        path.pop()
        color[node] = BLACK
    
    for node in graph:
        if color.get(node, WHITE) == WHITE:
            dfs(node)
    
    return cycles

Algorithm 2: Tarjan's Strongly Connected Components

Strongly connected components (SCCs) with more than one node indicate cycles:

def tarjan_scc(graph):
    """
    Find SCCs using Tarjan's algorithm.
    Any SCC with |SCC| > 1 contains a cycle.
    """
    index_counter = [0]
    index = {}
    lowlink = {}
    on_stack = {}
    stack = []
    sccs = []
    
    def strongconnect(node):
        index[node] = index_counter[0]
        lowlink[node] = index_counter[0]
        index_counter[0] += 1
        on_stack[node] = True
        stack.append(node)
        
        for neighbor in graph.get(node, []):
            if neighbor not in index:
                strongconnect(neighbor)
                lowlink[node] = min(lowlink[node], lowlink[neighbor])
            elif on_stack.get(neighbor, False):
                lowlink[node] = min(lowlink[node], index[neighbor])
        
        if lowlink[node] == index[node]:
            scc = []
            while True:
                w = stack.pop()
                on_stack[w] = False
                scc.append(w)
                if w == node:
                    break
            sccs.append(scc)
    
    for node in graph:
        if node not in index:
            strongconnect(node)
    
    # SCCs with >1 node contain cycles
    return [scc for scc in sccs if len(scc) > 1]

Complexity Comparison:

Algorithm	Time Complexity	Space Complexity	Best For
DFS with colors	O(V + E)	O(V)	General detection
Tarjan's SCC	O(V + E)	O(V)	Finding all cycles efficiently
Simple 2-cycle check	O(E)	O(1)	Optimized common case
Kahn's topological sort	O(V + E)	O(V)	Checking if any cycle exists

Practical Optimizations for WFG

Production database systems implement numerous optimizations to make Wait-For Graph operations efficient at scale:

Key Optimizations

•Two-Transaction Fast Path — Before running full DFS, check for direct bidirectional edges (T₁ → T₂ and T₂ → T₁). This O(1) check catches ~95% of deadlocks.
•Lazy Node Inclusion — Only include blocked transactions in the graph. Non-blocked transactions cannot be part of a deadlock cycle.
•Bloom Filter Pre-Check — Use Bloom filters to quickly determine if a transaction might be in a cycle before expensive graph traversal.
•Bounded DFS Depth — Limit DFS recursion to practical cycle lengths (typically 5-10). Longer cycles are rare and may indicate configuration issues.
•Partition by Resource Type — Separate WFGs for different lock types (row vs. table) reduce graph size and enable parallel detection.
•Transaction Age Thresholds — Only include transactions blocked beyond a minimum time, filtering transient waits that resolve quickly.

Memory-Efficient Representation:

For large-scale systems, memory efficiency matters. Compare representations:

# Approach 1: Hash map of sets (flexible, ~60 bytes per edge)
adjacency = {tx1: {tx2, tx3}, tx4: {tx1}}

# Approach 2: Adjacency array with offset table (compact, ~12 bytes per edge)
offsets = [0, 2, 4, ...]      # Index into edges array
edges = [tx2, tx3, tx1, ...]  # Flattened edge list
# Node tx1 has edges[offsets[0]:offsets[1]]

# Approach 3: Bit matrix (ultra-compact for small graphs)
# Matrix[i][j] = 1 if Ti → Tj
# Only practical for < 1000 concurrent transactions

Dynamic Graph Pruning:

Remove resolved wait relationships eagerly:

def prune_resolved_waits(graph, lock_table):
    """Remove edges where waits have been resolved."""
    for waiter, holders in list(graph.items()):
        active_waits = []
        for holder in holders:
            # Check if waiter still blocks on holder
            if is_still_blocked(waiter, holder, lock_table):
                active_waits.append(holder)
        
        if active_waits:
            graph[waiter] = active_waits
        else:
            del graph[waiter]

Wait-For Graphs in Distributed Systems

In distributed databases, no single node has the complete Wait-For Graph. Constructing a global view requires coordination:

Local Wait-For Graphs (LWFG):

Each node maintains its own LWFG for transactions and locks it manages. Cross-node waits create partial edges:

Node A LWFG:
  T₁ → T₂          # Both on Node A
  T₁ → Texternal   # T₁ waits for transaction on another node

Node B LWFG:
  T₃ → T₄          # Both on Node B  
  T₃ → Texternal   # Cross-node wait indicator

The challenge: detect when these partial graphs form a global cycle.

Global Wait-For Graph (GWFG):

Construct a combined graph from all LWFGs:

Periodically collect LWFGs from all nodes
Merge into GWFG, resolving external references
Run cycle detection on GWFG
Coordinate victim selection across nodes

Challenge: Consistency—the global graph represents a snapshot in time, but local graphs change continuously.

Path-Pushing Protocol:

Avoid building full global graph:

When T blocks on remote, push wait path to that node
Remote node extends path with its local waits
If path returns to origin, cycle detected
Deadlock resolved locally where detected

Advantage: No coordinator, better scalability Challenge: Message overhead, phantom deadlocks

Example: Distributed WFG Construction

Cluster: Node A, Node B, Node C

Transaction Locations:
  T₁: Started on A, accessing resources on A and B
  T₂: Started on B, accessing resources on B and C
  T₃: Started on C, accessing resources on C and A

Lock Situation:
  T₁ holds lock on A:Resource1, wants B:Resource2 (held by T₂)
  T₂ holds lock on B:Resource2, wants C:Resource3 (held by T₃)  
  T₃ holds lock on C:Resource3, wants A:Resource1 (held by T₁)

Local WFGs:
  Node A: {T₃ → T₁, T₁ → External_B}
  Node B: {T₁ → T₂, T₂ → External_C}
  Node C: {T₂ → T₃, T₃ → External_A}

Global WFG (merged):
  T₁ → T₂ → T₃ → T₁  ← CYCLE DETECTED!

Real-World Systems

Distributed databases like Google Spanner, CockroachDB, and YugabyteDB implement distributed deadlock detection using variations of these approaches. Most use timeout-based initial detection with distributed verification before aborting to minimize phantom deadlock false positives.

Summary: Mastering Wait-For Graphs

The Wait-For Graph is the cornerstone data structure for deadlock detection. Here's what you've learned:

Key Takeaways

•The WFG is a directed graph where vertices are transactions and edges represent 'waits-for' relationships. A cycle indicates deadlock.
•Graph construction can be on-demand (simpler, periodic overhead) or incremental (continuous overhead, always current). Most systems use hybrids.
•Complex lock modes create multi-edge scenarios where one waiter may have edges to multiple holders (shared locks) or upgrade deadlocks occur.
•Cycle detection uses DFS with three-color marking or Tarjan's SCC algorithm, both achieving O(V + E) complexity.
•Production optimizations include 2-transaction fast paths, bounded search depth, lazy node inclusion, and partition-based detection.
•Distributed WFGs require global graph construction or distributed protocols (path-pushing, edge-chasing) to detect cross-node cycles.
•Memory-efficient representations matter at scale—choose adjacency structures appropriate to your transaction volume.

What's Next:

Now that we understand how deadlocks are detected, we'll explore deadlock prevention—techniques that design systems to ensure deadlocks cannot occur in the first place. The next page examines prevention strategies that break the Coffman conditions before circular waits can form.

WFG Mastery Complete

You now possess deep understanding of the Wait-For Graph—from formal definition through construction, maintenance, cycle detection, and distributed considerations. This knowledge enables you to understand and troubleshoot deadlock detection in any database system.

Wait-For Graph

Mapping the Maze of Dependencies

What You Will Learn

Formal Definition of the Wait-For Graph

The Wait-For Graph (WFG) is a directed graph G = (V, E) where:

V (Vertices): Each vertex represents an active transaction in the system
E (Edges): A directed edge (Tᵢ → Tⱼ) exists if transaction Tᵢ is waiting for a resource currently held by transaction Tⱼ

Formal Properties:

Directed: Edges have direction—Tᵢ → Tⱼ means Tᵢ waits for Tⱼ, not the reverse
Dynamic: The graph changes as transactions acquire locks, release locks, and complete
Transient: Vertices are added when transactions block and removed when they complete
Cycle-Indicative: A cycle in the WFG indicates a deadlock; acyclic WFG means no deadlock

The Fundamental Theorem:

A system is in a deadlocked state if and only if its Wait-For Graph contains one or more cycles.

Why Directed Edges Matter

Mathematical Notation:

Let's formalize the Wait-For Graph construction:

Given:
  L = set of all locks in the system
  T = set of all active transactions
  holder(l) = transaction currently holding lock l
  waiting(l) = set of transactions waiting for lock l

Wait-For Graph Construction:
  V = { t ∈ T | ∃l ∈ L : t ∈ waiting(l) } ∪ { t ∈ T | ∃l ∈ L : t = holder(l) }
  E = { (tᵢ, tⱼ) | ∃l ∈ L : tᵢ ∈ waiting(l) ∧ tⱼ = holder(l) }

In plain language:

Include as vertices all transactions that are waiting or holding locks
Draw an edge from each waiting transaction to the transaction holding the resource it needs

Visual Representation and Examples

Understanding Wait-For Graphs becomes intuitive through visualization. Let's examine several scenarios with varying complexity:

Example 1: No Deadlock

Consider a scenario where:

T₁ holds lock on A, wants lock on B
T₂ holds lock on B (not waiting for anything)
T₃ is waiting for lock on A (held by T₁)

The wait-for relationships are:

T₁ → T₂ (T₁ waits for B held by T₂)
T₃ → T₁ (T₃ waits for A held by T₁)

    T₃ ──→ T₁ ──→ T₂

This is a chain, not a cycle. No deadlock. When T₂ commits, T₁ can proceed, and then T₃ can proceed.

Example 2: Simple Deadlock (2 Transactions)

T₁ holds A, wants B
T₂ holds B, wants A

Edges:

T₁ → T₂ (T₁ waits for B)
T₂ → T₁ (T₂ waits for A)

    T₁ ⟷ T₂
    (actually: T₁ ──→ T₂ ──→ T₁)

Cycle detected: T₁ → T₂ → T₁. DEADLOCK!

Example 3: Complex Deadlock (Multiple Transactions)

T₁ holds A, wants D
T₂ holds B, wants A
T₃ holds C, wants B
T₄ holds D, wants C

Edges:

T₁ → T₄ (wants D)
T₂ → T₁ (wants A)
T₃ → T₂ (wants B)
T₄ → T₃ (wants C)

    T₁ ──→ T₄
    ↑        ↓
    T₂ ←── T₃

Cycle: T₁ → T₄ → T₃ → T₂ → T₁. DEADLOCK involving 4 transactions!

Multiple Waits Per Transaction

Wait-For Graph Construction

Constructing the Wait-For Graph requires maintaining the relationship between locks and transactions. There are two primary approaches:

Approach 1: On-Demand Construction

Build the graph only when detection is needed. Traverse the lock table and identify all wait relationships:

def build_wait_for_graph():
    """
    Construct WFG from current lock table state.
    Called before each detection cycle.
    """
    graph = {}  # adjacency list: waiter -> [holders]
    
    for resource, lock_info in lock_table.items():
        holders = lock_info.holders          # Transactions holding locks
        waiters = lock_info.wait_queue       # Transactions waiting
        
        for waiter in waiters:
            if waiter not in graph:
                graph[waiter] = []
            
            # Edge from waiter to each holder
            for holder in holders:
                if holder != waiter:  # Can't wait for yourself
                    graph[waiter].append(holder)
    
    return graph

Pros: Simple implementation, no continuous overhead Cons: O(locks × waiters) construction cost per detection

Approach 2: Incremental Maintenance

Maintain the graph continuously as lock operations occur:

class IncrementalWaitForGraph:
    """
    Incrementally maintained Wait-For Graph.
    Updated on each lock operation for O(1) detection queries.
    """
    
    def __init__(self):
        self.edges = {}           # waiter -> set of holders
        self.reverse_edges = {}   # holder -> set of waiters (for fast removal)
    
    def on_lock_blocked(self, waiter, holder):
        """
        Called when 'waiter' blocks waiting for 'holder'.
        Adds edge: waiter -> holder
        """
        if waiter not in self.edges:
            self.edges[waiter] = set()
        self.edges[waiter].add(holder)
        
        if holder not in self.reverse_edges:
            self.reverse_edges[holder] = set()
        self.reverse_edges[holder].add(waiter)
    
    def on_lock_granted(self, waiter):
        """
        Called when 'waiter' acquires its lock.
        Removes all outgoing edges from waiter.
        """
        if waiter in self.edges:
            for holder in self.edges[waiter]:
                if holder in self.reverse_edges:
                    self.reverse_edges[holder].discard(waiter)
            del self.edges[waiter]
    
    def on_transaction_complete(self, txn):
        """
        Called when transaction commits or aborts.
        Removes all edges involving this transaction.
        """
        # Remove outgoing edges
        self.on_lock_granted(txn)
        
        # Remove incoming edges (others waiting for this txn)
        if txn in self.reverse_edges:
            for waiter in self.reverse_edges[txn]:
                if waiter in self.edges:
                    self.edges[waiter].discard(txn)
            del self.reverse_edges[txn]

Pros: O(1) graph operations, always current Cons: Continuous memory overhead, complexity in edge maintenance

Which Approach to Use?

Handling Complex Lock Modes

Real database systems support multiple lock modes (Shared, Exclusive, Update, Intent locks) that complicate wait-for relationship construction:

Shared Lock Complications:

Multiple transactions can hold shared locks simultaneously. When a transaction waits for exclusive access, it waits for ALL shared lock holders:

Scenario:
  T₁ holds S-lock on X
  T₂ holds S-lock on X  
  T₃ wants X-lock on X (waits)

Wait-For Edges:
  T₃ → T₁
  T₃ → T₂

T₃ creates edges to BOTH T₁ and T₂ because T₃ cannot proceed until both release.

Wait Relationships for Different Lock Conflicts
Requester Wants	Currently Held By	Wait Edges Created
X-lock (one txn)	X-lock by T₁	Requester → T₁
X-lock (one txn)	S-lock by T₁, T₂, T₃	Requester → T₁, T₂, T₃
S-lock (one txn)	X-lock by T₁	Requester → T₁
S-lock (one txn)	S-lock by T₁, T₂	No wait (granted immediately)
X-lock upgrade	S-lock by T₁ (upgrading) + T₂	T₁ → T₂ (upgrade waits for other readers)
IX-lock (intent)	X-lock by T₁	Requester → T₁

Lock Upgrade Scenarios:

A particularly subtle case occurs when a transaction holding a shared lock wants to upgrade to exclusive:

Scenario:
  T₁ holds S-lock on X, wants to upgrade to X-lock
  T₂ holds S-lock on X

Analysis:
  T₁ cannot upgrade until T₂ releases its S-lock
  
Wait-For Edge:
  T₁ → T₂

Danger Zone:
  If T₂ also wants to upgrade:
  T₂ → T₁ (T₂ waits for T₁'s S-lock to release)
  
  DEADLOCK: Both have S-locks, both want X-locks, each waits for other.

This conversion deadlock is common in read-then-update patterns. Many systems detect this specific pattern and prevent it or handle it specially.

The Upgrade Deadlock Pattern

Cycle Detection Algorithms

Finding cycles in the Wait-For Graph is the core algorithmic challenge. Several approaches exist, each with different performance characteristics:

Algorithm 1: Standard DFS with Recursion Stack

The classic approach using three-color marking (WHITE, GRAY, BLACK):

cycle_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def find_cycles_dfs(graph):
    """
    Find all cycles in directed graph using DFS.
    Returns list of cycles (each cycle is a list of nodes).
    """
    WHITE, GRAY, BLACK = 0, 1, 2
    color = {node: WHITE for node in graph}
    path = []
    cycles = []
    
    def dfs(node):
        color[node] = GRAY
        path.append(node)
        
        for neighbor in graph.get(node, []):
            if color.get(neighbor, WHITE) == WHITE:
                dfs(neighbor)
            elif color.get(neighbor, WHITE) == GRAY:
                # Cycle found! Extract it from path
                cycle_start = path.index(neighbor)
                cycle = path[cycle_start:] + [neighbor]
                cycles.append(cycle)
        
        path.pop()
        color[node] = BLACK
    
    for node in graph:
        if color.get(node, WHITE) == WHITE:
            dfs(node)
    
    return cycles

Algorithm 2: Tarjan's Strongly Connected Components

Strongly connected components (SCCs) with more than one node indicate cycles:

def tarjan_scc(graph):
    """
    Find SCCs using Tarjan's algorithm.
    Any SCC with |SCC| > 1 contains a cycle.
    """
    index_counter = [0]
    index = {}
    lowlink = {}
    on_stack = {}
    stack = []
    sccs = []
    
    def strongconnect(node):
        index[node] = index_counter[0]
        lowlink[node] = index_counter[0]
        index_counter[0] += 1
        on_stack[node] = True
        stack.append(node)
        
        for neighbor in graph.get(node, []):
            if neighbor not in index:
                strongconnect(neighbor)
                lowlink[node] = min(lowlink[node], lowlink[neighbor])
            elif on_stack.get(neighbor, False):
                lowlink[node] = min(lowlink[node], index[neighbor])
        
        if lowlink[node] == index[node]:
            scc = []
            while True:
                w = stack.pop()
                on_stack[w] = False
                scc.append(w)
                if w == node:
                    break
            sccs.append(scc)
    
    for node in graph:
        if node not in index:
            strongconnect(node)
    
    # SCCs with >1 node contain cycles
    return [scc for scc in sccs if len(scc) > 1]

Complexity Comparison:

Algorithm	Time Complexity	Space Complexity	Best For
DFS with colors	O(V + E)	O(V)	General detection
Tarjan's SCC	O(V + E)	O(V)	Finding all cycles efficiently
Simple 2-cycle check	O(E)	O(1)	Optimized common case
Kahn's topological sort	O(V + E)	O(V)	Checking if any cycle exists

Practical Optimizations for WFG

Production database systems implement numerous optimizations to make Wait-For Graph operations efficient at scale:

Key Optimizations

•Two-Transaction Fast Path — Before running full DFS, check for direct bidirectional edges (T₁ → T₂ and T₂ → T₁). This O(1) check catches ~95% of deadlocks.
•Lazy Node Inclusion — Only include blocked transactions in the graph. Non-blocked transactions cannot be part of a deadlock cycle.
•Bloom Filter Pre-Check — Use Bloom filters to quickly determine if a transaction might be in a cycle before expensive graph traversal.
•Bounded DFS Depth — Limit DFS recursion to practical cycle lengths (typically 5-10). Longer cycles are rare and may indicate configuration issues.
•Partition by Resource Type — Separate WFGs for different lock types (row vs. table) reduce graph size and enable parallel detection.
•Transaction Age Thresholds — Only include transactions blocked beyond a minimum time, filtering transient waits that resolve quickly.

Memory-Efficient Representation:

For large-scale systems, memory efficiency matters. Compare representations:

# Approach 1: Hash map of sets (flexible, ~60 bytes per edge)
adjacency = {tx1: {tx2, tx3}, tx4: {tx1}}

# Approach 2: Adjacency array with offset table (compact, ~12 bytes per edge)
offsets = [0, 2, 4, ...]      # Index into edges array
edges = [tx2, tx3, tx1, ...]  # Flattened edge list
# Node tx1 has edges[offsets[0]:offsets[1]]

# Approach 3: Bit matrix (ultra-compact for small graphs)
# Matrix[i][j] = 1 if Ti → Tj
# Only practical for < 1000 concurrent transactions

Dynamic Graph Pruning:

Remove resolved wait relationships eagerly:

def prune_resolved_waits(graph, lock_table):
    """Remove edges where waits have been resolved."""
    for waiter, holders in list(graph.items()):
        active_waits = []
        for holder in holders:
            # Check if waiter still blocks on holder
            if is_still_blocked(waiter, holder, lock_table):
                active_waits.append(holder)
        
        if active_waits:
            graph[waiter] = active_waits
        else:
            del graph[waiter]

Wait-For Graphs in Distributed Systems

In distributed databases, no single node has the complete Wait-For Graph. Constructing a global view requires coordination:

Local Wait-For Graphs (LWFG):

Each node maintains its own LWFG for transactions and locks it manages. Cross-node waits create partial edges:

Node A LWFG:
  T₁ → T₂          # Both on Node A
  T₁ → Texternal   # T₁ waits for transaction on another node

Node B LWFG:
  T₃ → T₄          # Both on Node B  
  T₃ → Texternal   # Cross-node wait indicator

The challenge: detect when these partial graphs form a global cycle.

Global Wait-For Graph (GWFG):

Construct a combined graph from all LWFGs:

Periodically collect LWFGs from all nodes
Merge into GWFG, resolving external references
Run cycle detection on GWFG
Coordinate victim selection across nodes

Challenge: Consistency—the global graph represents a snapshot in time, but local graphs change continuously.

Path-Pushing Protocol:

Avoid building full global graph:

When T blocks on remote, push wait path to that node
Remote node extends path with its local waits
If path returns to origin, cycle detected
Deadlock resolved locally where detected

Advantage: No coordinator, better scalability Challenge: Message overhead, phantom deadlocks

Example: Distributed WFG Construction

Cluster: Node A, Node B, Node C

Transaction Locations:
  T₁: Started on A, accessing resources on A and B
  T₂: Started on B, accessing resources on B and C
  T₃: Started on C, accessing resources on C and A

Lock Situation:
  T₁ holds lock on A:Resource1, wants B:Resource2 (held by T₂)
  T₂ holds lock on B:Resource2, wants C:Resource3 (held by T₃)  
  T₃ holds lock on C:Resource3, wants A:Resource1 (held by T₁)

Local WFGs:
  Node A: {T₃ → T₁, T₁ → External_B}
  Node B: {T₁ → T₂, T₂ → External_C}
  Node C: {T₂ → T₃, T₃ → External_A}

Global WFG (merged):
  T₁ → T₂ → T₃ → T₁  ← CYCLE DETECTED!

Real-World Systems

Summary: Mastering Wait-For Graphs

The Wait-For Graph is the cornerstone data structure for deadlock detection. Here's what you've learned:

Key Takeaways

•The WFG is a directed graph where vertices are transactions and edges represent 'waits-for' relationships. A cycle indicates deadlock.
•Graph construction can be on-demand (simpler, periodic overhead) or incremental (continuous overhead, always current). Most systems use hybrids.
•Complex lock modes create multi-edge scenarios where one waiter may have edges to multiple holders (shared locks) or upgrade deadlocks occur.
•Cycle detection uses DFS with three-color marking or Tarjan's SCC algorithm, both achieving O(V + E) complexity.
•Production optimizations include 2-transaction fast paths, bounded search depth, lazy node inclusion, and partition-based detection.
•Distributed WFGs require global graph construction or distributed protocols (path-pushing, edge-chasing) to detect cross-node cycles.
•Memory-efficient representations matter at scale—choose adjacency structures appropriate to your transaction volume.

What's Next:

WFG Mastery Complete