Operating SystemsDetection and Recovery

Deadlock Detection and Recovery

LevelAdvanced

Duration90 mins

TopicDetection and Recovery

4 / 6

Recovery Options

Breaking the Deadlock

The detection algorithm has identified deadlock. A set of processes are caught in circular wait, unable to make progress. Now what?

Recovery is the hard part. Detection is algorithmic and deterministic—given the system state, we can precisely identify deadlocked processes. But recovery requires choosing victims, breaking invariants, and dealing with side effects. It's an engineering decision as much as a computer science problem.

The fundamental insight is that deadlock is a cycle of dependencies. To break the cycle, we must remove at least one edge—either by terminating a process (removing the node entirely) or by preempting resources (forcing a process to release what it holds). Each approach has costs, and choosing wisely requires understanding those costs.

This page explores the complete landscape of recovery options: when each is appropriate, how to choose victims fairly, and how to minimize the damage of recovery.

What You Will Learn

By the end of this page, you will understand the full spectrum of recovery strategies from operator intervention to automatic preemption, the criteria for selecting victims when breaking deadlock, how to minimize recovery cost while ensuring progress, and how real systems implement recovery in databases, operating systems, and distributed applications.

The Recovery Landscape

Recovery options span a spectrum from manual intervention to fully automatic resolution. Each point on this spectrum trades off human judgment against automation speed.

Spectrum of Deadlock Recovery Options
Strategy	Automation	Damage	Speed	When Appropriate
Operator intervention	None	Minimal (human choice)	Slow (minutes)	Critical systems, rare deadlocks
Process termination (all)	Full	Maximum	Fast	Simple systems, expendable processes
Process termination (one)	Full	Moderate	Fast	General purpose, restart exists
Checkpoint/rollback	Full	Low (lost work)	Medium	Databases, transactions
Resource preemption	Full	Variable	Medium	Preemptable resources only
Timeout + retry	Full	Low	Variable	Distributed systems, idempotent ops

The Fundamental Trade-off:

Every recovery option makes a trade-off between:

Damage to processes: How much work is lost? Data corrupted? State invalidated?
Speed of recovery: How quickly does the system return to productive work?
Fairness: Are the same processes always victimized? Is cost distributed fairly?
Complexity: How hard is the recovery mechanism to implement and debug?
Guarantees: Can we prove recovery always breaks deadlock? Are there edge cases?

No single strategy wins on all dimensions. The right choice depends on workload characteristics, system requirements, and operational constraints.

Recovery Can Cause Further Problems

Naive recovery can make things worse. Killing a process holding a lock may leave shared data in an inconsistent state. Preempting resources at the wrong time may cause cascading failures. Always consider the full impact of recovery actions, not just whether they break the deadlock.

Operator Intervention

The simplest recovery strategy is to alert a human operator and let them decide what to do. This may seem primitive, but for critical systems it's often the safest choice.

When Operator Intervention Makes Sense:

Deadlock is rare (once a day or less)
System is mission-critical (incorrect recovery is worse than delay)
Operators have deep system knowledge
Automatic recovery has unacceptable failure modes
Recovery action requires human judgment (contact customer, check logs, etc.)

operator_alert_system.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
import smtplib
import json
from datetime import datetime
from dataclasses import dataclass
from typing import List
import logging
 
@dataclass
class DeadlockAlert:
    """Alert structure for operator notification."""
    timestamp: datetime
    deadlocked_processes: List[int]
    resource_state: dict
    affected_users: List[str]
    suggested_victims: List[dict]  # Pre-computed victim candidates
    severity: str  # 'low', 'medium', 'high', 'critical'
 
class OperatorAlertSystem:
    """
    Deadlock alert system for operator intervention.
    
    Provides:
    - Multi-channel alerting (email, PagerDuty, Slack)
    - Escalation based on duration and severity
    - Detailed diagnostic information
    - Suggested recovery actions
    """
    
    def __init__(self, config: dict):
        self.config = config
        self.logger = logging.getLogger('deadlock_alert')
        self.escalation_states = {}  # deadlock_id -> escalation_level
        
    def alert_deadlock(self, alert: DeadlockAlert) -> str:
        """
        Alert operators to a detected deadlock.
        Returns a deadlock_id for tracking.
        """
        deadlock_id = f"DL-{alert.timestamp.strftime('%Y%m%d%H%M%S')}"
        
        # Determine initial escalation level based on severity
        if alert.severity == 'critical':
            self._send_pagerduty(alert, deadlock_id)
            self._send_slack_alert(alert, deadlock_id, channel='#critical')
        elif alert.severity == 'high':
            self._send_slack_alert(alert, deadlock_id, channel='#ops')
            self._send_email(alert, deadlock_id)
        else:
            self._send_slack_alert(alert, deadlock_id, channel='#ops')
        
        self.escalation_states[deadlock_id] = {
            'level': 1,
            'started': alert.timestamp,
            'alert': alert
        }
        
        self.logger.warning(
            f"Deadlock alert {deadlock_id}: "
            f"{len(alert.deadlocked_processes)} processes deadlocked"
        )
        
        return deadlock_id
    
    def escalate(self, deadlock_id: str):
        """
        Escalate an unresolved deadlock.
        Called periodically while deadlock persists.
        """
        state = self.escalation_states.get(deadlock_id)
        if not state:
            return
        
        state['level'] += 1
        level = state['level']
        
        if level == 2:
            # 5 minutes unresolved: page on-call
            self._send_pagerduty(state['alert'], deadlock_id)
        elif level == 3:
            # 15 minutes: escalate to senior on-call
            self._send_pagerduty(
                state['alert'], 
                deadlock_id,
                escalation='senior'
            )
        elif level == 4:
            # 30 minutes: emergency, consider automatic recovery
            self.logger.critical(
                f"Deadlock {deadlock_id} unresolved for 30+ minutes. "
                f"Consider enabling automatic recovery."
            )
    
    def format_alert_message(self, alert: DeadlockAlert, deadlock_id: str) -> str:
        """Format a human-readable alert message."""
        msg = []
        msg.append(f"🔴 DEADLOCK DETECTED: {deadlock_id}")
        msg.append(f"Time: {alert.timestamp.isoformat()}")
        msg.append(f"Severity: {alert.severity.upper()}")
        msg.append(f"")
        msg.append(f"Affected Processes: {alert.deadlocked_processes}")
        msg.append(f"Affected Users: {alert.affected_users or 'None identifiable'}")
        msg.append(f"")
        msg.append(f"Suggested Recovery Actions:")
        
        for i, victim in enumerate(alert.suggested_victims[:3], 1):
            msg.append(
                f"  {i}. Kill process {victim['pid']} "
                f"(cost: {victim['cost']}, reason: {victim['reason']})"
            )
        
        msg.append(f"")
        msg.append(f"To recover, run: kill-process --deadlock {deadlock_id} --victim <pid>")
        msg.append(f"To dismiss: dismiss-deadlock {deadlock_id} --reason <reason>")
        
        return "
".join(msg)
    
    def _send_pagerduty(self, alert, deadlock_id, escalation=None):
        # PagerDuty integration
        pass
    
    def _send_slack_alert(self, alert, deadlock_id, channel):
        # Slack webhook integration
        pass
    
    def _send_email(self, alert, deadlock_id):
        # Email alert
        pass

Advantages

•Human judgment on complex situations
•Can investigate root cause before acting
•Zero risk of automatic action errors
•Opportunity to fix underlying bug
•Audit trail of decisions

Disadvantages

•Slow response (minutes to hours)
•Requires 24/7 on-call staffing
•Human error in high-pressure situations
•Processes remain blocked during wait
•Inconsistent decisions across operators

Assist the Operator

If using operator intervention, make it easy. Pre-compute suggested victims with cost analysis. Provide one-click recovery actions. Show the deadlock graph visually. Log enough context to diagnose root cause. The goal is to reduce the mental load on the operator so they can make good decisions quickly.

Process Termination Overview

The most direct recovery method is to terminate one or more processes in the deadlock cycle. This breaks the cycle by removing nodes from the wait-for graph, freeing their resources for other processes.

Two Termination Strategies:

Terminate all deadlocked processes: Guaranteed to break deadlock, but maximizes damage.
Terminate one at a time: Terminate one process, re-check for deadlock, repeat if necessary. Minimizes damage but requires multiple detection passes.

Most systems use the second approach, carefully selecting which process to terminate.

termination_recovery.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
from dataclasses import dataclass
from typing import List, Set, Callable
import signal
import os
 
@dataclass
class TerminationResult:
    """Result of a termination-based recovery attempt."""
    success: bool
    terminated_processes: List[int]
    remaining_deadlock: List[int]
    total_cost: float
 
class TerminationRecovery:
    """
    Deadlock recovery through process termination.
    """
    
    def __init__(self, 
                 cost_function: Callable[[int], float],
                 detect_deadlock: Callable[[], List[int]],
                 max_victims: int = 10):
        """
        Args:
            cost_function: Returns termination cost for a process ID
            detect_deadlock: Returns list of currently deadlocked PIDs
            max_victims: Maximum processes to terminate before giving up
        """
        self.cost_function = cost_function
        self.detect_deadlock = detect_deadlock
        self.max_victims = max_victims
    
    def recover_terminate_all(self, 
                               deadlocked: List[int]) -> TerminationResult:
        """
        Terminate all deadlocked processes.
        
        Pros: Guaranteed to work, simple, fast
        Cons: Maximum collateral damage
        """
        total_cost = 0.0
        terminated = []
        
        for pid in deadlocked:
            try:
                cost = self.cost_function(pid)
                self._terminate(pid)
                terminated.append(pid)
                total_cost += cost
            except ProcessLookupError:
                # Already dead, that's fine
                pass
        
        # Verify deadlock is broken
        remaining = self.detect_deadlock()
        
        return TerminationResult(
            success=len(remaining) == 0,
            terminated_processes=terminated,
            remaining_deadlock=remaining,
            total_cost=total_cost
        )
    
    def recover_terminate_selective(self,
                                     deadlocked: List[int]) -> TerminationResult:
        """
        Terminate processes one at a time, starting with lowest cost.
        
        Pros: Minimizes damage
        Cons: Slower, requires repeated detection
        """
        terminated = []
        total_cost = 0.0
        
        for _ in range(self.max_victims):
            # Re-detect (previous termination may have broken deadlock)
            current_deadlock = self.detect_deadlock()
            
            if not current_deadlock:
                # Deadlock broken!
                return TerminationResult(
                    success=True,
                    terminated_processes=terminated,
                    remaining_deadlock=[],
                    total_cost=total_cost
                )
            
            # Choose victim: lowest cost among currently deadlocked
            victim = self._select_victim(current_deadlock)
            
            try:
                cost = self.cost_function(victim)
                self._terminate(victim)
                terminated.append(victim)
                total_cost += cost
            except ProcessLookupError:
                continue  # Try another
        
        # Check if we succeeded
        remaining = self.detect_deadlock()
        
        return TerminationResult(
            success=len(remaining) == 0,
            terminated_processes=terminated,
            remaining_deadlock=remaining,
            total_cost=total_cost
        )
    
    def _select_victim(self, deadlocked: List[int]) -> int:
        """
        Select which process to terminate.
        Default: lowest cost process.
        """
        return min(deadlocked, key=self.cost_function)
    
    def _terminate(self, pid: int):
        """Terminate a process."""
        # In production: use appropriate termination method
        # Options: SIGTERM (graceful), SIGKILL (forceful)
        os.kill(pid, signal.SIGTERM)
        
        # Wait briefly for graceful shutdown
        import time
        time.sleep(0.1)
        
        # Force kill if still alive
        try:
            os.kill(pid, 0)  # Check if still running
            os.kill(pid, signal.SIGKILL)
        except ProcessLookupError:
            pass  # Already dead

Termination Methods:

Method	Signal	Behavior	When to Use
Graceful	SIGTERM	Process can catch, cleanup, exit cleanly	Preferred, allows cleanup handlers
Forceful	SIGKILL	Immediate termination, no cleanup	Process not responding to SIGTERM
Abort	SIGABRT	Core dump + terminate	Need debug information

Always try graceful termination first. Force only after a timeout (typically 5-30 seconds).

Cleanup is Critical

Terminating a process holding locks, open files, or network connections can leave the system in an inconsistent state. Ensure the process has proper signal handlers to release resources, or that the system can recover orphaned resources (e.g., lock timeouts, transaction rollback, connection pool cleanup).

Victim Selection Criteria

When terminating processes selectively, choosing the right victim is crucial. The goal is to minimize the total cost of recovery while guaranteeing the deadlock is broken.

Factors to Consider:

Victim Selection Criteria

•Process priority — Terminate low-priority processes first. System daemons and critical services are high priority.
•Work completed — Prefer terminating processes early in their execution. A process 90% done loses more when killed than one 10% done.
•Resources held — Terminating a process holding many resources frees more for others. But it also means more lost work.
•User impact — Prefer terminating background jobs over interactive user sessions. Consider SLA impact.
•Restart cost — How expensive is it to restart this process? Long initialization, large data loading?
•Historical victimization — Avoid always killing the same process. Implement fairness over time.
•Transaction age — In databases, younger transactions have done less work. Prefer aborting them.
•Rollback complexity — Some work is easily undone (transactions with savepoints), other work is not.

victim_selection.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
from dataclasses import dataclass
from typing import List, Dict
from datetime import datetime, timedelta
import os
 
@dataclass
class ProcessInfo:
    """Information about a process for victim selection."""
    pid: int
    priority: int                    # Higher = more important
    start_time: datetime             # When process started
    resources_held: int              # Number of resources held
    progress_percent: float          # Estimated progress 0-100
    user_facing: bool                # True if interactive/user session
    times_victimized: int            # Times killed for deadlock before
    restart_cost_seconds: float      # Estimated restart time
    
class VictimSelector:
    """
    Select optimal victim for deadlock termination.
    
    Implements multiple selection strategies and combines them
    into a weighted cost function.
    """
    
    # Weight factors (tune based on system priorities)
    WEIGHTS = {
        'priority': 10.0,           # w1: Higher priority = higher cost
        'progress': 2.0,            # w2: More progress = higher cost
        'resources': -0.5,          # w3: More resources = lower cost (frees more)
        'user_facing': 50.0,        # w4: User-facing = much higher cost
        'victimization': 5.0,       # w5: Previously victimized = higher cost
        'restart': 1.0,             # w6: Restart cost directly
    }
    
    def __init__(self, weights: Dict[str, float] = None):
        self.weights = weights or self.WEIGHTS
        
    def compute_cost(self, proc: ProcessInfo) -> float:
        """
        Compute the cost of terminating a process.
        Lower cost = better victim.
        """
        w = self.weights
        
        cost = (
            w['priority'] * proc.priority +
            w['progress'] * proc.progress_percent +
            w['resources'] * proc.resources_held +
            w['user_facing'] * (100.0 if proc.user_facing else 0) +
            w['victimization'] * proc.times_victimized +
            w['restart'] * proc.restart_cost_seconds
        )
        
        return cost
    
    def select_victim(self, 
                      deadlocked_pids: List[int],
                      process_info: Dict[int, ProcessInfo]) -> int:
        """
        Select the best victim from deadlocked processes.
        
        Returns: PID of selected victim (lowest cost)
        """
        candidates = [
            (pid, self.compute_cost(process_info[pid]))
            for pid in deadlocked_pids
            if pid in process_info
        ]
        
        if not candidates:
            # Fallback: just pick first one
            return deadlocked_pids[0]
        
        # Select lowest cost
        victim, cost = min(candidates, key=lambda x: x[1])
        return victim
    
    def explain_selection(self,
                          pid: int,
                          proc: ProcessInfo) -> str:
        """
        Explain why a process was selected as victim.
        Useful for logging and operator review.
        """
        factors = []
        w = self.weights
        
        if proc.priority < 5:
            factors.append(f"Low priority ({proc.priority})")
        if proc.progress_percent < 20:
            factors.append(f"Low progress ({proc.progress_percent:.0f}%)")
        if proc.resources_held > 5:
            factors.append(f"Holds many resources ({proc.resources_held})")
        if not proc.user_facing:
            factors.append("Background process")
        if proc.times_victimized == 0:
            factors.append("Never victimized before")
        if proc.restart_cost_seconds < 10:
            factors.append(f"Quick restart ({proc.restart_cost_seconds:.0f}s)")
        
        cost = self.compute_cost(proc)
        
        return (
            f"Selected P{pid} as victim (cost: {cost:.1f}).
"
            f"Factors: {', '.join(factors) or 'No strong factors - default choice'}"
        )
 
 
# Example usage
def demo_victim_selection():
    selector = VictimSelector()
    
    processes = {
        100: ProcessInfo(100, priority=1, start_time=datetime.now()-timedelta(seconds=5),
                        resources_held=2, progress_percent=10, user_facing=False,
                        times_victimized=0, restart_cost_seconds=1),
        101: ProcessInfo(101, priority=8, start_time=datetime.now()-timedelta(minutes=30),
                        resources_held=1, progress_percent=85, user_facing=True,
                        times_victimized=0, restart_cost_seconds=60),
        102: ProcessInfo(102, priority=5, start_time=datetime.now()-timedelta(minutes=5),
                        resources_held=3, progress_percent=50, user_facing=False,
                        times_victimized=3, restart_cost_seconds=10),
    }
    
    deadlocked = [100, 101, 102]
    
    print("Process costs:")
    for pid, proc in processes.items():
        cost = selector.compute_cost(proc)
        print(f"  P{pid}: cost = {cost:.1f}")
    
    victim = selector.select_victim(deadlocked, processes)
    print(f"
Selected victim: P{victim}")
    print(selector.explain_selection(victim, processes[victim]))

Fairness Over Time

A naive cost function might always select the same process type as victim (e.g., always the background batch job). Track victimization history and increase cost for frequently-killed processes. This ensures short-lived 'victim' processes eventually get to complete, even if they're always the cheapest choice.

Checkpoint and Rollback

The most elegant recovery strategy is checkpoint and rollback: instead of terminating a process entirely, roll it back to a previous checkpoint where it didn't hold the resources causing deadlock. The process can then retry, potentially taking a different path that avoids deadlock.

This is how databases handle deadlock. When a deadlock is detected, the database aborts one transaction (the victim), rolls back its changes, and lets it retry. No process actually dies—just temporary work is undone.

checkpoint_rollback.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Any
from datetime import datetime
from abc import ABC, abstractmethod
import copy
 
@dataclass
class Checkpoint:
    """A saved process state for rollback."""
    checkpoint_id: int
    timestamp: datetime
    held_resources: List[int]       # Resources held at checkpoint
    process_state: Any              # Opaque process state
    parent_checkpoint: Optional[int] = None  # For nested checkpoints
 
class CheckpointManager(ABC):
    """Abstract interface for checkpoint/rollback support."""
    
    @abstractmethod
    def create_checkpoint(self, process_id: int) -> int:
        """Create a checkpoint, return checkpoint ID."""
        pass
    
    @abstractmethod
    def rollback(self, process_id: int, checkpoint_id: int) -> bool:
        """Roll back a process to a checkpoint."""
        pass
    
    @abstractmethod
    def discard_checkpoint(self, checkpoint_id: int):
        """Discard a checkpoint (no longer needed)."""
        pass
 
class TransactionCheckpointManager(CheckpointManager):
    """
    Checkpoint manager using database transaction semantics.
    
    This is how real database systems implement rollback for deadlock recovery.
    """
    
    def __init__(self, transaction_log, lock_manager):
        self.transaction_log = transaction_log
        self.lock_manager = lock_manager
        self.checkpoints: Dict[int, Checkpoint] = {}
        self.next_checkpoint_id = 1
    
    def create_checkpoint(self, process_id: int) -> int:
        """
        Create a savepoint in the transaction.
        All operations after this can be rolled back.
        """
        checkpoint_id = self.next_checkpoint_id
        self.next_checkpoint_id += 1
        
        # Record current state
        checkpoint = Checkpoint(
            checkpoint_id=checkpoint_id,
            timestamp=datetime.now(),
            held_resources=list(self.lock_manager.get_locks(process_id)),
            process_state=self.transaction_log.get_savepoint(process_id)
        )
        
        self.checkpoints[checkpoint_id] = checkpoint
        return checkpoint_id
    
    def rollback(self, process_id: int, checkpoint_id: int) -> bool:
        """
        Roll back to a checkpoint.
        
        1. Undo all operations since checkpoint
        2. Release locks acquired since checkpoint
        3. Restore process to checkpoint state
        """
        checkpoint = self.checkpoints.get(checkpoint_id)
        if not checkpoint:
            return False
        
        # Get operations to undo (in reverse order)
        operations = self.transaction_log.get_operations_since(
            process_id, 
            checkpoint.process_state
        )
        
        # Undo each operation
        for op in reversed(operations):
            self._undo_operation(op)
        
        # Release locks acquired after checkpoint
        current_locks = set(self.lock_manager.get_locks(process_id))
        checkpoint_locks = set(checkpoint.held_resources)
        locks_to_release = current_locks - checkpoint_locks
        
        for resource in locks_to_release:
            self.lock_manager.release(process_id, resource)
        
        # Restore transaction log position
        self.transaction_log.restore_savepoint(
            process_id, 
            checkpoint.process_state
        )
        
        return True
    
    def _undo_operation(self, operation):
        """
        Undo a single operation using its undo record.
        
        Common patterns:
        - INSERT: delete the inserted row
        - DELETE: re-insert the deleted row
        - UPDATE: restore the old value
        """
        if operation.type == 'INSERT':
            self._delete_row(operation.table, operation.row_id)
        elif operation.type == 'DELETE':
            self._insert_row(operation.table, operation.old_data)
        elif operation.type == 'UPDATE':
            self._update_row(operation.table, operation.row_id, operation.old_data)
    
    def discard_checkpoint(self, checkpoint_id: int):
        """Discard checkpoint on successful completion."""
        self.checkpoints.pop(checkpoint_id, None)
 
 
class DeadlockRollbackRecovery:
    """
    Deadlock recovery using transaction rollback.
    
    Instead of killing processes, we roll back the victim transaction
    and let it retry. This is the standard approach in databases.
    """
    
    def __init__(self, 
                 checkpoint_mgr: CheckpointManager,
                 detect_deadlock,
                 victim_selector):
        self.checkpoint_mgr = checkpoint_mgr
        self.detect_deadlock = detect_deadlock
        self.select_victim = victim_selector
    
    def recover(self, deadlocked: List[int]) -> Dict:
        """
        Recover from deadlock by rolling back victim transactions.
        """
        result = {
            'victims': [],
            'rolled_back': 0,
            'success': False
        }
        
        while deadlocked:
            # Select victim
            victim = self.select_victim(deadlocked)
            result['victims'].append(victim)
            
            # Find earliest checkpoint for victim
            earliest = self._get_earliest_checkpoint(victim)
            
            if earliest:
                # Rollback to that checkpoint
                success = self.checkpoint_mgr.rollback(victim, earliest)
                if success:
                    result['rolled_back'] += 1
            else:
                # No checkpoint - must abort entirely
                self._abort_transaction(victim)
            
            # Re-check for deadlock
            deadlocked = self.detect_deadlock()
        
        result['success'] = True
        return result
    
    def _get_earliest_checkpoint(self, process_id: int) -> Optional[int]:
        """Get the earliest valid checkpoint for a process."""
        # Implementation depends on checkpoint storage
        pass
    
    def _abort_transaction(self, process_id: int):
        """Fully abort a transaction with no checkpoint."""
        # Roll back to transaction start
        pass

Advantages of Rollback over Termination:

Aspect	Termination	Rollback
Work lost	All process work	Work since checkpoint
Process state	Must restart from scratch	Continues from checkpoint
External effects	May have caused side effects	Side effects undone
Recovery time	Full restart time	Resume from rollback point
Data consistency	Manual cleanup needed	Automatic via undo log

Automatic Retry After Rollback

After rolling back a victim, consider adding a small random delay before retry. This makes it less likely that the same processes will collide again in the same lock acquisition order. Adding jitter helps break symmetry and reduces repeated deadlocking.

Resource Preemption

Instead of terminating processes, we can try to preempt resources—forcibly taking resources from one process to give to another. This is less destructive than termination if the resource supports preemption.

Preemptable vs Non-Preemptable Resources:

Resource Preemptability
Resource Type	Preemptable?	Mechanism	Cost
CPU	✅ Yes	Context switch, scheduler	Very low (microseconds)
Memory pages	✅ Yes	Swap to disk	Medium (I/O cost)
Printer (mid-job)	❌ No	Would produce garbage output	N/A
Database row lock	✅ Partial	Rollback transaction, release lock	Lost transaction work
File handle	❌ No	Unclear consistent state	N/A
Network connection	❌ No	Breaks protocol state	N/A
GPU memory	✅ Yes	Swap to host memory	High (transfer cost)

resource_preemption.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
from enum import Enum
from dataclasses import dataclass
from typing import List, Dict, Optional, Callable
from abc import ABC, abstractmethod
 
class PreemptionResult(Enum):
    SUCCESS = 1           # Resource successfully preempted
    NOT_PREEMPTABLE = 2   # Resource type doesn't support preemption
    IN_USE = 3            # Resource in critical use, can't preempt now
    FAILED = 4            # Preemption attempted but failed
 
@dataclass
class Resource:
    id: int
    resource_type: str
    holder: Optional[int]      # PID of current holder
    preemptable: bool
    preempt_cost: float        # Cost to preempt this resource
 
class ResourcePreemptor(ABC):
    """Abstract base for resource-specific preemption logic."""
    
    @abstractmethod
    def can_preempt(self, resource: Resource) -> bool:
        """Check if resource can be safely preempted now."""
        pass
    
    @abstractmethod
    def preempt(self, resource: Resource, from_pid: int, to_pid: int) -> PreemptionResult:
        """Preempt resource from one process to another."""
        pass
 
class MemoryPagePreemptor(ResourcePreemptor):
    """Preemption handler for memory pages."""
    
    def can_preempt(self, resource: Resource) -> bool:
        # Can always preempt memory pages (swap to disk)
        return True
    
    def preempt(self, resource: Resource, from_pid: int, to_pid: int) -> PreemptionResult:
        # 1. Save page to swap
        # 2. Invalidate page in from_pid's page table
        # 3. Allocate page to to_pid
        # 4. When from_pid accesses, page fault will reload
        return PreemptionResult.SUCCESS
 
class LockPreemptor(ResourcePreemptor):
    """Preemption handler for locks (via rollback)."""
    
    def __init__(self, checkpoint_mgr):
        self.checkpoint_mgr = checkpoint_mgr
    
    def can_preempt(self, resource: Resource) -> bool:
        # Can preempt if holder has a checkpoint
        holder = resource.holder
        return holder is not None and self.checkpoint_mgr.has_checkpoint(holder)
    
    def preempt(self, resource: Resource, from_pid: int, to_pid: int) -> PreemptionResult:
        # Roll back from_pid to release the lock, then grant to to_pid
        checkpoint = self.checkpoint_mgr.get_latest_checkpoint(from_pid)
        
        if not checkpoint:
            return PreemptionResult.NOT_PREEMPTABLE
        
        success = self.checkpoint_mgr.rollback(from_pid, checkpoint)
        
        if success:
            # Lock should now be released
            return PreemptionResult.SUCCESS
        else:
            return PreemptionResult.FAILED
 
class PreemptionRecovery:
    """
    Deadlock recovery via resource preemption.
    
    Attempts to break deadlock by preempting resources from victim
    processes rather than terminating them.
    """
    
    def __init__(self,
                 resources: Dict[int, Resource],
                 preemptors: Dict[str, ResourcePreemptor],
                 detect_deadlock: Callable[[], List[int]],
                 victim_selector: Callable[[List[int]], int]):
        self.resources = resources
        self.preemptors = preemptors
        self.detect_deadlock = detect_deadlock
        self.select_victim = victim_selector
    
    def recover(self, deadlocked: List[int]) -> Dict:
        """
        Attempt to recover from deadlock via preemption.
        Falls back to termination if preemption not possible.
        """
        result = {
            'preempted_resources': [],
            'terminated_processes': [],
            'success': False
        }
        
        while deadlocked:
            # Select victim
            victim = self.select_victim(deadlocked)
            
            # Get resources held by victim
            victim_resources = self._get_resources_held_by(victim)
            
            # Try to preempt resources from victim
            preempted_any = False
            for resource in victim_resources:
                if self._try_preempt(resource, victim):
                    result['preempted_resources'].append(resource.id)
                    preempted_any = True
                    break  # Try one at a time
            
            if not preempted_any:
                # Can't preempt - must terminate
                self._terminate(victim)
                result['terminated_processes'].append(victim)
            
            # Re-check for deadlock
            deadlocked = self.detect_deadlock()
        
        result['success'] = True
        return result
    
    def _try_preempt(self, resource: Resource, from_pid: int) -> bool:
        """Attempt to preempt a resource."""
        preemptor = self.preemptors.get(resource.resource_type)
        
        if not preemptor or not preemptor.can_preempt(resource):
            return False
        
        # Find a waiting process that needs this resource
        waiter = self._find_waiter_for(resource.id)
        
        if waiter:
            result = preemptor.preempt(resource, from_pid, waiter)
            return result == PreemptionResult.SUCCESS
        
        return False
    
    def _get_resources_held_by(self, pid: int) -> List[Resource]:
        return [r for r in self.resources.values() if r.holder == pid]
    
    def _find_waiter_for(self, resource_id: int) -> Optional[int]:
        # Find process waiting for this resource
        pass
    
    def _terminate(self, pid: int):
        # Fallback termination
        import os, signal
        os.kill(pid, signal.SIGTERM)

Preemption is Complex

Resource preemption sounds elegant but is hard to implement safely. The preempted process must be able to handle suddenly losing its resource. For locks, this typically means rollback. For memory, the OS handles it. For many other resources, preemption simply isn't practical. Carefully evaluate whether your resource types support safe preemption.

Recovery in Real Systems

Let's examine how actual production systems implement deadlock recovery.

PostgreSQL Deadlock Recovery:

PostgreSQL detects deadlocks on a configurable interval (default: 1 second). When detected, it aborts one transaction.

postgresql_deadlock.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- PostgreSQL deadlock configuration
-- In postgresql.conf:
 
-- How often to check for deadlock
deadlock_timeout = 1s
 
-- Log level for deadlock reports
log_lock_waits = on
log_min_messages = warning
 
-- When deadlock occurs, PostgreSQL:
-- 1. Aborts one transaction (victim selected by internal heuristics)
-- 2. Releases all locks held by the aborted transaction
-- 3. Sends ERROR to the client:
--    ERROR: deadlock detected
--    DETAIL: Process 12345 waits for ShareLock on transaction 67890;
--            blocked by process 67890.
--            Process 67890 waits for ShareLock on transaction 12345;
--            blocked by process 12345.
--    HINT: See server log for query details.
 
-- Application code should handle this:
DO $$
DECLARE
    retry_count INT := 0;
    max_retries INT := 3;
BEGIN
    LOOP
        BEGIN
            -- Your transaction logic here
            UPDATE accounts SET balance = balance - 100 WHERE id = 1;
            UPDATE accounts SET balance = balance + 100 WHERE id = 2;
            COMMIT;
            EXIT;  -- Success, exit loop
        EXCEPTION WHEN SQLSTATE '40P01' THEN  -- deadlock_detected
            retry_count := retry_count + 1;
            IF retry_count >= max_retries THEN
                RAISE;  -- Give up after max retries
            END IF;
            -- Random backoff before retry
            PERFORM pg_sleep(random() * 0.1);
            -- Loop continues for retry
        END;
    END LOOP;
END;
$$;

Prevention is Preferred

Notice that the Linux kernel uses prevention (lockdep) rather than runtime recovery. This is common philosophy: for systems where correctness is paramount, prevent deadlock by design. Use detection and recovery only when prevention is too expensive or impractical.

Summary: Choosing Your Recovery Strategy

We've explored the full landscape of deadlock recovery options, from human intervention to automatic preemption.

Key Takeaways

•Recovery breaks the deadlock cycle: Either remove a node (terminate process) or remove an edge (preempt resource).
•Victim selection minimizes cost: Consider priority, progress, resources held, user impact, and historical victimization.
•Termination is simple but destructive: Works for any deadlock but loses all victim process work.
•Rollback is elegant for transactions: Undo work back to a checkpoint, then retry. Standard in databases.
•Preemption works only for preemptable resources: CPU and memory can be preempted; locks and file handles generally cannot.
•Real systems combine strategies: PostgreSQL aborts transactions, Linux prevents with lockdep, Java detects for diagnostics.

Decision Guide:

System Type	Recommended Recovery
OLTP Database	Abort/rollback youngest transaction, retry
Batch Processing	Terminate victim, restart job
Operating System	Prevent via design; timeout as fallback
Distributed System	Timeout with retry, idempotent operations
Real-time System	Prevention is mandatory; recovery too slow

What's Next:

The next pages dive deeper into the two main recovery mechanisms: process termination (how to do it safely, handling cleanup) and resource preemption (when it's possible, how to implement it).

Page Complete

You now understand the complete landscape of deadlock recovery options. You can evaluate trade-offs between different strategies, implement victim selection policies, and choose appropriate recovery mechanisms for different system types. This knowledge is essential for designing robust systems that handle deadlock gracefully.

4 / 6

Loading learning content...

Operating SystemsDetection and Recovery

Deadlock Detection and Recovery

LevelAdvanced

Duration90 mins

TopicDetection and Recovery

4 / 6

Recovery Options

Breaking the Deadlock

The detection algorithm has identified deadlock. A set of processes are caught in circular wait, unable to make progress. Now what?

This page explores the complete landscape of recovery options: when each is appropriate, how to choose victims fairly, and how to minimize the damage of recovery.

What You Will Learn

The Recovery Landscape

Recovery options span a spectrum from manual intervention to fully automatic resolution. Each point on this spectrum trades off human judgment against automation speed.

Spectrum of Deadlock Recovery Options
Strategy	Automation	Damage	Speed	When Appropriate
Operator intervention	None	Minimal (human choice)	Slow (minutes)	Critical systems, rare deadlocks
Process termination (all)	Full	Maximum	Fast	Simple systems, expendable processes
Process termination (one)	Full	Moderate	Fast	General purpose, restart exists
Checkpoint/rollback	Full	Low (lost work)	Medium	Databases, transactions
Resource preemption	Full	Variable	Medium	Preemptable resources only
Timeout + retry	Full	Low	Variable	Distributed systems, idempotent ops

The Fundamental Trade-off:

Every recovery option makes a trade-off between:

Damage to processes: How much work is lost? Data corrupted? State invalidated?
Speed of recovery: How quickly does the system return to productive work?
Fairness: Are the same processes always victimized? Is cost distributed fairly?
Complexity: How hard is the recovery mechanism to implement and debug?
Guarantees: Can we prove recovery always breaks deadlock? Are there edge cases?

No single strategy wins on all dimensions. The right choice depends on workload characteristics, system requirements, and operational constraints.

Recovery Can Cause Further Problems

Operator Intervention

The simplest recovery strategy is to alert a human operator and let them decide what to do. This may seem primitive, but for critical systems it's often the safest choice.

When Operator Intervention Makes Sense:

Deadlock is rare (once a day or less)
System is mission-critical (incorrect recovery is worse than delay)
Operators have deep system knowledge
Automatic recovery has unacceptable failure modes
Recovery action requires human judgment (contact customer, check logs, etc.)

operator_alert_system.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
import smtplib
import json
from datetime import datetime
from dataclasses import dataclass
from typing import List
import logging
 
@dataclass
class DeadlockAlert:
    """Alert structure for operator notification."""
    timestamp: datetime
    deadlocked_processes: List[int]
    resource_state: dict
    affected_users: List[str]
    suggested_victims: List[dict]  # Pre-computed victim candidates
    severity: str  # 'low', 'medium', 'high', 'critical'
 
class OperatorAlertSystem:
    """
    Deadlock alert system for operator intervention.
    
    Provides:
    - Multi-channel alerting (email, PagerDuty, Slack)
    - Escalation based on duration and severity
    - Detailed diagnostic information
    - Suggested recovery actions
    """
    
    def __init__(self, config: dict):
        self.config = config
        self.logger = logging.getLogger('deadlock_alert')
        self.escalation_states = {}  # deadlock_id -> escalation_level
        
    def alert_deadlock(self, alert: DeadlockAlert) -> str:
        """
        Alert operators to a detected deadlock.
        Returns a deadlock_id for tracking.
        """
        deadlock_id = f"DL-{alert.timestamp.strftime('%Y%m%d%H%M%S')}"
        
        # Determine initial escalation level based on severity
        if alert.severity == 'critical':
            self._send_pagerduty(alert, deadlock_id)
            self._send_slack_alert(alert, deadlock_id, channel='#critical')
        elif alert.severity == 'high':
            self._send_slack_alert(alert, deadlock_id, channel='#ops')
            self._send_email(alert, deadlock_id)
        else:
            self._send_slack_alert(alert, deadlock_id, channel='#ops')
        
        self.escalation_states[deadlock_id] = {
            'level': 1,
            'started': alert.timestamp,
            'alert': alert
        }
        
        self.logger.warning(
            f"Deadlock alert {deadlock_id}: "
            f"{len(alert.deadlocked_processes)} processes deadlocked"
        )
        
        return deadlock_id
    
    def escalate(self, deadlock_id: str):
        """
        Escalate an unresolved deadlock.
        Called periodically while deadlock persists.
        """
        state = self.escalation_states.get(deadlock_id)
        if not state:
            return
        
        state['level'] += 1
        level = state['level']
        
        if level == 2:
            # 5 minutes unresolved: page on-call
            self._send_pagerduty(state['alert'], deadlock_id)
        elif level == 3:
            # 15 minutes: escalate to senior on-call
            self._send_pagerduty(
                state['alert'], 
                deadlock_id,
                escalation='senior'
            )
        elif level == 4:
            # 30 minutes: emergency, consider automatic recovery
            self.logger.critical(
                f"Deadlock {deadlock_id} unresolved for 30+ minutes. "
                f"Consider enabling automatic recovery."
            )
    
    def format_alert_message(self, alert: DeadlockAlert, deadlock_id: str) -> str:
        """Format a human-readable alert message."""
        msg = []
        msg.append(f"🔴 DEADLOCK DETECTED: {deadlock_id}")
        msg.append(f"Time: {alert.timestamp.isoformat()}")
        msg.append(f"Severity: {alert.severity.upper()}")
        msg.append(f"")
        msg.append(f"Affected Processes: {alert.deadlocked_processes}")
        msg.append(f"Affected Users: {alert.affected_users or 'None identifiable'}")
        msg.append(f"")
        msg.append(f"Suggested Recovery Actions:")
        
        for i, victim in enumerate(alert.suggested_victims[:3], 1):
            msg.append(
                f"  {i}. Kill process {victim['pid']} "
                f"(cost: {victim['cost']}, reason: {victim['reason']})"
            )
        
        msg.append(f"")
        msg.append(f"To recover, run: kill-process --deadlock {deadlock_id} --victim <pid>")
        msg.append(f"To dismiss: dismiss-deadlock {deadlock_id} --reason <reason>")
        
        return "
".join(msg)
    
    def _send_pagerduty(self, alert, deadlock_id, escalation=None):
        # PagerDuty integration
        pass
    
    def _send_slack_alert(self, alert, deadlock_id, channel):
        # Slack webhook integration
        pass
    
    def _send_email(self, alert, deadlock_id):
        # Email alert
        pass

Advantages

•Human judgment on complex situations
•Can investigate root cause before acting
•Zero risk of automatic action errors
•Opportunity to fix underlying bug
•Audit trail of decisions

Disadvantages

•Slow response (minutes to hours)
•Requires 24/7 on-call staffing
•Human error in high-pressure situations
•Processes remain blocked during wait
•Inconsistent decisions across operators

Assist the Operator

Process Termination Overview

Two Termination Strategies:

Terminate all deadlocked processes: Guaranteed to break deadlock, but maximizes damage.
Terminate one at a time: Terminate one process, re-check for deadlock, repeat if necessary. Minimizes damage but requires multiple detection passes.

Most systems use the second approach, carefully selecting which process to terminate.

termination_recovery.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
from dataclasses import dataclass
from typing import List, Set, Callable
import signal
import os
 
@dataclass
class TerminationResult:
    """Result of a termination-based recovery attempt."""
    success: bool
    terminated_processes: List[int]
    remaining_deadlock: List[int]
    total_cost: float
 
class TerminationRecovery:
    """
    Deadlock recovery through process termination.
    """
    
    def __init__(self, 
                 cost_function: Callable[[int], float],
                 detect_deadlock: Callable[[], List[int]],
                 max_victims: int = 10):
        """
        Args:
            cost_function: Returns termination cost for a process ID
            detect_deadlock: Returns list of currently deadlocked PIDs
            max_victims: Maximum processes to terminate before giving up
        """
        self.cost_function = cost_function
        self.detect_deadlock = detect_deadlock
        self.max_victims = max_victims
    
    def recover_terminate_all(self, 
                               deadlocked: List[int]) -> TerminationResult:
        """
        Terminate all deadlocked processes.
        
        Pros: Guaranteed to work, simple, fast
        Cons: Maximum collateral damage
        """
        total_cost = 0.0
        terminated = []
        
        for pid in deadlocked:
            try:
                cost = self.cost_function(pid)
                self._terminate(pid)
                terminated.append(pid)
                total_cost += cost
            except ProcessLookupError:
                # Already dead, that's fine
                pass
        
        # Verify deadlock is broken
        remaining = self.detect_deadlock()
        
        return TerminationResult(
            success=len(remaining) == 0,
            terminated_processes=terminated,
            remaining_deadlock=remaining,
            total_cost=total_cost
        )
    
    def recover_terminate_selective(self,
                                     deadlocked: List[int]) -> TerminationResult:
        """
        Terminate processes one at a time, starting with lowest cost.
        
        Pros: Minimizes damage
        Cons: Slower, requires repeated detection
        """
        terminated = []
        total_cost = 0.0
        
        for _ in range(self.max_victims):
            # Re-detect (previous termination may have broken deadlock)
            current_deadlock = self.detect_deadlock()
            
            if not current_deadlock:
                # Deadlock broken!
                return TerminationResult(
                    success=True,
                    terminated_processes=terminated,
                    remaining_deadlock=[],
                    total_cost=total_cost
                )
            
            # Choose victim: lowest cost among currently deadlocked
            victim = self._select_victim(current_deadlock)
            
            try:
                cost = self.cost_function(victim)
                self._terminate(victim)
                terminated.append(victim)
                total_cost += cost
            except ProcessLookupError:
                continue  # Try another
        
        # Check if we succeeded
        remaining = self.detect_deadlock()
        
        return TerminationResult(
            success=len(remaining) == 0,
            terminated_processes=terminated,
            remaining_deadlock=remaining,
            total_cost=total_cost
        )
    
    def _select_victim(self, deadlocked: List[int]) -> int:
        """
        Select which process to terminate.
        Default: lowest cost process.
        """
        return min(deadlocked, key=self.cost_function)
    
    def _terminate(self, pid: int):
        """Terminate a process."""
        # In production: use appropriate termination method
        # Options: SIGTERM (graceful), SIGKILL (forceful)
        os.kill(pid, signal.SIGTERM)
        
        # Wait briefly for graceful shutdown
        import time
        time.sleep(0.1)
        
        # Force kill if still alive
        try:
            os.kill(pid, 0)  # Check if still running
            os.kill(pid, signal.SIGKILL)
        except ProcessLookupError:
            pass  # Already dead

Termination Methods:

Method	Signal	Behavior	When to Use
Graceful	SIGTERM	Process can catch, cleanup, exit cleanly	Preferred, allows cleanup handlers
Forceful	SIGKILL	Immediate termination, no cleanup	Process not responding to SIGTERM
Abort	SIGABRT	Core dump + terminate	Need debug information

Always try graceful termination first. Force only after a timeout (typically 5-30 seconds).

Cleanup is Critical

Victim Selection Criteria

When terminating processes selectively, choosing the right victim is crucial. The goal is to minimize the total cost of recovery while guaranteeing the deadlock is broken.

Factors to Consider:

Victim Selection Criteria

•Process priority — Terminate low-priority processes first. System daemons and critical services are high priority.
•Work completed — Prefer terminating processes early in their execution. A process 90% done loses more when killed than one 10% done.
•Resources held — Terminating a process holding many resources frees more for others. But it also means more lost work.
•User impact — Prefer terminating background jobs over interactive user sessions. Consider SLA impact.
•Restart cost — How expensive is it to restart this process? Long initialization, large data loading?
•Historical victimization — Avoid always killing the same process. Implement fairness over time.
•Transaction age — In databases, younger transactions have done less work. Prefer aborting them.
•Rollback complexity — Some work is easily undone (transactions with savepoints), other work is not.

victim_selection.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
from dataclasses import dataclass
from typing import List, Dict
from datetime import datetime, timedelta
import os
 
@dataclass
class ProcessInfo:
    """Information about a process for victim selection."""
    pid: int
    priority: int                    # Higher = more important
    start_time: datetime             # When process started
    resources_held: int              # Number of resources held
    progress_percent: float          # Estimated progress 0-100
    user_facing: bool                # True if interactive/user session
    times_victimized: int            # Times killed for deadlock before
    restart_cost_seconds: float      # Estimated restart time
    
class VictimSelector:
    """
    Select optimal victim for deadlock termination.
    
    Implements multiple selection strategies and combines them
    into a weighted cost function.
    """
    
    # Weight factors (tune based on system priorities)
    WEIGHTS = {
        'priority': 10.0,           # w1: Higher priority = higher cost
        'progress': 2.0,            # w2: More progress = higher cost
        'resources': -0.5,          # w3: More resources = lower cost (frees more)
        'user_facing': 50.0,        # w4: User-facing = much higher cost
        'victimization': 5.0,       # w5: Previously victimized = higher cost
        'restart': 1.0,             # w6: Restart cost directly
    }
    
    def __init__(self, weights: Dict[str, float] = None):
        self.weights = weights or self.WEIGHTS
        
    def compute_cost(self, proc: ProcessInfo) -> float:
        """
        Compute the cost of terminating a process.
        Lower cost = better victim.
        """
        w = self.weights
        
        cost = (
            w['priority'] * proc.priority +
            w['progress'] * proc.progress_percent +
            w['resources'] * proc.resources_held +
            w['user_facing'] * (100.0 if proc.user_facing else 0) +
            w['victimization'] * proc.times_victimized +
            w['restart'] * proc.restart_cost_seconds
        )
        
        return cost
    
    def select_victim(self, 
                      deadlocked_pids: List[int],
                      process_info: Dict[int, ProcessInfo]) -> int:
        """
        Select the best victim from deadlocked processes.
        
        Returns: PID of selected victim (lowest cost)
        """
        candidates = [
            (pid, self.compute_cost(process_info[pid]))
            for pid in deadlocked_pids
            if pid in process_info
        ]
        
        if not candidates:
            # Fallback: just pick first one
            return deadlocked_pids[0]
        
        # Select lowest cost
        victim, cost = min(candidates, key=lambda x: x[1])
        return victim
    
    def explain_selection(self,
                          pid: int,
                          proc: ProcessInfo) -> str:
        """
        Explain why a process was selected as victim.
        Useful for logging and operator review.
        """
        factors = []
        w = self.weights
        
        if proc.priority < 5:
            factors.append(f"Low priority ({proc.priority})")
        if proc.progress_percent < 20:
            factors.append(f"Low progress ({proc.progress_percent:.0f}%)")
        if proc.resources_held > 5:
            factors.append(f"Holds many resources ({proc.resources_held})")
        if not proc.user_facing:
            factors.append("Background process")
        if proc.times_victimized == 0:
            factors.append("Never victimized before")
        if proc.restart_cost_seconds < 10:
            factors.append(f"Quick restart ({proc.restart_cost_seconds:.0f}s)")
        
        cost = self.compute_cost(proc)
        
        return (
            f"Selected P{pid} as victim (cost: {cost:.1f}).
"
            f"Factors: {', '.join(factors) or 'No strong factors - default choice'}"
        )
 
 
# Example usage
def demo_victim_selection():
    selector = VictimSelector()
    
    processes = {
        100: ProcessInfo(100, priority=1, start_time=datetime.now()-timedelta(seconds=5),
                        resources_held=2, progress_percent=10, user_facing=False,
                        times_victimized=0, restart_cost_seconds=1),
        101: ProcessInfo(101, priority=8, start_time=datetime.now()-timedelta(minutes=30),
                        resources_held=1, progress_percent=85, user_facing=True,
                        times_victimized=0, restart_cost_seconds=60),
        102: ProcessInfo(102, priority=5, start_time=datetime.now()-timedelta(minutes=5),
                        resources_held=3, progress_percent=50, user_facing=False,
                        times_victimized=3, restart_cost_seconds=10),
    }
    
    deadlocked = [100, 101, 102]
    
    print("Process costs:")
    for pid, proc in processes.items():
        cost = selector.compute_cost(proc)
        print(f"  P{pid}: cost = {cost:.1f}")
    
    victim = selector.select_victim(deadlocked, processes)
    print(f"
Selected victim: P{victim}")
    print(selector.explain_selection(victim, processes[victim]))

Fairness Over Time

Checkpoint and Rollback

checkpoint_rollback.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Any
from datetime import datetime
from abc import ABC, abstractmethod
import copy
 
@dataclass
class Checkpoint:
    """A saved process state for rollback."""
    checkpoint_id: int
    timestamp: datetime
    held_resources: List[int]       # Resources held at checkpoint
    process_state: Any              # Opaque process state
    parent_checkpoint: Optional[int] = None  # For nested checkpoints
 
class CheckpointManager(ABC):
    """Abstract interface for checkpoint/rollback support."""
    
    @abstractmethod
    def create_checkpoint(self, process_id: int) -> int:
        """Create a checkpoint, return checkpoint ID."""
        pass
    
    @abstractmethod
    def rollback(self, process_id: int, checkpoint_id: int) -> bool:
        """Roll back a process to a checkpoint."""
        pass
    
    @abstractmethod
    def discard_checkpoint(self, checkpoint_id: int):
        """Discard a checkpoint (no longer needed)."""
        pass
 
class TransactionCheckpointManager(CheckpointManager):
    """
    Checkpoint manager using database transaction semantics.
    
    This is how real database systems implement rollback for deadlock recovery.
    """
    
    def __init__(self, transaction_log, lock_manager):
        self.transaction_log = transaction_log
        self.lock_manager = lock_manager
        self.checkpoints: Dict[int, Checkpoint] = {}
        self.next_checkpoint_id = 1
    
    def create_checkpoint(self, process_id: int) -> int:
        """
        Create a savepoint in the transaction.
        All operations after this can be rolled back.
        """
        checkpoint_id = self.next_checkpoint_id
        self.next_checkpoint_id += 1
        
        # Record current state
        checkpoint = Checkpoint(
            checkpoint_id=checkpoint_id,
            timestamp=datetime.now(),
            held_resources=list(self.lock_manager.get_locks(process_id)),
            process_state=self.transaction_log.get_savepoint(process_id)
        )
        
        self.checkpoints[checkpoint_id] = checkpoint
        return checkpoint_id
    
    def rollback(self, process_id: int, checkpoint_id: int) -> bool:
        """
        Roll back to a checkpoint.
        
        1. Undo all operations since checkpoint
        2. Release locks acquired since checkpoint
        3. Restore process to checkpoint state
        """
        checkpoint = self.checkpoints.get(checkpoint_id)
        if not checkpoint:
            return False
        
        # Get operations to undo (in reverse order)
        operations = self.transaction_log.get_operations_since(
            process_id, 
            checkpoint.process_state
        )
        
        # Undo each operation
        for op in reversed(operations):
            self._undo_operation(op)
        
        # Release locks acquired after checkpoint
        current_locks = set(self.lock_manager.get_locks(process_id))
        checkpoint_locks = set(checkpoint.held_resources)
        locks_to_release = current_locks - checkpoint_locks
        
        for resource in locks_to_release:
            self.lock_manager.release(process_id, resource)
        
        # Restore transaction log position
        self.transaction_log.restore_savepoint(
            process_id, 
            checkpoint.process_state
        )
        
        return True
    
    def _undo_operation(self, operation):
        """
        Undo a single operation using its undo record.
        
        Common patterns:
        - INSERT: delete the inserted row
        - DELETE: re-insert the deleted row
        - UPDATE: restore the old value
        """
        if operation.type == 'INSERT':
            self._delete_row(operation.table, operation.row_id)
        elif operation.type == 'DELETE':
            self._insert_row(operation.table, operation.old_data)
        elif operation.type == 'UPDATE':
            self._update_row(operation.table, operation.row_id, operation.old_data)
    
    def discard_checkpoint(self, checkpoint_id: int):
        """Discard checkpoint on successful completion."""
        self.checkpoints.pop(checkpoint_id, None)
 
 
class DeadlockRollbackRecovery:
    """
    Deadlock recovery using transaction rollback.
    
    Instead of killing processes, we roll back the victim transaction
    and let it retry. This is the standard approach in databases.
    """
    
    def __init__(self, 
                 checkpoint_mgr: CheckpointManager,
                 detect_deadlock,
                 victim_selector):
        self.checkpoint_mgr = checkpoint_mgr
        self.detect_deadlock = detect_deadlock
        self.select_victim = victim_selector
    
    def recover(self, deadlocked: List[int]) -> Dict:
        """
        Recover from deadlock by rolling back victim transactions.
        """
        result = {
            'victims': [],
            'rolled_back': 0,
            'success': False
        }
        
        while deadlocked:
            # Select victim
            victim = self.select_victim(deadlocked)
            result['victims'].append(victim)
            
            # Find earliest checkpoint for victim
            earliest = self._get_earliest_checkpoint(victim)
            
            if earliest:
                # Rollback to that checkpoint
                success = self.checkpoint_mgr.rollback(victim, earliest)
                if success:
                    result['rolled_back'] += 1
            else:
                # No checkpoint - must abort entirely
                self._abort_transaction(victim)
            
            # Re-check for deadlock
            deadlocked = self.detect_deadlock()
        
        result['success'] = True
        return result
    
    def _get_earliest_checkpoint(self, process_id: int) -> Optional[int]:
        """Get the earliest valid checkpoint for a process."""
        # Implementation depends on checkpoint storage
        pass
    
    def _abort_transaction(self, process_id: int):
        """Fully abort a transaction with no checkpoint."""
        # Roll back to transaction start
        pass

Advantages of Rollback over Termination:

Aspect	Termination	Rollback
Work lost	All process work	Work since checkpoint
Process state	Must restart from scratch	Continues from checkpoint
External effects	May have caused side effects	Side effects undone
Recovery time	Full restart time	Resume from rollback point
Data consistency	Manual cleanup needed	Automatic via undo log

Automatic Retry After Rollback

Resource Preemption

Preemptable vs Non-Preemptable Resources:

Resource Preemptability
Resource Type	Preemptable?	Mechanism	Cost
CPU	✅ Yes	Context switch, scheduler	Very low (microseconds)
Memory pages	✅ Yes	Swap to disk	Medium (I/O cost)
Printer (mid-job)	❌ No	Would produce garbage output	N/A
Database row lock	✅ Partial	Rollback transaction, release lock	Lost transaction work
File handle	❌ No	Unclear consistent state	N/A
Network connection	❌ No	Breaks protocol state	N/A
GPU memory	✅ Yes	Swap to host memory	High (transfer cost)

resource_preemption.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
from enum import Enum
from dataclasses import dataclass
from typing import List, Dict, Optional, Callable
from abc import ABC, abstractmethod
 
class PreemptionResult(Enum):
    SUCCESS = 1           # Resource successfully preempted
    NOT_PREEMPTABLE = 2   # Resource type doesn't support preemption
    IN_USE = 3            # Resource in critical use, can't preempt now
    FAILED = 4            # Preemption attempted but failed
 
@dataclass
class Resource:
    id: int
    resource_type: str
    holder: Optional[int]      # PID of current holder
    preemptable: bool
    preempt_cost: float        # Cost to preempt this resource
 
class ResourcePreemptor(ABC):
    """Abstract base for resource-specific preemption logic."""
    
    @abstractmethod
    def can_preempt(self, resource: Resource) -> bool:
        """Check if resource can be safely preempted now."""
        pass
    
    @abstractmethod
    def preempt(self, resource: Resource, from_pid: int, to_pid: int) -> PreemptionResult:
        """Preempt resource from one process to another."""
        pass
 
class MemoryPagePreemptor(ResourcePreemptor):
    """Preemption handler for memory pages."""
    
    def can_preempt(self, resource: Resource) -> bool:
        # Can always preempt memory pages (swap to disk)
        return True
    
    def preempt(self, resource: Resource, from_pid: int, to_pid: int) -> PreemptionResult:
        # 1. Save page to swap
        # 2. Invalidate page in from_pid's page table
        # 3. Allocate page to to_pid
        # 4. When from_pid accesses, page fault will reload
        return PreemptionResult.SUCCESS
 
class LockPreemptor(ResourcePreemptor):
    """Preemption handler for locks (via rollback)."""
    
    def __init__(self, checkpoint_mgr):
        self.checkpoint_mgr = checkpoint_mgr
    
    def can_preempt(self, resource: Resource) -> bool:
        # Can preempt if holder has a checkpoint
        holder = resource.holder
        return holder is not None and self.checkpoint_mgr.has_checkpoint(holder)
    
    def preempt(self, resource: Resource, from_pid: int, to_pid: int) -> PreemptionResult:
        # Roll back from_pid to release the lock, then grant to to_pid
        checkpoint = self.checkpoint_mgr.get_latest_checkpoint(from_pid)
        
        if not checkpoint:
            return PreemptionResult.NOT_PREEMPTABLE
        
        success = self.checkpoint_mgr.rollback(from_pid, checkpoint)
        
        if success:
            # Lock should now be released
            return PreemptionResult.SUCCESS
        else:
            return PreemptionResult.FAILED
 
class PreemptionRecovery:
    """
    Deadlock recovery via resource preemption.
    
    Attempts to break deadlock by preempting resources from victim
    processes rather than terminating them.
    """
    
    def __init__(self,
                 resources: Dict[int, Resource],
                 preemptors: Dict[str, ResourcePreemptor],
                 detect_deadlock: Callable[[], List[int]],
                 victim_selector: Callable[[List[int]], int]):
        self.resources = resources
        self.preemptors = preemptors
        self.detect_deadlock = detect_deadlock
        self.select_victim = victim_selector
    
    def recover(self, deadlocked: List[int]) -> Dict:
        """
        Attempt to recover from deadlock via preemption.
        Falls back to termination if preemption not possible.
        """
        result = {
            'preempted_resources': [],
            'terminated_processes': [],
            'success': False
        }
        
        while deadlocked:
            # Select victim
            victim = self.select_victim(deadlocked)
            
            # Get resources held by victim
            victim_resources = self._get_resources_held_by(victim)
            
            # Try to preempt resources from victim
            preempted_any = False
            for resource in victim_resources:
                if self._try_preempt(resource, victim):
                    result['preempted_resources'].append(resource.id)
                    preempted_any = True
                    break  # Try one at a time
            
            if not preempted_any:
                # Can't preempt - must terminate
                self._terminate(victim)
                result['terminated_processes'].append(victim)
            
            # Re-check for deadlock
            deadlocked = self.detect_deadlock()
        
        result['success'] = True
        return result
    
    def _try_preempt(self, resource: Resource, from_pid: int) -> bool:
        """Attempt to preempt a resource."""
        preemptor = self.preemptors.get(resource.resource_type)
        
        if not preemptor or not preemptor.can_preempt(resource):
            return False
        
        # Find a waiting process that needs this resource
        waiter = self._find_waiter_for(resource.id)
        
        if waiter:
            result = preemptor.preempt(resource, from_pid, waiter)
            return result == PreemptionResult.SUCCESS
        
        return False
    
    def _get_resources_held_by(self, pid: int) -> List[Resource]:
        return [r for r in self.resources.values() if r.holder == pid]
    
    def _find_waiter_for(self, resource_id: int) -> Optional[int]:
        # Find process waiting for this resource
        pass
    
    def _terminate(self, pid: int):
        # Fallback termination
        import os, signal
        os.kill(pid, signal.SIGTERM)

Preemption is Complex

Recovery in Real Systems

Let's examine how actual production systems implement deadlock recovery.

PostgreSQL Deadlock Recovery:

PostgreSQL detects deadlocks on a configurable interval (default: 1 second). When detected, it aborts one transaction.

postgresql_deadlock.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- PostgreSQL deadlock configuration
-- In postgresql.conf:
 
-- How often to check for deadlock
deadlock_timeout = 1s
 
-- Log level for deadlock reports
log_lock_waits = on
log_min_messages = warning
 
-- When deadlock occurs, PostgreSQL:
-- 1. Aborts one transaction (victim selected by internal heuristics)
-- 2. Releases all locks held by the aborted transaction
-- 3. Sends ERROR to the client:
--    ERROR: deadlock detected
--    DETAIL: Process 12345 waits for ShareLock on transaction 67890;
--            blocked by process 67890.
--            Process 67890 waits for ShareLock on transaction 12345;
--            blocked by process 12345.
--    HINT: See server log for query details.
 
-- Application code should handle this:
DO $$
DECLARE
    retry_count INT := 0;
    max_retries INT := 3;
BEGIN
    LOOP
        BEGIN
            -- Your transaction logic here
            UPDATE accounts SET balance = balance - 100 WHERE id = 1;
            UPDATE accounts SET balance = balance + 100 WHERE id = 2;
            COMMIT;
            EXIT;  -- Success, exit loop
        EXCEPTION WHEN SQLSTATE '40P01' THEN  -- deadlock_detected
            retry_count := retry_count + 1;
            IF retry_count >= max_retries THEN
                RAISE;  -- Give up after max retries
            END IF;
            -- Random backoff before retry
            PERFORM pg_sleep(random() * 0.1);
            -- Loop continues for retry
        END;
    END LOOP;
END;
$$;

Prevention is Preferred

Summary: Choosing Your Recovery Strategy

We've explored the full landscape of deadlock recovery options, from human intervention to automatic preemption.

Key Takeaways

•Recovery breaks the deadlock cycle: Either remove a node (terminate process) or remove an edge (preempt resource).
•Victim selection minimizes cost: Consider priority, progress, resources held, user impact, and historical victimization.
•Termination is simple but destructive: Works for any deadlock but loses all victim process work.
•Rollback is elegant for transactions: Undo work back to a checkpoint, then retry. Standard in databases.
•Preemption works only for preemptable resources: CPU and memory can be preempted; locks and file handles generally cannot.
•Real systems combine strategies: PostgreSQL aborts transactions, Linux prevents with lockdep, Java detects for diagnostics.

Decision Guide:

System Type	Recommended Recovery
OLTP Database	Abort/rollback youngest transaction, retry
Batch Processing	Terminate victim, restart job
Operating System	Prevent via design; timeout as fallback
Distributed System	Timeout with retry, idempotent operations
Real-time System	Prevention is mandatory; recovery too slow

What's Next:

Page Complete

4 / 6