Loading learning content...
A deadlock has been detected. Multiple transactions are locked in a circular embrace, each waiting for resources held by another. The system knows the problem exists—now it must solve it. But how?
Deadlock recovery is the final phase of deadlock management—the moment when detection's diagnosis becomes action. The system must make critical decisions: Which transaction should be sacrificed? How much work should be undone? How do we minimize the cascade of effects?
These decisions have real consequences. Choosing the wrong victim can waste hours of completed work. Improper rollback can leave data inconsistent. Poor recovery strategies can cause the same deadlock to recur immediately. Mastering deadlock recovery is essential for maintaining database health under concurrent load.
By the end of this page, you will master the complete deadlock recovery lifecycle—victim selection algorithms, rollback strategies, cascading abort prevention, retry mechanisms, and best practices for minimizing recovery overhead in production systems.
When a deadlock is detected, the database faces a fundamental problem: the circular wait must be broken, but doing so requires revoking guaranteed transaction properties. Let's understand the full scope of this challenge:
The Irrevocable Decision:
To break a deadlock cycle, at least one transaction must be aborted. This transaction becomes the victim. Aborting means:
The challenge is that ANY transaction in the cycle could be chosen as the victim. Different choices have vastly different consequences.
A deadlock victim loses ALL its work. If a transaction has been running for 10 minutes, processed 100,000 rows, and is 99% complete—being selected as a victim means starting over. Good victim selection can save hours of wasted computation.
Choosing which transaction to abort is perhaps the most critical decision in deadlock recovery. Various algorithms exist, each optimizing for different goals:
Algorithm 1: Minimum Work Done (Cost-Based)
Select the transaction that has performed the least amount of work. This minimizes wasted effort.
def select_victim_minimum_work(deadlock_cycle):
"""
Select victim based on minimum work already performed.
Work can be measured by rows modified, log bytes generated, CPU time, etc.
"""
min_work = float('inf')
victim = None
for transaction in deadlock_cycle:
work = calculate_work_done(transaction)
if work < min_work:
min_work = work
victim = transaction
return victim
def calculate_work_done(transaction):
"""Calculate approximate work done by transaction."""
return (
transaction.rows_modified * 1.0 +
transaction.rows_read * 0.1 +
transaction.log_bytes_written * 0.01 +
transaction.cpu_time_ms * 0.001
)
Pros: Minimizes wasted work Cons: May repeatedly victimize young transactions, causing starvation
Algorithm 2: Youngest Transaction (Timestamp-Based)
Always abort the transaction that started most recently. Simple and predictable.
def select_victim_youngest(deadlock_cycle):
"""Select the most recently started transaction."""
return max(deadlock_cycle, key=lambda t: t.start_timestamp)
Algorithm 3: Minimum Locks Held
Select the transaction holding the fewest locks, minimizing disruption to other waiters.
def select_victim_minimum_locks(deadlock_cycle):
"""Select transaction holding fewest locks."""
return min(deadlock_cycle, key=lambda t: len(t.held_locks))
Algorithm 4: Priority-Based
Assign explicit priorities to transactions; always abort lowest priority.
def select_victim_priority(deadlock_cycle):
"""Select lowest priority transaction."""
return min(deadlock_cycle, key=lambda t: t.priority)
| Algorithm | Optimization Goal | Starvation Risk | Implementation Complexity | Best For |
|---|---|---|---|---|
| Minimum Work | Minimize wasted effort | High (small txns always chosen) | Medium | Varied transaction sizes |
| Youngest | Simple, predictable | Medium (new txns disadvantaged) | Low | General purpose |
| Minimum Locks | Minimize blocking | Low | Low | High-contention workloads |
| Priority-Based | Business requirements | None (explicit control) | Medium | Mixed criticality workloads |
| Composite Score | Balanced optimization | Low | High | Production systems |
Production databases typically use a composite cost function that weighs multiple factors to select victims fairly and efficiently:
The Composite Approach:
Instead of optimizing for a single factor, assign a cost score to each transaction considering multiple dimensions:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
class VictimSelector: """ Production-grade victim selection using composite scoring. """ # Weight configuration (tunable per workload) WEIGHT_LOG_SIZE = 1.0 # Log records generated WEIGHT_LOCKS_HELD = 0.5 # Number of locks held WEIGHT_CPU_TIME = 0.3 # CPU cycles consumed WEIGHT_AGE = 0.2 # Transaction age (prevent starvation) WEIGHT_ROLLBACK_COUNT = 2.0 # Previous rollbacks (prevent repeat) WEIGHT_PRIORITY = 5.0 # Explicit priority (business logic) def calculate_victim_cost(self, transaction): """ Higher cost = less desirable to abort. We select the transaction with LOWEST cost. """ cost = 0.0 # Factor 1: Work done (log records are best proxy) cost += transaction.log_records_count * self.WEIGHT_LOG_SIZE # Factor 2: Locks held (aborting releases these) cost += len(transaction.held_locks) * self.WEIGHT_LOCKS_HELD # Factor 3: CPU time invested cost += transaction.cpu_time_ms * self.WEIGHT_CPU_TIME # Factor 4: Age (older transactions should complete) age_seconds = time.time() - transaction.start_timestamp cost += age_seconds * self.WEIGHT_AGE # Factor 5: Starvation prevention (penalize repeat victims) cost += transaction.rollback_count * self.WEIGHT_ROLLBACK_COUNT # Factor 6: Business priority # Higher priority = higher cost = less likely to abort cost += transaction.priority * self.WEIGHT_PRIORITY return cost def select_victim(self, deadlock_cycle): """ Select transaction with minimum cost to abort. """ if not deadlock_cycle: return None victim = min(deadlock_cycle, key=self.calculate_victim_cost) # Log selection rationale for debugging self.log_selection_rationale(deadlock_cycle, victim) return victim def log_selection_rationale(self, cycle, victim): """Log why this victim was selected for debugging.""" scores = [(t.id, self.calculate_victim_cost(t)) for t in cycle] logging.info( f"Deadlock victim selection: {victim.id} " f"(cost: {self.calculate_victim_cost(victim):.2f}). " f"All scores: {scores}" )The weights in composite scoring should be tuned based on your workload. A data warehouse with long-running analytical queries should weight CPU_TIME highly. An OLTP system with many small transactions might weight ROLLBACK_COUNT to prevent repeat failures.
Once a victim is selected, the transaction must be rolled back. The rollback strategy determines how much work is undone:
Strategy 1: Total Rollback
Rollback the entire transaction to its beginning. Simple and safe, but maximizes work lost.
Transaction Timeline:
[START] → Op1 → Op2 → Op3 → [DEADLOCK] → [ROLLBACK TO START]
↑ All work lost
Implementation:
Strategy 2: Partial Rollback (Savepoint-Based)
Rollback only to the most recent savepoint that releases the blocking lock. Preserves work done before the deadlock.
Transaction with Savepoints:
[START] → Op1 → [SAVEPOINT A] → Op2 → [SAVEPOINT B] → Op3 → [DEADLOCK]
Partial Rollback Options:
1. Rollback to B: Undo Op3, keep Op1, Op2 → Only Op3 lock needed?
2. Rollback to A: Undo Op2, Op3, keep Op1 → A releases blocking lock?
Benefits:
Limitations:
| Strategy | Work Preserved | Complexity | Lock Release | Application Impact |
|---|---|---|---|---|
| Total Rollback | None | Low | All locks released | Full restart required |
| Partial (to savepoint) | Work before savepoint | Medium | Only affected locks | Resume from savepoint |
| Partial (minimum) | Maximum possible | High | Only blocking lock | Complex state management |
Most databases default to total rollback because partial rollback requires application cooperation (savepoints) and doesn't always release the needed lock. The simplicity and reliability of total rollback makes it the practical choice.
A dangerous phenomenon in deadlock recovery is the cascading abort—where aborting one transaction forces the abort of others that read its uncommitted data:
Cascade Scenario:
T₁: Write X = 100
T₂: Read X (sees 100, uncommitted from T₁)
T₂: Based on X, modifies Y and Z
T₃: Read Y (sees uncommitted value from T₂)
[DEADLOCK DETECTED: T₁ selected as victim]
T₁ aborted → X value 100 never committed
T₂ read dirty data → T₂ must abort (cascade)
T₃ read T₂'s dirty data → T₃ must abort (cascade)
A single abort has cascaded into three aborts!
Why Modern Databases Avoid Cascades:
Most production databases use Strict 2PL or MVCC, which inherently prevent cascading aborts:
With Strict 2PL:
T₁: Write X = 100 (holds X-lock until commit)
T₂: Read X → BLOCKED (T₁ holds lock)
[DEADLOCK DETECTED: T₁ selected as victim]
T₁ aborted → X never read by anyone
T₂ unblocked → Reads original X value
No cascade! T₂ never saw T₁'s uncommitted write.
This is why isolation level configuration matters for deadlock recovery—higher isolation levels prevent cascades but may increase deadlock frequency (more blocking = more cycles possible).
If using READ UNCOMMITTED isolation, cascading aborts ARE possible. This isolation level is rarely appropriate for transactional workloads precisely because of this risk. Use READ COMMITTED or higher for production OLTP.
After a transaction is aborted due to deadlock, it should typically be retried. Proper retry mechanisms are essential for transparent deadlock handling:
Retry Design Principles:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
class DeadlockRetryHandler: """ Robust retry handler for deadlock victims. """ def __init__( self, max_retries: int = 5, base_delay_ms: int = 100, max_delay_ms: int = 5000, jitter_factor: float = 0.3 ): self.max_retries = max_retries self.base_delay_ms = base_delay_ms self.max_delay_ms = max_delay_ms self.jitter_factor = jitter_factor def execute_with_retry(self, transaction_func, *args, **kwargs): """ Execute transaction with automatic deadlock retry. """ last_exception = None for attempt in range(self.max_retries): try: return transaction_func(*args, **kwargs) except DeadlockException as e: last_exception = e if attempt == self.max_retries - 1: # Final attempt failed logging.error( f"Transaction failed after {self.max_retries} " f"deadlock retries: {e}" ) raise # Calculate backoff delay with jitter delay = self._calculate_delay(attempt) logging.warning( f"Deadlock on attempt {attempt + 1}, " f"retrying in {delay}ms" ) time.sleep(delay / 1000.0) raise last_exception def _calculate_delay(self, attempt: int) -> int: """ Calculate delay with exponential backoff and jitter. Jitter prevents synchronized retry storms. """ # Exponential backoff delay = self.base_delay_ms * (2 ** attempt) delay = min(delay, self.max_delay_ms) # Add random jitter (±jitter_factor%) jitter_range = delay * self.jitter_factor jitter = random.uniform(-jitter_range, jitter_range) delay = int(delay + jitter) return max(delay, 1) # Minimum 1ms # Usage exampleretry_handler = DeadlockRetryHandler(max_retries=5) def transfer_funds(from_acc, to_acc, amount): with database.transaction(): # Lock accounts (could deadlock) from_balance = db.get_balance(from_acc) to_balance = db.get_balance(to_acc) db.set_balance(from_acc, from_balance - amount) db.set_balance(to_acc, to_balance + amount) # Execute with automatic retryretry_handler.execute_with_retry( transfer_funds, from_acc=1000, to_acc=2000, amount=100.00)Understanding how specific databases handle deadlock recovery helps you configure and troubleshoot production systems:
MySQL InnoDB:
InnoDB automatically detects and resolves deadlocks. The victim is rolled back and receives error 1213.
-- MySQL deadlock handling example
-- Application should catch and retry on error 1213
-- Check deadlock information
SHOW ENGINE INNODB STATUS\G
-- Look for 'LATEST DETECTED DEADLOCK' section
-- Victim selection is based on:
-- 1. Approximate row count modified (less = more likely victim)
-- 2. Insert buffer size
-- InnoDB typically picks the transaction that modified fewer rows
-- Configure automatic rollback behavior
SET innodb_rollback_on_timeout = ON; -- Rollback entire transaction on timeout
PostgreSQL:
PostgreSQL aborts the transaction that completed the cycle (the one whose lock request created the deadlock). Error code is 40P01.
-- PostgreSQL returns SQLSTATE 40P01 for deadlocks
-- Logging configuration for deadlocks
SET log_lock_waits = on; -- Log when waiting for locks
SET deadlock_timeout = '1s'; -- Detection delay
-- View locks and pending requests
SELECT * FROM pg_locks WHERE NOT granted;
-- Detailed deadlock info in server logs:
-- ERROR: deadlock detected
-- DETAIL: Process 12345 waits for ShareLock on transaction 67890;
-- blocked by process 23456.
-- Process 23456 waits for ShareLock on transaction 12345;
-- blocked by process 12345.
-- HINT: See server log for query details.
SQL Server:
SQL Server uses sophisticated cost-based victim selection with the DEADLOCK_PRIORITY setting allowing explicit control.
-- SQL Server deadlock priority (LOW, NORMAL, HIGH)
SET DEADLOCK_PRIORITY LOW; -- This session more likely to be victim
SET DEADLOCK_PRIORITY HIGH; -- This session protected from victimization
-- Numeric priority (-10 to 10)
SET DEADLOCK_PRIORITY 5; -- Higher = less likely victim
-- Capture deadlock graphs with Extended Events
CREATE EVENT SESSION [DeadlockCapture] ON SERVER
ADD EVENT sqlserver.xml_deadlock_report
ADD TARGET package0.event_file(SET filename=N'Deadlocks');
-- Error number for deadlock is 1205
-- Applications should catch and retry on this error
| Database | Error Code | Victim Selection | Retry Responsibility | Priority Control |
|---|---|---|---|---|
| MySQL | 1213 | Fewest rows modified | Application layer | Limited (no explicit setting) |
| PostgreSQL | 40P01 | Cycle-completing transaction | Application layer | None |
| SQL Server | 1205 | Cost-based composite | Application layer | DEADLOCK_PRIORITY |
| Oracle | ORA-00060 | Cycle-completing, oldest | Application layer | Limited |
Deadlock recovery is the critical final phase of deadlock management—where detection becomes action. Here are the essential takeaways:
Module Complete:
Congratulations! You have now mastered the complete lifecycle of deadlock handling:
This knowledge equips you to design, configure, and troubleshoot concurrent database systems at the highest level.
You now possess comprehensive expertise in deadlock handling—from theoretical foundations through practical implementation. This knowledge is essential for any database professional working with concurrent transaction systems and is directly applicable to production database administration and application development.