Loading learning content...
Imagine a banking system where Transaction T₁ transfers $1,000 from Account A to Account B. Before T₁ completes, Transaction T₂ reads the updated balance of Account B and uses it to calculate interest. T₂ commits successfully. Then disaster strikes—T₁ encounters an error and must be aborted.
The question becomes: What happens to T₂?
T₂ has already committed based on data that was never actually finalized. The database is now in an inconsistent state—T₂'s calculations are based on a phantom value that never existed in any consistent database state. Worse, we cannot roll back T₂ because committed transactions are supposed to be permanent.
This scenario illustrates a non-recoverable schedule—a transaction execution order that makes proper recovery impossible. Understanding recoverability is fundamental to designing database systems that can survive failures without compromising data integrity.
By the end of this page, you will understand the formal definition of recoverable schedules, why commit ordering is crucial for recovery, how to identify recoverable vs. non-recoverable schedules, and the relationship between recoverability and the ACID properties.
Database systems must handle failures gracefully. When a transaction fails—whether due to system crash, constraint violation, deadlock, or explicit abort—the database must restore itself to a consistent state. This restoration process is called recovery.
Recovery relies on a fundamental principle: we can undo uncommitted transactions by reversing their changes. This works because uncommitted transactions haven't made any guarantees to the outside world—their effects are tentative and can be erased.
However, this principle creates a potential conflict with concurrent execution:
This conflict represents an irrecoverable situation. The database cannot undo T₁ without affecting T₂, but T₂ is already committed. The durability guarantee (the 'D' in ACID) says committed transactions must persist. Yet consistency (the 'C') says the database must be in a valid state.
We cannot satisfy both requirements. This is why preventing irrecoverable schedules is essential—they create impossible recovery scenarios.
Once a transaction commits based on uncommitted data from another transaction that later aborts, the database enters an irrecoverable state. No recovery algorithm can correctly restore consistency without violating the durability guarantee. Prevention is the only solution.
To prevent irrecoverable situations, we need precise rules governing when transactions can commit. This leads to the formal definition of a recoverable schedule.
Definition: A schedule S is recoverable if and only if, for every pair of transactions Tᵢ and Tⱼ in S where Tⱼ reads a data item written by Tᵢ:
commit(Tᵢ) < commit(Tⱼ) or Tᵢ aborts
In plain terms: If transaction Tⱼ reads data written by transaction Tᵢ, then Tⱼ must not commit until after Tᵢ commits (or Tᵢ aborts, in which case Tⱼ should also abort).
This definition ensures that when a transaction commits, all the data it read has been finalized by committed transactions. No committed transaction ever depends on uncommitted (potentially aborted) data.
| Scenario | T₁ Status | T₂ Reads from T₁ | T₂ Commits Before T₁ | Recoverable? |
|---|---|---|---|---|
| Proper ordering | Commits | Yes | No | ✓ Yes |
| Early commit | Eventually commits | Yes | Yes (violates order) | ✗ No |
| T₁ aborts first | Aborts | Yes | T₂ should abort | ✓ Yes (if T₂ aborts) |
| No dependency | Any | No | Any | ✓ Yes (irrelevant) |
| T₂ commits, T₁ later aborts | Aborts | Yes | Yes | ✗ No (irrecoverable) |
Understanding the "reads from" relationship:
Transaction Tⱼ reads from transaction Tᵢ when:
This creates a data dependency from Tᵢ to Tⱼ. The correctness of Tⱼ's execution depends on the correctness of Tᵢ's write.
If T₃ reads from T₂, and T₂ reads from T₁, then T₁ must commit before T₂, and T₂ must commit before T₃. Dependencies are transitive—a chain of reads creates a chain of required commit orderings.
Let's examine concrete schedule examples to build intuition for identifying recoverable and non-recoverable schedules. We'll use the notation:
A more complex example:
Consider this schedule with three transactions:
T₁: R(A) W(A) C
T₂: R(A) W(B) C
T₃: R(B) C
Dependency Analysis:
Required commit order: T₁ → T₂ → T₃
Actual commit order in schedule: T₁ (line shows C first), then T₂ (C), then T₃ (C)
This satisfies the constraints, so the schedule is recoverable.
To check if a schedule is recoverable: (1) Identify all read-write dependencies (which transaction reads from which), (2) For each dependency Tⱼ reads from Tᵢ, verify that if both commit, Tᵢ commits first, (3) If all dependencies satisfy this ordering, the schedule is recoverable.
Detecting non-recoverable schedules requires tracking read-write dependencies and commit orderings. Here's a systematic approach:
Algorithm for Recoverability Check:
Build the dependency graph: Create a directed graph where each node is a transaction. Add an edge from Tᵢ to Tⱼ if Tⱼ reads a data item last written by Tᵢ.
Record commit times: Note when each transaction commits (or if it aborts).
Verify ordering: For each edge (Tᵢ → Tⱼ) in the dependency graph:
Conclusion: If all edges satisfy the ordering constraint, the schedule is recoverable.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135
from typing import Dict, List, Tuple, Setfrom enum import Enumfrom dataclasses import dataclass, field class TransactionStatus(Enum): ACTIVE = "active" COMMITTED = "committed" ABORTED = "aborted" @dataclassclass Transaction: id: str status: TransactionStatus = TransactionStatus.ACTIVE commit_order: int = -1 # -1 means not yet committed @dataclassclass ScheduleAnalyzer: """ Analyzes transaction schedules for recoverability. Recoverability Definition: A schedule is recoverable iff for every pair of transactions Ti and Tj where Tj reads from Ti: - If both commit: commit(Ti) < commit(Tj) - If Ti aborts: Tj must not commit """ transactions: Dict[str, Transaction] = field(default_factory=dict) last_writer: Dict[str, str] = field(default_factory=dict) # item -> transaction_id dependencies: List[Tuple[str, str]] = field(default_factory=list) # (writer, reader) commit_counter: int = 0 def write(self, txn_id: str, item: str) -> None: """Record a write operation.""" if txn_id not in self.transactions: self.transactions[txn_id] = Transaction(id=txn_id) self.last_writer[item] = txn_id print(f" {txn_id}: W({item})") def read(self, txn_id: str, item: str) -> None: """Record a read operation and track dependency.""" if txn_id not in self.transactions: self.transactions[txn_id] = Transaction(id=txn_id) # If another transaction wrote this item, create dependency if item in self.last_writer: writer_id = self.last_writer[item] if writer_id != txn_id: self.dependencies.append((writer_id, txn_id)) print(f" {txn_id}: R({item}) -- reads from {writer_id}") else: print(f" {txn_id}: R({item}) -- reads own write") else: print(f" {txn_id}: R({item}) -- reads initial value") def commit(self, txn_id: str) -> None: """Record a commit operation.""" if txn_id not in self.transactions: self.transactions[txn_id] = Transaction(id=txn_id) self.transactions[txn_id].status = TransactionStatus.COMMITTED self.transactions[txn_id].commit_order = self.commit_counter self.commit_counter += 1 print(f" {txn_id}: COMMIT (order: {self.transactions[txn_id].commit_order})") def abort(self, txn_id: str) -> None: """Record an abort operation.""" if txn_id not in self.transactions: self.transactions[txn_id] = Transaction(id=txn_id) self.transactions[txn_id].status = TransactionStatus.ABORTED print(f" {txn_id}: ABORT") def is_recoverable(self) -> Tuple[bool, str]: """ Check if the schedule is recoverable. Returns: Tuple of (is_recoverable, explanation) """ print("Analyzing recoverability...") print(f"Dependencies (writer → reader): {self.dependencies}") for writer_id, reader_id in self.dependencies: writer = self.transactions.get(writer_id) reader = self.transactions.get(reader_id) if not writer or not reader: continue # Case 1: Reader committed, writer aborted if (reader.status == TransactionStatus.COMMITTED and writer.status == TransactionStatus.ABORTED): return False, ( f"Non-recoverable: {reader_id} committed but " f"depends on {writer_id} which aborted" ) # Case 2: Both committed, check order if (reader.status == TransactionStatus.COMMITTED and writer.status == TransactionStatus.COMMITTED): if reader.commit_order < writer.commit_order: return False, ( f"Non-recoverable: {reader_id} (order {reader.commit_order}) " f"committed before {writer_id} (order {writer.commit_order}), " f"but {reader_id} reads from {writer_id}" ) return True, "Schedule is recoverable: all commit orderings satisfy dependencies" # Example: Non-Recoverable Scheduleprint("=" * 60)print("Example 1: NON-RECOVERABLE SCHEDULE")print("=" * 60)analyzer1 = ScheduleAnalyzer()analyzer1.write("T1", "X")analyzer1.read("T2", "X") # T2 reads from T1analyzer1.commit("T2") # T2 commits first (violation!)analyzer1.abort("T1") # T1 abortsresult1, explanation1 = analyzer1.is_recoverable()print(f"Result: {explanation1}") # Example: Recoverable Scheduleprint("=" * 60)print("Example 2: RECOVERABLE SCHEDULE")print("=" * 60)analyzer2 = ScheduleAnalyzer()analyzer2.write("T1", "X")analyzer2.read("T2", "X") # T2 reads from T1analyzer2.commit("T1") # T1 commits first (correct!)analyzer2.commit("T2") # T2 commits afterresult2, explanation2 = analyzer2.is_recoverable()print(f"Result: {explanation2}")Practical Implementation Considerations:
In real database systems, ensuring recoverability is typically enforced through:
Strict Two-Phase Locking (Strict 2PL): Holds all exclusive locks until commit, preventing other transactions from reading uncommitted data.
Multi-Version Concurrency Control (MVCC): Transactions read committed versions, never uncommitted data.
Commit protocols: The system delays a transaction's commit until all transactions it read from have committed.
These mechanisms guarantee recoverability by construction rather than by checking after the fact.
A dirty read occurs when a transaction reads data written by another transaction that has not yet committed. Dirty reads are fundamentally connected to recoverability issues.
The relationship:
This distinction is important: dirty reads are a necessary but not sufficient condition for non-recoverability.
| Scenario | Dirty Read? | Recoverable? | Explanation |
|---|---|---|---|
| T₂ reads T₁'s uncommitted write; T₁ commits; T₂ commits | Yes | Yes | Dirty read occurred, but commit order is correct |
| T₂ reads T₁'s uncommitted write; T₂ commits; T₁ commits | Yes | No | Dirty read + wrong commit order → irrecoverable |
| T₂ reads T₁'s uncommitted write; T₁ aborts; T₂ aborts | Yes | Yes | Dirty read cascaded to abort, but recoverable |
| T₂ reads T₁'s uncommitted write; T₂ commits; T₁ aborts | Yes | No | Committed transaction depends on aborted data |
| T₂ reads only committed data | No | Yes | No dirty reads = guaranteed recoverable |
Preventing dirty reads entirely (through locking or MVCC) is the most common approach because it eliminates the possibility of non-recoverable schedules by construction. However, this comes with performance costs. Some systems allow dirty reads for performance but use additional mechanisms to ensure recoverability.
SQL Isolation Levels and Dirty Reads:
| Isolation Level | Allows Dirty Reads | Recoverable by Design |
|---|---|---|
| READ UNCOMMITTED | Yes | Must track dependencies |
| READ COMMITTED | No | Yes (by construction) |
| REPEATABLE READ | No | Yes (by construction) |
| SERIALIZABLE | No | Yes (by construction) |
The READ UNCOMMITTED isolation level explicitly allows dirty reads, placing the burden of ensuring recoverability on the application or other mechanisms.
Ensuring recoverable schedules requires controlling when transactions can commit. There are several strategies database systems employ:
Strategy 1: Deferred Commit
When a transaction Tⱼ reads from an uncommitted transaction Tᵢ, the system defers Tⱼ's commit until after Tᵢ commits (or aborts, causing Tⱼ to abort too).
Tⱼ issues COMMIT → System checks dependencies →
If Tᵢ still active: wait for Tᵢ
If Tᵢ committed: proceed with Tⱼ commit
If Tᵢ aborted: abort Tⱼ too
Strategy 2: Prevent Dirty Reads (Blocking)
Prevent transactions from reading uncommitted data entirely using locks or versioning:
Tⱼ requests READ(X) → X was written by uncommitted Tᵢ →
Block Tⱼ until Tᵢ commits or aborts
Strategy 3: Multi-Version Reads
Maintain multiple versions of data. Transactions always read the most recent committed version:
Tⱼ requests READ(X) → System returns last committed version of X
(Even if uncommitted Tᵢ has written a newer version)
Most modern OLTP databases use Multi-Version Concurrency Control (MVCC) because it provides excellent read concurrency without dirty reads. Writers don't block readers, and readers always see consistent committed data. This makes recoverability automatic while maintaining high performance.
Understanding recoverability has direct implications for database administration, application development, and system design:
For Database Administrators:
For Application Developers:
| Use Case | Recommended Approach | Justification |
|---|---|---|
| Financial transactions | Strict schedules (covered later) | Cannot tolerate any data inconsistency |
| Analytics/reporting | MVCC with snapshot isolation | Read consistency without blocking writers |
| High-throughput logging | May accept some risk | Volume makes strict approaches costly |
| Master-slave replication | Serializable at master | Replication depends on consistent commit order |
| Distributed transactions | Two-phase commit | Coordinates commits across systems |
In production systems, non-recoverable schedules are especially dangerous because they may not manifest as immediate failures. The database might appear to work correctly until a specific sequence of events (abort after dependent commit) creates an irrecoverable state. Testing may not catch this because it requires specific failure timing.
Recoverable schedules are a fundamental requirement for database systems that must survive failures. Let's consolidate the key concepts:
What's next:
Recoverable schedules guarantee that recovery is possible, but they don't address the cost of recovery. When a transaction aborts, other transactions that read its data may also need to abort. This can trigger a chain reaction called a cascading rollback. The next page explores this phenomenon and its implications for system performance and availability.
You now understand recoverable schedules—the fundamental requirement that commit ordering must respect read-write dependencies. This ensures that database recovery is always possible. Next, we'll explore what happens when recovery does occur and how cascading rollbacks can amplify the impact of a single transaction failure.