Loading learning content...
Throughout this module, we've explored the hierarchy of schedule properties: recoverable, cascadeless, and strict schedules. Each property adds constraints on how transactions can interact, with profound implications for database recovery.
Now we connect the dots. When a database system crashes in the middle of executing transactions, when a transaction encounters an error and must abort, when the system must restore itself to a consistent state—these are the moments when recoverability properties determine success or failure.
This page examines how real database recovery systems leverage these schedule properties, the algorithms they use, and the trade-offs they make. Understanding this connection transforms abstract schedule properties into concrete engineering decisions.
By the end of this page, you will understand how schedule properties affect recovery algorithms, the role of Write-Ahead Logging (WAL) in recovery, undo and redo recovery operations, how ARIES recovery handles different schedule types, and practical guidelines for choosing schedule properties in system design.
Database recovery addresses one fundamental question: After a failure, how do we restore the database to a consistent state where all committed transaction effects are preserved and all uncommitted transaction effects are removed?
This involves two complementary operations:
UNDO (Rollback):
REDO (Roll-forward):
Schedule properties directly impact how these operations work:
| Schedule Property | UNDO Complexity | REDO Complexity | Recovery Algorithm Impact |
|---|---|---|---|
| Non-recoverable | Impossible | Impossible | Cannot recover—committed transactions depend on aborted data |
| Recoverable only | Complex (cascades) | Standard | Must identify and abort all cascade victims |
| Cascadeless | Moderate | Standard | No cascade tracking, but write-write dependencies may exist |
| Strict | Simple | Standard | Before-images are committed; straightforward undo |
Unlike other database features that can be traded off for performance, recovery correctness is absolute. A database that cannot correctly recover from failures is fundamentally unusable for any application requiring data durability.
Write-Ahead Logging (WAL) is the foundation of recovery in virtually all modern databases. The WAL protocol ensures that recovery information is always available:
The WAL Rule:
Before any data modification is written to the database, the corresponding log record MUST be written to stable storage (log file on disk).
This ensures that if a failure occurs, the log contains all the information needed to recover:
Log Record Structure:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
from dataclasses import dataclassfrom typing import Any, Optionalfrom enum import Enum class LogRecordType(Enum): UPDATE = "UPDATE" # Data modification COMMIT = "COMMIT" # Transaction commit ABORT = "ABORT" # Transaction abort CHECKPOINT = "CHECKPOINT" # Consistency point BEGIN = "BEGIN" # Transaction start CLR = "CLR" # Compensation Log Record (for undo) @dataclassclass WALLogRecord: """ Standard Write-Ahead Log record structure. The log contains all information needed for recovery: - Before-images for UNDO - After-images for REDO - Transaction state for determining what to do """ lsn: int # Log Sequence Number (unique, increasing) txn_id: str # Transaction ID record_type: LogRecordType # Type of log record # For UPDATE records: page_id: Optional[str] = None # Which page was modified offset: Optional[int] = None # Position within page before_image: Optional[Any] = None # Value before update (for UNDO) after_image: Optional[Any] = None # Value after update (for REDO) # For linking: prev_lsn: Optional[int] = None # Previous log record for this transaction undo_next_lsn: Optional[int] = None # For CLRs: next record to undo def can_undo(self) -> bool: """Check if this record can be undone.""" return self.record_type == LogRecordType.UPDATE and self.before_image is not None def can_redo(self) -> bool: """Check if this record can be redone.""" return self.record_type == LogRecordType.UPDATE and self.after_image is not None # Example log sequence demonstrating WALprint("=" * 70)print("WRITE-AHEAD LOG EXAMPLE")print("=" * 70) log = [ WALLogRecord(100, "T1", LogRecordType.BEGIN), WALLogRecord(101, "T1", LogRecordType.UPDATE, "P1", 0, before_image="A=50", after_image="A=100"), WALLogRecord(102, "T2", LogRecordType.BEGIN), WALLogRecord(103, "T2", LogRecordType.UPDATE, "P2", 0, before_image="B=200", after_image="B=150"), WALLogRecord(104, "T1", LogRecordType.UPDATE, "P1", 4, before_image="C=30", after_image="C=80"), WALLogRecord(105, "T1", LogRecordType.COMMIT), # T1 committed # ---- CRASH OCCURS HERE ----] print("\nLog records at crash time:")for record in log: if record.record_type == LogRecordType.UPDATE: print(f" LSN {record.lsn}: {record.txn_id} {record.record_type.value} " f"{record.page_id}:{record.before_image} → {record.after_image}") else: print(f" LSN {record.lsn}: {record.txn_id} {record.record_type.value}") print("\nRecovery analysis:")print(" T1: COMMITTED (LSN 105) - REDO its updates if needed")print(" T2: No COMMIT record - UNDO its updates (restore before-images)")Why schedule properties matter for WAL:
| Property | Before-Image Reliability | Recovery Requirement |
|---|---|---|
| Recoverable | May contain uncommitted data | Must track dependencies |
| Cascadeless | May contain uncommitted writes | Simpler but still need care |
| Strict | Always committed data | Simple, independent undo |
With strict schedules, before-images in the log are always committed values. This means UNDO can simply restore the before-image without checking if it came from an uncommitted transaction.
UNDO recovery reverses the effects of uncommitted transactions. The complexity of UNDO directly depends on schedule properties:
UNDO in Non-Recoverable Schedules:
UNDO in Recoverable (not cascadeless) Schedules:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
from dataclasses import dataclass, fieldfrom typing import List, Dict, Set, Any, Optionalfrom enum import Enum class TxnState(Enum): ACTIVE = "ACTIVE" COMMITTED = "COMMITTED" ABORTED = "ABORTED" @dataclassclass UndoRecoveryComplex: """ UNDO recovery for recoverable (but not cascadeless) schedules. Must handle cascading aborts during recovery. """ committed_before_crash: Set[str] = field(default_factory=set) active_at_crash: Set[str] = field(default_factory=set) read_from: Dict[str, Set[str]] = field(default_factory=dict) # txn -> set of txns it read from def identify_cascade_victims(self) -> Set[str]: """ In a recoverable (not cascadeless) schedule, we must find all transactions that read from transactions we're going to undo. This is complex because: 1. Start with active (uncommitted) transactions 2. Find all transactions that read from them 3. If any of those read from another active one, add them 4. Repeat until no new victims found """ to_abort = self.active_at_crash.copy() changed = True iterations = 0 while changed: changed = False iterations += 1 new_victims = set() for txn, read_sources in self.read_from.items(): if txn not in to_abort and txn not in self.committed_before_crash: # Check if this transaction read from any transaction we're aborting if read_sources & to_abort: new_victims.add(txn) changed = True to_abort.update(new_victims) print(f" Iteration {iterations}: Found {len(new_victims)} new cascade victims") return to_abort def perform_undo(self, to_abort: Set[str]) -> None: """ Undo all transactions in the abort set. Order matters here for correctness! """ print(f"\n Transactions to UNDO: {to_abort}") print(" Must undo in reverse chronological order of their writes") print(" Must also consider that some transactions may have read from each other") print(" COMPLEX: Recovery manager needs full dependency information") @dataclassclass UndoRecoverySimple: """ UNDO recovery for strict schedules. Much simpler because before-images are reliable. """ active_at_crash: Set[str] = field(default_factory=set) def perform_undo(self, log_records: List[Any]) -> None: """ Simple UNDO for strict schedules: 1. Find all active transactions 2. For each, restore before-images 3. No need to track dependencies—before-images are committed values """ print(f"\n Transactions to UNDO: {self.active_at_crash}") print(" SIMPLE: Just restore before-images for each") print(" No cascade analysis needed") print(" Order doesn't matter (before-images are independent)") # Comparison demonstrationprint("=" * 70)print("UNDO RECOVERY COMPLEXITY COMPARISON")print("=" * 70) # Complex case (recoverable but not cascadeless)print("\n--- Recoverable (with dirty reads) Schedule ---")complex_recovery = UndoRecoveryComplex( committed_before_crash={"T4", "T5"}, active_at_crash={"T1", "T3"}, read_from={ "T2": {"T1"}, # T2 read from T1 (dirty read) "T3": {"T2"}, # T3 read from T2 "T4": set(), # T4 read from committed data only "T5": {"T4"}, # T5 read from T4 (committed) })cascade_victims = complex_recovery.identify_cascade_victims()complex_recovery.perform_undo(cascade_victims) # Simple case (strict schedule)print("\n--- Strict Schedule ---")simple_recovery = UndoRecoverySimple( active_at_crash={"T1", "T3"})simple_recovery.perform_undo([]) print("\n" + "=" * 70)print("KEY INSIGHT: Strict schedules make UNDO trivially simple.")print("=" * 70)For non-cascadeless schedules, the recovery system must identify cascading aborts during recovery. This requires either maintaining dependency graphs during execution or re-analyzing the log to determine which transactions read uncommitted data. Both approaches add significant complexity.
REDO recovery ensures that committed transaction effects are durable—even if they weren't written to disk before the crash. Interestingly, REDO is largely independent of schedule properties.
Why REDO is simpler:
The REDO Process:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
from dataclasses import dataclass, fieldfrom typing import List, Dict, Any, Setfrom enum import Enum class RecordType(Enum): UPDATE = "UPDATE" COMMIT = "COMMIT" BEGIN = "BEGIN" @dataclassclass LogRecord: lsn: int txn_id: str record_type: RecordType page_id: str = "" after_image: Any = None page_lsn: int = 0 # LSN of last update applied to page @dataclassclass Page: page_id: str data: Dict[str, Any] page_lsn: int = 0 # Last LSN applied to this page class RedoRecovery: """ REDO recovery is relatively simple regardless of schedule type: For each log record of a committed transaction: If the page's LSN < log record's LSN: Apply the after-image (redo the update) The page_lsn comparison ensures we don't redo updates that were already written to disk. """ def __init__(self, log: List[LogRecord], pages: Dict[str, Page]): self.log = log self.pages = pages def identify_committed_transactions(self) -> Set[str]: """Find all transactions that committed before crash.""" committed = set() for record in self.log: if record.record_type == RecordType.COMMIT: committed.add(record.txn_id) return committed def redo_pass(self) -> None: """ Redo all committed transaction updates. This is a single forward pass through the log. For each UPDATE by a committed transaction, redo if the page doesn't have this update yet. """ committed = self.identify_committed_transactions() print(f"Committed transactions: {committed}") print("\nStarting REDO pass (forward through log)...") redo_count = 0 skip_count = 0 for record in self.log: if record.record_type != RecordType.UPDATE: continue if record.txn_id not in committed: # Skip uncommitted transaction updates continue page = self.pages.get(record.page_id) if page is None: # Page might not be loaded; would need to fetch from disk print(f" LSN {record.lsn}: Page {record.page_id} not in buffer") continue if page.page_lsn < record.lsn: # This update wasn't applied to disk; redo it print(f" LSN {record.lsn}: REDO {record.txn_id}'s update to {record.page_id}") print(f" Applying: {record.after_image}") page.data = record.after_image page.page_lsn = record.lsn redo_count += 1 else: # Update already on disk; skip print(f" LSN {record.lsn}: SKIP (page_lsn {page.page_lsn} >= {record.lsn})") skip_count += 1 print(f"\nREDO complete: {redo_count} redone, {skip_count} skipped") # Demonstrationprint("=" * 70)print("REDO RECOVERY DEMONSTRATION")print("=" * 70) log = [ LogRecord(100, "T1", RecordType.BEGIN), LogRecord(101, "T1", RecordType.UPDATE, "P1", after_image={"A": 100}), LogRecord(102, "T2", RecordType.BEGIN), LogRecord(103, "T1", RecordType.UPDATE, "P2", after_image={"B": 200}), LogRecord(104, "T2", RecordType.UPDATE, "P3", after_image={"C": 300}), LogRecord(105, "T1", RecordType.COMMIT), # T1 committed LogRecord(106, "T2", RecordType.UPDATE, "P1", after_image={"A": 150}), # CRASH - T2 never committed] # Simulate that P1 has LSN 101 on disk, P2 has LSN 0 (update wasn't flushed)pages = { "P1": Page("P1", {"A": 100}, page_lsn=101), # T1's first update was flushed "P2": Page("P2", {"B": 0}, page_lsn=0), # T1's second update wasn't flushed "P3": Page("P3", {"C": 0}, page_lsn=0), # T2's update wasn't flushed} print("\nInitial page states:")for pid, page in pages.items(): print(f" {pid}: {page.data}, page_lsn={page.page_lsn}") print()recovery = RedoRecovery(log, pages)recovery.redo_pass() print("\nFinal page states:")for pid, page in pages.items(): print(f" {pid}: {page.data}, page_lsn={page.page_lsn}") print("\nNote: T2's update to P3 was NOT redone (T2 didn't commit)")Unlike UNDO, REDO recovery works the same way regardless of schedule type. The key insight is that we only REDO committed transactions, and recoverability guarantees that committed transactions don't depend on uncommitted data. This makes REDO straightforward.
ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) is the most widely used recovery algorithm in commercial databases. It's used by IBM DB2, Microsoft SQL Server, and influenced PostgreSQL and MySQL InnoDB.
ARIES Key Principles:
Three Phases of ARIES Recovery:
How Schedule Properties Affect ARIES:
| Phase | Recoverable | Cascadeless | Strict |
|---|---|---|---|
| Analysis | Must track dependencies | Standard | Standard |
| REDO | Same (repeat history) | Same | Same |
| UNDO | Complex (cascades) | Moderate | Simple |
ARIES assumes strict schedules in its standard form. For non-strict schedules, the UNDO phase would need modification to handle dependencies.
Why databases use strict schedules with ARIES:
When your PostgreSQL or SQL Server database recovers after a crash, it's running a variant of ARIES. The database reads the WAL, determines what transactions were in progress, redoes committed work, and undoes uncommitted work—all leveraging the guarantees provided by strict schedules.
Checkpoints are periodic snapshots that reduce recovery time. Without checkpoints, recovery would need to process the entire log from the beginning of time. Checkpoints provide a "safe starting point."
Types of Checkpoints:
1. Consistent Checkpoint (Quiesce):
Simple but causes a pause in processing.
2. Fuzzy Checkpoint:
Used by most production systems for minimal interruption.
| Checkpoint Type | Processing Impact | Recovery Start Point | Log Needed |
|---|---|---|---|
| None | None | Beginning of log | Entire log |
| Consistent | Pause during checkpoint | Last checkpoint | Log since checkpoint |
| Fuzzy | Minimal pause | Oldest dirty page or active txn | Log since oldest active |
Recovery time formula:
Recovery Time ≈ Analysis Time + REDO Time + UNDO Time
Analysis Time ∝ Log records since checkpoint
REDO Time ∝ Dirty pages × Log records
UNDO Time ∝ Active transactions × Their log records
For strict schedules, UNDO is simpler:
Checkpoint frequency trade-off:
Production systems often have Recovery Time Objectives (RTOs). Checkpoint frequency is tuned to meet these SLAs. For a 1-minute RTO, you might checkpoint every 30 seconds. Strict schedules help because UNDO phase is faster and more predictable.
Understanding recoverability properties helps make informed decisions about database configuration and application design:
Guideline 1: Default to Strict Schedules
Unless you have specific requirements that demand otherwise:
Guideline 2: Understand Your Isolation Level
Know what your isolation level provides:
| Use Case | Recommended Level | Recovery Implications |
|---|---|---|
| Financial transactions | SERIALIZABLE | Strictest guarantees, simple recovery |
| General OLTP | READ COMMITTED | Cascadeless + Strict for writes, fast recovery |
| Reporting queries | READ COMMITTED SNAPSHOT | Snapshot isolation, no blocking, simple |
| Approximate analytics | READ UNCOMMITTED (rare) | Not cascadeless, complex recovery if crashes occur |
It's tempting to weaken isolation for performance. Remember: you're trading simple, predictable recovery for marginal throughput gains. When the crash happens—and it will—you'll want the simple recovery path.
We've completed our exploration of recoverability in transaction processing. This final summary brings together all the concepts from the module:
The Big Picture:
Recoverability is not an abstract theoretical concept—it's the foundation that makes databases reliable. When you use a database with confidence that your committed data will survive crashes, you're relying on:
Understanding these concepts helps you choose appropriate isolation levels, design robust applications, and debug issues when things go wrong.
Congratulations! You've mastered the hierarchy of recoverability properties—from basic recoverable schedules through cascadeless and strict schedules. You understand how these properties enable database recovery, how they're implemented in practice, and how to make informed decisions about isolation levels. This knowledge forms the foundation for understanding transaction management and database reliability.