Loading learning content...
When a database crashes, how does it come back to life with all committed data intact and no trace of incomplete transactions? The answer lies in recovery algorithms—carefully designed procedures that analyze what happened before the crash and restore the database to a consistent state.
Recovery algorithms represent decades of computer science research. From the early, simple approaches that traded performance for simplicity, to ARIES—the sophisticated algorithm that enables high-performance databases to recover correctly from any failure scenario—this evolution tells the story of how database theory became practice.
Understanding recovery algorithms at a conceptual level prepares you to appreciate the detailed algorithms in subsequent chapters and to make informed decisions about database selection and configuration.
By the end of this page, you will understand the major families of recovery algorithms, their tradeoffs between performance and recovery simplicity, the key design decisions that differentiate approaches, and why ARIES became the dominant algorithm in modern databases. This overview prepares you for deep dives into specific algorithms.
Recovery algorithms can be classified along several dimensions. Understanding these dimensions clarifies why different algorithms exist and which scenarios each addresses best:
Key Classification Dimensions:
| Dimension | Options | Tradeoff |
|---|---|---|
| Update Timing | Deferred update vs Immediate update | Simpler recovery vs Better performance |
| Buffer Policy | NO-STEAL vs STEAL | Simpler undo vs Buffer flexibility |
| Flush Policy | FORCE vs NO-FORCE | Simpler redo vs Faster commits |
| Log Content | Physical vs Logical vs Physiological | Recovery simplicity vs Log size |
| Redo Scope | Page-oriented vs Transaction-oriented | Parallelism vs Simplicity |
The Fundamental Recovery Requirements:
All recovery algorithms must provide:
Undo and Redo Capabilities:
| Buffer/Flush Policy | Requires Undo? | Requires Redo? | Explanation |
|---|---|---|---|
| NO-STEAL / FORCE | No | No | All committed work on disk; no uncommitted work on disk |
| NO-STEAL / NO-FORCE | No | Yes | Committed work may not be on disk; need redo |
| STEAL / FORCE | Yes | No | Uncommitted work may be on disk; need undo |
| STEAL / NO-FORCE | Yes | Yes | Both situations possible; need both |
Most modern databases use STEAL/NO-FORCE because it provides maximum flexibility for the buffer manager, even though it requires the most sophisticated recovery.
STEAL allows the buffer manager to evict any page when memory is needed, preventing long-running transactions from monopolizing the buffer pool. NO-FORCE allows transactions to commit without waiting for all modified pages to be written. Together, these policies enable high-performance operation at the cost of complex recovery.
Deferred update is conceptually the simplest recovery approach. Under this scheme, modifications are not written to the database until after the transaction commits. All modifications are kept in memory and written to a log, but the actual database pages remain unchanged until commit time.
Deferred Update Algorithm:
Where Deferred Update is Used:
Deferred update is rarely used in production OLTP databases due to its limitations. However, it appears in:
The fundamental limitation is that NO-STEAL policy prevents the buffer manager from evicting dirty pages from uncommitted transactions, which can exhaust memory under concurrent load.
Deferred update is excellent for learning recovery concepts because it separates redo from undo. Understanding why its limitations led to immediate update helps appreciate the complexity of STEAL/NO-FORCE systems.
Immediate update allows modifications to be written to the database before the transaction commits. This is the STEAL policy—dirty pages can be 'stolen' from the buffer pool and written to disk even if their transaction is still active.
Immediate Update Algorithm:
The UNDO/REDO Challenge:
Immediate update recovery is more complex because:
Log Record Requirements:
┌─────────────────────────────────────────────────────────┐
│ Immediate Update Log Record │
├─────────────────────────────────────────────────────────┤
│ LSN: Unique log sequence number │
│ Transaction ID: Which transaction │
│ Previous LSN: Previous record for this transaction │
│ Page ID: Which page was modified │
│ Offset: Where on the page │
│ Before Image: Old value (for UNDO) │
│ After Image: New value (for REDO) │
│ Operation Type: INSERT/UPDATE/DELETE │
└─────────────────────────────────────────────────────────┘
Before images enable undo; after images enable redo. Both are required.
Immediate update with STEAL/NO-FORCE is used by virtually all production OLTP databases because the performance benefits outweigh the recovery complexity. The ARIES algorithm, developed specifically for this model, has become the de facto standard for recovery implementation.
Shadow paging takes a completely different approach to recovery by never overwriting existing data. Instead, modifications create new pages, and commit atomically switches from old pages to new pages.
Shadow Paging Concept:
BEFORE TRANSACTION:
Page Table (Current) Data Pages
┌─────────────────┐ ┌──────────────┐
│ Page 0 → Addr A │───────▶│ Original │ (Address A)
│ Page 1 → Addr B │───────▶│ Original │ (Address B)
│ Page 2 → Addr C │───────▶│ Original │ (Address C)
└─────────────────┘ └──────────────┘
│
└── Shadow (Copy)
┌─────────────────┐
│ Page 0 → Addr A │
│ Page 1 → Addr B │ (Identical at start)
│ Page 2 → Addr C │
└─────────────────┘
DURING TRANSACTION (modify Page 1):
Current Page Table Shadow Page Table Data Pages
┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐
│ Page 0 → Addr A │ │ Page 0 → Addr A │───▶│ Original │ (A)
│ Page 1 → Addr B │ │ Page 1 → Addr D │───▶│ Modified! │ (D) NEW
│ Page 2 → Addr C │ │ Page 2 → Addr C │───▶│ Original │ (C)
└─────────────────┘ └─────────────────┘ └──────────────┘
COMMIT: Atomically replace Current Page Table pointer with Shadow Page Table
Now Address D is the official Page 1; Address B can be reclaimed
Where Shadow Paging is Used:
Shadow paging's main weakness—difficulty with concurrent transactions—limits its use in high-concurrency OLTP systems. However, its simplicity makes it attractive for embedded databases and copy-on-write file systems.
Shadow paging concepts have seen renewed interest through copy-on-write file systems (ZFS, Btrfs) and newer databases optimized for SSDs. Since SSDs prefer sequential writes to new locations over random overwrites anyway, the 'write to new location' model aligns with SSD performance characteristics.
ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) is the dominant recovery algorithm in modern database systems. Developed at IBM Research in the early 1990s, ARIES provides a comprehensive framework for recovery that handles all the complexities of STEAL/NO-FORCE buffer management.
ARIES Design Principles:
ARIES Three-Phase Recovery:
┌─────────────────────────────────────────────────────────────────┐
│ ARIES RECOVERY │
└─────────────────────────────────────────────────────────────────┘
│
┌──────────────────────────────┼──────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ ANALYSIS │ │ REDO │ │ UNDO │
│ PHASE │ │ PHASE │ │ PHASE │
├──────────┤ ├──────────┤ ├──────────┤
│ Scan log │ │ Repeat │ │ Rollback │
│ forward │ │ history: │ │ loser │
│ from │ │ redo all │ │ trans- │
│ last │────────────────▶│ logged │─────────────────────▶│ actions │
│ check- │ │ actions │ │ using │
│ point │ │ since │ │ log │
│ │ │ redo │ │ │
│ Build: │ │ point │ │ Write │
│ • ATT │ │ │ │ CLRs │
│ • DPT │ │ Compare │ │ │
│ │ │ LSNs │ │ │
└──────────┘ └──────────┘ └──────────┘
│ │ │
│ │ │
└────────────────────────────┴─────────────────────────────────┘
│
▼
Database Consistent!
Phase Purposes:
| Phase | Direction | Output | Purpose |
|---|---|---|---|
| Analysis | Forward from checkpoint | ATT, DPT, Redo point | Determine what needs redo/undo |
| Redo | Forward from redo point | Restored pages | Repeat history to crash-point state |
| Undo | Backward through loser transactions | CLRs, Clean pages | Remove uncommitted modifications |
ARIES redoes everything—even uncommitted transactions—because this restores the exact pre-crash page states. If pages aren't in their pre-crash state, the undo phase might try to reverse changes that weren't actually on the page. Repeating history ensures correctness of subsequent undo.
Recovery algorithms differ in what information they log. This choice affects log size, recovery complexity, and concurrency:
Logging Granularity Options:
| Type | What's Logged | Log Size | Redo/Undo Complexity | Concurrency |
|---|---|---|---|---|
| Physical | Exact byte changes to specific disk locations | Large | Simple (byte copy) | Low (conflicts at byte level) |
| Logical | High-level operation (e.g., 'INSERT row X') | Small | Complex (re-execute operation) | High (operation-level conflicts) |
| Physiological | Operation on specific page (physical location + logical operation) | Medium | Moderate | High (ARIES approach) |
Physical Logging:
Log Record: "At page 42, offset 128, replace bytes [A,B,C,D] with [W,X,Y,Z]"
Logical Logging:
Log Record: "INSERT into Employee table: (id=42, name='Alice', dept=5)"
Physiological Logging (ARIES Approach):
Log Record: "On page 42: INSERT row at slot 7, content = [...]"
Physiological logging combines physical page identification (enabling parallel redo across pages) with logical operations (enabling concurrent modifications within pages). This is why ARIES achieves both correctness and high performance.
Let's bring together all the recovery algorithms into a comprehensive comparison:
| Characteristic | Deferred Update | Immediate Update | Shadow Paging | ARIES |
|---|---|---|---|---|
| Buffer Policy | NO-STEAL | STEAL | N/A (copy-on-write) | STEAL |
| Flush Policy | FORCE | Varies | N/A | NO-FORCE |
| Undo Required? | No | Yes | No | Yes |
| Redo Required? | Yes | Yes | No | Yes |
| Log Content | After images | Before + after | None (or minimal) | Physiological |
| Commit Speed | Slow (FORCE) | Fast (NO-FORCE) | Fast (pointer swap) | Fast (NO-FORCE) |
| Memory Requirement | High (hold all changes) | Low | Medium | Low |
| Recovery Complexity | Simple (redo only) | Complex (undo + redo) | Simple (no redo/undo) | Complex (3 phases) |
| Concurrency Support | Limited | Good | Difficult | Excellent |
| Industry Adoption | Rare (special cases) | Moderate | Niche (SQLite, LMDB) | Dominant (PostgreSQL, MySQL, Oracle, SQL Server) |
Why ARIES Dominates:
ARIES became the industry standard because it achieves the best combination of:
The Complexity Cost:
ARIES's sophistication comes with implementation complexity. The algorithm has many subtle details:
This complexity is why database recovery code is among the most carefully reviewed and tested in any software system.
While ARIES is the conceptual foundation, each database implements its own variation. PostgreSQL, MySQL/InnoDB, Oracle, and SQL Server all have ARIES-inspired recovery systems with vendor-specific optimizations and extensions. Understanding core ARIES translates well to understanding any of them.
Recovery algorithms represent the practical realization of ACID guarantees. From simple deferred update to sophisticated ARIES, the evolution reflects the industry's drive toward higher performance without sacrificing correctness. Let's consolidate the key insights:
What's Next:
With a broad understanding of recovery algorithms, we're ready to explore how ACID properties and recovery are fundamentally interconnected. The next page examines how each ACID property—Atomicity, Consistency, Isolation, and Durability—depends on recovery mechanisms, completing our conceptual foundation for the detailed algorithm chapters ahead.
You now have a comprehensive mental map of recovery algorithm families—their designs, tradeoffs, and use cases. This overview prepares you to dive deep into ARIES and other specific algorithms, understanding not just how they work but why they're designed as they are.