Database Management SystemsRecovery Concepts

Recovery Concepts

LevelIntermediate

Duration75 mins

TopicRecovery Concepts

4 / 5

Recovery Algorithms Overview

Algorithms That Resurrect Databases

When a database crashes, how does it come back to life with all committed data intact and no trace of incomplete transactions? The answer lies in recovery algorithms—carefully designed procedures that analyze what happened before the crash and restore the database to a consistent state.

Recovery algorithms represent decades of computer science research. From the early, simple approaches that traded performance for simplicity, to ARIES—the sophisticated algorithm that enables high-performance databases to recover correctly from any failure scenario—this evolution tells the story of how database theory became practice.

Understanding recovery algorithms at a conceptual level prepares you to appreciate the detailed algorithms in subsequent chapters and to make informed decisions about database selection and configuration.

What You Will Learn

By the end of this page, you will understand the major families of recovery algorithms, their tradeoffs between performance and recovery simplicity, the key design decisions that differentiate approaches, and why ARIES became the dominant algorithm in modern databases. This overview prepares you for deep dives into specific algorithms.

Classifying Recovery Algorithms

Recovery algorithms can be classified along several dimensions. Understanding these dimensions clarifies why different algorithms exist and which scenarios each addresses best:

Key Classification Dimensions:

Recovery Algorithm Classification Dimensions
Dimension	Options	Tradeoff
Update Timing	Deferred update vs Immediate update	Simpler recovery vs Better performance
Buffer Policy	NO-STEAL vs STEAL	Simpler undo vs Buffer flexibility
Flush Policy	FORCE vs NO-FORCE	Simpler redo vs Faster commits
Log Content	Physical vs Logical vs Physiological	Recovery simplicity vs Log size
Redo Scope	Page-oriented vs Transaction-oriented	Parallelism vs Simplicity

The Fundamental Recovery Requirements:

All recovery algorithms must provide:

Atomicity: Uncommitted transactions must leave no trace after recovery
Durability: Committed transactions must be fully present after recovery
Correctness: The recovered database must be in a consistent state
Restartability: Recovery must succeed even if the system crashes during recovery

Undo and Redo Capabilities:

Buffer/Flush Policy	Requires Undo?	Requires Redo?	Explanation
NO-STEAL / FORCE	No	No	All committed work on disk; no uncommitted work on disk
NO-STEAL / NO-FORCE	No	Yes	Committed work may not be on disk; need redo
STEAL / FORCE	Yes	No	Uncommitted work may be on disk; need undo
STEAL / NO-FORCE	Yes	Yes	Both situations possible; need both

Most modern databases use STEAL/NO-FORCE because it provides maximum flexibility for the buffer manager, even though it requires the most sophisticated recovery.

The Performance Motivation

STEAL allows the buffer manager to evict any page when memory is needed, preventing long-running transactions from monopolizing the buffer pool. NO-FORCE allows transactions to commit without waiting for all modified pages to be written. Together, these policies enable high-performance operation at the cost of complex recovery.

Deferred Update Recovery (NO-UNDO/REDO)

Deferred update is conceptually the simplest recovery approach. Under this scheme, modifications are not written to the database until after the transaction commits. All modifications are kept in memory and written to a log, but the actual database pages remain unchanged until commit time.

Deferred Update Algorithm:

Deferred Update Process

•During transaction execution: Record all modifications in the log (with new values only—no old values needed) and in memory buffers
•At commit time: Force log records to stable storage, then write all modified pages to the database
•On abort: Simply discard the in-memory modifications; the database pages were never modified
•On crash recovery: Scan the log forward; redo all committed transactions' modifications; ignore uncommitted transactions (they never touched the database)

Advantages

•No UNDO needed—uncommitted transactions never modify database pages
•Simpler log records—only new values required (no before images)
•Simpler recovery—just redo committed transactions
•Abort is instant—just discard memory

Disadvantages

•All modifications must fit in memory until commit—limits transaction size
•Long-running transactions consume memory throughout their lifetime
•Commit is slow—must write all pages at commit time (FORCE policy)
•Buffer manager cannot evict uncommitted pages—limits buffer flexibility (NO-STEAL)

Where Deferred Update is Used:

Deferred update is rarely used in production OLTP databases due to its limitations. However, it appears in:

In-memory databases: Where 'writing to disk' happens only at checkpoint time
Transaction-isolated analytics: Where long-running analytical transactions build result sets in memory
Simple embedded databases: Where implementation simplicity is valued over performance

The fundamental limitation is that NO-STEAL policy prevents the buffer manager from evicting dirty pages from uncommitted transactions, which can exhaust memory under concurrent load.

Educational Value

Deferred update is excellent for learning recovery concepts because it separates redo from undo. Understanding why its limitations led to immediate update helps appreciate the complexity of STEAL/NO-FORCE systems.

Immediate Update Recovery (UNDO/REDO)

Immediate update allows modifications to be written to the database before the transaction commits. This is the STEAL policy—dirty pages can be 'stolen' from the buffer pool and written to disk even if their transaction is still active.

Immediate Update Algorithm:

Immediate Update Process

•During transaction execution: Record modifications in the log (both old and new values) and in buffer pages. Pages may be written to disk anytime, following WAL protocol.
•At commit time: Force log records to stable storage. Pages may or may not have been written yet.
•On abort: Use log records to undo all modifications (restore old values)
•On crash recovery: First REDO all committed transactions to ensure durability, then UNDO all uncommitted transactions to ensure atomicity

The UNDO/REDO Challenge:

Immediate update recovery is more complex because:

Pages on disk may contain uncommitted modifications — These must be undone
Pages on disk may be missing committed modifications — These must be redone
Order matters — Redo must happen before undo to establish correct page state
Crash during recovery — Must handle crashes during undo (CLRs needed)

Log Record Requirements:

┌─────────────────────────────────────────────────────────┐
│  Immediate Update Log Record                            │
├─────────────────────────────────────────────────────────┤
│  LSN: Unique log sequence number                        │
│  Transaction ID: Which transaction                       │
│  Previous LSN: Previous record for this transaction     │
│  Page ID: Which page was modified                       │
│  Offset: Where on the page                              │
│  Before Image: Old value (for UNDO)                     │
│  After Image: New value (for REDO)                      │
│  Operation Type: INSERT/UPDATE/DELETE                   │
└─────────────────────────────────────────────────────────┘

Before images enable undo; after images enable redo. Both are required.

Advantages

•Transactions can be arbitrarily large—not limited by memory
•Buffer manager has full flexibility to evict any page
•Commit can be fast (NO-FORCE)—just write log records
•Better memory utilization under concurrent load

Disadvantages

•More complex recovery with both redo and undo phases
•Larger log records (both before and after images)
•Abort is slower—must undo all modifications
•Need CLRs to handle crashes during recovery

The Industry Standard

Immediate update with STEAL/NO-FORCE is used by virtually all production OLTP databases because the performance benefits outweigh the recovery complexity. The ARIES algorithm, developed specifically for this model, has become the de facto standard for recovery implementation.

Shadow Paging (NO-UNDO/NO-REDO)

Shadow paging takes a completely different approach to recovery by never overwriting existing data. Instead, modifications create new pages, and commit atomically switches from old pages to new pages.

Shadow Paging Concept:

BEFORE TRANSACTION:

  Page Table (Current)        Data Pages
  ┌─────────────────┐        ┌──────────────┐
  │ Page 0 → Addr A │───────▶│  Original    │ (Address A)
  │ Page 1 → Addr B │───────▶│  Original    │ (Address B)
  │ Page 2 → Addr C │───────▶│  Original    │ (Address C)
  └─────────────────┘        └──────────────┘
        │
        └── Shadow (Copy)
            ┌─────────────────┐
            │ Page 0 → Addr A │
            │ Page 1 → Addr B │  (Identical at start)
            │ Page 2 → Addr C │
            └─────────────────┘

DURING TRANSACTION (modify Page 1):

  Current Page Table         Shadow Page Table      Data Pages
  ┌─────────────────┐       ┌─────────────────┐    ┌──────────────┐
  │ Page 0 → Addr A │       │ Page 0 → Addr A │───▶│  Original    │ (A)
  │ Page 1 → Addr B │       │ Page 1 → Addr D │───▶│  Modified!   │ (D) NEW
  │ Page 2 → Addr C │       │ Page 2 → Addr C │───▶│  Original    │ (C)
  └─────────────────┘       └─────────────────┘    └──────────────┘

COMMIT: Atomically replace Current Page Table pointer with Shadow Page Table
        Now Address D is the official Page 1; Address B can be reclaimed

Shadow Paging Properties

•No redo needed — Modified pages are written to new locations before commit; after commit, they're the official pages
•No undo needed — If transaction aborts, simply discard the shadow page table; original pages are unchanged
•Atomic commit — Commit is a single atomic pointer swap (changing which page table is 'current')
•No logging required — The data pages themselves serve as the recovery mechanism

Advantages

•Extremely simple recovery—no redo or undo logic
•Commit is atomic and fast (pointer swap)
•No WAL overhead during transaction execution
•Natural support for time-travel/versioning

Disadvantages

•Data fragmentation—modified pages scattered randomly
•Garbage collection needed to reclaim old page versions
•Concurrent transactions are complex—separate shadow tables conflicting
•Does not scale well to large numbers of concurrent transactions

Where Shadow Paging is Used:

SQLite: Uses a simplified form of shadow paging (rollback journal) for single-writer transactions
LMDB: Memory-mapped database using copy-on-write B-trees (shadow paging variant)
File system journaling: Many journaling file systems use shadow-like techniques

Shadow paging's main weakness—difficulty with concurrent transactions—limits its use in high-concurrency OLTP systems. However, its simplicity makes it attractive for embedded databases and copy-on-write file systems.

Copy-on-Write Modern Renaissance

Shadow paging concepts have seen renewed interest through copy-on-write file systems (ZFS, Btrfs) and newer databases optimized for SSDs. Since SSDs prefer sequential writes to new locations over random overwrites anyway, the 'write to new location' model aligns with SSD performance characteristics.

ARIES: The Industry Standard

ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) is the dominant recovery algorithm in modern database systems. Developed at IBM Research in the early 1990s, ARIES provides a comprehensive framework for recovery that handles all the complexities of STEAL/NO-FORCE buffer management.

ARIES Design Principles:

ARIES Core Principles

•Write-Ahead Logging (WAL) — Log records must be forced to stable storage before corresponding data pages are written. This ensures recoverability.
•Repeating History During Redo — During redo, reapply all logged operations, even for uncommitted transactions. This restores the database to its exact pre-crash state.
•Logging Changes During Undo — When undoing operations, write compensation log records (CLRs). This ensures recovery can be restarted if it crashes.

ARIES Three-Phase Recovery:

    ┌─────────────────────────────────────────────────────────────────┐
    │                       ARIES RECOVERY                            │
    └─────────────────────────────────────────────────────────────────┘
                                   │
    ┌──────────────────────────────┼──────────────────────────────────┐
    │                              │                                  │
    ▼                              ▼                                  ▼
┌──────────┐                 ┌──────────┐                      ┌──────────┐
│ ANALYSIS │                 │   REDO   │                      │   UNDO   │
│  PHASE   │                 │  PHASE   │                      │  PHASE   │
├──────────┤                 ├──────────┤                      ├──────────┤
│ Scan log │                 │ Repeat   │                      │ Rollback │
│ forward  │                 │ history: │                      │ loser    │
│ from     │                 │ redo all │                      │ trans-   │
│ last     │────────────────▶│ logged   │─────────────────────▶│ actions  │
│ check-   │                 │ actions  │                      │ using    │
│ point    │                 │ since    │                      │ log      │
│          │                 │ redo     │                      │          │
│ Build:   │                 │ point    │                      │ Write    │
│ • ATT    │                 │          │                      │ CLRs     │
│ • DPT    │                 │ Compare  │                      │          │
│          │                 │ LSNs     │                      │          │
└──────────┘                 └──────────┘                      └──────────┘
      │                            │                                 │
      │                            │                                 │
      └────────────────────────────┴─────────────────────────────────┘
                                   │
                                   ▼
                        Database Consistent!

Phase Purposes:

Phase	Direction	Output	Purpose
Analysis	Forward from checkpoint	ATT, DPT, Redo point	Determine what needs redo/undo
Redo	Forward from redo point	Restored pages	Repeat history to crash-point state
Undo	Backward through loser transactions	CLRs, Clean pages	Remove uncommitted modifications

Why 'Repeat History'?

ARIES redoes everything—even uncommitted transactions—because this restores the exact pre-crash page states. If pages aren't in their pre-crash state, the undo phase might try to reverse changes that weren't actually on the page. Repeating history ensures correctness of subsequent undo.

Physical, Logical, and Physiological Logging

Recovery algorithms differ in what information they log. This choice affects log size, recovery complexity, and concurrency:

Logging Granularity Options:

Logging Granularity Comparison
Type	What's Logged	Log Size	Redo/Undo Complexity	Concurrency
Physical	Exact byte changes to specific disk locations	Large	Simple (byte copy)	Low (conflicts at byte level)
Logical	High-level operation (e.g., 'INSERT row X')	Small	Complex (re-execute operation)	High (operation-level conflicts)
Physiological	Operation on specific page (physical location + logical operation)	Medium	Moderate	High (ARIES approach)

Physical Logging:

Log Record: "At page 42, offset 128, replace bytes [A,B,C,D] with [W,X,Y,Z]"

Exact byte location and content
Simple redo/undo: just copy bytes
Large logs: every byte change recorded
Parallelism limited: concurrent changes to same page conflict

Logical Logging:

Log Record: "INSERT into Employee table: (id=42, name='Alice', dept=5)"

High-level operation
Compact logs: just operation and parameters
Complex redo: must re-execute the insert (find free space, update indexes...)
Concurrent logical operations can be serialized differently

Physiological Logging (ARIES Approach):

Log Record: "On page 42: INSERT row at slot 7, content = [...]"

Physical location (page 42) + logical operation (insert at slot)
Middle ground: more compact than physical, simpler than pure logical
Redo is 'local' to a single page (doesn't need to find the page)
ARIES uses this to enable efficient parallel redo

Physiological: Best of Both Worlds

Physiological logging combines physical page identification (enabling parallel redo across pages) with logical operations (enabling concurrent modifications within pages). This is why ARIES achieves both correctness and high performance.

Recovery Algorithm Comparison

Let's bring together all the recovery algorithms into a comprehensive comparison:

Comprehensive Recovery Algorithm Comparison
Characteristic	Deferred Update	Immediate Update	Shadow Paging	ARIES
Buffer Policy	NO-STEAL	STEAL	N/A (copy-on-write)	STEAL
Flush Policy	FORCE	Varies	N/A	NO-FORCE
Undo Required?	No	Yes	No	Yes
Redo Required?	Yes	Yes	No	Yes
Log Content	After images	Before + after	None (or minimal)	Physiological
Commit Speed	Slow (FORCE)	Fast (NO-FORCE)	Fast (pointer swap)	Fast (NO-FORCE)
Memory Requirement	High (hold all changes)	Low	Medium	Low
Recovery Complexity	Simple (redo only)	Complex (undo + redo)	Simple (no redo/undo)	Complex (3 phases)
Concurrency Support	Limited	Good	Difficult	Excellent
Industry Adoption	Rare (special cases)	Moderate	Niche (SQLite, LMDB)	Dominant (PostgreSQL, MySQL, Oracle, SQL Server)

Why ARIES Dominates:

ARIES became the industry standard because it achieves the best combination of:

Maximum buffer manager flexibility — STEAL/NO-FORCE allows optimal buffer utilization
Fast commits — NO-FORCE means commit only requires log force, not data page force
Large transaction support — STEAL allows arbitrary transaction size
High concurrency — Physiological logging and fine-grained locking
Recovery restartability — CLRs ensure recovery survives crashes during recovery
Theoretical correctness — Proven correct in academic literature
Battle-tested implementation — Decades of production experience

The Complexity Cost:

ARIES's sophistication comes with implementation complexity. The algorithm has many subtle details:

Checkpoint and restart logic
CLR generation and processing
LSN management and comparison
Fuzzy checkpoint coordination
Parallel redo optimization

This complexity is why database recovery code is among the most carefully reviewed and tested in any software system.

ARIES Variations

While ARIES is the conceptual foundation, each database implements its own variation. PostgreSQL, MySQL/InnoDB, Oracle, and SQL Server all have ARIES-inspired recovery systems with vendor-specific optimizations and extensions. Understanding core ARIES translates well to understanding any of them.

Summary: The Recovery Algorithm Landscape

Recovery algorithms represent the practical realization of ACID guarantees. From simple deferred update to sophisticated ARIES, the evolution reflects the industry's drive toward higher performance without sacrificing correctness. Let's consolidate the key insights:

Key Takeaways

•Buffer and flush policies determine recovery needs — NO-STEAL/FORCE is simple but slow; STEAL/NO-FORCE is fast but requires both undo and redo.
•Deferred update trades performance for simplicity — Good for learning, limited in production due to memory constraints.
•Immediate update enables high performance — STEAL allows flexible buffer management; requires sophisticated recovery.
•Shadow paging avoids logging entirely — Elegant but struggles with concurrency; niche use cases.
•ARIES is the industry standard — Three-phase recovery (Analysis, Redo, Undo) with physiological logging and CLRs.
•Logging granularity affects performance — Physiological (page + operation) balances log size with redo simplicity.
•All correct algorithms achieve atomicity and durability — They differ in how they achieve these properties, not whether.

What's Next:

With a broad understanding of recovery algorithms, we're ready to explore how ACID properties and recovery are fundamentally interconnected. The next page examines how each ACID property—Atomicity, Consistency, Isolation, and Durability—depends on recovery mechanisms, completing our conceptual foundation for the detailed algorithm chapters ahead.

Page Complete

You now have a comprehensive mental map of recovery algorithm families—their designs, tradeoffs, and use cases. This overview prepares you to dive deep into ARIES and other specific algorithms, understanding not just how they work but why they're designed as they are.

4 / 5

Loading learning content...

Database Management SystemsRecovery Concepts

Recovery Concepts

LevelIntermediate

Duration75 mins

TopicRecovery Concepts

4 / 5

Recovery Algorithms Overview

Algorithms That Resurrect Databases

What You Will Learn

Classifying Recovery Algorithms

Recovery algorithms can be classified along several dimensions. Understanding these dimensions clarifies why different algorithms exist and which scenarios each addresses best:

Key Classification Dimensions:

Recovery Algorithm Classification Dimensions
Dimension	Options	Tradeoff
Update Timing	Deferred update vs Immediate update	Simpler recovery vs Better performance
Buffer Policy	NO-STEAL vs STEAL	Simpler undo vs Buffer flexibility
Flush Policy	FORCE vs NO-FORCE	Simpler redo vs Faster commits
Log Content	Physical vs Logical vs Physiological	Recovery simplicity vs Log size
Redo Scope	Page-oriented vs Transaction-oriented	Parallelism vs Simplicity

The Fundamental Recovery Requirements:

All recovery algorithms must provide:

Atomicity: Uncommitted transactions must leave no trace after recovery
Durability: Committed transactions must be fully present after recovery
Correctness: The recovered database must be in a consistent state
Restartability: Recovery must succeed even if the system crashes during recovery

Undo and Redo Capabilities:

Buffer/Flush Policy	Requires Undo?	Requires Redo?	Explanation
NO-STEAL / FORCE	No	No	All committed work on disk; no uncommitted work on disk
NO-STEAL / NO-FORCE	No	Yes	Committed work may not be on disk; need redo
STEAL / FORCE	Yes	No	Uncommitted work may be on disk; need undo
STEAL / NO-FORCE	Yes	Yes	Both situations possible; need both

Most modern databases use STEAL/NO-FORCE because it provides maximum flexibility for the buffer manager, even though it requires the most sophisticated recovery.

The Performance Motivation

Deferred Update Recovery (NO-UNDO/REDO)

Deferred Update Algorithm:

Deferred Update Process

•During transaction execution: Record all modifications in the log (with new values only—no old values needed) and in memory buffers
•At commit time: Force log records to stable storage, then write all modified pages to the database
•On abort: Simply discard the in-memory modifications; the database pages were never modified
•On crash recovery: Scan the log forward; redo all committed transactions' modifications; ignore uncommitted transactions (they never touched the database)

Advantages

•No UNDO needed—uncommitted transactions never modify database pages
•Simpler log records—only new values required (no before images)
•Simpler recovery—just redo committed transactions
•Abort is instant—just discard memory

Disadvantages

•All modifications must fit in memory until commit—limits transaction size
•Long-running transactions consume memory throughout their lifetime
•Commit is slow—must write all pages at commit time (FORCE policy)
•Buffer manager cannot evict uncommitted pages—limits buffer flexibility (NO-STEAL)

Where Deferred Update is Used:

Deferred update is rarely used in production OLTP databases due to its limitations. However, it appears in:

In-memory databases: Where 'writing to disk' happens only at checkpoint time
Transaction-isolated analytics: Where long-running analytical transactions build result sets in memory
Simple embedded databases: Where implementation simplicity is valued over performance

The fundamental limitation is that NO-STEAL policy prevents the buffer manager from evicting dirty pages from uncommitted transactions, which can exhaust memory under concurrent load.

Educational Value

Immediate Update Recovery (UNDO/REDO)

Immediate Update Algorithm:

Immediate Update Process

•During transaction execution: Record modifications in the log (both old and new values) and in buffer pages. Pages may be written to disk anytime, following WAL protocol.
•At commit time: Force log records to stable storage. Pages may or may not have been written yet.
•On abort: Use log records to undo all modifications (restore old values)
•On crash recovery: First REDO all committed transactions to ensure durability, then UNDO all uncommitted transactions to ensure atomicity

The UNDO/REDO Challenge:

Immediate update recovery is more complex because:

Pages on disk may contain uncommitted modifications — These must be undone
Pages on disk may be missing committed modifications — These must be redone
Order matters — Redo must happen before undo to establish correct page state
Crash during recovery — Must handle crashes during undo (CLRs needed)

Log Record Requirements:

┌─────────────────────────────────────────────────────────┐
│  Immediate Update Log Record                            │
├─────────────────────────────────────────────────────────┤
│  LSN: Unique log sequence number                        │
│  Transaction ID: Which transaction                       │
│  Previous LSN: Previous record for this transaction     │
│  Page ID: Which page was modified                       │
│  Offset: Where on the page                              │
│  Before Image: Old value (for UNDO)                     │
│  After Image: New value (for REDO)                      │
│  Operation Type: INSERT/UPDATE/DELETE                   │
└─────────────────────────────────────────────────────────┘

Before images enable undo; after images enable redo. Both are required.

Advantages

•Transactions can be arbitrarily large—not limited by memory
•Buffer manager has full flexibility to evict any page
•Commit can be fast (NO-FORCE)—just write log records
•Better memory utilization under concurrent load

Disadvantages

•More complex recovery with both redo and undo phases
•Larger log records (both before and after images)
•Abort is slower—must undo all modifications
•Need CLRs to handle crashes during recovery

The Industry Standard

Shadow Paging (NO-UNDO/NO-REDO)

Shadow Paging Concept:

BEFORE TRANSACTION:

  Page Table (Current)        Data Pages
  ┌─────────────────┐        ┌──────────────┐
  │ Page 0 → Addr A │───────▶│  Original    │ (Address A)
  │ Page 1 → Addr B │───────▶│  Original    │ (Address B)
  │ Page 2 → Addr C │───────▶│  Original    │ (Address C)
  └─────────────────┘        └──────────────┘
        │
        └── Shadow (Copy)
            ┌─────────────────┐
            │ Page 0 → Addr A │
            │ Page 1 → Addr B │  (Identical at start)
            │ Page 2 → Addr C │
            └─────────────────┘

DURING TRANSACTION (modify Page 1):

  Current Page Table         Shadow Page Table      Data Pages
  ┌─────────────────┐       ┌─────────────────┐    ┌──────────────┐
  │ Page 0 → Addr A │       │ Page 0 → Addr A │───▶│  Original    │ (A)
  │ Page 1 → Addr B │       │ Page 1 → Addr D │───▶│  Modified!   │ (D) NEW
  │ Page 2 → Addr C │       │ Page 2 → Addr C │───▶│  Original    │ (C)
  └─────────────────┘       └─────────────────┘    └──────────────┘

COMMIT: Atomically replace Current Page Table pointer with Shadow Page Table
        Now Address D is the official Page 1; Address B can be reclaimed

Shadow Paging Properties

•No redo needed — Modified pages are written to new locations before commit; after commit, they're the official pages
•No undo needed — If transaction aborts, simply discard the shadow page table; original pages are unchanged
•Atomic commit — Commit is a single atomic pointer swap (changing which page table is 'current')
•No logging required — The data pages themselves serve as the recovery mechanism

Advantages

•Extremely simple recovery—no redo or undo logic
•Commit is atomic and fast (pointer swap)
•No WAL overhead during transaction execution
•Natural support for time-travel/versioning

Disadvantages

•Data fragmentation—modified pages scattered randomly
•Garbage collection needed to reclaim old page versions
•Concurrent transactions are complex—separate shadow tables conflicting
•Does not scale well to large numbers of concurrent transactions

Where Shadow Paging is Used:

SQLite: Uses a simplified form of shadow paging (rollback journal) for single-writer transactions
LMDB: Memory-mapped database using copy-on-write B-trees (shadow paging variant)
File system journaling: Many journaling file systems use shadow-like techniques

Copy-on-Write Modern Renaissance

ARIES: The Industry Standard

ARIES Design Principles:

ARIES Core Principles

•Write-Ahead Logging (WAL) — Log records must be forced to stable storage before corresponding data pages are written. This ensures recoverability.
•Repeating History During Redo — During redo, reapply all logged operations, even for uncommitted transactions. This restores the database to its exact pre-crash state.
•Logging Changes During Undo — When undoing operations, write compensation log records (CLRs). This ensures recovery can be restarted if it crashes.

ARIES Three-Phase Recovery:

    ┌─────────────────────────────────────────────────────────────────┐
    │                       ARIES RECOVERY                            │
    └─────────────────────────────────────────────────────────────────┘
                                   │
    ┌──────────────────────────────┼──────────────────────────────────┐
    │                              │                                  │
    ▼                              ▼                                  ▼
┌──────────┐                 ┌──────────┐                      ┌──────────┐
│ ANALYSIS │                 │   REDO   │                      │   UNDO   │
│  PHASE   │                 │  PHASE   │                      │  PHASE   │
├──────────┤                 ├──────────┤                      ├──────────┤
│ Scan log │                 │ Repeat   │                      │ Rollback │
│ forward  │                 │ history: │                      │ loser    │
│ from     │                 │ redo all │                      │ trans-   │
│ last     │────────────────▶│ logged   │─────────────────────▶│ actions  │
│ check-   │                 │ actions  │                      │ using    │
│ point    │                 │ since    │                      │ log      │
│          │                 │ redo     │                      │          │
│ Build:   │                 │ point    │                      │ Write    │
│ • ATT    │                 │          │                      │ CLRs     │
│ • DPT    │                 │ Compare  │                      │          │
│          │                 │ LSNs     │                      │          │
└──────────┘                 └──────────┘                      └──────────┘
      │                            │                                 │
      │                            │                                 │
      └────────────────────────────┴─────────────────────────────────┘
                                   │
                                   ▼
                        Database Consistent!

Phase Purposes:

Phase	Direction	Output	Purpose
Analysis	Forward from checkpoint	ATT, DPT, Redo point	Determine what needs redo/undo
Redo	Forward from redo point	Restored pages	Repeat history to crash-point state
Undo	Backward through loser transactions	CLRs, Clean pages	Remove uncommitted modifications

Why 'Repeat History'?

Physical, Logical, and Physiological Logging

Recovery algorithms differ in what information they log. This choice affects log size, recovery complexity, and concurrency:

Logging Granularity Options:

Logging Granularity Comparison
Type	What's Logged	Log Size	Redo/Undo Complexity	Concurrency
Physical	Exact byte changes to specific disk locations	Large	Simple (byte copy)	Low (conflicts at byte level)
Logical	High-level operation (e.g., 'INSERT row X')	Small	Complex (re-execute operation)	High (operation-level conflicts)
Physiological	Operation on specific page (physical location + logical operation)	Medium	Moderate	High (ARIES approach)

Physical Logging:

Log Record: "At page 42, offset 128, replace bytes [A,B,C,D] with [W,X,Y,Z]"

Exact byte location and content
Simple redo/undo: just copy bytes
Large logs: every byte change recorded
Parallelism limited: concurrent changes to same page conflict

Logical Logging:

Log Record: "INSERT into Employee table: (id=42, name='Alice', dept=5)"

High-level operation
Compact logs: just operation and parameters
Complex redo: must re-execute the insert (find free space, update indexes...)
Concurrent logical operations can be serialized differently

Physiological Logging (ARIES Approach):

Log Record: "On page 42: INSERT row at slot 7, content = [...]"

Physical location (page 42) + logical operation (insert at slot)
Middle ground: more compact than physical, simpler than pure logical
Redo is 'local' to a single page (doesn't need to find the page)
ARIES uses this to enable efficient parallel redo

Physiological: Best of Both Worlds

Recovery Algorithm Comparison

Let's bring together all the recovery algorithms into a comprehensive comparison:

Comprehensive Recovery Algorithm Comparison
Characteristic	Deferred Update	Immediate Update	Shadow Paging	ARIES
Buffer Policy	NO-STEAL	STEAL	N/A (copy-on-write)	STEAL
Flush Policy	FORCE	Varies	N/A	NO-FORCE
Undo Required?	No	Yes	No	Yes
Redo Required?	Yes	Yes	No	Yes
Log Content	After images	Before + after	None (or minimal)	Physiological
Commit Speed	Slow (FORCE)	Fast (NO-FORCE)	Fast (pointer swap)	Fast (NO-FORCE)
Memory Requirement	High (hold all changes)	Low	Medium	Low
Recovery Complexity	Simple (redo only)	Complex (undo + redo)	Simple (no redo/undo)	Complex (3 phases)
Concurrency Support	Limited	Good	Difficult	Excellent
Industry Adoption	Rare (special cases)	Moderate	Niche (SQLite, LMDB)	Dominant (PostgreSQL, MySQL, Oracle, SQL Server)

Why ARIES Dominates:

ARIES became the industry standard because it achieves the best combination of:

Maximum buffer manager flexibility — STEAL/NO-FORCE allows optimal buffer utilization
Fast commits — NO-FORCE means commit only requires log force, not data page force
Large transaction support — STEAL allows arbitrary transaction size
High concurrency — Physiological logging and fine-grained locking
Recovery restartability — CLRs ensure recovery survives crashes during recovery
Theoretical correctness — Proven correct in academic literature
Battle-tested implementation — Decades of production experience

The Complexity Cost:

ARIES's sophistication comes with implementation complexity. The algorithm has many subtle details:

Checkpoint and restart logic
CLR generation and processing
LSN management and comparison
Fuzzy checkpoint coordination
Parallel redo optimization

This complexity is why database recovery code is among the most carefully reviewed and tested in any software system.

ARIES Variations

Summary: The Recovery Algorithm Landscape

Key Takeaways

•Buffer and flush policies determine recovery needs — NO-STEAL/FORCE is simple but slow; STEAL/NO-FORCE is fast but requires both undo and redo.
•Deferred update trades performance for simplicity — Good for learning, limited in production due to memory constraints.
•Immediate update enables high performance — STEAL allows flexible buffer management; requires sophisticated recovery.
•Shadow paging avoids logging entirely — Elegant but struggles with concurrency; niche use cases.
•ARIES is the industry standard — Three-phase recovery (Analysis, Redo, Undo) with physiological logging and CLRs.
•Logging granularity affects performance — Physiological (page + operation) balances log size with redo simplicity.
•All correct algorithms achieve atomicity and durability — They differ in how they achieve these properties, not whether.

What's Next:

Page Complete

4 / 5