Recovery Concepts - Learning Module

Loading content...

0/252

Recovery Manager

The Guardian of Data Persistence

Every second, databases around the world commit millions of transactions. Users expect their data to persist forever, yet systems crash, power fails, and disks corrupt. Something must stand between chaos and data loss—that something is the Recovery Manager.

The recovery manager is the unsung hero of database systems. It operates silently during normal execution, ensuring every committed transaction is safely recorded. When disaster strikes, it springs into action, meticulously reconstructing the database to a consistent state.

Understanding the recovery manager means understanding how databases deliver on their most fundamental promise: your committed data will survive.

What You Will Learn

By the end of this page, you will understand the recovery manager's architecture, responsibilities, and integration with other database components. You'll comprehend how it coordinates logging, checkpointing, and recovery operations, and appreciate why its design decisions profoundly impact both normal operation performance and recovery time after failures.

What is the Recovery Manager?

The recovery manager is a specialized component of the database management system responsible for ensuring the atomicity and durability properties of transactions. Let's establish its definition and scope:

Definition: The Recovery Manager is the DBMS component that implements transaction atomicity and durability by managing logging, coordinating with the buffer manager, and executing recovery procedures after failures.

The recovery manager's mission can be summarized as two complementary goals:

During Normal Operation

•Record all modifications in a persistent log
•Ensure committed transactions are durable (force log to disk)
•Enable efficient rollback of aborted transactions
•Coordinate checkpoints to bound recovery time
•Minimize performance overhead while maintaining safety

During Recovery

•Analyze the log to determine system state at crash
•Redo all committed transactions (ensure durability)
•Undo all uncommitted transactions (ensure atomicity)
•Restore the database to a consistent state
•Complete recovery as quickly as possible

The Recovery Manager's Position in the DBMS:

┌──────────────────────────────────────────────────────────────────┐
│                      Application / SQL Engine                     │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│                      Transaction Manager                          │
│   Coordinates begin, commit, abort decisions                      │
└───────────────────────────────┬──────────────────────────────────┘
                                │
          ┌─────────────────────┼─────────────────────┐
          │                     │                     │
          ▼                     ▼                     ▼
┌─────────────────┐   ┌─────────────────┐   ┌─────────────────────┐
│   Lock Manager  │   │ Recovery Manager│   │  Concurrency Control│
│  (Isolation)    │   │ (A + D in ACID) │   │  (Serialization)    │
└─────────────────┘   └────────┬────────┘   └─────────────────────┘
                               │
          ┌────────────────────┼────────────────────┐
          │                    │                    │
          ▼                    ▼                    ▼
┌─────────────────┐   ┌─────────────────┐   ┌─────────────────────┐
│   Log Manager   │   │  Buffer Manager │   │   Storage Manager   │
│ (Write log file)│   │  (Page caching) │   │   (Disk I/O)        │
└─────────────────┘   └─────────────────┘   └─────────────────────┘

The recovery manager is central—it coordinates with the transaction manager (for commit/abort decisions), the buffer manager (for page writes), and the log manager (for log persistence).

Atomicity + Durability

The recovery manager implements both 'A' (Atomicity) and 'D' (Durability) of ACID. Atomicity means all-or-nothing: if a transaction aborts, all its effects must be undone. Durability means committed effects persist. These two properties require coordinated mechanisms—logging supports both.

Core Responsibilities of the Recovery Manager

The recovery manager orchestrates multiple critical operations. Each responsibility requires careful design to balance correctness and performance:

Primary Responsibilities

•Log Record Generation — For every data modification, create a log record containing sufficient information to redo the operation (for durability) and undo the operation (for atomicity). This includes the transaction ID, affected page, old value, new value, and operation type.
•Write-Ahead Logging Enforcement — Before any modified page is written to disk, ensure its log records are already persistent. This WAL protocol guarantees that the log always contains enough information to recover, even if a crash occurs during a page write.
•Commit Processing — When a transaction requests commit, force all its log records to persistent storage before acknowledging. This is the durability guarantee implementation—the commit record on disk is proof of durability.
•Abort Processing — When a transaction aborts (voluntarily or due to deadlock), traverse its log records backward and undo all modifications. Generate compensation log records (CLRs) to record the undo actions, enabling recovery from crashes during abort.
•Checkpoint Coordination — Periodically create checkpoints that record the current state of active transactions and dirty pages. Checkpoints limit how much log must be processed during recovery, bounding recovery time.
•Crash Recovery Execution — After a failure, execute the recovery algorithm (Analysis, Redo, Undo phases) to restore the database to a consistent state reflecting all committed transactions and none of the uncommitted ones.

Log Record Anatomy:

A typical log record contains:

Field	Purpose
LSN (Log Sequence Number)	Unique, monotonically increasing identifier
Transaction ID	Which transaction generated this record
Previous LSN	Previous log record for this transaction (linked list)
Page ID	Which page was modified (for redo/undo)
Offset	Position within the page
Before Image	Data before modification (for undo)
After Image	Data after modification (for redo)
Operation Type	Insert, update, delete, commit, abort, CLR, etc.

This structure enables both forward (redo) and backward (undo) traversal, supporting all recovery scenarios.

LSN: The Universal Ordering

The Log Sequence Number (LSN) provides a total ordering of all database modifications. Every page stores the LSN of the last modification applied to it. During recovery, comparing page LSNs with log record LSNs determines what needs to be redone—if pageLSN < logRecordLSN, the modification must be reapplied.

Critical Interaction with the Buffer Manager

The recovery manager and buffer manager have an intimate, carefully choreographed relationship. The buffer manager caches database pages in memory for performance, but this caching creates recovery challenges:

The Tension:

Buffer manager wants to keep modified (dirty) pages in memory to batch disk writes
Recovery manager needs to ensure recoverability regardless of which pages are in memory

The WAL Solution:

Write-Ahead Logging resolves this tension elegantly. The buffer manager can write dirty pages whenever convenient for performance, as long as it follows the WAL protocol:

WAL Rule: Before writing any dirty page to disk, all log records describing modifications to that page must already be on stable storage.

This ensures that if a crash occurs during or after a page write, the log contains enough information to redo the modification if needed.

Buffer Manager Policies and Recovery Implications
Policy	Description	Recovery Impact
STEAL	Uncommitted pages can be written to disk	Requires UNDO capability—must be able to reverse uncommitted changes
NO-STEAL	Only committed pages can be written to disk	No UNDO needed, but limits buffer manager flexibility
FORCE	All modified pages written at commit	No REDO needed, but commit latency increases significantly
NO-FORCE	Pages may remain dirty after commit	Requires REDO capability—must be able to reapply committed changes

Modern Databases: STEAL/NO-FORCE

Most production databases use STEAL/NO-FORCE policies because:

STEAL allows flexible buffer management — The buffer manager can evict any page when memory is needed, even if it contains uncommitted modifications. This prevents uncommitted transactions from monopolizing memory.
NO-FORCE enables fast commits — Transactions commit when their log records are durable, without waiting for all modified pages to be written. This dramatically reduces commit latency.

The cost of STEAL/NO-FORCE is that recovery must handle both REDO (committed work not yet on data pages) and UNDO (uncommitted work that reached data pages). ARIES, the dominant recovery algorithm, was designed specifically for this policy combination.

The Recovery Manager's Contract with Buffer Manager:

Before evicting a dirty page, call recovery manager to ensure WAL compliance
Track pageLSN for each buffered page
Report dirty page information during checkpoint
On recovery, participate in redo by accepting page corrections

The No-Free-Lunch Principle

STEAL/NO-FORCE provides maximum performance flexibility but requires the most sophisticated recovery logic. Simpler policies (NO-STEAL/FORCE) would simplify recovery but create unacceptable performance constraints. The recovery manager's complexity is the price of efficient normal operation.

Checkpoint Management

Checkpoints are the recovery manager's mechanism for bounding recovery time. Without checkpoints, recovery would need to replay the entire transaction log from database creation—potentially gigabytes or terabytes of log records.

Checkpoint Purpose:

A checkpoint records a snapshot of the database's state at a point in time:

Which transactions are active (and their status)
Which pages are dirty (and their LSNs)
The point in the log where recovery can safely start

After a crash, recovery only needs to process log records from the last checkpoint forward—dramatically reducing recovery time.

Types of Checkpoints

•Quiescent Checkpoint (Simple but Impractical) — Stop all transactions, flush all dirty pages, write checkpoint record. Provides a clean recovery point but blocks all operations—unacceptable for production systems.
•Non-Quiescent Checkpoint (Consistent Checkpoint) — Briefly pause new transactions, record active transactions and dirty pages, resume operations. Faster than quiescent but still causes a pause.
•Fuzzy Checkpoint (ARIES approach) — No transaction pause. Write checkpoint START record, capture transaction and dirty page tables, write checkpoint END record. Operations continue throughout. Recovery uses the tables to optimize log processing.

Fuzzy Checkpoint Details:

Modern databases use fuzzy checkpoints because they impose minimal disruption:

Time ──────────────────────────────────────────────────────────────▶

        │                                        │
        ▼                                        ▼
   CHECKPOINT_START                        CHECKPOINT_END
        │                                        │
        │    Normal transaction processing       │
        │    continues throughout                │
        │                                        │
        │◀────────────────────────────────────────▶│
        │    Capture ATT (Active Transaction Table)
        │    Capture DPT (Dirty Page Table)

Active Transaction Table (ATT):

Transaction ID
Transaction status (active, committing, aborting)
LastLSN (most recent log record for this transaction)
UndoNxtLSN (next log record to undo if aborted)

Dirty Page Table (DPT):

Page ID
RecoveryLSN (LSN of first log record that dirtied this page since last flush)

The minimum of all RecoveryLSNs in the DPT determines the earliest point from which redo must start—the 'redo point.'

Checkpoint Frequency Tradeoff

Frequent checkpoints reduce recovery time (less log to replay) but increase normal operation overhead (checkpoint processing). Infrequent checkpoints minimize overhead but extend recovery time. Production systems tune checkpoint intervals based on acceptable recovery time objectives (RTO).

The Recovery Process

When the database restarts after a failure, the recovery manager executes a systematic recovery process. Most modern databases follow the ARIES algorithm's three-phase approach:

Phase 1: Analysis

Scan the log forward from the last checkpoint
Reconstruct the Active Transaction Table (find uncommitted transactions at crash time)
Reconstruct the Dirty Page Table (find pages that may need redo)
Determine the 'redo point' (where redo phase should start)

Phase 2: Redo

Scan the log forward from the redo point
For each modification record, check if the page needs the modification reapplied
Compare pageLSN with log record LSN—if pageLSN < logLSN, redo the operation
This 'repeats history' to restore the database to crash-point state

Phase 3: Undo

Roll back all uncommitted transactions identified in analysis
Process transactions in reverse LSN order
Write Compensation Log Records (CLRs) for each undo action
CLRs ensure idempotent recovery—crashes during recovery can be recovered

Recovery Phase Characteristics
Phase	Direction	Purpose	Key Operation
Analysis	Forward (from checkpoint)	Determine what needs redo/undo	Reconstruct ATT and DPT
Redo	Forward (from redo point)	Restore crash-time state	Reapply log records to pages where pageLSN < logLSN
Undo	Backward (uncommitted transactions)	Remove uncommitted work	Reverse operations, write CLRs

Why Redo Before Undo?

A subtle but critical question: why not undo first? The answer lies in logging:

During normal operation, uncommitted modifications might have been written to disk (STEAL policy)
The log records for those modifications are also on disk
During redo, we restore the database to exactly its crash-point state, including uncommitted modifications
Now we can safely undo uncommitted transactions, because all relevant pages are in a known state

If we tried to undo first, pages might not reflect the modifications we're trying to undo (if they hadn't been flushed before the crash).

Recovery Time:

Recovery time depends on:

Time since last checkpoint (affects log length to process)
Number of dirty pages (affects redo work)
Number of uncommitted transactions (affects undo work)
Disk I/O speed and parallelism

Production systems design checkpoint intervals and recovery parallelism to meet Recovery Time Objectives (RTOs)—often minutes to single-digit minutes for large databases.

Repeating History

ARIES's 'repeat history' approach (redoing all logged operations) may seem wasteful—why redo work that was already on disk? But it greatly simplifies the recovery algorithm: after redo, we know the exact state of every page, enabling correct undo. The performance cost is mitigated by smart redo optimization using pageLSN comparisons.

Compensation Log Records (CLRs)

Compensation Log Records are a critical innovation in recovery algorithms. They solve a subtle but dangerous problem: what happens if the system crashes during recovery?

The Problem:

Imagine a transaction T1 made modifications M1, M2, M3 and then crashed before commit. During recovery, the undo phase reverses M3, M2... and then another crash occurs. Without CLRs, the next recovery would need to undo M3 again—but M3 might have already been undone. How do we avoid redoing or re-undoing work on recovery restart?

The CLR Solution:

When we undo an operation, we log the undo as a Compensation Log Record. CLRs have special properties:

CLR Properties

•CLRs are redo-only — During redo phase, CLRs are treated like any other log record and redone if necessary. This ensures undo work survives crashes during recovery.
•CLRs are never undone — CLRs have a special UndoNxtLSN pointer that skips over them during undo traversal. This prevents infinite undo loops.
•CLRs point backward — Each CLR contains the LSN of the next record to undo, allowing recovery to skip already-undone operations.

CLR Example:

Original log sequence for transaction T1:

 LSN 100: T1, Update page P1 (old=A, new=B)
 LSN 150: T1, Update page P2 (old=X, new=Y)
 LSN 200: T1, Update page P3 (old=M, new=N)
 [CRASH - T1 never committed]

During first recovery, undo phase generates CLRs:

 LSN 300: CLR for LSN 200 (undid P3: N→M), UndoNxtLSN=150
 LSN 350: CLR for LSN 150 (undid P2: Y→X), UndoNxtLSN=100
 [CRASH DURING RECOVERY]

During second recovery:
 - Analysis finds T1 is uncommitted, last CLR is LSN 350
 - Redo replays CLRs 300 and 350 (if not already on pages)
 - Undo follows UndoNxtLSN pointer from LSN 350 → LSN 100
 - Generates CLR for LSN 100, completes undo

The already-undone operations (LSN 200, 150) are skipped via UndoNxtLSN pointers.

This mechanism guarantees that recovery is idempotent—running recovery multiple times produces the same correct result, regardless of how many crashes occur during recovery.

Recovery from Recovery Failures

Real-world systems do crash during recovery. Hardware problems, bugs in recovery code, or resource exhaustion can interrupt recovery. CLRs ensure that each recovery attempt makes forward progress and eventually completes. Without CLRs, recovery could not be resumed safely after interruption.

Implementation Considerations

Implementing a robust recovery manager involves numerous practical challenges beyond the algorithm itself:

Implementation Challenges

•Log Physical Layout — Logs must be written sequentially for performance but also support random access during recovery. Most systems use log segments (fixed-size files) with an index tracking LSN-to-file mappings.
•Group Commit Batching — Collecting multiple commits into single disk flushes requires careful synchronization. Transactions waiting for group commit need timeout handling to bound latency.
•Log Archival — Production logs grow continuously and must be archived (moved to cheaper storage) while maintaining accessibility for point-in-time recovery. Archive logs enable recovery to any historical point.
•Parallel Recovery — Single-threaded recovery is too slow for large databases. Modern recovery managers parallelize redo across pages and undo across transactions, requiring careful coordination.
•Nested and Savepoint Transactions — Partial rollback (to savepoints) requires additional log record types and modified undo logic.
•Long-Running Transactions — Transactions spanning hours or days generate massive log records. Special handling may be needed to avoid log explosion.
•Schema Changes (DDL) — Table creation, alteration, and deletion require special log record types and careful ordering with respect to data modifications.

Real-World Recovery Manager Examples:

Database	Recovery System	Notable Features
PostgreSQL	Write-Ahead Log (WAL)	Physiological logging, streaming replication, archive recovery
MySQL InnoDB	Redo/Undo logs	Double-write buffer for torn page protection, purge threads for undo cleanup
Oracle	Redo logs + Undo tablespace	Fast-start checkpointing, flashback features, parallel recovery
SQL Server	Transaction log + Checkpoint	Indirect checkpoints, accelerated database recovery (ADR)

Double-Write Buffer (InnoDB):

InnoDB addresses the torn page problem (crash during partial page write) with a double-write buffer. Before writing a page to its actual location, InnoDB first writes it to a contiguous double-write area. If a crash causes a torn page write, recovery can restore the page from the double-write buffer. This provides additional protection beyond standard WAL.

The Devil is in the Details

Production recovery managers contain thousands of lines of carefully audited code addressing edge cases: partial log writes, corrupted log records, inconsistent checkpoints, and hardware-specific behaviors. The core ARIES algorithm is just the beginning—production implementations layer significant additional complexity.

Summary: The Recovery Manager Foundation

The recovery manager is the guarantor of atomicity and durability—two of the four pillars that make transactional databases trustworthy. Let's consolidate the key insights:

Key Takeaways

•The recovery manager implements A and D — Atomicity (all-or-nothing) and Durability (committed = permanent) are its core responsibility.
•WAL is the foundational protocol — Log records must reach disk before modified pages; this enables recovery from any crash.
•Buffer manager coordination is critical — STEAL/NO-FORCE policies require sophisticated recovery but enable optimal performance.
•Checkpoints bound recovery time — Fuzzy checkpoints capture system state without blocking operations.
•Three-phase recovery (ARIES) — Analysis, Redo, Undo phases systematically restore consistency.
•CLRs enable recovery restartability — Compensation records ensure recovery can survive crashes during recovery.
•Implementation is complex — Production recovery managers address numerous edge cases beyond the core algorithm.

What's Next:

The recovery manager depends on a critical abstraction: stable storage—storage that survives failures. But what makes storage 'stable'? The next page explores stable storage concepts, including how databases use disk redundancy, replication, and carefully designed I/O patterns to create the reliable storage foundation that recovery depends upon.

Page Complete

You now understand the recovery manager's architecture, responsibilities, and mechanisms. This central component orchestrates the complex dance between performance (keeping data in memory, batching writes) and safety (ensuring recoverability at every moment). Next, we'll explore the stable storage foundation it relies upon.