Database Management SystemsRecovery Algorithms

Recovery Algorithm Types

LevelAdvanced

Duration90 mins

TopicRecovery Algorithms

4 / 5

ARIES: The Industry Standard

The Algorithm Behind Every Major Database

When your bank processes a transfer, when your favorite app saves your data, when an airline books your seat—chances are extremely high that ARIES is the recovery algorithm ensuring those transactions survive power failures, crashes, and hardware errors.

ARIES—Algorithms for Recovery and Isolation Exploiting Semantics—was developed at IBM Research in the late 1980s and early 1990s by C. Mohan and colleagues. It has become the de facto standard for database recovery, implemented with variations in Oracle, DB2, SQL Server, PostgreSQL, MySQL InnoDB, and virtually every production-grade DBMS.

What You Will Learn

By the end of this page, you will understand ARIES's design philosophy, its three recovery phases (Analysis, Redo, Undo), the key data structures it maintains, and why its 'repeating history' approach provides both flexibility and correctness.

Design Philosophy

ARIES is designed around three fundamental principles that distinguish it from simpler recovery algorithms:

1. Write-Ahead Logging (WAL):

Before any change to a database page is written to disk, the corresponding log record must first be written to stable storage.

This ensures recoverability regardless of when pages are flushed.

2. Repeating History During Redo:

During recovery, ARIES replays ALL logged actions (not just committed ones) to restore the database to its exact pre-crash state before undoing incomplete transactions.

This counterintuitive approach simplifies recovery logic and supports complex features like fine-grained locking.

3. Logging Changes During Undo:

When ARIES undoes an incomplete transaction's actions, it writes Compensation Log Records (CLRs) that record the undo operations.

This makes undo operations idempotent and ensures crash-safe recovery of recovery itself.

ARIES Design Principles
Principle	Policy	Benefit
STEAL	Uncommitted changes may be written to disk	No transaction size limit; flexible buffer management
NO-FORCE	Committed pages needn't be forced to disk	Fast commits (force only log)
Repeat History	Redo all actions, then undo losers	Consistent recovery; supports physiological logging
CLRs for Undo	Log undo actions as they happen	Crash-safe undo; prevents repeated undo work
Fuzzy Checkpoints	Checkpoint without quiescing system	Continuous operation; bounded recovery

Why STEAL/NO-FORCE?

This seemingly strange combination is actually optimal for general-purpose databases:

STEAL allows the buffer manager to evict any page when memory is needed, even if it contains uncommitted changes. This removes transaction size limits.
NO-FORCE means committed transactions don't wait for all their pages to be written—only the log records are forced. This dramatically reduces commit latency.

The cost is requiring both UNDO (for uncommitted changes that reached disk) and REDO (for committed changes that didn't). ARIES handles both efficiently.

The ARIES Philosophy

ARIES prioritizes flexibility and performance during normal operation, accepting additional complexity during recovery. Since systems spend >99.9% of time in normal operation, this trade-off is overwhelmingly positive.

Key Data Structures

ARIES maintains several data structures that enable efficient recovery. Understanding these is essential to understanding how ARIES works.

1. Log Sequence Number (LSN):

Every log record receives a unique, monotonically increasing LSN. LSNs serve as the universal timeline for the entire system:

Log records are ordered by LSN
Each page stores the LSN of the most recent log record that modified it (pageLSN)
During recovery, LSNs determine whether a REDO action is necessary

2. Transaction Table:

Maintains the state of all active transactions:

Transaction ID: Unique identifier
lastLSN: LSN of the most recent log record for this transaction
Status: RUNNING, COMMITTED, ABORTED
UndoNextLSN: For crash recovery, the next LSN to undo

aries_data_structures.pseudo

Pseudocode

// Transaction Table Entry
STRUCT TransactionTableEntry:
    txn_id          : TransactionID
    status          : {RUNNING, COMMITTED, ABORTED}
    lastLSN         : LSN      // Most recent log record for this txn
    undoNextLSN     : LSN      // Next record to undo (during recovery)
 
// Dirty Page Table Entry  
STRUCT DirtyPageTableEntry:
    page_id         : PageID
    recLSN          : LSN      // LSN of first log record that dirtied page
                               // (since last checkpoint or flush)
 
// Log Record Structure
STRUCT LogRecord:
    LSN             : LSN      // This record's unique sequence number
    prevLSN         : LSN      // Previous LSN for same transaction
    txn_id          : TransactionID
    type            : {BEGIN, UPDATE, COMMIT, ABORT, CLR, END, CHECKPOINT}
    page_id         : PageID   // For UPDATE and CLR
    undo_next_LSN   : LSN      // For CLR: next record to undo
    before_image    : BYTES    // For UPDATE: value before change
    after_image     : BYTES    // For UPDATE: value after change
 
// Page Header (maintained on each data page)
STRUCT PageHeader:
    page_id         : PageID
    pageLSN         : LSN      // LSN of last update to this page

3. Dirty Page Table (DPT):

Tracks pages that have been modified in the buffer pool but may not have been flushed to disk:

Page ID: Identifies the dirty page
recLSN (Recovery LSN): The LSN of the first log record that dirtied this page since it was last written to disk

The recLSN is critical: during REDO, we know we can skip log records with LSN < recLSN for a given page (the page was clean at that point).

4. prevLSN Chain:

Each log record contains a prevLSN pointing to the previous log record for the same transaction. This creates per-transaction chains enabling efficient traversal:

T1: BEGIN(100) <-- UPDATE(105) <-- UPDATE(112) <-- COMMIT(118)
         ^             ^              ^              ^
         |             |              |              |
     prevLSN=NULL  prevLSN=100   prevLSN=105   prevLSN=112

During UNDO, we follow these chains backwards to undo transaction operations in reverse order.

The Power of LSNs

LSNs unify time across the entire system. By comparing a page's pageLSN with a log record's LSN, we instantly know if that record's update is already reflected in the page. This comparison is the foundation of ARIES REDO efficiency.

The Three Phases of Recovery

ARIES recovery proceeds through three distinct phases, each with a specific purpose:

Phase 1: ANALYSIS

Scans the log from the last checkpoint to the end, reconstructing:

The transaction table (who was active at crash time)
The dirty page table (which pages might need redo)
The set of transactions that need to be undone

Phase 2: REDO

Scans forward through the log from the minimum recLSN in the dirty page table, replaying ALL actions to restore the database to its exact pre-crash state—including uncommitted changes.

Phase 3: UNDO

Scans backward, undoing the actions of transactions that were active at crash time and never committed.

ARIES Recovery Phases Overview
Phase	Direction	Purpose	Output
Analysis	Forward from checkpoint	Reconstruct system state at crash	Transaction table, Dirty page table, Redo start point
Redo	Forward from min(recLSN)	Repeat history to restore pre-crash state	Database restored to crash-time state
Undo	Backward (per-transaction)	Remove effects of incomplete transactions	Consistent database with only committed data

Why Three Phases?

The separation serves important purposes:

Analysis avoids redundant work—we learn exactly what needs REDO and UNDO before doing any I/O.
REDO restoring exact pre-crash state enables:
- Correct UNDO (we undo from a known, consistent starting point)
- Recovery of uncommitted changes that were written to disk (STEAL)
- Support for physiological logging (operations that depend on current page state)
UNDO as a separate phase allows:
- Parallel UNDO of independent transactions
- CLR generation (crash-safe progress)
- Efficient processing (REDO touched all pages, UNDO benefits from cached pages)

Repeat History: The Non-Obvious Decision

ARIES redoes ALL operations—even those from transactions that will be undone. This seems wasteful but is essential for correctness with physiological logging. If we skipped uncommitted changes during REDO, the page state wouldn't match what the UNDO log records expect.

Analysis Phase in Detail

The Analysis phase reconstructs two critical tables by scanning the log from the last checkpoint to the end. This phase does no I/O to data pages—it only reads the log.

Starting Point:

Read the last checkpoint record from the log
Initialize the transaction table from the checkpoint's saved state
Initialize the dirty page table from the checkpoint's saved state

Processing Each Log Record:

As we scan forward through the log, we update our tables based on each record type:

aries_analysis_phase.pseudo

Pseudocode

PROCEDURE Analysis_Phase():
    // Start from last checkpoint
    checkpoint := find_last_checkpoint_record()
    
    // Initialize tables from checkpoint
    transaction_table := checkpoint.active_transactions
    dirty_page_table := checkpoint.dirty_pages
    
    // Scan log from checkpoint to end
    lsn := checkpoint.LSN
    
    WHILE more_log_records():
        record := read_log_record(lsn)
        
        CASE record.type OF:
        
            BEGIN:
                // New transaction started
                transaction_table.add({
                    txn_id: record.txn_id,
                    status: RUNNING,
                    lastLSN: record.LSN,
                    undoNextLSN: NULL
                })
                
            UPDATE, CLR:
                // Transaction modified a page
                transaction_table[record.txn_id].lastLSN := record.LSN
                
                // Track dirty page
                IF record.page_id NOT IN dirty_page_table THEN
                    dirty_page_table.add({
                        page_id: record.page_id,
                        recLSN: record.LSN  // First modification since flush
                    })
                END IF
                
            COMMIT:
                // Transaction committed
                transaction_table[record.txn_id].status := COMMITTED
                
            ABORT:
                // Transaction aborted (may require undo)
                transaction_table[record.txn_id].status := ABORTED
                
            END:
                // Transaction fully complete (committed or aborted + undone)
                transaction_table.remove(record.txn_id)
                
        END CASE
        
        lsn := next_LSN()
    END WHILE
    
    // Analysis complete. Returns:
    // - transaction_table: All transactions requiring UNDO
    // - dirty_page_table: Defines REDO starting point
    RETURN (transaction_table, dirty_page_table)

After Analysis:

The transaction table now contains:

All transactions that were active at crash time (need UNDO)
Their lastLSN (starting point for backward UNDO traversal)

The dirty page table tells us:

The minimum recLSN across all dirty pages = starting point for REDO
Which pages potentially need REDO (any page in the table)

Analysis Example:

Checkpoint saved: T1 active (lastLSN=100), pages P1(recLSN=95), P2(recLSN=98)

Log after checkpoint:

LSN=110: [UPDATE, T1, P3, ...]    // T1 updates P3
LSN=112: [BEGIN, T2]              // T2 starts
LSN=115: [UPDATE, T2, P1, ...]    // T2 updates P1
LSN=118: [COMMIT, T1]             // T1 commits
LSN=120: [UPDATE, T2, P2, ...]    // T2 updates P2
--- CRASH ---

After Analysis:

Transaction table: T2 (RUNNING, lastLSN=120)
Dirty page table: P1(95), P2(98), P3(110)
REDO starts at LSN 95 (min recLSN)

Redo Phase in Detail

The REDO phase restores the database to its exact pre-crash state by replaying all logged operations. This is ARIES's 'repeat history' paradigm.

Starting Point: The minimum recLSN from the dirty page table (the earliest point where a page might need updates)

Key Principle: REDO only if necessary. For each UPDATE or CLR record, check if the update is already reflected in the page.

aries_redo_phase.pseudo

Pseudocode

PROCEDURE Redo_Phase(dirty_page_table):
    // Find starting point for REDO
    redo_lsn := MIN(recLSN for each page in dirty_page_table)
    
    // Scan forward from redo_lsn to end of log
    FOR EACH log_record WHERE record.LSN >= redo_lsn:
        
        // Only process UPDATE and CLR records (they modify pages)
        IF record.type NOT IN {UPDATE, CLR} THEN
            CONTINUE
        END IF
        
        // Optimization 1: Skip if page not in dirty page table
        IF record.page_id NOT IN dirty_page_table THEN
            CONTINUE  // Page wasn't dirty, no redo needed
        END IF
        
        // Optimization 2: Skip if record is before page became dirty
        IF record.LSN < dirty_page_table[record.page_id].recLSN THEN
            CONTINUE  // Record preceded the dirty state
        END IF
        
        // Now we must examine the actual page
        page := fetch_page(record.page_id)
        
        // Key check: Is this update already on the page?
        IF page.pageLSN >= record.LSN THEN
            CONTINUE  // Already applied (page was written after this log record)
        END IF
        
        // Apply the redo operation
        apply_redo(record, page)
        
        // Update page's LSN to reflect this operation
        page.pageLSN := record.LSN
        
    END FOR
    
    // Database now reflects pre-crash state (including uncommitted changes)

The pageLSN Check:

The comparison page.pageLSN >= record.LSN is the heart of REDO efficiency:

If pageLSN < record.LSN: The page doesn't have this update → must REDO
If pageLSN >= record.LSN: The page already has this (or later) update → skip

This check allows ARIES to handle:

Pages that were flushed after modification (no redo needed)
Pages that were never flushed (full redo needed)
Partial flushes (some updates present, others missing)

Idempotency:

Each REDO operation is idempotent—applying it multiple times has the same effect as applying it once. Combined with the pageLSN check, this means:

We can crash during REDO and restart safely
The same log record is never applied twice to the same page
Recovery of recovery is correct

Redo Efficiency

In practice, REDO skips most log records: pages flushed during checkpointing already have updates; only pages dirty at crash need work. Hot pages (frequently updated) may need more redo, but they're also likely in the buffer pool from recent activity.

Undo Phase in Detail

The UNDO phase removes the effects of transactions that were active at crash time. After REDO, the database contains both committed and uncommitted changes—UNDO restores consistency by reversing the uncommitted ones.

Key Concept: Compensation Log Records (CLRs)

When ARIES undoes an operation, it writes a CLR that:

Records the undo action (for crash safety)
Contains an undoNextLSN pointing to the next record to undo
Is itself REDO-only (never undone)

CLRs ensure that if we crash during UNDO, we don't redo the same undo work.

aries_undo_phase.pseudo

Pseudocode

PROCEDURE Undo_Phase(transaction_table):
    // Collect all LSNs that need to be undone
    // For each active transaction, we'll undo from its lastLSN backward
    
    undo_list := new MaxHeap()  // Process highest LSN first
    
    FOR EACH txn IN transaction_table WHERE txn.status != COMMITTED:
        undo_list.add(txn.lastLSN)
    END FOR
    
    // Process undo list in descending LSN order
    WHILE undo_list.not_empty():
        lsn := undo_list.pop_max()
        record := read_log_record(lsn)
        
        IF record.type = CLR THEN
            // CLR's undoNextLSN tells us what to undo next
            IF record.undoNextLSN != NULL THEN
                undo_list.add(record.undoNextLSN)
            END IF
            // CLRs are never undone themselves
            
        ELSE IF record.type = UPDATE THEN
            // Undo this update
            page := fetch_page(record.page_id)
            
            // Apply the undo using before-image
            apply_undo(record, page)
            
            // Write CLR to record the undo
            clr := {
                type: CLR,
                txn_id: record.txn_id,
                page_id: record.page_id,
                LSN: next_LSN(),
                undoNextLSN: record.prevLSN,  // Points to next record to undo
                redo_info: record.before_image  // CLR redoes the undo
            }
            write_log(clr)
            page.pageLSN := clr.LSN
            
            // Queue the previous operation for undo
            IF record.prevLSN != NULL THEN
                undo_list.add(record.prevLSN)
            END IF
            
        END IF
    END WHILE
    
    // Write END records for all undone transactions
    FOR EACH txn IN transaction_table WHERE txn.status != COMMITTED:
        write_log({type: END, txn_id: txn.txn_id})
    END FOR

The undoNextLSN Chain:

CLRs contain an undoNextLSN field that creates a 'skip chain' for efficient traversal:

Original: UPDATE(100) <-- UPDATE(110) <-- UPDATE(120)
                  ^           |              |
                  |           v              |
                  +----  CLR(125) ----> prevLSN=100
                             |          undoNextLSN=100
                             v
                        CLR(130) ----> undoNextLSN=100
                             |
                             v
               Undo of 100: CLR(135) ----> undoNextLSN=NULL

If we crash after writing CLR(125), recovery's UNDO phase:

Sees CLR(125) during log scan
Follows undoNextLSN to 100 (skipping already-undone 110)
Undoes record 100
Transaction fully undone

Why Process Highest LSN First?

Using a max-heap ensures we undo operations in reverse chronological order:

Operations are undone in the reverse order they were done
Multiple transactions are interleaved correctly
The database returns to a consistent state

Crash-Safe Undo

CLRs make UNDO restartable. If we crash during UNDO, recovery's REDO phase replays the CLRs (applying the undo operations), then UNDO continues from where it left off using undoNextLSN pointers.

Fuzzy Checkpointing

Checkpoints bound recovery time by saving system state periodically. ARIES uses fuzzy checkpoints that don't require stopping the system.

The Checkpointing Process:

Write a BEGIN_CHECKPOINT record
Write the current transaction table to the log
Write the current dirty page table to the log
Write an END_CHECKPOINT record with pointer to BEGIN
Update the master record to point to this checkpoint

No quiescing is required. Transactions continue throughout.

Why 'Fuzzy'?

Between BEGIN_CHECKPOINT and END_CHECKPOINT:

New transactions may start
Old transactions may commit
Pages may be modified or flushed

The checkpoint captures a 'fuzzy' snapshot—not a precise moment, but a valid starting point for recovery. Analysis will correct any discrepancies by scanning beyond the checkpoint.

aries_checkpoint.pseudo

Pseudocode

PROCEDURE Fuzzy_Checkpoint():
    // Step 1: Write begin marker
    begin_lsn := write_log({type: BEGIN_CHECKPOINT})
    
    // Step 2: Capture current state (snapshot, may be slightly stale)
    txn_table_copy := copy(transaction_table)
    dpt_copy := copy(dirty_page_table)
    
    // Step 3: Write state to log
    write_log({
        type: END_CHECKPOINT,
        begin_lsn: begin_lsn,
        transaction_table: txn_table_copy,
        dirty_page_table: dpt_copy
    })
    
    // Step 4: Update master record to point to new checkpoint
    // This is the last action - if we crash before this,
    // the previous checkpoint is still valid
    update_master_record(begin_lsn)
    
    // No transactions were blocked during this process
 
 
// The master record stores the LSN of the last valid checkpoint
// On restart, recovery reads the master record first

Checkpoint Impact on Recovery:

Without Checkpoints	With Checkpoints
Analysis scans entire log	Analysis starts at checkpoint
REDO starts at log beginning	REDO starts at min(recLSN)
UNDO processes all uncommitted	UNDO unchanged
Recovery O(log size)	Recovery O(checkpoint interval)

Checkpoint Frequency Trade-off:

Frequent checkpoints → shorter recovery time
Frequent checkpoints → more overhead during operation
Typical interval: seconds to minutes depending on workload and recovery time requirements

Dirty Page Writeback

Some systems combine checkpointing with background dirty page writeback. Writing dirty pages to disk allows removing them from the DPT, which advances the REDO starting point and reduces recovery work. This is separate from the checkpoint record itself.

Summary: ARIES

ARIES represents the pinnacle of database recovery algorithm design. Let's consolidate the key concepts:

Key Takeaways

•STEAL/NO-FORCE: Enables unlimited transaction sizes and fast commits, at the cost of requiring UNDO + REDO
•WAL is mandatory: Log records must reach disk before corresponding data changes—the foundation of recoverability
•Three phases: Analysis (reconstruct state), REDO (repeat history), UNDO (remove uncommitted)
•LSN as universal timeline: pageLSN vs record LSN determines whether REDO is needed
•Repeat History: REDO all operations (even for losers) to establish consistent starting point for UNDO
•CLRs make UNDO crash-safe: Compensation records ensure no operation is undone twice
•Fuzzy checkpoints: Bound recovery time without stopping the system
•Industry standard: Implemented in Oracle, SQL Server, PostgreSQL, MySQL InnoDB, DB2, and more

What's Next:

In the final page of this module, we'll directly compare all the recovery algorithms we've studied—deferred update, immediate update, shadow paging, and ARIES. We'll create a decision framework for choosing among them and discuss how these concepts apply in modern distributed systems.

Page Complete

You now understand ARIES—the industry-standard recovery algorithm. You can explain its three phases, the role of LSNs and data structures, the 'repeat history' philosophy, and how fuzzy checkpoints bound recovery time. This knowledge applies directly to understanding any production DBMS's recovery behavior.

4 / 5

Loading learning content...

Database Management SystemsRecovery Algorithms

Recovery Algorithm Types

LevelAdvanced

Duration90 mins

TopicRecovery Algorithms

4 / 5

ARIES: The Industry Standard

The Algorithm Behind Every Major Database

What You Will Learn

Design Philosophy

ARIES is designed around three fundamental principles that distinguish it from simpler recovery algorithms:

1. Write-Ahead Logging (WAL):

Before any change to a database page is written to disk, the corresponding log record must first be written to stable storage.

This ensures recoverability regardless of when pages are flushed.

2. Repeating History During Redo:

During recovery, ARIES replays ALL logged actions (not just committed ones) to restore the database to its exact pre-crash state before undoing incomplete transactions.

This counterintuitive approach simplifies recovery logic and supports complex features like fine-grained locking.

3. Logging Changes During Undo:

When ARIES undoes an incomplete transaction's actions, it writes Compensation Log Records (CLRs) that record the undo operations.

This makes undo operations idempotent and ensures crash-safe recovery of recovery itself.

ARIES Design Principles
Principle	Policy	Benefit
STEAL	Uncommitted changes may be written to disk	No transaction size limit; flexible buffer management
NO-FORCE	Committed pages needn't be forced to disk	Fast commits (force only log)
Repeat History	Redo all actions, then undo losers	Consistent recovery; supports physiological logging
CLRs for Undo	Log undo actions as they happen	Crash-safe undo; prevents repeated undo work
Fuzzy Checkpoints	Checkpoint without quiescing system	Continuous operation; bounded recovery

Why STEAL/NO-FORCE?

This seemingly strange combination is actually optimal for general-purpose databases:

STEAL allows the buffer manager to evict any page when memory is needed, even if it contains uncommitted changes. This removes transaction size limits.
NO-FORCE means committed transactions don't wait for all their pages to be written—only the log records are forced. This dramatically reduces commit latency.

The cost is requiring both UNDO (for uncommitted changes that reached disk) and REDO (for committed changes that didn't). ARIES handles both efficiently.

The ARIES Philosophy

Key Data Structures

ARIES maintains several data structures that enable efficient recovery. Understanding these is essential to understanding how ARIES works.

1. Log Sequence Number (LSN):

Every log record receives a unique, monotonically increasing LSN. LSNs serve as the universal timeline for the entire system:

Log records are ordered by LSN
Each page stores the LSN of the most recent log record that modified it (pageLSN)
During recovery, LSNs determine whether a REDO action is necessary

2. Transaction Table:

Maintains the state of all active transactions:

Transaction ID: Unique identifier
lastLSN: LSN of the most recent log record for this transaction
Status: RUNNING, COMMITTED, ABORTED
UndoNextLSN: For crash recovery, the next LSN to undo

aries_data_structures.pseudo

Pseudocode

// Transaction Table Entry
STRUCT TransactionTableEntry:
    txn_id          : TransactionID
    status          : {RUNNING, COMMITTED, ABORTED}
    lastLSN         : LSN      // Most recent log record for this txn
    undoNextLSN     : LSN      // Next record to undo (during recovery)
 
// Dirty Page Table Entry  
STRUCT DirtyPageTableEntry:
    page_id         : PageID
    recLSN          : LSN      // LSN of first log record that dirtied page
                               // (since last checkpoint or flush)
 
// Log Record Structure
STRUCT LogRecord:
    LSN             : LSN      // This record's unique sequence number
    prevLSN         : LSN      // Previous LSN for same transaction
    txn_id          : TransactionID
    type            : {BEGIN, UPDATE, COMMIT, ABORT, CLR, END, CHECKPOINT}
    page_id         : PageID   // For UPDATE and CLR
    undo_next_LSN   : LSN      // For CLR: next record to undo
    before_image    : BYTES    // For UPDATE: value before change
    after_image     : BYTES    // For UPDATE: value after change
 
// Page Header (maintained on each data page)
STRUCT PageHeader:
    page_id         : PageID
    pageLSN         : LSN      // LSN of last update to this page

3. Dirty Page Table (DPT):

Tracks pages that have been modified in the buffer pool but may not have been flushed to disk:

Page ID: Identifies the dirty page
recLSN (Recovery LSN): The LSN of the first log record that dirtied this page since it was last written to disk

The recLSN is critical: during REDO, we know we can skip log records with LSN < recLSN for a given page (the page was clean at that point).

4. prevLSN Chain:

Each log record contains a prevLSN pointing to the previous log record for the same transaction. This creates per-transaction chains enabling efficient traversal:

T1: BEGIN(100) <-- UPDATE(105) <-- UPDATE(112) <-- COMMIT(118)
         ^             ^              ^              ^
         |             |              |              |
     prevLSN=NULL  prevLSN=100   prevLSN=105   prevLSN=112

During UNDO, we follow these chains backwards to undo transaction operations in reverse order.

The Power of LSNs

The Three Phases of Recovery

ARIES recovery proceeds through three distinct phases, each with a specific purpose:

Phase 1: ANALYSIS

Scans the log from the last checkpoint to the end, reconstructing:

The transaction table (who was active at crash time)
The dirty page table (which pages might need redo)
The set of transactions that need to be undone

Phase 2: REDO

Scans forward through the log from the minimum recLSN in the dirty page table, replaying ALL actions to restore the database to its exact pre-crash state—including uncommitted changes.

Phase 3: UNDO

Scans backward, undoing the actions of transactions that were active at crash time and never committed.

ARIES Recovery Phases Overview
Phase	Direction	Purpose	Output
Analysis	Forward from checkpoint	Reconstruct system state at crash	Transaction table, Dirty page table, Redo start point
Redo	Forward from min(recLSN)	Repeat history to restore pre-crash state	Database restored to crash-time state
Undo	Backward (per-transaction)	Remove effects of incomplete transactions	Consistent database with only committed data

Why Three Phases?

The separation serves important purposes:

Analysis avoids redundant work—we learn exactly what needs REDO and UNDO before doing any I/O.
REDO restoring exact pre-crash state enables:
- Correct UNDO (we undo from a known, consistent starting point)
- Recovery of uncommitted changes that were written to disk (STEAL)
- Support for physiological logging (operations that depend on current page state)
UNDO as a separate phase allows:
- Parallel UNDO of independent transactions
- CLR generation (crash-safe progress)
- Efficient processing (REDO touched all pages, UNDO benefits from cached pages)

Repeat History: The Non-Obvious Decision

Analysis Phase in Detail

The Analysis phase reconstructs two critical tables by scanning the log from the last checkpoint to the end. This phase does no I/O to data pages—it only reads the log.

Starting Point:

Read the last checkpoint record from the log
Initialize the transaction table from the checkpoint's saved state
Initialize the dirty page table from the checkpoint's saved state

Processing Each Log Record:

As we scan forward through the log, we update our tables based on each record type:

aries_analysis_phase.pseudo

Pseudocode

PROCEDURE Analysis_Phase():
    // Start from last checkpoint
    checkpoint := find_last_checkpoint_record()
    
    // Initialize tables from checkpoint
    transaction_table := checkpoint.active_transactions
    dirty_page_table := checkpoint.dirty_pages
    
    // Scan log from checkpoint to end
    lsn := checkpoint.LSN
    
    WHILE more_log_records():
        record := read_log_record(lsn)
        
        CASE record.type OF:
        
            BEGIN:
                // New transaction started
                transaction_table.add({
                    txn_id: record.txn_id,
                    status: RUNNING,
                    lastLSN: record.LSN,
                    undoNextLSN: NULL
                })
                
            UPDATE, CLR:
                // Transaction modified a page
                transaction_table[record.txn_id].lastLSN := record.LSN
                
                // Track dirty page
                IF record.page_id NOT IN dirty_page_table THEN
                    dirty_page_table.add({
                        page_id: record.page_id,
                        recLSN: record.LSN  // First modification since flush
                    })
                END IF
                
            COMMIT:
                // Transaction committed
                transaction_table[record.txn_id].status := COMMITTED
                
            ABORT:
                // Transaction aborted (may require undo)
                transaction_table[record.txn_id].status := ABORTED
                
            END:
                // Transaction fully complete (committed or aborted + undone)
                transaction_table.remove(record.txn_id)
                
        END CASE
        
        lsn := next_LSN()
    END WHILE
    
    // Analysis complete. Returns:
    // - transaction_table: All transactions requiring UNDO
    // - dirty_page_table: Defines REDO starting point
    RETURN (transaction_table, dirty_page_table)

After Analysis:

The transaction table now contains:

All transactions that were active at crash time (need UNDO)
Their lastLSN (starting point for backward UNDO traversal)

The dirty page table tells us:

The minimum recLSN across all dirty pages = starting point for REDO
Which pages potentially need REDO (any page in the table)

Analysis Example:

Checkpoint saved: T1 active (lastLSN=100), pages P1(recLSN=95), P2(recLSN=98)

Log after checkpoint:

LSN=110: [UPDATE, T1, P3, ...]    // T1 updates P3
LSN=112: [BEGIN, T2]              // T2 starts
LSN=115: [UPDATE, T2, P1, ...]    // T2 updates P1
LSN=118: [COMMIT, T1]             // T1 commits
LSN=120: [UPDATE, T2, P2, ...]    // T2 updates P2
--- CRASH ---

After Analysis:

Transaction table: T2 (RUNNING, lastLSN=120)
Dirty page table: P1(95), P2(98), P3(110)
REDO starts at LSN 95 (min recLSN)

Redo Phase in Detail

The REDO phase restores the database to its exact pre-crash state by replaying all logged operations. This is ARIES's 'repeat history' paradigm.

Starting Point: The minimum recLSN from the dirty page table (the earliest point where a page might need updates)

Key Principle: REDO only if necessary. For each UPDATE or CLR record, check if the update is already reflected in the page.

aries_redo_phase.pseudo

Pseudocode

PROCEDURE Redo_Phase(dirty_page_table):
    // Find starting point for REDO
    redo_lsn := MIN(recLSN for each page in dirty_page_table)
    
    // Scan forward from redo_lsn to end of log
    FOR EACH log_record WHERE record.LSN >= redo_lsn:
        
        // Only process UPDATE and CLR records (they modify pages)
        IF record.type NOT IN {UPDATE, CLR} THEN
            CONTINUE
        END IF
        
        // Optimization 1: Skip if page not in dirty page table
        IF record.page_id NOT IN dirty_page_table THEN
            CONTINUE  // Page wasn't dirty, no redo needed
        END IF
        
        // Optimization 2: Skip if record is before page became dirty
        IF record.LSN < dirty_page_table[record.page_id].recLSN THEN
            CONTINUE  // Record preceded the dirty state
        END IF
        
        // Now we must examine the actual page
        page := fetch_page(record.page_id)
        
        // Key check: Is this update already on the page?
        IF page.pageLSN >= record.LSN THEN
            CONTINUE  // Already applied (page was written after this log record)
        END IF
        
        // Apply the redo operation
        apply_redo(record, page)
        
        // Update page's LSN to reflect this operation
        page.pageLSN := record.LSN
        
    END FOR
    
    // Database now reflects pre-crash state (including uncommitted changes)

The pageLSN Check:

The comparison page.pageLSN >= record.LSN is the heart of REDO efficiency:

If pageLSN < record.LSN: The page doesn't have this update → must REDO
If pageLSN >= record.LSN: The page already has this (or later) update → skip

This check allows ARIES to handle:

Pages that were flushed after modification (no redo needed)
Pages that were never flushed (full redo needed)
Partial flushes (some updates present, others missing)

Idempotency:

Each REDO operation is idempotent—applying it multiple times has the same effect as applying it once. Combined with the pageLSN check, this means:

We can crash during REDO and restart safely
The same log record is never applied twice to the same page
Recovery of recovery is correct

Redo Efficiency

Undo Phase in Detail

Key Concept: Compensation Log Records (CLRs)

When ARIES undoes an operation, it writes a CLR that:

Records the undo action (for crash safety)
Contains an undoNextLSN pointing to the next record to undo
Is itself REDO-only (never undone)

CLRs ensure that if we crash during UNDO, we don't redo the same undo work.

aries_undo_phase.pseudo

Pseudocode

PROCEDURE Undo_Phase(transaction_table):
    // Collect all LSNs that need to be undone
    // For each active transaction, we'll undo from its lastLSN backward
    
    undo_list := new MaxHeap()  // Process highest LSN first
    
    FOR EACH txn IN transaction_table WHERE txn.status != COMMITTED:
        undo_list.add(txn.lastLSN)
    END FOR
    
    // Process undo list in descending LSN order
    WHILE undo_list.not_empty():
        lsn := undo_list.pop_max()
        record := read_log_record(lsn)
        
        IF record.type = CLR THEN
            // CLR's undoNextLSN tells us what to undo next
            IF record.undoNextLSN != NULL THEN
                undo_list.add(record.undoNextLSN)
            END IF
            // CLRs are never undone themselves
            
        ELSE IF record.type = UPDATE THEN
            // Undo this update
            page := fetch_page(record.page_id)
            
            // Apply the undo using before-image
            apply_undo(record, page)
            
            // Write CLR to record the undo
            clr := {
                type: CLR,
                txn_id: record.txn_id,
                page_id: record.page_id,
                LSN: next_LSN(),
                undoNextLSN: record.prevLSN,  // Points to next record to undo
                redo_info: record.before_image  // CLR redoes the undo
            }
            write_log(clr)
            page.pageLSN := clr.LSN
            
            // Queue the previous operation for undo
            IF record.prevLSN != NULL THEN
                undo_list.add(record.prevLSN)
            END IF
            
        END IF
    END WHILE
    
    // Write END records for all undone transactions
    FOR EACH txn IN transaction_table WHERE txn.status != COMMITTED:
        write_log({type: END, txn_id: txn.txn_id})
    END FOR

The undoNextLSN Chain:

CLRs contain an undoNextLSN field that creates a 'skip chain' for efficient traversal:

Original: UPDATE(100) <-- UPDATE(110) <-- UPDATE(120)
                  ^           |              |
                  |           v              |
                  +----  CLR(125) ----> prevLSN=100
                             |          undoNextLSN=100
                             v
                        CLR(130) ----> undoNextLSN=100
                             |
                             v
               Undo of 100: CLR(135) ----> undoNextLSN=NULL

If we crash after writing CLR(125), recovery's UNDO phase:

Sees CLR(125) during log scan
Follows undoNextLSN to 100 (skipping already-undone 110)
Undoes record 100
Transaction fully undone

Why Process Highest LSN First?

Using a max-heap ensures we undo operations in reverse chronological order:

Operations are undone in the reverse order they were done
Multiple transactions are interleaved correctly
The database returns to a consistent state

Crash-Safe Undo

CLRs make UNDO restartable. If we crash during UNDO, recovery's REDO phase replays the CLRs (applying the undo operations), then UNDO continues from where it left off using undoNextLSN pointers.

Fuzzy Checkpointing

Checkpoints bound recovery time by saving system state periodically. ARIES uses fuzzy checkpoints that don't require stopping the system.

The Checkpointing Process:

Write a BEGIN_CHECKPOINT record
Write the current transaction table to the log
Write the current dirty page table to the log
Write an END_CHECKPOINT record with pointer to BEGIN
Update the master record to point to this checkpoint

No quiescing is required. Transactions continue throughout.

Why 'Fuzzy'?

Between BEGIN_CHECKPOINT and END_CHECKPOINT:

New transactions may start
Old transactions may commit
Pages may be modified or flushed

The checkpoint captures a 'fuzzy' snapshot—not a precise moment, but a valid starting point for recovery. Analysis will correct any discrepancies by scanning beyond the checkpoint.

aries_checkpoint.pseudo

Pseudocode

PROCEDURE Fuzzy_Checkpoint():
    // Step 1: Write begin marker
    begin_lsn := write_log({type: BEGIN_CHECKPOINT})
    
    // Step 2: Capture current state (snapshot, may be slightly stale)
    txn_table_copy := copy(transaction_table)
    dpt_copy := copy(dirty_page_table)
    
    // Step 3: Write state to log
    write_log({
        type: END_CHECKPOINT,
        begin_lsn: begin_lsn,
        transaction_table: txn_table_copy,
        dirty_page_table: dpt_copy
    })
    
    // Step 4: Update master record to point to new checkpoint
    // This is the last action - if we crash before this,
    // the previous checkpoint is still valid
    update_master_record(begin_lsn)
    
    // No transactions were blocked during this process
 
 
// The master record stores the LSN of the last valid checkpoint
// On restart, recovery reads the master record first

Checkpoint Impact on Recovery:

Without Checkpoints	With Checkpoints
Analysis scans entire log	Analysis starts at checkpoint
REDO starts at log beginning	REDO starts at min(recLSN)
UNDO processes all uncommitted	UNDO unchanged
Recovery O(log size)	Recovery O(checkpoint interval)

Checkpoint Frequency Trade-off:

Frequent checkpoints → shorter recovery time
Frequent checkpoints → more overhead during operation
Typical interval: seconds to minutes depending on workload and recovery time requirements

Dirty Page Writeback

Summary: ARIES

ARIES represents the pinnacle of database recovery algorithm design. Let's consolidate the key concepts:

Key Takeaways

•STEAL/NO-FORCE: Enables unlimited transaction sizes and fast commits, at the cost of requiring UNDO + REDO
•WAL is mandatory: Log records must reach disk before corresponding data changes—the foundation of recoverability
•Three phases: Analysis (reconstruct state), REDO (repeat history), UNDO (remove uncommitted)
•LSN as universal timeline: pageLSN vs record LSN determines whether REDO is needed
•Repeat History: REDO all operations (even for losers) to establish consistent starting point for UNDO
•CLRs make UNDO crash-safe: Compensation records ensure no operation is undone twice
•Fuzzy checkpoints: Bound recovery time without stopping the system
•Industry standard: Implemented in Oracle, SQL Server, PostgreSQL, MySQL InnoDB, DB2, and more

What's Next:

Page Complete

4 / 5