Database Management SystemsWrite-Ahead Logging (WAL)

Write-Ahead Logging (WAL)

LevelIntermediate

Duration60 mins

TopicWrite-Ahead Logging (WAL)

4 / 5

Redo Information

Reconstructing the Lost

When a committed transaction's changes haven't reached disk before a crash, those changes are lost from the data files. But they're not lost from the database—because they were logged first. The system can redo those changes, replaying them from the log to restore the committed state.

Redo information is the complement to undo. Where undo provides atomicity (abort leaves no trace), redo provides durability (commit survives any failure). Together, they form the complete recovery capability of WAL.

This page explores redo in depth: what it contains, how it's structured, and how the redo phase of recovery reconstructs committed transactions that the crash interrupted.

What You Will Learn

You will understand redo information (after-images), the mechanics of the redo phase during recovery, idempotence requirements for redo operations, and how databases achieve durability through log-based replay. This completes your understanding of WAL's recovery guarantees.

What is Redo Information?

Redo information, also called after-images or new values, represents the values that a modification introduces. It provides the ability to recreate any committed change.

Formal Definition:

For any modification M that changes data item X from value V₁ to value V₂:

Redo information = V₂ (the new value)
Purpose = If the modification was committed but didn't reach disk, we apply V₂

Contrast with Undo:

Redo vs Undo Information
Aspect	Undo Information	Redo Information
Contains	Before-image (old value)	After-image (new value)
Purpose	Reverse uncommitted changes	Replay committed changes
Guarantees	Atomicity	Durability
When Applied	Transaction abort / Crash recovery undo phase	Crash recovery redo phase
Direction	Moves backward in time	Moves forward in time

What Redo Information Contains:

Component	Description	Example
LSN	Log Sequence Number for ordering	`12847`
Transaction ID	Which transaction made this change	`T1024`
Page ID	Which page was modified	`Page#4521`
Offset	Position within the page	`Byte 312`
Length	Size of the modified region	`8 bytes`
After-Image	The new value to apply	`500`

Notice that redo doesn't strictly need the before-image—only the new value matters for replay. However, most systems store both in a single log record, serving dual purposes.

Redo is Sufficient for Durability

Even if the data page was completely reinitialized (lost entirely), redo can reconstruct it by replaying all modifications from the log. The log is the source of truth—data pages are merely a cache that can be rebuilt.

The Redo Rule

The redo component of WAL has a specific timing constraint:

The Redo Rule: Before a transaction commits (returns success to the client), all log records for that transaction—including the commit record—must be on stable storage.

This rule enables the NO-FORCE policy:

NO-FORCE Explanation:

When a transaction commits, we don't force its dirty pages to disk. We only force its log records. This is dramatically faster:

Log writes: Sequential, append-only, can be batched
Data page writes: Random access, scattered across disk

A transaction with 100 modified pages might require 100 random I/Os to force data. But its log records are sequential and might fit in a single I/O.

Commit with NO-FORCE Policy

Pseudocode

function commitTransaction(transaction: Transaction):
    // STEP 1: Create commit record
    commitRecord = new CommitLogRecord(
        LSN: log.nextLSN(),
        TransactionID: transaction.id,
        PrevLSN: transaction.lastLSN,
        CommitTime: now()
    )
    
    // STEP 2: Append commit record to log buffer
    log.append(commitRecord)
    
    // STEP 3: Force log to stable storage
    // This is the ONLY I/O barrier at commit time
    log.flushUpTo(commitRecord.LSN)
    
    // STEP 4: Transaction is now durable!
    // Even if crash happens right now, recovery can redo all changes
    
    // STEP 5: Release locks (strict 2PL releases at commit)
    lockManager.releaseAll(transaction)
    
    // STEP 6: Return success to client
    // Data pages may still be dirty in buffer - that's fine
    return COMMIT_SUCCESS
 
// Note what we did NOT do:
// - We did NOT force dirty pages to disk
// - We did NOT wait for data page writes
// - We ONLY waited for log flush
// This is NO-FORCE: data pages written lazily, log written eagerly

Why is NO-FORCE Safe?

Consider what happens if crash occurs immediately after commit returns:

Client received commit acknowledgment → Transaction is durable
Log contains all operations + commit record (forced before acknowledgment)
Data pages may not have reached disk
On recovery, redo phase finds commit record, replays all operations
Database state reflects the committed transaction

The key insight: The log IS the transaction. Data pages are just a performance optimization—a cached materialization of what the log already contains.

Group Commit Optimization

Flushing the log for every commit would be expensive. Systems use 'group commit': batch multiple transactions' commit records and flush once. Each transaction waits until the batch flush completes. This dramatically reduces I/O while maintaining full durability.

The Redo Phase of Recovery

After a crash, the redo phase replays logged operations to ensure all committed changes are present in the data pages. This is the second phase of ARIES-style recovery (after Analysis):

Redo Phase Goals:

Restore the database to its exact state at the moment of crash
Re-apply all logged changes that may have been lost
This includes changes from both committed and uncommitted transactions

Wait—why redo uncommitted transactions? Because we need to restore the exact crash-time state before undo can work correctly. We 'repeat history' then undo the losers.

Redo Phase Algorithm (ARIES)

Pseudocode

function recoveryRedoPhase(redoStartLSN: LSN, 
                            dirtyPageTable: Map<PageID, RecoveryLSN>):
    
    print("Starting REDO phase from LSN: " + redoStartLSN)
    
    // Scan log forward from the redo start point
    currentLSN = redoStartLSN
    
    while currentLSN < log.endOfLog():
        logRecord = log.read(currentLSN)
        
        // Only consider records that affect data pages
        if logRecord.type in [UPDATE, CLR, FULL_PAGE_IMAGE]:
            
            pageID = logRecord.pageID
            
            // Check if redo is needed using dirty page table
            if pageID in dirtyPageTable:
                recoveryLSN = dirtyPageTable[pageID]
                
                // Redo only if log LSN >= recovery LSN
                // (modification might not have reached disk)
                if currentLSN >= recoveryLSN:
                    
                    // Fetch the page
                    page = bufferPool.getPage(pageID)
                    
                    // Compare page LSN with log record LSN
                    if page.pageLSN < currentLSN:
                        // THIS is the key redo check
                        // Page LSN < Log LSN means this update was NOT applied
                        // Must apply the redo
                        
                        applyRedo(page, logRecord)
                        page.pageLSN = currentLSN
                        page.markDirty()
                        
                        print("  REDO: Applied LSN " + currentLSN + " to Page " + pageID)
                    else:
                        // Page already contains this or later update
                        // No redo needed
                        print("  SKIP: Page " + pageID + " already has LSN " + page.pageLSN)
        
        currentLSN = log.nextRecord(currentLSN)
    
    print("REDO phase complete")
 
function applyRedo(page: Page, logRecord: LogRecord):
    // Apply the after-image to the page
    page.setData(
        logRecord.offset,
        logRecord.afterImage
    )

The Page LSN Check:

The critical optimization in redo is comparing page.pageLSN with log.LSN:

If page.pageLSN >= log.LSN: This update already reached disk → Skip
If page.pageLSN < log.LSN: Update was lost → Apply redo

This avoids re-applying modifications that survived the crash, making redo efficient even when most data was already persistent.

Why Redo Everything (Including Uncommitted)?

The redo phase doesn't distinguish committed from uncommitted transactions. It replays ALL logged modifications. This 'repeating history' approach has a crucial benefit:

After redo, database is in exact crash-time state
Undo phase can proceed with simple logic
Any complex interactions between transactions are correctly preserved
Recovery is simpler and more provably correct

Redo Before Undo

The order is significant: redo first, then undo. Redo reconstructs the exact crash state. Undo then removes effects of losers. If we undid before redoing, we might undo things that were never applied, or miss undoing things that were partially applied.

Idempotence of Redo Operations

For recovery to be reliable, redo operations must be idempotent—applying the same redo multiple times produces the same result as applying it once.

Why Idempotence Matters:

Consider this scenario:

Crash occurs during recovery redo phase
Some redo operations were applied, some weren't
System restarts, recovery begins again
Redo phase runs again from the beginning

Without idempotence, we might apply the same operation twice:

balance = balance + 100 → Applied twice = wrong result!

With idempotence:

balance = 600 → Applied twice = same correct result

Idempotent vs Non-Idempotent Operations

Pseudocode

// NON-IDEMPOTENT: Dangerous for recovery!
// Applying twice gives wrong result
 
operation_increment:
    balance = balance + 100
    // First apply:  balance goes from 500 to 600
    // Second apply: balance goes from 600 to 700 (WRONG!)
 
// IDEMPOTENT: Safe for recovery
// Applying once or many times gives same result
 
operation_set:
    balance = 600
    // First apply:  balance becomes 600
    // Second apply: balance is still 600 (CORRECT!)
 
 
// WAL logging approach ensuring idempotence:
 
// Instead of logging: "increment balance by 100"
// Log: "set balance to 600" (the final value)
 
LogRecord {
    LSN: 12345,
    PageID: accounts_page_42,
    Offset: byte_312,
    BeforeImage: 500,   // What it was
    AfterImage: 600     // What it should be (not "add 100")
}
 
// Physical redo is naturally idempotent:
// "Write bytes 0x00000258 at offset 312"
// Doing this once or multiple times = same final state

Achieving Idempotence:

1. Physical Logging (Byte-Level)

Store the actual bytes to write, not the operation:

Log: {offset: 312, bytes: 0x00000258} (the value 600)
Redo: overwrite those bytes at that offset
Idempotent: writing same bytes multiple times = same result

2. Physiological Logging

Physical to the page, logical within the page:

Log: {pageID: 42, operation: 'insert row with key 100'}
Redo: if row with key 100 doesn't exist, insert it
Idempotent: check-before-insert makes it safe

3. LSN-Based Skip

Use page LSN to detect already-applied operations:

If page.pageLSN >= log.LSN, skip the redo
The operation was already applied (before crash)
This makes ANY operation effectively idempotent

Idempotence Strategies

•Physical logging — Log final byte values, not operations. Overwriting with same bytes is idempotent.
•Page LSN comparison — Skip redo if page already has this or later LSN. Prevents double-apply.
•Check-then-apply — For logical operations, verify preconditions. Insert only if not exists.
•Full-page images — After checkpoint, log entire page. Overwriting entire page is idempotent.

Non-Idempotent Operations Are Dangerous

Any operation that depends on current state ('add 5', 'append to list', 'increment counter') is non-idempotent and dangerous for recovery. Always log the RESULT, not the DELTA. The log record should contain what the value IS, not what was done to it.

Redo and Checkpoints

Without checkpoints, redo would need to start from the beginning of the log—potentially processing days or weeks of operations. Checkpoints limit the redo scope:

How Checkpoints Limit Redo:

Checkpoint records the oldest dirty page LSN (oldest unflushed modification)
On recovery, redo starts from this LSN, not from log beginning
Pages flushed before checkpoint don't need redo—they're already current

The Redo Start Point (ARIES RedoLSN):

Determining Redo Start Point

Pseudocode

// During checkpoint, we record the dirty page table state
function performCheckpoint():
    // Find all dirty pages in buffer pool
    dirtyPages = bufferPool.getDirtyPages()
    
    // For each dirty page, record its RecoveryLSN
    // (the first LSN that dirtied this page since last flush)
    dirtyPageTable = {}
    for page in dirtyPages:
        dirtyPageTable[page.id] = page.recoveryLSN
    
    // Also record active transactions
    activeTransactions = transactionManager.getActive()
    
    // Write checkpoint record
    checkpointRecord = new CheckpointRecord(
        LSN: log.nextLSN(),
        DirtyPageTable: dirtyPageTable,
        ActiveTransactions: activeTransactions
    )
    log.append(checkpointRecord)
    log.flush()
    
    // Update master record to point to this checkpoint
    writeMasterRecord(checkpointRecord.LSN)
 
// During recovery, determine where to start redo
function determineRedoStartLSN(checkpoint: CheckpointRecord) -> LSN:
    // The redo start point is the MINIMUM of:
    // 1. All RecoveryLSNs in the dirty page table
    // This is the oldest modification that might not be on disk
    
    if checkpoint.dirtyPageTable.isEmpty():
        // No dirty pages at checkpoint time
        // Start redo from the checkpoint
        return checkpoint.LSN
    
    minRecoveryLSN = MIN(checkpoint.dirtyPageTable.values())
    
    // Could also need to consider active transaction's first LSN
    // In case transaction started before any current dirty page
    for txn in checkpoint.activeTransactions:
        if txn.firstLSN < minRecoveryLSN:
            minRecoveryLSN = txn.firstLSN
    
    return minRecoveryLSN

Checkpoint Trade-offs:

Checkpoint Frequency	Checkpoint Cost	Recovery Time	Log Size
Very frequent	High (flush overhead)	Very short	Small
Moderate	Moderate	Moderate	Moderate
Infrequent	Low	Long redo phase	Large

Fuzzy Checkpoints:

Modern systems use fuzzy checkpoints that don't require stopping all activity:

Write checkpoint record (marking intent)
Flush dirty pages in background
Write end-checkpoint record when done
Activity continues during entire process

This reduces checkpoint impact on normal operations.

Checkpoint Frequency Tuning

More checkpoints = faster recovery but higher runtime cost. Less checkpoints = slower recovery but better throughput. Tune based on your recovery time objective (RTO). If you need recovery under 1 minute, checkpoint frequently. If 10 minutes is acceptable, checkpoint less often.

Physical vs Logical Redo

Database systems choose between different levels of abstraction for redo logging:

Physical Redo (Byte-Level)

Log the exact bytes modified at their exact location:

{page: 42, offset: 312, oldBytes: 0x000001F4, newBytes: 0x00000258}
Redo: Copy newBytes to page 42 at offset 312

Pros:

Simple, fast redo execution
Naturally idempotent
No logic required during recovery

Cons:

Verbose—full data duplication
Page format changes require log format changes
Can't optimize common cases

Physical vs Logical vs Physiological Logging
Type	Log Content	Redo Complexity	Space Efficiency
Physical	Byte changes at byte offsets	Trivial (copy bytes)	Poor (full data duplication)
Logical	High-level SQL operation	Complex (re-execute query)	Excellent (compact)
Physiological	Physical to page, logical within page	Moderate (interpret operation)	Good (compact but localized)

Logical Redo (Operation-Level)

Log the SQL operation that was executed:

{INSERT INTO accounts VALUES (42, 'Alice', 1000)}
Redo: Re-execute the INSERT

Pros:

Very compact logs
Independent of physical storage format

Cons:

Redo requires full query execution (slow, complex)
May produce different results if data changed
Difficult to make idempotent

Physiological Redo (ARIES approach)

Physical to the page level, logical within the page:

{page: 42, operation: INSERT_ROW, rowData: ...}
Redo: Apply the INSERT_ROW operation to page 42

This is the sweet spot:

Page-level addressing is physical—no need to search
Within-page operation is logical—compact and adaptable
Used by PostgreSQL, InnoDB, most modern systems

Physiological Logging Example

Pseudocode

// Physical logging: log exact bytes
physicalLog = {
    LSN: 12345,
    PageID: 42,
    Offset: 512,
    Length: 64,
    OldImage: [0x00, 0x01, 0x02, ...],  // 64 bytes
    NewImage: [0x10, 0x11, 0x12, ...]   // 64 bytes
}
 
// Logical logging: log the SQL
logicalLog = {
    LSN: 12345,
    Query: "INSERT INTO users (id, name, balance) VALUES (100, 'Alice', 500)",
    TransactionID: T1
}
 
// Physiological logging: physical page, logical operation
physiologicalLog = {
    LSN: 12345,
    PageID: 42,                    // Physical: which page
    Operation: "INSERT_ROW",       // Logical: what operation
    RowData: {                     // Logical: operation data
        id: 100,
        name: "Alice", 
        balance: 500
    },
    SlotHint: 7                    // Optimization: where to insert
}
 
// Redo for physiological:
function redoPhysiological(page: Page, log: PhysiologicalLog):
    if log.operation == "INSERT_ROW":
        if not page.hasRowWithKey(log.rowData.id):
            page.insertRow(log.rowData, slotHint: log.slotHint)
    else if log.operation == "UPDATE_ROW":
        page.updateRow(log.rowData.id, log.rowData.changes)
    else if log.operation == "DELETE_ROW":
        if page.hasRowWithKey(log.rowData.id):
            page.deleteRow(log.rowData.id)

Redo Performance Optimization

Long redo phases can extend database downtime during recovery. Several optimizations reduce redo time:

1. Parallel Redo

Redo operations on different pages are independent—they can execute in parallel:

Parallel Redo Strategy

Pseudocode

function parallelRedoPhase(logRecords: List<LogRecord>, 
                             numWorkers: int):
    
    // Group log records by page
    recordsByPage = groupBy(logRecords, r -> r.pageID)
    
    // Create per-page redo chains
    // Each chain must execute in LSN order, but chains are independent
    redoChains = []
    for pageID, records in recordsByPage:
        chain = new RedoChain(pageID, sortByLSN(records))
        redoChains.append(chain)
    
    // Distribute chains to workers
    // Worker i handles chains where hash(pageID) % numWorkers == i
    workers = []
    for i in range(numWorkers):
        workerChains = redoChains.filter(c -> hash(c.pageID) % numWorkers == i)
        workers.append(startWorker(workerChains))
    
    // Wait for all workers to complete
    for worker in workers:
        worker.join()
 
class RedoWorker:
    function run(chains: List<RedoChain>):
        for chain in chains:
            page = bufferPool.getPage(chain.pageID)
            for record in chain.records:
                if page.pageLSN < record.LSN:
                    applyRedo(page, record)
                    page.pageLSN = record.LSN
            page.markDirty()

2. Prefetching

During redo, we know which pages we'll need (from the log). Prefetch them to eliminate disk wait:

Scan log ahead to identify upcoming pages
Issue asynchronous reads for those pages
By the time redo reaches them, they're in memory

3. Batched Redo

Instead of applying one log record at a time:

Collect all redo records for a page
Apply them all in one pass
Single disk read, single disk write

4. Recovery Checkpoints

During very long redo:

Periodically checkpoint recovered state
If crash during recovery, restart from recovery checkpoint
Prevents having to redo the redo

Redo Performance Techniques

•Parallel redo — Multiple workers process independent pages simultaneously; near-linear scaling with workers
•I/O prefetching — Scan log ahead and prefetch pages; overlaps I/O with processing
•Page batching — Collect all redo for a page before reading it; minimizes I/O operations
•Memory sizing — Larger buffer pool = more pages cached = less I/O during redo
•SSD storage — Random reads become fast; redo performance dramatically improved
•Recovery checkpoints — Persist progress during long redo; prevents re-work on crash

Recovery Time Objective

Production systems define a Recovery Time Objective (RTO)—maximum acceptable recovery time. Design checkpoint frequency, log size limits, and redo parallelism to meet your RTO. A 5-minute RTO requires different tuning than a 60-minute RTO.

Summary: Redo Information

We've explored redo information—the after-images that guarantee durability. Let's consolidate the key concepts:

Key Takeaways

•Redo information is after-images — The new values that modifications introduce, enabling replay of committed changes.
•The Redo Rule enables NO-FORCE — Commit only waits for log flush, not data page writes, because redo can reconstruct later.
•Redo phases replay ALL modifications — Both committed and uncommitted, to restore exact crash-time state before undo.
•Idempotence is essential — Redo operations must produce same result whether applied once or many times.
•Page LSN comparison skips applied redo — Efficient recovery by detecting what's already on disk.
•Checkpoints bound redo scope — Without checkpoints, redo would scan entire log history.
•Physiological logging is the standard — Physical page addressing with logical operations—best of both worlds.

What's Next:

Now that we understand both undo and redo information, the final page explores why WAL matters in practice—examining the critical importance of this protocol for database reliability, performance, and replication in production systems.

Page Complete

You now understand redo information—the after-images that guarantee durability. With undo and redo together, WAL provides complete recovery capability: atomicity (undo) and durability (redo). Next, we'll explore why WAL is critical for real-world database systems.

4 / 5

Loading learning content...

Database Management SystemsWrite-Ahead Logging (WAL)

Write-Ahead Logging (WAL)

LevelIntermediate

Duration60 mins

TopicWrite-Ahead Logging (WAL)

4 / 5

Redo Information

Reconstructing the Lost

This page explores redo in depth: what it contains, how it's structured, and how the redo phase of recovery reconstructs committed transactions that the crash interrupted.

What You Will Learn

What is Redo Information?

Redo information, also called after-images or new values, represents the values that a modification introduces. It provides the ability to recreate any committed change.

Formal Definition:

For any modification M that changes data item X from value V₁ to value V₂:

Redo information = V₂ (the new value)
Purpose = If the modification was committed but didn't reach disk, we apply V₂

Contrast with Undo:

Redo vs Undo Information
Aspect	Undo Information	Redo Information
Contains	Before-image (old value)	After-image (new value)
Purpose	Reverse uncommitted changes	Replay committed changes
Guarantees	Atomicity	Durability
When Applied	Transaction abort / Crash recovery undo phase	Crash recovery redo phase
Direction	Moves backward in time	Moves forward in time

What Redo Information Contains:

Component	Description	Example
LSN	Log Sequence Number for ordering	`12847`
Transaction ID	Which transaction made this change	`T1024`
Page ID	Which page was modified	`Page#4521`
Offset	Position within the page	`Byte 312`
Length	Size of the modified region	`8 bytes`
After-Image	The new value to apply	`500`

Notice that redo doesn't strictly need the before-image—only the new value matters for replay. However, most systems store both in a single log record, serving dual purposes.

Redo is Sufficient for Durability

The Redo Rule

The redo component of WAL has a specific timing constraint:

The Redo Rule: Before a transaction commits (returns success to the client), all log records for that transaction—including the commit record—must be on stable storage.

This rule enables the NO-FORCE policy:

NO-FORCE Explanation:

When a transaction commits, we don't force its dirty pages to disk. We only force its log records. This is dramatically faster:

Log writes: Sequential, append-only, can be batched
Data page writes: Random access, scattered across disk

A transaction with 100 modified pages might require 100 random I/Os to force data. But its log records are sequential and might fit in a single I/O.

Commit with NO-FORCE Policy

Pseudocode

function commitTransaction(transaction: Transaction):
    // STEP 1: Create commit record
    commitRecord = new CommitLogRecord(
        LSN: log.nextLSN(),
        TransactionID: transaction.id,
        PrevLSN: transaction.lastLSN,
        CommitTime: now()
    )
    
    // STEP 2: Append commit record to log buffer
    log.append(commitRecord)
    
    // STEP 3: Force log to stable storage
    // This is the ONLY I/O barrier at commit time
    log.flushUpTo(commitRecord.LSN)
    
    // STEP 4: Transaction is now durable!
    // Even if crash happens right now, recovery can redo all changes
    
    // STEP 5: Release locks (strict 2PL releases at commit)
    lockManager.releaseAll(transaction)
    
    // STEP 6: Return success to client
    // Data pages may still be dirty in buffer - that's fine
    return COMMIT_SUCCESS
 
// Note what we did NOT do:
// - We did NOT force dirty pages to disk
// - We did NOT wait for data page writes
// - We ONLY waited for log flush
// This is NO-FORCE: data pages written lazily, log written eagerly

Why is NO-FORCE Safe?

Consider what happens if crash occurs immediately after commit returns:

Client received commit acknowledgment → Transaction is durable
Log contains all operations + commit record (forced before acknowledgment)
Data pages may not have reached disk
On recovery, redo phase finds commit record, replays all operations
Database state reflects the committed transaction

The key insight: The log IS the transaction. Data pages are just a performance optimization—a cached materialization of what the log already contains.

Group Commit Optimization

The Redo Phase of Recovery

After a crash, the redo phase replays logged operations to ensure all committed changes are present in the data pages. This is the second phase of ARIES-style recovery (after Analysis):

Redo Phase Goals:

Restore the database to its exact state at the moment of crash
Re-apply all logged changes that may have been lost
This includes changes from both committed and uncommitted transactions

Wait—why redo uncommitted transactions? Because we need to restore the exact crash-time state before undo can work correctly. We 'repeat history' then undo the losers.

Redo Phase Algorithm (ARIES)

Pseudocode

function recoveryRedoPhase(redoStartLSN: LSN, 
                            dirtyPageTable: Map<PageID, RecoveryLSN>):
    
    print("Starting REDO phase from LSN: " + redoStartLSN)
    
    // Scan log forward from the redo start point
    currentLSN = redoStartLSN
    
    while currentLSN < log.endOfLog():
        logRecord = log.read(currentLSN)
        
        // Only consider records that affect data pages
        if logRecord.type in [UPDATE, CLR, FULL_PAGE_IMAGE]:
            
            pageID = logRecord.pageID
            
            // Check if redo is needed using dirty page table
            if pageID in dirtyPageTable:
                recoveryLSN = dirtyPageTable[pageID]
                
                // Redo only if log LSN >= recovery LSN
                // (modification might not have reached disk)
                if currentLSN >= recoveryLSN:
                    
                    // Fetch the page
                    page = bufferPool.getPage(pageID)
                    
                    // Compare page LSN with log record LSN
                    if page.pageLSN < currentLSN:
                        // THIS is the key redo check
                        // Page LSN < Log LSN means this update was NOT applied
                        // Must apply the redo
                        
                        applyRedo(page, logRecord)
                        page.pageLSN = currentLSN
                        page.markDirty()
                        
                        print("  REDO: Applied LSN " + currentLSN + " to Page " + pageID)
                    else:
                        // Page already contains this or later update
                        // No redo needed
                        print("  SKIP: Page " + pageID + " already has LSN " + page.pageLSN)
        
        currentLSN = log.nextRecord(currentLSN)
    
    print("REDO phase complete")
 
function applyRedo(page: Page, logRecord: LogRecord):
    // Apply the after-image to the page
    page.setData(
        logRecord.offset,
        logRecord.afterImage
    )

The Page LSN Check:

The critical optimization in redo is comparing page.pageLSN with log.LSN:

If page.pageLSN >= log.LSN: This update already reached disk → Skip
If page.pageLSN < log.LSN: Update was lost → Apply redo

This avoids re-applying modifications that survived the crash, making redo efficient even when most data was already persistent.

Why Redo Everything (Including Uncommitted)?

The redo phase doesn't distinguish committed from uncommitted transactions. It replays ALL logged modifications. This 'repeating history' approach has a crucial benefit:

After redo, database is in exact crash-time state
Undo phase can proceed with simple logic
Any complex interactions between transactions are correctly preserved
Recovery is simpler and more provably correct

Redo Before Undo

Idempotence of Redo Operations

For recovery to be reliable, redo operations must be idempotent—applying the same redo multiple times produces the same result as applying it once.

Why Idempotence Matters:

Consider this scenario:

Crash occurs during recovery redo phase
Some redo operations were applied, some weren't
System restarts, recovery begins again
Redo phase runs again from the beginning

Without idempotence, we might apply the same operation twice:

balance = balance + 100 → Applied twice = wrong result!

With idempotence:

balance = 600 → Applied twice = same correct result

Idempotent vs Non-Idempotent Operations

Pseudocode

// NON-IDEMPOTENT: Dangerous for recovery!
// Applying twice gives wrong result
 
operation_increment:
    balance = balance + 100
    // First apply:  balance goes from 500 to 600
    // Second apply: balance goes from 600 to 700 (WRONG!)
 
// IDEMPOTENT: Safe for recovery
// Applying once or many times gives same result
 
operation_set:
    balance = 600
    // First apply:  balance becomes 600
    // Second apply: balance is still 600 (CORRECT!)
 
 
// WAL logging approach ensuring idempotence:
 
// Instead of logging: "increment balance by 100"
// Log: "set balance to 600" (the final value)
 
LogRecord {
    LSN: 12345,
    PageID: accounts_page_42,
    Offset: byte_312,
    BeforeImage: 500,   // What it was
    AfterImage: 600     // What it should be (not "add 100")
}
 
// Physical redo is naturally idempotent:
// "Write bytes 0x00000258 at offset 312"
// Doing this once or multiple times = same final state

Achieving Idempotence:

1. Physical Logging (Byte-Level)

Store the actual bytes to write, not the operation:

Log: {offset: 312, bytes: 0x00000258} (the value 600)
Redo: overwrite those bytes at that offset
Idempotent: writing same bytes multiple times = same result

2. Physiological Logging

Physical to the page, logical within the page:

Log: {pageID: 42, operation: 'insert row with key 100'}
Redo: if row with key 100 doesn't exist, insert it
Idempotent: check-before-insert makes it safe

3. LSN-Based Skip

Use page LSN to detect already-applied operations:

If page.pageLSN >= log.LSN, skip the redo
The operation was already applied (before crash)
This makes ANY operation effectively idempotent

Idempotence Strategies

•Physical logging — Log final byte values, not operations. Overwriting with same bytes is idempotent.
•Page LSN comparison — Skip redo if page already has this or later LSN. Prevents double-apply.
•Check-then-apply — For logical operations, verify preconditions. Insert only if not exists.
•Full-page images — After checkpoint, log entire page. Overwriting entire page is idempotent.

Non-Idempotent Operations Are Dangerous

Redo and Checkpoints

Without checkpoints, redo would need to start from the beginning of the log—potentially processing days or weeks of operations. Checkpoints limit the redo scope:

How Checkpoints Limit Redo:

Checkpoint records the oldest dirty page LSN (oldest unflushed modification)
On recovery, redo starts from this LSN, not from log beginning
Pages flushed before checkpoint don't need redo—they're already current

The Redo Start Point (ARIES RedoLSN):

Determining Redo Start Point

Pseudocode

// During checkpoint, we record the dirty page table state
function performCheckpoint():
    // Find all dirty pages in buffer pool
    dirtyPages = bufferPool.getDirtyPages()
    
    // For each dirty page, record its RecoveryLSN
    // (the first LSN that dirtied this page since last flush)
    dirtyPageTable = {}
    for page in dirtyPages:
        dirtyPageTable[page.id] = page.recoveryLSN
    
    // Also record active transactions
    activeTransactions = transactionManager.getActive()
    
    // Write checkpoint record
    checkpointRecord = new CheckpointRecord(
        LSN: log.nextLSN(),
        DirtyPageTable: dirtyPageTable,
        ActiveTransactions: activeTransactions
    )
    log.append(checkpointRecord)
    log.flush()
    
    // Update master record to point to this checkpoint
    writeMasterRecord(checkpointRecord.LSN)
 
// During recovery, determine where to start redo
function determineRedoStartLSN(checkpoint: CheckpointRecord) -> LSN:
    // The redo start point is the MINIMUM of:
    // 1. All RecoveryLSNs in the dirty page table
    // This is the oldest modification that might not be on disk
    
    if checkpoint.dirtyPageTable.isEmpty():
        // No dirty pages at checkpoint time
        // Start redo from the checkpoint
        return checkpoint.LSN
    
    minRecoveryLSN = MIN(checkpoint.dirtyPageTable.values())
    
    // Could also need to consider active transaction's first LSN
    // In case transaction started before any current dirty page
    for txn in checkpoint.activeTransactions:
        if txn.firstLSN < minRecoveryLSN:
            minRecoveryLSN = txn.firstLSN
    
    return minRecoveryLSN

Checkpoint Trade-offs:

Checkpoint Frequency	Checkpoint Cost	Recovery Time	Log Size
Very frequent	High (flush overhead)	Very short	Small
Moderate	Moderate	Moderate	Moderate
Infrequent	Low	Long redo phase	Large

Fuzzy Checkpoints:

Modern systems use fuzzy checkpoints that don't require stopping all activity:

Write checkpoint record (marking intent)
Flush dirty pages in background
Write end-checkpoint record when done
Activity continues during entire process

This reduces checkpoint impact on normal operations.

Checkpoint Frequency Tuning

Physical vs Logical Redo

Database systems choose between different levels of abstraction for redo logging:

Physical Redo (Byte-Level)

Log the exact bytes modified at their exact location:

{page: 42, offset: 312, oldBytes: 0x000001F4, newBytes: 0x00000258}
Redo: Copy newBytes to page 42 at offset 312

Pros:

Simple, fast redo execution
Naturally idempotent
No logic required during recovery

Cons:

Verbose—full data duplication
Page format changes require log format changes
Can't optimize common cases

Physical vs Logical vs Physiological Logging
Type	Log Content	Redo Complexity	Space Efficiency
Physical	Byte changes at byte offsets	Trivial (copy bytes)	Poor (full data duplication)
Logical	High-level SQL operation	Complex (re-execute query)	Excellent (compact)
Physiological	Physical to page, logical within page	Moderate (interpret operation)	Good (compact but localized)

Logical Redo (Operation-Level)

Log the SQL operation that was executed:

{INSERT INTO accounts VALUES (42, 'Alice', 1000)}
Redo: Re-execute the INSERT

Pros:

Very compact logs
Independent of physical storage format

Cons:

Redo requires full query execution (slow, complex)
May produce different results if data changed
Difficult to make idempotent

Physiological Redo (ARIES approach)

Physical to the page level, logical within the page:

{page: 42, operation: INSERT_ROW, rowData: ...}
Redo: Apply the INSERT_ROW operation to page 42

This is the sweet spot:

Page-level addressing is physical—no need to search
Within-page operation is logical—compact and adaptable
Used by PostgreSQL, InnoDB, most modern systems

Physiological Logging Example

Pseudocode

// Physical logging: log exact bytes
physicalLog = {
    LSN: 12345,
    PageID: 42,
    Offset: 512,
    Length: 64,
    OldImage: [0x00, 0x01, 0x02, ...],  // 64 bytes
    NewImage: [0x10, 0x11, 0x12, ...]   // 64 bytes
}
 
// Logical logging: log the SQL
logicalLog = {
    LSN: 12345,
    Query: "INSERT INTO users (id, name, balance) VALUES (100, 'Alice', 500)",
    TransactionID: T1
}
 
// Physiological logging: physical page, logical operation
physiologicalLog = {
    LSN: 12345,
    PageID: 42,                    // Physical: which page
    Operation: "INSERT_ROW",       // Logical: what operation
    RowData: {                     // Logical: operation data
        id: 100,
        name: "Alice", 
        balance: 500
    },
    SlotHint: 7                    // Optimization: where to insert
}
 
// Redo for physiological:
function redoPhysiological(page: Page, log: PhysiologicalLog):
    if log.operation == "INSERT_ROW":
        if not page.hasRowWithKey(log.rowData.id):
            page.insertRow(log.rowData, slotHint: log.slotHint)
    else if log.operation == "UPDATE_ROW":
        page.updateRow(log.rowData.id, log.rowData.changes)
    else if log.operation == "DELETE_ROW":
        if page.hasRowWithKey(log.rowData.id):
            page.deleteRow(log.rowData.id)

Redo Performance Optimization

Long redo phases can extend database downtime during recovery. Several optimizations reduce redo time:

1. Parallel Redo

Redo operations on different pages are independent—they can execute in parallel:

Parallel Redo Strategy

Pseudocode

function parallelRedoPhase(logRecords: List<LogRecord>, 
                             numWorkers: int):
    
    // Group log records by page
    recordsByPage = groupBy(logRecords, r -> r.pageID)
    
    // Create per-page redo chains
    // Each chain must execute in LSN order, but chains are independent
    redoChains = []
    for pageID, records in recordsByPage:
        chain = new RedoChain(pageID, sortByLSN(records))
        redoChains.append(chain)
    
    // Distribute chains to workers
    // Worker i handles chains where hash(pageID) % numWorkers == i
    workers = []
    for i in range(numWorkers):
        workerChains = redoChains.filter(c -> hash(c.pageID) % numWorkers == i)
        workers.append(startWorker(workerChains))
    
    // Wait for all workers to complete
    for worker in workers:
        worker.join()
 
class RedoWorker:
    function run(chains: List<RedoChain>):
        for chain in chains:
            page = bufferPool.getPage(chain.pageID)
            for record in chain.records:
                if page.pageLSN < record.LSN:
                    applyRedo(page, record)
                    page.pageLSN = record.LSN
            page.markDirty()

2. Prefetching

During redo, we know which pages we'll need (from the log). Prefetch them to eliminate disk wait:

Scan log ahead to identify upcoming pages
Issue asynchronous reads for those pages
By the time redo reaches them, they're in memory

3. Batched Redo

Instead of applying one log record at a time:

Collect all redo records for a page
Apply them all in one pass
Single disk read, single disk write

4. Recovery Checkpoints

During very long redo:

Periodically checkpoint recovered state
If crash during recovery, restart from recovery checkpoint
Prevents having to redo the redo

Redo Performance Techniques

•Parallel redo — Multiple workers process independent pages simultaneously; near-linear scaling with workers
•I/O prefetching — Scan log ahead and prefetch pages; overlaps I/O with processing
•Page batching — Collect all redo for a page before reading it; minimizes I/O operations
•Memory sizing — Larger buffer pool = more pages cached = less I/O during redo
•SSD storage — Random reads become fast; redo performance dramatically improved
•Recovery checkpoints — Persist progress during long redo; prevents re-work on crash

Recovery Time Objective

Summary: Redo Information

We've explored redo information—the after-images that guarantee durability. Let's consolidate the key concepts:

Key Takeaways

•Redo information is after-images — The new values that modifications introduce, enabling replay of committed changes.
•The Redo Rule enables NO-FORCE — Commit only waits for log flush, not data page writes, because redo can reconstruct later.
•Redo phases replay ALL modifications — Both committed and uncommitted, to restore exact crash-time state before undo.
•Idempotence is essential — Redo operations must produce same result whether applied once or many times.
•Page LSN comparison skips applied redo — Efficient recovery by detecting what's already on disk.
•Checkpoints bound redo scope — Without checkpoints, redo would scan entire log history.
•Physiological logging is the standard — Physical page addressing with logical operations—best of both worlds.

What's Next:

Page Complete

4 / 5