Database Management SystemsRecovery Algorithms

Undo Phase in ARIES Recovery

LevelAdvanced

Duration75 mins

TopicRecovery Algorithms

3 / 5

Backward Scan

Efficient Log Traversal for Undo

The log in a database system can span gigabytes or even terabytes of data, containing millions of log records. During the undo phase, we need to reverse the effects of all loser transactions—but how do we navigate this massive sequential structure efficiently?

A naive approach would process each loser transaction independently: walk backward through T1's records, then walk backward through T2's records, then T3's, and so on. But this would result in multiple passes over the log, each requiring expensive disk I/O.

ARIES's backward scan solves this problem elegantly: it processes ALL loser transactions in a single backward pass through the log, undoing operations in reverse chronological order regardless of which transaction performed them. This minimizes disk seeks and ensures optimal I/O efficiency.

What You Will Learn

By the end of this page, you will understand why backward scanning is essential for efficient undo, how the ToUndo set orchestrates multi-transaction undo, the algorithm for selecting and processing the next record to undo, and the I/O optimizations that make backward scanning practical.

Why Backward, Not Forward?

Before exploring the mechanics, let's understand why undo must proceed backward through the log rather than forward.

The Ordering Problem:

Consider a transaction that makes two updates to the same page:

LSN 100: Set salary to 50000
LSN 200: Add 10% bonus (salary becomes 55000)

If we tried to undo forward:

Undo LSN 100 first: restore salary to original value (say, 45000)
Undo LSN 200: tries to subtract the bonus, but from what value?

The problem is that undoing in forward order destroys the context needed for later undos. The bonus calculation assumed the salary was 50000; undoing in wrong order produces incorrect results.

Backward Order Preserves Context:

Undoing backward:

Undo LSN 200: salary goes from 55000 back to 50000 ✓
Undo LSN 100: salary goes from 50000 back to 45000 ✓

Each undo sees the correct "current" value and restores the correct "previous" value.

Forward vs Backward Undo Comparison
Aspect	Forward Undo (Wrong)	Backward Undo (Correct)
Logical correctness	Breaks for dependent operations	Maintains correct dependencies
Before-image validity	May not match current page state	Always matches current page state
Multi-page transactions	Complex dependency tracking needed	Naturally handles all cases
Implementation complexity	High (need dependency analysis)	Low (simple reverse traversal)
Recovery from partial undo	Very complex	Simple with CLRs and undoNextLSN

The Before-Image Contract:

Log records store before-images based on the state of data at the time of the update. When we undo backward:

The current page state matches the after-image of the most recent update
The before-image correctly describes what to restore
After restoration, the page state matches what it was before that update
This state now correctly matches the after-image of the previous update

This invariant holds at each step only when processing in reverse order.

Physical vs Logical Undo

ARIES uses physical undo (restore exact byte values from before-images). Some systems use logical undo (execute inverse operations). Logical undo can be more flexible but has its own ordering complexities. Physical undo's backward requirement is more straightforward conceptually.

The ToUndo Set

The ToUndo set is the central data structure driving the backward scan. It tracks where we are in the undo process for each loser transaction and determines which log record to process next.

Structure:

The ToUndo set contains (LSN, TransactionID) pairs, where each LSN is the next record that needs to be undone for that transaction. It's typically implemented as a max-heap (priority queue) ordered by LSN, allowing efficient extraction of the highest LSN.

Initialization:

At the start of undo, for each loser transaction T in the transaction table:

If T.undoNextLSN is not null, add (T.undoNextLSN, T.txnId) to ToUndo
If T.undoNextLSN is null, T has no updates to undo (just write END record)

The undoNextLSN field comes from the analysis phase. For transactions that were actively running at crash time, it equals lastLSN. For transactions that were already aborting, it reflects partial undo progress.

to_undo_set.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
/**
 * The ToUndo set implementation using a max-heap.
 * Efficiently manages the undo frontier across all loser transactions.
 */
class ToUndoSet {
    // Max-heap: largest LSN at the top
    private heap: MaxHeap<{ lsn: LogSequenceNumber; txnId: TransactionId }>;
    
    constructor() {
        this.heap = new MaxHeap((a, b) => a.lsn.compareTo(b.lsn));
    }
    
    /**
     * Initialize from the transaction table at start of undo phase.
     */
    static fromTransactionTable(txnTable: TransactionTable): ToUndoSet {
        const toUndo = new ToUndoSet();
        
        for (const [txnId, txnEntry] of txnTable.entries()) {
            // Only process losers (active or aborting transactions)
            if (txnEntry.status === 'ACTIVE' || txnEntry.status === 'ABORTING') {
                if (txnEntry.undoNextLSN !== null) {
                    toUndo.insert(txnEntry.undoNextLSN, txnId);
                } else {
                    // Transaction has no updates - just log END
                    Log.write({ type: 'END', transactionId: txnId });
                    txnTable.remove(txnId);
                }
            }
        }
        
        return toUndo;
    }
    
    /**
     * Insert a new (LSN, txnId) pair into the set.
     */
    insert(lsn: LogSequenceNumber, txnId: TransactionId): void {
        this.heap.insert({ lsn, txnId });
    }
    
    /**
     * Extract and return the entry with the highest LSN.
     * Returns null if the set is empty.
     */
    extractMax(): { lsn: LogSequenceNumber; txnId: TransactionId } | null {
        return this.heap.extractMax();
    }
    
    /**
     * Check if there are more records to undo.
     */
    isEmpty(): boolean {
        return this.heap.isEmpty();
    }
    
    /**
     * Peek at the highest LSN without removing it.
     * Useful for I/O prefetching decisions.
     */
    peekMax(): LogSequenceNumber | null {
        const max = this.heap.peekMax();
        return max ? max.lsn : null;
    }
    
    /**
     * Number of transactions still being undone.
     */
    size(): number {
        return this.heap.size();
    }
}
 
// Example usage during undo phase:
function exampleUndoPhase(txnTable: TransactionTable) {
    const toUndo = ToUndoSet.fromTransactionTable(txnTable);
    
    console.log(`Starting undo with ${toUndo.size()} loser transactions`);
    
    while (!toUndo.isEmpty()) {
        const { lsn, txnId } = toUndo.extractMax()!;
        console.log(`Processing LSN ${lsn} for transaction ${txnId}`);
        
        // ... undo logic here ...
    }
}

Why a Max-Heap?

The max-heap ensures we always process the highest LSN next. This is crucial for:

Sequential log access: Higher LSNs are later in the log; processing them first means we read backward sequentially
Correct ordering: For operations on the same page, we undo the latest first
Efficient extraction: O(log n) to get the maximum, where n is the number of loser transactions

For systems with few concurrent transactions, even a simple sorted list would work. But max-heaps scale better if there are many losers.

ToUndo Set Size

The ToUndo set contains at most one entry per loser transaction, not one per log record. If we have 10 loser transactions, the set has at most 10 entries. This keeps the overhead minimal even for transactions with many updates.

The Backward Scan Algorithm

Now let's walk through the complete backward scan algorithm. The core loop is deceptively simple, but understanding each step is essential.

Main Loop Structure:

Extract the maximum LSN from ToUndo
Read that log record from the log
Process based on record type (UPDATE or CLR)
Update ToUndo with the next LSN for that transaction
Repeat until ToUndo is empty

backward_scan_algorithm.pseudo

Backward Scan Algorithm

PROCEDURE BackwardScanUndo(transactionTable):
    // =============================================
    // PHASE 1: Initialize ToUndo set
    // =============================================
    ToUndo = MaxHeap()  // Priority queue by LSN
    
    FOR EACH (txnId, entry) IN transactionTable:
        IF entry.status IN {ACTIVE, ABORTING}:
            IF entry.undoNextLSN != NULL:
                ToUndo.Insert(entry.undoNextLSN, txnId)
            ELSE:
                // No work to do for this transaction
                Log.Write(END_RECORD, transactionId: txnId)
                Log.Info("Transaction {txnId} had no updates to undo")
    
    Log.Info("Starting backward scan with {ToUndo.Size()} losers")
    
    // =============================================
    // PHASE 2: Main backward scan loop
    // =============================================
    WHILE NOT ToUndo.IsEmpty():
        // Get the highest LSN (latest operation to undo)
        (currentLSN, txnId) = ToUndo.ExtractMax()
        
        // Read the log record at this LSN
        logRecord = Log.Read(currentLSN)
        ASSERT logRecord.transactionId == txnId
        
        // =========================================
        // CASE A: This is an UPDATE record
        // =========================================
        IF logRecord.type == UPDATE:
            // Step A1: Apply the before-image to undo this update
            page = BufferPool.Fetch(logRecord.pageId)
            page.ApplyBeforeImage(logRecord.beforeImage)
            
            // Step A2: Write a CLR documenting this undo
            clr = WriteCLR(
                transactionId: txnId,
                pageId: logRecord.pageId,
                redoInfo: logRecord.beforeImage,
                undoNextLSN: logRecord.prevLSN,  // Key: skip to next
                undoLSN: currentLSN
            )
            
            // Step A3: Update page's LSN to the CLR
            page.SetPageLSN(clr.lsn)
            BufferPool.MarkDirty(logRecord.pageId)
            
            // Step A4: Update ToUndo with next record to process
            IF logRecord.prevLSN != NULL:
                ToUndo.Insert(logRecord.prevLSN, txnId)
            ELSE:
                // Reached the BEGIN - transaction fully undone
                Log.Write(END_RECORD, transactionId: txnId)
                transactionTable.Remove(txnId)
                Log.Info("Transaction {txnId} fully rolled back")
        
        // =========================================
        // CASE B: This is a CLR (from previous partial undo)
        // =========================================
        ELSE IF logRecord.type == CLR:
            // Don't undo the CLR! Just follow undoNextLSN
            IF logRecord.undoNextLSN != NULL:
                ToUndo.Insert(logRecord.undoNextLSN, txnId)
            ELSE:
                // CLR says undo is complete
                Log.Write(END_RECORD, transactionId: txnId)
                transactionTable.Remove(txnId)
                Log.Info("Transaction {txnId} fully rolled back (resumed)")
        
        // =========================================
        // CASE C: This is a BEGIN record  
        // =========================================
        ELSE IF logRecord.type == BEGIN:
            // Transaction fully undone
            Log.Write(END_RECORD, transactionId: txnId)
            transactionTable.Remove(txnId)
    
    // =============================================
    // PHASE 3: Undo complete
    // =============================================
    ASSERT transactionTable.IsEmpty() OR 
           ALL entries are COMMITTED
    Log.Info("Backward scan complete - all losers rolled back")

Critical Observations:

One read per undo: We read exactly one log record per iteration, and we undo at most one operation per iteration. No wasted reads.
CLRs are skipped, not undone: When we encounter a CLR, we don't redo or undo it - we just follow its undoNextLSN to continue where we left off.
The loop terminates when ToUndo is empty: This happens when every loser has reached its BEGIN record (or a CLR with null undoNextLSN).
END records finalize cleanup: The END record marks that the transaction is completely rolled back, enabling log truncation.

What About Non-Loser Records?

Notice that we only process log records from loser transactions. The ToUndo set only contains entries for losers, so we never "see" records from committed transactions during undo. Those records were already correctly handled by redo - their effects remain in the database.

Interleaved Transaction Processing

A key insight of the backward scan is that it processes multiple transactions interleaved based on their LSN ordering, not sequentially by transaction. Let's see this in action.

Example Scenario:

Three loser transactions were active at crash:

T1: Updates at LSNs 100, 250, 400
T2: Updates at LSNs 150, 300
T3: Updates at LSNs 200, 350, 450

The backward scan processes them in this order:

Interleaved Undo Processing Order
Step	ToUndo State	Process LSN	Transaction	Action	Next for Txn
0	{400→T1, 300→T2, 450→T3}			Initialize
1	{400→T1, 300→T2}	450	T3	Undo, CLR@501	350
2	{400→T1, 350→T3}	300	T2	Undo, CLR@502	150
3	{350→T3, 150→T2}	400	T1	Undo, CLR@503	250
4	{250→T1, 150→T2}	350	T3	Undo, CLR@504	200
5	{250→T1, 200→T3}	150	T2	Undo, CLR@505	null→END
6	{200→T3}	250	T1	Undo, CLR@506	100
7	{100→T1}	200	T3	Undo, CLR@507	null→END
8	{}	100	T1	Undo, CLR@508	null→END

Key Observations:

Processing jumps between transactions: We don't finish T1 before starting T2. We process whichever transaction has the highest remaining LSN.
T2 finishes first: Even though T2 wasn't processed first overall, it has fewer records and thus completes before T1 and T3.
Log access is sequential backward: We read LSNs 450, 400, 350, 300, 250, 200, 150, 100 - a monotonically decreasing sequence.
CLRs are written in order: CLR@501, 502, 503... form a chronologically ordered sequence in the log.

Why This Order?

The interleaved order ensures we read the log sequentially backward. If we processed T1 completely before T2:

Read 400, then 250, then 100 (jumping backward three times for T1)
Read 300, then 150 (jumping backward twice for T2)
Read 450, 350, 200 (three more jumps for T3)

This would mean 8 non-sequential reads instead of one smooth backward scan.

Converting Mermaid diagram...

In the diagram:

Red nodes belong to T1
Teal nodes belong to T2
Yellow nodes belong to T3

The backward scan reads from right to left (450 → 400 → 350 → 300 → 250 → 200 → 150 → 100), undoing operations in that order regardless of which transaction they belong to.

Single I/O Stream

Sequential backward read is extremely efficient on modern storage. The disk head (or SSD controller) reads in one direction without seeking. Databases often prefetch log pages backward during undo to maximize throughput.

PrevLSN Chain Navigation

While the ToUndo set determines which LSN to process next, the PrevLSN chain determines where each transaction goes after processing a record. Every log record contains a PrevLSN field pointing to the previous log record for the same transaction.

Chain Structure:

For a transaction with updates at LSNs 100, 250, 400:

Record@400: PrevLSN = 250
Record@250: PrevLSN = 100
Record@100: PrevLSN = null (this is the BEGIN record, or first update)

When we undo record@400, we add 250 to ToUndo. When we undo record@250, we add 100. When we undo record@100, PrevLSN is null, so the transaction is complete.

prevlsn_chain_navigation.pseudo

Chain Navigation Logic

// After undoing a record, determine next step for this transaction
 
PROCEDURE DetermineNextUndo(logRecord, txnId, ToUndo, transactionTable):
    
    IF logRecord.type == UPDATE:
        // For updates, the next record to undo is found via PrevLSN
        nextLSN = logRecord.prevLSN
        
    ELSE IF logRecord.type == CLR:
        // For CLRs, jump using undoNextLSN (skip what's already done)
        nextLSN = logRecord.undoNextLSN
        
    // Determine what to do with nextLSN
    IF nextLSN == NULL:
        // No more records to undo for this transaction
        // This means we've reached the BEGIN record
        
        // Write END record to mark transaction as fully rolled back
        Log.Write(END_RECORD, transactionId: txnId)
        
        // Remove from transaction table
        transactionTable.Remove(txnId)
        
        // Don't add anything to ToUndo
        Log.Debug("Transaction {txnId} undo complete")
        
    ELSE:
        // More records remain to undo
        ToUndo.Insert(nextLSN, txnId)
        
        // Also update transaction table's tracking
        transactionTable[txnId].undoNextLSN = nextLSN
        
        Log.Debug("Transaction {txnId} next undo: LSN {nextLSN}")

PrevLSN vs UndoNextLSN:

It's important to distinguish these two pointers:

Field	Stored In	Points To	Purpose
PrevLSN	Every log record	Previous record for same txn	Chronological chain
UndoNextLSN	CLRs only	Next record to undo	Skip completed undos

For UPDATE records, we use PrevLSN to find the next undo target. For CLR records, we use UndoNextLSN instead (which may skip many records).

Why CLRs Skip Ahead:

When we undo UPDATE@300 and write CLR@450, the CLR's undoNextLSN is set to UPDATE@300's PrevLSN (say, 200). This means the CLR effectively "encapsulates" the undo of 300 and provides a shortcut to 200. If we crash and restart, we don't need to undo 300 again—we just follow the CLR's pointer.

PrevLSN is Set During Normal Operation

The PrevLSN chain is constructed during normal transaction processing, not during recovery. Each time a transaction writes a log record, PrevLSN is set to that transaction's last log record LSN. This is why recovery can efficiently navigate the chain without rebuilding it.

I/O Optimizations for Backward Scan

While the backward scan is inherently I/O-efficient, production systems employ additional optimizations to minimize recovery time.

1. Log Buffer Reuse:

During redo, we've already read most of the log. Smart implementations cache log pages in memory during redo and reuse them during undo, eliminating redundant disk reads.

2. Prefetching:

Knowing that we're scanning backward, the system can prefetch log pages ahead of the current position:

// If current position is page 1000
// Prefetch pages 999, 998, 997... in background

This hides disk latency by reading pages before they're needed.

3. Batch Log Writes:

CLRs written during undo can be batched in the log buffer. The log buffer is flushed periodically or when necessary:

When a dirty page needs to be evicted from buffer pool
At end of undo for a transaction (force the END record)
Based on a time or size threshold

Read Optimizations

•Sequential prefetching: Read ahead in backward direction
•Log buffer caching: Keep recently read log pages in memory
•Redo phase reuse: Share log buffer with redo phase
•Compression awareness: Decompress log blocks once
•RAID striping: Parallel reads across disks

Write Optimizations

•CLR batching: Accumulate CLRs in log buffer
•Group commit: Force multiple CLRs together
•Async page writes: Dirty pages flushed in background
•Minimal CLR content: Only essential fields stored
•END record batching: Multiple ENDs per log force

4. Parallel Undo (Advanced):

Some systems parallelize undo when transactions access non-overlapping pages:

// Worker 1 handles T1 (pages P1, P2, P3)
// Worker 2 handles T2 (pages P4, P5)
// Worker 3 handles T3 (pages P6, P7, P8)

This requires coordination for:

Log buffer access (CLR writes must be serialized for LSN ordering)
Buffer pool latching (standard concurrency control)
ToUndo set access (needs thread-safe implementation)

5. Log Record Locality:

Database systems try to keep related log records physically close:

All records for a transaction tend to cluster temporally
Checkpoint records don't interrupt the log flow
Large log records may be stored separately with pointers

This improves cache behavior during backward scan.

Recovery Time Objectives

Many production databases aim for sub-minute recovery times. This is achieved through aggressive checkpointing (limiting redo work), efficient buffer pool tuning (minimizing losers), and these I/O optimizations. The backward scan's efficiency is a key contributor to fast recovery.

Edge Cases and Special Situations

The backward scan must handle several edge cases correctly. Let's examine the most important ones.

Case 1: Empty Loser Set

If no transactions were active at crash time, the undo phase has nothing to do:

IF ToUndo.IsEmpty() at start:
    Log.Info("No losers to undo - undo phase complete")
    RETURN

This can happen after a clean shutdown followed by a crash before any new transactions began.

Case 2: Read-Only Transactions

Read-only transactions don't write update log records. If one was "active" at crash:

Analysis phase might see BEGIN but no updates
UndoNextLSN would be null (pointing back from BEGIN)
Undo just writes END and moves on

edge_cases.pseudo

Edge Case Handling

// Case 3: Very Long Transactions
// Some transactions have thousands of updates to undo
 
PROCEDURE HandleLongTransactionUndo(txnId, ToUndo):
    recordsUndone = 0
    
    WHILE ToUndo.ContainsTransaction(txnId):
        (lsn, _) = ToUndo.ExtractMax()
        UndoRecord(lsn)
        recordsUndone++
        
        // Optional: Checkpoint periodically during long undo
        IF recordsUndone % 10000 == 0:
            WriteCheckpoint()  // Saves progress
            Log.Info("Undo progress: {recordsUndone} records for {txnId}")
 
 
// Case 4: Partial CLR Chain (crash during previous undo)
// The CLR chain may be incomplete - handle gracefully
 
PROCEDURE HandleIncompleteCLRChain(logRecord):
    IF logRecord.type == CLR:
        // CLR's undoNextLSN tells us where to continue
        // This correctly skips the records that were already undone
        nextLSN = logRecord.undoNextLSN
        // The CLR's redo info is already applied during redo phase
        RETURN nextLSN
 
 
// Case 5: Transaction with Nested Savepoints
// Savepoints create internal markers but don't change undo logic
 
PROCEDURE HandleSavepointRecords(logRecord):
    IF logRecord.type == SAVEPOINT:
        // Savepoint records are not undone
        // Just follow PrevLSN to continue
        RETURN logRecord.prevLSN
        
    IF logRecord.type == ROLLBACK_TO_SAVEPOINT:
        // Already handled any partial rollback
        // Continue with PrevLSN from before the savepoint
        RETURN logRecord.prevLSN

Case 5: Mixed Record Types

The ToUndo set might extract LSNs pointing to different record types:

UPDATE records: undo and write CLR
CLR records: follow undoNextLSN only
BEGIN records: write END and remove transaction
Other types (SAVEPOINT, etc.): follow PrevLSN

Case 6: Page No Longer Exists

Rarely, a page might not exist during undo:

Table was dropped (but transaction not committed)
File corruption or missing file

Recovery must handle this:

TRY:
    page = BufferPool.Fetch(pageId)
CATCH PageNotFoundException:
    Log.Warn("Page {pageId} not found - may be from dropped table")
    // Write CLR anyway to track progress
    // The undo effect is implicitly done if page doesn't exist

Corrupted Log Records

If a log record is corrupted (checksum failure), recovery faces a difficult decision. Options include: stopping and requiring manual intervention, skipping the record (risky), or attempting repair from redundant copies. Most production systems halt and alert operators for corruption during recovery.

Summary: Backward Scan

The backward scan is the engine that drives ARIES's undo phase, processing all loser transactions efficiently in a single pass. Let's consolidate the key insights:

Key Takeaways

•Backward order is essential — Undoing forward would corrupt data due to operation dependencies on the same page.
•ToUndo set tracks all losers — A max-heap containing each loser's next record to undo enables efficient LSN-ordered processing.
•Single pass processes all transactions — By always undoing the highest LSN, we read the log sequentially backward.
•Interleaved processing is I/O-efficient — Transactions complete in different orders based on their log positions, but read access is sequential.
•PrevLSN chain guides navigation — Each record's PrevLSN points to the next record to undo for that transaction.
•CLRs provide shortcuts — Following undoNextLSN instead of PrevLSN skips already-completed undo work.
•Production systems add optimizations — Prefetching, batching, and parallelism further improve recovery time.

What's Next:

Some transactions don't just need simple undo—they require nested rollback when different parts of the transaction need different handling. The next page explores nested rollback scenarios, including savepoints, subtransactions, and the complexities they introduce to the undo phase.

Page Complete

You now understand backward scanning—the efficient mechanism that processes all loser transactions in a single pass through the log. The ToUndo set, PrevLSN chains, and CLR shortcuts work together to minimize I/O while guaranteeing correct transaction rollback.

3 / 5

Loading learning content...

Database Management SystemsRecovery Algorithms

Undo Phase in ARIES Recovery

LevelAdvanced

Duration75 mins

TopicRecovery Algorithms

3 / 5

Backward Scan

Efficient Log Traversal for Undo

What You Will Learn

Why Backward, Not Forward?

Before exploring the mechanics, let's understand why undo must proceed backward through the log rather than forward.

The Ordering Problem:

Consider a transaction that makes two updates to the same page:

LSN 100: Set salary to 50000
LSN 200: Add 10% bonus (salary becomes 55000)

If we tried to undo forward:

Undo LSN 100 first: restore salary to original value (say, 45000)
Undo LSN 200: tries to subtract the bonus, but from what value?

The problem is that undoing in forward order destroys the context needed for later undos. The bonus calculation assumed the salary was 50000; undoing in wrong order produces incorrect results.

Backward Order Preserves Context:

Undoing backward:

Undo LSN 200: salary goes from 55000 back to 50000 ✓
Undo LSN 100: salary goes from 50000 back to 45000 ✓

Each undo sees the correct "current" value and restores the correct "previous" value.

Forward vs Backward Undo Comparison
Aspect	Forward Undo (Wrong)	Backward Undo (Correct)
Logical correctness	Breaks for dependent operations	Maintains correct dependencies
Before-image validity	May not match current page state	Always matches current page state
Multi-page transactions	Complex dependency tracking needed	Naturally handles all cases
Implementation complexity	High (need dependency analysis)	Low (simple reverse traversal)
Recovery from partial undo	Very complex	Simple with CLRs and undoNextLSN

The Before-Image Contract:

Log records store before-images based on the state of data at the time of the update. When we undo backward:

The current page state matches the after-image of the most recent update
The before-image correctly describes what to restore
After restoration, the page state matches what it was before that update
This state now correctly matches the after-image of the previous update

This invariant holds at each step only when processing in reverse order.

Physical vs Logical Undo

The ToUndo Set

The ToUndo set is the central data structure driving the backward scan. It tracks where we are in the undo process for each loser transaction and determines which log record to process next.

Structure:

Initialization:

At the start of undo, for each loser transaction T in the transaction table:

If T.undoNextLSN is not null, add (T.undoNextLSN, T.txnId) to ToUndo
If T.undoNextLSN is null, T has no updates to undo (just write END record)

to_undo_set.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
/**
 * The ToUndo set implementation using a max-heap.
 * Efficiently manages the undo frontier across all loser transactions.
 */
class ToUndoSet {
    // Max-heap: largest LSN at the top
    private heap: MaxHeap<{ lsn: LogSequenceNumber; txnId: TransactionId }>;
    
    constructor() {
        this.heap = new MaxHeap((a, b) => a.lsn.compareTo(b.lsn));
    }
    
    /**
     * Initialize from the transaction table at start of undo phase.
     */
    static fromTransactionTable(txnTable: TransactionTable): ToUndoSet {
        const toUndo = new ToUndoSet();
        
        for (const [txnId, txnEntry] of txnTable.entries()) {
            // Only process losers (active or aborting transactions)
            if (txnEntry.status === 'ACTIVE' || txnEntry.status === 'ABORTING') {
                if (txnEntry.undoNextLSN !== null) {
                    toUndo.insert(txnEntry.undoNextLSN, txnId);
                } else {
                    // Transaction has no updates - just log END
                    Log.write({ type: 'END', transactionId: txnId });
                    txnTable.remove(txnId);
                }
            }
        }
        
        return toUndo;
    }
    
    /**
     * Insert a new (LSN, txnId) pair into the set.
     */
    insert(lsn: LogSequenceNumber, txnId: TransactionId): void {
        this.heap.insert({ lsn, txnId });
    }
    
    /**
     * Extract and return the entry with the highest LSN.
     * Returns null if the set is empty.
     */
    extractMax(): { lsn: LogSequenceNumber; txnId: TransactionId } | null {
        return this.heap.extractMax();
    }
    
    /**
     * Check if there are more records to undo.
     */
    isEmpty(): boolean {
        return this.heap.isEmpty();
    }
    
    /**
     * Peek at the highest LSN without removing it.
     * Useful for I/O prefetching decisions.
     */
    peekMax(): LogSequenceNumber | null {
        const max = this.heap.peekMax();
        return max ? max.lsn : null;
    }
    
    /**
     * Number of transactions still being undone.
     */
    size(): number {
        return this.heap.size();
    }
}
 
// Example usage during undo phase:
function exampleUndoPhase(txnTable: TransactionTable) {
    const toUndo = ToUndoSet.fromTransactionTable(txnTable);
    
    console.log(`Starting undo with ${toUndo.size()} loser transactions`);
    
    while (!toUndo.isEmpty()) {
        const { lsn, txnId } = toUndo.extractMax()!;
        console.log(`Processing LSN ${lsn} for transaction ${txnId}`);
        
        // ... undo logic here ...
    }
}

Why a Max-Heap?

The max-heap ensures we always process the highest LSN next. This is crucial for:

Sequential log access: Higher LSNs are later in the log; processing them first means we read backward sequentially
Correct ordering: For operations on the same page, we undo the latest first
Efficient extraction: O(log n) to get the maximum, where n is the number of loser transactions

For systems with few concurrent transactions, even a simple sorted list would work. But max-heaps scale better if there are many losers.

ToUndo Set Size

The Backward Scan Algorithm

Now let's walk through the complete backward scan algorithm. The core loop is deceptively simple, but understanding each step is essential.

Main Loop Structure:

Extract the maximum LSN from ToUndo
Read that log record from the log
Process based on record type (UPDATE or CLR)
Update ToUndo with the next LSN for that transaction
Repeat until ToUndo is empty

backward_scan_algorithm.pseudo

Backward Scan Algorithm

PROCEDURE BackwardScanUndo(transactionTable):
    // =============================================
    // PHASE 1: Initialize ToUndo set
    // =============================================
    ToUndo = MaxHeap()  // Priority queue by LSN
    
    FOR EACH (txnId, entry) IN transactionTable:
        IF entry.status IN {ACTIVE, ABORTING}:
            IF entry.undoNextLSN != NULL:
                ToUndo.Insert(entry.undoNextLSN, txnId)
            ELSE:
                // No work to do for this transaction
                Log.Write(END_RECORD, transactionId: txnId)
                Log.Info("Transaction {txnId} had no updates to undo")
    
    Log.Info("Starting backward scan with {ToUndo.Size()} losers")
    
    // =============================================
    // PHASE 2: Main backward scan loop
    // =============================================
    WHILE NOT ToUndo.IsEmpty():
        // Get the highest LSN (latest operation to undo)
        (currentLSN, txnId) = ToUndo.ExtractMax()
        
        // Read the log record at this LSN
        logRecord = Log.Read(currentLSN)
        ASSERT logRecord.transactionId == txnId
        
        // =========================================
        // CASE A: This is an UPDATE record
        // =========================================
        IF logRecord.type == UPDATE:
            // Step A1: Apply the before-image to undo this update
            page = BufferPool.Fetch(logRecord.pageId)
            page.ApplyBeforeImage(logRecord.beforeImage)
            
            // Step A2: Write a CLR documenting this undo
            clr = WriteCLR(
                transactionId: txnId,
                pageId: logRecord.pageId,
                redoInfo: logRecord.beforeImage,
                undoNextLSN: logRecord.prevLSN,  // Key: skip to next
                undoLSN: currentLSN
            )
            
            // Step A3: Update page's LSN to the CLR
            page.SetPageLSN(clr.lsn)
            BufferPool.MarkDirty(logRecord.pageId)
            
            // Step A4: Update ToUndo with next record to process
            IF logRecord.prevLSN != NULL:
                ToUndo.Insert(logRecord.prevLSN, txnId)
            ELSE:
                // Reached the BEGIN - transaction fully undone
                Log.Write(END_RECORD, transactionId: txnId)
                transactionTable.Remove(txnId)
                Log.Info("Transaction {txnId} fully rolled back")
        
        // =========================================
        // CASE B: This is a CLR (from previous partial undo)
        // =========================================
        ELSE IF logRecord.type == CLR:
            // Don't undo the CLR! Just follow undoNextLSN
            IF logRecord.undoNextLSN != NULL:
                ToUndo.Insert(logRecord.undoNextLSN, txnId)
            ELSE:
                // CLR says undo is complete
                Log.Write(END_RECORD, transactionId: txnId)
                transactionTable.Remove(txnId)
                Log.Info("Transaction {txnId} fully rolled back (resumed)")
        
        // =========================================
        // CASE C: This is a BEGIN record  
        // =========================================
        ELSE IF logRecord.type == BEGIN:
            // Transaction fully undone
            Log.Write(END_RECORD, transactionId: txnId)
            transactionTable.Remove(txnId)
    
    // =============================================
    // PHASE 3: Undo complete
    // =============================================
    ASSERT transactionTable.IsEmpty() OR 
           ALL entries are COMMITTED
    Log.Info("Backward scan complete - all losers rolled back")

Critical Observations:

One read per undo: We read exactly one log record per iteration, and we undo at most one operation per iteration. No wasted reads.
CLRs are skipped, not undone: When we encounter a CLR, we don't redo or undo it - we just follow its undoNextLSN to continue where we left off.
The loop terminates when ToUndo is empty: This happens when every loser has reached its BEGIN record (or a CLR with null undoNextLSN).
END records finalize cleanup: The END record marks that the transaction is completely rolled back, enabling log truncation.

What About Non-Loser Records?

Interleaved Transaction Processing

A key insight of the backward scan is that it processes multiple transactions interleaved based on their LSN ordering, not sequentially by transaction. Let's see this in action.

Example Scenario:

Three loser transactions were active at crash:

T1: Updates at LSNs 100, 250, 400
T2: Updates at LSNs 150, 300
T3: Updates at LSNs 200, 350, 450

The backward scan processes them in this order:

Interleaved Undo Processing Order
Step	ToUndo State	Process LSN	Transaction	Action	Next for Txn
0	{400→T1, 300→T2, 450→T3}			Initialize
1	{400→T1, 300→T2}	450	T3	Undo, CLR@501	350
2	{400→T1, 350→T3}	300	T2	Undo, CLR@502	150
3	{350→T3, 150→T2}	400	T1	Undo, CLR@503	250
4	{250→T1, 150→T2}	350	T3	Undo, CLR@504	200
5	{250→T1, 200→T3}	150	T2	Undo, CLR@505	null→END
6	{200→T3}	250	T1	Undo, CLR@506	100
7	{100→T1}	200	T3	Undo, CLR@507	null→END
8	{}	100	T1	Undo, CLR@508	null→END

Key Observations:

Processing jumps between transactions: We don't finish T1 before starting T2. We process whichever transaction has the highest remaining LSN.
T2 finishes first: Even though T2 wasn't processed first overall, it has fewer records and thus completes before T1 and T3.
Log access is sequential backward: We read LSNs 450, 400, 350, 300, 250, 200, 150, 100 - a monotonically decreasing sequence.
CLRs are written in order: CLR@501, 502, 503... form a chronologically ordered sequence in the log.

Why This Order?

The interleaved order ensures we read the log sequentially backward. If we processed T1 completely before T2:

Read 400, then 250, then 100 (jumping backward three times for T1)
Read 300, then 150 (jumping backward twice for T2)
Read 450, 350, 200 (three more jumps for T3)

This would mean 8 non-sequential reads instead of one smooth backward scan.

Converting Mermaid diagram...

In the diagram:

Red nodes belong to T1
Teal nodes belong to T2
Yellow nodes belong to T3

The backward scan reads from right to left (450 → 400 → 350 → 300 → 250 → 200 → 150 → 100), undoing operations in that order regardless of which transaction they belong to.

Single I/O Stream

PrevLSN Chain Navigation

Chain Structure:

For a transaction with updates at LSNs 100, 250, 400:

Record@400: PrevLSN = 250
Record@250: PrevLSN = 100
Record@100: PrevLSN = null (this is the BEGIN record, or first update)

When we undo record@400, we add 250 to ToUndo. When we undo record@250, we add 100. When we undo record@100, PrevLSN is null, so the transaction is complete.

prevlsn_chain_navigation.pseudo

Chain Navigation Logic

// After undoing a record, determine next step for this transaction
 
PROCEDURE DetermineNextUndo(logRecord, txnId, ToUndo, transactionTable):
    
    IF logRecord.type == UPDATE:
        // For updates, the next record to undo is found via PrevLSN
        nextLSN = logRecord.prevLSN
        
    ELSE IF logRecord.type == CLR:
        // For CLRs, jump using undoNextLSN (skip what's already done)
        nextLSN = logRecord.undoNextLSN
        
    // Determine what to do with nextLSN
    IF nextLSN == NULL:
        // No more records to undo for this transaction
        // This means we've reached the BEGIN record
        
        // Write END record to mark transaction as fully rolled back
        Log.Write(END_RECORD, transactionId: txnId)
        
        // Remove from transaction table
        transactionTable.Remove(txnId)
        
        // Don't add anything to ToUndo
        Log.Debug("Transaction {txnId} undo complete")
        
    ELSE:
        // More records remain to undo
        ToUndo.Insert(nextLSN, txnId)
        
        // Also update transaction table's tracking
        transactionTable[txnId].undoNextLSN = nextLSN
        
        Log.Debug("Transaction {txnId} next undo: LSN {nextLSN}")

PrevLSN vs UndoNextLSN:

It's important to distinguish these two pointers:

Field	Stored In	Points To	Purpose
PrevLSN	Every log record	Previous record for same txn	Chronological chain
UndoNextLSN	CLRs only	Next record to undo	Skip completed undos

For UPDATE records, we use PrevLSN to find the next undo target. For CLR records, we use UndoNextLSN instead (which may skip many records).

Why CLRs Skip Ahead:

PrevLSN is Set During Normal Operation

I/O Optimizations for Backward Scan

While the backward scan is inherently I/O-efficient, production systems employ additional optimizations to minimize recovery time.

1. Log Buffer Reuse:

During redo, we've already read most of the log. Smart implementations cache log pages in memory during redo and reuse them during undo, eliminating redundant disk reads.

2. Prefetching:

Knowing that we're scanning backward, the system can prefetch log pages ahead of the current position:

// If current position is page 1000
// Prefetch pages 999, 998, 997... in background

This hides disk latency by reading pages before they're needed.

3. Batch Log Writes:

CLRs written during undo can be batched in the log buffer. The log buffer is flushed periodically or when necessary:

When a dirty page needs to be evicted from buffer pool
At end of undo for a transaction (force the END record)
Based on a time or size threshold

Read Optimizations

•Sequential prefetching: Read ahead in backward direction
•Log buffer caching: Keep recently read log pages in memory
•Redo phase reuse: Share log buffer with redo phase
•Compression awareness: Decompress log blocks once
•RAID striping: Parallel reads across disks

Write Optimizations

•CLR batching: Accumulate CLRs in log buffer
•Group commit: Force multiple CLRs together
•Async page writes: Dirty pages flushed in background
•Minimal CLR content: Only essential fields stored
•END record batching: Multiple ENDs per log force

4. Parallel Undo (Advanced):

Some systems parallelize undo when transactions access non-overlapping pages:

// Worker 1 handles T1 (pages P1, P2, P3)
// Worker 2 handles T2 (pages P4, P5)
// Worker 3 handles T3 (pages P6, P7, P8)

This requires coordination for:

Log buffer access (CLR writes must be serialized for LSN ordering)
Buffer pool latching (standard concurrency control)
ToUndo set access (needs thread-safe implementation)

5. Log Record Locality:

Database systems try to keep related log records physically close:

All records for a transaction tend to cluster temporally
Checkpoint records don't interrupt the log flow
Large log records may be stored separately with pointers

This improves cache behavior during backward scan.

Recovery Time Objectives

Edge Cases and Special Situations

The backward scan must handle several edge cases correctly. Let's examine the most important ones.

Case 1: Empty Loser Set

If no transactions were active at crash time, the undo phase has nothing to do:

IF ToUndo.IsEmpty() at start:
    Log.Info("No losers to undo - undo phase complete")
    RETURN

This can happen after a clean shutdown followed by a crash before any new transactions began.

Case 2: Read-Only Transactions

Read-only transactions don't write update log records. If one was "active" at crash:

Analysis phase might see BEGIN but no updates
UndoNextLSN would be null (pointing back from BEGIN)
Undo just writes END and moves on

edge_cases.pseudo

Edge Case Handling

// Case 3: Very Long Transactions
// Some transactions have thousands of updates to undo
 
PROCEDURE HandleLongTransactionUndo(txnId, ToUndo):
    recordsUndone = 0
    
    WHILE ToUndo.ContainsTransaction(txnId):
        (lsn, _) = ToUndo.ExtractMax()
        UndoRecord(lsn)
        recordsUndone++
        
        // Optional: Checkpoint periodically during long undo
        IF recordsUndone % 10000 == 0:
            WriteCheckpoint()  // Saves progress
            Log.Info("Undo progress: {recordsUndone} records for {txnId}")
 
 
// Case 4: Partial CLR Chain (crash during previous undo)
// The CLR chain may be incomplete - handle gracefully
 
PROCEDURE HandleIncompleteCLRChain(logRecord):
    IF logRecord.type == CLR:
        // CLR's undoNextLSN tells us where to continue
        // This correctly skips the records that were already undone
        nextLSN = logRecord.undoNextLSN
        // The CLR's redo info is already applied during redo phase
        RETURN nextLSN
 
 
// Case 5: Transaction with Nested Savepoints
// Savepoints create internal markers but don't change undo logic
 
PROCEDURE HandleSavepointRecords(logRecord):
    IF logRecord.type == SAVEPOINT:
        // Savepoint records are not undone
        // Just follow PrevLSN to continue
        RETURN logRecord.prevLSN
        
    IF logRecord.type == ROLLBACK_TO_SAVEPOINT:
        // Already handled any partial rollback
        // Continue with PrevLSN from before the savepoint
        RETURN logRecord.prevLSN

Case 5: Mixed Record Types

The ToUndo set might extract LSNs pointing to different record types:

UPDATE records: undo and write CLR
CLR records: follow undoNextLSN only
BEGIN records: write END and remove transaction
Other types (SAVEPOINT, etc.): follow PrevLSN

Case 6: Page No Longer Exists

Rarely, a page might not exist during undo:

Table was dropped (but transaction not committed)
File corruption or missing file

Recovery must handle this:

TRY:
    page = BufferPool.Fetch(pageId)
CATCH PageNotFoundException:
    Log.Warn("Page {pageId} not found - may be from dropped table")
    // Write CLR anyway to track progress
    // The undo effect is implicitly done if page doesn't exist

Corrupted Log Records

Summary: Backward Scan

The backward scan is the engine that drives ARIES's undo phase, processing all loser transactions efficiently in a single pass. Let's consolidate the key insights:

Key Takeaways

•Backward order is essential — Undoing forward would corrupt data due to operation dependencies on the same page.
•ToUndo set tracks all losers — A max-heap containing each loser's next record to undo enables efficient LSN-ordered processing.
•Single pass processes all transactions — By always undoing the highest LSN, we read the log sequentially backward.
•Interleaved processing is I/O-efficient — Transactions complete in different orders based on their log positions, but read access is sequential.
•PrevLSN chain guides navigation — Each record's PrevLSN points to the next record to undo for that transaction.
•CLRs provide shortcuts — Following undoNextLSN instead of PrevLSN skips already-completed undo work.
•Production systems add optimizations — Prefetching, batching, and parallelism further improve recovery time.

What's Next:

Page Complete

3 / 5