Database Management SystemsRecovery Algorithms

Undo Phase in ARIES Recovery

LevelAdvanced

Duration75 mins

TopicRecovery Algorithms

2 / 5

Compensation Log Records (CLR)

Making Undo Operations Durable

Consider a troubling scenario: the database system crashes during the undo phase of recovery. You were halfway through rolling back three incomplete transactions when power failed. What happens when the system restarts?

Without special precautions, you'd face a dilemma: some undo operations succeeded, some pages have been reverted, but there's no record of this progress. Do you restart undo from the beginning? That could apply before-images twice, corrupting data. Do you skip what was already done? But how do you know what was done?

Compensation Log Records (CLRs) solve this problem elegantly. They are special log records that document undo operations, making them durable and allowing recovery to "repeat history" for undos just as it does for regular operations.

What You Will Learn

By the end of this page, you will understand the structure and purpose of CLRs, why they are "redo-only" records, how the undoNextLSN field enables skipping during repeated recovery, and how CLRs form a chain that tracks undo progress through any number of crash-recovery cycles.

The Fundamental Problem CLRs Solve

Before diving into CLR structure, let's thoroughly understand the problem they solve. Imagine we're undoing a transaction T1 that made three updates:

T1's Log Records:

LSN 100: T1 updates page P1 (A→B)
LSN 200: T1 updates page P2 (X→Y)
LSN 300: T1 updates page P3 (M→N)
[CRASH - no commit record]

During recovery's undo phase:

We undo LSN 300: restore P3 to M
We undo LSN 200: restore P2 to X
[CRASH DURING UNDO]

Now the system restarts again. What state are we in?

Without CLRs

•No record that LSN 300 and 200 were undone
•Analysis phase still sees T1 as uncommitted
•Undo phase would try to undo LSN 300 again
•Applying before-image twice corrupts data
•If P3 was M→N, undoing gives M. Undoing again gives ???
•Recovery would produce incorrect results

With CLRs

•CLR written for each completed undo
•CLR for 300 and 200 are in the log
•Redo phase replays the CLRs (safe)
•Undo phase sees CLRs and skips them
•Undo continues from LSN 100 only
•Recovery produces correct results

The Key Insight:

CLRs turn undo operations into regular logged operations that follow the same rules as everything else:

They're written to the log before the page modification is considered durable
They're replayed during redo just like regular updates
They contain enough information to re-apply the undo if needed

This elegantly reuses the existing redo machinery to handle crash-during-undo scenarios, without requiring any special case logic.

Repeating History

ARIES follows a "repeating history" paradigm: during redo, recreate the exact state at crash time, including any partial undo progress. This means redo treats CLRs just like regular log records—it replays them. Then undo can simply continue from where it left off, knowing that all previous undo progress has been restored.

CLR Structure in Detail

A Compensation Log Record has a specific structure designed to support both redo (during recovery) and undo-chain management. Let's examine each field:

Standard Log Record Fields: Like all log records, CLRs have an LSN, transaction ID, and PrevLSN. These enable the CLR to be part of the transaction's log chain.

CLR-Specific Fields:

UndoNextLSN: Points to the next log record that needs to be undone (crucial for skipping)
Redo Information: Contains the action that the CLR records (the undo operation itself)
Type identifier: Marks this as a CLR rather than a regular update

clr_structure.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
/**
 * Compensation Log Record Structure
 * 
 * A CLR documents an undo operation, making it durable
 * and enabling proper handling during repeated recovery.
 */
interface CompensationLogRecord {
    // ===== Standard Log Record Fields =====
    
    /** Unique Log Sequence Number for this CLR */
    lsn: LogSequenceNumber;
    
    /** Which transaction this CLR belongs to */
    transactionId: TransactionId;
    
    /** 
     * Previous log record for this transaction.
     * For CLRs, this points to the previous log record
     * (could be another CLR or an update record)
     */
    prevLSN: LogSequenceNumber | null;
    
    /** Type identifier */
    recordType: LogRecordType.CLR;
    
    // ===== CLR-Specific Fields =====
    
    /**
     * THE KEY FIELD: Points to the next record to undo.
     * This is the PrevLSN of the record that was just undone.
     * 
     * Example: If we undid record at LSN 200, and that record's
     * PrevLSN was 100, then undoNextLSN = 100.
     * 
     * If the undone record had no PrevLSN (was the BEGIN),
     * undoNextLSN = null (indicating undo is complete).
     */
    undoNextLSN: LogSequenceNumber | null;
    
    /**
     * The page that was modified by this undo operation.
     * Needed during redo to re-apply the CLR.
     */
    pageId: PageId;
    
    /**
     * The redo information: what was written to the page.
     * This is the before-image from the original update
     * that we are undoing.
     */
    redoInfo: {
        /** Offset within the page */
        offset: number;
        /** Length of the data */
        length: number;
        /** The data that was written (the before-image) */
        data: Buffer;
    };
    
    /**
     * LSN of the original update record that was undone.
     * Useful for debugging and log analysis.
     */
    undoLSN: LogSequenceNumber;
}
 
// Example CLR:
const exampleCLR: CompensationLogRecord = {
    lsn: 450,                    // New LSN for this CLR
    transactionId: 'T1',
    prevLSN: 300,                // Points to the last CLR or update by T1
    recordType: LogRecordType.CLR,
    undoNextLSN: 100,            // Next record to undo for T1
    pageId: 'P2',
    redoInfo: {
        offset: 42,
        length: 8,
        data: Buffer.from('X')   // The before-image we restored
    },
    undoLSN: 200                 // We undid the record at LSN 200
};

The UndoNextLSN Field Explained:

This is the most important field in a CLR. Understanding it is key to understanding how ARIES handles crash-during-undo:

When we undo a log record at LSN X, that record has a PrevLSN field
The CLR's undoNextLSN = that PrevLSN value
This tells future undo phases: "The next record to undo is at this LSN"
If the original record had PrevLSN = null, undoNextLSN = null, meaning "undo complete"

This creates a shortcut chain: during undo, when we encounter a CLR, we jump directly to undoNextLSN instead of walking through all the intermediate records.

Why Not Store the Before-Image Twice?

The CLR stores redo information (what to write during redo), which happens to be the before-image from the original update. It doesn't need to store an "undo" image because CLRs are never undone—they're redo-only. This reduces log space and simplifies the design.

Why CLRs are Redo-Only

A fundamental property of CLRs is that they are never undone. This might seem strange at first—if we can undo regular updates, why can't we undo CLRs? The answer lies in what it would mean to undo an undo.

The Logical Contradiction:

Suppose transaction T1 updated page P, changing value A to B:

Original update: A → B (logged at LSN 100)
Undo (CLR): B → A (logged at LSN 500, undoNextLSN = null)

If we "undid" the CLR, we would be:

Reversing the CLR: A → B ???

But wait—this would re-apply the original change! We'd be putting T1's uncommitted change back into the database. This completely defeats the purpose of undo.

The ARIES Solution:

ARIES makes a clean design choice: CLRs are redo-only. They have no "undo" information because the concept doesn't make sense. During undo processing:

Regular updates: undo them, write CLR
CLRs: follow undoNextLSN, don't undo

Log Record Types and Their Undo/Redo Behavior
Record Type	Has Undo Info?	Has Redo Info?	During Redo Phase	During Undo Phase
UPDATE	Yes (before-image)	Yes (after-image)	Apply after-image if needed	Apply before-image, write CLR
CLR	No	Yes (before-image of undone op)	Apply redo info if needed	Do NOT undo; follow undoNextLSN
COMMIT	N/A	N/A	Mark txn as committed	Should never encounter (txn is winner)
BEGIN	N/A	N/A	Initialize txn state	Write END record, remove from ToUndo
END	N/A	N/A	Remove txn from table	Should never encounter

Mathematical Perspective:

If we denote the original operation as O (with inverse O⁻¹), then:

The update log record represents O
The CLR represents O⁻¹

Undoing the CLR would be (O⁻¹)⁻¹ = O, which brings us back to the original operation. But we're trying to remove the effects of O! The only sensible approach is to treat CLRs as facts about the undo process, not as reversible operations.

The Practical Implication:

Because CLRs are redo-only, they're simpler than regular update records. They don't need a "before-before-image" (what the data was before the undo). They only need enough information to redo the undo if there's another crash. This is why CLRs can be smaller than the original update records in some cases.

Never Try to Undo a CLR

Any recovery algorithm that attempts to undo CLRs is fundamentally broken. Undoing an undo produces the wrong result and violates atomicity. The undoNextLSN pointer exists precisely to skip CLRs during the undo phase while still tracking progress.

The CLR Chain and Progress Tracking

CLRs form a chain through their undoNextLSN pointers that provides efficient navigation during undo. This chain is separate from (but connected to) the regular PrevLSN chain that links all records for a transaction.

Two Interleaved Chains:

PrevLSN Chain: Links all log records for a transaction chronologically (updates and CLRs)
UndoNextLSN Chain: Provides shortcuts through the undo progress, skipping what's already done

Let's visualize this with a concrete example:

Converting Mermaid diagram...

In the diagram above:

Teal boxes are original UPDATE records from T1
Yellow boxes are CLRs written during undo
Red markers indicate crash points
Solid arrows show the PrevLSN chain (chronological order)
Dotted arrows show the undoNextLSN chain (undo progress)

Walking Through the Scenario:

T1 makes three updates (LSN 100, 200, 300), then crashes
Recovery #1: Undo starts with LSN 300, writes CLR at 450 (undoNextLSN=200)
Undo continues with LSN 200, writes CLR at 500 (undoNextLSN=100)
Crash #2 occurs before completing
Recovery #2: Redo replays CLR 450 and 500
Undo sees T1 in table with lastLSN=500 (the last CLR)
Read CLR at 500, follow undoNextLSN to 100
Undo LSN 100, write CLR at 600 (undoNextLSN=null)
Write END record—T1 is fully rolled back

The Beauty of the Design

Notice how the second recovery doesn't need special logic. It simply processes the CLRs like any other log record during redo, then follows undoNextLSN during undo. The system naturally resumes from where it left off. This uniformity is a hallmark of ARIES's elegant design.

CLRs and Page LSN Updates

When a CLR is written for an undo operation, the affected page's PageLSN is updated to the CLR's LSN. This has important implications for the redo phase.

Why Update PageLSN?

The PageLSN on a page indicates: "This page reflects all log records up to and including this LSN." By setting PageLSN to the CLR's LSN:

Future redo phases know the undo is already applied to this page
The page is correctly marked as modified (dirty)
Crash during undo won't cause the undo operation to be lost

The Redo Phase Check:

During redo, for each log record (including CLRs), we check:

if (logRecord.lsn > page.pageLSN) {
    // Redo this operation - it wasn't persisted
    applyOperation(logRecord);
    page.pageLSN = logRecord.lsn;
} else {
    // Page already reflects this operation - skip
}

clr_page_update.pseudo

CLR and Page LSN Logic

PROCEDURE WriteCLRAndUpdatePage(undoingLSN, beforeImage, pageId):
    // Step 1: Fetch the page (may already be in buffer)
    page = bufferPool.Fetch(pageId)
    page.AcquireExclusiveLatch()
    
    TRY:
        // Step 2: Apply the before-image (the undo operation)
        page.ApplyBeforeImage(beforeImage)
        
        // Step 3: Construct the CLR
        clr = {
            type: CLR,
            transactionId: getCurrentTransaction(),
            prevLSN: transactionTable[txnId].lastLSN,
            undoNextLSN: log[undoingLSN].prevLSN,  // Skip to next
            pageId: pageId,
            redoInfo: beforeImage,  // What redo should apply
            undoLSN: undoingLSN     // What we're undoing
        }
        
        // Step 4: Write CLR to log buffer (WAL: log before data)
        clr.lsn = log.Append(clr)
        
        // Step 5: Force CLR to stable storage
        log.ForceToLSN(clr.lsn)  // Usually done asynchronously in batches
        
        // Step 6: Update page's PageLSN to the CLR's LSN
        page.SetPageLSN(clr.lsn)
        
        // Step 7: Update transaction table
        transactionTable[txnId].lastLSN = clr.lsn
        transactionTable[txnId].undoNextLSN = clr.undoNextLSN
        
        // Step 8: Mark page as dirty (if not already)
        bufferPool.MarkDirty(pageId)
    
    FINALLY:
        page.ReleaseExclusiveLatch()
    
    RETURN clr.lsn

Timing Considerations:

The order of operations matters:

Latch the page first (exclusive access)
Apply the change to the page in memory
Write the CLR to the log (WAL protocol)
Update PageLSN to the CLR's LSN
Release latch (page may remain in buffer as dirty)

The WAL protocol requirement is critical: the CLR must be on stable storage before we consider the undo complete. In practice, systems often batch log writes, so the CLR might sit in the log buffer briefly. But before the page can be written to disk, the log must be forced to at least that CLR's LSN.

Force at Commit vs Force at CLR

During normal transaction abort, the system might not force each CLR individually—it can batch them. But the final END record must be forced, and WAL ensures any dirty pages with CLR-updated PageLSNs can't be written until those CLRs are stable. During recovery, there's no commit to force, but the log writes during undo still follow WAL semantics.

Handling Multiple Crashes

One of ARIES's most impressive properties is its ability to handle any number of crashes during recovery without losing progress or corrupting data. This is entirely due to the CLR mechanism.

Invariant: Progress is Always Preserved

No matter when a crash occurs:

All completed undos are recorded as CLRs
Redo will replay those CLRs, restoring the undo work
Undo will see the CLRs and skip to undoNextLSN
Only remaining work is done, never repeated work

Let's trace through a worst-case scenario with many crashes:

multiple_crash_scenario.pseudo

Multi-Crash Recovery Trace

SCENARIO: Transaction T1 with 5 updates, followed by many crashes
 
Initial State:
  Log: [BEGIN@10, UPD@100, UPD@200, UPD@300, UPD@400, UPD@500]
  T1 is uncommitted (no COMMIT record)
 
=== CRASH #1 ===
 
Recovery #1 - Analysis:
  T1 is loser, lastLSN=500, undoNextLSN=500
  
Recovery #1 - Redo:
  Replays all updates (nothing new here)
  
Recovery #1 - Undo:
  Undo LSN 500 → write CLR@601 (undoNextLSN=400)
  Undo LSN 400 → write CLR@602 (undoNextLSN=300)
  [CRASH #2 DURING UNDO]
 
=== CRASH #2 ===
 
Recovery #2 - Analysis:
  Sees T1 still active (no COMMIT or END)
  lastLSN=602 (the latest CLR)
  
Recovery #2 - Redo:
  Replays UPD@100-500 (pages already current, skipped)
  Replays CLR@601 and CLR@602 (applies them if needed)
  
Recovery #2 - Undo:
  T1's undoNextLSN from CLR@602 is 300
  Start from LSN 300 (not 500!)
  Undo LSN 300 → write CLR@701 (undoNextLSN=200)
  [CRASH #3 DURING UNDO]
 
=== CRASH #3 ===
 
Recovery #3 - Analysis:
  T1 still active, lastLSN=701
  
Recovery #3 - Redo:
  Replays CLR@601, CLR@602, CLR@701
  
Recovery #3 - Undo:
  undoNextLSN from CLR@701 is 200
  Undo LSN 200 → write CLR@801 (undoNextLSN=100)
  Undo LSN 100 → write CLR@802 (undoNextLSN=null)
  undoNextLSN=null means done!
  Write END@803 for T1
  
=== RECOVERY COMPLETE ===
 
Final Log: [BEGIN@10, UPD@100, UPD@200, UPD@300, UPD@400, UPD@500,
            CLR@601, CLR@602, CLR@701, CLR@801, CLR@802, END@803]
 
Database state: All T1's changes have been undone.
                T1 is fully rolled back.

Key Observations:

Each crash loses zero progress: Every completed undo remains as a CLR in the log
Redo recreates undo state: The CLRs are replayed, putting pages in the right state
Undo skips completed work: undoNextLSN jumps past what's already done
Bounded work per recovery: We only undo records that weren't undone before

Theoretical Guarantee:

If a transaction has N updates to undo, the total work across any number of recoveries is still O(N). Each update is undone exactly once, and its CLR is potentially redone multiple times (on each recovery), but redo is cheap compared to undo.

Crash Resistance

ARIES can survive any number of crashes during recovery. The CLR mechanism ensures progress is never lost, work is never repeated incorrectly, and the system eventually completes recovery. This property is essential for systems that must recover from power failures, hardware issues, or software crashes.

CLR Space Considerations

CLRs add overhead to the log. For every update that gets undone, there's a corresponding CLR. This has implications for log space and performance.

Space Analysis:

Each CLR is roughly the same size as the original update record
In the worst case, rolling back a transaction doubles its log footprint
For transactions that abort frequently, this can be significant
However, log space is generally cheaper than the alternatives

Why Accept This Overhead?

The alternative approaches have their own costs:

Approaches to Undo Logging
Approach	Space Cost	Time Cost	Crash Resilience
CLRs (ARIES)	2× log for aborted txns	Efficient undo	Excellent (any #crashes)
No CLRs, restart from scratch	1× log	Potentially unbounded redo/undo	Poor (wasted work)
Checkpoint after each undo	1× log + checkpoint overhead	Very slow undo	Good but expensive
Force pages after undo	1× log + I/O per undo	Very slow undo	Good but expensive

Optimizations for CLR Size:

Some implementations reduce CLR overhead:

Compressed CLRs: Store only essential information; derive redo info from original record
CLR chaining: Batch multiple undos into fewer CLRs when safe
Logical undos: For some operations, a smaller logical representation suffices
Early log truncation: Once a transaction's END is written, its CLRs can be removed

Log Truncation:

CLRs are candidates for log truncation once:

The transaction's END record is written and forced
The CLR's LSN is less than the oldest checkpoint's begin LSN
No active transaction needs to see these records

In practice, aggressive log truncation keeps log size manageable despite CLR overhead.

Transaction Abort Frequency

Well-designed applications have low abort rates. If 1% of transactions abort, CLR overhead is only 2% of that 1% = 0.02% increase in log size. The overhead is proportional to abort rate, so CLRs have minimal impact in well-behaved systems.

Summary: Compensation Log Records

Compensation Log Records are the mechanism that makes ARIES's undo phase crash-recoverable. They represent a elegant solution to a subtle problem. Let's consolidate the key concepts:

Key Takeaways

•CLRs document undo operations — They're log records that say "I undid this operation" with enough information to redo that undo.
•CLRs are redo-only — They're never undone because undoing an undo would restore the original uncommitted change.
•UndoNextLSN provides shortcuts — This field lets the undo phase skip past completed undos to find remaining work.
•Multiple crashes are handled gracefully — Redo replays CLRs, undo follows undoNextLSN, and progress is preserved.
•PageLSN is updated to CLR's LSN — This ensures redo correctly recognizes which pages already have the undo applied.
•CLRs form a chain within the transaction — Following undoNextLSN efficiently navigates undo progress.
•Space overhead is acceptable — CLRs roughly double log size for aborted transactions, but this is outweighed by the crash resilience they provide.

What's Next:

With CLRs understood, we can explore the backward scan mechanism in detail on the next page. We'll see how ARIES efficiently processes the log in reverse order, using the ToUndo set and undoNextLSN pointers to minimize I/O while undoing multiple transactions in parallel.

Page Complete

You now understand Compensation Log Records—their structure, purpose, and critical role in crash-resistant recovery. CLRs are what allow ARIES to guarantee progress through any number of crashes, making undo operations as durable as the original operations they reverse.

2 / 5

Loading learning content...

Database Management SystemsRecovery Algorithms

Undo Phase in ARIES Recovery

LevelAdvanced

Duration75 mins

TopicRecovery Algorithms

2 / 5

Compensation Log Records (CLR)

Making Undo Operations Durable

What You Will Learn

The Fundamental Problem CLRs Solve

Before diving into CLR structure, let's thoroughly understand the problem they solve. Imagine we're undoing a transaction T1 that made three updates:

T1's Log Records:

LSN 100: T1 updates page P1 (A→B)
LSN 200: T1 updates page P2 (X→Y)
LSN 300: T1 updates page P3 (M→N)
[CRASH - no commit record]

During recovery's undo phase:

We undo LSN 300: restore P3 to M
We undo LSN 200: restore P2 to X
[CRASH DURING UNDO]

Now the system restarts again. What state are we in?

Without CLRs

•No record that LSN 300 and 200 were undone
•Analysis phase still sees T1 as uncommitted
•Undo phase would try to undo LSN 300 again
•Applying before-image twice corrupts data
•If P3 was M→N, undoing gives M. Undoing again gives ???
•Recovery would produce incorrect results

With CLRs

•CLR written for each completed undo
•CLR for 300 and 200 are in the log
•Redo phase replays the CLRs (safe)
•Undo phase sees CLRs and skips them
•Undo continues from LSN 100 only
•Recovery produces correct results

The Key Insight:

CLRs turn undo operations into regular logged operations that follow the same rules as everything else:

They're written to the log before the page modification is considered durable
They're replayed during redo just like regular updates
They contain enough information to re-apply the undo if needed

This elegantly reuses the existing redo machinery to handle crash-during-undo scenarios, without requiring any special case logic.

Repeating History

CLR Structure in Detail

A Compensation Log Record has a specific structure designed to support both redo (during recovery) and undo-chain management. Let's examine each field:

Standard Log Record Fields: Like all log records, CLRs have an LSN, transaction ID, and PrevLSN. These enable the CLR to be part of the transaction's log chain.

CLR-Specific Fields:

UndoNextLSN: Points to the next log record that needs to be undone (crucial for skipping)
Redo Information: Contains the action that the CLR records (the undo operation itself)
Type identifier: Marks this as a CLR rather than a regular update

clr_structure.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
/**
 * Compensation Log Record Structure
 * 
 * A CLR documents an undo operation, making it durable
 * and enabling proper handling during repeated recovery.
 */
interface CompensationLogRecord {
    // ===== Standard Log Record Fields =====
    
    /** Unique Log Sequence Number for this CLR */
    lsn: LogSequenceNumber;
    
    /** Which transaction this CLR belongs to */
    transactionId: TransactionId;
    
    /** 
     * Previous log record for this transaction.
     * For CLRs, this points to the previous log record
     * (could be another CLR or an update record)
     */
    prevLSN: LogSequenceNumber | null;
    
    /** Type identifier */
    recordType: LogRecordType.CLR;
    
    // ===== CLR-Specific Fields =====
    
    /**
     * THE KEY FIELD: Points to the next record to undo.
     * This is the PrevLSN of the record that was just undone.
     * 
     * Example: If we undid record at LSN 200, and that record's
     * PrevLSN was 100, then undoNextLSN = 100.
     * 
     * If the undone record had no PrevLSN (was the BEGIN),
     * undoNextLSN = null (indicating undo is complete).
     */
    undoNextLSN: LogSequenceNumber | null;
    
    /**
     * The page that was modified by this undo operation.
     * Needed during redo to re-apply the CLR.
     */
    pageId: PageId;
    
    /**
     * The redo information: what was written to the page.
     * This is the before-image from the original update
     * that we are undoing.
     */
    redoInfo: {
        /** Offset within the page */
        offset: number;
        /** Length of the data */
        length: number;
        /** The data that was written (the before-image) */
        data: Buffer;
    };
    
    /**
     * LSN of the original update record that was undone.
     * Useful for debugging and log analysis.
     */
    undoLSN: LogSequenceNumber;
}
 
// Example CLR:
const exampleCLR: CompensationLogRecord = {
    lsn: 450,                    // New LSN for this CLR
    transactionId: 'T1',
    prevLSN: 300,                // Points to the last CLR or update by T1
    recordType: LogRecordType.CLR,
    undoNextLSN: 100,            // Next record to undo for T1
    pageId: 'P2',
    redoInfo: {
        offset: 42,
        length: 8,
        data: Buffer.from('X')   // The before-image we restored
    },
    undoLSN: 200                 // We undid the record at LSN 200
};

The UndoNextLSN Field Explained:

This is the most important field in a CLR. Understanding it is key to understanding how ARIES handles crash-during-undo:

When we undo a log record at LSN X, that record has a PrevLSN field
The CLR's undoNextLSN = that PrevLSN value
This tells future undo phases: "The next record to undo is at this LSN"
If the original record had PrevLSN = null, undoNextLSN = null, meaning "undo complete"

This creates a shortcut chain: during undo, when we encounter a CLR, we jump directly to undoNextLSN instead of walking through all the intermediate records.

Why Not Store the Before-Image Twice?

Why CLRs are Redo-Only

The Logical Contradiction:

Suppose transaction T1 updated page P, changing value A to B:

Original update: A → B (logged at LSN 100)
Undo (CLR): B → A (logged at LSN 500, undoNextLSN = null)

If we "undid" the CLR, we would be:

Reversing the CLR: A → B ???

But wait—this would re-apply the original change! We'd be putting T1's uncommitted change back into the database. This completely defeats the purpose of undo.

The ARIES Solution:

ARIES makes a clean design choice: CLRs are redo-only. They have no "undo" information because the concept doesn't make sense. During undo processing:

Regular updates: undo them, write CLR
CLRs: follow undoNextLSN, don't undo

Log Record Types and Their Undo/Redo Behavior
Record Type	Has Undo Info?	Has Redo Info?	During Redo Phase	During Undo Phase
UPDATE	Yes (before-image)	Yes (after-image)	Apply after-image if needed	Apply before-image, write CLR
CLR	No	Yes (before-image of undone op)	Apply redo info if needed	Do NOT undo; follow undoNextLSN
COMMIT	N/A	N/A	Mark txn as committed	Should never encounter (txn is winner)
BEGIN	N/A	N/A	Initialize txn state	Write END record, remove from ToUndo
END	N/A	N/A	Remove txn from table	Should never encounter

Mathematical Perspective:

If we denote the original operation as O (with inverse O⁻¹), then:

The update log record represents O
The CLR represents O⁻¹

The Practical Implication:

Never Try to Undo a CLR

The CLR Chain and Progress Tracking

Two Interleaved Chains:

PrevLSN Chain: Links all log records for a transaction chronologically (updates and CLRs)
UndoNextLSN Chain: Provides shortcuts through the undo progress, skipping what's already done

Let's visualize this with a concrete example:

Converting Mermaid diagram...

In the diagram above:

Teal boxes are original UPDATE records from T1
Yellow boxes are CLRs written during undo
Red markers indicate crash points
Solid arrows show the PrevLSN chain (chronological order)
Dotted arrows show the undoNextLSN chain (undo progress)

Walking Through the Scenario:

T1 makes three updates (LSN 100, 200, 300), then crashes
Recovery #1: Undo starts with LSN 300, writes CLR at 450 (undoNextLSN=200)
Undo continues with LSN 200, writes CLR at 500 (undoNextLSN=100)
Crash #2 occurs before completing
Recovery #2: Redo replays CLR 450 and 500
Undo sees T1 in table with lastLSN=500 (the last CLR)
Read CLR at 500, follow undoNextLSN to 100
Undo LSN 100, write CLR at 600 (undoNextLSN=null)
Write END record—T1 is fully rolled back

The Beauty of the Design

CLRs and Page LSN Updates

When a CLR is written for an undo operation, the affected page's PageLSN is updated to the CLR's LSN. This has important implications for the redo phase.

Why Update PageLSN?

The PageLSN on a page indicates: "This page reflects all log records up to and including this LSN." By setting PageLSN to the CLR's LSN:

Future redo phases know the undo is already applied to this page
The page is correctly marked as modified (dirty)
Crash during undo won't cause the undo operation to be lost

The Redo Phase Check:

During redo, for each log record (including CLRs), we check:

if (logRecord.lsn > page.pageLSN) {
    // Redo this operation - it wasn't persisted
    applyOperation(logRecord);
    page.pageLSN = logRecord.lsn;
} else {
    // Page already reflects this operation - skip
}

clr_page_update.pseudo

CLR and Page LSN Logic

PROCEDURE WriteCLRAndUpdatePage(undoingLSN, beforeImage, pageId):
    // Step 1: Fetch the page (may already be in buffer)
    page = bufferPool.Fetch(pageId)
    page.AcquireExclusiveLatch()
    
    TRY:
        // Step 2: Apply the before-image (the undo operation)
        page.ApplyBeforeImage(beforeImage)
        
        // Step 3: Construct the CLR
        clr = {
            type: CLR,
            transactionId: getCurrentTransaction(),
            prevLSN: transactionTable[txnId].lastLSN,
            undoNextLSN: log[undoingLSN].prevLSN,  // Skip to next
            pageId: pageId,
            redoInfo: beforeImage,  // What redo should apply
            undoLSN: undoingLSN     // What we're undoing
        }
        
        // Step 4: Write CLR to log buffer (WAL: log before data)
        clr.lsn = log.Append(clr)
        
        // Step 5: Force CLR to stable storage
        log.ForceToLSN(clr.lsn)  // Usually done asynchronously in batches
        
        // Step 6: Update page's PageLSN to the CLR's LSN
        page.SetPageLSN(clr.lsn)
        
        // Step 7: Update transaction table
        transactionTable[txnId].lastLSN = clr.lsn
        transactionTable[txnId].undoNextLSN = clr.undoNextLSN
        
        // Step 8: Mark page as dirty (if not already)
        bufferPool.MarkDirty(pageId)
    
    FINALLY:
        page.ReleaseExclusiveLatch()
    
    RETURN clr.lsn

Timing Considerations:

The order of operations matters:

Latch the page first (exclusive access)
Apply the change to the page in memory
Write the CLR to the log (WAL protocol)
Update PageLSN to the CLR's LSN
Release latch (page may remain in buffer as dirty)

Force at Commit vs Force at CLR

Handling Multiple Crashes

One of ARIES's most impressive properties is its ability to handle any number of crashes during recovery without losing progress or corrupting data. This is entirely due to the CLR mechanism.

Invariant: Progress is Always Preserved

No matter when a crash occurs:

All completed undos are recorded as CLRs
Redo will replay those CLRs, restoring the undo work
Undo will see the CLRs and skip to undoNextLSN
Only remaining work is done, never repeated work

Let's trace through a worst-case scenario with many crashes:

multiple_crash_scenario.pseudo

Multi-Crash Recovery Trace

SCENARIO: Transaction T1 with 5 updates, followed by many crashes
 
Initial State:
  Log: [BEGIN@10, UPD@100, UPD@200, UPD@300, UPD@400, UPD@500]
  T1 is uncommitted (no COMMIT record)
 
=== CRASH #1 ===
 
Recovery #1 - Analysis:
  T1 is loser, lastLSN=500, undoNextLSN=500
  
Recovery #1 - Redo:
  Replays all updates (nothing new here)
  
Recovery #1 - Undo:
  Undo LSN 500 → write CLR@601 (undoNextLSN=400)
  Undo LSN 400 → write CLR@602 (undoNextLSN=300)
  [CRASH #2 DURING UNDO]
 
=== CRASH #2 ===
 
Recovery #2 - Analysis:
  Sees T1 still active (no COMMIT or END)
  lastLSN=602 (the latest CLR)
  
Recovery #2 - Redo:
  Replays UPD@100-500 (pages already current, skipped)
  Replays CLR@601 and CLR@602 (applies them if needed)
  
Recovery #2 - Undo:
  T1's undoNextLSN from CLR@602 is 300
  Start from LSN 300 (not 500!)
  Undo LSN 300 → write CLR@701 (undoNextLSN=200)
  [CRASH #3 DURING UNDO]
 
=== CRASH #3 ===
 
Recovery #3 - Analysis:
  T1 still active, lastLSN=701
  
Recovery #3 - Redo:
  Replays CLR@601, CLR@602, CLR@701
  
Recovery #3 - Undo:
  undoNextLSN from CLR@701 is 200
  Undo LSN 200 → write CLR@801 (undoNextLSN=100)
  Undo LSN 100 → write CLR@802 (undoNextLSN=null)
  undoNextLSN=null means done!
  Write END@803 for T1
  
=== RECOVERY COMPLETE ===
 
Final Log: [BEGIN@10, UPD@100, UPD@200, UPD@300, UPD@400, UPD@500,
            CLR@601, CLR@602, CLR@701, CLR@801, CLR@802, END@803]
 
Database state: All T1's changes have been undone.
                T1 is fully rolled back.

Key Observations:

Each crash loses zero progress: Every completed undo remains as a CLR in the log
Redo recreates undo state: The CLRs are replayed, putting pages in the right state
Undo skips completed work: undoNextLSN jumps past what's already done
Bounded work per recovery: We only undo records that weren't undone before

Theoretical Guarantee:

Crash Resistance

CLR Space Considerations

CLRs add overhead to the log. For every update that gets undone, there's a corresponding CLR. This has implications for log space and performance.

Space Analysis:

Each CLR is roughly the same size as the original update record
In the worst case, rolling back a transaction doubles its log footprint
For transactions that abort frequently, this can be significant
However, log space is generally cheaper than the alternatives

Why Accept This Overhead?

The alternative approaches have their own costs:

Approaches to Undo Logging
Approach	Space Cost	Time Cost	Crash Resilience
CLRs (ARIES)	2× log for aborted txns	Efficient undo	Excellent (any #crashes)
No CLRs, restart from scratch	1× log	Potentially unbounded redo/undo	Poor (wasted work)
Checkpoint after each undo	1× log + checkpoint overhead	Very slow undo	Good but expensive
Force pages after undo	1× log + I/O per undo	Very slow undo	Good but expensive

Optimizations for CLR Size:

Some implementations reduce CLR overhead:

Compressed CLRs: Store only essential information; derive redo info from original record
CLR chaining: Batch multiple undos into fewer CLRs when safe
Logical undos: For some operations, a smaller logical representation suffices
Early log truncation: Once a transaction's END is written, its CLRs can be removed

Log Truncation:

CLRs are candidates for log truncation once:

The transaction's END record is written and forced
The CLR's LSN is less than the oldest checkpoint's begin LSN
No active transaction needs to see these records

In practice, aggressive log truncation keeps log size manageable despite CLR overhead.

Transaction Abort Frequency

Summary: Compensation Log Records

Compensation Log Records are the mechanism that makes ARIES's undo phase crash-recoverable. They represent a elegant solution to a subtle problem. Let's consolidate the key concepts:

Key Takeaways

•CLRs document undo operations — They're log records that say "I undid this operation" with enough information to redo that undo.
•CLRs are redo-only — They're never undone because undoing an undo would restore the original uncommitted change.
•UndoNextLSN provides shortcuts — This field lets the undo phase skip past completed undos to find remaining work.
•Multiple crashes are handled gracefully — Redo replays CLRs, undo follows undoNextLSN, and progress is preserved.
•PageLSN is updated to CLR's LSN — This ensures redo correctly recognizes which pages already have the undo applied.
•CLRs form a chain within the transaction — Following undoNextLSN efficiently navigates undo progress.
•Space overhead is acceptable — CLRs roughly double log size for aborted transactions, but this is outweighed by the crash resilience they provide.

What's Next:

Page Complete

2 / 5