Loading learning content...
Consider a troubling scenario: the database system crashes during the undo phase of recovery. You were halfway through rolling back three incomplete transactions when power failed. What happens when the system restarts?
Without special precautions, you'd face a dilemma: some undo operations succeeded, some pages have been reverted, but there's no record of this progress. Do you restart undo from the beginning? That could apply before-images twice, corrupting data. Do you skip what was already done? But how do you know what was done?
Compensation Log Records (CLRs) solve this problem elegantly. They are special log records that document undo operations, making them durable and allowing recovery to "repeat history" for undos just as it does for regular operations.
By the end of this page, you will understand the structure and purpose of CLRs, why they are "redo-only" records, how the undoNextLSN field enables skipping during repeated recovery, and how CLRs form a chain that tracks undo progress through any number of crash-recovery cycles.
Before diving into CLR structure, let's thoroughly understand the problem they solve. Imagine we're undoing a transaction T1 that made three updates:
T1's Log Records:
During recovery's undo phase:
Now the system restarts again. What state are we in?
The Key Insight:
CLRs turn undo operations into regular logged operations that follow the same rules as everything else:
This elegantly reuses the existing redo machinery to handle crash-during-undo scenarios, without requiring any special case logic.
ARIES follows a "repeating history" paradigm: during redo, recreate the exact state at crash time, including any partial undo progress. This means redo treats CLRs just like regular log records—it replays them. Then undo can simply continue from where it left off, knowing that all previous undo progress has been restored.
A Compensation Log Record has a specific structure designed to support both redo (during recovery) and undo-chain management. Let's examine each field:
Standard Log Record Fields: Like all log records, CLRs have an LSN, transaction ID, and PrevLSN. These enable the CLR to be part of the transaction's log chain.
CLR-Specific Fields:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
/** * Compensation Log Record Structure * * A CLR documents an undo operation, making it durable * and enabling proper handling during repeated recovery. */interface CompensationLogRecord { // ===== Standard Log Record Fields ===== /** Unique Log Sequence Number for this CLR */ lsn: LogSequenceNumber; /** Which transaction this CLR belongs to */ transactionId: TransactionId; /** * Previous log record for this transaction. * For CLRs, this points to the previous log record * (could be another CLR or an update record) */ prevLSN: LogSequenceNumber | null; /** Type identifier */ recordType: LogRecordType.CLR; // ===== CLR-Specific Fields ===== /** * THE KEY FIELD: Points to the next record to undo. * This is the PrevLSN of the record that was just undone. * * Example: If we undid record at LSN 200, and that record's * PrevLSN was 100, then undoNextLSN = 100. * * If the undone record had no PrevLSN (was the BEGIN), * undoNextLSN = null (indicating undo is complete). */ undoNextLSN: LogSequenceNumber | null; /** * The page that was modified by this undo operation. * Needed during redo to re-apply the CLR. */ pageId: PageId; /** * The redo information: what was written to the page. * This is the before-image from the original update * that we are undoing. */ redoInfo: { /** Offset within the page */ offset: number; /** Length of the data */ length: number; /** The data that was written (the before-image) */ data: Buffer; }; /** * LSN of the original update record that was undone. * Useful for debugging and log analysis. */ undoLSN: LogSequenceNumber;} // Example CLR:const exampleCLR: CompensationLogRecord = { lsn: 450, // New LSN for this CLR transactionId: 'T1', prevLSN: 300, // Points to the last CLR or update by T1 recordType: LogRecordType.CLR, undoNextLSN: 100, // Next record to undo for T1 pageId: 'P2', redoInfo: { offset: 42, length: 8, data: Buffer.from('X') // The before-image we restored }, undoLSN: 200 // We undid the record at LSN 200};The UndoNextLSN Field Explained:
This is the most important field in a CLR. Understanding it is key to understanding how ARIES handles crash-during-undo:
This creates a shortcut chain: during undo, when we encounter a CLR, we jump directly to undoNextLSN instead of walking through all the intermediate records.
The CLR stores redo information (what to write during redo), which happens to be the before-image from the original update. It doesn't need to store an "undo" image because CLRs are never undone—they're redo-only. This reduces log space and simplifies the design.
A fundamental property of CLRs is that they are never undone. This might seem strange at first—if we can undo regular updates, why can't we undo CLRs? The answer lies in what it would mean to undo an undo.
The Logical Contradiction:
Suppose transaction T1 updated page P, changing value A to B:
If we "undid" the CLR, we would be:
But wait—this would re-apply the original change! We'd be putting T1's uncommitted change back into the database. This completely defeats the purpose of undo.
The ARIES Solution:
ARIES makes a clean design choice: CLRs are redo-only. They have no "undo" information because the concept doesn't make sense. During undo processing:
| Record Type | Has Undo Info? | Has Redo Info? | During Redo Phase | During Undo Phase |
|---|---|---|---|---|
| UPDATE | Yes (before-image) | Yes (after-image) | Apply after-image if needed | Apply before-image, write CLR |
| CLR | No | Yes (before-image of undone op) | Apply redo info if needed | Do NOT undo; follow undoNextLSN |
| COMMIT | N/A | N/A | Mark txn as committed | Should never encounter (txn is winner) |
| BEGIN | N/A | N/A | Initialize txn state | Write END record, remove from ToUndo |
| END | N/A | N/A | Remove txn from table | Should never encounter |
Mathematical Perspective:
If we denote the original operation as O (with inverse O⁻¹), then:
Undoing the CLR would be (O⁻¹)⁻¹ = O, which brings us back to the original operation. But we're trying to remove the effects of O! The only sensible approach is to treat CLRs as facts about the undo process, not as reversible operations.
The Practical Implication:
Because CLRs are redo-only, they're simpler than regular update records. They don't need a "before-before-image" (what the data was before the undo). They only need enough information to redo the undo if there's another crash. This is why CLRs can be smaller than the original update records in some cases.
Any recovery algorithm that attempts to undo CLRs is fundamentally broken. Undoing an undo produces the wrong result and violates atomicity. The undoNextLSN pointer exists precisely to skip CLRs during the undo phase while still tracking progress.
CLRs form a chain through their undoNextLSN pointers that provides efficient navigation during undo. This chain is separate from (but connected to) the regular PrevLSN chain that links all records for a transaction.
Two Interleaved Chains:
Let's visualize this with a concrete example:
In the diagram above:
Walking Through the Scenario:
Notice how the second recovery doesn't need special logic. It simply processes the CLRs like any other log record during redo, then follows undoNextLSN during undo. The system naturally resumes from where it left off. This uniformity is a hallmark of ARIES's elegant design.
When a CLR is written for an undo operation, the affected page's PageLSN is updated to the CLR's LSN. This has important implications for the redo phase.
Why Update PageLSN?
The PageLSN on a page indicates: "This page reflects all log records up to and including this LSN." By setting PageLSN to the CLR's LSN:
The Redo Phase Check:
During redo, for each log record (including CLRs), we check:
if (logRecord.lsn > page.pageLSN) {
// Redo this operation - it wasn't persisted
applyOperation(logRecord);
page.pageLSN = logRecord.lsn;
} else {
// Page already reflects this operation - skip
}
12345678910111213141516171819202122232425262728293031323334353637383940
PROCEDURE WriteCLRAndUpdatePage(undoingLSN, beforeImage, pageId): // Step 1: Fetch the page (may already be in buffer) page = bufferPool.Fetch(pageId) page.AcquireExclusiveLatch() TRY: // Step 2: Apply the before-image (the undo operation) page.ApplyBeforeImage(beforeImage) // Step 3: Construct the CLR clr = { type: CLR, transactionId: getCurrentTransaction(), prevLSN: transactionTable[txnId].lastLSN, undoNextLSN: log[undoingLSN].prevLSN, // Skip to next pageId: pageId, redoInfo: beforeImage, // What redo should apply undoLSN: undoingLSN // What we're undoing } // Step 4: Write CLR to log buffer (WAL: log before data) clr.lsn = log.Append(clr) // Step 5: Force CLR to stable storage log.ForceToLSN(clr.lsn) // Usually done asynchronously in batches // Step 6: Update page's PageLSN to the CLR's LSN page.SetPageLSN(clr.lsn) // Step 7: Update transaction table transactionTable[txnId].lastLSN = clr.lsn transactionTable[txnId].undoNextLSN = clr.undoNextLSN // Step 8: Mark page as dirty (if not already) bufferPool.MarkDirty(pageId) FINALLY: page.ReleaseExclusiveLatch() RETURN clr.lsnTiming Considerations:
The order of operations matters:
The WAL protocol requirement is critical: the CLR must be on stable storage before we consider the undo complete. In practice, systems often batch log writes, so the CLR might sit in the log buffer briefly. But before the page can be written to disk, the log must be forced to at least that CLR's LSN.
During normal transaction abort, the system might not force each CLR individually—it can batch them. But the final END record must be forced, and WAL ensures any dirty pages with CLR-updated PageLSNs can't be written until those CLRs are stable. During recovery, there's no commit to force, but the log writes during undo still follow WAL semantics.
One of ARIES's most impressive properties is its ability to handle any number of crashes during recovery without losing progress or corrupting data. This is entirely due to the CLR mechanism.
Invariant: Progress is Always Preserved
No matter when a crash occurs:
Let's trace through a worst-case scenario with many crashes:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
SCENARIO: Transaction T1 with 5 updates, followed by many crashes Initial State: Log: [BEGIN@10, UPD@100, UPD@200, UPD@300, UPD@400, UPD@500] T1 is uncommitted (no COMMIT record) === CRASH #1 === Recovery #1 - Analysis: T1 is loser, lastLSN=500, undoNextLSN=500 Recovery #1 - Redo: Replays all updates (nothing new here) Recovery #1 - Undo: Undo LSN 500 → write CLR@601 (undoNextLSN=400) Undo LSN 400 → write CLR@602 (undoNextLSN=300) [CRASH #2 DURING UNDO] === CRASH #2 === Recovery #2 - Analysis: Sees T1 still active (no COMMIT or END) lastLSN=602 (the latest CLR) Recovery #2 - Redo: Replays UPD@100-500 (pages already current, skipped) Replays CLR@601 and CLR@602 (applies them if needed) Recovery #2 - Undo: T1's undoNextLSN from CLR@602 is 300 Start from LSN 300 (not 500!) Undo LSN 300 → write CLR@701 (undoNextLSN=200) [CRASH #3 DURING UNDO] === CRASH #3 === Recovery #3 - Analysis: T1 still active, lastLSN=701 Recovery #3 - Redo: Replays CLR@601, CLR@602, CLR@701 Recovery #3 - Undo: undoNextLSN from CLR@701 is 200 Undo LSN 200 → write CLR@801 (undoNextLSN=100) Undo LSN 100 → write CLR@802 (undoNextLSN=null) undoNextLSN=null means done! Write END@803 for T1 === RECOVERY COMPLETE === Final Log: [BEGIN@10, UPD@100, UPD@200, UPD@300, UPD@400, UPD@500, CLR@601, CLR@602, CLR@701, CLR@801, CLR@802, END@803] Database state: All T1's changes have been undone. T1 is fully rolled back.Key Observations:
Theoretical Guarantee:
If a transaction has N updates to undo, the total work across any number of recoveries is still O(N). Each update is undone exactly once, and its CLR is potentially redone multiple times (on each recovery), but redo is cheap compared to undo.
ARIES can survive any number of crashes during recovery. The CLR mechanism ensures progress is never lost, work is never repeated incorrectly, and the system eventually completes recovery. This property is essential for systems that must recover from power failures, hardware issues, or software crashes.
CLRs add overhead to the log. For every update that gets undone, there's a corresponding CLR. This has implications for log space and performance.
Space Analysis:
Why Accept This Overhead?
The alternative approaches have their own costs:
| Approach | Space Cost | Time Cost | Crash Resilience |
|---|---|---|---|
| CLRs (ARIES) | 2× log for aborted txns | Efficient undo | Excellent (any #crashes) |
| No CLRs, restart from scratch | 1× log | Potentially unbounded redo/undo | Poor (wasted work) |
| Checkpoint after each undo | 1× log + checkpoint overhead | Very slow undo | Good but expensive |
| Force pages after undo | 1× log + I/O per undo | Very slow undo | Good but expensive |
Optimizations for CLR Size:
Some implementations reduce CLR overhead:
Log Truncation:
CLRs are candidates for log truncation once:
In practice, aggressive log truncation keeps log size manageable despite CLR overhead.
Well-designed applications have low abort rates. If 1% of transactions abort, CLR overhead is only 2% of that 1% = 0.02% increase in log size. The overhead is proportional to abort rate, so CLRs have minimal impact in well-behaved systems.
Compensation Log Records are the mechanism that makes ARIES's undo phase crash-recoverable. They represent a elegant solution to a subtle problem. Let's consolidate the key concepts:
What's Next:
With CLRs understood, we can explore the backward scan mechanism in detail on the next page. We'll see how ARIES efficiently processes the log in reverse order, using the ToUndo set and undoNextLSN pointers to minimize I/O while undoing multiple transactions in parallel.
You now understand Compensation Log Records—their structure, purpose, and critical role in crash-resistant recovery. CLRs are what allow ARIES to guarantee progress through any number of crashes, making undo operations as durable as the original operations they reverse.