Database Management SystemsWrite-Ahead Logging (WAL)

Write-Ahead Logging (WAL)

LevelIntermediate

Duration60 mins

TopicWrite-Ahead Logging (WAL)

3 / 5

Undo Information

Reversing the Irreversible

When a transaction aborts—whether due to a constraint violation, deadlock, explicit rollback, or system crash—the database must undo all changes made by that transaction. The data must return to exactly the state before the transaction started, as if the transaction never happened.

But how do you reverse a change once it's been made? How do you restore a balance of $1000 after it's been overwritten with $500? The answer lies in undo information—the before-images carefully preserved before any modification.

Undo information is one half of the logging equation. It ensures atomicity: uncommitted transactions can be completely reversed, maintaining database consistency after any failure.

What You Will Learn

This page covers undo information in depth: what it contains, how it's structured, when it's written, how it's applied during recovery, and advanced optimizations like undo segments. You'll understand the complete lifecycle of undo data from capture to application.

What is Undo Information?

Undo information, also called before-images or rollback data, is the values that existed before a modification was made. It provides the ability to reverse any change.

Formal Definition:

For any modification M that changes data item X from value V₁ to value V₂:

Undo information = V₁ (the original value)
Purpose = If we need to reverse M, we replace the current value with V₁

What Undo Information Contains:

Components of Undo Information
Component	Description	Example
Transaction ID	Which transaction made this modification	`T1024`
Object ID	What data item was modified (table, row, column)	`accounts.row_42.balance`
Operation Type	INSERT, UPDATE, or DELETE	`UPDATE`
Before-Image	The value before modification	`1000`
Undo Action	How to reverse (implicit or explicit)	Set balance = 1000

Undo Actions for Different Operations:

Original Operation	Undo Action	Before-Image Contains
INSERT new row	DELETE that row	The row's location/ID
DELETE existing row	INSERT that row back	Complete row data
UPDATE column value	UPDATE to old value	Original column value
UPDATE multiple columns	Restore all old values	All original values

Notice that DELETE is the 'most expensive' to undo—it requires storing the complete row that was deleted. UPDATE only needs the changed columns.

Before-Image vs Inverse Operation

Storing the before-image (V₁) is simpler than storing the inverse operation. Consider 'increment by 5'—the inverse is 'decrement by 5', but that's only correct if no other transaction modified the value. Storing V₁ directly is safe regardless of what else happened.

When Undo Information is Written

The timing of undo information logging is critical for the STEAL policy to work safely. Remember:

STEAL Policy: The buffer manager can write uncommitted pages to disk (to free buffer space)

For STEAL to be safe, we need the ability to undo any uncommitted change that reaches disk. This creates a strict timing requirement:

The Undo Rule:

Before any uncommitted modification can be written to disk, the log record containing that modification's undo information must be on stable storage.

Undo Information Timing

Pseudocode

function performModification(transaction: Transaction, 
                               page: Page, 
                               modification: Modification):
    
    // STEP 1: Capture before-image BEFORE making any change
    beforeImage = page.extractData(modification.location, modification.length)
    
    // STEP 2: Create log record with undo information
    logRecord = new UpdateLogRecord(
        LSN: log.nextLSN(),
        TransactionID: transaction.id,
        PrevLSN: transaction.lastLSN,
        PageID: page.id,
        Offset: modification.location,
        BeforeImage: beforeImage,      // UNDO information
        AfterImage: modification.newValue  // REDO information
    )
    
    // STEP 3: Write log record to log buffer
    log.append(logRecord)
    
    // STEP 4: Update page LSN (marks the latest modification)
    page.pageLSN = logRecord.LSN
    
    // STEP 5: NOW we can safely modify the page in memory
    page.apply(modification)
    
    // Mark page as dirty - eligible for flush to disk
    // But buffer manager will ensure log is flushed first (WAL)
    page.markDirty()
    
    // Track for transaction's undo chain
    transaction.lastLSN = logRecord.LSN

The Critical Sequence:

Read the current value — Capture what's there now
Create log record — Include the before-image
Update page LSN — Mark the page with this modification's LSN
Modify the page — Apply the actual change
Mark page dirty — Flag for eventual disk write

The page cannot be flushed to disk until the log record with the before-image reaches stable storage. The buffer manager enforces this using the page LSN.

Common Mistake

Modifying the page before capturing the before-image is a critical bug. Once you overwrite the value, the original is gone forever. You cannot reconstruct it. Always read-then-log-then-modify.

Undo During Transaction Abort

When a transaction aborts (either explicitly via ROLLBACK or implicitly due to errors), all its modifications must be undone. This happens in reverse order of the operations:

Why Reverse Order?

Consider a transaction that:

Sets X = 10 (was 5)
Sets X = 20 (was 10)
Sets X = 30 (was 20)

To undo correctly:

First undo step 3: X = 20
Then undo step 2: X = 10
Finally undo step 1: X = 5

If we undid in forward order (1, 2, 3), we'd end up with X = 20, not 5!

Transaction Abort Procedure

Pseudocode

function abortTransaction(transaction: Transaction):
    // Start from transaction's most recent log record
    currentLSN = transaction.lastLSN
    
    while currentLSN != NULL:
        logRecord = log.read(currentLSN)
        
        if logRecord.type == UPDATE:
            // Undo this modification
            page = bufferPool.getPage(logRecord.pageID)
            
            // Apply the before-image (restore old value)
            page.setData(
                logRecord.offset, 
                logRecord.beforeImage
            )
            
            // IMPORTANT: Log the undo action (Compensation Log Record)
            clr = new CLRLogRecord(
                LSN: log.nextLSN(),
                Type: "CLR",
                TransactionID: transaction.id,
                PrevLSN: currentLSN,
                UndoNextLSN: logRecord.prevLSN,  // Skip to next undo target
                PageID: logRecord.pageID,
                AfterImage: logRecord.beforeImage  // What we restored
            )
            log.append(clr)
            page.pageLSN = clr.LSN
            page.markDirty()
        
        // Move to previous operation in this transaction
        currentLSN = logRecord.prevLSN
    
    // Write ABORT record
    abortRecord = new AbortLogRecord(
        LSN: log.nextLSN(),
        TransactionID: transaction.id,
        PrevLSN: transaction.lastLSN
    )
    log.append(abortRecord)
    log.flush()  // Force ABORT to stable storage
    
    // Release all locks
    lockManager.releaseAll(transaction)

Compensation Log Records (CLRs):

Notice that we log each undo action as a CLR. Why log the undo?

Crash during abort — If the system crashes while aborting T, recovery must continue the abort. CLRs show which undos already completed.
No double-undo — The UndoNextLSN field in CLR points to the next record to undo, skipping the ones already undone. This prevents re-undoing the same operation.
Idempotence — CLRs make the abort process recoverable and restartable from any point.

CLRs themselves are never undone—they represent completed undo actions that must persist.

CLRs are Redo-Only

CLRs don't have before-images because they're never undone. They only have after-images (what the undo restored). During recovery, CLRs are redone if needed (to ensure the undo took effect) but never undone. The UndoNextLSN field allows recovery to skip over already-undone work efficiently.

Undo During Crash Recovery

After a crash, the system must identify and undo all uncommitted transactions. This is more complex than runtime abort because:

Multiple transactions may need undo simultaneously
Some transactions may have been mid-abort when crash occurred
Log may contain incomplete transaction chains

The Recovery Undo Phase (from ARIES):

During recovery, after the Analysis and Redo phases, the Undo phase processes uncommitted transactions:

Recovery Undo Phase

Pseudocode

function recoveryUndoPhase(loserTransactions: Set<TransactionID>,
                            lastLSN: Map<TransactionID, LSN>):
    
    // Build initial undo list: last LSN of each loser transaction
    // Priority queue ordered by LSN (highest first = reverse order)
    undoList = new MaxHeap<LSN>()
    
    for txn in loserTransactions:
        if lastLSN[txn] != NULL:
            undoList.insert(lastLSN[txn])
    
    while not undoList.isEmpty():
        // Get the highest LSN (most recent operation to undo)
        currentLSN = undoList.extractMax()
        logRecord = log.read(currentLSN)
        
        if logRecord.type == CLR:
            // This is a compensation record (already undone action)
            // Skip to what it says should be undone next
            if logRecord.undoNextLSN != NULL:
                undoList.insert(logRecord.undoNextLSN)
            else:
                // Transaction fully undone - write END record
                writeEndRecord(logRecord.transactionID)
        
        else if logRecord.type == UPDATE:
            // Undo this modification
            page = bufferPool.getPage(logRecord.pageID)
            
            // Only undo if the modification actually reached this page
            if page.pageLSN >= logRecord.LSN:
                // Apply before-image
                page.setData(logRecord.offset, logRecord.beforeImage)
                
                // Write CLR
                clr = new CLRLogRecord(
                    LSN: log.nextLSN(),
                    TransactionID: logRecord.transactionID,
                    UndoNextLSN: logRecord.prevLSN,
                    PageID: logRecord.pageID,
                    AfterImage: logRecord.beforeImage
                )
                log.append(clr)
                page.pageLSN = clr.LSN
                page.markDirty()
            
            // Continue with previous operation
            if logRecord.prevLSN != NULL:
                undoList.insert(logRecord.prevLSN)
            else:
                // Transaction start reached - fully undone
                writeEndRecord(logRecord.transactionID)
    
    log.flush()
    print("Undo phase complete")

Key Points About Recovery Undo:

Process all losers together — Using a max-heap naturally processes highest LSNs first, achieving reverse chronological order across all transactions.
CLRs guide skipping — When we encounter a CLR, we know its corresponding update was already undone. The UndoNextLSN tells us where to continue.
Conditional undo — We check page.pageLSN >= log.LSN because the modification might not have reached disk. No undo needed if it didn't.
END records — After fully undoing a transaction, we write an END record so future recoveries know this transaction is complete.

Efficiency of the Max-Heap

Processing all loser transactions together with a max-heap is more efficient than undoing each completely before starting the next. It minimizes the total number of page accesses since operations on the same page (from different transactions) may be adjacent in LSN order.

Undo Segments and Rollback Segments

Some database systems (notably Oracle and MySQL/InnoDB) use a separate structure for undo information called undo segments or rollback segments. This offers advantages over storing undo directly in the log:

The Undo Segment Architecture:

Instead of embedding before-images in the redo log:

Undo information is stored in dedicated undo tablespace
The redo log contains pointers to undo records
Undo segments are themselves logged for recovery

Undo Segment vs Log-Embedded Undo
Aspect	Log-Embedded Undo	Separate Undo Segments
Storage	Undo in redo log records	Undo in dedicated tablespace
Log Size	Larger (contains all before-images)	Smaller (just pointers)
Read Consistency	Complex to implement	Natural support via undo segments
Long Transactions	Log grows significantly	Undo space managed separately
MVCC	Requires additional structures	Undo segments provide old versions
Complexity	Simpler implementation	More complex, more flexible

Undo Segment Structure (InnoDB-style)

Pseudocode

struct UndoSegment:
    // Circular buffer of undo pages
    pages: List<UndoPage>
    
    // Currently active page for new undo records
    currentPage: UndoPage
    
    // Transaction that owns this segment (if single-transaction mode)
    owningTransaction: TransactionID | null
    
    // State: ACTIVE, CACHED, TO_FREE, TO_PURGE
    state: SegmentState
 
struct UndoPage:
    // List of undo records in this page
    records: List<UndoRecord>
    
    // Previous page in this transaction's chain
    prevPage: PagePointer | null
 
struct UndoRecord:
    // Transaction that created this undo
    transactionID: TransactionID
    
    // Type: INSERT_UNDO, UPDATE_UNDO, or DELETE_UNDO
    type: UndoType
    
    // Table and row identification
    tableID: uint32
    primaryKey: bytes
    
    // Previous version of the row (or row data for delete)
    beforeImage: bytes
    
    // Pointer to next older version (for MVCC chain)
    rollPointer: UndoPointer | null
    
    // Transaction ID that last modified before this
    prevTrxID: TransactionID
 
// Undo record creation example
function createUndoForUpdate(transaction: Transaction,
                            table: Table,
                            row: Row,
                            oldValues: bytes):
    
    // Allocate undo record in transaction's undo segment
    undoRecord = transaction.undoSegment.allocate(
        size: sizeof(UndoRecord) + len(oldValues)
    )
    
    undoRecord.transactionID = transaction.id
    undoRecord.type = UPDATE_UNDO
    undoRecord.tableID = table.id
    undoRecord.primaryKey = row.primaryKey
    undoRecord.beforeImage = oldValues
    undoRecord.rollPointer = row.currentRollPointer
    undoRecord.prevTrxID = row.currentTrxID
    
    // Update row's roll pointer to this new undo version
    row.rollPointer = address(undoRecord)
    row.trxID = transaction.id
    
    return undoRecord

Key Concepts:

Roll Pointer Chain — Each row has a pointer to its most recent undo record. That undo record points to the previous version. This creates a version chain for MVCC.
Undo Segment Reuse — After a transaction commits and no active readers need its old versions, undo space can be reclaimed. A purge process cleans old undo records.
Read Consistency — When a reader queries with an older snapshot, it follows the roll pointer chain to find the version visible to its snapshot.
Undo Logs are Logged — Changes to undo segments are themselves logged in the redo log. This allows recovery to reconstruct undo segments after a crash.

Undo for Inserts

For INSERTs, undo is special: the 'undo' of an insert is to delete the row. Insert undo records can be purged as soon as the transaction commits—no other transaction can have seen a row that didn't exist yet. Delete and update undos must wait until no reader needs them.

Undo and MVCC Integration

In systems with Multi-Version Concurrency Control (MVCC), undo information serves a dual purpose:

Transaction Rollback — Restore previous values when a transaction aborts
Read Consistency — Provide old versions to readers who need a historical snapshot

How MVCC Uses Undo:

When a reader queries with snapshot at time T₁, and the current row version was modified by a later transaction T₂:

Reader sees that current row.trxID = T₂ > T₁
Reader follows row.rollPointer to the undo record
Undo record contains the version before T₂'s modification
Reader checks if this version is visible to T₁
If not, continues down the roll pointer chain until finding visible version

MVCC Version Visibility

Pseudocode

function readRow(reader: Transaction, row: Row) -> RowVersion | null:
    // Start with current version in the data page
    currentVersion = row.currentData
    currentTrxID = row.trxID
    rollPointer = row.rollPointer
    
    while true:
        // Check if this version is visible to reader
        if isVisible(reader, currentTrxID):
            return currentVersion
        
        // Not visible - need an older version
        if rollPointer == null:
            // No older version exists - row didn't exist in reader's snapshot
            return null
        
        // Follow roll pointer to undo segment
        undoRecord = readUndoRecord(rollPointer)
        
        // Extract version data from undo record
        currentVersion = undoRecord.beforeImage
        currentTrxID = undoRecord.prevTrxID
        rollPointer = undoRecord.rollPointer
 
function isVisible(reader: Transaction, writerTrxID: TransactionID) -> bool:
    // A version is visible if:
    // 1. It was written by reader itself
    if writerTrxID == reader.id:
        return true
    
    // 2. Writer committed before reader's snapshot started
    if writerTrxID < reader.snapshotID:
        if isCommitted(writerTrxID):
            return true
    
    // 3. Writer is in reader's snapshot's "active list" -> not visible
    if writerTrxID in reader.activeList:
        return false
    
    // Default: not visible
    return false

Undo Purging with Active Readers:

Because undo records serve readers, they cannot be purged while any reader might need them. The system tracks:

Oldest Active Snapshot — The earliest snapshot timestamp still in use
Purge Threshold — Undo records from transactions committed before the oldest snapshot can be purged
Purge Process — Background thread that removes purgeable undo records

This creates a trade-off: long-running read transactions prevent undo purging, consuming storage. 'Long transaction undo bloat' is a common operational issue in PostgreSQL and MySQL.

Undo Bloat from Long Reads

A long-running query (or forgotten open transaction) can prevent undo purging for hours. Meanwhile, write transactions keep generating new undo records. This can exhaust undo tablespace. Monitor transaction age and set appropriate timeouts.

Optimizations and Advanced Topics

Production database systems employ several optimizations to make undo handling efficient:

1. Logical Undo

Instead of storing physical byte changes, store the logical operation and its inverse:

Logical Operation	Logical Undo
`INSERT INTO t VALUES (...)`	`DELETE FROM t WHERE pk = ...`
`DELETE FROM t WHERE pk = ...`	`INSERT INTO t VALUES (saved_row)`
`UPDATE t SET x = 10 WHERE pk = ...`	`UPDATE t SET x = old_value WHERE pk = ...`

Advantages: Smaller undo records, more flexible. Challenges: Must handle lost updates, concurrent modifications.

2. Partial Rollback (Savepoints)

Transactions can set savepoints and rollback to them, undoing only work after the savepoint:

Savepoint Implementation
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
BEGIN TRANSACTION;
 
INSERT INTO orders VALUES (1001, 'pending');
 
SAVEPOINT order_created;
 
INSERT INTO order_items VALUES (1001, 'item_a', 2);  -- Error here!
INSERT INTO order_items VALUES (1001, 'item_b', 1);
 
-- Oops, item_a was wrong!
ROLLBACK TO SAVEPOINT order_created;
 
-- Try again
INSERT INTO order_items VALUES (1001, 'item_c', 2);
 
COMMIT;

Savepoints work by recording the transaction's lastLSN at savepoint creation. Partial rollback undoes records back to that LSN, then continues the transaction.

3. In-Page Undo

Some systems store undo information within the data page itself, not in separate undo segments:

Row contains link to its previous version (within same page if possible)
Reduces I/O for most rollbacks
Complicates page layout

4. Group Undo

When many rows in the same table are undone:

Batch the undo operations
Single pass through affected pages
Reduces buffer pool pressure

Performance Considerations

•Undo segment contention — In high-concurrency systems, many transactions may compete for undo segment allocation. Use multiple undo segments.
•Undo I/O during rollback — Large rollbacks require reading many undo pages. Consider abort cost when designing transactions.
•Undo during recovery — Recovery undo can be the longest phase. More uncommitted work at crash time = longer recovery.
•Undo tablespace sizing — Size undo storage for worst-case: longest expected transaction × number of concurrent transactions × average undo per second.

Summary: Undo Information

We've explored undo information comprehensively—from basic concepts to advanced optimizations. Let's consolidate the key points:

Key Takeaways

•Undo information is before-images — The values that existed before modification, enabling reversal of changes.
•Undo must be logged before data can reach disk — This is the 'undo rule' of WAL, enabling the STEAL policy.
•Abort processes operations in reverse order — Ensures correct restoration of original state when multiple operations affect the same data.
•CLRs make undo restartable — Compensation Log Records track undo progress, enabling recovery to continue after crash-during-abort.
•Undo segments serve dual purpose — Rollback for aborted transactions AND old versions for MVCC readers.
•Undo retention depends on readers — Long-running reads prevent undo purging, potentially causing storage bloat.

What's Next:

Now that we understand undo information for reversing uncommitted changes, the next page explores redo information—the after-images that enable replaying committed transactions. Redo is the complement to undo, ensuring durability rather than atomicity.

Page Complete

You now understand undo information—the before-images that enable transaction rollback and MVCC versioning. Undo ensures atomicity: transactions that don't commit leave no trace. Next, we'll explore redo information, which ensures durability: committed transactions survive any failure.

3 / 5

Loading learning content...

Database Management SystemsWrite-Ahead Logging (WAL)

Write-Ahead Logging (WAL)

LevelIntermediate

Duration60 mins

TopicWrite-Ahead Logging (WAL)

3 / 5

Undo Information

Reversing the Irreversible

Undo information is one half of the logging equation. It ensures atomicity: uncommitted transactions can be completely reversed, maintaining database consistency after any failure.

What You Will Learn

What is Undo Information?

Undo information, also called before-images or rollback data, is the values that existed before a modification was made. It provides the ability to reverse any change.

Formal Definition:

For any modification M that changes data item X from value V₁ to value V₂:

Undo information = V₁ (the original value)
Purpose = If we need to reverse M, we replace the current value with V₁

What Undo Information Contains:

Components of Undo Information
Component	Description	Example
Transaction ID	Which transaction made this modification	`T1024`
Object ID	What data item was modified (table, row, column)	`accounts.row_42.balance`
Operation Type	INSERT, UPDATE, or DELETE	`UPDATE`
Before-Image	The value before modification	`1000`
Undo Action	How to reverse (implicit or explicit)	Set balance = 1000

Undo Actions for Different Operations:

Original Operation	Undo Action	Before-Image Contains
INSERT new row	DELETE that row	The row's location/ID
DELETE existing row	INSERT that row back	Complete row data
UPDATE column value	UPDATE to old value	Original column value
UPDATE multiple columns	Restore all old values	All original values

Notice that DELETE is the 'most expensive' to undo—it requires storing the complete row that was deleted. UPDATE only needs the changed columns.

Before-Image vs Inverse Operation

When Undo Information is Written

The timing of undo information logging is critical for the STEAL policy to work safely. Remember:

STEAL Policy: The buffer manager can write uncommitted pages to disk (to free buffer space)

For STEAL to be safe, we need the ability to undo any uncommitted change that reaches disk. This creates a strict timing requirement:

The Undo Rule:

Before any uncommitted modification can be written to disk, the log record containing that modification's undo information must be on stable storage.

Undo Information Timing

Pseudocode

function performModification(transaction: Transaction, 
                               page: Page, 
                               modification: Modification):
    
    // STEP 1: Capture before-image BEFORE making any change
    beforeImage = page.extractData(modification.location, modification.length)
    
    // STEP 2: Create log record with undo information
    logRecord = new UpdateLogRecord(
        LSN: log.nextLSN(),
        TransactionID: transaction.id,
        PrevLSN: transaction.lastLSN,
        PageID: page.id,
        Offset: modification.location,
        BeforeImage: beforeImage,      // UNDO information
        AfterImage: modification.newValue  // REDO information
    )
    
    // STEP 3: Write log record to log buffer
    log.append(logRecord)
    
    // STEP 4: Update page LSN (marks the latest modification)
    page.pageLSN = logRecord.LSN
    
    // STEP 5: NOW we can safely modify the page in memory
    page.apply(modification)
    
    // Mark page as dirty - eligible for flush to disk
    // But buffer manager will ensure log is flushed first (WAL)
    page.markDirty()
    
    // Track for transaction's undo chain
    transaction.lastLSN = logRecord.LSN

The Critical Sequence:

Read the current value — Capture what's there now
Create log record — Include the before-image
Update page LSN — Mark the page with this modification's LSN
Modify the page — Apply the actual change
Mark page dirty — Flag for eventual disk write

The page cannot be flushed to disk until the log record with the before-image reaches stable storage. The buffer manager enforces this using the page LSN.

Common Mistake

Modifying the page before capturing the before-image is a critical bug. Once you overwrite the value, the original is gone forever. You cannot reconstruct it. Always read-then-log-then-modify.

Undo During Transaction Abort

When a transaction aborts (either explicitly via ROLLBACK or implicitly due to errors), all its modifications must be undone. This happens in reverse order of the operations:

Why Reverse Order?

Consider a transaction that:

Sets X = 10 (was 5)
Sets X = 20 (was 10)
Sets X = 30 (was 20)

To undo correctly:

First undo step 3: X = 20
Then undo step 2: X = 10
Finally undo step 1: X = 5

If we undid in forward order (1, 2, 3), we'd end up with X = 20, not 5!

Transaction Abort Procedure

Pseudocode

function abortTransaction(transaction: Transaction):
    // Start from transaction's most recent log record
    currentLSN = transaction.lastLSN
    
    while currentLSN != NULL:
        logRecord = log.read(currentLSN)
        
        if logRecord.type == UPDATE:
            // Undo this modification
            page = bufferPool.getPage(logRecord.pageID)
            
            // Apply the before-image (restore old value)
            page.setData(
                logRecord.offset, 
                logRecord.beforeImage
            )
            
            // IMPORTANT: Log the undo action (Compensation Log Record)
            clr = new CLRLogRecord(
                LSN: log.nextLSN(),
                Type: "CLR",
                TransactionID: transaction.id,
                PrevLSN: currentLSN,
                UndoNextLSN: logRecord.prevLSN,  // Skip to next undo target
                PageID: logRecord.pageID,
                AfterImage: logRecord.beforeImage  // What we restored
            )
            log.append(clr)
            page.pageLSN = clr.LSN
            page.markDirty()
        
        // Move to previous operation in this transaction
        currentLSN = logRecord.prevLSN
    
    // Write ABORT record
    abortRecord = new AbortLogRecord(
        LSN: log.nextLSN(),
        TransactionID: transaction.id,
        PrevLSN: transaction.lastLSN
    )
    log.append(abortRecord)
    log.flush()  // Force ABORT to stable storage
    
    // Release all locks
    lockManager.releaseAll(transaction)

Compensation Log Records (CLRs):

Notice that we log each undo action as a CLR. Why log the undo?

Crash during abort — If the system crashes while aborting T, recovery must continue the abort. CLRs show which undos already completed.
No double-undo — The UndoNextLSN field in CLR points to the next record to undo, skipping the ones already undone. This prevents re-undoing the same operation.
Idempotence — CLRs make the abort process recoverable and restartable from any point.

CLRs themselves are never undone—they represent completed undo actions that must persist.

CLRs are Redo-Only

Undo During Crash Recovery

After a crash, the system must identify and undo all uncommitted transactions. This is more complex than runtime abort because:

Multiple transactions may need undo simultaneously
Some transactions may have been mid-abort when crash occurred
Log may contain incomplete transaction chains

The Recovery Undo Phase (from ARIES):

During recovery, after the Analysis and Redo phases, the Undo phase processes uncommitted transactions:

Recovery Undo Phase

Pseudocode

function recoveryUndoPhase(loserTransactions: Set<TransactionID>,
                            lastLSN: Map<TransactionID, LSN>):
    
    // Build initial undo list: last LSN of each loser transaction
    // Priority queue ordered by LSN (highest first = reverse order)
    undoList = new MaxHeap<LSN>()
    
    for txn in loserTransactions:
        if lastLSN[txn] != NULL:
            undoList.insert(lastLSN[txn])
    
    while not undoList.isEmpty():
        // Get the highest LSN (most recent operation to undo)
        currentLSN = undoList.extractMax()
        logRecord = log.read(currentLSN)
        
        if logRecord.type == CLR:
            // This is a compensation record (already undone action)
            // Skip to what it says should be undone next
            if logRecord.undoNextLSN != NULL:
                undoList.insert(logRecord.undoNextLSN)
            else:
                // Transaction fully undone - write END record
                writeEndRecord(logRecord.transactionID)
        
        else if logRecord.type == UPDATE:
            // Undo this modification
            page = bufferPool.getPage(logRecord.pageID)
            
            // Only undo if the modification actually reached this page
            if page.pageLSN >= logRecord.LSN:
                // Apply before-image
                page.setData(logRecord.offset, logRecord.beforeImage)
                
                // Write CLR
                clr = new CLRLogRecord(
                    LSN: log.nextLSN(),
                    TransactionID: logRecord.transactionID,
                    UndoNextLSN: logRecord.prevLSN,
                    PageID: logRecord.pageID,
                    AfterImage: logRecord.beforeImage
                )
                log.append(clr)
                page.pageLSN = clr.LSN
                page.markDirty()
            
            // Continue with previous operation
            if logRecord.prevLSN != NULL:
                undoList.insert(logRecord.prevLSN)
            else:
                // Transaction start reached - fully undone
                writeEndRecord(logRecord.transactionID)
    
    log.flush()
    print("Undo phase complete")

Key Points About Recovery Undo:

Process all losers together — Using a max-heap naturally processes highest LSNs first, achieving reverse chronological order across all transactions.
CLRs guide skipping — When we encounter a CLR, we know its corresponding update was already undone. The UndoNextLSN tells us where to continue.
Conditional undo — We check page.pageLSN >= log.LSN because the modification might not have reached disk. No undo needed if it didn't.
END records — After fully undoing a transaction, we write an END record so future recoveries know this transaction is complete.

Efficiency of the Max-Heap

Undo Segments and Rollback Segments

The Undo Segment Architecture:

Instead of embedding before-images in the redo log:

Undo information is stored in dedicated undo tablespace
The redo log contains pointers to undo records
Undo segments are themselves logged for recovery

Undo Segment vs Log-Embedded Undo
Aspect	Log-Embedded Undo	Separate Undo Segments
Storage	Undo in redo log records	Undo in dedicated tablespace
Log Size	Larger (contains all before-images)	Smaller (just pointers)
Read Consistency	Complex to implement	Natural support via undo segments
Long Transactions	Log grows significantly	Undo space managed separately
MVCC	Requires additional structures	Undo segments provide old versions
Complexity	Simpler implementation	More complex, more flexible

Undo Segment Structure (InnoDB-style)

Pseudocode

struct UndoSegment:
    // Circular buffer of undo pages
    pages: List<UndoPage>
    
    // Currently active page for new undo records
    currentPage: UndoPage
    
    // Transaction that owns this segment (if single-transaction mode)
    owningTransaction: TransactionID | null
    
    // State: ACTIVE, CACHED, TO_FREE, TO_PURGE
    state: SegmentState
 
struct UndoPage:
    // List of undo records in this page
    records: List<UndoRecord>
    
    // Previous page in this transaction's chain
    prevPage: PagePointer | null
 
struct UndoRecord:
    // Transaction that created this undo
    transactionID: TransactionID
    
    // Type: INSERT_UNDO, UPDATE_UNDO, or DELETE_UNDO
    type: UndoType
    
    // Table and row identification
    tableID: uint32
    primaryKey: bytes
    
    // Previous version of the row (or row data for delete)
    beforeImage: bytes
    
    // Pointer to next older version (for MVCC chain)
    rollPointer: UndoPointer | null
    
    // Transaction ID that last modified before this
    prevTrxID: TransactionID
 
// Undo record creation example
function createUndoForUpdate(transaction: Transaction,
                            table: Table,
                            row: Row,
                            oldValues: bytes):
    
    // Allocate undo record in transaction's undo segment
    undoRecord = transaction.undoSegment.allocate(
        size: sizeof(UndoRecord) + len(oldValues)
    )
    
    undoRecord.transactionID = transaction.id
    undoRecord.type = UPDATE_UNDO
    undoRecord.tableID = table.id
    undoRecord.primaryKey = row.primaryKey
    undoRecord.beforeImage = oldValues
    undoRecord.rollPointer = row.currentRollPointer
    undoRecord.prevTrxID = row.currentTrxID
    
    // Update row's roll pointer to this new undo version
    row.rollPointer = address(undoRecord)
    row.trxID = transaction.id
    
    return undoRecord

Key Concepts:

Roll Pointer Chain — Each row has a pointer to its most recent undo record. That undo record points to the previous version. This creates a version chain for MVCC.
Undo Segment Reuse — After a transaction commits and no active readers need its old versions, undo space can be reclaimed. A purge process cleans old undo records.
Read Consistency — When a reader queries with an older snapshot, it follows the roll pointer chain to find the version visible to its snapshot.
Undo Logs are Logged — Changes to undo segments are themselves logged in the redo log. This allows recovery to reconstruct undo segments after a crash.

Undo for Inserts

Undo and MVCC Integration

In systems with Multi-Version Concurrency Control (MVCC), undo information serves a dual purpose:

Transaction Rollback — Restore previous values when a transaction aborts
Read Consistency — Provide old versions to readers who need a historical snapshot

How MVCC Uses Undo:

When a reader queries with snapshot at time T₁, and the current row version was modified by a later transaction T₂:

Reader sees that current row.trxID = T₂ > T₁
Reader follows row.rollPointer to the undo record
Undo record contains the version before T₂'s modification
Reader checks if this version is visible to T₁
If not, continues down the roll pointer chain until finding visible version

MVCC Version Visibility

Pseudocode

function readRow(reader: Transaction, row: Row) -> RowVersion | null:
    // Start with current version in the data page
    currentVersion = row.currentData
    currentTrxID = row.trxID
    rollPointer = row.rollPointer
    
    while true:
        // Check if this version is visible to reader
        if isVisible(reader, currentTrxID):
            return currentVersion
        
        // Not visible - need an older version
        if rollPointer == null:
            // No older version exists - row didn't exist in reader's snapshot
            return null
        
        // Follow roll pointer to undo segment
        undoRecord = readUndoRecord(rollPointer)
        
        // Extract version data from undo record
        currentVersion = undoRecord.beforeImage
        currentTrxID = undoRecord.prevTrxID
        rollPointer = undoRecord.rollPointer
 
function isVisible(reader: Transaction, writerTrxID: TransactionID) -> bool:
    // A version is visible if:
    // 1. It was written by reader itself
    if writerTrxID == reader.id:
        return true
    
    // 2. Writer committed before reader's snapshot started
    if writerTrxID < reader.snapshotID:
        if isCommitted(writerTrxID):
            return true
    
    // 3. Writer is in reader's snapshot's "active list" -> not visible
    if writerTrxID in reader.activeList:
        return false
    
    // Default: not visible
    return false

Undo Purging with Active Readers:

Because undo records serve readers, they cannot be purged while any reader might need them. The system tracks:

Oldest Active Snapshot — The earliest snapshot timestamp still in use
Purge Threshold — Undo records from transactions committed before the oldest snapshot can be purged
Purge Process — Background thread that removes purgeable undo records

This creates a trade-off: long-running read transactions prevent undo purging, consuming storage. 'Long transaction undo bloat' is a common operational issue in PostgreSQL and MySQL.

Undo Bloat from Long Reads

Optimizations and Advanced Topics

Production database systems employ several optimizations to make undo handling efficient:

1. Logical Undo

Instead of storing physical byte changes, store the logical operation and its inverse:

Logical Operation	Logical Undo
`INSERT INTO t VALUES (...)`	`DELETE FROM t WHERE pk = ...`
`DELETE FROM t WHERE pk = ...`	`INSERT INTO t VALUES (saved_row)`
`UPDATE t SET x = 10 WHERE pk = ...`	`UPDATE t SET x = old_value WHERE pk = ...`

Advantages: Smaller undo records, more flexible. Challenges: Must handle lost updates, concurrent modifications.

2. Partial Rollback (Savepoints)

Transactions can set savepoints and rollback to them, undoing only work after the savepoint:

Savepoint Implementation
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
BEGIN TRANSACTION;
 
INSERT INTO orders VALUES (1001, 'pending');
 
SAVEPOINT order_created;
 
INSERT INTO order_items VALUES (1001, 'item_a', 2);  -- Error here!
INSERT INTO order_items VALUES (1001, 'item_b', 1);
 
-- Oops, item_a was wrong!
ROLLBACK TO SAVEPOINT order_created;
 
-- Try again
INSERT INTO order_items VALUES (1001, 'item_c', 2);
 
COMMIT;

Savepoints work by recording the transaction's lastLSN at savepoint creation. Partial rollback undoes records back to that LSN, then continues the transaction.

3. In-Page Undo

Some systems store undo information within the data page itself, not in separate undo segments:

Row contains link to its previous version (within same page if possible)
Reduces I/O for most rollbacks
Complicates page layout

4. Group Undo

When many rows in the same table are undone:

Batch the undo operations
Single pass through affected pages
Reduces buffer pool pressure

Performance Considerations

•Undo segment contention — In high-concurrency systems, many transactions may compete for undo segment allocation. Use multiple undo segments.
•Undo I/O during rollback — Large rollbacks require reading many undo pages. Consider abort cost when designing transactions.
•Undo during recovery — Recovery undo can be the longest phase. More uncommitted work at crash time = longer recovery.
•Undo tablespace sizing — Size undo storage for worst-case: longest expected transaction × number of concurrent transactions × average undo per second.

Summary: Undo Information

We've explored undo information comprehensively—from basic concepts to advanced optimizations. Let's consolidate the key points:

Key Takeaways

•Undo information is before-images — The values that existed before modification, enabling reversal of changes.
•Undo must be logged before data can reach disk — This is the 'undo rule' of WAL, enabling the STEAL policy.
•Abort processes operations in reverse order — Ensures correct restoration of original state when multiple operations affect the same data.
•CLRs make undo restartable — Compensation Log Records track undo progress, enabling recovery to continue after crash-during-abort.
•Undo segments serve dual purpose — Rollback for aborted transactions AND old versions for MVCC readers.
•Undo retention depends on readers — Long-running reads prevent undo purging, potentially causing storage bloat.

What's Next:

Page Complete

3 / 5