Loading learning content...
When a transaction aborts—whether due to a constraint violation, deadlock, explicit rollback, or system crash—the database must undo all changes made by that transaction. The data must return to exactly the state before the transaction started, as if the transaction never happened.
But how do you reverse a change once it's been made? How do you restore a balance of $1000 after it's been overwritten with $500? The answer lies in undo information—the before-images carefully preserved before any modification.
Undo information is one half of the logging equation. It ensures atomicity: uncommitted transactions can be completely reversed, maintaining database consistency after any failure.
This page covers undo information in depth: what it contains, how it's structured, when it's written, how it's applied during recovery, and advanced optimizations like undo segments. You'll understand the complete lifecycle of undo data from capture to application.
Undo information, also called before-images or rollback data, is the values that existed before a modification was made. It provides the ability to reverse any change.
Formal Definition:
For any modification M that changes data item X from value V₁ to value V₂:
What Undo Information Contains:
| Component | Description | Example |
|---|---|---|
| Transaction ID | Which transaction made this modification | T1024 |
| Object ID | What data item was modified (table, row, column) | accounts.row_42.balance |
| Operation Type | INSERT, UPDATE, or DELETE | UPDATE |
| Before-Image | The value before modification | 1000 |
| Undo Action | How to reverse (implicit or explicit) | Set balance = 1000 |
Undo Actions for Different Operations:
| Original Operation | Undo Action | Before-Image Contains |
|---|---|---|
| INSERT new row | DELETE that row | The row's location/ID |
| DELETE existing row | INSERT that row back | Complete row data |
| UPDATE column value | UPDATE to old value | Original column value |
| UPDATE multiple columns | Restore all old values | All original values |
Notice that DELETE is the 'most expensive' to undo—it requires storing the complete row that was deleted. UPDATE only needs the changed columns.
Storing the before-image (V₁) is simpler than storing the inverse operation. Consider 'increment by 5'—the inverse is 'decrement by 5', but that's only correct if no other transaction modified the value. Storing V₁ directly is safe regardless of what else happened.
The timing of undo information logging is critical for the STEAL policy to work safely. Remember:
STEAL Policy: The buffer manager can write uncommitted pages to disk (to free buffer space)
For STEAL to be safe, we need the ability to undo any uncommitted change that reaches disk. This creates a strict timing requirement:
The Undo Rule:
Before any uncommitted modification can be written to disk, the log record containing that modification's undo information must be on stable storage.
123456789101112131415161718192021222324252627282930313233
function performModification(transaction: Transaction, page: Page, modification: Modification): // STEP 1: Capture before-image BEFORE making any change beforeImage = page.extractData(modification.location, modification.length) // STEP 2: Create log record with undo information logRecord = new UpdateLogRecord( LSN: log.nextLSN(), TransactionID: transaction.id, PrevLSN: transaction.lastLSN, PageID: page.id, Offset: modification.location, BeforeImage: beforeImage, // UNDO information AfterImage: modification.newValue // REDO information ) // STEP 3: Write log record to log buffer log.append(logRecord) // STEP 4: Update page LSN (marks the latest modification) page.pageLSN = logRecord.LSN // STEP 5: NOW we can safely modify the page in memory page.apply(modification) // Mark page as dirty - eligible for flush to disk // But buffer manager will ensure log is flushed first (WAL) page.markDirty() // Track for transaction's undo chain transaction.lastLSN = logRecord.LSNThe Critical Sequence:
The page cannot be flushed to disk until the log record with the before-image reaches stable storage. The buffer manager enforces this using the page LSN.
Modifying the page before capturing the before-image is a critical bug. Once you overwrite the value, the original is gone forever. You cannot reconstruct it. Always read-then-log-then-modify.
When a transaction aborts (either explicitly via ROLLBACK or implicitly due to errors), all its modifications must be undone. This happens in reverse order of the operations:
Why Reverse Order?
Consider a transaction that:
To undo correctly:
If we undid in forward order (1, 2, 3), we'd end up with X = 20, not 5!
123456789101112131415161718192021222324252627282930313233343536373839404142434445
function abortTransaction(transaction: Transaction): // Start from transaction's most recent log record currentLSN = transaction.lastLSN while currentLSN != NULL: logRecord = log.read(currentLSN) if logRecord.type == UPDATE: // Undo this modification page = bufferPool.getPage(logRecord.pageID) // Apply the before-image (restore old value) page.setData( logRecord.offset, logRecord.beforeImage ) // IMPORTANT: Log the undo action (Compensation Log Record) clr = new CLRLogRecord( LSN: log.nextLSN(), Type: "CLR", TransactionID: transaction.id, PrevLSN: currentLSN, UndoNextLSN: logRecord.prevLSN, // Skip to next undo target PageID: logRecord.pageID, AfterImage: logRecord.beforeImage // What we restored ) log.append(clr) page.pageLSN = clr.LSN page.markDirty() // Move to previous operation in this transaction currentLSN = logRecord.prevLSN // Write ABORT record abortRecord = new AbortLogRecord( LSN: log.nextLSN(), TransactionID: transaction.id, PrevLSN: transaction.lastLSN ) log.append(abortRecord) log.flush() // Force ABORT to stable storage // Release all locks lockManager.releaseAll(transaction)Compensation Log Records (CLRs):
Notice that we log each undo action as a CLR. Why log the undo?
Crash during abort — If the system crashes while aborting T, recovery must continue the abort. CLRs show which undos already completed.
No double-undo — The UndoNextLSN field in CLR points to the next record to undo, skipping the ones already undone. This prevents re-undoing the same operation.
Idempotence — CLRs make the abort process recoverable and restartable from any point.
CLRs themselves are never undone—they represent completed undo actions that must persist.
CLRs don't have before-images because they're never undone. They only have after-images (what the undo restored). During recovery, CLRs are redone if needed (to ensure the undo took effect) but never undone. The UndoNextLSN field allows recovery to skip over already-undone work efficiently.
After a crash, the system must identify and undo all uncommitted transactions. This is more complex than runtime abort because:
The Recovery Undo Phase (from ARIES):
During recovery, after the Analysis and Redo phases, the Undo phase processes uncommitted transactions:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
function recoveryUndoPhase(loserTransactions: Set<TransactionID>, lastLSN: Map<TransactionID, LSN>): // Build initial undo list: last LSN of each loser transaction // Priority queue ordered by LSN (highest first = reverse order) undoList = new MaxHeap<LSN>() for txn in loserTransactions: if lastLSN[txn] != NULL: undoList.insert(lastLSN[txn]) while not undoList.isEmpty(): // Get the highest LSN (most recent operation to undo) currentLSN = undoList.extractMax() logRecord = log.read(currentLSN) if logRecord.type == CLR: // This is a compensation record (already undone action) // Skip to what it says should be undone next if logRecord.undoNextLSN != NULL: undoList.insert(logRecord.undoNextLSN) else: // Transaction fully undone - write END record writeEndRecord(logRecord.transactionID) else if logRecord.type == UPDATE: // Undo this modification page = bufferPool.getPage(logRecord.pageID) // Only undo if the modification actually reached this page if page.pageLSN >= logRecord.LSN: // Apply before-image page.setData(logRecord.offset, logRecord.beforeImage) // Write CLR clr = new CLRLogRecord( LSN: log.nextLSN(), TransactionID: logRecord.transactionID, UndoNextLSN: logRecord.prevLSN, PageID: logRecord.pageID, AfterImage: logRecord.beforeImage ) log.append(clr) page.pageLSN = clr.LSN page.markDirty() // Continue with previous operation if logRecord.prevLSN != NULL: undoList.insert(logRecord.prevLSN) else: // Transaction start reached - fully undone writeEndRecord(logRecord.transactionID) log.flush() print("Undo phase complete")Key Points About Recovery Undo:
Process all losers together — Using a max-heap naturally processes highest LSNs first, achieving reverse chronological order across all transactions.
CLRs guide skipping — When we encounter a CLR, we know its corresponding update was already undone. The UndoNextLSN tells us where to continue.
Conditional undo — We check page.pageLSN >= log.LSN because the modification might not have reached disk. No undo needed if it didn't.
END records — After fully undoing a transaction, we write an END record so future recoveries know this transaction is complete.
Processing all loser transactions together with a max-heap is more efficient than undoing each completely before starting the next. It minimizes the total number of page accesses since operations on the same page (from different transactions) may be adjacent in LSN order.
Some database systems (notably Oracle and MySQL/InnoDB) use a separate structure for undo information called undo segments or rollback segments. This offers advantages over storing undo directly in the log:
The Undo Segment Architecture:
Instead of embedding before-images in the redo log:
| Aspect | Log-Embedded Undo | Separate Undo Segments |
|---|---|---|
| Storage | Undo in redo log records | Undo in dedicated tablespace |
| Log Size | Larger (contains all before-images) | Smaller (just pointers) |
| Read Consistency | Complex to implement | Natural support via undo segments |
| Long Transactions | Log grows significantly | Undo space managed separately |
| MVCC | Requires additional structures | Undo segments provide old versions |
| Complexity | Simpler implementation | More complex, more flexible |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
struct UndoSegment: // Circular buffer of undo pages pages: List<UndoPage> // Currently active page for new undo records currentPage: UndoPage // Transaction that owns this segment (if single-transaction mode) owningTransaction: TransactionID | null // State: ACTIVE, CACHED, TO_FREE, TO_PURGE state: SegmentState struct UndoPage: // List of undo records in this page records: List<UndoRecord> // Previous page in this transaction's chain prevPage: PagePointer | null struct UndoRecord: // Transaction that created this undo transactionID: TransactionID // Type: INSERT_UNDO, UPDATE_UNDO, or DELETE_UNDO type: UndoType // Table and row identification tableID: uint32 primaryKey: bytes // Previous version of the row (or row data for delete) beforeImage: bytes // Pointer to next older version (for MVCC chain) rollPointer: UndoPointer | null // Transaction ID that last modified before this prevTrxID: TransactionID // Undo record creation examplefunction createUndoForUpdate(transaction: Transaction, table: Table, row: Row, oldValues: bytes): // Allocate undo record in transaction's undo segment undoRecord = transaction.undoSegment.allocate( size: sizeof(UndoRecord) + len(oldValues) ) undoRecord.transactionID = transaction.id undoRecord.type = UPDATE_UNDO undoRecord.tableID = table.id undoRecord.primaryKey = row.primaryKey undoRecord.beforeImage = oldValues undoRecord.rollPointer = row.currentRollPointer undoRecord.prevTrxID = row.currentTrxID // Update row's roll pointer to this new undo version row.rollPointer = address(undoRecord) row.trxID = transaction.id return undoRecordKey Concepts:
Roll Pointer Chain — Each row has a pointer to its most recent undo record. That undo record points to the previous version. This creates a version chain for MVCC.
Undo Segment Reuse — After a transaction commits and no active readers need its old versions, undo space can be reclaimed. A purge process cleans old undo records.
Read Consistency — When a reader queries with an older snapshot, it follows the roll pointer chain to find the version visible to its snapshot.
Undo Logs are Logged — Changes to undo segments are themselves logged in the redo log. This allows recovery to reconstruct undo segments after a crash.
For INSERTs, undo is special: the 'undo' of an insert is to delete the row. Insert undo records can be purged as soon as the transaction commits—no other transaction can have seen a row that didn't exist yet. Delete and update undos must wait until no reader needs them.
In systems with Multi-Version Concurrency Control (MVCC), undo information serves a dual purpose:
How MVCC Uses Undo:
When a reader queries with snapshot at time T₁, and the current row version was modified by a later transaction T₂:
row.trxID = T₂ > T₁row.rollPointer to the undo record1234567891011121314151617181920212223242526272829303132333435363738394041
function readRow(reader: Transaction, row: Row) -> RowVersion | null: // Start with current version in the data page currentVersion = row.currentData currentTrxID = row.trxID rollPointer = row.rollPointer while true: // Check if this version is visible to reader if isVisible(reader, currentTrxID): return currentVersion // Not visible - need an older version if rollPointer == null: // No older version exists - row didn't exist in reader's snapshot return null // Follow roll pointer to undo segment undoRecord = readUndoRecord(rollPointer) // Extract version data from undo record currentVersion = undoRecord.beforeImage currentTrxID = undoRecord.prevTrxID rollPointer = undoRecord.rollPointer function isVisible(reader: Transaction, writerTrxID: TransactionID) -> bool: // A version is visible if: // 1. It was written by reader itself if writerTrxID == reader.id: return true // 2. Writer committed before reader's snapshot started if writerTrxID < reader.snapshotID: if isCommitted(writerTrxID): return true // 3. Writer is in reader's snapshot's "active list" -> not visible if writerTrxID in reader.activeList: return false // Default: not visible return falseUndo Purging with Active Readers:
Because undo records serve readers, they cannot be purged while any reader might need them. The system tracks:
This creates a trade-off: long-running read transactions prevent undo purging, consuming storage. 'Long transaction undo bloat' is a common operational issue in PostgreSQL and MySQL.
A long-running query (or forgotten open transaction) can prevent undo purging for hours. Meanwhile, write transactions keep generating new undo records. This can exhaust undo tablespace. Monitor transaction age and set appropriate timeouts.
Production database systems employ several optimizations to make undo handling efficient:
1. Logical Undo
Instead of storing physical byte changes, store the logical operation and its inverse:
| Logical Operation | Logical Undo |
|---|---|
INSERT INTO t VALUES (...) | DELETE FROM t WHERE pk = ... |
DELETE FROM t WHERE pk = ... | INSERT INTO t VALUES (saved_row) |
UPDATE t SET x = 10 WHERE pk = ... | UPDATE t SET x = old_value WHERE pk = ... |
Advantages: Smaller undo records, more flexible. Challenges: Must handle lost updates, concurrent modifications.
2. Partial Rollback (Savepoints)
Transactions can set savepoints and rollback to them, undoing only work after the savepoint:
12345678910111213141516
BEGIN TRANSACTION; INSERT INTO orders VALUES (1001, 'pending'); SAVEPOINT order_created; INSERT INTO order_items VALUES (1001, 'item_a', 2); -- Error here!INSERT INTO order_items VALUES (1001, 'item_b', 1); -- Oops, item_a was wrong!ROLLBACK TO SAVEPOINT order_created; -- Try againINSERT INTO order_items VALUES (1001, 'item_c', 2); COMMIT;Savepoints work by recording the transaction's lastLSN at savepoint creation. Partial rollback undoes records back to that LSN, then continues the transaction.
3. In-Page Undo
Some systems store undo information within the data page itself, not in separate undo segments:
4. Group Undo
When many rows in the same table are undone:
We've explored undo information comprehensively—from basic concepts to advanced optimizations. Let's consolidate the key points:
What's Next:
Now that we understand undo information for reversing uncommitted changes, the next page explores redo information—the after-images that enable replaying committed transactions. Redo is the complement to undo, ensuring durability rather than atomicity.
You now understand undo information—the before-images that enable transaction rollback and MVCC versioning. Undo ensures atomicity: transactions that don't commit leave no trace. Next, we'll explore redo information, which ensures durability: committed transactions survive any failure.