Loading learning content...
The log in a database system can span gigabytes or even terabytes of data, containing millions of log records. During the undo phase, we need to reverse the effects of all loser transactions—but how do we navigate this massive sequential structure efficiently?
A naive approach would process each loser transaction independently: walk backward through T1's records, then walk backward through T2's records, then T3's, and so on. But this would result in multiple passes over the log, each requiring expensive disk I/O.
ARIES's backward scan solves this problem elegantly: it processes ALL loser transactions in a single backward pass through the log, undoing operations in reverse chronological order regardless of which transaction performed them. This minimizes disk seeks and ensures optimal I/O efficiency.
By the end of this page, you will understand why backward scanning is essential for efficient undo, how the ToUndo set orchestrates multi-transaction undo, the algorithm for selecting and processing the next record to undo, and the I/O optimizations that make backward scanning practical.
Before exploring the mechanics, let's understand why undo must proceed backward through the log rather than forward.
The Ordering Problem:
Consider a transaction that makes two updates to the same page:
If we tried to undo forward:
The problem is that undoing in forward order destroys the context needed for later undos. The bonus calculation assumed the salary was 50000; undoing in wrong order produces incorrect results.
Backward Order Preserves Context:
Undoing backward:
Each undo sees the correct "current" value and restores the correct "previous" value.
| Aspect | Forward Undo (Wrong) | Backward Undo (Correct) |
|---|---|---|
| Logical correctness | Breaks for dependent operations | Maintains correct dependencies |
| Before-image validity | May not match current page state | Always matches current page state |
| Multi-page transactions | Complex dependency tracking needed | Naturally handles all cases |
| Implementation complexity | High (need dependency analysis) | Low (simple reverse traversal) |
| Recovery from partial undo | Very complex | Simple with CLRs and undoNextLSN |
The Before-Image Contract:
Log records store before-images based on the state of data at the time of the update. When we undo backward:
This invariant holds at each step only when processing in reverse order.
ARIES uses physical undo (restore exact byte values from before-images). Some systems use logical undo (execute inverse operations). Logical undo can be more flexible but has its own ordering complexities. Physical undo's backward requirement is more straightforward conceptually.
The ToUndo set is the central data structure driving the backward scan. It tracks where we are in the undo process for each loser transaction and determines which log record to process next.
Structure:
The ToUndo set contains (LSN, TransactionID) pairs, where each LSN is the next record that needs to be undone for that transaction. It's typically implemented as a max-heap (priority queue) ordered by LSN, allowing efficient extraction of the highest LSN.
Initialization:
At the start of undo, for each loser transaction T in the transaction table:
The undoNextLSN field comes from the analysis phase. For transactions that were actively running at crash time, it equals lastLSN. For transactions that were already aborting, it reflects partial undo progress.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
/** * The ToUndo set implementation using a max-heap. * Efficiently manages the undo frontier across all loser transactions. */class ToUndoSet { // Max-heap: largest LSN at the top private heap: MaxHeap<{ lsn: LogSequenceNumber; txnId: TransactionId }>; constructor() { this.heap = new MaxHeap((a, b) => a.lsn.compareTo(b.lsn)); } /** * Initialize from the transaction table at start of undo phase. */ static fromTransactionTable(txnTable: TransactionTable): ToUndoSet { const toUndo = new ToUndoSet(); for (const [txnId, txnEntry] of txnTable.entries()) { // Only process losers (active or aborting transactions) if (txnEntry.status === 'ACTIVE' || txnEntry.status === 'ABORTING') { if (txnEntry.undoNextLSN !== null) { toUndo.insert(txnEntry.undoNextLSN, txnId); } else { // Transaction has no updates - just log END Log.write({ type: 'END', transactionId: txnId }); txnTable.remove(txnId); } } } return toUndo; } /** * Insert a new (LSN, txnId) pair into the set. */ insert(lsn: LogSequenceNumber, txnId: TransactionId): void { this.heap.insert({ lsn, txnId }); } /** * Extract and return the entry with the highest LSN. * Returns null if the set is empty. */ extractMax(): { lsn: LogSequenceNumber; txnId: TransactionId } | null { return this.heap.extractMax(); } /** * Check if there are more records to undo. */ isEmpty(): boolean { return this.heap.isEmpty(); } /** * Peek at the highest LSN without removing it. * Useful for I/O prefetching decisions. */ peekMax(): LogSequenceNumber | null { const max = this.heap.peekMax(); return max ? max.lsn : null; } /** * Number of transactions still being undone. */ size(): number { return this.heap.size(); }} // Example usage during undo phase:function exampleUndoPhase(txnTable: TransactionTable) { const toUndo = ToUndoSet.fromTransactionTable(txnTable); console.log(`Starting undo with ${toUndo.size()} loser transactions`); while (!toUndo.isEmpty()) { const { lsn, txnId } = toUndo.extractMax()!; console.log(`Processing LSN ${lsn} for transaction ${txnId}`); // ... undo logic here ... }}Why a Max-Heap?
The max-heap ensures we always process the highest LSN next. This is crucial for:
For systems with few concurrent transactions, even a simple sorted list would work. But max-heaps scale better if there are many losers.
The ToUndo set contains at most one entry per loser transaction, not one per log record. If we have 10 loser transactions, the set has at most 10 entries. This keeps the overhead minimal even for transactions with many updates.
Now let's walk through the complete backward scan algorithm. The core loop is deceptively simple, but understanding each step is essential.
Main Loop Structure:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
PROCEDURE BackwardScanUndo(transactionTable): // ============================================= // PHASE 1: Initialize ToUndo set // ============================================= ToUndo = MaxHeap() // Priority queue by LSN FOR EACH (txnId, entry) IN transactionTable: IF entry.status IN {ACTIVE, ABORTING}: IF entry.undoNextLSN != NULL: ToUndo.Insert(entry.undoNextLSN, txnId) ELSE: // No work to do for this transaction Log.Write(END_RECORD, transactionId: txnId) Log.Info("Transaction {txnId} had no updates to undo") Log.Info("Starting backward scan with {ToUndo.Size()} losers") // ============================================= // PHASE 2: Main backward scan loop // ============================================= WHILE NOT ToUndo.IsEmpty(): // Get the highest LSN (latest operation to undo) (currentLSN, txnId) = ToUndo.ExtractMax() // Read the log record at this LSN logRecord = Log.Read(currentLSN) ASSERT logRecord.transactionId == txnId // ========================================= // CASE A: This is an UPDATE record // ========================================= IF logRecord.type == UPDATE: // Step A1: Apply the before-image to undo this update page = BufferPool.Fetch(logRecord.pageId) page.ApplyBeforeImage(logRecord.beforeImage) // Step A2: Write a CLR documenting this undo clr = WriteCLR( transactionId: txnId, pageId: logRecord.pageId, redoInfo: logRecord.beforeImage, undoNextLSN: logRecord.prevLSN, // Key: skip to next undoLSN: currentLSN ) // Step A3: Update page's LSN to the CLR page.SetPageLSN(clr.lsn) BufferPool.MarkDirty(logRecord.pageId) // Step A4: Update ToUndo with next record to process IF logRecord.prevLSN != NULL: ToUndo.Insert(logRecord.prevLSN, txnId) ELSE: // Reached the BEGIN - transaction fully undone Log.Write(END_RECORD, transactionId: txnId) transactionTable.Remove(txnId) Log.Info("Transaction {txnId} fully rolled back") // ========================================= // CASE B: This is a CLR (from previous partial undo) // ========================================= ELSE IF logRecord.type == CLR: // Don't undo the CLR! Just follow undoNextLSN IF logRecord.undoNextLSN != NULL: ToUndo.Insert(logRecord.undoNextLSN, txnId) ELSE: // CLR says undo is complete Log.Write(END_RECORD, transactionId: txnId) transactionTable.Remove(txnId) Log.Info("Transaction {txnId} fully rolled back (resumed)") // ========================================= // CASE C: This is a BEGIN record // ========================================= ELSE IF logRecord.type == BEGIN: // Transaction fully undone Log.Write(END_RECORD, transactionId: txnId) transactionTable.Remove(txnId) // ============================================= // PHASE 3: Undo complete // ============================================= ASSERT transactionTable.IsEmpty() OR ALL entries are COMMITTED Log.Info("Backward scan complete - all losers rolled back")Critical Observations:
One read per undo: We read exactly one log record per iteration, and we undo at most one operation per iteration. No wasted reads.
CLRs are skipped, not undone: When we encounter a CLR, we don't redo or undo it - we just follow its undoNextLSN to continue where we left off.
The loop terminates when ToUndo is empty: This happens when every loser has reached its BEGIN record (or a CLR with null undoNextLSN).
END records finalize cleanup: The END record marks that the transaction is completely rolled back, enabling log truncation.
Notice that we only process log records from loser transactions. The ToUndo set only contains entries for losers, so we never "see" records from committed transactions during undo. Those records were already correctly handled by redo - their effects remain in the database.
A key insight of the backward scan is that it processes multiple transactions interleaved based on their LSN ordering, not sequentially by transaction. Let's see this in action.
Example Scenario:
Three loser transactions were active at crash:
The backward scan processes them in this order:
| Step | ToUndo State | Process LSN | Transaction | Action | Next for Txn |
|---|---|---|---|---|---|
| 0 | {400→T1, 300→T2, 450→T3} | Initialize | |||
| 1 | {400→T1, 300→T2} | 450 | T3 | Undo, CLR@501 | 350 |
| 2 | {400→T1, 350→T3} | 300 | T2 | Undo, CLR@502 | 150 |
| 3 | {350→T3, 150→T2} | 400 | T1 | Undo, CLR@503 | 250 |
| 4 | {250→T1, 150→T2} | 350 | T3 | Undo, CLR@504 | 200 |
| 5 | {250→T1, 200→T3} | 150 | T2 | Undo, CLR@505 | null→END |
| 6 | {200→T3} | 250 | T1 | Undo, CLR@506 | 100 |
| 7 | {100→T1} | 200 | T3 | Undo, CLR@507 | null→END |
| 8 | {} | 100 | T1 | Undo, CLR@508 | null→END |
Key Observations:
Processing jumps between transactions: We don't finish T1 before starting T2. We process whichever transaction has the highest remaining LSN.
T2 finishes first: Even though T2 wasn't processed first overall, it has fewer records and thus completes before T1 and T3.
Log access is sequential backward: We read LSNs 450, 400, 350, 300, 250, 200, 150, 100 - a monotonically decreasing sequence.
CLRs are written in order: CLR@501, 502, 503... form a chronologically ordered sequence in the log.
Why This Order?
The interleaved order ensures we read the log sequentially backward. If we processed T1 completely before T2:
This would mean 8 non-sequential reads instead of one smooth backward scan.
In the diagram:
The backward scan reads from right to left (450 → 400 → 350 → 300 → 250 → 200 → 150 → 100), undoing operations in that order regardless of which transaction they belong to.
Sequential backward read is extremely efficient on modern storage. The disk head (or SSD controller) reads in one direction without seeking. Databases often prefetch log pages backward during undo to maximize throughput.
While the ToUndo set determines which LSN to process next, the PrevLSN chain determines where each transaction goes after processing a record. Every log record contains a PrevLSN field pointing to the previous log record for the same transaction.
Chain Structure:
For a transaction with updates at LSNs 100, 250, 400:
When we undo record@400, we add 250 to ToUndo. When we undo record@250, we add 100. When we undo record@100, PrevLSN is null, so the transaction is complete.
12345678910111213141516171819202122232425262728293031323334
// After undoing a record, determine next step for this transaction PROCEDURE DetermineNextUndo(logRecord, txnId, ToUndo, transactionTable): IF logRecord.type == UPDATE: // For updates, the next record to undo is found via PrevLSN nextLSN = logRecord.prevLSN ELSE IF logRecord.type == CLR: // For CLRs, jump using undoNextLSN (skip what's already done) nextLSN = logRecord.undoNextLSN // Determine what to do with nextLSN IF nextLSN == NULL: // No more records to undo for this transaction // This means we've reached the BEGIN record // Write END record to mark transaction as fully rolled back Log.Write(END_RECORD, transactionId: txnId) // Remove from transaction table transactionTable.Remove(txnId) // Don't add anything to ToUndo Log.Debug("Transaction {txnId} undo complete") ELSE: // More records remain to undo ToUndo.Insert(nextLSN, txnId) // Also update transaction table's tracking transactionTable[txnId].undoNextLSN = nextLSN Log.Debug("Transaction {txnId} next undo: LSN {nextLSN}")PrevLSN vs UndoNextLSN:
It's important to distinguish these two pointers:
| Field | Stored In | Points To | Purpose |
|---|---|---|---|
| PrevLSN | Every log record | Previous record for same txn | Chronological chain |
| UndoNextLSN | CLRs only | Next record to undo | Skip completed undos |
For UPDATE records, we use PrevLSN to find the next undo target. For CLR records, we use UndoNextLSN instead (which may skip many records).
Why CLRs Skip Ahead:
When we undo UPDATE@300 and write CLR@450, the CLR's undoNextLSN is set to UPDATE@300's PrevLSN (say, 200). This means the CLR effectively "encapsulates" the undo of 300 and provides a shortcut to 200. If we crash and restart, we don't need to undo 300 again—we just follow the CLR's pointer.
The PrevLSN chain is constructed during normal transaction processing, not during recovery. Each time a transaction writes a log record, PrevLSN is set to that transaction's last log record LSN. This is why recovery can efficiently navigate the chain without rebuilding it.
While the backward scan is inherently I/O-efficient, production systems employ additional optimizations to minimize recovery time.
1. Log Buffer Reuse:
During redo, we've already read most of the log. Smart implementations cache log pages in memory during redo and reuse them during undo, eliminating redundant disk reads.
2. Prefetching:
Knowing that we're scanning backward, the system can prefetch log pages ahead of the current position:
// If current position is page 1000
// Prefetch pages 999, 998, 997... in background
This hides disk latency by reading pages before they're needed.
3. Batch Log Writes:
CLRs written during undo can be batched in the log buffer. The log buffer is flushed periodically or when necessary:
4. Parallel Undo (Advanced):
Some systems parallelize undo when transactions access non-overlapping pages:
// Worker 1 handles T1 (pages P1, P2, P3)
// Worker 2 handles T2 (pages P4, P5)
// Worker 3 handles T3 (pages P6, P7, P8)
This requires coordination for:
5. Log Record Locality:
Database systems try to keep related log records physically close:
This improves cache behavior during backward scan.
Many production databases aim for sub-minute recovery times. This is achieved through aggressive checkpointing (limiting redo work), efficient buffer pool tuning (minimizing losers), and these I/O optimizations. The backward scan's efficiency is a key contributor to fast recovery.
The backward scan must handle several edge cases correctly. Let's examine the most important ones.
Case 1: Empty Loser Set
If no transactions were active at crash time, the undo phase has nothing to do:
IF ToUndo.IsEmpty() at start:
Log.Info("No losers to undo - undo phase complete")
RETURN
This can happen after a clean shutdown followed by a crash before any new transactions began.
Case 2: Read-Only Transactions
Read-only transactions don't write update log records. If one was "active" at crash:
123456789101112131415161718192021222324252627282930313233343536373839404142
// Case 3: Very Long Transactions// Some transactions have thousands of updates to undo PROCEDURE HandleLongTransactionUndo(txnId, ToUndo): recordsUndone = 0 WHILE ToUndo.ContainsTransaction(txnId): (lsn, _) = ToUndo.ExtractMax() UndoRecord(lsn) recordsUndone++ // Optional: Checkpoint periodically during long undo IF recordsUndone % 10000 == 0: WriteCheckpoint() // Saves progress Log.Info("Undo progress: {recordsUndone} records for {txnId}") // Case 4: Partial CLR Chain (crash during previous undo)// The CLR chain may be incomplete - handle gracefully PROCEDURE HandleIncompleteCLRChain(logRecord): IF logRecord.type == CLR: // CLR's undoNextLSN tells us where to continue // This correctly skips the records that were already undone nextLSN = logRecord.undoNextLSN // The CLR's redo info is already applied during redo phase RETURN nextLSN // Case 5: Transaction with Nested Savepoints// Savepoints create internal markers but don't change undo logic PROCEDURE HandleSavepointRecords(logRecord): IF logRecord.type == SAVEPOINT: // Savepoint records are not undone // Just follow PrevLSN to continue RETURN logRecord.prevLSN IF logRecord.type == ROLLBACK_TO_SAVEPOINT: // Already handled any partial rollback // Continue with PrevLSN from before the savepoint RETURN logRecord.prevLSNCase 5: Mixed Record Types
The ToUndo set might extract LSNs pointing to different record types:
Case 6: Page No Longer Exists
Rarely, a page might not exist during undo:
Recovery must handle this:
TRY:
page = BufferPool.Fetch(pageId)
CATCH PageNotFoundException:
Log.Warn("Page {pageId} not found - may be from dropped table")
// Write CLR anyway to track progress
// The undo effect is implicitly done if page doesn't exist
If a log record is corrupted (checksum failure), recovery faces a difficult decision. Options include: stopping and requiring manual intervention, skipping the record (risky), or attempting repair from redundant copies. Most production systems halt and alert operators for corruption during recovery.
The backward scan is the engine that drives ARIES's undo phase, processing all loser transactions efficiently in a single pass. Let's consolidate the key insights:
What's Next:
Some transactions don't just need simple undo—they require nested rollback when different parts of the transaction need different handling. The next page explores nested rollback scenarios, including savepoints, subtransactions, and the complexities they introduce to the undo phase.
You now understand backward scanning—the efficient mechanism that processes all loser transactions in a single pass through the log. The ToUndo set, PrevLSN chains, and CLR shortcuts work together to minimize I/O while guaranteeing correct transaction rollback.