Loading learning content...
After a system crash, the ARIES recovery algorithm has completed two critical phases: the Analysis Phase has reconstructed the system state at the time of failure, and the Redo Phase has replayed all logged operations to bring the database to its pre-crash state. But there's a fundamental problem remaining: some transactions were still active when the crash occurred.
These incomplete transactions—sometimes called "loser transactions"—represent a serious threat to database consistency. They may have modified data that they never committed, violating the atomicity guarantee: either all of a transaction's changes should be visible, or none of them should be. The Undo Phase exists to solve this problem by systematically reversing every change made by transactions that never reached the commit point.
By the end of this page, you will understand why incomplete transactions must be undone, how the undo phase identifies which transactions require reversal, the relationship between undo and the ACID atomicity property, and the fundamental mechanics of reversing changes through backward log processing.
To understand why the undo phase is essential, we must first understand what happens during normal database operation and what state the database may be in after a crash.
The Steal Policy Problem:
ARIES uses a steal buffer management policy, which means dirty pages (pages modified by uncommitted transactions) can be written to disk before the transaction commits. This is done for performance reasons—waiting to flush all dirty pages until commit would create significant I/O bottlenecks.
However, this policy creates a critical challenge: after a crash, the disk may contain modifications from transactions that never committed. These "uncommitted changes on disk" must be removed to maintain atomicity.
| Transaction State at Crash | What Redo Does | What Undo Must Do |
|---|---|---|
| Committed before crash | Ensures all changes are on disk | Nothing (skip) |
| In progress at crash | Replays all logged changes | Must reverse ALL changes |
| Aborted before crash | Replays changes AND undo CLRs | Nothing (already handled) |
| Prepared (2PC) at crash | Replays all changes | Depends on coordinator decision |
The Chain of Reasoning:
Redo is blind: The redo phase replays ALL logged operations, regardless of transaction outcome. This ensures no committed work is lost, but it also means uncommitted changes get replayed.
Uncommitted changes violate atomicity: After redo, the database may contain partial results from transactions that never finished. These partial results could be seen by future transactions, violating isolation and consistency.
Undo restores atomicity: By reversing all changes from uncommitted transactions, the undo phase ensures the database state reflects only complete, committed transactions.
This is the fundamental contract of undo: erase all evidence that an uncommitted transaction ever existed.
You might wonder: why replay uncommitted changes during redo just to undo them afterward? The answer is that redo must reconstruct the exact database state at crash time to correctly handle page-level recovery. Some committed transactions may have modified the same pages as uncommitted ones, and reconstructing the correct page state requires replaying all operations in order. Only after this state is reconstructed can undo safely reverse the uncommitted changes.
The term "loser transaction" refers to any transaction that was active at the time of the crash and did not have a commit record in the log. Identifying these losers is straightforward because the analysis phase has already built the Transaction Table (TT) that tracks the state of all transactions.
How Losers are Identified:
During the analysis phase, ARIES scans the log from the most recent checkpoint to the end, updating the transaction table as it encounters:
At the end of analysis, any transaction remaining in the table with status ACTIVE or ABORTING is a loser that needs to be undone.
123456789101112131415161718192021
-- Transaction Table after Analysis Phase-- This table was reconstructed by scanning the log -- Sample Transaction Table Contents:/*TransactionID | Status | LastLSN | FirstLSN | UndoNextLSN--------------+------------+---------+----------+------------- T1 | COMMITTED | 1050 | 1000 | NULL -- Winner: will be removed T2 | ACTIVE | 1045 | 1010 | 1045 -- Loser: must undo T3 | ACTIVE | 1060 | 1025 | 1060 -- Loser: must undo T4 | ABORTING | 1055 | 1030 | 1048 -- Loser: continue undo T5 | COMMITTED | 1040 | 1005 | NULL -- Winner: will be removed*/ -- After filtering, Undo Phase works with:-- Loser Set = {T2, T3, T4} -- For each loser, we need:-- 1. UndoNextLSN: The next log record to undo-- 2. LastLSN: Pointer to most recent log record-- 3. The ability to walk backward through the chainThe UndoNextLSN Field:
This critical field in the transaction table tracks undo progress. For a transaction that was simply active (never started aborting), UndoNextLSN equals LastLSN—we start from the most recent operation and work backward.
For a transaction that was already aborting when the crash occurred, UndoNextLSN may point somewhere in the middle of the log record chain, indicating how far the undo had progressed before the crash. This is essential for avoiding redundant work and ensuring correctness.
When the undo phase begins, it constructs a loser set from the transaction table—the set of all transactions requiring undo. But how should these transactions be processed? ARIES makes a crucial design decision here: rather than processing each loser transaction sequentially, it processes them in parallel through a single backward pass of the log.
Why Parallel Processing Matters:
Consider a scenario with three loser transactions that each modified different pages at different times. Sequential processing would require:
Instead, ARIES processes ALL losers in a single backward scan, undoing operations in reverse LSN order regardless of which transaction performed them.
In the diagram above:
Single Backward Pass Efficiency:
ARIES maintains a ToUndo set containing the UndoNextLSN for each loser. It repeatedly:
This approach minimizes random I/O by processing the log in order.
The ToUndo set is typically implemented as a max-heap (priority queue) ordered by LSN. This allows O(log n) extraction of the highest LSN where n is the number of active losers. For small numbers of concurrent transactions, even a simple sorted list works well.
When the undo phase encounters an update log record from a loser transaction, it must reverse the effect of that update. This process involves several coordinated steps:
Step 1: Read the Log Record
Each update log record contains:
Step 2: Apply the Before Image
To undo an update, we restore the data to its before-image value. This is conceptually simple but requires care:
123456789101112131415161718192021222324252627282930313233343536
PROCEDURE UndoSingleRecord(logRecord): // Step 1: Extract information from log record txnId = logRecord.transactionId pageId = logRecord.pageId beforeImage = logRecord.beforeImage afterImage = logRecord.afterImage recordOffset = logRecord.offset // Step 2: Fetch and latch the page page = bufferPool.FetchPage(pageId) page.AcquireExclusiveLatch() TRY: // Step 3: Apply the before-image (reverse the change) page.WriteAt(recordOffset, beforeImage) // Step 4: Write a Compensation Log Record (CLR) clrLSN = writeCompensationLogRecord( transactionId: txnId, pageId: pageId, undoNextLSN: logRecord.prevLSN, // Skip over what we just undid redoInfo: beforeImage // So redo can re-apply this undo ) // Step 5: Update the page's LSN to the CLR's LSN page.SetPageLSN(clrLSN) // Step 6: Update transaction table transactionTable[txnId].undoNextLSN = logRecord.prevLSN transactionTable[txnId].lastLSN = clrLSN FINALLY: // Step 7: Release latch (page may still be dirty in buffer) page.ReleaseExclusiveLatch() RETURN logRecord.prevLSN // Next record to undo for this txnStep 3: Write a Compensation Log Record (CLR)
Critically, every undo operation is itself logged. This CLR serves multiple purposes:
We'll explore CLRs in depth on the next page.
Step 4: Update Page and Transaction Metadata
After the undo:
ARIES uses physiological logging, so undo is physical within a page but logical across pages. The before-image captures the exact byte changes needed to reverse an update. Some systems use logical undo (re-executing inverse operations), but this is more complex and can fail if the database structure has changed.
Not all log records encountered during the backward scan require the same handling. The undo phase must distinguish between several types:
Update Records (Need Undo): These are the primary targets of the undo phase. Each update record from a loser transaction must be reversed by applying the before-image and writing a CLR.
CLR Records (Do Not Re-Undo): Compensation log records represent undos that already happened. When encountered, we don't undo them again—that would undo the undo! Instead, we follow the CLR's undoNextLSN to continue the undo chain from where it left off.
Begin Records (Transaction Complete): When we reach a transaction's begin record, that transaction's undo is complete. We write an END record and remove the transaction from the loser set.
| Record Type | Action in Undo Phase | What Happens to ToUndo |
|---|---|---|
| UPDATE (loser txn) | Apply before-image, write CLR | Replace with PrevLSN (or remove if NULL) |
| CLR | Do NOT undo again | Replace with UndoNextLSN from CLR |
| BEGIN (loser txn) | Write END record for transaction | Remove transaction from ToUndo set |
| COMMIT | Should not appear (txn would be winner) | N/A |
| ABORT | Continue processing (txn was aborting) | No change (continue processing records) |
| CHECKPOINT | Ignore | No change |
123456789101112131415161718192021222324252627282930313233343536373839
PROCEDURE UndoPhase(transactionTable): // Initialize ToUndo with UndoNextLSN for all losers ToUndo = MaxHeap() // Ordered by LSN (highest first) FOR EACH txn IN transactionTable: IF txn.status IN {ACTIVE, ABORTING}: IF txn.undoNextLSN != NULL: ToUndo.Insert(txn.undoNextLSN, txn.transactionId) // Main undo loop: process in reverse LSN order WHILE NOT ToUndo.IsEmpty(): (currentLSN, txnId) = ToUndo.ExtractMax() logRecord = Log.Read(currentLSN) IF logRecord.type == UPDATE: // Undo this update nextLSN = UndoSingleRecord(logRecord) IF nextLSN != NULL: ToUndo.Insert(nextLSN, txnId) ELSE: // Reached BEGIN - transaction fully undone Log.Write(END_RECORD, txnId) transactionTable.Remove(txnId) ELSE IF logRecord.type == CLR: // Don't re-undo; follow the undoNextLSN IF logRecord.undoNextLSN != NULL: ToUndo.Insert(logRecord.undoNextLSN, txnId) ELSE: // CLR points to nothing - transaction fully undone Log.Write(END_RECORD, txnId) transactionTable.Remove(txnId) ELSE IF logRecord.type == BEGIN: // Transaction fully undone Log.Write(END_RECORD, txnId) transactionTable.Remove(txnId) // Undo phase complete - all losers rolled backCLRs are redo-only log records. They exist to record the fact that an undo occurred, allowing that undo to be redone if there's a subsequent crash. If we "undid" a CLR, we would be reversing a reversal—essentially re-applying the original change that we were trying to undo. The undoNextLSN pointer in CLRs allows us to skip them correctly during undo.
The undo phase operates while the database is being recovered—it's not yet open for normal operations. However, maintaining consistency during this phase is still critical because crashes can occur during recovery itself.
The Crash-During-Recovery Problem:
Imagine we're halfway through undoing a transaction when power fails again. What happens?
The system correctly continues from where it left off, without re-undoing operations that were already undone.
The Role of the END Record:
When a transaction's undo is complete—either by reaching its BEGIN record or by processing a CLR with null undoNextLSN—an END record is written to the log. This serves as definitive proof that the transaction has been fully rolled back.
The END record is important for log truncation: once a transaction's END record is written, all of its log records (including CLRs) are eligible for eventual removal. Without END records, the system couldn't determine which portions of the log are safe to reclaim.
Concurrent Undo Considerations:
While the basic ARIES undo is single-threaded (one backward scan), some implementations parallelize undo by assigning different loser transactions to different worker threads. This requires careful coordination:
The undo phase terminates when the ToUndo set becomes empty, meaning every loser transaction has been fully rolled back. At this point:
Database State:
Log State:
System State:
123456789101112131415161718192021222324
PROCEDURE CompleteRecovery(): // At this point, all three phases are done: // - Analysis: Reconstructed DPT and TT // - Redo: Replayed all logged operations // - Undo: Rolled back all loser transactions // Verify completion ASSERT transactionTable.IsEmpty() OR ALL(txn IN transactionTable: txn.status == COMMITTED) // Optional: force a checkpoint now // This reduces future recovery time IF config.CHECKPOINT_AFTER_RECOVERY: WriteCheckpoint() // Enable normal operations SetSystemStatus(ONLINE) // Begin accepting new connections OpenForTransactions() Log.Info("Recovery complete. Database is online.") Log.Info("Undo phase rolled back {count} loser transactions.") Log.Info("Recovery time: {elapsed} seconds.")After ARIES recovery completes, the database is in a state where: (1) Every committed transaction's effects are fully present, honoring durability. (2) Every uncommitted transaction's effects are completely absent, honoring atomicity. (3) The database satisfies all integrity constraints, honoring consistency. This is the fundamental promise of the recovery system.
This page established the foundational concepts for the undo phase of ARIES recovery. Let's consolidate the key insights:
What's Next:
Now that we understand the fundamentals of undoing incomplete transactions, the next page dives deep into Compensation Log Records (CLRs)—the mechanism that makes undo operations themselves durable and crash-proof. We'll explore their structure, why they're "redo-only," and how they form a chain that tracks undo progress.
You now understand why incomplete transactions must be undone, how loser transactions are identified, and the fundamental mechanics of the undo phase. The next page explores Compensation Log Records (CLRs) in detail—the key mechanism that makes undo operations themselves recoverable.