Database Management SystemsARIES Overview

ARIES Recovery Algorithm Overview

LevelAdvanced

Duration75 mins

TopicARIES Overview

2 / 5

The Three Phases of ARIES Recovery

A Symphony in Three Movements

When a database system restarts after a crash, ARIES orchestrates recovery through three carefully choreographed phases: Analysis, Redo, and Undo. Each phase has a specific responsibility, operates on distinct data structures, and must complete before the next can begin. Together, they transform a potentially corrupted database into a consistent one—and they do so correctly even if the system crashes again during recovery.

This three-phase structure is the architectural backbone of ARIES. Understanding what each phase does, why it does it, and how the phases interact is essential for mastering database recovery.

What You Will Learn

By the end of this page, you will understand: (1) how the Analysis phase reconstructs the system state from the log, (2) how the Redo phase restores the exact crash-time database state, (3) how the Undo phase rolls back incomplete transactions while protecting against nested crashes, and (4) how these phases interconnect to guarantee correct recovery.

Overview of the Three Phases

Before diving into each phase, let's understand their relationships and responsibilities at a high level.

Sequential Dependency

The three phases execute in strict sequence:

Analysis must complete before Redo can begin (Redo needs the Dirty Page Table and Transaction Table)
Redo must complete before Undo can begin (Undo requires the database to be in crash-time state)
Undo must complete before the database can accept new transactions

Each phase depends on outputs from the previous phase. Skipping or reordering phases would break correctness.

Distinct Responsibilities

The Three Phases at a Glance
Phase	Question Answered	Primary Actions	Key Outputs
Analysis	What was the state at crash time?	Scan log forward from checkpoint	Transaction Table, Dirty Page Table, RedoLSN
Redo	How do we restore crash-time state?	Re-apply logged operations	Database restored to crash-time state
Undo	How do we roll back incomplete work?	Reverse uncommitted operations	Consistent database (only committed effects remain)

Time Complexity Characteristics

Each phase has different time characteristics:

Analysis: Time proportional to log size since last checkpoint (typically fast)
Redo: Time proportional to log size since oldest dirty page, modified by how many operations actually need redo
Undo: Time proportional to the amount of work done by uncommitted transactions (highly variable)

In practice, Analysis is usually the fastest phase, Redo is predictably bounded, and Undo can be the wildcard—a single long-running uncommitted transaction can mean extended undo time.

Why This Order?

Analysis → Redo → Undo isn't arbitrary. Analysis determines WHAT to redo. Redo restores the state needed for correct undo. Undo requires the exact crash-time state because logical undo operations (like decrementing a counter) only work correctly if the page is in the expected state. Reordering would break these dependencies.

Phase 1: Analysis

The Analysis phase answers a fundamental question: What was the state of the database system when it crashed?

At the moment of crash, we have no in-memory structures—all volatile state is lost. But the log on stable storage contains a complete record of operations. The Analysis phase reconstructs:

Transaction Table (TT): Which transactions were active, and what was their last logged operation?
Dirty Page Table (DPT): Which pages had been modified but potentially not written to disk?
RedoLSN: Where should the Redo phase begin scanning?

Starting Point: The Last Checkpoint

Analysis doesn't scan from the beginning of the log—that could be enormous. Instead, it starts from the last checkpoint record. A checkpoint contains snapshots of the Transaction Table and Dirty Page Table at checkpoint time. Analysis uses these as a starting point, then updates them by scanning forward through the log.

Analysis Phase Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
// ARIES Analysis Phase Algorithm
function AnalysisPhase(log: Log, lastCheckpoint: CheckpointRecord): AnalysisResult {
    
    // Initialize from checkpoint
    let TransactionTable = clone(lastCheckpoint.TransactionTable);
    let DirtyPageTable = clone(lastCheckpoint.DirtyPageTable);
    
    // Start scanning from checkpoint's LSN
    let currentLSN = lastCheckpoint.LSN;
    
    // Scan forward through the log to end
    while (hasMoreRecords(log, currentLSN)) {
        let record = log.read(currentLSN);
        
        if (record.type === "END") {
            // Transaction completed (committed or rolled back)
            // Remove from transaction table
            TransactionTable.remove(record.transactionId);
        }
        else if (record.transactionId !== null) {
            // Any record associated with a transaction
            
            // Ensure transaction is in the table
            if (!TransactionTable.has(record.transactionId)) {
                TransactionTable.add(record.transactionId, {
                    state: "UNDO",  // Assume needs undo until we see COMMIT
                    lastLSN: record.LSN
                });
            } else {
                // Update last LSN for this transaction
                TransactionTable.get(record.transactionId).lastLSN = record.LSN;
            }
            
            // Handle COMMIT specially
            if (record.type === "COMMIT") {
                TransactionTable.get(record.transactionId).state = "COMMIT";
            }
        }
        
        // Track dirty pages
        if (record.type === "UPDATE" || record.type === "CLR") {
            let pageId = record.pageId;
            
            // Add to DPT if not already present
            // RecLSN is the FIRST log record that dirtied the page
            if (!DirtyPageTable.has(pageId)) {
                DirtyPageTable.add(pageId, {
                    recLSN: record.LSN  // First LSN that dirtied this page
                });
            }
            // Note: we don't update recLSN if page already in DPT
            // because we want the OLDEST dirtying LSN
        }
        
        currentLSN = nextLSN(log, currentLSN);
    }
    
    // Calculate RedoLSN: earliest recLSN in DPT
    // This is where redo phase will start
    let RedoLSN = min(DirtyPageTable.values().map(e => e.recLSN));
    if (RedoLSN === undefined) {
        RedoLSN = lastCheckpoint.LSN;  // No dirty pages, start from checkpoint
    }
    
    return {
        TransactionTable,
        DirtyPageTable,
        RedoLSN
    };
}

Key Observations About Analysis

Forward scan only: Analysis reads the log from checkpoint to end, never backward.
Conservative dirty page tracking: If a page appears in an UPDATE or CLR record, it's added to the DPT. The page might actually be clean (already flushed to disk), but we don't know—we'll discover this during redo.
Transaction state inference: Any transaction without a COMMIT or END record is assumed to need undo. Even if the transaction was committing when the crash occurred, if END wasn't written, we treat it as uncommitted.
RecLSN captures first modification: The DPT's recLSN for each page is the first log record that dirtied it after the checkpoint. This determines where redo must start for that page.
RedoLSN is the minimum recLSN: Redo must start early enough to capture all necessary operations. The earliest recLSN in the DPT ensures this.

Analysis is Conservative

Analysis errs on the side of caution. It may include pages in the DPT that are actually clean (flushed to disk). It may start redo earlier than strictly necessary. This conservatism is intentional—it's always safe to redo an operation, but missing one could cause corruption. Redo phase will skip already-applied operations via LSN comparison.

Analysis Data Structures in Detail

Let's examine the two critical data structures that Analysis produces:

Transaction Table (TT)

The Transaction Table tracks every transaction that was active at crash time. For each transaction, it records:

Transaction Table Structure
Field	Type	Description	Example Values
TransactionID	integer	Unique identifier for the transaction	T1, T2, T3...
State	enum	Current transaction state	UNDO (active), COMMIT (committed but not ended)
LastLSN	LSN	Most recent log record written by this transaction	LSN 5000, 7250, etc.
UndoNextLSN	LSN	Next record to undo for this transaction	Initially same as LastLSN

State Transitions During Analysis

As Analysis scans forward:

New transaction entries are created when we see a log record for an unknown transaction
State changes to COMMIT when we see a COMMIT record
Entries are removed entirely when we see an END record

After Analysis, only transactions that need undo (state = UNDO) or are in the process of committing (state = COMMIT, but no END yet) remain.

Dirty Page Table (DPT)

The Dirty Page Table identifies which pages may contain unflushed modifications:

Dirty Page Table Structure
Field	Type	Description	Example Values
PageID	identifier	Unique identifier for the database page	Page 42, Page 1023, etc.
RecLSN	LSN	LSN of the first log record that dirtied this page (since checkpoint)	LSN 3500, 4200, etc.

Why RecLSN Matters

The RecLSN is the crucial value in the DPT. It represents the earliest point from which redo might be necessary for this page. Consider:

At checkpoint time, Page 42 was either clean or had LSN X as its PageLSN
After the checkpoint, log record at LSN 3500 modified Page 42
Even if Page 42 was later modified by LSN 4200, 5000, etc., we must start redo from LSN 3500
If we started from LSN 5000, we'd miss the changes from LSN 3500 and 4200

The RecLSN answers: "What's the earliest log record that might not be reflected on this page's disk version?"

Computing RedoLSN

The RedoLSN is simply the minimum RecLSN across all entries in the DPT:

RedoLSN = min(RecLSN for all pages in DPT)

This ensures we start redo early enough to capture all necessary operations. Even if most pages need redo from LSN 5000, if one page might need it from LSN 3500, we start from 3500.

If the DPT is empty (no dirty pages), RedoLSN is the checkpoint's LSN—but redo will likely skip everything via LSN comparison.

Checkpoint Contents

The checkpoint record contains the Transaction Table and Dirty Page Table at checkpoint time. Analysis starts with these as initial values, then updates them by scanning forward. This is why checkpoints accelerate recovery—we don't reconstruct from the beginning of time, just from the last checkpoint.

Phase 2: Redo

The Redo phase implements ARIES's signature principle: Repeat History. Starting from RedoLSN, it re-applies logged operations to restore the database to its exact crash-time state. This includes operations from transactions that will be rolled back—we redo everything, then undo the losers.

Why Repeat History?

This approach might seem wasteful. Why redo work that we'll undo immediately after? The answer lies in page-level consistency:

Physical state restoration: A page might have been modified by multiple transactions. To correctly undo one transaction, the page must be in its crash-time state, including changes from other transactions.
CLR handling: If the system crashed during rollback, CLRs in the log record partial undo progress. Redo must apply these CLRs to restore the interrupted rollback state.
Simplicity and correctness: Redo doesn't need to track transaction status. It mechanically re-applies operations. This separation of concerns makes the algorithm easier to verify.

Redo Algorithm

Redo Phase Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// ARIES Redo Phase Algorithm
function RedoPhase(
    log: Log, 
    RedoLSN: LSN, 
    DirtyPageTable: Map<PageID, DPTEntry>,
    Buffer: BufferManager
): void {
    
    let currentLSN = RedoLSN;
    
    // Scan forward from RedoLSN to end of log
    while (hasMoreRecords(log, currentLSN)) {
        let record = log.read(currentLSN);
        
        // Only redo UPDATE and CLR records
        // (COMMIT, ABORT, END, CHECKPOINT don't modify pages)
        if (record.type === "UPDATE" || record.type === "CLR") {
            
            let pageId = record.pageId;
            let shouldRedo = true;
            
            // Optimization 1: Skip if page not in DPT
            // If page isn't in DPT, it was clean at checkpoint
            // and not modified after, so no redo needed
            if (!DirtyPageTable.has(pageId)) {
                shouldRedo = false;
            }
            
            // Optimization 2: Check if record LSN < page's RecLSN
            // If this record is older than the first post-checkpoint
            // modification to this page, the page wasn't dirty from
            // this record's perspective
            else if (record.LSN < DirtyPageTable.get(pageId).recLSN) {
                shouldRedo = false;
            }
            
            // Optimization 3: Compare with PageLSN on disk
            // If PageLSN >= record LSN, this change is already on disk
            else {
                let page = Buffer.fetchPage(pageId);
                if (page.PageLSN >= record.LSN) {
                    // Already applied, skip redo
                    shouldRedo = false;
                }
            }
            
            // Actually perform redo if needed
            if (shouldRedo) {
                let page = Buffer.fetchPage(pageId);
                
                // Apply the redo information from the log record
                applyRedo(page, record.redoInfo);
                
                // Update the page's LSN to this record's LSN
                page.PageLSN = record.LSN;
                
                // Note: Page is NOT forced to disk here
                // Buffer manager will flush later per normal operation
            }
        }
        
        currentLSN = nextLSN(log, currentLSN);
    }
    
    // After redo: database is in exact crash-time state
}
 
// Helper: Apply redo information to a page
function applyRedo(page: Page, redoInfo: RedoData): void {
    // For physical logging: copy bytes directly
    // For physiological: re-execute operation
    // For logical: evaluate operation in current context
    // (Implementation depends on logging type)
}

The LSN Comparison Trick

The most elegant optimization in ARIES redo is the PageLSN comparison. Every database page carries a PageLSN in its header—the LSN of the last log record that modified it. During redo:

If PageLSN >= RecordLSN, the modification is already on the page (was flushed to disk before crash)
If PageLSN < RecordLSN, the modification was lost (page had old version on disk)

This single comparison replaces complex bookkeeping. We don't maintain which operations were applied—the page's LSN tells us directly.

Three-Level Redo Filtering

Redo uses three levels of filtering to minimize unnecessary work:

Redo Filtering Optimizations

•DPT membership: If the page isn't in the Dirty Page Table, it was clean at crash—skip all redo for this page. (Level 1: cheapest check)
•RecLSN comparison: If the record's LSN is older than the page's RecLSN in the DPT, this operation predates the page becoming dirty—skip it. (Level 2: DPT lookup)
•PageLSN comparison: Fetch the page and compare its PageLSN to the record's LSN. If already applied, skip. (Level 3: requires I/O but definitive)

Redo is Idempotent

Even without these optimizations, redo would be correct—just slower. Applying the same operation twice produces the same result (idempotency). The optimizations prevent redundant work, but correctness doesn't depend on them.

Redo Phase: Worked Example

Let's trace through a redo scenario to solidify understanding. Consider this initial state:

Scenario: State Before Crash
Item	Value
Last Checkpoint LSN	1000
Crash occurred at LSN	5000
RedoLSN (from Analysis)	2000

Dirty Page Table (from Analysis)
PageID	RecLSN
Page A	2000
Page B	3000
Page C	3500

Page States on Disk (Before Recovery)
PageID	PageLSN on Disk	Interpretation
Page A	2500	Has changes up to LSN 2500, missing 3000-5000
Page B	4000	Has changes up to LSN 4000, missing 4500-5000
Page C	3000	Has changes up to LSN 3000, missing 3500-5000

Log Records from LSN 2000 to 5000:

Log Records to Redo
LSN	Type	Page	Action	Redo Decision
2000	UPDATE	Page A	Set X = 10	RecLSN = 2000, PageLSN = 2500 ≥ 2000 → Skip
2500	UPDATE	Page A	Set Y = 20	RecLSN = 2000, PageLSN = 2500 ≥ 2500 → Skip
3000	UPDATE	Page B	Set Z = 30	RecLSN = 3000, PageLSN = 4000 ≥ 3000 → Skip
3000	UPDATE	Page A	Set X = 15	RecLSN = 2000, PageLSN = 2500 < 3000 → REDO
3500	UPDATE	Page C	Set W = 40	RecLSN = 3500, PageLSN = 3000 < 3500 → REDO
4000	UPDATE	Page B	Set Z = 35	RecLSN = 3000, PageLSN = 4000 ≥ 4000 → Skip
4500	UPDATE	Page B	Set Z = 40	RecLSN = 3000, PageLSN = 4000 < 4500 → REDO
5000	UPDATE	Page A	Set Y = 25	RecLSN = 2000, PageLSN now = 3000 < 5000 → REDO

Redo Trace:

LSN 2000, 2500 (Page A): Skipped—PageLSN 2500 shows these are already on disk
LSN 3000 (Page B): Skipped—PageLSN 4000 shows this is already on disk
LSN 3000 (Page A): REDO—PageLSN 2500 < 3000, modification lost in crash
- After redo: Page A's PageLSN = 3000
LSN 3500 (Page C): REDO—PageLSN 3000 < 3500, modification lost
- After redo: Page C's PageLSN = 3500
LSN 4000 (Page B): Skipped—PageLSN 4000 ≥ 4000
LSN 4500 (Page B): REDO—PageLSN 4000 < 4500
- After redo: Page B's PageLSN = 4500
LSN 5000 (Page A): REDO—PageLSN 3000 < 5000
- After redo: Page A's PageLSN = 5000

After Redo:

Page A: PageLSN = 5000 (fully recovered)
Page B: PageLSN = 4500 (fully recovered)
Page C: PageLSN = 3500 (fully recovered)

The database is now in exactly the state it was at the moment of crash.

Redo Restored Crash-Time State

After redo, every page reflects every operation that was logged before the crash. This includes uncommitted transactions—we haven't judged what to keep yet. That's the Undo phase's job. Redo just repeats history faithfully.

Phase 3: Undo

The Undo phase completes recovery by rolling back all transactions that were active at crash time but hadn't committed. These are the loser transactions—their effects must be removed to achieve a consistent final state.

Identifying Loser Transactions

After Analysis, the Transaction Table contains entries for all transactions that were active at crash. Any transaction with state = UNDO (no COMMIT record) is a loser. These transactions' modifications are currently in the database (restored by redo) but must be removed.

Undo Direction: Backward

Unlike redo (which scans forward chronologically), undo proceeds backward—from most recent to oldest. We use the LastLSN from each transaction's TT entry as the starting point, then follow PrevLSN pointers backward through that transaction's log records.

Compensation Log Records (CLRs)

Here's where ARIES becomes truly resilient: when we undo an operation, we write a Compensation Log Record (CLR) describing the undo action. CLRs are crucial for crash-during-recovery robustness:

If we crash during undo, the CLRs recorded so far will be redone (since redo repeats all history)
After redo, we resume undo from where we left off, not from the beginning
CLRs have an UndoNextLSN pointer that tells us where undo should continue

Undo Phase Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// ARIES Undo Phase Algorithm
function UndoPhase(
    log: Log,
    TransactionTable: Map<TransactionID, TTEntry>,
    Buffer: BufferManager
): void {
    
    // Collect all LSNs that need to be undone
    // These form the "ToUndo" set
    let ToUndo: MaxHeap<LSN> = new MaxHeap();
    
    for (let [txnId, entry] of TransactionTable.entries()) {
        if (entry.state === "UNDO") {
            // Add the last LSN of each loser transaction
            ToUndo.insert(entry.lastLSN);
        }
    }
    
    // Process in reverse LSN order (largest first)
    while (!ToUndo.isEmpty()) {
        let lsnToUndo = ToUndo.extractMax();
        let record = log.read(lsnToUndo);
        
        if (record.type === "CLR") {
            // CLR records are never undone themselves!
            // Just follow the UndoNextLSN pointer
            if (record.UndoNextLSN !== null) {
                ToUndo.insert(record.UndoNextLSN);
            }
            // If UndoNextLSN is null, this transaction's undo is complete
        }
        else if (record.type === "UPDATE") {
            // Undo this update
            
            // 1. Read the page
            let page = Buffer.fetchPage(record.pageId);
            
            // 2. Apply the undo operation
            applyUndo(page, record.undoInfo);
            
            // 3. Write a CLR for this undo action
            let clr = {
                type: "CLR",
                transactionId: record.transactionId,
                pageId: record.pageId,
                redoInfo: computeRedoForUndo(record), // How to redo this undo
                UndoNextLSN: record.PrevLSN, // Where to continue undoing
                PrevLSN: getLastLSN(record.transactionId)
            };
            let clrLSN = log.append(clr);
            
            // 4. Update page LSN
            page.PageLSN = clrLSN;
            
            // 5. Add the PrevLSN to continue undoing this transaction
            if (record.PrevLSN !== null) {
                ToUndo.insert(record.PrevLSN);
            }
        }
        // Other record types (COMMIT, BEGIN, etc.) don't need undo
    }
    
    // Write END records for all loser transactions
    for (let [txnId, entry] of TransactionTable.entries()) {
        if (entry.state === "UNDO") {
            log.append({
                type: "END",
                transactionId: txnId
            });
        }
    }
    
    // Recovery complete!
}

The ToUndo Set and Processing Order

Undo maintains a set (typically a max-heap) of LSNs that need processing. Processing proceeds in descending LSN order—we undo the most recent operations first. This is important because:

Dependency ordering: Later operations might depend on earlier ones. Undoing in reverse order respects these dependencies.
Page state: An UPDATE might assume the page is in a certain state. That state was produced by earlier operations. Undoing the latest first maintains consistency.
CLR chaining: When we undo an operation and write a CLR, the CLR's UndoNextLSN points to the next record to undo. Processing in order ensures we can follow this chain correctly.

What About CLR Records?

When undo encounters a CLR record, it does not undo it. CLRs are redo-only records. Instead, undo follows the CLR's UndoNextLSN pointer to find the next record that needs undoing. This is how ARIES handles crash-during-recovery: CLRs bookmark our undo progress.

CLRs Are Never Undone

A critical invariant: we never undo a CLR. CLRs record that we've undone something—undoing them would mean re-doing the original operation, which contradicts the rollback. Instead, CLRs are redone if we crash during recovery, then we resume undo from where we left off via UndoNextLSN.

Undo Phase: Worked Example

Let's continue our earlier example. After redo, the database is in crash-time state. Now we must undo the loser transactions.

Transaction Table (After Analysis)
TransactionID	State	LastLSN
T1	COMMIT	4000
T2	UNDO	5000
T3	UNDO	4500

T1 is committed (we saw its COMMIT record), so it's a winner. T2 and T3 are losers—they must be undone.

Log Records for Loser Transactions:

Log Records to Process
LSN	TxnID	Type	Page	PrevLSN	Action
2000	T2	UPDATE	Page A	null	Set X from 0 to 10
2500	T3	UPDATE	Page B	null	Set Y from 0 to 20
3000	T2	UPDATE	Page A	2000	Set X from 10 to 15
4500	T3	UPDATE	Page B	2500	Set Y from 20 to 25
5000	T2	UPDATE	Page A	3000	Set X from 15 to 18

Undo Processing (Descending LSN Order):

Step 1: Process LSN 5000 (T2)

Undo: Set X from 18 back to 15
Write CLR at LSN 5001: "Set X = 15", UndoNextLSN = 3000
Add LSN 3000 to ToUndo set

Step 2: Process LSN 4500 (T3)

Undo: Set Y from 25 back to 20
Write CLR at LSN 5002: "Set Y = 20", UndoNextLSN = 2500
Add LSN 2500 to ToUndo set

Step 3: Process LSN 3000 (T2)

Undo: Set X from 15 back to 10
Write CLR at LSN 5003: "Set X = 10", UndoNextLSN = 2000
Add LSN 2000 to ToUndo set

Step 4: Process LSN 2500 (T3)

Undo: Set Y from 20 back to 0
Write CLR at LSN 5004: "Set Y = 0", UndoNextLSN = null (end of T3's chain)
T3 undo is complete

Step 5: Process LSN 2000 (T2)

Undo: Set X from 10 back to 0
Write CLR at LSN 5005: "Set X = 0", UndoNextLSN = null (end of T2's chain)
T2 undo is complete

Step 6: Write END Records

Write END record for T2
Write END record for T3

Final State:

Page A: X = 0 (T2's changes undone)
Page B: Y = 0 (T3's changes undone)
T1's committed changes remain in the database

Recovery Complete

After undo, the database is consistent. Only T1's committed effects remain. T2 and T3's modifications have been rolled back. CLRs ensure that if we crash during this undo process, we can resume correctly without re-undoing already-undone work.

Handling Crash During Recovery

One of ARIES's defining strengths is nested crash handling—the ability to recover correctly even if the system crashes during recovery. Let's trace how this works.

Scenario: Crash During Undo

Suppose in our previous example, the system crashes after Step 3 (after writing CLR at LSN 5003, before completing T3's undo).

State at Second Crash:

CLRs at LSN 5001, 5002, 5003 are on stable storage (written before crash)
T2 has been partially undone (LSN 5000 and 3000 undone)
T3 has been partially undone (LSN 4500 undone, 2500 not yet)

Recovery from Second Crash:

Analysis Phase:

Scans from last checkpoint, sees all original records plus CLRs
Transaction Table: T2 (state=UNDO, lastLSN=5003), T3 (state=UNDO, lastLSN=5002)
DPT includes pages modified by CLRs

Redo Phase:

Redoes all operations including CLRs
After redo: database is in state at moment of second crash
This includes the undo work that was already done (CLRs are redone)

Undo Phase:

ToUndo set initialized with: T2's lastLSN = 5003, T3's lastLSN = 5002
Process LSN 5003 (CLR for T2): Follow UndoNextLSN = 2000
Process LSN 5002 (CLR for T3): Follow UndoNextLSN = 2500
Process LSN 2500 (T3): Undo, write CLR 5006
Process LSN 2000 (T2): Undo, write CLR 5007
Write END records

The Key Insight: CLRs as Undo Progress Markers

Notice what happened with CLRs during recovery from the nested crash:

We encountered CLR 5003 (for T2's undo of LSN 3000)
We did not try to undo the CLR
Instead, we followed UndoNextLSN to LSN 2000
This is the next record T2 needs to undo—we didn't re-undo LSN 5000 or 3000

CLRs act as undo progress markers. After a crash-during-recovery:

Redo re-applies all CLRs, restoring the partial undo state
Undo continues from where it left off via UndoNextLSN
No work is repeated; no work is missed

Why CLRs Can't Be Undone

If we tried to undo a CLR, we'd be undoing an undo—which means redoing the original operation. This would violate our goal of rolling back the transaction. The rule is absolute: CLRs are redo-only, never undone.

Converting Mermaid diagram...

Bounded Recovery Time

Because CLRs are redone and undo progress is preserved, nested crashes don't cause exponential work. Each recovery attempt makes progress. Even with arbitrary numbers of crashes during recovery, total work is bounded by the original log size plus the work needed for one complete undo pass.

Summary: The Three Phases

We've explored the three phases of ARIES recovery in depth. Let's consolidate the key concepts:

Key Takeaways

•Analysis reconstructs crash-time state by scanning from the last checkpoint, building the Transaction Table and Dirty Page Table, and determining RedoLSN.
•Redo repeats history by re-applying all logged operations from RedoLSN forward, using PageLSN comparison to skip already-applied operations.
•Undo rolls back loser transactions by processing their operations in reverse order, writing CLRs to record progress.
•CLRs enable crash-during-recovery resilience by serving as undo progress markers that are redone (not undone) in subsequent recoveries.
•The phases are sequential and dependent: Analysis feeds Redo, Redo prepares state for Undo, Undo achieves consistency.
•LSN comparison is the key optimization, enabling efficient redo by checking if operations are already reflected on pages.

What's Next

Now that we understand the three phases, we'll dive deeper into the Log Sequence Number (LSN) concept—the backbone identifier system that makes ARIES work efficiently. LSNs enable everything from redo skipping to CLR chaining to checkpoint optimization.

Page Complete

You now understand how ARIES's three phases work together to recover from crashes. Analysis determines state, Redo restores it, and Undo cleans up uncommitted work—all while handling nested crashes gracefully through CLRs.

2 / 5

Loading learning content...

Database Management SystemsARIES Overview

ARIES Recovery Algorithm Overview

LevelAdvanced

Duration75 mins

TopicARIES Overview

2 / 5

The Three Phases of ARIES Recovery

A Symphony in Three Movements

This three-phase structure is the architectural backbone of ARIES. Understanding what each phase does, why it does it, and how the phases interact is essential for mastering database recovery.

What You Will Learn

Overview of the Three Phases

Before diving into each phase, let's understand their relationships and responsibilities at a high level.

Sequential Dependency

The three phases execute in strict sequence:

Analysis must complete before Redo can begin (Redo needs the Dirty Page Table and Transaction Table)
Redo must complete before Undo can begin (Undo requires the database to be in crash-time state)
Undo must complete before the database can accept new transactions

Each phase depends on outputs from the previous phase. Skipping or reordering phases would break correctness.

Distinct Responsibilities

The Three Phases at a Glance
Phase	Question Answered	Primary Actions	Key Outputs
Analysis	What was the state at crash time?	Scan log forward from checkpoint	Transaction Table, Dirty Page Table, RedoLSN
Redo	How do we restore crash-time state?	Re-apply logged operations	Database restored to crash-time state
Undo	How do we roll back incomplete work?	Reverse uncommitted operations	Consistent database (only committed effects remain)

Time Complexity Characteristics

Each phase has different time characteristics:

Analysis: Time proportional to log size since last checkpoint (typically fast)
Redo: Time proportional to log size since oldest dirty page, modified by how many operations actually need redo
Undo: Time proportional to the amount of work done by uncommitted transactions (highly variable)

In practice, Analysis is usually the fastest phase, Redo is predictably bounded, and Undo can be the wildcard—a single long-running uncommitted transaction can mean extended undo time.

Why This Order?

Phase 1: Analysis

The Analysis phase answers a fundamental question: What was the state of the database system when it crashed?

At the moment of crash, we have no in-memory structures—all volatile state is lost. But the log on stable storage contains a complete record of operations. The Analysis phase reconstructs:

Transaction Table (TT): Which transactions were active, and what was their last logged operation?
Dirty Page Table (DPT): Which pages had been modified but potentially not written to disk?
RedoLSN: Where should the Redo phase begin scanning?

Starting Point: The Last Checkpoint

Analysis Phase Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
// ARIES Analysis Phase Algorithm
function AnalysisPhase(log: Log, lastCheckpoint: CheckpointRecord): AnalysisResult {
    
    // Initialize from checkpoint
    let TransactionTable = clone(lastCheckpoint.TransactionTable);
    let DirtyPageTable = clone(lastCheckpoint.DirtyPageTable);
    
    // Start scanning from checkpoint's LSN
    let currentLSN = lastCheckpoint.LSN;
    
    // Scan forward through the log to end
    while (hasMoreRecords(log, currentLSN)) {
        let record = log.read(currentLSN);
        
        if (record.type === "END") {
            // Transaction completed (committed or rolled back)
            // Remove from transaction table
            TransactionTable.remove(record.transactionId);
        }
        else if (record.transactionId !== null) {
            // Any record associated with a transaction
            
            // Ensure transaction is in the table
            if (!TransactionTable.has(record.transactionId)) {
                TransactionTable.add(record.transactionId, {
                    state: "UNDO",  // Assume needs undo until we see COMMIT
                    lastLSN: record.LSN
                });
            } else {
                // Update last LSN for this transaction
                TransactionTable.get(record.transactionId).lastLSN = record.LSN;
            }
            
            // Handle COMMIT specially
            if (record.type === "COMMIT") {
                TransactionTable.get(record.transactionId).state = "COMMIT";
            }
        }
        
        // Track dirty pages
        if (record.type === "UPDATE" || record.type === "CLR") {
            let pageId = record.pageId;
            
            // Add to DPT if not already present
            // RecLSN is the FIRST log record that dirtied the page
            if (!DirtyPageTable.has(pageId)) {
                DirtyPageTable.add(pageId, {
                    recLSN: record.LSN  // First LSN that dirtied this page
                });
            }
            // Note: we don't update recLSN if page already in DPT
            // because we want the OLDEST dirtying LSN
        }
        
        currentLSN = nextLSN(log, currentLSN);
    }
    
    // Calculate RedoLSN: earliest recLSN in DPT
    // This is where redo phase will start
    let RedoLSN = min(DirtyPageTable.values().map(e => e.recLSN));
    if (RedoLSN === undefined) {
        RedoLSN = lastCheckpoint.LSN;  // No dirty pages, start from checkpoint
    }
    
    return {
        TransactionTable,
        DirtyPageTable,
        RedoLSN
    };
}

Key Observations About Analysis

Forward scan only: Analysis reads the log from checkpoint to end, never backward.
Conservative dirty page tracking: If a page appears in an UPDATE or CLR record, it's added to the DPT. The page might actually be clean (already flushed to disk), but we don't know—we'll discover this during redo.
Transaction state inference: Any transaction without a COMMIT or END record is assumed to need undo. Even if the transaction was committing when the crash occurred, if END wasn't written, we treat it as uncommitted.
RecLSN captures first modification: The DPT's recLSN for each page is the first log record that dirtied it after the checkpoint. This determines where redo must start for that page.
RedoLSN is the minimum recLSN: Redo must start early enough to capture all necessary operations. The earliest recLSN in the DPT ensures this.

Analysis is Conservative

Analysis Data Structures in Detail

Let's examine the two critical data structures that Analysis produces:

Transaction Table (TT)

The Transaction Table tracks every transaction that was active at crash time. For each transaction, it records:

Transaction Table Structure
Field	Type	Description	Example Values
TransactionID	integer	Unique identifier for the transaction	T1, T2, T3...
State	enum	Current transaction state	UNDO (active), COMMIT (committed but not ended)
LastLSN	LSN	Most recent log record written by this transaction	LSN 5000, 7250, etc.
UndoNextLSN	LSN	Next record to undo for this transaction	Initially same as LastLSN

State Transitions During Analysis

As Analysis scans forward:

New transaction entries are created when we see a log record for an unknown transaction
State changes to COMMIT when we see a COMMIT record
Entries are removed entirely when we see an END record

After Analysis, only transactions that need undo (state = UNDO) or are in the process of committing (state = COMMIT, but no END yet) remain.

Dirty Page Table (DPT)

The Dirty Page Table identifies which pages may contain unflushed modifications:

Dirty Page Table Structure
Field	Type	Description	Example Values
PageID	identifier	Unique identifier for the database page	Page 42, Page 1023, etc.
RecLSN	LSN	LSN of the first log record that dirtied this page (since checkpoint)	LSN 3500, 4200, etc.

Why RecLSN Matters

The RecLSN is the crucial value in the DPT. It represents the earliest point from which redo might be necessary for this page. Consider:

At checkpoint time, Page 42 was either clean or had LSN X as its PageLSN
After the checkpoint, log record at LSN 3500 modified Page 42
Even if Page 42 was later modified by LSN 4200, 5000, etc., we must start redo from LSN 3500
If we started from LSN 5000, we'd miss the changes from LSN 3500 and 4200

The RecLSN answers: "What's the earliest log record that might not be reflected on this page's disk version?"

Computing RedoLSN

The RedoLSN is simply the minimum RecLSN across all entries in the DPT:

RedoLSN = min(RecLSN for all pages in DPT)

This ensures we start redo early enough to capture all necessary operations. Even if most pages need redo from LSN 5000, if one page might need it from LSN 3500, we start from 3500.

If the DPT is empty (no dirty pages), RedoLSN is the checkpoint's LSN—but redo will likely skip everything via LSN comparison.

Checkpoint Contents

Phase 2: Redo

Why Repeat History?

This approach might seem wasteful. Why redo work that we'll undo immediately after? The answer lies in page-level consistency:

Physical state restoration: A page might have been modified by multiple transactions. To correctly undo one transaction, the page must be in its crash-time state, including changes from other transactions.
CLR handling: If the system crashed during rollback, CLRs in the log record partial undo progress. Redo must apply these CLRs to restore the interrupted rollback state.
Simplicity and correctness: Redo doesn't need to track transaction status. It mechanically re-applies operations. This separation of concerns makes the algorithm easier to verify.

Redo Algorithm

Redo Phase Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// ARIES Redo Phase Algorithm
function RedoPhase(
    log: Log, 
    RedoLSN: LSN, 
    DirtyPageTable: Map<PageID, DPTEntry>,
    Buffer: BufferManager
): void {
    
    let currentLSN = RedoLSN;
    
    // Scan forward from RedoLSN to end of log
    while (hasMoreRecords(log, currentLSN)) {
        let record = log.read(currentLSN);
        
        // Only redo UPDATE and CLR records
        // (COMMIT, ABORT, END, CHECKPOINT don't modify pages)
        if (record.type === "UPDATE" || record.type === "CLR") {
            
            let pageId = record.pageId;
            let shouldRedo = true;
            
            // Optimization 1: Skip if page not in DPT
            // If page isn't in DPT, it was clean at checkpoint
            // and not modified after, so no redo needed
            if (!DirtyPageTable.has(pageId)) {
                shouldRedo = false;
            }
            
            // Optimization 2: Check if record LSN < page's RecLSN
            // If this record is older than the first post-checkpoint
            // modification to this page, the page wasn't dirty from
            // this record's perspective
            else if (record.LSN < DirtyPageTable.get(pageId).recLSN) {
                shouldRedo = false;
            }
            
            // Optimization 3: Compare with PageLSN on disk
            // If PageLSN >= record LSN, this change is already on disk
            else {
                let page = Buffer.fetchPage(pageId);
                if (page.PageLSN >= record.LSN) {
                    // Already applied, skip redo
                    shouldRedo = false;
                }
            }
            
            // Actually perform redo if needed
            if (shouldRedo) {
                let page = Buffer.fetchPage(pageId);
                
                // Apply the redo information from the log record
                applyRedo(page, record.redoInfo);
                
                // Update the page's LSN to this record's LSN
                page.PageLSN = record.LSN;
                
                // Note: Page is NOT forced to disk here
                // Buffer manager will flush later per normal operation
            }
        }
        
        currentLSN = nextLSN(log, currentLSN);
    }
    
    // After redo: database is in exact crash-time state
}
 
// Helper: Apply redo information to a page
function applyRedo(page: Page, redoInfo: RedoData): void {
    // For physical logging: copy bytes directly
    // For physiological: re-execute operation
    // For logical: evaluate operation in current context
    // (Implementation depends on logging type)
}

The LSN Comparison Trick

The most elegant optimization in ARIES redo is the PageLSN comparison. Every database page carries a PageLSN in its header—the LSN of the last log record that modified it. During redo:

If PageLSN >= RecordLSN, the modification is already on the page (was flushed to disk before crash)
If PageLSN < RecordLSN, the modification was lost (page had old version on disk)

This single comparison replaces complex bookkeeping. We don't maintain which operations were applied—the page's LSN tells us directly.

Three-Level Redo Filtering

Redo uses three levels of filtering to minimize unnecessary work:

Redo Filtering Optimizations

•DPT membership: If the page isn't in the Dirty Page Table, it was clean at crash—skip all redo for this page. (Level 1: cheapest check)
•RecLSN comparison: If the record's LSN is older than the page's RecLSN in the DPT, this operation predates the page becoming dirty—skip it. (Level 2: DPT lookup)
•PageLSN comparison: Fetch the page and compare its PageLSN to the record's LSN. If already applied, skip. (Level 3: requires I/O but definitive)

Redo is Idempotent

Redo Phase: Worked Example

Let's trace through a redo scenario to solidify understanding. Consider this initial state:

Scenario: State Before Crash
Item	Value
Last Checkpoint LSN	1000
Crash occurred at LSN	5000
RedoLSN (from Analysis)	2000

Dirty Page Table (from Analysis)
PageID	RecLSN
Page A	2000
Page B	3000
Page C	3500

Page States on Disk (Before Recovery)
PageID	PageLSN on Disk	Interpretation
Page A	2500	Has changes up to LSN 2500, missing 3000-5000
Page B	4000	Has changes up to LSN 4000, missing 4500-5000
Page C	3000	Has changes up to LSN 3000, missing 3500-5000

Log Records from LSN 2000 to 5000:

Log Records to Redo
LSN	Type	Page	Action	Redo Decision
2000	UPDATE	Page A	Set X = 10	RecLSN = 2000, PageLSN = 2500 ≥ 2000 → Skip
2500	UPDATE	Page A	Set Y = 20	RecLSN = 2000, PageLSN = 2500 ≥ 2500 → Skip
3000	UPDATE	Page B	Set Z = 30	RecLSN = 3000, PageLSN = 4000 ≥ 3000 → Skip
3000	UPDATE	Page A	Set X = 15	RecLSN = 2000, PageLSN = 2500 < 3000 → REDO
3500	UPDATE	Page C	Set W = 40	RecLSN = 3500, PageLSN = 3000 < 3500 → REDO
4000	UPDATE	Page B	Set Z = 35	RecLSN = 3000, PageLSN = 4000 ≥ 4000 → Skip
4500	UPDATE	Page B	Set Z = 40	RecLSN = 3000, PageLSN = 4000 < 4500 → REDO
5000	UPDATE	Page A	Set Y = 25	RecLSN = 2000, PageLSN now = 3000 < 5000 → REDO

Redo Trace:

LSN 2000, 2500 (Page A): Skipped—PageLSN 2500 shows these are already on disk
LSN 3000 (Page B): Skipped—PageLSN 4000 shows this is already on disk
LSN 3000 (Page A): REDO—PageLSN 2500 < 3000, modification lost in crash
- After redo: Page A's PageLSN = 3000
LSN 3500 (Page C): REDO—PageLSN 3000 < 3500, modification lost
- After redo: Page C's PageLSN = 3500
LSN 4000 (Page B): Skipped—PageLSN 4000 ≥ 4000
LSN 4500 (Page B): REDO—PageLSN 4000 < 4500
- After redo: Page B's PageLSN = 4500
LSN 5000 (Page A): REDO—PageLSN 3000 < 5000
- After redo: Page A's PageLSN = 5000

After Redo:

Page A: PageLSN = 5000 (fully recovered)
Page B: PageLSN = 4500 (fully recovered)
Page C: PageLSN = 3500 (fully recovered)

The database is now in exactly the state it was at the moment of crash.

Redo Restored Crash-Time State

Phase 3: Undo

Identifying Loser Transactions

Undo Direction: Backward

Compensation Log Records (CLRs)

Here's where ARIES becomes truly resilient: when we undo an operation, we write a Compensation Log Record (CLR) describing the undo action. CLRs are crucial for crash-during-recovery robustness:

If we crash during undo, the CLRs recorded so far will be redone (since redo repeats all history)
After redo, we resume undo from where we left off, not from the beginning
CLRs have an UndoNextLSN pointer that tells us where undo should continue

Undo Phase Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// ARIES Undo Phase Algorithm
function UndoPhase(
    log: Log,
    TransactionTable: Map<TransactionID, TTEntry>,
    Buffer: BufferManager
): void {
    
    // Collect all LSNs that need to be undone
    // These form the "ToUndo" set
    let ToUndo: MaxHeap<LSN> = new MaxHeap();
    
    for (let [txnId, entry] of TransactionTable.entries()) {
        if (entry.state === "UNDO") {
            // Add the last LSN of each loser transaction
            ToUndo.insert(entry.lastLSN);
        }
    }
    
    // Process in reverse LSN order (largest first)
    while (!ToUndo.isEmpty()) {
        let lsnToUndo = ToUndo.extractMax();
        let record = log.read(lsnToUndo);
        
        if (record.type === "CLR") {
            // CLR records are never undone themselves!
            // Just follow the UndoNextLSN pointer
            if (record.UndoNextLSN !== null) {
                ToUndo.insert(record.UndoNextLSN);
            }
            // If UndoNextLSN is null, this transaction's undo is complete
        }
        else if (record.type === "UPDATE") {
            // Undo this update
            
            // 1. Read the page
            let page = Buffer.fetchPage(record.pageId);
            
            // 2. Apply the undo operation
            applyUndo(page, record.undoInfo);
            
            // 3. Write a CLR for this undo action
            let clr = {
                type: "CLR",
                transactionId: record.transactionId,
                pageId: record.pageId,
                redoInfo: computeRedoForUndo(record), // How to redo this undo
                UndoNextLSN: record.PrevLSN, // Where to continue undoing
                PrevLSN: getLastLSN(record.transactionId)
            };
            let clrLSN = log.append(clr);
            
            // 4. Update page LSN
            page.PageLSN = clrLSN;
            
            // 5. Add the PrevLSN to continue undoing this transaction
            if (record.PrevLSN !== null) {
                ToUndo.insert(record.PrevLSN);
            }
        }
        // Other record types (COMMIT, BEGIN, etc.) don't need undo
    }
    
    // Write END records for all loser transactions
    for (let [txnId, entry] of TransactionTable.entries()) {
        if (entry.state === "UNDO") {
            log.append({
                type: "END",
                transactionId: txnId
            });
        }
    }
    
    // Recovery complete!
}

The ToUndo Set and Processing Order

Undo maintains a set (typically a max-heap) of LSNs that need processing. Processing proceeds in descending LSN order—we undo the most recent operations first. This is important because:

Dependency ordering: Later operations might depend on earlier ones. Undoing in reverse order respects these dependencies.
Page state: An UPDATE might assume the page is in a certain state. That state was produced by earlier operations. Undoing the latest first maintains consistency.
CLR chaining: When we undo an operation and write a CLR, the CLR's UndoNextLSN points to the next record to undo. Processing in order ensures we can follow this chain correctly.

What About CLR Records?

CLRs Are Never Undone

Undo Phase: Worked Example

Let's continue our earlier example. After redo, the database is in crash-time state. Now we must undo the loser transactions.

Transaction Table (After Analysis)
TransactionID	State	LastLSN
T1	COMMIT	4000
T2	UNDO	5000
T3	UNDO	4500

T1 is committed (we saw its COMMIT record), so it's a winner. T2 and T3 are losers—they must be undone.

Log Records for Loser Transactions:

Log Records to Process
LSN	TxnID	Type	Page	PrevLSN	Action
2000	T2	UPDATE	Page A	null	Set X from 0 to 10
2500	T3	UPDATE	Page B	null	Set Y from 0 to 20
3000	T2	UPDATE	Page A	2000	Set X from 10 to 15
4500	T3	UPDATE	Page B	2500	Set Y from 20 to 25
5000	T2	UPDATE	Page A	3000	Set X from 15 to 18

Undo Processing (Descending LSN Order):

Step 1: Process LSN 5000 (T2)

Undo: Set X from 18 back to 15
Write CLR at LSN 5001: "Set X = 15", UndoNextLSN = 3000
Add LSN 3000 to ToUndo set

Step 2: Process LSN 4500 (T3)

Undo: Set Y from 25 back to 20
Write CLR at LSN 5002: "Set Y = 20", UndoNextLSN = 2500
Add LSN 2500 to ToUndo set

Step 3: Process LSN 3000 (T2)

Undo: Set X from 15 back to 10
Write CLR at LSN 5003: "Set X = 10", UndoNextLSN = 2000
Add LSN 2000 to ToUndo set

Step 4: Process LSN 2500 (T3)

Undo: Set Y from 20 back to 0
Write CLR at LSN 5004: "Set Y = 0", UndoNextLSN = null (end of T3's chain)
T3 undo is complete

Step 5: Process LSN 2000 (T2)

Undo: Set X from 10 back to 0
Write CLR at LSN 5005: "Set X = 0", UndoNextLSN = null (end of T2's chain)
T2 undo is complete

Step 6: Write END Records

Write END record for T2
Write END record for T3

Final State:

Page A: X = 0 (T2's changes undone)
Page B: Y = 0 (T3's changes undone)
T1's committed changes remain in the database

Recovery Complete

Handling Crash During Recovery

One of ARIES's defining strengths is nested crash handling—the ability to recover correctly even if the system crashes during recovery. Let's trace how this works.

Scenario: Crash During Undo

Suppose in our previous example, the system crashes after Step 3 (after writing CLR at LSN 5003, before completing T3's undo).

State at Second Crash:

CLRs at LSN 5001, 5002, 5003 are on stable storage (written before crash)
T2 has been partially undone (LSN 5000 and 3000 undone)
T3 has been partially undone (LSN 4500 undone, 2500 not yet)

Recovery from Second Crash:

Analysis Phase:

Scans from last checkpoint, sees all original records plus CLRs
Transaction Table: T2 (state=UNDO, lastLSN=5003), T3 (state=UNDO, lastLSN=5002)
DPT includes pages modified by CLRs

Redo Phase:

Redoes all operations including CLRs
After redo: database is in state at moment of second crash
This includes the undo work that was already done (CLRs are redone)

Undo Phase:

ToUndo set initialized with: T2's lastLSN = 5003, T3's lastLSN = 5002
Process LSN 5003 (CLR for T2): Follow UndoNextLSN = 2000
Process LSN 5002 (CLR for T3): Follow UndoNextLSN = 2500
Process LSN 2500 (T3): Undo, write CLR 5006
Process LSN 2000 (T2): Undo, write CLR 5007
Write END records

The Key Insight: CLRs as Undo Progress Markers

Notice what happened with CLRs during recovery from the nested crash:

We encountered CLR 5003 (for T2's undo of LSN 3000)
We did not try to undo the CLR
Instead, we followed UndoNextLSN to LSN 2000
This is the next record T2 needs to undo—we didn't re-undo LSN 5000 or 3000

CLRs act as undo progress markers. After a crash-during-recovery:

Redo re-applies all CLRs, restoring the partial undo state
Undo continues from where it left off via UndoNextLSN
No work is repeated; no work is missed

Why CLRs Can't Be Undone

Converting Mermaid diagram...

Bounded Recovery Time

Summary: The Three Phases

We've explored the three phases of ARIES recovery in depth. Let's consolidate the key concepts:

Key Takeaways

•Analysis reconstructs crash-time state by scanning from the last checkpoint, building the Transaction Table and Dirty Page Table, and determining RedoLSN.
•Redo repeats history by re-applying all logged operations from RedoLSN forward, using PageLSN comparison to skip already-applied operations.
•Undo rolls back loser transactions by processing their operations in reverse order, writing CLRs to record progress.
•CLRs enable crash-during-recovery resilience by serving as undo progress markers that are redone (not undone) in subsequent recoveries.
•The phases are sequential and dependent: Analysis feeds Redo, Redo prepares state for Undo, Undo achieves consistency.
•LSN comparison is the key optimization, enabling efficient redo by checking if operations are already reflected on pages.

What's Next

Page Complete

2 / 5