Analysis Phase - Learning Module

Loading content...

0/241

Determining Redo Point: Where Recovery Begins

Finding the Needle in a Billion Haystacks

A production database might have a log spanning years of operation—billions of records representing countless transactions. When the system crashes, every one of these historical transactions has already completed; their effects are safely on disk. Only a tiny fraction of log records are relevant: those describing changes that might not have reached stable storage.

The RedoLSN—also called the recovery start point—is the earliest log position that could possibly require redo operations. Everything before this point is guaranteed to be safely on disk. Finding this boundary efficiently is crucial: starting redo too early wastes time replaying already-durable changes; starting too late risks missing updates, leaving the database inconsistent.

This page explores the algorithm for determining the RedoLSN, its correctness proof, the interaction between the Dirty Page Table and Transaction Table in this calculation, and practical considerations in production systems.

What You Will Learn

By the end of this page, you will understand exactly how the RedoLSN is calculated, why it's correct and complete, how it bounds recovery work, and how different system configurations affect this calculation. You'll be able to compute the RedoLSN for any given system state.

The RedoLSN Concept

The RedoLSN marks the boundary between "definitely on disk" and "might need redo." Understanding what this means requires understanding how the database maintains durability.

The Durability Problem:

Modern databases use a no-force policy: when a transaction commits, we don't force its modified pages to disk immediately. Instead:

The COMMIT log record is forced to stable storage (guaranteeing durability)
The modified pages remain in the buffer pool (dirty but not flushed)
Eventually, background processes or checkpoints write dirty pages to disk

This creates a window where committed data exists only in the log, not in the data files. If the system crashes during this window, redo must reconstruct these changes from log records.

Converting Mermaid diagram...

Formal Definition:

RedoLSN = The smallest LSN such that all database pages are guaranteed to reflect all log records with LSN < RedoLSN.

Equivalently:

Starting redo from RedoLSN guarantees that we don't miss any updates that might not be on disk.

The key insight is that we need to consider two sources of information:

Dirty Page Table: Pages that are (or might be) dirty, with their first-dirtying LSN
Transaction Table: Active transactions that might have made modifications

Conservative is Safe

The RedoLSN might be earlier than strictly necessary (some records processed during redo may already be on disk). This is acceptable—redo checks each page's LSN before applying operations. The critical property is completeness: we must never skip a record that needs redo.

The RedoLSN Calculation Algorithm

After the Analysis Phase completes its log scan, calculating the RedoLSN is straightforward. It requires finding the minimum across two sets of LSNs.

Algorithm:

Find the minimum recLSN across all entries in the Dirty Page Table
(Optionally) Consider the minimum firstLSN of active transactions
RedoLSN = minimum of these values

Implementation:

calculate_redo_lsn.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
LSN AnalysisPhase::calculateRedoLSN() {
    // Component 1: Minimum recLSN from Dirty Page Table
    // This is the LSN of the earliest operation that might not be on disk
    LSN min_rec_lsn = MAX_LSN;
    
    for (const auto& [page_id, entry] : dirty_page_table) {
        if (entry.recovery_lsn < min_rec_lsn) {
            min_rec_lsn = entry.recovery_lsn;
        }
    }
    
    // If DPT is empty, no dirty pages means no redo needed (from DPT perspective)
    if (dirty_page_table.empty()) {
        min_rec_lsn = MAX_LSN;  // Will be replaced if txn table has entries
    }
    
    // Component 2: Minimum firstLSN from Transaction Table (optional but safer)
    // In standard ARIES, DPT contains all relevant pages, but some implementations
    // use this as an additional safety check
    LSN min_txn_lsn = MAX_LSN;
    
    for (const auto& [txn_id, entry] : transaction_table) {
        if (entry.first_lsn < min_txn_lsn) {
            min_txn_lsn = entry.first_lsn;
        }
    }
    
    // RedoLSN is the minimum of both
    LSN redo_lsn;
    
    if (min_rec_lsn == MAX_LSN && min_txn_lsn == MAX_LSN) {
        // No dirty pages and no active transactions
        // This means nothing needs redo - start from end of log
        redo_lsn = log_manager.getLastLSN();
        log_info("RedoLSN: No redo needed (no dirty pages or active txns)");
    } else {
        redo_lsn = min(min_rec_lsn, min_txn_lsn);
        log_info("RedoLSN: %d (min_rec: %d, min_txn: %d)", 
                 redo_lsn, min_rec_lsn, min_txn_lsn);
    }
    
    return redo_lsn;
}

RedoLSN Calculation Examples
Scenario	Min DPT recLSN	Min Txn firstLSN	RedoLSN
Normal operation	5000	5500	5000
Long-running transaction	6000	4000	4000
No dirty pages	∞ (empty)	3000	3000
No active transactions	2500	∞ (empty)	2500
Fresh checkpoint, nothing dirty	∞	∞	End of log (no redo)

Correctness: Why the Minimum Works

Understanding why the minimum of recLSNs is the correct starting point requires tracing through what the recLSN represents and how pages reach disk.

Theorem: All log records with LSN < RedoLSN have their effects reflected on disk.

Proof:

Consider any log record R with LSN(R) < RedoLSN. We need to show that R's effects are on the database pages on disk.

Case 1: The page modified by R is not in the DPT

If a page is not in the DPT, one of two things is true:

The page was never dirtied after the last flush (never modified since checkpoint), OR
The page was dirtied but subsequently flushed to disk

In either case, the page on disk is current. Since the page isn't dirty, record R's effects (if R modified this page) must already be on disk.

Case 2: The page modified by R is in the DPT with recLSN > LSN(R)

This would mean R modified the page before the current recLSN. But recLSN is the LSN of the first modification since the page was last flushed. If R modified the page and has LSN < recLSN, then the page must have been flushed after R's modification but before the current recLSN's modification. Therefore, R's effects are on disk.

Case 3: The page modified by R is in the DPT with recLSN = LSN(R)

This means R was the first modification since the last flush. But this contradicts LSN(R) < RedoLSN since RedoLSN ≤ recLSN for all entries in DPT.

Case 4: The page modified by R is in the DPT with recLSN < LSN(R)

This would mean a different record modified the page before R. But since recLSN ≤ RedoLSN (by definition of minimum), and LSN(R) < RedoLSN, we have recLSN ≤ RedoLSN > LSN(R) ≥ recLSN, which is impossible. ∎

Intuitive Understanding

Think of recLSN as a 'high water mark' for each page—everything below it has been flushed. The minimum recLSN across all pages is the global high water mark. Below this level, all pages are on disk. Above it, some pages might be dirty.

visual_proof.txt

plaintext

Log Timeline (LSN increasing rightward):
═══════════════════════════════════════════════════════════════
        0    1000   2000   3000   4000   5000   6000   7000
        |      |      |      |      |      |      |      |
                
Page 100:          [recLSN=2000]─────────────────────────────────────
                   ↑ First dirty since last flush
                   
Page 200:                         [recLSN=3500]─────────────────────
                                  ↑ First dirty since last flush
                                  
Page 300:                                      [recLSN=4500]────────
                                               ↑ First dirty
 
RedoLSN = min(2000, 3500, 4500) = 2000
 
Everything at LSN < 2000 is GUARANTEED on disk:
- Page 100's recLSN is 2000, so it was flushed including everything up to some point < 2000
- Page 200's recLSN is 3500 > 2000, so it was flushed even more recently  
- Page 300's recLSN is 4500 > 2000, same logic
 
Therefore, Redo safely starts at LSN 2000.

The Role of Checkpoints in RedoLSN

Checkpoints indirectly affect the RedoLSN by influencing the Dirty Page Table state. Understanding this relationship reveals why checkpoint frequency directly impacts recovery time.

How Checkpoints Affect RedoLSN:

During checkpoint, some dirty pages may be flushed (depending on checkpoint type)
Flushed pages are removed from (or never enter) the DPT
Only pages dirtied after the flush have recLSN values
The minimum recLSN can only be as old as the oldest unflushed dirty page

Checkpoint Types and Their Impact:

Checkpoint Types and Recovery Implications
Checkpoint Type	Dirty Page Flushing	RedoLSN Impact	Recovery Time
Quiescent	All pages flushed	RedoLSN = checkpoint LSN	Minimal redo
Transaction-Consistent	All pages flushed	RedoLSN = checkpoint LSN	Minimal redo
Fuzzy (standard ARIES)	No flushing required	RedoLSN may be before checkpoint	More redo possible
Aggressive Fuzzy	Background flushing encouraged	RedoLSN closer to checkpoint	Moderate redo

checkpoint_redolsn_scenario.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
/*
 * Scenario: Fuzzy Checkpoint Without Aggressive Flushing
 * 
 * Time 0: Page 100 dirtied at LSN 1000
 * Time 1: Page 200 dirtied at LSN 1500
 * Time 2: Checkpoint at LSN 2000
 *         DPT snapshot: {P100: recLSN=1000, P200: recLSN=1500}
 * Time 3: Page 300 dirtied at LSN 2500
 * Time 4: CRASH at LSN 3000
 * 
 * Analysis Result:
 *   DPT: {P100: recLSN=1000, P200: recLSN=1500, P300: recLSN=2500}
 *   RedoLSN = 1000  (older than checkpoint!)
 * 
 * Redo must process 2000 log records (1000 to 3000)
 */
 
/*
 * Scenario: Checkpoint With Aggressive Flushing
 * 
 * Time 0: Page 100 dirtied at LSN 1000
 * Time 1: Page 200 dirtied at LSN 1500
 * Time 2: Checkpoint at LSN 2000
 *         - Checkpoint also flushes all dirty pages!
 *         - DPT snapshot after flush: {} (empty)
 * Time 3: Page 300 dirtied at LSN 2500
 * Time 4: CRASH at LSN 3000
 * 
 * Analysis Result:
 *   DPT: {P300: recLSN=2500}
 *   RedoLSN = 2500  (after checkpoint!)
 * 
 * Redo must process only 500 log records (2500 to 3000)
 */
 
void RecoveryManager::performAggressiveCheckpoint() {
    // Standard checkpoint steps plus page flushing
    LSN begin_lsn = log_manager.appendLog(BEGIN_CHECKPOINT);
    
    // Snapshot tables first (for consistency)
    auto dpt_snapshot = dirty_page_table.snapshot();
    auto txn_snapshot = transaction_table.snapshot();
    
    // Flush ALL dirty pages to disk
    buffer_manager.flushAllDirtyPages();
    
    // Now DPT can be cleared (pages are clean)
    // But we already took snapshot, so checkpoint contains pre-flush state
    
    log_manager.appendLog(END_CHECKPOINT, dpt_snapshot, txn_snapshot);
    log_manager.flush();
    
    // Update master record
    updateMasterRecord(begin_lsn);
}

The Checkpoint Trade-off

Aggressive flushing during checkpoints reduces RedoLSN (faster recovery) but increases checkpoint I/O cost and may impact online transaction performance. Systems must balance recovery time requirements against operational impact. Many production databases offer tunable checkpoint aggressiveness.

Transaction Table Contribution to RedoLSN

In standard ARIES, the Dirty Page Table is the primary source for RedoLSN calculation. However, considering the Transaction Table provides an additional safety margin and can be necessary in certain edge cases.

When Transaction Table Matters:

Incomplete DPT Information: If a log record modified a page but the page was evicted from the buffer pool before the DPT was fully updated (race condition in some implementations)
Distributed Transactions: In 2PC scenarios, a transaction may have prepared but not committed, affecting pages on remote nodes that aren't in the local DPT
Logical Logging: If the system uses logical logging (operations described abstractly rather than by exact page changes), the DPT may not capture all affected pages

txn_table_edge_case.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/*
 * Edge Case: Transaction Table Required
 * 
 * Consider this scenario with physiological logging:
 * 
 * LSN 1000: T1 begins
 * LSN 1100: T1 INSERT into table (affects B-tree pages 10, 11, 12)
 *           - Log record: INSERT key=42 into table
 *           - This is a logical operation, not physical page updates
 *           - DPT might only track the leaf page, not internal pages
 * LSN 1200: Buffer manager evicts page 11 (writes to disk)
 *           - Page 11 removed from DPT
 * LSN 1300: CRASH
 * 
 * DPT at crash might only have:
 *   Page 10: recLSN = 1100
 *   Page 12: recLSN = 1100
 *   (Page 11 was evicted clean)
 * 
 * Transaction Table:
 *   T1: firstLSN = 1000, lastLSN = 1100, state = ACTIVE
 * 
 * If we use only DPT: RedoLSN = 1100
 * If we consider Txn Table: RedoLSN = min(1100, 1000) = 1000
 * 
 * The safer choice is 1000, ensuring we don't miss any operations
 * that might have affected pages we don't know about.
 */
 
LSN AnalysisPhase::calculateRedoLSNSafe() {
    LSN from_dpt = getMinRecLSNFromDPT();
    LSN from_txn = getMinFirstLSNFromTransactionTable();
    
    // Take the minimum for safety
    LSN redo_lsn = min(from_dpt, from_txn);
    
    // Log if there's a significant difference (might indicate a problem)
    if (from_dpt != MAX_LSN && from_txn != MAX_LSN) {
        LSN diff = abs((long)from_dpt - (long)from_txn);
        if (diff > 10000) {  // Threshold for concern
            log_warn("RedoLSN sources differ significantly: DPT=%d, TxnTable=%d",
                    from_dpt, from_txn);
        }
    }
    
    return redo_lsn;
}

Defense in Depth

Using both DPT and Transaction Table for RedoLSN provides defense in depth. The additional scan overhead (processing a few extra log records) is negligible compared to the cost of missing a record that needs redo. Production systems often include such safety margins.

Special Cases and Optimizations

Several special cases affect the RedoLSN calculation. Understanding these helps debug recovery issues and optimize checkpoint strategies.

Case 1: Empty DPT and No Active Transactions

If both tables are empty after analysis, no redo is needed. This happens if:

A checkpoint just completed with all pages flushed
All transactions committed and rolled off
The system crashed before any new activity

special_case_no_redo.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
LSN AnalysisPhase::calculateRedoLSN() {
    if (dirty_page_table.empty() && transaction_table.empty()) {
        // No redo needed - database is consistent
        log_info("Database clean: no redo required");
        return END_OF_LOG;  // Special marker meaning "skip redo"
    }
    
    // ... normal calculation
}
 
bool RedoPhase::shouldRun(LSN redo_lsn) {
    if (redo_lsn == END_OF_LOG) {
        log_info("Redo phase skipped: nothing to redo");
        return false;
    }
    return true;
}

Case 2: Very Old Dirty Pages

If a page has been dirty for a long time (perhaps due to a stuck transaction or locking issue), RedoLSN might be very old, causing excessive redo work.

old_dirty_page_detection.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
void AnalysisPhase::detectProblematicDirtyPages() {
    LSN current_lsn = log_manager.getLastLSN();
    LSN checkpoint_lsn = getLastCheckpointLSN();
    
    for (const auto& [page_id, entry] : dirty_page_table) {
        LSN age = current_lsn - entry.recovery_lsn;
        
        // Flag pages dirty since before the last checkpoint
        if (entry.recovery_lsn < checkpoint_lsn) {
            log_warn("Page %d dirty since before checkpoint! recLSN=%d, ckpt=%d",
                    page_id, entry.recovery_lsn, checkpoint_lsn);
            
            // This might indicate:
            // 1. Long-running transaction holding the page
            // 2. Buffer manager thrashing (pages not being flushed)
            // 3. Checkpoint not flushing aggressively enough
        }
        
        // Flag excessively old dirty pages
        if (age > CONFIG_MAX_DIRTY_PAGE_AGE) {
            log_error("Page %d has been dirty for %d LSN units - investigate!",
                     page_id, age);
        }
    }
}
 
/*
 * Optimization: Proactive Flushing
 * 
 * Some systems periodically flush pages whose recLSN is getting too old,
 * preventing RedoLSN from drifting too far into the past.
 */
void BufferManager::proactiveFlush() {
    LSN threshold = log_manager.getLastLSN() - CONFIG_FLUSH_AGE_THRESHOLD;
    
    for (Page& page : buffer_pool) {
        if (page.isDirty() && page.getRecLSN() < threshold) {
            flushPage(page);
            log_debug("Proactively flushed page %d (old recLSN)", page.id);
        }
    }
}

Case 3: Distributed Databases

In distributed systems, the RedoLSN concept extends across nodes. Each node has its own DPT and log, but global consistency requires coordinating recovery.

Distributed Recovery

In distributed databases, each node determines its local RedoLSN independently. Global recovery may require waiting for all nodes to complete redo and undo before distributed transactions can be resolved. The 2PC protocol's PREPARE records help coordinate this process.

Practical RedoLSN Calculation Examples

Let's work through several realistic examples to solidify understanding of RedoLSN calculation.

Scenario: An e-commerce database crashes during a busy shopping period.

State at Crash:

Dirty Page Table at Crash
Page	Type	recLSN
Customers_42	Data page	8,450,000
Orders_idx_7	Index page	8,490,000
Inventory_15	Data page	8,460,000
Cart_temp_99	Temp page	8,500,000
Products_2	Data page	8,400,000

Transaction Table:

T1001: checkout in progress, firstLSN = 8,480,000
T1002: browsing session, firstLSN = 8,510,000
T1003: admin update, firstLSN = 8,420,000

Calculation:

Min DPT recLSN: 8,400,000 (Products_2)
Min Txn firstLSN: 8,420,000 (T1003)
RedoLSN = 8,400,000

Redo will start from LSN 8,400,000 and process approximately 100,000 log records.

Summary: RedoLSN Mastery

The RedoLSN is the crucial boundary that separates "definitely on disk" from "might need redo." Let's consolidate the essential concepts:

Key Takeaways

•Definition: RedoLSN is the minimum recLSN across all Dirty Page Table entries—the earliest modification that might not be on disk.
•Correctness: All log records with LSN < RedoLSN have their effects reflected on disk, guaranteed by the recLSN tracking invariant.
•Conservative Nature: RedoLSN may trigger processing of already-applied records, but redo is idempotent—checking page LSNs skips unnecessary work.
•Checkpoint Impact: More aggressive checkpoint flushing keeps recLSNs recent, reducing the log span that redo must process.
•Transaction Table: Including minimum firstLSN from transactions provides additional safety margin in edge cases.
•Optimization: Proactive flushing of old dirty pages prevents RedoLSN from drifting into ancient history.

What's Next:

With the RedoLSN determined, we've completed the Analysis Phase's primary outputs: the reconstructed DPT, the Transaction Table with loser transactions identified, and the starting point for redo. The final page examines how all this information comes together to reconstruct the crash-time database state, preparing for the Redo and Undo phases.

Page Complete

You now understand how the Analysis Phase determines the RedoLSN—the algorithm, its correctness, the role of checkpoints, and practical implications. This knowledge is essential for understanding recovery performance and tuning checkpoint strategies.