Checkpoints - Learning Module

Loading content...

0/241

Recovery with Checkpoints

When the Crash Happens

The moment of truth for any checkpoint system arrives when a crash occurs. Everything we've discussed—checkpoint concepts, processes, fuzziness, and frequency—exists to serve this moment: recovery.

When a database restarts after a crash, the recovery system must:

Find the most recent valid checkpoint
Initialize recovery data structures from checkpoint information
Analyze the log to understand what happened after the checkpoint
Redo committed work that may not be on disk
Undo uncommitted work that partially reached disk

This page traces the complete recovery process, showing exactly how checkpoint information guides each phase and how the fuzziness of checkpoints is resolved into a consistent database state.

What You Will Learn

By the end of this page, you will understand the complete recovery flow from crash to operational database, how the checkpoint's ATT and DPT are used in each phase, how recovery resolves the fuzziness of checkpoints, and practical considerations for ensuring fast, correct recovery.

Recovery Startup and Checkpoint Location

When a database system starts after an unclean shutdown, the recovery manager takes control before any user activity is allowed.

The first critical task: finding the checkpoint.

Recovery must locate the most recent valid checkpoint. This is done through the master record—a fixed-location pointer that always indicates where to find the current checkpoint.

Recovery startup sequence:

Recovery Startup Steps

•Read control files — Database metadata including log locations, database version, and master record pointer.
•Detect unclean shutdown — Check shutdown flag or compare expected vs. actual state. If unclean, recovery is needed.
•Read master record — Get the LSN of the most recent checkpoint.
•Read checkpoint record — Navigate to the checkpoint LSN in the log and read the checkpoint information.
•Validate checkpoint — Ensure the checkpoint record is complete and uncorrupted. Verify END_CHECKPOINT exists.
•Extract ATT and DPT — Copy the Active Transaction Table and Dirty Page Table from the checkpoint into memory.
•Begin analysis phase — Use the checkpoint data as the starting point for log analysis.

recovery_startup.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- Recovery startup: locating and loading checkpoint
 
FUNCTION recovery_startup():
    -- Step 1: Read control file
    control = read_control_file()
    
    -- Step 2: Check if recovery is needed
    IF control.shutdown_flag == CLEAN:
        LOG("Clean shutdown detected; no recovery needed")
        RETURN SUCCESS
    
    LOG("Unclean shutdown detected; starting recovery")
    
    -- Step 3: Read master record
    master = read_master_record()
    IF master.checkpoint_lsn IS NULL:
        LOG("No checkpoint found; recovering from log start")
        checkpoint_lsn = LOG_START
        att = EMPTY_TABLE
        dpt = EMPTY_TABLE
        redo_lsn = LOG_START
    ELSE:
        -- Step 4: Read checkpoint record
        checkpoint = read_log_record(master.checkpoint_lsn)
        
        -- Step 5: Validate checkpoint
        IF checkpoint.type != 'END_CHECKPOINT':
            LOG("Invalid checkpoint; searching for previous")
            checkpoint = scan_for_valid_checkpoint(before = master.checkpoint_lsn)
        
        -- Step 6: Extract tables
        att = checkpoint.active_transactions
        dpt = checkpoint.dirty_pages
        redo_lsn = checkpoint.redo_lsn
    
    LOG("Checkpoint loaded: " + master.checkpoint_lsn)
    LOG("Active transactions: " + COUNT(att))
    LOG("Dirty pages: " + COUNT(dpt))
    LOG("Redo starting point: " + redo_lsn)
    
    -- Step 7: Begin analysis phase
    RETURN begin_analysis_phase(redo_lsn, att, dpt)

Handling checkpoint corruption:

If the checkpoint record at the master's location is corrupted or incomplete (perhaps the crash occurred during checkpoint writing), recovery falls back to the previous checkpoint.

In extreme cases (no valid checkpoint anywhere), recovery must scan from the beginning of the log. This is why:

Checkpoints should be taken regularly
Log archives are essential for true disaster recovery
Master record updates are carefully ordered

The redo_lsn from checkpoint:

The redo_lsn is the minimum recovery_lsn from the Dirty Page Table—the earliest point where any dirty page might need redo. This is where the analysis and redo phases begin their log scan.

Recovery Before User Access

During recovery, the database is not available for user connections. Only after recovery completes successfully does the system open for normal operation. This is why recovery time directly impacts availability—every second of recovery is a second of downtime.

The Analysis Phase

The analysis phase scans the log from the checkpoint's redo_lsn to the end, updating the ATT and DPT to reflect what happened after the checkpoint.

Purpose of analysis:

The checkpoint's ATT and DPT are snapshots from checkpoint time. Since then:

New transactions may have started
Active transactions may have committed or aborted
Pages may have become dirty or been flushed

Analysis reconstructs the true ATT and DPT as of crash time, which redo and undo will then use.

Analysis phase operations:

Converting Mermaid diagram...

analysis_phase.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
-- Analysis phase: updating ATT and DPT from checkpoint to crash
 
FUNCTION analysis_phase(redo_lsn, att, dpt):
    -- Scan log from redo_lsn to end
    FOR each log_record FROM redo_lsn TO end_of_log:
        txn_id = log_record.txn_id
        
        -- Handle different record types
        CASE log_record.type:
            
            WHEN 'BEGIN':
                -- New transaction started after checkpoint
                att[txn_id] = ATTEntry {
                    state: ACTIVE,
                    first_lsn: log_record.lsn,
                    last_lsn: log_record.lsn,
                    undo_nxt_lsn: NULL
                }
            
            WHEN 'UPDATE', 'INSERT', 'DELETE':
                -- Data modification
                page_id = log_record.page_id
                
                -- Update ATT: track last LSN
                att[txn_id].last_lsn = log_record.lsn
                
                -- Update DPT: add page if not already present
                IF page_id NOT IN dpt:
                    dpt[page_id] = DPTEntry {
                        recovery_lsn: log_record.lsn
                        -- First log record that might need redo for this page
                    }
                -- If already in DPT, keep original recovery_lsn
                -- (we want the FIRST modification, not the latest)
            
            WHEN 'COMMIT':
                -- Transaction committed; remove from ATT
                -- (won't need undo)
                DELETE att[txn_id]
            
            WHEN 'ABORT':
                -- Transaction aborted; mark for undo
                att[txn_id].state = ABORTED
                att[txn_id].undo_nxt_lsn = log_record.prev_lsn
            
            WHEN 'CLR':  -- Compensation Log Record
                -- Part of ongoing rollback; update undo position
                att[txn_id].undo_nxt_lsn = log_record.undo_nxt_lsn
            
            WHEN 'END':
                -- Transaction fully complete; remove from ATT
                DELETE att[txn_id]
    
    -- Compute redo start point
    IF dpt IS EMPTY:
        final_redo_lsn = end_of_log  -- Nothing to redo
    ELSE:
        final_redo_lsn = MIN(entry.recovery_lsn FOR entry IN dpt)
    
    RETURN att, dpt, final_redo_lsn

Key analysis outcomes:

Final ATT contains "loser" transactions: All transactions still active at crash time. These need undo.
Final DPT contains all potentially dirty pages: Both from checkpoint and from post-checkpoint modifications.
Redo LSN is the earliest point requiring replay: Minimum recovery_lsn across all dirty pages.

Why not just use checkpoint values?

Because of checkpoint fuzziness:

Transactions may have committed after checkpoint (remove from ATT)
New transactions may have started (add to ATT)
New pages may have become dirty (add to DPT)
Some dirty pages may have been flushed (but we can't remove from DPT—redo phase handles this)

Analysis resolves the fuzziness for ATT. DPT remains pessimistic—we include everything that might need redo; redo phase determines what actually needs redo.

Analysis Is Fast

Analysis only reads log records—it doesn't touch data pages. This makes it very fast: typically limited only by log read speed. Even a gigabyte of log can be analyzed in seconds with modern storage.

The Redo Phase

The redo phase replays log records to bring the database to its crash-time state. This includes all modifications—even those by uncommitted transactions—because undo needs the complete crash-time state to work backwards.

Redo philosophy: "Repeat History"

The redo phase operates on the principle of repeating history. It reconstructs exactly what the database looked like at the moment of the crash, including:

Committed transactions' changes (these must be preserved)
Uncommitted transactions' changes (these will be undone later)
Partial rollbacks that were in progress (continuing undo later)

Redo phase operations:

redo_phase.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-- Redo phase: replaying history from redo_lsn to crash
 
FUNCTION redo_phase(redo_lsn, dpt):
    LOG("Beginning redo phase from LSN: " + redo_lsn)
    records_redone = 0
    records_skipped = 0
    
    -- Scan log forward from redo_lsn
    FOR each log_record FROM redo_lsn TO end_of_log:
        
        -- Only redo data-modifying records
        IF log_record.type NOT IN ('UPDATE', 'INSERT', 'DELETE', 'CLR'):
            CONTINUE  -- Skip non-data records
        
        page_id = log_record.page_id
        
        -- Optimization: check if page is in DPT
        -- If not in DPT, page was clean at checkpoint and no newer records affect it
        -- (This check is optional but speeds up recovery)
        IF page_id NOT IN dpt:
            -- Page wasn't dirty at checkpoint and wasn't modified after
            records_skipped += 1
            CONTINUE
        
        -- Check if redo is needed for this specific record
        IF log_record.lsn < dpt[page_id].recovery_lsn:
            -- Record is before this page's recovery_lsn
            -- Page was flushed after this record; no redo needed
            records_skipped += 1
            CONTINUE
        
        -- Read the actual page from disk
        page = read_page(page_id)
        
        -- Critical check: compare LSNs
        IF log_record.lsn <= page.page_lsn:
            -- Page's LSN is at or after this record
            -- The modification already reached disk; skip redo
            records_skipped += 1
            CONTINUE
        
        -- Redo is needed: apply the log record
        page = apply_log_record(log_record, page)
        page.page_lsn = log_record.lsn  -- Update page LSN
        write_page(page)                 -- Write to disk (or buffer pool)
        records_redone += 1
    
    LOG("Redo complete: " + records_redone + " redone, " + records_skipped + " skipped")
    RETURN SUCCESS

The page_lsn check is crucial:

The page_lsn (stored on each page) tells us the LSN of the last log record that modified this page and was flushed to disk. During redo:

If log_record.lsn > page.page_lsn → The page doesn't have this change yet → REDO
If log_record.lsn ≤ page.page_lsn → The page already has this change (or later) → SKIP

This is how the "fuzziness" is resolved: we don't know from the checkpoint which pages were actually flushed, but we can tell by reading each page and checking its LSN.

Redo Decision Matrix
Page in DPT?	log_record.lsn vs page.page_lsn	Action	Reason
Yes	log_record.lsn > page_lsn	REDO	Page on disk is older; apply change
Yes	log_record.lsn ≤ page_lsn	SKIP	Page was flushed after this log record
No	log_record.lsn > page_lsn	REDO	Page modified after checkpoint; apply change
No (and < checkpoint)	—	SKIP (optimized)	Page was clean at checkpoint and stayed clean

Idempotent Redo

Redo operations are idempotent—applying them twice produces the same result as applying once. This is essential because recovery might crash and restart. The page_lsn check ensures we don't re-apply operations that already succeeded in a previous recovery attempt.

The Undo Phase

The undo phase rolls back all transactions that were active at crash time and never committed—the "loser" transactions identified by the final ATT from analysis.

Why undo is necessary:

After redo, the database reflects the crash-time state, including partial work by uncommitted transactions. This violates atomicity—uncommitted transactions should leave no trace. Undo removes these partial modifications.

Undo phase approach:

Undo works backward through the log, undoing each modification by loser transactions. As each modification is undone, a Compensation Log Record (CLR) is written, recording the undo operation.

Converting Mermaid diagram...

undo_phase.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
-- Undo phase: rolling back loser transactions
 
FUNCTION undo_phase(att):
    -- Build list of loser transactions (still in ATT after analysis)
    losers = [txn for txn in att WHERE txn.state IN (ACTIVE, ABORTED)]
    
    LOG("Beginning undo phase for " + COUNT(losers) + " loser transactions")
    
    -- Build priority queue of log records to undo
    -- Priority: highest LSN first (undo in reverse order)
    undo_queue = PriorityQueue()  -- Max-heap by LSN
    
    FOR txn IN losers:
        IF txn.undo_nxt_lsn IS NOT NULL:
            undo_queue.insert(txn.undo_nxt_lsn, txn.txn_id)
        ELSE:
            undo_queue.insert(txn.last_lsn, txn.txn_id)
    
    -- Process undo in reverse LSN order
    WHILE undo_queue IS NOT EMPTY:
        lsn, txn_id = undo_queue.pop()
        log_record = read_log_record(lsn)
        
        IF log_record.type == 'CLR':
            -- This is a compensation log record from previous rollback
            -- Skip to the undo_nxt_lsn recorded in the CLR
            IF log_record.undo_nxt_lsn IS NOT NULL:
                undo_queue.insert(log_record.undo_nxt_lsn, txn_id)
            -- If undo_nxt_lsn is NULL, this txn is fully undone
            CONTINUE
        
        IF log_record.type IN ('UPDATE', 'INSERT', 'DELETE'):
            -- Undo this modification
            page = read_page(log_record.page_id)
            page = apply_undo(log_record, page)
            
            -- Write CLR to log
            clr = write_clr(
                txn_id: txn_id,
                undo_of: lsn,
                undo_nxt_lsn: log_record.prev_lsn,  -- Next to undo
                page_id: log_record.page_id
            )
            
            page.page_lsn = clr.lsn
            write_page(page)
            
            -- Continue with previous log record of this txn
            IF log_record.prev_lsn IS NOT NULL:
                undo_queue.insert(log_record.prev_lsn, txn_id)
            ELSE:
                -- Reached beginning of txn; write END record
                write_end_record(txn_id)
                LOG("Transaction " + txn_id + " fully undone")
    
    LOG("Undo phase complete")
    RETURN SUCCESS

Why write CLRs during undo?

Compensation Log Records serve two purposes:

Durability of undo operations: CLRs are log records, so they're durable. If the system crashes during undo, recovery can continue undo from where it left off by reading the CLRs.
Prevent redo of undone work: Each CLR marks that a modification was undone. If redo runs again after a recovery crash, it will see the CLR and know not to redo the original (now-undone) operation.

The undo_nxt_lsn chain:

Each CLR contains an undo_nxt_lsn pointing to the next log record that needs undoing. This chain enables:

Resumption after recovery crash (start from CLR's undo_nxt_lsn)
Efficient backward traversal (don't re-examine already-undone records)
Tracking progress even during nested undo operations

Undo Is Usually Fast

In typical systems, most transactions commit quickly, so there are few losers. Even if a transaction ran for an hour, undo only needs to reverse its actual modifications—often seconds of work. Undo time rarely dominates recovery.

Recovery Completion and System Opening

After undo completes, the database is in a transaction-consistent state. But before opening to users, a few finishing steps are required.

Finalization steps:

Recovery Finalization

•Write END records for all losers — Mark each undone transaction as complete in the log. This prevents re-undoing on subsequent recovery.
•Force log to disk — Ensure all CLRs and END records are durable. The recovery is not safe until this completes.
•Take a checkpoint — Some systems take a checkpoint immediately after recovery. This minimizes work if another crash occurs soon.
•Update control files — Mark the database as cleanly recovered. Set the shutdown flag appropriately.
•Open for user connections — Allow normal database operations to resume.
•Background cleanup — Some recovery work may continue in background (index rebuilds, space reclamation) but doesn't block user access.

recovery_completion.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- Recovery completion and database opening
 
FUNCTION complete_recovery():
    -- Ensure all recovery work is durable
    force_log_to_stable_storage()
    
    LOG("Recovery complete. Summary:")
    LOG("  Redo: " + stats.redo_records + " records processed")
    LOG("  Undo: " + stats.losers_count + " transactions rolled back")
    LOG("  Duration: " + stats.duration_seconds + " seconds")
    
    -- Optional: take immediate checkpoint to protect recovery work
    IF config.checkpoint_after_recovery:
        perform_checkpoint(reason = "POST_RECOVERY")
    
    -- Update control file
    update_control_file(
        state: CLEAN,
        last_recovery_time: NOW(),
        recovery_lsn: end_of_log
    )
    
    -- Open database for connections
    set_database_state(OPEN)
    LOG("Database open for user connections")
    
    -- Start background processes
    start_background_writer()
    start_checkpoint_scheduler()
    start_archiver()
    
    -- Some systems continue background work
    IF deferred_work IS NOT EMPTY:
        schedule_background_cleanup(deferred_work)
        LOG("Background cleanup scheduled for " + COUNT(deferred_work) + " items")
 
-- Timeline of recovery phases
RECOVERY_TIMELINE = {
    startup:   "~1-5 seconds (read control files)",
    analysis:  "~10-30 seconds per GB of log (log scan only)",
    redo:      "~1-5 minutes per GB of data (I/O bound)",
    undo:      "~seconds to minutes (depends on loser work)",
    finalize:  "~1-5 seconds (log force, open)"
}

Recovery time breakdown:

In most recoveries:

Analysis is fast (log scan only, no data I/O)
Redo dominates (reading pages, applying changes, writing pages)
Undo is fast (few losers, limited work)
Finalization is fast (log force and state updates)

Why checkpoint after recovery?

If the system crashes again soon after recovery, it would need to redo the same work. A post-recovery checkpoint:

Flushes all the redone pages to disk
Establishes a new recovery start point
Minimizes repeated work

Some systems make this optional (immediate availability is valued more than crash protection).

Distributed Transaction Considerations

Transactions in PREPARING state (2-phase commit) cannot be committed or aborted unilaterally. After basic recovery, these transactions remain 'in-doubt' until the coordinator is contacted. Some systems open for normal operation while waiting for in-doubt resolution; others block until resolved.

How Checkpoint Quality Affects Recovery

Not all checkpoints are equal. The quality of a checkpoint—how recent its redo_lsn is, how accurate its tables are—directly affects recovery speed.

Factors that improve checkpoint quality:

Checkpoint Quality Factors and Recovery Impact
Quality Factor	Good State	Poor State	Recovery Impact
Dirty page age	Most pages recently flushed	Many old dirty pages	Redo_lsn close to checkpoint vs. far behind
Checkpoint frequency	Checkpoints every 5 min	Checkpoints every hour	5 min of log replay vs. 1 hour
Active transaction count	Few long-running txns	Many long-running txns	Small ATT vs. large ATT in undo
Log generation rate	Steady, moderate rate	High burst rate	Predictable redo time vs. variable
Background flushing	Aggressive, continuous	Minimal, lazy	Short redo vs. long redo

The redo_lsn is the key metric:

The most important checkpoint quality indicator is the distance from redo_lsn to the checkpoint LSN:

Ideal: redo_lsn equals checkpoint LSN (no dirty pages predating checkpoint)
Good: redo_lsn within seconds of checkpoint
Acceptable: redo_lsn within the checkpoint interval
Poor: redo_lsn much older than checkpoint

When redo_lsn is old, it means dirty pages haven't been flushed. Even though the checkpoint is recent, recovery must go back further into the log.

checkpoint_quality_monitoring.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Monitoring checkpoint quality for recovery prediction
 
-- PostgreSQL: Estimate recovery time from current state
SELECT 
    pg_current_wal_lsn() AS current_lsn,
    (SELECT redo_lsn FROM pg_control_checkpoint()) AS checkpoint_redo_lsn,
    pg_wal_lsn_diff(
        pg_current_wal_lsn(), 
        (SELECT redo_lsn FROM pg_control_checkpoint())
    ) / 1024 / 1024 AS log_distance_mb,
    
    -- Rough estimate: assume 100 MB/s redo rate
    pg_wal_lsn_diff(
        pg_current_wal_lsn(), 
        (SELECT redo_lsn FROM pg_control_checkpoint())
    ) / 1024 / 1024 / 100 AS estimated_redo_seconds;
 
-- MySQL/InnoDB: Check checkpoint age
SELECT 
    (@@innodb_log_file_size * @@innodb_log_files_in_group) AS log_space_bytes,
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status 
     WHERE VARIABLE_NAME = 'Innodb_checkpoint_age') AS checkpoint_age,
    -- checkpoint_age / log_space = how full the log is
    -- Higher = potentially longer recovery
    ROUND(
        (SELECT VARIABLE_VALUE FROM performance_schema.global_status 
         WHERE VARIABLE_NAME = 'Innodb_checkpoint_age') /
        (@@innodb_log_file_size * @@innodb_log_files_in_group) * 100, 2
    ) AS log_fill_percentage;
 
-- Alert if recovery estimate exceeds threshold
-- Example alert condition:
-- IF estimated_redo_seconds > target_recovery_time * 0.8:
--     ALERT("Recovery time approaching SLA threshold")

Test Recovery Regularly

Predictions are estimates. The only way to know true recovery time is to crash a test system and measure. Do this quarterly: create a replica, crash it, time recovery, compare to SLA. This validates your checkpoint and flushing configuration.

Practical Recovery Scenarios

Let's examine recovery in concrete scenarios, tracing through the phases with example data.

Scenario: E-commerce Database Crash

An e-commerce database crashes during peak traffic. At crash time:

Last checkpoint was 3 minutes ago
50,000 pages in DPT with redo_lsn 2.5 minutes before checkpoint
150 active transactions (ATT)
500 MB of log since checkpoint

Recovery trace:

recovery_scenario.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
-- Recovery scenario walkthrough
 
-- STARTUP (0-2 seconds)
[14:35:00] Database startup initiated
[14:35:00] Reading control file...
[14:35:00] Detected unclean shutdown at 14:34:55
[14:35:01] Master record points to checkpoint at LSN 1000000
[14:35:01] Reading checkpoint record...
[14:35:01] Checkpoint loaded:
           - ATT entries: 150 transactions
           - DPT entries: 50,000 pages
           - Redo_lsn: 980000 (2.5 min before checkpoint)
           - Checkpoint taken at: 14:32:00
 
-- ANALYSIS PHASE (2-8 seconds)
[14:35:02] Starting analysis from LSN 980000
[14:35:02] Scanning 500 MB of log records...
[14:35:05] Analysis progress: 250 MB scanned (50%)
[14:35:08] Analysis complete:
           - Final ATT: 35 transactions (losers)
              - 115 transactions committed after checkpoint
           - Final DPT: 52,000 pages
              - 2,000 pages dirtied after checkpoint
           - Redo_lsn confirmed: 980000
 
-- REDO PHASE (8-68 seconds)
[14:35:08] Starting redo from LSN 980000
[14:35:08] Processing 650,000 log records...
[14:35:15] Redo progress: 100 MB / 550 MB total
[14:35:25] Redo progress: 250 MB / 550 MB total  
[14:35:40] Redo progress: 400 MB / 550 MB total
[14:35:55] Redo progress: 550 MB / 550 MB total
[14:36:00] Redo complete:
           - Records processed: 650,000
           - Pages actually redone: 48,500
           - Pages skipped (already flushed): 3,500
 
-- UNDO PHASE (68-78 seconds)
[14:36:00] Starting undo for 35 loser transactions
[14:36:02] Undoing transaction 1001 (1,200 modifications)
[14:36:04] Undoing transaction 1005 (450 modifications)
... (33 more transactions)
[14:36:08] Undo complete:
           - Transactions undone: 35
           - Total modifications reversed: 12,500
           - CLRs written: 12,500
 
-- FINALIZATION (78-82 seconds)
[14:36:08] Forcing log to stable storage
[14:36:09] Performing post-recovery checkpoint
[14:36:10] Updating control file
[14:36:10] Database open for connections
 
-- SUMMARY
[14:36:10] Recovery complete
           Total time: 70 seconds
           - Analysis: 6 seconds
           - Redo: 52 seconds
           - Undo: 8 seconds  
           - Finalization: 4 seconds

Observations from this scenario:

Redo dominated: 52 of 70 seconds (74%) was redo. This is typical.
DPT fuzziness resolved: 50,000 pages in DPT, but only 48,500 actually needed redo. The rest were flushed between checkpoint and crash.
ATT fuzziness resolved: 150 transactions in checkpoint ATT, but only 35 were losers. The rest committed after checkpoint.
3,500 pages skipped: Background flushing between checkpoint and crash saved redo work.
35 losers were small: Even 35 active transactions only needed 12,500 undo operations—8 seconds of work.

Compare to RTO

This 70-second recovery meets a 2-minute RTO easily. If the RTO were 30 seconds, adjustments would be needed: more frequent checkpoints, more aggressive flushing, or faster storage.

Summary: Recovery with Checkpoints

Recovery is where checkpoint theory meets reality. The checkpoint provides the starting point; the recovery algorithm uses that information to restore a consistent database state. Let's consolidate the key concepts:

Key Takeaways

•Recovery starts by finding the checkpoint — The master record points to the most recent valid checkpoint, which provides ATT, DPT, and redo_lsn.
•Analysis updates ATT and DPT to crash-time state — Log scan discovers commits, new transactions, and new dirty pages since checkpoint.
•Redo replays history to reconstruct crash-time state — Page LSN comparison determines which operations actually need redo.
•Undo rolls back loser transactions — CLRs ensure undo is durable and resumable if recovery crashes.
•Page LSN resolves checkpoint fuzziness — We don't need to know exactly when pages were flushed; we can check each page directly.
•Checkpoint quality directly affects recovery time — Recent redo_lsn means shorter redo; fewer active transactions means shorter undo.
•Recovery time is predictable and testable — Monitor redo_lsn distance and test actual recovery to validate RTO compliance.

Module conclusion:

With this page, we've completed our deep dive into checkpoints. You now understand:

What checkpoints are and what they capture
How the checkpoint process works, including fuzzy checkpoints
When to checkpoint (frequency and triggers)
How recovery uses checkpoint information to restore consistency

Checkpoints are the bridge between the theoretical guarantees of transaction durability and the practical reality of systems that must recover quickly after failures. Every production database depends on well-tuned checkpointing to meet availability requirements.

Module Complete

Congratulations! You have completed the Checkpoints module. You now possess deep, practical knowledge of checkpoint mechanisms—understanding that will serve you whether you're tuning production databases, debugging recovery issues, or designing new database systems. You understand not just how checkpoints work, but why they work that way.

Recovery with Checkpoints

When the Crash Happens

When a database restarts after a crash, the recovery system must:

Find the most recent valid checkpoint
Initialize recovery data structures from checkpoint information
Analyze the log to understand what happened after the checkpoint
Redo committed work that may not be on disk
Undo uncommitted work that partially reached disk

This page traces the complete recovery process, showing exactly how checkpoint information guides each phase and how the fuzziness of checkpoints is resolved into a consistent database state.

What You Will Learn

Recovery Startup and Checkpoint Location

When a database system starts after an unclean shutdown, the recovery manager takes control before any user activity is allowed.

The first critical task: finding the checkpoint.

Recovery must locate the most recent valid checkpoint. This is done through the master record—a fixed-location pointer that always indicates where to find the current checkpoint.

Recovery startup sequence:

Recovery Startup Steps

•Read control files — Database metadata including log locations, database version, and master record pointer.
•Detect unclean shutdown — Check shutdown flag or compare expected vs. actual state. If unclean, recovery is needed.
•Read master record — Get the LSN of the most recent checkpoint.
•Read checkpoint record — Navigate to the checkpoint LSN in the log and read the checkpoint information.
•Validate checkpoint — Ensure the checkpoint record is complete and uncorrupted. Verify END_CHECKPOINT exists.
•Extract ATT and DPT — Copy the Active Transaction Table and Dirty Page Table from the checkpoint into memory.
•Begin analysis phase — Use the checkpoint data as the starting point for log analysis.

recovery_startup.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- Recovery startup: locating and loading checkpoint
 
FUNCTION recovery_startup():
    -- Step 1: Read control file
    control = read_control_file()
    
    -- Step 2: Check if recovery is needed
    IF control.shutdown_flag == CLEAN:
        LOG("Clean shutdown detected; no recovery needed")
        RETURN SUCCESS
    
    LOG("Unclean shutdown detected; starting recovery")
    
    -- Step 3: Read master record
    master = read_master_record()
    IF master.checkpoint_lsn IS NULL:
        LOG("No checkpoint found; recovering from log start")
        checkpoint_lsn = LOG_START
        att = EMPTY_TABLE
        dpt = EMPTY_TABLE
        redo_lsn = LOG_START
    ELSE:
        -- Step 4: Read checkpoint record
        checkpoint = read_log_record(master.checkpoint_lsn)
        
        -- Step 5: Validate checkpoint
        IF checkpoint.type != 'END_CHECKPOINT':
            LOG("Invalid checkpoint; searching for previous")
            checkpoint = scan_for_valid_checkpoint(before = master.checkpoint_lsn)
        
        -- Step 6: Extract tables
        att = checkpoint.active_transactions
        dpt = checkpoint.dirty_pages
        redo_lsn = checkpoint.redo_lsn
    
    LOG("Checkpoint loaded: " + master.checkpoint_lsn)
    LOG("Active transactions: " + COUNT(att))
    LOG("Dirty pages: " + COUNT(dpt))
    LOG("Redo starting point: " + redo_lsn)
    
    -- Step 7: Begin analysis phase
    RETURN begin_analysis_phase(redo_lsn, att, dpt)

Handling checkpoint corruption:

If the checkpoint record at the master's location is corrupted or incomplete (perhaps the crash occurred during checkpoint writing), recovery falls back to the previous checkpoint.

In extreme cases (no valid checkpoint anywhere), recovery must scan from the beginning of the log. This is why:

Checkpoints should be taken regularly
Log archives are essential for true disaster recovery
Master record updates are carefully ordered

The redo_lsn from checkpoint:

The redo_lsn is the minimum recovery_lsn from the Dirty Page Table—the earliest point where any dirty page might need redo. This is where the analysis and redo phases begin their log scan.

Recovery Before User Access

The Analysis Phase

The analysis phase scans the log from the checkpoint's redo_lsn to the end, updating the ATT and DPT to reflect what happened after the checkpoint.

Purpose of analysis:

The checkpoint's ATT and DPT are snapshots from checkpoint time. Since then:

New transactions may have started
Active transactions may have committed or aborted
Pages may have become dirty or been flushed

Analysis reconstructs the true ATT and DPT as of crash time, which redo and undo will then use.

Analysis phase operations:

Converting Mermaid diagram...

analysis_phase.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
-- Analysis phase: updating ATT and DPT from checkpoint to crash
 
FUNCTION analysis_phase(redo_lsn, att, dpt):
    -- Scan log from redo_lsn to end
    FOR each log_record FROM redo_lsn TO end_of_log:
        txn_id = log_record.txn_id
        
        -- Handle different record types
        CASE log_record.type:
            
            WHEN 'BEGIN':
                -- New transaction started after checkpoint
                att[txn_id] = ATTEntry {
                    state: ACTIVE,
                    first_lsn: log_record.lsn,
                    last_lsn: log_record.lsn,
                    undo_nxt_lsn: NULL
                }
            
            WHEN 'UPDATE', 'INSERT', 'DELETE':
                -- Data modification
                page_id = log_record.page_id
                
                -- Update ATT: track last LSN
                att[txn_id].last_lsn = log_record.lsn
                
                -- Update DPT: add page if not already present
                IF page_id NOT IN dpt:
                    dpt[page_id] = DPTEntry {
                        recovery_lsn: log_record.lsn
                        -- First log record that might need redo for this page
                    }
                -- If already in DPT, keep original recovery_lsn
                -- (we want the FIRST modification, not the latest)
            
            WHEN 'COMMIT':
                -- Transaction committed; remove from ATT
                -- (won't need undo)
                DELETE att[txn_id]
            
            WHEN 'ABORT':
                -- Transaction aborted; mark for undo
                att[txn_id].state = ABORTED
                att[txn_id].undo_nxt_lsn = log_record.prev_lsn
            
            WHEN 'CLR':  -- Compensation Log Record
                -- Part of ongoing rollback; update undo position
                att[txn_id].undo_nxt_lsn = log_record.undo_nxt_lsn
            
            WHEN 'END':
                -- Transaction fully complete; remove from ATT
                DELETE att[txn_id]
    
    -- Compute redo start point
    IF dpt IS EMPTY:
        final_redo_lsn = end_of_log  -- Nothing to redo
    ELSE:
        final_redo_lsn = MIN(entry.recovery_lsn FOR entry IN dpt)
    
    RETURN att, dpt, final_redo_lsn

Key analysis outcomes:

Final ATT contains "loser" transactions: All transactions still active at crash time. These need undo.
Final DPT contains all potentially dirty pages: Both from checkpoint and from post-checkpoint modifications.
Redo LSN is the earliest point requiring replay: Minimum recovery_lsn across all dirty pages.

Why not just use checkpoint values?

Because of checkpoint fuzziness:

Transactions may have committed after checkpoint (remove from ATT)
New transactions may have started (add to ATT)
New pages may have become dirty (add to DPT)
Some dirty pages may have been flushed (but we can't remove from DPT—redo phase handles this)

Analysis resolves the fuzziness for ATT. DPT remains pessimistic—we include everything that might need redo; redo phase determines what actually needs redo.

Analysis Is Fast

The Redo Phase

Redo philosophy: "Repeat History"

The redo phase operates on the principle of repeating history. It reconstructs exactly what the database looked like at the moment of the crash, including:

Committed transactions' changes (these must be preserved)
Uncommitted transactions' changes (these will be undone later)
Partial rollbacks that were in progress (continuing undo later)

Redo phase operations:

redo_phase.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-- Redo phase: replaying history from redo_lsn to crash
 
FUNCTION redo_phase(redo_lsn, dpt):
    LOG("Beginning redo phase from LSN: " + redo_lsn)
    records_redone = 0
    records_skipped = 0
    
    -- Scan log forward from redo_lsn
    FOR each log_record FROM redo_lsn TO end_of_log:
        
        -- Only redo data-modifying records
        IF log_record.type NOT IN ('UPDATE', 'INSERT', 'DELETE', 'CLR'):
            CONTINUE  -- Skip non-data records
        
        page_id = log_record.page_id
        
        -- Optimization: check if page is in DPT
        -- If not in DPT, page was clean at checkpoint and no newer records affect it
        -- (This check is optional but speeds up recovery)
        IF page_id NOT IN dpt:
            -- Page wasn't dirty at checkpoint and wasn't modified after
            records_skipped += 1
            CONTINUE
        
        -- Check if redo is needed for this specific record
        IF log_record.lsn < dpt[page_id].recovery_lsn:
            -- Record is before this page's recovery_lsn
            -- Page was flushed after this record; no redo needed
            records_skipped += 1
            CONTINUE
        
        -- Read the actual page from disk
        page = read_page(page_id)
        
        -- Critical check: compare LSNs
        IF log_record.lsn <= page.page_lsn:
            -- Page's LSN is at or after this record
            -- The modification already reached disk; skip redo
            records_skipped += 1
            CONTINUE
        
        -- Redo is needed: apply the log record
        page = apply_log_record(log_record, page)
        page.page_lsn = log_record.lsn  -- Update page LSN
        write_page(page)                 -- Write to disk (or buffer pool)
        records_redone += 1
    
    LOG("Redo complete: " + records_redone + " redone, " + records_skipped + " skipped")
    RETURN SUCCESS

The page_lsn check is crucial:

The page_lsn (stored on each page) tells us the LSN of the last log record that modified this page and was flushed to disk. During redo:

If log_record.lsn > page.page_lsn → The page doesn't have this change yet → REDO
If log_record.lsn ≤ page.page_lsn → The page already has this change (or later) → SKIP

This is how the "fuzziness" is resolved: we don't know from the checkpoint which pages were actually flushed, but we can tell by reading each page and checking its LSN.

Redo Decision Matrix
Page in DPT?	log_record.lsn vs page.page_lsn	Action	Reason
Yes	log_record.lsn > page_lsn	REDO	Page on disk is older; apply change
Yes	log_record.lsn ≤ page_lsn	SKIP	Page was flushed after this log record
No	log_record.lsn > page_lsn	REDO	Page modified after checkpoint; apply change
No (and < checkpoint)	—	SKIP (optimized)	Page was clean at checkpoint and stayed clean

Idempotent Redo

The Undo Phase

The undo phase rolls back all transactions that were active at crash time and never committed—the "loser" transactions identified by the final ATT from analysis.

Why undo is necessary:

Undo phase approach:

Undo works backward through the log, undoing each modification by loser transactions. As each modification is undone, a Compensation Log Record (CLR) is written, recording the undo operation.

Converting Mermaid diagram...

undo_phase.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
-- Undo phase: rolling back loser transactions
 
FUNCTION undo_phase(att):
    -- Build list of loser transactions (still in ATT after analysis)
    losers = [txn for txn in att WHERE txn.state IN (ACTIVE, ABORTED)]
    
    LOG("Beginning undo phase for " + COUNT(losers) + " loser transactions")
    
    -- Build priority queue of log records to undo
    -- Priority: highest LSN first (undo in reverse order)
    undo_queue = PriorityQueue()  -- Max-heap by LSN
    
    FOR txn IN losers:
        IF txn.undo_nxt_lsn IS NOT NULL:
            undo_queue.insert(txn.undo_nxt_lsn, txn.txn_id)
        ELSE:
            undo_queue.insert(txn.last_lsn, txn.txn_id)
    
    -- Process undo in reverse LSN order
    WHILE undo_queue IS NOT EMPTY:
        lsn, txn_id = undo_queue.pop()
        log_record = read_log_record(lsn)
        
        IF log_record.type == 'CLR':
            -- This is a compensation log record from previous rollback
            -- Skip to the undo_nxt_lsn recorded in the CLR
            IF log_record.undo_nxt_lsn IS NOT NULL:
                undo_queue.insert(log_record.undo_nxt_lsn, txn_id)
            -- If undo_nxt_lsn is NULL, this txn is fully undone
            CONTINUE
        
        IF log_record.type IN ('UPDATE', 'INSERT', 'DELETE'):
            -- Undo this modification
            page = read_page(log_record.page_id)
            page = apply_undo(log_record, page)
            
            -- Write CLR to log
            clr = write_clr(
                txn_id: txn_id,
                undo_of: lsn,
                undo_nxt_lsn: log_record.prev_lsn,  -- Next to undo
                page_id: log_record.page_id
            )
            
            page.page_lsn = clr.lsn
            write_page(page)
            
            -- Continue with previous log record of this txn
            IF log_record.prev_lsn IS NOT NULL:
                undo_queue.insert(log_record.prev_lsn, txn_id)
            ELSE:
                -- Reached beginning of txn; write END record
                write_end_record(txn_id)
                LOG("Transaction " + txn_id + " fully undone")
    
    LOG("Undo phase complete")
    RETURN SUCCESS

Why write CLRs during undo?

Compensation Log Records serve two purposes:

Durability of undo operations: CLRs are log records, so they're durable. If the system crashes during undo, recovery can continue undo from where it left off by reading the CLRs.
Prevent redo of undone work: Each CLR marks that a modification was undone. If redo runs again after a recovery crash, it will see the CLR and know not to redo the original (now-undone) operation.

The undo_nxt_lsn chain:

Each CLR contains an undo_nxt_lsn pointing to the next log record that needs undoing. This chain enables:

Resumption after recovery crash (start from CLR's undo_nxt_lsn)
Efficient backward traversal (don't re-examine already-undone records)
Tracking progress even during nested undo operations

Undo Is Usually Fast

Recovery Completion and System Opening

After undo completes, the database is in a transaction-consistent state. But before opening to users, a few finishing steps are required.

Finalization steps:

Recovery Finalization

•Write END records for all losers — Mark each undone transaction as complete in the log. This prevents re-undoing on subsequent recovery.
•Force log to disk — Ensure all CLRs and END records are durable. The recovery is not safe until this completes.
•Take a checkpoint — Some systems take a checkpoint immediately after recovery. This minimizes work if another crash occurs soon.
•Update control files — Mark the database as cleanly recovered. Set the shutdown flag appropriately.
•Open for user connections — Allow normal database operations to resume.
•Background cleanup — Some recovery work may continue in background (index rebuilds, space reclamation) but doesn't block user access.

recovery_completion.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- Recovery completion and database opening
 
FUNCTION complete_recovery():
    -- Ensure all recovery work is durable
    force_log_to_stable_storage()
    
    LOG("Recovery complete. Summary:")
    LOG("  Redo: " + stats.redo_records + " records processed")
    LOG("  Undo: " + stats.losers_count + " transactions rolled back")
    LOG("  Duration: " + stats.duration_seconds + " seconds")
    
    -- Optional: take immediate checkpoint to protect recovery work
    IF config.checkpoint_after_recovery:
        perform_checkpoint(reason = "POST_RECOVERY")
    
    -- Update control file
    update_control_file(
        state: CLEAN,
        last_recovery_time: NOW(),
        recovery_lsn: end_of_log
    )
    
    -- Open database for connections
    set_database_state(OPEN)
    LOG("Database open for user connections")
    
    -- Start background processes
    start_background_writer()
    start_checkpoint_scheduler()
    start_archiver()
    
    -- Some systems continue background work
    IF deferred_work IS NOT EMPTY:
        schedule_background_cleanup(deferred_work)
        LOG("Background cleanup scheduled for " + COUNT(deferred_work) + " items")
 
-- Timeline of recovery phases
RECOVERY_TIMELINE = {
    startup:   "~1-5 seconds (read control files)",
    analysis:  "~10-30 seconds per GB of log (log scan only)",
    redo:      "~1-5 minutes per GB of data (I/O bound)",
    undo:      "~seconds to minutes (depends on loser work)",
    finalize:  "~1-5 seconds (log force, open)"
}

Recovery time breakdown:

In most recoveries:

Analysis is fast (log scan only, no data I/O)
Redo dominates (reading pages, applying changes, writing pages)
Undo is fast (few losers, limited work)
Finalization is fast (log force and state updates)

Why checkpoint after recovery?

If the system crashes again soon after recovery, it would need to redo the same work. A post-recovery checkpoint:

Flushes all the redone pages to disk
Establishes a new recovery start point
Minimizes repeated work

Some systems make this optional (immediate availability is valued more than crash protection).

Distributed Transaction Considerations

How Checkpoint Quality Affects Recovery

Not all checkpoints are equal. The quality of a checkpoint—how recent its redo_lsn is, how accurate its tables are—directly affects recovery speed.

Factors that improve checkpoint quality:

Checkpoint Quality Factors and Recovery Impact
Quality Factor	Good State	Poor State	Recovery Impact
Dirty page age	Most pages recently flushed	Many old dirty pages	Redo_lsn close to checkpoint vs. far behind
Checkpoint frequency	Checkpoints every 5 min	Checkpoints every hour	5 min of log replay vs. 1 hour
Active transaction count	Few long-running txns	Many long-running txns	Small ATT vs. large ATT in undo
Log generation rate	Steady, moderate rate	High burst rate	Predictable redo time vs. variable
Background flushing	Aggressive, continuous	Minimal, lazy	Short redo vs. long redo

The redo_lsn is the key metric:

The most important checkpoint quality indicator is the distance from redo_lsn to the checkpoint LSN:

Ideal: redo_lsn equals checkpoint LSN (no dirty pages predating checkpoint)
Good: redo_lsn within seconds of checkpoint
Acceptable: redo_lsn within the checkpoint interval
Poor: redo_lsn much older than checkpoint

When redo_lsn is old, it means dirty pages haven't been flushed. Even though the checkpoint is recent, recovery must go back further into the log.

checkpoint_quality_monitoring.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Monitoring checkpoint quality for recovery prediction
 
-- PostgreSQL: Estimate recovery time from current state
SELECT 
    pg_current_wal_lsn() AS current_lsn,
    (SELECT redo_lsn FROM pg_control_checkpoint()) AS checkpoint_redo_lsn,
    pg_wal_lsn_diff(
        pg_current_wal_lsn(), 
        (SELECT redo_lsn FROM pg_control_checkpoint())
    ) / 1024 / 1024 AS log_distance_mb,
    
    -- Rough estimate: assume 100 MB/s redo rate
    pg_wal_lsn_diff(
        pg_current_wal_lsn(), 
        (SELECT redo_lsn FROM pg_control_checkpoint())
    ) / 1024 / 1024 / 100 AS estimated_redo_seconds;
 
-- MySQL/InnoDB: Check checkpoint age
SELECT 
    (@@innodb_log_file_size * @@innodb_log_files_in_group) AS log_space_bytes,
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status 
     WHERE VARIABLE_NAME = 'Innodb_checkpoint_age') AS checkpoint_age,
    -- checkpoint_age / log_space = how full the log is
    -- Higher = potentially longer recovery
    ROUND(
        (SELECT VARIABLE_VALUE FROM performance_schema.global_status 
         WHERE VARIABLE_NAME = 'Innodb_checkpoint_age') /
        (@@innodb_log_file_size * @@innodb_log_files_in_group) * 100, 2
    ) AS log_fill_percentage;
 
-- Alert if recovery estimate exceeds threshold
-- Example alert condition:
-- IF estimated_redo_seconds > target_recovery_time * 0.8:
--     ALERT("Recovery time approaching SLA threshold")

Test Recovery Regularly

Practical Recovery Scenarios

Let's examine recovery in concrete scenarios, tracing through the phases with example data.

Scenario: E-commerce Database Crash

An e-commerce database crashes during peak traffic. At crash time:

Last checkpoint was 3 minutes ago
50,000 pages in DPT with redo_lsn 2.5 minutes before checkpoint
150 active transactions (ATT)
500 MB of log since checkpoint

Recovery trace:

recovery_scenario.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
-- Recovery scenario walkthrough
 
-- STARTUP (0-2 seconds)
[14:35:00] Database startup initiated
[14:35:00] Reading control file...
[14:35:00] Detected unclean shutdown at 14:34:55
[14:35:01] Master record points to checkpoint at LSN 1000000
[14:35:01] Reading checkpoint record...
[14:35:01] Checkpoint loaded:
           - ATT entries: 150 transactions
           - DPT entries: 50,000 pages
           - Redo_lsn: 980000 (2.5 min before checkpoint)
           - Checkpoint taken at: 14:32:00
 
-- ANALYSIS PHASE (2-8 seconds)
[14:35:02] Starting analysis from LSN 980000
[14:35:02] Scanning 500 MB of log records...
[14:35:05] Analysis progress: 250 MB scanned (50%)
[14:35:08] Analysis complete:
           - Final ATT: 35 transactions (losers)
              - 115 transactions committed after checkpoint
           - Final DPT: 52,000 pages
              - 2,000 pages dirtied after checkpoint
           - Redo_lsn confirmed: 980000
 
-- REDO PHASE (8-68 seconds)
[14:35:08] Starting redo from LSN 980000
[14:35:08] Processing 650,000 log records...
[14:35:15] Redo progress: 100 MB / 550 MB total
[14:35:25] Redo progress: 250 MB / 550 MB total  
[14:35:40] Redo progress: 400 MB / 550 MB total
[14:35:55] Redo progress: 550 MB / 550 MB total
[14:36:00] Redo complete:
           - Records processed: 650,000
           - Pages actually redone: 48,500
           - Pages skipped (already flushed): 3,500
 
-- UNDO PHASE (68-78 seconds)
[14:36:00] Starting undo for 35 loser transactions
[14:36:02] Undoing transaction 1001 (1,200 modifications)
[14:36:04] Undoing transaction 1005 (450 modifications)
... (33 more transactions)
[14:36:08] Undo complete:
           - Transactions undone: 35
           - Total modifications reversed: 12,500
           - CLRs written: 12,500
 
-- FINALIZATION (78-82 seconds)
[14:36:08] Forcing log to stable storage
[14:36:09] Performing post-recovery checkpoint
[14:36:10] Updating control file
[14:36:10] Database open for connections
 
-- SUMMARY
[14:36:10] Recovery complete
           Total time: 70 seconds
           - Analysis: 6 seconds
           - Redo: 52 seconds
           - Undo: 8 seconds  
           - Finalization: 4 seconds

Observations from this scenario:

Redo dominated: 52 of 70 seconds (74%) was redo. This is typical.
DPT fuzziness resolved: 50,000 pages in DPT, but only 48,500 actually needed redo. The rest were flushed between checkpoint and crash.
ATT fuzziness resolved: 150 transactions in checkpoint ATT, but only 35 were losers. The rest committed after checkpoint.
3,500 pages skipped: Background flushing between checkpoint and crash saved redo work.
35 losers were small: Even 35 active transactions only needed 12,500 undo operations—8 seconds of work.

Compare to RTO

This 70-second recovery meets a 2-minute RTO easily. If the RTO were 30 seconds, adjustments would be needed: more frequent checkpoints, more aggressive flushing, or faster storage.

Summary: Recovery with Checkpoints

Key Takeaways

•Recovery starts by finding the checkpoint — The master record points to the most recent valid checkpoint, which provides ATT, DPT, and redo_lsn.
•Analysis updates ATT and DPT to crash-time state — Log scan discovers commits, new transactions, and new dirty pages since checkpoint.
•Redo replays history to reconstruct crash-time state — Page LSN comparison determines which operations actually need redo.
•Undo rolls back loser transactions — CLRs ensure undo is durable and resumable if recovery crashes.
•Page LSN resolves checkpoint fuzziness — We don't need to know exactly when pages were flushed; we can check each page directly.
•Checkpoint quality directly affects recovery time — Recent redo_lsn means shorter redo; fewer active transactions means shorter undo.
•Recovery time is predictable and testable — Monitor redo_lsn distance and test actual recovery to validate RTO compliance.

Module conclusion:

With this page, we've completed our deep dive into checkpoints. You now understand:

What checkpoints are and what they capture
How the checkpoint process works, including fuzzy checkpoints
When to checkpoint (frequency and triggers)
How recovery uses checkpoint information to restore consistency

Module Complete