Loading content...
The Analysis Phase's core activity is a forward scan through the log—starting from the checkpoint and proceeding to the log's end. This single pass must accomplish multiple critical tasks:
This page examines the log scanning algorithm in exhaustive detail. We'll trace through log records of various types, observing how each affects the DPT and Transaction Table. By the end, you'll understand exactly how ARIES transforms a raw log stream into precise recovery instructions.
By the end of this page, you will understand the complete log scanning algorithm—which log records are processed, how each record type updates the DPT and Transaction Table, and how the scan establishes the foundation for efficient redo and undo. You'll be able to trace through concrete log sequences and predict the resulting recovery state.
Before scanning can begin, the recovery system must locate the most recent valid checkpoint. This is the point from which all analysis proceeds.
The Master Record:
ARIES maintains a master record in a well-known location on stable storage (often the first sector of the database file or a dedicated recovery file). This record contains the LSN of the most recent successful checkpoint's BEGIN_CHECKPOINT record.
Recovery Startup Sequence:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
struct MasterRecord { LSN last_checkpoint_lsn; // Points to BEGIN_CHECKPOINT uint64 checkpoint_number; // Monotonically increasing uint32 checksum; // Validates integrity}; LSN RecoveryManager::findLastCheckpoint() { // Step 1: Read master record from known location MasterRecord master = readMasterRecord(); // Step 2: Validate checksum if (!validateChecksum(master)) { // Master record corrupted - need backup strategy log_warn("Master record checksum failed, searching for checkpoint"); return findCheckpointByScanning(); } // Step 3: Verify the checkpoint record exists and is valid LSN ckpt_lsn = master.last_checkpoint_lsn; LogRecord ckpt_record = log_manager.readLog(ckpt_lsn); if (ckpt_record.type != LogType::BEGIN_CHECKPOINT) { log_error("Master record points to non-checkpoint LSN %d", ckpt_lsn); return findCheckpointByScanning(); } // Step 4: Find the corresponding END_CHECKPOINT // It contains the actual DPT and Transaction Table snapshots LSN end_ckpt_lsn = findEndCheckpoint(ckpt_lsn); if (end_ckpt_lsn == INVALID_LSN) { // Crash occurred during checkpoint - use previous checkpoint log_warn("Incomplete checkpoint at %d, seeking earlier", ckpt_lsn); return findPreviousCheckpoint(ckpt_lsn); } log_info("Found valid checkpoint: BEGIN=%d, END=%d", ckpt_lsn, end_ckpt_lsn); return ckpt_lsn;} LSN RecoveryManager::findEndCheckpoint(LSN begin_lsn) { // Scan forward from BEGIN_CHECKPOINT to find matching END_CHECKPOINT LSN current = begin_lsn; while (current != END_OF_LOG) { LogRecord record = log_manager.readLog(current); if (record.type == LogType::END_CHECKPOINT) { // Verify it matches our BEGIN_CHECKPOINT if (record.begin_checkpoint_lsn == begin_lsn) { return current; } } current = record.next_lsn; } return INVALID_LSN; // No matching END found}If the system crashed during checkpointing, the END_CHECKPOINT record may not exist. In this case, recovery must use the previous complete checkpoint. This is why the master record is only updated AFTER the checkpoint is fully written and flushed to disk.
Once the checkpoint is located, the Analysis Phase initializes both the Dirty Page Table and Transaction Table from the checkpoint's snapshot data. This gives us a starting point that reflects system state at checkpoint time.
Loading Checkpoint Tables:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
void AnalysisPhase::initializeFromCheckpoint(LSN checkpoint_lsn) { // Read the END_CHECKPOINT record which contains the snapshots LSN end_ckpt_lsn = findEndCheckpoint(checkpoint_lsn); EndCheckpointRecord end_ckpt = log_manager.readLog(end_ckpt_lsn); // Initialize Dirty Page Table dirty_page_table.clear(); for (const auto& dpt_entry : end_ckpt.dirty_page_table) { DirtyPageEntry entry; entry.page_id = dpt_entry.page_id; entry.recovery_lsn = dpt_entry.recovery_lsn; dirty_page_table.insert(dpt_entry.page_id, entry); } log_info("Loaded DPT from checkpoint: %d dirty pages", end_ckpt.dirty_page_table.size()); // Initialize Transaction Table transaction_table.clear(); for (const auto& txn_entry : end_ckpt.transaction_table) { TransactionEntry entry; entry.txn_id = txn_entry.txn_id; entry.state = txn_entry.state; entry.last_lsn = txn_entry.last_lsn; entry.undo_next_lsn = txn_entry.undo_next_lsn; transaction_table.insert(txn_entry.txn_id, entry); } log_info("Loaded Transaction Table: %d active transactions", end_ckpt.transaction_table.size()); // Set the scan starting point // Start from the BEGIN_CHECKPOINT LSN // (Some implementations start from END_CHECKPOINT instead) scan_start_lsn = checkpoint_lsn;} /* * Example Checkpoint State: * * BEGIN_CHECKPOINT at LSN 5000 * END_CHECKPOINT at LSN 5050 contains: * * Dirty Page Table: * Page 100: recLSN = 4500 * Page 200: recLSN = 4800 * Page 300: recLSN = 5010 (dirtied between BEGIN and END checkpoint) * * Transaction Table: * T1: state=ACTIVE, lastLSN=4900, undoNextLSN=4900 * T2: state=ACTIVE, lastLSN=5020, undoNextLSN=5020 * T3: state=ABORTING, lastLSN=5040, undoNextLSN=4600 * * These become the initial state for the log scan. */| Structure | Content | Purpose |
|---|---|---|
| Dirty Page Table | Page IDs + recLSNs from checkpoint | Tracks pages needing potential redo |
| Transaction Table | Txn IDs + states + lastLSNs | Tracks transactions needing potential undo |
| Scan Start LSN | BEGIN_CHECKPOINT LSN | Where log scanning begins |
With initialization complete, the Analysis Phase performs a single forward pass through the log from the checkpoint to the end. Each log record is processed according to its type, updating the DPT and Transaction Table as needed.
The Master Algorithm:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
struct AnalysisResult { LSN redo_lsn; // Where redo should start vector<UndoTask> undo_list; // Transactions needing undo DirtyPageTable final_dpt; // Crash-time DPT state TransactionTable final_txn_table; // Crash-time transaction state}; AnalysisResult AnalysisPhase::run() { // Step 1: Find and load checkpoint LSN checkpoint_lsn = findLastCheckpoint(); initializeFromCheckpoint(checkpoint_lsn); // Step 2: Scan log from checkpoint to end LSN current_lsn = checkpoint_lsn; while (true) { // Read next log record LogRecord record = log_manager.readLog(current_lsn); if (record.type == LogType::END_OF_LOG) { break; // Reached end of log } // Process this record processLogRecord(record, current_lsn); // Move to next record current_lsn = record.next_lsn; if (current_lsn == INVALID_LSN) { break; // Reached end } } // Step 3: Determine RedoLSN LSN redo_lsn = computeRedoLSN(); // Step 4: Build undo list from transaction table vector<UndoTask> undo_list = buildUndoList(); // Step 5: Return results AnalysisResult result; result.redo_lsn = redo_lsn; result.undo_list = undo_list; result.final_dpt = dirty_page_table; result.final_txn_table = transaction_table; log_info("Analysis complete: RedoLSN=%d, %d transactions to undo", redo_lsn, undo_list.size()); return result;}The Analysis Phase accomplishes all its goals in a single forward scan. This is crucial for recovery performance—logs can be enormous, and multiple passes would significantly increase recovery time. Each record is read exactly once.
Different log record types require different processing during analysis. Let's examine each category in detail.
The Record Processing Function:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
void AnalysisPhase::processLogRecord(const LogRecord& record, LSN lsn) { TransactionID tid = record.txn_id; switch (record.type) { // ===== Data Modification Records ===== case LogType::UPDATE: case LogType::INSERT: case LogType::DELETE: processDataModification(record, lsn); break; // ===== Compensation Log Records ===== case LogType::CLR: processCLR(record, lsn); break; // ===== Transaction Control Records ===== case LogType::BEGIN: processBegin(record, lsn); break; case LogType::COMMIT: processCommit(record, lsn); break; case LogType::ABORT: processAbort(record, lsn); break; case LogType::END: processEnd(record, lsn); break; // ===== Checkpoint Records ===== case LogType::BEGIN_CHECKPOINT: case LogType::END_CHECKPOINT: // Already processed during initialization - skip break; // ===== Other Records ===== case LogType::PREPARE: // For 2PC processPrepare(record, lsn); break; default: log_warn("Unknown log record type %d at LSN %d", record.type, lsn); }}UPDATE, INSERT, DELETE Records:
These records indicate data modifications and affect both the DPT (page was modified) and Transaction Table (transaction logged an operation).
1234567891011121314151617181920212223242526272829303132333435363738
void AnalysisPhase::processDataModification( const LogRecord& record, LSN lsn) { TransactionID tid = record.txn_id; PageID pid = record.page_id; // === Update Transaction Table === if (!transaction_table.contains(tid)) { // New transaction not seen before (started after checkpoint) TransactionEntry entry; entry.txn_id = tid; entry.state = TransactionState::UNDO; // Assume uncommitted entry.first_lsn = lsn; entry.last_lsn = lsn; entry.undo_next_lsn = lsn; transaction_table.insert(tid, entry); log_debug("New transaction %d discovered at LSN %d", tid, lsn); } else { // Update existing transaction TransactionEntry& entry = transaction_table.get(tid); entry.last_lsn = lsn; entry.undo_next_lsn = lsn; // Most recent operation to undo } // === Update Dirty Page Table === if (!dirty_page_table.contains(pid)) { // Page not in DPT - add it with this LSN as recLSN DirtyPageEntry dpt_entry; dpt_entry.page_id = pid; dpt_entry.recovery_lsn = lsn; dirty_page_table.insert(pid, dpt_entry); log_debug("Page %d added to DPT at LSN %d", pid, lsn); } // If page already in DPT, do NOT update recLSN // The original recLSN is the first dirty point}Real-world recovery encounters numerous edge cases that the Analysis Phase must handle correctly. Let's examine the most important ones.
Edge Case 1: Transaction Committed But No END Record
If the system crashed after writing COMMIT but before writing END, the transaction IS committed and should NOT be undone.
123456789101112131415161718192021222324252627282930313233343536373839
/* * Scenario Timeline: * * LSN 1000: T1 UPDATE page 10 * LSN 1050: T1 UPDATE page 20 * LSN 1100: T1 COMMIT * LSN 1110: Log flushed to disk (commit is durable) * --- CRASH before END record written --- * * Analysis sees: COMMIT at 1100, no END * Transaction state: COMMITTING * * Correct behavior: Do NOT undo T1's operations * During Redo: Replay T1's operations if needed * After Redo: Write END record for T1 to clean up */ vector<UndoTask> AnalysisPhase::buildUndoList() { vector<UndoTask> undo_list; for (const auto& [tid, entry] : transaction_table) { if (entry.state == TransactionState::COMMITTING) { // COMMIT seen but no END - transaction IS committed // Will write END during redo phase cleanup log_info("Transaction %d committed but incomplete - will write END", tid); continue; // Do NOT add to undo list } if (entry.state == TransactionState::UNDO) { // Needs undo UndoTask task; task.txn_id = tid; task.undo_next_lsn = entry.undo_next_lsn; undo_list.push_back(task); } } return undo_list;}Edge Case 2: Crash During Rollback
If the system was rolling back a transaction when it crashed, CLRs may have been written for some (but not all) undo operations. The undoNextLSN field ensures we don't redo already-completed undo work.
12345678910111213141516171819202122232425262728293031323334
/* * Scenario Timeline: * * LSN 100: T1 UPDATE page 10 * LSN 150: T1 UPDATE page 20 * LSN 200: T1 UPDATE page 30 * LSN 250: T1 ABORT (user requested rollback) * LSN 260: T1 CLR undoing LSN 200 (undoNextLSN = 150) * --- CRASH --- * * Analysis finds: * T1: state=UNDO, lastLSN=260, undoNextLSN=150 * * Undo phase will: * 1. Start from undoNextLSN=150 (NOT from 260) * 2. Undo LSN 150 (write CLR with undoNextLSN=100) * 3. Undo LSN 100 (write CLR with undoNextLSN=null) * 4. Write END for T1 * * LSN 200's undo is skipped because CLR 260 already handled it */ void AnalysisPhase::processCLR(const LogRecord& record, LSN lsn) { TransactionID tid = record.txn_id; TransactionEntry& entry = transaction_table.getOrCreate(tid); entry.last_lsn = lsn; // CLR's undoNextLSN tells us where rollback will resume // This is the KEY to avoiding repeated undo work entry.undo_next_lsn = record.undo_next_lsn; log_debug("T%d CLR: undoNextLSN advances to %d", tid, record.undo_next_lsn);}The CLR's undoNextLSN is vital for crash resilience during rollback. No matter how many times the system crashes during rollback, each restart continues from where it left off. Undo work is never repeated because each CLR advances the undoNextLSN pointer.
Edge Case 3: Page Written to Disk After Dirtying
The DPT may contain pages that have actually been written to disk (the page is now clean). This is conservative but safe—redo will check page LSNs and skip operations already applied.
123456789101112131415161718192021222324252627282930313233343536373839
/* * Why the DPT might be conservative: * * 1. Checkpoint captures DPT with Page 100 (recLSN=500) * 2. Background flush writes Page 100 to disk at LSN 600 * 3. Page 100 is now clean, but Analysis doesn't know this * 4. Crash occurs * 5. Analysis still has Page 100 in DPT * * Is this a problem? NO! * * During Redo: * - For each log record at LSN >= recLSN * - Read the page from disk * - Check page's pageLSN * - If pageLSN >= log record's LSN, skip (already applied) * - If pageLSN < log record's LSN, apply redo * * So even if Page 100 was flushed at LSN 600: * - Page's on-disk pageLSN = 600 * - Any log record with LSN <= 600 will be skipped * - Only records with LSN > 600 will be applied (if any) * * The redo phase is IDEMPOTENT - applying the same operation * multiple times produces the same result as applying it once. */ bool RedoPhase::shouldApplyRedo(const LogRecord& record, Page* page) { // Check if this operation is already reflected on disk LSN page_lsn = page->getPageLSN(); if (page_lsn >= record.lsn) { // Page already has this (and possibly later) changes log_debug("Skip redo of LSN %d: page LSN is %d", record.lsn, page_lsn); return false; } return true; // Need to apply this redo}Let's trace through a complete log scanning scenario to see how the DPT and Transaction Table evolve during analysis.
| LSN | Type | Txn | Page | Details |
|---|---|---|---|---|
| 1000 | BEGIN_CHECKPOINT | Checkpoint starts | ||
| 1050 | END_CHECKPOINT | DPT: {P10:800}, TxnTable: {T1:ACTIVE,lastLSN:900} | ||
| 1100 | UPDATE | T1 | P10 | prevLSN=900 |
| 1150 | UPDATE | T2 | P20 | prevLSN=null (new txn) |
| 1200 | UPDATE | T1 | P30 | prevLSN=1100 |
| 1250 | COMMIT | T1 | prevLSN=1200 | |
| 1300 | END | T1 | T1 complete | |
| 1350 | UPDATE | T2 | P20 | prevLSN=1150 |
| 1400 | ABORT | T2 | prevLSN=1350 | |
| 1450 | CLR | T2 | P20 | undoes 1350, undoNextLSN=1150 |
| CRASH | System failure |
Step-by-Step Analysis:
**Initial State from END_CHECKPOINT (LSN 1050):** DPT: - P10: recLSN = 800 Transaction Table: - T1: ACTIVE, lastLSN = 900, undoNextLSN = 900
**Action:** T1 modifies page P10 DPT: - P10: recLSN = 800 (unchanged - already tracked) Transaction Table: - T1: ACTIVE, lastLSN = 1100, undoNextLSN = 1100
**Action:** New transaction T2 modifies P20 DPT: - P10: recLSN = 800 - P20: recLSN = 1150 (NEW) Transaction Table: - T1: ACTIVE, lastLSN = 1100 - T2: UNDO, lastLSN = 1150, undoNextLSN = 1150 (NEW)
**Action:** T1 modifies new page P30 DPT: - P10: recLSN = 800 - P20: recLSN = 1150 - P30: recLSN = 1200 (NEW) Transaction Table: - T1: ACTIVE, lastLSN = 1200, undoNextLSN = 1200 - T2: UNDO, lastLSN = 1150
**Action:** T1 commits DPT: (unchanged) Transaction Table: - T1: COMMITTING, lastLSN = 1250 - T2: UNDO, lastLSN = 1150
**Action:** T1 fully completes DPT: (unchanged) Transaction Table: - T1: REMOVED - T2: UNDO, lastLSN = 1150
**Action:** T2 modifies P20 again DPT: - P10: recLSN = 800 - P20: recLSN = 1150 (unchanged) - P30: recLSN = 1200 Transaction Table: - T2: UNDO, lastLSN = 1350, undoNextLSN = 1350
**Action:** T2 begins abort DPT: (unchanged) Transaction Table: - T2: UNDO, lastLSN = 1400, undoNextLSN = 1350
**Action:** CLR records undo of LSN 1350 DPT: - P10: recLSN = 800 - P20: recLSN = 1150 (CLR is a modification) - P30: recLSN = 1200 Transaction Table: - T2: UNDO, lastLSN = 1450, undoNextLSN = 1150 (from CLR)
**Analysis Complete:** RedoLSN = min(800, 1150, 1200) = 800 Undo List: - T2: undoNextLSN = 1150 (continue rollback from here) Redo will start at LSN 800. Undo will process T2 starting at LSN 1150.
The Analysis Phase must be fast to minimize recovery time. Several factors affect its performance:
Log Read Performance:
The primary cost of analysis is reading log records from disk. Optimizations include:
| Factor | Impact | Mitigation |
|---|---|---|
| Log size since checkpoint | Linear | Frequent checkpoints |
| Log storage I/O speed | Direct | SSD storage, RAID |
| Hash table operations | O(1) per record | Good hash function, sizing |
| Checkpoint loading | Fixed overhead | Compact checkpoint format |
| CLR processing | Slightly more complex | Pre-allocated structures |
In practice, the Analysis Phase is usually fast compared to Redo and Undo phases. Analysis only reads the log and updates in-memory tables. Redo and Undo must read/write actual database pages, which involves much more I/O. Analysis typically completes in seconds even for large logs, while redo may take minutes to hours.
The log scanning process is the heart of the Analysis Phase—transforming a raw log stream into precise recovery instructions. Let's consolidate the essential concepts:
What's Next:
With log scanning complete and both tables reconstructed, we'll examine how the Analysis Phase determines the redo point—the exact LSN where the Redo Phase must begin processing. This determination involves synthesizing information from both the DPT and Transaction Table to find the earliest possible relevant log record.
You now understand the complete log scanning algorithm—how the Analysis Phase processes each record type, handles edge cases, and produces the reconstructed state needed for redo and undo. This single-pass algorithm is the foundation of efficient ARIES recovery.