Loading content...
The Analysis Phase transforms raw log data into a precise reconstruction of system state at crash time. This reconstructed state answers every question the Redo and Undo phases will ask:
This final page of the Analysis Phase module synthesizes everything we've learned. We examine how the Dirty Page Table, Transaction Table, and RedoLSN combine into a coherent recovery plan. We explore the invariants that must hold, the validation checks that ensure correctness, and the handoff to subsequent phases.
By the end of this page, you will understand the complete output of the Analysis Phase, how to validate reconstructed state for correctness, how this state integrates with Redo and Undo phases, and how to diagnose recovery issues by examining Analysis Phase output.
When the Analysis Phase completes, it produces a structured output that completely characterizes the recovery work needed. Let's examine each component.
1234567891011121314151617181920212223242526272829303132333435363738394041
/** * Complete output of the Analysis Phase * This structure contains everything needed for Redo and Undo phases */struct AnalysisPhaseResult { // === Dirty Page Table (Final State) === // Maps PageID -> DirtyPageEntry // Contains all pages that might need redo HashMap<PageID, DirtyPageEntry> dirty_page_table; // === Transaction Table (Final State) === // Maps TransactionID -> TransactionEntry // Contains all transactions that were active at crash HashMap<TransactionID, TransactionEntry> transaction_table; // === Redo Control === LSN redo_lsn; // Where Redo Phase starts scanning LSN end_of_log_lsn; // Last valid LSN in the log (where analysis stopped) // === Undo Control === vector<UndoTask> undo_list; // Transactions requiring rollback, sorted set<TransactionID> winners; // Transactions that committed (for verification) // === Metadata === LSN checkpoint_lsn; // The checkpoint we started from uint64 log_records_scanned; // Number of records processed uint64 analysis_duration_ms; // Time taken for analysis // === Validation === bool is_valid; // Did analysis complete successfully? string error_message; // If not valid, what went wrong}; /** * Each UndoTask specifies a transaction to undo and where to start */struct UndoTask { TransactionID txn_id; LSN undo_next_lsn; // Start undoing from this LSN int estimated_records; // Approximate log records to process};| Component | Purpose | Used By |
|---|---|---|
| Dirty Page Table | Pages potentially needing redo | Redo Phase |
| Transaction Table | Active transactions at crash | Undo Phase |
| RedoLSN | Starting point for log replay | Redo Phase |
| Undo List | Transactions to roll back, ordered | Undo Phase |
| Winners Set | Committed transactions for verification | Validation, logging |
| Metadata | Performance metrics and diagnostics | Monitoring, tuning |
The reconstructed state must satisfy certain invariants for recovery to proceed correctly. Validating these invariants catches bugs in both the logging subsystem and the recovery implementation.
Critical Invariants:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
bool AnalysisPhaseResult::validate() { is_valid = true; error_message = ""; // Invariant 1: RedoLSN ≤ min(recLSN) for (const auto& [pid, entry] : dirty_page_table) { if (redo_lsn > entry.recovery_lsn) { is_valid = false; error_message = format( "Invariant 1 violated: RedoLSN %d > recLSN %d for page %d", redo_lsn, entry.recovery_lsn, pid); return false; } } // Invariant 2: undoNextLSN ≤ lastLSN for all undo tasks for (const auto& task : undo_list) { auto it = transaction_table.find(task.txn_id); if (it != transaction_table.end()) { if (task.undo_next_lsn > it->second.last_lsn) { is_valid = false; error_message = format( "Invariant 2 violated: T%d undoNextLSN %d > lastLSN %d", task.txn_id, task.undo_next_lsn, it->second.last_lsn); return false; } } } // Invariant 3: Undo list and winners are disjoint for (const auto& task : undo_list) { if (winners.contains(task.txn_id)) { is_valid = false; error_message = format( "Invariant 3 violated: T%d in both undo list and winners", task.txn_id); return false; } } // Invariant 4: RedoLSN ≤ end of log if (redo_lsn > end_of_log_lsn && !dirty_page_table.empty()) { is_valid = false; error_message = format( "Invariant 4 violated: RedoLSN %d > end_of_log %d", redo_lsn, end_of_log_lsn); return false; } // Invariant 5: UndoNextLSN valid for all undo tasks for (const auto& task : undo_list) { auto it = transaction_table.find(task.txn_id); if (it != transaction_table.end() && it->second.log_records > 0 && task.undo_next_lsn == INVALID_LSN) { is_valid = false; error_message = format( "Invariant 5 violated: T%d has %d records but invalid undoNextLSN", task.txn_id, it->second.log_records); return false; } } log_info("Analysis result validation passed"); return true;}If validation fails, the database MUST NOT proceed with recovery using this state. Invariant violations indicate serious bugs in logging, checkpointing, or analysis—proceeding could cause data corruption. The DBA must investigate using log dumps and potentially restore from backup.
The reconstructed DPT at the end of Analysis Phase tells us everything about page-level recovery needs. Let's examine what we can learn from it.
DPT Characteristics:
123456789101112131415161718192021222324252627282930313233343536373839404142434445
void AnalysisPhaseResult::analyzeDPT() { log_info("=== Dirty Page Table Analysis ==="); log_info("Total dirty pages: %d", dirty_page_table.size()); // Find LSN range LSN min_rec_lsn = MAX_LSN; LSN max_rec_lsn = 0; for (const auto& [pid, entry] : dirty_page_table) { min_rec_lsn = min(min_rec_lsn, entry.recovery_lsn); max_rec_lsn = max(max_rec_lsn, entry.recovery_lsn); } log_info("RecLSN range: [%d, %d]", min_rec_lsn, max_rec_lsn); log_info("Span: %d LSN units", max_rec_lsn - min_rec_lsn); // Categorize pages by age LSN quarter = (max_rec_lsn - min_rec_lsn) / 4; int old_pages = 0, medium_pages = 0, new_pages = 0, very_new_pages = 0; for (const auto& [pid, entry] : dirty_page_table) { LSN age = entry.recovery_lsn - min_rec_lsn; if (age < quarter) old_pages++; else if (age < 2 * quarter) medium_pages++; else if (age < 3 * quarter) new_pages++; else very_new_pages++; } log_info("Page age distribution:"); log_info(" Oldest quarter: %d pages", old_pages); log_info(" 2nd quarter: %d pages", medium_pages); log_info(" 3rd quarter: %d pages", new_pages); log_info(" Newest quarter: %d pages", very_new_pages); // Check for problematic patterns if (old_pages > dirty_page_table.size() / 2) { log_warn("Many old dirty pages - consider more aggressive flushing"); } // Estimate redo work LSN redo_span = end_of_log_lsn - redo_lsn; log_info("Redo will scan %d LSN units", redo_span); log_info("Estimated redo time: %.1f seconds", estimateRedoTime(redo_span, dirty_page_table.size()));}| Pattern | Meaning | Action |
|---|---|---|
| Many old recLSNs | Long-dirty pages, poor flushing | Investigate flushing policy |
| All recent recLSNs | Aggressive checkpoint/flushing | Good configuration |
| Clustered pages | Hot tables/indexes | Consider partitioning |
| Scattered pages | Random workload | Normal for OLTP |
| Very large DPT | Large buffer pool, infrequent flush | May need tuning |
The Transaction Table at the end of Analysis contains only transactions that need attention—those that didn't fully complete before the crash. Let's analyze what this tells us.
Transaction Classifications:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
void AnalysisPhaseResult::analyzeTransactionTable() { log_info("=== Transaction Table Analysis ==="); log_info("Total transactions requiring attention: %d", transaction_table.size()); int active_count = 0; int aborting_count = 0; int committing_count = 0; for (const auto& [tid, entry] : transaction_table) { switch (entry.state) { case TransactionState::UNDO: case TransactionState::ACTIVE: active_count++; break; case TransactionState::ABORTING: aborting_count++; break; case TransactionState::COMMITTING: committing_count++; break; default: log_warn("Unexpected state %d for T%d", entry.state, tid); } } log_info("State distribution:"); log_info(" Active (need undo): %d", active_count); log_info(" Aborting (continue undo): %d", aborting_count); log_info(" Committing (need END): %d", committing_count); // Analyze undo work log_info("\nUndo work estimation:"); long total_undo_records = 0; for (const auto& task : undo_list) { auto it = transaction_table.find(task.txn_id); if (it != transaction_table.end()) { long records = it->second.log_records; total_undo_records += records; if (records > 1000) { log_info(" T%d: ~%d records (large transaction)", task.txn_id, records); } } } log_info("Total estimated undo records: %d", total_undo_records); log_info("Estimated undo time: %.1f seconds", estimateUndoTime(total_undo_records)); // Check for concerning patterns for (const auto& [tid, entry] : transaction_table) { // Very long-running transaction LSN age = end_of_log_lsn - entry.first_lsn; if (age > 1000000) { // Arbitrary threshold log_warn("T%d is very old (span: %d LSN units) - may slow recovery", tid, age); } // Transaction with many operations if (entry.log_records > 100000) { log_warn("T%d has %d log records - undo will be slow", tid, entry.log_records); } }}Transactions are called 'winners' if they committed (COMMIT record seen) and 'losers' if they didn't. Only losers require undo. The winners set isn't strictly needed for recovery but is useful for logging, auditing, and understanding what happened at crash time.
The undo list is the ordered set of transactions that must be rolled back. The ordering is critical for efficiency—ARIES processes undo in a specific manner to minimize log reads.
Undo Processing Order:
ARIES uses a single backward pass through the log for all undo operations. To enable this, the undo list is ordered by undoNextLSN in descending order. As undo proceeds, it encounters each transaction's next-to-undo record in sequence.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
vector<UndoTask> AnalysisPhaseResult::buildUndoList() { vector<UndoTask> undo_list; // Collect all transactions needing undo for (const auto& [tid, entry] : transaction_table) { bool needs_undo = false; switch (entry.state) { case TransactionState::ACTIVE: case TransactionState::UNDO: // Active transaction that never committed needs_undo = true; break; case TransactionState::ABORTING: // Transaction was in the middle of rollback needs_undo = true; break; case TransactionState::COMMITTING: // Transaction committed - just needs END record // Add to "needs cleanup" but not undo needs_undo = false; winners.insert(tid); // Track as winner break; default: log_warn("Unexpected transaction state for T%d", tid); } if (needs_undo && entry.undo_next_lsn != INVALID_LSN) { UndoTask task; task.txn_id = tid; task.undo_next_lsn = entry.undo_next_lsn; task.estimated_records = entry.log_records; undo_list.push_back(task); } } // Sort by undoNextLSN DESCENDING // This enables efficient single-pass backward log scan sort(undo_list.begin(), undo_list.end(), [](const UndoTask& a, const UndoTask& b) { return a.undo_next_lsn > b.undo_next_lsn; }); log_info("Undo list built: %d transactions, max undoNextLSN=%d", undo_list.size(), undo_list.empty() ? 0 : undo_list[0].undo_next_lsn); return undo_list;}Processing in descending undoNextLSN order means we naturally encounter records in log order during a single backward scan. When we process T3's record at LSN 5000, we're positioned to then process T1's at 4500, then T5's at 4000, etc. No random seeks needed—just continuous backward reading.
With Analysis complete, the system transitions to the Redo Phase. The Analysis output provides everything Redo needs to efficiently reconstruct crash-time page states.
What Redo Receives:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
class RedoPhase {public: void initialize(const AnalysisPhaseResult& analysis_result) { // Store references to analysis output this->dirty_page_table = analysis_result.dirty_page_table; this->redo_lsn = analysis_result.redo_lsn; this->end_of_log_lsn = analysis_result.end_of_log_lsn; // Pre-calculate redo scope this->redo_scope = end_of_log_lsn - redo_lsn; log_info("Redo Phase initialized:"); log_info(" Start LSN: %d", redo_lsn); log_info(" End LSN: %d", end_of_log_lsn); log_info(" Scope: %d LSN units", redo_scope); log_info(" Dirty pages to consider: %d", dirty_page_table.size()); } void run() { log_info("Starting Redo Phase..."); LSN current_lsn = redo_lsn; int records_processed = 0; int records_applied = 0; while (current_lsn < end_of_log_lsn) { LogRecord record = log_manager.readLog(current_lsn); records_processed++; if (shouldRedo(record)) { applyRedo(record); records_applied++; } current_lsn = record.next_lsn; // Progress logging if (records_processed % 100000 == 0) { log_info("Redo progress: %d/%d records, %d applied", records_processed, redo_scope, records_applied); } } log_info("Redo Phase complete: %d records processed, %d applied", records_processed, records_applied); } private: bool shouldRedo(const LogRecord& record) { // Only consider redoable record types if (!record.isRedoable()) return false; PageID pid = record.page_id; // Check 1: Is this page in the DPT? if (!dirty_page_table.contains(pid)) { // Page not dirty - already on disk return false; } // Check 2: Is record LSN >= page's recLSN? const DirtyPageEntry& entry = dirty_page_table.get(pid); if (record.lsn < entry.recovery_lsn) { // Record is before this page became dirty - already on disk return false; } // Check 3: Read page and check pageLSN Page* page = buffer_manager.getPage(pid); if (page->getPageLSN() >= record.lsn) { // Page already has this change return false; } // All checks passed - this record needs to be redone return true; }};Notice how the DPT enables two levels of filtering before even reading the page: records for clean pages are skipped entirely, and records before a page's recLSN are skipped. The final pageLSN check handles cases where the page was actually flushed but remained in the conservative DPT.
After Redo completes, the Undo Phase uses the transaction table and undo list to roll back incomplete transactions. The Analysis output ensures Undo knows exactly what work to perform.
What Undo Receives:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
class UndoPhase {public: void initialize(const AnalysisPhaseResult& analysis_result) { this->transaction_table = analysis_result.transaction_table; this->undo_list = analysis_result.undo_list; // Calculate total undo work int total_records = 0; for (const auto& task : undo_list) { total_records += task.estimated_records; } log_info("Undo Phase initialized:"); log_info(" Transactions to undo: %d", undo_list.size()); log_info(" Estimated records: %d", total_records); } void run() { log_info("Starting Undo Phase..."); // Build "ToUndo" set of LSNs from all undo tasks // This enables efficient single backward pass set<LSN, greater<LSN>> to_undo; // Descending order for (const auto& task : undo_list) { if (task.undo_next_lsn != INVALID_LSN) { to_undo.insert(task.undo_next_lsn); } } int records_undone = 0; // Process in descending LSN order (single backward pass) while (!to_undo.empty()) { LSN lsn = *to_undo.begin(); to_undo.erase(to_undo.begin()); LogRecord record = log_manager.readLog(lsn); TransactionID tid = record.txn_id; if (record.type == LogType::CLR) { // CLR: follow its undoNextLSN if (record.undo_next_lsn != INVALID_LSN) { to_undo.insert(record.undo_next_lsn); } } else if (record.isUndoable()) { // Undoable record: perform undo and log CLR performUndo(record); // Add CLR to log LogRecord clr = createCLR(record); LSN clr_lsn = log_manager.appendLog(clr); // Update transaction's lastLSN transaction_table.get(tid).last_lsn = clr_lsn; // Continue undo with record's prevLSN if (record.prev_lsn != INVALID_LSN) { to_undo.insert(record.prev_lsn); } records_undone++; } // Check if this transaction is done undoing if (record.prev_lsn == INVALID_LSN || !to_undo.contains(record.prev_lsn)) { // Transaction fully undone - write END writeEndRecord(tid); } } log_info("Undo Phase complete: %d records undone", records_undone); }};| Record Type | Action | Next LSN to Process |
|---|---|---|
| UPDATE | Apply inverse operation, log CLR | record.prevLSN |
| INSERT | Delete the inserted record, log CLR | record.prevLSN |
| DELETE | Re-insert the deleted record, log CLR | record.prevLSN |
| CLR | Nothing (already re-done if needed) | clr.undoNextLSN |
| BEGIN | Write END record, remove from undo list | Done |
Let's visualize how the Analysis Phase output flows through the complete ARIES recovery process.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
void RecoveryManager::performRecovery() { log_info("========================================"); log_info("DATABASE RECOVERY STARTED"); log_info("========================================"); auto start_time = chrono::high_resolution_clock::now(); // Phase 1: Analysis log_info("\n=== ANALYSIS PHASE ==="); AnalysisPhase analysis; AnalysisPhaseResult result = analysis.run(); if (!result.validate()) { log_error("Analysis validation failed: %s", result.error_message.c_str()); throw RecoveryError("Invalid analysis result - cannot proceed"); } auto analysis_end = chrono::high_resolution_clock::now(); log_info("Analysis completed in %d ms", chrono::duration_cast<chrono::milliseconds>(analysis_end - start_time).count()); // Phase 2: Redo log_info("\n=== REDO PHASE ==="); if (result.redo_lsn == END_OF_LOG || result.dirty_page_table.empty()) { log_info("No redo required - database pages are current"); } else { RedoPhase redo; redo.initialize(result); redo.run(); } auto redo_end = chrono::high_resolution_clock::now(); // Phase 3: Undo log_info("\n=== UNDO PHASE ==="); if (result.undo_list.empty()) { log_info("No undo required - all transactions committed"); } else { UndoPhase undo; undo.initialize(result); undo.run(); } auto undo_end = chrono::high_resolution_clock::now(); // Summary log_info("\n========================================"); log_info("RECOVERY COMPLETE"); log_info("========================================"); log_info("Analysis time: %d ms", chrono::duration_cast<chrono::milliseconds>(analysis_end - start_time).count()); log_info("Redo time: %d ms", chrono::duration_cast<chrono::milliseconds>(redo_end - analysis_end).count()); log_info("Undo time: %d ms", chrono::duration_cast<chrono::milliseconds>(undo_end - redo_end).count()); log_info("Total time: %d ms", chrono::duration_cast<chrono::milliseconds>(undo_end - start_time).count()); log_info("Transactions recovered: %d (winners), %d (losers)", result.winners.size(), result.undo_list.size());}The Analysis Phase reconstructs a complete picture of the database system at crash time. This information forms the foundation for efficient, correct recovery. Let's consolidate everything we've learned:
Module Complete:
You have now mastered the ARIES Analysis Phase—the critical first step of crash recovery that transforms raw log data into precise recovery instructions. With this knowledge, you understand:
The next modules cover the Redo Phase (repeating history) and Undo Phase (rolling back losers), completing the ARIES recovery algorithm.
Congratulations! You have completed the Analysis Phase module of ARIES recovery. You now understand how database systems reconstruct their exact state after a crash, enabling the subsequent Redo and Undo phases to restore consistency efficiently. This knowledge is fundamental for database engineers, DBAs, and anyone working with production database systems.