Loading content...
Over the previous pages, we've explored the individual components of the ARIES redo phase: the repeating history philosophy, the mechanics of processing all operations, LSN comparison for correctness, and idempotence for safety. Now it's time to synthesize these concepts into the complete redo process.
This page presents the redo phase as a practitioner would implement and optimize it. We'll trace through a complete recovery scenario, analyze performance characteristics, examine real-world implementation details, and understand how the redo phase interacts with the analysis phase that precedes it and the undo phase that follows.
By the end, you'll have a complete mental model of how ARIES redo restores a crashed database to its pre-crash state, ready for the undo phase to remove uncommitted changes.
By the end of this page, you will: (1) Understand the complete redo phase algorithm step-by-step, (2) Trace through a realistic crash recovery scenario, (3) Analyze redo phase performance and optimization strategies, (4) See how redo connects to analysis and undo phases, and (5) Examine production implementation considerations.
Let's present the complete redo phase algorithm with all its components integrated:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231
class ARIESRedoPhase: """ Complete ARIES Redo Phase Implementation This class encapsulates the entire redo phase, from initialization using analysis phase results to completion and handoff to undo. """ def __init__(self, log_manager, buffer_manager, analysis_result): """ Initialize redo phase with analysis phase outputs. Args: log_manager: Access to the write-ahead log buffer_manager: Buffer pool for page access analysis_result: Output from analysis phase containing: - dirty_page_table: Pages that may need redo - transaction_table: Active transactions at crash - redo_lsn: Starting point for redo scan """ self.log = log_manager self.buffer = buffer_manager self.dpt = analysis_result.dirty_page_table self.txn_table = analysis_result.transaction_table self.redo_lsn = analysis_result.redo_lsn # Statistics tracking self.stats = { 'records_scanned': 0, 'records_skipped_type': 0, # Non-redoable record types 'records_skipped_dpt': 0, # Not in dirty page table 'records_skipped_reclsn': 0, # LSN < recLSN 'records_skipped_pagelsl': 0, # Page LSN >= record LSN 'records_redone': 0, # Actually applied 'pages_read': 0, 'start_time': None, 'end_time': None, } def execute(self): """ Execute the complete redo phase. Returns updated transaction table for undo phase. """ import time self.stats['start_time'] = time.time() print(f"{'='*60}") print("REDO PHASE: Starting") print(f"{'='*60}") print(f"RedoLSN: {self.redo_lsn}") print(f"Dirty pages in DPT: {len(self.dpt)}") print(f"Active transactions: {len(self.txn_table)}") print() # Main redo loop self._process_log() self.stats['end_time'] = time.time() # Report statistics self._report_statistics() # Return updated state for undo phase return { 'transaction_table': self.txn_table, 'pages_recovered': len(self.dpt) } def _process_log(self): """ Scan log from RedoLSN forward, processing each record. """ # Open forward scan starting at RedoLSN log_scan = self.log.open_scan( start_lsn=self.redo_lsn, direction='forward' ) for record in log_scan: self.stats['records_scanned'] += 1 self._process_record(record) log_scan.close() def _process_record(self, record): """ Process a single log record during redo. Implements the complete decision tree. """ # ======================================== # STEP 1: Filter by record type # ======================================== if not self._is_redoable_type(record): self.stats['records_skipped_type'] += 1 return page_id = record.page_id record_lsn = record.lsn # ======================================== # STEP 2: Check Dirty Page Table presence # ======================================== if page_id not in self.dpt: self.stats['records_skipped_dpt'] += 1 return # ======================================== # STEP 3: Check recLSN # ======================================== rec_lsn = self.dpt[page_id].rec_lsn if record_lsn < rec_lsn: self.stats['records_skipped_reclsn'] += 1 return # ======================================== # STEP 4: Read page and check page LSN # ======================================== page = self.buffer.fetch_page(page_id) self.stats['pages_read'] += 1 page_lsn = page.get_page_lsn() if page_lsn >= record_lsn: self.stats['records_skipped_pagelsl'] += 1 return # ======================================== # STEP 5: Apply the redo # ======================================== self._apply_redo(page, record) self.stats['records_redone'] += 1 def _is_redoable_type(self, record): """ Determine if record type requires redo action. """ REDOABLE_TYPES = { 'UPDATE', # Normal data modification 'CLR', # Compensation log record 'NESTED_TOP_ACTION' # Nested top action (e.g., space mgmt) } return record.type in REDOABLE_TYPES def _apply_redo(self, page, record): """ Apply a log record's redo information to a page. """ redo_info = record.redo_info # Apply based on operation type if redo_info['operation'] == 'INSERT_ROW': self._redo_insert(page, redo_info) elif redo_info['operation'] == 'DELETE_ROW': self._redo_delete(page, redo_info) elif redo_info['operation'] == 'UPDATE_FIELD': self._redo_update(page, redo_info) elif redo_info['operation'] == 'PAGE_FORMAT': self._redo_format(page, redo_info) elif redo_info['operation'] == 'BTREE_SPLIT': self._redo_btree_split(page, redo_info) # ... other operation types # CRITICAL: Update page LSN and mark dirty page.set_page_lsn(record.lsn) self.buffer.mark_dirty(page) def _redo_insert(self, page, redo_info): """Redo a row insertion.""" page.insert_at_slot( slot=redo_info['slot'], data=redo_info['data'] ) def _redo_delete(self, page, redo_info): """Redo a row deletion.""" page.delete_at_slot(slot=redo_info['slot']) def _redo_update(self, page, redo_info): """Redo a field update.""" page.update_field( slot=redo_info['slot'], offset=redo_info['offset'], value=redo_info['value'] ) def _redo_format(self, page, redo_info): """Redo page formatting (e.g., new page initialization).""" page.format(page_type=redo_info['page_type']) def _redo_btree_split(self, page, redo_info): """Redo B-tree node split.""" page.apply_split_changes( remaining_keys=redo_info['remaining_keys'], sibling_pointer=redo_info['sibling_page_id'] ) def _report_statistics(self): """Print comprehensive redo phase statistics.""" s = self.stats duration = s['end_time'] - s['start_time'] print(f"{'='*60}") print("REDO PHASE: Complete") print(f"{'='*60}") print(f"Duration: {duration:.3f} seconds") print(f"Log Records Processed:") print(f" Total scanned: {s['records_scanned']:,}") print(f" Skipped (wrong type): {s['records_skipped_type']:,}") print(f" Skipped (not in DPT): {s['records_skipped_dpt']:,}") print(f" Skipped (recLSN): {s['records_skipped_reclsn']:,}") print(f" Skipped (page LSN): {s['records_skipped_pagelsl']:,}") print(f" Actually redone: {s['records_redone']:,}") print(f"I/O Statistics:") print(f" Pages read: {s['pages_read']:,}") if s['records_scanned'] > 0: redo_rate = s['records_redone'] / s['records_scanned'] * 100 print(f" Redo rate: {redo_rate:.1f}%") if duration > 0: throughput = s['records_scanned'] / duration print(f" Throughput: {throughput:,.0f} records/sec")Algorithm Summary:
Let's trace through a complete, realistic crash recovery scenario to see the redo phase in action:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174
class CrashRecoveryScenario: """ Complete trace of a realistic crash recovery scenario. This scenario involves: - Multiple concurrent transactions - Mixed committed and uncommitted transactions - B-tree operations - Checkpoint taken mid-way - Partial page flushes """ def setup_scenario(self): """ Set up the scenario timeline. """ self.timeline = """ TIME ACTION ───────────────────────────────────────────────────────── T1 T1: BEGIN T2 T1: UPDATE accounts SET balance=100 WHERE id=1 (page 100, LSN 1000) T3 T2: BEGIN T4 T2: INSERT INTO orders (id, total) VALUES (99, 500) (page 200, LSN 1001) T5 T1: UPDATE accounts SET balance=150 WHERE id=2 (page 101, LSN 1002) T6 CHECKPOINT - DPT: {page100: recLSN=1000, page200: recLSN=1001, page101: recLSN=1002} - Active: {T1: lastLSN=1002, T2: lastLSN=1001} T7 Page 100 flushed to disk (pageLSN=1000) T8 T2: UPDATE orders SET status='processing' WHERE id=99 (page 200, LSN 1003) T9 T2: COMMIT (LSN 1004) T10 T3: BEGIN T11 T3: INSERT INTO products (id, name) VALUES (50, 'Widget') (page 300, LSN 1005) T12 T1: UPDATE accounts SET balance=200 WHERE id=1 (page 100, LSN 1006) T13 💥 CRASH! Power failure, no orderly shutdown STATE AT CRASH: ───────────────────────────────────────────────────────── Committed: T2 Uncommitted: T1, T3 Disk State: - page 100: pageLSN=1000, balance=100 (flushed at T7) - page 101: pageLSN=0 (never flushed) - page 200: pageLSN=0 (never flushed) - page 300: pageLSN=0 (never flushed) """ return self.timeline def analysis_phase_result(self): """ What the analysis phase produces (summarized). """ return { 'redo_lsn': 1000, # min(recLSN) from DPT at checkpoint 'dirty_page_table': { 100: {'rec_lsn': 1000}, # From checkpoint, updated by scan 101: {'rec_lsn': 1002}, 200: {'rec_lsn': 1001}, 300: {'rec_lsn': 1005}, # Added during log scan }, 'transaction_table': { 'T1': {'status': 'active', 'last_lsn': 1006, 'undo_next': 1006}, 'T2': {'status': 'committed'}, # Found commit record 'T3': {'status': 'active', 'last_lsn': 1005, 'undo_next': 1005}, }, } def trace_redo_phase(self): """ Detailed trace of redo phase execution. """ trace = [] # Start from RedoLSN = 1000 trace.append("" + "="*60) trace.append("REDO PHASE TRACE") trace.append("="*60) trace.append("Starting from RedoLSN = 1000") trace.append("") # LSN 1000: T1 UPDATE page 100 trace.append("LSN 1000: UPDATE page 100") trace.append(" - Page in DPT? Yes (recLSN=1000)") trace.append(" - 1000 >= recLSN=1000? Yes, continue") trace.append(" - Read page 100 from disk: pageLSN=1000") trace.append(" - 1000 >= pageLSN=1000? Yes, SKIP") trace.append(" → Skipped: already on disk") trace.append("") # LSN 1001: T2 INSERT page 200 trace.append("LSN 1001: INSERT page 200") trace.append(" - Page in DPT? Yes (recLSN=1001)") trace.append(" - 1001 >= recLSN=1001? Yes, continue") trace.append(" - Read page 200 from disk: pageLSN=0") trace.append(" - 1001 >= pageLSN=0? No, must REDO") trace.append(" → REDO: Insert row into page 200, set pageLSN=1001") trace.append("") # LSN 1002: T1 UPDATE page 101 trace.append("LSN 1002: UPDATE page 101") trace.append(" - Page in DPT? Yes (recLSN=1002)") trace.append(" - 1002 >= recLSN=1002? Yes, continue") trace.append(" - Read page 101 from disk: pageLSN=0") trace.append(" - 1002 >= pageLSN=0? No, must REDO") trace.append(" → REDO: Update page 101, set pageLSN=1002") trace.append("") # LSN 1003: T2 UPDATE page 200 trace.append("LSN 1003: UPDATE page 200") trace.append(" - Page in DPT? Yes") trace.append(" - 1003 >= recLSN=1001? Yes, continue") trace.append(" - Page 200 now in buffer: pageLSN=1001") trace.append(" - 1003 >= pageLSN=1001? No, must REDO") trace.append(" → REDO: Update page 200, set pageLSN=1003") trace.append("") # LSN 1004: T2 COMMIT trace.append("LSN 1004: COMMIT record (T2)") trace.append(" - Record type is COMMIT, not redoable") trace.append(" → Skipped: lifecycle record") trace.append("") # LSN 1005: T3 INSERT page 300 trace.append("LSN 1005: INSERT page 300") trace.append(" - Page in DPT? Yes (recLSN=1005)") trace.append(" - 1005 >= recLSN=1005? Yes, continue") trace.append(" - Read page 300 from disk: pageLSN=0") trace.append(" - 1005 >= pageLSN=0? No, must REDO") trace.append(" → REDO: Insert row into page 300, set pageLSN=1005") trace.append("") # LSN 1006: T1 UPDATE page 100 again trace.append("LSN 1006: UPDATE page 100") trace.append(" - Page in DPT? Yes") trace.append(" - 1006 >= recLSN=1000? Yes, continue") trace.append(" - Page 100 in buffer: pageLSN=1000") trace.append(" - 1006 >= pageLSN=1000? No, must REDO") trace.append(" → REDO: Update page 100, set pageLSN=1006") trace.append("") # End of log trace.append("-"*60) trace.append("REDO PHASE COMPLETE") trace.append("-"*60) trace.append("Records scanned: 7") trace.append("Records redone: 5") trace.append("Records skipped: 2") trace.append("") trace.append("All pages now reflect pre-crash state:") trace.append(" - Page 100: balance=200 (T1's second update)") trace.append(" - Page 101: balance=150 (T1's update)") trace.append(" - Page 200: order 99 with status='processing' (T2)") trace.append(" - Page 300: product 50 (T3)") trace.append("") trace.append("Ready for UNDO phase to roll back T1 and T3") return "".join(trace) # Run the scenarioscenario = CrashRecoveryScenario()print(scenario.setup_scenario())print(scenario.trace_redo_phase())Notice how: (1) T2's changes are redone even though it committed—redo doesn't care about commit status. (2) T3's insert is redone even though uncommitted—it will be undone later. (3) Page 100's first update is skipped because it was flushed, but the second update is redone. (4) The redo phase processes everything; the undo phase will handle T1 and T3.
The redo phase doesn't exist in isolation—it's the middle act in a three-phase recovery process. Understanding how it connects to the analysis and undo phases is crucial.
| Phase | Purpose | Input | Output |
|---|---|---|---|
| Analysis | Determine recovery scope | Checkpoint record, log | DPT, TxnTable, RedoLSN |
| Redo | Restore crash-time state | DPT, RedoLSN, log, disk pages | Recovered pages, updated TxnTable |
| Undo | Remove uncommitted changes | TxnTable (active txns), log | Consistent database |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151
class ARIESRecoveryManager: """ Complete ARIES recovery orchestration. Coordinates analysis, redo, and undo phases. """ def __init__(self, log_manager, buffer_manager): self.log = log_manager self.buffer = buffer_manager def recover(self): """ Execute complete crash recovery. """ print("="*60) print("ARIES CRASH RECOVERY INITIATED") print("="*60) # ====================================== # PHASE 1: ANALYSIS # ====================================== print("[PHASE 1: ANALYSIS]") print("-"*40) analysis = AnalysisPhase(self.log) analysis_result = analysis.execute() print(f"Analysis complete:") print(f" - RedoLSN: {analysis_result.redo_lsn}") print(f" - Dirty pages: {len(analysis_result.dirty_page_table)}") print(f" - Active transactions: {len(analysis_result.transaction_table)}") # Check if redo is even needed if analysis_result.redo_lsn == float('inf'): print("No redo needed - database is consistent!") return # ====================================== # PHASE 2: REDO # ====================================== print("[PHASE 2: REDO]") print("-"*40) redo = RedoPhase( self.log, self.buffer, analysis_result ) redo_result = redo.execute() print(f"Redo complete:") print(f" - Records redone: {redo.stats['records_redone']}") print(f" - Pages recovered: {redo_result['pages_recovered']}") # ====================================== # PHASE 3: UNDO # ====================================== print("[PHASE 3: UNDO]") print("-"*40) # Identify transactions needing undo loser_txns = [ txn_id for txn_id, info in redo_result['transaction_table'].items() if info.get('status') == 'active' # Not committed ] if not loser_txns: print("No transactions to undo - recovery complete!") return print(f"Transactions to undo: {loser_txns}") undo = UndoPhase( self.log, self.buffer, loser_txns, redo_result['transaction_table'] ) undo.execute() print("Undo complete - database is now consistent!") # ====================================== # POST-RECOVERY # ====================================== print("" + "="*60) print("RECOVERY COMPLETE") print("="*60) # Take a checkpoint to record clean state self._post_recovery_checkpoint() def _post_recovery_checkpoint(self): """ Take a checkpoint after recovery completes. This ensures quick recovery if another crash happens soon. """ print("Taking post-recovery checkpoint...") # Write END records for all undone transactions # Flush dirty pages # Write checkpoint record print("Checkpoint complete - system ready for normal operation") class DataFlowBetweenPhases: """ Illustrates the data flow between recovery phases. """ def show_data_flow(self): """ Visualize information passed between phases. """ flow = """ ┌──────────────┐ │ ANALYSIS │ └──────┬───────┘ │ │ Produces: │ • Dirty Page Table (DPT) │ • Transaction Table │ • RedoLSN (starting point) │ ▼ ┌──────────────┐ │ REDO │ └──────┬───────┘ │ │ Uses: DPT, RedoLSN │ Produces: │ • Restored pages (crash-time state) │ • Finalized Transaction Table │ (which txns are active/committed) │ ▼ ┌──────────────┐ │ UNDO │ └──────────────┘ │ │ Uses: Transaction Table (losers) │ Produces: │ • Consistent database state │ • CLRs logged for undone operations │ • END records for completed rollbacks """ return flowCritical Interactions:
Analysis → Redo: The DPT tells redo which pages might need work. The RedoLSN tells redo where to start scanning.
Redo → Undo: After redo, pages are in crash-time state with both committed and uncommitted changes. The transaction table tells undo which transactions are "losers" (didn't commit).
Redo updates page LSNs: This is important—if the system crashes during recovery, the next recovery will see the new page LSNs and correctly skip already-redone operations.
Redo phase performance directly impacts database availability during recovery. Let's analyze the key performance factors:
| Factor | Impact | Optimization Strategy |
|---|---|---|
| Log scan distance | Time = LogBytes / DiskSequentialRead | More frequent checkpoints → smaller log to scan |
| Page read count | Dominant I/O cost | Three-level filtering, prefetching |
| Unique pages touched | Limits parallelism benefit | Depends on workload locality |
| Redo operations/page | CPU time per page | Physiological logging efficiency |
| Buffer pool efficiency | Repeated page access cost | Larger buffer pool during recovery |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
class RedoPerformanceModel: """ Analytical model for redo phase performance. """ def estimate_redo_time(self, log_bytes: int, log_read_rate_mbps: int, dirty_pages: int, page_read_time_ms: float, avg_redos_per_page: float, redo_cpu_time_ms: float): """ Estimate total redo phase duration. Args: log_bytes: Amount of log to scan (from checkpoint) log_read_rate_mbps: Sequential log read speed dirty_pages: Number of pages in DPT page_read_time_ms: Time to read one page from disk avg_redos_per_page: Average redo operations per dirty page redo_cpu_time_ms: CPU time per redo operation """ # Log scan time (sequential I/O, very fast) log_scan_seconds = log_bytes / (log_read_rate_mbps * 1024 * 1024) # Page read time (random I/O, the bottleneck) page_read_seconds = (dirty_pages * page_read_time_ms) / 1000 # CPU processing time total_redos = dirty_pages * avg_redos_per_page cpu_seconds = (total_redos * redo_cpu_time_ms) / 1000 # Total (assuming some overlap between log scan and page reads) total_seconds = log_scan_seconds + page_read_seconds + cpu_seconds return { 'log_scan': log_scan_seconds, 'page_reads': page_read_seconds, 'cpu_processing': cpu_seconds, 'total': total_seconds, 'bottleneck': self._identify_bottleneck( log_scan_seconds, page_read_seconds, cpu_seconds ) } def _identify_bottleneck(self, log_s, page_s, cpu_s): if page_s > log_s and page_s > cpu_s: return "Page I/O" elif log_s > cpu_s: return "Log scan" else: return "CPU" def analyze_recovery_scenarios(): """ Compare recovery times under different scenarios. """ model = RedoPerformanceModel() scenarios = [ { 'name': 'Small DB, frequent checkpoint', 'log_bytes': 10 * 1024 * 1024, # 10 MB 'dirty_pages': 500, 'storage': 'SSD' }, { 'name': 'Medium DB, normal checkpoint', 'log_bytes': 100 * 1024 * 1024, # 100 MB 'dirty_pages': 5000, 'storage': 'SSD' }, { 'name': 'Large DB, infrequent checkpoint', 'log_bytes': 1024 * 1024 * 1024, # 1 GB 'dirty_pages': 50000, 'storage': 'SSD' }, { 'name': 'Large DB on HDD', 'log_bytes': 1024 * 1024 * 1024, # 1 GB 'dirty_pages': 50000, 'storage': 'HDD' }, ] # Storage characteristics storage_params = { 'SSD': {'log_rate': 500, 'page_time': 0.1}, 'HDD': {'log_rate': 150, 'page_time': 8.0}, } print("="*70) print("REDO PHASE PERFORMANCE ANALYSIS") print("="*70) for s in scenarios: params = storage_params[s['storage']] result = model.estimate_redo_time( log_bytes=s['log_bytes'], log_read_rate_mbps=params['log_rate'], dirty_pages=s['dirty_pages'], page_read_time_ms=params['page_time'], avg_redos_per_page=3.0, redo_cpu_time_ms=0.01 ) print(f"{s['name']}:") print(f" Log to scan: {s['log_bytes'] / (1024*1024):.0f} MB") print(f" Dirty pages: {s['dirty_pages']:,}") print(f" Storage: {s['storage']}") print(f" ---") print(f" Log scan time: {result['log_scan']:.2f}s") print(f" Page read time: {result['page_reads']:.2f}s") print(f" CPU time: {result['cpu_processing']:.2f}s") print(f" Total: {result['total']:.2f}s") print(f" Bottleneck: {result['bottleneck']}") analyze_recovery_scenarios()Production databases use several techniques: (1) Parallel redo across different pages, (2) Prefetching pages based on log lookahead, (3) Larger buffer pool during recovery, (4) Background page flush during normal operation to reduce DPT size, (5) Adaptive checkpoint frequency based on workload.
Real database systems implement several enhancements beyond the basic ARIES algorithm. Let's examine key production considerations:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
class ProductionRedoFeatures: """ Production-grade redo phase features. """ class ProgressReporting: """ Track and report recovery progress. """ def __init__(self, total_log_bytes, redo_lsn): self.total_bytes = total_log_bytes self.start_lsn = redo_lsn self.current_lsn = redo_lsn self.start_time = time.time() def update(self, current_lsn): self.current_lsn = current_lsn def get_progress(self): bytes_processed = self.current_lsn - self.start_lsn if self.total_bytes == 0: return 100.0 return (bytes_processed / self.total_bytes) * 100 def estimate_remaining_seconds(self): elapsed = time.time() - self.start_time progress = self.get_progress() if progress == 0: return None return (elapsed / progress) * (100 - progress) def report(self): progress = self.get_progress() remaining = self.estimate_remaining_seconds() print(f"Redo progress: {progress:.1f}% | " f"Est. remaining: {remaining:.0f}s" if remaining else f"Redo progress: {progress:.1f}%") class ParallelRedoCoordinator: """ Coordinate parallel redo workers. """ def __init__(self, num_workers=4): self.num_workers = num_workers self.page_worker_map = {} # page_id -> worker_id self.worker_queues = [Queue() for _ in range(num_workers)] def dispatch_record(self, record): """ Assign a log record to a worker based on page_id. Same page always goes to same worker for ordering. """ page_id = record.page_id worker_id = hash(page_id) % self.num_workers self.worker_queues[worker_id].put(record) def worker_thread(self, worker_id): """ Worker thread processing its queue. """ queue = self.worker_queues[worker_id] while True: record = queue.get() if record is None: # Shutdown signal break self._process_redo(record) class CrashDuringRecoveryHandler: """ Handle crash that occurs during recovery itself. """ def detect_incomplete_recovery(self): """ Check if previous recovery was incomplete. """ # Look for end-of-recovery marker in log # If absent, previous recovery crashed last_record = self.log.read_last_record() return last_record.type != 'RECOVERY_COMPLETE' def handle_nested_recovery(self): """ Recovery after crash during recovery. Key insight: Due to idempotence, we just re-run recovery from the beginning. Already-applied operations will be skipped via LSN comparison. """ print("Previous recovery was incomplete. Restarting...") # Simply run normal recovery # Idempotence ensures correctness self.run_complete_recovery() class HotStandbyIntegration: """ Allow read queries during undo phase. """ def can_serve_reads(self): """ After redo, pages are consistent (have crash-time state). We can serve reads that don't touch uncommitted data. """ return self.redo_complete and self.snapshot_isolation_ready def handle_page_request(self, page_id, query_snapshot): """ Serve a page to a read query. May need to hide uncommitted rows from loser transactions. """ page = self.buffer.get_page(page_id) # Filter out rows from uncommitted transactions visible_rows = [ row for row in page.rows if self._is_visible(row, query_snapshot) ] return visible_rows def _is_visible(self, row, snapshot): """ Check if row is visible to this snapshot. Rows from transactions still being undone are hidden. """ if row.txn_id in self.active_undo_transactions: return False # Transaction being undone return True # Otherwise visibleEven well-implemented recovery systems can encounter issues. Understanding common problems aids in debugging and verification:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
class RedoPhaseDebugger: """ Debugging utilities for redo phase issues. """ def verify_log_integrity(self): """ Scan log and verify all records are valid. """ issues = [] for record in self.log.scan_all(): # Check checksum if not record.verify_checksum(): issues.append(f"LSN {record.lsn}: checksum mismatch") # Check prev_lsn chain if record.prev_lsn and record.prev_lsn >= record.lsn: issues.append(f"LSN {record.lsn}: invalid prev_lsn {record.prev_lsn}") # Check page_id validity if record.page_id and not self.is_valid_page(record.page_id): issues.append(f"LSN {record.lsn}: invalid page_id {record.page_id}") return issues def verify_page_lsn_consistency(self): """ After redo, verify page LSNs make sense. """ issues = [] max_lsn = self.log.get_max_lsn() for page_id in self.get_all_pages(): page = self.buffer.read_page(page_id) page_lsn = page.get_page_lsn() if page_lsn > max_lsn: issues.append(f"Page {page_id}: LSN {page_lsn} > max log LSN {max_lsn}") if page_lsn < 0: issues.append(f"Page {page_id}: negative LSN {page_lsn}") return issues def trace_page_history(self, page_id, start_lsn=0): """ Trace all log records affecting a specific page. Useful for debugging page corruption. """ records = [] for record in self.log.scan_all(): if record.lsn < start_lsn: continue if record.page_id == page_id: records.append({ 'lsn': record.lsn, 'type': record.type, 'txn_id': record.txn_id, 'operation': record.redo_info.get('operation') if record.redo_info else None }) return records def simulate_redo(self, dry_run=True): """ Simulate redo without actually modifying pages. Useful for testing and verification. """ decisions = [] for record in self.log.scan_from(self.redo_lsn): if not self._is_redoable(record): decisions.append((record.lsn, 'SKIP', 'wrong type')) continue if record.page_id not in self.dpt: decisions.append((record.lsn, 'SKIP', 'not in DPT')) continue rec_lsn = self.dpt[record.page_id].rec_lsn if record.lsn < rec_lsn: decisions.append((record.lsn, 'SKIP', f'LSN < recLSN ({rec_lsn})')) continue page_lsn = self.get_page_lsn(record.page_id) if page_lsn >= record.lsn: decisions.append((record.lsn, 'SKIP', f'pageLSN ({page_lsn}) >= LSN')) else: decisions.append((record.lsn, 'REDO', f'pageLSN ({page_lsn}) < LSN')) return decisionsWe've now covered every aspect of the ARIES redo phase. Let's bring together the complete picture:
The Redo Phase in One Sentence:
The redo phase scans the log from RedoLSN forward, restoring each dirty page to its crash-time state by applying any log records whose LSN exceeds the page's current LSN, thereby reconstructing exactly the state that existed when the system crashed.
What's Next:
With the database restored to crash-time state, the undo phase can now roll back any transactions that were active at crash time. The undo phase uses the transaction table (finalized during redo) to identify "loser" transactions and walks their log record chains backward, writing CLRs for each undone operation.
Congratulations! You now have a complete understanding of the ARIES redo phase—from the philosophical foundation of repeating history to production implementation details. This knowledge forms a critical part of understanding how databases maintain durability and recover from failures.