Loading content...
In the previous page, we established why ARIES repeats history. Now we confront the practical question: what exactly does 'redo all operations' mean?
The answer is deceptively simple in principle but rich in detail. During the redo phase, ARIES scans the log from the RedoLSN forward, and for each log record that represents a data modification, it considers whether that modification needs to be reapplied to the corresponding page.
This page explores the complete mechanics: which log records are processed, how they're applied, and what special considerations exist for different record types. By the end, you'll understand the precise algorithm that transforms a potentially inconsistent disk state back to the exact pre-crash database state.
By the end of this page, you will understand: (1) Which log record types are processed during redo, (2) How each type is handled differently, (3) The page read and modification process during redo, and (4) Performance considerations for efficient redo processing.
Not all log records require action during redo. ARIES defines several log record types, each with specific redo behavior:
Records That Require Redo Action:
| Record Type | Contains | Redo Action | Purpose |
|---|---|---|---|
| UPDATE | Page ID, Redo Info, Undo Info | Apply redo information to page | Normal data modifications |
| CLR (Compensation Log Record) | Page ID, Redo Info, UndoNextLSN | Apply redo information to page | Record of undo operation during rollback |
| UPDATE (nested top action) | Page ID, Redo Info only | Apply redo information to page | Non-undoable operations (e.g., space mgmt) |
Records That Are Skipped During Redo:
| Record Type | Why Skipped | Redo Consideration |
|---|---|---|
| BEGIN | No data modification | None—just marks transaction start |
| COMMIT | No data modification | None—just marks successful commit |
| ABORT | No data modification | None—marks intent to roll back |
| END | No data modification | None—cleanup marker after rollback |
| CHECKPOINT | Meta-information only | Used by analysis phase, not redo |
The redo phase only cares about records that modified page contents. Transaction lifecycle records (BEGIN, COMMIT, ABORT, END) are important for analysis and recovery logic, but they don't change data pages and thus require no redo action.
The core of the redo phase is a sequential scan through the log, processing each record according to precise rules. Here is the complete algorithm:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146
class RedoPhaseProcessor: """ Complete implementation of the ARIES redo phase processing loop. This demonstrates the exact algorithm used in production databases. """ def __init__(self, log_manager, buffer_manager, dirty_page_table): self.log = log_manager self.buffer = buffer_manager self.dirty_page_table = dirty_page_table # From analysis phase self.redo_lsn = self._calculate_redo_lsn() self.pages_redone = 0 self.pages_skipped = 0 def _calculate_redo_lsn(self): """Calculate starting point from Dirty Page Table""" if not self.dirty_page_table: return float('inf') # Nothing to redo return min(entry.rec_lsn for entry in self.dirty_page_table.values()) def execute_redo_phase(self): """ Main redo phase execution. Scans log from RedoLSN forward, applying operations as needed. """ print(f"Starting redo phase from LSN {self.redo_lsn}") # Open log scan at RedoLSN log_scan = self.log.open_forward_scan(start_lsn=self.redo_lsn) for record in log_scan: self._process_log_record(record) print(f"Redo phase complete:") print(f" - Records processed: {self.pages_redone + self.pages_skipped}") print(f" - Pages redone: {self.pages_redone}") print(f" - Pages skipped: {self.pages_skipped}") def _process_log_record(self, record): """ Process a single log record during redo. This is the heart of the redo algorithm. """ # Step 1: Filter non-redoable records if not self._is_redoable(record): return page_id = record.page_id # Step 2: Check if page is in Dirty Page Table # If page is not in DPT, it was flushed after all its modifications # and doesn't need redo if page_id not in self.dirty_page_table: self.pages_skipped += 1 return # Step 3: Check recLSN - if record LSN < recLSN, the modification # was flushed before this page became dirty again rec_lsn = self.dirty_page_table[page_id].rec_lsn if record.lsn < rec_lsn: self.pages_skipped += 1 return # Step 4: Read page from disk and check pageLSN page = self.buffer.fetch_page(page_id) page_lsn = page.get_page_lsn() # Step 5: The critical comparison # If pageLSN >= record.lsn, this operation is already on disk if page_lsn >= record.lsn: self.pages_skipped += 1 self._log_skip_reason(record, page_lsn, "pageLSN >= record LSN") return # Step 6: Apply the redo self._apply_redo(page, record) self.pages_redone += 1 def _is_redoable(self, record): """Determine if a log record requires redo action""" redoable_types = {'UPDATE', 'CLR', 'NESTED_TOP_ACTION'} return record.type in redoable_types def _apply_redo(self, page, record): """ Apply the redo information from the log record to the page. This is where the actual modification happens. """ # Extract redo information from log record redo_info = record.redo_info # Apply the operation based on its type if record.type == 'UPDATE': # Physiological redo: apply operation to specific location self._apply_physiological_redo(page, redo_info) elif record.type == 'CLR': # CLR redo is the same as UPDATE redo - just apply the change self._apply_physiological_redo(page, redo_info) # Update the page's LSN to reflect this redo page.set_page_lsn(record.lsn) # Mark page as dirty (it will eventually be flushed) self.buffer.mark_dirty(page) def _apply_physiological_redo(self, page, redo_info): """ Apply physiological redo operation. Physiological logging means: - Physical: identifies the specific page - Logical: describes the operation within the page This allows the operation to work correctly even if the page's internal structure has changed. """ operation_type = redo_info['operation'] if operation_type == 'INSERT_RECORD': slot = redo_info['slot'] data = redo_info['data'] page.insert_at_slot(slot, data) elif operation_type == 'DELETE_RECORD': slot = redo_info['slot'] page.delete_at_slot(slot) elif operation_type == 'UPDATE_FIELD': slot = redo_info['slot'] offset = redo_info['offset'] new_value = redo_info['new_value'] page.update_field(slot, offset, new_value) elif operation_type == 'PAGE_SPLIT': # B-tree page splits are logged and redone new_page_id = redo_info['new_page_id'] split_key = redo_info['split_key'] page.apply_split(new_page_id, split_key) # ... other operation types as needed def _log_skip_reason(self, record, page_lsn, reason): """Log why a redo was skipped (for debugging/monitoring)""" # In production, this might update metrics or detailed logs passKey Observations About the Algorithm:
Three-Level Filtering: Before actually reading a page, we filter using the Dirty Page Table (two checks: presence and recLSN). Only if both pass do we incur the I/O cost of reading the page.
Final LSN Check: Even after passing DPT checks, we still compare pageLSN. This handles the case where the DPT is conservative (pages can be flushed between checkpoint and crash).
Redo is Write-Intensive: Each redone operation marks the page dirty. This is intentional—we want the recovered state to eventually be flushed.
Sequential Log Access: The log is scanned forward exactly once, which is optimal for disk-based logs and excellent for modern SSDs.
The most common redo operation involves UPDATE log records. These represent normal data modifications made by transactions. Understanding their structure is essential for understanding redo.
UPDATE Log Record Structure:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
class UpdateLogRecord: """ Structure of an UPDATE log record in ARIES. This record contains all information needed for both redo and undo. """ def __init__(self): # Header fields (present in all log records) self.lsn = None # Log Sequence Number - unique identifier self.type = 'UPDATE' # Record type self.txn_id = None # Transaction that made this change self.prev_lsn = None # Previous LSN for this transaction # Page identification self.page_id = None # Which page was modified # Redo information - enough to re-apply the change self.redo_info = { 'operation': None, # Type of operation (INSERT, DELETE, UPDATE) 'offset': None, # Position within page (for byte-level ops) 'length': None, # Length of modification 'new_value': None, # The value to write (after-image) } # Undo information - enough to reverse the change self.undo_info = { 'operation': None, # Inverse operation 'offset': None, # Same position 'length': None, # Same length 'old_value': None, # The value to restore (before-image) } # Example: A row update operationdef example_row_update(): """ Scenario: Update employee salary from 50000 to 55000 in row at slot 5 of page 1234 """ record = UpdateLogRecord() record.lsn = 1000 record.txn_id = 'T1' record.prev_lsn = 995 record.page_id = 1234 # For redo: we need to know what to write record.redo_info = { 'operation': 'UPDATE_FIELD', 'slot': 5, # Row position in page 'field_offset': 32, # Salary field offset within row 'length': 4, # 4 bytes for integer 'new_value': 55000 # New salary } # For undo: we need to know the original value record.undo_info = { 'operation': 'UPDATE_FIELD', 'slot': 5, 'field_offset': 32, 'length': 4, 'old_value': 50000 # Original salary } return recordThe Redo Process for UPDATE Records:
Physiological Logging Nuance:
ARIES uses physiological logging—physical identification of the page but logical description of the operation within the page. This is important because:
You might wonder why an UPDATE record stores both redo and undo information. During redo, we only use redo_info. The undo_info is there for the undo phase or for rollback during normal operation. This dual-purpose design means the same log record works for crash recovery and live transaction rollback.
One of the more subtle aspects of ARIES redo is handling Compensation Log Records (CLRs). These records are created when a transaction is being rolled back—each undo operation generates a CLR.
Why CLRs Exist:
When rolling back a transaction, the undo operations themselves must be logged. If the system crashes during a rollback, recovery must not re-undo operations that were already undone. CLRs prevent this.
CLR Structure:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
class CLRLogRecord: """ Compensation Log Record (CLR) - records an undo operation. Key insight: A CLR is redo-only! It records what was done to undo a previous operation. If we redo a CLR, we're re-applying the undo operation, which is exactly what we want. """ def __init__(self): # Standard header self.lsn = None self.type = 'CLR' self.txn_id = None self.prev_lsn = None # Page being modified by the undo self.page_id = None # The undo operation (which we need to redo during recovery) self.redo_info = None # Same structure as UPDATE redo_info # The critical CLR field: points to the NEXT record to undo self.undo_next_lsn = None # NOTE: CLRs have NO undo_info field! # This is crucial: CLRs themselves are never undone def create_clr_for_undo(original_record, undo_result): """ Create a CLR when undoing an UPDATE record. The CLR captures what we did (the undo) and where to continue if we need to keep undoing (undo_next_lsn). """ clr = CLRLogRecord() clr.txn_id = original_record.txn_id clr.page_id = original_record.page_id # The redo_info for a CLR is the UNDO of the original # So during redo, we re-apply the undo operation clr.redo_info = { 'operation': original_record.undo_info['operation'], 'slot': original_record.undo_info.get('slot'), 'offset': original_record.undo_info.get('offset'), 'new_value': original_record.undo_info['old_value'] # Restore old! } # Point to the record we should undo next clr.undo_next_lsn = original_record.prev_lsn return clr # Example scenario demonstrating CLR creation and redodef clr_redo_scenario(): """ Scenario: 1. T1 updates salary 50000 -> 55000 (LSN 100) 2. T1 is rolled back, salary restored 55000 -> 50000 (CLR at LSN 150) 3. System crashes 4. During redo, we redo the CLR (restoring salary to 50000) """ # Original update original = UpdateLogRecord() original.lsn = 100 original.txn_id = 'T1' original.prev_lsn = 90 # Previous operation by T1 original.page_id = 1234 original.redo_info = {'operation': 'UPDATE_FIELD', 'new_value': 55000} original.undo_info = {'operation': 'UPDATE_FIELD', 'old_value': 50000} # CLR created when undoing the above clr = CLRLogRecord() clr.lsn = 150 clr.txn_id = 'T1' clr.prev_lsn = 100 clr.page_id = 1234 clr.redo_info = {'operation': 'UPDATE_FIELD', 'new_value': 50000} # Restoring! clr.undo_next_lsn = 90 # Skip LSN 100, go to 90 next print("During redo phase:") print(f" LSN 100 UPDATE: Would set salary to 55000") print(f" LSN 150 CLR: Would set salary back to 50000") print(f"") print("After redo, salary = 50000 (the undo was 'redone')") print("During undo phase, T1 is seen as still needing undo,") print("but undo_next_lsn=90 tells us to skip LSN 100!") clr_redo_scenario()The Elegant Design of CLRs:
CLRs are redo-only: They have no undo information because CLRs themselves should never be undone. The undo_next_lsn ensures we skip the original operation during any subsequent undo.
Redo of CLR = Re-apply the undo: If we redo a CLR, we're just re-applying the undo operation it recorded. This is exactly correct.
Crash during rollback is handled: If the system crashes while rolling back T1, recovery will:
No double-undo: The undo_next_lsn pointers form a chain that skips already-undone operations.
Redoing a CLR doesn't 'undo the undo'—it re-applies the undo! The CLR's redo_info contains the undo operation (restoring the old value). During redo, we treat CLRs just like UPDATE records: apply their redo_info to the page.
The redo phase processes log records in strict LSN order. This isn't arbitrary—it's essential for correctness.
Why LSN Order Matters:
Parallelization Opportunities:
While the log must be scanned sequentially, there are opportunities for parallelism:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
import threadingfrom collections import defaultdictfrom queue import Queue class ParallelRedoProcessor: """ Optimized redo processor that parallelizes where safe. Strategy: - Single thread scans log and dispatches to page-specific queues - Multiple worker threads process page queues in parallel - Operations on the same page remain ordered within their queue """ def __init__(self, num_workers=4): self.num_workers = num_workers self.page_queues = defaultdict(Queue) # page_id -> operation queue self.workers = [] self.stop_event = threading.Event() def execute_parallel_redo(self, log_records, redo_lsn): """ Execute redo phase with page-level parallelism. """ # Start worker threads for i in range(self.num_workers): worker = threading.Thread( target=self._page_worker, args=(i,) ) worker.start() self.workers.append(worker) # Main thread: scan log and dispatch to page queues pages_seen = set() for record in log_records: if record.lsn < redo_lsn: continue if not self._is_redoable(record): continue page_id = record.page_id pages_seen.add(page_id) # Add to page-specific queue (maintains order per page) self.page_queues[page_id].put(record) # Signal completion and wait for page_id in pages_seen: self.page_queues[page_id].put(None) # Sentinel self.stop_event.set() for worker in self.workers: worker.join() def _page_worker(self, worker_id): """ Worker thread that processes operations for assigned pages. Each page's operations are processed in order, but different pages can be processed concurrently. """ while not self.stop_event.is_set(): # Find a page with pending work for page_id, queue in self.page_queues.items(): if not queue.empty(): record = queue.get() if record is None: continue # Sentinel, this page is done # Process this redo operation (same logic as serial) self._process_redo(record) def _process_redo(self, record): """Apply redo - same as sequential version""" # ... LSN checks, page fetch, apply redo ... pass def _is_redoable(self, record): return record.type in {'UPDATE', 'CLR'}Modern databases like PostgreSQL and MySQL use variations of parallel redo. PostgreSQL parallelizes by block/page, ensuring each page is processed by exactly one worker. This maintains the critical per-page ordering while achieving significant speedup on multi-core systems.
Not all operations are simple row updates. The redo phase must handle several special categories correctly:
1. Nested Top Actions:
Some operations, like space management or index structure changes, need to persist even if the containing transaction aborts. These are implemented as nested top actions:
12345678910111213141516171819202122232425262728293031
def nested_top_action_redo(): """ Nested top actions are mini-transactions that commit independently. Example: Allocating a new page for a B-tree split - The allocation itself must persist even if the insert fails - Otherwise, we'd leak pages or corrupt free space management """ # Log record structure for nested top action nested_action_log = { 'type': 'NESTED_TOP_ACTION', 'lsn': 200, 'txn_id': 'T1', 'page_id': 'free_space_map', # Only redo info - NO undo info! 'redo_info': { 'operation': 'ALLOCATE_PAGE', 'allocated_page_id': 5678 }, # Links that make this a "nested" action 'dummy_clr_lsn': 199, # Points before the nested action start # This ensures undo skips over this action } print("During redo: Apply ALLOCATE_PAGE normally") print("During undo: The dummy CLR linkage causes this to be skipped") print("Result: Page allocation persists even if T1 is rolled back")2. Multi-Page Operations:
Some operations span multiple pages and must be handled atomically—either all redone or none. Examples include:
12345678910111213141516171819202122232425262728293031323334353637383940
def multi_page_operation_redo(): """ B-tree split logged as multiple records with dependencies. """ # Step 1: Allocate new sibling page log_record_1 = { 'lsn': 300, 'type': 'NESTED_TOP_ACTION', # Page allocation never undone 'page_id': 'free_space_map', 'redo_info': {'operation': 'ALLOCATE_PAGE', 'new_page_id': 999} } # Step 2: Initialize new page with half the keys log_record_2 = { 'lsn': 301, 'type': 'UPDATE', 'page_id': 999, # The newly allocated page 'redo_info': {'operation': 'INIT_BTREE_LEAF', 'keys': [50,51,52]} } # Step 3: Update original page to remove moved keys log_record_3 = { 'lsn': 302, 'type': 'UPDATE', 'page_id': 888, # The original page being split 'redo_info': {'operation': 'REMOVE_KEYS', 'keys': [50,51,52]} } # Step 4: Update parent with new separator key log_record_4 = { 'lsn': 303, 'type': 'UPDATE', 'page_id': 777, # Parent page 'redo_info': {'operation': 'INSERT_SEPARATOR', 'key': 50, 'child_ptr': 999} } print("Redo processes these in LSN order (300, 301, 302, 303)") print("Even if some pages were flushed, the sequential redo") print("ensures the B-tree ends up in a consistent state.")3. External File Operations:
Operations involving external resources (like BLOB storage) require careful redo:
All redo operations must be idempotent—applying them multiple times produces the same result as applying them once. This is critical because a crash during recovery means redo will be restarted, potentially re-applying the same operations. We'll explore this requirement in depth on the next page.
Redo phase performance directly impacts recovery time, which affects database availability. Several techniques optimize redo:
1. Prefetching:
The redo phase can look ahead in the log to identify upcoming page accesses and issue prefetch requests:
12345678910111213141516171819202122232425262728293031323334353637383940414243
class RedoWithPrefetch: """ Redo processor with page prefetching for improved I/O. """ def __init__(self, buffer_manager, prefetch_distance=100): self.buffer = buffer_manager self.prefetch_distance = prefetch_distance self.pending_prefetch = set() def execute_redo_with_prefetch(self, log_records, redo_lsn): """ Process redo while prefetching upcoming pages. """ # Build lookahead buffer lookahead = [] for record in log_records: if record.lsn < redo_lsn: continue if not self._is_redoable(record): continue lookahead.append(record) # When we have enough lookahead, start prefetching if len(lookahead) >= self.prefetch_distance: self._issue_prefetch(lookahead[-1].page_id) # Process the oldest record in lookahead if len(lookahead) > self.prefetch_distance: oldest = lookahead.pop(0) self._process_redo(oldest) # Drain remaining records for record in lookahead: self._process_redo(record) def _issue_prefetch(self, page_id): """Request async page read""" if page_id not in self.pending_prefetch: self.buffer.async_prefetch(page_id) self.pending_prefetch.add(page_id)2. Write Combining:
Multiple redo operations on the same page can be applied before flushing, reducing I/O:
3. Checkpoint Frequency Tuning:
More frequent checkpoints mean less log to scan during redo:
4. Log Record Sizing:
Smaller log records mean faster log scanning:
| Metric | Impact | Optimization |
|---|---|---|
| Log scan rate | Time proportional to log length | Faster storage, parallel parsing |
| Page read latency | Each unique page needs at least one read | Prefetching, SSD storage |
| Pages to redo | More dirty pages = more work | Aggressive flushing, smaller DPT |
| Redo operations/page | More ops = more CPU time | Batched application |
| Write amplification | Each page written at least once | Write combining, delayed flush |
We've explored the complete mechanics of how ARIES processes every relevant log record during the redo phase. Let's consolidate the key points:
What's Next:
The redo phase relies heavily on LSN comparison to decide whether to apply each log record. The next page dives deep into this mechanism—how page LSNs are maintained, compared, and used to achieve the critical property of idempotent redo.
You now understand the complete mechanics of processing all log records during ARIES redo. The universal application principle—consider every record, filter intelligently, apply what's needed—ensures correct reconstruction of the pre-crash database state.