Loading learning content...
Systems fail. Power goes out. Disks crash. Software bugs cause segmentation faults. The D in ACID—Durability—promises that committed transactions survive any such failure.
Recovery scenarios test your understanding of how database systems use logging to survive failures and restore consistency. In interviews, you'll analyze log sequences, determine which operations to redo and undo after a crash, and reason about checkpoint optimization.
This page gives you complete mastery of recovery concepts, from fundamental logging principles to the industry-standard ARIES algorithm.
By completing this page, you will be able to: (1) Understand Write-Ahead Logging (WAL) principles; (2) Analyze log records and determine transaction states after crash; (3) Apply the ARIES recovery algorithm (Analysis, Redo, Undo); (4) Reason about checkpoint mechanisms and their impact on recovery; (5) Solve recovery-focused interview problems systematically.
Database systems use volatile memory (RAM) for performance—buffer pools, lock tables, and transaction state all reside in memory. A crash loses this volatile state. Recovery must:
Transaction Failure: A single transaction aborts (due to error, deadlock, etc.)
System Failure (Crash): Entire system fails (power outage, OS crash)
Media Failure: Disk failure loses data
| Storage Type | Survives Crash? | Survives Disk Failure? | Examples |
|---|---|---|---|
| Volatile (RAM) | No | No | Buffer pool, lock table |
| Non-volatile (Disk) | Yes | No | Database files, log files |
| Stable (Replicated) | Yes | Yes | Replicated logs, backups |
Database systems face a choice:
Force policy: Write all changes to disk before commit
No-Force policy: Allow commit before all changes are on disk
Steal policy: Allow uncommitted changes to be written to disk
No-Steal policy: Never write uncommitted changes to disk
Most systems use Steal/No-Force for performance, accepting complex recovery.
The ARIES recovery algorithm (used by most modern DBMS) is designed for Steal/No-Force policies. This means recovery must handle both redo (because of No-Force) and undo (because of Steal).
Write-Ahead Logging (WAL) is the fundamental mechanism enabling recovery. The principle is simple but critical:
Rule 1: Undo Logging Before a page is flushed to disk (stolen), all log records for updates to that page must be flushed to stable storage.
Rule 2: Redo Logging Before a transaction commits, all log records for that transaction must be flushed to stable storage.
In essence: Always write the log BEFORE the data. If we crash:
Log writes are sequential (append-only), which is much faster than random I/O:
By logging changes sequentially and lazily writing data pages, we get the best of both worlds.
A typical log record contains:
| Field | Description |
|---|---|
| LSN | Log Sequence Number (unique, increasing) |
| TransID | Transaction that performed the operation |
| Type | Type of record (UPDATE, COMMIT, ABORT, etc.) |
| PageID | Page modified (for UPDATE records) |
| Offset | Position within page |
| Before Image | Value before the update (for undo) |
| After Image | Value after the update (for redo) |
| PrevLSN | LSN of previous log record for this transaction |
123456789101112131415161718
LSN TransID Type PageID Before After PrevLSN--- ------- ---- ------ ------ ----- -------001 T1 UPDATE P1 10 20 null002 T2 UPDATE P2 50 55 null003 T1 UPDATE P3 100 120 001004 T2 UPDATE P1 20 25 002005 T1 COMMIT - - - 003006 T2 UPDATE P4 30 35 004007 CHECKPOINT (active: T2)008 T2 UPDATE P2 55 60 006009 T3 UPDATE P5 0 10 null010 T2 COMMIT - - - 008--- CRASH --- --- --- --- --- Analysis:- T1: Committed at LSN 005 → REDO all T1 updates (001, 003)- T2: Committed at LSN 010 → REDO all T2 updates (002, 004, 006, 008)- T3: No commit → UNDO T3 changes (009)Before images are used for UNDO (restoring old values). After images are used for REDO (reapplying changes). Some interview problems give only one direction—make sure you know which operations are possible with available data.
ARIES (Algorithms for Recovery and Isolation Exploiting Semantics) is the industry-standard recovery algorithm, used by IBM DB2, Microsoft SQL Server, PostgreSQL, and others.
| Structure | Purpose | Contents |
|---|---|---|
| Transaction Table | Track active transactions | TransID, Status, LastLSN, UndoNextLSN |
| Dirty Page Table (DPT) | Track dirty pages in buffer | PageID, RecLSN (first LSN that dirtied it) |
| Log | Record all changes | Sequence of log records (UPDATE, COMMIT, CLR, etc.) |
When undoing an operation, ARIES writes a CLR to the log. The CLR has:
Why CLRs matter: If the system crashes during recovery (during the undo phase), we don't want to undo the same operation again. CLRs ensure idempotence—we can crash and restart recovery any number of times and get the same result.
A common interview question: 'Why redo uncommitted transactions in ARIES?' Answer: Repeating history exactly recreates the system state at crash, including all locks. This enables proper undo and handles complex dependencies. It's simpler than trying to selectively redo.
Checkpoints reduce recovery time by limiting how much log must be scanned. Without checkpoints, recovery must scan the entire log from the beginning—potentially hours of work for a long-running system.
Naive Checkpoint (System Halt):
Pro: Simple, recovery starts from checkpoint Con: System unavailable during checkpoint (unacceptable for production)
Fuzzy Checkpoint (ARIES-style):
Pro: No system pause Con: Recovery must process some log before checkpoint
An ARIES fuzzy checkpoint records:
Analysis Phase starts at: Last checkpoint's BEGIN_CHECKPOINT record
Redo Phase starts at: Minimum of:
This is called the RedoLSN—the earliest point from which we might need to redo.
Frequent checkpoints:
Infrequent checkpoints:
Checkpoint at LSN 100:
- Active Transactions: {T1: LastLSN=95, T2: LastLSN=80}
- Dirty Page Table: {P1: RecLSN=60, P2: RecLSN=90}
Subsequent log:
LSN 101: T1 UPDATE P3
LSN 102: T3 UPDATE P4
LSN 103: T2 COMMIT
LSN 104: T1 COMMIT
--- CRASH ---Analysis Phase:
- Start from LSN 100 (checkpoint)
- At LSN 103: Remove T2 from active set (committed)
- At LSN 104: Remove T1 from active set (committed)
- Add T3 to active set at LSN 102
- Losers at crash: {T3}
Redo Phase starts at:
- Min(RecLSN) = min(60, 90) = 60
- Or: RedoLSN = 60
Redo from LSN 60 forward, reapplying all updates.
Undo Phase:
- Undo T3's changes (LSN 102)
- Write CLR for the undoARIES maintains a 'master record' on disk pointing to the last checkpoint. On recovery, the system reads this to find where to start. Without it, you'd need to scan the entire log to find checkpoints.
Let's work through a complete ARIES-style recovery scenario step by step.
1234567891011121314151617181920
Log before crash: LSN Trans Type Page Before After PrevLSN--- ----- ---- ---- ------ ----- -------10 T1 UPDATE P1 A B null20 T2 UPDATE P2 C D null30 T1 UPDATE P3 E F 1040 CHECKPOINT: ActiveTxns={T1:30, T2:20}, DirtyPages={P1:10, P2:20, P3:30}50 T3 UPDATE P4 G H null60 T1 UPDATE P2 D I 3070 T2 COMMIT - - - 2080 T1 UPDATE P1 B J 6090 T3 UPDATE P5 K L 50--- CRASH --- Post-crash disk state (due to steal policy, some pages flushed):- P1 contains 'B' (T1's first update at LSN 10, not second at 80)- P2 contains 'D' (T2's update at LSN 20, not T1's at 60)- P3 contains 'F' (T1's update at LSN 30)- P4, P5 in original state (no flushes)Start at checkpoint (LSN 40)
Initial state from checkpoint:
Scan forward from LSN 50:
LSN 50: T3 UPDATE P4
LSN 60: T1 UPDATE P2
LSN 70: T2 COMMIT
LSN 80: T1 UPDATE P1
LSN 90: T3 UPDATE P5
End of Analysis:
Scan forward from RedoLSN (LSN 10):
For each UPDATE, check if redo is needed:
LSN 10: T1 UPDATE P1 (A→B)
LSN 20: T2 UPDATE P2 (C→D)
LSN 30: T1 UPDATE P3 (E→F)
LSN 50: T3 UPDATE P4 (G→H)
LSN 60: T1 UPDATE P2 (D→I)
LSN 70: COMMIT (no redo for commits)
LSN 80: T1 UPDATE P1 (B→J)
LSN 90: T3 UPDATE P5 (K→L)
After Redo: Database reflects exact state at crash (including uncommitted changes)
Losers: {T1, T3}
Build ToUndo list from lastLSN of each loser:
ToUndo = {90, 80} (process in reverse order)
Undo LSN 90: T3 UPDATE P5 (K→L)
Undo LSN 80: T1 UPDATE P1 (B→J)
Undo LSN 60: T1 UPDATE P2 (D→I)
Undo LSN 50: T3 UPDATE P4 (G→H)
Undo LSN 30: T1 UPDATE P3 (E→F)
Undo LSN 10: T1 UPDATE P1 (A→B)
Recovery Complete!
Recovery problems in interviews follow predictable patterns. Here's how to recognize and solve each type.
Pattern: Given a log, identify which transactions committed (winners) and which didn't (losers).
Method:
Quick Check: Transactions with COMMIT record = winners. All others = losers.
Tricky case: ABORT is also "finished" — but the undo may not be complete if crash happened during abort's undo phase.
'Why does ARIES redo uncommitted transactions?' — To repeat history and recreate exact crash state. 'Why write CLRs during undo?' — To prevent re-undoing if we crash during recovery. 'Why are checkpoints needed?' — To limit recovery time by providing a known-good starting point.
Understanding transaction state transitions helps with recovery analysis.
ACTIVE: Transaction is executing PARTIALLY COMMITTED: Transaction has finished final statement, but commit not yet logged COMMITTED: Commit record written to log (durable) FAILED: Error occurred, transaction will abort ABORTED: Abort complete, changes undone
On recovery, determine each transaction's state:
| Last Record Seen | Recovery Action |
|---|---|
| COMMIT | No action (winner) |
| ABORT followed by END | No action (already undone) |
| ABORT without END | Complete the undo |
| UPDATE (no COMMIT/ABORT) | Full undo needed (loser) |
| END | Transaction fully complete |
The END record marks complete transaction termination:
A transaction is 'partially committed' after finishing execution but BEFORE the commit record is durable. If system crashes in this state, the transaction is a loser—it must be undone. Only after the COMMIT record is on stable storage is the transaction truly committed.
Recovery is the mechanism that makes the ACID durability guarantee possible. Let's consolidate the essential skills:
Module Complete!
You have now completed the Transaction Problems module. You've mastered:
These skills prepare you for the most challenging transaction-related interview questions at any level.
Congratulations! You now have comprehensive mastery of transaction problem-solving for DBMS interviews. From schedule analysis through serializability, locking, deadlocks, and recovery—you can tackle any transaction-related interview question with confidence and precision.