Recovery Concepts - Learning Module

Loading content...

0/252

ACID and Recovery

Where Theory Meets Implementation

ACID properties—Atomicity, Consistency, Isolation, and Durability—define what it means for a database to be reliable. But properties are just promises; they mean nothing without implementation. The recovery system is where ACID properties become reality.

Every ACID property depends on mechanisms we've studied: logging, checkpoints, stable storage, and recovery algorithms. Understanding these connections reveals why recovery is not just one subsystem among many—it's the foundation that makes transactional databases trustworthy.

This page ties together everything we've learned by showing how each ACID property is implemented and maintained through recovery-related mechanisms.

What You Will Learn

By the end of this page, you will understand how Atomicity depends on undo logging, how Durability depends on write-ahead logging and stable storage, how Consistency is preserved through recovery, and how Isolation interacts with recovery during abort and crash scenarios. You'll see ACID as an integrated system, not four independent properties.

ACID Properties: A Recovery Perspective

Before diving into the connections, let's precisely define each ACID property with an eye toward implementation:

ACID Properties and Their Implementation Requirements
Property	Definition	Implementation Requirement	Primary Mechanism
Atomicity	All operations of a transaction complete, or none do	Ability to undo partial transactions	Undo logging, rollback
Consistency	Transactions take the database from one consistent state to another	Preserve invariants even after failures	Constraints + complete recovery
Isolation	Concurrent transactions appear to execute serially	Prevent interference, handle concurrent aborts	Locking, undo for cascading aborts
Durability	Committed transactions survive any subsequent failure	Committed data reaches stable storage	Redo logging, WAL, stable storage

The Recovery System's Role in Each:

              ┌───────────────────────────────────────────────────────────────┐
              │                    ACID PROPERTIES                            │
              └───────────────────────────────────────────────────────────────┘
                    │             │              │              │
                    ▼             ▼              ▼              ▼
              ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
              │ Atomicity│  │Consistency│ │ Isolation│  │Durability│
              └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘
                   │             │             │              │
                   │             │             │              │
              ┌────▼─────┐  ┌────▼─────┐  ┌────▼─────┐  ┌────▼─────┐
              │   UNDO   │  │ Complete │  │  Abort   │  │   REDO   │
              │ Logging  │  │ Recovery │  │ Cascading│  │ Logging  │
              │          │  │ + Checks │  │ Aborts   │  │ + Stable │
              │ Rollback │  │          │  │          │  │ Storage  │
              └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘
                   │             │             │              │
                   └─────────────┴─────────────┴──────────────┘
                                         │
                                         ▼
                          ┌─────────────────────────────┐
                          │     RECOVERY SYSTEM         │
                          │   (Logging, Checkpoints,    │
                          │    Stable Storage, ARIES)   │
                          └─────────────────────────────┘

Notice that all four properties feed into the recovery system. This isn't coincidence—transactional guarantees are fundamentally about surviving failures, and failure survival is the recovery system's job.

ACID as an Integrated System

While ACID is presented as four separate properties, they're deeply interconnected. Atomicity without durability means completed transactions might vanish. Isolation without atomicity means concurrent readers might see partial states. The recovery system implements these interdependencies.

Atomicity: All-or-Nothing Through Undo

Atomicity guarantees that transactions are indivisible—either all operations complete, or none do. This seems simple until you consider that:

Transactions may execute many operations over significant time
Some operations may be written to disk before the transaction completes
The system may crash mid-transaction
An application may request abort at any point

How does atomicity survive these scenarios?

Atomicity Implementation Mechanisms

•Undo Logging — Before any modification, log the previous value. If the transaction must be rolled back, use these 'before images' to restore original values.
•Transaction Rollback — When an application requests abort, traverse the transaction's log records backward, applying undo operations to reverse each modification.
•Crash Recovery Undo Phase — After a crash, the recovery system identifies uncommitted transactions (via Analysis phase) and undoes all their modifications (via Undo phase).
•Compensation Log Records (CLRs) — During undo (either explicit rollback or recovery), write CLRs so that if a crash occurs during undo, recovery can continue correctly.

Atomicity Timeline:

Transaction T1: Insert A, Update B, Delete C

Time ────────────────────────────────────────────────────────────────▶

|─────── T1 Active ───────|
│                         │
│ Insert A logged         │
│ (before: ∅, after: A)   │
│         │               │
│         Update B logged │
│         (before: b, after: B) │
│                   │     │
│                   Delete C logged
│                   (before: c, after: ∅)
│                         │
│                         │
▼                         ▼

SCENARIO 1: Application requests COMMIT
 → Force log to disk
 → Write commit record
 → Ack to application
 → T1 is atomic (all operations committed)

SCENARIO 2: Application requests ABORT
 → Undo: Restore C (insert c)
 → Undo: Restore B (update to b)
 → Undo: Restore A (delete A)
 → Write abort record
 → T1 is atomic (no operations visible)

SCENARIO 3: CRASH before commit
 → On recovery, T1 is uncommitted
 → Undo phase reverses all T1 modifications
 → T1 is atomic (no operations visible)

In all scenarios, the transaction's effects are all-or-nothing. The undo log makes 'nothing' possible even after partial execution.

STEAL Policy Complication

Under STEAL policy, uncommitted modifications may reach the disk before the transaction commits. This is precisely why undo is necessary at crash recovery—those uncommitted changes are on disk and must be reversed. Without STEAL, atomicity would be simpler but memory management would suffer.

Durability: Permanent Through Redo

Durability guarantees that committed transactions survive all subsequent failures. This requires:

Committed modifications must reach stable storage
Even if the system crashes immediately after commit acknowledgment
Recovery must restore any committed work not yet on data pages

The WAL + Redo Solution:

Durability Implementation Mechanisms

•Write-Ahead Logging (WAL) — Log records describing modifications must reach stable storage before the transaction commits. The log is the primary durability mechanism.
•Force-at-Commit — Before acknowledging commit, force all log records for the transaction to stable storage. Once ack'd, the transaction is durable.
•Redo Logging — Log records contain 'after images' enabling reapplication of modifications. If committed modifications aren't on data pages after a crash, redo reconstructs them.
•Stable Storage — Logs and data reside on stable storage (RAID, replication) that survives single-point failures.

Durability Timeline:

Transaction T1: Update X from 100 to 200

Time ────────────────────────────────────────────────────────────────▶

│ Log record written to buffer:
│ "T1: Page P, Offset O, Before=100, After=200"
│         │
│         Modification applied to buffer page P
│         (Page P now has value 200 in memory)
│                   │
│                   Application requests COMMIT
│                         │
│                         LOG FORCED TO STABLE STORAGE
│                         (fsync completes)
│                               │
│                               Commit record written and forced
│                                     │
│                                     Commit acknowledged to application
│                                           │
▼                                           ▼

AT THIS POINT: Durability is guaranteed!

SCENARIO A: No crash
 → Buffer manager eventually writes page P to disk
 → System operates normally

SCENARIO B: Crash before page P written to disk
 → On recovery, log shows T1 committed
 → Page P on disk still has old value (100)
 → Redo phase applies: "Set Page P, Offset O = 200"
 → Page P now has value 200
 → T1's durability is confirmed

SCENARIO C: Crash after page P written to disk
 → On recovery, redo phase checks page
 → PageLSN shows modification already applied
 → Redo skips (optimization)
 → T1's durability is confirmed

The key insight: the log, not the data pages, is the source of durability. As long as log records reach stable storage before commit, recovery can reconstruct any missing modifications.

NO-FORCE Enables Fast Commits

Under NO-FORCE policy, commit doesn't wait for data pages to be written—only log records. Since log writes are sequential (fast) while data page writes are random (slow), NO-FORCE dramatically reduces commit latency. Redo makes this possible by guaranteeing committed work can be recovered from the log.

Consistency: Preserved Through Complete Recovery

Consistency is the most nuanced ACID property. It encompasses:

Schema constraints (primary keys, foreign keys, check constraints)
User-defined invariants (business rules, application logic)
Database internal consistency (index structures match data, no corruption)

Recovery's Role in Consistency:

Consistency Preservation Mechanisms

•Complete Transaction Execution — By guaranteeing atomicity (complete or nothing) and durability (committed = permanent), recovery ensures only complete, valid transactions affect the database. Partial transactions that might violate consistency are rolled back.
•Constraint Checking at Commit — Constraints are checked before commit. If a constraint violation exists, the transaction is aborted (not committed). Recovery never commits constraint-violating transactions.
•Index and Metadata Consistency — Recovery applies redo/undo to indexes and metadata just as it does to data pages. After recovery, indexes correctly reflect data content.
•Database Integrity After Crash — The recovery process itself is designed to produce a consistent state. After ARIES recovery, all committed transactions are present, all uncommitted transactions are absent, and all internal structures are valid.

Consistency Levels:

Consistency Aspect	Pre-Crash State	Post-Recovery State	Mechanism
Schema constraints	Valid (enforced at commit)	Valid	Atomic commit/recovery
Referential integrity	Valid	Valid	Same—partial commits impossible
Index consistency	May be in-flight updates	Valid	Redo/undo applied to indexes too
Physical page structure	May have partial writes	Valid	Torn page protection + recovery
Transaction state	May be indeterminate	Committed or aborted	Recovery resolves all transactions

The Consistency Guarantee:

If the database was consistent before the crash (and transactions maintained consistency during execution), then recovery produces a consistent database. Recovery doesn't validate consistency—it assumes consistent transactions and ensures only complete transactions take effect.

Responsibility Split:

Application: Ensure transactions maintain logical consistency (business rules)
Database: Enforce schema constraints, provide atomic execution, recover correctly
Recovery: Ensure only complete, committed transactions affect final state

Consistency is Partly Application Responsibility

The database cannot verify arbitrary business rules. If an application creates a transaction that leaves data in a logically inconsistent state (e.g., negative bank balance without triggering a constraint), the database will commit it. Consistency requires correct application logic plus database mechanism support.

Isolation: Interactions with Recovery

Isolation is primarily implemented by concurrency control (locking, MVCC), but recovery has important interactions with isolation, especially during abort scenarios:

Scenario: Cascading Abort

With some isolation levels and locking schemes, one transaction aborting can force other transactions to abort:

T1: Update X = 100 → 200
T2: Read X (sees 200)    ← T2 read uncommitted value from T1
T2: Continue processing based on X=200
T1: ABORT
    → T1's change is undone: X = 100
    → But T2 has already seen X=200!
    → T2 has a 'dirty read'
    → T2 must also abort to maintain isolation
    → This is a 'cascading abort'

Recovery's Role:

Undo supports cascading abort — When T1 aborts, recovery can use undo logs to restore X, and T2's dependency triggers its abort
Strict 2PL minimizes cascading — By holding write locks until commit, strict 2PL prevents other transactions from reading uncommitted data
MVCC avoids the problem — By reading committed snapshots, readers never see uncommitted data, eliminating cascading abort risk

Isolation Levels and Recovery Interaction
Isolation Level	Dirty Read Possible?	Cascading Abort Risk	Recovery Complexity
Read Uncommitted	Yes	High—readers may see uncommitted data that gets rolled back	Must handle cascading aborts
Read Committed	No	Low—readers only see committed data	Standard recovery
Repeatable Read	No	Low	Standard recovery
Serializable	No	None—full isolation prevents all anomalies	Standard recovery
MVCC-based levels	No	None—readers use snapshots, never see uncommitted	Garbage collection needed for old versions

Crash During Concurrent Execution:

When a crash occurs with multiple active transactions:

All uncommitted transactions are rolled back — ARIES undo phase reverses all 'loser' transactions
Committed transactions are fully recovered — ARIES redo phase ensures committed work persists
Locks are released — After recovery, no remnant locks from crashed transactions
Isolation is maintained — Post-recovery database shows only the effects of committed, serializable transactions

Recovery and Strict 2PL:

Strict 2PL (holding write locks until commit) actually simplifies recovery:

Uncommitted modifications are locked, so no other transaction can have seen them
Undo of T1 cannot affect T2 (T2 couldn't read T1's uncommitted data)
No cascading aborts needed during recovery

This is why strict 2PL is the most common concurrency control scheme in traditional databases—it provides both serializability and clean recovery semantics.

MVCC and Recovery Simplification

MVCC (Multi-Version Concurrency Control) maintains multiple versions of data items. Readers access old, committed versions while writers create new versions. This eliminates read-write conflicts, but adds complexity: old versions must be garbage collected, and recovery must account for version chains. The tradeoff is often worthwhile for read-heavy workloads.

ACID as an Integrated Recovery System

Having examined each property individually, let's appreciate how they form an integrated system. The log is the unifying mechanism:

The Log as ACID Foundation:

┌─────────────────────────────────────────────────────────────────────────┐
│                        TRANSACTION LOG                                  │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   LSN 100: T1 BEGIN                                                     │
│   LSN 101: T1 Update P1 (before=A, after=B)     ← Atomicity (undo)     │
│   LSN 102: T2 BEGIN                                                     │
│   LSN 103: T2 Update P2 (before=X, after=Y)     ← Durability (redo)    │
│   LSN 104: T1 Update P3 (before=M, after=N)     ← Both                 │
│   LSN 105: T2 COMMIT                            ← Durability           │
│   LSN 106: T1 ABORT                             ← Atomicity            │
│   LSN 107: CLR for LSN 104 (undo P3: N→M)       ← Atomicity            │
│   LSN 108: CLR for LSN 101 (undo P1: B→A)       ← Atomicity            │
│   LSN 109: T1 END                                                       │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

After this sequence:
- T2 is committed and durable (DURABILITY)
- T1 is fully rolled back, no trace (ATOMICITY)
- Database is consistent (CONSISTENCY via complete tx only)
- T2 didn't see T1's uncommitted changes (ISOLATION via locks/MVCC)

Log Record Dual Purpose

•Before Image (Old Value) — Enables UNDO for atomicity. If transaction must abort, restore old values.
•After Image (New Value) — Enables REDO for durability. If committed transaction's changes aren't on disk, reapply them.
•Transaction ID — Groups operations for atomicity (all T1 ops together) and recovery (identify uncommitted transactions).
•Commit Record — The definitive marker of durability. Commit on disk = durable. No commit = transaction is atomic failure.

The Recovery Algorithm Unifies All Properties:

ARIES Phase	Atomicity Support	Durability Support	Consistency Support	Isolation Support
Analysis	Identifies uncommitted transactions	Identifies redo point	N/A	N/A
Redo	N/A	Reapplies committed changes	Restores complete tx state	Rebuilds pre-crash state
Undo	Reverses uncommitted changes	N/A	Removes partial tx effects	Cleans up for future tx

Every phase serves multiple properties. The recovery algorithm is the implementation of ACID.

Think 'Log-Centric'

When reasoning about ACID, think log-centric: The log is the source of truth. Data pages are an optimization for fast random access. During recovery, the log reconstructs the guaranteed-correct state. This mental model clarifies why logging is so fundamental to transactional databases.

Practical Implications for Database Operation

Understanding the ACID-recovery connection has practical implications for database configuration, monitoring, and troubleshooting:

Configuration Implications

•Log placement matters critically — Logs must be on the fastest, most reliable storage. Log latency directly affects commit latency. Log loss means data loss.
•Synchronous commit settings trade durability for speed — Setting synchronous_commit=off (PostgreSQL) or innodb_flush_log_at_trx_commit=2 (MySQL) reduces commit latency but creates a window of potential data loss.
•Checkpoint frequency trades recovery time for I/O load — Frequent checkpoints mean faster recovery but more I/O during normal operation.
•Long-running transactions complicate everything — They prevent log truncation, increase undo work on abort, and extend recovery time.
•Replication adds durability layers — Synchronous replication provides durability beyond local stable storage at the cost of latency.

ACID Strength vs Performance Tradeoffs
Setting	Strong ACID	Reduced ACID	Risk
Log sync at commit	Every commit syncs log	Batched/delayed sync	Recent commits may be lost on crash
Replication mode	Synchronous (wait for standby)	Asynchronous	Committed data may be lost if primary fails
Checkpoint interval	Frequent (2-5 min)	Infrequent (30 min)	Longer recovery time after crash
Isolation level	Serializable	Read Committed	Possible anomalies (non-repeatable reads)

Monitoring for ACID Health:

Operators should monitor:

Log disk I/O latency — High latency means slow commits
Log disk space — Full log disk stops all writes
Checkpoint duration and frequency — Long checkpoints may indicate I/O issues
Oldest active transaction — Long-running transactions block log truncation
Replication lag — In async replication, lag represents potential data loss window

Troubleshooting Recovery Issues:

Long recovery time — Too infrequent checkpoints, too much log to replay
Undo during recovery taking too long — Long-running transactions at crash time
Recovery crashes — Possible disk corruption, check storage health
Data lost after crash — Incorrect durability settings (async commit, no replication)

Know Your Configuration

Many databases ship with settings optimized for benchmark performance, not production durability. Before deploying, explicitly verify: Are commits synchronous? Is the log on reliable storage? Is replication synchronous? Document your durability posture—the cost of discovering misconfiguration through data loss is unacceptable.

Summary: ACID Through Recovery

ACID properties are the promises databases make; the recovery system is how those promises are kept. Every aspect of recovery—logging, checkpoints, stable storage, and recovery algorithms—exists to implement one or more ACID guarantees. Let's consolidate the key insights:

Key Takeaways

•Atomicity requires undo — Before images in log records enable rollback of partial transactions.
•Durability requires redo — After images in log records enable reconstruction of committed work from persistent log.
•Consistency follows from complete recovery — Only complete, committed transactions affect the final database state.
•Isolation interacts with recovery — Strict 2PL simplifies recovery; MVCC requires version garbage collection.
•The log is the unifying mechanism — Before images, after images, and commit records implement all four properties.
•ARIES integrates all properties — Analysis, Redo, and Undo phases together implement atomic, durable, consistent recovery.
•Configuration choices affect ACID strength — Operators can trade durability/isolation for performance, but must understand the risks.

Module Complete:

This concludes Module 2: Recovery Concepts. You now have a comprehensive understanding of:

Durability guarantee — What it means and how it's implemented
Recovery manager — Its responsibilities and architecture
Stable storage — How reliability is built from redundancy
Recovery algorithms — From deferred update to ARIES
ACID and recovery — How recovery implements transactional guarantees

With this conceptual foundation, you're prepared for the detailed chapters ahead on specific recovery mechanisms: Write-Ahead Logging (WAL), Checkpoints, and the ARIES algorithm in depth.

Module Complete

You have completed Module 2: Recovery Concepts. You now understand recovery not as an isolated subsystem but as the foundational mechanism that makes ACID properties real. This understanding will serve you well as you dive deeper into logging, checkpointing, and the ARIES algorithm.