Loading learning content...
You've now studied four distinct approaches to database recovery: Deferred Update, Immediate Update, Shadow Paging, and ARIES. Each embodies different design philosophies and trade-offs. But how do you choose? When is one approach superior to another?
This page synthesizes everything we've learned into a comprehensive comparison, providing clear decision frameworks and real-world guidance. By understanding the trade-offs, you'll be equipped to evaluate recovery mechanisms in any database system—whether selecting software, debugging performance issues, or designing your own storage engine.
By the end of this page, you will understand how each recovery algorithm compares across key dimensions, which scenarios favor which approach, and how modern systems combine these techniques. You'll also gain insight into recovery in distributed systems.
Let's begin with a detailed comparison across the most important dimensions:
| Characteristic | Deferred Update | Immediate Update | Shadow Paging | ARIES |
|---|---|---|---|---|
| Alternative name | NO-UNDO/REDO | UNDO/REDO | Copy-on-Write | Advanced UNDO/REDO |
| Buffer policy | NO-STEAL | STEAL | N/A (page-based) | STEAL |
| Force policy | FORCE | NO-FORCE | FORCE | NO-FORCE |
| Logging required? | Yes (REDO info) | Yes (UNDO + REDO) | No | Yes (UNDO + REDO) |
| Recovery type | REDO only | UNDO + REDO | None | UNDO + REDO |
| Transaction size limit | Buffer pool size | Unlimited | Page table size | Unlimited |
| Commit latency | Moderate (log + data) | Low (log only) | High (all pages) | Low (log only) |
| Recovery time | Moderate | Slow | Instant | Fast (optimized) |
| Implementation complexity | Low | Medium | Low | High |
| Industry adoption | Limited | Some systems | Limited (file systems) | Universal |
Reading the Matrix:
Buffer policy (STEAL vs NO-STEAL) determines whether uncommitted changes can reach disk. STEAL enables larger transactions but requires UNDO capability.
Force policy (FORCE vs NO-FORCE) determines whether committed changes must be on disk before commit returns. NO-FORCE means faster commits but requires REDO capability.
Logging captures before/after images. More logging means more I/O but enables recovery.
Transaction size limit is a critical practical constraint—some workloads require very large transactions.
Recovery time and implementation complexity often trade off—simpler recovery may mean slower recovery.
Notice that ARIES combines the most desirable properties: STEAL (no size limit), NO-FORCE (fast commits), and optimized recovery. This explains its universal adoption in production DBMS implementations.
Performance varies dramatically across algorithms. Let's examine key operational scenarios:
Normal Operation: Transaction Execution
| Metric | Deferred Update | Immediate Update | Shadow Paging | ARIES |
|---|---|---|---|---|
| Write latency | Low (buffer only) | Low (buffer only) | Moderate (allocate shadow) | Low (buffer only) |
| Memory pressure handling | Poor (pinned buffers) | Good (can evict) | Good (no pinning) | Good (can evict) |
| Log I/O per write | Small (after-image) | Larger (both images) | None | Larger (both images) |
| Bulk operation support | Limited | Excellent | Limited | Excellent |
Normal Operation: Transaction Commit
| Aspect | Deferred Update | Immediate Update | Shadow Paging | ARIES |
|---|---|---|---|---|
| Must force log? | Yes | Yes | No (no log) | Yes |
| Must force data pages? | Yes (all modified) | No | Yes (all modified) | No |
| I/O operations | Log + all pages | Log only (sequential) | All pages (random) | Log only (sequential) |
| Commit latency | Medium-High | Low | High | Low |
| Latency scaling | O(pages modified) | O(log records) | O(pages modified) | O(log records) |
Transaction Abort (Rollback)
| Aspect | Deferred Update | Immediate Update | Shadow Paging | ARIES |
|---|---|---|---|---|
| Abort cost | Near zero | O(changes made) | Near zero | O(changes made) |
| Pages to restore | None (never written) | All modified | None (discard shadows) | All modified |
| Log records to process | None | All for transaction | None | All for transaction |
| Best for high abort rates | Yes | No | Yes | No |
Optimistic concurrency control schemes have high abort rates (validation failures). Deferred update and shadow paging shine here because aborts are nearly free. ARIES/immediate update can be expensive when many transactions abort after substantial work.
Recovery performance—how quickly the database becomes available after a crash—is often the deciding factor in algorithm selection for mission-critical systems.
| Factor | Deferred Update | Immediate Update | Shadow Paging | ARIES |
|---|---|---|---|---|
| Log scan required | Start to checkpoint | Start to end (forward + backward) | None | Checkpoint to end |
| Phase complexity | 1 phase (REDO) | 2 phases (REDO + UNDO) | 0 phases | 3 phases (optimized) |
| Pages to read | Committed pages | All dirty pages | None | Only pages needing redo |
| Checkpoint benefit | Moderate | Moderate | N/A | High (bounds all phases) |
| Parallelizable? | Yes (REDO) | Partially | N/A | Yes (REDO especially) |
| Worst case | O(log since checkpoint) | O(log size) | O(1) | O(checkpoint interval) |
Recovery Time Estimates (Illustrative):
Assume: 10 GB dirty data, 100 MB/s I/O, 1 GB log since checkpoint
| Algorithm | Estimated Recovery Time |
|---|---|
| Shadow Paging | ~10 ms (read superblock) |
| Deferred Update | ~10-30 seconds (REDO committed) |
| ARIES | ~30-60 seconds (Analysis + REDO + limited UNDO) |
| Simple Immediate Update | ~2-5 minutes (full REDO + UNDO) |
Note: These are rough estimates. Actual times depend heavily on checkpoint frequency, transaction mix, and hardware.
ARIES Optimization Impact:
ARIES's optimizations dramatically reduce recovery time:
Many organizations have RTOs (Recovery Time Objectives) of seconds to minutes. Shadow paging trivially meets this; ARIES can meet it with frequent checkpoints. Simple immediate update may struggle for databases with high write rates or infrequent checkpoints.
Understanding the ideal use cases for each algorithm helps in system selection and design:
For general-purpose OLTP databases, ARIES is almost always the right choice. It handles the widest range of workloads with the best balance of normal-operation performance and recovery characteristics. The other algorithms serve specialized niches.
Let's examine how major systems implement recovery:
ARIES-Based Systems:
| System | Recovery Approach | Notable Features |
|---|---|---|
| PostgreSQL | ARIES (WAL) | Point-in-time recovery, streaming replication |
| MySQL InnoDB | ARIES variant | Doublewrite buffer for torn page protection |
| SQL Server | ARIES | Accelerated database recovery (ADR) option |
| Oracle | ARIES-like | Fast-start checkpointing, flashback |
| DB2 | ARIES (origin) | Original ARIES implementation |
| SQLite | Rollback journal / WAL | Two modes; WAL mode is ARIES-like |
Shadow Paging / Copy-on-Write Systems:
| System | Type | Notes |
|---|---|---|
| LMDB | Database | Single-writer, copy-on-write, instant recovery |
| CouchDB | Database | Append-only B-tree with multi-version |
| ZFS | File System | Full copy-on-write, snapshots, self-healing |
| Btrfs | File System | Copy-on-write, snapshots, subvolumes |
| APFS | File System | Apple's copy-on-write file system |
Hybrid Approaches:
Many modern systems combine techniques:
Log-structured storage (LSM-trees): All writes go to log/memtable first, later compacted. This is essentially deferred update at the storage level.
Copy-on-write with logging: Some systems use copy-on-write for data pages but still maintain a WAL for recovery of in-memory structures.
MVCC with recovery: Multi-version concurrency control creates versions like shadow paging, but uses logging for durability.
SQL Server's Accelerated Database Recovery (ADR) is an interesting evolution: it uses a persistent version store to enable near-instant recovery while maintaining ARIES-style logging. This hybrid approach shows how techniques can be combined for specific benefits.
When data spans multiple nodes, recovery becomes significantly more complex. The algorithms we've studied apply per-node, but coordination is required for distributed transactions.
Two-Phase Commit (2PC) Integration:
For distributed transactions, 2PC coordinates prepare and commit across nodes:
The local recovery algorithm (typically ARIES) handles:
Replication and Recovery:
Replicated systems have additional recovery considerations:
| Replication Type | Recovery Approach |
|---|---|
| Synchronous | All replicas have same state; recover any |
| Asynchronous | Primary may be ahead; catch up from log |
| Quorum-based | Reconstruct from quorum; may need voting |
Log Shipping / Streaming Replication:
Many systems send WAL records to replicas:
This leverages the log-based recovery infrastructure for replication—replicas are essentially in continuous REDO mode.
Consensus-Based Systems:
Modern distributed databases (CockroachDB, TiDB, YugabyteDB) use consensus protocols (Raft, Paxos):
Think of distributed recovery as layered: the local recovery algorithm (ARIES) handles single-node crash recovery, while distributed protocols (2PC, consensus) handle coordination. Each layer builds on the guarantees provided by the layer below.
When evaluating or designing a recovery system, consider these key questions:
1. What are your transaction characteristics?
2. What are your performance priorities?
3. What is your abort pattern?
4. What are your operational requirements?
5. What is your complexity budget?
| Primary Requirement | Recommended Algorithm |
|---|---|
| General-purpose OLTP | ARIES |
| Instant recovery, limited writes | Shadow paging |
| Small transactions, simple system | Deferred update |
| Educational/prototype system | Simple immediate update |
| Embedded database, single writer | Shadow paging (LMDB-style) |
| Log-structured storage | Deferred update concepts |
| Maximum flexibility | ARIES |
If you're unsure which algorithm to use or recommend, ARIES is the safest choice. It's proven, widely understood, handles edge cases, and doesn't impose unexpected limitations. The other algorithms are optimizations for specific scenarios.
We've completed our exploration of recovery algorithm types. Let's consolidate the key takeaways from this entire module:
The Big Picture:
Database recovery solves a fundamental problem: ensuring atomicity and durability despite failures. The algorithms we've studied represent decades of research and engineering to solve this problem efficiently.
Understanding recovery algorithms equips you to:
What's Next in Chapter 27:
In the following modules, we'll dive deeper into ARIES's specific phases (Analysis, REDO, UNDO) and features (physiological logging, nested top actions, etc.), exploring the nuances that make it the industry standard.
Congratulations! You've mastered the fundamental recovery algorithm types. You can now compare deferred update, immediate update, shadow paging, and ARIES across multiple dimensions, understand their trade-offs, and make informed decisions about when to use each approach.