Database Management SystemsARIES Features

ARIES Advanced Features and Optimizations

LevelAdvanced

Duration90 mins

TopicARIES Features

1 / 5

Steal/No-Force Policy

The Buffer Management Dilemma

Every database system faces a fundamental tension: the buffer manager wants maximum flexibility to manage memory efficiently, while the recovery manager demands guarantees about what data reaches persistent storage and when. How these two subsystems negotiate their competing requirements determines both the performance ceiling and the correctness guarantees of the entire system.

The steal/no-force policy combination represents ARIES's elegant resolution to this tension—a design that grants the buffer manager complete freedom while still providing full ACID guarantees. Understanding this policy is essential to grasping why ARIES became the dominant recovery paradigm in commercial database systems.

What You Will Learn

By the end of this page, you will understand the four possible combinations of buffer management policies (steal/no-steal, force/no-force), why ARIES chooses steal/no-force, the recovery implications of each choice, and how this design enables maximum I/O flexibility without sacrificing durability or atomicity.

Understanding Buffer Management Policies

Before diving into steal/no-force specifically, we must establish the conceptual framework of buffer management policies. These policies govern two critical decisions:

When can dirty pages be written to disk? — This determines whether uncommitted changes can appear on disk.
When must dirty pages be written to disk? — This determines whether committed changes must reach disk before commit returns.

The answers to these questions define four orthogonal policies that combine to create four possible configurations.

The Four Buffer Management Policies
Policy	Definition	Question Answered	Constraint Imposed
Steal	Allow flushing uncommitted pages	Can we write uncommitted data?	None—buffer manager has full freedom
No-Steal	Forbid flushing uncommitted pages	Can we write uncommitted data?	Dirty pages pinned until commit
Force	Require flushing at commit	Must committed data be on disk?	Commit waits for all pages to flush
No-Force	Allow delayed flushing	Must committed data be on disk?	None—commit can return immediately

The Steal Policy:

When the buffer pool is under memory pressure and needs to evict a page, the steal policy asks: "Can I evict this dirty page even if the transaction that modified it hasn't committed yet?"

Steal = Yes: The buffer manager can write any dirty page to disk at any time, regardless of transaction state. This maximizes flexibility but creates a problem: if the transaction later aborts, we've written uncommitted data to the database. Recovery must be prepared to undo these changes.
No-Steal = No: Dirty pages from uncommitted transactions are "pinned" in the buffer pool. They cannot be evicted until the transaction commits. This simplifies recovery (no undo needed for in-place updates) but can cause memory exhaustion if transactions modify many pages before committing.

The Force Policy:

At commit time, the force policy asks: "Must all pages modified by this transaction be flushed to disk before we acknowledge the commit?"

Force = Yes: Every dirty page touched by the transaction must be written to disk synchronously before commit returns. This guarantees durability trivially (data is on disk!) but imposes severe I/O overhead—every commit triggers potentially many random disk writes.
No-Force = No: The transaction can commit as soon as its log records are on stable storage. The actual data pages may remain in the buffer pool and be written later. This dramatically improves commit latency but requires the recovery system to redo committed changes that weren't yet on disk when a crash occurred.

The Key Insight

Steal requires UNDO capability in recovery (to reverse uncommitted changes that reached disk). No-Force requires REDO capability (to reinstall committed changes that didn't reach disk). ARIES supports both, enabling the optimal steal/no-force combination.

The Four Policy Combinations

The steal/no-steal and force/no-force policies are orthogonal—they can be combined in any of four ways. Each combination has distinct characteristics, trade-offs, and recovery requirements. Understanding all four illuminates why ARIES chose steal/no-force.

Policy Combinations and Their Characteristics
Combination	UNDO Needed?	REDO Needed?	Performance	Memory Pressure	Practical Assessment
No-Steal/Force	No	No	Poor	High	Simple recovery, unusable performance
No-Steal/No-Force	No	Yes	Moderate	High	Better performance, still memory-limited
Steal/Force	Yes	No	Moderate	Low	Better memory, still I/O-limited
Steal/No-Force	Yes	Yes	Excellent	Low	Maximum flexibility, most complex recovery

No-Steal/Force (Simplest, Slowest):

This is the naive approach that requires no recovery logic beyond "don't crash." Because we never write uncommitted data (no-steal) and always write committed data before returning (force):

Abort is trivial: just discard buffer pool pages
Crash recovery is trivial: database on disk is always in a consistent committed state

Unfortunately, this is practically unusable:

Large transactions can exhaust the buffer pool (no-steal pins too many pages)
Every commit incurs synchronous I/O to every modified page (force is expensive)
Random write patterns make matters worse on spinning disks

No production database uses this combination.

No-Steal/No-Force:

Relaxing the force requirement improves commit latency—we only need to flush log records, not data pages. However, we still can't evict uncommitted pages, so:

Long-running transactions with large write sets can still exhaust memory
We need REDO logic because committed changes may not be on disk at crash time
No UNDO needed because uncommitted changes never reach disk

This was used by some early systems but the memory limitation proved severely restricting for OLTP workloads.

Steal/Force:

Relaxing no-steal solves the memory problem—we can evict any page when needed. But forcing at commit still imposes severe I/O overhead:

We need UNDO logic because uncommitted changes may reach disk
No REDO needed because all committed changes are on disk before commit returns
Commit latency remains high due to synchronous page flushes

This combination is viable but suboptimal; commit remains I/O-bound.

Steal/No-Force (ARIES Choice):

By allowing both policies to maximize flexibility:

Buffer manager can evict any page at any time (steal → low memory pressure)
Commits only require log flush, not data page flush (no-force → low commit latency)
Recovery needs both UNDO (for uncommitted steals) and REDO (for uncommitted flushes)

This combination offers optimal buffer management but demands sophisticated recovery. ARIES provides exactly that sophistication.

Why Complexity Is Worth It

The complexity of UNDO+REDO recovery is a one-time engineering investment. The performance benefits of steal/no-force accrue on every transaction, every second, for the lifetime of the system. This is a classic engineering trade-off: upfront complexity for continuous benefit.

Deep Dive into the Steal Policy

The steal policy deserves detailed examination because its implications are subtle and its benefits are profound. When we say "steal," we mean the buffer manager can steal a page frame from a transaction that hasn't committed yet.

The Mechanics of Stealing:

Consider this scenario:

Transaction T₁ modifies page P, making it dirty
T₁ has not yet committed
The buffer pool is full, and transaction T₂ needs a page frame
The buffer manager's replacement policy (e.g., LRU) selects P for eviction

With no-steal, the buffer manager cannot evict P. It must find another victim or wait. If T₁ is a long-running transaction with many modifications, this can deadlock or exhaust memory.

With steal, the buffer manager writes P to disk and reuses the frame. T₁'s uncommitted changes are now on the physical database.

steal_policy_visualization.md
Timeline with Steal Policy:
┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│  T₁: BEGIN ──► MODIFY(P) ──► ... (long running) ...                │
│                    │                                                │
│                    ▼                                                │
│              Page P is dirty                                        │
│              (uncommitted)                                          │
│                    │                                                │
│   Buffer Pool Full │◄── T₂ requests new page                       │
│                    │                                                │
│                    ▼                                                │
│            ┌───────────────┐                                        │
│            │ STEAL PAGE P  │  ◄── Buffer manager evicts P          │
│            │ Write to disk │      despite T₁ uncommitted           │
│            └───────────────┘                                        │
│                    │                                                │
│                    ▼                                                │
│            Database on disk now                                     │
│            contains UNCOMMITTED data                                │
│                    │                                                │
│  Two possible outcomes:                                             │
│                    │                                                │
│         ┌─────────┴─────────┐                                       │
│         ▼                   ▼                                       │
│    T₁ COMMITS          T₁ ABORTS (or CRASH)                        │
│    (data stays)        (MUST UNDO changes)                          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

The Recovery Implication:

If T₁ aborts after a steal, or if the system crashes before T₁ commits, the database on disk contains changes that should never have been visible. ARIES must undo these changes:

For abort: ARIES reads T₁'s log records in reverse order, applying the undo information to restore original values. This works because the log contains the before images (original values before modification).
For crash recovery: During the UNDO phase, ARIES identifies all transactions that were active at crash time (from the analysis phase) and rolls them back using the same logic.

The key requirement is that the log must reach stable storage before any stolen page. This is the Write-Ahead Logging (WAL) protocol, which ARIES rigorously enforces.

WAL Enables Steal

Without WAL, the steal policy would be unsafe. If a dirty page were written to disk before its log record, and a crash occurred, we'd have no way to undo the change. WAL guarantees that undo information is always available before the change reaches disk.

Benefits of the Steal Policy:

Memory Efficiency: The buffer pool never becomes hostage to long-running transactions. Any page can be evicted when memory is needed.
Predictable Performance: Memory pressure triggers page eviction, not transaction failure. The system degrades gracefully under load.
Flexibility in Transaction Size: Transactions can modify more pages than fit in the buffer pool. This is essential for bulk operations.
Simpler Buffer Manager: The replacement algorithm doesn't need to understand transaction boundaries. It just picks victims based on access patterns.
Reduced Lock Contention: Pages aren't held in memory artificially, reducing the window where one transaction blocks another waiting for buffer space.

Deep Dive into the No-Force Policy

The no-force policy provides equally dramatic benefits, this time for commit latency. Understanding no-force requires examining what happens at commit time under each strategy.

Force Policy Commit:

When a transaction commits under a force policy:

Identify all pages modified by this transaction
For each dirty page: perform synchronous write to disk
Wait for all writes to complete
Only then return success to the application

If a transaction modified 100 pages scattered across the disk, commit requires 100 random I/O operations. On a spinning disk with 10ms seek time, that's one second just for the commit. Under heavy concurrency with multiplexed transactions, this becomes catastrophic.

No-Force Policy Commit:

When a transaction commits under no-force:

Append commit record to the log
Flush log to stable storage (typically a sequential write)
Return success to the application immediately
Dirty pages remain in buffer pool, eventually written by background I/O

The critical insight: logging is sequential, data pages are random. Sequential I/O is orders of magnitude faster than random I/O on spinning disks, and still significantly faster on SSDs due to reduced write amplification.

A commit that took 1 second under force might take 1ms under no-force (the time to flush a few log pages).

force_vs_noforce_comparison.md
Commit Latency Comparison:
 
┌─────────────────────────────────────────────────────────────────────┐
│                      FORCE POLICY                                   │
├─────────────────────────────────────────────────────────────────────┤
│  Transaction T modifies pages: P1, P2, P3, ..., P100                │
│                                                                     │
│  COMMIT:                                                            │
│    ├──► Write P1 to disk (10ms)                                     │
│    ├──► Write P2 to disk (10ms)                                     │
│    ├──► Write P3 to disk (10ms)                                     │
│    │        ...                                                     │
│    └──► Write P100 to disk (10ms)                                   │
│                                                                     │
│  Total: ~1000ms (1 second) for random I/O                           │
│  Throughput: 1 transaction/second per disk                          │
└─────────────────────────────────────────────────────────────────────┘
 
┌─────────────────────────────────────────────────────────────────────┐
│                     NO-FORCE POLICY                                 │
├─────────────────────────────────────────────────────────────────────┤
│  Transaction T modifies pages: P1, P2, P3, ..., P100                │
│                                                                     │
│  COMMIT:                                                            │
│    └──► Flush log records to disk (sequential, ~1ms)                │
│                                                                     │
│  Background (asynchronous, batched):                                │
│    └──► Write dirty pages as buffer manager sees fit                │
│                                                                     │
│  Total: ~1ms for commit response                                    │
│  Throughput: 1000+ transactions/second per disk                     │
└─────────────────────────────────────────────────────────────────────┘
 
Performance improvement: 1000x in this scenario

The Recovery Implication:

With no-force, committed transactions may have their changes only in the log, not on data pages. If the system crashes:

The commit record is in the log (that's why the commit was acknowledged)
Some or all of the transaction's data page changes may not be on disk
Recovery must redo these changes to restore the committed state

ARIES's REDO phase handles this by replaying all logged operations from the earliest relevant point, bringing the database forward to its state at crash time.

Log Is the Source of Truth

Under no-force, the log becomes the authoritative record of committed state. The database files are merely a cache that eventually catches up. This inversion of the traditional view—database as primary, log as backup—is fundamental to ARIES's design.

Benefits of the No-Force Policy:

Low Commit Latency: Commits complete in milliseconds regardless of how many pages were modified.
Sequential Log Writes: Group commit can batch multiple transactions' log records into a single I/O, amortizing disk access across many transactions.
Reduced Random I/O: Data pages are written when convenient, not when transactions demand it. The buffer manager can batch and optimize writes.
Better Write Patterns: Rather than writing pages immediately after each modification, pages accumulate multiple modifications before being written, reducing total I/O.
Higher Throughput: Systems can sustain thousands of transactions per second on hardware that would support only tens under force.

Combined Benefits and Synergies

The steal/no-force combination creates synergistic benefits beyond what each policy provides individually. The two policies complement each other, and ARIES exploits this synergy masterfully.

Synergistic Benefits of Steal/No-Force

•Unified I/O Optimization: Both stealing and no-force allow the buffer manager to batch I/O intelligently. Multiple transactions' changes to the same page are written once, not once per transaction.
•Decoupled Transaction Progress: Transaction execution, commit, and I/O are fully decoupled. Transactions can proceed without waiting for I/O; commits don't block on writes; I/O proceeds based on system-optimal scheduling.
•Memory-I/O Balance: Steal handles memory pressure; no-force handles I/O pressure. Even under combined memory and I/O stress, the system degrades gracefully.
•Write Absorption: A "hot" page modified by many transactions is written to disk only periodically, not once per transaction. This dramatically reduces total I/O for update-heavy workloads.
•Checkpoint Efficiency: Checkpoints can proceed incrementally (fuzzy checkpoints) because there's no assumption that all pages are forced to disk at checkpoint time.

Write Absorption in Detail:

Consider a counter page that's incremented by 1000 transactions per second:

With Force: 1000 writes per second to this page (each commit forces the page)
With No-Force + Steal: Perhaps 1 write per second (page is updated in memory, written when evicted or checkpointed)

This is a 1000x reduction in I/O for hot pages. In real workloads with many hot pages, the total I/O savings are enormous.

The formula for I/O reduction is approximately:

I/O Reduction Factor ≈ (transactions modifying page P) / (evictions of page P)

For frequently-accessed pages that remain in the buffer pool, this factor can be thousands or millions.

Real-World Impact

Modern databases like PostgreSQL, MySQL/InnoDB, Oracle, and SQL Server all use steal/no-force policies. This isn't academic theory—it's the only policy combination that can achieve production-grade transaction throughput on commodity hardware.

Implementation Considerations

Implementing steal/no-force requires careful coordination between the buffer manager, log manager, and recovery subsystem. Here are the critical implementation requirements:

Critical Implementation Requirements

•LSN Tracking on Pages: Every data page must record the LSN of the last log record that modified it. This pageLSN allows REDO to skip already-applied operations.
•WAL Protocol Enforcement: Before any dirty page is written to disk, all log records up to its pageLSN must be flushed. The buffer manager must coordinate with the log manager.
•Dirty Page Table: The system must track which pages are dirty and their recLSNs (the LSN when the page first became dirty). This determines the REDO starting point.
•Transaction Table: Active transactions and their lastLSNs must be tracked for UNDO purposes.
•Log Record Format: Each log record must contain both redo and undo information—the 'after image' and 'before image' of modifications.
•Compensation Logging: When undoing operations, Compensation Log Records (CLRs) must be written to ensure undo operations are not repeated after a crash during recovery.

wal_coordination.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Before writing a dirty page to disk (steal operation):
function prepareToEvictPage(page: Page) {
    // WAL Protocol: Log must reach disk before data
    logManager.flushUpToLSN(page.pageLSN);
    
    // Now safe to write the page
    diskManager.writePage(page);
}
 
// At commit time (no-force operation):
function commitTransaction(txn: Transaction) {
    // Write commit record to log
    commitLSN = logManager.appendCommitRecord(txn.id);
    
    // Flush log to stable storage
    // This is the ONLY synchronous I/O required!
    logManager.flushUpToLSN(commitLSN);
    
    // Transaction is now durably committed
    // Data pages may still be in buffer pool - that's OK
    return COMMIT_SUCCESS;
    
    // Note: Dirty pages will be written later by:
    // - Buffer manager eviction (steal)
    // - Background checkpoint process  
    // - Natural buffer pool turnover
}

The PageLSN Invariant:

The pageLSN recorded on each page serves multiple purposes:

Recovery Optimization: During REDO, if a page's on-disk pageLSN ≥ the log record's LSN, that log record's effect is already on disk and can be skipped.
WAL Enforcement: The buffer manager checks that logManager.flushedLSN ≥ page.pageLSN before writing any page.
Consistency Verification: If a page's pageLSN is greater than expected, it indicates corruption or a recovery bug.

This single value—the LSN of the last modification—provides all necessary coordination between the buffer manager and recovery system.

The LSN is Everything

In ARIES, the Log Sequence Number (LSN) is the fundamental ordering mechanism. It provides total ordering of all database modifications, enables comparison between log and disk state, and coordinates all subsystems. Master the LSN and you master ARIES.

Trade-offs and Design Considerations

While steal/no-force is overwhelmingly advantageous, it's important to understand the trade-offs and edge cases where other policies might be considered.

Advantages

•Maximum I/O flexibility
•Minimal commit latency
•Efficient memory utilization
•Graceful degradation under load
•Supports very large transactions
•Enables group commit optimization
•Compatible with fuzzy checkpoints

Considerations

•Complex recovery implementation
•Requires both UNDO and REDO logic
•Log becomes critical path (must not fail)
•Recovery time proportional to work since last checkpoint
•CLRs add logging overhead during aborts
•Requires careful LSN coordination
•More difficult to understand and debug

When Simpler Policies Might Apply:

Read-only or rarely-updated data: If updates are rare, the cost of force might be acceptable for simpler recovery.
Small buffer pools relative to data size: If eviction is constant anyway, the steal benefit diminishes slightly (but no-force still helps).
Embedded systems with limited code space: The simpler no-steal/force logic might fit where ARIES wouldn't.
Append-only systems: Some designs avoid in-place updates entirely, making the question moot.

However, for any general-purpose OLTP database, steal/no-force is essentially mandatory for acceptable performance.

Recovery Time Trade-off:

One genuine trade-off: with no-force, committed changes may only be in the log for extended periods. If a crash occurs:

More REDO work may be needed to bring pages up to date
Recovery time increases with time since the last checkpoint

This is mitigated by:

Frequent checkpoints: Bound the recovery window
Fuzzy checkpoints: Minimize checkpoint overhead (discussed in later pages)
SSD storage: Fast random reads make REDO quick
Parallel recovery: Modern systems parallelize REDO across multiple threads

Optimize for the Common Case

Crashes are rare; normal operation is constant. Steal/no-force optimizes for the common case (every transaction, every second) at the cost of slightly longer handling for the rare case (crash recovery, hopefully never). This is sound engineering.

Summary: The Steal/No-Force Foundation

We've established the conceptual foundation for understanding ARIES's buffer management philosophy. The steal/no-force policy combination is not an arbitrary choice—it's the result of careful analysis of the trade-offs between runtime performance and recovery complexity.

Key Takeaways

•Steal allows uncommitted pages to be written to disk, requiring UNDO capability but freeing the buffer manager from memory pressure.
•No-Force allows commits without synchronous page writes, requiring REDO capability but achieving millisecond commit latency.
•Combined, these policies deliver maximum I/O flexibility and throughput at the cost of more sophisticated recovery logic.
•WAL (Write-Ahead Logging) is the enabling protocol that makes steal safe—log records always reach disk before data pages.
•ARIES provides the UNDO and REDO mechanisms that make steal/no-force practical and robust.
•Modern databases universally adopt steal/no-force because no other combination can achieve adequate performance.

What's Next:

The steal/no-force policy requires sophisticated logging to track both undo and redo information. The next page examines physiological logging—ARIES's innovative approach that combines physical and logical logging to achieve both efficiency and simplicity in recording database modifications.

Page Complete

You now understand the steal/no-force buffer management policies that form the foundation of ARIES's design. This knowledge is essential for understanding how ARIES achieves both maximum performance and correct recovery. Next, we'll explore how ARIES logs modifications through physiological logging.

1 / 5

Loading learning content...

Database Management SystemsARIES Features

ARIES Advanced Features and Optimizations

LevelAdvanced

Duration90 mins

TopicARIES Features

1 / 5

Steal/No-Force Policy

The Buffer Management Dilemma

What You Will Learn

Understanding Buffer Management Policies

Before diving into steal/no-force specifically, we must establish the conceptual framework of buffer management policies. These policies govern two critical decisions:

When can dirty pages be written to disk? — This determines whether uncommitted changes can appear on disk.
When must dirty pages be written to disk? — This determines whether committed changes must reach disk before commit returns.

The answers to these questions define four orthogonal policies that combine to create four possible configurations.

The Four Buffer Management Policies
Policy	Definition	Question Answered	Constraint Imposed
Steal	Allow flushing uncommitted pages	Can we write uncommitted data?	None—buffer manager has full freedom
No-Steal	Forbid flushing uncommitted pages	Can we write uncommitted data?	Dirty pages pinned until commit
Force	Require flushing at commit	Must committed data be on disk?	Commit waits for all pages to flush
No-Force	Allow delayed flushing	Must committed data be on disk?	None—commit can return immediately

The Steal Policy:

When the buffer pool is under memory pressure and needs to evict a page, the steal policy asks: "Can I evict this dirty page even if the transaction that modified it hasn't committed yet?"

Steal = Yes: The buffer manager can write any dirty page to disk at any time, regardless of transaction state. This maximizes flexibility but creates a problem: if the transaction later aborts, we've written uncommitted data to the database. Recovery must be prepared to undo these changes.
No-Steal = No: Dirty pages from uncommitted transactions are "pinned" in the buffer pool. They cannot be evicted until the transaction commits. This simplifies recovery (no undo needed for in-place updates) but can cause memory exhaustion if transactions modify many pages before committing.

The Force Policy:

At commit time, the force policy asks: "Must all pages modified by this transaction be flushed to disk before we acknowledge the commit?"

Force = Yes: Every dirty page touched by the transaction must be written to disk synchronously before commit returns. This guarantees durability trivially (data is on disk!) but imposes severe I/O overhead—every commit triggers potentially many random disk writes.
No-Force = No: The transaction can commit as soon as its log records are on stable storage. The actual data pages may remain in the buffer pool and be written later. This dramatically improves commit latency but requires the recovery system to redo committed changes that weren't yet on disk when a crash occurred.

The Key Insight

The Four Policy Combinations

Policy Combinations and Their Characteristics
Combination	UNDO Needed?	REDO Needed?	Performance	Memory Pressure	Practical Assessment
No-Steal/Force	No	No	Poor	High	Simple recovery, unusable performance
No-Steal/No-Force	No	Yes	Moderate	High	Better performance, still memory-limited
Steal/Force	Yes	No	Moderate	Low	Better memory, still I/O-limited
Steal/No-Force	Yes	Yes	Excellent	Low	Maximum flexibility, most complex recovery

No-Steal/Force (Simplest, Slowest):

This is the naive approach that requires no recovery logic beyond "don't crash." Because we never write uncommitted data (no-steal) and always write committed data before returning (force):

Abort is trivial: just discard buffer pool pages
Crash recovery is trivial: database on disk is always in a consistent committed state

Unfortunately, this is practically unusable:

Large transactions can exhaust the buffer pool (no-steal pins too many pages)
Every commit incurs synchronous I/O to every modified page (force is expensive)
Random write patterns make matters worse on spinning disks

No production database uses this combination.

No-Steal/No-Force:

Relaxing the force requirement improves commit latency—we only need to flush log records, not data pages. However, we still can't evict uncommitted pages, so:

Long-running transactions with large write sets can still exhaust memory
We need REDO logic because committed changes may not be on disk at crash time
No UNDO needed because uncommitted changes never reach disk

This was used by some early systems but the memory limitation proved severely restricting for OLTP workloads.

Steal/Force:

Relaxing no-steal solves the memory problem—we can evict any page when needed. But forcing at commit still imposes severe I/O overhead:

We need UNDO logic because uncommitted changes may reach disk
No REDO needed because all committed changes are on disk before commit returns
Commit latency remains high due to synchronous page flushes

This combination is viable but suboptimal; commit remains I/O-bound.

Steal/No-Force (ARIES Choice):

By allowing both policies to maximize flexibility:

Buffer manager can evict any page at any time (steal → low memory pressure)
Commits only require log flush, not data page flush (no-force → low commit latency)
Recovery needs both UNDO (for uncommitted steals) and REDO (for uncommitted flushes)

This combination offers optimal buffer management but demands sophisticated recovery. ARIES provides exactly that sophistication.

Why Complexity Is Worth It

Deep Dive into the Steal Policy

The Mechanics of Stealing:

Consider this scenario:

Transaction T₁ modifies page P, making it dirty
T₁ has not yet committed
The buffer pool is full, and transaction T₂ needs a page frame
The buffer manager's replacement policy (e.g., LRU) selects P for eviction

With no-steal, the buffer manager cannot evict P. It must find another victim or wait. If T₁ is a long-running transaction with many modifications, this can deadlock or exhaust memory.

With steal, the buffer manager writes P to disk and reuses the frame. T₁'s uncommitted changes are now on the physical database.

steal_policy_visualization.md
Timeline with Steal Policy:
┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│  T₁: BEGIN ──► MODIFY(P) ──► ... (long running) ...                │
│                    │                                                │
│                    ▼                                                │
│              Page P is dirty                                        │
│              (uncommitted)                                          │
│                    │                                                │
│   Buffer Pool Full │◄── T₂ requests new page                       │
│                    │                                                │
│                    ▼                                                │
│            ┌───────────────┐                                        │
│            │ STEAL PAGE P  │  ◄── Buffer manager evicts P          │
│            │ Write to disk │      despite T₁ uncommitted           │
│            └───────────────┘                                        │
│                    │                                                │
│                    ▼                                                │
│            Database on disk now                                     │
│            contains UNCOMMITTED data                                │
│                    │                                                │
│  Two possible outcomes:                                             │
│                    │                                                │
│         ┌─────────┴─────────┐                                       │
│         ▼                   ▼                                       │
│    T₁ COMMITS          T₁ ABORTS (or CRASH)                        │
│    (data stays)        (MUST UNDO changes)                          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

The Recovery Implication:

If T₁ aborts after a steal, or if the system crashes before T₁ commits, the database on disk contains changes that should never have been visible. ARIES must undo these changes:

For abort: ARIES reads T₁'s log records in reverse order, applying the undo information to restore original values. This works because the log contains the before images (original values before modification).
For crash recovery: During the UNDO phase, ARIES identifies all transactions that were active at crash time (from the analysis phase) and rolls them back using the same logic.

The key requirement is that the log must reach stable storage before any stolen page. This is the Write-Ahead Logging (WAL) protocol, which ARIES rigorously enforces.

WAL Enables Steal

Benefits of the Steal Policy:

Memory Efficiency: The buffer pool never becomes hostage to long-running transactions. Any page can be evicted when memory is needed.
Predictable Performance: Memory pressure triggers page eviction, not transaction failure. The system degrades gracefully under load.
Flexibility in Transaction Size: Transactions can modify more pages than fit in the buffer pool. This is essential for bulk operations.
Simpler Buffer Manager: The replacement algorithm doesn't need to understand transaction boundaries. It just picks victims based on access patterns.
Reduced Lock Contention: Pages aren't held in memory artificially, reducing the window where one transaction blocks another waiting for buffer space.

Deep Dive into the No-Force Policy

The no-force policy provides equally dramatic benefits, this time for commit latency. Understanding no-force requires examining what happens at commit time under each strategy.

Force Policy Commit:

When a transaction commits under a force policy:

Identify all pages modified by this transaction
For each dirty page: perform synchronous write to disk
Wait for all writes to complete
Only then return success to the application

No-Force Policy Commit:

When a transaction commits under no-force:

Append commit record to the log
Flush log to stable storage (typically a sequential write)
Return success to the application immediately
Dirty pages remain in buffer pool, eventually written by background I/O

A commit that took 1 second under force might take 1ms under no-force (the time to flush a few log pages).

force_vs_noforce_comparison.md
Commit Latency Comparison:
 
┌─────────────────────────────────────────────────────────────────────┐
│                      FORCE POLICY                                   │
├─────────────────────────────────────────────────────────────────────┤
│  Transaction T modifies pages: P1, P2, P3, ..., P100                │
│                                                                     │
│  COMMIT:                                                            │
│    ├──► Write P1 to disk (10ms)                                     │
│    ├──► Write P2 to disk (10ms)                                     │
│    ├──► Write P3 to disk (10ms)                                     │
│    │        ...                                                     │
│    └──► Write P100 to disk (10ms)                                   │
│                                                                     │
│  Total: ~1000ms (1 second) for random I/O                           │
│  Throughput: 1 transaction/second per disk                          │
└─────────────────────────────────────────────────────────────────────┘
 
┌─────────────────────────────────────────────────────────────────────┐
│                     NO-FORCE POLICY                                 │
├─────────────────────────────────────────────────────────────────────┤
│  Transaction T modifies pages: P1, P2, P3, ..., P100                │
│                                                                     │
│  COMMIT:                                                            │
│    └──► Flush log records to disk (sequential, ~1ms)                │
│                                                                     │
│  Background (asynchronous, batched):                                │
│    └──► Write dirty pages as buffer manager sees fit                │
│                                                                     │
│  Total: ~1ms for commit response                                    │
│  Throughput: 1000+ transactions/second per disk                     │
└─────────────────────────────────────────────────────────────────────┘
 
Performance improvement: 1000x in this scenario

The Recovery Implication:

With no-force, committed transactions may have their changes only in the log, not on data pages. If the system crashes:

The commit record is in the log (that's why the commit was acknowledged)
Some or all of the transaction's data page changes may not be on disk
Recovery must redo these changes to restore the committed state

ARIES's REDO phase handles this by replaying all logged operations from the earliest relevant point, bringing the database forward to its state at crash time.

Log Is the Source of Truth

Benefits of the No-Force Policy:

Low Commit Latency: Commits complete in milliseconds regardless of how many pages were modified.
Sequential Log Writes: Group commit can batch multiple transactions' log records into a single I/O, amortizing disk access across many transactions.
Reduced Random I/O: Data pages are written when convenient, not when transactions demand it. The buffer manager can batch and optimize writes.
Better Write Patterns: Rather than writing pages immediately after each modification, pages accumulate multiple modifications before being written, reducing total I/O.
Higher Throughput: Systems can sustain thousands of transactions per second on hardware that would support only tens under force.

Combined Benefits and Synergies

The steal/no-force combination creates synergistic benefits beyond what each policy provides individually. The two policies complement each other, and ARIES exploits this synergy masterfully.

Synergistic Benefits of Steal/No-Force

•Unified I/O Optimization: Both stealing and no-force allow the buffer manager to batch I/O intelligently. Multiple transactions' changes to the same page are written once, not once per transaction.
•Decoupled Transaction Progress: Transaction execution, commit, and I/O are fully decoupled. Transactions can proceed without waiting for I/O; commits don't block on writes; I/O proceeds based on system-optimal scheduling.
•Memory-I/O Balance: Steal handles memory pressure; no-force handles I/O pressure. Even under combined memory and I/O stress, the system degrades gracefully.
•Write Absorption: A "hot" page modified by many transactions is written to disk only periodically, not once per transaction. This dramatically reduces total I/O for update-heavy workloads.
•Checkpoint Efficiency: Checkpoints can proceed incrementally (fuzzy checkpoints) because there's no assumption that all pages are forced to disk at checkpoint time.

Write Absorption in Detail:

Consider a counter page that's incremented by 1000 transactions per second:

With Force: 1000 writes per second to this page (each commit forces the page)
With No-Force + Steal: Perhaps 1 write per second (page is updated in memory, written when evicted or checkpointed)

This is a 1000x reduction in I/O for hot pages. In real workloads with many hot pages, the total I/O savings are enormous.

The formula for I/O reduction is approximately:

I/O Reduction Factor ≈ (transactions modifying page P) / (evictions of page P)

For frequently-accessed pages that remain in the buffer pool, this factor can be thousands or millions.

Real-World Impact

Implementation Considerations

Implementing steal/no-force requires careful coordination between the buffer manager, log manager, and recovery subsystem. Here are the critical implementation requirements:

Critical Implementation Requirements

•LSN Tracking on Pages: Every data page must record the LSN of the last log record that modified it. This pageLSN allows REDO to skip already-applied operations.
•WAL Protocol Enforcement: Before any dirty page is written to disk, all log records up to its pageLSN must be flushed. The buffer manager must coordinate with the log manager.
•Dirty Page Table: The system must track which pages are dirty and their recLSNs (the LSN when the page first became dirty). This determines the REDO starting point.
•Transaction Table: Active transactions and their lastLSNs must be tracked for UNDO purposes.
•Log Record Format: Each log record must contain both redo and undo information—the 'after image' and 'before image' of modifications.
•Compensation Logging: When undoing operations, Compensation Log Records (CLRs) must be written to ensure undo operations are not repeated after a crash during recovery.

wal_coordination.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Before writing a dirty page to disk (steal operation):
function prepareToEvictPage(page: Page) {
    // WAL Protocol: Log must reach disk before data
    logManager.flushUpToLSN(page.pageLSN);
    
    // Now safe to write the page
    diskManager.writePage(page);
}
 
// At commit time (no-force operation):
function commitTransaction(txn: Transaction) {
    // Write commit record to log
    commitLSN = logManager.appendCommitRecord(txn.id);
    
    // Flush log to stable storage
    // This is the ONLY synchronous I/O required!
    logManager.flushUpToLSN(commitLSN);
    
    // Transaction is now durably committed
    // Data pages may still be in buffer pool - that's OK
    return COMMIT_SUCCESS;
    
    // Note: Dirty pages will be written later by:
    // - Buffer manager eviction (steal)
    // - Background checkpoint process  
    // - Natural buffer pool turnover
}

The PageLSN Invariant:

The pageLSN recorded on each page serves multiple purposes:

Recovery Optimization: During REDO, if a page's on-disk pageLSN ≥ the log record's LSN, that log record's effect is already on disk and can be skipped.
WAL Enforcement: The buffer manager checks that logManager.flushedLSN ≥ page.pageLSN before writing any page.
Consistency Verification: If a page's pageLSN is greater than expected, it indicates corruption or a recovery bug.

This single value—the LSN of the last modification—provides all necessary coordination between the buffer manager and recovery system.

The LSN is Everything

Trade-offs and Design Considerations

While steal/no-force is overwhelmingly advantageous, it's important to understand the trade-offs and edge cases where other policies might be considered.

Advantages

•Maximum I/O flexibility
•Minimal commit latency
•Efficient memory utilization
•Graceful degradation under load
•Supports very large transactions
•Enables group commit optimization
•Compatible with fuzzy checkpoints

Considerations

•Complex recovery implementation
•Requires both UNDO and REDO logic
•Log becomes critical path (must not fail)
•Recovery time proportional to work since last checkpoint
•CLRs add logging overhead during aborts
•Requires careful LSN coordination
•More difficult to understand and debug

When Simpler Policies Might Apply:

Read-only or rarely-updated data: If updates are rare, the cost of force might be acceptable for simpler recovery.
Small buffer pools relative to data size: If eviction is constant anyway, the steal benefit diminishes slightly (but no-force still helps).
Embedded systems with limited code space: The simpler no-steal/force logic might fit where ARIES wouldn't.
Append-only systems: Some designs avoid in-place updates entirely, making the question moot.

However, for any general-purpose OLTP database, steal/no-force is essentially mandatory for acceptable performance.

Recovery Time Trade-off:

One genuine trade-off: with no-force, committed changes may only be in the log for extended periods. If a crash occurs:

More REDO work may be needed to bring pages up to date
Recovery time increases with time since the last checkpoint

This is mitigated by:

Frequent checkpoints: Bound the recovery window
Fuzzy checkpoints: Minimize checkpoint overhead (discussed in later pages)
SSD storage: Fast random reads make REDO quick
Parallel recovery: Modern systems parallelize REDO across multiple threads

Optimize for the Common Case

Summary: The Steal/No-Force Foundation

Key Takeaways

•Steal allows uncommitted pages to be written to disk, requiring UNDO capability but freeing the buffer manager from memory pressure.
•No-Force allows commits without synchronous page writes, requiring REDO capability but achieving millisecond commit latency.
•Combined, these policies deliver maximum I/O flexibility and throughput at the cost of more sophisticated recovery logic.
•WAL (Write-Ahead Logging) is the enabling protocol that makes steal safe—log records always reach disk before data pages.
•ARIES provides the UNDO and REDO mechanisms that make steal/no-force practical and robust.
•Modern databases universally adopt steal/no-force because no other combination can achieve adequate performance.

What's Next:

Page Complete

1 / 5