Database Management SystemsLog-Based Recovery

Log-Based Recovery

LevelIntermediate

Duration60 mins

TopicLog-Based Recovery

3 / 5

Log Types

The Choice That Shapes Recovery

Not all transaction logs are created equal. While every log records database modifications, the way modifications are logged and when data can be written to disk varies dramatically between systems. These choices define the logging type or logging strategy used by a database system.

The logging type isn't just an implementation detail—it fundamentally affects:

When transactions can commit
When dirty pages can be written to disk
How recovery works after a crash
The overall system performance

Understanding the three primary logging types—Undo-Only, Redo-Only, and Undo-Redo—is essential for understanding how different database systems approach the recovery problem and why certain performance trade-offs exist.

What You Will Learn

By the end of this page, you will understand the three primary logging strategies (undo-only, redo-only, undo-redo), the constraints each places on when data can be written to disk, how recovery differs for each approach, and why most modern systems use the undo-redo approach.

The Fundamental Trade-off: Steal vs. No-Steal, Force vs. No-Force

Before diving into specific log types, we must understand two fundamental policy choices that every database must make regarding when modified data pages can be written to disk:

Policy 1: STEAL vs. NO-STEAL

STEAL: The buffer manager may write a dirty page to disk before the transaction that modified it commits. This "steals" the buffer frame for other uses.
NO-STEAL: The buffer manager must not write a dirty page until all transactions that modified it have committed.

Policy 2: FORCE vs. NO-FORCE

FORCE: When a transaction commits, all pages it modified must be written to disk before the commit completes.
NO-FORCE: A transaction can commit even if its modified pages haven't been written to disk yet (the log guarantees durability).

These policies directly determine what kind of logging is required:

Buffer Management Policies and Logging Requirements
	FORCE	NO-FORCE
NO-STEAL	No logging needed (impractical)	Redo-only logging required
STEAL	Undo-only logging required	Undo-redo logging required

Why These Policies Matter:

STEAL is essential for memory efficiency: Without STEAL, all modified pages must remain in memory until their transactions commit. For long-running transactions, this could exhaust the buffer pool.
NO-FORCE is essential for performance: With FORCE, every commit requires random disk I/O to write all modified pages—potentially dozens of pages scattered across the disk. This makes commits extremely slow.

The ideal combination for performance is STEAL + NO-FORCE, which minimizes memory pressure and commit latency. But this combination requires the most sophisticated logging: undo-redo logging. Let's examine each approach in detail.

The Practical Reality

Modern production databases (PostgreSQL, MySQL/InnoDB, Oracle, SQL Server) all use STEAL + NO-FORCE with undo-redo logging. The alternatives are either impractical (NO-STEAL, which exhausts memory) or too slow (FORCE, which makes commits expensive).

Undo-Only Logging

Undo-only logging is the simplest logging strategy. Each log record contains only the before image—the value of data before modification. This is sufficient to undo uncommitted changes, but provides no ability to redo committed changes.

The Rules of Undo-Only Logging:

Before modifying data: Write the undo log record (containing the before image) to disk
Before commit: Force ALL modified pages to disk (FORCE policy required)
On recovery: Undo all changes from uncommitted transactions using before images

Why FORCE is Required:

With only before images, we can only undo—we cannot redo. If a transaction commits but its data pages weren't on disk, and then we crash, we have no way to restore those committed changes. Therefore, commits must wait for all data pages to be flushed (FORCE).

Undo-Only Log Record
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Undo-Only log records contain only before images
 
struct UndoOnlyLogRecord {
    uint64_t    lsn;
    uint32_t    transaction_id;
    uint64_t    prev_lsn;
    uint8_t     record_type;        // UPDATE, INSERT, DELETE
    
    // === Location ===
    uint32_t    page_id;
    uint16_t    offset;
    
    // === Only Before Image (for undo) ===
    uint16_t    before_length;
    byte[]      before_image;       // Old value - enables undo
    
    // NO after_image - cannot redo!
}
 
// The Undo-Only Commit Protocol:
// 
// 1. T1 modifies Page A: Log(before_image), then modify in buffer
// 2. T1 modifies Page B: Log(before_image), then modify in buffer
// 3. T1 modifies Page C: Log(before_image), then modify in buffer
// 4. T1 requests COMMIT
// 5. FORCE: Flush Page A, Page B, Page C to disk (WAIT!)
// 6. Write COMMIT record
// 7. Acknowledge commit to application
//
// If crash after step 5 but before step 6:
//   - Pages are on disk, but no COMMIT
//   - Recovery will UNDO all T1's changes using before images
//
// If crash after step 6:
//   - COMMIT record on disk, pages on disk
//   - Recovery: T1 is committed, no action needed

Advantages

•Simple to implement
•Smaller log records (no after images)
•Simple recovery logic (only undo)
•Committed data guaranteed on disk

Disadvantages

•Slow commits (must flush all pages)
•Random I/O at commit time
•Transaction response time suffers
•Impractical for high throughput

The Commit Bottleneck

In undo-only logging, the FORCE requirement makes every commit wait for potentially dozens of random disk writes. Consider a transaction modifying 100 pages across the disk—the commit must wait for all 100 to flush. This is why undo-only logging is rarely used in production systems.

Redo-Only Logging

Redo-only logging takes the opposite approach: log records contain only the after image—the new value of data after modification. This enables redo of committed transactions but provides no ability to undo uncommitted changes.

The Rules of Redo-Only Logging:

Before commit: Write all redo log records (containing after images) to disk
Restriction: Do NOT write modified pages to disk until the transaction commits (NO-STEAL)
After commit: Pages may be written to disk at any time
On recovery: Redo changes from committed transactions using after images

Why NO-STEAL is Required:

With only after images, we can redo but cannot undo. If an uncommitted transaction's dirty page is written to disk and then we crash, that uncommitted change is permanently in the database with no way to undo it. Therefore, uncommitted pages must never reach disk (NO-STEAL).

Redo-Only Log Record
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Redo-Only log records contain only after images
 
struct RedoOnlyLogRecord {
    uint64_t    lsn;
    uint32_t    transaction_id;
    uint64_t    prev_lsn;
    uint8_t     record_type;        // UPDATE, INSERT, DELETE
    
    // === Location ===
    uint32_t    page_id;
    uint16_t    offset;
    
    // === Only After Image (for redo) ===
    uint16_t    after_length;
    byte[]      after_image;        // New value - enables redo
    
    // NO before_image - cannot undo!
}
 
// The Redo-Only Commit Protocol:
// 
// 1. T1 modifies Page A in buffer only (DO NOT flush!)
// 2. T1 modifies Page B in buffer only (DO NOT flush!)
// 3. T1 modifies Page C in buffer only (DO NOT flush!)
// 4. T1 requests COMMIT
// 5. Flush all T1's log records to disk (including COMMIT)
// 6. Acknowledge commit to application
// 7. Now pages A, B, C may be flushed (at any time)
//
// If crash after step 3 but before step 5:
//   - Pages not on disk (NO-STEAL enforced)
//   - No COMMIT record, T1 treated as never happened
//   - Recovery: Nothing to do for T1
//
// If crash after step 5 but before step 7:
//   - COMMIT on disk, but pages might not be
//   - Recovery: REDO T1's changes using after images

Advantages

•Fast commits (sequential log writes only)
•No page flushes required at commit
•Simple recovery (only redo, no undo)
•Pages can be written lazily

Disadvantages

•NO-STEAL exhausts buffer memory
•Long transactions lock buffer frames
•Buffer pool pressure for large workloads
•Memory crisis with many concurrent transactions

The Memory Crisis

Consider a transaction that modifies 10,000 pages before committing. With NO-STEAL, all 10,000 pages must remain in the buffer pool. If the buffer pool is only 8,000 frames, the transaction cannot complete. Worse, multiple concurrent transactions compound this pressure, making NO-STEAL impractical for most workloads.

Undo-Redo Logging: The Best of Both Worlds

Undo-redo logging (also called ARIES-style logging or simply WAL in practice) combines both before and after images in log records. This enables complete flexibility: uncommitted pages can be written to disk at any time (STEAL), and committed transactions don't require page flushes (NO-FORCE).

The Rules of Undo-Redo Logging:

Before modifying data in buffer: Write the log record (with both before AND after images) to disk
Modified pages: May be written to disk at any time, before or after commit
Before commit: Ensure all log records (including COMMIT) are on disk
On recovery: Both redo and undo are possible using the full information in the log

Undo-Redo Log Record
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Undo-Redo log records contain BOTH images
 
struct UndoRedoLogRecord {
    uint64_t    lsn;
    uint32_t    transaction_id;
    uint64_t    prev_lsn;
    uint8_t     record_type;        // UPDATE, INSERT, DELETE
    
    // === Location ===
    uint32_t    page_id;
    uint16_t    offset;
    
    // === Before Image (for undo) ===
    uint16_t    before_length;
    byte[]      before_image;       // Enables undo of uncommitted changes
    
    // === After Image (for redo) ===
    uint16_t    after_length;
    byte[]      after_image;        // Enables redo of committed changes
}
 
// The Undo-Redo Protocol (Maximum Flexibility):
// 
// 1. T1 modifies Page A: Log(before, after), then modify in buffer
//    - Page A CAN be flushed immediately (STEAL OK)
// 2. T1 modifies Page B: Log(before, after), then modify in buffer
//    - Page B CAN be flushed immediately (STEAL OK)
// 3. T1 requests COMMIT
// 4. Flush log through COMMIT record (pages need NOT be flushed)
// 5. Acknowledge commit immediately (NO-FORCE)
// 6. Pages A, B flushed whenever convenient
//
// Recovery handles ALL scenarios:
// - Crash with committed T1, pages flushed: No action needed
// - Crash with committed T1, pages not flushed: REDO using after images
// - Crash with uncommitted T1, pages flushed: UNDO using before images
// - Crash with uncommitted T1, pages not flushed: No action needed

Why Undo-Redo Wins:

With both before and after images, the recovery system can handle any possible state:

Transaction State	Page State	Recovery Action
Committed	On disk	Nothing (already correct)
Committed	Not on disk	REDO using after image
Not committed	On disk	UNDO using before image
Not committed	Not on disk	Nothing (changes lost, as expected)

This flexibility enables:

STEAL: Free buffer frames by writing dirty pages whenever needed
NO-FORCE: Commit with just a log flush (fast!)
Maximum concurrency: No blocking for page flushes
Optimal I/O: Pages written in batches, reducing random I/O

The Industry Standard

Virtually all modern relational databases use undo-redo logging: PostgreSQL, MySQL/InnoDB, Oracle, SQL Server, DB2, and more. The slightly larger log records are a small price for the flexibility and performance benefits.

Detailed Comparison of Logging Types

Let's systematically compare all three logging approaches across multiple dimensions to fully understand their trade-offs:

Comprehensive Logging Type Comparison
Dimension	Undo-Only	Redo-Only	Undo-Redo
Stored Images	Before only	After only	Both before and after
Buffer Policy	STEAL + FORCE	NO-STEAL + NO-FORCE	STEAL + NO-FORCE
Commit Speed	Slow (page flushes)	Fast (log only)	Fast (log only)
Buffer Pressure	Low	High	Low
Log Record Size	Smaller	Smaller	Larger
Recovery Actions	Undo only	Redo only	Redo then Undo
Recovery Complexity	Simple	Simple	Complex
Production Use	Rare	Rare	Universal

Performance Implications:

Commit Latency:

Undo-only: Worst (must flush all modified pages)
Redo-only: Best (only log records)
Undo-redo: Best (only log records)

Memory Efficiency:

Undo-only: Good (can write pages freely)
Redo-only: Poor (must retain uncommitted pages)
Undo-redo: Good (can write pages freely)

I/O Pattern:

Undo-only: Random writes at commit (bad)
Redo-only: Opportunistic writes (good)
Undo-redo: Opportunistic writes (good)

Log Space:

Undo-only: Smallest
Redo-only: Small
Undo-redo: Largest (both images)

The Log Space Trade-off

Undo-redo logging uses approximately 2x the log space of undo-only or redo-only approaches (storing both images). However, disk space is cheap, while commit latency and buffer pressure are expensive. Modern systems happily trade log space for performance.

Recovery Procedures by Log Type

Each logging type requires a different recovery procedure. Understanding these procedures clarifies why the logging rules exist:

Undo-Only Recovery:

Undo-Only Recovery Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Undo-Only Recovery
// Since FORCE is used, all committed data is on disk
// Only need to undo uncommitted transactions
 
function recoverUndoOnly():
    // Step 1: Scan forward to find all uncommitted transactions
    uncommitted = new Set()
    for record in log.from_beginning():
        if record.type == BEGIN:
            uncommitted.add(record.transaction_id)
        else if record.type == COMMIT or ABORT:
            uncommitted.remove(record.transaction_id)
    
    // Step 2: Scan backward, undoing uncommitted work
    for record in log.from_end_to_beginning():
        if record.transaction_id in uncommitted:
            if record.type in [UPDATE, INSERT, DELETE]:
                // Apply before image (undo)
                page = buffer.getPage(record.page_id)
                page.applyBeforeImage(record.before_image)
                page.flush()
    
    // Step 3: Write ABORT records for uncommitted transactions
    for txn_id in uncommitted:
        log.writeAbort(txn_id)

Redo-Only Recovery:

Redo-Only Recovery Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Redo-Only Recovery
// Since NO-STEAL is used, uncommitted data is NOT on disk
// Only need to redo committed transactions
 
function recoverRedoOnly():
    // Step 1: Find all committed transactions
    committed = new Set()
    for record in log.from_beginning():
        if record.type == COMMIT:
            committed.add(record.transaction_id)
    
    // Step 2: Scan forward, redoing committed work
    for record in log.from_beginning():
        if record.transaction_id in committed:
            if record.type in [UPDATE, INSERT, DELETE]:
                page = buffer.getPage(record.page_id)
                // Check if redo needed (page may already be current)
                if page.lsn < record.lsn:
                    // Apply after image (redo)
                    page.applyAfterImage(record.after_image)
                    page.setLSN(record.lsn)
    
    // Uncommitted transactions: no action needed
    // (their pages were never written due to NO-STEAL)

Undo-Redo Recovery (ARIES-style):

Undo-Redo Recovery Algorithm (Simplified ARIES)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// Undo-Redo Recovery (Three-Phase ARIES-style)
// Must handle all combinations: committed/uncommitted × on disk/not on disk
 
function recoverUndoRedo():
    // PHASE 1: ANALYSIS
    // Determine: dirty pages, active transactions at crash
    active_txns = {}
    dirty_pages = {}
    
    for record in log.from_checkpoint():
        if record.type == BEGIN:
            active_txns[record.txn_id] = {state: ACTIVE, lastLSN: record.lsn}
        else if record.type == COMMIT:
            active_txns[record.txn_id].state = COMMITTED
        else if record.type == ABORT:
            active_txns[record.txn_id].state = ABORTED
        else if record.type in [UPDATE, INSERT, DELETE]:
            active_txns[record.txn_id].lastLSN = record.lsn
            if record.page_id not in dirty_pages:
                dirty_pages[record.page_id] = record.lsn  // first dirtying LSN
    
    // PHASE 2: REDO (repeat history)
    // Apply ALL logged operations (committed or not) if page is behind
    redo_start = min(dirty_pages.values())
    
    for record in log.from(redo_start):
        if record.type in [UPDATE, INSERT, DELETE, CLR]:
            page = buffer.getPage(record.page_id)
            if page.lsn < record.lsn:
                page.applyAfterImage(record.after_image)
                page.setLSN(record.lsn)
    
    // PHASE 3: UNDO (rollback losers)
    // Undo all transactions that were active (not committed) at crash
    losers = [txn for txn in active_txns if txn.state == ACTIVE]
    
    undo_list = PriorityQueue()  // ordered by LSN descending
    for txn in losers:
        undo_list.add(txn.lastLSN)
    
    while not undo_list.empty():
        lsn = undo_list.pop()
        record = log.getRecord(lsn)
        
        if record.type in [UPDATE, INSERT, DELETE]:
            // Undo this operation
            page = buffer.getPage(record.page_id)
            page.applyBeforeImage(record.before_image)
            
            // Write CLR so we don't repeat this undo
            clr = writeCLR(record, undoNextLSN=record.prev_lsn)
            
            if record.prev_lsn != NULL:
                undo_list.add(record.prev_lsn)
        
        else if record.type == CLR:
            // This undo was already done, skip to next
            if record.undo_next_lsn != NULL:
                undo_list.add(record.undo_next_lsn)
    
    // Write ABORT records for all loser transactions
    for txn in losers:
        log.writeAbort(txn.id)

Why Redo Before Undo?

ARIES-style recovery redoes ALL operations first (even uncommitted ones) to restore the exact database state at crash time. Then it undoes uncommitted work. This 'repeat history' approach ensures that physiological logging and complex operations work correctly—the database state must be exactly as it was for before-images to be meaningful.

Physical vs. Logical Logging

Orthogonal to the undo/redo type distinction is another important classification: physical vs. logical logging.

Physical Logging:

Records the actual byte-level changes to pages
Before image: exact bytes being replaced
After image: exact bytes replacing them
Can be applied mechanically without understanding the data
Larger log records, but simpler recovery

Logical Logging:

Records the operation being performed
Example: "INSERT INTO accounts VALUES (101, 'Smith', 5000)"
Smaller log records, but requires more context to replay
Recovery must re-execute the operation, not just copy bytes

Physiological Logging (The Hybrid):

Most modern systems use physiological logging: physically identify the affected page, but logically describe the change within that page.

Logging Granularity Comparison
Approach	Identifies Page	Describes Change	Record Size	Complexity
Pure Physical	By disk block address	Byte-level diff	Large	Simple redo/undo
Pure Logical	By table/key	SQL-like operation	Small	Complex re-execution
Physiological	By page ID	Slot-level change	Medium	Balanced

Example: UPDATE accounts SET balance = 500 WHERE id = 101

Pure Physical Log:

[Page 42, offset 1724-1732: replace 0x000003E8 with 0x000001F4]

Pure Logical Log:

[UPDATE accounts SET balance = 500 WHERE id = 101]

Physiological Log:

[Page 42, slot 15, column 'balance': before=1000, after=500]

Physiological logging provides the best trade-off: page-level identification (simple addressing), but logical change description (compact, meaningful). This approach also handles page reorganizations gracefully—the slot number identifies the record even if it moves within the page.

B-Tree and Index Logging

Index operations often use pure physical logging because their structure-level changes (splits, merges) are difficult to express logically. The log might simply record 'page 847 replaced by pages 847+1023' during a B-tree split, with full before/after page images.

Summary: Choosing the Right Logging Strategy

We've examined the three primary logging strategies and understood why undo-redo logging dominates in production systems. Let's consolidate the key insights:

Key Takeaways

•Logging type is determined by buffer management policies — STEAL/NO-STEAL and FORCE/NO-FORCE determine what logging is required.
•Undo-only logging requires FORCE — All pages flushed at commit; simple but slow commits.
•Redo-only logging requires NO-STEAL — Uncommitted pages stay in memory; fast commits but memory pressure.
•Undo-redo logging enables STEAL + NO-FORCE — Maximum flexibility with fast commits and low memory pressure.
•Modern databases universally use undo-redo — The larger log records are worth the performance benefits.
•Recovery procedures differ by logging type — Undo-only undoes, redo-only redoes, undo-redo does both.
•Physiological logging is the practical choice — Balances log size and recovery simplicity.

What's Next:

Now that we understand the types of logging and why undo-redo dominates, we'll examine log storage—how logs are physically organized on disk, how they're managed for space efficiency, and how archival and truncation work in production systems.

Page Complete

You now understand the three logging strategies, their constraints, and why undo-redo with STEAL/NO-FORCE is the universal choice for production databases. Next, we'll explore how these logs are physically stored and managed.

3 / 5

Loading learning content...

Database Management SystemsLog-Based Recovery

Log-Based Recovery

LevelIntermediate

Duration60 mins

TopicLog-Based Recovery

3 / 5

Log Types

The Choice That Shapes Recovery

The logging type isn't just an implementation detail—it fundamentally affects:

When transactions can commit
When dirty pages can be written to disk
How recovery works after a crash
The overall system performance

What You Will Learn

The Fundamental Trade-off: Steal vs. No-Steal, Force vs. No-Force

Before diving into specific log types, we must understand two fundamental policy choices that every database must make regarding when modified data pages can be written to disk:

Policy 1: STEAL vs. NO-STEAL

STEAL: The buffer manager may write a dirty page to disk before the transaction that modified it commits. This "steals" the buffer frame for other uses.
NO-STEAL: The buffer manager must not write a dirty page until all transactions that modified it have committed.

Policy 2: FORCE vs. NO-FORCE

FORCE: When a transaction commits, all pages it modified must be written to disk before the commit completes.
NO-FORCE: A transaction can commit even if its modified pages haven't been written to disk yet (the log guarantees durability).

These policies directly determine what kind of logging is required:

Buffer Management Policies and Logging Requirements
	FORCE	NO-FORCE
NO-STEAL	No logging needed (impractical)	Redo-only logging required
STEAL	Undo-only logging required	Undo-redo logging required

Why These Policies Matter:

STEAL is essential for memory efficiency: Without STEAL, all modified pages must remain in memory until their transactions commit. For long-running transactions, this could exhaust the buffer pool.
NO-FORCE is essential for performance: With FORCE, every commit requires random disk I/O to write all modified pages—potentially dozens of pages scattered across the disk. This makes commits extremely slow.

The Practical Reality

Undo-Only Logging

The Rules of Undo-Only Logging:

Before modifying data: Write the undo log record (containing the before image) to disk
Before commit: Force ALL modified pages to disk (FORCE policy required)
On recovery: Undo all changes from uncommitted transactions using before images

Why FORCE is Required:

Undo-Only Log Record
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Undo-Only log records contain only before images
 
struct UndoOnlyLogRecord {
    uint64_t    lsn;
    uint32_t    transaction_id;
    uint64_t    prev_lsn;
    uint8_t     record_type;        // UPDATE, INSERT, DELETE
    
    // === Location ===
    uint32_t    page_id;
    uint16_t    offset;
    
    // === Only Before Image (for undo) ===
    uint16_t    before_length;
    byte[]      before_image;       // Old value - enables undo
    
    // NO after_image - cannot redo!
}
 
// The Undo-Only Commit Protocol:
// 
// 1. T1 modifies Page A: Log(before_image), then modify in buffer
// 2. T1 modifies Page B: Log(before_image), then modify in buffer
// 3. T1 modifies Page C: Log(before_image), then modify in buffer
// 4. T1 requests COMMIT
// 5. FORCE: Flush Page A, Page B, Page C to disk (WAIT!)
// 6. Write COMMIT record
// 7. Acknowledge commit to application
//
// If crash after step 5 but before step 6:
//   - Pages are on disk, but no COMMIT
//   - Recovery will UNDO all T1's changes using before images
//
// If crash after step 6:
//   - COMMIT record on disk, pages on disk
//   - Recovery: T1 is committed, no action needed

Advantages

•Simple to implement
•Smaller log records (no after images)
•Simple recovery logic (only undo)
•Committed data guaranteed on disk

Disadvantages

•Slow commits (must flush all pages)
•Random I/O at commit time
•Transaction response time suffers
•Impractical for high throughput

The Commit Bottleneck

Redo-Only Logging

The Rules of Redo-Only Logging:

Before commit: Write all redo log records (containing after images) to disk
Restriction: Do NOT write modified pages to disk until the transaction commits (NO-STEAL)
After commit: Pages may be written to disk at any time
On recovery: Redo changes from committed transactions using after images

Why NO-STEAL is Required:

Redo-Only Log Record
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Redo-Only log records contain only after images
 
struct RedoOnlyLogRecord {
    uint64_t    lsn;
    uint32_t    transaction_id;
    uint64_t    prev_lsn;
    uint8_t     record_type;        // UPDATE, INSERT, DELETE
    
    // === Location ===
    uint32_t    page_id;
    uint16_t    offset;
    
    // === Only After Image (for redo) ===
    uint16_t    after_length;
    byte[]      after_image;        // New value - enables redo
    
    // NO before_image - cannot undo!
}
 
// The Redo-Only Commit Protocol:
// 
// 1. T1 modifies Page A in buffer only (DO NOT flush!)
// 2. T1 modifies Page B in buffer only (DO NOT flush!)
// 3. T1 modifies Page C in buffer only (DO NOT flush!)
// 4. T1 requests COMMIT
// 5. Flush all T1's log records to disk (including COMMIT)
// 6. Acknowledge commit to application
// 7. Now pages A, B, C may be flushed (at any time)
//
// If crash after step 3 but before step 5:
//   - Pages not on disk (NO-STEAL enforced)
//   - No COMMIT record, T1 treated as never happened
//   - Recovery: Nothing to do for T1
//
// If crash after step 5 but before step 7:
//   - COMMIT on disk, but pages might not be
//   - Recovery: REDO T1's changes using after images

Advantages

•Fast commits (sequential log writes only)
•No page flushes required at commit
•Simple recovery (only redo, no undo)
•Pages can be written lazily

Disadvantages

•NO-STEAL exhausts buffer memory
•Long transactions lock buffer frames
•Buffer pool pressure for large workloads
•Memory crisis with many concurrent transactions

The Memory Crisis

Undo-Redo Logging: The Best of Both Worlds

The Rules of Undo-Redo Logging:

Before modifying data in buffer: Write the log record (with both before AND after images) to disk
Modified pages: May be written to disk at any time, before or after commit
Before commit: Ensure all log records (including COMMIT) are on disk
On recovery: Both redo and undo are possible using the full information in the log

Undo-Redo Log Record
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Undo-Redo log records contain BOTH images
 
struct UndoRedoLogRecord {
    uint64_t    lsn;
    uint32_t    transaction_id;
    uint64_t    prev_lsn;
    uint8_t     record_type;        // UPDATE, INSERT, DELETE
    
    // === Location ===
    uint32_t    page_id;
    uint16_t    offset;
    
    // === Before Image (for undo) ===
    uint16_t    before_length;
    byte[]      before_image;       // Enables undo of uncommitted changes
    
    // === After Image (for redo) ===
    uint16_t    after_length;
    byte[]      after_image;        // Enables redo of committed changes
}
 
// The Undo-Redo Protocol (Maximum Flexibility):
// 
// 1. T1 modifies Page A: Log(before, after), then modify in buffer
//    - Page A CAN be flushed immediately (STEAL OK)
// 2. T1 modifies Page B: Log(before, after), then modify in buffer
//    - Page B CAN be flushed immediately (STEAL OK)
// 3. T1 requests COMMIT
// 4. Flush log through COMMIT record (pages need NOT be flushed)
// 5. Acknowledge commit immediately (NO-FORCE)
// 6. Pages A, B flushed whenever convenient
//
// Recovery handles ALL scenarios:
// - Crash with committed T1, pages flushed: No action needed
// - Crash with committed T1, pages not flushed: REDO using after images
// - Crash with uncommitted T1, pages flushed: UNDO using before images
// - Crash with uncommitted T1, pages not flushed: No action needed

Why Undo-Redo Wins:

With both before and after images, the recovery system can handle any possible state:

Transaction State	Page State	Recovery Action
Committed	On disk	Nothing (already correct)
Committed	Not on disk	REDO using after image
Not committed	On disk	UNDO using before image
Not committed	Not on disk	Nothing (changes lost, as expected)

This flexibility enables:

STEAL: Free buffer frames by writing dirty pages whenever needed
NO-FORCE: Commit with just a log flush (fast!)
Maximum concurrency: No blocking for page flushes
Optimal I/O: Pages written in batches, reducing random I/O

The Industry Standard

Detailed Comparison of Logging Types

Let's systematically compare all three logging approaches across multiple dimensions to fully understand their trade-offs:

Comprehensive Logging Type Comparison
Dimension	Undo-Only	Redo-Only	Undo-Redo
Stored Images	Before only	After only	Both before and after
Buffer Policy	STEAL + FORCE	NO-STEAL + NO-FORCE	STEAL + NO-FORCE
Commit Speed	Slow (page flushes)	Fast (log only)	Fast (log only)
Buffer Pressure	Low	High	Low
Log Record Size	Smaller	Smaller	Larger
Recovery Actions	Undo only	Redo only	Redo then Undo
Recovery Complexity	Simple	Simple	Complex
Production Use	Rare	Rare	Universal

Performance Implications:

Commit Latency:

Undo-only: Worst (must flush all modified pages)
Redo-only: Best (only log records)
Undo-redo: Best (only log records)

Memory Efficiency:

Undo-only: Good (can write pages freely)
Redo-only: Poor (must retain uncommitted pages)
Undo-redo: Good (can write pages freely)

I/O Pattern:

Undo-only: Random writes at commit (bad)
Redo-only: Opportunistic writes (good)
Undo-redo: Opportunistic writes (good)

Log Space:

Undo-only: Smallest
Redo-only: Small
Undo-redo: Largest (both images)

The Log Space Trade-off

Recovery Procedures by Log Type

Each logging type requires a different recovery procedure. Understanding these procedures clarifies why the logging rules exist:

Undo-Only Recovery:

Undo-Only Recovery Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Undo-Only Recovery
// Since FORCE is used, all committed data is on disk
// Only need to undo uncommitted transactions
 
function recoverUndoOnly():
    // Step 1: Scan forward to find all uncommitted transactions
    uncommitted = new Set()
    for record in log.from_beginning():
        if record.type == BEGIN:
            uncommitted.add(record.transaction_id)
        else if record.type == COMMIT or ABORT:
            uncommitted.remove(record.transaction_id)
    
    // Step 2: Scan backward, undoing uncommitted work
    for record in log.from_end_to_beginning():
        if record.transaction_id in uncommitted:
            if record.type in [UPDATE, INSERT, DELETE]:
                // Apply before image (undo)
                page = buffer.getPage(record.page_id)
                page.applyBeforeImage(record.before_image)
                page.flush()
    
    // Step 3: Write ABORT records for uncommitted transactions
    for txn_id in uncommitted:
        log.writeAbort(txn_id)

Redo-Only Recovery:

Redo-Only Recovery Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Redo-Only Recovery
// Since NO-STEAL is used, uncommitted data is NOT on disk
// Only need to redo committed transactions
 
function recoverRedoOnly():
    // Step 1: Find all committed transactions
    committed = new Set()
    for record in log.from_beginning():
        if record.type == COMMIT:
            committed.add(record.transaction_id)
    
    // Step 2: Scan forward, redoing committed work
    for record in log.from_beginning():
        if record.transaction_id in committed:
            if record.type in [UPDATE, INSERT, DELETE]:
                page = buffer.getPage(record.page_id)
                // Check if redo needed (page may already be current)
                if page.lsn < record.lsn:
                    // Apply after image (redo)
                    page.applyAfterImage(record.after_image)
                    page.setLSN(record.lsn)
    
    // Uncommitted transactions: no action needed
    // (their pages were never written due to NO-STEAL)

Undo-Redo Recovery (ARIES-style):

Undo-Redo Recovery Algorithm (Simplified ARIES)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// Undo-Redo Recovery (Three-Phase ARIES-style)
// Must handle all combinations: committed/uncommitted × on disk/not on disk
 
function recoverUndoRedo():
    // PHASE 1: ANALYSIS
    // Determine: dirty pages, active transactions at crash
    active_txns = {}
    dirty_pages = {}
    
    for record in log.from_checkpoint():
        if record.type == BEGIN:
            active_txns[record.txn_id] = {state: ACTIVE, lastLSN: record.lsn}
        else if record.type == COMMIT:
            active_txns[record.txn_id].state = COMMITTED
        else if record.type == ABORT:
            active_txns[record.txn_id].state = ABORTED
        else if record.type in [UPDATE, INSERT, DELETE]:
            active_txns[record.txn_id].lastLSN = record.lsn
            if record.page_id not in dirty_pages:
                dirty_pages[record.page_id] = record.lsn  // first dirtying LSN
    
    // PHASE 2: REDO (repeat history)
    // Apply ALL logged operations (committed or not) if page is behind
    redo_start = min(dirty_pages.values())
    
    for record in log.from(redo_start):
        if record.type in [UPDATE, INSERT, DELETE, CLR]:
            page = buffer.getPage(record.page_id)
            if page.lsn < record.lsn:
                page.applyAfterImage(record.after_image)
                page.setLSN(record.lsn)
    
    // PHASE 3: UNDO (rollback losers)
    // Undo all transactions that were active (not committed) at crash
    losers = [txn for txn in active_txns if txn.state == ACTIVE]
    
    undo_list = PriorityQueue()  // ordered by LSN descending
    for txn in losers:
        undo_list.add(txn.lastLSN)
    
    while not undo_list.empty():
        lsn = undo_list.pop()
        record = log.getRecord(lsn)
        
        if record.type in [UPDATE, INSERT, DELETE]:
            // Undo this operation
            page = buffer.getPage(record.page_id)
            page.applyBeforeImage(record.before_image)
            
            // Write CLR so we don't repeat this undo
            clr = writeCLR(record, undoNextLSN=record.prev_lsn)
            
            if record.prev_lsn != NULL:
                undo_list.add(record.prev_lsn)
        
        else if record.type == CLR:
            // This undo was already done, skip to next
            if record.undo_next_lsn != NULL:
                undo_list.add(record.undo_next_lsn)
    
    // Write ABORT records for all loser transactions
    for txn in losers:
        log.writeAbort(txn.id)

Why Redo Before Undo?

Physical vs. Logical Logging

Orthogonal to the undo/redo type distinction is another important classification: physical vs. logical logging.

Physical Logging:

Records the actual byte-level changes to pages
Before image: exact bytes being replaced
After image: exact bytes replacing them
Can be applied mechanically without understanding the data
Larger log records, but simpler recovery

Logical Logging:

Records the operation being performed
Example: "INSERT INTO accounts VALUES (101, 'Smith', 5000)"
Smaller log records, but requires more context to replay
Recovery must re-execute the operation, not just copy bytes

Physiological Logging (The Hybrid):

Most modern systems use physiological logging: physically identify the affected page, but logically describe the change within that page.

Logging Granularity Comparison
Approach	Identifies Page	Describes Change	Record Size	Complexity
Pure Physical	By disk block address	Byte-level diff	Large	Simple redo/undo
Pure Logical	By table/key	SQL-like operation	Small	Complex re-execution
Physiological	By page ID	Slot-level change	Medium	Balanced

Example: UPDATE accounts SET balance = 500 WHERE id = 101

Pure Physical Log:

[Page 42, offset 1724-1732: replace 0x000003E8 with 0x000001F4]

Pure Logical Log:

[UPDATE accounts SET balance = 500 WHERE id = 101]

Physiological Log:

[Page 42, slot 15, column 'balance': before=1000, after=500]

B-Tree and Index Logging

Summary: Choosing the Right Logging Strategy

We've examined the three primary logging strategies and understood why undo-redo logging dominates in production systems. Let's consolidate the key insights:

Key Takeaways

•Logging type is determined by buffer management policies — STEAL/NO-STEAL and FORCE/NO-FORCE determine what logging is required.
•Undo-only logging requires FORCE — All pages flushed at commit; simple but slow commits.
•Redo-only logging requires NO-STEAL — Uncommitted pages stay in memory; fast commits but memory pressure.
•Undo-redo logging enables STEAL + NO-FORCE — Maximum flexibility with fast commits and low memory pressure.
•Modern databases universally use undo-redo — The larger log records are worth the performance benefits.
•Recovery procedures differ by logging type — Undo-only undoes, redo-only redoes, undo-redo does both.
•Physiological logging is the practical choice — Balances log size and recovery simplicity.

What's Next:

Page Complete

3 / 5