Loading learning content...
Not all transaction logs are created equal. While every log records database modifications, the way modifications are logged and when data can be written to disk varies dramatically between systems. These choices define the logging type or logging strategy used by a database system.
The logging type isn't just an implementation detail—it fundamentally affects:
Understanding the three primary logging types—Undo-Only, Redo-Only, and Undo-Redo—is essential for understanding how different database systems approach the recovery problem and why certain performance trade-offs exist.
By the end of this page, you will understand the three primary logging strategies (undo-only, redo-only, undo-redo), the constraints each places on when data can be written to disk, how recovery differs for each approach, and why most modern systems use the undo-redo approach.
Before diving into specific log types, we must understand two fundamental policy choices that every database must make regarding when modified data pages can be written to disk:
Policy 1: STEAL vs. NO-STEAL
Policy 2: FORCE vs. NO-FORCE
These policies directly determine what kind of logging is required:
| FORCE | NO-FORCE | |
|---|---|---|
| NO-STEAL | No logging needed (impractical) | Redo-only logging required |
| STEAL | Undo-only logging required | Undo-redo logging required |
Why These Policies Matter:
STEAL is essential for memory efficiency: Without STEAL, all modified pages must remain in memory until their transactions commit. For long-running transactions, this could exhaust the buffer pool.
NO-FORCE is essential for performance: With FORCE, every commit requires random disk I/O to write all modified pages—potentially dozens of pages scattered across the disk. This makes commits extremely slow.
The ideal combination for performance is STEAL + NO-FORCE, which minimizes memory pressure and commit latency. But this combination requires the most sophisticated logging: undo-redo logging. Let's examine each approach in detail.
Modern production databases (PostgreSQL, MySQL/InnoDB, Oracle, SQL Server) all use STEAL + NO-FORCE with undo-redo logging. The alternatives are either impractical (NO-STEAL, which exhausts memory) or too slow (FORCE, which makes commits expensive).
Undo-only logging is the simplest logging strategy. Each log record contains only the before image—the value of data before modification. This is sufficient to undo uncommitted changes, but provides no ability to redo committed changes.
The Rules of Undo-Only Logging:
Why FORCE is Required:
With only before images, we can only undo—we cannot redo. If a transaction commits but its data pages weren't on disk, and then we crash, we have no way to restore those committed changes. Therefore, commits must wait for all data pages to be flushed (FORCE).
123456789101112131415161718192021222324252627282930313233343536
// Undo-Only log records contain only before images struct UndoOnlyLogRecord { uint64_t lsn; uint32_t transaction_id; uint64_t prev_lsn; uint8_t record_type; // UPDATE, INSERT, DELETE // === Location === uint32_t page_id; uint16_t offset; // === Only Before Image (for undo) === uint16_t before_length; byte[] before_image; // Old value - enables undo // NO after_image - cannot redo!} // The Undo-Only Commit Protocol:// // 1. T1 modifies Page A: Log(before_image), then modify in buffer// 2. T1 modifies Page B: Log(before_image), then modify in buffer// 3. T1 modifies Page C: Log(before_image), then modify in buffer// 4. T1 requests COMMIT// 5. FORCE: Flush Page A, Page B, Page C to disk (WAIT!)// 6. Write COMMIT record// 7. Acknowledge commit to application//// If crash after step 5 but before step 6:// - Pages are on disk, but no COMMIT// - Recovery will UNDO all T1's changes using before images//// If crash after step 6:// - COMMIT record on disk, pages on disk// - Recovery: T1 is committed, no action neededIn undo-only logging, the FORCE requirement makes every commit wait for potentially dozens of random disk writes. Consider a transaction modifying 100 pages across the disk—the commit must wait for all 100 to flush. This is why undo-only logging is rarely used in production systems.
Redo-only logging takes the opposite approach: log records contain only the after image—the new value of data after modification. This enables redo of committed transactions but provides no ability to undo uncommitted changes.
The Rules of Redo-Only Logging:
Why NO-STEAL is Required:
With only after images, we can redo but cannot undo. If an uncommitted transaction's dirty page is written to disk and then we crash, that uncommitted change is permanently in the database with no way to undo it. Therefore, uncommitted pages must never reach disk (NO-STEAL).
12345678910111213141516171819202122232425262728293031323334353637
// Redo-Only log records contain only after images struct RedoOnlyLogRecord { uint64_t lsn; uint32_t transaction_id; uint64_t prev_lsn; uint8_t record_type; // UPDATE, INSERT, DELETE // === Location === uint32_t page_id; uint16_t offset; // === Only After Image (for redo) === uint16_t after_length; byte[] after_image; // New value - enables redo // NO before_image - cannot undo!} // The Redo-Only Commit Protocol:// // 1. T1 modifies Page A in buffer only (DO NOT flush!)// 2. T1 modifies Page B in buffer only (DO NOT flush!)// 3. T1 modifies Page C in buffer only (DO NOT flush!)// 4. T1 requests COMMIT// 5. Flush all T1's log records to disk (including COMMIT)// 6. Acknowledge commit to application// 7. Now pages A, B, C may be flushed (at any time)//// If crash after step 3 but before step 5:// - Pages not on disk (NO-STEAL enforced)// - No COMMIT record, T1 treated as never happened// - Recovery: Nothing to do for T1//// If crash after step 5 but before step 7:// - COMMIT on disk, but pages might not be// - Recovery: REDO T1's changes using after imagesConsider a transaction that modifies 10,000 pages before committing. With NO-STEAL, all 10,000 pages must remain in the buffer pool. If the buffer pool is only 8,000 frames, the transaction cannot complete. Worse, multiple concurrent transactions compound this pressure, making NO-STEAL impractical for most workloads.
Undo-redo logging (also called ARIES-style logging or simply WAL in practice) combines both before and after images in log records. This enables complete flexibility: uncommitted pages can be written to disk at any time (STEAL), and committed transactions don't require page flushes (NO-FORCE).
The Rules of Undo-Redo Logging:
12345678910111213141516171819202122232425262728293031323334353637
// Undo-Redo log records contain BOTH images struct UndoRedoLogRecord { uint64_t lsn; uint32_t transaction_id; uint64_t prev_lsn; uint8_t record_type; // UPDATE, INSERT, DELETE // === Location === uint32_t page_id; uint16_t offset; // === Before Image (for undo) === uint16_t before_length; byte[] before_image; // Enables undo of uncommitted changes // === After Image (for redo) === uint16_t after_length; byte[] after_image; // Enables redo of committed changes} // The Undo-Redo Protocol (Maximum Flexibility):// // 1. T1 modifies Page A: Log(before, after), then modify in buffer// - Page A CAN be flushed immediately (STEAL OK)// 2. T1 modifies Page B: Log(before, after), then modify in buffer// - Page B CAN be flushed immediately (STEAL OK)// 3. T1 requests COMMIT// 4. Flush log through COMMIT record (pages need NOT be flushed)// 5. Acknowledge commit immediately (NO-FORCE)// 6. Pages A, B flushed whenever convenient//// Recovery handles ALL scenarios:// - Crash with committed T1, pages flushed: No action needed// - Crash with committed T1, pages not flushed: REDO using after images// - Crash with uncommitted T1, pages flushed: UNDO using before images// - Crash with uncommitted T1, pages not flushed: No action neededWhy Undo-Redo Wins:
With both before and after images, the recovery system can handle any possible state:
| Transaction State | Page State | Recovery Action |
|---|---|---|
| Committed | On disk | Nothing (already correct) |
| Committed | Not on disk | REDO using after image |
| Not committed | On disk | UNDO using before image |
| Not committed | Not on disk | Nothing (changes lost, as expected) |
This flexibility enables:
Virtually all modern relational databases use undo-redo logging: PostgreSQL, MySQL/InnoDB, Oracle, SQL Server, DB2, and more. The slightly larger log records are a small price for the flexibility and performance benefits.
Let's systematically compare all three logging approaches across multiple dimensions to fully understand their trade-offs:
| Dimension | Undo-Only | Redo-Only | Undo-Redo |
|---|---|---|---|
| Stored Images | Before only | After only | Both before and after |
| Buffer Policy | STEAL + FORCE | NO-STEAL + NO-FORCE | STEAL + NO-FORCE |
| Commit Speed | Slow (page flushes) | Fast (log only) | Fast (log only) |
| Buffer Pressure | Low | High | Low |
| Log Record Size | Smaller | Smaller | Larger |
| Recovery Actions | Undo only | Redo only | Redo then Undo |
| Recovery Complexity | Simple | Simple | Complex |
| Production Use | Rare | Rare | Universal |
Performance Implications:
Commit Latency:
Memory Efficiency:
I/O Pattern:
Log Space:
Undo-redo logging uses approximately 2x the log space of undo-only or redo-only approaches (storing both images). However, disk space is cheap, while commit latency and buffer pressure are expensive. Modern systems happily trade log space for performance.
Each logging type requires a different recovery procedure. Understanding these procedures clarifies why the logging rules exist:
Undo-Only Recovery:
12345678910111213141516171819202122232425
// Undo-Only Recovery// Since FORCE is used, all committed data is on disk// Only need to undo uncommitted transactions function recoverUndoOnly(): // Step 1: Scan forward to find all uncommitted transactions uncommitted = new Set() for record in log.from_beginning(): if record.type == BEGIN: uncommitted.add(record.transaction_id) else if record.type == COMMIT or ABORT: uncommitted.remove(record.transaction_id) // Step 2: Scan backward, undoing uncommitted work for record in log.from_end_to_beginning(): if record.transaction_id in uncommitted: if record.type in [UPDATE, INSERT, DELETE]: // Apply before image (undo) page = buffer.getPage(record.page_id) page.applyBeforeImage(record.before_image) page.flush() // Step 3: Write ABORT records for uncommitted transactions for txn_id in uncommitted: log.writeAbort(txn_id)Redo-Only Recovery:
123456789101112131415161718192021222324
// Redo-Only Recovery// Since NO-STEAL is used, uncommitted data is NOT on disk// Only need to redo committed transactions function recoverRedoOnly(): // Step 1: Find all committed transactions committed = new Set() for record in log.from_beginning(): if record.type == COMMIT: committed.add(record.transaction_id) // Step 2: Scan forward, redoing committed work for record in log.from_beginning(): if record.transaction_id in committed: if record.type in [UPDATE, INSERT, DELETE]: page = buffer.getPage(record.page_id) // Check if redo needed (page may already be current) if page.lsn < record.lsn: // Apply after image (redo) page.applyAfterImage(record.after_image) page.setLSN(record.lsn) // Uncommitted transactions: no action needed // (their pages were never written due to NO-STEAL)Undo-Redo Recovery (ARIES-style):
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
// Undo-Redo Recovery (Three-Phase ARIES-style)// Must handle all combinations: committed/uncommitted × on disk/not on disk function recoverUndoRedo(): // PHASE 1: ANALYSIS // Determine: dirty pages, active transactions at crash active_txns = {} dirty_pages = {} for record in log.from_checkpoint(): if record.type == BEGIN: active_txns[record.txn_id] = {state: ACTIVE, lastLSN: record.lsn} else if record.type == COMMIT: active_txns[record.txn_id].state = COMMITTED else if record.type == ABORT: active_txns[record.txn_id].state = ABORTED else if record.type in [UPDATE, INSERT, DELETE]: active_txns[record.txn_id].lastLSN = record.lsn if record.page_id not in dirty_pages: dirty_pages[record.page_id] = record.lsn // first dirtying LSN // PHASE 2: REDO (repeat history) // Apply ALL logged operations (committed or not) if page is behind redo_start = min(dirty_pages.values()) for record in log.from(redo_start): if record.type in [UPDATE, INSERT, DELETE, CLR]: page = buffer.getPage(record.page_id) if page.lsn < record.lsn: page.applyAfterImage(record.after_image) page.setLSN(record.lsn) // PHASE 3: UNDO (rollback losers) // Undo all transactions that were active (not committed) at crash losers = [txn for txn in active_txns if txn.state == ACTIVE] undo_list = PriorityQueue() // ordered by LSN descending for txn in losers: undo_list.add(txn.lastLSN) while not undo_list.empty(): lsn = undo_list.pop() record = log.getRecord(lsn) if record.type in [UPDATE, INSERT, DELETE]: // Undo this operation page = buffer.getPage(record.page_id) page.applyBeforeImage(record.before_image) // Write CLR so we don't repeat this undo clr = writeCLR(record, undoNextLSN=record.prev_lsn) if record.prev_lsn != NULL: undo_list.add(record.prev_lsn) else if record.type == CLR: // This undo was already done, skip to next if record.undo_next_lsn != NULL: undo_list.add(record.undo_next_lsn) // Write ABORT records for all loser transactions for txn in losers: log.writeAbort(txn.id)ARIES-style recovery redoes ALL operations first (even uncommitted ones) to restore the exact database state at crash time. Then it undoes uncommitted work. This 'repeat history' approach ensures that physiological logging and complex operations work correctly—the database state must be exactly as it was for before-images to be meaningful.
Orthogonal to the undo/redo type distinction is another important classification: physical vs. logical logging.
Physical Logging:
Logical Logging:
Physiological Logging (The Hybrid):
Most modern systems use physiological logging: physically identify the affected page, but logically describe the change within that page.
| Approach | Identifies Page | Describes Change | Record Size | Complexity |
|---|---|---|---|---|
| Pure Physical | By disk block address | Byte-level diff | Large | Simple redo/undo |
| Pure Logical | By table/key | SQL-like operation | Small | Complex re-execution |
| Physiological | By page ID | Slot-level change | Medium | Balanced |
Example: UPDATE accounts SET balance = 500 WHERE id = 101
Pure Physical Log:
[Page 42, offset 1724-1732: replace 0x000003E8 with 0x000001F4]
Pure Logical Log:
[UPDATE accounts SET balance = 500 WHERE id = 101]
Physiological Log:
[Page 42, slot 15, column 'balance': before=1000, after=500]
Physiological logging provides the best trade-off: page-level identification (simple addressing), but logical change description (compact, meaningful). This approach also handles page reorganizations gracefully—the slot number identifies the record even if it moves within the page.
Index operations often use pure physical logging because their structure-level changes (splits, merges) are difficult to express logically. The log might simply record 'page 847 replaced by pages 847+1023' during a B-tree split, with full before/after page images.
We've examined the three primary logging strategies and understood why undo-redo logging dominates in production systems. Let's consolidate the key insights:
What's Next:
Now that we understand the types of logging and why undo-redo dominates, we'll examine log storage—how logs are physically organized on disk, how they're managed for space efficiency, and how archival and truncation work in production systems.
You now understand the three logging strategies, their constraints, and why undo-redo with STEAL/NO-FORCE is the universal choice for production databases. Next, we'll explore how these logs are physically stored and managed.