Loading learning content...
The log is ARIES's source of truth—a persistent record of every modification that enables both undo and redo. But the log is not merely a sequence of bytes; it's a carefully structured repository of recovery information. Each log record type serves a specific purpose, contains particular fields, and relates to other records in defined ways.
Understanding log structure is essential for several reasons: it reveals what information ARIES needs and why, it clarifies how different record types support different recovery operations, and it explains the space and I/O tradeoffs inherent in logging strategies.
By the end of this page, you will understand: (1) the complete taxonomy of log record types in ARIES, (2) the structure and fields of each record type, (3) physical vs. logical vs. physiological logging, and (4) how log records interconnect to support recovery.
ARIES uses several types of log records, each serving a distinct purpose in the recovery ecosystem. Understanding these types is fundamental to grasping how recovery works.
Core Record Types
| Record Type | Purpose | Generated When | Recovery Use |
|---|---|---|---|
| UPDATE | Records a data modification | Data operation executes | Redo and undo operations |
| COMMIT | Marks transaction as committed | Transaction commits | Identify winner transactions |
| ABORT | Marks transaction as aborting | Transaction aborts | Trigger undo processing |
| CLR (Compensation) | Records an undo action | During rollback/recovery | Redo the undo; track undo progress |
| END | Marks transaction completion | After commit or full rollback | Remove from transaction table |
| BEGIN | Marks transaction start | Transaction starts (optional) | Debugging/auditing |
| CHECKPOINT | Captures system state snapshot | Periodically by system | Bound recovery scan |
Record Type Categories
We can categorize these records by their role:
During normal forward processing, we primarily write UPDATE, COMMIT, and END records. CLRs appear only during rollback or recovery. CHECKPOINT records are written periodically by a background process.
Many ARIES implementations don't write explicit BEGIN records. The first UPDATE record for a transaction implicitly marks its start. This reduces log volume. However, explicit BEGIN records can aid debugging and provide transaction timing information.
The UPDATE record is the most important and most frequent log record type. It captures all the information needed to redo or undo a data modification.
UPDATE Record Fields
| Field | Type | Description | Example |
|---|---|---|---|
| LSN | Log Sequence Number | Unique identifier for this record | 5000 |
| Type | Enum | Record type identifier | UPDATE |
| TransactionID | Identifier | Transaction that made this change | T42 |
| PrevLSN | LSN | Previous log record by this transaction | 4500 |
| PageID | Identifier | Database page being modified | Page_1023 |
| Length | Integer | Size of this log record | 256 bytes |
| UndoInfo | Bytes/Operation | Information to reverse this change | Old row, or inverse operation |
| RedoInfo | Bytes/Operation | Information to re-apply this change | New row, or operation details |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
// Detailed UPDATE record structure interface UpdateLogRecord { // Header fields (common to all record types) lsn: LSN; // Assigned when record is written type: "UPDATE"; // Record type identifier transactionId: TransactionID; // Owning transaction prevLSN: LSN | null; // Previous record by this transaction // Page identification pageId: PageID; // Which page is being modified // Modification information (depends on logging type) undoInfo: UndoData; // How to reverse this change redoInfo: RedoData; // How to re-apply this change} // Example concrete UPDATE record:const exampleUpdate: UpdateLogRecord = { lsn: 5000, type: "UPDATE", transactionId: "T42", prevLSN: 4500, // T42's previous operation was at LSN 4500 pageId: "Page_1023", // For a row update: "SET salary = 75000 WHERE id = 5" // Physical logging: store before/after byte values undoInfo: { offset: 120, // Where on the page length: 4, // How many bytes beforeValue: Buffer.from([0x00, 0x00, 0xC3, 0x50]) // 50000 in binary }, redoInfo: { offset: 120, length: 4, afterValue: Buffer.from([0x00, 0x01, 0x24, 0xF8]) // 75000 in binary }}; // Physiological logging alternative:const physiologicalUpdate = { lsn: 5000, type: "UPDATE", transactionId: "T42", prevLSN: 4500, pageId: "Page_1023", // Operation-based description undoInfo: { operation: "SET_COLUMN", rowSlot: 5, columnId: "salary", value: 50000 // Old value to restore }, redoInfo: { operation: "SET_COLUMN", rowSlot: 5, columnId: "salary", value: 75000 // New value to apply }};Understanding UndoInfo and RedoInfo
The undo and redo information are the payload of an UPDATE record. What they contain depends on the logging strategy:
The choice of logging strategy affects log size, redo/undo complexity, and recovery flexibility.
For most operations, undo and redo are symmetric inverses. If redo is 'set X to 75000' and the original value was 50000, undo is 'set X to 50000'. But this isn't always true—some operations like 'allocate page' need different undo logic than just 'deallocate page' due to metadata updates.
Compensation Log Records (CLRs) are written during transaction rollback or recovery undo. They record the undo action, enabling ARIES to handle crashes during rollback without re-undoing already-undone operations.
CLR Design Philosophy
A CLR describes the inverse of an UPDATE. When we undo an UPDATE at LSN X, we write a CLR that:
CLRs are redo-only records—they are never undone themselves. If we crash and recover, CLRs are redone just like UPDATEs, restoring our undo progress.
CLR Record Fields
| Field | Type | Description | Key Difference from UPDATE |
|---|---|---|---|
| LSN | LSN | This CLR's unique identifier | Same as UPDATE |
| Type | Enum | Record type = CLR | Different type indicator |
| TransactionID | ID | Transaction being rolled back | Same as UPDATE |
| PrevLSN | LSN | Previous record by this transaction | Points to prior CLR or original UPDATE |
| PageID | ID | Page where undo was applied | Same as UPDATE |
| RedoInfo | Data | How to redo this undo action | Contains the undo operation |
| UndoNextLSN | LSN | Next record to undo for this txn | NOT in UPDATE records |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
// CLR Record Structure interface CLRLogRecord { // Common header fields lsn: LSN; type: "CLR"; transactionId: TransactionID; prevLSN: LSN; // Points to previous log record (could be CLR or UPDATE) // Page identification pageId: PageID; // Page where the undo was applied // The undo action (for redo) redoInfo: RedoData; // Describes what undo did to the page // CLR-specific: where to continue undoing undoNextLSN: LSN | null; // Next record to undo (or null if undo complete) // Note: NO undoInfo field! CLRs are never undone.} // Example: Undoing the UPDATE from previous example // Original UPDATE at LSN 5000 was: SET salary from 50000 to 75000// Undo means: SET salary from 75000 to 50000// The original UPDATE's PrevLSN was 4500 const exampleCLR: CLRLogRecord = { lsn: 6000, // New LSN for this CLR type: "CLR", transactionId: "T42", prevLSN: 5000, // Points back to the UPDATE we're undoing // (or previous CLR if multiple undos occurred) pageId: "Page_1023", // How to redo the undo (if we crash and recover) redoInfo: { offset: 120, length: 4, afterValue: Buffer.from([0x00, 0x00, 0xC3, 0x50]) // Restore to 50000 }, // Continue undoing at LSN 4500 (the original UPDATE's PrevLSN) undoNextLSN: 4500}; // Key insight:// The CLR's UndoNextLSN = The undone record's PrevLSN// This skips over the record we just undid when continuing undoCLR Chaining During Rollback
As a transaction is rolled back, CLRs form their own chain:
[CLR for last op] → [CLR for second-to-last] → ... → [CLR for first op]
Each CLR's PrevLSN points to the previous CLR (or the original UPDATE if it's the first CLR). The UndoNextLSN points to where undo should continue—skipping past what's already been undone.
This is a fundamental ARIES invariant. During undo, when we encounter a CLR, we do NOT undo it—we just read its UndoNextLSN and continue from there. Undoing a CLR would mean redoing the original operation, which contradicts the rollback. CLRs are designed to be redo-only records.
Transaction state records track the lifecycle of transactions without modifying data. They're simpler than UPDATE records but essential for recovery correctness.
COMMIT Record
The COMMIT record marks that a transaction has successfully completed and its effects should persist. It's the point of no return.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
// COMMIT Record Structureinterface CommitLogRecord { lsn: LSN; type: "COMMIT"; transactionId: TransactionID; prevLSN: LSN; // Last UPDATE/CLR by this transaction timestamp?: Date; // Optional: when commit occurred} // ABORT Record Structureinterface AbortLogRecord { lsn: LSN; type: "ABORT"; transactionId: TransactionID; prevLSN: LSN; // Records that transaction is aborting (rollback will follow)} // END Record Structureinterface EndLogRecord { lsn: LSN; type: "END"; transactionId: TransactionID; prevLSN: LSN; // Points to COMMIT or last CLR // Marks transaction completely finished (can remove from TT)} // BEGIN Record Structure (optional in many systems)interface BeginLogRecord { lsn: LSN; type: "BEGIN"; transactionId: TransactionID; prevLSN: null; // First record for this transaction timestamp?: Date;} // Transaction lifecycle examples: // Successful commit:// BEGIN(T1) → UPDATE(T1) → UPDATE(T1) → COMMIT(T1) → END(T1) // Aborted transaction:// BEGIN(T1) → UPDATE(T1) → UPDATE(T1) → ABORT(T1) // → CLR(T1) → CLR(T1) → END(T1) // Crash during transaction:// BEGIN(T1) → UPDATE(T1) → UPDATE(T1) → [CRASH]// Recovery will: undo all T1 updates, write CLRs, write ENDThe Role of Each Transaction Record
| Record | When Written | Recovery Implication | Can Be Omitted? |
|---|---|---|---|
| BEGIN | Transaction start | Provides start time, txn ID assignment | Yes (implicit from first UPDATE) |
| COMMIT | Before returning success to app | Mark as winner (don't undo) | No—required for durability |
| ABORT | When rollback requested | Explicitly start rollback | Yes (absence implies abort if no COMMIT) |
| END | After commit or complete rollback | Remove from transaction table | Yes (but slows recovery) |
COMMIT vs END: Why Both?
COMMIT and END are separate because there's work between them:
If we crash after COMMIT but before END:
Without END records, the Transaction Table might accumulate completed transactions, making analysis slower.
The COMMIT record MUST be on stable storage before returning success to the application. This is why commit latency includes a log flush. The END record doesn't need immediate flushing—it's just for cleanup. This is a key distinction for performance.
CHECKPOINT records capture system state at a point in time, bounding how far back recovery must scan. Without checkpoints, recovery would start from the beginning of the log—potentially enormous.
Checkpoint Contents
A checkpoint record contains snapshots of the critical recovery data structures:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
// Checkpoint Record Structure interface CheckpointLogRecord { lsn: LSN; type: "CHECKPOINT"; // Snapshot of Transaction Table transactionTable: Array<{ transactionId: TransactionID; state: "ACTIVE" | "COMMITTING" | "ABORTING"; lastLSN: LSN; }>; // Snapshot of Dirty Page Table dirtyPageTable: Array<{ pageId: PageID; recLSN: LSN; }>; // Additional metadata timestamp: Date; // Master record location (for finding this checkpoint) // Usually stored separately in a known location} // Example checkpoint record:const exampleCheckpoint: CheckpointLogRecord = { lsn: 10000, type: "CHECKPOINT", transactionTable: [ { transactionId: "T1", state: "ACTIVE", lastLSN: 9500 }, { transactionId: "T2", state: "ACTIVE", lastLSN: 8700 }, { transactionId: "T3", state: "COMMITTING", lastLSN: 9900 } ], dirtyPageTable: [ { pageId: "Page_100", recLSN: 8000 }, { pageId: "Page_250", recLSN: 9200 }, { pageId: "Page_375", recLSN: 9800 } ], timestamp: new Date("2024-03-15T10:30:00Z")}; // How recovery uses this:// 1. Find the master record pointing to this checkpoint// 2. Read checkpoint at LSN 10000// 3. Initialize TT and DPT from checkpoint contents// 4. Scan log forward from LSN 10000 (or earliest recLSN)// 5. Analysis updates TT and DPT based on subsequent recordsTypes of Checkpoints
ARIES supports different checkpoint strategies:
1. Quiescent (Sharp) Checkpoint
Advantage: Simple recovery. Disadvantage: System pause is unacceptable for production.
2. Non-Quiescent (Transaction-Consistent) Checkpoint
Advantage: No pause. Disadvantage: More complex, more data in checkpoint.
3. Fuzzy Checkpoint (ARIES default)
Advantage: Minimal overhead. Disadvantage: More redo work potentially needed.
| Type | Blocks Operations? | Forces Pages? | Recovery Work | Production Suitable? |
|---|---|---|---|---|
| Quiescent | Yes | Yes | Minimal | No |
| Transaction-Consistent | Briefly | Yes | Low | Sometimes |
| Fuzzy | No | No | More | Yes |
Most systems store a 'master record' at a known location that points to the most recent checkpoint. On restart, recovery reads the master record to find the checkpoint, then starts analysis from there. The master record must be updated atomically after the checkpoint is fully written.
The contents of UPDATE records' redo and undo information depend on the logging strategy. ARIES supports multiple approaches, each with distinct tradeoffs.
Physical Logging
Physical logging records the actual bytes changed:
Advantages:
Disadvantages:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
// Physical Logging Example// Updating a 1KB row requires storing 2KB (before + after) const physicalUpdate = { redoInfo: { pageId: "Page_100", offset: 0, data: Buffer.alloc(1024) // Full row after-image }, undoInfo: { pageId: "Page_100", offset: 0, data: Buffer.alloc(1024) // Full row before-image } // Total: ~2KB per row modification!}; // Logical Logging Example// Compact but requires execution context const logicalUpdate = { redoInfo: { operation: "UPDATE employees SET salary = salary * 1.1 WHERE dept = 'ENG'" }, undoInfo: { operation: "UPDATE employees SET salary = salary / 1.1 WHERE dept = 'ENG'" } // Total: ~100 bytes, but requires SQL execution!}; // Physiological Logging (Physical to a Page, Logical Within)// ARIES's preferred approach const physiologicalUpdate = { redoInfo: { pageId: "Page_100", // Specific page (physical) operation: "SET_SLOT", // Logical operation slotNumber: 5, columnId: "salary", newValue: 75000 }, undoInfo: { pageId: "Page_100", operation: "SET_SLOT", slotNumber: 5, columnId: "salary", oldValue: 50000 } // Compact logs + deterministic page-level operations};Logical Logging
Logical logging records the operation that was performed:
Advantages:
Disadvantages:
Physiological Logging (ARIES's Choice)
Physiological = Physical-to-a-Page, Logical-within-a-Page.
Advantages:
Disadvantages:
| Strategy | Log Size | Recovery Complexity | Concurrency Support | ARIES Default |
|---|---|---|---|---|
| Physical | Large | Simple | Limited | No |
| Logical | Small | Complex | Limited | No |
| Physiological | Medium | Medium | Good | Yes |
Physiological logging hits the sweet spot: logs are compact (no full-page images), recovery is deterministic (same page, same operation), and it supports fine-grained concurrency (page-level locking compatible). This is why ARIES standardized on physiological logging.
Log records don't exist in isolation—they form a web of interconnected pointers that enable efficient recovery operations.
Forward Log Sequence
Records are written sequentially with monotonically increasing LSNs. This sequence is the primary structure for redo: scan forward, apply operations.
Transaction Backward Chains (PrevLSN)
Each transaction's records form a backward chain via PrevLSN. Undo follows these chains:
T1: UPDATE(1000) ← UPDATE(3000) ← CLR(6000) ← UPDATE(8000)
[first] ← [second] ← [undo] ← [third]
CLR UndoNextLSN Pointers
CLRs have additional UndoNextLSN pointers that skip over undone operations:
CLR(6000).UndoNextLSN = 1000 (skip UPDATE at 3000, already undone)
Page Modification Chains
While not explicit pointers, all records modifying a given page form an implicit chain ordered by LSN. PageLSN comparison uses this ordering.
Navigating the Structure
Recovery uses different navigations:
Each navigation is optimized for its purpose:
Conceptually, the log is a directed graph: nodes are records, forward edges follow LSN sequence, backward edges follow PrevLSN, and skip edges follow UndoNextLSN. Recovery algorithms traverse this graph in different ways depending on their needs.
The physical storage and management of the log presents its own engineering challenges.
Log Buffer Management
Log records are first written to an in-memory log buffer before being flushed to stable storage:
Circular Log with Archiving
Physical logs are often organized as circular buffers with archiving:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
// Log Storage Configuration interface LogConfiguration { // Log buffer (in memory) bufferSize: number; // e.g., 64MB flushIntervalMs: number; // e.g., 1000ms (1 second) forceOnCommit: boolean; // Must be true for durability! // Active log files logFileSize: number; // e.g., 1GB per file numLogFiles: number; // e.g., 4 files (circular) logDirectory: string; // e.g., "/var/db/wal/" // Archive settings archiveEnabled: boolean; archiveDirectory: string; // e.g., "/backup/wal_archive/" compressionEnabled: boolean; // Checkpoint settings checkpointIntervalSec: number; // e.g., 300 (5 minutes) checkpointOnLogFull: boolean;} // Log file layout://// log_001.wal (1GB)// ├── Record at LSN 0// ├── Record at LSN 256// ├── ...// └── Record at LSN 1073741568//// log_002.wal (1GB) // ├── Record at LSN 1073741824// └── ...//// When log_001 is no longer needed (all txns committed, // checkpoint past its records), it can be archived or recycled. // Determining if log file can be freed:function canFreeLogFile(file: LogFile, redoLSN: LSN): boolean { // File can be freed if all its records precede the redo starting point // AND all transactions that touched it have completed return file.maxLSN < redoLSN;}Log Truncation
The log can't grow forever. Log truncation removes old records that are no longer needed for recovery:
Write-Ahead Log Performance
The log's performance characteristics are crucial:
Modern systems use NVMe SSDs or battery-backed write caches to make log flushes fast without sacrificing durability.
Production databases typically keep the log on a separate physical device from data files. Both are sequential writers (log appends, checkpoint flushes), so separating them eliminates head movement contention. SSDs change this calculus but dedicated log devices remain common for latency predictability.
The log structure is the information foundation of ARIES recovery. Let's consolidate the key concepts:
What's Next
With log structure understood, we'll explore the core ARIES principles—the theoretical foundations that ensure recovery is always correct: Write-Ahead Logging, Repeat History, and the invariants that tie everything together.
You now understand the complete structure of ARIES logs: the record types, their fields, how they interconnect, and the design decisions behind physiological logging. This structural knowledge provides the foundation for understanding ARIES's correctness principles.