Database Management SystemsARIES Overview

ARIES Recovery Algorithm Overview

LevelAdvanced

Duration75 mins

TopicARIES Overview

4 / 5

ARIES Log Structure

The Recovery Information Repository

The log is ARIES's source of truth—a persistent record of every modification that enables both undo and redo. But the log is not merely a sequence of bytes; it's a carefully structured repository of recovery information. Each log record type serves a specific purpose, contains particular fields, and relates to other records in defined ways.

Understanding log structure is essential for several reasons: it reveals what information ARIES needs and why, it clarifies how different record types support different recovery operations, and it explains the space and I/O tradeoffs inherent in logging strategies.

What You Will Learn

By the end of this page, you will understand: (1) the complete taxonomy of log record types in ARIES, (2) the structure and fields of each record type, (3) physical vs. logical vs. physiological logging, and (4) how log records interconnect to support recovery.

Log Record Types

ARIES uses several types of log records, each serving a distinct purpose in the recovery ecosystem. Understanding these types is fundamental to grasping how recovery works.

Core Record Types

ARIES Log Record Types
Record Type	Purpose	Generated When	Recovery Use
UPDATE	Records a data modification	Data operation executes	Redo and undo operations
COMMIT	Marks transaction as committed	Transaction commits	Identify winner transactions
ABORT	Marks transaction as aborting	Transaction aborts	Trigger undo processing
CLR (Compensation)	Records an undo action	During rollback/recovery	Redo the undo; track undo progress
END	Marks transaction completion	After commit or full rollback	Remove from transaction table
BEGIN	Marks transaction start	Transaction starts (optional)	Debugging/auditing
CHECKPOINT	Captures system state snapshot	Periodically by system	Bound recovery scan

Record Type Categories

We can categorize these records by their role:

Data Records (UPDATE, CLR): Describe modifications to database pages
Transaction Records (BEGIN, COMMIT, ABORT, END): Track transaction lifecycle
System Records (CHECKPOINT): Capture system-wide state

During normal forward processing, we primarily write UPDATE, COMMIT, and END records. CLRs appear only during rollback or recovery. CHECKPOINT records are written periodically by a background process.

BEGIN Records Are Optional

Many ARIES implementations don't write explicit BEGIN records. The first UPDATE record for a transaction implicitly marks its start. This reduces log volume. However, explicit BEGIN records can aid debugging and provide transaction timing information.

UPDATE Record: The Workhorse

The UPDATE record is the most important and most frequent log record type. It captures all the information needed to redo or undo a data modification.

UPDATE Record Fields

UPDATE Log Record Structure
Field	Type	Description	Example
LSN	Log Sequence Number	Unique identifier for this record	5000
Type	Enum	Record type identifier	UPDATE
TransactionID	Identifier	Transaction that made this change	T42
PrevLSN	LSN	Previous log record by this transaction	4500
PageID	Identifier	Database page being modified	Page_1023
Length	Integer	Size of this log record	256 bytes
UndoInfo	Bytes/Operation	Information to reverse this change	Old row, or inverse operation
RedoInfo	Bytes/Operation	Information to re-apply this change	New row, or operation details

UPDATE Record Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
// Detailed UPDATE record structure
 
interface UpdateLogRecord {
    // Header fields (common to all record types)
    lsn: LSN;                    // Assigned when record is written
    type: "UPDATE";              // Record type identifier
    transactionId: TransactionID; // Owning transaction
    prevLSN: LSN | null;         // Previous record by this transaction
    
    // Page identification
    pageId: PageID;              // Which page is being modified
    
    // Modification information (depends on logging type)
    undoInfo: UndoData;          // How to reverse this change
    redoInfo: RedoData;          // How to re-apply this change
}
 
// Example concrete UPDATE record:
const exampleUpdate: UpdateLogRecord = {
    lsn: 5000,
    type: "UPDATE",
    transactionId: "T42",
    prevLSN: 4500,  // T42's previous operation was at LSN 4500
    
    pageId: "Page_1023",
    
    // For a row update: "SET salary = 75000 WHERE id = 5"
    // Physical logging: store before/after byte values
    undoInfo: {
        offset: 120,      // Where on the page
        length: 4,        // How many bytes
        beforeValue: Buffer.from([0x00, 0x00, 0xC3, 0x50])  // 50000 in binary
    },
    redoInfo: {
        offset: 120,
        length: 4,
        afterValue: Buffer.from([0x00, 0x01, 0x24, 0xF8])   // 75000 in binary
    }
};
 
// Physiological logging alternative:
const physiologicalUpdate = {
    lsn: 5000,
    type: "UPDATE",
    transactionId: "T42",
    prevLSN: 4500,
    pageId: "Page_1023",
    
    // Operation-based description
    undoInfo: {
        operation: "SET_COLUMN",
        rowSlot: 5,
        columnId: "salary",
        value: 50000  // Old value to restore
    },
    redoInfo: {
        operation: "SET_COLUMN",
        rowSlot: 5,
        columnId: "salary", 
        value: 75000  // New value to apply
    }
};

Understanding UndoInfo and RedoInfo

The undo and redo information are the payload of an UPDATE record. What they contain depends on the logging strategy:

Physical logging: Raw bytes—before-image for undo, after-image for redo
Logical logging: Operation description—"decrement X by 10" for redo, "increment X by 10" for undo
Physiological logging: Operation on a specific page—"set slot 5, column 3 to value V"

The choice of logging strategy affects log size, redo/undo complexity, and recovery flexibility.

Undo and Redo Are Often Symmetric

For most operations, undo and redo are symmetric inverses. If redo is 'set X to 75000' and the original value was 50000, undo is 'set X to 50000'. But this isn't always true—some operations like 'allocate page' need different undo logic than just 'deallocate page' due to metadata updates.

CLR Record: Compensation Logging

Compensation Log Records (CLRs) are written during transaction rollback or recovery undo. They record the undo action, enabling ARIES to handle crashes during rollback without re-undoing already-undone operations.

CLR Design Philosophy

A CLR describes the inverse of an UPDATE. When we undo an UPDATE at LSN X, we write a CLR that:

Describes what we did to reverse the UPDATE (for redo if we crash)
Contains UndoNextLSN pointing to the next record to undo

CLRs are redo-only records—they are never undone themselves. If we crash and recover, CLRs are redone just like UPDATEs, restoring our undo progress.

CLR Record Fields

CLR Log Record Structure
Field	Type	Description	Key Difference from UPDATE
LSN	LSN	This CLR's unique identifier	Same as UPDATE
Type	Enum	Record type = CLR	Different type indicator
TransactionID	ID	Transaction being rolled back	Same as UPDATE
PrevLSN	LSN	Previous record by this transaction	Points to prior CLR or original UPDATE
PageID	ID	Page where undo was applied	Same as UPDATE
RedoInfo	Data	How to redo this undo action	Contains the undo operation
UndoNextLSN	LSN	Next record to undo for this txn	NOT in UPDATE records

CLR Record Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// CLR Record Structure
 
interface CLRLogRecord {
    // Common header fields
    lsn: LSN;
    type: "CLR";
    transactionId: TransactionID;
    prevLSN: LSN;              // Points to previous log record (could be CLR or UPDATE)
    
    // Page identification
    pageId: PageID;            // Page where the undo was applied
    
    // The undo action (for redo)
    redoInfo: RedoData;        // Describes what undo did to the page
    
    // CLR-specific: where to continue undoing
    undoNextLSN: LSN | null;   // Next record to undo (or null if undo complete)
    
    // Note: NO undoInfo field! CLRs are never undone.
}
 
// Example: Undoing the UPDATE from previous example
 
// Original UPDATE at LSN 5000 was: SET salary from 50000 to 75000
// Undo means: SET salary from 75000 to 50000
// The original UPDATE's PrevLSN was 4500
 
const exampleCLR: CLRLogRecord = {
    lsn: 6000,                  // New LSN for this CLR
    type: "CLR",
    transactionId: "T42",
    prevLSN: 5000,              // Points back to the UPDATE we're undoing
                                 // (or previous CLR if multiple undos occurred)
    
    pageId: "Page_1023",
    
    // How to redo the undo (if we crash and recover)
    redoInfo: {
        offset: 120,
        length: 4,
        afterValue: Buffer.from([0x00, 0x00, 0xC3, 0x50])  // Restore to 50000
    },
    
    // Continue undoing at LSN 4500 (the original UPDATE's PrevLSN)
    undoNextLSN: 4500
};
 
// Key insight:
// The CLR's UndoNextLSN = The undone record's PrevLSN
// This skips over the record we just undid when continuing undo

CLR Chaining During Rollback

As a transaction is rolled back, CLRs form their own chain:

[CLR for last op] → [CLR for second-to-last] → ... → [CLR for first op]

Each CLR's PrevLSN points to the previous CLR (or the original UPDATE if it's the first CLR). The UndoNextLSN points to where undo should continue—skipping past what's already been undone.

Converting Mermaid diagram...

CLRs Are Never Undone

This is a fundamental ARIES invariant. During undo, when we encounter a CLR, we do NOT undo it—we just read its UndoNextLSN and continue from there. Undoing a CLR would mean redoing the original operation, which contradicts the rollback. CLRs are designed to be redo-only records.

Transaction State Records

Transaction state records track the lifecycle of transactions without modifying data. They're simpler than UPDATE records but essential for recovery correctness.

COMMIT Record

The COMMIT record marks that a transaction has successfully completed and its effects should persist. It's the point of no return.

Transaction State Records
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// COMMIT Record Structure
interface CommitLogRecord {
    lsn: LSN;
    type: "COMMIT";
    transactionId: TransactionID;
    prevLSN: LSN;              // Last UPDATE/CLR by this transaction
    timestamp?: Date;          // Optional: when commit occurred
}
 
// ABORT Record Structure
interface AbortLogRecord {
    lsn: LSN;
    type: "ABORT";
    transactionId: TransactionID;
    prevLSN: LSN;
    // Records that transaction is aborting (rollback will follow)
}
 
// END Record Structure
interface EndLogRecord {
    lsn: LSN;
    type: "END";
    transactionId: TransactionID;
    prevLSN: LSN;              // Points to COMMIT or last CLR
    // Marks transaction completely finished (can remove from TT)
}
 
// BEGIN Record Structure (optional in many systems)
interface BeginLogRecord {
    lsn: LSN;
    type: "BEGIN";
    transactionId: TransactionID;
    prevLSN: null;             // First record for this transaction
    timestamp?: Date;
}
 
// Transaction lifecycle examples:
 
// Successful commit:
// BEGIN(T1) → UPDATE(T1) → UPDATE(T1) → COMMIT(T1) → END(T1)
 
// Aborted transaction:
// BEGIN(T1) → UPDATE(T1) → UPDATE(T1) → ABORT(T1) 
//           → CLR(T1) → CLR(T1) → END(T1)
 
// Crash during transaction:
// BEGIN(T1) → UPDATE(T1) → UPDATE(T1) → [CRASH]
// Recovery will: undo all T1 updates, write CLRs, write END

The Role of Each Transaction Record

Transaction Record Purposes
Record	When Written	Recovery Implication	Can Be Omitted?
BEGIN	Transaction start	Provides start time, txn ID assignment	Yes (implicit from first UPDATE)
COMMIT	Before returning success to app	Mark as winner (don't undo)	No—required for durability
ABORT	When rollback requested	Explicitly start rollback	Yes (absence implies abort if no COMMIT)
END	After commit or complete rollback	Remove from transaction table	Yes (but slows recovery)

COMMIT vs END: Why Both?

COMMIT and END are separate because there's work between them:

COMMIT is written → transaction is now durable
Release locks → other transactions can proceed
Clean up resources → buffers, temporary structures
END is written → transaction is fully done, can remove from TT

If we crash after COMMIT but before END:

Transaction is committed (COMMIT record exists)
But it's still in the Transaction Table (no END)
Recovery treats it as a winner (has COMMIT), writes END, done

Without END records, the Transaction Table might accumulate completed transactions, making analysis slower.

COMMIT Must Be Force-Written

The COMMIT record MUST be on stable storage before returning success to the application. This is why commit latency includes a log flush. The END record doesn't need immediate flushing—it's just for cleanup. This is a key distinction for performance.

CHECKPOINT Records

CHECKPOINT records capture system state at a point in time, bounding how far back recovery must scan. Without checkpoints, recovery would start from the beginning of the log—potentially enormous.

Checkpoint Contents

A checkpoint record contains snapshots of the critical recovery data structures:

Checkpoint Record Structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// Checkpoint Record Structure
 
interface CheckpointLogRecord {
    lsn: LSN;
    type: "CHECKPOINT";
    
    // Snapshot of Transaction Table
    transactionTable: Array<{
        transactionId: TransactionID;
        state: "ACTIVE" | "COMMITTING" | "ABORTING";
        lastLSN: LSN;
    }>;
    
    // Snapshot of Dirty Page Table
    dirtyPageTable: Array<{
        pageId: PageID;
        recLSN: LSN;
    }>;
    
    // Additional metadata
    timestamp: Date;
    
    // Master record location (for finding this checkpoint)
    // Usually stored separately in a known location
}
 
// Example checkpoint record:
const exampleCheckpoint: CheckpointLogRecord = {
    lsn: 10000,
    type: "CHECKPOINT",
    
    transactionTable: [
        { transactionId: "T1", state: "ACTIVE", lastLSN: 9500 },
        { transactionId: "T2", state: "ACTIVE", lastLSN: 8700 },
        { transactionId: "T3", state: "COMMITTING", lastLSN: 9900 }
    ],
    
    dirtyPageTable: [
        { pageId: "Page_100", recLSN: 8000 },
        { pageId: "Page_250", recLSN: 9200 },
        { pageId: "Page_375", recLSN: 9800 }
    ],
    
    timestamp: new Date("2024-03-15T10:30:00Z")
};
 
// How recovery uses this:
// 1. Find the master record pointing to this checkpoint
// 2. Read checkpoint at LSN 10000
// 3. Initialize TT and DPT from checkpoint contents
// 4. Scan log forward from LSN 10000 (or earliest recLSN)
// 5. Analysis updates TT and DPT based on subsequent records

Types of Checkpoints

ARIES supports different checkpoint strategies:

1. Quiescent (Sharp) Checkpoint

Stop all transactions
Flush all dirty pages
Write checkpoint with empty DPT
Resume transactions

Advantage: Simple recovery. Disadvantage: System pause is unacceptable for production.

2. Non-Quiescent (Transaction-Consistent) Checkpoint

Don't stop transactions
Flush dirty pages in background
Checkpoint captures current TT and DPT

Advantage: No pause. Disadvantage: More complex, more data in checkpoint.

3. Fuzzy Checkpoint (ARIES default)

Don't stop transactions
Don't force page writes
Just record current TT and DPT
Pages may be dirtier than checkpoint indicates

Advantage: Minimal overhead. Disadvantage: More redo work potentially needed.

Checkpoint Type Comparison
Type	Blocks Operations?	Forces Pages?	Recovery Work	Production Suitable?
Quiescent	Yes	Yes	Minimal	No
Transaction-Consistent	Briefly	Yes	Low	Sometimes
Fuzzy	No	No	More	Yes

Master Record

Most systems store a 'master record' at a known location that points to the most recent checkpoint. On restart, recovery reads the master record to find the checkpoint, then starts analysis from there. The master record must be updated atomically after the checkpoint is fully written.

Physical, Logical, and Physiological Logging

The contents of UPDATE records' redo and undo information depend on the logging strategy. ARIES supports multiple approaches, each with distinct tradeoffs.

Physical Logging

Physical logging records the actual bytes changed:

RedoInfo: "At offset 120, write bytes 0x00 0x01 0x24 0xF8"
UndoInfo: "At offset 120, write bytes 0x00 0x00 0xC3 0x50"

Advantages:

Simple: Just copy bytes
Deterministic: Same bytes give same result
No dependencies: Redo/undo doesn't need to understand semantics

Disadvantages:

Large records: Before/after images can be huge (whole rows or pages)
Fragile: Physical layout must match exactly
No semantic optimization possible

Logging Strategy Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// Physical Logging Example
// Updating a 1KB row requires storing 2KB (before + after)
 
const physicalUpdate = {
    redoInfo: {
        pageId: "Page_100",
        offset: 0,
        data: Buffer.alloc(1024)  // Full row after-image
    },
    undoInfo: {
        pageId: "Page_100", 
        offset: 0,
        data: Buffer.alloc(1024)  // Full row before-image
    }
    // Total: ~2KB per row modification!
};
 
// Logical Logging Example
// Compact but requires execution context
 
const logicalUpdate = {
    redoInfo: {
        operation: "UPDATE employees SET salary = salary * 1.1 WHERE dept = 'ENG'"
    },
    undoInfo: {
        operation: "UPDATE employees SET salary = salary / 1.1 WHERE dept = 'ENG'"
    }
    // Total: ~100 bytes, but requires SQL execution!
};
 
// Physiological Logging (Physical to a Page, Logical Within)
// ARIES's preferred approach
 
const physiologicalUpdate = {
    redoInfo: {
        pageId: "Page_100",      // Specific page (physical)
        operation: "SET_SLOT",   // Logical operation
        slotNumber: 5,
        columnId: "salary",
        newValue: 75000
    },
    undoInfo: {
        pageId: "Page_100",
        operation: "SET_SLOT",
        slotNumber: 5,
        columnId: "salary",
        oldValue: 50000
    }
    // Compact logs + deterministic page-level operations
};

Logical Logging

Logical logging records the operation that was performed:

RedoInfo: "Execute SQL: UPDATE t SET x = x + 10 WHERE id = 5"
UndoInfo: "Execute SQL: UPDATE t SET x = x - 10 WHERE id = 5"

Advantages:

Compact: Just the operation description
Flexible: Works even if physical layout changes
Natural for high-level operations

Disadvantages:

Non-deterministic: Result might differ if context changed
Complex recovery: Needs full execution environment
Concurrency issues: Row might have moved, locking needed

Physiological Logging (ARIES's Choice)

Physiological = Physical-to-a-Page, Logical-within-a-Page.

Identify a specific page (physical targeting)
Describe operation on that page (logical description)
Example: "On Page 100, set slot 5's salary column to 75000"

Advantages:

Compact: No full-page images
Deterministic at page level: Same page, same slot, same result
Supports fine-grained locking
Page-oriented: Natural for buffer pool interaction

Disadvantages:

More complex than pure physical
Requires page-format understanding
Still tied to specific page

Logging Strategy Comparison
Strategy	Log Size	Recovery Complexity	Concurrency Support	ARIES Default
Physical	Large	Simple	Limited	No
Logical	Small	Complex	Limited	No
Physiological	Medium	Medium	Good	Yes

Why Physiological Wins

Physiological logging hits the sweet spot: logs are compact (no full-page images), recovery is deterministic (same page, same operation), and it supports fine-grained concurrency (page-level locking compatible). This is why ARIES standardized on physiological logging.

Log Record Interconnections

Log records don't exist in isolation—they form a web of interconnected pointers that enable efficient recovery operations.

Forward Log Sequence

Records are written sequentially with monotonically increasing LSNs. This sequence is the primary structure for redo: scan forward, apply operations.

Transaction Backward Chains (PrevLSN)

Each transaction's records form a backward chain via PrevLSN. Undo follows these chains:

T1: UPDATE(1000) ← UPDATE(3000) ← CLR(6000) ← UPDATE(8000)
     [first]  ←    [second]   ←   [undo]   ←   [third]

CLR UndoNextLSN Pointers

CLRs have additional UndoNextLSN pointers that skip over undone operations:

CLR(6000).UndoNextLSN = 1000  (skip UPDATE at 3000, already undone)

Page Modification Chains

While not explicit pointers, all records modifying a given page form an implicit chain ordered by LSN. PageLSN comparison uses this ordering.

Converting Mermaid diagram...

Navigating the Structure

Recovery uses different navigations:

Analysis: Forward scan from checkpoint, builds TT and DPT
Redo: Forward scan from RedoLSN, follows log sequence
Undo: Backward traversal via PrevLSN and UndoNextLSN

Each navigation is optimized for its purpose:

Forward is sequential I/O (fast)
Backward uses direct seeking via LSN pointers (minimal I/O)

Log as a Graph

Conceptually, the log is a directed graph: nodes are records, forward edges follow LSN sequence, backward edges follow PrevLSN, and skip edges follow UndoNextLSN. Recovery algorithms traverse this graph in different ways depending on their needs.

Log Storage and Management

The physical storage and management of the log presents its own engineering challenges.

Log Buffer Management

Log records are first written to an in-memory log buffer before being flushed to stable storage:

Transaction writes log record → goes to log buffer
Buffer accumulates records
Buffer flushed to disk on: commit, buffer full, or periodic interval
FlushedLSN updated after flush

Circular Log with Archiving

Physical logs are often organized as circular buffers with archiving:

Active log: Most recent records, used for recovery
Archive log: Older records, used for point-in-time recovery
When active log fills, oldest portions are archived
Archived portions can be to tape, remote storage, etc.

Log Storage Organization
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Log Storage Configuration
 
interface LogConfiguration {
    // Log buffer (in memory)
    bufferSize: number;           // e.g., 64MB
    flushIntervalMs: number;      // e.g., 1000ms (1 second)
    forceOnCommit: boolean;       // Must be true for durability!
    
    // Active log files
    logFileSize: number;          // e.g., 1GB per file
    numLogFiles: number;          // e.g., 4 files (circular)
    logDirectory: string;         // e.g., "/var/db/wal/"
    
    // Archive settings
    archiveEnabled: boolean;
    archiveDirectory: string;     // e.g., "/backup/wal_archive/"
    compressionEnabled: boolean;
    
    // Checkpoint settings  
    checkpointIntervalSec: number; // e.g., 300 (5 minutes)
    checkpointOnLogFull: boolean;
}
 
// Log file layout:
//
// log_001.wal (1GB)
// ├── Record at LSN 0
// ├── Record at LSN 256
// ├── ...
// └── Record at LSN 1073741568
//
// log_002.wal (1GB)  
// ├── Record at LSN 1073741824
// └── ...
//
// When log_001 is no longer needed (all txns committed, 
// checkpoint past its records), it can be archived or recycled.
 
// Determining if log file can be freed:
function canFreeLogFile(file: LogFile, redoLSN: LSN): boolean {
    // File can be freed if all its records precede the redo starting point
    // AND all transactions that touched it have completed
    return file.maxLSN < redoLSN;
}

Log Truncation

The log can't grow forever. Log truncation removes old records that are no longer needed for recovery:

Records before the oldest active transaction's first record can be truncated
Records before the checkpoint's RedoLSN can be truncated
But we must keep records needed for replication or point-in-time recovery!

Write-Ahead Log Performance

The log's performance characteristics are crucial:

Sequential writes: Log appends are sequential, which is fast
Synchronous flushes: Commits require sync to stable storage
Group commit: Batching multiple commits in one flush
Dedicated disk: Often the log is on a separate disk/SSD from data

Modern systems use NVMe SSDs or battery-backed write caches to make log flushes fast without sacrificing durability.

Log Disk Separation

Production databases typically keep the log on a separate physical device from data files. Both are sequential writers (log appends, checkpoint flushes), so separating them eliminates head movement contention. SSDs change this calculus but dedicated log devices remain common for latency predictability.

Summary: Log Structure

The log structure is the information foundation of ARIES recovery. Let's consolidate the key concepts:

Key Takeaways

•UPDATE records capture redo and undo information for data modifications, using physiological logging in ARIES.
•CLR records log undo actions with UndoNextLSN pointers, enabling crash-during-recovery resilience.
•Transaction records (COMMIT, ABORT, END) track lifecycle; COMMIT must be force-written for durability.
•CHECKPOINT records capture TT and DPT snapshots, bounding recovery scan length.
•Physiological logging balances log compactness with recovery determinism and concurrency support.
•Log records interconnect via PrevLSN chains and UndoNextLSN pointers, enabling efficient traversal.

What's Next

With log structure understood, we'll explore the core ARIES principles—the theoretical foundations that ensure recovery is always correct: Write-Ahead Logging, Repeat History, and the invariants that tie everything together.

Page Complete

You now understand the complete structure of ARIES logs: the record types, their fields, how they interconnect, and the design decisions behind physiological logging. This structural knowledge provides the foundation for understanding ARIES's correctness principles.

4 / 5

Loading learning content...

Database Management SystemsARIES Overview

ARIES Recovery Algorithm Overview

LevelAdvanced

Duration75 mins

TopicARIES Overview

4 / 5

ARIES Log Structure

The Recovery Information Repository

What You Will Learn

Log Record Types

ARIES uses several types of log records, each serving a distinct purpose in the recovery ecosystem. Understanding these types is fundamental to grasping how recovery works.

Core Record Types

ARIES Log Record Types
Record Type	Purpose	Generated When	Recovery Use
UPDATE	Records a data modification	Data operation executes	Redo and undo operations
COMMIT	Marks transaction as committed	Transaction commits	Identify winner transactions
ABORT	Marks transaction as aborting	Transaction aborts	Trigger undo processing
CLR (Compensation)	Records an undo action	During rollback/recovery	Redo the undo; track undo progress
END	Marks transaction completion	After commit or full rollback	Remove from transaction table
BEGIN	Marks transaction start	Transaction starts (optional)	Debugging/auditing
CHECKPOINT	Captures system state snapshot	Periodically by system	Bound recovery scan

Record Type Categories

We can categorize these records by their role:

Data Records (UPDATE, CLR): Describe modifications to database pages
Transaction Records (BEGIN, COMMIT, ABORT, END): Track transaction lifecycle
System Records (CHECKPOINT): Capture system-wide state

BEGIN Records Are Optional

UPDATE Record: The Workhorse

The UPDATE record is the most important and most frequent log record type. It captures all the information needed to redo or undo a data modification.

UPDATE Record Fields

UPDATE Log Record Structure
Field	Type	Description	Example
LSN	Log Sequence Number	Unique identifier for this record	5000
Type	Enum	Record type identifier	UPDATE
TransactionID	Identifier	Transaction that made this change	T42
PrevLSN	LSN	Previous log record by this transaction	4500
PageID	Identifier	Database page being modified	Page_1023
Length	Integer	Size of this log record	256 bytes
UndoInfo	Bytes/Operation	Information to reverse this change	Old row, or inverse operation
RedoInfo	Bytes/Operation	Information to re-apply this change	New row, or operation details

UPDATE Record Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
// Detailed UPDATE record structure
 
interface UpdateLogRecord {
    // Header fields (common to all record types)
    lsn: LSN;                    // Assigned when record is written
    type: "UPDATE";              // Record type identifier
    transactionId: TransactionID; // Owning transaction
    prevLSN: LSN | null;         // Previous record by this transaction
    
    // Page identification
    pageId: PageID;              // Which page is being modified
    
    // Modification information (depends on logging type)
    undoInfo: UndoData;          // How to reverse this change
    redoInfo: RedoData;          // How to re-apply this change
}
 
// Example concrete UPDATE record:
const exampleUpdate: UpdateLogRecord = {
    lsn: 5000,
    type: "UPDATE",
    transactionId: "T42",
    prevLSN: 4500,  // T42's previous operation was at LSN 4500
    
    pageId: "Page_1023",
    
    // For a row update: "SET salary = 75000 WHERE id = 5"
    // Physical logging: store before/after byte values
    undoInfo: {
        offset: 120,      // Where on the page
        length: 4,        // How many bytes
        beforeValue: Buffer.from([0x00, 0x00, 0xC3, 0x50])  // 50000 in binary
    },
    redoInfo: {
        offset: 120,
        length: 4,
        afterValue: Buffer.from([0x00, 0x01, 0x24, 0xF8])   // 75000 in binary
    }
};
 
// Physiological logging alternative:
const physiologicalUpdate = {
    lsn: 5000,
    type: "UPDATE",
    transactionId: "T42",
    prevLSN: 4500,
    pageId: "Page_1023",
    
    // Operation-based description
    undoInfo: {
        operation: "SET_COLUMN",
        rowSlot: 5,
        columnId: "salary",
        value: 50000  // Old value to restore
    },
    redoInfo: {
        operation: "SET_COLUMN",
        rowSlot: 5,
        columnId: "salary", 
        value: 75000  // New value to apply
    }
};

Understanding UndoInfo and RedoInfo

The undo and redo information are the payload of an UPDATE record. What they contain depends on the logging strategy:

Physical logging: Raw bytes—before-image for undo, after-image for redo
Logical logging: Operation description—"decrement X by 10" for redo, "increment X by 10" for undo
Physiological logging: Operation on a specific page—"set slot 5, column 3 to value V"

The choice of logging strategy affects log size, redo/undo complexity, and recovery flexibility.

Undo and Redo Are Often Symmetric

CLR Record: Compensation Logging

CLR Design Philosophy

A CLR describes the inverse of an UPDATE. When we undo an UPDATE at LSN X, we write a CLR that:

Describes what we did to reverse the UPDATE (for redo if we crash)
Contains UndoNextLSN pointing to the next record to undo

CLRs are redo-only records—they are never undone themselves. If we crash and recover, CLRs are redone just like UPDATEs, restoring our undo progress.

CLR Record Fields

CLR Log Record Structure
Field	Type	Description	Key Difference from UPDATE
LSN	LSN	This CLR's unique identifier	Same as UPDATE
Type	Enum	Record type = CLR	Different type indicator
TransactionID	ID	Transaction being rolled back	Same as UPDATE
PrevLSN	LSN	Previous record by this transaction	Points to prior CLR or original UPDATE
PageID	ID	Page where undo was applied	Same as UPDATE
RedoInfo	Data	How to redo this undo action	Contains the undo operation
UndoNextLSN	LSN	Next record to undo for this txn	NOT in UPDATE records

CLR Record Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// CLR Record Structure
 
interface CLRLogRecord {
    // Common header fields
    lsn: LSN;
    type: "CLR";
    transactionId: TransactionID;
    prevLSN: LSN;              // Points to previous log record (could be CLR or UPDATE)
    
    // Page identification
    pageId: PageID;            // Page where the undo was applied
    
    // The undo action (for redo)
    redoInfo: RedoData;        // Describes what undo did to the page
    
    // CLR-specific: where to continue undoing
    undoNextLSN: LSN | null;   // Next record to undo (or null if undo complete)
    
    // Note: NO undoInfo field! CLRs are never undone.
}
 
// Example: Undoing the UPDATE from previous example
 
// Original UPDATE at LSN 5000 was: SET salary from 50000 to 75000
// Undo means: SET salary from 75000 to 50000
// The original UPDATE's PrevLSN was 4500
 
const exampleCLR: CLRLogRecord = {
    lsn: 6000,                  // New LSN for this CLR
    type: "CLR",
    transactionId: "T42",
    prevLSN: 5000,              // Points back to the UPDATE we're undoing
                                 // (or previous CLR if multiple undos occurred)
    
    pageId: "Page_1023",
    
    // How to redo the undo (if we crash and recover)
    redoInfo: {
        offset: 120,
        length: 4,
        afterValue: Buffer.from([0x00, 0x00, 0xC3, 0x50])  // Restore to 50000
    },
    
    // Continue undoing at LSN 4500 (the original UPDATE's PrevLSN)
    undoNextLSN: 4500
};
 
// Key insight:
// The CLR's UndoNextLSN = The undone record's PrevLSN
// This skips over the record we just undid when continuing undo

CLR Chaining During Rollback

As a transaction is rolled back, CLRs form their own chain:

[CLR for last op] → [CLR for second-to-last] → ... → [CLR for first op]

Each CLR's PrevLSN points to the previous CLR (or the original UPDATE if it's the first CLR). The UndoNextLSN points to where undo should continue—skipping past what's already been undone.

Converting Mermaid diagram...

CLRs Are Never Undone

Transaction State Records

Transaction state records track the lifecycle of transactions without modifying data. They're simpler than UPDATE records but essential for recovery correctness.

COMMIT Record

The COMMIT record marks that a transaction has successfully completed and its effects should persist. It's the point of no return.

Transaction State Records
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// COMMIT Record Structure
interface CommitLogRecord {
    lsn: LSN;
    type: "COMMIT";
    transactionId: TransactionID;
    prevLSN: LSN;              // Last UPDATE/CLR by this transaction
    timestamp?: Date;          // Optional: when commit occurred
}
 
// ABORT Record Structure
interface AbortLogRecord {
    lsn: LSN;
    type: "ABORT";
    transactionId: TransactionID;
    prevLSN: LSN;
    // Records that transaction is aborting (rollback will follow)
}
 
// END Record Structure
interface EndLogRecord {
    lsn: LSN;
    type: "END";
    transactionId: TransactionID;
    prevLSN: LSN;              // Points to COMMIT or last CLR
    // Marks transaction completely finished (can remove from TT)
}
 
// BEGIN Record Structure (optional in many systems)
interface BeginLogRecord {
    lsn: LSN;
    type: "BEGIN";
    transactionId: TransactionID;
    prevLSN: null;             // First record for this transaction
    timestamp?: Date;
}
 
// Transaction lifecycle examples:
 
// Successful commit:
// BEGIN(T1) → UPDATE(T1) → UPDATE(T1) → COMMIT(T1) → END(T1)
 
// Aborted transaction:
// BEGIN(T1) → UPDATE(T1) → UPDATE(T1) → ABORT(T1) 
//           → CLR(T1) → CLR(T1) → END(T1)
 
// Crash during transaction:
// BEGIN(T1) → UPDATE(T1) → UPDATE(T1) → [CRASH]
// Recovery will: undo all T1 updates, write CLRs, write END

The Role of Each Transaction Record

Transaction Record Purposes
Record	When Written	Recovery Implication	Can Be Omitted?
BEGIN	Transaction start	Provides start time, txn ID assignment	Yes (implicit from first UPDATE)
COMMIT	Before returning success to app	Mark as winner (don't undo)	No—required for durability
ABORT	When rollback requested	Explicitly start rollback	Yes (absence implies abort if no COMMIT)
END	After commit or complete rollback	Remove from transaction table	Yes (but slows recovery)

COMMIT vs END: Why Both?

COMMIT and END are separate because there's work between them:

COMMIT is written → transaction is now durable
Release locks → other transactions can proceed
Clean up resources → buffers, temporary structures
END is written → transaction is fully done, can remove from TT

If we crash after COMMIT but before END:

Transaction is committed (COMMIT record exists)
But it's still in the Transaction Table (no END)
Recovery treats it as a winner (has COMMIT), writes END, done

Without END records, the Transaction Table might accumulate completed transactions, making analysis slower.

COMMIT Must Be Force-Written

CHECKPOINT Records

Checkpoint Contents

A checkpoint record contains snapshots of the critical recovery data structures:

Checkpoint Record Structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// Checkpoint Record Structure
 
interface CheckpointLogRecord {
    lsn: LSN;
    type: "CHECKPOINT";
    
    // Snapshot of Transaction Table
    transactionTable: Array<{
        transactionId: TransactionID;
        state: "ACTIVE" | "COMMITTING" | "ABORTING";
        lastLSN: LSN;
    }>;
    
    // Snapshot of Dirty Page Table
    dirtyPageTable: Array<{
        pageId: PageID;
        recLSN: LSN;
    }>;
    
    // Additional metadata
    timestamp: Date;
    
    // Master record location (for finding this checkpoint)
    // Usually stored separately in a known location
}
 
// Example checkpoint record:
const exampleCheckpoint: CheckpointLogRecord = {
    lsn: 10000,
    type: "CHECKPOINT",
    
    transactionTable: [
        { transactionId: "T1", state: "ACTIVE", lastLSN: 9500 },
        { transactionId: "T2", state: "ACTIVE", lastLSN: 8700 },
        { transactionId: "T3", state: "COMMITTING", lastLSN: 9900 }
    ],
    
    dirtyPageTable: [
        { pageId: "Page_100", recLSN: 8000 },
        { pageId: "Page_250", recLSN: 9200 },
        { pageId: "Page_375", recLSN: 9800 }
    ],
    
    timestamp: new Date("2024-03-15T10:30:00Z")
};
 
// How recovery uses this:
// 1. Find the master record pointing to this checkpoint
// 2. Read checkpoint at LSN 10000
// 3. Initialize TT and DPT from checkpoint contents
// 4. Scan log forward from LSN 10000 (or earliest recLSN)
// 5. Analysis updates TT and DPT based on subsequent records

Types of Checkpoints

ARIES supports different checkpoint strategies:

1. Quiescent (Sharp) Checkpoint

Stop all transactions
Flush all dirty pages
Write checkpoint with empty DPT
Resume transactions

Advantage: Simple recovery. Disadvantage: System pause is unacceptable for production.

2. Non-Quiescent (Transaction-Consistent) Checkpoint

Don't stop transactions
Flush dirty pages in background
Checkpoint captures current TT and DPT

Advantage: No pause. Disadvantage: More complex, more data in checkpoint.

3. Fuzzy Checkpoint (ARIES default)

Don't stop transactions
Don't force page writes
Just record current TT and DPT
Pages may be dirtier than checkpoint indicates

Advantage: Minimal overhead. Disadvantage: More redo work potentially needed.

Checkpoint Type Comparison
Type	Blocks Operations?	Forces Pages?	Recovery Work	Production Suitable?
Quiescent	Yes	Yes	Minimal	No
Transaction-Consistent	Briefly	Yes	Low	Sometimes
Fuzzy	No	No	More	Yes

Master Record

Physical, Logical, and Physiological Logging

The contents of UPDATE records' redo and undo information depend on the logging strategy. ARIES supports multiple approaches, each with distinct tradeoffs.

Physical Logging

Physical logging records the actual bytes changed:

RedoInfo: "At offset 120, write bytes 0x00 0x01 0x24 0xF8"
UndoInfo: "At offset 120, write bytes 0x00 0x00 0xC3 0x50"

Advantages:

Simple: Just copy bytes
Deterministic: Same bytes give same result
No dependencies: Redo/undo doesn't need to understand semantics

Disadvantages:

Large records: Before/after images can be huge (whole rows or pages)
Fragile: Physical layout must match exactly
No semantic optimization possible

Logging Strategy Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// Physical Logging Example
// Updating a 1KB row requires storing 2KB (before + after)
 
const physicalUpdate = {
    redoInfo: {
        pageId: "Page_100",
        offset: 0,
        data: Buffer.alloc(1024)  // Full row after-image
    },
    undoInfo: {
        pageId: "Page_100", 
        offset: 0,
        data: Buffer.alloc(1024)  // Full row before-image
    }
    // Total: ~2KB per row modification!
};
 
// Logical Logging Example
// Compact but requires execution context
 
const logicalUpdate = {
    redoInfo: {
        operation: "UPDATE employees SET salary = salary * 1.1 WHERE dept = 'ENG'"
    },
    undoInfo: {
        operation: "UPDATE employees SET salary = salary / 1.1 WHERE dept = 'ENG'"
    }
    // Total: ~100 bytes, but requires SQL execution!
};
 
// Physiological Logging (Physical to a Page, Logical Within)
// ARIES's preferred approach
 
const physiologicalUpdate = {
    redoInfo: {
        pageId: "Page_100",      // Specific page (physical)
        operation: "SET_SLOT",   // Logical operation
        slotNumber: 5,
        columnId: "salary",
        newValue: 75000
    },
    undoInfo: {
        pageId: "Page_100",
        operation: "SET_SLOT",
        slotNumber: 5,
        columnId: "salary",
        oldValue: 50000
    }
    // Compact logs + deterministic page-level operations
};

Logical Logging

Logical logging records the operation that was performed:

RedoInfo: "Execute SQL: UPDATE t SET x = x + 10 WHERE id = 5"
UndoInfo: "Execute SQL: UPDATE t SET x = x - 10 WHERE id = 5"

Advantages:

Compact: Just the operation description
Flexible: Works even if physical layout changes
Natural for high-level operations

Disadvantages:

Non-deterministic: Result might differ if context changed
Complex recovery: Needs full execution environment
Concurrency issues: Row might have moved, locking needed

Physiological Logging (ARIES's Choice)

Physiological = Physical-to-a-Page, Logical-within-a-Page.

Identify a specific page (physical targeting)
Describe operation on that page (logical description)
Example: "On Page 100, set slot 5's salary column to 75000"

Advantages:

Compact: No full-page images
Deterministic at page level: Same page, same slot, same result
Supports fine-grained locking
Page-oriented: Natural for buffer pool interaction

Disadvantages:

More complex than pure physical
Requires page-format understanding
Still tied to specific page

Logging Strategy Comparison
Strategy	Log Size	Recovery Complexity	Concurrency Support	ARIES Default
Physical	Large	Simple	Limited	No
Logical	Small	Complex	Limited	No
Physiological	Medium	Medium	Good	Yes

Why Physiological Wins

Log Record Interconnections

Log records don't exist in isolation—they form a web of interconnected pointers that enable efficient recovery operations.

Forward Log Sequence

Records are written sequentially with monotonically increasing LSNs. This sequence is the primary structure for redo: scan forward, apply operations.

Transaction Backward Chains (PrevLSN)

Each transaction's records form a backward chain via PrevLSN. Undo follows these chains:

T1: UPDATE(1000) ← UPDATE(3000) ← CLR(6000) ← UPDATE(8000)
     [first]  ←    [second]   ←   [undo]   ←   [third]

CLR UndoNextLSN Pointers

CLRs have additional UndoNextLSN pointers that skip over undone operations:

CLR(6000).UndoNextLSN = 1000  (skip UPDATE at 3000, already undone)

Page Modification Chains

While not explicit pointers, all records modifying a given page form an implicit chain ordered by LSN. PageLSN comparison uses this ordering.

Converting Mermaid diagram...

Navigating the Structure

Recovery uses different navigations:

Analysis: Forward scan from checkpoint, builds TT and DPT
Redo: Forward scan from RedoLSN, follows log sequence
Undo: Backward traversal via PrevLSN and UndoNextLSN

Each navigation is optimized for its purpose:

Forward is sequential I/O (fast)
Backward uses direct seeking via LSN pointers (minimal I/O)

Log as a Graph

Log Storage and Management

The physical storage and management of the log presents its own engineering challenges.

Log Buffer Management

Log records are first written to an in-memory log buffer before being flushed to stable storage:

Transaction writes log record → goes to log buffer
Buffer accumulates records
Buffer flushed to disk on: commit, buffer full, or periodic interval
FlushedLSN updated after flush

Circular Log with Archiving

Physical logs are often organized as circular buffers with archiving:

Active log: Most recent records, used for recovery
Archive log: Older records, used for point-in-time recovery
When active log fills, oldest portions are archived
Archived portions can be to tape, remote storage, etc.

Log Storage Organization
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Log Storage Configuration
 
interface LogConfiguration {
    // Log buffer (in memory)
    bufferSize: number;           // e.g., 64MB
    flushIntervalMs: number;      // e.g., 1000ms (1 second)
    forceOnCommit: boolean;       // Must be true for durability!
    
    // Active log files
    logFileSize: number;          // e.g., 1GB per file
    numLogFiles: number;          // e.g., 4 files (circular)
    logDirectory: string;         // e.g., "/var/db/wal/"
    
    // Archive settings
    archiveEnabled: boolean;
    archiveDirectory: string;     // e.g., "/backup/wal_archive/"
    compressionEnabled: boolean;
    
    // Checkpoint settings  
    checkpointIntervalSec: number; // e.g., 300 (5 minutes)
    checkpointOnLogFull: boolean;
}
 
// Log file layout:
//
// log_001.wal (1GB)
// ├── Record at LSN 0
// ├── Record at LSN 256
// ├── ...
// └── Record at LSN 1073741568
//
// log_002.wal (1GB)  
// ├── Record at LSN 1073741824
// └── ...
//
// When log_001 is no longer needed (all txns committed, 
// checkpoint past its records), it can be archived or recycled.
 
// Determining if log file can be freed:
function canFreeLogFile(file: LogFile, redoLSN: LSN): boolean {
    // File can be freed if all its records precede the redo starting point
    // AND all transactions that touched it have completed
    return file.maxLSN < redoLSN;
}

Log Truncation

The log can't grow forever. Log truncation removes old records that are no longer needed for recovery:

Records before the oldest active transaction's first record can be truncated
Records before the checkpoint's RedoLSN can be truncated
But we must keep records needed for replication or point-in-time recovery!

Write-Ahead Log Performance

The log's performance characteristics are crucial:

Sequential writes: Log appends are sequential, which is fast
Synchronous flushes: Commits require sync to stable storage
Group commit: Batching multiple commits in one flush
Dedicated disk: Often the log is on a separate disk/SSD from data

Modern systems use NVMe SSDs or battery-backed write caches to make log flushes fast without sacrificing durability.

Log Disk Separation

Summary: Log Structure

The log structure is the information foundation of ARIES recovery. Let's consolidate the key concepts:

Key Takeaways

•UPDATE records capture redo and undo information for data modifications, using physiological logging in ARIES.
•CLR records log undo actions with UndoNextLSN pointers, enabling crash-during-recovery resilience.
•Transaction records (COMMIT, ABORT, END) track lifecycle; COMMIT must be force-written for durability.
•CHECKPOINT records capture TT and DPT snapshots, bounding recovery scan length.
•Physiological logging balances log compactness with recovery determinism and concurrency support.
•Log records interconnect via PrevLSN chains and UndoNextLSN pointers, enabling efficient traversal.

What's Next

Page Complete

4 / 5