Loading learning content...
If ARIES is the body of database recovery, Log Sequence Numbers (LSNs) are its nervous system. Every log record, every database page, every recovery decision ultimately traces back to LSN comparisons. Without LSNs, ARIES would need complex bookkeeping to track which operations have been applied. With LSNs, a single number comparison answers the question.
Understanding LSNs is not just about knowing what they are—it's about understanding how they're used throughout ARIES to enable efficient, correct recovery. This page explores LSN generation, storage, comparison, and the various LSN types that make ARIES work.
By the end of this page, you will understand: (1) how LSNs are generated and what properties they must have, (2) the different types of LSNs (PageLSN, RecLSN, PrevLSN, UndoNextLSN, FlushedLSN), (3) how LSN comparison enables efficient redo, and (4) the role of LSNs in maintaining recovery invariants.
A Log Sequence Number (LSN) is a unique identifier assigned to each log record when it's created. LSNs form the chronological ordering of all operations in the database system.
Fundamental Properties of LSNs
LSNs must satisfy several critical properties:
Uniqueness: Every log record has a distinct LSN. No two records share the same LSN.
Monotonicity: LSNs strictly increase. If record A is created before record B, then LSN(A) < LSN(B).
Persistence: LSNs survive crashes. The LSN assignment is part of the log record on stable storage.
Comparability: Two LSNs can be compared to determine chronological order.
Physical Implementation
In practice, LSNs are typically implemented as one of:
The byte-offset approach is most common because it provides both a unique identifier AND a physical location—no separate index needed.
| Approach | Format | Advantages | Disadvantages |
|---|---|---|---|
| Byte Offset | File position (e.g., 104857600) | Also serves as physical address | Gaps if records vary in size |
| Counter | Incrementing integer (1, 2, 3...) | Dense, no gaps | Requires index to find records |
| Composite | <FileNum, Offset> | Supports multi-file logs | More complex comparison |
| Timestamp | System clock value | Natural ordering | Clock skew issues, not guaranteed unique |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
// Example: Byte-offset LSN implementation class LogManager { private logFile: File; private currentPosition: number = 0; // This IS the LSN private logBuffer: Buffer; /** * Appends a log record and returns its LSN * The LSN is simply the byte position where the record starts */ appendLogRecord(record: LogRecord): LSN { // The LSN is the current position (before writing) const lsn: LSN = this.currentPosition; // Serialize the record const serialized = serialize(record); // Write to buffer (not yet on stable storage) this.logBuffer.append(serialized); // Update position for next record this.currentPosition += serialized.length; // Return the LSN assigned to this record return lsn; } /** * Read a log record given its LSN * With byte-offset LSN, we can seek directly */ readLogRecord(lsn: LSN): LogRecord { // Seek to the byte position this.logFile.seek(lsn); // Read and deserialize return deserialize(this.logFile.read()); } /** * Flush log buffer to stable storage * Records WAL invariant: flushedLSN tracks what's durable */ flushLog(upToLSN: LSN): void { // Force all records up to and including upToLSN to disk this.logFile.write(this.logBuffer.getUpTo(upToLSN)); this.logFile.sync(); // Ensure actually on stable storage this.flushedLSN = upToLSN; }}The elegance of byte-offset LSNs is that they serve dual purposes: unique identifiers for comparison AND physical addresses for retrieval. When the undo phase needs to process a record at LSN 5000, it can seek directly to byte 5000 in the log file. No separate index required.
ARIES uses LSNs in multiple contexts, each with a specific purpose. Understanding these different LSN uses is crucial for grasping how ARIES operates.
1. Record LSN (The Log Record's Own LSN)
Every log record has an LSN—the unique identifier assigned when the record was created. This is what 'LSN' typically refers to in discussions of log records.
2. PageLSN (On Database Pages)
Every database page stores a PageLSN in its header—the LSN of the most recent log record that modified this page. PageLSN is the key to efficient redo: if PageLSN >= RecordLSN, the modification is already applied.
3. RecLSN (In Dirty Page Table)
The Dirty Page Table tracks, for each dirty page, the RecLSN—the LSN of the first log record that dirtied the page since the last checkpoint. RecLSN determines where redo must start for each page.
4. PrevLSN (In Log Records)
Each log record contains a PrevLSN pointer—the LSN of the previous log record written by the same transaction. PrevLSN enables backward traversal through a transaction's operations.
5. UndoNextLSN (In CLRs)
Compensation Log Records contain an UndoNextLSN—the LSN of the next record to undo for this transaction. UndoNextLSN enables skipping already-undone operations during recovery.
6. FlushedLSN (Log Manager State)
The log manager tracks FlushedLSN—the highest LSN that has been forced to stable storage. This enforces the Write-Ahead Logging rule.
| LSN Type | Stored In | Purpose | Updated When |
|---|---|---|---|
| Record LSN | Log record itself | Unique identifier | Record creation |
| PageLSN | Database page header | Track latest modification | Page is modified |
| RecLSN | Dirty Page Table entry | First dirtying operation | Page first modified after checkpoint |
| PrevLSN | Log record field | Chain transaction's records | Record creation (points backward) |
| UndoNextLSN | CLR field | Undo progress marker | CLR creation during rollback |
| FlushedLSN | Log manager memory | Track durable log position | Log buffer flushed to disk |
Notice how PrevLSN creates a backward chain for each transaction: T1's records (4000 → 3000 → 1000) and T2's records (5000 → 2000). This chain is what undo follows when rolling back a transaction—no need to scan the entire log.
The PageLSN is perhaps the most important LSN type for redo efficiency. Let's explore how it works in detail.
What PageLSN Represents
A page's PageLSN answers: "What is the most recent log record whose effects are reflected on this page?"
Whenever a page is modified:
The Redo Comparison
During redo, for each log record with LSN = R that modifies page P:
if (PageLSN of P on disk) >= R:
skip redo // Already applied
else:
apply redo // Need to re-apply
This comparison is the heart of ARIES's efficiency. Instead of tracking applied/not-applied in complex data structures, we piggyback on the page itself.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
// How PageLSN is maintained during normal operation class BufferManager { private pages: Map<PageID, Page> = new Map(); private logManager: LogManager; /** * Modify a page - updates PageLSN and logs the change */ modifyPage( pageId: PageID, transactionId: TransactionID, modification: Modification ): void { // 1. Get the page (from buffer or disk) const page = this.fetchPage(pageId); // 2. Create log record with undo and redo information const logRecord: UpdateLogRecord = { type: "UPDATE", transactionId: transactionId, pageId: pageId, prevLSN: this.getLastLSN(transactionId), undoInfo: captureBeforeState(page, modification), redoInfo: modification }; // 3. Append to log and get the assigned LSN const recordLSN = this.logManager.appendLogRecord(logRecord); // 4. Apply the modification to the page applyModification(page, modification); // 5. UPDATE THE PAGE'S LSN to this record's LSN page.PageLSN = recordLSN; // 6. Mark page as dirty in buffer pool this.markDirty(pageId); // Note: Page is NOT written to disk yet! // Buffer manager will flush later based on its policies } /** * Before flushing a page to disk, must ensure WAL */ flushPage(pageId: PageID): void { const page = this.pages.get(pageId); // WAL Rule: Log must be flushed up to PageLSN before page flush this.logManager.flushLog(page.PageLSN); // Now safe to write the page this.storage.writePage(pageId, page); }}PageLSN and Recovery Scenarios
Let's trace through how PageLSN helps in different scenarios:
| Scenario | PageLSN on Disk | Log Record LSN | Action | Reason |
|---|---|---|---|---|
| Page flushed after modification | 5000 | 5000 | Skip redo | PageLSN shows change is on disk |
| Page not flushed | 4000 | 5000 | Apply redo | PageLSN < RecordLSN means change lost |
| Multiple modifications, partial flush | 3000 | 3000, 4000, 5000 | Skip 3000, redo 4000 & 5000 | Only changes after PageLSN need redo |
| Page flushed multiple times | 5000 | 2000 | Skip redo | Old record, definitely applied |
| Crash during page flush | ??? | 4000 | Redo if needed | Torn page may need redo; PageLSN tells us |
If a crash occurs during a page flush, the page on disk might be partially written (torn). The PageLSN might be corrupted or reflect the pre-modification state. ARIES handles this by treating torn pages as needing redo—the redo operation will write the correct values AND update PageLSN. Some systems also use checksums or double-write buffers for additional protection.
The PrevLSN field in each log record creates a backward-linked chain of all operations by a single transaction. This chain is essential for efficient undo.
Why Chain Matters
Consider a transaction T1 that performs 100 operations. To roll it back, we need to undo all 100 in reverse order. Without chaining, we'd have to scan the entire log looking for T1's records. With PrevLSN chaining, we follow pointers directly:
[Last Operation] → [Second-to-Last] → ... → [First Operation] → null
The Chain Structure
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
// Transaction T1 executes three operations // Operation 1: Insert rowLogRecord_1000 = { LSN: 1000, TransactionID: T1, Type: UPDATE, PrevLSN: null, // First record for T1, no previous PageID: Page_A, UndoInfo: "Delete row 5", RedoInfo: "Insert row 5 with values (...)"} // Operation 2: Update row LogRecord_3000 = { LSN: 3000, TransactionID: T1, Type: UPDATE, PrevLSN: 1000, // Points to previous T1 record PageID: Page_B, UndoInfo: "Set salary to 50000", RedoInfo: "Set salary to 75000"} // Operation 3: Delete rowLogRecord_4500 = { LSN: 4500, TransactionID: T1, Type: UPDATE, PrevLSN: 3000, // Points to previous T1 record PageID: Page_C, UndoInfo: "Insert row 10 with values (...)", RedoInfo: "Delete row 10"} // Transaction Table entry for T1:TT_Entry_T1 = { TransactionID: T1, State: UNDO, // Needs rollback LastLSN: 4500 // Entry point to the chain} // Undo traversal:// 1. Start at LastLSN = 4500// 2. Undo delete, follow PrevLSN to 3000// 3. Undo update, follow PrevLSN to 1000// 4. Undo insert, PrevLSN is null, done!Maintaining PrevLSN
The log manager maintains a lastLSN for each active transaction. When writing a new log record:
This simple bookkeeping creates the chain automatically.
Efficiency Implications
PrevLSN chaining provides O(k) undo time where k is the number of operations by that transaction, rather than O(n) where n is the total log size. For a system with millions of log records and a transaction that did 10 operations, this is a massive efficiency gain.
PrevLSN is not just an optimization—it's essential for practical recovery. Consider a 100GB log with a transaction doing 10 updates. Without PrevLSN, undo would scan 100GB. With PrevLSN, it reads ~10 records (maybe 10KB). The difference is five orders of magnitude.
The UndoNextLSN field in Compensation Log Records (CLRs) solves a critical problem: handling crashes during recovery. When we undo an operation and write a CLR, the CLR's UndoNextLSN points to where undo should continue.
The Problem
Suppose we're undoing transaction T1 with operations at LSNs 1000, 2000, 3000. We undo 3000 (write CLR at 4001), then crash. After recovery and redo:
The Solution: UndoNextLSN
The CLR at LSN 4001 contains UndoNextLSN = 2000. During recovery's undo phase:
The chain is preserved, but we skip the already-undone operation.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
// Scenario: T1 crash during rollback// T1's original log records:// LSN 1000: Insert record// LSN 2000: Update record // LSN 3000: Delete record // During first recovery, undo phase processes: // Step 1: Undo LSN 3000 (delete → insert back)CLR_4001 = { LSN: 4001, TransactionID: T1, Type: CLR, PageID: Page_C, RedoInfo: "Insert record back", // The undo action UndoNextLSN: 2000, // Next record to undo for T1 PrevLSN: 3000} // CRASH before undoing LSN 2000! // ===========================================// Second recovery begins... // Redo phase: re-applies CLR 4001// This restores the "un-deleted" state // Undo phase: processes T1// T1's LastLSN in Transaction Table = 4001 (the CLR) // Processing LSN 4001:// It's a CLR, don't undo it!// Read UndoNextLSN = 2000// Add 2000 to ToUndo set, NOT 3000 // Processing LSN 2000:// Undo the update// Write CLR at 4002, UndoNextLSN = 1000 // Processing LSN 1000:// Undo the insert// Write CLR at 4003, UndoNextLSN = null // T1 undo complete! /* Key insight: Thanks to UndoNextLSN, we never re-undo LSN 3000 CLR 4001 acts as a "resume marker" */The UndoNextLSN Invariant
For proper operation, UndoNextLSN must satisfy:
CLR.UndoNextLSN = UndoneRecord.PrevLSN
In other words, after undoing record R, the CLR's UndoNextLSN points to where R's PrevLSN points—the next record in the backward chain that needs undoing.
Chaining Example with Multiple Crashes
Even with multiple crashes during recovery, UndoNextLSN preserves progress:
| Event | Log State | T1's Progress |
|---|---|---|
| Initial | 1000 → 2000 → 3000 | Not started |
| Undo 3000 | ... → CLR 4001 (UndoNext=2000) | Undone: 3000 |
| CRASH #1 | — | — |
| Redo CLR 4001 | Restored undo of 3000 | Resume from 2000 |
| Undo 2000 | ... → CLR 4002 (UndoNext=1000) | Undone: 3000, 2000 |
| CRASH #2 | — | — |
| Redo CLRs | Restored undo of 3000, 2000 | Resume from 1000 |
| Undo 1000 | ... → CLR 4003 (UndoNext=null) | Complete |
No matter how many times the system crashes during recovery, each recovery attempt makes progress. CLRs are redone (restoring already-completed undo work), then undo resumes from UndoNextLSN. The total work is bounded: original undo work plus redo of CLRs. We never undo the same record twice.
The RecLSN (Recovery LSN) in the Dirty Page Table tracks the earliest point from which a page might need redo. This is crucial for determining where the redo phase should start.
Why Not Just Use PageLSN?
PageLSN tells us the latest modification applied to a page. RecLSN tells us the first modification since the page became dirty after a checkpoint. These are different:
During normal operation, a page might be:
RecLSN Assignment
RecLSN is assigned when a page first enters the Dirty Page Table after a checkpoint (or after being flushed):
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
// How RecLSN is managed in the Dirty Page Table class DirtyPageTable { private entries: Map<PageID, DPTEntry> = new Map(); /** * Called when a page is modified * Sets RecLSN only if page is newly dirty */ recordModification(pageId: PageID, lsn: LSN): void { if (!this.entries.has(pageId)) { // Page is newly dirty - set RecLSN this.entries.set(pageId, { pageId: pageId, recLSN: lsn // FIRST log record that dirtied this page }); } // If page already in DPT, don't update recLSN // We want the OLDEST/EARLIEST dirtying LSN } /** * Called when a page is flushed to disk * Removes the page from DPT (no longer dirty) */ recordFlush(pageId: PageID): void { this.entries.delete(pageId); // Next modification will re-add with new recLSN } /** * Compute the RedoLSN - where redo should start */ computeRedoLSN(): LSN { let minRecLSN = Infinity; for (const entry of this.entries.values()) { if (entry.recLSN < minRecLSN) { minRecLSN = entry.recLSN; } } return minRecLSN; }} // Example timeline:// // Time 1: Checkpoint at LSN 1000// DPT from checkpoint: {Page A: recLSN=800}// // Time 2: Log record LSN 2000 modifies Page B// DPT: {Page A: recLSN=800, Page B: recLSN=2000}// // Time 3: Log record LSN 3000 modifies Page A// DPT: {Page A: recLSN=800, Page B: recLSN=2000}// Note: Page A's recLSN still 800 (already dirty)// // Time 4: Page A flushed to disk// DPT: {Page B: recLSN=2000}// // Time 5: Log record LSN 4000 modifies Page A// DPT: {Page A: recLSN=4000, Page B: recLSN=2000}// Note: Page A re-entered DPT with new recLSN//// If crash now, RedoLSN = min(4000, 2000) = 2000RecLSN vs RedoLSN
RedoLSN = min(all RecLSNs in DPT)
This ensures we start redo early enough to not miss any necessary operations, even if we could start later for most pages.
RecLSN tracking is conservative—it might indicate a page needs redo from an earlier point than strictly necessary (if the page was flushed but re-dirtied). That's fine: the PageLSN comparison during redo will skip operations that are already applied. Being conservative costs some extra redo-phase checks but never misses needed work.
The FlushedLSN tracks how much of the log has been written to stable storage. This is essential for enforcing the Write-Ahead Logging (WAL) rule—the cornerstone of ARIES correctness.
The WAL Rule
Write-Ahead Logging requires:
Before any database page is written to stable storage, all log records up to and including the page's PageLSN must be on stable storage.
In equation form:
BeforeFlush(Page): FlushedLSN >= Page.PageLSN
Why WAL Matters
Imagine WAL is violated: we write a page with PageLSN=5000 to disk while the log is only flushed to LSN 4000. If we crash:
WAL ensures the log always has the information needed to undo or redo any change that might be on a page.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
// How FlushedLSN enforces WAL class LogManager { private flushedLSN: LSN = 0; // Highest LSN on stable storage private buffer: LogBuffer; /** * Flush log to stable storage up to the specified LSN */ flushToLSN(targetLSN: LSN): void { if (targetLSN <= this.flushedLSN) { return; // Already flushed this far } // Get all log records in buffer up to targetLSN const recordsToFlush = this.buffer.getUpTo(targetLSN); // Write to disk this.logFile.append(recordsToFlush); // Force to stable storage (not just OS buffer) this.logFile.sync(); // Update flushedLSN this.flushedLSN = targetLSN; } /** * Called before flushing any page to disk */ ensureWAL(pageLSN: LSN): void { if (this.flushedLSN < pageLSN) { // Must flush log first! this.flushToLSN(pageLSN); } // Now safe to flush the page }} class BufferManager { private logManager: LogManager; /** * Flush a dirty page to disk */ flushPage(pageId: PageID): void { const page = this.getPage(pageId); // ENFORCE WAL RULE this.logManager.ensureWAL(page.PageLSN); // Write page to storage this.storage.write(pageId, page); // Mark as clean this.cleanPage(pageId); }} // Example:// Page A has PageLSN = 5000// FlushedLSN = 4500// // Attempt to flush Page A:// 1. ensureWAL(5000) called// 2. FlushedLSN (4500) < PageLSN (5000)// 3. Must flush log to 5000 first!// 4. After log flush: FlushedLSN = 5000// 5. Now safe to write Page A to diskCommit and FlushedLSN
WAL also governs transaction commit. A transaction cannot be considered committed until its COMMIT log record is on stable storage:
Commit(Transaction):
1. Write COMMIT log record, get commitLSN
2. Flush log to commitLSN
3. Only THEN return success to application
This ensures that if a crash occurs after we tell the application "committed," the commit is recoverable. If we crash before flushing, the transaction will be rolled back on recovery—which is correct since we never confirmed commit.
Group Commit Optimization
Flushing the log is expensive (disk sync). Group commit batches multiple transactions' commits:
This amortizes the flush cost across many transactions.
The WAL rule is not an optimization—it's a correctness requirement. Violating WAL risks irrecoverable data corruption. Every database system that uses logging strictly enforces WAL. The performance costs (sequential log writes, commit flushes) are the price of durability.
Here's a comprehensive reference for all LSN types and their comparisons:
| LSN Type | Location | Set When | Used For |
|---|---|---|---|
| Record.LSN | Log record | Record creation | Unique identifier, ordering |
| Page.PageLSN | Page header | Page modification | Redo skipping via comparison |
| DPT.RecLSN | Dirty Page Table | Page first dirtied | Compute RedoLSN |
| RedoLSN | Recovery state | End of analysis | Redo phase starting point |
| Record.PrevLSN | Log record | Record creation | Transaction backward chain |
| TT.LastLSN | Transaction Table | Each log write | Undo starting point |
| CLR.UndoNextLSN | CLR record | CLR creation | Resume undo after crash |
| FlushedLSN | Log manager | Log flush | Enforce WAL rule |
Key LSN Comparisons in ARIES
| Comparison | Where Used | If True | If False |
|---|---|---|---|
| PageLSN >= RecordLSN | Redo phase | Skip (already applied) | Apply redo |
| RecordLSN < RecLSN | Redo phase | Skip (not relevant to dirty state) | Check PageLSN |
| FlushedLSN >= PageLSN | Page flush check | Safe to flush page | Flush log first |
| FlushedLSN >= CommitLSN | Commit check | Safe to confirm commit | Flush log first |
| UndoNextLSN == null | Undo phase | Transaction undo complete | Continue undoing |
ARIES's genius is that complex recovery decisions reduce to simple LSN comparisons. Is redo needed? Compare PageLSN. Is undo complete? Check UndoNextLSN. Is page flush safe? Compare FlushedLSN. The LSN system transforms recovery from complex bookkeeping into straightforward arithmetic.
Log Sequence Numbers are the foundation upon which ARIES is built. Let's consolidate what we've learned:
What's Next
With LSNs understood, we'll explore the complete log structure—the different types of log records, their formats, and how they work together to provide the information ARIES needs for recovery.
You now understand how Log Sequence Numbers enable every aspect of ARIES recovery—from efficient redo to crash-resilient undo to WAL enforcement. These simple integers are the key to making database recovery both correct and efficient.