Loading content...
When a database crashes, the recovery system faces two fundamental questions about every transaction that was running at failure time:
The Transaction Table answers both questions. It maintains runtime state about every active transaction—its current status, the log records it generated, and the information needed to undo its effects. This page explores the Transaction Table in exhaustive technical detail, examining its structure, lifecycle management, checkpoint persistence, and indispensable role in the ARIES Analysis Phase.
By the end of this page, you will understand the complete architecture of the Transaction Table, how entries are created and updated throughout transaction lifecycles, how the table survives checkpoints, and how the Analysis Phase uses it to identify transactions requiring undo. You'll be able to trace transaction state transitions and explain how the table enables correct crash recovery.
The Transaction Table serves as the runtime registry of all active transactions in the database system. Its primary purposes are:
During Normal Operation:
During Recovery:
| Aspect | Dirty Page Table | Transaction Table |
|---|---|---|
| Tracks | Modified database pages | Active transactions |
| Key Field | recLSN (first dirty LSN) | lastLSN (most recent operation) |
| Guides Recovery | Redo phase starting point | Undo phase targets |
| Entry Lifetime | From first dirty to disk write | From BEGIN to COMMIT/ABORT |
| Checkpoint Role | Snapshots dirty pages | Snapshots active transactions |
The DPT tells recovery 'these pages might need redo.' The Transaction Table tells recovery 'these transactions need undo.' Together, they provide complete information for reconstructing a consistent database state from any crash point.
The Transaction Table is an in-memory data structure mapping transaction identifiers to their runtime state. Each entry contains all information necessary for both normal operation and crash recovery.
Core Structure:
The table is typically implemented as a hash table for O(1) access, with each entry containing transaction metadata and log navigation information.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
// Transaction states in the lifecycleenum class TransactionState { ACTIVE, // Transaction is running, executing operations PREPARING, // 2PC: prepare phase initiated COMMITTING, // Commit initiated, log records being written COMMITTED, // Commit complete (entry may be removed soon) ABORTING, // Rollback in progress ABORTED // Rollback complete (entry may be removed soon)}; // Transaction Table entry structurestruct TransactionEntry { TransactionID txn_id; // Unique transaction identifier TransactionState state; // Current lifecycle state LSN first_lsn; // LSN of first log record for this txn LSN last_lsn; // LSN of most recent log record for this txn LSN undo_next_lsn; // Next LSN to process during undo (follows prevLSN chain) Timestamp start_time; // When transaction began UserID user_id; // Owner of the transaction int lock_count; // Number of locks held (for monitoring) int log_records; // Count of log records generated}; // The Transaction Table itselfclass TransactionTable {private: HashMap<TransactionID, TransactionEntry> entries; Mutex table_lock; public: // Create entry when transaction begins void beginTransaction(TransactionID tid); // Update lastLSN when transaction logs an operation void recordLogEntry(TransactionID tid, LSN log_lsn); // Update state during commit/abort void setCommitting(TransactionID tid); void setCommitted(TransactionID tid); void setAborting(TransactionID tid); void setAborted(TransactionID tid); // Remove entry when transaction completes void removeTransaction(TransactionID tid); // Recovery support vector<TransactionEntry> snapshot(); TransactionEntry* getEntry(TransactionID tid); vector<TransactionID> getUncommittedTransactions();};Key Fields Explained:
Transaction ID (txn_id): A globally unique identifier assigned when the transaction begins. Typically a monotonically increasing 64-bit integer, possibly incorporating a timestamp or server ID for distributed systems.
State: The current position in the transaction lifecycle. This state machine governs what operations are legal and what recovery actions are needed.
First LSN (firstLSN): The Log Sequence Number of the first log record generated by this transaction. Used for:
Last LSN (lastLSN): The LSN of the most recent log record for this transaction. This is the starting point for undo—rollback follows the prevLSN chain backward from here.
Undo Next LSN (undoNextLSN): During rollback (either normal abort or recovery undo), this field tracks progress. It points to the next log record that must be undone, following the prevLSN chain.
Each log record contains a 'prevLSN' field pointing to the previous log record of the same transaction. This forms a backward-linked chain enabling efficient rollback without scanning the entire log. The lastLSN in the Transaction Table is the head of this chain.
The Transaction Table is updated at every significant point in a transaction's lifecycle. Understanding these updates is essential for tracing how the table reaches its crash-time state.
Lifecycle State Machine:
Event 1: Transaction Begin
When a transaction starts, a new entry is created with initial state:
12345678910111213141516171819202122
void TransactionTable::beginTransaction(TransactionID tid) { lock_guard<Mutex> lock(table_lock); // Create new entry TransactionEntry entry; entry.txn_id = tid; entry.state = TransactionState::ACTIVE; entry.first_lsn = INVALID_LSN; // No log records yet entry.last_lsn = INVALID_LSN; entry.undo_next_lsn = INVALID_LSN; entry.start_time = getCurrentTime(); entry.user_id = getCurrentUser(); entry.lock_count = 0; entry.log_records = 0; entries[tid] = entry; log_debug("Transaction %d started", tid);} // Note: The BEGIN transaction itself may or may not be logged.// ARIES does not require logging BEGIN, but some systems do for auditing.Event 2: Recording Operations
Every data modification generates a log record. The Transaction Table must be updated to track the latest LSN:
12345678910111213141516171819202122232425262728293031
void TransactionTable::recordLogEntry(TransactionID tid, LSN log_lsn) { lock_guard<Mutex> lock(table_lock); auto it = entries.find(tid); if (it == entries.end()) { throw InvalidTransactionError("Transaction not found"); } TransactionEntry& entry = it->second; // Update firstLSN only if this is the first log record if (entry.first_lsn == INVALID_LSN) { entry.first_lsn = log_lsn; } // Always update lastLSN to the most recent record entry.last_lsn = log_lsn; entry.log_records++; log_debug("Transaction %d: logged LSN %d (total records: %d)", tid, log_lsn, entry.log_records);} // The log record itself contains:// - LSN (assigned by log manager)// - TransactionID// - prevLSN (the PREVIOUS lastLSN of this transaction - forms undo chain)// - Type (UPDATE, INSERT, DELETE, etc.)// - PageID// - Undo data (old value)// - Redo data (new value)Event 3: Commit Processing
Committing involves writing a COMMIT log record and transitioning through states:
123456789101112131415161718192021222324252627282930313233343536373839
TransactionResult TransactionManager::commitTransaction(TransactionID tid) { // Step 1: Transition to COMMITTING state txn_table.setCommitting(tid); // Step 2: Write COMMIT log record // This record indicates the transaction's decision to commit LogRecord commit_record; commit_record.type = LogType::COMMIT; commit_record.txn_id = tid; commit_record.prev_lsn = txn_table.getEntry(tid)->last_lsn; LSN commit_lsn = log_manager.appendLog(commit_record); txn_table.recordLogEntry(tid, commit_lsn); // Step 3: Force log to disk (CRUCIAL for durability!) // All log records up to commit_lsn must be on stable storage log_manager.flushTo(commit_lsn); // Step 4: Transition to COMMITTED // At this point, the transaction is durably committed txn_table.setCommitted(tid); // Step 5: Release locks lock_manager.releaseAllLocks(tid); // Step 6: Write END log record (optional but useful) // Indicates that commit processing is complete LogRecord end_record; end_record.type = LogType::END; end_record.txn_id = tid; end_record.prev_lsn = commit_lsn; log_manager.appendLog(end_record); // No need to force - just informational // Step 7: Remove from transaction table txn_table.removeTransaction(tid); return TransactionResult::SUCCESS;}The log flush in step 3 is the moment of truth for durability. Before this point, a crash would cause the transaction to be rolled back during recovery. After this point, the transaction is committed regardless of subsequent crashes. This is why write-ahead logging is fundamental—the COMMIT record must reach stable storage before we tell the application 'commit successful.'
When a transaction must be aborted—whether due to user request, constraint violation, deadlock, or error—the system must undo all its effects. The Transaction Table provides the starting point for this rollback.
Rollback Process:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
TransactionResult TransactionManager::abortTransaction(TransactionID tid) { TransactionEntry* entry = txn_table.getEntry(tid); if (!entry) { throw InvalidTransactionError("Transaction not found"); } // Step 1: Transition to ABORTING state txn_table.setAborting(tid); // Step 2: Write ABORT log record LogRecord abort_record; abort_record.type = LogType::ABORT; abort_record.txn_id = tid; abort_record.prev_lsn = entry->last_lsn; LSN abort_lsn = log_manager.appendLog(abort_record); txn_table.recordLogEntry(tid, abort_lsn); // Step 3: Perform rollback - follow prevLSN chain entry->undo_next_lsn = entry->last_lsn; // Start from most recent while (entry->undo_next_lsn != INVALID_LSN) { LogRecord record = log_manager.readLog(entry->undo_next_lsn); if (record.isUndoable()) { // Perform the compensating action undoOperation(record); // Write CLR (Compensation Log Record) LogRecord clr; clr.type = LogType::CLR; clr.txn_id = tid; clr.prev_lsn = entry->last_lsn; clr.undo_next_lsn = record.prev_lsn; // Points to next record to undo clr.page_id = record.page_id; clr.redo_data = record.undo_data; // CLR's redo = original undo LSN clr_lsn = log_manager.appendLog(clr); txn_table.recordLogEntry(tid, clr_lsn); } // Move to previous log record in the chain entry->undo_next_lsn = record.prev_lsn; } // Step 4: Write END record LogRecord end_record; end_record.type = LogType::END; end_record.txn_id = tid; log_manager.appendLog(end_record); // Step 5: Release locks lock_manager.releaseAllLocks(tid); // Step 6: Transition to ABORTED and remove txn_table.setAborted(tid); txn_table.removeTransaction(tid); return TransactionResult::ABORTED;}Compensation Log Records (CLRs) are crucial for crash recovery during rollback. If the system crashes mid-rollback, recovery can continue the rollback by following CLRs. CLRs are never undone—they represent completed undo work. Their undoNextLSN field enables recovery to skip already-undone records.
During checkpointing, the Transaction Table snapshot is written to the log alongside the Dirty Page Table. This enables recovery to reconstruct the transaction state at checkpoint time.
What Gets Persisted:
For each active transaction at checkpoint time, the checkpoint record stores:
The firstLSN is technically derivable from the log but is often included for convenience and correctness verification.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
struct CheckpointTransactionEntry { TransactionID txn_id; TransactionState state; LSN last_lsn; LSN undo_next_lsn;}; vector<CheckpointTransactionEntry> TransactionTable::snapshotForCheckpoint() { lock_guard<Mutex> lock(table_lock); vector<CheckpointTransactionEntry> snapshot; for (const auto& [tid, entry] : entries) { // Include all non-completed transactions if (entry.state != TransactionState::COMMITTED && entry.state != TransactionState::ABORTED) { CheckpointTransactionEntry ckpt_entry; ckpt_entry.txn_id = entry.txn_id; ckpt_entry.state = entry.state; ckpt_entry.last_lsn = entry.last_lsn; ckpt_entry.undo_next_lsn = entry.undo_next_lsn; snapshot.push_back(ckpt_entry); } } log_info("Transaction table snapshot: %d active transactions", snapshot.size()); return snapshot;} void RecoveryManager::performFuzzyCheckpoint() { // Step 1: Write BEGIN_CHECKPOINT LSN begin_ckpt_lsn = log_manager.appendLog(BEGIN_CHECKPOINT, {}); // Step 2: Snapshot both tables (order doesn't matter for correctness) vector<DirtyPageEntry> dpt_snapshot = dirty_page_table.snapshot(); vector<CheckpointTransactionEntry> txn_snapshot = txn_table.snapshotForCheckpoint(); // Step 3: Write END_CHECKPOINT with both tables EndCheckpointRecord end_ckpt; end_ckpt.dirty_page_table = dpt_snapshot; end_ckpt.transaction_table = txn_snapshot; LSN end_ckpt_lsn = log_manager.appendLog(END_CHECKPOINT, end_ckpt); // Step 4: Force checkpoint records to disk log_manager.flushTo(end_ckpt_lsn); // Step 5: Update master record updateMasterRecord(begin_ckpt_lsn); log_info("Checkpoint complete: DPT=%d entries, TxnTable=%d entries", dpt_snapshot.size(), txn_snapshot.size());}Because fuzzy checkpoints don't stop transactions, the checkpoint snapshot represents state at a specific instant. Transactions may have progressed between BEGIN_CHECKPOINT and END_CHECKPOINT. The Analysis Phase accounts for this by scanning log records after the checkpoint to update both tables.
The Analysis Phase reconstructs the Transaction Table to reflect its crash-time state. This involves loading the checkpoint snapshot and then processing all log records after the checkpoint.
Analysis Phase Transaction Table Processing:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
void AnalysisPhase::reconstructTransactionTable() { // Step 1: Load transaction table from checkpoint CheckpointRecord ckpt = readLastCheckpoint(); for (const auto& entry : ckpt.transaction_table) { TransactionEntry txn; txn.txn_id = entry.txn_id; txn.state = entry.state; txn.last_lsn = entry.last_lsn; txn.undo_next_lsn = entry.undo_next_lsn; transaction_table.insert(entry.txn_id, txn); } log_info("Loaded %d transactions from checkpoint", ckpt.transaction_table.size()); // Step 2: Scan log from checkpoint to end LSN current_lsn = ckpt.checkpoint_lsn; while (current_lsn != END_OF_LOG) { LogRecord record = log_manager.readLog(current_lsn); TransactionID tid = record.txn_id; // Handle different log record types switch (record.type) { case LogType::UPDATE: case LogType::INSERT: case LogType::DELETE: case LogType::CLR: // Regular operations: update or create transaction entry if (!transaction_table.contains(tid)) { // Transaction started after checkpoint TransactionEntry entry; entry.txn_id = tid; entry.state = TransactionState::UNDO; // Mark for undo entry.last_lsn = current_lsn; entry.undo_next_lsn = current_lsn; transaction_table.insert(tid, entry); log_debug("Analysis: New transaction %d found at LSN %d", tid, current_lsn); } else { // Update existing entry TransactionEntry& entry = transaction_table.get(tid); entry.last_lsn = current_lsn; // For CLR, update undoNextLSN if (record.type == LogType::CLR) { entry.undo_next_lsn = record.undo_next_lsn; } else { entry.undo_next_lsn = current_lsn; } } break; case LogType::COMMIT: // Transaction committed - will be removed later if (transaction_table.contains(tid)) { transaction_table.get(tid).state = TransactionState::COMMITTING; } break; case LogType::ABORT: // Transaction was aborting at crash if (transaction_table.contains(tid)) { transaction_table.get(tid).state = TransactionState::UNDO; } break; case LogType::END: // Transaction fully completed - remove from table transaction_table.remove(tid); log_debug("Analysis: Transaction %d completed, removed", tid); break; } current_lsn = record.next_lsn; } // Step 3: Mark all remaining transactions as needing UNDO // Any transaction still in the table without an END record didn't complete for (auto& [tid, entry] : transaction_table) { if (entry.state != TransactionState::COMMITTING) { entry.state = TransactionState::UNDO; } } log_info("Analysis complete: %d transactions require undo", transaction_table.getUndoCount());}A COMMIT record indicates the transaction decided to commit, but an END record indicates commit processing completed. If the system crashed between COMMIT and END, the transaction IS committed (its effects must survive), but it won't have an END record. The Analysis Phase must correctly identify this case—the transaction should NOT be in the undo list.
Handling the COMMIT-but-no-END Case:
This subtle scenario deserves special attention. Consider:
T1 should NOT be undone—it committed. The state should be COMMITTING, and it should be removed from the undo list during the redo/undo phases (after redo replays any effects and confirms durability).
After the Analysis Phase completes, the Transaction Table contains all transactions that require undo processing. The Undo Phase uses this information to roll back uncommitted work.
Identifying Losers (Transactions to Undo):
A transaction is a 'loser' requiring undo if:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
struct UndoTask { TransactionID txn_id; LSN undo_next_lsn; // Where to start undoing}; vector<UndoTask> AnalysisPhase::determineUndoList() { vector<UndoTask> undo_list; for (const auto& [tid, entry] : transaction_table) { bool needs_undo = false; switch (entry.state) { case TransactionState::ACTIVE: // Was running, never committed needs_undo = true; break; case TransactionState::UNDO: // Explicitly marked for undo (started after checkpoint, no commit) needs_undo = true; break; case TransactionState::ABORTING: // Was in the middle of rollback needs_undo = true; break; case TransactionState::COMMITTING: // Had COMMIT record - do NOT undo // Will write END during recovery to clean up needs_undo = false; log_debug("Transaction %d committed - will write END", tid); break; case TransactionState::COMMITTED: case TransactionState::ABORTED: // Fully completed - should have been removed // If still here, something is wrong log_error("Unexpected completed transaction %d in table", tid); break; } if (needs_undo) { UndoTask task; task.txn_id = tid; task.undo_next_lsn = entry.undo_next_lsn; undo_list.push_back(task); log_info("Transaction %d added to undo list, starting at LSN %d", tid, entry.undo_next_lsn); } } // Sort by undoNextLSN descending for efficient undo // Start with most recent operations first sort(undo_list.begin(), undo_list.end(), [](const UndoTask& a, const UndoTask& b) { return a.undo_next_lsn > b.undo_next_lsn; }); return undo_list;} /* * Example After Analysis: * * Transaction Table: * T1: state=UNDO, lastLSN=1050, undoNextLSN=1050 * T2: state=COMMITTING, lastLSN=1100 * T3: state=UNDO, lastLSN=1200, undoNextLSN=1150 (CLR advanced it) * * Undo List (sorted by undoNextLSN descending): * T3: undoNextLSN=1150 * T1: undoNextLSN=1050 * * Undo will process: LSN 1150 (T3), then 1050 (T1), then their prevLSN chains */Processing undo tasks from highest to lowest LSN enables an efficient single backward pass through the log. As each transaction's prevLSN chain is followed, the log pointer naturally moves backward. This is more I/O efficient than jumping around the log randomly.
The Transaction Table is the complement to the Dirty Page Table—together they provide complete information for crash recovery. Let's consolidate the essential concepts:
What's Next:
With both the Dirty Page Table and Transaction Table understood, we'll examine how the Analysis Phase scans the log to update these structures. The log scanning process is where raw log records transform into actionable recovery information, determining exactly what redo and undo work must occur.
You now have a comprehensive understanding of the Transaction Table—its structure, lifecycle updates, checkpoint integration, and critical role in identifying transactions requiring undo. Combined with DPT knowledge, you understand the two core data structures that make ARIES recovery intelligent and efficient.