Acid Properties - Learning Module

Loading content...

0/241

Atomicity: All or Nothing

The Indivisible Transaction

Imagine transferring $1,000 from your savings account to your checking account. The operation involves two distinct steps: debit $1,000 from savings and credit $1,000 to checking. Now imagine a power failure occurs precisely between these two operations—your savings has been debited, but your checking never received the credit. You've just lost $1,000 into the digital void.

This nightmare scenario represents the fundamental problem that Atomicity solves. In the world of database transactions, atomicity ensures that a sequence of operations is treated as a single, indivisible unit—either all operations succeed together, or none of them take effect. There is no partial execution, no half-completed state, no limbo.

What You Will Learn

This page provides a complete exploration of atomicity: from its theoretical foundations and the 'all-or-nothing' guarantee, to the sophisticated implementation mechanisms databases use to achieve it. You'll understand write-ahead logging, undo/redo recovery, checkpoint strategies, and how atomicity behaves in distributed contexts. By the end, you'll be able to reason precisely about transactional boundaries in system design.

Defining Atomicity: The Foundation

The term atomicity derives from the Greek word atomos, meaning 'indivisible' or 'uncuttable.' In ancient philosophy, an atom was the smallest possible unit of matter—something that could not be broken down further. Database atomicity captures this same essence: a transaction is the smallest logical unit of work that cannot be broken into smaller pieces from the perspective of the outside world.

The Formal Definition:

Atomicity guarantees that a transaction is treated as a single, indivisible operation that either completes in its entirety (commit) or has no effect whatsoever (abort/rollback). If any part of the transaction fails—whether due to a constraint violation, system crash, power failure, or explicit rollback request—the database must return to precisely the state it was in before the transaction began.

Key Properties of Atomicity

•Indivisibility — External observers see only the before-state or after-state of a transaction, never any intermediate state. The internal steps are completely hidden.
•All-or-Nothing Semantics — Either every operation within a transaction is persisted, or none of them are. There is no partial execution visible to the database.
•Failure Transparency — If anything goes wrong during transaction execution—application error, constraint violation, hardware failure—the system automatically undoes all changes.
•Clean Rollback — When a transaction is aborted, all traces of its operations are completely erased. The database state is indistinguishable from if the transaction never started.
•Commit Point — A transaction becomes permanent only when it reaches the commit point. Before commit, everything is provisional and reversible.

Atomicity Is Not About Speed

A common misconception is that 'atomic' means 'instantaneous.' In reality, a transaction can take seconds or minutes to execute. Atomicity doesn't guarantee speed—it guarantees that the execution appears as a single logical moment, regardless of actual duration. The transaction's effects become visible all at once when it commits, even if the work took considerable time.

The Problem Atomicity Solves

To appreciate atomicity, we must understand the chaos that would reign without it. Consider the following real-world scenarios where multiple database operations must succeed or fail together:

Scenario 1: Bank Transfer

1. Debit $500 from Account A
2. Credit $500 to Account B

Without atomicity: If step 1 succeeds but step 2 fails, $500 vanishes from the system. Money has been debited but never credited.

Scenario 2: E-commerce Order

1. Decrement inventory count
2. Create order record
3. Charge customer payment
4. Send confirmation email trigger

Without atomicity: Inventory might be decremented, but if payment fails, you've sold items that aren't properly accounted for. The inventory is corrupted.

Scenario 3: User Registration

1. Insert user record
2. Create user profile
3. Initialize user settings
4. Grant default permissions

Without atomicity: Partial user creation leads to orphaned records, broken foreign keys, and accounts that exist in a half-configured state.

Failure Modes Without Atomicity
Failure Point	System State Without Atomicity	Business Impact
Between operation 1 and 2	Partial changes persisted, data inconsistent	Data corruption, lost transactions, audit failures
After last operation, before commit	All changes persisted but not marked complete	Duplicate processing on retry, phantom records
Hardware failure mid-transaction	Random subset of changes persisted	Unpredictable state, potential data loss
Application exception during processing	Changes up to exception point persisted	Orphaned records, referential integrity violations

The Cascade of Partial Failures

Partial transaction failures don't just cause immediate problems—they cascade. Corrupt data propagates to reports, analytics, downstream systems, and backups. What starts as one failed transaction can poison entire data pipelines. Atomicity is the firewall that prevents this cascade.

The Transaction Lifecycle

Understanding atomicity requires understanding how transactions flow through a database system. Every transaction passes through distinct states, each with specific guarantees and behaviors:

The Five States of a Transaction:

Active — The transaction is currently executing. All operations are being processed, and changes are being made to working memory or buffer pools.
Partially Committed — All operations have completed successfully. The transaction is ready to commit but hasn't yet been durably persisted.
Committed — The transaction has been permanently recorded. All changes are guaranteed to survive failures. This is the point of no return.
Failed — An error occurred during execution. The transaction cannot proceed and must be rolled back.
Aborted — The transaction has been rolled back. All changes have been undone, and the database is restored to its pre-transaction state.

Transaction State Machine

                    ┌──────────────────────────────────────────┐
                    │              TRANSACTION                 │
                    │              LIFECYCLE                   │
                    └──────────────────────────────────────────┘
                    
                              ┌─────────────┐
                              │   BEGIN     │
                              │ TRANSACTION │
                              └──────┬──────┘
                                     │
                                     ▼
                    ┌─────────────────────────────────────┐
                    │              ACTIVE                 │
                    │    (Executing operations)           │
                    └───────────────┬─────────┬───────────┘
                                    │         │
                        Success     │         │  Error/Failure
                                    ▼         ▼
                    ┌────────────────────┐   ┌─────────────┐
                    │ PARTIALLY COMMITTED│   │   FAILED    │
                    │ (Ready to persist) │   │ (Must abort)│
                    └─────────┬──────────┘   └──────┬──────┘
                              │                     │
                  Durably     │                     │
                  Persisted   ▼                     ▼
                    ┌─────────────────┐   ┌─────────────────┐
                    │    COMMITTED    │   │     ABORTED     │
                    │   (Permanent)   │   │  (Rolled back)  │
                    └─────────────────┘   └─────────────────┘

The Critical Transition: Partially Committed → Committed

The most important transition in this lifecycle is from 'partially committed' to 'committed.' This is where atomicity's guarantee is actually enforced. Before this point, the transaction can still be rolled back. After this point, the changes are permanent and will survive any subsequent failure.

Databases accomplish this transition through careful coordination of logging and buffer management, which we'll explore in detail in the implementation section.

Implementation: Write-Ahead Logging (WAL)

The primary mechanism for implementing atomicity in modern databases is Write-Ahead Logging (WAL), also known as 'journaling' or 'transaction logging.' WAL is one of the most elegant solutions in computer science—a simple principle with profound implications.

The WAL Protocol:

Before any change is made to the actual database, a description of that change is first written to a sequential log file. This log entry contains enough information to both redo the change (if the transaction commits) and undo the change (if the transaction aborts or the system crashes).

WAL Log Entry Structure

•Transaction ID (TxID) — Unique identifier for the transaction that made this change.
•Log Sequence Number (LSN) — Monotonically increasing number ordering all log entries. This establishes the precise sequence of operations.
•Operation Type — INSERT, UPDATE, DELETE, or control records (BEGIN, COMMIT, ABORT).
•Page/Record Identifier — The specific database page and record being modified.
•Before Image (Undo Data) — The previous value of the data, used for rollback.
•After Image (Redo Data) — The new value of the data, used for recovery.
•Previous LSN — Pointer to the previous log entry for this transaction (for efficient rollback).

WAL Entry Example: Bank Transfer
1
2
3
4
5
6
7
8
9
10
11
12
LSN: 1001 | TxID: 7 | BEGIN TRANSACTION
LSN: 1002 | TxID: 7 | UPDATE accounts SET balance = 4500 WHERE id = 'A'
         | Before: balance = 5000 | After: balance = 4500
         | PrevLSN: 1001
LSN: 1003 | TxID: 7 | UPDATE accounts SET balance = 3500 WHERE id = 'B'  
         | Before: balance = 3000 | After: balance = 3500
         | PrevLSN: 1002
LSN: 1004 | TxID: 7 | COMMIT
 
───────────────────────────────────────────────────────────
If crash BEFORE LSN 1004: UNDO using before-images
If crash AFTER  LSN 1004: REDO using after-images

Why 'Write-Ahead'?

The 'ahead' in Write-Ahead Logging is crucial. The log entry MUST be written and persisted to stable storage BEFORE the actual data page is modified. This ordering guarantee is what makes crash recovery possible. If data were written first and we crashed before logging, we'd have no record of what happened.

The Undo/Redo Recovery Algorithm

When a database system crashes and restarts, it must recover to a consistent state. The WAL log provides all the information needed to do this through a systematic Undo/Redo recovery process. This algorithm, known as ARIES (Algorithms for Recovery and Isolation Exploiting Semantics), is the gold standard used by most production databases.

Recovery consists of three phases:

Phase 1: Analysis

Scan the log from the last checkpoint forward
Identify which transactions were active at crash time
Determine which pages might need recovery
Build a 'dirty page table' tracking pages that might differ from disk

Phase 2: Redo (History Repeating)

Replay all logged operations from the checkpoint forward
This brings the database to its exact state at crash time
All modifications are reapplied, even for transactions that will be undone
Why? Because some committed transactions' pages might not have been flushed to disk

Phase 3: Undo (Rolling Back Losers)

Identify 'loser' transactions: those that were active but not committed at crash time
Walk their log entries backward using the PrevLSN chain
Apply the before-images to reverse their changes
Write 'Compensation Log Records' (CLRs) to log the undos themselves

ARIES Recovery Illustrated
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Timeline:
                                                                    
  Checkpoint          Operations           Crash
      │                   │                  │
      ▼                   ▼                  ▼
 ─────┼───────────────────┼──────────────────┼─────▶ time
      │                   │                  │
      │   Tx1: BEGIN      │                  │
      │   Tx1: UPDATE A   │                  │
      │   Tx1: COMMIT  ◄──┼── Committed      │
      │                   │                  │
      │   Tx2: BEGIN      │                  │
      │   Tx2: UPDATE B   │                  │
      │   Tx2: UPDATE C   │  ◄── Active      │
      │                   │      (No Commit) │
                                             X Crash!
 
Recovery:
─────────────────────────────────────────────────────────
Phase 1 (Analysis):
  - Tx1 committed before crash ✓
  - Tx2 was active at crash (no commit record) → Loser
 
Phase 2 (Redo): 
  - Redo Tx1: UPDATE A           (ensure committed work persists)
  - Redo Tx2: UPDATE B, UPDATE C (bring to crash state)
 
Phase 3 (Undo):
  - Undo Tx2: Reverse UPDATE C using before-image
  - Undo Tx2: Reverse UPDATE B using before-image
  - Log CLRs for each undo operation
 
Result: Tx1's changes present, Tx2's changes gone = Atomicity!

Why Redo Everything?

You might wonder why Phase 2 redoes operations for transactions that will be undone in Phase 3. The answer is that redo brings the database to a known, consistent state (the crash point), from which undo can safely work backward. This 'repeating history' approach simplifies reasoning about recovery correctness.

Checkpointing: Limiting Recovery Time

Without limits, the WAL log would grow forever, and recovery would need to replay the entire history of the database from its creation. Checkpointing solves this problem by creating periodic 'save points' that limit how far back recovery must scan.

What a Checkpoint Records:

All active transactions and their last LSN
All dirty pages in the buffer pool and their oldest dirty LSN
The checkpoint LSN itself

Types of Checkpoints:

Sharp (Consistent) Checkpoint

•Quiesce all transactions (pause new work)
•Flush ALL dirty pages to disk
•Write checkpoint record
•Resume transactions
•Pro: Recovery is simple—start from checkpoint
•Con: Requires blocking all work (downtime)
•Use: Rare in production; sometimes during maintenance

Fuzzy (Non-Blocking) Checkpoint

•Record current state without stopping work
•Transactions continue uninterrupted
•Dirty pages flushed gradually in background
•Pro: No blocking, continuous operation
•Con: Recovery must scan from checkpoint + redo more
•Use: Default in most production databases

Checkpoint Frequency Trade-offs:

Frequent checkpoints:

Shorter recovery time — Less log to replay after crash
More I/O overhead — Constant page flushing impacts throughput
Log can be truncated sooner — Disk space savings

Infrequent checkpoints:

Longer recovery time — More log to replay
Better throughput — Less background I/O
Larger log files — More disk space needed

Most databases default to checkpoint intervals of 5-15 minutes, balancing recovery time against operational overhead. Critical systems may checkpoint more frequently to minimize worst-case recovery time.

Recovery Time Objective (RTO)

When designing systems, consider your RTO—how quickly must the database recover after a crash? If you need recovery in under 30 seconds, you'll need frequent checkpoints. If several minutes is acceptable, you can checkpoint less often and gain throughput. This is a tunable trade-off, not a fixed constraint.

Atomicity in Application Code

While databases provide atomicity guarantees, application developers must correctly use these mechanisms. Misusing transaction boundaries is a common source of bugs that can defeat atomicity's protections.

Correct Transaction Boundaries:

Correct: Single Transaction Boundary
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// ✅ CORRECT: All related operations in one transaction
async function transferMoney(fromId: string, toId: string, amount: number) {
  await prisma.$transaction(async (tx) => {
    // Both operations are atomic - succeed or fail together
    await tx.account.update({
      where: { id: fromId },
      data: { balance: { decrement: amount } }
    });
    
    await tx.account.update({
      where: { id: toId },
      data: { balance: { increment: amount } }
    });
    
    await tx.transfer.create({
      data: { fromId, toId, amount, timestamp: new Date() }
    });
  });
  // If we reach here, ALL three operations committed
  // If any failed, ALL three were rolled back
}

Incorrect: Separate Transactions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// ❌ WRONG: Operations in separate transactions
async function transferMoney(fromId: string, toId: string, amount: number) {
  // First transaction
  await prisma.account.update({
    where: { id: fromId },
    data: { balance: { decrement: amount } }
  });
  
  // If we crash here, money is gone forever!
  
  // Second transaction
  await prisma.account.update({
    where: { id: toId },
    data: { balance: { increment: amount } }
  });
  
  // Third transaction
  await prisma.transfer.create({
    data: { fromId, toId, amount, timestamp: new Date() }
  });
}

Common Atomicity Anti-Patterns

•Auto-commit mode abuse — Each statement in its own transaction, breaking logical groupings.
•Side effects in transactions — Sending emails or HTTP requests that can't be rolled back.
•Long-running transactions — Holding locks for minutes while waiting on external systems.
•Nested transaction confusion — Misunderstanding savepoint semantics vs. true nested transactions.
•Silent swallowing of errors — Catching exceptions without re-throwing, allowing partial commits.

Atomicity Across Distributed Systems

Single-database atomicity is well-understood, but modern systems often span multiple databases, services, and data stores. Distributed atomicity—ensuring all-or-nothing guarantees across multiple systems—is significantly more challenging.

The Two-Phase Commit Protocol (2PC):

2PC is the classic protocol for distributed transactions. A coordinator orchestrates multiple participants to agree on commit or abort:

Phase 1: Prepare

Coordinator sends 'PREPARE' to all participants
Each participant validates locally—can it commit?
Participants write their vote to durable storage
Participants reply 'YES' (prepared to commit) or 'NO' (must abort)

Phase 2: Commit/Abort

If ALL participants voted YES: Coordinator sends 'COMMIT'
If ANY participant voted NO: Coordinator sends 'ABORT'
Participants execute the decision and acknowledge
Coordinator records completion

Two-Phase Commit Flow

Coordinator           Participant A        Participant B
     │                        │                    │
     │──── PREPARE ──────────►│                    │
     │─────────── PREPARE ────┼───────────────────►│
     │                        │                    │
     │                   [Validate]            [Validate]
     │                   [Log PREP]            [Log PREP]
     │                        │                    │
     │◄─────── YES ───────────│                    │
     │◄────────────── YES ────┼────────────────────│
     │                        │                    │
  [All YES?]                  │                    │
     │                        │                    │
     │──── COMMIT ───────────►│                    │
     │─────────── COMMIT ─────┼───────────────────►│
     │                        │                    │
     │                   [Commit]              [Commit]
     │                        │                    │
     │◄────── ACK ────────────│                    │
     │◄─────────── ACK ───────┼────────────────────│
     │                        │                    │
  [Done]                      │                    │

2PC's Fatal Flaw

2PC has a critical weakness: if the coordinator crashes after participants have voted YES but before sending COMMIT/ABORT, participants are BLOCKED. They've promised to commit and cannot unilaterally abort, yet they have no decision. This blocking problem makes 2PC unsuitable for many distributed systems.

Modern Alternatives to 2PC:

Sagas — Break transactions into compensatable steps. If step 3 fails, execute compensating transactions for steps 1 and 2.
Choreography — Services react to events asynchronously. Eventual consistency replaces strong atomicity.
Outbox Pattern — Write to database and event queue atomically (same database transaction), then process events idempotently.
Three-Phase Commit (3PC) — Adds a 'pre-commit' phase to reduce blocking, but still vulnerable to network partitions.

In distributed systems, pure atomicity often gives way to 'eventual atomicity' or compensating transactions—accepting temporary inconsistency in exchange for availability and partition tolerance.

Summary: Atomicity Mastery

Atomicity is the shield against partial failure—the guarantee that our operations leave the database in a clean, predictable state regardless of what goes wrong during execution.

Key Takeaways

•Atomicity means indivisibility — Transactions execute completely or not at all. There is no partial execution visible to observers.
•Write-Ahead Logging enables atomicity — By logging changes before applying them, databases can always recover to a consistent state.
•Recovery uses Undo/Redo — The ARIES algorithm replays committed work and reverses uncommitted work after a crash.
•Checkpoints limit recovery time — Periodic checkpoints prevent unbounded log replay during recovery.
•Application code must respect boundaries — Correct transaction boundaries are essential; the database can't fix application-level mistakes.
•Distributed atomicity is hard — Two-phase commit provides guarantees but has blocking problems; modern systems often use sagas or eventual consistency.

Page Complete

You now understand Atomicity in depth—from its fundamental guarantee through its implementation via WAL and ARIES recovery, to its challenges in distributed systems. Next, we'll explore Consistency: how databases ensure that transactions move the system from one valid state to another.