Database Management SystemsWrite-Ahead Logging (WAL)

Write-Ahead Logging (WAL)

LevelIntermediate

Duration60 mins

TopicWrite-Ahead Logging (WAL)

5 / 5

WAL Importance

The Foundation of Database Reliability

We've explored the mechanics of Write-Ahead Logging—the rule, the undo information, the redo information. But why has WAL become the universal standard in database systems? Why is it implemented in PostgreSQL, MySQL, Oracle, SQL Server, SQLite, and virtually every database you'll ever use?

The answer extends beyond crash recovery. WAL is the foundation upon which databases build their most critical guarantees: ACID compliance, point-in-time recovery, streaming replication, and high availability. Understanding WAL's importance reveals why it's ubiquitous and why alternatives have failed.

This final page synthesizes everything we've learned, examining WAL's role in production systems and its critical importance to modern data infrastructure.

What You Will Learn

By the end of this page, you will understand why WAL is essential for ACID guarantees, how it enables replication and high availability, its performance implications, and why it remains the dominant approach despite decades of alternatives being proposed.

WAL and ACID Guarantees

The four ACID properties—Atomicity, Consistency, Isolation, Durability—are the foundation of reliable database operations. WAL is essential to two of these directly and supports the others:

Atomicity Through Undo

Atomicity requires that transactions either complete entirely or have no effect. When a transaction aborts (due to error, deadlock, or explicit rollback), all its changes must vanish.

WAL provides this through undo information:

Every modification's before-image is logged
Abort reverses changes using before-images
Crash recovery undoes uncommitted transactions
Result: Partial transactions leave no trace

Durability Through Redo

Durability requires that committed transactions survive any failure—power loss, crashes, hardware faults.

WAL provides this through redo information:

Every modification's after-image is logged before commit
Commit waits for log flush (forced to stable storage)
Recovery replays committed changes not yet on disk
Result: Committed data survives any failure

How WAL Enables ACID
ACID Property	WAL Component	Mechanism
Atomicity	Undo information (before-images)	Rollback reverses all changes; crash recovery undoes uncommitted work
Consistency	Complete recovery	Database always returns to consistent state after any failure
Isolation	Undo segments (MVCC)	Undo provides old versions for snapshot isolation
Durability	Redo information (after-images)	Committed transactions survive crashes via log replay

Supporting Consistency and Isolation:

While consistency and isolation are primarily enforced by constraints and concurrency control, WAL supports them:

Consistency: If a crash interrupts a constraint-validating operation, WAL ensures the database returns to a constraint-satisfying state. Partial constraint updates don't persist.
Isolation (MVCC): Many systems use undo information to provide old versions for MVCC. Readers access before-images to see consistent snapshots without blocking writers.

No WAL, No ACID

Without WAL (or an equivalent mechanism), databases cannot provide both atomicity and durability. You'd have to choose: sacrifice durability by not persisting until commit (data loss on crash), or sacrifice atomicity by persisting incrementally (partial transactions on crash). WAL provides both.

WAL for High Performance

It might seem that WAL adds overhead—writing to both log and data. But WAL actually improves performance dramatically compared to alternatives:

The Performance Insight:

Without WAL, every transaction commit would require:

Forcing all modified data pages to disk
These are random writes scattered across the disk
Each page write requires disk seek + rotation + transfer

With WAL:

Commit only requires forcing log records
Log writes are sequential (append-only)
Sequential writes are 10-100x faster than random writes
Data pages can be written lazily, batched optimally

Performance Comparison

Pseudocode

// WITHOUT WAL (FORCE policy):
// A transaction modifying 100 pages
 
function commitWithoutWAL(transaction):
    // Must force all modified pages to disk
    for page in transaction.modifiedPages:  // 100 pages
        disk.writePage(page)  // Random I/O each!
        disk.fsync()
    // Commit complete
    // Cost: 100 random I/Os × 10ms = 1000ms = 1 second
 
// WITH WAL (NO-FORCE policy):
// Same transaction modifying 100 pages
 
function commitWithWAL(transaction):
    // Only force log records
    for record in transaction.logRecords:  // Sequential buffer
        logBuffer.append(record)
    disk.appendToLog(logBuffer)  // Single sequential write
    disk.fsync()  // Single fsync
    // Commit complete
    // Cost: 1 sequential I/O × ~2ms = 2ms
 
// Performance difference: 500x faster commits!
 
// Data pages written later in background:
function backgroundFlush():
    dirtyPages = bufferPool.getDirtyPages()
    sortByPhysicalLocation(dirtyPages)  // Optimize disk scheduling
    for page in dirtyPages:
        disk.writePage(page)
    // No fsync needed per page - WAL provides durability

Why Sequential Beats Random:

Write Type	HDD Time	SSD Time	Why
Random 4KB	8-12 ms	0.1-0.5 ms	Seek + positioning or flash page lookup
Sequential 4KB	0.04 ms	0.02 ms	Append at head position, write combining
Random 100 pages	800-1200 ms	10-50 ms	Sum of random accesses
Sequential 100 pages	4 ms	2 ms	Streamed write

Even with SSDs (which have fast random access), sequential writes are faster due to write combining and simpler flash management.

Additional Performance Benefits:

Group Commit: Multiple transactions share single log flush
Asynchronous I/O: Data pages written in background
Optimized Flush Order: Background writer can sort pages by disk location
Reduced Write Amplification: Fewer total writes due to page buffering

WAL is Faster, Not Slower

Counterintuitively, adding logging makes databases FASTER. The overhead of writing log records is vastly outweighed by eliminating forced random writes at commit. This is why all high-performance databases use WAL—it's not just for recovery, it's for speed.

WAL for Replication

One of WAL's most important modern uses is database replication—maintaining synchronized copies across multiple servers. The log becomes a complete, ordered record of all changes:

Log-Based Replication:

Primary server writes changes to WAL
WAL records are streamed to replica servers
Replicas apply the WAL records (redo)
Replicas become synchronized copies of primary

WAL Streaming Replication

Pseudocode

// PRIMARY SERVER
class PrimaryWALStreamer:
    replicas: List<ReplicaConnection>
    
    function onWALWrite(logRecords: List<LogRecord>):
        // After writing to local log, stream to replicas
        for replica in self.replicas:
            replica.stream.send(logRecords)
    
    function handleReplicaAck(replicaId: string, ackedLSN: LSN):
        // Replica confirmed receipt up to ackedLSN
        replicas[replicaId].confirmedLSN = ackedLSN
        
        // For synchronous replication: unblock waiting transactions
        for transaction in waitingForReplication:
            if allReplicasAcked(transaction.commitLSN):
                transaction.confirmCommit()
 
// REPLICA SERVER
class ReplicaWALReceiver:
    function onWALReceived(logRecords: List<LogRecord>):
        // Write received records to local WAL
        for record in logRecords:
            localLog.append(record)
        localLog.flush()
        
        // Apply records to local database (redo)
        for record in logRecords:
            page = bufferPool.getPage(record.pageID)
            if page.pageLSN < record.LSN:
                applyRedo(page, record)
                page.pageLSN = record.LSN
                page.markDirty()
        
        // Acknowledge receipt to primary
        primary.send(Ack(lastAppliedLSN))
 
// The replica's database is always a redo-recovery ahead
// of its last applied LSN

Replication Modes:

Mode	Behavior	Durability	Latency
Asynchronous	Primary doesn't wait for replicas	Committed data may be lost if primary fails before replica catches up	Lowest
Synchronous	Primary waits for ≥1 replica ack	Committed data survives primary failure	Higher
Quorum	Primary waits for majority of replicas	Committed data survives minority failures	Moderate

Why WAL is Ideal for Replication:

Complete History: WAL contains every change in order—nothing missed
Compact Transfer: Log records are smaller than full page copies
Idempotent Apply: Replay is idempotent—replicas can apply same record multiple times safely
Unified Format: Same format for recovery and replication—less code, fewer bugs
Point-in-Time: Replicas can be paused at any LSN—time travel for replicas

Logical vs Physical Replication

Some systems offer 'logical replication' (streaming row-level changes instead of WAL). This allows replicating to different database versions or systems, but WAL-based 'physical replication' is simpler, faster, and guarantees exact copies. Most HA setups use physical WAL streaming.

WAL for Point-in-Time Recovery

Beyond crash recovery, WAL enables Point-in-Time Recovery (PITR)—restoring a database to any moment in the past:

The PITR Process:

Start with a base backup (full database copy at time T₀)
Apply archived WAL files from T₀ to target time T₁
Stop replay at T₁
Database is now in exact state at T₁

Use Cases for PITR:

When Point-in-Time Recovery Saves the Day

•Accidental deletion — User runs DELETE FROM users without WHERE clause at 2:30 PM. Restore to 2:29 PM to recover data.
•Data corruption — A bug corrupts records starting Tuesday. Restore to Monday, fix bug, replay valid transactions.
•Compliance audit — Regulator needs database state as of March 1st. Restore to that date for audit.
•Testing — Create a database copy reflecting production at specific point for debugging.
•Ransomware attack — Attacker encrypts database. Restore from pre-attack backup using archived WAL.

PostgreSQL PITR Example
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# 1. Configure WAL archiving (postgresql.conf)
archive_mode = on
archive_command = 'cp %p /archive/wal/%f'
 
# 2. Take a base backup
pg_basebackup -D /backup/base_2024_01_15 -Ft -z -P
 
# 3. Later: disaster strikes at 14:30
# Someone accidentally deletes the users table
 
# 4. Restore base backup 
cd /var/lib/postgresql
rm -rf data/*
tar xzf /backup/base_2024_01_15/base.tar.gz -C data/
 
# 5. Create recovery.conf (PostgreSQL 12+: recovery.signal)
cat > data/postgresql.auto.conf << EOF
restore_command = 'cp /archive/wal/%f %p'
recovery_target_time = '2024-01-15 14:29:00'
recovery_target_action = 'pause'
EOF
 
# 6. Start PostgreSQL - it will replay WAL up to 14:29
pg_ctl start -D data/
 
# 7. Database is now in state before the deletion
# Extract needed data or promote to primary

PITR Requirements:

Continuous WAL archiving — All WAL segments from base backup to target time must be available
Unbroken chain — Missing WAL segments break recovery (cannot skip)
Archive storage — Sufficient space for WAL archives (can be months of data)
Regular base backups — Without recent base backup, must replay more WAL (slower)

Recovery Time Considerations:

More WAL to replay = longer recovery time
Weekly base backups: Worst case = 1 week of WAL replay
Daily base backups: Worst case = 1 day of WAL replay
Balance backup frequency against recovery time objectives

Continuous Archiving is Critical

PITR only works if WAL archiving is continuous and unbroken. If archiving fails for any period, you cannot recover to times after that gap. Monitor archive_command failures closely and alert immediately on failures.

WAL in Modern Architectures

WAL has evolved beyond traditional single-server databases to become central to modern distributed and cloud-native architectures:

1. Distributed Databases

In distributed databases like CockroachDB, TiDB, and Spanner:

Each node maintains its own WAL
Consensus protocols (Raft, Paxos) replicate WAL entries
A transaction commits only when its WAL is replicated to a quorum
WAL becomes the unit of distributed consensus

WAL in Different Database Architectures
Architecture	WAL Role	Key Innovation
Single-Node (PostgreSQL)	Local log file for recovery and replication	Classic WAL—recovery and streaming replication
Shared-Disk (Oracle RAC)	WAL coordinated across nodes, stored on shared storage	Cluster-wide recovery coordination
Shared-Nothing Distributed	Per-shard WAL replicated via consensus	WAL = unit of consensus in Raft/Paxos
Cloud-Native (Aurora)	WAL as the primary data path, pages on demand	Log is the database—pages are just cache
Log-Structured (Kafka)	Entire system is a distributed commit log	WAL elevated to first-class data structure

2. Aurora and the 'Log is the Database' Philosophy

Amazon Aurora takes WAL to its logical extreme:

Only WAL records are written to storage layer
Data pages are never explicitly written by database
Storage layer materializes pages from log on demand
The log IS the database—pages are computed, not stored

This provides:

6-way replication of logs (durability)
Instant crash recovery (no redo phase—storage has logs)
Fast replica creation (share logs, not pages)

3. Event Sourcing and Change Data Capture

WAL concepts extend to application architecture:

Event sourcing: Application state derived from event log (like redo from WAL)
Change data capture (CDC): WAL streamed to data pipelines (Debezium, Maxwell)
CQRS: Write to log, derive read models from log events

WAL as Universal Pattern

WAL is no longer just a database implementation detail—it's a fundamental architectural pattern. From distributed consensus to event-driven systems, the insight that an ordered, durable log can serve as the source of truth has transformed system design. Learn WAL deeply and you'll see it everywhere.

Alternatives to WAL

Over decades, researchers have proposed alternatives to WAL. Understanding why they've failed (or been relegated to niche uses) reinforces WAL's importance:

1. Shadow Paging (No-Overwrite)

Concept: Never overwrite existing pages. Always write modified pages to new locations. Atomically update a 'master pointer' from old to new pages.

Examples: System R (historical), LMDB (specialized)

Problems:

Fragmenting: Pages scattered across disk, losing locality
Space amplification: Need space for both old and new copies
Garbage collection: Complex cleanup of obsolete pages
Poor cache utilization: Random access patterns

2. Force/No-Steal (Simpler Logging)

Concept: Force all pages at commit (no redo needed). Don't allow uncommitted pages to disk (no undo needed).

Problems:

Commit latency: Must wait for all page I/Os
Memory pressure: Uncommitted pages pinned in memory
Long transactions: May exhaust buffer pool
Poor concurrency: Heavy commit I/O blocking

WAL vs Alternatives
Approach	Commit Speed	Memory Use	Recovery	Adoption
WAL (Steal/No-Force)	Fast (log only)	Efficient (steal pages)	Undo + Redo	Universal standard
Shadow Paging	Slow (copy-on-write)	High (multiple copies)	Instant (switch pointers)	Niche (LMDB)
Force/No-Steal	Very slow (all pages)	High (no steal)	None needed	Embedded only
No Recovery	Fastest	Most efficient	None (data loss)	Caches only

3. Replication Instead of Logging

Concept: Synchronously replicate to multiple nodes. If primary fails, replica takes over.

Problems:

Network partitions: What happens when replicas can't communicate?
Split brain: Multiple nodes think they're primary
Still need local durability: What if ALL nodes crash simultaneously?
Consistency: Replication is necessary but not sufficient for recovery

Why WAL Wins:

WAL provides the best balance of:

Performance: Sequential log I/O is fast
Flexibility: Steal/No-Force enables optimal buffer management
Durability: Committed data survives any failure
Recovery: Both undo and redo capabilities
Replication: Log is natural replication source
Simplicity: Well-understood, proven correct

50 Years of Refinement

WAL has been refined since the 1970s. Every proposed alternative has revealed limitations under real-world conditions. WAL's longevity isn't inertia—it's evidence that the approach is fundamentally sound. It has outcompeted every alternative across all database categories.

WAL Operational Considerations

Understanding WAL's importance includes knowing how to operate WAL-based systems effectively:

Log Space Management:

Common WAL Operational Issues

•Log disk fills up — WAL grows until checkpoint; if archiving fails, logs accumulate. Monitor disk space; alert at 80% and page at 90%.
•Long-running transactions — Prevent undo purge and checkpoint advancement. Set statement timeouts; monitor transaction age.
•Replication lag — Replica falls behind; primary can't recycle logs. Monitor replication delay; alert on growing lag.
•Archive failures — Missing archives break PITR chain. Monitor archive_command exit code; redundant archive destinations.
•Checkpoint stalls — Heavy write load with insufficient I/O capacity. Tune checkpoint_completion_target; upgrade storage.

Key Metrics to Monitor:

Metric	Why It Matters	Alert Threshold
WAL write rate	Indicates write load	Unusual spikes
Log space used	Prevent disk exhaustion	>80% capacity
Checkpoint frequency	Balance recovery time and I/O	Varies by workload
Replication lag	Replica health	>10 seconds
Oldest transaction age	Undo bloat prevention	>5 minutes
Archive success rate	PITR chain integrity	<100%

PostgreSQL WAL Monitoring Queries
sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Current WAL position and size
SELECT pg_current_wal_lsn() AS current_wal_lsn,
       pg_walfile_name(pg_current_wal_lsn()) AS current_wal_file;
 
-- WAL generation rate (bytes per second)
SELECT pg_wal_lsn_diff(
         pg_current_wal_lsn(),
         pg_stat_replication.sent_lsn
       ) AS replication_lag_bytes
FROM pg_stat_replication;
 
-- Oldest active transaction (undo bloat risk)
SELECT pid, 
       usename, 
       age(clock_timestamp(), xact_start) AS transaction_age,
       state,
       query
FROM pg_stat_activity 
WHERE xact_start IS NOT NULL
ORDER BY xact_start;
 
-- Checkpoint statistics
SELECT checkpoints_timed,
       checkpoints_req,
       checkpoint_write_time,
       checkpoint_sync_time,
       buffers_checkpoint,
       buffers_backend
FROM pg_stat_bgwriter;
 
-- WAL archiving status
SELECT archived_count,
       failed_count,
       last_archived_wal,
       last_archived_time,
       last_failed_wal,
       last_failed_time
FROM pg_stat_archiver;

WAL Operations Determine Reliability

A well-implemented WAL is useless if operations fail. Set up comprehensive monitoring, automate alerts, test recovery procedures regularly. The best recovery mechanism is one you've practiced using.

Summary: WAL Importance

We've explored why Write-Ahead Logging is critical to database systems—from ACID guarantees to modern distributed architectures. Let's consolidate the key insights:

Key Takeaways

•WAL enables ACID — Atomicity through undo, durability through redo. Without WAL, you must sacrifice one or the other.
•WAL improves performance — Sequential log I/O is dramatically faster than random data page I/O. WAL makes databases faster, not slower.
•WAL enables replication — The log is a complete, ordered history of changes—perfect for streaming to replicas.
•WAL enables PITR — Point-in-Time Recovery lets you restore to any moment in the past using archived logs.
•WAL is universal — Every major database uses WAL. It has outcompeted all alternatives over 50 years.
•Modern systems extend WAL — From distributed consensus to event sourcing, WAL concepts are fundamental to system design.

Module Complete:

You've now mastered Write-Ahead Logging—from the foundational WAL rule through undo information, redo information, and the critical importance of this protocol. WAL is the invisible infrastructure that makes databases reliable. Every query, every transaction, every commit you've ever made depended on this protocol working correctly.

As you continue your database journey, remember: the log is the source of truth. Data pages are just a cache. Understanding this insight unlocks deep comprehension of how databases really work.

Module Complete

You have completed the Write-Ahead Logging module. You understand the WAL rule, undo and redo information, and why WAL is critical for database reliability and performance. This knowledge is foundational for understanding recovery algorithms (like ARIES), replication systems, and modern distributed databases.

5 / 5

Loading learning content...

Database Management SystemsWrite-Ahead Logging (WAL)

Write-Ahead Logging (WAL)

LevelIntermediate

Duration60 mins

TopicWrite-Ahead Logging (WAL)

5 / 5

WAL Importance

The Foundation of Database Reliability

This final page synthesizes everything we've learned, examining WAL's role in production systems and its critical importance to modern data infrastructure.

What You Will Learn

WAL and ACID Guarantees

The four ACID properties—Atomicity, Consistency, Isolation, Durability—are the foundation of reliable database operations. WAL is essential to two of these directly and supports the others:

Atomicity Through Undo

Atomicity requires that transactions either complete entirely or have no effect. When a transaction aborts (due to error, deadlock, or explicit rollback), all its changes must vanish.

WAL provides this through undo information:

Every modification's before-image is logged
Abort reverses changes using before-images
Crash recovery undoes uncommitted transactions
Result: Partial transactions leave no trace

Durability Through Redo

Durability requires that committed transactions survive any failure—power loss, crashes, hardware faults.

WAL provides this through redo information:

Every modification's after-image is logged before commit
Commit waits for log flush (forced to stable storage)
Recovery replays committed changes not yet on disk
Result: Committed data survives any failure

How WAL Enables ACID
ACID Property	WAL Component	Mechanism
Atomicity	Undo information (before-images)	Rollback reverses all changes; crash recovery undoes uncommitted work
Consistency	Complete recovery	Database always returns to consistent state after any failure
Isolation	Undo segments (MVCC)	Undo provides old versions for snapshot isolation
Durability	Redo information (after-images)	Committed transactions survive crashes via log replay

Supporting Consistency and Isolation:

While consistency and isolation are primarily enforced by constraints and concurrency control, WAL supports them:

Consistency: If a crash interrupts a constraint-validating operation, WAL ensures the database returns to a constraint-satisfying state. Partial constraint updates don't persist.
Isolation (MVCC): Many systems use undo information to provide old versions for MVCC. Readers access before-images to see consistent snapshots without blocking writers.

No WAL, No ACID

WAL for High Performance

It might seem that WAL adds overhead—writing to both log and data. But WAL actually improves performance dramatically compared to alternatives:

The Performance Insight:

Without WAL, every transaction commit would require:

Forcing all modified data pages to disk
These are random writes scattered across the disk
Each page write requires disk seek + rotation + transfer

With WAL:

Commit only requires forcing log records
Log writes are sequential (append-only)
Sequential writes are 10-100x faster than random writes
Data pages can be written lazily, batched optimally

Performance Comparison

Pseudocode

// WITHOUT WAL (FORCE policy):
// A transaction modifying 100 pages
 
function commitWithoutWAL(transaction):
    // Must force all modified pages to disk
    for page in transaction.modifiedPages:  // 100 pages
        disk.writePage(page)  // Random I/O each!
        disk.fsync()
    // Commit complete
    // Cost: 100 random I/Os × 10ms = 1000ms = 1 second
 
// WITH WAL (NO-FORCE policy):
// Same transaction modifying 100 pages
 
function commitWithWAL(transaction):
    // Only force log records
    for record in transaction.logRecords:  // Sequential buffer
        logBuffer.append(record)
    disk.appendToLog(logBuffer)  // Single sequential write
    disk.fsync()  // Single fsync
    // Commit complete
    // Cost: 1 sequential I/O × ~2ms = 2ms
 
// Performance difference: 500x faster commits!
 
// Data pages written later in background:
function backgroundFlush():
    dirtyPages = bufferPool.getDirtyPages()
    sortByPhysicalLocation(dirtyPages)  // Optimize disk scheduling
    for page in dirtyPages:
        disk.writePage(page)
    // No fsync needed per page - WAL provides durability

Why Sequential Beats Random:

Write Type	HDD Time	SSD Time	Why
Random 4KB	8-12 ms	0.1-0.5 ms	Seek + positioning or flash page lookup
Sequential 4KB	0.04 ms	0.02 ms	Append at head position, write combining
Random 100 pages	800-1200 ms	10-50 ms	Sum of random accesses
Sequential 100 pages	4 ms	2 ms	Streamed write

Even with SSDs (which have fast random access), sequential writes are faster due to write combining and simpler flash management.

Additional Performance Benefits:

Group Commit: Multiple transactions share single log flush
Asynchronous I/O: Data pages written in background
Optimized Flush Order: Background writer can sort pages by disk location
Reduced Write Amplification: Fewer total writes due to page buffering

WAL is Faster, Not Slower

WAL for Replication

One of WAL's most important modern uses is database replication—maintaining synchronized copies across multiple servers. The log becomes a complete, ordered record of all changes:

Log-Based Replication:

Primary server writes changes to WAL
WAL records are streamed to replica servers
Replicas apply the WAL records (redo)
Replicas become synchronized copies of primary

WAL Streaming Replication

Pseudocode

// PRIMARY SERVER
class PrimaryWALStreamer:
    replicas: List<ReplicaConnection>
    
    function onWALWrite(logRecords: List<LogRecord>):
        // After writing to local log, stream to replicas
        for replica in self.replicas:
            replica.stream.send(logRecords)
    
    function handleReplicaAck(replicaId: string, ackedLSN: LSN):
        // Replica confirmed receipt up to ackedLSN
        replicas[replicaId].confirmedLSN = ackedLSN
        
        // For synchronous replication: unblock waiting transactions
        for transaction in waitingForReplication:
            if allReplicasAcked(transaction.commitLSN):
                transaction.confirmCommit()
 
// REPLICA SERVER
class ReplicaWALReceiver:
    function onWALReceived(logRecords: List<LogRecord>):
        // Write received records to local WAL
        for record in logRecords:
            localLog.append(record)
        localLog.flush()
        
        // Apply records to local database (redo)
        for record in logRecords:
            page = bufferPool.getPage(record.pageID)
            if page.pageLSN < record.LSN:
                applyRedo(page, record)
                page.pageLSN = record.LSN
                page.markDirty()
        
        // Acknowledge receipt to primary
        primary.send(Ack(lastAppliedLSN))
 
// The replica's database is always a redo-recovery ahead
// of its last applied LSN

Replication Modes:

Mode	Behavior	Durability	Latency
Asynchronous	Primary doesn't wait for replicas	Committed data may be lost if primary fails before replica catches up	Lowest
Synchronous	Primary waits for ≥1 replica ack	Committed data survives primary failure	Higher
Quorum	Primary waits for majority of replicas	Committed data survives minority failures	Moderate

Why WAL is Ideal for Replication:

Complete History: WAL contains every change in order—nothing missed
Compact Transfer: Log records are smaller than full page copies
Idempotent Apply: Replay is idempotent—replicas can apply same record multiple times safely
Unified Format: Same format for recovery and replication—less code, fewer bugs
Point-in-Time: Replicas can be paused at any LSN—time travel for replicas

Logical vs Physical Replication

WAL for Point-in-Time Recovery

Beyond crash recovery, WAL enables Point-in-Time Recovery (PITR)—restoring a database to any moment in the past:

The PITR Process:

Start with a base backup (full database copy at time T₀)
Apply archived WAL files from T₀ to target time T₁
Stop replay at T₁
Database is now in exact state at T₁

Use Cases for PITR:

When Point-in-Time Recovery Saves the Day

•Accidental deletion — User runs DELETE FROM users without WHERE clause at 2:30 PM. Restore to 2:29 PM to recover data.
•Data corruption — A bug corrupts records starting Tuesday. Restore to Monday, fix bug, replay valid transactions.
•Compliance audit — Regulator needs database state as of March 1st. Restore to that date for audit.
•Testing — Create a database copy reflecting production at specific point for debugging.
•Ransomware attack — Attacker encrypts database. Restore from pre-attack backup using archived WAL.

PostgreSQL PITR Example
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# 1. Configure WAL archiving (postgresql.conf)
archive_mode = on
archive_command = 'cp %p /archive/wal/%f'
 
# 2. Take a base backup
pg_basebackup -D /backup/base_2024_01_15 -Ft -z -P
 
# 3. Later: disaster strikes at 14:30
# Someone accidentally deletes the users table
 
# 4. Restore base backup 
cd /var/lib/postgresql
rm -rf data/*
tar xzf /backup/base_2024_01_15/base.tar.gz -C data/
 
# 5. Create recovery.conf (PostgreSQL 12+: recovery.signal)
cat > data/postgresql.auto.conf << EOF
restore_command = 'cp /archive/wal/%f %p'
recovery_target_time = '2024-01-15 14:29:00'
recovery_target_action = 'pause'
EOF
 
# 6. Start PostgreSQL - it will replay WAL up to 14:29
pg_ctl start -D data/
 
# 7. Database is now in state before the deletion
# Extract needed data or promote to primary

PITR Requirements:

Continuous WAL archiving — All WAL segments from base backup to target time must be available
Unbroken chain — Missing WAL segments break recovery (cannot skip)
Archive storage — Sufficient space for WAL archives (can be months of data)
Regular base backups — Without recent base backup, must replay more WAL (slower)

Recovery Time Considerations:

More WAL to replay = longer recovery time
Weekly base backups: Worst case = 1 week of WAL replay
Daily base backups: Worst case = 1 day of WAL replay
Balance backup frequency against recovery time objectives

Continuous Archiving is Critical

WAL in Modern Architectures

WAL has evolved beyond traditional single-server databases to become central to modern distributed and cloud-native architectures:

1. Distributed Databases

In distributed databases like CockroachDB, TiDB, and Spanner:

Each node maintains its own WAL
Consensus protocols (Raft, Paxos) replicate WAL entries
A transaction commits only when its WAL is replicated to a quorum
WAL becomes the unit of distributed consensus

WAL in Different Database Architectures
Architecture	WAL Role	Key Innovation
Single-Node (PostgreSQL)	Local log file for recovery and replication	Classic WAL—recovery and streaming replication
Shared-Disk (Oracle RAC)	WAL coordinated across nodes, stored on shared storage	Cluster-wide recovery coordination
Shared-Nothing Distributed	Per-shard WAL replicated via consensus	WAL = unit of consensus in Raft/Paxos
Cloud-Native (Aurora)	WAL as the primary data path, pages on demand	Log is the database—pages are just cache
Log-Structured (Kafka)	Entire system is a distributed commit log	WAL elevated to first-class data structure

2. Aurora and the 'Log is the Database' Philosophy

Amazon Aurora takes WAL to its logical extreme:

Only WAL records are written to storage layer
Data pages are never explicitly written by database
Storage layer materializes pages from log on demand
The log IS the database—pages are computed, not stored

This provides:

6-way replication of logs (durability)
Instant crash recovery (no redo phase—storage has logs)
Fast replica creation (share logs, not pages)

3. Event Sourcing and Change Data Capture

WAL concepts extend to application architecture:

Event sourcing: Application state derived from event log (like redo from WAL)
Change data capture (CDC): WAL streamed to data pipelines (Debezium, Maxwell)
CQRS: Write to log, derive read models from log events

WAL as Universal Pattern

Alternatives to WAL

Over decades, researchers have proposed alternatives to WAL. Understanding why they've failed (or been relegated to niche uses) reinforces WAL's importance:

1. Shadow Paging (No-Overwrite)

Concept: Never overwrite existing pages. Always write modified pages to new locations. Atomically update a 'master pointer' from old to new pages.

Examples: System R (historical), LMDB (specialized)

Problems:

Fragmenting: Pages scattered across disk, losing locality
Space amplification: Need space for both old and new copies
Garbage collection: Complex cleanup of obsolete pages
Poor cache utilization: Random access patterns

2. Force/No-Steal (Simpler Logging)

Concept: Force all pages at commit (no redo needed). Don't allow uncommitted pages to disk (no undo needed).

Problems:

Commit latency: Must wait for all page I/Os
Memory pressure: Uncommitted pages pinned in memory
Long transactions: May exhaust buffer pool
Poor concurrency: Heavy commit I/O blocking

WAL vs Alternatives
Approach	Commit Speed	Memory Use	Recovery	Adoption
WAL (Steal/No-Force)	Fast (log only)	Efficient (steal pages)	Undo + Redo	Universal standard
Shadow Paging	Slow (copy-on-write)	High (multiple copies)	Instant (switch pointers)	Niche (LMDB)
Force/No-Steal	Very slow (all pages)	High (no steal)	None needed	Embedded only
No Recovery	Fastest	Most efficient	None (data loss)	Caches only

3. Replication Instead of Logging

Concept: Synchronously replicate to multiple nodes. If primary fails, replica takes over.

Problems:

Network partitions: What happens when replicas can't communicate?
Split brain: Multiple nodes think they're primary
Still need local durability: What if ALL nodes crash simultaneously?
Consistency: Replication is necessary but not sufficient for recovery

Why WAL Wins:

WAL provides the best balance of:

Performance: Sequential log I/O is fast
Flexibility: Steal/No-Force enables optimal buffer management
Durability: Committed data survives any failure
Recovery: Both undo and redo capabilities
Replication: Log is natural replication source
Simplicity: Well-understood, proven correct

50 Years of Refinement

WAL Operational Considerations

Understanding WAL's importance includes knowing how to operate WAL-based systems effectively:

Log Space Management:

Common WAL Operational Issues

•Log disk fills up — WAL grows until checkpoint; if archiving fails, logs accumulate. Monitor disk space; alert at 80% and page at 90%.
•Long-running transactions — Prevent undo purge and checkpoint advancement. Set statement timeouts; monitor transaction age.
•Replication lag — Replica falls behind; primary can't recycle logs. Monitor replication delay; alert on growing lag.
•Archive failures — Missing archives break PITR chain. Monitor archive_command exit code; redundant archive destinations.
•Checkpoint stalls — Heavy write load with insufficient I/O capacity. Tune checkpoint_completion_target; upgrade storage.

Key Metrics to Monitor:

Metric	Why It Matters	Alert Threshold
WAL write rate	Indicates write load	Unusual spikes
Log space used	Prevent disk exhaustion	>80% capacity
Checkpoint frequency	Balance recovery time and I/O	Varies by workload
Replication lag	Replica health	>10 seconds
Oldest transaction age	Undo bloat prevention	>5 minutes
Archive success rate	PITR chain integrity	<100%

PostgreSQL WAL Monitoring Queries
sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Current WAL position and size
SELECT pg_current_wal_lsn() AS current_wal_lsn,
       pg_walfile_name(pg_current_wal_lsn()) AS current_wal_file;
 
-- WAL generation rate (bytes per second)
SELECT pg_wal_lsn_diff(
         pg_current_wal_lsn(),
         pg_stat_replication.sent_lsn
       ) AS replication_lag_bytes
FROM pg_stat_replication;
 
-- Oldest active transaction (undo bloat risk)
SELECT pid, 
       usename, 
       age(clock_timestamp(), xact_start) AS transaction_age,
       state,
       query
FROM pg_stat_activity 
WHERE xact_start IS NOT NULL
ORDER BY xact_start;
 
-- Checkpoint statistics
SELECT checkpoints_timed,
       checkpoints_req,
       checkpoint_write_time,
       checkpoint_sync_time,
       buffers_checkpoint,
       buffers_backend
FROM pg_stat_bgwriter;
 
-- WAL archiving status
SELECT archived_count,
       failed_count,
       last_archived_wal,
       last_archived_time,
       last_failed_wal,
       last_failed_time
FROM pg_stat_archiver;

WAL Operations Determine Reliability

A well-implemented WAL is useless if operations fail. Set up comprehensive monitoring, automate alerts, test recovery procedures regularly. The best recovery mechanism is one you've practiced using.

Summary: WAL Importance

We've explored why Write-Ahead Logging is critical to database systems—from ACID guarantees to modern distributed architectures. Let's consolidate the key insights:

Key Takeaways

•WAL enables ACID — Atomicity through undo, durability through redo. Without WAL, you must sacrifice one or the other.
•WAL improves performance — Sequential log I/O is dramatically faster than random data page I/O. WAL makes databases faster, not slower.
•WAL enables replication — The log is a complete, ordered history of changes—perfect for streaming to replicas.
•WAL enables PITR — Point-in-Time Recovery lets you restore to any moment in the past using archived logs.
•WAL is universal — Every major database uses WAL. It has outcompeted all alternatives over 50 years.
•Modern systems extend WAL — From distributed consensus to event sourcing, WAL concepts are fundamental to system design.

Module Complete:

As you continue your database journey, remember: the log is the source of truth. Data pages are just a cache. Understanding this insight unlocks deep comprehension of how databases really work.

Module Complete

5 / 5