System Design (HLD)Leader-Follower Replication

Leader-Follower Replication

LevelIntermediate

Duration75 mins

TopicLeader-Follower Replication

2 / 5

Followers Replicate from Leader

The Art of Following: Maintaining Perfect Copies

In the previous page, we established that all writes flow through a single leader. But having one authoritative copy of the data isn't enough—we need those changes to propagate to followers reliably, efficiently, and with minimal delay.

Consider what followers must accomplish:

Receive every change the leader makes, in the exact order the leader made them
Apply changes durably so they survive follower crashes and restarts
Catch up after outages without losing data or corrupting their state
Minimize lag so reads from followers return reasonably recent data
Handle failures gracefully when the leader or network becomes temporarily unavailable

This page explores the mechanisms that make follower replication work—the protocols, data formats, and operational patterns that keep copies synchronized across the cluster.

What You Will Learn

By the end of this page, you will understand how replication logs are structured and transmitted, the differences between physical and logical replication, log streaming versus log shipping, how followers recover from outages, and best practices for maintaining healthy replication.

The Replication Log: Foundation of Follower Sync

At the heart of leader-to-follower replication is the replication log—a sequential, append-only record of all changes made to the database. This log is the single source of truth for what the follower must apply.

Different databases call this log different names:

PostgreSQL: Write-Ahead Log (WAL)
MySQL: Binary Log (binlog)
MongoDB: Oplog (operations log)
SQL Server: Transaction Log
Oracle: Redo Log

Despite the naming variations, the concept is identical: an ordered sequence of records, each describing a change to the database.

Conceptual Log Structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
┌─────────────────────────────────────────────────────────────────────────────┐
│                         REPLICATION LOG                                      │
│                                                                             │
│  ┌──────────────────────────────────────────────────────────────────────┐   │
│  │ LSN: 00000001  │  Transaction: BEGIN   │  Timestamp: 2024-01-15 10:00 │   │
│  │ LSN: 00000002  │  INSERT users (id=1, name='Alice')                  │   │
│  │ LSN: 00000003  │  INSERT orders (user_id=1, total=99.99)             │   │
│  │ LSN: 00000004  │  Transaction: COMMIT                                │   │
│  ├──────────────────────────────────────────────────────────────────────┤   │
│  │ LSN: 00000005  │  Transaction: BEGIN                                 │   │
│  │ LSN: 00000006  │  UPDATE users SET balance=500 WHERE id=1            │   │
│  │ LSN: 00000007  │  Transaction: COMMIT                                │   │
│  ├──────────────────────────────────────────────────────────────────────┤   │
│  │ LSN: 00000008  │  Transaction: BEGIN                                 │   │
│  │ LSN: 00000009  │  DELETE FROM orders WHERE id=1                      │   │
│  │ LSN: 00000010  │  Transaction: COMMIT                                │   │
│  └──────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  LSN = Log Sequence Number (monotonically increasing)                       │
│  The log is append-only: new entries are always added at the end           │
└─────────────────────────────────────────────────────────────────────────────┘

Key Properties of Replication Logs

•Sequential and Ordered — Each entry has a unique, monotonically increasing identifier (LSN, GTID, oplog timestamp). This ordering is absolute across the entire database.
•Append-Only — New entries are always added at the end. Entries are never modified or deleted during normal operation (only purged after replication and backup).
•Durable — The log is persisted to disk before transactions commit. This is the D in ACID—the database can recover state by replaying the log.
•Complete — The log contains enough information to reconstruct any state change. A follower applying the log in order reaches the same state as the leader.
•Streamable — The log can be read incrementally as new entries arrive. Followers don't need to wait for batch boundaries.

Log Sequence Numbers (LSNs)

LSNs are the heartbeat of replication. Every follower tracks its current LSN—the last log entry it has applied. The difference between the leader's current LSN and the follower's LSN is the 'replication lag' in log-position terms. Recovery after failure means replaying log entries from the last applied LSN.

Physical vs. Logical Replication

Replication logs can encode changes at different levels of abstraction. The two primary approaches are physical replication and logical replication, each with distinct trade-offs.

Physical Replication (Byte-Level)

Physical replication transmits the exact byte-level changes to the database's storage files. The log contains information like "write these 8KB of data to page 42 of file users.dat."

Logical Replication (Operation-Level)

Logical replication transmits the logical operations: "INSERT row (id=1, name='Alice') into table users" or "DELETE rows WHERE age > 100 FROM users." The follower interprets these operations and applies them to its own storage.

Physical vs. Logical Replication Trade-offs
Aspect	Physical Replication	Logical Replication
Data Format	Byte-level storage changes	Logical SQL-like operations
Version Compatibility	Requires identical database versions	Supports different versions
Cross-Platform	Same OS, same architecture	Can replicate to different platforms
Selective Replication	All or nothing (full database)	Can filter tables/rows/columns
Performance	Lower CPU overhead	Higher CPU (must parse/apply)
Conflict Detection	Not possible (byte-level)	Possible (understands operations)
Use Case	High-availability replicas	Data integration, ETL, CDC

Physical Replication

•PostgreSQL Streaming Replication — Transmits WAL records containing page-level changes
•MySQL Binary Log (row format) — Contains before/after row images
•SQL Server Always On — Ships transaction log records
•Exact Replica — The follower is byte-for-byte identical to the leader

Logical Replication

•PostgreSQL Logical Replication — Transmits decoded row changes
•Debezium CDC — Captures changes as JSON events for Kafka
•MySQL GTIDs with Statement Mode — Replicates SQL statements
•Selective Sync — Can replicate a subset of tables to different targets

Choosing Replication Type

For high-availability failover replicas (same data center, same purpose as leader), physical replication is typically preferred—it's faster and simpler. For cross-version upgrades, data integration pipelines, or selective replication scenarios, logical replication provides the necessary flexibility.

Log Streaming vs. Log Shipping

Once the replication log exists, how does it get from the leader to followers? Two primary approaches have evolved: log shipping and log streaming.

Log Shipping vs. Log Streaming
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
LOG SHIPPING (File-Based)
═══════════════════════════════════════════════════════════════════════════
 
  LEADER                         FOLLOWER
  ──────                         ────────
  ┌─────────┐                    ┌─────────┐
  │ WAL     │  (1) Archive       │ Waiting │
  │ Segment │ ────┐              │   ...   │
  │  001    │     │              └─────────┘
  └─────────┘     │                   │
                  ▼                   │
            ┌──────────┐              │
            │  Shared  │  (2) Copy    │
            │  Storage │ ─────────────┘
            │ (NFS/S3) │              │
            └──────────┘              ▼
                              ┌─────────────┐
                              │ (3) Apply   │
                              │ WAL Segment │
                              │    001      │
                              └─────────────┘
 
  ✓ Simple: just file copies         ✗ Lag = segment size (16MB default)
  ✓ Works over unreliable networks   ✗ Minimum latency is segment duration
  ✓ Easy DR to remote locations      ✗ Not suitable for hot standby
 
 
LOG STREAMING (Connection-Based)
═══════════════════════════════════════════════════════════════════════════
 
  LEADER                              FOLLOWER
  ──────                              ────────
  ┌─────────────┐                     ┌─────────────┐
  │ WAL Buffer  │   Continuous        │ WAL Receive │
  │             │   TCP Stream        │   Process   │
  │ [New Entry] │─────────────────────│   [Apply]   │
  │ [New Entry] │   (real-time)       │   [Apply]   │
  │ [New Entry] │─────────────────────│   [Apply]   │
  └─────────────┘                     └─────────────┘
 
  ✓ Real-time: entries sent immediately    ✓ Sub-second lag possible
  ✓ Follower always near-current           ✓ Suitable for hot standby
  ✓ Efficient: no file overhead            ✗ Requires stable network
                                           ✗ More complex failure handling

Log Shipping (File-Based)

In log shipping, the leader writes complete log segment files (typically 16MB or 64MB) and archives them to shared storage (NFS, S3, or direct file copy). Followers periodically check for new segments, download them, and apply them.

Use case: Disaster recovery to a remote location over unreliable networks. The follower can be hours or days behind, but it will eventually catch up.

Log Streaming (Connection-Based)

In log streaming, the follower maintains a persistent TCP connection to the leader. As the leader writes new log entries, it immediately streams them to connected followers. The follower applies entries as they arrive.

Use case: Hot standby replicas that can serve reads and fail over quickly. Lag is typically sub-second under normal conditions.

Streaming Protocol Details

•Connection Initiation — Follower connects to leader's replication port, authenticates, and sends its current log position (LSN/GTID).
•Streaming — Leader sends log entries starting from the follower's position. Entries are sent in order, with sequence numbers for verification.
•Acknowledgment — Follower acknowledges received entries. In synchronous mode, the leader waits for acknowledgment before commit.
•Heartbeating — Both sides send periodic heartbeats to detect connection failures. Typical timeout: 30-60 seconds.
•Reconnection — On disconnect, follower reconnects and resumes from its last applied position. No data loss if the leader retained the needed log entries.

Log Retention is Critical

If the leader purges log entries before a follower has consumed them, the follower cannot catch up—it must be rebuilt from scratch. Use replication slots (PostgreSQL) or similar mechanisms to prevent premature log cleanup. But remember: holding too much log consumes disk space.

The Follower Apply Process

Receiving log entries is only half the battle—the follower must also apply them to its local data files. This process involves several stages and has its own complexities.

Follower Apply Pipeline
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
┌─────────────────────────────────────────────────────────────────────────────┐
│                        FOLLOWER APPLY PIPELINE                               │
└─────────────────────────────────────────────────────────────────────────────┘
 
  FROM LEADER
      │
      ▼
┌───────────────────┐
│ (1) RECEIVE       │   Network buffer receives log entries
│     Buffer        │   Entries queued for processing
└───────────────────┘
      │
      ▼
┌───────────────────┐
│ (2) WRITE TO      │   Entries written to local WAL (durable)
│     Local WAL     │   This is the follower's recovery point
└───────────────────┘
      │
      ▼
┌───────────────────┐
│ (3) PARSE &       │   Decode binary log format
│     Decode        │   Validate entry integrity (checksums)
└───────────────────┘
      │
      ▼
┌───────────────────┐
│ (4) APPLY TO      │   Execute the change against local storage
│     Data Files    │   Update tables, indexes, etc.
└───────────────────┘
      │
      ▼
┌───────────────────┐
│ (5) UPDATE        │   Record the new applied LSN
│     Apply Position│   This position is reported to the leader
└───────────────────┘
      │
      ▼
┌───────────────────┐
│ (6) ACKNOWLEDGE   │   Send acknowledgment to leader (if sync)
│     to Leader     │   Report apply position for monitoring
└───────────────────┘

Key Apply Process Considerations:

Crash Recovery: If the follower crashes, it restarts from its last applied position (stored durably). Because entries are written to the local WAL before application, recovery replays any entries that were received but not yet fully applied.

Apply Parallelism: Some databases parallelize the apply process by applying non-conflicting transactions concurrently. MySQL Group Replication and PostgreSQL's parallel apply feature both support this. The key challenge is maintaining transaction ordering for conflicting operations.

Apply Lag: The time between the leader writing an entry and the follower applying it is apply lag. This is distinct from network lag (time to transmit). A follower might receive entries quickly but apply them slowly if its disk or CPU is saturated.

Common Causes of Apply Lag
Cause	Symptom	Resolution
Slow Disk I/O	Apply position falls behind receive position	Faster storage (SSD), tune I/O scheduler
CPU Saturation	High CPU on follower, low on leader	Scale follower resources, reduce read load
Large Transactions	Periodic spikes in lag	Break up large transactions, batch smaller
Schema Changes (DDL)	Lag spikes during DDL	DDL locks entire table; run during low traffic
Network Saturation	Receive buffer grows	Increase network capacity, enable compression

Hot Standby Considerations

Many databases support 'hot standby' mode where followers serve read queries while applying changes. This introduces a potential conflict: long-running read queries might conflict with pending apply operations. PostgreSQL resolves this with configurable behavior: cancel the query, delay the apply, or allow stale reads.

Catching Up After Outages

Followers go offline. Networks partition. Hardware fails. When a follower comes back online after an outage, it must catch up to the leader's current state. The strategy depends on how far behind the follower is.

Recovery Scenarios

•Minor Lag (Log Available) — The follower reconnects, requests entries from its last applied position, and catches up by replaying the missing log. This is the happy path and takes seconds to minutes.
•Major Lag (Log Partially Available) — Some needed log entries have been purged from the leader or are in archived segments. The follower must fetch from archive storage or use WAL archives before reconnecting to streaming.
•Catastrophic Lag (Log Unavailable) — The follower is so far behind that no log covering its missing period exists. The follower must be rebuilt entirely from a base backup plus current log.

Follower Recovery Decision Tree

                     FOLLOWER COMES ONLINE
                              │
                              ▼
                    ┌────────────────────┐
                    │ Compare follower's │
                    │ last LSN to leader │
                    └────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
        ┌───────────┐   ┌───────────┐   ┌───────────┐
        │ Log entry │   │ Entry in  │   │ Entry not │
        │ available │   │  archive  │   │ available │
        │ on leader │   │  storage  │   │ anywhere  │
        └───────────┘   └───────────┘   └───────────┘
              │               │               │
              ▼               ▼               ▼
        ┌───────────┐   ┌───────────┐   ┌───────────┐
        │  Stream   │   │   Fetch   │   │  Rebuild  │
        │   from    │   │   from    │   │   from    │
        │  leader   │   │  archive  │   │  backup   │
        └───────────┘   └───────────┘   └───────────┘
              │               │               │
              └───────────────┼───────────────┘
                              │
                              ▼
                    ┌────────────────────┐
                    │  Resume streaming  │
                    │  replication       │
                    └────────────────────┘

Rebuild Process (pg_basebackup, xtrabackup, mongodump):

When a follower needs complete reconstruction:

Take Base Backup — Create a consistent snapshot of the leader's data files, either by stopping the database momentarily (cold backup) or using consistent snapshot mechanisms (hot backup).
Transfer to Follower — Copy all data files to the new follower. This can be network-intensive for large databases (terabytes = hours or days of transfer).
Record Start Position — Note the log position at the time of the backup. The follower will resume replication from this point.
Start Follower — Initialize the follower with the backup, configure it to stream from the leader starting at the noted position.
Catch Up — The follower applies all log entries from the backup point to the leader's current position.

Rebuild Time = Vulnerability Window

While a follower rebuilds, you have one fewer replica for redundancy. For large databases, rebuilds can take hours or days. Plan ahead: maintain enough replicas that losing one doesn't threaten availability, and consider techniques like incremental backup or storage-level snapshots to speed up rebuilds.

Replication Verification and Monitoring

Healthy replication is invisible—problems are not. Monitoring replication health is critical for catching issues before they become outages. Here's what to watch and how to verify correctness.

Critical Replication Metrics

•Replication Lag (Time-Based) — Seconds/minutes behind the leader. This is what users experience as 'stale reads.' Alert if lag exceeds your SLO threshold.
•Replication Lag (Bytes/Position) — LSN difference between leader and follower. Useful for understanding how much data needs to transfer.
•Replication State — Is the follower connected, streaming, and applying? States vary by database (PostgreSQL: streaming, catchup, startup).
•Apply Rate — How quickly is the follower applying log entries? Compare to leader's write rate to detect if follower is falling behind.
•Network Throughput — Bandwidth used for replication. Saturated network means growing lag.
•Slot/Position Retention — How much log is the leader retaining for slow followers? Growing retention risks disk exhaustion.

PostgreSQL Replication Monitoring
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- View replication status on the leader
SELECT
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    write_lag,
    flush_lag,
    replay_lag
FROM pg_stat_replication;
 
-- Check replication slots and their lag
SELECT
    slot_name,
    active,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag_size
FROM pg_replication_slots;
 
-- On the follower, check if recovery is complete
SELECT pg_is_in_recovery(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn();

Data Verification:

Beyond monitoring metrics, you should periodically verify that follower data actually matches the leader. Techniques include:

Checksum Comparison — Some tools (pt-table-checksum for MySQL) compute checksums of table contents on leader and follower, alerting on mismatches.
Row Count Comparison — Simple but effective: compare row counts for critical tables.
Application-Level Verification — Write a value to the leader, then read it from each follower after expected propagation time.
Automated Testing — Chaos engineering that fails followers and verifies they recover correctly.

Alert Before Users Notice

Set lag alerts well below your tolerance threshold. If your application can tolerate 30 seconds of lag, alert at 10 seconds. This gives you time to investigate and remediate before users experience stale data.

Best Practices for Follower Management

Years of operational experience have distilled specific practices for maintaining healthy, reliable follower replication. These recommendations apply across database systems.

Operational Best Practices

•Maintain At Least 2 Followers — One for high-availability failover, one for disaster recovery or read scaling. Never have a single point of failure.
•Match Follower Resources to Leader — If the leader has 64GB RAM and fast SSDs, followers need similar resources to keep up. Underpowered followers chronically lag.
•Use Replication Slots/GTIDs — Ensure the leader retains needed log entries until all followers have consumed them. Prevent the 'follower too far behind' rebuild scenario.
•Monitor Lag Continuously — Alert on lag exceeding thresholds. Investigate all lag spikes, even transient ones—they often precede larger issues.
•Test Failover Regularly — Run planned failover drills. Verify that write routing, leader election, and follower promotion all work correctly. Don't wait for an outage to discover problems.
•Stagger Maintenance — Never restart or upgrade all replicas simultaneously. Maintain quorum and availability throughout maintenance windows.
•Plan for Rebuilds — Know how long a rebuild takes for your database size. Ensure you can tolerate being one replica short for that duration.
•Document Recovery Procedures — Written runbooks for common scenarios: follower lag, follower crash, leader failover, full rebuild. Practice them.

Follower Configuration Recommendations
Setting	Recommendation	Rationale
Follower Count	Minimum 2 (prefer 3+)	Survive one failure while maintaining read capacity
Synchronous Replicas	At most 1 (for critical data)	Balance durability against write latency
Log Retention	Slightly more than max expected lag	Prevent rebuild after transient outages
Read Routing	Load balance across followers	Distribute read load, reduce leader burden
Backup Source	Prefer followers over leader	Reduce leader load, followers are already consistent

Followers Are Not Backups

Replication protects against hardware failure but not against data corruption or accidental deletion. If you DELETE all rows from a table on the leader, followers replicate the DELETE. Maintain independent backups (periodic snapshots + log archives) for point-in-time recovery.

Summary: Followers Replicate from Leader

We've explored the follower side of leader-follower replication—how followers receive, store, and apply changes from the leader to maintain synchronized copies. Let's consolidate the essential knowledge:

Key Takeaways

•The replication log is the single source of truth — Sequential, append-only, durable. Different databases call it WAL, binlog, or oplog, but the concept is universal.
•Physical replication is low-level and fast; logical replication is flexible — Choose physical for HA replicas, logical for cross-version, cross-platform, or selective scenarios.
•Log streaming provides real-time sync; log shipping is simpler but higher lag — Streaming is standard for hot standby; shipping works for distant DR sites.
•Apply is a pipeline: receive → store → decode → apply → acknowledge — Each stage can bottleneck. Monitor not just lag but also receive vs. apply position.
•Recovery means replaying log from last position — Minor lag catches up quickly; major lag may require archive fetch; catastrophic lag means full rebuild.
•Monitor lag, state, and apply rate continuously — Alert before users notice. Test failover regularly. Document recovery runbooks.

What's Next:

We've covered how writes flow through the leader and how followers replicate. But we've glossed over a critical detail: when does replication happen relative to commit? The next page dives deep into synchronous versus asynchronous replication—the trade-off between durability guarantees and write performance that shapes every production deployment.

Page Complete

You now understand how followers maintain synchronized copies of the leader's data through log-based replication. Next, we'll explore the critical choice between synchronous and asynchronous replication—a decision that directly impacts durability, latency, and availability.

2 / 5

Loading learning content...

System Design (HLD)Leader-Follower Replication

Leader-Follower Replication

LevelIntermediate

Duration75 mins

TopicLeader-Follower Replication

2 / 5

Followers Replicate from Leader

The Art of Following: Maintaining Perfect Copies

Consider what followers must accomplish:

Receive every change the leader makes, in the exact order the leader made them
Apply changes durably so they survive follower crashes and restarts
Catch up after outages without losing data or corrupting their state
Minimize lag so reads from followers return reasonably recent data
Handle failures gracefully when the leader or network becomes temporarily unavailable

This page explores the mechanisms that make follower replication work—the protocols, data formats, and operational patterns that keep copies synchronized across the cluster.

What You Will Learn

The Replication Log: Foundation of Follower Sync

Different databases call this log different names:

PostgreSQL: Write-Ahead Log (WAL)
MySQL: Binary Log (binlog)
MongoDB: Oplog (operations log)
SQL Server: Transaction Log
Oracle: Redo Log

Despite the naming variations, the concept is identical: an ordered sequence of records, each describing a change to the database.

Conceptual Log Structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
┌─────────────────────────────────────────────────────────────────────────────┐
│                         REPLICATION LOG                                      │
│                                                                             │
│  ┌──────────────────────────────────────────────────────────────────────┐   │
│  │ LSN: 00000001  │  Transaction: BEGIN   │  Timestamp: 2024-01-15 10:00 │   │
│  │ LSN: 00000002  │  INSERT users (id=1, name='Alice')                  │   │
│  │ LSN: 00000003  │  INSERT orders (user_id=1, total=99.99)             │   │
│  │ LSN: 00000004  │  Transaction: COMMIT                                │   │
│  ├──────────────────────────────────────────────────────────────────────┤   │
│  │ LSN: 00000005  │  Transaction: BEGIN                                 │   │
│  │ LSN: 00000006  │  UPDATE users SET balance=500 WHERE id=1            │   │
│  │ LSN: 00000007  │  Transaction: COMMIT                                │   │
│  ├──────────────────────────────────────────────────────────────────────┤   │
│  │ LSN: 00000008  │  Transaction: BEGIN                                 │   │
│  │ LSN: 00000009  │  DELETE FROM orders WHERE id=1                      │   │
│  │ LSN: 00000010  │  Transaction: COMMIT                                │   │
│  └──────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  LSN = Log Sequence Number (monotonically increasing)                       │
│  The log is append-only: new entries are always added at the end           │
└─────────────────────────────────────────────────────────────────────────────┘

Key Properties of Replication Logs

•Sequential and Ordered — Each entry has a unique, monotonically increasing identifier (LSN, GTID, oplog timestamp). This ordering is absolute across the entire database.
•Append-Only — New entries are always added at the end. Entries are never modified or deleted during normal operation (only purged after replication and backup).
•Durable — The log is persisted to disk before transactions commit. This is the D in ACID—the database can recover state by replaying the log.
•Complete — The log contains enough information to reconstruct any state change. A follower applying the log in order reaches the same state as the leader.
•Streamable — The log can be read incrementally as new entries arrive. Followers don't need to wait for batch boundaries.

Log Sequence Numbers (LSNs)

Physical vs. Logical Replication

Replication logs can encode changes at different levels of abstraction. The two primary approaches are physical replication and logical replication, each with distinct trade-offs.

Physical Replication (Byte-Level)

Physical replication transmits the exact byte-level changes to the database's storage files. The log contains information like "write these 8KB of data to page 42 of file users.dat."

Logical Replication (Operation-Level)

Physical vs. Logical Replication Trade-offs
Aspect	Physical Replication	Logical Replication
Data Format	Byte-level storage changes	Logical SQL-like operations
Version Compatibility	Requires identical database versions	Supports different versions
Cross-Platform	Same OS, same architecture	Can replicate to different platforms
Selective Replication	All or nothing (full database)	Can filter tables/rows/columns
Performance	Lower CPU overhead	Higher CPU (must parse/apply)
Conflict Detection	Not possible (byte-level)	Possible (understands operations)
Use Case	High-availability replicas	Data integration, ETL, CDC

Physical Replication

•PostgreSQL Streaming Replication — Transmits WAL records containing page-level changes
•MySQL Binary Log (row format) — Contains before/after row images
•SQL Server Always On — Ships transaction log records
•Exact Replica — The follower is byte-for-byte identical to the leader

Logical Replication

•PostgreSQL Logical Replication — Transmits decoded row changes
•Debezium CDC — Captures changes as JSON events for Kafka
•MySQL GTIDs with Statement Mode — Replicates SQL statements
•Selective Sync — Can replicate a subset of tables to different targets

Choosing Replication Type

Log Streaming vs. Log Shipping

Once the replication log exists, how does it get from the leader to followers? Two primary approaches have evolved: log shipping and log streaming.

Log Shipping vs. Log Streaming
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
LOG SHIPPING (File-Based)
═══════════════════════════════════════════════════════════════════════════
 
  LEADER                         FOLLOWER
  ──────                         ────────
  ┌─────────┐                    ┌─────────┐
  │ WAL     │  (1) Archive       │ Waiting │
  │ Segment │ ────┐              │   ...   │
  │  001    │     │              └─────────┘
  └─────────┘     │                   │
                  ▼                   │
            ┌──────────┐              │
            │  Shared  │  (2) Copy    │
            │  Storage │ ─────────────┘
            │ (NFS/S3) │              │
            └──────────┘              ▼
                              ┌─────────────┐
                              │ (3) Apply   │
                              │ WAL Segment │
                              │    001      │
                              └─────────────┘
 
  ✓ Simple: just file copies         ✗ Lag = segment size (16MB default)
  ✓ Works over unreliable networks   ✗ Minimum latency is segment duration
  ✓ Easy DR to remote locations      ✗ Not suitable for hot standby
 
 
LOG STREAMING (Connection-Based)
═══════════════════════════════════════════════════════════════════════════
 
  LEADER                              FOLLOWER
  ──────                              ────────
  ┌─────────────┐                     ┌─────────────┐
  │ WAL Buffer  │   Continuous        │ WAL Receive │
  │             │   TCP Stream        │   Process   │
  │ [New Entry] │─────────────────────│   [Apply]   │
  │ [New Entry] │   (real-time)       │   [Apply]   │
  │ [New Entry] │─────────────────────│   [Apply]   │
  └─────────────┘                     └─────────────┘
 
  ✓ Real-time: entries sent immediately    ✓ Sub-second lag possible
  ✓ Follower always near-current           ✓ Suitable for hot standby
  ✓ Efficient: no file overhead            ✗ Requires stable network
                                           ✗ More complex failure handling

Log Shipping (File-Based)

Use case: Disaster recovery to a remote location over unreliable networks. The follower can be hours or days behind, but it will eventually catch up.

Log Streaming (Connection-Based)

Use case: Hot standby replicas that can serve reads and fail over quickly. Lag is typically sub-second under normal conditions.

Streaming Protocol Details

•Connection Initiation — Follower connects to leader's replication port, authenticates, and sends its current log position (LSN/GTID).
•Streaming — Leader sends log entries starting from the follower's position. Entries are sent in order, with sequence numbers for verification.
•Acknowledgment — Follower acknowledges received entries. In synchronous mode, the leader waits for acknowledgment before commit.
•Heartbeating — Both sides send periodic heartbeats to detect connection failures. Typical timeout: 30-60 seconds.
•Reconnection — On disconnect, follower reconnects and resumes from its last applied position. No data loss if the leader retained the needed log entries.

Log Retention is Critical

The Follower Apply Process

Receiving log entries is only half the battle—the follower must also apply them to its local data files. This process involves several stages and has its own complexities.

Follower Apply Pipeline
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
┌─────────────────────────────────────────────────────────────────────────────┐
│                        FOLLOWER APPLY PIPELINE                               │
└─────────────────────────────────────────────────────────────────────────────┘
 
  FROM LEADER
      │
      ▼
┌───────────────────┐
│ (1) RECEIVE       │   Network buffer receives log entries
│     Buffer        │   Entries queued for processing
└───────────────────┘
      │
      ▼
┌───────────────────┐
│ (2) WRITE TO      │   Entries written to local WAL (durable)
│     Local WAL     │   This is the follower's recovery point
└───────────────────┘
      │
      ▼
┌───────────────────┐
│ (3) PARSE &       │   Decode binary log format
│     Decode        │   Validate entry integrity (checksums)
└───────────────────┘
      │
      ▼
┌───────────────────┐
│ (4) APPLY TO      │   Execute the change against local storage
│     Data Files    │   Update tables, indexes, etc.
└───────────────────┘
      │
      ▼
┌───────────────────┐
│ (5) UPDATE        │   Record the new applied LSN
│     Apply Position│   This position is reported to the leader
└───────────────────┘
      │
      ▼
┌───────────────────┐
│ (6) ACKNOWLEDGE   │   Send acknowledgment to leader (if sync)
│     to Leader     │   Report apply position for monitoring
└───────────────────┘

Key Apply Process Considerations:

Common Causes of Apply Lag
Cause	Symptom	Resolution
Slow Disk I/O	Apply position falls behind receive position	Faster storage (SSD), tune I/O scheduler
CPU Saturation	High CPU on follower, low on leader	Scale follower resources, reduce read load
Large Transactions	Periodic spikes in lag	Break up large transactions, batch smaller
Schema Changes (DDL)	Lag spikes during DDL	DDL locks entire table; run during low traffic
Network Saturation	Receive buffer grows	Increase network capacity, enable compression

Hot Standby Considerations

Catching Up After Outages

Recovery Scenarios

•Minor Lag (Log Available) — The follower reconnects, requests entries from its last applied position, and catches up by replaying the missing log. This is the happy path and takes seconds to minutes.
•Major Lag (Log Partially Available) — Some needed log entries have been purged from the leader or are in archived segments. The follower must fetch from archive storage or use WAL archives before reconnecting to streaming.
•Catastrophic Lag (Log Unavailable) — The follower is so far behind that no log covering its missing period exists. The follower must be rebuilt entirely from a base backup plus current log.

Follower Recovery Decision Tree

                     FOLLOWER COMES ONLINE
                              │
                              ▼
                    ┌────────────────────┐
                    │ Compare follower's │
                    │ last LSN to leader │
                    └────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
        ┌───────────┐   ┌───────────┐   ┌───────────┐
        │ Log entry │   │ Entry in  │   │ Entry not │
        │ available │   │  archive  │   │ available │
        │ on leader │   │  storage  │   │ anywhere  │
        └───────────┘   └───────────┘   └───────────┘
              │               │               │
              ▼               ▼               ▼
        ┌───────────┐   ┌───────────┐   ┌───────────┐
        │  Stream   │   │   Fetch   │   │  Rebuild  │
        │   from    │   │   from    │   │   from    │
        │  leader   │   │  archive  │   │  backup   │
        └───────────┘   └───────────┘   └───────────┘
              │               │               │
              └───────────────┼───────────────┘
                              │
                              ▼
                    ┌────────────────────┐
                    │  Resume streaming  │
                    │  replication       │
                    └────────────────────┘

Rebuild Process (pg_basebackup, xtrabackup, mongodump):

When a follower needs complete reconstruction:

Take Base Backup — Create a consistent snapshot of the leader's data files, either by stopping the database momentarily (cold backup) or using consistent snapshot mechanisms (hot backup).
Transfer to Follower — Copy all data files to the new follower. This can be network-intensive for large databases (terabytes = hours or days of transfer).
Record Start Position — Note the log position at the time of the backup. The follower will resume replication from this point.
Start Follower — Initialize the follower with the backup, configure it to stream from the leader starting at the noted position.
Catch Up — The follower applies all log entries from the backup point to the leader's current position.

Rebuild Time = Vulnerability Window

Replication Verification and Monitoring

Healthy replication is invisible—problems are not. Monitoring replication health is critical for catching issues before they become outages. Here's what to watch and how to verify correctness.

Critical Replication Metrics

•Replication Lag (Time-Based) — Seconds/minutes behind the leader. This is what users experience as 'stale reads.' Alert if lag exceeds your SLO threshold.
•Replication Lag (Bytes/Position) — LSN difference between leader and follower. Useful for understanding how much data needs to transfer.
•Replication State — Is the follower connected, streaming, and applying? States vary by database (PostgreSQL: streaming, catchup, startup).
•Apply Rate — How quickly is the follower applying log entries? Compare to leader's write rate to detect if follower is falling behind.
•Network Throughput — Bandwidth used for replication. Saturated network means growing lag.
•Slot/Position Retention — How much log is the leader retaining for slow followers? Growing retention risks disk exhaustion.

PostgreSQL Replication Monitoring
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- View replication status on the leader
SELECT
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    write_lag,
    flush_lag,
    replay_lag
FROM pg_stat_replication;
 
-- Check replication slots and their lag
SELECT
    slot_name,
    active,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag_size
FROM pg_replication_slots;
 
-- On the follower, check if recovery is complete
SELECT pg_is_in_recovery(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn();

Data Verification:

Beyond monitoring metrics, you should periodically verify that follower data actually matches the leader. Techniques include:

Checksum Comparison — Some tools (pt-table-checksum for MySQL) compute checksums of table contents on leader and follower, alerting on mismatches.
Row Count Comparison — Simple but effective: compare row counts for critical tables.
Application-Level Verification — Write a value to the leader, then read it from each follower after expected propagation time.
Automated Testing — Chaos engineering that fails followers and verifies they recover correctly.

Alert Before Users Notice

Best Practices for Follower Management

Years of operational experience have distilled specific practices for maintaining healthy, reliable follower replication. These recommendations apply across database systems.

Operational Best Practices

•Maintain At Least 2 Followers — One for high-availability failover, one for disaster recovery or read scaling. Never have a single point of failure.
•Match Follower Resources to Leader — If the leader has 64GB RAM and fast SSDs, followers need similar resources to keep up. Underpowered followers chronically lag.
•Use Replication Slots/GTIDs — Ensure the leader retains needed log entries until all followers have consumed them. Prevent the 'follower too far behind' rebuild scenario.
•Monitor Lag Continuously — Alert on lag exceeding thresholds. Investigate all lag spikes, even transient ones—they often precede larger issues.
•Test Failover Regularly — Run planned failover drills. Verify that write routing, leader election, and follower promotion all work correctly. Don't wait for an outage to discover problems.
•Stagger Maintenance — Never restart or upgrade all replicas simultaneously. Maintain quorum and availability throughout maintenance windows.
•Plan for Rebuilds — Know how long a rebuild takes for your database size. Ensure you can tolerate being one replica short for that duration.
•Document Recovery Procedures — Written runbooks for common scenarios: follower lag, follower crash, leader failover, full rebuild. Practice them.

Follower Configuration Recommendations
Setting	Recommendation	Rationale
Follower Count	Minimum 2 (prefer 3+)	Survive one failure while maintaining read capacity
Synchronous Replicas	At most 1 (for critical data)	Balance durability against write latency
Log Retention	Slightly more than max expected lag	Prevent rebuild after transient outages
Read Routing	Load balance across followers	Distribute read load, reduce leader burden
Backup Source	Prefer followers over leader	Reduce leader load, followers are already consistent

Followers Are Not Backups

Summary: Followers Replicate from Leader

Key Takeaways

•The replication log is the single source of truth — Sequential, append-only, durable. Different databases call it WAL, binlog, or oplog, but the concept is universal.
•Physical replication is low-level and fast; logical replication is flexible — Choose physical for HA replicas, logical for cross-version, cross-platform, or selective scenarios.
•Log streaming provides real-time sync; log shipping is simpler but higher lag — Streaming is standard for hot standby; shipping works for distant DR sites.
•Apply is a pipeline: receive → store → decode → apply → acknowledge — Each stage can bottleneck. Monitor not just lag but also receive vs. apply position.
•Recovery means replaying log from last position — Minor lag catches up quickly; major lag may require archive fetch; catastrophic lag means full rebuild.
•Monitor lag, state, and apply rate continuously — Alert before users notice. Test failover regularly. Document recovery runbooks.

What's Next:

Page Complete

2 / 5