Loading learning content...
A user updates their profile picture. They click 'Save,' see a success message, and immediately navigate to their profile to admire their new photo. But they see the old picture.
Confused, they refresh. Still the old picture. They refresh again—finally, the new picture appears.
What happened? The user wrote to the leader, but their subsequent read hit a follower that hadn't yet received the update. This is replication lag—the time delay between when a change is committed on the leader and when it's visible on followers.
Replication lag is an inherent property of leader-follower systems. It cannot be eliminated (without synchronous replication's latency costs), only managed. Understanding its causes, measuring it accurately, and designing applications to handle it gracefully is essential for building reliable distributed systems.
By the end of this page, you will understand why replication lag exists, how to measure it accurately, the various causes and how to address them, the impact on application behavior, read-your-writes consistency patterns, and strategies for minimizing lag's impact on user experience.
Replication lag is the delay between a write being committed on the leader and that same write being applied and visible on a follower. It's an inherent consequence of asynchronous replication—by definition, if we don't wait for followers before acknowledging a write, followers will be behind.
Lag Can Be Measured Two Ways:
Both perspectives are useful. Time-based lag tells you how stale reads from the follower are. Position-based lag tells you how much data needs to transfer.
12345678910111213141516171819202122
LEADER TIMELINE:═══════════════════════════════════════════════════════════════════════════════ Write Events: W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 │ │ │ │ │ │ │ │ │ │ Leader Time: 0ms 100ms 200ms 300ms 400ms 500ms 600ms 700ms 800ms 900ms ▲ CURRENT TIME FOLLOWER TIMELINE (with ~300ms lag):═══════════════════════════════════════════════════════════════════════════════ Apply Events: W1 W2 W3 W4 W5 W6 W7 │ │ │ │ │ │ │ Follower Time: 100ms 200ms 300ms 400ms 500ms 600ms 700ms───────────────now ▲ LAST APPLIED LAG = 900ms (current time) - 600ms (W7's leader timestamp) = 300ms Transactions W8, W9, W10 exist on leader but NOT YET on follower A read hitting this follower won't see W8, W9, W10Don't confuse replication lag with network latency. Network latency is how long it takes to send data. Replication lag includes network latency PLUS apply time PLUS any queuing. A follower with 10ms network latency might have 5 seconds of replication lag if it's struggling to apply entries fast enough.
Replication lag can spike for many reasons. Understanding the root causes helps you diagnose and resolve lag issues effectively.
| Cause | Mechanism | Typical Symptoms | Resolution |
|---|---|---|---|
| High Write Volume | Leader produces entries faster than follower can apply | Lag grows under load, recovers when load drops | Scale follower hardware, parallelize apply |
| Large Transactions | Single transaction takes long to apply | Periodic lag spikes correlating with big batches | Break up large transactions, use smaller batches |
| Schema Changes (DDL) | DDL locks tables during apply | Lag spike during schema changes | Run DDL during low traffic, use online DDL tools |
| Slow Disk I/O | Follower storage can't keep up with write rate | High disk utilization, apply waits on I/O | Faster storage (SSD), optimize I/O scheduler |
| Network Congestion | Replication stream is throttled | Receive lag high, apply lag normal | Increase bandwidth, prioritize replication traffic |
| Read Query Contention | Heavy read load on follower competes with apply | Lag varies with read traffic patterns | Load balance reads across multiple followers |
| Apply Thread Bottleneck | Single-threaded apply can't parallelize | CPU single core at 100%, others idle | Enable parallel apply (if supported), faster CPU |
| Long-Running Queries | Apply blocked by long read queries (on some DBs) | Lag correlates with slow query duration | Kill long queries, configure query cancellation |
Deep Dive: The Apply Bottleneck
Historically, database replication applied changes single-threaded: one transaction at a time, in order. This matches the leader's write order but limits throughput to what one CPU core can handle.
Modern databases increasingly support parallel apply:
Parallel apply dramatically improves throughput but must preserve ordering for conflicting operations (same row/document). Implementing this correctly is complex.
1234567891011121314151617181920212223242526
SINGLE-THREADED APPLY:─────────────────────────────────────────────────────────────────────────────── Incoming: [Tx1] [Tx2] [Tx3] [Tx4] [Tx5] [Tx6] [Tx7] [Tx8] | v Apply: [Tx1]─────[Tx2]─────[Tx3]─────[Tx4]─────[Tx5]... ────────────────────────────────────────────────▶ time Throughput limited to: (1 transaction / apply_time) PARALLEL APPLY (4 threads):─────────────────────────────────────────────────────────────────────────────── Incoming: [Tx1] [Tx2] [Tx3] [Tx4] [Tx5] [Tx6] [Tx7] [Tx8] | v (distribute non-conflicting transactions) Thread 1: [Tx1]─────────────[Tx5]──────────────... Thread 2: [Tx2]─────────────[Tx6]──────────────... Thread 3: [Tx3]─────────────[Tx7]──────────────... Thread 4: [Tx4]─────────────[Tx8]──────────────... ────────────────────────────────────────────────▶ time Throughput up to: 4 × (1 transaction / apply_time) CONSTRAINT: Tx1 and Tx5 must not conflict (touch same rows)Schema changes (ALTER TABLE, CREATE INDEX) are often the worst lag culprits. They typically require table-level locks during apply and can take minutes or hours on large tables. Plan DDL operations carefully, use online DDL tools when available, and monitor lag closely during schema changes.
Accurate lag measurement is essential for monitoring, alerting, and debugging. Different databases expose lag metrics in different ways.
1234567891011121314151617181920212223
-- On the leader: view replication status for all followersSELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, -- Time-based lag (most useful for applications) write_lag, -- Time since follower wrote to disk flush_lag, -- Time since follower fsynced replay_lag, -- Time since follower applied (most important!) -- Position-based lag pg_wal_lsn_diff(sent_lsn, replay_lsn) AS bytes_behindFROM pg_stat_replication; -- On the follower: check if in recovery and last apply timeSELECT pg_is_in_recovery() AS is_replica, pg_last_wal_receive_lsn() AS last_received, pg_last_wal_replay_lsn() AS last_applied, pg_last_xact_replay_timestamp() AS last_apply_time, NOW() - pg_last_xact_replay_timestamp() AS lag_time;1234567891011121314151617181920
-- On the replica: check Seconds_Behind_MasterSHOW REPLICA STATUS\G -- Key fields:-- Seconds_Behind_Source: Estimated lag in seconds-- Relay_Log_Space: Size of relay logs (received but not applied)-- Read_Source_Log_Pos: Position read from source-- Exec_Source_Log_Pos: Position applied locally -- Or query performance_schema for more detailSELECT CHANNEL_NAME, LAST_QUEUED_TRANSACTION, LAST_APPLIED_TRANSACTION, LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP, LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP, TIMESTAMPDIFF(SECOND, LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP, NOW()) AS lag_secondsFROM performance_schema.replication_applier_status_by_coordinator;Don't rely solely on database-exposed metrics. Use external monitoring (Prometheus + Grafana, Datadog, etc.) that can historical track lag, correlate with other systems, and alert when the database itself might be unreachable. Write synthetic transactions to verify end-to-end replication health.
Replication lag creates consistency anomalies that applications must handle. Understanding these patterns helps you design resilient applications and avoid confusing bugs.
123456789101112131415161718192021222324
USER ACTION SYSTEM STATE───────────────────────────────────────────────────────────────────────────── 1. User updates profile LEADER: profile.bio = "New bio!" POST /profile/update Response: 200 OK FOLLOWER: profile.bio = "Old bio" (not yet replicated) 2. User views profile LEADER: profile.bio = "New bio!" GET /profile (load balanced to follower) FOLLOWER: profile.bio = "Old bio" ← Read returns stale data! User sees: "Old bio" User thinks: "My update didn't save!" 3. User refreshes again FOLLOWER: profile.bio = "New bio!" (after lag catches up) (finally replicated) User sees: "New bio!" USER EXPERIENCE: Confusing! Did my save work? Why did it take multiple refreshes?The Impact on Application Design:
Applications that naively assume reads reflect recent writes will exhibit confusing bugs under replication lag. Common failure patterns include:
Development and test environments often have single-node databases with no replication lag. The application 'works' in testing but breaks in production. Always test with realistic replication lag—either use a test cluster with lag or simulate lag at the application level.
The most common consistency requirement is read-your-writes: a user should always see the effects of their own writes, even if reading from a follower. Several patterns achieve this guarantee.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
// After a write, capture the leader's current positionasync function updateProfile(userId: string, newBio: string): Promise<WriteResult> { await db.leader.query( 'UPDATE profiles SET bio = $1 WHERE user_id = $2', [newBio, userId] ); // Get the current WAL position from leader const result = await db.leader.query('SELECT pg_current_wal_lsn() AS lsn'); const writePosition = result.rows[0].lsn; return { success: true, consistencyToken: writePosition };} // On subsequent reads, ensure follower has caught upasync function getProfile( userId: string, consistencyToken?: string): Promise<Profile> { const replica = selectReplica(); if (consistencyToken) { // Wait for replica to catch up to the write position await waitForReplicaPosition(replica, consistencyToken, { timeout: 5000, // Give up after 5 seconds, fall back to leader pollInterval: 50 }); } return await replica.query( 'SELECT * FROM profiles WHERE user_id = $1', [userId] );} async function waitForReplicaPosition( replica: Connection, targetLsn: string, options: { timeout: number, pollInterval: number }): Promise<void> { const startTime = Date.now(); while (Date.now() - startTime < options.timeout) { const result = await replica.query( 'SELECT pg_last_wal_replay_lsn() >= $1::pg_lsn AS caught_up', [targetLsn] ); if (result.rows[0].caught_up) { return; // Replica is current enough } await sleep(options.pollInterval); } throw new Error('Replica did not catch up in time');}The consistency token must travel with the user's session. Store it in a cookie, include it in API responses, or maintain it in client-side state. The client passes it back on subsequent requests so the server can enforce consistency.
While some lag is inherent to asynchronous replication, many causes of excessive lag can be addressed through infrastructure and configuration improvements.
| Strategy | How It Helps | Trade-offs |
|---|---|---|
| Faster Follower Storage | Faster apply I/O reduces apply time | Cost of premium storage (NVMe SSDs) |
| More Follower CPU/Memory | Faster query processing, larger buffer pool | Higher infrastructure costs |
| Parallel Apply | Multiple threads apply non-conflicting transactions | Complexity; not all DBs support; ordering constraints |
| Reduce Follower Read Load | Less competition between apply and reads | Need more followers or adjust read routing |
| Optimize Large Transactions | Smaller batches apply faster | Application changes; more round trips |
| Network Priority/QoS | Ensure replication traffic isn't starved | Network configuration complexity |
| Dedicated Replication Network | Separate replication from client traffic | Additional network infrastructure |
| Compression | Less data to transfer for wide-area replication | CPU overhead; not helpful for local |
wal_receiver_status_interval, max_standby_streaming_delay, recovery_min_apply_delayreplica_parallel_workers, replica_parallel_type = LOGICAL_CLOCK, replica_preserve_commit_orderreplWriterThreadCount, replBatchLimitBytes, storage engine settings123456789101112131415161718192021222324252627282930313233343536
IS LAG CONSISTENTLY HIGH? │ ┌─────┴─────┐ YES NO (spiky) │ │ ▼ ▼Check: Check:- Follower - Large transactions? hardware - DDL operations?- Apply rate - Slow queries blocking? vs leader - Traffic bursts? write rate │ │ ▼ ▼If apply Address root cause:rate < write - Break up transactionsrate: - Schedule DDL off-peak - Kill long queriesFollower - Scale for burstscan't keep up—scale hardware or enable parallel apply │ ▼If apply rate≈ write ratebut still lag: │ ▼Check network:- Bandwidth saturated?- Latency spikes?- Packet loss?Not all applications require minimal lag. Analytics dashboards can tolerate minutes of lag. Batch processing jobs can use dedicated followers with intentionally delayed replication. Match your lag tolerance to actual requirements—over-engineering for sub-second lag on a reporting replica wastes resources.
While we've focused on minimizing lag, there's a valuable pattern that intentionally maintains lag: delayed replicas (also called time-delayed or lag replicas). These are followers configured to apply changes only after a specified delay.
DELETE FROM users or DROP TABLE orders, the delayed replica still has the data for the configured delay period (e.g., 1 hour). You can failover to it or extract the missing data.123456789101112131415161718192021
-- PostgreSQL: Configure a standby with intentional delay-- In recovery.conf or postgresql.auto.conf on the standby:recovery_min_apply_delay = '1h' -- 1 hour delay -- MySQL: Configure replica delay-- On the replica:STOP REPLICA;CHANGE REPLICATION SOURCE TO SOURCE_DELAY = 3600; -- 1 hour in secondsSTART REPLICA; -- Verify delay is activeSHOW REPLICA STATUS\G-- Look for: SQL_Delay: 3600 -- MongoDB: Use a slaveDelay memberrs.add({ host: "delayed-replica:27017", priority: 0, -- Never become primary hidden: true, -- Not visible to clients slaveDelay: 3600 -- 1 hour delay (deprecated name, use secondaryDelaySecs in newer versions)});Delayed Replica Design Considerations:
Delayed replicas complement, not replace, regular backups and point-in-time recovery. They provide a faster recovery path for recent accidents but don't protect against all failure modes. Use both strategies together for comprehensive protection.
We've completed our deep exploration of replication lag—the final topic in leader-follower replication. Let's consolidate the key insights from this page and reflect on the entire module.
Across five pages, we've thoroughly examined the leader-follower replication model:
Single Leader Accepts Writes — All writes flow through one node, creating a total ordering and eliminating write conflicts.
Followers Replicate from Leader — Log-based replication (physical or logical) propagates changes to followers, which apply them to maintain synchronized copies.
Synchronous vs. Asynchronous — The fundamental trade-off between durability guarantees (sync) and write latency (async), with semi-synchronous modes offering balance.
Failover Handling — Detecting failures, electing new leaders, preventing split-brain, and managing the transition without data loss.
Replication Lag — Understanding, measuring, and mitigating the delay between leader and follower states.
Leader-follower replication is the dominant model in production databases because it provides a good balance of consistency, availability, and operability. It's not the only model—leaderless and multi-leader approaches offer different trade-offs—but it's the one you'll encounter most often and the foundation for understanding more advanced replication topologies.
Congratulations! You've mastered leader-follower replication—the backbone of most production database systems. You understand how writes flow through the leader, how followers maintain synchronized copies, the trade-offs between synchronous and asynchronous modes, how failover works, and how to manage replication lag. This knowledge applies across PostgreSQL, MySQL, MongoDB, and virtually every major database system.