System Design (HLD)Read Replicas

Read Replicas: Scaling SQL Reads

LevelIntermediate

Duration75 mins

TopicRead Replicas

2 / 5

Replica Lag Handling

The Lag Paradox

Read replicas promise scale, but they introduce a subtle yet fundamental challenge: replication lag. The moment you split reads from writes, you accept that replicas may not reflect the most recent state of your data. A user submits a form, refreshes the page, and sees stale data—or worse, no data at all.

This isn't a bug; it's a fundamental characteristic of asynchronous replication. The primary database writes a record at time T₀. The replica receives and applies that write at time T₀ + Δt, where Δt is the replication lag. During that window, any read from the replica returns outdated information.

For many use cases, small delays are imperceptible and acceptable. But when users expect immediate feedback, when business logic depends on current state, or when analytics must be accurate, lag becomes a critical concern. Understanding, measuring, and managing replication lag is essential for building systems that scale reliably.

This page provides the comprehensive treatment of replica lag that senior engineers rely on to build production-grade distributed database architectures.

What You Will Learn

By the end of this page, you will understand the causes of replication lag across different database systems, how to measure and monitor lag effectively, architectural patterns for minimizing lag, and application-level strategies for handling stale reads gracefully—including the critical 'read-your-writes' consistency pattern.

Understanding Replication Lag

Replication lag is the temporal delay between when a write commits on the primary database and when that write becomes visible on a replica. While conceptually simple, lag arises from multiple sources and manifests differently across database architectures.

The replication pipeline:

To understand lag, trace the journey of a single committed transaction from primary to replica:

Transaction Replication Lifecycle

•Commit on Primary — Transaction writes are flushed to the transaction log (WAL/binlog). At this moment, the write is durable on the primary.
•Log Transmission — The primary sends log records to connected replicas. This may happen immediately (streaming) or in batches.
•Log Receipt — The replica receives and writes log records to its relay log or local WAL storage.
•Log Application — The replica's apply process reads log records and executes them against the local data files.
•Visibility on Replica — Only after application completes is the data queryable on the replica.

Lag accumulates at each stage. Network latency affects transmission. I/O performance affects log writing. Apply thread capacity affects how quickly changes are executed. Under normal conditions, these delays sum to milliseconds. Under stress, they compound to seconds or even minutes.

Converting Mermaid diagram...

Lag Is Unavoidable in Async Replication

If you choose asynchronous replication (which most systems do for performance), some lag is guaranteed. The question isn't whether lag exists, but how much you can tolerate and how your application handles it. Zero lag requires synchronous replication, which trades availability and throughput for consistency.

Causes of Replication Lag

Replication lag doesn't appear randomly—it has specific, diagnosable causes. Understanding these causes enables targeted optimization and accurate capacity planning.

Root Causes of Replication Lag
Category	Specific Cause	Mechanism	Mitigation Strategy
Network	High latency links	Each log record waits for round-trip transmission	Co-locate replicas; use faster network; tune socket buffers
Network	Bandwidth saturation	Large transactions queue behind each other	Increase bandwidth; compress log data; reduce transaction size
Primary Load	High write throughput	Replicas cannot apply as fast as primary generates	Scale replicas vertically; add more apply workers
Primary Load	Large transactions	Single transaction with millions of rows blocks pipeline	Break into smaller batches; avoid bulk operations during peak
Replica Resources	CPU saturation	Apply thread cannot process fast enough	Upgrade replica CPU; enable parallel apply (if supported)
Replica Resources	I/O bottleneck	Disk cannot write data as fast as it arrives	Use faster storage (NVMe); increase IOPS limits
Schema	Missing replica indexes	Apply must maintain indexes that don't exist or are suboptimal	Ensure replica schema matches primary
Schema	Expensive triggers	Replica fires triggers during apply	Disable non-essential triggers on replicas
Long-running queries	Apply blocking	Long SELECT queries acquire locks that block apply	Set statement timeout; use read-only replicas
Configuration	Single-threaded apply	Default MySQL replication is single-threaded	Enable parallel replication (slave_parallel_workers)

The apply bottleneck:

Historically, database replication applied transactions serially—one at a time, in commit order. This design guarantees correctness but creates a severe bottleneck when the primary's write concurrency exceeds what a single apply thread can process.

Modern databases address this with parallel apply:

PostgreSQL: Starting with version 16, supports parallel apply for logical replication. Streaming replication with hot standby applies WAL in parallel at the storage layer.

MySQL 8.0+: Supports slave_parallel_workers for parallel replication. Can parallelize by schema, transaction dependencies, or logical clock timestamps.

Aurora/Cloud SQL: Implement storage-level replication that bypasses traditional log apply, achieving near-zero lag for many workloads.

Enabling parallel apply can reduce lag dramatically but requires careful configuration to avoid conflicts.

Diagnosing Lag Causes

When lag spikes occur, systematic diagnosis is essential. Check primary write throughput (is it unusually high?), network metrics (latency, packet loss), replica resource utilization (CPU, I/O wait), and apply thread status. Most databases expose replication-specific metrics that pinpoint the bottleneck.

Measuring and Monitoring Lag

You cannot manage what you don't measure. Effective lag monitoring requires understanding database-specific measurement mechanisms and integrating them into your observability stack.

PostgreSQL provides several methods for measuring streaming replication lag:

On the replica (standby):

postgres-lag-queries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Method 1: Time since last replayed transaction
-- Note: Returns NULL if no transactions have been replayed yet
SELECT 
    now() - pg_last_xact_replay_timestamp() AS replication_lag;
 
-- Method 2: WAL position comparison (more granular)
SELECT 
    pg_last_wal_receive_lsn() AS received_lsn,
    pg_last_wal_replay_lsn() AS replayed_lsn,
    pg_last_wal_receive_lsn() - pg_last_wal_replay_lsn() AS apply_lag_bytes;
 
-- Method 3: Query primary for per-replica status
-- Run this on the PRIMARY:
SELECT 
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes,
    (pg_wal_lsn_diff(sent_lsn, replay_lsn) / 1024.0 / 1024.0)::numeric(10,2) AS lag_mb
FROM pg_stat_replication;

Lag Metrics Interpretation

PostgreSQL reports lag in both time (seconds since last transaction) and bytes (WAL position difference). Time-based lag can be misleading during low-write periods—if no writes occur for an hour, time lag shows an hour even though the replica is current. Byte-based lag is more accurate for determining actual delay.

Alerting thresholds:

Establish alerting based on your application's tolerance:

0-100ms: Normal for most applications; no action needed
100ms-1s: Warning level; investigate if sustained
1-30s: Critical for user-facing reads; consider routing to primary
>30s: High severity; may indicate replica failure; trigger incident response

Thresholds vary by use case. Analytics workloads may tolerate minutes of lag; user-facing reads may require sub-second freshness.

Architectural Patterns for Minimizing Lag

While some lag is inevitable in asynchronous systems, architectural decisions can minimize its magnitude and impact.

Infrastructure-Level Lag Reduction

•Co-locate primary and replicas — Network latency is often the largest contributor to lag. Place replicas in the same availability zone or data center as the primary. Use dedicated, high-bandwidth network links.
•Match replica hardware to primary — Underpowered replicas create apply bottlenecks. Replicas should have equivalent or better CPU and I/O capacity than the primary.
•Enable parallel replication — Configure multiple apply workers in MySQL (slave_parallel_workers). PostgreSQL streaming replication applies WAL in parallel at the storage layer automatically.
•Use NVMe storage — Fast storage reduces I/O wait during log writing and apply operations. The difference between HDD and NVMe can be 10-100x in IOPS-heavy workloads.
•Tune kernel and network buffers — Increase TCP buffer sizes, enable jumbo frames if supported, tune kernel parameters for high-throughput network I/O.

Semi-synchronous replication:

For applications that cannot tolerate any lag on critical writes, consider semi-synchronous replication. In this mode, the primary waits for at least one replica to acknowledge receipt (but not necessarily application) of the write before confirming the commit to the client.

Trade-offs:

Increases write latency (adds one network round-trip)
Provides durability guarantee (data exists on 2+ nodes at commit time)
Does not guarantee zero read lag (receipt ≠ apply)
Can cause write stalls if all replicas become unreachable

mysql-semi-sync.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Enable semi-synchronous replication on MySQL Primary
INSTALL PLUGIN rpl_semi_sync_source SONAME 'semisync_source.so';
SET GLOBAL rpl_semi_sync_source_enabled = ON;
SET GLOBAL rpl_semi_sync_source_timeout = 10000; -- 10 seconds
 
-- Configure minimum number of replicas that must acknowledge
SET GLOBAL rpl_semi_sync_source_wait_for_replica_count = 1;
 
-- On Replica
INSTALL PLUGIN rpl_semi_sync_replica SONAME 'semisync_replica.so';
SET GLOBAL rpl_semi_sync_replica_enabled = ON;
 
-- Verify status
SHOW STATUS LIKE 'Rpl_semi_sync%';
 
/*
Important metrics:
  - Rpl_semi_sync_source_status: ON if semi-sync is active
  - Rpl_semi_sync_source_no_tx: Count of transactions that fell back to async
  - Rpl_semi_sync_source_yes_tx: Count of semi-sync acknowledged transactions
*/

Aurora and Logical Replication

Amazon Aurora achieves near-zero replica lag through storage-level replication. Instead of shipping transaction logs, Aurora replicates at the storage layer, and replicas read from the same shared storage with minimal delay. Consider Aurora or similar cloud-native architectures when lag requirements are stringent.

Application-Level Lag Handling

Even with optimized infrastructure, applications must be designed to handle replication lag gracefully. The most important pattern is read-your-writes consistency—ensuring users see their own recent changes, even when reading from replicas.

Read-your-writes consistency strategies:

Read-Your-Writes Implementation Strategies
Strategy	Implementation	Pros	Cons
Primary reads after writes	Route reads to primary for N seconds after user writes	Simple; guaranteed freshness	Increases primary load; requires session affinity or state
Timestamp-based routing	Track last write timestamp; route to replica only if lag < staleness threshold	Dynamic; adapts to lag conditions	Requires accurate lag monitoring; clock skew concerns
Session stickiness	Pin user session to specific replica that has received their writes	Consistent within session	Complex affinity; replica failure breaks consistency
Causal consistency tokens	Return commit LSN/GTID with write response; require replica >= that position	Precise; minimal overhead	Requires database support; complexity in distributed apps
Optimistic client display	Show write locally before confirmation; reconcile if needed	Instant feedback	May show data that doesn't persist; reconciliation complexity

read-your-writes.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
// Read-your-writes implementation using timestamp-based routing
 
interface WriteContext {
    userId: string;
    lastWriteTime: number; // Unix timestamp in milliseconds
    writeTables: Set<string>; // Tables modified by user
}
 
// In-memory store (use Redis in production)
const writeContexts = new Map<string, WriteContext>();
 
// Configuration
const STALE_READ_TOLERANCE_MS = 5000; // 5 seconds
const WRITE_CONTEXT_TTL_MS = 60000; // 1 minute
 
// Call after any write operation
function recordWrite(userId: string, tableName: string): void {
    const context = writeContexts.get(userId) ?? {
        userId,
        lastWriteTime: 0,
        writeTables: new Set(),
    };
    
    context.lastWriteTime = Date.now();
    context.writeTables.add(tableName);
    writeContexts.set(userId, context);
    
    // Cleanup old contexts
    setTimeout(() => {
        const current = writeContexts.get(userId);
        if (current && Date.now() - current.lastWriteTime > WRITE_CONTEXT_TTL_MS) {
            writeContexts.delete(userId);
        }
    }, WRITE_CONTEXT_TTL_MS);
}
 
// Determine which connection to use for a read
function shouldReadFromPrimary(userId: string, tableName: string): boolean {
    const context = writeContexts.get(userId);
    
    if (!context) {
        return false; // No recent writes; replica is safe
    }
    
    const timeSinceWrite = Date.now() - context.lastWriteTime;
    
    // If user recently wrote to this table, read from primary
    if (context.writeTables.has(tableName) && timeSinceWrite < STALE_READ_TOLERANCE_MS) {
        return true;
    }
    
    return false;
}
 
// Usage example
async function getUserOrders(userId: string): Promise<Order[]> {
    const connection = shouldReadFromPrimary(userId, 'orders') 
        ? primaryConnection 
        : replicaConnection;
    
    return connection.query('SELECT * FROM orders WHERE user_id = ?', [userId]);
}
 
// After creating an order
async function createOrder(userId: string, orderData: OrderData): Promise<Order> {
    const order = await primaryConnection.query(
        'INSERT INTO orders (user_id, ...) VALUES (?, ...) RETURNING *',
        [userId, ...]
    );
    
    recordWrite(userId, 'orders');
    return order;
}

Handling stale reads gracefully:

Not all reads require fresh data. Design your application to categorize reads by freshness requirements:

Fresh reads (primary required): Account balance checks, inventory availability, real-time bidding
Recent reads (low lag tolerance): User's own content, activity feeds, notifications
Eventually consistent reads (lag tolerance: seconds-minutes): Search results, recommendation lists, aggregate statistics
Archival reads (no freshness requirement): Historical reports, audit logs, analytics

Route each category appropriately. This reduces primary load while maintaining user experience.

UI Patterns for Lag Tolerance

Consider UI patterns that set user expectations. Loading spinners, 'refreshing...' indicators, and timestamps ('updated 30 seconds ago') help users understand that data may not be instantaneous. This is especially important for dashboards and feeds where absolute freshness is less critical than perceived responsiveness.

Lag-Aware Load Balancing

When multiple replicas exist, intelligent load balancing can route queries away from lagging replicas, ensuring consistent read performance even when individual replicas experience issues.

Lag-Aware Load Balancing Strategies

•Weighted round-robin with lag penalty — Reduce weight for replicas as their lag increases. At critical lag thresholds, weight drops to zero, removing the replica from rotation.
•Lag threshold exclusion — Configure a maximum acceptable lag (e.g., 30 seconds). Replicas exceeding this threshold are removed from the healthy pool until they catch up.
•Least-lag routing — Always route to the replica with the lowest current lag. Ensures freshest possible reads but may create hotspots on the 'best' replica.
•Lag tier grouping — Group replicas by lag ranges (e.g., <1s, 1-10s, 10-60s). Route low-latency queries to the freshest tier; route analytics to any tier.

lag-aware-balancer.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
// Lag-aware load balancer implementation
 
interface ReplicaStatus {
    id: string;
    connection: DatabaseConnection;
    lagMs: number;
    lastHealthCheck: Date;
    healthy: boolean;
}
 
class LagAwareLoadBalancer {
    private replicas: ReplicaStatus[] = [];
    private readonly maxLagMs: number;
    private readonly healthCheckIntervalMs: number;
    
    constructor(config: { maxLagMs: number; healthCheckIntervalMs: number }) {
        this.maxLagMs = config.maxLagMs;
        this.healthCheckIntervalMs = config.healthCheckIntervalMs;
        this.startHealthChecks();
    }
    
    addReplica(id: string, connection: DatabaseConnection): void {
        this.replicas.push({
            id,
            connection,
            lagMs: 0,
            lastHealthCheck: new Date(),
            healthy: true,
        });
    }
    
    private async checkReplicaHealth(replica: ReplicaStatus): Promise<void> {
        try {
            // PostgreSQL lag check
            const result = await replica.connection.query(`
                SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) * 1000 AS lag_ms
            `);
            
            const lagMs = result.rows[0]?.lag_ms ?? Infinity;
            
            replica.lagMs = lagMs;
            replica.healthy = lagMs <= this.maxLagMs;
            replica.lastHealthCheck = new Date();
            
            if (!replica.healthy) {
                console.warn(`Replica ${replica.id} removed from pool (lag: ${lagMs}ms)`);
            }
        } catch (error) {
            replica.healthy = false;
            replica.lagMs = Infinity;
            console.error(`Replica ${replica.id} health check failed`, error);
        }
    }
    
    private startHealthChecks(): void {
        setInterval(() => {
            this.replicas.forEach(replica => this.checkReplicaHealth(replica));
        }, this.healthCheckIntervalMs);
    }
    
    // Get best available replica
    getReplica(): DatabaseConnection | null {
        const healthyReplicas = this.replicas
            .filter(r => r.healthy)
            .sort((a, b) => a.lagMs - b.lagMs);
        
        if (healthyReplicas.length === 0) {
            return null; // No healthy replicas; caller should fall back to primary
        }
        
        // Weighted random selection favoring lower-lag replicas
        const weights = healthyReplicas.map(r => Math.max(1, this.maxLagMs - r.lagMs));
        const totalWeight = weights.reduce((sum, w) => sum + w, 0);
        let random = Math.random() * totalWeight;
        
        for (let i = 0; i < healthyReplicas.length; i++) {
            random -= weights[i];
            if (random <= 0) {
                return healthyReplicas[i].connection;
            }
        }
        
        return healthyReplicas[0].connection;
    }
    
    // For queries that require maximum freshness
    getFreshestReplica(): DatabaseConnection | null {
        const healthyReplicas = this.replicas
            .filter(r => r.healthy)
            .sort((a, b) => a.lagMs - b.lagMs);
        
        return healthyReplicas[0]?.connection ?? null;
    }
    
    getStatus(): { id: string; lagMs: number; healthy: boolean }[] {
        return this.replicas.map(r => ({
            id: r.id,
            lagMs: r.lagMs,
            healthy: r.healthy,
        }));
    }
}

Summary: Replica Lag Handling

Replication lag is an inherent characteristic of asynchronous database replication. Rather than fighting it, successful systems understand, measure, and design around it.

Key Takeaways

•Lag occurs at every pipeline stage — Network transmission, log writing, and transaction apply all contribute. Each stage can become a bottleneck under load.
•Measurement is essential — Use database-specific tools to monitor lag continuously. Establish alerting thresholds aligned with your application's freshness requirements.
•Infrastructure matters — Co-location, matching hardware, parallel apply, and fast storage all reduce lag. Semi-synchronous replication trades latency for durability when needed.
•Read-your-writes consistency is the critical pattern — Users must see their own changes. Implement timestamp-based routing, session stickiness, or causal consistency tokens.
•Lag-aware load balancing protects quality — Route queries away from lagging replicas. Maintain health checks that detect and exclude problematic replicas.
•Design for graceful degradation — When all replicas lag, fall back to primary reads. When primary is overwhelmed, serve stale data with appropriate user messaging.

What's next:

With lag handling strategies established, the next page covers Load Balancing Reads—distributing read traffic efficiently across multiple replicas while handling failures, hotspots, and varying replica capacities.

Page Complete

You now have a comprehensive understanding of replication lag—its causes, measurement approaches, and strategies for handling it at both infrastructure and application levels. This knowledge is essential for building read replica architectures that scale reliably without sacrificing user experience.

2 / 5

Loading learning content...

System Design (HLD)Read Replicas

Read Replicas: Scaling SQL Reads

LevelIntermediate

Duration75 mins

TopicRead Replicas

2 / 5

Replica Lag Handling

The Lag Paradox

This page provides the comprehensive treatment of replica lag that senior engineers rely on to build production-grade distributed database architectures.

What You Will Learn

Understanding Replication Lag

The replication pipeline:

To understand lag, trace the journey of a single committed transaction from primary to replica:

Transaction Replication Lifecycle

•Commit on Primary — Transaction writes are flushed to the transaction log (WAL/binlog). At this moment, the write is durable on the primary.
•Log Transmission — The primary sends log records to connected replicas. This may happen immediately (streaming) or in batches.
•Log Receipt — The replica receives and writes log records to its relay log or local WAL storage.
•Log Application — The replica's apply process reads log records and executes them against the local data files.
•Visibility on Replica — Only after application completes is the data queryable on the replica.

Converting Mermaid diagram...

Lag Is Unavoidable in Async Replication

Causes of Replication Lag

Replication lag doesn't appear randomly—it has specific, diagnosable causes. Understanding these causes enables targeted optimization and accurate capacity planning.

Root Causes of Replication Lag
Category	Specific Cause	Mechanism	Mitigation Strategy
Network	High latency links	Each log record waits for round-trip transmission	Co-locate replicas; use faster network; tune socket buffers
Network	Bandwidth saturation	Large transactions queue behind each other	Increase bandwidth; compress log data; reduce transaction size
Primary Load	High write throughput	Replicas cannot apply as fast as primary generates	Scale replicas vertically; add more apply workers
Primary Load	Large transactions	Single transaction with millions of rows blocks pipeline	Break into smaller batches; avoid bulk operations during peak
Replica Resources	CPU saturation	Apply thread cannot process fast enough	Upgrade replica CPU; enable parallel apply (if supported)
Replica Resources	I/O bottleneck	Disk cannot write data as fast as it arrives	Use faster storage (NVMe); increase IOPS limits
Schema	Missing replica indexes	Apply must maintain indexes that don't exist or are suboptimal	Ensure replica schema matches primary
Schema	Expensive triggers	Replica fires triggers during apply	Disable non-essential triggers on replicas
Long-running queries	Apply blocking	Long SELECT queries acquire locks that block apply	Set statement timeout; use read-only replicas
Configuration	Single-threaded apply	Default MySQL replication is single-threaded	Enable parallel replication (slave_parallel_workers)

The apply bottleneck:

Modern databases address this with parallel apply:

PostgreSQL: Starting with version 16, supports parallel apply for logical replication. Streaming replication with hot standby applies WAL in parallel at the storage layer.

MySQL 8.0+: Supports slave_parallel_workers for parallel replication. Can parallelize by schema, transaction dependencies, or logical clock timestamps.

Aurora/Cloud SQL: Implement storage-level replication that bypasses traditional log apply, achieving near-zero lag for many workloads.

Enabling parallel apply can reduce lag dramatically but requires careful configuration to avoid conflicts.

Diagnosing Lag Causes

Measuring and Monitoring Lag

You cannot manage what you don't measure. Effective lag monitoring requires understanding database-specific measurement mechanisms and integrating them into your observability stack.

PostgreSQL provides several methods for measuring streaming replication lag:

On the replica (standby):

postgres-lag-queries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Method 1: Time since last replayed transaction
-- Note: Returns NULL if no transactions have been replayed yet
SELECT 
    now() - pg_last_xact_replay_timestamp() AS replication_lag;
 
-- Method 2: WAL position comparison (more granular)
SELECT 
    pg_last_wal_receive_lsn() AS received_lsn,
    pg_last_wal_replay_lsn() AS replayed_lsn,
    pg_last_wal_receive_lsn() - pg_last_wal_replay_lsn() AS apply_lag_bytes;
 
-- Method 3: Query primary for per-replica status
-- Run this on the PRIMARY:
SELECT 
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes,
    (pg_wal_lsn_diff(sent_lsn, replay_lsn) / 1024.0 / 1024.0)::numeric(10,2) AS lag_mb
FROM pg_stat_replication;

Lag Metrics Interpretation

Alerting thresholds:

Establish alerting based on your application's tolerance:

0-100ms: Normal for most applications; no action needed
100ms-1s: Warning level; investigate if sustained
1-30s: Critical for user-facing reads; consider routing to primary
>30s: High severity; may indicate replica failure; trigger incident response

Thresholds vary by use case. Analytics workloads may tolerate minutes of lag; user-facing reads may require sub-second freshness.

Architectural Patterns for Minimizing Lag

While some lag is inevitable in asynchronous systems, architectural decisions can minimize its magnitude and impact.

Infrastructure-Level Lag Reduction

•Co-locate primary and replicas — Network latency is often the largest contributor to lag. Place replicas in the same availability zone or data center as the primary. Use dedicated, high-bandwidth network links.
•Match replica hardware to primary — Underpowered replicas create apply bottlenecks. Replicas should have equivalent or better CPU and I/O capacity than the primary.
•Enable parallel replication — Configure multiple apply workers in MySQL (slave_parallel_workers). PostgreSQL streaming replication applies WAL in parallel at the storage layer automatically.
•Use NVMe storage — Fast storage reduces I/O wait during log writing and apply operations. The difference between HDD and NVMe can be 10-100x in IOPS-heavy workloads.
•Tune kernel and network buffers — Increase TCP buffer sizes, enable jumbo frames if supported, tune kernel parameters for high-throughput network I/O.

Semi-synchronous replication:

Trade-offs:

Increases write latency (adds one network round-trip)
Provides durability guarantee (data exists on 2+ nodes at commit time)
Does not guarantee zero read lag (receipt ≠ apply)
Can cause write stalls if all replicas become unreachable

mysql-semi-sync.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Enable semi-synchronous replication on MySQL Primary
INSTALL PLUGIN rpl_semi_sync_source SONAME 'semisync_source.so';
SET GLOBAL rpl_semi_sync_source_enabled = ON;
SET GLOBAL rpl_semi_sync_source_timeout = 10000; -- 10 seconds
 
-- Configure minimum number of replicas that must acknowledge
SET GLOBAL rpl_semi_sync_source_wait_for_replica_count = 1;
 
-- On Replica
INSTALL PLUGIN rpl_semi_sync_replica SONAME 'semisync_replica.so';
SET GLOBAL rpl_semi_sync_replica_enabled = ON;
 
-- Verify status
SHOW STATUS LIKE 'Rpl_semi_sync%';
 
/*
Important metrics:
  - Rpl_semi_sync_source_status: ON if semi-sync is active
  - Rpl_semi_sync_source_no_tx: Count of transactions that fell back to async
  - Rpl_semi_sync_source_yes_tx: Count of semi-sync acknowledged transactions
*/

Aurora and Logical Replication

Application-Level Lag Handling

Read-your-writes consistency strategies:

Read-Your-Writes Implementation Strategies
Strategy	Implementation	Pros	Cons
Primary reads after writes	Route reads to primary for N seconds after user writes	Simple; guaranteed freshness	Increases primary load; requires session affinity or state
Timestamp-based routing	Track last write timestamp; route to replica only if lag < staleness threshold	Dynamic; adapts to lag conditions	Requires accurate lag monitoring; clock skew concerns
Session stickiness	Pin user session to specific replica that has received their writes	Consistent within session	Complex affinity; replica failure breaks consistency
Causal consistency tokens	Return commit LSN/GTID with write response; require replica >= that position	Precise; minimal overhead	Requires database support; complexity in distributed apps
Optimistic client display	Show write locally before confirmation; reconcile if needed	Instant feedback	May show data that doesn't persist; reconciliation complexity

read-your-writes.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
// Read-your-writes implementation using timestamp-based routing
 
interface WriteContext {
    userId: string;
    lastWriteTime: number; // Unix timestamp in milliseconds
    writeTables: Set<string>; // Tables modified by user
}
 
// In-memory store (use Redis in production)
const writeContexts = new Map<string, WriteContext>();
 
// Configuration
const STALE_READ_TOLERANCE_MS = 5000; // 5 seconds
const WRITE_CONTEXT_TTL_MS = 60000; // 1 minute
 
// Call after any write operation
function recordWrite(userId: string, tableName: string): void {
    const context = writeContexts.get(userId) ?? {
        userId,
        lastWriteTime: 0,
        writeTables: new Set(),
    };
    
    context.lastWriteTime = Date.now();
    context.writeTables.add(tableName);
    writeContexts.set(userId, context);
    
    // Cleanup old contexts
    setTimeout(() => {
        const current = writeContexts.get(userId);
        if (current && Date.now() - current.lastWriteTime > WRITE_CONTEXT_TTL_MS) {
            writeContexts.delete(userId);
        }
    }, WRITE_CONTEXT_TTL_MS);
}
 
// Determine which connection to use for a read
function shouldReadFromPrimary(userId: string, tableName: string): boolean {
    const context = writeContexts.get(userId);
    
    if (!context) {
        return false; // No recent writes; replica is safe
    }
    
    const timeSinceWrite = Date.now() - context.lastWriteTime;
    
    // If user recently wrote to this table, read from primary
    if (context.writeTables.has(tableName) && timeSinceWrite < STALE_READ_TOLERANCE_MS) {
        return true;
    }
    
    return false;
}
 
// Usage example
async function getUserOrders(userId: string): Promise<Order[]> {
    const connection = shouldReadFromPrimary(userId, 'orders') 
        ? primaryConnection 
        : replicaConnection;
    
    return connection.query('SELECT * FROM orders WHERE user_id = ?', [userId]);
}
 
// After creating an order
async function createOrder(userId: string, orderData: OrderData): Promise<Order> {
    const order = await primaryConnection.query(
        'INSERT INTO orders (user_id, ...) VALUES (?, ...) RETURNING *',
        [userId, ...]
    );
    
    recordWrite(userId, 'orders');
    return order;
}

Handling stale reads gracefully:

Not all reads require fresh data. Design your application to categorize reads by freshness requirements:

Fresh reads (primary required): Account balance checks, inventory availability, real-time bidding
Recent reads (low lag tolerance): User's own content, activity feeds, notifications
Eventually consistent reads (lag tolerance: seconds-minutes): Search results, recommendation lists, aggregate statistics
Archival reads (no freshness requirement): Historical reports, audit logs, analytics

Route each category appropriately. This reduces primary load while maintaining user experience.

UI Patterns for Lag Tolerance

Lag-Aware Load Balancing

When multiple replicas exist, intelligent load balancing can route queries away from lagging replicas, ensuring consistent read performance even when individual replicas experience issues.

Lag-Aware Load Balancing Strategies

•Weighted round-robin with lag penalty — Reduce weight for replicas as their lag increases. At critical lag thresholds, weight drops to zero, removing the replica from rotation.
•Lag threshold exclusion — Configure a maximum acceptable lag (e.g., 30 seconds). Replicas exceeding this threshold are removed from the healthy pool until they catch up.
•Least-lag routing — Always route to the replica with the lowest current lag. Ensures freshest possible reads but may create hotspots on the 'best' replica.
•Lag tier grouping — Group replicas by lag ranges (e.g., <1s, 1-10s, 10-60s). Route low-latency queries to the freshest tier; route analytics to any tier.

lag-aware-balancer.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
// Lag-aware load balancer implementation
 
interface ReplicaStatus {
    id: string;
    connection: DatabaseConnection;
    lagMs: number;
    lastHealthCheck: Date;
    healthy: boolean;
}
 
class LagAwareLoadBalancer {
    private replicas: ReplicaStatus[] = [];
    private readonly maxLagMs: number;
    private readonly healthCheckIntervalMs: number;
    
    constructor(config: { maxLagMs: number; healthCheckIntervalMs: number }) {
        this.maxLagMs = config.maxLagMs;
        this.healthCheckIntervalMs = config.healthCheckIntervalMs;
        this.startHealthChecks();
    }
    
    addReplica(id: string, connection: DatabaseConnection): void {
        this.replicas.push({
            id,
            connection,
            lagMs: 0,
            lastHealthCheck: new Date(),
            healthy: true,
        });
    }
    
    private async checkReplicaHealth(replica: ReplicaStatus): Promise<void> {
        try {
            // PostgreSQL lag check
            const result = await replica.connection.query(`
                SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) * 1000 AS lag_ms
            `);
            
            const lagMs = result.rows[0]?.lag_ms ?? Infinity;
            
            replica.lagMs = lagMs;
            replica.healthy = lagMs <= this.maxLagMs;
            replica.lastHealthCheck = new Date();
            
            if (!replica.healthy) {
                console.warn(`Replica ${replica.id} removed from pool (lag: ${lagMs}ms)`);
            }
        } catch (error) {
            replica.healthy = false;
            replica.lagMs = Infinity;
            console.error(`Replica ${replica.id} health check failed`, error);
        }
    }
    
    private startHealthChecks(): void {
        setInterval(() => {
            this.replicas.forEach(replica => this.checkReplicaHealth(replica));
        }, this.healthCheckIntervalMs);
    }
    
    // Get best available replica
    getReplica(): DatabaseConnection | null {
        const healthyReplicas = this.replicas
            .filter(r => r.healthy)
            .sort((a, b) => a.lagMs - b.lagMs);
        
        if (healthyReplicas.length === 0) {
            return null; // No healthy replicas; caller should fall back to primary
        }
        
        // Weighted random selection favoring lower-lag replicas
        const weights = healthyReplicas.map(r => Math.max(1, this.maxLagMs - r.lagMs));
        const totalWeight = weights.reduce((sum, w) => sum + w, 0);
        let random = Math.random() * totalWeight;
        
        for (let i = 0; i < healthyReplicas.length; i++) {
            random -= weights[i];
            if (random <= 0) {
                return healthyReplicas[i].connection;
            }
        }
        
        return healthyReplicas[0].connection;
    }
    
    // For queries that require maximum freshness
    getFreshestReplica(): DatabaseConnection | null {
        const healthyReplicas = this.replicas
            .filter(r => r.healthy)
            .sort((a, b) => a.lagMs - b.lagMs);
        
        return healthyReplicas[0]?.connection ?? null;
    }
    
    getStatus(): { id: string; lagMs: number; healthy: boolean }[] {
        return this.replicas.map(r => ({
            id: r.id,
            lagMs: r.lagMs,
            healthy: r.healthy,
        }));
    }
}

Summary: Replica Lag Handling

Replication lag is an inherent characteristic of asynchronous database replication. Rather than fighting it, successful systems understand, measure, and design around it.

Key Takeaways

•Lag occurs at every pipeline stage — Network transmission, log writing, and transaction apply all contribute. Each stage can become a bottleneck under load.
•Measurement is essential — Use database-specific tools to monitor lag continuously. Establish alerting thresholds aligned with your application's freshness requirements.
•Infrastructure matters — Co-location, matching hardware, parallel apply, and fast storage all reduce lag. Semi-synchronous replication trades latency for durability when needed.
•Read-your-writes consistency is the critical pattern — Users must see their own changes. Implement timestamp-based routing, session stickiness, or causal consistency tokens.
•Lag-aware load balancing protects quality — Route queries away from lagging replicas. Maintain health checks that detect and exclude problematic replicas.
•Design for graceful degradation — When all replicas lag, fall back to primary reads. When primary is overwhelmed, serve stale data with appropriate user messaging.

What's next:

Page Complete

2 / 5