System Design (HLD)Database Performance

Database Performance Optimization

LevelAdvanced

Duration90 mins

TopicDatabase Performance

4 / 5

Read Replicas: Scaling Read Throughput

When One Database Isn't Enough

Your application becomes successful. Traffic grows. Your single database server, once handling 100 queries per second comfortably, now struggles with 10,000 queries per second. The CPU is pegged at 100%, queries are timing out, and users are complaining.

You could scale vertically—buy a bigger server with more CPU, RAM, and faster SSDs. But vertical scaling has limits and costs grow exponentially. A server with 2x the performance often costs 4x as much, and there's an upper ceiling to what hardware you can buy.

Read replicas offer a different path: horizontal scaling for read operations. Instead of one database handling all queries, you deploy multiple copies—replicas—that stay synchronized with the primary. Read traffic is distributed across replicas, multiplying your read throughput while the primary focuses on writes.

This pattern is foundational to scaling databases. Every major web application—from Twitter to Stripe to Netflix—relies on read replicas to handle their read-heavy workloads.

What You Will Learn

By the end of this page, you will understand: how database replication works at a low level; the spectrum from synchronous to asynchronous replication; replication lag and its implications; patterns for routing reads to replicas safely; replica failover and promotion; and production configurations for PostgreSQL and MySQL.

Why Read Replicas?

Before diving into implementation, let's understand the fundamental problem read replicas solve and why they're so effective.

The Read/Write Ratio:

Most applications are heavily read-oriented:

Application Type	Typical Read/Write Ratio
Social media feed	100:1 to 1000:1
E-commerce catalog	50:1 to 200:1
News/content sites	500:1 to 5000:1
Analytics dashboards	100:1
Banking/financial	10:1 to 50:1

If 95% of your database operations are reads, scaling reads provides 20x the impact of scaling writes.

What replicas provide:

Benefits of Read Replicas

•Horizontal read scaling — Each replica can handle full read load; 5 replicas = ~5x read throughput
•Load distribution — Primary handles writes; replicas handle reads; total capacity increases
•Geographic distribution — Place replicas near users for lower latency (US-East, EU-West, AP-South)
•Fault tolerance — If primary fails, promote a replica (high availability)
•Operational flexibility — Run backups, analytics, or maintenance on replicas without impacting primary
•Read-your-writes consistency — With proper routing, users read their own writes while getting scale benefits

Replicas Don't Scale Writes

Read replicas only scale read operations. All writes still go to the primary and are replicated to followers. For write-heavy workloads, you need different strategies: sharding (partitioning data across multiple primaries), or using databases designed for write scaling (like Cassandra or CockroachDB).

How Replication Works

Database replication works by shipping changes from the primary to replicas. The mechanism varies by database, but the core concept is consistent: capture changes on primary, transmit to replicas, apply on replicas.

PostgreSQL Streaming Replication:

PostgreSQL uses Write-Ahead Log (WAL) shipping:

Every change to the database is first written to the WAL (before being applied to tables)
The WAL is a sequential log of all operations (INSERT, UPDATE, DELETE, etc.)
Replicas connect to the primary and stream WAL records
Replicas apply WAL records to their local copies, keeping them synchronized

replication_flow.txt

Text

PostgreSQL Streaming Replication Flow
 
Application
    │
    │ INSERT INTO orders VALUES (...)
    ▼
┌─────────────────────────────────────────────────────┐
│                    PRIMARY                          │
│  ┌─────────┐      ┌────────────┐      ┌─────────┐  │
│  │ Query   │──────│ WAL Buffer │──────│ WAL     │  │
│  │ Parser  │      │ (memory)   │      │ Writer  │  │
│  └─────────┘      └────────────┘      └────┬────┘  │
│                                             │       │
│                          ┌──────────────────┘       │
│                          ▼                          │
│                    ┌──────────┐                     │
│                    │ WAL Files │ (pg_wal directory) │
│                    └─────┬────┘                     │
│                          │                          │
│                    ┌─────┴─────┐                    │
│                    │ WAL Sender │←── Replica conn   │
│                    └─────┬─────┘                    │
└──────────────────────────┼──────────────────────────┘
                           │
          TCP Connection   │  WAL Stream
          (replication)    │
                           ▼
┌─────────────────────────────────────────────────────┐
│                    REPLICA                          │
│                    ┌───────────┐                    │
│                    │WAL Receiver│                   │
│                    └─────┬─────┘                    │
│                          │                          │
│                    ┌─────▼─────┐                    │
│                    │ WAL Files │ (local copy)       │
│                    └─────┬─────┘                    │
│                          │                          │
│                    ┌─────▼─────┐                    │
│                    │  Startup  │ (replay process)   │
│                    │  Process  │                    │
│                    └─────┬─────┘                    │
│                          ▼                          │
│                    Applied to tables                │
│                    (replica now has the INSERT)     │
└─────────────────────────────────────────────────────┘

MySQL Replication:

MySQL uses binary log (binlog) replication:

Primary writes changes to the binary log
Replicas have an I/O thread that connects to primary and copies binlog events
Events are written to replica's relay log
SQL thread on replica reads relay log and applies changes

MySQL 8.0+ Group Replication:

Modern MySQL supports group replication where multiple nodes use Paxos-like consensus for synchronous, multi-primary replication—a significant advancement over traditional async replication.

Replication Mechanisms by Database
Database	Mechanism	Log Type	Notes
PostgreSQL	Streaming replication	WAL (Write-Ahead Log)	Physical replication; byte-identical replicas
PostgreSQL	Logical replication	Logical decoding	Table-level; cross-version compatible; allows transforms
MySQL	Async binlog	Binary Log	Statement or row-based; traditional approach
MySQL	Semi-sync	Binary Log	At least one replica acknowledges before commit returns
MySQL	Group Replication	Binary Log + Paxos	Multi-primary; automatic failover; MySQL 8.0+
SQL Server	Always On AG	Transaction Log	Synchronous or async; automatic failover
MongoDB	Replica Set	Oplog	Automatic failover; majority write concern available

Synchronous vs Asynchronous Replication

The spectrum between synchronous and asynchronous replication represents a fundamental trade-off between consistency and performance/availability.

Synchronous Replication

•Primary waits for replica(s) to confirm before commit returns
•Zero data loss — replica always has committed data
•Higher latency — every write waits for replica
•Availability depends on replica — if replica is down/slow, writes block
•Best for: financial transactions, critical data that cannot be lost

Asynchronous Replication

•Primary commits immediately, ships changes later
•Possible data loss — recent commits may not reach replica
•Lower latency — writes don't wait for replica
•Higher availability — replica issues don't block writes
•Best for: high-throughput systems where brief data loss is acceptable

postgresql_replication_config.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- PostgreSQL synchronous replication configuration
 
-- On PRIMARY: postgresql.conf
-- Require sync commit from at least one replica
synchronous_commit = on            -- 'on' = wait for sync replica, 'off' = async
synchronous_standby_names = 'FIRST 1 (replica1, replica2)'
 
-- Sync modes available:
-- synchronous_commit = off           -- Full async (fastest, least safe)
-- synchronous_commit = local         -- Wait for local WAL write only
-- synchronous_commit = remote_write  -- Wait for replica to receive
-- synchronous_commit = on            -- Wait for replica to write WAL
-- synchronous_commit = remote_apply  -- Wait for replica to apply (strongest)
 
-- Check replication status
SELECT 
    client_addr,
    state,
    sync_state,  -- 'sync', 'async', 'potential', 'quorum'
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;
 
-- Per-transaction sync level (override global setting)
SET synchronous_commit = 'remote_apply';  -- For critical transaction
BEGIN;
  UPDATE accounts SET balance = balance - 1000 WHERE id = 123;
  INSERT INTO audit_log VALUES (...);
COMMIT;  -- Waits for replica to apply
 
SET synchronous_commit = 'local';  -- For less critical transaction
INSERT INTO page_views VALUES (...);  -- Returns immediately

Semi-Synchronous: The Middle Ground

Many production systems use semi-synchronous replication: wait for at least one replica (out of several) to acknowledge. This provides data safety without letting a single slow replica block all writes. Configure synchronous_standby_names = 'FIRST 1 (replica1, replica2, replica3)' — commit proceeds as soon as ANY one replica confirms.

Understanding and Handling Replication Lag

Replication lag is the delay between when a change is committed on the primary and when it becomes visible on a replica. For asynchronous replication, lag is inevitable—the question is how much and how to handle it.

What causes lag:

Cause	Description	Mitigation
Network latency	Time to transmit WAL over network	Co-locate replicas; use faster network
Replica CPU	Replay is single-threaded (PostgreSQL)	Faster replica CPU; logical replication
Heavy write load	More WAL to ship and replay	More replicas; faster disks
Long-running queries	VACUUM, migrations block replay	Use hot_standby_feedback; schedule off-peak
Replica maintenance	ANALYZE, index builds	Perform during low-traffic periods

Measuring lag:

measure_lag.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- On PRIMARY: check replication lag for all replicas
SELECT 
    client_addr AS replica,
    state,
    pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
    pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) / 1024 / 1024 AS lag_mb,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_pending_bytes,
    write_lag,      -- Time lag for write
    flush_lag,      -- Time lag for fsync
    replay_lag      -- Time lag for apply (PostgreSQL 10+)
FROM pg_stat_replication;
 
-- On REPLICA: check how far behind
SELECT 
    pg_last_wal_receive_lsn() AS received_lsn,
    pg_last_wal_replay_lsn() AS replayed_lsn,
    pg_wal_lsn_diff(
        pg_last_wal_receive_lsn(), 
        pg_last_wal_replay_lsn()
    ) AS replay_pending_bytes,
    pg_last_xact_replay_timestamp() AS last_replayed_at,
    NOW() - pg_last_xact_replay_timestamp() AS time_since_replay;
 
-- Alert if lag exceeds threshold
-- (typically run as monitoring check)
SELECT CASE 
    WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) > 1024*1024*100 
    THEN 'CRITICAL: Replica lag > 100MB'
    WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) > 1024*1024*10
    THEN 'WARNING: Replica lag > 10MB'
    ELSE 'OK'
END as status
FROM pg_stat_replication;

Strategies for handling lag:

Lag Handling Strategies

•Read-your-writes consistency — After a write, route that user's reads to primary briefly (or to a known-synced replica). Use session affinity or explicit primary reads for critical paths.
•Lag-aware routing — Monitor replica lag; route reads only to replicas under lag threshold. If all replicas are lagged, route to primary.
•Causal consistency tokens — Return a 'sync token' (e.g., WAL LSN) with write response. On subsequent reads, wait until replica reaches that position.
•Eventual consistency acceptance — For non-critical reads (analytics, recommendations), accept that data may be slightly stale.
•Synchronous replication for critical paths — Use sync replicas for operations that truly need it; async for everything else.

The Read-Your-Writes Problem

A user creates a post, then immediately views their profile. If the write goes to primary but the read goes to a lagged replica, the post is missing! This breaks user trust. Always ensure read-your-writes consistency for user-facing operations—route the user's subsequent reads to primary for a few seconds after writes, or use causal consistency tokens.

Read Routing Patterns

Effectively routing reads to replicas requires thoughtful design. Here are production-proven patterns.

The application explicitly decides write vs read destinations. Most flexible but requires careful implementation.

TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
// Application-level read/write split
import { Pool } from 'pg';
 
// Separate connection pools
const primaryPool = new Pool({
  host: 'db-primary.internal',
  database: 'myapp',
  max: 10,
});
 
const replicaPool = new Pool({
  host: 'db-replica.internal',
  database: 'myapp',
  max: 30,  // More replica connections for read load
});
 
type Intent = 'read' | 'write' | 'read-after-write';
 
// Track recent writes per user for read-your-writes consistency
const recentWrites = new Map<string, number>();
const WRITE_STICK_DURATION = 5000; // 5 seconds
 
function getPool(intent: Intent, userId?: string): Pool {
  if (intent === 'write') {
    if (userId) {
      recentWrites.set(userId, Date.now());
    }
    return primaryPool;
  }
  
  // Check if user recently wrote - ensure read-your-writes
  if (userId) {
    const lastWrite = recentWrites.get(userId);
    if (lastWrite && Date.now() - lastWrite < WRITE_STICK_DURATION) {
      console.log(`Routing ${userId} to primary (recent write)`);
      return primaryPool;
    }
  }
  
  return replicaPool;
}
 
// Usage in repository
class UserRepository {
  async findById(userId: string, requestingUserId?: string) {
    const pool = getPool('read', requestingUserId);
    const result = await pool.query(
      'SELECT * FROM users WHERE id = $1',
      [userId]
    );
    return result.rows[0];
  }
 
  async updateProfile(userId: string, data: Partial<User>) {
    const pool = getPool('write', userId);
    await pool.query(
      'UPDATE users SET name = $1, bio = $2 WHERE id = $3',
      [data.name, data.bio, userId]
    );
    // Next reads by this user go to primary for 5 seconds
  }
}

Failover and Replica Promotion

When the primary fails, a replica must be promoted to become the new primary. This is one of the most critical operations in database administration.

Failover approaches:

Approach	Description	Recovery Time	Data Loss Risk
Manual failover	DBA manually promotes replica	Minutes to hours	Depends on lag at failure
Automated failover	Tools detect failure, auto-promote	Seconds to minutes	Potential for split-brain
Managed services	AWS RDS, Cloud SQL handle it	1-2 minutes typical	Provider's SLA

PostgreSQL manual promotion:

failover_procedure.sh
Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# PostgreSQL manual failover procedure
 
# 1. Verify primary is truly down (avoid split-brain!)
pg_isready -h db-primary.internal -p 5432
# Should show: "could not connect to server"
 
# 2. Ensure replica has all available data
# On replica: check if caught up
psql -h db-replica1.internal -c "
  SELECT pg_last_wal_receive_lsn(), 
         pg_last_wal_replay_lsn(),
         pg_last_xact_replay_timestamp();"
 
# 3. Stop accepting connections to old primary (if accessible)
# This prevents split-brain if primary comes back
# On old primary (if accessible):
pg_ctl stop -D /var/lib/postgresql/data -m fast
 
# 4. Promote replica to primary
# On replica:
pg_ctl promote -D /var/lib/postgresql/data
# Or: touch /var/lib/postgresql/data/standby.signal && remove it
# Or: SELECT pg_promote();  (PostgreSQL 12+)
 
# 5. Verify promotion succeeded
psql -h db-replica1.internal -c "SELECT pg_is_in_recovery();"
# Should return 'f' (false = not in recovery = is primary)
 
# 6. Update application configuration to point to new primary
# (Or update DNS/load balancer to point to new primary)
 
# 7. Reconfigure other replicas to follow new primary
# On other replicas, update primary_conninfo in postgresql.conf:
# primary_conninfo = 'host=db-replica1.internal port=5432 ...'
# Then restart: pg_ctl restart
 
# 8. (Later) Add old primary back as replica
# After old primary is recovered, reinitialize it as replica:
pg_basebackup -h new-primary.internal -D /var/lib/postgresql/data -U replicator -P

Automated failover tools:

Tool	Database	Features
Patroni	PostgreSQL	Distributed consensus (etcd/Consul), auto-failover, REST API
repmgr	PostgreSQL	Simpler setup, manual or automatic failover
Orchestrator	MySQL	Topology discovery, automated failover, web UI
MHA	MySQL	Master High Availability, mature but dated
ProxySQL	MySQL	Failover detection with health checks

The Split-Brain Nightmare

Split-brain occurs when two nodes both believe they are the primary and accept writes. This causes data divergence that's extremely difficult to reconcile. Prevention: use STONITH (Shoot The Other Node In The Head) via fencing to ensure old primary can't accept writes; require quorum before promotion; use managed services that handle this correctly.

Summary: Read Replica Mastery

Key Takeaways

•Read replicas scale reads, not writes — Most apps are read-heavy (95%+ reads); replicas provide horizontal read scaling.
•Replication ships changes via logs — PostgreSQL uses WAL streaming; MySQL uses binlog. Changes flow from primary to replicas.
•Sync vs async is a trade-off — Synchronous ensures no data loss but adds latency; asynchronous is faster but risks data loss on failover.
•Replication lag is inevitable (async) — Monitor lag; implement read-your-writes consistency; use lag-aware routing.
•Routing can be application, proxy, or ORM-level — Choose based on complexity and control needs; proxies (ProxySQL) require less code change.
•Failover is critical and dangerous — Automate with Patroni/Orchestrator; avoid split-brain with fencing; test failover regularly.
•Managed services simplify operations — AWS RDS, Cloud SQL handle replication, failover, and monitoring; strongly consider for production.

What's next:

While read replicas scale read throughput, query caching eliminates query execution entirely for repeated requests. The next page explores query caching—from application-level caching with Redis to database query caches and result caching patterns.

Page Complete

You now understand how read replicas work, the spectrum of replication consistency, how to handle replication lag, patterns for routing reads, and the critical process of failover. Next, we'll cover query caching to further reduce database load.

4 / 5

Loading learning content...

System Design (HLD)Database Performance

Database Performance Optimization

LevelAdvanced

Duration90 mins

TopicDatabase Performance

4 / 5

Read Replicas: Scaling Read Throughput

When One Database Isn't Enough

This pattern is foundational to scaling databases. Every major web application—from Twitter to Stripe to Netflix—relies on read replicas to handle their read-heavy workloads.

What You Will Learn

Why Read Replicas?

Before diving into implementation, let's understand the fundamental problem read replicas solve and why they're so effective.

The Read/Write Ratio:

Most applications are heavily read-oriented:

Application Type	Typical Read/Write Ratio
Social media feed	100:1 to 1000:1
E-commerce catalog	50:1 to 200:1
News/content sites	500:1 to 5000:1
Analytics dashboards	100:1
Banking/financial	10:1 to 50:1

If 95% of your database operations are reads, scaling reads provides 20x the impact of scaling writes.

What replicas provide:

Benefits of Read Replicas

•Horizontal read scaling — Each replica can handle full read load; 5 replicas = ~5x read throughput
•Load distribution — Primary handles writes; replicas handle reads; total capacity increases
•Geographic distribution — Place replicas near users for lower latency (US-East, EU-West, AP-South)
•Fault tolerance — If primary fails, promote a replica (high availability)
•Operational flexibility — Run backups, analytics, or maintenance on replicas without impacting primary
•Read-your-writes consistency — With proper routing, users read their own writes while getting scale benefits

Replicas Don't Scale Writes

How Replication Works

PostgreSQL Streaming Replication:

PostgreSQL uses Write-Ahead Log (WAL) shipping:

Every change to the database is first written to the WAL (before being applied to tables)
The WAL is a sequential log of all operations (INSERT, UPDATE, DELETE, etc.)
Replicas connect to the primary and stream WAL records
Replicas apply WAL records to their local copies, keeping them synchronized

replication_flow.txt

Text

PostgreSQL Streaming Replication Flow
 
Application
    │
    │ INSERT INTO orders VALUES (...)
    ▼
┌─────────────────────────────────────────────────────┐
│                    PRIMARY                          │
│  ┌─────────┐      ┌────────────┐      ┌─────────┐  │
│  │ Query   │──────│ WAL Buffer │──────│ WAL     │  │
│  │ Parser  │      │ (memory)   │      │ Writer  │  │
│  └─────────┘      └────────────┘      └────┬────┘  │
│                                             │       │
│                          ┌──────────────────┘       │
│                          ▼                          │
│                    ┌──────────┐                     │
│                    │ WAL Files │ (pg_wal directory) │
│                    └─────┬────┘                     │
│                          │                          │
│                    ┌─────┴─────┐                    │
│                    │ WAL Sender │←── Replica conn   │
│                    └─────┬─────┘                    │
└──────────────────────────┼──────────────────────────┘
                           │
          TCP Connection   │  WAL Stream
          (replication)    │
                           ▼
┌─────────────────────────────────────────────────────┐
│                    REPLICA                          │
│                    ┌───────────┐                    │
│                    │WAL Receiver│                   │
│                    └─────┬─────┘                    │
│                          │                          │
│                    ┌─────▼─────┐                    │
│                    │ WAL Files │ (local copy)       │
│                    └─────┬─────┘                    │
│                          │                          │
│                    ┌─────▼─────┐                    │
│                    │  Startup  │ (replay process)   │
│                    │  Process  │                    │
│                    └─────┬─────┘                    │
│                          ▼                          │
│                    Applied to tables                │
│                    (replica now has the INSERT)     │
└─────────────────────────────────────────────────────┘

MySQL Replication:

MySQL uses binary log (binlog) replication:

Primary writes changes to the binary log
Replicas have an I/O thread that connects to primary and copies binlog events
Events are written to replica's relay log
SQL thread on replica reads relay log and applies changes

MySQL 8.0+ Group Replication:

Modern MySQL supports group replication where multiple nodes use Paxos-like consensus for synchronous, multi-primary replication—a significant advancement over traditional async replication.

Replication Mechanisms by Database
Database	Mechanism	Log Type	Notes
PostgreSQL	Streaming replication	WAL (Write-Ahead Log)	Physical replication; byte-identical replicas
PostgreSQL	Logical replication	Logical decoding	Table-level; cross-version compatible; allows transforms
MySQL	Async binlog	Binary Log	Statement or row-based; traditional approach
MySQL	Semi-sync	Binary Log	At least one replica acknowledges before commit returns
MySQL	Group Replication	Binary Log + Paxos	Multi-primary; automatic failover; MySQL 8.0+
SQL Server	Always On AG	Transaction Log	Synchronous or async; automatic failover
MongoDB	Replica Set	Oplog	Automatic failover; majority write concern available

Synchronous vs Asynchronous Replication

The spectrum between synchronous and asynchronous replication represents a fundamental trade-off between consistency and performance/availability.

Synchronous Replication

•Primary waits for replica(s) to confirm before commit returns
•Zero data loss — replica always has committed data
•Higher latency — every write waits for replica
•Availability depends on replica — if replica is down/slow, writes block
•Best for: financial transactions, critical data that cannot be lost

Asynchronous Replication

•Primary commits immediately, ships changes later
•Possible data loss — recent commits may not reach replica
•Lower latency — writes don't wait for replica
•Higher availability — replica issues don't block writes
•Best for: high-throughput systems where brief data loss is acceptable

postgresql_replication_config.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- PostgreSQL synchronous replication configuration
 
-- On PRIMARY: postgresql.conf
-- Require sync commit from at least one replica
synchronous_commit = on            -- 'on' = wait for sync replica, 'off' = async
synchronous_standby_names = 'FIRST 1 (replica1, replica2)'
 
-- Sync modes available:
-- synchronous_commit = off           -- Full async (fastest, least safe)
-- synchronous_commit = local         -- Wait for local WAL write only
-- synchronous_commit = remote_write  -- Wait for replica to receive
-- synchronous_commit = on            -- Wait for replica to write WAL
-- synchronous_commit = remote_apply  -- Wait for replica to apply (strongest)
 
-- Check replication status
SELECT 
    client_addr,
    state,
    sync_state,  -- 'sync', 'async', 'potential', 'quorum'
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;
 
-- Per-transaction sync level (override global setting)
SET synchronous_commit = 'remote_apply';  -- For critical transaction
BEGIN;
  UPDATE accounts SET balance = balance - 1000 WHERE id = 123;
  INSERT INTO audit_log VALUES (...);
COMMIT;  -- Waits for replica to apply
 
SET synchronous_commit = 'local';  -- For less critical transaction
INSERT INTO page_views VALUES (...);  -- Returns immediately

Semi-Synchronous: The Middle Ground

Understanding and Handling Replication Lag

What causes lag:

Cause	Description	Mitigation
Network latency	Time to transmit WAL over network	Co-locate replicas; use faster network
Replica CPU	Replay is single-threaded (PostgreSQL)	Faster replica CPU; logical replication
Heavy write load	More WAL to ship and replay	More replicas; faster disks
Long-running queries	VACUUM, migrations block replay	Use hot_standby_feedback; schedule off-peak
Replica maintenance	ANALYZE, index builds	Perform during low-traffic periods

Measuring lag:

measure_lag.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- On PRIMARY: check replication lag for all replicas
SELECT 
    client_addr AS replica,
    state,
    pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
    pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) / 1024 / 1024 AS lag_mb,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_pending_bytes,
    write_lag,      -- Time lag for write
    flush_lag,      -- Time lag for fsync
    replay_lag      -- Time lag for apply (PostgreSQL 10+)
FROM pg_stat_replication;
 
-- On REPLICA: check how far behind
SELECT 
    pg_last_wal_receive_lsn() AS received_lsn,
    pg_last_wal_replay_lsn() AS replayed_lsn,
    pg_wal_lsn_diff(
        pg_last_wal_receive_lsn(), 
        pg_last_wal_replay_lsn()
    ) AS replay_pending_bytes,
    pg_last_xact_replay_timestamp() AS last_replayed_at,
    NOW() - pg_last_xact_replay_timestamp() AS time_since_replay;
 
-- Alert if lag exceeds threshold
-- (typically run as monitoring check)
SELECT CASE 
    WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) > 1024*1024*100 
    THEN 'CRITICAL: Replica lag > 100MB'
    WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) > 1024*1024*10
    THEN 'WARNING: Replica lag > 10MB'
    ELSE 'OK'
END as status
FROM pg_stat_replication;

Strategies for handling lag:

Lag Handling Strategies

•Read-your-writes consistency — After a write, route that user's reads to primary briefly (or to a known-synced replica). Use session affinity or explicit primary reads for critical paths.
•Lag-aware routing — Monitor replica lag; route reads only to replicas under lag threshold. If all replicas are lagged, route to primary.
•Causal consistency tokens — Return a 'sync token' (e.g., WAL LSN) with write response. On subsequent reads, wait until replica reaches that position.
•Eventual consistency acceptance — For non-critical reads (analytics, recommendations), accept that data may be slightly stale.
•Synchronous replication for critical paths — Use sync replicas for operations that truly need it; async for everything else.

The Read-Your-Writes Problem

Read Routing Patterns

Effectively routing reads to replicas requires thoughtful design. Here are production-proven patterns.

The application explicitly decides write vs read destinations. Most flexible but requires careful implementation.

TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
// Application-level read/write split
import { Pool } from 'pg';
 
// Separate connection pools
const primaryPool = new Pool({
  host: 'db-primary.internal',
  database: 'myapp',
  max: 10,
});
 
const replicaPool = new Pool({
  host: 'db-replica.internal',
  database: 'myapp',
  max: 30,  // More replica connections for read load
});
 
type Intent = 'read' | 'write' | 'read-after-write';
 
// Track recent writes per user for read-your-writes consistency
const recentWrites = new Map<string, number>();
const WRITE_STICK_DURATION = 5000; // 5 seconds
 
function getPool(intent: Intent, userId?: string): Pool {
  if (intent === 'write') {
    if (userId) {
      recentWrites.set(userId, Date.now());
    }
    return primaryPool;
  }
  
  // Check if user recently wrote - ensure read-your-writes
  if (userId) {
    const lastWrite = recentWrites.get(userId);
    if (lastWrite && Date.now() - lastWrite < WRITE_STICK_DURATION) {
      console.log(`Routing ${userId} to primary (recent write)`);
      return primaryPool;
    }
  }
  
  return replicaPool;
}
 
// Usage in repository
class UserRepository {
  async findById(userId: string, requestingUserId?: string) {
    const pool = getPool('read', requestingUserId);
    const result = await pool.query(
      'SELECT * FROM users WHERE id = $1',
      [userId]
    );
    return result.rows[0];
  }
 
  async updateProfile(userId: string, data: Partial<User>) {
    const pool = getPool('write', userId);
    await pool.query(
      'UPDATE users SET name = $1, bio = $2 WHERE id = $3',
      [data.name, data.bio, userId]
    );
    // Next reads by this user go to primary for 5 seconds
  }
}

Failover and Replica Promotion

When the primary fails, a replica must be promoted to become the new primary. This is one of the most critical operations in database administration.

Failover approaches:

Approach	Description	Recovery Time	Data Loss Risk
Manual failover	DBA manually promotes replica	Minutes to hours	Depends on lag at failure
Automated failover	Tools detect failure, auto-promote	Seconds to minutes	Potential for split-brain
Managed services	AWS RDS, Cloud SQL handle it	1-2 minutes typical	Provider's SLA

PostgreSQL manual promotion:

failover_procedure.sh
Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# PostgreSQL manual failover procedure
 
# 1. Verify primary is truly down (avoid split-brain!)
pg_isready -h db-primary.internal -p 5432
# Should show: "could not connect to server"
 
# 2. Ensure replica has all available data
# On replica: check if caught up
psql -h db-replica1.internal -c "
  SELECT pg_last_wal_receive_lsn(), 
         pg_last_wal_replay_lsn(),
         pg_last_xact_replay_timestamp();"
 
# 3. Stop accepting connections to old primary (if accessible)
# This prevents split-brain if primary comes back
# On old primary (if accessible):
pg_ctl stop -D /var/lib/postgresql/data -m fast
 
# 4. Promote replica to primary
# On replica:
pg_ctl promote -D /var/lib/postgresql/data
# Or: touch /var/lib/postgresql/data/standby.signal && remove it
# Or: SELECT pg_promote();  (PostgreSQL 12+)
 
# 5. Verify promotion succeeded
psql -h db-replica1.internal -c "SELECT pg_is_in_recovery();"
# Should return 'f' (false = not in recovery = is primary)
 
# 6. Update application configuration to point to new primary
# (Or update DNS/load balancer to point to new primary)
 
# 7. Reconfigure other replicas to follow new primary
# On other replicas, update primary_conninfo in postgresql.conf:
# primary_conninfo = 'host=db-replica1.internal port=5432 ...'
# Then restart: pg_ctl restart
 
# 8. (Later) Add old primary back as replica
# After old primary is recovered, reinitialize it as replica:
pg_basebackup -h new-primary.internal -D /var/lib/postgresql/data -U replicator -P

Automated failover tools:

Tool	Database	Features
Patroni	PostgreSQL	Distributed consensus (etcd/Consul), auto-failover, REST API
repmgr	PostgreSQL	Simpler setup, manual or automatic failover
Orchestrator	MySQL	Topology discovery, automated failover, web UI
MHA	MySQL	Master High Availability, mature but dated
ProxySQL	MySQL	Failover detection with health checks

The Split-Brain Nightmare

Summary: Read Replica Mastery

Key Takeaways

•Read replicas scale reads, not writes — Most apps are read-heavy (95%+ reads); replicas provide horizontal read scaling.
•Replication ships changes via logs — PostgreSQL uses WAL streaming; MySQL uses binlog. Changes flow from primary to replicas.
•Sync vs async is a trade-off — Synchronous ensures no data loss but adds latency; asynchronous is faster but risks data loss on failover.
•Replication lag is inevitable (async) — Monitor lag; implement read-your-writes consistency; use lag-aware routing.
•Routing can be application, proxy, or ORM-level — Choose based on complexity and control needs; proxies (ProxySQL) require less code change.
•Failover is critical and dangerous — Automate with Patroni/Orchestrator; avoid split-brain with fencing; test failover regularly.
•Managed services simplify operations — AWS RDS, Cloud SQL handle replication, failover, and monitoring; strongly consider for production.

What's next:

Page Complete

4 / 5