System Design (HLD)SQL Database Scaling Patterns

SQL Database Scaling Patterns

LevelAdvanced

Duration75 mins

TopicSQL Database Scaling Patterns

2 / 5

Read Replica Scaling

The Power of Multiplication

Most database workloads are read-heavy. E-commerce platforms, content management systems, social networks, and analytics dashboards typically see read-to-write ratios of 10:1, 100:1, or even higher. When reads dominate your workload, you have a powerful scaling lever: replicate your data across multiple servers and distribute read traffic among them.

Read replica scaling is often the first horizontal scaling strategy teams adopt—and for good reason. It delivers substantial capacity improvements while preserving the single-primary write model that keeps application logic simple. Understanding how to implement, operate, and troubleshoot read replicas is essential knowledge for any engineer working with SQL databases at scale.

What You Will Learn

By the end of this page, you will understand how read replicas work at a mechanical level, the consistency trade-offs they introduce, strategies for load balancing reads across replicas, common pitfalls and how to avoid them, and when read replicas are (and aren't) the right solution.

How Read Replicas Work

Read replicas are database instances that maintain a copy of the primary database's data through asynchronous replication. The primary handles all write operations and streams changes to replicas, which apply those changes and serve read queries.

The Replication Pipeline

At a high level, the replication process follows these steps:

Write occurs on primary: An application executes an INSERT, UPDATE, or DELETE on the primary database.
Transaction logged: The primary writes the change to its Write-Ahead Log (WAL) or binary log, depending on the database engine.
Log streamed to replicas: The primary (or replicas polling the primary) transmits log entries to replica instances.
Replicas apply changes: Each replica reads the incoming log entries and applies them to its local data files.
Replicas serve reads: Clients can query replicas for read operations, receiving data that reflects applied changes.

replication_architecture.md
Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Replication Data Flow:
 
┌─────────────────────────────────────────────────────────────────┐
│                        APPLICATION TIER                         │
│   ┌─────────────┐             ┌─────────────────────────────┐   │
│   │   Writes    │             │          Reads              │   │
│   └──────┬──────┘             └──────────────┬──────────────┘   │
└──────────┼───────────────────────────────────┼──────────────────┘
           │                                   │
           ▼                                   ▼
    ┌──────────────┐                 ┌─────────────────┐
    │   PRIMARY    │                 │  LOAD BALANCER  │
    │   (Writer)   │                 │   (for reads)   │
    │              │                 └────────┬────────┘
    │  ┌────────┐  │                          │
    │  │  WAL/  │  │         ┌────────────────┼────────────────┐
    │  │ Binlog │──┼────────►│                ▼                │
    │  └────────┘  │         │    ┌───────────────────────┐    │
    └──────────────┘         │    │      REPLICA 1        │    │
                             │    │  (Async Apply WAL)    │    │
                             │    └───────────────────────┘    │
                             │                                 │
                             │    ┌───────────────────────┐    │
                             │    │      REPLICA 2        │    │
                             │    │  (Async Apply WAL)    │    │
                             │    └───────────────────────┘    │
                             │                                 │
                             │    ┌───────────────────────┐    │
                             │    │      REPLICA N        │    │
                             │    │  (Async Apply WAL)    │    │
                             │    └───────────────────────┘    │
                             │         REPLICA POOL          │
                             └────────────────────────────────┘

Asynchronous vs. Synchronous Replication

The critical distinction in replica configurations is whether replication is asynchronous or synchronous:

Asynchronous Replication (most common for read scaling):

Primary commits transactions immediately, without waiting for replicas
Lower write latency—primary isn't blocked by replica performance
Risk of data loss if primary fails before replicas receive changes
Replicas may lag behind primary (replication lag)

Synchronous Replication (used for HA, not typically for read scaling):

Primary waits for at least one replica to confirm receipt before committing
Higher write latency due to network round-trips
Zero data loss if primary fails (acknowledged writes are on replicas)
Not practical for read scaling—sync overhead defeats the purpose

Replication Lag Is Unavoidable

With asynchronous replication, replicas are always at least slightly behind the primary. Under normal conditions, lag is milliseconds. Under heavy write load or network issues, lag can grow to seconds or even minutes. Your application architecture must account for this reality.

Consistency Implications

Read replicas introduce eventual consistency into what was previously a strongly consistent system. This has profound implications for application design.

The Read-Your-Writes Problem

Consider this sequence:

User updates their profile name to "Alice" (write goes to primary)
User's browser immediately refreshes to show the profile (read goes to replica)
Replica hasn't received the update yet → User sees old name

This read-your-writes violation is jarring for users. They expect immediate consistency for their own actions, even if they can tolerate stale data from others.

Monotonic Reads Problem

Another consistency anomaly occurs when reads go to different replicas with different lag:

Request 1 hits Replica A (1 second behind): Shows 10 comments
Request 2 hits Replica B (3 seconds behind): Shows only 8 comments

The user sees comments "disappear"—a confusing experience that violates the intuitive expectation that data moves forward in time.

Consistency Anomalies with Read Replicas
Anomaly	Description	User Impact	Mitigation Strategy
Read-Your-Writes	User's own writes not visible on subsequent reads	Confusing—'my change didn't save'	Sticky sessions or read-from-primary for user's own data
Monotonic Reads	Data appears to go backward in time between requests	Disorienting—items 'disappear' and 'reappear'	Session affinity to single replica or version-based routing
Non-monotonic Writes	Observers see writes in different orders	Unlikely with single-primary but possible with cascading replicas	Avoid multi-level replica chains
Stale UI State	User sees outdated data that mismatches their mental model	Reduced trust in the application	Real-time updates via WebSocket/SSE for critical data

read_routing_patterns.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// Pattern: Context-aware read routing for consistency
 
import { Pool } from 'pg';
 
interface DatabaseCluster {
    primary: Pool;
    replicas: Pool[];
    replicaIndex: number;
}
 
interface RequestContext {
    userId: string;
    lastWriteTimestamp?: number;
    requiresStrongConsistency?: boolean;
}
 
class ConsistencyAwareRouter {
    private cluster: DatabaseCluster;
    
    // Maximum acceptable replication lag in milliseconds
    private maxAcceptableLag = 1000;
    
    constructor(cluster: DatabaseCluster) {
        this.cluster = cluster;
    }
    
    /**
     * Route read queries based on consistency requirements
     */
    async getReadConnection(context: RequestContext): Promise<Pool> {
        // Case 1: Strong consistency explicitly required
        if (context.requiresStrongConsistency) {
            return this.cluster.primary;
        }
        
        // Case 2: User just wrote data - read-your-writes guarantee
        if (context.lastWriteTimestamp) {
            const timeSinceWrite = Date.now() - context.lastWriteTimestamp;
            
            // If write was very recent, use primary
            if (timeSinceWrite < this.maxAcceptableLag) {
                return this.cluster.primary;
            }
        }
        
        // Case 3: Normal read - use replica with round-robin
        return this.getHealthyReplica();
    }
    
    /**
     * Round-robin selection among healthy replicas
     */
    private getHealthyReplica(): Pool {
        const replicas = this.cluster.replicas;
        
        // Simple round-robin
        this.cluster.replicaIndex = 
            (this.cluster.replicaIndex + 1) % replicas.length;
        
        return replicas[this.cluster.replicaIndex];
    }
}
 
// Usage in API handler
async function getUserProfile(
    router: ConsistencyAwareRouter,
    userId: string, 
    context: RequestContext
) {
    // Reading own profile after update? Need consistency
    const isOwnProfile = context.userId === userId;
    
    const conn = await router.getReadConnection({
        ...context,
        requiresStrongConsistency: isOwnProfile && !!context.lastWriteTimestamp,
    });
    
    return conn.query('SELECT * FROM users WHERE id = $1', [userId]);
}

Load Balancing Strategies

Distributing read traffic across replicas effectively requires careful consideration of balancing algorithms, health checking, and failure handling.

Round-Robin Distribution

The simplest approach: rotate through replicas sequentially.

Pros: Simple implementation, even distribution under uniform query patterns

Cons: Doesn't account for replica health, replication lag, or variable query complexity. A slow replica gets the same traffic as a fast one.

Weighted Round-Robin

Assign weights to replicas based on their capacity:

Replica 1 (high-spec): weight 3
Replica 2 (standard): weight 2
Replica 3 (standard): weight 2

Replica 1 receives ~43% of traffic, others ~29% each.

Use case: Heterogeneous replica hardware, or replicas that serve other purposes (analytics, backups) and should receive less traffic.

Least Connections

Route to the replica with the fewest active connections/queries.

Pros: Naturally adapts to variable query complexity and replica performance

Cons: Requires real-time connection tracking; may cause thundering herd when a slow replica finishes a batch

Advanced Load Balancing Considerations

•Lag-Aware Routing — Monitor replication lag on each replica. Route only to replicas within acceptable lag thresholds. If all replicas exceed threshold, fall back to primary or return cached data.
•Geographic Affinity — In multi-region deployments, prefer replicas in the same region as the client to minimize latency. Cross-region routing only when local replicas are unavailable.
•Query-Type Routing — Route expensive analytical queries to dedicated replicas to prevent them from impacting OLTP workloads. Use query parsing or application hints to classify queries.
•Circuit Breaker Patterns — If a replica becomes unresponsive or slow, temporarily remove it from the pool. Gradually reintroduce it after health checks pass.
•Connection Pooling Integration — Load balancers should work with connection pools to avoid connection storms. Pre-warmed connections reduce latency for routed queries.

lag_aware_balancer.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
"""
Lag-aware read replica load balancer
 
Monitors replication lag and routes queries accordingly:
- Lag < 100ms: Normal round-robin
- Lag 100ms - 1s: Reduced weight
- Lag > 1s: Removed from pool
- Lag > 5s: Alert triggered
"""
 
import asyncio
from dataclasses import dataclass
from typing import Optional
import asyncpg
 
@dataclass
class ReplicaStatus:
    host: str
    port: int
    pool: asyncpg.Pool
    lag_ms: float = 0
    weight: int = 100
    healthy: bool = True
    last_check: float = 0
 
class LagAwareBalancer:
    def __init__(self, replicas: list[ReplicaStatus], max_lag_ms: int = 1000):
        self.replicas = replicas
        self.max_lag_ms = max_lag_ms
        self._current_index = 0
        
    async def start_lag_monitoring(self, interval_seconds: float = 1.0):
        """Background task to monitor replica lag"""
        while True:
            await asyncio.gather(*[
                self._check_replica_lag(replica) 
                for replica in self.replicas
            ])
            await asyncio.sleep(interval_seconds)
    
    async def _check_replica_lag(self, replica: ReplicaStatus):
        """Query replica to determine replication lag"""
        try:
            # PostgreSQL: Check replication lag
            lag_result = await replica.pool.fetchval("""
                SELECT COALESCE(
                    EXTRACT(MILLISECONDS FROM (now() - pg_last_xact_replay_timestamp())),
                    0
                )::bigint AS lag_ms
            """)
            
            replica.lag_ms = lag_result or 0
            replica.healthy = replica.lag_ms < self.max_lag_ms
            
            # Adjust weight based on lag
            if replica.lag_ms < 100:
                replica.weight = 100
            elif replica.lag_ms < 500:
                replica.weight = 50
            elif replica.lag_ms < 1000:
                replica.weight = 25
            else:
                replica.weight = 0  # Effectively removed from pool
                
        except Exception as e:
            replica.healthy = False
            replica.weight = 0
            print(f"Replica {replica.host} health check failed: {e}")
    
    def get_replica(self) -> Optional[ReplicaStatus]:
        """Get next healthy replica using weighted selection"""
        healthy_replicas = [r for r in self.replicas if r.healthy and r.weight > 0]
        
        if not healthy_replicas:
            return None  # Caller should fall back to primary
        
        # Weighted random selection
        total_weight = sum(r.weight for r in healthy_replicas)
        
        # Simple weighted round-robin
        cumulative = 0
        target = self._current_index % total_weight
        self._current_index += 1
        
        for replica in healthy_replicas:
            cumulative += replica.weight
            if cumulative > target:
                return replica
        
        return healthy_replicas[-1]

Scaling Characteristics and Limits

Read replicas scale reads linearly—up to a point. Understanding the scaling dynamics helps you plan capacity and anticipate limitations.

Linear Read Scaling

In the ideal case, N replicas provide N× read capacity:

1 replica: 10,000 queries/second
5 replicas: 50,000 queries/second
10 replicas: 100,000 queries/second

This linear scaling holds when:

Query patterns are uniform across replicas
Network bandwidth to replicas isn't saturated
Replication doesn't bottleneck on primary CPU/IO
Replicas have sufficient resources to apply changes and serve queries

Replication Bandwidth Limits

As write volume increases, replication becomes a potential bottleneck:

Primary-side limits:

Each replica connection consumes CPU for WAL sending
With 20+ replicas, WAL sender processes compete for primary resources
Large transactions (bulk inserts, schema changes) amplify bandwidth requirements

Network limits:

A database generating 100 MB/s of WAL needs 100 MB/s × N bandwidth to replicas
Cross-region replication is particularly affected by WAN latency and bandwidth costs

Read Replica Scaling Limits by Write Volume
Write Volume	Typical Max Replicas	Bottleneck	Mitigation
< 1 MB/s WAL	50+	Connection limits	Connection pooling, pgbouncer
1-10 MB/s WAL	20-30	Primary CPU for WAL sending	Dedicated network interface for replication
10-50 MB/s WAL	10-15	Network bandwidth	Cascading replicas, regional clusters
50+ MB/s WAL	5-10	Replica apply throughput	Faster replica storage, parallel apply
100+ MB/s WAL	3-5	Everything saturated	Consider partitioning writes across primaries

Cascading Replication

To reduce load on the primary, you can configure cascading replicas:

Primary → Replica A → Replica A1, A2, A3
                   → Replica B → Replica B1, B2, B3

The primary streams WAL only to Replica A and B. Those intermediate replicas then stream to downstream replicas.

Trade-offs:

✅ Reduced primary load for replication
✅ Can scale to many more replicas
❌ Increased replication lag (additive at each hop)
❌ More complex failure handling (intermediate replica failure affects downstream)
❌ Risk of applying writes in different orders on different branches

Writes Don't Scale

Read replicas do nothing for write scalability. All writes still go to a single primary. If your bottleneck is write capacity—high transaction rates, large INSERT volumes, frequent UPDATE storms—read replicas won't help. You'll need functional partitioning or sharding instead.

Operational Considerations

Operating a read replica fleet introduces operational complexity beyond a single-server deployment. Here are the critical concerns:

Monitoring and Alerting

You must monitor:

Replication lag: Alert when any replica exceeds acceptable lag thresholds. Different thresholds for warning (lag > 1s) vs. critical (lag > 10s).

Replica health: Connection success, query execution times, error rates.

Capacity utilization: CPU, memory, I/O on each replica. Unbalanced load indicates routing problems.

Primary replication slots: PostgreSQL replication slots retain WAL until all replicas have consumed it. A slow/failed replica causes unbounded WAL growth on primary.

replica_monitoring.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
-- PostgreSQL: Comprehensive replication monitoring queries
 
-- 1. View replication status from PRIMARY
SELECT 
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS replay_lag_bytes,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)) AS send_lag_bytes
FROM pg_stat_replication;
 
-- 2. Check replication slots for WAL retention issues
SELECT 
    slot_name,
    slot_type,
    active,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots
WHERE slot_type = 'physical';
 
-- 3. View replication lag from REPLICA
SELECT
    CASE 
        WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0
        ELSE EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))
    END AS replica_lag_seconds;
 
-- 4. Create a monitoring/alerting function
CREATE OR REPLACE FUNCTION check_replication_health()
RETURNS TABLE(
    replica_addr inet,
    lag_bytes bigint,
    lag_seconds numeric,
    status text
) AS $$
BEGIN
    RETURN QUERY
    SELECT 
        r.client_addr,
        pg_wal_lsn_diff(pg_current_wal_lsn(), r.replay_lsn)::bigint,
        EXTRACT(EPOCH FROM (now() - r.reply_time)),
        CASE 
            WHEN r.state != 'streaming' THEN 'CRITICAL: Not streaming'
            WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), r.replay_lsn) > 100000000 -- >100MB
                THEN 'WARNING: High lag'
            ELSE 'OK'
        END
    FROM pg_stat_replication r;
END;
$$ LANGUAGE plpgsql;

Failover and Promotion

When the primary fails, one replica must be promoted. This process has critical implications:

Promotion Steps:

Detect primary failure (typically via health checks from a monitoring system)
Select promotion candidate (usually the replica with least lag)
Stop accepting writes on the old primary (if still accessible—fencing)
Promote selected replica to primary role
Reconfigure other replicas to follow the new primary
Update application connection strings or DNS/proxy routing

Data Loss Considerations: With asynchronous replication, the promoted replica may be behind. Any transactions committed on the old primary but not yet replicated are lost. This is the fundamental trade-off of async replication.

Failover Challenges

•Detection latency (seconds to minutes)
•Split-brain risk if old primary resurfaces
•Replica convergence to new primary
•Connection string updates or DNS TTL delays
•Data loss from replication lag
•Application retry logic during transition

Failover Best Practices

•Use a proper HA solution (Patroni, RDS Multi-AZ)
•Implement fencing to prevent split-brain
•Monitor replica lag for promotion decisions
•Use connection proxies (HAProxy, pgbouncer) for redirection
•Consider synchronous replication for one replica
•Test failover procedures regularly

Schema Changes and DDL

Schema changes (Data Definition Language—DDL) present unique challenges with read replicas. Understanding these challenges helps you plan migrations that don't disrupt service.

DDL Replication Mechanics

Most DDL operations are logged and replicated like any other statement. However, the impact can be severe:

Full Table Locks: Operations like ALTER TABLE ADD COLUMN with defaults can lock the entire table on both primary and replicas. During lock acquisition, replicas can't apply any subsequent WAL, causing lag to grow.

Serial Replay: DDL on primary must complete before subsequent DML. If DDL takes 30 minutes on primary, replicas will be at least 30 minutes behind during that window.

Cascading Delays: A slow DDL on primary causes all replicas to lag. Applications reading from replicas see increasingly stale data.

Online DDL Strategies

To minimize DDL impact:

Use non-blocking DDL when possible: PostgreSQL 11+ supports concurrent index creation. Many ALTERs can be non-blocking.
Pt-online-schema-change / gh-ost (MySQL): These tools create shadow tables and swap atomically, minimizing lock time.
Expand/contract migrations: Instead of modifying columns, add new columns, migrate data, then drop old columns.
Off-peak scheduling: Perform schema changes during low-traffic periods to minimize impact.

safe_migration_patterns.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Safe migration patterns for replicated databases
 
-- Pattern 1: Add nullable column (near-instant, non-blocking)
ALTER TABLE orders ADD COLUMN tracking_number VARCHAR(100);
 
-- Pattern 2: Add column with default (PostgreSQL 11+, non-blocking)
ALTER TABLE orders ADD COLUMN status VARCHAR(20) DEFAULT 'pending';
 
-- Pattern 3: Create index concurrently (non-blocking)
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);
 
-- Pattern 4: Add NOT NULL constraint in stages
-- Step 1: Add check constraint as NOT VALID (fast, no table scan)
ALTER TABLE orders 
ADD CONSTRAINT orders_status_not_null 
CHECK (status IS NOT NULL) NOT VALID;
 
-- Step 2: Validate in background (blocks writes briefly, but can be done off-peak)
ALTER TABLE orders VALIDATE CONSTRAINT orders_status_not_null;
 
-- Step 3: Add NOT NULL using validated constraint (fast)
ALTER TABLE orders ALTER COLUMN status SET NOT NULL;
 
-- Step 4: Drop the now-redundant check constraint
ALTER TABLE orders DROP CONSTRAINT orders_status_not_null;
 
-- Pattern 5: Rename column safely using expand/contract
-- (Rename is instant but breaks queries, so use this pattern instead)
-- Phase 1: Add new column
ALTER TABLE users ADD COLUMN email_address VARCHAR(255);
 
-- Phase 2: Backfill
UPDATE users SET email_address = email WHERE email_address IS NULL;
 
-- Phase 3: Update application to use new column
-- Phase 4: Drop old column
ALTER TABLE users DROP COLUMN email;

When Read Replicas Aren't Enough

Read replicas are powerful but not universal. Recognize when they've reached their limits and a different approach is needed:

Signs Read Replicas Have Hit Their Limits

•Write Throughput Saturated — Primary can't keep up with writes. Replicas can't help here; you need partitioned writes.
•Replication Lag Growing Under Normal Load — If replicas can't apply changes as fast as primary generates them, adding more replicas makes it worse (more replication connections on primary).
•Cross-Region Latency Unacceptable — If users in Asia need low latency but primary is in US, read replicas add network latency. You may need multi-region writes or edge caching.
•Storage Capacity Limits — If a single machine can't hold your dataset, replicas don't help—they need the same storage. Sharding is required.
•Hot Spot Write Patterns — If all writes target a small portion of data (a few hot rows), replicas don't reduce contention. Application redesign is needed.
•Strong Consistency Required Everywhere — If your application cannot tolerate any staleness, routing everything to primary defeats replica scaling benefits.

Evolution Path

Read replicas fit into a broader scaling progression:

Optimize single node: Query tuning, indexing, configuration
Vertical scaling: Bigger hardware
Read replicas: Offload read traffic ← You are here
Caching layers: Redis/Memcached in front of database
Functional partitioning: Separate databases per domain
Application-level sharding: Partition writes across primaries
Distributed databases: NewSQL, distributed storage

Each step adds complexity but unlocks new scaling dimensions. Read replicas are often sufficient for years before the next step is needed.

The Principal Engineer's Perspective

Read replicas plus intelligent caching handle the vast majority of read-scaling needs. Before pursuing sharding, ensure you've fully exploited replicas + cache. A replicated PostgreSQL cluster with Redis caching can often handle 100,000+ queries per second—more than most applications will ever need.

Summary: Read Replica Scaling

Let's consolidate the key insights from our exploration of read replica scaling:

Key Takeaways

•Replicas scale reads linearly — N replicas provide approximately N× read capacity, until replication becomes the bottleneck.
•Asynchronous replication means eventual consistency — Applications must handle read-your-writes and monotonic read anomalies.
•Lag-aware routing is essential — Don't send reads to replicas that are significantly behind. Implement monitoring and automatic failback.
•Load balancing requires health-aware strategies — Simple round-robin is insufficient. Consider lag, connection counts, and replica health.
•DDL operations impact the entire replica fleet — Use online DDL techniques to minimize replication lag and service disruption.
•Replicas don't scale writes — If write throughput is your bottleneck, you need functional partitioning or sharding, not more replicas.

What's Next:

When read replicas aren't sufficient—often because write patterns or data isolation needs demand it—the next scaling strategy is functional partitioning. We'll explore how to divide your database by domain or feature to scale both reads and writes.

Page Complete

You now understand how read replicas work, their consistency implications, load balancing strategies, and operational considerations. This knowledge enables you to implement and operate replica-based scaling for read-heavy SQL workloads effectively.

2 / 5

Loading learning content...

System Design (HLD)SQL Database Scaling Patterns

SQL Database Scaling Patterns

LevelAdvanced

Duration75 mins

TopicSQL Database Scaling Patterns

2 / 5

Read Replica Scaling

The Power of Multiplication

What You Will Learn

How Read Replicas Work

The Replication Pipeline

At a high level, the replication process follows these steps:

Write occurs on primary: An application executes an INSERT, UPDATE, or DELETE on the primary database.
Transaction logged: The primary writes the change to its Write-Ahead Log (WAL) or binary log, depending on the database engine.
Log streamed to replicas: The primary (or replicas polling the primary) transmits log entries to replica instances.
Replicas apply changes: Each replica reads the incoming log entries and applies them to its local data files.
Replicas serve reads: Clients can query replicas for read operations, receiving data that reflects applied changes.

replication_architecture.md
Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Replication Data Flow:
 
┌─────────────────────────────────────────────────────────────────┐
│                        APPLICATION TIER                         │
│   ┌─────────────┐             ┌─────────────────────────────┐   │
│   │   Writes    │             │          Reads              │   │
│   └──────┬──────┘             └──────────────┬──────────────┘   │
└──────────┼───────────────────────────────────┼──────────────────┘
           │                                   │
           ▼                                   ▼
    ┌──────────────┐                 ┌─────────────────┐
    │   PRIMARY    │                 │  LOAD BALANCER  │
    │   (Writer)   │                 │   (for reads)   │
    │              │                 └────────┬────────┘
    │  ┌────────┐  │                          │
    │  │  WAL/  │  │         ┌────────────────┼────────────────┐
    │  │ Binlog │──┼────────►│                ▼                │
    │  └────────┘  │         │    ┌───────────────────────┐    │
    └──────────────┘         │    │      REPLICA 1        │    │
                             │    │  (Async Apply WAL)    │    │
                             │    └───────────────────────┘    │
                             │                                 │
                             │    ┌───────────────────────┐    │
                             │    │      REPLICA 2        │    │
                             │    │  (Async Apply WAL)    │    │
                             │    └───────────────────────┘    │
                             │                                 │
                             │    ┌───────────────────────┐    │
                             │    │      REPLICA N        │    │
                             │    │  (Async Apply WAL)    │    │
                             │    └───────────────────────┘    │
                             │         REPLICA POOL          │
                             └────────────────────────────────┘

Asynchronous vs. Synchronous Replication

The critical distinction in replica configurations is whether replication is asynchronous or synchronous:

Asynchronous Replication (most common for read scaling):

Primary commits transactions immediately, without waiting for replicas
Lower write latency—primary isn't blocked by replica performance
Risk of data loss if primary fails before replicas receive changes
Replicas may lag behind primary (replication lag)

Synchronous Replication (used for HA, not typically for read scaling):

Primary waits for at least one replica to confirm receipt before committing
Higher write latency due to network round-trips
Zero data loss if primary fails (acknowledged writes are on replicas)
Not practical for read scaling—sync overhead defeats the purpose

Replication Lag Is Unavoidable

Consistency Implications

Read replicas introduce eventual consistency into what was previously a strongly consistent system. This has profound implications for application design.

The Read-Your-Writes Problem

Consider this sequence:

User updates their profile name to "Alice" (write goes to primary)
User's browser immediately refreshes to show the profile (read goes to replica)
Replica hasn't received the update yet → User sees old name

This read-your-writes violation is jarring for users. They expect immediate consistency for their own actions, even if they can tolerate stale data from others.

Monotonic Reads Problem

Another consistency anomaly occurs when reads go to different replicas with different lag:

Request 1 hits Replica A (1 second behind): Shows 10 comments
Request 2 hits Replica B (3 seconds behind): Shows only 8 comments

The user sees comments "disappear"—a confusing experience that violates the intuitive expectation that data moves forward in time.

Consistency Anomalies with Read Replicas
Anomaly	Description	User Impact	Mitigation Strategy
Read-Your-Writes	User's own writes not visible on subsequent reads	Confusing—'my change didn't save'	Sticky sessions or read-from-primary for user's own data
Monotonic Reads	Data appears to go backward in time between requests	Disorienting—items 'disappear' and 'reappear'	Session affinity to single replica or version-based routing
Non-monotonic Writes	Observers see writes in different orders	Unlikely with single-primary but possible with cascading replicas	Avoid multi-level replica chains
Stale UI State	User sees outdated data that mismatches their mental model	Reduced trust in the application	Real-time updates via WebSocket/SSE for critical data

read_routing_patterns.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// Pattern: Context-aware read routing for consistency
 
import { Pool } from 'pg';
 
interface DatabaseCluster {
    primary: Pool;
    replicas: Pool[];
    replicaIndex: number;
}
 
interface RequestContext {
    userId: string;
    lastWriteTimestamp?: number;
    requiresStrongConsistency?: boolean;
}
 
class ConsistencyAwareRouter {
    private cluster: DatabaseCluster;
    
    // Maximum acceptable replication lag in milliseconds
    private maxAcceptableLag = 1000;
    
    constructor(cluster: DatabaseCluster) {
        this.cluster = cluster;
    }
    
    /**
     * Route read queries based on consistency requirements
     */
    async getReadConnection(context: RequestContext): Promise<Pool> {
        // Case 1: Strong consistency explicitly required
        if (context.requiresStrongConsistency) {
            return this.cluster.primary;
        }
        
        // Case 2: User just wrote data - read-your-writes guarantee
        if (context.lastWriteTimestamp) {
            const timeSinceWrite = Date.now() - context.lastWriteTimestamp;
            
            // If write was very recent, use primary
            if (timeSinceWrite < this.maxAcceptableLag) {
                return this.cluster.primary;
            }
        }
        
        // Case 3: Normal read - use replica with round-robin
        return this.getHealthyReplica();
    }
    
    /**
     * Round-robin selection among healthy replicas
     */
    private getHealthyReplica(): Pool {
        const replicas = this.cluster.replicas;
        
        // Simple round-robin
        this.cluster.replicaIndex = 
            (this.cluster.replicaIndex + 1) % replicas.length;
        
        return replicas[this.cluster.replicaIndex];
    }
}
 
// Usage in API handler
async function getUserProfile(
    router: ConsistencyAwareRouter,
    userId: string, 
    context: RequestContext
) {
    // Reading own profile after update? Need consistency
    const isOwnProfile = context.userId === userId;
    
    const conn = await router.getReadConnection({
        ...context,
        requiresStrongConsistency: isOwnProfile && !!context.lastWriteTimestamp,
    });
    
    return conn.query('SELECT * FROM users WHERE id = $1', [userId]);
}

Load Balancing Strategies

Distributing read traffic across replicas effectively requires careful consideration of balancing algorithms, health checking, and failure handling.

Round-Robin Distribution

The simplest approach: rotate through replicas sequentially.

Pros: Simple implementation, even distribution under uniform query patterns

Cons: Doesn't account for replica health, replication lag, or variable query complexity. A slow replica gets the same traffic as a fast one.

Weighted Round-Robin

Assign weights to replicas based on their capacity:

Replica 1 (high-spec): weight 3
Replica 2 (standard): weight 2
Replica 3 (standard): weight 2

Replica 1 receives ~43% of traffic, others ~29% each.

Use case: Heterogeneous replica hardware, or replicas that serve other purposes (analytics, backups) and should receive less traffic.

Least Connections

Route to the replica with the fewest active connections/queries.

Pros: Naturally adapts to variable query complexity and replica performance

Cons: Requires real-time connection tracking; may cause thundering herd when a slow replica finishes a batch

Advanced Load Balancing Considerations

•Lag-Aware Routing — Monitor replication lag on each replica. Route only to replicas within acceptable lag thresholds. If all replicas exceed threshold, fall back to primary or return cached data.
•Geographic Affinity — In multi-region deployments, prefer replicas in the same region as the client to minimize latency. Cross-region routing only when local replicas are unavailable.
•Query-Type Routing — Route expensive analytical queries to dedicated replicas to prevent them from impacting OLTP workloads. Use query parsing or application hints to classify queries.
•Circuit Breaker Patterns — If a replica becomes unresponsive or slow, temporarily remove it from the pool. Gradually reintroduce it after health checks pass.
•Connection Pooling Integration — Load balancers should work with connection pools to avoid connection storms. Pre-warmed connections reduce latency for routed queries.

lag_aware_balancer.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
"""
Lag-aware read replica load balancer
 
Monitors replication lag and routes queries accordingly:
- Lag < 100ms: Normal round-robin
- Lag 100ms - 1s: Reduced weight
- Lag > 1s: Removed from pool
- Lag > 5s: Alert triggered
"""
 
import asyncio
from dataclasses import dataclass
from typing import Optional
import asyncpg
 
@dataclass
class ReplicaStatus:
    host: str
    port: int
    pool: asyncpg.Pool
    lag_ms: float = 0
    weight: int = 100
    healthy: bool = True
    last_check: float = 0
 
class LagAwareBalancer:
    def __init__(self, replicas: list[ReplicaStatus], max_lag_ms: int = 1000):
        self.replicas = replicas
        self.max_lag_ms = max_lag_ms
        self._current_index = 0
        
    async def start_lag_monitoring(self, interval_seconds: float = 1.0):
        """Background task to monitor replica lag"""
        while True:
            await asyncio.gather(*[
                self._check_replica_lag(replica) 
                for replica in self.replicas
            ])
            await asyncio.sleep(interval_seconds)
    
    async def _check_replica_lag(self, replica: ReplicaStatus):
        """Query replica to determine replication lag"""
        try:
            # PostgreSQL: Check replication lag
            lag_result = await replica.pool.fetchval("""
                SELECT COALESCE(
                    EXTRACT(MILLISECONDS FROM (now() - pg_last_xact_replay_timestamp())),
                    0
                )::bigint AS lag_ms
            """)
            
            replica.lag_ms = lag_result or 0
            replica.healthy = replica.lag_ms < self.max_lag_ms
            
            # Adjust weight based on lag
            if replica.lag_ms < 100:
                replica.weight = 100
            elif replica.lag_ms < 500:
                replica.weight = 50
            elif replica.lag_ms < 1000:
                replica.weight = 25
            else:
                replica.weight = 0  # Effectively removed from pool
                
        except Exception as e:
            replica.healthy = False
            replica.weight = 0
            print(f"Replica {replica.host} health check failed: {e}")
    
    def get_replica(self) -> Optional[ReplicaStatus]:
        """Get next healthy replica using weighted selection"""
        healthy_replicas = [r for r in self.replicas if r.healthy and r.weight > 0]
        
        if not healthy_replicas:
            return None  # Caller should fall back to primary
        
        # Weighted random selection
        total_weight = sum(r.weight for r in healthy_replicas)
        
        # Simple weighted round-robin
        cumulative = 0
        target = self._current_index % total_weight
        self._current_index += 1
        
        for replica in healthy_replicas:
            cumulative += replica.weight
            if cumulative > target:
                return replica
        
        return healthy_replicas[-1]

Scaling Characteristics and Limits

Read replicas scale reads linearly—up to a point. Understanding the scaling dynamics helps you plan capacity and anticipate limitations.

Linear Read Scaling

In the ideal case, N replicas provide N× read capacity:

1 replica: 10,000 queries/second
5 replicas: 50,000 queries/second
10 replicas: 100,000 queries/second

This linear scaling holds when:

Query patterns are uniform across replicas
Network bandwidth to replicas isn't saturated
Replication doesn't bottleneck on primary CPU/IO
Replicas have sufficient resources to apply changes and serve queries

Replication Bandwidth Limits

As write volume increases, replication becomes a potential bottleneck:

Primary-side limits:

Each replica connection consumes CPU for WAL sending
With 20+ replicas, WAL sender processes compete for primary resources
Large transactions (bulk inserts, schema changes) amplify bandwidth requirements

Network limits:

A database generating 100 MB/s of WAL needs 100 MB/s × N bandwidth to replicas
Cross-region replication is particularly affected by WAN latency and bandwidth costs

Read Replica Scaling Limits by Write Volume
Write Volume	Typical Max Replicas	Bottleneck	Mitigation
< 1 MB/s WAL	50+	Connection limits	Connection pooling, pgbouncer
1-10 MB/s WAL	20-30	Primary CPU for WAL sending	Dedicated network interface for replication
10-50 MB/s WAL	10-15	Network bandwidth	Cascading replicas, regional clusters
50+ MB/s WAL	5-10	Replica apply throughput	Faster replica storage, parallel apply
100+ MB/s WAL	3-5	Everything saturated	Consider partitioning writes across primaries

Cascading Replication

To reduce load on the primary, you can configure cascading replicas:

Primary → Replica A → Replica A1, A2, A3
                   → Replica B → Replica B1, B2, B3

The primary streams WAL only to Replica A and B. Those intermediate replicas then stream to downstream replicas.

Trade-offs:

✅ Reduced primary load for replication
✅ Can scale to many more replicas
❌ Increased replication lag (additive at each hop)
❌ More complex failure handling (intermediate replica failure affects downstream)
❌ Risk of applying writes in different orders on different branches

Writes Don't Scale

Operational Considerations

Operating a read replica fleet introduces operational complexity beyond a single-server deployment. Here are the critical concerns:

Monitoring and Alerting

You must monitor:

Replication lag: Alert when any replica exceeds acceptable lag thresholds. Different thresholds for warning (lag > 1s) vs. critical (lag > 10s).

Replica health: Connection success, query execution times, error rates.

Capacity utilization: CPU, memory, I/O on each replica. Unbalanced load indicates routing problems.

Primary replication slots: PostgreSQL replication slots retain WAL until all replicas have consumed it. A slow/failed replica causes unbounded WAL growth on primary.

replica_monitoring.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
-- PostgreSQL: Comprehensive replication monitoring queries
 
-- 1. View replication status from PRIMARY
SELECT 
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS replay_lag_bytes,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)) AS send_lag_bytes
FROM pg_stat_replication;
 
-- 2. Check replication slots for WAL retention issues
SELECT 
    slot_name,
    slot_type,
    active,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots
WHERE slot_type = 'physical';
 
-- 3. View replication lag from REPLICA
SELECT
    CASE 
        WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0
        ELSE EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))
    END AS replica_lag_seconds;
 
-- 4. Create a monitoring/alerting function
CREATE OR REPLACE FUNCTION check_replication_health()
RETURNS TABLE(
    replica_addr inet,
    lag_bytes bigint,
    lag_seconds numeric,
    status text
) AS $$
BEGIN
    RETURN QUERY
    SELECT 
        r.client_addr,
        pg_wal_lsn_diff(pg_current_wal_lsn(), r.replay_lsn)::bigint,
        EXTRACT(EPOCH FROM (now() - r.reply_time)),
        CASE 
            WHEN r.state != 'streaming' THEN 'CRITICAL: Not streaming'
            WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), r.replay_lsn) > 100000000 -- >100MB
                THEN 'WARNING: High lag'
            ELSE 'OK'
        END
    FROM pg_stat_replication r;
END;
$$ LANGUAGE plpgsql;

Failover and Promotion

When the primary fails, one replica must be promoted. This process has critical implications:

Promotion Steps:

Detect primary failure (typically via health checks from a monitoring system)
Select promotion candidate (usually the replica with least lag)
Stop accepting writes on the old primary (if still accessible—fencing)
Promote selected replica to primary role
Reconfigure other replicas to follow the new primary
Update application connection strings or DNS/proxy routing

Failover Challenges

•Detection latency (seconds to minutes)
•Split-brain risk if old primary resurfaces
•Replica convergence to new primary
•Connection string updates or DNS TTL delays
•Data loss from replication lag
•Application retry logic during transition

Failover Best Practices

•Use a proper HA solution (Patroni, RDS Multi-AZ)
•Implement fencing to prevent split-brain
•Monitor replica lag for promotion decisions
•Use connection proxies (HAProxy, pgbouncer) for redirection
•Consider synchronous replication for one replica
•Test failover procedures regularly

Schema Changes and DDL

Schema changes (Data Definition Language—DDL) present unique challenges with read replicas. Understanding these challenges helps you plan migrations that don't disrupt service.

DDL Replication Mechanics

Most DDL operations are logged and replicated like any other statement. However, the impact can be severe:

Serial Replay: DDL on primary must complete before subsequent DML. If DDL takes 30 minutes on primary, replicas will be at least 30 minutes behind during that window.

Cascading Delays: A slow DDL on primary causes all replicas to lag. Applications reading from replicas see increasingly stale data.

Online DDL Strategies

To minimize DDL impact:

Use non-blocking DDL when possible: PostgreSQL 11+ supports concurrent index creation. Many ALTERs can be non-blocking.
Pt-online-schema-change / gh-ost (MySQL): These tools create shadow tables and swap atomically, minimizing lock time.
Expand/contract migrations: Instead of modifying columns, add new columns, migrate data, then drop old columns.
Off-peak scheduling: Perform schema changes during low-traffic periods to minimize impact.

safe_migration_patterns.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Safe migration patterns for replicated databases
 
-- Pattern 1: Add nullable column (near-instant, non-blocking)
ALTER TABLE orders ADD COLUMN tracking_number VARCHAR(100);
 
-- Pattern 2: Add column with default (PostgreSQL 11+, non-blocking)
ALTER TABLE orders ADD COLUMN status VARCHAR(20) DEFAULT 'pending';
 
-- Pattern 3: Create index concurrently (non-blocking)
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);
 
-- Pattern 4: Add NOT NULL constraint in stages
-- Step 1: Add check constraint as NOT VALID (fast, no table scan)
ALTER TABLE orders 
ADD CONSTRAINT orders_status_not_null 
CHECK (status IS NOT NULL) NOT VALID;
 
-- Step 2: Validate in background (blocks writes briefly, but can be done off-peak)
ALTER TABLE orders VALIDATE CONSTRAINT orders_status_not_null;
 
-- Step 3: Add NOT NULL using validated constraint (fast)
ALTER TABLE orders ALTER COLUMN status SET NOT NULL;
 
-- Step 4: Drop the now-redundant check constraint
ALTER TABLE orders DROP CONSTRAINT orders_status_not_null;
 
-- Pattern 5: Rename column safely using expand/contract
-- (Rename is instant but breaks queries, so use this pattern instead)
-- Phase 1: Add new column
ALTER TABLE users ADD COLUMN email_address VARCHAR(255);
 
-- Phase 2: Backfill
UPDATE users SET email_address = email WHERE email_address IS NULL;
 
-- Phase 3: Update application to use new column
-- Phase 4: Drop old column
ALTER TABLE users DROP COLUMN email;

When Read Replicas Aren't Enough

Read replicas are powerful but not universal. Recognize when they've reached their limits and a different approach is needed:

Signs Read Replicas Have Hit Their Limits

•Write Throughput Saturated — Primary can't keep up with writes. Replicas can't help here; you need partitioned writes.
•Replication Lag Growing Under Normal Load — If replicas can't apply changes as fast as primary generates them, adding more replicas makes it worse (more replication connections on primary).
•Cross-Region Latency Unacceptable — If users in Asia need low latency but primary is in US, read replicas add network latency. You may need multi-region writes or edge caching.
•Storage Capacity Limits — If a single machine can't hold your dataset, replicas don't help—they need the same storage. Sharding is required.
•Hot Spot Write Patterns — If all writes target a small portion of data (a few hot rows), replicas don't reduce contention. Application redesign is needed.
•Strong Consistency Required Everywhere — If your application cannot tolerate any staleness, routing everything to primary defeats replica scaling benefits.

Evolution Path

Read replicas fit into a broader scaling progression:

Optimize single node: Query tuning, indexing, configuration
Vertical scaling: Bigger hardware
Read replicas: Offload read traffic ← You are here
Caching layers: Redis/Memcached in front of database
Functional partitioning: Separate databases per domain
Application-level sharding: Partition writes across primaries
Distributed databases: NewSQL, distributed storage

Each step adds complexity but unlocks new scaling dimensions. Read replicas are often sufficient for years before the next step is needed.

The Principal Engineer's Perspective

Summary: Read Replica Scaling

Let's consolidate the key insights from our exploration of read replica scaling:

Key Takeaways

•Replicas scale reads linearly — N replicas provide approximately N× read capacity, until replication becomes the bottleneck.
•Asynchronous replication means eventual consistency — Applications must handle read-your-writes and monotonic read anomalies.
•Lag-aware routing is essential — Don't send reads to replicas that are significantly behind. Implement monitoring and automatic failback.
•Load balancing requires health-aware strategies — Simple round-robin is insufficient. Consider lag, connection counts, and replica health.
•DDL operations impact the entire replica fleet — Use online DDL techniques to minimize replication lag and service disruption.
•Replicas don't scale writes — If write throughput is your bottleneck, you need functional partitioning or sharding, not more replicas.

What's Next:

Page Complete

2 / 5