System Design (HLD)Read Replicas

Read Replicas: Scaling SQL Reads

LevelIntermediate

Duration75 mins

TopicRead Replicas

1 / 5

Offloading Read Traffic

The Read Scalability Challenge

Every successful application eventually confronts a fundamental database challenge: reads dramatically outnumber writes. In typical web applications, the read-to-write ratio often exceeds 10:1, and for content-heavy platforms like social media feeds or e-commerce catalogs, ratios of 100:1 or even 1000:1 are common.

This asymmetry creates a critical bottleneck. Your primary database—designed to maintain consistency through ACID guarantees—becomes overwhelmed not by data modifications, but by the sheer volume of queries asking "What is the current state?" Users refreshing feeds, loading product pages, checking notifications, and searching catalogs collectively hammer your database with read requests that compete for the same CPU, memory, and I/O resources needed for writes.

Read replicas represent the foundational solution to this challenge. By creating copies of your database that serve read traffic independently, you multiply your read capacity while preserving the integrity guarantees your application demands.

What You Will Learn

By the end of this page, you will master the architecture and implementation of read traffic offloading. You'll understand when and why to implement read replicas, how to design effective read/write splitting strategies, the traffic patterns that benefit most from replication, and the trade-offs inherent in every replica architecture decision.

Understanding the Read Bottleneck

Before implementing read replicas, we must deeply understand why reads become the bottleneck and what specific constraints we're addressing. This understanding shapes every architectural decision.

The anatomy of a database read:

When your application executes a SELECT query, the database performs multiple resource-intensive operations:

Query parsing and planning — The SQL parser validates syntax, the optimizer evaluates execution strategies, and the planner selects indexes and join algorithms
Buffer pool access — The storage engine checks if required data pages exist in memory (buffer pool hit) or must be loaded from disk (buffer pool miss)
Lock acquisition — Even read operations may acquire shared locks to prevent inconsistent reads during concurrent modifications
Result materialization — The engine assembles matching rows, applies projections and aggregations, and prepares the result set
Network transmission — Results travel from database server to application server, consuming network bandwidth and connection slots

Each of these stages consumes finite resources. When read volume grows, these resources become contention points.

Database Resource Contention Under Read Load
Resource	Contention Mechanism	Symptoms Under Load	Scale Limit
CPU	Query planning, sorting, aggregations	High CPU utilization, slow complex queries	Core count × query parallelism
Memory (Buffer Pool)	Page caching, sort buffers, join buffers	Increased disk I/O, cache evictions	Physical RAM allocation
Disk I/O	Page reads when buffer pool misses	High read latency, I/O wait states	IOPS capacity, throughput bandwidth
Connection Pool	Finite connection slots per server	Connection timeouts, queue buildup	max_connections setting
Network	Result set transmission	Bandwidth saturation, packet queuing	NIC capacity, network topology

The single-primary constraint:

In traditional SQL database architectures, a single primary (master) server handles both reads and writes. This design ensures consistency—all modifications flow through one authoritative source—but creates an inherent scalability ceiling.

Consider a PostgreSQL primary with 64GB RAM and 16 CPU cores. Under light load, query latency is sub-millisecond. But as concurrent read queries multiply:

At 100 concurrent users: Average latency remains low (5-10ms)
At 500 concurrent users: Lock contention emerges, latency climbs (50-100ms)
At 1000 concurrent users: Connection pool saturates, timeouts begin
At 2000+ concurrent users: System becomes unresponsive, cascading failures occur

Vertical scaling (adding more CPU/RAM) offers diminishing returns. Beyond a certain point, you cannot buy a single machine powerful enough to handle the load. This is where horizontal scaling through read replicas becomes essential.

The Shared-Nothing Principle

Read replicas exemplify the shared-nothing architecture principle: instead of scaling a single resource vertically, we create independent processing units that share no state except through explicit replication. Each replica operates autonomously, with its own CPU, memory, and storage. This design enables near-linear scalability—adding replicas proportionally increases read capacity.

Read Replica Architecture Fundamentals

A read replica is a synchronized copy of your primary database that accepts only read queries. The primary database (also called master, leader, or writer) handles all write operations and propagates changes to replicas through a replication mechanism. This architecture separates the concerns of data modification from data retrieval.

Core architectural components:

Read Replica System Components

•Primary Database (Writer) — The authoritative source for all data modifications. Accepts INSERT, UPDATE, DELETE operations and maintains the transaction log (WAL in PostgreSQL, binlog in MySQL) that records every change.
•Replication Stream — The continuous flow of change events from primary to replicas. May be synchronous (primary waits for replica acknowledgment) or asynchronous (primary continues without waiting).
•Read Replicas (Followers) — Copies of the primary that apply replicated changes and serve read queries. Each replica maintains its own buffer pool, connection pool, and query processing resources.
•Query Router / Proxy — Optional but recommended component that directs queries to appropriate destinations—writes to primary, reads to replicas—transparently from the application's perspective.
•Health Monitor — System that tracks replica status, detects lag, and removes unhealthy replicas from the read pool to prevent stale data from being served.

Converting Mermaid diagram...

Replication mechanics vary by database:

Each database implements replication differently, but the conceptual model remains consistent across platforms:

PostgreSQL Streaming Replication:

Primary writes changes to Write-Ahead Log (WAL)
WAL sender process transmits WAL records to replicas
Replicas apply WAL records to maintain synchronization
Supports both synchronous and asynchronous modes

MySQL Binary Log Replication:

Primary writes changes to binary log (binlog)
Replicas connect to primary's binlog dump thread
Replica I/O thread pulls binlog events
Replica SQL thread applies events to local data

Cloud-Managed Replication (RDS, Cloud SQL, Azure SQL):

Managed services abstract replication complexity
Replicas provisioned through console/API
Automatic failover capabilities often included
Storage-layer replication for performance (e.g., Aurora)

Read/Write Splitting Strategies

The effectiveness of read replicas depends entirely on how your application routes queries. Read/write splitting—directing write operations to the primary and read operations to replicas—can be implemented at multiple architectural layers, each with distinct trade-offs.

Implementation layers from application to infrastructure:

Application-level splitting places routing logic directly in your codebase. The application explicitly chooses which database connection to use for each query.

Advantages:

Complete control over routing decisions
Can implement complex logic (e.g., read-your-writes consistency)
No additional infrastructure components
Easy to test and debug

Disadvantages:

Requires code changes throughout application
Risk of incorrect routing (writes sent to replicas)
Routing logic duplicated across services
Developers must understand replica architecture

database-router.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// Application-level read/write splitting in TypeScript
interface DatabaseConfig {
    primary: ConnectionPool;
    replicas: ConnectionPool[];
}
 
class DatabaseRouter {
    private config: DatabaseConfig;
    private replicaIndex = 0;
 
    constructor(config: DatabaseConfig) {
        this.config = config;
    }
 
    // Write operations always go to primary
    async write<T>(query: string, params: unknown[]): Promise<T> {
        return this.config.primary.execute(query, params);
    }
 
    // Read operations round-robin across replicas
    async read<T>(query: string, params: unknown[]): Promise<T> {
        const replica = this.selectReplica();
        return replica.execute(query, params);
    }
 
    // Read from primary when consistency is critical
    async readFromPrimary<T>(query: string, params: unknown[]): Promise<T> {
        return this.config.primary.execute(query, params);
    }
 
    private selectReplica(): ConnectionPool {
        const replica = this.config.replicas[this.replicaIndex];
        this.replicaIndex = (this.replicaIndex + 1) % this.config.replicas.length;
        return replica;
    }
}
 
// Usage in application code
const router = new DatabaseRouter(config);
 
// Write goes to primary
await router.write('INSERT INTO orders (user_id, total) VALUES ($1, $2)', [userId, total]);
 
// Read goes to replica
const orders = await router.read('SELECT * FROM orders WHERE user_id = $1', [userId]);
 
// Read-your-writes: immediately read from primary after write
const justCreated = await router.readFromPrimary('SELECT * FROM orders WHERE id = $1', [orderId]);

Hybrid Approaches

Production systems often combine multiple layers. A proxy handles basic routing and connection pooling, while application code implements nuanced logic for consistency-critical paths. This layered approach provides both transparency for simple cases and control for complex scenarios.

Traffic Patterns and Workload Analysis

Not all read workloads benefit equally from replication. Understanding your traffic patterns is essential for sizing replica infrastructure and anticipating when replication will—and won't—solve your problems.

Characterizing read workloads:

Read queries vary dramatically in their resource consumption and latency tolerance. Effective replica deployment requires classifying your queries along several dimensions:

Read Query Classification Matrix
Query Type	Characteristics	Replica Suitability	Example
Point Lookups	Index seek, single row, <5ms	Excellent	SELECT * FROM users WHERE id = 123
Range Scans	Index range, multiple rows, 5-50ms	Excellent	SELECT * FROM orders WHERE user_id = 123
Aggregations	Table/index scans, computation, 50-500ms	Good (offloads primary)	SELECT COUNT(*), AVG(total) FROM orders WHERE date > '2024-01-01'
Complex Joins	Multi-table, optimization-heavy, 100ms-5s	Good (but lag-sensitive)	SELECT ... FROM orders JOIN products JOIN inventory ...
Full-Text Search	Specialized indexes, 50-200ms	Good	SELECT * FROM products WHERE name @@ 'search terms'
Analytic Queries	Large scans, heavy aggregation, 1s-minutes	Excellent (dedicated replica)	SELECT date, SUM(revenue) ... GROUP BY date
Real-Time Dashboards	Frequent, lightweight, <10ms	Moderate (lag consideration)	SELECT COUNT(*) FROM active_sessions

Workload patterns that maximize replica value:

Ideal for Read Replicas

•High read-to-write ratio — 10:1 or higher means reads dominate resource consumption
•Tolerance for slight staleness — Displaying cached content, historical reports, non-critical reads
•Heavy analytical queries — Long-running reports that would block OLTP traffic on primary
•Geographically distributed users — Replicas placed in user regions reduce latency
•Predictable query patterns — Cacheable, index-friendly queries scale linearly

Poor Fit for Read Replicas Alone

•Write-heavy workloads — Replication overhead dominates; consider sharding instead
•Strict consistency requirements — Financial transactions, inventory, real-time counters
•Read-after-write requirements — User expects to immediately see their changes
•Low read volume — Replica cost outweighs benefit below certain thresholds
•Unpredictable query patterns — Ad-hoc queries may create hotspots or scan entire tables

Measure Before You Architect

Before deploying replicas, instrument your application to capture actual query patterns. Use database query logs, APM tools, or built-in statistics (pg_stat_statements in PostgreSQL, performance_schema in MySQL) to understand your read/write ratio, query latency distribution, and connection utilization. This data should drive replica sizing and routing strategy.

Scaling Considerations and Capacity Planning

Adding read replicas is not infinitely scalable. Several constraints limit how many replicas you can effectively deploy and how much read capacity you can gain.

Replication fan-out limits:

Each replica maintains a connection to the primary and consumes resources on the primary for WAL/binlog transmission. Practical limits include:

PostgreSQL: Typically supports 10-20 streaming replicas before primary overhead becomes significant. Each replica requires a replication slot and WAL sender process.
MySQL: Similar limits, with binlog dump threads consuming resources per replica. MySQL 8.0 improved parallel replication but primary overhead remains.
Cloud-managed services: May have explicit limits (e.g., RDS allows up to 15 read replicas for MySQL/PostgreSQL).

Scaling strategies beyond direct replication:

Advanced Replica Scaling Patterns

•Cascading replicas — Replicas can replicate from other replicas instead of the primary. This creates a replication tree that reduces primary load but increases lag at lower tiers.
•Read replica pools by function — Separate replica pools for different workloads: one pool for API reads, another for analytics, another for search indexing. Prevents workload interference.
•Cache layer in front of replicas — Application-level caching (Redis, Memcached) handles hot-path reads, reducing replica load. Replicas serve cache misses and less frequent queries.
•Materialized views and denormalization — Pre-computed results stored in dedicated read stores (Elasticsearch for search, ClickHouse for analytics). Reduces query complexity on replicas.

Capacity planning formula:

A rough capacity model for read replicas:

Required Replicas = (Peak Concurrent Read Queries × Average Query Time) / (Replica Capacity × Safety Factor)

Example calculation:

Peak concurrent read queries: 1,000
Average query execution time: 50ms (0.05s)
Single replica can handle: 500 queries/second
Safety factor: 1.5 (headroom for spikes)

Required = (1000 × 0.05) / (500 × (1/1.5))
Required = 50 / 333.33 ≈ 0.15, round up to 2 replicas minimum

Always add at least one additional replica for redundancy—if one fails, remaining replicas must absorb its traffic.

Replica Elasticity Considerations

Unlike stateless application servers, database replicas cannot be instantly scaled. Creating a new replica requires copying the entire dataset—a process that takes hours for large databases. Plan capacity ahead of demand, and consider maintaining warm standby replicas that can be promoted to active read duty during traffic spikes.

Implementation Best Practices

Deploying read replicas effectively requires attention to operational details that determine success or failure in production.

Read Replica Implementation Checklist

•Identical hardware specifications — Replicas should match primary specifications. Underpowered replicas become bottlenecks and increase lag.
•Separate connection pools — Use dedicated connection pools for primary and replicas. This prevents connection exhaustion on either tier from affecting the other.
•Health check endpoints — Implement replica health checks that verify both connectivity AND acceptable lag. Remove lagging replicas from load balancer rotation automatically.
•Query timeout configuration — Set appropriate query timeouts on replicas. Long-running queries should not block connection pools indefinitely.
•Read-only enforcement — Configure replicas as explicitly read-only at the database level (e.g., default_transaction_read_only=on in PostgreSQL). This prevents accidental writes.
•Monitoring and alerting — Track replication lag, replica query latency, connection pool utilization, and error rates. Alert before user impact.
•Graceful degradation — Design the application to fall back to primary reads if all replicas become unavailable. Primary should have headroom for this scenario.
•Testing replica failures — Regularly test replica failure scenarios in non-production environments. Verify that routing correctly excludes failed replicas.

health-check-example.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// Replica health check implementation
interface ReplicaHealth {
    isHealthy: boolean;
    lagSeconds: number;
    lastChecked: Date;
}
 
const MAX_ACCEPTABLE_LAG_SECONDS = 30;
const HEALTH_CHECK_INTERVAL_MS = 5000;
 
async function checkReplicaHealth(replica: DatabaseConnection): Promise<ReplicaHealth> {
    try {
        // Check connectivity with simple query
        await replica.query('SELECT 1');
        
        // Check replication lag (PostgreSQL example)
        const lagResult = await replica.query(`
            SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) 
            AS lag_seconds
        `);
        
        const lagSeconds = lagResult.rows[0]?.lag_seconds ?? Infinity;
        
        return {
            isHealthy: lagSeconds <= MAX_ACCEPTABLE_LAG_SECONDS,
            lagSeconds,
            lastChecked: new Date(),
        };
    } catch (error) {
        return {
            isHealthy: false,
            lagSeconds: Infinity,
            lastChecked: new Date(),
        };
    }
}
 
// Load balancer integrates health status
class ReplicaLoadBalancer {
    private replicas: Map<string, { connection: DatabaseConnection; health: ReplicaHealth }>;
    
    getHealthyReplica(): DatabaseConnection | null {
        const healthy = Array.from(this.replicas.values())
            .filter(r => r.health.isHealthy)
            .sort((a, b) => a.health.lagSeconds - b.health.lagSeconds); // Prefer lowest lag
        
        return healthy.length > 0 ? healthy[0].connection : null;
    }
}

Summary: Offloading Read Traffic

We've established the foundational concepts for scaling SQL reads through replica architectures. Let's consolidate the essential takeaways:

Key Takeaways

•Read replicas solve the read scalability challenge — By creating synchronized copies of your database, you multiply read capacity while preserving the single-writer consistency model.
•Read/write splitting can be implemented at multiple layers — Application-level provides maximum control, ORM-level offers convenience, and proxy-level enables transparency. Many systems combine approaches.
•Not all workloads benefit equally — High read-to-write ratios, tolerance for staleness, and heavy analytical queries are ideal candidates. Write-heavy or consistency-critical workloads require different strategies.
•Capacity planning requires measurement — Understand your actual query patterns, latency requirements, and connection utilization before sizing replica infrastructure.
•Operational excellence determines success — Health checks, monitoring, graceful degradation, and failure testing are essential for production reliability.

What's next:

With the architectural foundation established, the next page dives into Replica Lag Handling—the inherent trade-off of asynchronous replication. We'll explore what lag is, why it occurs, how to measure it, and strategies for building applications that gracefully handle stale reads.

Page Complete

You now understand the fundamental architecture of read replicas and how to offload read traffic from your primary database. The concepts covered here—read/write splitting, traffic pattern analysis, capacity planning, and operational best practices—form the foundation for all subsequent topics in this module.

1 / 5

Loading learning content...

System Design (HLD)Read Replicas

Read Replicas: Scaling SQL Reads

LevelIntermediate

Duration75 mins

TopicRead Replicas

1 / 5

Offloading Read Traffic

The Read Scalability Challenge

What You Will Learn

Understanding the Read Bottleneck

Before implementing read replicas, we must deeply understand why reads become the bottleneck and what specific constraints we're addressing. This understanding shapes every architectural decision.

The anatomy of a database read:

When your application executes a SELECT query, the database performs multiple resource-intensive operations:

Query parsing and planning — The SQL parser validates syntax, the optimizer evaluates execution strategies, and the planner selects indexes and join algorithms
Buffer pool access — The storage engine checks if required data pages exist in memory (buffer pool hit) or must be loaded from disk (buffer pool miss)
Lock acquisition — Even read operations may acquire shared locks to prevent inconsistent reads during concurrent modifications
Result materialization — The engine assembles matching rows, applies projections and aggregations, and prepares the result set
Network transmission — Results travel from database server to application server, consuming network bandwidth and connection slots

Each of these stages consumes finite resources. When read volume grows, these resources become contention points.

Database Resource Contention Under Read Load
Resource	Contention Mechanism	Symptoms Under Load	Scale Limit
CPU	Query planning, sorting, aggregations	High CPU utilization, slow complex queries	Core count × query parallelism
Memory (Buffer Pool)	Page caching, sort buffers, join buffers	Increased disk I/O, cache evictions	Physical RAM allocation
Disk I/O	Page reads when buffer pool misses	High read latency, I/O wait states	IOPS capacity, throughput bandwidth
Connection Pool	Finite connection slots per server	Connection timeouts, queue buildup	max_connections setting
Network	Result set transmission	Bandwidth saturation, packet queuing	NIC capacity, network topology

The single-primary constraint:

Consider a PostgreSQL primary with 64GB RAM and 16 CPU cores. Under light load, query latency is sub-millisecond. But as concurrent read queries multiply:

At 100 concurrent users: Average latency remains low (5-10ms)
At 500 concurrent users: Lock contention emerges, latency climbs (50-100ms)
At 1000 concurrent users: Connection pool saturates, timeouts begin
At 2000+ concurrent users: System becomes unresponsive, cascading failures occur

The Shared-Nothing Principle

Read Replica Architecture Fundamentals

Core architectural components:

Read Replica System Components

•Primary Database (Writer) — The authoritative source for all data modifications. Accepts INSERT, UPDATE, DELETE operations and maintains the transaction log (WAL in PostgreSQL, binlog in MySQL) that records every change.
•Replication Stream — The continuous flow of change events from primary to replicas. May be synchronous (primary waits for replica acknowledgment) or asynchronous (primary continues without waiting).
•Read Replicas (Followers) — Copies of the primary that apply replicated changes and serve read queries. Each replica maintains its own buffer pool, connection pool, and query processing resources.
•Query Router / Proxy — Optional but recommended component that directs queries to appropriate destinations—writes to primary, reads to replicas—transparently from the application's perspective.
•Health Monitor — System that tracks replica status, detects lag, and removes unhealthy replicas from the read pool to prevent stale data from being served.

Converting Mermaid diagram...

Replication mechanics vary by database:

Each database implements replication differently, but the conceptual model remains consistent across platforms:

PostgreSQL Streaming Replication:

Primary writes changes to Write-Ahead Log (WAL)
WAL sender process transmits WAL records to replicas
Replicas apply WAL records to maintain synchronization
Supports both synchronous and asynchronous modes

MySQL Binary Log Replication:

Primary writes changes to binary log (binlog)
Replicas connect to primary's binlog dump thread
Replica I/O thread pulls binlog events
Replica SQL thread applies events to local data

Cloud-Managed Replication (RDS, Cloud SQL, Azure SQL):

Managed services abstract replication complexity
Replicas provisioned through console/API
Automatic failover capabilities often included
Storage-layer replication for performance (e.g., Aurora)

Read/Write Splitting Strategies

Implementation layers from application to infrastructure:

Application-level splitting places routing logic directly in your codebase. The application explicitly chooses which database connection to use for each query.

Advantages:

Complete control over routing decisions
Can implement complex logic (e.g., read-your-writes consistency)
No additional infrastructure components
Easy to test and debug

Disadvantages:

Requires code changes throughout application
Risk of incorrect routing (writes sent to replicas)
Routing logic duplicated across services
Developers must understand replica architecture

database-router.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// Application-level read/write splitting in TypeScript
interface DatabaseConfig {
    primary: ConnectionPool;
    replicas: ConnectionPool[];
}
 
class DatabaseRouter {
    private config: DatabaseConfig;
    private replicaIndex = 0;
 
    constructor(config: DatabaseConfig) {
        this.config = config;
    }
 
    // Write operations always go to primary
    async write<T>(query: string, params: unknown[]): Promise<T> {
        return this.config.primary.execute(query, params);
    }
 
    // Read operations round-robin across replicas
    async read<T>(query: string, params: unknown[]): Promise<T> {
        const replica = this.selectReplica();
        return replica.execute(query, params);
    }
 
    // Read from primary when consistency is critical
    async readFromPrimary<T>(query: string, params: unknown[]): Promise<T> {
        return this.config.primary.execute(query, params);
    }
 
    private selectReplica(): ConnectionPool {
        const replica = this.config.replicas[this.replicaIndex];
        this.replicaIndex = (this.replicaIndex + 1) % this.config.replicas.length;
        return replica;
    }
}
 
// Usage in application code
const router = new DatabaseRouter(config);
 
// Write goes to primary
await router.write('INSERT INTO orders (user_id, total) VALUES ($1, $2)', [userId, total]);
 
// Read goes to replica
const orders = await router.read('SELECT * FROM orders WHERE user_id = $1', [userId]);
 
// Read-your-writes: immediately read from primary after write
const justCreated = await router.readFromPrimary('SELECT * FROM orders WHERE id = $1', [orderId]);

Hybrid Approaches

Traffic Patterns and Workload Analysis

Characterizing read workloads:

Read queries vary dramatically in their resource consumption and latency tolerance. Effective replica deployment requires classifying your queries along several dimensions:

Read Query Classification Matrix
Query Type	Characteristics	Replica Suitability	Example
Point Lookups	Index seek, single row, <5ms	Excellent	SELECT * FROM users WHERE id = 123
Range Scans	Index range, multiple rows, 5-50ms	Excellent	SELECT * FROM orders WHERE user_id = 123
Aggregations	Table/index scans, computation, 50-500ms	Good (offloads primary)	SELECT COUNT(*), AVG(total) FROM orders WHERE date > '2024-01-01'
Complex Joins	Multi-table, optimization-heavy, 100ms-5s	Good (but lag-sensitive)	SELECT ... FROM orders JOIN products JOIN inventory ...
Full-Text Search	Specialized indexes, 50-200ms	Good	SELECT * FROM products WHERE name @@ 'search terms'
Analytic Queries	Large scans, heavy aggregation, 1s-minutes	Excellent (dedicated replica)	SELECT date, SUM(revenue) ... GROUP BY date
Real-Time Dashboards	Frequent, lightweight, <10ms	Moderate (lag consideration)	SELECT COUNT(*) FROM active_sessions

Workload patterns that maximize replica value:

Ideal for Read Replicas

•High read-to-write ratio — 10:1 or higher means reads dominate resource consumption
•Tolerance for slight staleness — Displaying cached content, historical reports, non-critical reads
•Heavy analytical queries — Long-running reports that would block OLTP traffic on primary
•Geographically distributed users — Replicas placed in user regions reduce latency
•Predictable query patterns — Cacheable, index-friendly queries scale linearly

Poor Fit for Read Replicas Alone

•Write-heavy workloads — Replication overhead dominates; consider sharding instead
•Strict consistency requirements — Financial transactions, inventory, real-time counters
•Read-after-write requirements — User expects to immediately see their changes
•Low read volume — Replica cost outweighs benefit below certain thresholds
•Unpredictable query patterns — Ad-hoc queries may create hotspots or scan entire tables

Measure Before You Architect

Scaling Considerations and Capacity Planning

Adding read replicas is not infinitely scalable. Several constraints limit how many replicas you can effectively deploy and how much read capacity you can gain.

Replication fan-out limits:

Each replica maintains a connection to the primary and consumes resources on the primary for WAL/binlog transmission. Practical limits include:

PostgreSQL: Typically supports 10-20 streaming replicas before primary overhead becomes significant. Each replica requires a replication slot and WAL sender process.
MySQL: Similar limits, with binlog dump threads consuming resources per replica. MySQL 8.0 improved parallel replication but primary overhead remains.
Cloud-managed services: May have explicit limits (e.g., RDS allows up to 15 read replicas for MySQL/PostgreSQL).

Scaling strategies beyond direct replication:

Advanced Replica Scaling Patterns

•Cascading replicas — Replicas can replicate from other replicas instead of the primary. This creates a replication tree that reduces primary load but increases lag at lower tiers.
•Read replica pools by function — Separate replica pools for different workloads: one pool for API reads, another for analytics, another for search indexing. Prevents workload interference.
•Cache layer in front of replicas — Application-level caching (Redis, Memcached) handles hot-path reads, reducing replica load. Replicas serve cache misses and less frequent queries.
•Materialized views and denormalization — Pre-computed results stored in dedicated read stores (Elasticsearch for search, ClickHouse for analytics). Reduces query complexity on replicas.

Capacity planning formula:

A rough capacity model for read replicas:

Required Replicas = (Peak Concurrent Read Queries × Average Query Time) / (Replica Capacity × Safety Factor)

Example calculation:

Peak concurrent read queries: 1,000
Average query execution time: 50ms (0.05s)
Single replica can handle: 500 queries/second
Safety factor: 1.5 (headroom for spikes)

Required = (1000 × 0.05) / (500 × (1/1.5))
Required = 50 / 333.33 ≈ 0.15, round up to 2 replicas minimum

Always add at least one additional replica for redundancy—if one fails, remaining replicas must absorb its traffic.

Replica Elasticity Considerations

Implementation Best Practices

Deploying read replicas effectively requires attention to operational details that determine success or failure in production.

Read Replica Implementation Checklist

•Identical hardware specifications — Replicas should match primary specifications. Underpowered replicas become bottlenecks and increase lag.
•Separate connection pools — Use dedicated connection pools for primary and replicas. This prevents connection exhaustion on either tier from affecting the other.
•Health check endpoints — Implement replica health checks that verify both connectivity AND acceptable lag. Remove lagging replicas from load balancer rotation automatically.
•Query timeout configuration — Set appropriate query timeouts on replicas. Long-running queries should not block connection pools indefinitely.
•Read-only enforcement — Configure replicas as explicitly read-only at the database level (e.g., default_transaction_read_only=on in PostgreSQL). This prevents accidental writes.
•Monitoring and alerting — Track replication lag, replica query latency, connection pool utilization, and error rates. Alert before user impact.
•Graceful degradation — Design the application to fall back to primary reads if all replicas become unavailable. Primary should have headroom for this scenario.
•Testing replica failures — Regularly test replica failure scenarios in non-production environments. Verify that routing correctly excludes failed replicas.

health-check-example.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// Replica health check implementation
interface ReplicaHealth {
    isHealthy: boolean;
    lagSeconds: number;
    lastChecked: Date;
}
 
const MAX_ACCEPTABLE_LAG_SECONDS = 30;
const HEALTH_CHECK_INTERVAL_MS = 5000;
 
async function checkReplicaHealth(replica: DatabaseConnection): Promise<ReplicaHealth> {
    try {
        // Check connectivity with simple query
        await replica.query('SELECT 1');
        
        // Check replication lag (PostgreSQL example)
        const lagResult = await replica.query(`
            SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) 
            AS lag_seconds
        `);
        
        const lagSeconds = lagResult.rows[0]?.lag_seconds ?? Infinity;
        
        return {
            isHealthy: lagSeconds <= MAX_ACCEPTABLE_LAG_SECONDS,
            lagSeconds,
            lastChecked: new Date(),
        };
    } catch (error) {
        return {
            isHealthy: false,
            lagSeconds: Infinity,
            lastChecked: new Date(),
        };
    }
}
 
// Load balancer integrates health status
class ReplicaLoadBalancer {
    private replicas: Map<string, { connection: DatabaseConnection; health: ReplicaHealth }>;
    
    getHealthyReplica(): DatabaseConnection | null {
        const healthy = Array.from(this.replicas.values())
            .filter(r => r.health.isHealthy)
            .sort((a, b) => a.health.lagSeconds - b.health.lagSeconds); // Prefer lowest lag
        
        return healthy.length > 0 ? healthy[0].connection : null;
    }
}

Summary: Offloading Read Traffic

We've established the foundational concepts for scaling SQL reads through replica architectures. Let's consolidate the essential takeaways:

Key Takeaways

•Read replicas solve the read scalability challenge — By creating synchronized copies of your database, you multiply read capacity while preserving the single-writer consistency model.
•Read/write splitting can be implemented at multiple layers — Application-level provides maximum control, ORM-level offers convenience, and proxy-level enables transparency. Many systems combine approaches.
•Not all workloads benefit equally — High read-to-write ratios, tolerance for staleness, and heavy analytical queries are ideal candidates. Write-heavy or consistency-critical workloads require different strategies.
•Capacity planning requires measurement — Understand your actual query patterns, latency requirements, and connection utilization before sizing replica infrastructure.
•Operational excellence determines success — Health checks, monitoring, graceful degradation, and failure testing are essential for production reliability.

What's next:

Page Complete

1 / 5