System Design (HLD)Read Replicas

Read Replicas: Scaling SQL Reads

LevelIntermediate

Duration75 mins

TopicRead Replicas

3 / 5

Load Balancing Reads

Distributing Traffic at Scale

Having multiple read replicas is only half the solution. The other half—arguably the more complex half—is intelligently distributing read traffic across those replicas. A naive approach might send all traffic to the first available replica, leaving others idle. A slightly better approach might round-robin evenly, ignoring differences in replica capacity or current load.

Production systems require more sophistication. Replicas fail, experience lag spikes, or become overloaded. Network paths vary in latency. Different query types stress different resources. The load balancing strategy must account for all these factors while adding minimal latency and maintaining high availability.

This page explores load balancing for database reads in depth—from fundamental algorithms to production-ready patterns that handle the messy realities of distributed systems.

What You Will Learn

By the end of this page, you will master load balancing algorithms and their trade-offs, understand how to detect and handle replica failures, implement connection pooling strategies that maximize efficiency, and design systems that remain performant even when individual replicas degrade.

Load Balancing Algorithms

The choice of load balancing algorithm significantly impacts how effectively replicas are utilized and how the system responds to failures and varying load conditions. Each algorithm has distinct characteristics suited to different scenarios.

Load Balancing Algorithms Comparison
Algorithm	Mechanism	Best For	Limitations
Round Robin	Cycle through replicas in order	Homogeneous replicas, uniform query cost	Ignores replica health, capacity, current load
Weighted Round Robin	Round robin with configurable weights	Heterogeneous replicas (different capacities)	Static weights don't adapt to dynamic conditions
Least Connections	Route to replica with fewest active queries	Variable query duration workloads	Connection count ≠ actual load; long queries skew
Least Response Time	Route to replica with lowest recent latency	Latency-sensitive applications	Requires continuous latency measurement; cold start issues
Random	Select replica randomly	Many replicas; simplicity required	Uneven distribution with small replica counts
Weighted Random	Random with probability proportional to weight	Heterogeneous replicas; simple implementation	Less deterministic; harder to debug
Consistent Hashing	Hash query/user to specific replica	Session affinity; cache locality	Hotspots with skewed distributions; rebalancing on failures
Adaptive/Dynamic	Adjust based on real-time metrics	Production systems with variable load	Most complex; requires comprehensive monitoring

Algorithm deep dive: Least Connections vs. Least Response Time

These two algorithms are often confused but behave differently under load:

Least Connections routes new queries to the replica with the fewest active connections/queries. This works well when:

Queries have similar execution times
Connection count accurately reflects load
You want to avoid overloading busy replicas

However, if one replica is processing 10 lightweight queries while another processes 2 complex analytics queries, connection count doesn't reflect true load.

Least Response Time routes to the replica with the lowest observed latency. This better reflects actual performance but requires:

Continuous latency measurement
Handling of startup conditions (new replicas have no history)
Smoothing to avoid oscillation (one slow query shouldn't immediately deprioritize a replica)

load-balancer-algorithms.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
// Implementation of common load balancing algorithms
 
interface Replica {
    id: string;
    connection: DatabaseConnection;
    weight: number;
    activeConnections: number;
    avgLatencyMs: number;
    healthy: boolean;
}
 
// Round Robin
class RoundRobinBalancer {
    private replicas: Replica[];
    private currentIndex = 0;
    
    constructor(replicas: Replica[]) {
        this.replicas = replicas;
    }
    
    next(): Replica | null {
        const healthy = this.replicas.filter(r => r.healthy);
        if (healthy.length === 0) return null;
        
        const replica = healthy[this.currentIndex % healthy.length];
        this.currentIndex++;
        return replica;
    }
}
 
// Weighted Round Robin
class WeightedRoundRobinBalancer {
    private replicas: Replica[];
    private weights: number[];
    private currentWeights: number[];
    
    constructor(replicas: Replica[]) {
        this.replicas = replicas;
        this.weights = replicas.map(r => r.weight);
        this.currentWeights = [...this.weights];
    }
    
    next(): Replica | null {
        const healthy = this.replicas.filter(r => r.healthy);
        if (healthy.length === 0) return null;
        
        // Find replica with highest current weight
        let maxWeight = -1;
        let selected: Replica | null = null;
        let selectedIndex = -1;
        
        for (let i = 0; i < healthy.length; i++) {
            const originalIndex = this.replicas.indexOf(healthy[i]);
            if (this.currentWeights[originalIndex] > maxWeight) {
                maxWeight = this.currentWeights[originalIndex];
                selected = healthy[i];
                selectedIndex = originalIndex;
            }
        }
        
        if (selected && selectedIndex >= 0) {
            // Decrease selected weight, increase others
            const totalWeight = this.weights.reduce((a, b) => a + b, 0);
            this.currentWeights[selectedIndex] -= totalWeight;
            for (let i = 0; i < this.weights.length; i++) {
                this.currentWeights[i] += this.weights[i];
            }
        }
        
        return selected;
    }
}
 
// Least Connections
class LeastConnectionsBalancer {
    private replicas: Replica[];
    
    constructor(replicas: Replica[]) {
        this.replicas = replicas;
    }
    
    next(): Replica | null {
        const healthy = this.replicas.filter(r => r.healthy);
        if (healthy.length === 0) return null;
        
        // Sort by active connections, then by weight (higher weight = tiebreaker preference)
        healthy.sort((a, b) => {
            if (a.activeConnections !== b.activeConnections) {
                return a.activeConnections - b.activeConnections;
            }
            return b.weight - a.weight;
        });
        
        return healthy[0];
    }
}
 
// Least Response Time with Exponential Moving Average
class LeastResponseTimeBalancer {
    private replicas: Replica[];
    private readonly alpha = 0.3; // Smoothing factor
    
    constructor(replicas: Replica[]) {
        this.replicas = replicas;
    }
    
    recordLatency(replicaId: string, latencyMs: number): void {
        const replica = this.replicas.find(r => r.id === replicaId);
        if (replica) {
            // Exponential moving average
            replica.avgLatencyMs = this.alpha * latencyMs + (1 - this.alpha) * replica.avgLatencyMs;
        }
    }
    
    next(): Replica | null {
        const healthy = this.replicas.filter(r => r.healthy);
        if (healthy.length === 0) return null;
        
        // Sort by average latency
        healthy.sort((a, b) => a.avgLatencyMs - b.avgLatencyMs);
        
        return healthy[0];
    }
}

Health Checking and Failure Detection

Load balancing depends on accurate knowledge of replica health. A load balancer that routes traffic to a failed replica provides the worst of both worlds: queries fail AND healthy replicas are underutilized.

Health Check Dimensions

•Connectivity — Can we establish a connection at all? TCP handshake success, authentication success.
•Responsiveness — Does the replica respond to simple queries in acceptable time? SELECT 1 should complete in milliseconds.
•Replication status — Is the replica connected to the primary and receiving updates? Check replication status variables.
•Replication lag — Is the replica acceptably current? Compare WAL positions or check lag metrics.
•Resource health — Is the replica overwhelmed? Check connection pool saturation, query queue depth.
•Data validity — Can the replica serve actual queries? Test against known tables/data.

Health check types:

Active health checks probe replicas continuously at fixed intervals, regardless of traffic. This ensures timely detection even during low-traffic periods but consumes resources.

Passive health checks observe actual query results and mark replicas unhealthy upon failures. This is efficient but may be slow to detect problems during low traffic.

Hybrid approaches combine both: active checks run at intervals, passive checks accelerate detection when failures occur during traffic.

health-checker.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
// Comprehensive health checker for read replicas
 
interface HealthCheckResult {
    healthy: boolean;
    connectivity: boolean;
    responsive: boolean;
    replicationActive: boolean;
    lagMs: number;
    checkDurationMs: number;
    timestamp: Date;
    error?: string;
}
 
interface HealthCheckConfig {
    connectTimeoutMs: number;
    queryTimeoutMs: number;
    maxLagMs: number;
    consecutiveFailuresBeforeUnhealthy: number;
    consecutiveSuccessesBeforeHealthy: number;
}
 
class ReplicaHealthChecker {
    private config: HealthCheckConfig;
    private consecutiveFailures = 0;
    private consecutiveSuccesses = 0;
    private currentlyHealthy = true;
    
    constructor(config: HealthCheckConfig) {
        this.config = config;
    }
    
    async check(replica: DatabaseConnection): Promise<HealthCheckResult> {
        const startTime = Date.now();
        const result: HealthCheckResult = {
            healthy: false,
            connectivity: false,
            responsive: false,
            replicationActive: false,
            lagMs: Infinity,
            checkDurationMs: 0,
            timestamp: new Date(),
        };
        
        try {
            // Test 1: Connectivity
            await this.testConnectivity(replica);
            result.connectivity = true;
            
            // Test 2: Responsiveness
            await this.testResponsiveness(replica);
            result.responsive = true;
            
            // Test 3: Replication status and lag
            const replicationStatus = await this.testReplication(replica);
            result.replicationActive = replicationStatus.active;
            result.lagMs = replicationStatus.lagMs;
            
            // Determine overall health
            result.healthy = 
                result.connectivity &&
                result.responsive &&
                result.replicationActive &&
                result.lagMs <= this.config.maxLagMs;
            
        } catch (error) {
            result.error = error instanceof Error ? error.message : String(error);
            result.healthy = false;
        }
        
        result.checkDurationMs = Date.now() - startTime;
        
        // Apply hysteresis (prevent flapping)
        this.updateHealthState(result.healthy);
        result.healthy = this.currentlyHealthy;
        
        return result;
    }
    
    private async testConnectivity(replica: DatabaseConnection): Promise<void> {
        // Attempt to ping or run minimal command
        await Promise.race([
            replica.ping(),
            new Promise((_, reject) => 
                setTimeout(() => reject(new Error('Connect timeout')), this.config.connectTimeoutMs)
            ),
        ]);
    }
    
    private async testResponsiveness(replica: DatabaseConnection): Promise<void> {
        await Promise.race([
            replica.query('SELECT 1'),
            new Promise((_, reject) => 
                setTimeout(() => reject(new Error('Query timeout')), this.config.queryTimeoutMs)
            ),
        ]);
    }
    
    private async testReplication(
        replica: DatabaseConnection
    ): Promise<{ active: boolean; lagMs: number }> {
        // PostgreSQL example
        const result = await replica.query(`
            SELECT 
                pg_is_in_recovery() AS is_replica,
                CASE 
                    WHEN pg_last_xact_replay_timestamp() IS NULL THEN -1
                    ELSE EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) * 1000
                END AS lag_ms
        `);
        
        const row = result.rows[0];
        return {
            active: row.is_replica === true,
            lagMs: row.lag_ms >= 0 ? row.lag_ms : Infinity,
        };
    }
    
    private updateHealthState(checkPassed: boolean): void {
        if (checkPassed) {
            this.consecutiveSuccesses++;
            this.consecutiveFailures = 0;
            
            if (!this.currentlyHealthy && 
                this.consecutiveSuccesses >= this.config.consecutiveSuccessesBeforeHealthy) {
                this.currentlyHealthy = true;
                console.log('Replica marked healthy after recovery');
            }
        } else {
            this.consecutiveFailures++;
            this.consecutiveSuccesses = 0;
            
            if (this.currentlyHealthy && 
                this.consecutiveFailures >= this.config.consecutiveFailuresBeforeUnhealthy) {
                this.currentlyHealthy = false;
                console.warn('Replica marked unhealthy after consecutive failures');
            }
        }
    }
}

Hysteresis Prevents Flapping

Healthy/unhealthy state should not toggle on every check. Require multiple consecutive failures before marking unhealthy, and multiple consecutive successes before marking healthy again. This prevents transient issues (single timeout, brief network hiccup) from causing rapid state changes that destabilize routing.

Connection Pooling Strategies

Database connections are expensive resources. Each connection consumes memory on both client and server, requires TCP state, and (for PostgreSQL) spawns a backend process. Connection pooling—reusing connections across multiple queries—is essential for efficient replica utilization.

Pooling Benefits

•Reduced connection overhead — Avoid per-query connection establishment (authentication, SSL handshake, backend spawning)
•Connection reuse — Hot connections with prepared statements and session state remain available
•Concurrency control — Pool size limits prevent overwhelming databases with too many connections
•Failure isolation — Bad connections are detected and replaced without affecting the pool

Pooling Considerations

•Session state management — Transaction state, temporary tables, session variables may leak between uses
•Connection validation — Pooled connections may become stale (server timeout, network partition)
•Pool sizing — Too small = queue buildup; too large = server overwhelm
•Multi-tenancy complexity — Different credentials or schemas require pool-per-tenant

Pooling architectures:

Application-level pooling maintains pools within each application instance. Libraries like HikariCP (Java), pgx (Go), and node-postgres provide built-in pooling. This is simple but creates many connections when applications scale horizontally.

Connection proxy pooling interposes a pooler (PgBouncer, ProxySQL) between applications and databases. Applications connect to the pooler, which maintains far fewer connections to the actual database. This multiplexes many application connections onto fewer database connections.

pgbouncer.ini
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
; PgBouncer configuration for read replica pooling
 
[databases]
; Define read replica pool
myapp_read = host=replica1.internal,replica2.internal,replica3.internal port=5432 dbname=myapp
myapp_write = host=primary.internal port=5432 dbname=myapp
 
[pgbouncer]
; Pool mode determines how connections are shared
; session: connection locked until client disconnects (safest, least efficient)
; transaction: connection returned after each transaction (good balance)
; statement: connection returned after each statement (most efficient, restrictions apply)
pool_mode = transaction
 
; Connection limits
max_client_conn = 10000          ; Max connections from applications
default_pool_size = 25           ; Default connections per database user
min_pool_size = 5                ; Minimum connections to keep open
reserve_pool_size = 5            ; Extra connections for spike handling
reserve_pool_timeout = 3         ; Seconds before using reserve pool
 
; Health and validation
server_check_delay = 30          ; Seconds between server checks
server_check_query = SELECT 1    ; Query for health check
server_lifetime = 3600           ; Max seconds a connection lives
server_idle_timeout = 600        ; Seconds before closing idle connection
 
; Logging and monitoring
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
 
; Security
auth_type = scram-sha-256
max_db_connections = 50          ; Max connections per database
 
[users]
; User-specific pool configurations
app_user = pool_mode=transaction max_user_connections=100

Pool Mode Selection

Transaction pooling is the most common choice for read replicas. It returns connections to the pool after each transaction completes, enabling high multiplexing. However, it prevents session-level features like prepared statements or session variables. Statement pooling offers maximum efficiency but requires queries to be completely independent.

Handling Hotspots and Query Routing

Not all queries are equal. Some queries complete in milliseconds; others run for seconds or minutes. Some access hot data cached in memory; others trigger disk I/O. Effective load balancing accounts for these differences.

Query Classification Strategies

•Workload isolation — Route lightweight OLTP queries to one replica pool and heavy analytics queries to another. Prevents long-running queries from blocking quick lookups.
•Query cost estimation — Use EXPLAIN to estimate query cost. Route expensive queries to dedicated replicas with more resources or longer timeouts.
•User/tenant isolation — Large tenants get dedicated replicas. Small tenants share replicas. This prevents noisy neighbors from affecting service quality.
•Read pattern affinity — Route queries that access similar data to the same replica. This maximizes buffer pool cache efficiency.

Preventing hotspots:

Hotspots occur when traffic concentrates on one replica while others remain underutilized. Common causes include:

Sticky sessions without balancing — All requests from high-traffic users go to one replica
Consistent hashing with skewed keys — Popular keys hash to the same replica
Least connections with long queries — One replica runs few long queries, appears lightly loaded, receives more traffic

Mitigation strategies:

query-router.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
// Intelligent query routing with workload isolation
 
type QueryCategory = 'simple' | 'complex' | 'analytics' | 'unknown';
 
interface ReplicaPool {
    name: string;
    replicas: Replica[];
    balancer: LoadBalancer;
}
 
class QueryRouter {
    private pools: Map<QueryCategory, ReplicaPool>;
    private queryClassifier: QueryClassifier;
    
    constructor(pools: Map<QueryCategory, ReplicaPool>, classifier: QueryClassifier) {
        this.pools = pools;
        this.queryClassifier = classifier;
    }
    
    route(query: string): Replica | null {
        const category = this.classifyQuery(query);
        const pool = this.pools.get(category) ?? this.pools.get('unknown');
        
        if (!pool) {
            throw new Error('No available replica pool');
        }
        
        return pool.balancer.next();
    }
    
    private classifyQuery(query: string): QueryCategory {
        const normalized = query.trim().toUpperCase();
        
        // Simple heuristics - production systems use more sophisticated analysis
        if (normalized.startsWith('SELECT')) {
            // Check for analytics patterns
            if (this.isAnalyticsQuery(normalized)) {
                return 'analytics';
            }
            
            // Check for complex patterns
            if (this.isComplexQuery(normalized)) {
                return 'complex';
            }
            
            return 'simple';
        }
        
        return 'unknown';
    }
    
    private isAnalyticsQuery(query: string): boolean {
        const analyticsPatterns = [
            /GROUP BY/i,
            /HAVING/i,
            /\bSUM\b|\bAVG\b|\bCOUNT\b.*\bFROM\b/i,
            /ORDER BY.*LIMIT\s+\d{4,}/i, // Large LIMIT
            /BETWEEN.*AND.*BETWEEN/i, // Multiple range conditions
        ];
        
        return analyticsPatterns.some(pattern => pattern.test(query));
    }
    
    private isComplexQuery(query: string): boolean {
        const joinCount = (query.match(/\bJOIN\b/gi) || []).length;
        const subqueryCount = (query.match(/\(\s*SELECT/gi) || []).length;
        
        return joinCount >= 3 || subqueryCount >= 2;
    }
}
 
// Example pool configuration
const simplePools = new Map<QueryCategory, ReplicaPool>([
    ['simple', { 
        name: 'OLTP Pool', 
        replicas: [replica1, replica2], 
        balancer: new LeastConnectionsBalancer([replica1, replica2])
    }],
    ['analytics', { 
        name: 'Analytics Pool', 
        replicas: [replica3], 
        balancer: new RoundRobinBalancer([replica3])
    }],
    ['complex', { 
        name: 'Complex Query Pool', 
        replicas: [replica2, replica3], 
        balancer: new LeastResponseTimeBalancer([replica2, replica3])
    }],
    ['unknown', { 
        name: 'Default Pool', 
        replicas: [replica1, replica2, replica3], 
        balancer: new RoundRobinBalancer([replica1, replica2, replica3])
    }],
]);

Query Classification Overhead

Parsing queries to classify them adds latency. Keep classification logic lightweight—simple pattern matching rather than full SQL parsing. For complex classification needs, consider labeling queries at the application level (e.g., repository methods know their query category) rather than parsing SQL at runtime.

Graceful Degradation Under Failure

Robust load balancing anticipates failures. Replicas crash, become unreachable, or experience degraded performance. The system must continue operating with reduced capacity rather than failing entirely.

Failure Handling Strategies

•Automatic removal — Health checks detect failed replicas and remove them from the rotation. Traffic automatically redistributes to remaining healthy replicas.
•Circuit breakers — After multiple consecutive failures on a replica, stop sending traffic for a cooldown period. Periodically probe to detect recovery.
•Fallback to primary — When all replicas fail or are too lagged, route reads to the primary. This maintains availability at the cost of primary load.
•Stale data serving — If replicas are lagged but responsive, continue serving reads with appropriate user messaging about data freshness.
•Request queuing with backpressure — When capacity is exceeded, queue requests briefly. If queue grows too large, reject with appropriate error codes (503 Service Unavailable).

circuit-breaker.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
// Circuit breaker implementation for replica connections
 
enum CircuitState {
    CLOSED,     // Normal operation - requests flow through
    OPEN,       // Failures exceeded threshold - requests fail fast
    HALF_OPEN,  // Testing recovery - limited requests allowed
}
 
interface CircuitBreakerConfig {
    failureThreshold: number;       // Failures before opening circuit
    successThreshold: number;       // Successes in half-open before closing
    openDurationMs: number;         // How long circuit stays open
    halfOpenMaxRequests: number;    // Max requests allowed in half-open
}
 
class CircuitBreaker {
    private state: CircuitState = CircuitState.CLOSED;
    private failures = 0;
    private successes = 0;
    private lastFailureTime = 0;
    private halfOpenRequests = 0;
    private config: CircuitBreakerConfig;
    
    constructor(config: CircuitBreakerConfig) {
        this.config = config;
    }
    
    async execute<T>(operation: () => Promise<T>): Promise<T> {
        if (!this.canExecute()) {
            throw new Error('Circuit breaker is OPEN - failing fast');
        }
        
        try {
            const result = await operation();
            this.recordSuccess();
            return result;
        } catch (error) {
            this.recordFailure();
            throw error;
        }
    }
    
    private canExecute(): boolean {
        switch (this.state) {
            case CircuitState.CLOSED:
                return true;
                
            case CircuitState.OPEN:
                // Check if enough time has passed to try again
                if (Date.now() - this.lastFailureTime >= this.config.openDurationMs) {
                    this.transitionTo(CircuitState.HALF_OPEN);
                    return true;
                }
                return false;
                
            case CircuitState.HALF_OPEN:
                // Allow limited requests in half-open state
                return this.halfOpenRequests < this.config.halfOpenMaxRequests;
        }
    }
    
    private recordSuccess(): void {
        this.failures = 0;
        
        if (this.state === CircuitState.HALF_OPEN) {
            this.successes++;
            if (this.successes >= this.config.successThreshold) {
                this.transitionTo(CircuitState.CLOSED);
            }
        }
    }
    
    private recordFailure(): void {
        this.failures++;
        this.lastFailureTime = Date.now();
        
        if (this.state === CircuitState.CLOSED) {
            if (this.failures >= this.config.failureThreshold) {
                this.transitionTo(CircuitState.OPEN);
            }
        } else if (this.state === CircuitState.HALF_OPEN) {
            // Single failure in half-open returns to open
            this.transitionTo(CircuitState.OPEN);
        }
    }
    
    private transitionTo(newState: CircuitState): void {
        console.log(`Circuit breaker: ${CircuitState[this.state]} -> ${CircuitState[newState]}`);
        this.state = newState;
        this.successes = 0;
        this.halfOpenRequests = 0;
    }
    
    getState(): CircuitState {
        return this.state;
    }
}
 
// Usage with replicas
class ReplicaWithCircuitBreaker {
    private replica: Replica;
    private circuitBreaker: CircuitBreaker;
    
    constructor(replica: Replica, config: CircuitBreakerConfig) {
        this.replica = replica;
        this.circuitBreaker = new CircuitBreaker(config);
    }
    
    async query<T>(sql: string, params: unknown[]): Promise<T> {
        return this.circuitBreaker.execute(() => 
            this.replica.connection.query(sql, params)
        );
    }
    
    isAvailable(): boolean {
        return this.circuitBreaker.getState() !== CircuitState.OPEN;
    }
}

Fallback to Primary Considerations

When falling back to primary reads, monitor primary load carefully. If the primary becomes overwhelmed, you may need to shed load entirely (return errors) rather than cascade failure to the primary. Consider implementing request priority: critical reads get primary access, non-critical reads receive stale data or errors.

Monitoring and Observability

Effective load balancing requires comprehensive visibility into replica health, traffic distribution, and performance. Without observability, you're flying blind.

Key Load Balancing Metrics
Metric Category	Specific Metrics	Alert Thresholds	Dashboarding
Traffic Distribution	Queries per replica, QPS per replica	Deviation > 30% from mean	Stacked area chart over time
Latency	p50, p95, p99 per replica	p99 > 2x baseline	Heatmap with latency buckets
Error Rate	Errors per replica, error types	Error rate > 1%	Time series with error breakdown
Connection Pools	Active, idle, waiting connections per pool	Waiting > 0 sustained	Pool utilization gauges
Health Status	Healthy replica count, health check latency	Healthy < 2 (redundancy loss)	Status indicator panel
Replication Lag	Lag per replica in ms/bytes	Any replica > threshold	Per-replica lag chart
Circuit Breakers	Open circuits, trip frequency	Any circuit open	State history timeline

Tracing distributed queries:

When debugging latency issues, correlate application-level traces with database query execution. Modern observability stacks support distributed tracing that follows a request from load balancer through replica selection to query execution and back.

Essential trace spans for database reads:

Connection acquisition from pool (time waiting for connection)
Replica selection (load balancer decision)
Network round trip (connection to replica)
Query execution (time on replica)
Result serialization and return

High latency in connection acquisition indicates pool exhaustion. High latency in replica selection may indicate a slow health check in the critical path. High query execution time suggests replica capacity issues or query optimization needs.

SLO-Based Alerting

Define Service Level Objectives (SLOs) for read operations—e.g., '99% of reads complete in under 100ms.' Alert when error budget is being consumed too quickly rather than on individual threshold violations. This prevents alert fatigue while ensuring you catch meaningful degradation.

Summary: Load Balancing Reads

Effective load balancing transforms a collection of replicas into a cohesive, resilient read tier. The strategies covered here form the foundation for building systems that scale reliably.

Key Takeaways

•Choose algorithms based on workload characteristics — Round robin for homogeneous loads, least connections for variable query duration, least response time for latency-sensitive applications.
•Health checking is non-negotiable — Active and passive checks with hysteresis prevent routing to failed or degraded replicas while avoiding flapping.
•Connection pooling maximizes efficiency — External poolers like PgBouncer enable high multiplexing, but pool mode and sizing require careful configuration.
•Workload isolation prevents noisy neighbors — Separate replica pools for OLTP vs. analytics queries prevents long-running queries from affecting quick lookups.
•Circuit breakers enable graceful degradation — When replicas fail, fail fast and recover automatically rather than overwhelming degraded resources.
•Observability drives operational excellence — Metrics, tracing, and SLO-based alerting provide the visibility needed to optimize and troubleshoot.

What's next:

With load balancing fundamentals established, the next page examines Consistency Considerations—how to reason about data freshness, when strong consistency is required, and patterns for achieving read-your-writes semantics in replica architectures.

Page Complete

You now understand the algorithms, patterns, and operational practices for distributing read traffic across replicas. These concepts apply whether you're building custom routing logic or configuring managed load balancers and database proxies.

3 / 5

Loading learning content...

System Design (HLD)Read Replicas

Read Replicas: Scaling SQL Reads

LevelIntermediate

Duration75 mins

TopicRead Replicas

3 / 5

Load Balancing Reads

Distributing Traffic at Scale

This page explores load balancing for database reads in depth—from fundamental algorithms to production-ready patterns that handle the messy realities of distributed systems.

What You Will Learn

Load Balancing Algorithms

Load Balancing Algorithms Comparison
Algorithm	Mechanism	Best For	Limitations
Round Robin	Cycle through replicas in order	Homogeneous replicas, uniform query cost	Ignores replica health, capacity, current load
Weighted Round Robin	Round robin with configurable weights	Heterogeneous replicas (different capacities)	Static weights don't adapt to dynamic conditions
Least Connections	Route to replica with fewest active queries	Variable query duration workloads	Connection count ≠ actual load; long queries skew
Least Response Time	Route to replica with lowest recent latency	Latency-sensitive applications	Requires continuous latency measurement; cold start issues
Random	Select replica randomly	Many replicas; simplicity required	Uneven distribution with small replica counts
Weighted Random	Random with probability proportional to weight	Heterogeneous replicas; simple implementation	Less deterministic; harder to debug
Consistent Hashing	Hash query/user to specific replica	Session affinity; cache locality	Hotspots with skewed distributions; rebalancing on failures
Adaptive/Dynamic	Adjust based on real-time metrics	Production systems with variable load	Most complex; requires comprehensive monitoring

Algorithm deep dive: Least Connections vs. Least Response Time

These two algorithms are often confused but behave differently under load:

Least Connections routes new queries to the replica with the fewest active connections/queries. This works well when:

Queries have similar execution times
Connection count accurately reflects load
You want to avoid overloading busy replicas

However, if one replica is processing 10 lightweight queries while another processes 2 complex analytics queries, connection count doesn't reflect true load.

Least Response Time routes to the replica with the lowest observed latency. This better reflects actual performance but requires:

Continuous latency measurement
Handling of startup conditions (new replicas have no history)
Smoothing to avoid oscillation (one slow query shouldn't immediately deprioritize a replica)

load-balancer-algorithms.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
// Implementation of common load balancing algorithms
 
interface Replica {
    id: string;
    connection: DatabaseConnection;
    weight: number;
    activeConnections: number;
    avgLatencyMs: number;
    healthy: boolean;
}
 
// Round Robin
class RoundRobinBalancer {
    private replicas: Replica[];
    private currentIndex = 0;
    
    constructor(replicas: Replica[]) {
        this.replicas = replicas;
    }
    
    next(): Replica | null {
        const healthy = this.replicas.filter(r => r.healthy);
        if (healthy.length === 0) return null;
        
        const replica = healthy[this.currentIndex % healthy.length];
        this.currentIndex++;
        return replica;
    }
}
 
// Weighted Round Robin
class WeightedRoundRobinBalancer {
    private replicas: Replica[];
    private weights: number[];
    private currentWeights: number[];
    
    constructor(replicas: Replica[]) {
        this.replicas = replicas;
        this.weights = replicas.map(r => r.weight);
        this.currentWeights = [...this.weights];
    }
    
    next(): Replica | null {
        const healthy = this.replicas.filter(r => r.healthy);
        if (healthy.length === 0) return null;
        
        // Find replica with highest current weight
        let maxWeight = -1;
        let selected: Replica | null = null;
        let selectedIndex = -1;
        
        for (let i = 0; i < healthy.length; i++) {
            const originalIndex = this.replicas.indexOf(healthy[i]);
            if (this.currentWeights[originalIndex] > maxWeight) {
                maxWeight = this.currentWeights[originalIndex];
                selected = healthy[i];
                selectedIndex = originalIndex;
            }
        }
        
        if (selected && selectedIndex >= 0) {
            // Decrease selected weight, increase others
            const totalWeight = this.weights.reduce((a, b) => a + b, 0);
            this.currentWeights[selectedIndex] -= totalWeight;
            for (let i = 0; i < this.weights.length; i++) {
                this.currentWeights[i] += this.weights[i];
            }
        }
        
        return selected;
    }
}
 
// Least Connections
class LeastConnectionsBalancer {
    private replicas: Replica[];
    
    constructor(replicas: Replica[]) {
        this.replicas = replicas;
    }
    
    next(): Replica | null {
        const healthy = this.replicas.filter(r => r.healthy);
        if (healthy.length === 0) return null;
        
        // Sort by active connections, then by weight (higher weight = tiebreaker preference)
        healthy.sort((a, b) => {
            if (a.activeConnections !== b.activeConnections) {
                return a.activeConnections - b.activeConnections;
            }
            return b.weight - a.weight;
        });
        
        return healthy[0];
    }
}
 
// Least Response Time with Exponential Moving Average
class LeastResponseTimeBalancer {
    private replicas: Replica[];
    private readonly alpha = 0.3; // Smoothing factor
    
    constructor(replicas: Replica[]) {
        this.replicas = replicas;
    }
    
    recordLatency(replicaId: string, latencyMs: number): void {
        const replica = this.replicas.find(r => r.id === replicaId);
        if (replica) {
            // Exponential moving average
            replica.avgLatencyMs = this.alpha * latencyMs + (1 - this.alpha) * replica.avgLatencyMs;
        }
    }
    
    next(): Replica | null {
        const healthy = this.replicas.filter(r => r.healthy);
        if (healthy.length === 0) return null;
        
        // Sort by average latency
        healthy.sort((a, b) => a.avgLatencyMs - b.avgLatencyMs);
        
        return healthy[0];
    }
}

Health Checking and Failure Detection

Health Check Dimensions

•Connectivity — Can we establish a connection at all? TCP handshake success, authentication success.
•Responsiveness — Does the replica respond to simple queries in acceptable time? SELECT 1 should complete in milliseconds.
•Replication status — Is the replica connected to the primary and receiving updates? Check replication status variables.
•Replication lag — Is the replica acceptably current? Compare WAL positions or check lag metrics.
•Resource health — Is the replica overwhelmed? Check connection pool saturation, query queue depth.
•Data validity — Can the replica serve actual queries? Test against known tables/data.

Health check types:

Active health checks probe replicas continuously at fixed intervals, regardless of traffic. This ensures timely detection even during low-traffic periods but consumes resources.

Passive health checks observe actual query results and mark replicas unhealthy upon failures. This is efficient but may be slow to detect problems during low traffic.

Hybrid approaches combine both: active checks run at intervals, passive checks accelerate detection when failures occur during traffic.

health-checker.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
// Comprehensive health checker for read replicas
 
interface HealthCheckResult {
    healthy: boolean;
    connectivity: boolean;
    responsive: boolean;
    replicationActive: boolean;
    lagMs: number;
    checkDurationMs: number;
    timestamp: Date;
    error?: string;
}
 
interface HealthCheckConfig {
    connectTimeoutMs: number;
    queryTimeoutMs: number;
    maxLagMs: number;
    consecutiveFailuresBeforeUnhealthy: number;
    consecutiveSuccessesBeforeHealthy: number;
}
 
class ReplicaHealthChecker {
    private config: HealthCheckConfig;
    private consecutiveFailures = 0;
    private consecutiveSuccesses = 0;
    private currentlyHealthy = true;
    
    constructor(config: HealthCheckConfig) {
        this.config = config;
    }
    
    async check(replica: DatabaseConnection): Promise<HealthCheckResult> {
        const startTime = Date.now();
        const result: HealthCheckResult = {
            healthy: false,
            connectivity: false,
            responsive: false,
            replicationActive: false,
            lagMs: Infinity,
            checkDurationMs: 0,
            timestamp: new Date(),
        };
        
        try {
            // Test 1: Connectivity
            await this.testConnectivity(replica);
            result.connectivity = true;
            
            // Test 2: Responsiveness
            await this.testResponsiveness(replica);
            result.responsive = true;
            
            // Test 3: Replication status and lag
            const replicationStatus = await this.testReplication(replica);
            result.replicationActive = replicationStatus.active;
            result.lagMs = replicationStatus.lagMs;
            
            // Determine overall health
            result.healthy = 
                result.connectivity &&
                result.responsive &&
                result.replicationActive &&
                result.lagMs <= this.config.maxLagMs;
            
        } catch (error) {
            result.error = error instanceof Error ? error.message : String(error);
            result.healthy = false;
        }
        
        result.checkDurationMs = Date.now() - startTime;
        
        // Apply hysteresis (prevent flapping)
        this.updateHealthState(result.healthy);
        result.healthy = this.currentlyHealthy;
        
        return result;
    }
    
    private async testConnectivity(replica: DatabaseConnection): Promise<void> {
        // Attempt to ping or run minimal command
        await Promise.race([
            replica.ping(),
            new Promise((_, reject) => 
                setTimeout(() => reject(new Error('Connect timeout')), this.config.connectTimeoutMs)
            ),
        ]);
    }
    
    private async testResponsiveness(replica: DatabaseConnection): Promise<void> {
        await Promise.race([
            replica.query('SELECT 1'),
            new Promise((_, reject) => 
                setTimeout(() => reject(new Error('Query timeout')), this.config.queryTimeoutMs)
            ),
        ]);
    }
    
    private async testReplication(
        replica: DatabaseConnection
    ): Promise<{ active: boolean; lagMs: number }> {
        // PostgreSQL example
        const result = await replica.query(`
            SELECT 
                pg_is_in_recovery() AS is_replica,
                CASE 
                    WHEN pg_last_xact_replay_timestamp() IS NULL THEN -1
                    ELSE EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) * 1000
                END AS lag_ms
        `);
        
        const row = result.rows[0];
        return {
            active: row.is_replica === true,
            lagMs: row.lag_ms >= 0 ? row.lag_ms : Infinity,
        };
    }
    
    private updateHealthState(checkPassed: boolean): void {
        if (checkPassed) {
            this.consecutiveSuccesses++;
            this.consecutiveFailures = 0;
            
            if (!this.currentlyHealthy && 
                this.consecutiveSuccesses >= this.config.consecutiveSuccessesBeforeHealthy) {
                this.currentlyHealthy = true;
                console.log('Replica marked healthy after recovery');
            }
        } else {
            this.consecutiveFailures++;
            this.consecutiveSuccesses = 0;
            
            if (this.currentlyHealthy && 
                this.consecutiveFailures >= this.config.consecutiveFailuresBeforeUnhealthy) {
                this.currentlyHealthy = false;
                console.warn('Replica marked unhealthy after consecutive failures');
            }
        }
    }
}

Hysteresis Prevents Flapping

Connection Pooling Strategies

Pooling Benefits

•Reduced connection overhead — Avoid per-query connection establishment (authentication, SSL handshake, backend spawning)
•Connection reuse — Hot connections with prepared statements and session state remain available
•Concurrency control — Pool size limits prevent overwhelming databases with too many connections
•Failure isolation — Bad connections are detected and replaced without affecting the pool

Pooling Considerations

•Session state management — Transaction state, temporary tables, session variables may leak between uses
•Connection validation — Pooled connections may become stale (server timeout, network partition)
•Pool sizing — Too small = queue buildup; too large = server overwhelm
•Multi-tenancy complexity — Different credentials or schemas require pool-per-tenant

Pooling architectures:

pgbouncer.ini
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
; PgBouncer configuration for read replica pooling
 
[databases]
; Define read replica pool
myapp_read = host=replica1.internal,replica2.internal,replica3.internal port=5432 dbname=myapp
myapp_write = host=primary.internal port=5432 dbname=myapp
 
[pgbouncer]
; Pool mode determines how connections are shared
; session: connection locked until client disconnects (safest, least efficient)
; transaction: connection returned after each transaction (good balance)
; statement: connection returned after each statement (most efficient, restrictions apply)
pool_mode = transaction
 
; Connection limits
max_client_conn = 10000          ; Max connections from applications
default_pool_size = 25           ; Default connections per database user
min_pool_size = 5                ; Minimum connections to keep open
reserve_pool_size = 5            ; Extra connections for spike handling
reserve_pool_timeout = 3         ; Seconds before using reserve pool
 
; Health and validation
server_check_delay = 30          ; Seconds between server checks
server_check_query = SELECT 1    ; Query for health check
server_lifetime = 3600           ; Max seconds a connection lives
server_idle_timeout = 600        ; Seconds before closing idle connection
 
; Logging and monitoring
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
 
; Security
auth_type = scram-sha-256
max_db_connections = 50          ; Max connections per database
 
[users]
; User-specific pool configurations
app_user = pool_mode=transaction max_user_connections=100

Pool Mode Selection

Handling Hotspots and Query Routing

Query Classification Strategies

•Workload isolation — Route lightweight OLTP queries to one replica pool and heavy analytics queries to another. Prevents long-running queries from blocking quick lookups.
•Query cost estimation — Use EXPLAIN to estimate query cost. Route expensive queries to dedicated replicas with more resources or longer timeouts.
•User/tenant isolation — Large tenants get dedicated replicas. Small tenants share replicas. This prevents noisy neighbors from affecting service quality.
•Read pattern affinity — Route queries that access similar data to the same replica. This maximizes buffer pool cache efficiency.

Preventing hotspots:

Hotspots occur when traffic concentrates on one replica while others remain underutilized. Common causes include:

Sticky sessions without balancing — All requests from high-traffic users go to one replica
Consistent hashing with skewed keys — Popular keys hash to the same replica
Least connections with long queries — One replica runs few long queries, appears lightly loaded, receives more traffic

Mitigation strategies:

query-router.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
// Intelligent query routing with workload isolation
 
type QueryCategory = 'simple' | 'complex' | 'analytics' | 'unknown';
 
interface ReplicaPool {
    name: string;
    replicas: Replica[];
    balancer: LoadBalancer;
}
 
class QueryRouter {
    private pools: Map<QueryCategory, ReplicaPool>;
    private queryClassifier: QueryClassifier;
    
    constructor(pools: Map<QueryCategory, ReplicaPool>, classifier: QueryClassifier) {
        this.pools = pools;
        this.queryClassifier = classifier;
    }
    
    route(query: string): Replica | null {
        const category = this.classifyQuery(query);
        const pool = this.pools.get(category) ?? this.pools.get('unknown');
        
        if (!pool) {
            throw new Error('No available replica pool');
        }
        
        return pool.balancer.next();
    }
    
    private classifyQuery(query: string): QueryCategory {
        const normalized = query.trim().toUpperCase();
        
        // Simple heuristics - production systems use more sophisticated analysis
        if (normalized.startsWith('SELECT')) {
            // Check for analytics patterns
            if (this.isAnalyticsQuery(normalized)) {
                return 'analytics';
            }
            
            // Check for complex patterns
            if (this.isComplexQuery(normalized)) {
                return 'complex';
            }
            
            return 'simple';
        }
        
        return 'unknown';
    }
    
    private isAnalyticsQuery(query: string): boolean {
        const analyticsPatterns = [
            /GROUP BY/i,
            /HAVING/i,
            /\bSUM\b|\bAVG\b|\bCOUNT\b.*\bFROM\b/i,
            /ORDER BY.*LIMIT\s+\d{4,}/i, // Large LIMIT
            /BETWEEN.*AND.*BETWEEN/i, // Multiple range conditions
        ];
        
        return analyticsPatterns.some(pattern => pattern.test(query));
    }
    
    private isComplexQuery(query: string): boolean {
        const joinCount = (query.match(/\bJOIN\b/gi) || []).length;
        const subqueryCount = (query.match(/\(\s*SELECT/gi) || []).length;
        
        return joinCount >= 3 || subqueryCount >= 2;
    }
}
 
// Example pool configuration
const simplePools = new Map<QueryCategory, ReplicaPool>([
    ['simple', { 
        name: 'OLTP Pool', 
        replicas: [replica1, replica2], 
        balancer: new LeastConnectionsBalancer([replica1, replica2])
    }],
    ['analytics', { 
        name: 'Analytics Pool', 
        replicas: [replica3], 
        balancer: new RoundRobinBalancer([replica3])
    }],
    ['complex', { 
        name: 'Complex Query Pool', 
        replicas: [replica2, replica3], 
        balancer: new LeastResponseTimeBalancer([replica2, replica3])
    }],
    ['unknown', { 
        name: 'Default Pool', 
        replicas: [replica1, replica2, replica3], 
        balancer: new RoundRobinBalancer([replica1, replica2, replica3])
    }],
]);

Query Classification Overhead

Graceful Degradation Under Failure

Failure Handling Strategies

•Automatic removal — Health checks detect failed replicas and remove them from the rotation. Traffic automatically redistributes to remaining healthy replicas.
•Circuit breakers — After multiple consecutive failures on a replica, stop sending traffic for a cooldown period. Periodically probe to detect recovery.
•Fallback to primary — When all replicas fail or are too lagged, route reads to the primary. This maintains availability at the cost of primary load.
•Stale data serving — If replicas are lagged but responsive, continue serving reads with appropriate user messaging about data freshness.
•Request queuing with backpressure — When capacity is exceeded, queue requests briefly. If queue grows too large, reject with appropriate error codes (503 Service Unavailable).

circuit-breaker.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
// Circuit breaker implementation for replica connections
 
enum CircuitState {
    CLOSED,     // Normal operation - requests flow through
    OPEN,       // Failures exceeded threshold - requests fail fast
    HALF_OPEN,  // Testing recovery - limited requests allowed
}
 
interface CircuitBreakerConfig {
    failureThreshold: number;       // Failures before opening circuit
    successThreshold: number;       // Successes in half-open before closing
    openDurationMs: number;         // How long circuit stays open
    halfOpenMaxRequests: number;    // Max requests allowed in half-open
}
 
class CircuitBreaker {
    private state: CircuitState = CircuitState.CLOSED;
    private failures = 0;
    private successes = 0;
    private lastFailureTime = 0;
    private halfOpenRequests = 0;
    private config: CircuitBreakerConfig;
    
    constructor(config: CircuitBreakerConfig) {
        this.config = config;
    }
    
    async execute<T>(operation: () => Promise<T>): Promise<T> {
        if (!this.canExecute()) {
            throw new Error('Circuit breaker is OPEN - failing fast');
        }
        
        try {
            const result = await operation();
            this.recordSuccess();
            return result;
        } catch (error) {
            this.recordFailure();
            throw error;
        }
    }
    
    private canExecute(): boolean {
        switch (this.state) {
            case CircuitState.CLOSED:
                return true;
                
            case CircuitState.OPEN:
                // Check if enough time has passed to try again
                if (Date.now() - this.lastFailureTime >= this.config.openDurationMs) {
                    this.transitionTo(CircuitState.HALF_OPEN);
                    return true;
                }
                return false;
                
            case CircuitState.HALF_OPEN:
                // Allow limited requests in half-open state
                return this.halfOpenRequests < this.config.halfOpenMaxRequests;
        }
    }
    
    private recordSuccess(): void {
        this.failures = 0;
        
        if (this.state === CircuitState.HALF_OPEN) {
            this.successes++;
            if (this.successes >= this.config.successThreshold) {
                this.transitionTo(CircuitState.CLOSED);
            }
        }
    }
    
    private recordFailure(): void {
        this.failures++;
        this.lastFailureTime = Date.now();
        
        if (this.state === CircuitState.CLOSED) {
            if (this.failures >= this.config.failureThreshold) {
                this.transitionTo(CircuitState.OPEN);
            }
        } else if (this.state === CircuitState.HALF_OPEN) {
            // Single failure in half-open returns to open
            this.transitionTo(CircuitState.OPEN);
        }
    }
    
    private transitionTo(newState: CircuitState): void {
        console.log(`Circuit breaker: ${CircuitState[this.state]} -> ${CircuitState[newState]}`);
        this.state = newState;
        this.successes = 0;
        this.halfOpenRequests = 0;
    }
    
    getState(): CircuitState {
        return this.state;
    }
}
 
// Usage with replicas
class ReplicaWithCircuitBreaker {
    private replica: Replica;
    private circuitBreaker: CircuitBreaker;
    
    constructor(replica: Replica, config: CircuitBreakerConfig) {
        this.replica = replica;
        this.circuitBreaker = new CircuitBreaker(config);
    }
    
    async query<T>(sql: string, params: unknown[]): Promise<T> {
        return this.circuitBreaker.execute(() => 
            this.replica.connection.query(sql, params)
        );
    }
    
    isAvailable(): boolean {
        return this.circuitBreaker.getState() !== CircuitState.OPEN;
    }
}

Fallback to Primary Considerations

Monitoring and Observability

Effective load balancing requires comprehensive visibility into replica health, traffic distribution, and performance. Without observability, you're flying blind.

Key Load Balancing Metrics
Metric Category	Specific Metrics	Alert Thresholds	Dashboarding
Traffic Distribution	Queries per replica, QPS per replica	Deviation > 30% from mean	Stacked area chart over time
Latency	p50, p95, p99 per replica	p99 > 2x baseline	Heatmap with latency buckets
Error Rate	Errors per replica, error types	Error rate > 1%	Time series with error breakdown
Connection Pools	Active, idle, waiting connections per pool	Waiting > 0 sustained	Pool utilization gauges
Health Status	Healthy replica count, health check latency	Healthy < 2 (redundancy loss)	Status indicator panel
Replication Lag	Lag per replica in ms/bytes	Any replica > threshold	Per-replica lag chart
Circuit Breakers	Open circuits, trip frequency	Any circuit open	State history timeline

Tracing distributed queries:

Essential trace spans for database reads:

Connection acquisition from pool (time waiting for connection)
Replica selection (load balancer decision)
Network round trip (connection to replica)
Query execution (time on replica)
Result serialization and return

SLO-Based Alerting

Summary: Load Balancing Reads

Effective load balancing transforms a collection of replicas into a cohesive, resilient read tier. The strategies covered here form the foundation for building systems that scale reliably.

Key Takeaways

•Choose algorithms based on workload characteristics — Round robin for homogeneous loads, least connections for variable query duration, least response time for latency-sensitive applications.
•Health checking is non-negotiable — Active and passive checks with hysteresis prevent routing to failed or degraded replicas while avoiding flapping.
•Connection pooling maximizes efficiency — External poolers like PgBouncer enable high multiplexing, but pool mode and sizing require careful configuration.
•Workload isolation prevents noisy neighbors — Separate replica pools for OLTP vs. analytics queries prevents long-running queries from affecting quick lookups.
•Circuit breakers enable graceful degradation — When replicas fail, fail fast and recover automatically rather than overwhelming degraded resources.
•Observability drives operational excellence — Metrics, tracing, and SLO-based alerting provide the visibility needed to optimize and troubleshoot.

What's next:

Page Complete

3 / 5