Loading learning content...
Having multiple read replicas is only half the solution. The other half—arguably the more complex half—is intelligently distributing read traffic across those replicas. A naive approach might send all traffic to the first available replica, leaving others idle. A slightly better approach might round-robin evenly, ignoring differences in replica capacity or current load.
Production systems require more sophistication. Replicas fail, experience lag spikes, or become overloaded. Network paths vary in latency. Different query types stress different resources. The load balancing strategy must account for all these factors while adding minimal latency and maintaining high availability.
This page explores load balancing for database reads in depth—from fundamental algorithms to production-ready patterns that handle the messy realities of distributed systems.
By the end of this page, you will master load balancing algorithms and their trade-offs, understand how to detect and handle replica failures, implement connection pooling strategies that maximize efficiency, and design systems that remain performant even when individual replicas degrade.
The choice of load balancing algorithm significantly impacts how effectively replicas are utilized and how the system responds to failures and varying load conditions. Each algorithm has distinct characteristics suited to different scenarios.
| Algorithm | Mechanism | Best For | Limitations |
|---|---|---|---|
| Round Robin | Cycle through replicas in order | Homogeneous replicas, uniform query cost | Ignores replica health, capacity, current load |
| Weighted Round Robin | Round robin with configurable weights | Heterogeneous replicas (different capacities) | Static weights don't adapt to dynamic conditions |
| Least Connections | Route to replica with fewest active queries | Variable query duration workloads | Connection count ≠ actual load; long queries skew |
| Least Response Time | Route to replica with lowest recent latency | Latency-sensitive applications | Requires continuous latency measurement; cold start issues |
| Random | Select replica randomly | Many replicas; simplicity required | Uneven distribution with small replica counts |
| Weighted Random | Random with probability proportional to weight | Heterogeneous replicas; simple implementation | Less deterministic; harder to debug |
| Consistent Hashing | Hash query/user to specific replica | Session affinity; cache locality | Hotspots with skewed distributions; rebalancing on failures |
| Adaptive/Dynamic | Adjust based on real-time metrics | Production systems with variable load | Most complex; requires comprehensive monitoring |
Algorithm deep dive: Least Connections vs. Least Response Time
These two algorithms are often confused but behave differently under load:
Least Connections routes new queries to the replica with the fewest active connections/queries. This works well when:
However, if one replica is processing 10 lightweight queries while another processes 2 complex analytics queries, connection count doesn't reflect true load.
Least Response Time routes to the replica with the lowest observed latency. This better reflects actual performance but requires:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124
// Implementation of common load balancing algorithms interface Replica { id: string; connection: DatabaseConnection; weight: number; activeConnections: number; avgLatencyMs: number; healthy: boolean;} // Round Robinclass RoundRobinBalancer { private replicas: Replica[]; private currentIndex = 0; constructor(replicas: Replica[]) { this.replicas = replicas; } next(): Replica | null { const healthy = this.replicas.filter(r => r.healthy); if (healthy.length === 0) return null; const replica = healthy[this.currentIndex % healthy.length]; this.currentIndex++; return replica; }} // Weighted Round Robinclass WeightedRoundRobinBalancer { private replicas: Replica[]; private weights: number[]; private currentWeights: number[]; constructor(replicas: Replica[]) { this.replicas = replicas; this.weights = replicas.map(r => r.weight); this.currentWeights = [...this.weights]; } next(): Replica | null { const healthy = this.replicas.filter(r => r.healthy); if (healthy.length === 0) return null; // Find replica with highest current weight let maxWeight = -1; let selected: Replica | null = null; let selectedIndex = -1; for (let i = 0; i < healthy.length; i++) { const originalIndex = this.replicas.indexOf(healthy[i]); if (this.currentWeights[originalIndex] > maxWeight) { maxWeight = this.currentWeights[originalIndex]; selected = healthy[i]; selectedIndex = originalIndex; } } if (selected && selectedIndex >= 0) { // Decrease selected weight, increase others const totalWeight = this.weights.reduce((a, b) => a + b, 0); this.currentWeights[selectedIndex] -= totalWeight; for (let i = 0; i < this.weights.length; i++) { this.currentWeights[i] += this.weights[i]; } } return selected; }} // Least Connectionsclass LeastConnectionsBalancer { private replicas: Replica[]; constructor(replicas: Replica[]) { this.replicas = replicas; } next(): Replica | null { const healthy = this.replicas.filter(r => r.healthy); if (healthy.length === 0) return null; // Sort by active connections, then by weight (higher weight = tiebreaker preference) healthy.sort((a, b) => { if (a.activeConnections !== b.activeConnections) { return a.activeConnections - b.activeConnections; } return b.weight - a.weight; }); return healthy[0]; }} // Least Response Time with Exponential Moving Averageclass LeastResponseTimeBalancer { private replicas: Replica[]; private readonly alpha = 0.3; // Smoothing factor constructor(replicas: Replica[]) { this.replicas = replicas; } recordLatency(replicaId: string, latencyMs: number): void { const replica = this.replicas.find(r => r.id === replicaId); if (replica) { // Exponential moving average replica.avgLatencyMs = this.alpha * latencyMs + (1 - this.alpha) * replica.avgLatencyMs; } } next(): Replica | null { const healthy = this.replicas.filter(r => r.healthy); if (healthy.length === 0) return null; // Sort by average latency healthy.sort((a, b) => a.avgLatencyMs - b.avgLatencyMs); return healthy[0]; }}Load balancing depends on accurate knowledge of replica health. A load balancer that routes traffic to a failed replica provides the worst of both worlds: queries fail AND healthy replicas are underutilized.
Health check types:
Active health checks probe replicas continuously at fixed intervals, regardless of traffic. This ensures timely detection even during low-traffic periods but consumes resources.
Passive health checks observe actual query results and mark replicas unhealthy upon failures. This is efficient but may be slow to detect problems during low traffic.
Hybrid approaches combine both: active checks run at intervals, passive checks accelerate detection when failures occur during traffic.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139
// Comprehensive health checker for read replicas interface HealthCheckResult { healthy: boolean; connectivity: boolean; responsive: boolean; replicationActive: boolean; lagMs: number; checkDurationMs: number; timestamp: Date; error?: string;} interface HealthCheckConfig { connectTimeoutMs: number; queryTimeoutMs: number; maxLagMs: number; consecutiveFailuresBeforeUnhealthy: number; consecutiveSuccessesBeforeHealthy: number;} class ReplicaHealthChecker { private config: HealthCheckConfig; private consecutiveFailures = 0; private consecutiveSuccesses = 0; private currentlyHealthy = true; constructor(config: HealthCheckConfig) { this.config = config; } async check(replica: DatabaseConnection): Promise<HealthCheckResult> { const startTime = Date.now(); const result: HealthCheckResult = { healthy: false, connectivity: false, responsive: false, replicationActive: false, lagMs: Infinity, checkDurationMs: 0, timestamp: new Date(), }; try { // Test 1: Connectivity await this.testConnectivity(replica); result.connectivity = true; // Test 2: Responsiveness await this.testResponsiveness(replica); result.responsive = true; // Test 3: Replication status and lag const replicationStatus = await this.testReplication(replica); result.replicationActive = replicationStatus.active; result.lagMs = replicationStatus.lagMs; // Determine overall health result.healthy = result.connectivity && result.responsive && result.replicationActive && result.lagMs <= this.config.maxLagMs; } catch (error) { result.error = error instanceof Error ? error.message : String(error); result.healthy = false; } result.checkDurationMs = Date.now() - startTime; // Apply hysteresis (prevent flapping) this.updateHealthState(result.healthy); result.healthy = this.currentlyHealthy; return result; } private async testConnectivity(replica: DatabaseConnection): Promise<void> { // Attempt to ping or run minimal command await Promise.race([ replica.ping(), new Promise((_, reject) => setTimeout(() => reject(new Error('Connect timeout')), this.config.connectTimeoutMs) ), ]); } private async testResponsiveness(replica: DatabaseConnection): Promise<void> { await Promise.race([ replica.query('SELECT 1'), new Promise((_, reject) => setTimeout(() => reject(new Error('Query timeout')), this.config.queryTimeoutMs) ), ]); } private async testReplication( replica: DatabaseConnection ): Promise<{ active: boolean; lagMs: number }> { // PostgreSQL example const result = await replica.query(` SELECT pg_is_in_recovery() AS is_replica, CASE WHEN pg_last_xact_replay_timestamp() IS NULL THEN -1 ELSE EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) * 1000 END AS lag_ms `); const row = result.rows[0]; return { active: row.is_replica === true, lagMs: row.lag_ms >= 0 ? row.lag_ms : Infinity, }; } private updateHealthState(checkPassed: boolean): void { if (checkPassed) { this.consecutiveSuccesses++; this.consecutiveFailures = 0; if (!this.currentlyHealthy && this.consecutiveSuccesses >= this.config.consecutiveSuccessesBeforeHealthy) { this.currentlyHealthy = true; console.log('Replica marked healthy after recovery'); } } else { this.consecutiveFailures++; this.consecutiveSuccesses = 0; if (this.currentlyHealthy && this.consecutiveFailures >= this.config.consecutiveFailuresBeforeUnhealthy) { this.currentlyHealthy = false; console.warn('Replica marked unhealthy after consecutive failures'); } } }}Healthy/unhealthy state should not toggle on every check. Require multiple consecutive failures before marking unhealthy, and multiple consecutive successes before marking healthy again. This prevents transient issues (single timeout, brief network hiccup) from causing rapid state changes that destabilize routing.
Database connections are expensive resources. Each connection consumes memory on both client and server, requires TCP state, and (for PostgreSQL) spawns a backend process. Connection pooling—reusing connections across multiple queries—is essential for efficient replica utilization.
Pooling architectures:
Application-level pooling maintains pools within each application instance. Libraries like HikariCP (Java), pgx (Go), and node-postgres provide built-in pooling. This is simple but creates many connections when applications scale horizontally.
Connection proxy pooling interposes a pooler (PgBouncer, ProxySQL) between applications and databases. Applications connect to the pooler, which maintains far fewer connections to the actual database. This multiplexes many application connections onto fewer database connections.
123456789101112131415161718192021222324252627282930313233343536373839
; PgBouncer configuration for read replica pooling [databases]; Define read replica poolmyapp_read = host=replica1.internal,replica2.internal,replica3.internal port=5432 dbname=myappmyapp_write = host=primary.internal port=5432 dbname=myapp [pgbouncer]; Pool mode determines how connections are shared; session: connection locked until client disconnects (safest, least efficient); transaction: connection returned after each transaction (good balance); statement: connection returned after each statement (most efficient, restrictions apply)pool_mode = transaction ; Connection limitsmax_client_conn = 10000 ; Max connections from applicationsdefault_pool_size = 25 ; Default connections per database usermin_pool_size = 5 ; Minimum connections to keep openreserve_pool_size = 5 ; Extra connections for spike handlingreserve_pool_timeout = 3 ; Seconds before using reserve pool ; Health and validationserver_check_delay = 30 ; Seconds between server checksserver_check_query = SELECT 1 ; Query for health checkserver_lifetime = 3600 ; Max seconds a connection livesserver_idle_timeout = 600 ; Seconds before closing idle connection ; Logging and monitoringlog_connections = 1log_disconnections = 1log_pooler_errors = 1 ; Securityauth_type = scram-sha-256max_db_connections = 50 ; Max connections per database [users]; User-specific pool configurationsapp_user = pool_mode=transaction max_user_connections=100Transaction pooling is the most common choice for read replicas. It returns connections to the pool after each transaction completes, enabling high multiplexing. However, it prevents session-level features like prepared statements or session variables. Statement pooling offers maximum efficiency but requires queries to be completely independent.
Not all queries are equal. Some queries complete in milliseconds; others run for seconds or minutes. Some access hot data cached in memory; others trigger disk I/O. Effective load balancing accounts for these differences.
Preventing hotspots:
Hotspots occur when traffic concentrates on one replica while others remain underutilized. Common causes include:
Mitigation strategies:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
// Intelligent query routing with workload isolation type QueryCategory = 'simple' | 'complex' | 'analytics' | 'unknown'; interface ReplicaPool { name: string; replicas: Replica[]; balancer: LoadBalancer;} class QueryRouter { private pools: Map<QueryCategory, ReplicaPool>; private queryClassifier: QueryClassifier; constructor(pools: Map<QueryCategory, ReplicaPool>, classifier: QueryClassifier) { this.pools = pools; this.queryClassifier = classifier; } route(query: string): Replica | null { const category = this.classifyQuery(query); const pool = this.pools.get(category) ?? this.pools.get('unknown'); if (!pool) { throw new Error('No available replica pool'); } return pool.balancer.next(); } private classifyQuery(query: string): QueryCategory { const normalized = query.trim().toUpperCase(); // Simple heuristics - production systems use more sophisticated analysis if (normalized.startsWith('SELECT')) { // Check for analytics patterns if (this.isAnalyticsQuery(normalized)) { return 'analytics'; } // Check for complex patterns if (this.isComplexQuery(normalized)) { return 'complex'; } return 'simple'; } return 'unknown'; } private isAnalyticsQuery(query: string): boolean { const analyticsPatterns = [ /GROUP BY/i, /HAVING/i, /\bSUM\b|\bAVG\b|\bCOUNT\b.*\bFROM\b/i, /ORDER BY.*LIMIT\s+\d{4,}/i, // Large LIMIT /BETWEEN.*AND.*BETWEEN/i, // Multiple range conditions ]; return analyticsPatterns.some(pattern => pattern.test(query)); } private isComplexQuery(query: string): boolean { const joinCount = (query.match(/\bJOIN\b/gi) || []).length; const subqueryCount = (query.match(/\(\s*SELECT/gi) || []).length; return joinCount >= 3 || subqueryCount >= 2; }} // Example pool configurationconst simplePools = new Map<QueryCategory, ReplicaPool>([ ['simple', { name: 'OLTP Pool', replicas: [replica1, replica2], balancer: new LeastConnectionsBalancer([replica1, replica2]) }], ['analytics', { name: 'Analytics Pool', replicas: [replica3], balancer: new RoundRobinBalancer([replica3]) }], ['complex', { name: 'Complex Query Pool', replicas: [replica2, replica3], balancer: new LeastResponseTimeBalancer([replica2, replica3]) }], ['unknown', { name: 'Default Pool', replicas: [replica1, replica2, replica3], balancer: new RoundRobinBalancer([replica1, replica2, replica3]) }],]);Parsing queries to classify them adds latency. Keep classification logic lightweight—simple pattern matching rather than full SQL parsing. For complex classification needs, consider labeling queries at the application level (e.g., repository methods know their query category) rather than parsing SQL at runtime.
Robust load balancing anticipates failures. Replicas crash, become unreachable, or experience degraded performance. The system must continue operating with reduced capacity rather than failing entirely.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118
// Circuit breaker implementation for replica connections enum CircuitState { CLOSED, // Normal operation - requests flow through OPEN, // Failures exceeded threshold - requests fail fast HALF_OPEN, // Testing recovery - limited requests allowed} interface CircuitBreakerConfig { failureThreshold: number; // Failures before opening circuit successThreshold: number; // Successes in half-open before closing openDurationMs: number; // How long circuit stays open halfOpenMaxRequests: number; // Max requests allowed in half-open} class CircuitBreaker { private state: CircuitState = CircuitState.CLOSED; private failures = 0; private successes = 0; private lastFailureTime = 0; private halfOpenRequests = 0; private config: CircuitBreakerConfig; constructor(config: CircuitBreakerConfig) { this.config = config; } async execute<T>(operation: () => Promise<T>): Promise<T> { if (!this.canExecute()) { throw new Error('Circuit breaker is OPEN - failing fast'); } try { const result = await operation(); this.recordSuccess(); return result; } catch (error) { this.recordFailure(); throw error; } } private canExecute(): boolean { switch (this.state) { case CircuitState.CLOSED: return true; case CircuitState.OPEN: // Check if enough time has passed to try again if (Date.now() - this.lastFailureTime >= this.config.openDurationMs) { this.transitionTo(CircuitState.HALF_OPEN); return true; } return false; case CircuitState.HALF_OPEN: // Allow limited requests in half-open state return this.halfOpenRequests < this.config.halfOpenMaxRequests; } } private recordSuccess(): void { this.failures = 0; if (this.state === CircuitState.HALF_OPEN) { this.successes++; if (this.successes >= this.config.successThreshold) { this.transitionTo(CircuitState.CLOSED); } } } private recordFailure(): void { this.failures++; this.lastFailureTime = Date.now(); if (this.state === CircuitState.CLOSED) { if (this.failures >= this.config.failureThreshold) { this.transitionTo(CircuitState.OPEN); } } else if (this.state === CircuitState.HALF_OPEN) { // Single failure in half-open returns to open this.transitionTo(CircuitState.OPEN); } } private transitionTo(newState: CircuitState): void { console.log(`Circuit breaker: ${CircuitState[this.state]} -> ${CircuitState[newState]}`); this.state = newState; this.successes = 0; this.halfOpenRequests = 0; } getState(): CircuitState { return this.state; }} // Usage with replicasclass ReplicaWithCircuitBreaker { private replica: Replica; private circuitBreaker: CircuitBreaker; constructor(replica: Replica, config: CircuitBreakerConfig) { this.replica = replica; this.circuitBreaker = new CircuitBreaker(config); } async query<T>(sql: string, params: unknown[]): Promise<T> { return this.circuitBreaker.execute(() => this.replica.connection.query(sql, params) ); } isAvailable(): boolean { return this.circuitBreaker.getState() !== CircuitState.OPEN; }}When falling back to primary reads, monitor primary load carefully. If the primary becomes overwhelmed, you may need to shed load entirely (return errors) rather than cascade failure to the primary. Consider implementing request priority: critical reads get primary access, non-critical reads receive stale data or errors.
Effective load balancing requires comprehensive visibility into replica health, traffic distribution, and performance. Without observability, you're flying blind.
| Metric Category | Specific Metrics | Alert Thresholds | Dashboarding |
|---|---|---|---|
| Traffic Distribution | Queries per replica, QPS per replica | Deviation > 30% from mean | Stacked area chart over time |
| Latency | p50, p95, p99 per replica | p99 > 2x baseline | Heatmap with latency buckets |
| Error Rate | Errors per replica, error types | Error rate > 1% | Time series with error breakdown |
| Connection Pools | Active, idle, waiting connections per pool | Waiting > 0 sustained | Pool utilization gauges |
| Health Status | Healthy replica count, health check latency | Healthy < 2 (redundancy loss) | Status indicator panel |
| Replication Lag | Lag per replica in ms/bytes | Any replica > threshold | Per-replica lag chart |
| Circuit Breakers | Open circuits, trip frequency | Any circuit open | State history timeline |
Tracing distributed queries:
When debugging latency issues, correlate application-level traces with database query execution. Modern observability stacks support distributed tracing that follows a request from load balancer through replica selection to query execution and back.
Essential trace spans for database reads:
High latency in connection acquisition indicates pool exhaustion. High latency in replica selection may indicate a slow health check in the critical path. High query execution time suggests replica capacity issues or query optimization needs.
Define Service Level Objectives (SLOs) for read operations—e.g., '99% of reads complete in under 100ms.' Alert when error budget is being consumed too quickly rather than on individual threshold violations. This prevents alert fatigue while ensuring you catch meaningful degradation.
Effective load balancing transforms a collection of replicas into a cohesive, resilient read tier. The strategies covered here form the foundation for building systems that scale reliably.
What's next:
With load balancing fundamentals established, the next page examines Consistency Considerations—how to reason about data freshness, when strong consistency is required, and patterns for achieving read-your-writes semantics in replica architectures.
You now understand the algorithms, patterns, and operational practices for distributing read traffic across replicas. These concepts apply whether you're building custom routing logic or configuring managed load balancers and database proxies.