System Design (HLD)Why Replication?

Why Database Replication Matters

LevelIntermediate

Duration90 mins

TopicWhy Replication?

1 / 4

Read Scalability: The Primary Driver of Replication

The Read-Heavy Reality of Modern Systems

In the vast majority of production systems, reads vastly outnumber writes. Consider Twitter: for every tweet posted (a write), millions of users view their timelines (reads). On Amazon, for every purchase (a write), hundreds of product page views occur (reads). This asymmetry—often 100:1 or even 1000:1—creates a fundamental scaling challenge that shapes how we design database architectures.

A single database server, no matter how powerful, has finite capacity for handling concurrent connections, executing queries, and returning results. When read traffic grows beyond what a single machine can handle, the system buckles: response times spike, queries timeout, and users experience degraded service or complete unavailability.

Database replication directly addresses this challenge by distributing read operations across multiple copies of the data. Rather than funneling every read through a single database, replicas allow us to spread the load horizontally—transforming a vertical scaling problem into a horizontal scaling solution.

What You Will Learn

By the end of this page, you will understand why read scalability is the primary driver behind database replication, how replicas distribute read load, the mathematical foundations of horizontal read scaling, real-world patterns for routing reads to replicas, and the critical trade-offs between consistency and scalability that every architect must navigate.

The Read/Write Asymmetry Problem

Understanding read scalability begins with recognizing the fundamental asymmetry between read and write operations in production systems. This asymmetry exists across virtually every domain and application type.

Why reads dominate:

Most user interactions with software systems are consumptive rather than generative. Users browse, search, view, and read far more often than they create, update, or delete. Even in systems designed around content creation—social media, e-commerce, content management—the ratio of consumers to creators ensures that reads will always dominate.

Consider these real-world read/write ratios from production systems:

Read/Write Ratios in Production Systems
System Type	Read/Write Ratio	Explanation
News/Media Sites	10,000:1 or higher	Millions view articles; few writers publish content
E-commerce Catalog	500:1 to 1,000:1	Product browsing vastly exceeds purchases/inventory updates
Social Media Feeds	100:1 to 500:1	Timeline views dwarf post creation
User Authentication	50:1 to 100:1	Logins far exceed password changes or signups
Analytics Dashboards	1,000:1 or higher	Pre-aggregated data viewed repeatedly; raw data ingested once
Collaborative Editing	10:1 to 50:1	Even in write-heavy apps, reads for syncing dominate

The single-server bottleneck:

A standalone database server—regardless of hardware specifications—faces hard limits on concurrent query execution. These limits stem from:

CPU constraints: Query parsing, optimization, and execution consume CPU cycles
Memory constraints: Active connections, query buffers, and result sets require memory
I/O constraints: Disk reads for non-cached data introduce latency
Connection limits: Each client connection consumes resources; limits are typically 100-1000 for standard databases

When read traffic approaches these limits, the database degrades gracefully at first (slower responses) then catastrophically (connection refused, timeouts). Vertical scaling—adding more CPU, memory, or faster disks—provides temporary relief but has diminishing returns and hard ceilings.

The Vertical Scaling Trap

Upgrading to a more powerful single server is appealing in its simplicity, but it's a trap. Each doubling of capacity typically costs more than 2x (superlinear costs), and physical limits are absolute—you cannot buy a server with unlimited RAM or CPU. Additionally, a single server remains a single point of failure no matter how powerful it is.

How Replication Enables Read Scaling

Database replication fundamentally transforms the read scaling problem by creating multiple identical copies of data across independent servers. Each replica can independently serve read queries, effectively multiplying the system's read capacity.

The core mechanism:

A primary (or leader/master) database handles all write operations
One or more replicas (or followers/slaves) maintain copies of the primary's data
Changes from the primary are propagated to replicas through replication logs
Read queries can be directed to any replica, distributing the load

This architecture leverages a powerful insight: data can be copied, but it doesn't need to be re-computed. Once a write commits to the primary, the resulting state can be shared with replicas at minimal cost—far less than re-executing the business logic that produced the write.

Linear read scaling:

In an idealized model, adding replicas provides near-linear read scaling:

Read Capacity Scaling with Replicas
Configuration	Read Capacity	Relative Improvement
1 Primary (no replicas)	1x (baseline)	—
1 Primary + 1 Replica	~2x	+100%
1 Primary + 2 Replicas	~3x	+200%
1 Primary + 4 Replicas	~5x	+400%
1 Primary + 9 Replicas	~10x	+900%

Reality check—why scaling isn't perfectly linear:

In practice, several factors prevent perfect linear scaling:

Replication overhead: The primary must send data to all replicas, consuming CPU and network bandwidth
Consistency requirements: Strong consistency may require waiting for replica acknowledgment, adding latency
Connection routing overhead: Load balancers or proxy layers introduce additional latency
Hotspot data: If certain data is accessed disproportionately, caching at the application layer may be more effective than adding replicas

Nevertheless, replication typically achieves 70-90% of theoretical linear scaling—a remarkable improvement over single-server architectures.

The Economics of Read Replicas

Read replicas are often the most cost-effective scaling strategy. Adding a replica typically costs the same as the primary (or less, since replicas don't need write-optimized storage). Yet each replica can potentially double your read throughput. Compare this to upgrading the primary to a 2x more powerful instance, which often costs 3-4x more due to superlinear cloud pricing.

Read Replica Architectures

Several architectural patterns exist for implementing read replicas, each with distinct trade-offs in complexity, consistency, and performance.

Streaming replication (physical replication):

In streaming replication, the primary continuously streams its transaction log (WAL in PostgreSQL, binary log in MySQL) to replicas. Replicas apply these log entries in order, maintaining an exact binary copy of the primary's data files.

Advantages:

Low overhead: replicas apply pre-computed changes
Byte-for-byte identical data across all nodes
Relatively simple to set up with modern databases

Disadvantages:

Replicas must use identical database version and architecture
All data is replicated; no selective filtering
Storage requirements scale linearly with replicas

Logical replication:

Logical replication transmits higher-level changes (INSERT, UPDATE, DELETE statements or row-level changes) rather than low-level log entries. This allows more flexibility in how replicas process changes.

Advantages:

Replicas can have different schemas, indexes, or even database versions
Selective replication: replicate only specific tables or columns
Enables heterogeneous replication (e.g., PostgreSQL to MySQL)

Disadvantages:

Higher overhead: changes must be decoded and re-applied
Some operations (DDL, sequences) may require special handling
More complex conflict resolution if multiple primaries exist

Synchronous vs. asynchronous replication:

A critical architectural choice is whether writes wait for replica acknowledgment:

Synchronous Replication

•Writes commit only after replica confirms receipt
•Guarantees: Reading from any replica returns latest committed data
•Latency cost: Write latency includes network round-trip to replica
•Availability risk: If replica is unavailable, writes may block or fail
•Use when: Strong consistency is critical (financial transactions, inventory systems)

Asynchronous Replication

•Writes commit immediately; replicas updated in background
•Trade-off: Replica may lag behind primary (stale reads possible)
•Latency benefit: Write latency unaffected by replica location
•Availability benefit: Writes succeed even if replicas temporarily unavailable
•Use when: Eventual consistency acceptable (social media, analytics, caching)

Semi-synchronous replication:

A hybrid approach where the primary waits for at least one replica (but not all) to acknowledge before committing. This provides a balance: stronger durability guarantees than fully asynchronous, lower latency than fully synchronous.

Most production systems serving high-traffic reads use asynchronous replication due to its performance benefits. The consistency implications are managed through careful application design—understanding which reads can tolerate staleness and which require fresh data from the primary.

Read Routing Strategies

Having replicas is only valuable if read traffic is effectively distributed across them. Several strategies exist for routing reads, each with implications for consistency, performance, and operational complexity.

Connection-level routing:

The simplest approach is to direct different connections to different database endpoints:

Applications connect to a read-only endpoint that load-balances across replicas
Write operations use the primary endpoint
Cloud providers (AWS RDS, Google Cloud SQL, Azure Database) provide these endpoints automatically

Trade-off: All reads from a single connection go to one replica. If certain connections are "hot" (high query volume), load distribution may be uneven.

Query-level routing:

More sophisticated systems route individual queries based on type or content:

Parse each query; send writes to primary, reads to replicas
Implemented via database proxy (ProxySQL, PgBouncer in statement mode, Vitess)
Some frameworks (Rails, Django, Spring) provide built-in read/write splitting

Trade-off: Adds latency for query parsing; requires careful handling of transactions that mix reads and writes.

Consistency-aware routing:

Advanced applications route based on consistency requirements:

Strong reads (user just wrote; must see their own write): Route to primary
Bounded-staleness reads (can tolerate data up to N seconds old): Route to any replica with lag < N
Eventual reads (any recent state acceptable): Route to least-loaded replica

This approach maximizes replica utilization while respecting application consistency needs.

read-routing-example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
"""
Consistency-aware read routing example.
Demonstrates routing reads based on freshness requirements.
"""
from enum import Enum
from typing import Optional, Any
import time
 
class ReadConsistency(Enum):
    STRONG = "strong"        # Must read latest committed data
    BOUNDED = "bounded"      # Can tolerate up to N seconds of staleness
    EVENTUAL = "eventual"    # Any recent state is acceptable
 
class DatabaseRouter:
    def __init__(self, primary_conn, replica_pool, max_acceptable_lag_seconds=5):
        self.primary = primary_conn
        self.replicas = replica_pool  # List of (connection, current_lag_seconds)
        self.max_lag = max_acceptable_lag_seconds
        self.last_write_timestamp = {}  # user_id -> timestamp
    
    def execute_read(
        self,
        query: str,
        consistency: ReadConsistency = ReadConsistency.EVENTUAL,
        user_id: Optional[str] = None,
        params: tuple = ()
    ) -> Any:
        """
        Route read query based on consistency requirements.
        """
        if consistency == ReadConsistency.STRONG:
            # Strong consistency: always read from primary
            return self._execute_on_primary(query, params)
        
        if consistency == ReadConsistency.BOUNDED:
            # Bounded staleness: use replica within acceptable lag
            replica = self._get_replica_with_lag_under(self.max_lag)
            if replica:
                return self._execute_on_replica(replica, query, params)
            # Fallback to primary if no replica meets lag requirement
            return self._execute_on_primary(query, params)
        
        if consistency == ReadConsistency.EVENTUAL:
            # Read-your-writes: if user recently wrote, route to primary
            if user_id and self._user_recently_wrote(user_id):
                return self._execute_on_primary(query, params)
            
            # Otherwise, use least-loaded replica
            replica = self._get_least_loaded_replica()
            return self._execute_on_replica(replica, query, params)
    
    def execute_write(self, query: str, user_id: str, params: tuple = ()) -> Any:
        """
        Execute write on primary and track write timestamp for user.
        """
        result = self._execute_on_primary(query, params)
        self.last_write_timestamp[user_id] = time.time()
        return result
    
    def _user_recently_wrote(self, user_id: str, window_seconds: int = 10) -> bool:
        """Check if user wrote within the recent window."""
        last_write = self.last_write_timestamp.get(user_id, 0)
        return (time.time() - last_write) < window_seconds
    
    def _get_replica_with_lag_under(self, max_lag: float):
        """Return a replica with replication lag under threshold."""
        eligible = [(conn, lag) for conn, lag in self.replicas if lag < max_lag]
        if eligible:
            # Return replica with lowest lag among eligible
            return min(eligible, key=lambda x: x[1])[0]
        return None
    
    def _get_least_loaded_replica(self):
        """Return the replica with lowest current load."""
        # In production, this would check connection counts, query queues, etc.
        return self.replicas[0][0] if self.replicas else self.primary
    
    def _execute_on_primary(self, query: str, params: tuple) -> Any:
        return self.primary.execute(query, params)
    
    def _execute_on_replica(self, replica, query: str, params: tuple) -> Any:
        return replica.execute(query, params)
 
# Usage example
# router.execute_read("SELECT * FROM products WHERE id = %s", 
#                     consistency=ReadConsistency.EVENTUAL, 
#                     params=(product_id,))
# router.execute_read("SELECT * FROM orders WHERE user_id = %s",
#                     consistency=ReadConsistency.STRONG,
#                     user_id=current_user_id,
#                     params=(current_user_id,))

Read-Your-Writes Consistency

A critical pattern in read routing is ensuring users see their own writes. If a user updates their profile and immediately views it, they must see the update—not stale data from a lagging replica. This is achieved by routing that user's reads to the primary for a short window after any write, then returning to replica reads.

Measuring and Monitoring Read Scaling

Effective read scaling requires continuous measurement and monitoring. Key metrics reveal whether replicas are providing the expected benefits and highlight emerging problems.

Essential metrics:

Key Metrics for Read Scaling

•Queries per second (QPS) per node: Monitor read throughput on primary and each replica. Healthy distribution shows similar QPS across replicas.
•Read latency percentiles (p50, p95, p99): Track response times. Replicas should have comparable latency to the primary for similar queries.
•Replication lag: Time (or transaction count) behind the primary. Lag indicates staleness risk and potential consistency issues.
•Connection counts per node: Ensure connections are distributed. Skewed distribution suggests routing problems.
•CPU and memory utilization per node: Replicas approaching resource limits may need horizontal expansion.
•Cache hit rates: Both buffer pool (database) and application cache. High cache hits reduce replica load but may mask capacity issues.

Replication lag—the critical metric:

Replication lag is the most important metric for read-heavy systems. It measures how far behind replicas are from the primary and directly impacts consistency.

Causes of replication lag:

Large write volume overwhelming replica's ability to apply changes
Long-running transactions on replicas blocking replication
Network latency between primary and replicas (especially cross-region)
Resource contention on replicas (I/O, CPU)
Schema changes or DDL operations

Acceptable lag depends on the application:

0 seconds: Only achievable with synchronous replication; adds write latency
< 1 second: Suitable for most interactive applications
1-5 seconds: Acceptable for analytics, reporting, search indexing
> 5 seconds: May cause user-visible inconsistencies; investigate causes

Alerting thresholds:

Warning: Lag exceeds 5 seconds
Critical: Lag exceeds 30 seconds or stops progressing (replication stalled)

Lag Spikes During Traffic Surges

Replication lag often spikes during high-traffic events (sales, product launches, viral content). This is exactly when read scaling is most critical. Design systems to degrade gracefully: route consistency-sensitive reads to primary during lag spikes, and ensure application UX handles potential staleness.

Real-World Case Studies

Understanding how major platforms implement read scaling provides actionable insights for system design.

Case Study 1: GitHub

GitHub serves billions of read requests daily—repository file views, issue listings, pull request discussions. Their approach:

MySQL primary-replica architecture with dozens of read replicas
Different replica pools for different access patterns (web traffic, API, CI/CD)
ProxySQL for intelligent query routing
Application-level read/write splitting with connection objects explicitly marked as read-only
Aggressive caching layer (Memcached) in front of replicas for hot data

Key insight: Separating replica pools by access pattern allows independent scaling. CI/CD traffic (automated, bursty) doesn't compete with interactive web traffic.

Case Study 2: Shopify

Shopify handles over 500 million unique visitors monthly and must scale reads during flash sales:

Pod-based architecture: each merchant's data in an isolated shard
Each shard has 1 primary + multiple replicas
Replicas categorized: "fast" (low lag, interactive queries) and "slow" (higher lag, acceptable for reports)
Automatic failover with Orchestrator

Key insight: Sharding + replication compound benefits. Each shard scales independently, and replicas within each shard handle read volume.

Case Study 3: Netflix

Netflix serves 200+ million subscribers, each with personalized recommendations requiring massive read throughput:

Services use Cassandra (distributed, eventually consistent) for most data
Cassandra's architecture means every node can handle reads—inherent horizontal scaling
Critical metadata uses MySQL with read replicas across multiple AWS regions
Heavy use of caching (EVCache, based on Memcached) to reduce database load

Key insight: Choosing the right database architecture—one with built-in horizontal read scaling—reduces operational complexity. Cassandra's design eliminates the primary/replica distinction for reads.

Read Scaling Approaches: Summary
Company	Primary Database	Read Scaling Strategy	Special Techniques
GitHub	MySQL	Multi-pool read replicas + ProxySQL	Access-pattern-specific replica pools
Shopify	MySQL	Sharding + per-shard replicas	Fast/slow replica tiers
Netflix	Cassandra + MySQL	Distributed database + regional replicas	Heavy edge caching
Facebook	MySQL + TAO	Custom caching layer (TAO) + replicas	Graph-aware caching
Uber	MySQL + Schemaless	Sharded keyspace + replicas	Cell-based architecture

Trade-offs and Design Decisions

Read scaling through replication is not without costs. Understanding these trade-offs is essential for making informed architectural decisions.

Consistency vs. scalability:

The fundamental trade-off: stronger consistency limits scalability.

Synchronous replication ensures consistency but adds write latency
Asynchronous replication maximizes throughput but introduces stale read risk
Applications must be designed to handle potential inconsistencies

Operational complexity:

Each replica adds operational burden:

Monitoring and alerting for additional nodes
Backup and restore procedures
Software upgrades must be coordinated
Failure handling and replica promotion

Storage costs:

Replicas duplicate data storage. For a 1TB database:

1 primary + 4 replicas = 5TB total storage
Storage costs multiply linearly (though cloud providers often charge less for replica storage)

Write amplification:

Every write to the primary must be replicated to all replicas:

Primary's network egress scales with replica count
Very high write volume may bottleneck primary
Consider whether all data needs replication (logical replication may help)

Cache vs. replica decision:

For some workloads, application-level caching may be more effective than adding replicas:

If 90% of reads hit 10% of data, caching that hot data is cheaper
Caches (Redis, Memcached) are typically faster than database queries
Trade-off: caches require invalidation logic; replicas automatically stay consistent

When to Add Replicas vs. Cache

Add replicas when: read patterns are diverse (no hot data), consistency matters, or cache hit rates are already optimized. Add cache when: a small subset of data is extremely hot, sub-millisecond latency is required, or you want to reduce database load for repeated identical queries.

Design Decision Checklist

•What is the read/write ratio? If < 10:1, replicas may not provide significant benefit.
•What consistency is required? Strong consistency limits routing flexibility.
•What latency is acceptable? Synchronous replication adds write latency equal to network round-trip.
•What is the hot data percentage? If small, caching may be more effective.
•What is the failure domain? Cross-region replicas provide both read scaling and disaster recovery.
•What is the operational capacity? Each replica adds monitoring, upgrade, and failover complexity.

Summary: Read Scalability Through Replication

Read scalability is the most common and often the primary motivation for database replication. Let's consolidate the key concepts:

Key Takeaways

•Read/write asymmetry is universal — Most systems are heavily read-dominated (100:1 or higher), making read scaling the critical bottleneck.
•Replication provides near-linear read scaling — Each replica added roughly doubles read capacity, with 70-90% efficiency.
•Synchronous vs. asynchronous — Choose synchronized replication for strong consistency at the cost of latency; asynchronous for throughput with eventual consistency.
•Routing strategy matters — Connection-level, query-level, or consistency-aware routing each have trade-offs in simplicity, performance, and correctness.
•Monitor replication lag religiously — Lag is the leading indicator of consistency problems and capacity limits.
•Consider the cache/replica decision — Sometimes application-level caching is more effective than additional replicas.
•Trade-offs are unavoidable — Stronger consistency, operational simplicity, and cost savings cannot be maximized simultaneously.

What's next:

Read scalability addresses the performance aspect of replication. But what happens when your primary database fails? The next page explores High Availability—how replication enables systems to survive hardware failures, network partitions, and other disasters without losing data or availability.

Page Complete

You now understand read scalability as a fundamental motivation for database replication. You can evaluate when replication is the right solution, design routing strategies for consistency requirements, and monitor systems for healthy read distribution. Next, we explore how replication enables high availability.

1 / 4

Loading learning content...

System Design (HLD)Why Replication?

Why Database Replication Matters

LevelIntermediate

Duration90 mins

TopicWhy Replication?

1 / 4

Read Scalability: The Primary Driver of Replication

The Read-Heavy Reality of Modern Systems

What You Will Learn

The Read/Write Asymmetry Problem

Why reads dominate:

Consider these real-world read/write ratios from production systems:

Read/Write Ratios in Production Systems
System Type	Read/Write Ratio	Explanation
News/Media Sites	10,000:1 or higher	Millions view articles; few writers publish content
E-commerce Catalog	500:1 to 1,000:1	Product browsing vastly exceeds purchases/inventory updates
Social Media Feeds	100:1 to 500:1	Timeline views dwarf post creation
User Authentication	50:1 to 100:1	Logins far exceed password changes or signups
Analytics Dashboards	1,000:1 or higher	Pre-aggregated data viewed repeatedly; raw data ingested once
Collaborative Editing	10:1 to 50:1	Even in write-heavy apps, reads for syncing dominate

The single-server bottleneck:

A standalone database server—regardless of hardware specifications—faces hard limits on concurrent query execution. These limits stem from:

CPU constraints: Query parsing, optimization, and execution consume CPU cycles
Memory constraints: Active connections, query buffers, and result sets require memory
I/O constraints: Disk reads for non-cached data introduce latency
Connection limits: Each client connection consumes resources; limits are typically 100-1000 for standard databases

The Vertical Scaling Trap

How Replication Enables Read Scaling

The core mechanism:

A primary (or leader/master) database handles all write operations
One or more replicas (or followers/slaves) maintain copies of the primary's data
Changes from the primary are propagated to replicas through replication logs
Read queries can be directed to any replica, distributing the load

Linear read scaling:

In an idealized model, adding replicas provides near-linear read scaling:

Read Capacity Scaling with Replicas
Configuration	Read Capacity	Relative Improvement
1 Primary (no replicas)	1x (baseline)	—
1 Primary + 1 Replica	~2x	+100%
1 Primary + 2 Replicas	~3x	+200%
1 Primary + 4 Replicas	~5x	+400%
1 Primary + 9 Replicas	~10x	+900%

Reality check—why scaling isn't perfectly linear:

In practice, several factors prevent perfect linear scaling:

Replication overhead: The primary must send data to all replicas, consuming CPU and network bandwidth
Consistency requirements: Strong consistency may require waiting for replica acknowledgment, adding latency
Connection routing overhead: Load balancers or proxy layers introduce additional latency
Hotspot data: If certain data is accessed disproportionately, caching at the application layer may be more effective than adding replicas

Nevertheless, replication typically achieves 70-90% of theoretical linear scaling—a remarkable improvement over single-server architectures.

The Economics of Read Replicas

Read Replica Architectures

Several architectural patterns exist for implementing read replicas, each with distinct trade-offs in complexity, consistency, and performance.

Streaming replication (physical replication):

Advantages:

Low overhead: replicas apply pre-computed changes
Byte-for-byte identical data across all nodes
Relatively simple to set up with modern databases

Disadvantages:

Replicas must use identical database version and architecture
All data is replicated; no selective filtering
Storage requirements scale linearly with replicas

Logical replication:

Advantages:

Replicas can have different schemas, indexes, or even database versions
Selective replication: replicate only specific tables or columns
Enables heterogeneous replication (e.g., PostgreSQL to MySQL)

Disadvantages:

Higher overhead: changes must be decoded and re-applied
Some operations (DDL, sequences) may require special handling
More complex conflict resolution if multiple primaries exist

Synchronous vs. asynchronous replication:

A critical architectural choice is whether writes wait for replica acknowledgment:

Synchronous Replication

•Writes commit only after replica confirms receipt
•Guarantees: Reading from any replica returns latest committed data
•Latency cost: Write latency includes network round-trip to replica
•Availability risk: If replica is unavailable, writes may block or fail
•Use when: Strong consistency is critical (financial transactions, inventory systems)

Asynchronous Replication

•Writes commit immediately; replicas updated in background
•Trade-off: Replica may lag behind primary (stale reads possible)
•Latency benefit: Write latency unaffected by replica location
•Availability benefit: Writes succeed even if replicas temporarily unavailable
•Use when: Eventual consistency acceptable (social media, analytics, caching)

Semi-synchronous replication:

Read Routing Strategies

Connection-level routing:

The simplest approach is to direct different connections to different database endpoints:

Applications connect to a read-only endpoint that load-balances across replicas
Write operations use the primary endpoint
Cloud providers (AWS RDS, Google Cloud SQL, Azure Database) provide these endpoints automatically

Trade-off: All reads from a single connection go to one replica. If certain connections are "hot" (high query volume), load distribution may be uneven.

Query-level routing:

More sophisticated systems route individual queries based on type or content:

Parse each query; send writes to primary, reads to replicas
Implemented via database proxy (ProxySQL, PgBouncer in statement mode, Vitess)
Some frameworks (Rails, Django, Spring) provide built-in read/write splitting

Trade-off: Adds latency for query parsing; requires careful handling of transactions that mix reads and writes.

Consistency-aware routing:

Advanced applications route based on consistency requirements:

Strong reads (user just wrote; must see their own write): Route to primary
Bounded-staleness reads (can tolerate data up to N seconds old): Route to any replica with lag < N
Eventual reads (any recent state acceptable): Route to least-loaded replica

This approach maximizes replica utilization while respecting application consistency needs.

read-routing-example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
"""
Consistency-aware read routing example.
Demonstrates routing reads based on freshness requirements.
"""
from enum import Enum
from typing import Optional, Any
import time
 
class ReadConsistency(Enum):
    STRONG = "strong"        # Must read latest committed data
    BOUNDED = "bounded"      # Can tolerate up to N seconds of staleness
    EVENTUAL = "eventual"    # Any recent state is acceptable
 
class DatabaseRouter:
    def __init__(self, primary_conn, replica_pool, max_acceptable_lag_seconds=5):
        self.primary = primary_conn
        self.replicas = replica_pool  # List of (connection, current_lag_seconds)
        self.max_lag = max_acceptable_lag_seconds
        self.last_write_timestamp = {}  # user_id -> timestamp
    
    def execute_read(
        self,
        query: str,
        consistency: ReadConsistency = ReadConsistency.EVENTUAL,
        user_id: Optional[str] = None,
        params: tuple = ()
    ) -> Any:
        """
        Route read query based on consistency requirements.
        """
        if consistency == ReadConsistency.STRONG:
            # Strong consistency: always read from primary
            return self._execute_on_primary(query, params)
        
        if consistency == ReadConsistency.BOUNDED:
            # Bounded staleness: use replica within acceptable lag
            replica = self._get_replica_with_lag_under(self.max_lag)
            if replica:
                return self._execute_on_replica(replica, query, params)
            # Fallback to primary if no replica meets lag requirement
            return self._execute_on_primary(query, params)
        
        if consistency == ReadConsistency.EVENTUAL:
            # Read-your-writes: if user recently wrote, route to primary
            if user_id and self._user_recently_wrote(user_id):
                return self._execute_on_primary(query, params)
            
            # Otherwise, use least-loaded replica
            replica = self._get_least_loaded_replica()
            return self._execute_on_replica(replica, query, params)
    
    def execute_write(self, query: str, user_id: str, params: tuple = ()) -> Any:
        """
        Execute write on primary and track write timestamp for user.
        """
        result = self._execute_on_primary(query, params)
        self.last_write_timestamp[user_id] = time.time()
        return result
    
    def _user_recently_wrote(self, user_id: str, window_seconds: int = 10) -> bool:
        """Check if user wrote within the recent window."""
        last_write = self.last_write_timestamp.get(user_id, 0)
        return (time.time() - last_write) < window_seconds
    
    def _get_replica_with_lag_under(self, max_lag: float):
        """Return a replica with replication lag under threshold."""
        eligible = [(conn, lag) for conn, lag in self.replicas if lag < max_lag]
        if eligible:
            # Return replica with lowest lag among eligible
            return min(eligible, key=lambda x: x[1])[0]
        return None
    
    def _get_least_loaded_replica(self):
        """Return the replica with lowest current load."""
        # In production, this would check connection counts, query queues, etc.
        return self.replicas[0][0] if self.replicas else self.primary
    
    def _execute_on_primary(self, query: str, params: tuple) -> Any:
        return self.primary.execute(query, params)
    
    def _execute_on_replica(self, replica, query: str, params: tuple) -> Any:
        return replica.execute(query, params)
 
# Usage example
# router.execute_read("SELECT * FROM products WHERE id = %s", 
#                     consistency=ReadConsistency.EVENTUAL, 
#                     params=(product_id,))
# router.execute_read("SELECT * FROM orders WHERE user_id = %s",
#                     consistency=ReadConsistency.STRONG,
#                     user_id=current_user_id,
#                     params=(current_user_id,))

Read-Your-Writes Consistency

Measuring and Monitoring Read Scaling

Effective read scaling requires continuous measurement and monitoring. Key metrics reveal whether replicas are providing the expected benefits and highlight emerging problems.

Essential metrics:

Key Metrics for Read Scaling

•Queries per second (QPS) per node: Monitor read throughput on primary and each replica. Healthy distribution shows similar QPS across replicas.
•Read latency percentiles (p50, p95, p99): Track response times. Replicas should have comparable latency to the primary for similar queries.
•Replication lag: Time (or transaction count) behind the primary. Lag indicates staleness risk and potential consistency issues.
•Connection counts per node: Ensure connections are distributed. Skewed distribution suggests routing problems.
•CPU and memory utilization per node: Replicas approaching resource limits may need horizontal expansion.
•Cache hit rates: Both buffer pool (database) and application cache. High cache hits reduce replica load but may mask capacity issues.

Replication lag—the critical metric:

Replication lag is the most important metric for read-heavy systems. It measures how far behind replicas are from the primary and directly impacts consistency.

Causes of replication lag:

Large write volume overwhelming replica's ability to apply changes
Long-running transactions on replicas blocking replication
Network latency between primary and replicas (especially cross-region)
Resource contention on replicas (I/O, CPU)
Schema changes or DDL operations

Acceptable lag depends on the application:

0 seconds: Only achievable with synchronous replication; adds write latency
< 1 second: Suitable for most interactive applications
1-5 seconds: Acceptable for analytics, reporting, search indexing
> 5 seconds: May cause user-visible inconsistencies; investigate causes

Alerting thresholds:

Warning: Lag exceeds 5 seconds
Critical: Lag exceeds 30 seconds or stops progressing (replication stalled)

Lag Spikes During Traffic Surges

Real-World Case Studies

Understanding how major platforms implement read scaling provides actionable insights for system design.

Case Study 1: GitHub

GitHub serves billions of read requests daily—repository file views, issue listings, pull request discussions. Their approach:

MySQL primary-replica architecture with dozens of read replicas
Different replica pools for different access patterns (web traffic, API, CI/CD)
ProxySQL for intelligent query routing
Application-level read/write splitting with connection objects explicitly marked as read-only
Aggressive caching layer (Memcached) in front of replicas for hot data

Key insight: Separating replica pools by access pattern allows independent scaling. CI/CD traffic (automated, bursty) doesn't compete with interactive web traffic.

Case Study 2: Shopify

Shopify handles over 500 million unique visitors monthly and must scale reads during flash sales:

Pod-based architecture: each merchant's data in an isolated shard
Each shard has 1 primary + multiple replicas
Replicas categorized: "fast" (low lag, interactive queries) and "slow" (higher lag, acceptable for reports)
Automatic failover with Orchestrator

Key insight: Sharding + replication compound benefits. Each shard scales independently, and replicas within each shard handle read volume.

Case Study 3: Netflix

Netflix serves 200+ million subscribers, each with personalized recommendations requiring massive read throughput:

Services use Cassandra (distributed, eventually consistent) for most data
Cassandra's architecture means every node can handle reads—inherent horizontal scaling
Critical metadata uses MySQL with read replicas across multiple AWS regions
Heavy use of caching (EVCache, based on Memcached) to reduce database load

Read Scaling Approaches: Summary
Company	Primary Database	Read Scaling Strategy	Special Techniques
GitHub	MySQL	Multi-pool read replicas + ProxySQL	Access-pattern-specific replica pools
Shopify	MySQL	Sharding + per-shard replicas	Fast/slow replica tiers
Netflix	Cassandra + MySQL	Distributed database + regional replicas	Heavy edge caching
Facebook	MySQL + TAO	Custom caching layer (TAO) + replicas	Graph-aware caching
Uber	MySQL + Schemaless	Sharded keyspace + replicas	Cell-based architecture

Trade-offs and Design Decisions

Read scaling through replication is not without costs. Understanding these trade-offs is essential for making informed architectural decisions.

Consistency vs. scalability:

The fundamental trade-off: stronger consistency limits scalability.

Synchronous replication ensures consistency but adds write latency
Asynchronous replication maximizes throughput but introduces stale read risk
Applications must be designed to handle potential inconsistencies

Operational complexity:

Each replica adds operational burden:

Monitoring and alerting for additional nodes
Backup and restore procedures
Software upgrades must be coordinated
Failure handling and replica promotion

Storage costs:

Replicas duplicate data storage. For a 1TB database:

1 primary + 4 replicas = 5TB total storage
Storage costs multiply linearly (though cloud providers often charge less for replica storage)

Write amplification:

Every write to the primary must be replicated to all replicas:

Primary's network egress scales with replica count
Very high write volume may bottleneck primary
Consider whether all data needs replication (logical replication may help)

Cache vs. replica decision:

For some workloads, application-level caching may be more effective than adding replicas:

If 90% of reads hit 10% of data, caching that hot data is cheaper
Caches (Redis, Memcached) are typically faster than database queries
Trade-off: caches require invalidation logic; replicas automatically stay consistent

When to Add Replicas vs. Cache

Design Decision Checklist

•What is the read/write ratio? If < 10:1, replicas may not provide significant benefit.
•What consistency is required? Strong consistency limits routing flexibility.
•What latency is acceptable? Synchronous replication adds write latency equal to network round-trip.
•What is the hot data percentage? If small, caching may be more effective.
•What is the failure domain? Cross-region replicas provide both read scaling and disaster recovery.
•What is the operational capacity? Each replica adds monitoring, upgrade, and failover complexity.

Summary: Read Scalability Through Replication

Read scalability is the most common and often the primary motivation for database replication. Let's consolidate the key concepts:

Key Takeaways

•Read/write asymmetry is universal — Most systems are heavily read-dominated (100:1 or higher), making read scaling the critical bottleneck.
•Replication provides near-linear read scaling — Each replica added roughly doubles read capacity, with 70-90% efficiency.
•Synchronous vs. asynchronous — Choose synchronized replication for strong consistency at the cost of latency; asynchronous for throughput with eventual consistency.
•Routing strategy matters — Connection-level, query-level, or consistency-aware routing each have trade-offs in simplicity, performance, and correctness.
•Monitor replication lag religiously — Lag is the leading indicator of consistency problems and capacity limits.
•Consider the cache/replica decision — Sometimes application-level caching is more effective than additional replicas.
•Trade-offs are unavoidable — Stronger consistency, operational simplicity, and cost savings cannot be maximized simultaneously.

What's next:

Page Complete

1 / 4