Loading content...
Data replication is the beating heart of geo-distributed systems. Whether you're implementing active-passive for disaster recovery or active-active for global low latency, the ability to keep data synchronized across regions determines your system's consistency, availability, and durability characteristics.
This is also where distributed systems become genuinely difficult. The CAP theorem reminds us that network partitions are inevitable, and we must choose between consistency and availability. The PACELC theorem adds that even without partitions, we trade off between latency and consistency. Data replication is where these theoretical constraints become operational realities.
In this page, we'll explore the full spectrum of replication strategies, understanding not just their mechanics but when to apply each approach.
By the end of this page, you'll understand synchronous vs asynchronous replication trade-offs, how replication lag affects user experience and application design, consistency models and their implications, database-specific replication mechanisms, and strategies for handling replication failures and recovery.
The fundamental choice in data replication is whether writes must be confirmed on remote replicas before being acknowledged to clients (synchronous) or if writes can be acknowledged immediately and replicated in the background (asynchronous).
Mechanism: A write is not considered complete until it has been durably stored on all (or a quorum of) replicas, including those in remote regions.
Client → Primary → [Write to local storage]
→ [Replicate to Region B, wait for ack]
→ [Replicate to Region C, wait for ack]
← [All acks received, acknowledge to client]
Characteristics:
Mechanism: A write is acknowledged after local storage, with replication occurring in the background.
Client → Primary → [Write to local storage]
← [Immediately acknowledge to client]
→ [Background: Replicate to Region B]
→ [Background: Replicate to Region C]
Characteristics:
| Characteristic | Synchronous | Asynchronous |
|---|---|---|
| Write Latency | High (includes network RTT) | Low (local only) |
| Read Consistency | Strong (always current) | Eventual (may be stale) |
| RPO (Data Loss) | Zero | Seconds to minutes |
| Availability | Lower (requires replicas) | Higher (independent of replicas) |
| Throughput | Limited by slowest replica | Limited by local resources |
| Failure Handling | Complex (blocked writes) | Simple (queue and retry) |
| Use Cases | Financial, critical data | Most web applications |
A hybrid approach that balances latency and durability:
Mechanism: Write acknowledged after confirmation from primary plus at least one remote replica (not all).
Client → Primary → [Write to local storage]
→ [Replicate to Region B, wait for ack] ← First ack
← [Acknowledge to client]
→ [Background: Replicate to Region C]
Characteristics:
Mechanism: Write acknowledged after W replicas confirm, reads require R replicas to respond. Configured such that W + R > N (total replicas).
Example (N=5, W=3, R=3):
Tuning Trade-offs:
For cross-region replication, asynchronous is the default choice for most applications. The latency penalty of waiting for cross-continental round trips (100-300ms per write) is prohibitive for interactive applications. Use synchronous replication only when zero data loss is absolutely required and you can accept the latency and availability implications.
Asynchronous replication introduces replication lag: the delay between a write being committed on the primary and becoming visible on replicas. Understanding lag is crucial for designing systems that behave correctly despite it.
Network Latency: Minimum lag is bound by network round-trip time. Cross-continental links have 100-300ms latency.
Replication Transport: Time to serialize, transmit, and deserialize changes. Depends on change volume and network bandwidth.
Replica Apply Time: Time for replica to apply changes locally. Can spike during complex operations (large transactions, schema changes).
Queue Depth: If replica falls behind (apply time > arrival rate), queue builds up. Lag can grow unboundedly until queue drains.
Same-region replicas:
Cross-region replicas:
| Lag Duration | User Experience Impact | Application Design Implications |
|---|---|---|
| <100ms | Imperceptible | Most operations work normally |
| 100ms-1s | Refreshing sees old data briefly | Read-your-writes needed for writes |
| 1-5s | Noticeable inconsistency | UI must handle stale reads explicitly |
| 5-30s | Significant confusion possible | Operations may need to route to primary |
30s | Features may appear broken | Degraded mode, explicit warnings needed |
The most common consistency pattern needed with replication lag is read-your-writes: after a user performs a write, their subsequent reads should see that write.
Implementation Approaches:
1. Sticky Sessions to Primary:
2. Version Tracking:
3. MVCC and Timestamps:
4. Accept Inconsistency:
Replication lag is a critical operational metric:
Key Metrics:
Alerting Thresholds (typical):
Lag spikes often indicate capacity issues, network problems, or expensive operations (large transactions, schema changes).
If replica apply rate is slower than primary write rate, lag grows indefinitely. This happens during: high write loads, expensive transactions, replica hardware issues, network bottlenecks. Monitor lag trends, not just current values. Catching a growing trend early prevents cascading problems.
Consistency models define what guarantees the system provides about the relationship between writes and subsequent reads. Choosing the right consistency model is a fundamental design decision.
Definition: After a write completes, all subsequent reads (from any client, through any replica) return that write or a more recent one.
Implications:
When to Use:
Definition: If no new writes occur, all replicas will eventually return the same value. No guarantee about when this happens.
Implications:
When to Use:
Definition: If operation B causally depends on operation A (e.g., B reads result of A, or B follows A in same session), then any process that sees B will also see A.
Implications:
When to Use:
| Model | Latency Impact | Availability Impact | Consistency Guarantee | Implementation Complexity |
|---|---|---|---|---|
| Strong | High (sync replication) | Lower (partition sensitive) | All reads see latest write | Medium |
| Bounded Staleness | Medium | Medium | Reads within time/version bound | Medium-High |
| Session (Read-Your-Writes) | Low | High | Session sees own writes | Medium |
| Causal | Low | High | Causally related ops ordered | High |
| Eventual | Lowest | Highest | Eventually converges | Low |
Definition: Reads are guaranteed to be no more than X seconds (or N versions) behind writes.
Implications:
Configuring Staleness Bounds:
Definition: A client will always see their own writes. Other clients may see stale data.
Implications:
Implementation Patterns:
Many databases offer tunable consistency, allowing different operations to use different levels:
Cassandra: ONE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM DynamoDB: Eventually consistent reads, strongly consistent reads CosmosDB: Strong, Bounded Staleness, Session, Consistent Prefix, Eventual CockroachDB: Serializable (default), follower reads (stale allowed)
This allows applications to use strong consistency where needed and eventual consistency where acceptable.
Stronger consistency is not always better—it comes with latency and availability costs. Analyze each operation: Does this read need to see the absolute latest data, or is eventual consistency acceptable? Applying strong consistency everywhere is expensive; applying it only where needed is efficient.
How replicas are connected affects replication latency, bandwidth usage, and failure handling. Understanding topology options helps optimize for your specific needs.
Structure:
Primary → Replica 1
→ Replica 2
→ Replica 3
Characteristics:
Use Cases:
Structure:
Primary A ←→ Primary B ←→ Primary C
Characteristics:
Use Cases:
Structure:
Primary → Replica 1 → Replica 2 → Replica 3
Characteristics:
Use Cases:
| Topology | Write Capacity | Read Scaling | Lag Distribution | Failure Handling |
|---|---|---|---|---|
| Primary-Replica | Single primary | Excellent | Uniform | Simple (failover to replica) |
| Multi-Primary | All nodes | Excellent | Uniform | Complex (conflict resolution) |
| Cascading | Single primary | Excellent | Increases along chain | Medium (chain repair) |
| Ring | All nodes | Excellent | Varies | Complex (ring repair) |
| Mesh | All nodes | Excellent | Low | Most complex (full mesh) |
Two Regions:
Region A (Primary) → Region B (Replica)
Simplest geo-distribution. Active-passive with clear failover target.
Three Regions (Chain):
US-East → EU-West → APAC-Tokyo
Reduces transcontinental replication hops. APAC has highest lag.
Three Regions (Star):
EU-West
↑
US-East ←→ APAC-Tokyo
US-East replicates to both. More bandwidth from US-East, lower lag globally.
Three Regions (Mesh):
US-East ←→ EU-West
↑ ↘ ↙ ↑
↓ ↘ ↙ ↓
APAC-Tokyo ←→ (all connected)
Full mesh. Lowest lag, highest complexity and bandwidth. Required for true active-active.
Cross-region replication consumes significant bandwidth:
Typical Consumption:
Cost Implications:
Topology choice cascades through your architecture: failover procedures, consistency guarantees, monitoring requirements, cost structure. Choose topology based on your specific requirements for write distribution, lag tolerance, and operational complexity appetite.
Different database systems implement replication differently. Understanding your database's approach is essential for correct configuration and operation.
Streaming Replication:
synchronous_commit, synchronous_standby_namesLogical Replication:
Cross-Region Considerations:
wal_keep_size to buffer during network issuesBinary Log Replication:
Group Replication:
Cross-Region Considerations:
| Database | Replication Type | Cross-Region Suitability | Key Configuration |
|---|---|---|---|
| PostgreSQL | Streaming (physical) | Good | synchronous_commit, recovery_target_timeline |
| PostgreSQL | Logical | Good | publication/subscription |
| MySQL | Binary log | Good | gtid_mode, semi_sync settings |
| MySQL | Group Replication | Medium (latency sensitive) | group_replication_consistency |
| MongoDB | Oplog replication | Good | write concern, read preference |
| Cassandra | Peer-to-peer gossip | Excellent | NetworkTopologyStrategy, DC-aware |
| CockroachDB | Raft consensus | Good (built for geo) | zone configurations, follower reads |
| Spanner | TrueTime + Paxos | Excellent (designed for geo) | Regional/multi-regional configuration |
Replica Set:
Cross-Region Considerations:
majority with cross-region secondaries means cross-region latencyPeer-to-Peer:
Cross-Region Considerations:
LOCAL_QUORUM for regional consistency with async cross-regionEACH_QUORUM for synchronous cross-region (high latency)Distributed Consensus:
Cross-Region Considerations:
Database replication documentation often uses nuanced language about guarantees. 'Durable' doesn't always mean 'replicated'. 'Committed' may not mean 'globally visible'. Read documentation carefully and test failure scenarios to understand actual behavior.
Replication failures are inevitable in geo-distributed systems. Networks partition, replicas fall behind, and divergent data must be reconciled. Designing for these failures is essential.
Network Partition:
Replica Falling Behind:
Data Corruption:
Schema Divergence:
Preserve Replication Logs:
Detect Problems Early:
Automated Recovery:
Test Failure Scenarios:
During primary failure with asynchronous replication, unreplicated data may be lost:
Option 1: Accept Loss
Option 2: Wait for Primary
Option 3: Attempt Recovery
The right choice depends on your RPO requirements and the nature of the failure.
Replication failures happen at the worst times—peak load, during deployments, when key engineers are unavailable. Document procedures thoroughly. Automate what can be safely automated. Ensure multiple team members can handle recovery. Don't let replication recovery be single-person tribal knowledge.
We've deeply explored data replication strategies for geo-distributed systems. Let's consolidate the key insights:
What's next:
With data replication covered, we now turn to the user-facing side of geo-distribution: latency optimization. The next page explores techniques for minimizing user-perceived latency including edge caching, connection optimization, and geographic traffic routing.
You now understand the fundamentals of data replication for geo-distributed systems: synchronous vs asynchronous approaches, managing replication lag, consistency models, replication topologies, database-specific mechanisms, and failure handling. Next, we'll explore techniques for optimizing latency across your geo-distributed architecture.