Loading learning content...
Knowing that you need replication is only the beginning. The critical question is: how should replicas be organized and coordinated? Should there be a single node accepting writes? Multiple? Should replicas be daisy-chained for efficiency or fully meshed for resilience? Should consensus protocols govern every write?
These questions define your replication strategy—the architectural pattern that determines how your distributed database behaves under normal operation and during failures. Each strategy makes specific trade-offs between consistency, availability, latency, and operational complexity.
Choosing the wrong strategy for your workload leads to systems that are either too slow (over-synchronized), too fragile (under-replicated), or too complex to operate (needlessly sophisticated). This page equips you to make informed architectural decisions.
By the end of this page, you will understand the major replication strategies—primary-replica, multi-primary, chain replication, and consensus-based replication. You'll learn the mechanics, trade-offs, failure modes, and ideal use cases for each strategy, enabling you to select and configure the right approach for your system's requirements.
The primary-replica strategy (also called leader-follower or master-slave) is the most common replication pattern. A single designated node—the primary—handles all write operations. One or more replica nodes receive copies of these writes and serve read traffic.
How It Works:
Conflict Resolution: None Required
Because only the primary accepts writes, there are no write conflicts. This is the fundamental simplification that makes primary-replica replication operationally manageable.
Failover Mechanics:
When the primary fails, a replica must be promoted:
Automatic vs. Manual Failover:
If network partitions the primary from replicas but both remain operational, you risk split-brain: two nodes believe they're the primary and accept different writes. This causes data divergence that's extremely difficult to resolve. Consensus-based leader election (using odd-numbered nodes and majority quorums) prevents split-brain.
The multi-primary strategy (also called multi-master or active-active) allows multiple nodes to accept writes simultaneously. Changes made on any primary are replicated to all other primaries. This eliminates the write bottleneck of single-primary but introduces significant complexity.
When Multi-Primary Is Needed:
Geographic distribution: Users in multiple regions need low-latency writes. Routing all writes to a single primary in one region adds 100-200ms latency for distant users.
High write availability: If the single primary fails, writes are blocked. Multi-primary allows writes to continue on surviving nodes.
Write throughput: Write-heavy workloads may exceed single-node capacity.
The Central Challenge: Conflict Resolution
When two primaries accept writes to the same data simultaneously, a conflict occurs. Unlike primary-replica, where conflicts are impossible, multi-primary must detect and resolve conflicts.
Conflict Resolution Strategies:
1. Last-Writer-Wins (LWW) Use timestamps to determine which write is "later." The later write overwrites the earlier one.
2. Application-Level Resolution Present both values to the application and let business logic decide.
3. Merge Functions Define data type-specific merge operations (e.g., union sets, merge counters).
4. CRDTs (Conflict-free Replicated Data Types) Data structures mathematically designed to merge without conflicts.
| Strategy | Data Loss Risk | Complexity | Suitable For |
|---|---|---|---|
| Last-Writer-Wins | High (silent) | Low | Logging, caching, non-critical data |
| Application Resolution | None | High | Business-critical data requiring human judgment |
| Merge Functions | None | Medium | Counters, sets, append-only data |
| CRDTs | None | High | Collaborative editing, distributed counters |
The best conflict resolution is no conflict at all. Partition your data so that different records are owned by different primaries. User profile updates always go to the user's 'home' primary. Order processing happens on the primary where the order was created. This 'sticky routing' model gets multi-primary benefits without multi-primary conflicts.
Chain replication is a specialized strategy that arranges nodes in a linear chain. Writes enter at the head of the chain and propagate through each node until reaching the tail. Reads are served exclusively by the tail. This design provides strong consistency with high throughput.
How Chain Replication Works:
Why This Works:
The tail is guaranteed to have every committed write. Any write in the chain but not at the tail is "in flight"—not yet committed. By serving reads from the tail exclusively, you get linearizable (strongly consistent) reads without any coordination overhead.
Failure Handling in Chain Replication:
Head Failure: The next node in the chain becomes the new head. Pending writes that hadn't propagated past the old head are lost.
Middle Node Failure: The chain is "spliced"—the predecessor of the failed node connects directly to its successor. In-flight writes must be recovered from the predecessor.
Tail Failure: The predecessor of the tail becomes the new tail. Some committed writes (acked by old tail) may need re-acknowledgment.
CRAQ: Chain Replication with Apportioned Queries
A variant called CRAQ (Chain Replication with Apportioned Queries) allows reads from any node, not just the tail. Each node tracks which writes are "clean" (committed) vs. "dirty" (in-flight). Reads of clean data can be served locally; reads of dirty data are forwarded to the tail. This distributes read load while maintaining strong consistency.
Chain replication is less common in traditional databases but appears in distributed storage systems. Microsoft Azure Storage uses a variant of chain replication. Apache Kafka's replication model has chain-like properties. Object storage systems benefit from chain replication's bandwidth efficiency.
Consensus-based replication uses distributed consensus protocols (Paxos, Raft, Zab) to ensure that all replicas agree on the order of writes. This provides the strongest consistency guarantees—linearizability—while tolerating failures of a minority of nodes.
The Core Idea:
Before a write is committed, a majority (quorum) of nodes must agree to accept it. This ensures:
Common Consensus Protocols:
Raft
Paxos
Zab (Zookeeper Atomic Broadcast)
Raft Protocol Deep Dive:
Leader Election:
Log Replication:
Safety Guarantees:
| Characteristic | Description | Impact |
|---|---|---|
| Fault Tolerance | Tolerates (N-1)/2 failures for N nodes | 3 nodes → 1 failure; 5 nodes → 2 failures |
| Write Latency | Requires majority acknowledgment | Higher than async, similar to sync |
| Consistency | Linearizable (strongest) | No stale reads possible |
| Availability | Available if majority is reachable | CAP: CP system (not AP) |
| Complexity | Complex protocol implementation | Use battle-tested libraries (etcd, Consul) |
Consensus requires a majority quorum. With 3 nodes, majority is 2, tolerating 1 failure. With 5 nodes, majority is 3, tolerating 2 failures. Even-numbered clusters (2, 4, 6) provide no benefit: 4 nodes still only tolerate 1 failure (majority is 3), same as 3 nodes but with more overhead. Always use odd numbers.
Quorum-based replication (also called leaderless or Dynamo-style replication) eliminates the leader entirely. Clients can write to and read from any node. Consistency is achieved through quorum arithmetic: if you write to W nodes and read from R nodes, and W + R > N (total nodes), at least one node you read from has the latest write.
The Quorum Formula:
If W + R > N, reads will observe the latest write.
Where:
N = Total number of replicas
W = Number of nodes that must acknowledge a write
R = Number of nodes that must respond to a read
Common Configurations:
| Config | N | W | R | Write Tolerance | Read Tolerance | Characteristic |
|---|---|---|---|---|---|---|
| Strong | 3 | 2 | 2 | 1 failure | 1 failure | Balanced |
| Write-heavy | 3 | 1 | 3 | 2 failures | 0 failures | Fast writes |
| Read-heavy | 3 | 3 | 1 | 0 failures | 2 failures | Fast reads |
| Eventual | 3 | 1 | 1 | 2 failures | 2 failures | No guarantee |
Read Repair and Anti-Entropy:
Quorum replication requires mechanisms to ensure replicas eventually converge:
Read Repair: When a client reads from R nodes and receives different versions, it writes the newest version back to stale nodes. This piggy-backs consistency maintenance on read operations.
Anti-Entropy (Merkle Trees): Background processes compare data across replicas using hash trees. When differences are detected, the stale replica receives updates. This catches inconsistencies that read repair misses (data never read).
Hinted Handoff: If a node is temporarily unavailable during a write, another node stores the write temporarily (a "hint"). When the target node recovers, the hint is delivered. This maintains write availability during transient failures.
A 'sloppy quorum' allows writes to succeed even if some of the designated N replicas are unavailable, by writing to other nodes instead. This increases write availability but weakens durability guarantees. Amazon's Dynamo uses sloppy quorums—sacrificing strict consistency for high availability.
Selecting the right replication strategy requires matching your workload characteristics and requirements to strategy capabilities. Here's a decision framework:
| Requirement | Best Strategy | Rationale |
|---|---|---|
| Simple operations, strong consistency | Primary-Replica + Sync | Simplest model; sync ensures durability |
| Read scalability, eventual consistency OK | Primary-Replica + Async | Add replicas freely; low write latency |
| Multi-region writes, low latency | Multi-Primary | Local writes in each region |
| Strong consistency, fault tolerance | Consensus (Raft/Paxos) | Linearizable; survives minority failures |
| High write availability, tunable consistency | Quorum-Based | No single point of failure; flexible W/R |
| Simple high throughput, strong consistency | Chain Replication | Efficient network usage; strong guarantees |
Decision Tree:
1. Do you need multi-region writes with low latency?
Yes → Multi-Primary (accept conflict resolution complexity)
No → Continue
2. Is strong (linearizable) consistency required?
Yes → Consensus-Based (Raft/Paxos)
No → Continue
3. Is write availability more important than consistency?
Yes → Quorum-Based (tunable W/R)
No → Continue
4. Is read scalability the primary concern?
Yes → Primary-Replica with Async (many replicas)
No → Primary-Replica with Sync (simple, durable)
Start simple. Primary-replica covers 90% of use cases. Multi-primary and consensus add operational complexity that you must be prepared to manage. Choose the simplest strategy that meets your requirements—complexity is a cost, not a feature.
Real-world systems often combine multiple replication strategies to achieve complex requirements. Understanding these hybrid approaches reveals the flexibility of replication architecture:
Example: CockroachDB Architecture
CockroachDB demonstrates sophisticated hybrid replication:
This combines sharding, consensus, and lease-based optimization into a globally consistent yet locally fast system.
Hybrid architectures are often the result of evolution, not initial design. Start with a simple strategy (primary-replica). As requirements grow, add components: a synchronous replica for durability, a remote async replica for DR, sharding for write scale. Each addition should address a specific, measured need.
We've explored the major replication strategies and their trade-offs. Let's consolidate the key insights:
What's Next:
Now that we understand replication strategies, we'll dive into the most critical aspect of distributed database design: Consistency Trade-offs. We'll explore the spectrum from strong to eventual consistency, the CAP and PACELC theorems, and how to choose the right consistency model for your application's semantics.
You now understand the major replication strategies—primary-replica, multi-primary, chain replication, consensus-based, and quorum-based. Each strategy makes specific trade-offs between consistency, availability, latency, and operational complexity. Select the simplest strategy that meets your requirements, and be prepared to evolve as needs grow. Next, we'll explore consistency trade-offs in depth.