Loading learning content...
Fragmentation divides data; replication copies it. While fragmentation distributes the workload across nodes, replication ensures that data survives node failures and remains accessible from multiple locations.
Replication is the mechanism by which distributed databases achieve:
However, replication introduces the fundamental challenge that defines distributed computing: keeping copies consistent. When data exists in multiple places, modifications must propagate. This propagation takes time, during which replicas disagree. Managing this disagreement—or choosing to tolerate it—is the central problem of replication design.
By the end of this page, you will understand replication fundamentals—why we replicate, how synchronous and asynchronous strategies differ, what consistency trade-offs each implies, and how to choose the right replication strategy for different scenarios. You'll grasp the theoretical underpinnings that make replication both powerful and challenging.
What is a Replica?
A replica is a copy of data maintained at a distinct location. In database terms, a replica typically contains a copy of one or more fragments. The replication factor indicates how many copies exist—a replication factor of 3 means each piece of data exists in three places.
Primary Goals of Replication
1. Durability
Ensure data survives failures. If a disk fails, data on other replicas remains intact. With replication factor 3 and independent failure modes, the probability of losing data approaches zero:
2. Availability
Ensure data remains accessible despite failures. With multiple replicas, services continue operating when replicas fail:
3. Performance
Improve read throughput and reduce latency:
4. Load Distribution
Spread query load across replicas:
The Fundamental Challenge
Maintaining identical copies sounds simple until you consider: What happens when writes occur?
If data is modified on one replica, other replicas become stale. They contain old data until updates propagate. During this window:
All replication strategies are fundamentally about managing this update propagation—when it happens, how it happens, and what guarantees it provides.
Replication provides real-time redundancy, not protection against logical errors. If application code corrupts data, that corruption replicates instantly to all replicas. Backups—point-in-time snapshots stored separately—protect against logical errors, ransomware, and accidental deletion. A robust data protection strategy requires both replication AND backups.
Synchronous replication ensures that a write is durably stored on multiple replicas before acknowledging success to the client. The transaction doesn't complete until replicas confirm receipt.
How It Works
Consistency Guarantee
Synchronous replication provides strong consistency (or linearizability)—all reads after a successful write are guaranteed to see that write, regardless of which replica serves the read. There is no window where replicas disagree; by the time the client learns the write succeeded, all replicas have it.
12345678910111213141516171819
Timeline: Synchronous Replication (3 replicas) Client Primary Secondary-1 Secondary-2 | | | | |--- BEGIN WRITE ----->| | | | |--- REPLICATE ---->| | | |--- REPLICATE --------------------> | | | | | | | (waiting...) | (writing...) | | | | | | |<--- ACK ---------| | | |<--- ACK ---------------------------| | | | | |<--- COMMIT ACK ------| | | | | | | Write latency = Network RTT to farthest replica + replica write time Key property: When client receives COMMIT ACK, all replicas have the dataA common optimization is quorum acknowledgment: require W out of N replicas to acknowledge (where W > N/2). For N=3, W=2 means we wait for any 2 replicas. This tolerates one slow/failed replica while still providing strong consistency, assuming reads also use quorum (R replicas where R + W > N).
Asynchronous replication acknowledges writes after the primary replica commits, without waiting for secondaries. Replication to secondaries happens eventually but not as part of the commit path.
How It Works
Consistency Reality
Asynchronous replication provides eventual consistency—replicas will eventually converge to the same state, but there's a window where they disagree. During this replication lag window:
123456789101112131415161718192021
Timeline: Asynchronous Replication (3 replicas) Client Primary Secondary-1 Secondary-2 | | | | |--- BEGIN WRITE ----->| | | | | | | | | (local commit) | | | | | | |<--- COMMIT ACK ------| | | | | | | | [client continues] | | | | |--- REPLICATE ---->| (background) | | |--- REPLICATE --------------------> | | | | | | |<--- ACK ---------| | | |<--- ACK ---------------------------| Write latency = Primary commit time only (fast!) Key property: Client receives ACK before secondaries have the dataRisk: Primary failure before replication = data lossReplication Lag
The delay between a write committing on primary and appearing on secondaries is replication lag. Typical values:
Replication lag is a health metric. Excessive lag indicates:
Durability Risk
With asynchronous replication, if the primary fails immediately after acknowledging a commit, that write may be lost—it existed only on the now-failed primary. This is the durability gap:
The size of this gap equals replication lag. For some applications, losing "the last few seconds of writes" on catastrophic failure is acceptable. For financial transactions, it's not.
| Characteristic | Synchronous | Asynchronous |
|---|---|---|
| Write latency | High (wait for replicas) | Low (local commit only) |
| Consistency | Strong (linearizable) | Eventual |
| Durability on ack | Guaranteed across replicas | Primary only |
| Availability | Requires replica quorum | Primary only needed |
| Data loss on primary failure | Zero | Up to replication lag |
| Geographic suitability | Same region preferred | Works across regions |
You cannot have low latency, strong consistency, and high availability simultaneously for geographically distributed data. Synchronous replication trades latency for consistency. Asynchronous trades consistency for latency. This fundamental trade-off is unavoidable—it's physics, not engineering.
Beyond timing (sync vs. async), replication strategies differ in topology—how replicas relate to each other and which can accept writes.
1. Primary-Secondary (Leader-Follower)
One replica (primary/leader) accepts all writes. Other replicas (secondaries/followers) receive replicated changes and serve reads.
This is the most common strategy for traditional relational databases (PostgreSQL, MySQL, SQL Server).
2. Multi-Primary (Multi-Leader, Multi-Master)
Multiple replicas accept writes. Each write is replicated to other primaries.
Conflict Example:
Conflict Resolution Strategies:
3. Leaderless (Peer-to-Peer)
No designated leader. Any node can accept writes. Coordination happens through quorum protocols.
Quorum Formula: R + W > N (where N = total replicas)
If you read from R replicas and write to W replicas, and R + W > N, at least one replica in every read set will have seen the latest write. This provides read-your-writes consistency without strict coordination.
| Aspect | Primary-Secondary | Multi-Primary | Leaderless |
|---|---|---|---|
| Write availability | Requires primary | Any primary | Any quorum |
| Read scaling | Excellent | Excellent | Excellent |
| Conflict handling | None needed | Required | Required |
| Consistency model | Strong (sync) or eventual | Eventual + conflict resolution | Tunable via quorum |
| Failover complexity | Leader election needed | Automatic | Automatic |
| Use case | Traditional OLTP | Multi-region writes | Always-available systems |
Most production systems use primary-secondary replication within a region (synchronous for durability) and asynchronous replication between regions (for disaster recovery). Multi-primary and leaderless are typically used when write availability across regions is required—accepting the conflict complexity as a trade-off.
Replication introduces a spectrum of consistency models, each with distinct guarantees and costs. Understanding this spectrum is essential for distributed database design.
Strong Consistency (Linearizability)
Every read returns the most recent write. The system behaves as if there's a single copy of data. All clients agree on the order of operations.
Causal Consistency
Operations that are causally related appear in the same order at all replicas. Concurrent operations may appear in different orders.
Session Consistency (Read-Your-Writes)
Within a single session, clients see their own writes. Different clients may see different data.
Eventual Consistency
If no new writes occur, all replicas will eventually converge to the same value. No guarantees during the convergence period.
The Consistency Hierarchy
Stronger consistency provides more intuitive behavior but costs performance and availability:
Strong (Linearizable)
↓ weaker, faster, more available
Sequential
↓
Causal
↓
Session / Read-Your-Writes
↓
Monotonic Reads
↓
Eventual
Eventual consistency isn't "broken" consistency—it's a valid model with explicit semantics. Applications designed for it handle these anomalies gracefully: UI shows "saving..." during lag, idempotent operations tolerate retries, conflict resolution merges concurrent changes. The key is matching consistency level to application requirements.
How do databases actually replicate data? Several mechanisms exist, each with distinct characteristics.
1. Statement-Based Replication
Replicate the SQL statements themselves. Secondary replicas execute the same statements.
2. Row-Based Replication
Replicate the effect of statements—the actual row changes (inserts, updates, deletes).
3. Write-Ahead Log (WAL) Shipping
Replicate the database's internal write-ahead log—the physical changes to data pages.
4. Logical Log Replication
Replicate a logical description of changes—independent of internal storage format.
123456789101112131415161718192021
-- Original StatementUPDATE employees SET salary = salary * 1.1 WHERE department = 'Engineering'; -- Statement-Based Log Entry"UPDATE employees SET salary = salary * 1.1 WHERE department = 'Engineering'" -- Row-Based Log Entry (for each affected row){table: employees, key: 1001, before: {salary: 80000}, after: {salary: 88000}}{table: employees, key: 1002, before: {salary: 95000}, after: {salary: 104500}}{table: employees, key: 1003, before: {salary: 72000}, after: {salary: 79200}}... (one entry per affected row) -- WAL Entry (physical level)[Block 5, Page 23, Offset 192: write bytes 0x4E20 -> 0x5578][Block 5, Page 23, Offset 256: write bytes 0x5DC0 -> 0x6672]... (physical page modifications) -- Logical Log Entry{operation: UPDATE, table: employees, key_columns: [id: 1001], changed_columns: {salary: {old: 80000, new: 88000}}}| Mechanism | Log Size | Deterministic | Version Coupling | Cross-Engine |
|---|---|---|---|---|
| Statement-Based | Minimal | No | Medium | Possible |
| Row-Based | Medium-Large | Yes | Low | Easy |
| WAL Shipping | Large | Yes | High (exact version) | No |
| Logical Replication | Medium | Yes | Low | Yes (CDC) |
CDC systems (Debezium, AWS DMS, Google Datastream) extract logical change logs from databases and stream them to other systems—data warehouses, search indices, event platforms. This extends replication beyond the source database, enabling real-time data integration across heterogeneous systems.
Replication's value is realized during failures. When replicas fail, the system must detect the failure, reconfigure, and continue operating. This process—failover—is among the most critical and complex operations in distributed systems.
Detecting Failures
How do you know a replica has failed versus being slow or experiencing network issues?
The Split-Brain Problem
If network partitions separate replicas but all remain operational, each partition might elect its own primary. Two primaries accepting writes = divergent data = disaster.
Solutions:
Promoting a Secondary
When primary fails, a secondary must become the new primary:
Data Loss Considerations
With asynchronous replication, the promoted secondary may be missing recent writes:
With synchronous replication, no data loss occurs—promotion is seamless.
Automatic failover reduces recovery time but risks false positives (failing over when not needed) and split-brain. Manual failover is slower but allows human judgment. Many production systems use automatic failover within a region but manual failover across regions—balancing speed with safety.
Data replication is fundamental to distributed database reliability and performance. Let's consolidate the key concepts:
What's Next
Replication and fragmentation together create distributed data storage. But how do applications interact with this complexity? The next page explores transparency—the mechanisms that hide distribution details from applications, making a distributed database appear as a single logical system.
You now understand how data replication works in distributed databases—from synchronous and asynchronous strategies to consistency trade-offs and failover mechanisms. Next, we'll explore how transparency abstracts away distribution complexity from applications.