Database Management SystemsDistributed Databases

Distributed Database Concepts

LevelAdvanced

Duration75 mins

TopicDistributed Databases

3 / 5

Data Replication

Copies for Resilience and Performance

Fragmentation divides data; replication copies it. While fragmentation distributes the workload across nodes, replication ensures that data survives node failures and remains accessible from multiple locations.

Replication is the mechanism by which distributed databases achieve:

Fault tolerance: Data survives hardware failures, network partitions, and entire site outages
High availability: Clients can access data even when some replicas are unavailable
Read scalability: Multiple replicas serve read requests in parallel
Geographic locality: Replicas placed near users reduce latency

However, replication introduces the fundamental challenge that defines distributed computing: keeping copies consistent. When data exists in multiple places, modifications must propagate. This propagation takes time, during which replicas disagree. Managing this disagreement—or choosing to tolerate it—is the central problem of replication design.

What You Will Learn

By the end of this page, you will understand replication fundamentals—why we replicate, how synchronous and asynchronous strategies differ, what consistency trade-offs each implies, and how to choose the right replication strategy for different scenarios. You'll grasp the theoretical underpinnings that make replication both powerful and challenging.

Replication Fundamentals

What is a Replica?

A replica is a copy of data maintained at a distinct location. In database terms, a replica typically contains a copy of one or more fragments. The replication factor indicates how many copies exist—a replication factor of 3 means each piece of data exists in three places.

Primary Goals of Replication

1. Durability

Ensure data survives failures. If a disk fails, data on other replicas remains intact. With replication factor 3 and independent failure modes, the probability of losing data approaches zero:

Single disk: ~1% annual failure rate
Three independent replicas: (0.01)³ = 0.0001% chance of total loss

2. Availability

Ensure data remains accessible despite failures. With multiple replicas, services continue operating when replicas fail:

If any 1 of 3 replicas is available, data is accessible
Combined availability: 1 - (1 - p)³ where p = individual replica availability

3. Performance

Improve read throughput and reduce latency:

Read scalability: N replicas can serve N times the read throughput
Latency reduction: Geographic replicas serve local users without cross-continent round trips

4. Load Distribution

Spread query load across replicas:

Avoid overloading any single node
Handle traffic spikes by distributing across healthy replicas
Enable maintenance on individual replicas without service interruption

The Fundamental Challenge

Maintaining identical copies sounds simple until you consider: What happens when writes occur?

If data is modified on one replica, other replicas become stale. They contain old data until updates propagate. During this window:

Different replicas return different values for the same read
Clients may see data "go backward in time" if they switch replicas
Concurrent modifications on different replicas may conflict

All replication strategies are fundamentally about managing this update propagation—when it happens, how it happens, and what guarantees it provides.

Replication is Not Backup

Replication provides real-time redundancy, not protection against logical errors. If application code corrupts data, that corruption replicates instantly to all replicas. Backups—point-in-time snapshots stored separately—protect against logical errors, ransomware, and accidental deletion. A robust data protection strategy requires both replication AND backups.

Synchronous Replication

Synchronous replication ensures that a write is durably stored on multiple replicas before acknowledging success to the client. The transaction doesn't complete until replicas confirm receipt.

How It Works

Client sends write to primary (leader) replica
Primary forwards write to secondary (follower) replicas
Primary waits for acknowledgment from secondaries
Only after sufficient acknowledgments does primary confirm to client
Transaction is now committed and durable across replicas

Consistency Guarantee

Synchronous replication provides strong consistency (or linearizability)—all reads after a successful write are guaranteed to see that write, regardless of which replica serves the read. There is no window where replicas disagree; by the time the client learns the write succeeded, all replicas have it.

synchronous_replication.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Timeline: Synchronous Replication (3 replicas)
 
Client                Primary           Secondary-1        Secondary-2
  |                      |                   |                  |
  |--- BEGIN WRITE ----->|                   |                  |
  |                      |--- REPLICATE ---->|                  |
  |                      |--- REPLICATE -------------------->   |
  |                      |                   |                  |
  |                      |    (waiting...)   |   (writing...)   |
  |                      |                   |                  |
  |                      |<--- ACK ---------|                  |
  |                      |<--- ACK ---------------------------|
  |                      |                   |                  |
  |<--- COMMIT ACK ------|                   |                  |
  |                      |                   |                  |
  
Write latency = Network RTT to farthest replica + replica write time
 
Key property: When client receives COMMIT ACK, all replicas have the data

Synchronous Advantages

•Zero data loss on failover: All committed writes exist on all replicas
•Strong consistency: No stale reads, ever
•Simple reasoning: Committed = durable everywhere
•No conflict resolution needed: Single source of truth

Synchronous Disadvantages

•Higher write latency: Must wait for slowest replica
•Reduced availability: If required replicas unavailable, writes blocked
•Geographic limitation: Cross-region replication adds 100ms+ latency
•Tail latency impact: Slowest replica determines write performance

Quorum Writes

A common optimization is quorum acknowledgment: require W out of N replicas to acknowledge (where W > N/2). For N=3, W=2 means we wait for any 2 replicas. This tolerates one slow/failed replica while still providing strong consistency, assuming reads also use quorum (R replicas where R + W > N).

Asynchronous Replication

Asynchronous replication acknowledges writes after the primary replica commits, without waiting for secondaries. Replication to secondaries happens eventually but not as part of the commit path.

How It Works

Client sends write to primary replica
Primary commits write locally
Primary immediately confirms to client
Primary later sends write to secondaries (in background)
Secondaries apply write when they receive it

Consistency Reality

Asynchronous replication provides eventual consistency—replicas will eventually converge to the same state, but there's a window where they disagree. During this replication lag window:

Reading from secondaries may return stale data
If primary fails before replicating, recent writes are lost
Clients switching replicas may see data "regress"

asynchronous_replication.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Timeline: Asynchronous Replication (3 replicas)
 
Client                Primary           Secondary-1        Secondary-2
  |                      |                   |                  |
  |--- BEGIN WRITE ----->|                   |                  |
  |                      |                   |                  |
  |                      | (local commit)    |                  |
  |                      |                   |                  |
  |<--- COMMIT ACK ------|                   |                  |
  |                      |                   |                  |
  |   [client continues] |                   |                  |
  |                      |--- REPLICATE ---->| (background)     |
  |                      |--- REPLICATE -------------------->   |
  |                      |                   |                  |
  |                      |<--- ACK ---------|                  |
  |                      |<--- ACK ---------------------------|
  
Write latency = Primary commit time only (fast!)
 
Key property: Client receives ACK before secondaries have the data
Risk: Primary failure before replication = data loss

Replication Lag

The delay between a write committing on primary and appearing on secondaries is replication lag. Typical values:

Same data center: 1-100 milliseconds
Cross-region: 50-500 milliseconds
Under load: Can grow to seconds or minutes

Replication lag is a health metric. Excessive lag indicates:

Overloaded secondaries
Network issues
Write volume exceeding replication capacity

Durability Risk

With asynchronous replication, if the primary fails immediately after acknowledging a commit, that write may be lost—it existed only on the now-failed primary. This is the durability gap:

Acknowledged writes: Exist on primary only
Durable writes: Exist on secondaries too

The size of this gap equals replication lag. For some applications, losing "the last few seconds of writes" on catastrophic failure is acceptable. For financial transactions, it's not.

Synchronous vs. Asynchronous Summary
Characteristic	Synchronous	Asynchronous
Write latency	High (wait for replicas)	Low (local commit only)
Consistency	Strong (linearizable)	Eventual
Durability on ack	Guaranteed across replicas	Primary only
Availability	Requires replica quorum	Primary only needed
Data loss on primary failure	Zero	Up to replication lag
Geographic suitability	Same region preferred	Works across regions

The Replication Dilemma

You cannot have low latency, strong consistency, and high availability simultaneously for geographically distributed data. Synchronous replication trades latency for consistency. Asynchronous trades consistency for latency. This fundamental trade-off is unavoidable—it's physics, not engineering.

Replication Strategies

Beyond timing (sync vs. async), replication strategies differ in topology—how replicas relate to each other and which can accept writes.

1. Primary-Secondary (Leader-Follower)

One replica (primary/leader) accepts all writes. Other replicas (secondaries/followers) receive replicated changes and serve reads.

Write path: All writes → Primary → Replicate to secondaries
Read path: Any replica (primary or secondaries)
Failover: If primary fails, promote a secondary to primary
Conflict resolution: Not needed—single write point

This is the most common strategy for traditional relational databases (PostgreSQL, MySQL, SQL Server).

2. Multi-Primary (Multi-Leader, Multi-Master)

Multiple replicas accept writes. Each write is replicated to other primaries.

Write path: Any primary → Replicate to other primaries
Read path: Any replica
Conflict: Possible—concurrent writes to different primaries may conflict
Use case: Write availability across regions, offline-capable clients

Conflict Example:

User A (via Primary-1) sets account balance to $100
User B (via Primary-2) sets account balance to $150 (concurrently)
After replication: Which value wins? Both are equally valid "latest" writes.

Conflict Resolution Strategies:

Last-writer-wins (based on timestamp—but clock skew makes this imperfect)
Application-defined merge (custom logic per data type)
Conflict-free Replicated Data Types (CRDTs)—mathematically guaranteed to converge
Human intervention (flag conflicts for review)

3. Leaderless (Peer-to-Peer)

No designated leader. Any node can accept writes. Coordination happens through quorum protocols.

Write path: Write to W nodes (quorum)
Read path: Read from R nodes, reconcile if different
Consistency: Tunable via W and R values
Use case: Highly available systems (Cassandra, DynamoDB)

Quorum Formula: R + W > N (where N = total replicas)

If you read from R replicas and write to W replicas, and R + W > N, at least one replica in every read set will have seen the latest write. This provides read-your-writes consistency without strict coordination.

Replication Topology Comparison
Aspect	Primary-Secondary	Multi-Primary	Leaderless
Write availability	Requires primary	Any primary	Any quorum
Read scaling	Excellent	Excellent	Excellent
Conflict handling	None needed	Required	Required
Consistency model	Strong (sync) or eventual	Eventual + conflict resolution	Tunable via quorum
Failover complexity	Leader election needed	Automatic	Automatic
Use case	Traditional OLTP	Multi-region writes	Always-available systems

Production Reality

Most production systems use primary-secondary replication within a region (synchronous for durability) and asynchronous replication between regions (for disaster recovery). Multi-primary and leaderless are typically used when write availability across regions is required—accepting the conflict complexity as a trade-off.

Consistency Trade-offs

Replication introduces a spectrum of consistency models, each with distinct guarantees and costs. Understanding this spectrum is essential for distributed database design.

Strong Consistency (Linearizability)

Every read returns the most recent write. The system behaves as if there's a single copy of data. All clients agree on the order of operations.

Implementation: Synchronous replication, consensus protocols (Paxos, Raft)
Cost: Higher latency, reduced availability during partitions
Use case: Financial transactions, inventory management, anything with "exactly once" semantics

Causal Consistency

Operations that are causally related appear in the same order at all replicas. Concurrent operations may appear in different orders.

Implementation: Vector clocks, dependency tracking
Cost: Moderate complexity, some metadata overhead
Use case: Social media feeds, collaborative editing

Session Consistency (Read-Your-Writes)

Within a single session, clients see their own writes. Different clients may see different data.

Implementation: Sticky sessions (route client to same replica) or write tokens
Cost: Low, but limits load balancing
Use case: User profiles, shopping carts—where consistency matters per-user but not globally

Eventual Consistency

If no new writes occur, all replicas will eventually converge to the same value. No guarantees during the convergence period.

Implementation: Asynchronous replication, gossip protocols
Cost: Lowest latency, highest availability
Use case: Analytics, caching, non-critical data, "best effort" updates

The Consistency Hierarchy

Stronger consistency provides more intuitive behavior but costs performance and availability:

Strong (Linearizable)
    ↓ weaker, faster, more available
Sequential
    ↓
Causal
    ↓
Session / Read-Your-Writes
    ↓
Monotonic Reads
    ↓
Eventual

Eventual Consistency Anomalies

•Stale reads: Read old value after write confirmed
•Non-monotonic reads: Read newer value, then older value
•Missing writes: Writes appear to disappear temporarily
•Out-of-order reads: See effect before cause
•Write conflicts: Concurrent writes overwrite each other unpredictably

Designing for Eventual Consistency

Eventual consistency isn't "broken" consistency—it's a valid model with explicit semantics. Applications designed for it handle these anomalies gracefully: UI shows "saving..." during lag, idempotent operations tolerate retries, conflict resolution merges concurrent changes. The key is matching consistency level to application requirements.

Replication Implementation

How do databases actually replicate data? Several mechanisms exist, each with distinct characteristics.

1. Statement-Based Replication

Replicate the SQL statements themselves. Secondary replicas execute the same statements.

Pros: Compact (just the statement text), easy to understand
Cons: Non-determinism (NOW(), RAND(), triggers) causes divergence
Status: Mostly obsolete; replaced by row-based

2. Row-Based Replication

Replicate the effect of statements—the actual row changes (inserts, updates, deletes).

Pros: Deterministic, handles non-deterministic functions correctly
Cons: Larger log volume for bulk operations
Status: Default for most modern databases (MySQL, PostgreSQL)

3. Write-Ahead Log (WAL) Shipping

Replicate the database's internal write-ahead log—the physical changes to data pages.

Pros: Byte-for-byte identical replicas, includes all changes
Cons: Tied to specific database version; replicas must be same version
Status: Common for PostgreSQL physical replication, Oracle Data Guard

4. Logical Log Replication

Replicate a logical description of changes—independent of internal storage format.

Pros: Version-independent, can replicate to different database engines
Cons: More complex parsing and application
Status: PostgreSQL logical replication, change data capture (CDC) systems

replication_log_comparison.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Original Statement
UPDATE employees SET salary = salary * 1.1 WHERE department = 'Engineering';
 
-- Statement-Based Log Entry
"UPDATE employees SET salary = salary * 1.1 WHERE department = 'Engineering'"
 
-- Row-Based Log Entry (for each affected row)
{table: employees, key: 1001, before: {salary: 80000}, after: {salary: 88000}}
{table: employees, key: 1002, before: {salary: 95000}, after: {salary: 104500}}
{table: employees, key: 1003, before: {salary: 72000}, after: {salary: 79200}}
... (one entry per affected row)
 
-- WAL Entry (physical level)
[Block 5, Page 23, Offset 192: write bytes 0x4E20 -> 0x5578]
[Block 5, Page 23, Offset 256: write bytes 0x5DC0 -> 0x6672]
... (physical page modifications)
 
-- Logical Log Entry
{operation: UPDATE, table: employees, 
 key_columns: [id: 1001], 
 changed_columns: {salary: {old: 80000, new: 88000}}}

Replication Mechanism Comparison
Mechanism	Log Size	Deterministic	Version Coupling	Cross-Engine
Statement-Based	Minimal	No	Medium	Possible
Row-Based	Medium-Large	Yes	Low	Easy
WAL Shipping	Large	Yes	High (exact version)	No
Logical Replication	Medium	Yes	Low	Yes (CDC)

Change Data Capture (CDC)

CDC systems (Debezium, AWS DMS, Google Datastream) extract logical change logs from databases and stream them to other systems—data warehouses, search indices, event platforms. This extends replication beyond the source database, enabling real-time data integration across heterogeneous systems.

Failure Handling and Failover

Replication's value is realized during failures. When replicas fail, the system must detect the failure, reconfigure, and continue operating. This process—failover—is among the most critical and complex operations in distributed systems.

Detecting Failures

How do you know a replica has failed versus being slow or experiencing network issues?

Heartbeat monitoring: Replicas send periodic health signals; missed heartbeats indicate failure
Timeout thresholds: No response within timeout = assumed failed
Consensus agreement: Multiple nodes must agree a replica is failed (avoids false positives)

The Split-Brain Problem

If network partitions separate replicas but all remain operational, each partition might elect its own primary. Two primaries accepting writes = divergent data = disaster.

Solutions:

Fencing: Ensure old primary cannot process writes (STONITH: "Shoot The Other Node In The Head")
Quorum requirements: Require majority to elect primary; minority partition can't elect
External arbitration: Third-party witness node breaks ties

Promoting a Secondary

When primary fails, a secondary must become the new primary:

Detect primary failure: Timeout, missed heartbeats, consensus
Select new primary: Most up-to-date secondary, highest priority, consensus vote
Fence old primary: Ensure it cannot accept writes if it recovers
Reconfigure replication: Other secondaries now follow new primary
Resume operations: Clients directed to new primary

Data Loss Considerations

With asynchronous replication, the promoted secondary may be missing recent writes:

Committed but not replicated: Lost unless old primary recovers
Recovery options: If old primary recovers, reconcile its additional writes
Client impact: Some acknowledged writes may disappear

With synchronous replication, no data loss occurs—promotion is seamless.

Failover Complexities

•Detection latency: Time to detect failure delays recovery
•Election time: Consensus protocols take seconds to elect new leader
•Client redirection: Clients must learn new primary location
•Catch-up time: New primary may need to apply pending writes
•Application handling: Connections fail; applications must reconnect
•Cascading failures: Failover spike can overload remaining nodes

Automatic vs. Manual Failover

Automatic failover reduces recovery time but risks false positives (failing over when not needed) and split-brain. Manual failover is slower but allows human judgment. Many production systems use automatic failover within a region but manual failover across regions—balancing speed with safety.

Summary: Mastering Data Replication

Data replication is fundamental to distributed database reliability and performance. Let's consolidate the key concepts:

Key Takeaways

•Replication provides durability, availability, and performance — Copies survive failures, serve requests in parallel, and reduce latency through locality
•Synchronous replication guarantees strong consistency — All replicas have data when commit acknowledged, but latency increases
•Asynchronous replication minimizes latency — Primary commits immediately, but data loss possible during failures; replicas may be stale
•Topology matters — Primary-secondary (simple, no conflicts), multi-primary (write availability, conflicts), leaderless (highest availability, quorum-based)
•Consistency is a spectrum — Strong → Causal → Session → Eventual; choose based on application requirements
•Implementation mechanisms vary — Statement, row-based, WAL, logical replication each have trade-offs
•Failover is complex and critical — Detection, election, fencing, and client redirection must all work correctly

What's Next

Replication and fragmentation together create distributed data storage. But how do applications interact with this complexity? The next page explores transparency—the mechanisms that hide distribution details from applications, making a distributed database appear as a single logical system.

Page Complete

You now understand how data replication works in distributed databases—from synchronous and asynchronous strategies to consistency trade-offs and failover mechanisms. Next, we'll explore how transparency abstracts away distribution complexity from applications.

3 / 5

Loading learning content...

Database Management SystemsDistributed Databases

Distributed Database Concepts

LevelAdvanced

Duration75 mins

TopicDistributed Databases

3 / 5

Data Replication

Copies for Resilience and Performance

Replication is the mechanism by which distributed databases achieve:

Fault tolerance: Data survives hardware failures, network partitions, and entire site outages
High availability: Clients can access data even when some replicas are unavailable
Read scalability: Multiple replicas serve read requests in parallel
Geographic locality: Replicas placed near users reduce latency

What You Will Learn

Replication Fundamentals

What is a Replica?

Primary Goals of Replication

1. Durability

Ensure data survives failures. If a disk fails, data on other replicas remains intact. With replication factor 3 and independent failure modes, the probability of losing data approaches zero:

Single disk: ~1% annual failure rate
Three independent replicas: (0.01)³ = 0.0001% chance of total loss

2. Availability

Ensure data remains accessible despite failures. With multiple replicas, services continue operating when replicas fail:

If any 1 of 3 replicas is available, data is accessible
Combined availability: 1 - (1 - p)³ where p = individual replica availability

3. Performance

Improve read throughput and reduce latency:

Read scalability: N replicas can serve N times the read throughput
Latency reduction: Geographic replicas serve local users without cross-continent round trips

4. Load Distribution

Spread query load across replicas:

Avoid overloading any single node
Handle traffic spikes by distributing across healthy replicas
Enable maintenance on individual replicas without service interruption

The Fundamental Challenge

Maintaining identical copies sounds simple until you consider: What happens when writes occur?

If data is modified on one replica, other replicas become stale. They contain old data until updates propagate. During this window:

Different replicas return different values for the same read
Clients may see data "go backward in time" if they switch replicas
Concurrent modifications on different replicas may conflict

All replication strategies are fundamentally about managing this update propagation—when it happens, how it happens, and what guarantees it provides.

Replication is Not Backup

Synchronous Replication

Synchronous replication ensures that a write is durably stored on multiple replicas before acknowledging success to the client. The transaction doesn't complete until replicas confirm receipt.

How It Works

Client sends write to primary (leader) replica
Primary forwards write to secondary (follower) replicas
Primary waits for acknowledgment from secondaries
Only after sufficient acknowledgments does primary confirm to client
Transaction is now committed and durable across replicas

Consistency Guarantee

synchronous_replication.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Timeline: Synchronous Replication (3 replicas)
 
Client                Primary           Secondary-1        Secondary-2
  |                      |                   |                  |
  |--- BEGIN WRITE ----->|                   |                  |
  |                      |--- REPLICATE ---->|                  |
  |                      |--- REPLICATE -------------------->   |
  |                      |                   |                  |
  |                      |    (waiting...)   |   (writing...)   |
  |                      |                   |                  |
  |                      |<--- ACK ---------|                  |
  |                      |<--- ACK ---------------------------|
  |                      |                   |                  |
  |<--- COMMIT ACK ------|                   |                  |
  |                      |                   |                  |
  
Write latency = Network RTT to farthest replica + replica write time
 
Key property: When client receives COMMIT ACK, all replicas have the data

Synchronous Advantages

•Zero data loss on failover: All committed writes exist on all replicas
•Strong consistency: No stale reads, ever
•Simple reasoning: Committed = durable everywhere
•No conflict resolution needed: Single source of truth

Synchronous Disadvantages

•Higher write latency: Must wait for slowest replica
•Reduced availability: If required replicas unavailable, writes blocked
•Geographic limitation: Cross-region replication adds 100ms+ latency
•Tail latency impact: Slowest replica determines write performance

Quorum Writes

Asynchronous Replication

How It Works

Client sends write to primary replica
Primary commits write locally
Primary immediately confirms to client
Primary later sends write to secondaries (in background)
Secondaries apply write when they receive it

Consistency Reality

Asynchronous replication provides eventual consistency—replicas will eventually converge to the same state, but there's a window where they disagree. During this replication lag window:

Reading from secondaries may return stale data
If primary fails before replicating, recent writes are lost
Clients switching replicas may see data "regress"

asynchronous_replication.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Timeline: Asynchronous Replication (3 replicas)
 
Client                Primary           Secondary-1        Secondary-2
  |                      |                   |                  |
  |--- BEGIN WRITE ----->|                   |                  |
  |                      |                   |                  |
  |                      | (local commit)    |                  |
  |                      |                   |                  |
  |<--- COMMIT ACK ------|                   |                  |
  |                      |                   |                  |
  |   [client continues] |                   |                  |
  |                      |--- REPLICATE ---->| (background)     |
  |                      |--- REPLICATE -------------------->   |
  |                      |                   |                  |
  |                      |<--- ACK ---------|                  |
  |                      |<--- ACK ---------------------------|
  
Write latency = Primary commit time only (fast!)
 
Key property: Client receives ACK before secondaries have the data
Risk: Primary failure before replication = data loss

Replication Lag

The delay between a write committing on primary and appearing on secondaries is replication lag. Typical values:

Same data center: 1-100 milliseconds
Cross-region: 50-500 milliseconds
Under load: Can grow to seconds or minutes

Replication lag is a health metric. Excessive lag indicates:

Overloaded secondaries
Network issues
Write volume exceeding replication capacity

Durability Risk

With asynchronous replication, if the primary fails immediately after acknowledging a commit, that write may be lost—it existed only on the now-failed primary. This is the durability gap:

Acknowledged writes: Exist on primary only
Durable writes: Exist on secondaries too

The size of this gap equals replication lag. For some applications, losing "the last few seconds of writes" on catastrophic failure is acceptable. For financial transactions, it's not.

Synchronous vs. Asynchronous Summary
Characteristic	Synchronous	Asynchronous
Write latency	High (wait for replicas)	Low (local commit only)
Consistency	Strong (linearizable)	Eventual
Durability on ack	Guaranteed across replicas	Primary only
Availability	Requires replica quorum	Primary only needed
Data loss on primary failure	Zero	Up to replication lag
Geographic suitability	Same region preferred	Works across regions

The Replication Dilemma

Replication Strategies

Beyond timing (sync vs. async), replication strategies differ in topology—how replicas relate to each other and which can accept writes.

1. Primary-Secondary (Leader-Follower)

One replica (primary/leader) accepts all writes. Other replicas (secondaries/followers) receive replicated changes and serve reads.

Write path: All writes → Primary → Replicate to secondaries
Read path: Any replica (primary or secondaries)
Failover: If primary fails, promote a secondary to primary
Conflict resolution: Not needed—single write point

This is the most common strategy for traditional relational databases (PostgreSQL, MySQL, SQL Server).

2. Multi-Primary (Multi-Leader, Multi-Master)

Multiple replicas accept writes. Each write is replicated to other primaries.

Write path: Any primary → Replicate to other primaries
Read path: Any replica
Conflict: Possible—concurrent writes to different primaries may conflict
Use case: Write availability across regions, offline-capable clients

Conflict Example:

User A (via Primary-1) sets account balance to $100
User B (via Primary-2) sets account balance to $150 (concurrently)
After replication: Which value wins? Both are equally valid "latest" writes.

Conflict Resolution Strategies:

Last-writer-wins (based on timestamp—but clock skew makes this imperfect)
Application-defined merge (custom logic per data type)
Conflict-free Replicated Data Types (CRDTs)—mathematically guaranteed to converge
Human intervention (flag conflicts for review)

3. Leaderless (Peer-to-Peer)

No designated leader. Any node can accept writes. Coordination happens through quorum protocols.

Write path: Write to W nodes (quorum)
Read path: Read from R nodes, reconcile if different
Consistency: Tunable via W and R values
Use case: Highly available systems (Cassandra, DynamoDB)

Quorum Formula: R + W > N (where N = total replicas)

Replication Topology Comparison
Aspect	Primary-Secondary	Multi-Primary	Leaderless
Write availability	Requires primary	Any primary	Any quorum
Read scaling	Excellent	Excellent	Excellent
Conflict handling	None needed	Required	Required
Consistency model	Strong (sync) or eventual	Eventual + conflict resolution	Tunable via quorum
Failover complexity	Leader election needed	Automatic	Automatic
Use case	Traditional OLTP	Multi-region writes	Always-available systems

Production Reality

Consistency Trade-offs

Replication introduces a spectrum of consistency models, each with distinct guarantees and costs. Understanding this spectrum is essential for distributed database design.

Strong Consistency (Linearizability)

Every read returns the most recent write. The system behaves as if there's a single copy of data. All clients agree on the order of operations.

Implementation: Synchronous replication, consensus protocols (Paxos, Raft)
Cost: Higher latency, reduced availability during partitions
Use case: Financial transactions, inventory management, anything with "exactly once" semantics

Causal Consistency

Operations that are causally related appear in the same order at all replicas. Concurrent operations may appear in different orders.

Implementation: Vector clocks, dependency tracking
Cost: Moderate complexity, some metadata overhead
Use case: Social media feeds, collaborative editing

Session Consistency (Read-Your-Writes)

Within a single session, clients see their own writes. Different clients may see different data.

Implementation: Sticky sessions (route client to same replica) or write tokens
Cost: Low, but limits load balancing
Use case: User profiles, shopping carts—where consistency matters per-user but not globally

Eventual Consistency

If no new writes occur, all replicas will eventually converge to the same value. No guarantees during the convergence period.

Implementation: Asynchronous replication, gossip protocols
Cost: Lowest latency, highest availability
Use case: Analytics, caching, non-critical data, "best effort" updates

The Consistency Hierarchy

Stronger consistency provides more intuitive behavior but costs performance and availability:

Strong (Linearizable)
    ↓ weaker, faster, more available
Sequential
    ↓
Causal
    ↓
Session / Read-Your-Writes
    ↓
Monotonic Reads
    ↓
Eventual

Eventual Consistency Anomalies

•Stale reads: Read old value after write confirmed
•Non-monotonic reads: Read newer value, then older value
•Missing writes: Writes appear to disappear temporarily
•Out-of-order reads: See effect before cause
•Write conflicts: Concurrent writes overwrite each other unpredictably

Designing for Eventual Consistency

Replication Implementation

How do databases actually replicate data? Several mechanisms exist, each with distinct characteristics.

1. Statement-Based Replication

Replicate the SQL statements themselves. Secondary replicas execute the same statements.

Pros: Compact (just the statement text), easy to understand
Cons: Non-determinism (NOW(), RAND(), triggers) causes divergence
Status: Mostly obsolete; replaced by row-based

2. Row-Based Replication

Replicate the effect of statements—the actual row changes (inserts, updates, deletes).

Pros: Deterministic, handles non-deterministic functions correctly
Cons: Larger log volume for bulk operations
Status: Default for most modern databases (MySQL, PostgreSQL)

3. Write-Ahead Log (WAL) Shipping

Replicate the database's internal write-ahead log—the physical changes to data pages.

Pros: Byte-for-byte identical replicas, includes all changes
Cons: Tied to specific database version; replicas must be same version
Status: Common for PostgreSQL physical replication, Oracle Data Guard

4. Logical Log Replication

Replicate a logical description of changes—independent of internal storage format.

Pros: Version-independent, can replicate to different database engines
Cons: More complex parsing and application
Status: PostgreSQL logical replication, change data capture (CDC) systems

replication_log_comparison.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Original Statement
UPDATE employees SET salary = salary * 1.1 WHERE department = 'Engineering';
 
-- Statement-Based Log Entry
"UPDATE employees SET salary = salary * 1.1 WHERE department = 'Engineering'"
 
-- Row-Based Log Entry (for each affected row)
{table: employees, key: 1001, before: {salary: 80000}, after: {salary: 88000}}
{table: employees, key: 1002, before: {salary: 95000}, after: {salary: 104500}}
{table: employees, key: 1003, before: {salary: 72000}, after: {salary: 79200}}
... (one entry per affected row)
 
-- WAL Entry (physical level)
[Block 5, Page 23, Offset 192: write bytes 0x4E20 -> 0x5578]
[Block 5, Page 23, Offset 256: write bytes 0x5DC0 -> 0x6672]
... (physical page modifications)
 
-- Logical Log Entry
{operation: UPDATE, table: employees, 
 key_columns: [id: 1001], 
 changed_columns: {salary: {old: 80000, new: 88000}}}

Replication Mechanism Comparison
Mechanism	Log Size	Deterministic	Version Coupling	Cross-Engine
Statement-Based	Minimal	No	Medium	Possible
Row-Based	Medium-Large	Yes	Low	Easy
WAL Shipping	Large	Yes	High (exact version)	No
Logical Replication	Medium	Yes	Low	Yes (CDC)

Change Data Capture (CDC)

Failure Handling and Failover

Detecting Failures

How do you know a replica has failed versus being slow or experiencing network issues?

Heartbeat monitoring: Replicas send periodic health signals; missed heartbeats indicate failure
Timeout thresholds: No response within timeout = assumed failed
Consensus agreement: Multiple nodes must agree a replica is failed (avoids false positives)

The Split-Brain Problem

If network partitions separate replicas but all remain operational, each partition might elect its own primary. Two primaries accepting writes = divergent data = disaster.

Solutions:

Fencing: Ensure old primary cannot process writes (STONITH: "Shoot The Other Node In The Head")
Quorum requirements: Require majority to elect primary; minority partition can't elect
External arbitration: Third-party witness node breaks ties

Promoting a Secondary

When primary fails, a secondary must become the new primary:

Detect primary failure: Timeout, missed heartbeats, consensus
Select new primary: Most up-to-date secondary, highest priority, consensus vote
Fence old primary: Ensure it cannot accept writes if it recovers
Reconfigure replication: Other secondaries now follow new primary
Resume operations: Clients directed to new primary

Data Loss Considerations

With asynchronous replication, the promoted secondary may be missing recent writes:

Committed but not replicated: Lost unless old primary recovers
Recovery options: If old primary recovers, reconcile its additional writes
Client impact: Some acknowledged writes may disappear

With synchronous replication, no data loss occurs—promotion is seamless.

Failover Complexities

•Detection latency: Time to detect failure delays recovery
•Election time: Consensus protocols take seconds to elect new leader
•Client redirection: Clients must learn new primary location
•Catch-up time: New primary may need to apply pending writes
•Application handling: Connections fail; applications must reconnect
•Cascading failures: Failover spike can overload remaining nodes

Automatic vs. Manual Failover

Summary: Mastering Data Replication

Data replication is fundamental to distributed database reliability and performance. Let's consolidate the key concepts:

Key Takeaways

•Replication provides durability, availability, and performance — Copies survive failures, serve requests in parallel, and reduce latency through locality
•Synchronous replication guarantees strong consistency — All replicas have data when commit acknowledged, but latency increases
•Asynchronous replication minimizes latency — Primary commits immediately, but data loss possible during failures; replicas may be stale
•Topology matters — Primary-secondary (simple, no conflicts), multi-primary (write availability, conflicts), leaderless (highest availability, quorum-based)
•Consistency is a spectrum — Strong → Causal → Session → Eventual; choose based on application requirements
•Implementation mechanisms vary — Statement, row-based, WAL, logical replication each have trade-offs
•Failover is complex and critical — Detection, election, fencing, and client redirection must all work correctly

What's Next

Page Complete

3 / 5