Leaderless Replication - Learning Module

Loading content...

0/273

Quorum Reads and Writes

The Mathematics of Distributed Agreement

In leaderless systems, there's no single authority to determine the "correct" value. Instead, correctness emerges from agreement among replicas. But how many replicas need to agree? And what guarantees does that agreement provide?

The answer lies in quorum systems—a concept from distributed computing that provides tunable consistency guarantees based on the number of nodes participating in reads and writes.

Quorums are powerful because they're mathematically grounded. With the right configuration, you can guarantee that reads always see the latest write, even in a system designed for availability. Understanding quorum mathematics is essential for configuring leaderless systems correctly.

What You Will Learn

By the end of this page, you will understand the N-R-W quorum formula, why quorum intersection guarantees consistency, the trade-offs between different quorum configurations, and practical guidance for choosing quorum settings based on your application's requirements.

The N-R-W Model

Leaderless systems expose three key parameters that control consistency and availability trade-offs:

N — The replication factor: how many replicas exist for each piece of data
W — The write quorum: how many replicas must acknowledge a write before it's considered successful
R — The read quorum: how many replicas must respond to a read before returning a result

These parameters are typically configurable per-request, allowing different operations to have different consistency requirements.

Common N-R-W Configurations
Configuration	N	R	W	Use Case
Read-heavy, eventual	3	1	1	Caching, session stores, analytics
Balanced, strong	3	2	2	Most production workloads
Write-heavy, eventual	3	1	2	Log aggregation, time-series
Maximum consistency	3	3	3	Financial data, inventory counts
Single-DC common	3	2	2	Local with quorum for safety
Multi-DC common	5 (2+2+1)	3	3	Global with LOCAL_QUORUM optimization

The fundamental relationship:

If R + W > N, reads and writes are guaranteed to overlap on at least one replica. This overlap ensures that any read will contact at least one replica that participated in the most recent write, providing strong consistency guarantees.

Mathematical proof:

N = total replicas
W = replicas that acknowledge a write
R = replicas contacted for a read

If W + R > N:
  - Write touches W replicas
  - Read touches R replicas
  - Total "touches" = W + R > N
  - By pigeonhole principle: at least one replica is touched by both
  - Therefore: read must see the write

The Magic Formula

R + W > N is the foundation of quorum-based consistency. Memorize it. When someone asks "how do I get consistency in a leaderless system?" the answer starts here. With N=3, the most common setting is R=2, W=2—giving R+W=4 > 3, ensuring overlap.

Quorum Intersection Deep Dive

Let's examine quorum intersection in concrete scenarios to build intuition for how it provides consistency guarantees.

Scenario 1: N=3, W=2, R=2 (Strong consistency)

Replicas: A, B, C

Write: value=100 → acknowledged by A, B (W=2 met)
  A: value=100 ✓
  B: value=100 ✓
  C: value=? (may have old value or 100, depending on timing)

Read: contacts B, C (R=2)
  B: returns 100 (was in write quorum)
  C: returns ??? 

Result: At least B has the latest value
        Read will see value=100

Why this works:

Write quorum: {A, B}
Read quorum: {B, C}
Intersection: {B}
B has the latest value, so the read sees it

Scenario 2: N=3, W=1, R=1 (Eventual consistency)

Replicas: A, B, C

Write: value=100 → acknowledged by A only (W=1 met)
  A: value=100 ✓
  B: value=old
  C: value=old

Read: contacts C only (R=1)
  C: returns old value

Result: Read misses the write!
        R + W = 1 + 1 = 2, but N = 3
        No guaranteed overlap

This configuration provides no consistency guarantee—you might get stale data. It maximizes availability and throughput but sacrifices consistency.

Scenario 3: N=5, W=3, R=3 (Common multi-DC)

Replicas: A, B (DC1), C, D (DC2), E (DC3)

Write: value=100 → acknowledged by A, B, C (W=3 met)
Read: contacts C, D, E (R=3)

  Write quorum: {A, B, C}
  Read quorum: {C, D, E}
  Intersection: {C}

C has the latest value → read sees value=100 ✓

R + W = 3 + 3 = 6 > 5 = N → guaranteed overlap

Intersection Size Implications

•Minimum intersection = R + W - N — With N=3, R=2, W=2, minimum overlap is 2+2-3=1 replica.
•Larger intersection = more confidence — More overlapping replicas means less chance of reading stale data during repairs.
•Intersection of 1 is sufficient but fragile — That one overlapping replica better have the latest data!
•R = W = N means all replicas — Maximum consistency but any failure blocks operations.

Intersection Doesn't Guarantee Immediate Consistency

Even with R + W > N, there's a brief window during a write where some read quorums might not include the write's replicas. Writes are not atomic across replicas—they complete at different times. Quorum intersection guarantees consistency after the write is acknowledged, not during it.

Consistency Levels in Practice

Production systems like Cassandra provide named consistency levels that map to specific R and W values. Understanding these levels is crucial for configuring applications correctly.

Cassandra's consistency levels:

Cassandra Consistency Levels (Writes)
Level	Meaning	Trade-off
ANY	Write to any node, including hints	Maximum availability, may lose data if hint node fails
ONE	Write to at least 1 replica	Fast, durable on 1 node, stale reads possible
TWO	Write to at least 2 replicas	More durable, slightly higher latency
THREE	Write to at least 3 replicas	High durability for RF ≥ 3
QUORUM	Write to ⌊N/2⌋ + 1 replicas	Strong consistency with QUORUM reads
LOCAL_QUORUM	Quorum in local datacenter only	Strong local consistency, no cross-DC wait
EACH_QUORUM	Quorum in each datacenter	Strong consistency everywhere, highest latency
ALL	Write to all N replicas	Maximum durability, any failure blocks writes

Cassandra Consistency Levels (Reads)
Level	Meaning	Trade-off
ONE	Read from 1 replica	Fastest, may return stale data
TWO	Read from 2 replicas	More likely to get fresh data
THREE	Read from 3 replicas	High confidence for RF ≥ 3
QUORUM	Read from ⌊N/2⌋ + 1 replicas	Strong consistency with QUORUM writes
LOCAL_QUORUM	Quorum in local datacenter only	Strong local consistency, fast
EACH_QUORUM	Quorum from each datacenter	Strong global consistency
ALL	Read from all N replicas	Maximum freshness, any failure blocks reads
SERIAL	For LWT (linearizable reads)	Part of lightweight transaction protocol

Choosing consistency levels:

For strong consistency: Use CL=QUORUM for both reads and writes. This guarantees overlap.

For eventual consistency with durability: Use CL=ONE for reads, CL=QUORUM for writes. Writes are durable but reads might be stale.

For maximum throughput: Use CL=ONE for both. Accept that reads may return stale data.

For multi-datacenter: Use LOCAL_QUORUM to avoid cross-DC latency. Note: this only provides strong consistency within each DC, not globally.

LOCAL_QUORUM Considerations

LOCAL_QUORUM is the most commonly used level in multi-DC deployments. It provides good consistency locally without waiting for distant datacenters. However, after a disaster recovery failover, data written with LOCAL_QUORUM in one DC might not be visible in another DC until async replication catches up.

Availability Under Failures

The choice of R and W directly impacts system availability during failures. Higher quorum requirements mean fewer failures can be tolerated.

Failure tolerance calculation:

With N replicas:

Write tolerance = N - W (number of failures before writes are blocked)
Read tolerance = N - R (number of failures before reads are blocked)

Example: N=3, R=2, W=2 (QUORUM)

Write tolerance = 3 - 2 = 1 failure
Read tolerance = 3 - 2 = 1 failure

With 1 node down: Operations succeed (2 of 3 available)
With 2 nodes down: Operations FAIL (only 1 of 3 available)

Failure Tolerance by Configuration
Config	N	R	W	Node Failures Tolerated	Consistency Level
Weak consistency	3	1	1	2 (for reads or writes)	Eventual
Standard quorum	3	2	2	1 (for reads AND writes)	Strong
Read-favoring	3	1	2	2 (reads) / 1 (writes)	Eventual reads, durable writes
Write-favoring	3	2	1	1 (reads) / 2 (writes)	Strong reads, fast writes
Total paranoia	3	3	3	0 (any failure blocks)	Strongest, least available
5-node quorum	5	3	3	2 (for reads AND writes)	Strong, high availability

The availability-consistency spectrum:

← More Available                                    More Consistent →

R=1, W=1          R=1, W=2          R=2, W=2          R=3, W=3
(2 failures)      (1-2 failures)    (1 failure)       (0 failures)
(stale reads)     (durable writes)  (strong)          (strongest)

Practical recommendation:

For most production workloads with N=3:

Use R=QUORUM (2), W=QUORUM (2) for strong consistency
Accept that you can tolerate 1 replica failure
If you need higher availability, increase N to 5 (tolerates 2 failures with quorum)

Partial Failures Are More Common

Complete node failures (node is down) are easier to handle than partial failures (node is slow, or responding to some requests but not others). A slow replica can cause cascading latency issues as operations wait for it. Many systems implement "speculative retry" to handle slow replicas by racing requests to additional nodes.

Read Repair Mechanics

When a read operation contacts multiple replicas, it may discover they have different values. Read repair is the process of using this opportunity to synchronize the replicas.

The read repair process:

1. Read request for key K with R=2

2. Contact replicas A, B, C (contact all, wait for R)
   A responds: value=100, timestamp=1000
   B responds: value=100, timestamp=1000
   C responds: value=50, timestamp=500  ← stale!

3. Return value=100 to client (newest value)

4. Initiate repair:
   Send to C: "key K should have value=100, timestamp=1000"
   C updates its local copy

5. All replicas now consistent

Read Repair Modes

•Background (async) repair — Repair happens after returning to client. Lower read latency but repair may fail silently.
•Foreground (blocking) repair — Wait for repair to complete before returning. Higher latency but guaranteed repair.
•Probabilistic repair — Only repair on a percentage of reads to reduce overhead. Trade-off between consistency and performance.
•Digest-based detection — Replicas return digests (hashes) first. Full data only fetched if digests differ, reducing bandwidth.

Cassandra's read repair implementation:

Cassandra uses a configurable read_repair_chance (now deprecated in favor of dc_local_read_repair_chance):

On every read, replicas return data/digests
If digests match across R replicas, return immediately
If digests differ, fetch full data from all, determine newest
Send correction to stale replicas asynchronously
Optionally contact additional replicas (beyond R) for proactive repair

Why read repair helps quorum consistency:

Even with R + W > N, replicas can drift due to:

Hinted handoff delays
Failed write to one replica
Clock skew causing version confusion

Read repair is "continuous, opportunistic anti-entropy"—every read is a chance to fix inconsistencies.

Read Repair Performance Impact

Read repair adds latency (for repair) and network traffic (for corrections). For very hot keys read thousands of times per second, consider whether the repair overhead is justified. Some workloads disable read repair and rely on periodic anti-entropy repairs instead.

Handling Concurrent Writes

When two clients write the same key simultaneously, interesting scenarios arise even with quorum consistency. Understanding how concurrent writes interact is essential for correct application design.

Scenario: Concurrent writes with overlapping quorums

Replicas: A, B, C (N=3, W=2, R=2)

Client 1: Write value=X, timestamp=1000
  → Reaches A, B (W=2 met)

Client 2 (concurrent): Write value=Y, timestamp=1001
  → Reaches B, C (W=2 met)

Final state:
  A: value=X, timestamp=1000
  B: value=Y, timestamp=1001 (overwritten by Client 2)
  C: value=Y, timestamp=1001

Subsequent read (contacts any 2):
  - If contacts A, B: sees X and Y → Y wins (higher timestamp)
  - If contacts B, C: sees Y, Y → returns Y
  - If contacts A, C: sees X and Y → Y wins (higher timestamp)

Result: Y (higher timestamp) wins, X is silently lost

The "last writer wins" (LWW) problem:

With timestamp-based conflict resolution, concurrently writes result in one value silently overwriting the other. This is often undesirable:

// Shopping cart example
Client 1: cart = ["apple", "banana"]  // timestamp 1000
Client 2: cart = ["milk", "bread"]    // timestamp 1001

// LWW result: cart = ["milk", "bread"]
// Lost: "apple", "banana" silently gone!

Solutions to concurrent write conflicts:

Conflict Handling Strategies

•Last-write-wins (LWW) — Simple, predictable, but loses data. Appropriate when overwrites are the intended semantic (e.g., cache updates).
•Multi-value (siblings) — Store all concurrent versions, let application resolve. Used for shopping carts—merge all items.
•Conditional writes (CAS) — Compare-and-swap: only write if current version matches expected. Prevents silent conflicts but can fail.
•CRDTs — Conflict-free Replicated Data Types that merge automatically. Mathematically guaranteed to converge.
•Application-level merge — Application receives all versions, applies domain-specific logic to resolve.

Quorums Don't Prevent Conflicts

A common misconception is that R + W > N prevents write conflicts. It doesn't—it only guarantees you'll SEE the conflicts. Two concurrent writes can both succeed, and the conflict is detected and resolved during the next read. Quorums provide detection, not prevention.

Practical Quorum Configuration

With the theory understood, let's examine practical guidance for configuring quorums in production systems.

Starting point recommendations:

Quorum Configuration by Use Case
Use Case	N	Write CL	Read CL	Rationale
User sessions	3	LOCAL_QUORUM	LOCAL_ONE	Durable writes; fast reads; stale OK briefly
User profiles	3	QUORUM	QUORUM	Strong consistency; always read fresh data
Activity feeds	3	ONE	ONE	Eventually consistent; optimized for throughput
Financial txns	5	QUORUM	QUORUM	High durability; survive 2 node failures
Metrics/logs	3	ONE	ONE	High ingestion rate; slight loss acceptable
Inventory counts	3	QUORUM	QUORUM	Cannot oversell; must read accurate count
ML feature store	3	ONE	ONE	Cached features; staleness OK for minutes

Multi-datacenter considerations:

Setup: 3 DCs (US-East, US-West, EU), N=5 (2+2+1)

For local-first traffic:
  Write: LOCAL_QUORUM (2 in local DC)
  Read: LOCAL_QUORUM
  Trade-off: Cross-DC replication is async

For global consistency:
  Write: EACH_QUORUM (quorum in each DC)
  Read: QUORUM
  Trade-off: Slow (waits for distant DC)

Recommendation: LOCAL_QUORUM for most operations,
               EACH_QUORUM only for critical writes

Configuration Best Practices

•Start with QUORUM/QUORUM — Default to strong consistency; relax only after measuring performance.
•Use LOCAL_QUORUM for multi-DC — Avoid paying cross-DC latency on every operation.
•Monitor read repair rates — High repair rates indicate replica divergence; investigate root cause.
•Test failure scenarios — Verify your application handles unavailability when below quorum.
•Consider per-query tuning — Some queries need QUORUM, others can use ONE. Don't use one setting for all.
•Watch latency percentiles — Quorum = waiting for slowest required replica. Monitor p99 carefully.

The 'QUORUM Everywhere' Trap

It's tempting to use QUORUM for everything 'to be safe.' But this can cause unnecessary latency and reduce availability. Many read workloads can safely use ONE if slight staleness is acceptable. Think carefully about actual requirements for each operation type.

Summary: Mastering Quorum Consistency

We've explored the mathematics and practice of quorum-based consistency. Let's consolidate these insights:

Key Takeaways

•R + W > N guarantees read-write overlap — This is the fundamental formula for quorum-based strong consistency.
•Quorums are tunable per-operation — Different queries can have different consistency requirements.
•Higher quorums = less availability — Each node failure reduces the chance of meeting quorum.
•Read repair synchronizes replicas opportunistically — Every read is a chance to fix inconsistencies.
•Quorums detect conflicts, not prevent them — Concurrent writes can still succeed; resolution happens on read.
•LOCAL_QUORUM balances consistency and latency — Critical for multi-DC deployments.
•Configuration should match use case — Don't over-apply QUORUM; don't under-apply either.

What's next:

With quorum semantics understood, we'll tackle the inevitable challenge: conflict handling. The final page of this module explores how leaderless systems detect, track, and resolve conflicts when concurrent writes create divergent versions.

Page Complete

You now understand quorum mechanics deeply—from the N-R-W formula to practical configuration strategies. The final page will complete your leaderless replication knowledge with comprehensive conflict resolution strategies.