Operating SystemsDistributed Systems Basics

Distributed System Concepts

LevelAdvanced

Duration90 mins

TopicDistributed Systems Basics

5 / 5

Consistency Challenges

The Quest for Coherent State

When data lives on a single machine, consistency is straightforward: there's one copy, and all access goes through one system. But distributed systems replicate data across multiple nodes—for fault tolerance, for performance, for geographic coverage. This replication creates one of the most challenging problems in distributed computing: maintaining consistency.

What does it mean for distributed data to be 'consistent'? If I write a value in New York, when should it be visible in Tokyo? If two users update the same record simultaneously, which update wins? If a network partition prevents nodes from communicating, can they both continue accepting writes safely?

These questions have no easy answers. The solutions involve fundamental tradeoffs that shape every distributed system architecture. Understanding consistency—its models, impossibility results, and practical approaches—is essential for building and reasoning about distributed systems.

What You Will Learn

By the end of this page, you will understand the spectrum of consistency models from strong to eventual, the CAP theorem and its practical implications, special consistency guarantees that systems may provide, techniques for conflict resolution, and how to choose appropriate consistency levels for different use cases.

The Consistency Problem

Why Consistency Is Hard:

In a distributed system with replicated data, the consistency problem arises from fundamental physical constraints:

1. Network Latency Is Non-Zero

Information cannot travel faster than light. Sydney to London is ~17,000 km; at c = 300,000 km/s, minimum round-trip is ~113ms. No protocol can make data instantly consistent globally.

2. Messages Can Be Delayed or Lost

Networks are unreliable. A message to synchronize replicas might take milliseconds or seconds, or might never arrive. Systems must handle all possibilities.

3. Nodes Can Fail Independently

While Node A processes an update, Node B might crash, leaving replicas in different states. Recovery must handle partial update scenarios.

4. Concurrent Operations Create Conflicts

Two nodes might accept conflicting updates to the same data simultaneously, with no way to know about each other until later.

The Central Question:

Given these constraints, how do we ensure that when users access data—from any replica—they see a reasonable, predictable view of the data? Different answers define different consistency models.

Consistency vs. ACID Consistency

The 'C' in ACID (database transactions) refers to maintaining invariants within a single database (e.g., foreign keys, constraints). Distributed consistency refers to agreement across multiple nodes or replicas. These are related but distinct concepts. A system can have ACID consistency locally while having complex distributed consistency semantics.

Consistency Models: A Spectrum

Consistency models define the contract between a distributed storage system and its clients regarding what values can be returned by reads. They form a spectrum from strongest (most restrictive, easiest to reason about) to weakest (most permissive, hardest to program against).

Major Consistency Models (Strongest to Weakest)

•Linearizability (Strong Consistency) — Operations appear to execute instantaneously at some point between invocation and response. All clients see operations in the same order, and that order respects real-time ordering. Behaves like a single-copy system. Strongest practical guarantee.
•Sequential Consistency — All clients see operations in the same order, and that order respects each client's program order. (But not necessarily real-time order across clients.) Slightly weaker than linearizability.
•Causal Consistency — If operation A causally affects operation B (e.g., B reads a value written by A), all clients see A before B. Concurrent operations (neither caused the other) may be seen in different orders by different clients.
•Read-Your-Writes Consistency — A client always sees its own previous writes. (But may not see others' writes immediately.) Also called 'session consistency.'
•Monotonic Reads — Once a client reads a value, subsequent reads never return older values. Time moves forward, not backward.
•Eventual Consistency — If no new updates are made, eventually all replicas will converge to the same value. No guarantees about timing or intermediate states. Weakest useful guarantee.

Consistency Models Comparison
Model	Guarantee	Coordination Required	Example Systems
Linearizable	Real-time ordering; single-copy semantics	High (consensus on every op)	Spanner, CockroachDB
Sequential	Same global order; respects program order	High	Zookeeper
Causal	Cause precedes effect across all clients	Medium (metadata tracking)	COPS, Eiger
Read-Your-Writes	Client sees own writes	Low (per-session tracking)	Most web apps
Eventual	Converges eventually; no timing guarantee	Minimal	Cassandra, DynamoDB (default)

The Tradeoff

Stronger consistency requires more coordination. More coordination means higher latency (waiting for acknowledgments), lower availability (can't proceed during partitions), and lower throughput (coordination is a bottleneck). Weaker consistency allows faster, more available systems but places the burden of handling anomalies on application developers.

Linearizability: The Gold Standard

Linearizability (also called atomic consistency or strong consistency) is the strongest single-object consistency model. It provides the illusion that there is only one copy of the data, and all operations on it are atomic.

Formal Definition:

An execution is linearizable if:

Each operation appears to take effect instantaneously at some point (the linearization point) between its invocation and response.
All operations can be ordered in a sequence consistent with their linearization points.
This sequence respects read/write semantics: each read returns the value of the most recent preceding write.

What Linearizability Provides:

Real-time guarantees: If operation A completes before operation B starts, A's effects are visible to B.
Single-copy semantics: The system behaves as if there's one copy of each object.
Composability: Linearizable operations on individual objects are linearizable as a group.

What Linearizability Does NOT Provide:

Transactions spanning multiple objects (requires additional mechanisms)
Protection against application-level races
Guarantees about durability (that's a separate concern)

Achieving Linearizability:

Linearizability requires coordination—all replicas must agree on the order of all operations. Common approaches:

1. Single Leader

One node (primary) orders all writes
Reads from primary are linearizable
Reads from replicas may be stale (not linearizable)
Simple but single point of failure

2. Consensus Protocols

Paxos, Raft agree on operation order
All committed operations have agreed order
Tolerates minority failures
Latency for multi-node agreement

3. Synchronized Clocks (Google Spanner)

TrueTime provides bounded clock uncertainty
Operations assigned timestamps with uncertainty interval
Wait until interval passes before returning
Linearizable without per-operation consensus

When to Use Linearizability

Use linearizability for: leader election (only one leader at a time), distributed locks (mutual exclusion), unique sequence generation (unique IDs), and any scenario where 'exactly-once' semantics are critical. Don't use for high-volume operations where the coordination cost is prohibitive.

The CAP Theorem

The CAP theorem (Consistency, Availability, Partition tolerance), formulated by Eric Brewer and proven by Gilbert and Lynch (2002), is the most famous impossibility result in distributed systems. It states that a distributed data store can provide at most two of these three properties:

C — Consistency (Linearizability) Every read receives the most recent write or an error.

A — Availability Every request to a non-failing node receives a response (no timeout or error).

P — Partition Tolerance The system continues to operate despite arbitrary partition of the network (messages between nodes may be lost or delayed indefinitely).

The Impossibility:

During a network partition, you must choose:

Stop processing (sacrifice availability for consistency) — Refuse to respond to requests if you can't confirm you have the latest data
Continue processing (sacrifice consistency for availability) — Respond with potentially stale data because you can't communicate with other replicas

CP Systems (Consistency + Partition Tolerance)

•Prefer consistency over availability
•During partition: refuse requests or wait
•No conflicting values across nodes
•Examples: HBase, MongoDB (default), etcd
•Use when: correctness is critical (banking)

AP Systems (Availability + Partition Tolerance)

•Prefer availability over consistency
•During partition: continue serving requests
•May return stale or conflicting values
•Examples: Cassandra, DynamoDB, CouchDB
•Use when: availability is critical (shopping cart)

CAP Nuances and Clarifications:

1. It's About Partitions

CAP only forces a choice during partitions. When the network is healthy, you can have both consistency and availability. The question is what happens when things go wrong.

2. Partition Tolerance Is Not Optional

Network partitions are a fact of life in distributed systems. 'CA' (without P) would mean the system simply doesn't work during partitions—not a useful system. Real choices are CP or AP.

3. Consistency Has a Specific Meaning

CAP's 'consistency' means linearizability, specifically. Weaker consistency models (eventual, causal) can provide high availability during partitions.

4. The Choice Is Per-Operation

A system can make different choices for different operations. Write operations might be CP (strong consistency) while reads might be AP (always respond, possibly stale).

PACELC: Extending CAP

The PACELC theorem (Daniel Abadi) extends CAP: 'if Partition, Availability or Consistency; else, Latency or Consistency.' Even without partitions, there's a tradeoff between latency and consistency. Synchronous replication (for consistency) adds latency. The full spectrum of tradeoffs is P→A/C, else L/C.

Eventual Consistency Deep Dive

Eventual consistency is the consistency model of many highly scalable distributed systems. It's often misunderstood—let's examine it precisely.

Definition:

If no new updates are made to a data item, eventually all replicas will converge to the same value.

What 'Eventually' Means:

The system makes a best effort to propagate updates
No bound on how long convergence takes
During propagation, different replicas may have different values
Reads may return stale data

What Eventual Consistency Allows:

Replicas operate independently (no coordination for writes)
System remains available during partitions
Low latency for writes (no waiting for propagation)
High throughput (no coordination bottleneck)

What Eventual Consistency Does NOT Guarantee:

When updates will be visible at other replicas
What intermediate states clients might observe
That concurrent updates will resolve consistently

The Challenge of Concurrent Updates:

When two replicas accept conflicting updates to the same data:

Time:
  t1: Replica A sets x = 1
  t2: Replica B sets x = 2
  t3: Replicas synchronize... what is x?

Eventual consistency doesn't specify how conflicts resolve—only that replicas eventually agree. Additional mechanisms are needed:

Conflict Resolution Strategies:

Last-Writer-Wins (LWW)
- Each write has a timestamp
- Highest timestamp wins
- Simple but loses updates
- Depends on clock synchronization
Merge Functions (CRDTs)
- Data structures with mathematically proven merge
- Concurrent updates merge deterministically
- Examples: counters, sets, registers
Application-Level Resolution
- Store all conflicting values (siblings)
- Application picks or merges at read time
- Most flexible but complex

Conflict Resolution Comparison
Strategy	Approach	Pros	Cons
Last-Writer-Wins	Latest timestamp wins	Simple, automatic	Loses updates, clock dependency
CRDTs	Mathematically mergeable types	No data loss, automatic	Limited data types, space overhead
Vector Clocks + App	Track causality, app resolves	Flexible, preserves updates	Complex, app burden

Quorum Systems and Tunable Consistency

Quorum systems are a powerful technique for providing tunable consistency in replicated systems. By adjusting the number of replicas that must participate in reads and writes, systems can trade off between consistency and availability.

The Quorum Principle:

With N total replicas:

W = number of replicas that must acknowledge a write
R = number of replicas that must respond to a read

If W + R > N, reads and writes overlap—at least one replica in every read quorum has the latest write. This provides strong consistency.

Examples:

N = 3 replicas

W = 2, R = 2: W + R = 4 > 3 ✓ (strong consistency)
  - Write to majority, read from majority
  - Tolerates 1 failure for both reads and writes

W = 3, R = 1: W + R = 4 > 3 ✓ (write-heavy)
  - All replicas must acknowledge writes
  - Any replica can answer reads
  - Fast reads, slower writes

W = 1, R = 3: W + R = 4 > 3 ✓ (read-heavy)
  - Write to any replica
  - Must read all replicas
  - Fast writes, slower reads

W = 1, R = 1: W + R = 2 ≤ 3 ✗ (eventual consistency)
  - May miss latest write
  - Fast but potentially stale

Common Quorum Configurations
W	R	N	Consistency	Availability	Latency
N	1	N	Strong	Low (need all for writes)	Low reads, high writes
(N+1)/2	(N+1)/2	N	Strong	Medium (majority)	Medium both
1	N	N	Strong	Low (need all for reads)	High reads, low writes
1	1	N	Eventual	High	Low both

Quorums in Practice:

Cassandra uses tunable quorum:

QUORUM: majority (N/2 + 1)
ONE: single replica (eventual)
ALL: all replicas (strongest)
Per-query configuration possible

Riak and DynamoDB similarly allow tuning W and R.

Limitations:

Quorum overlap only ensures 'some' overlap, not 'which' values
Without additional mechanisms, conflicting concurrent writes need resolution
Sloppy quorums (during partitions, non-home replicas count) weaken guarantees

Practical Guidance

For critical operations (financial transactions), use QUORUM or ALL. For latency-sensitive operations tolerant of stale data (product catalog), use ONE or LOCAL_QUORUM. Mixing consistency levels within an application is common—different operations have different requirements.

Causal Consistency

Causal consistency is a compelling middle ground between strong and eventual consistency. It preserves intuitive cause-and-effect relationships while allowing high availability.

Definition:

If operation A causally precedes operation B (denoted A → B), then all clients observe A before B. Concurrent operations (neither causally precedes the other) may be observed in any order.

Causality Relationships:

Same client: If the same client does A then B, A → B
Read-from: If B reads a value written by A, A → B
Transitive: If A → B and B → C, then A → C

Concurrent: If neither A → B nor B → A, they are concurrent.

Example: Social Media Post

Alice posts: "I got a job!"
Bob reads Alice's post
Bob comments: "Congratulations!"

With causal consistency:

All users see Alice's post before Bob's comment
Cause (original post) always precedes effect (reply)

With only eventual consistency:

Some users might see Bob's comment before Alice's post
Confusing and violates user expectations

Tracking Causality:

Implementing causal consistency requires tracking causal dependencies:

Vector Clocks:

Each node maintains a vector of logical clocks
Vector captures which operations at other nodes have been observed
Comparison determines causality or concurrency

Dependency Metadata:

Operations carry metadata about operations they depend on
Reads delayed until dependencies are satisfied

Causal Consistency Tradeoffs
Aspect	Causal Consistency	Comparison
Availability	High (during partitions)	Better than linearizable
Latency	Low (local reads possible)	Better than linearizable
Programmer Model	Intuitive for many apps	Easier than eventual
Implementation	Metadata overhead	More complex than eventual
Constraint	No real-time ordering	Weaker than linearizable

When Causal Is Enough

Many applications need causal consistency but not linearizability. Social feeds, collaborative documents, and messaging don't require global real-time ordering—they just need causes to precede effects. Causal consistency provides this with much better availability and latency than linearizability.

CRDTs: Conflict-Free Replicated Data Types

CRDTs (Conflict-free Replicated Data Types) are data structures designed to be replicated across distributed nodes without coordination, while guaranteeing eventual convergence. They mathematically ensure that concurrent modifications merge automatically and deterministically.

The CRDT Insight:

Traditional data types conflict on concurrent modification:

Two nodes increment a counter: could double-count or lose increments
Two nodes add to a set: concurrent adds might conflict

CRDTs define operations and merge functions that commute, associate, and are idempotent—ensuring that all orderings of operations produce the same result.

Types of CRDTs:

Operation-based (CmRDT):

Operations broadcast to all replicas
Operations applied in any order (must commute)
Requires reliable broadcast

State-based (CvRDT):

Full state periodically synchronized
States merged using join operation (must form a semilattice)
Simpler network requirements

Common CRDT Types

•G-Counter (Grow-only Counter) — Each node maintains its own count; total is sum of all. Only increments allowed. Merges by taking max per node.
•PN-Counter (Positive-Negative Counter) — Two G-Counters: one for increments, one for decrements. Value is difference. Allows increment and decrement.
•G-Set (Grow-only Set) — Elements can be added, never removed. Merge is set union. Simple and useful for add-only collections.
•2P-Set (Two-Phase Set) — Separate sets for additions and removals. Element present if in add-set but not remove-set. Removed items cannot be re-added.
•OR-Set (Observed-Remove Set) — Each element tagged with unique ID. Add creates new tag; remove only removes observed tags. Concurrent add/remove is handled correctly.
•LWW-Register (Last-Writer-Wins Register) — Each write has a timestamp. Read returns highest timestamp value. Simple but timestamp-dependent.
•MV-Register (Multi-Value Register) — Concurrent writes create multiple values. Read returns all concurrent values; application or user resolves.

CRDTs in Practice:

Riak: OR-Sets, PN-Counters, Maps built in
Redis: CRDT-based Redis Enterprise for geo-replication
Apple Notes: Uses CRDTs for real-time collaboration
Figma: CRDT-based collaborative design
Yjs / Automerge: CRDT libraries for collaborative editing

Limitations:

Not all operations have CRDT equivalents (arbitrary data logic)
Space overhead for metadata (especially version tracking)
Semantics may differ from expected (counters absorb concurrent decrements)
Deletion and removal require care (tombstones, garbage collection)

Making Practical Consistency Choices

Choosing the right consistency model is a critical architectural decision. The right choice depends on application requirements, user expectations, and operational constraints.

Decision Framework:

When to Use Which Consistency Model
Requirement	Recommended Model	Reasoning
Financial transactions	Linearizable	Cannot tolerate stale reads or lost updates
Leader election, locks	Linearizable	Need mutual exclusion guarantees
Social media timeline	Causal	Causes before effects; high availability
Shopping cart	Eventual (with merge)	Availability critical; merge on checkout
Analytics/logging	Eventual	Stale data acceptable; volume high
Session state	Read-your-writes	User sees their own updates
User preferences	Eventual	Slightly stale settings are fine

Mixing Consistency Levels:

Most real systems use different consistency levels for different operations:

Example: E-Commerce Platform

Operation	Consistency	Reason
Inventory check	Eventual	Performance; prevent sold-out but safe
Inventory decrement	Linearizable	Must not oversell
Shopping cart	Eventual + merge	High availability
Order placement	Linearizable	Financial correctness
Product reviews	Eventual	Stale reviews acceptable
User authentication	Strongly consistent	Security

Operational Considerations:

Stronger consistency = more monitoring, more failure modes
Weaker consistency = more application complexity for handling anomalies
Team expertise matters: eventual consistency requires careful programming
Start stronger, relax as needed: easier than tightening later

Don't Choose Consistency Casually

Consistency requirements are often discovered after production incidents. A system that 'works fine' with eventual consistency may have rare bugs that cause data loss or corruption. Test corner cases. Chaos test with partitions. Understand the implications before committing to a model.

Summary: Consistency Challenges in Distributed Systems

Consistency is perhaps the most nuanced challenge in distributed systems. Let's consolidate the key insights:

Key Takeaways

•Replication creates the consistency challenge — Multiple copies of data may have different values. Synchronization is the problem.
•Consistency models form a spectrum — From linearizable (strongest, highest cost) to eventual (weakest, lowest cost). Each model is a different contract.
•The CAP theorem constrains choices during partitions — Must choose between consistency and availability. Partition tolerance is not optional in distributed systems.
•Linearizability provides single-copy semantics — Operations appear atomic and ordered. Achieved via consensus or synchronized clocks. Expensive.
•Eventual consistency provides availability — Replicas converge eventually. Concurrent updates need resolution strategies (LWW, CRDTs, application logic).
•Quorum systems provide tunable consistency — W + R > N ensures overlap. Trade off consistency, availability, and latency by adjusting quorums.
•Causal consistency is often the sweet spot — Preserves intuitive cause-effect while allowing high availability. Good for social and collaborative applications.
•CRDTs enable conflict-free convergence — Special data structures that merge deterministically. Powerful for specific use cases but not universally applicable.
•Real systems use mixed consistency — Different operations have different requirements. Match consistency level to each operation's needs.

Module Complete:

You've now completed the foundational module on Distributed System Concepts. You understand what distributed systems are, how they achieve transparency, the principles of scalability, mechanisms for fault tolerance, and the challenges of consistency. This foundation prepares you for deeper exploration of distributed communication, coordination, file systems, and cloud computing in subsequent modules.

Page Complete

You now understand consistency comprehensively: the spectrum of consistency models, the CAP theorem and its implications, eventual consistency and conflict resolution, quorum systems, causal consistency, CRDTs, and how to make practical consistency choices. This knowledge is essential for designing and reasoning about distributed data systems.

5 / 5

Loading learning content...

Operating SystemsDistributed Systems Basics

Distributed System Concepts

LevelAdvanced

Duration90 mins

TopicDistributed Systems Basics

5 / 5

Consistency Challenges

The Quest for Coherent State

What You Will Learn

The Consistency Problem

Why Consistency Is Hard:

In a distributed system with replicated data, the consistency problem arises from fundamental physical constraints:

1. Network Latency Is Non-Zero

Information cannot travel faster than light. Sydney to London is ~17,000 km; at c = 300,000 km/s, minimum round-trip is ~113ms. No protocol can make data instantly consistent globally.

2. Messages Can Be Delayed or Lost

Networks are unreliable. A message to synchronize replicas might take milliseconds or seconds, or might never arrive. Systems must handle all possibilities.

3. Nodes Can Fail Independently

While Node A processes an update, Node B might crash, leaving replicas in different states. Recovery must handle partial update scenarios.

4. Concurrent Operations Create Conflicts

Two nodes might accept conflicting updates to the same data simultaneously, with no way to know about each other until later.

The Central Question:

Consistency vs. ACID Consistency

Consistency Models: A Spectrum

Major Consistency Models (Strongest to Weakest)

•Linearizability (Strong Consistency) — Operations appear to execute instantaneously at some point between invocation and response. All clients see operations in the same order, and that order respects real-time ordering. Behaves like a single-copy system. Strongest practical guarantee.
•Sequential Consistency — All clients see operations in the same order, and that order respects each client's program order. (But not necessarily real-time order across clients.) Slightly weaker than linearizability.
•Causal Consistency — If operation A causally affects operation B (e.g., B reads a value written by A), all clients see A before B. Concurrent operations (neither caused the other) may be seen in different orders by different clients.
•Read-Your-Writes Consistency — A client always sees its own previous writes. (But may not see others' writes immediately.) Also called 'session consistency.'
•Monotonic Reads — Once a client reads a value, subsequent reads never return older values. Time moves forward, not backward.
•Eventual Consistency — If no new updates are made, eventually all replicas will converge to the same value. No guarantees about timing or intermediate states. Weakest useful guarantee.

Consistency Models Comparison
Model	Guarantee	Coordination Required	Example Systems
Linearizable	Real-time ordering; single-copy semantics	High (consensus on every op)	Spanner, CockroachDB
Sequential	Same global order; respects program order	High	Zookeeper
Causal	Cause precedes effect across all clients	Medium (metadata tracking)	COPS, Eiger
Read-Your-Writes	Client sees own writes	Low (per-session tracking)	Most web apps
Eventual	Converges eventually; no timing guarantee	Minimal	Cassandra, DynamoDB (default)

The Tradeoff

Linearizability: The Gold Standard

Formal Definition:

An execution is linearizable if:

Each operation appears to take effect instantaneously at some point (the linearization point) between its invocation and response.
All operations can be ordered in a sequence consistent with their linearization points.
This sequence respects read/write semantics: each read returns the value of the most recent preceding write.

What Linearizability Provides:

Real-time guarantees: If operation A completes before operation B starts, A's effects are visible to B.
Single-copy semantics: The system behaves as if there's one copy of each object.
Composability: Linearizable operations on individual objects are linearizable as a group.

What Linearizability Does NOT Provide:

Transactions spanning multiple objects (requires additional mechanisms)
Protection against application-level races
Guarantees about durability (that's a separate concern)

Achieving Linearizability:

Linearizability requires coordination—all replicas must agree on the order of all operations. Common approaches:

1. Single Leader

One node (primary) orders all writes
Reads from primary are linearizable
Reads from replicas may be stale (not linearizable)
Simple but single point of failure

2. Consensus Protocols

Paxos, Raft agree on operation order
All committed operations have agreed order
Tolerates minority failures
Latency for multi-node agreement

3. Synchronized Clocks (Google Spanner)

TrueTime provides bounded clock uncertainty
Operations assigned timestamps with uncertainty interval
Wait until interval passes before returning
Linearizable without per-operation consensus

When to Use Linearizability

The CAP Theorem

C — Consistency (Linearizability) Every read receives the most recent write or an error.

A — Availability Every request to a non-failing node receives a response (no timeout or error).

P — Partition Tolerance The system continues to operate despite arbitrary partition of the network (messages between nodes may be lost or delayed indefinitely).

The Impossibility:

During a network partition, you must choose:

Stop processing (sacrifice availability for consistency) — Refuse to respond to requests if you can't confirm you have the latest data
Continue processing (sacrifice consistency for availability) — Respond with potentially stale data because you can't communicate with other replicas

CP Systems (Consistency + Partition Tolerance)

•Prefer consistency over availability
•During partition: refuse requests or wait
•No conflicting values across nodes
•Examples: HBase, MongoDB (default), etcd
•Use when: correctness is critical (banking)

AP Systems (Availability + Partition Tolerance)

•Prefer availability over consistency
•During partition: continue serving requests
•May return stale or conflicting values
•Examples: Cassandra, DynamoDB, CouchDB
•Use when: availability is critical (shopping cart)

CAP Nuances and Clarifications:

1. It's About Partitions

CAP only forces a choice during partitions. When the network is healthy, you can have both consistency and availability. The question is what happens when things go wrong.

2. Partition Tolerance Is Not Optional

Network partitions are a fact of life in distributed systems. 'CA' (without P) would mean the system simply doesn't work during partitions—not a useful system. Real choices are CP or AP.

3. Consistency Has a Specific Meaning

CAP's 'consistency' means linearizability, specifically. Weaker consistency models (eventual, causal) can provide high availability during partitions.

4. The Choice Is Per-Operation

A system can make different choices for different operations. Write operations might be CP (strong consistency) while reads might be AP (always respond, possibly stale).

PACELC: Extending CAP

Eventual Consistency Deep Dive

Eventual consistency is the consistency model of many highly scalable distributed systems. It's often misunderstood—let's examine it precisely.

Definition:

If no new updates are made to a data item, eventually all replicas will converge to the same value.

What 'Eventually' Means:

The system makes a best effort to propagate updates
No bound on how long convergence takes
During propagation, different replicas may have different values
Reads may return stale data

What Eventual Consistency Allows:

Replicas operate independently (no coordination for writes)
System remains available during partitions
Low latency for writes (no waiting for propagation)
High throughput (no coordination bottleneck)

What Eventual Consistency Does NOT Guarantee:

When updates will be visible at other replicas
What intermediate states clients might observe
That concurrent updates will resolve consistently

The Challenge of Concurrent Updates:

When two replicas accept conflicting updates to the same data:

Time:
  t1: Replica A sets x = 1
  t2: Replica B sets x = 2
  t3: Replicas synchronize... what is x?

Eventual consistency doesn't specify how conflicts resolve—only that replicas eventually agree. Additional mechanisms are needed:

Conflict Resolution Strategies:

Last-Writer-Wins (LWW)
- Each write has a timestamp
- Highest timestamp wins
- Simple but loses updates
- Depends on clock synchronization
Merge Functions (CRDTs)
- Data structures with mathematically proven merge
- Concurrent updates merge deterministically
- Examples: counters, sets, registers
Application-Level Resolution
- Store all conflicting values (siblings)
- Application picks or merges at read time
- Most flexible but complex

Conflict Resolution Comparison
Strategy	Approach	Pros	Cons
Last-Writer-Wins	Latest timestamp wins	Simple, automatic	Loses updates, clock dependency
CRDTs	Mathematically mergeable types	No data loss, automatic	Limited data types, space overhead
Vector Clocks + App	Track causality, app resolves	Flexible, preserves updates	Complex, app burden

Quorum Systems and Tunable Consistency

The Quorum Principle:

With N total replicas:

W = number of replicas that must acknowledge a write
R = number of replicas that must respond to a read

If W + R > N, reads and writes overlap—at least one replica in every read quorum has the latest write. This provides strong consistency.

Examples:

N = 3 replicas

W = 2, R = 2: W + R = 4 > 3 ✓ (strong consistency)
  - Write to majority, read from majority
  - Tolerates 1 failure for both reads and writes

W = 3, R = 1: W + R = 4 > 3 ✓ (write-heavy)
  - All replicas must acknowledge writes
  - Any replica can answer reads
  - Fast reads, slower writes

W = 1, R = 3: W + R = 4 > 3 ✓ (read-heavy)
  - Write to any replica
  - Must read all replicas
  - Fast writes, slower reads

W = 1, R = 1: W + R = 2 ≤ 3 ✗ (eventual consistency)
  - May miss latest write
  - Fast but potentially stale

Common Quorum Configurations
W	R	N	Consistency	Availability	Latency
N	1	N	Strong	Low (need all for writes)	Low reads, high writes
(N+1)/2	(N+1)/2	N	Strong	Medium (majority)	Medium both
1	N	N	Strong	Low (need all for reads)	High reads, low writes
1	1	N	Eventual	High	Low both

Quorums in Practice:

Cassandra uses tunable quorum:

QUORUM: majority (N/2 + 1)
ONE: single replica (eventual)
ALL: all replicas (strongest)
Per-query configuration possible

Riak and DynamoDB similarly allow tuning W and R.

Limitations:

Quorum overlap only ensures 'some' overlap, not 'which' values
Without additional mechanisms, conflicting concurrent writes need resolution
Sloppy quorums (during partitions, non-home replicas count) weaken guarantees

Practical Guidance

Causal Consistency

Causal consistency is a compelling middle ground between strong and eventual consistency. It preserves intuitive cause-and-effect relationships while allowing high availability.

Definition:

If operation A causally precedes operation B (denoted A → B), then all clients observe A before B. Concurrent operations (neither causally precedes the other) may be observed in any order.

Causality Relationships:

Same client: If the same client does A then B, A → B
Read-from: If B reads a value written by A, A → B
Transitive: If A → B and B → C, then A → C

Concurrent: If neither A → B nor B → A, they are concurrent.

Example: Social Media Post

Alice posts: "I got a job!"
Bob reads Alice's post
Bob comments: "Congratulations!"

With causal consistency:

All users see Alice's post before Bob's comment
Cause (original post) always precedes effect (reply)

With only eventual consistency:

Some users might see Bob's comment before Alice's post
Confusing and violates user expectations

Tracking Causality:

Implementing causal consistency requires tracking causal dependencies:

Vector Clocks:

Each node maintains a vector of logical clocks
Vector captures which operations at other nodes have been observed
Comparison determines causality or concurrency

Dependency Metadata:

Operations carry metadata about operations they depend on
Reads delayed until dependencies are satisfied

Causal Consistency Tradeoffs
Aspect	Causal Consistency	Comparison
Availability	High (during partitions)	Better than linearizable
Latency	Low (local reads possible)	Better than linearizable
Programmer Model	Intuitive for many apps	Easier than eventual
Implementation	Metadata overhead	More complex than eventual
Constraint	No real-time ordering	Weaker than linearizable

When Causal Is Enough

CRDTs: Conflict-Free Replicated Data Types

The CRDT Insight:

Traditional data types conflict on concurrent modification:

Two nodes increment a counter: could double-count or lose increments
Two nodes add to a set: concurrent adds might conflict

CRDTs define operations and merge functions that commute, associate, and are idempotent—ensuring that all orderings of operations produce the same result.

Types of CRDTs:

Operation-based (CmRDT):

Operations broadcast to all replicas
Operations applied in any order (must commute)
Requires reliable broadcast

State-based (CvRDT):

Full state periodically synchronized
States merged using join operation (must form a semilattice)
Simpler network requirements

Common CRDT Types

•G-Counter (Grow-only Counter) — Each node maintains its own count; total is sum of all. Only increments allowed. Merges by taking max per node.
•PN-Counter (Positive-Negative Counter) — Two G-Counters: one for increments, one for decrements. Value is difference. Allows increment and decrement.
•G-Set (Grow-only Set) — Elements can be added, never removed. Merge is set union. Simple and useful for add-only collections.
•2P-Set (Two-Phase Set) — Separate sets for additions and removals. Element present if in add-set but not remove-set. Removed items cannot be re-added.
•OR-Set (Observed-Remove Set) — Each element tagged with unique ID. Add creates new tag; remove only removes observed tags. Concurrent add/remove is handled correctly.
•LWW-Register (Last-Writer-Wins Register) — Each write has a timestamp. Read returns highest timestamp value. Simple but timestamp-dependent.
•MV-Register (Multi-Value Register) — Concurrent writes create multiple values. Read returns all concurrent values; application or user resolves.

CRDTs in Practice:

Riak: OR-Sets, PN-Counters, Maps built in
Redis: CRDT-based Redis Enterprise for geo-replication
Apple Notes: Uses CRDTs for real-time collaboration
Figma: CRDT-based collaborative design
Yjs / Automerge: CRDT libraries for collaborative editing

Limitations:

Not all operations have CRDT equivalents (arbitrary data logic)
Space overhead for metadata (especially version tracking)
Semantics may differ from expected (counters absorb concurrent decrements)
Deletion and removal require care (tombstones, garbage collection)

Making Practical Consistency Choices

Choosing the right consistency model is a critical architectural decision. The right choice depends on application requirements, user expectations, and operational constraints.

Decision Framework:

When to Use Which Consistency Model
Requirement	Recommended Model	Reasoning
Financial transactions	Linearizable	Cannot tolerate stale reads or lost updates
Leader election, locks	Linearizable	Need mutual exclusion guarantees
Social media timeline	Causal	Causes before effects; high availability
Shopping cart	Eventual (with merge)	Availability critical; merge on checkout
Analytics/logging	Eventual	Stale data acceptable; volume high
Session state	Read-your-writes	User sees their own updates
User preferences	Eventual	Slightly stale settings are fine

Mixing Consistency Levels:

Most real systems use different consistency levels for different operations:

Example: E-Commerce Platform

Operation	Consistency	Reason
Inventory check	Eventual	Performance; prevent sold-out but safe
Inventory decrement	Linearizable	Must not oversell
Shopping cart	Eventual + merge	High availability
Order placement	Linearizable	Financial correctness
Product reviews	Eventual	Stale reviews acceptable
User authentication	Strongly consistent	Security

Operational Considerations:

Stronger consistency = more monitoring, more failure modes
Weaker consistency = more application complexity for handling anomalies
Team expertise matters: eventual consistency requires careful programming
Start stronger, relax as needed: easier than tightening later

Don't Choose Consistency Casually

Summary: Consistency Challenges in Distributed Systems

Consistency is perhaps the most nuanced challenge in distributed systems. Let's consolidate the key insights:

Key Takeaways

•Replication creates the consistency challenge — Multiple copies of data may have different values. Synchronization is the problem.
•Consistency models form a spectrum — From linearizable (strongest, highest cost) to eventual (weakest, lowest cost). Each model is a different contract.
•The CAP theorem constrains choices during partitions — Must choose between consistency and availability. Partition tolerance is not optional in distributed systems.
•Linearizability provides single-copy semantics — Operations appear atomic and ordered. Achieved via consensus or synchronized clocks. Expensive.
•Eventual consistency provides availability — Replicas converge eventually. Concurrent updates need resolution strategies (LWW, CRDTs, application logic).
•Quorum systems provide tunable consistency — W + R > N ensures overlap. Trade off consistency, availability, and latency by adjusting quorums.
•Causal consistency is often the sweet spot — Preserves intuitive cause-effect while allowing high availability. Good for social and collaborative applications.
•CRDTs enable conflict-free convergence — Special data structures that merge deterministically. Powerful for specific use cases but not universally applicable.
•Real systems use mixed consistency — Different operations have different requirements. Match consistency level to each operation's needs.

Module Complete:

Page Complete

5 / 5