System Design (HLD)CAP Theorem

CAP Theorem: Understanding Distributed System Trade-offs

LevelIntermediate

Duration90 mins

TopicCAP Theorem

4 / 5

CAP in Practice — Applying Theory to Real Systems

From Theory to Architecture

You've mastered the theory of CAP—consistency, availability, and partition tolerance. You understand the formal definitions, the trade-offs, and the impossibility results. But sitting in a design review, someone asks: "Should we use Cassandra or PostgreSQL for this service?" Theory alone won't answer that question.

CAP in practice is about translating theoretical understanding into architectural decisions. It's about recognizing that real systems exist on a spectrum, not in neat categories. It's about understanding that the same system can behave as CP in one configuration and AP in another. And it's about knowing when CAP is the right framework and when other considerations dominate.

What You Will Learn

By the end of this page, you will have practical frameworks for making CAP-informed decisions, understand how popular distributed systems make their CAP trade-offs, know how to evaluate systems against your requirements, and develop intuition for when CAP is the primary concern versus when other factors dominate.

A Practical CAP Decision Framework

When designing a distributed system, use this framework to make CAP-informed decisions:

Step 1: Identify Your Data Categories

Not all data has the same requirements. Categorize your data:

Category	Examples	Typical Priority
Financial	Payments, balances, ledgers	Consistency
User Identity	Authentication, permissions	Consistency
User Content	Posts, comments, uploads	Often availability
Session Data	Login state, preferences	Availability
Analytics	Metrics, logs, events	Availability
Inventory	Stock levels, reservations	Context-dependent
Catalog	Products, prices, descriptions	Often availability

Step 2: Define Your Failure Scenarios

For each category, ask: What happens during a partition?

Accepting stale data: Is a 5-minute-old view acceptable?
Returning errors: Can users tolerate "try again later"?
Queue for later: Can operations be deferred and applied later?
Conflicting writes: What if two users modify the same data?

Key Questions for CAP Decisions

•What's the cost of showing stale data? Social feed can be stale. Bank balance cannot.
•What's the cost of refusing a request? E-commerce cart: high cost. Compliance audit: acceptable.
•How long do partitions typically last? Seconds: maybe wait it out. Hours: need a strategy.
•Can conflicts be automatically resolved? Counters can merge. Unique constraints cannot.
•What do users expect? Financial apps expect accuracy. Social apps expect availability.
•What are the regulatory requirements? Some domains legally require consistency.

Step 3: Choose Your Strategy Per Category

For data requiring consistency:

Use CP storage (PostgreSQL with sync replication, etcd, CockroachDB)
Accept that some operations will fail during partitions
Design UX for graceful error handling
Implement retry logic with exponential backoff

For data tolerating eventual consistency:

Use AP storage (Cassandra, DynamoDB with eventual read)
Implement conflict resolution (LWW, merge, CRDT)
Consider read-your-writes guarantees for better UX
Monitor replication lag and divergence

For data that needs both (the hard cases):

Separate the data: some fields CP, some fields AP
Use hybrid architecture: CP for writes, cached AP for reads
Accept degraded functionality during partitions
Design explicit modes: "offline mode" vs "online mode"

Step 4: Validate with Failure Testing

After implementation, validate your choices:

Inject partitions and verify behavior matches design
Measure actual availability and consistency during failures
Test conflict resolution with realistic scenarios
Update design based on production experiences

The 80/20 Rule of CAP

In most systems, 80% of data can use a simple AP strategy with eventual consistency. Only 20% (or less) truly requires strong consistency. Don't over-engineer. Start with the question: 'What's the actual damage if this data is briefly inconsistent?' Often, the answer is 'not much.'

How Popular Systems Make CAP Trade-offs

Understanding how production systems handle CAP provides practical insight. Let's examine several popular distributed databases and their CAP characteristics.

CAP Characteristics of Popular Distributed Systems
System	CAP Mode	Consistency Model	Partition Behavior	When to Use
PostgreSQL (single)	CA*	Strong (ACID)	N/A (single node)	Strong consistency, moderate scale
PostgreSQL (sync rep)	CP	Strong	Minority side unavailable	Strong consistency with HA
MySQL Cluster	CP	Strong	Waits for majority	OLTP with high availability
CockroachDB	CP	Serializable	Minority partitions readonly	Distributed SQL, strong consistency
MongoDB	Configurable	Tunable per-op	Depends on write/read concern	Flexible, document-oriented
Cassandra	AP	Eventual	All nodes serve	High availability, write-heavy
DynamoDB	AP	Eventual (default)	Continues operating	Serverless, predictable performance
Redis Cluster	CP (consensus)	Strong within slot	Minority unavailable	Caching with availability
etcd	CP	Linearizable	Minority unavailable	Configuration, coordination
ZooKeeper	CP	Linearizable	Minority unavailable	Coordination, leader election

Systems Are More Nuanced Than Labels

Labeling a system 'CP' or 'AP' oversimplifies. Most modern systems offer configurable consistency. MongoDB can be CP or AP depending on writeConcern and readConcern settings. DynamoDB offers strong consistency reads. The label describes default or primary behavior, not absolute constraints.

Deep Dive: MongoDB's Configurable Consistency

MongoDB beautifully illustrates how modern systems offer tunable consistency:

Write Concern: Controls durability/consistency of writes

w: 1 - Acknowledged when primary receives (AP-ish)
w: majority - Acknowledged when majority has it (CP-ish)
w: 0 - Fire and forget (pure AP)

Read Concern: Controls consistency of reads

local - Returns primary's data, may be rolled back (AP)
majority - Returns data confirmed by majority (CP)
linearizable - Reflects all acknowledged writes (strong CP)

Read Preference: Where to read from

primary - Always read from primary (consistent but SPOF)
secondary - Read from secondaries (available but potentially stale)
nearest - Read from lowest-latency node (fastest but unpredictable consistency)

This flexibility lets you tune per-operation, using strong consistency for critical ops and relaxed consistency for others.

CP Systems: When Correctness Is Non-Negotiable

CP systems sacrifice availability during partitions to maintain consistency. Let's explore when and how to use them effectively.

When to Choose CP:

Financial transactions: Double-spending, incorrect balances are unacceptable
Inventory management: Overselling has real costs (expedited shipping, customer anger)
Booking systems: Double-booking a hotel room, flight seat, or appointment
Leader election: Having two leaders causes split-brain and data corruption
Configuration data: Inconsistent config can crash entire systems
Identity/authentication: Wrong permissions are security vulnerabilities

Implementing CP Effectively:

1. Use consensus-based systems (etcd, ZooKeeper, Consul)

Built on Paxos or Raft, proven correct
Automatic leader election and failover
Quorum-based operations ensure consistency

2. PostgreSQL with synchronous replication

synchronous_commit = on
synchronous_standby_names = 'first 1 (standby1, standby2)'

Writes are acknowledged only after at least one standby confirms. Partition of primary from all standbys means writes block.

3. CockroachDB for distributed SQL

Serializable isolation by default
Raft-based replication
Automatic resharding and rebalancing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
SCENARIO: Distributed lock for critical section protection
 
Using etcd or ZooKeeper for CP lock:
 
┌─────────────────────────────────────────────────────────────────┐
│  1. Client acquires lock                                        │
│     PUT /locks/resource-123                                     │
│     Value: client-id, TTL: 30s                                  │
│     Response: { "lease_id": 12345, "fencing_token": 42 }        │
│                                                                 │
│  2. Client performs work, includes fencing token                │
│     UPDATE resources SET ... WHERE fencing_token < 42           │
│     (Only succeeds if we're the latest lock holder)             │
│                                                                 │
│  3. Client releases lock (or TTL expires)                       │
│     DELETE /locks/resource-123                                  │
└─────────────────────────────────────────────────────────────────┘
 
WHY FENCING TOKENS MATTER:
Without fencing: 
  - Client A gets lock, then pauses (GC, network delay)
  - Lock TTL expires, Client B gets lock
  - Client A resumes, thinks it has lock
  - Both A and B modify resource → Corruption!
 
With fencing:
  - Client A: fencing_token = 42
  - Client B: fencing_token = 43
  - A's delayed write includes token 42
  - Storage rejects: token 42 < current token 43
  - Only B's writes succeed → Correctness preserved!

Trade-offs of CP Systems:

Latency: Writes go through consensus, adding round-trips. Expect 10-100ms per write depending on deployment topology.

Availability during partitions: Minority partitions are completely unavailable. In a 5-node cluster, 2 failing nodes means the minority cannot serve requests.

Scalability challenges: Every write must be agreed upon, limiting write throughput. Many CP systems don't scale writes horizontally well.

Complexity: Consensus protocols are subtle. Many implementations have had bugs. Use well-tested systems.

Mitigation strategies:

Use CP for metadata/coordination, not high-volume data
Cache reads aggressively (with appropriate invalidation)
Use async followers for read scaling (with caution about staleness)
Geographic placement to minimize consensus latency

CP Doesn't Mean Infallible

CP systems guarantee consistency when they respond, but they may not respond during partitions. A CP system with all nodes in one datacenter that loses internet connectivity is 100% unavailable externally. CP is about choosing unavailability over inconsistency—but the unavailability is real.

AP Systems: When Availability Is Critical

AP systems continue serving requests during partitions, accepting that replicas may temporarily diverge. Let's explore when and how to use them effectively.

When to Choose AP:

Shopping carts: Better to accept items than lose the sale; merge later
Social media feeds: Showing slightly stale content is fine; being unavailable isn't
IoT data ingestion: Sensors must always be able to report; merge duplicates later
Session stores: User sessions should always work; occasional stale data is acceptable
CDN cached content: Slight staleness is fine; availability is paramount
Metrics and logging: Losing a few datapoints is better than blocking the system

Implementing AP Effectively:

1. Choose the right conflict resolution strategy

Strategy	How It Works	Pros	Cons
Last-Writer-Wins (LWW)	Latest timestamp wins	Simple, automatic	Silently loses data
Multi-Value	Return all versions	No data loss	Pushes resolution to app
CRDTs	Mathematically merge	Automatic, no loss (for supported types)	Limited data type support
Custom merge	Application logic	Full control	Complex, error-prone

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
SCENARIO: Shopping cart across partitioned regions
 
Traditional approach (problematic):
┌─────────────────────────────────────────────────────────────────┐
│  Partition A: cart = ["item1"]                                  │
│  Partition B: cart = ["item2"]                                  │
│                                                                 │
│  After merge with LWW: cart = ["item2"] (item1 LOST!)           │
└─────────────────────────────────────────────────────────────────┘
 
CRDT approach (G-Set: Grow-only Set)
┌─────────────────────────────────────────────────────────────────┐
│  Partition A: cart = {"item1": {added: true}}                   │
│  Partition B: cart = {"item2": {added: true}}                   │
│                                                                 │
│  Merge operation: UNION of sets                                 │
│  After merge: cart = {"item1": ..., "item2": ...} (both items!) │
└─────────────────────────────────────────────────────────────────┘
 
For removals, use OR-Set (Observed-Remove Set):
┌─────────────────────────────────────────────────────────────────┐
│  Each item has unique ID + operation tag                        │
│                                                                 │
│  add("apple", id=1)   → {apple: {adds: [{id:1}], removes: []}} │
│  add("banana", id=2)  → {banana: {adds: [{id:2}], removes: []}}│
│  remove("apple", id=1)→ {apple: {adds: [{id:1}], removes:[1]}} │
│                                                                 │
│  Item visible if ANY add not in removes                         │
│  Concurrent add and remove: add wins (user will remove again)  │
└─────────────────────────────────────────────────────────────────┘

2. Implement read-your-writes when possible

Even in AP systems, users expect to see their own changes. Strategies:

Session stickiness: Route user to the node that received their writes
Version tokens: Include version in response; wait until replica reaches that version
Read from writer: For a short window after write, read from the node that accepted the write

3. Monitor and alert on divergence

AP systems should track:

Replication lag across replicas
Conflict rate and resolution outcomes
Retry rates on inter-node replication
Time since last successful sync per replica pair

Alert when divergence exceeds thresholds—a replica that's hours behind may have a problem.

AP System Pitfalls

•Unbounded divergence: Without monitoring, replicas can drift indefinitely without detection
•LWW data loss: Using LWW without understanding that writes can be silently lost
•Assuming immediate consistency: Designing UX as if writes are instantly visible everywhere
•Ignoring conflict resolution: Assuming conflicts 'won't happen much' without handling them
•No causality tracking: Missing dependencies between operations, causing nonsensical states
•Clock skew ignoring: LWW with unsynchronized clocks causes 'earlier' writes to win

Design for the Reconciliation Case

When designing AP systems, start by designing the reconciliation process. How will you detect conflicts? How will you resolve them? What's the user experience during reconciliation? If you can't answer these questions, you're not ready to deploy AP.

Tunable Consistency: The Best of Both Worlds?

Modern distributed systems increasingly offer tunable consistency, allowing per-operation or per-table consistency settings. This is powerful but introduces complexity.

How Tunable Consistency Works:

Most tunable systems use quorum-based logic:

N: Total number of replicas
W: Write quorum (nodes that must acknowledge a write)
R: Read quorum (nodes that must respond to a read)

Guarantee: If W + R > N, at least one node in the read quorum was in the write quorum, ensuring you read the latest write.

Examples configurations for N=3:

Setting	W	R	W+R	Behavior
Strong	2	2	4	Consistent: any read sees latest write
Write-optimized	1	3	4	Fast writes, slow reads, still consistent
Read-optimized	3	1	4	Slow writes, fast reads, still consistent
Eventual	1	1	2	Fast but may read stale data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Cassandra Consistency Levels with RF=3 (3 replicas):
 
WRITE CONSISTENCY LEVELS:
┌────────────────────────────────────────────────────────────────┐
│  ONE          │ Write to 1 node, return. Fast, risky.         │
│  QUORUM       │ Write to 2/3 nodes. Balanced.                 │
│  ALL          │ Write to 3/3 nodes. Slow, safest.             │
│  LOCAL_QUORUM │ Quorum within local datacenter.               │
│  EACH_QUORUM  │ Quorum in each datacenter. For multi-DC.      │
└────────────────────────────────────────────────────────────────┘
 
READ CONSISTENCY LEVELS:
┌────────────────────────────────────────────────────────────────┐
│  ONE          │ Read from nearest. Fastest, may be stale.     │
│  QUORUM       │ Read from 2/3, return newest. Consistent.     │
│  ALL          │ Read from 3/3. Slowest, most consistent.      │
│  LOCAL_QUORUM │ Quorum within local datacenter.               │
└────────────────────────────────────────────────────────────────┘
 
CONSISTENCY COMBINATIONS:
┌────────────────────────────────────────────────────────────────┐
│  For Strong Consistency:                                       │
│    W=QUORUM + R=QUORUM → W(2) + R(2) = 4 > N(3) ✓              │
│    W=ALL + R=ONE       → W(3) + R(1) = 4 > N(3) ✓              │
│                                                                │
│  For Eventual Consistency (fast):                              │
│    W=ONE + R=ONE       → W(1) + R(1) = 2 < N(3) ✗              │
│    May read stale data, but very fast!                         │
└────────────────────────────────────────────────────────────────┘

Practical Application: Per-Operation Tuning

Develop conventions for your application:

Critical operations: Use strong consistency

// User signup - must not duplicate
await cassandra.execute(
  'INSERT INTO users ...', 
  values,
  { consistency: types.consistencies.quorum }
);

Read-heavy analytics: Use eventual consistency

// Dashboard view - okay if slightly stale
await cassandra.execute(
  'SELECT * FROM page_views ...',
  values,
  { consistency: types.consistencies.one }
);

The Danger of Tunable Consistency:

Tunable consistency adds cognitive load and error opportunity:

Developers must remember which operations need which consistency
Mixing consistency levels can lead to subtle bugs
Testing all combinations is difficult
Production behavior depends on per-call settings, not just configuration

Best practice: Establish a small set of allowed patterns (e.g., "critical writes" and "casual reads") and use framework-level enforcement. Don't let every developer set consistency ad-hoc.

Tunable Consistency Isn't Free

Higher consistency levels have direct performance costs. QUORUM writes are ~2x the latency of ONE writes. ALL reads are 3x the latency of ONE reads (with RF=3). Profile your application to understand the actual trade-offs in your deployment.

Beyond CAP: Other Important Trade-offs

CAP is foundational, but it's not the only framework for distributed system trade-offs. Understanding related concepts provides a more complete picture.

PACELC: Extending CAP

The PACELC theorem extends CAP to describe behavior when there is NO partition:

Ppartition → Availability/Consistency Else (no partition) → Latency/Consistency

Translation:

During partitions: Choose between Availability and Consistency (CAP)
During normal operation: Choose between Latency and Consistency

This captures an important reality: even without partitions, synchronous replication (strong consistency) increases latency. Some systems sacrifice consistency for lower latency even when they could achieve consistency.

Examples in PACELC terms:

System	Partition (A/C)	Else (L/C)	Description
PostgreSQL (sync)	C	C	Always consistent, pays latency cost
Cassandra (default)	A	L	Available and fast, eventual consistency
DynamoDB (default)	A	L	Available and fast, eventual consistency
DynamoDB (strong read)	A	C	Available, but pays latency for consistency
VoltDB	C	C	Strongly consistent, accepts higher latency
PNUTS (Yahoo)	A	L	Prioritized availability and latency

ACID vs BASE

Database consistency is often discussed in terms of ACID (traditional) vs BASE (NoSQL):

ACID:

Atomicity: Transactions complete fully or not at all
Consistency: Transactions maintain database invariants
Isolation: Concurrent transactions don't interfere
Durability: Committed data survives failures

BASE:

Basically Available: System responds, even if degraded
Soft state: State may change without input (due to eventual consistency)
Eventual consistency: System becomes consistent over time

BASE systems trade ACID guarantees for availability and partition tolerance. They're not "worse"—they're optimized for different use cases.

Consistency Models Hierarchy:

Beyond CAP's binary consistency, there's a rich hierarchy:

Linearizability (strongest) - Real-time order, CAP's "C"
Sequential consistency - Some total order, not real-time
Causal consistency - Causally related events ordered
Read-your-writes - You see your own writes
Monotonic reads - You never see data go backward
Eventual consistency (weakest) - Eventually converges

Use the Right Hammer

CAP, PACELC, ACID, BASE—these are all tools for reasoning about distributed systems. CAP helps when thinking about partition behavior. PACELC adds latency considerations. ACID/BASE helps compare transactional guarantees. Use whichever framework is most illuminating for your specific decision.

Real-World Architecture Patterns

Let's examine architectural patterns that apply CAP thinking to real systems.

Pattern 1: Polyglot Persistence

Use different databases for different data based on CAP requirements:

┌─────────────────────────────────────────────────────────────┐
│                    MICROSERVICES SYSTEM                     │
├─────────────────────────────────────────────────────────────┤
│  Order Service     → PostgreSQL (CP)                        │
│  (Financial data needs strong consistency)                  │
│                                                             │
│  Session Service   → Redis Cluster (AP)                     │
│  (Sessions need availability, can tolerate staleness)       │
│                                                             │
│  Product Catalog   → Elasticsearch (AP)                     │
│  (Search needs availability, eventual updates are fine)     │
│                                                             │
│  User Preferences  → DynamoDB (AP)                          │
│  (Preferences can be eventually consistent)                 │
│                                                             │
│  Configuration     → etcd (CP)                              │
│  (Config must be consistent across all services)            │
└─────────────────────────────────────────────────────────────┘

Pattern 2: CQRS with Event Sourcing

Command Query Responsibility Segregation separates reads from writes, allowing different consistency for each:

Command side (writes): Strong consistency, ensure correct state transitions
Query side (reads): Eventually consistent projections, optimized for read patterns
Event store: Append-only, source of truth, CP characteristics
Read models: Derived from events, can be rebuilt, AP characteristics

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
PROBLEM: Order placement needs to:
1. Reserve inventory (Inventory Service)
2. Process payment (Payment Service)  
3. Confirm order (Order Service)
 
Can't use distributed transactions (2PC) - too slow, hurts availability.
 
SAGA PATTERN SOLUTION:
┌─────────────────────────────────────────────────────────────────┐
│  CHOREOGRAPHY SAGA:                                             │
│                                                                 │
│  1. Order Service: Create order (PENDING)                       │
│     → Publish: OrderCreated event                               │
│                                                                 │
│  2. Inventory Service: Receives OrderCreated                    │
│     → Reserve inventory                                         │
│     → Publish: InventoryReserved event                          │
│                                                                 │
│  3. Payment Service: Receives InventoryReserved                 │
│     → Process payment                                           │
│     → Publish: PaymentProcessed event                           │
│                                                                 │
│  4. Order Service: Receives PaymentProcessed                    │
│     → Mark order CONFIRMED                                      │
└─────────────────────────────────────────────────────────────────┘
 
COMPENSATION ON FAILURE:
┌─────────────────────────────────────────────────────────────────┐
│  If Payment fails:                                              │
│  → Payment Service: Publish PaymentFailed                       │
│  → Inventory Service: Release reserved inventory                │
│  → Order Service: Mark order CANCELLED                          │
│                                                                 │
│  Result: Eventually consistent, but correct!                    │
│  Trade-off: Temporary inconsistency (order PENDING but no       │
│             actual reservation in payment failure window)       │
└─────────────────────────────────────────────────────────────────┘

Pattern 3: Cache-Aside with TTL

In this pattern, the cache (AP) provides low-latency reads while the database (CP) remains the source of truth:

Read: Check cache first. On miss, read from DB, populate cache with TTL.
Write: Write to DB. Invalidate cache entry (or let TTL expire).

Trade-off: Reads may return stale data up to TTL duration. Acceptable for many use cases (product pages, user profiles, analytics dashboards).

Pattern 4: Write-Ahead Log with Async Replication

Local writes are durable to a WAL, then asynchronously replicated:

Client writes to local node
Write persisted to local WAL (durable)
Success returned to client
Background process replicates to other nodes

Trade-off: If the local node fails before replication, data is lost from other nodes' perspective. But writes are fast and available.

Patterns Are Building Blocks

Real systems combine multiple patterns. An e-commerce platform might use Polyglot Persistence for storage selection, CQRS for order processing, Saga for distributed transactions, and Cache-Aside for catalog reads. Each pattern addresses a specific CAP challenge in a specific component.

Summary: CAP in Practice

We've bridged theory and practice, exploring how CAP principles translate into real architectural decisions. Let's consolidate the essential insights.

Key Takeaways

•Start by categorizing your data: Different data has different CAP requirements. Financial data needs CP. Session data can be AP. Analyze before choosing.
•Modern systems offer tunable consistency: Don't think in absolutes. MongoDB, DynamoDB, Cassandra all allow per-operation consistency settings.
•CP systems are about refusing incorrect ops: They sacrifice availability to ensure consistency. Use for financial, coordination, and identity data.
•AP systems are about always serving: They sacrifice consistency to remain available. Use for high-volume, tolerant, user-facing data.
•PACELC extends CAP with latency: Even without partitions, strong consistency costs latency. Consider L/C trade-offs in normal operation.
•Polyglot persistence matches data to stores: Use different databases for different CAP requirements within the same system.
•Saga pattern enables distributed transactions without 2PC: Choreographed services with compensating transactions achieve eventual consistency.
•Test your CAP assumptions: Inject partitions, measure actual behavior, validate that your system behaves as designed under failure.

What's Next:

With CAP in Practice covered, we'll close this module with Choosing Between CP and AP—a focused decision guide for making the final call. You'll get concrete criteria, decision trees, and case studies to solidify your ability to make CAP-informed architectural choices.

Page Complete

You now understand how to apply CAP theory in practice: decision frameworks, how popular systems make trade-offs, implementing CP and AP strategies, tunable consistency, and architectural patterns. This practical knowledge transforms CAP from an abstract theorem into a daily design tool.

4 / 5

Loading learning content...

System Design (HLD)CAP Theorem

CAP Theorem: Understanding Distributed System Trade-offs

LevelIntermediate

Duration90 mins

TopicCAP Theorem

4 / 5

CAP in Practice — Applying Theory to Real Systems

From Theory to Architecture

What You Will Learn

A Practical CAP Decision Framework

When designing a distributed system, use this framework to make CAP-informed decisions:

Step 1: Identify Your Data Categories

Not all data has the same requirements. Categorize your data:

Category	Examples	Typical Priority
Financial	Payments, balances, ledgers	Consistency
User Identity	Authentication, permissions	Consistency
User Content	Posts, comments, uploads	Often availability
Session Data	Login state, preferences	Availability
Analytics	Metrics, logs, events	Availability
Inventory	Stock levels, reservations	Context-dependent
Catalog	Products, prices, descriptions	Often availability

Step 2: Define Your Failure Scenarios

For each category, ask: What happens during a partition?

Accepting stale data: Is a 5-minute-old view acceptable?
Returning errors: Can users tolerate "try again later"?
Queue for later: Can operations be deferred and applied later?
Conflicting writes: What if two users modify the same data?

Key Questions for CAP Decisions

•What's the cost of showing stale data? Social feed can be stale. Bank balance cannot.
•What's the cost of refusing a request? E-commerce cart: high cost. Compliance audit: acceptable.
•How long do partitions typically last? Seconds: maybe wait it out. Hours: need a strategy.
•Can conflicts be automatically resolved? Counters can merge. Unique constraints cannot.
•What do users expect? Financial apps expect accuracy. Social apps expect availability.
•What are the regulatory requirements? Some domains legally require consistency.

Step 3: Choose Your Strategy Per Category

For data requiring consistency:

Use CP storage (PostgreSQL with sync replication, etcd, CockroachDB)
Accept that some operations will fail during partitions
Design UX for graceful error handling
Implement retry logic with exponential backoff

For data tolerating eventual consistency:

Use AP storage (Cassandra, DynamoDB with eventual read)
Implement conflict resolution (LWW, merge, CRDT)
Consider read-your-writes guarantees for better UX
Monitor replication lag and divergence

For data that needs both (the hard cases):

Separate the data: some fields CP, some fields AP
Use hybrid architecture: CP for writes, cached AP for reads
Accept degraded functionality during partitions
Design explicit modes: "offline mode" vs "online mode"

Step 4: Validate with Failure Testing

After implementation, validate your choices:

Inject partitions and verify behavior matches design
Measure actual availability and consistency during failures
Test conflict resolution with realistic scenarios
Update design based on production experiences

The 80/20 Rule of CAP

How Popular Systems Make CAP Trade-offs

Understanding how production systems handle CAP provides practical insight. Let's examine several popular distributed databases and their CAP characteristics.

CAP Characteristics of Popular Distributed Systems
System	CAP Mode	Consistency Model	Partition Behavior	When to Use
PostgreSQL (single)	CA*	Strong (ACID)	N/A (single node)	Strong consistency, moderate scale
PostgreSQL (sync rep)	CP	Strong	Minority side unavailable	Strong consistency with HA
MySQL Cluster	CP	Strong	Waits for majority	OLTP with high availability
CockroachDB	CP	Serializable	Minority partitions readonly	Distributed SQL, strong consistency
MongoDB	Configurable	Tunable per-op	Depends on write/read concern	Flexible, document-oriented
Cassandra	AP	Eventual	All nodes serve	High availability, write-heavy
DynamoDB	AP	Eventual (default)	Continues operating	Serverless, predictable performance
Redis Cluster	CP (consensus)	Strong within slot	Minority unavailable	Caching with availability
etcd	CP	Linearizable	Minority unavailable	Configuration, coordination
ZooKeeper	CP	Linearizable	Minority unavailable	Coordination, leader election

Systems Are More Nuanced Than Labels

Deep Dive: MongoDB's Configurable Consistency

MongoDB beautifully illustrates how modern systems offer tunable consistency:

Write Concern: Controls durability/consistency of writes

w: 1 - Acknowledged when primary receives (AP-ish)
w: majority - Acknowledged when majority has it (CP-ish)
w: 0 - Fire and forget (pure AP)

Read Concern: Controls consistency of reads

local - Returns primary's data, may be rolled back (AP)
majority - Returns data confirmed by majority (CP)
linearizable - Reflects all acknowledged writes (strong CP)

Read Preference: Where to read from

primary - Always read from primary (consistent but SPOF)
secondary - Read from secondaries (available but potentially stale)
nearest - Read from lowest-latency node (fastest but unpredictable consistency)

This flexibility lets you tune per-operation, using strong consistency for critical ops and relaxed consistency for others.

CP Systems: When Correctness Is Non-Negotiable

CP systems sacrifice availability during partitions to maintain consistency. Let's explore when and how to use them effectively.

When to Choose CP:

Financial transactions: Double-spending, incorrect balances are unacceptable
Inventory management: Overselling has real costs (expedited shipping, customer anger)
Booking systems: Double-booking a hotel room, flight seat, or appointment
Leader election: Having two leaders causes split-brain and data corruption
Configuration data: Inconsistent config can crash entire systems
Identity/authentication: Wrong permissions are security vulnerabilities

Implementing CP Effectively:

1. Use consensus-based systems (etcd, ZooKeeper, Consul)

Built on Paxos or Raft, proven correct
Automatic leader election and failover
Quorum-based operations ensure consistency

2. PostgreSQL with synchronous replication

synchronous_commit = on
synchronous_standby_names = 'first 1 (standby1, standby2)'

Writes are acknowledged only after at least one standby confirms. Partition of primary from all standbys means writes block.

3. CockroachDB for distributed SQL

Serializable isolation by default
Raft-based replication
Automatic resharding and rebalancing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
SCENARIO: Distributed lock for critical section protection
 
Using etcd or ZooKeeper for CP lock:
 
┌─────────────────────────────────────────────────────────────────┐
│  1. Client acquires lock                                        │
│     PUT /locks/resource-123                                     │
│     Value: client-id, TTL: 30s                                  │
│     Response: { "lease_id": 12345, "fencing_token": 42 }        │
│                                                                 │
│  2. Client performs work, includes fencing token                │
│     UPDATE resources SET ... WHERE fencing_token < 42           │
│     (Only succeeds if we're the latest lock holder)             │
│                                                                 │
│  3. Client releases lock (or TTL expires)                       │
│     DELETE /locks/resource-123                                  │
└─────────────────────────────────────────────────────────────────┘
 
WHY FENCING TOKENS MATTER:
Without fencing: 
  - Client A gets lock, then pauses (GC, network delay)
  - Lock TTL expires, Client B gets lock
  - Client A resumes, thinks it has lock
  - Both A and B modify resource → Corruption!
 
With fencing:
  - Client A: fencing_token = 42
  - Client B: fencing_token = 43
  - A's delayed write includes token 42
  - Storage rejects: token 42 < current token 43
  - Only B's writes succeed → Correctness preserved!

Trade-offs of CP Systems:

Latency: Writes go through consensus, adding round-trips. Expect 10-100ms per write depending on deployment topology.

Availability during partitions: Minority partitions are completely unavailable. In a 5-node cluster, 2 failing nodes means the minority cannot serve requests.

Scalability challenges: Every write must be agreed upon, limiting write throughput. Many CP systems don't scale writes horizontally well.

Complexity: Consensus protocols are subtle. Many implementations have had bugs. Use well-tested systems.

Mitigation strategies:

Use CP for metadata/coordination, not high-volume data
Cache reads aggressively (with appropriate invalidation)
Use async followers for read scaling (with caution about staleness)
Geographic placement to minimize consensus latency

CP Doesn't Mean Infallible

AP Systems: When Availability Is Critical

AP systems continue serving requests during partitions, accepting that replicas may temporarily diverge. Let's explore when and how to use them effectively.

When to Choose AP:

Shopping carts: Better to accept items than lose the sale; merge later
Social media feeds: Showing slightly stale content is fine; being unavailable isn't
IoT data ingestion: Sensors must always be able to report; merge duplicates later
Session stores: User sessions should always work; occasional stale data is acceptable
CDN cached content: Slight staleness is fine; availability is paramount
Metrics and logging: Losing a few datapoints is better than blocking the system

Implementing AP Effectively:

1. Choose the right conflict resolution strategy

Strategy	How It Works	Pros	Cons
Last-Writer-Wins (LWW)	Latest timestamp wins	Simple, automatic	Silently loses data
Multi-Value	Return all versions	No data loss	Pushes resolution to app
CRDTs	Mathematically merge	Automatic, no loss (for supported types)	Limited data type support
Custom merge	Application logic	Full control	Complex, error-prone

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
SCENARIO: Shopping cart across partitioned regions
 
Traditional approach (problematic):
┌─────────────────────────────────────────────────────────────────┐
│  Partition A: cart = ["item1"]                                  │
│  Partition B: cart = ["item2"]                                  │
│                                                                 │
│  After merge with LWW: cart = ["item2"] (item1 LOST!)           │
└─────────────────────────────────────────────────────────────────┘
 
CRDT approach (G-Set: Grow-only Set)
┌─────────────────────────────────────────────────────────────────┐
│  Partition A: cart = {"item1": {added: true}}                   │
│  Partition B: cart = {"item2": {added: true}}                   │
│                                                                 │
│  Merge operation: UNION of sets                                 │
│  After merge: cart = {"item1": ..., "item2": ...} (both items!) │
└─────────────────────────────────────────────────────────────────┘
 
For removals, use OR-Set (Observed-Remove Set):
┌─────────────────────────────────────────────────────────────────┐
│  Each item has unique ID + operation tag                        │
│                                                                 │
│  add("apple", id=1)   → {apple: {adds: [{id:1}], removes: []}} │
│  add("banana", id=2)  → {banana: {adds: [{id:2}], removes: []}}│
│  remove("apple", id=1)→ {apple: {adds: [{id:1}], removes:[1]}} │
│                                                                 │
│  Item visible if ANY add not in removes                         │
│  Concurrent add and remove: add wins (user will remove again)  │
└─────────────────────────────────────────────────────────────────┘

2. Implement read-your-writes when possible

Even in AP systems, users expect to see their own changes. Strategies:

Session stickiness: Route user to the node that received their writes
Version tokens: Include version in response; wait until replica reaches that version
Read from writer: For a short window after write, read from the node that accepted the write

3. Monitor and alert on divergence

AP systems should track:

Replication lag across replicas
Conflict rate and resolution outcomes
Retry rates on inter-node replication
Time since last successful sync per replica pair

Alert when divergence exceeds thresholds—a replica that's hours behind may have a problem.

AP System Pitfalls

•Unbounded divergence: Without monitoring, replicas can drift indefinitely without detection
•LWW data loss: Using LWW without understanding that writes can be silently lost
•Assuming immediate consistency: Designing UX as if writes are instantly visible everywhere
•Ignoring conflict resolution: Assuming conflicts 'won't happen much' without handling them
•No causality tracking: Missing dependencies between operations, causing nonsensical states
•Clock skew ignoring: LWW with unsynchronized clocks causes 'earlier' writes to win

Design for the Reconciliation Case

Tunable Consistency: The Best of Both Worlds?

Modern distributed systems increasingly offer tunable consistency, allowing per-operation or per-table consistency settings. This is powerful but introduces complexity.

How Tunable Consistency Works:

Most tunable systems use quorum-based logic:

N: Total number of replicas
W: Write quorum (nodes that must acknowledge a write)
R: Read quorum (nodes that must respond to a read)

Guarantee: If W + R > N, at least one node in the read quorum was in the write quorum, ensuring you read the latest write.

Examples configurations for N=3:

Setting	W	R	W+R	Behavior
Strong	2	2	4	Consistent: any read sees latest write
Write-optimized	1	3	4	Fast writes, slow reads, still consistent
Read-optimized	3	1	4	Slow writes, fast reads, still consistent
Eventual	1	1	2	Fast but may read stale data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Cassandra Consistency Levels with RF=3 (3 replicas):
 
WRITE CONSISTENCY LEVELS:
┌────────────────────────────────────────────────────────────────┐
│  ONE          │ Write to 1 node, return. Fast, risky.         │
│  QUORUM       │ Write to 2/3 nodes. Balanced.                 │
│  ALL          │ Write to 3/3 nodes. Slow, safest.             │
│  LOCAL_QUORUM │ Quorum within local datacenter.               │
│  EACH_QUORUM  │ Quorum in each datacenter. For multi-DC.      │
└────────────────────────────────────────────────────────────────┘
 
READ CONSISTENCY LEVELS:
┌────────────────────────────────────────────────────────────────┐
│  ONE          │ Read from nearest. Fastest, may be stale.     │
│  QUORUM       │ Read from 2/3, return newest. Consistent.     │
│  ALL          │ Read from 3/3. Slowest, most consistent.      │
│  LOCAL_QUORUM │ Quorum within local datacenter.               │
└────────────────────────────────────────────────────────────────┘
 
CONSISTENCY COMBINATIONS:
┌────────────────────────────────────────────────────────────────┐
│  For Strong Consistency:                                       │
│    W=QUORUM + R=QUORUM → W(2) + R(2) = 4 > N(3) ✓              │
│    W=ALL + R=ONE       → W(3) + R(1) = 4 > N(3) ✓              │
│                                                                │
│  For Eventual Consistency (fast):                              │
│    W=ONE + R=ONE       → W(1) + R(1) = 2 < N(3) ✗              │
│    May read stale data, but very fast!                         │
└────────────────────────────────────────────────────────────────┘

Practical Application: Per-Operation Tuning

Develop conventions for your application:

Critical operations: Use strong consistency

// User signup - must not duplicate
await cassandra.execute(
  'INSERT INTO users ...', 
  values,
  { consistency: types.consistencies.quorum }
);

Read-heavy analytics: Use eventual consistency

// Dashboard view - okay if slightly stale
await cassandra.execute(
  'SELECT * FROM page_views ...',
  values,
  { consistency: types.consistencies.one }
);

The Danger of Tunable Consistency:

Tunable consistency adds cognitive load and error opportunity:

Developers must remember which operations need which consistency
Mixing consistency levels can lead to subtle bugs
Testing all combinations is difficult
Production behavior depends on per-call settings, not just configuration

Best practice: Establish a small set of allowed patterns (e.g., "critical writes" and "casual reads") and use framework-level enforcement. Don't let every developer set consistency ad-hoc.

Tunable Consistency Isn't Free

Beyond CAP: Other Important Trade-offs

CAP is foundational, but it's not the only framework for distributed system trade-offs. Understanding related concepts provides a more complete picture.

PACELC: Extending CAP

The PACELC theorem extends CAP to describe behavior when there is NO partition:

Ppartition → Availability/Consistency Else (no partition) → Latency/Consistency

Translation:

During partitions: Choose between Availability and Consistency (CAP)
During normal operation: Choose between Latency and Consistency

Examples in PACELC terms:

System	Partition (A/C)	Else (L/C)	Description
PostgreSQL (sync)	C	C	Always consistent, pays latency cost
Cassandra (default)	A	L	Available and fast, eventual consistency
DynamoDB (default)	A	L	Available and fast, eventual consistency
DynamoDB (strong read)	A	C	Available, but pays latency for consistency
VoltDB	C	C	Strongly consistent, accepts higher latency
PNUTS (Yahoo)	A	L	Prioritized availability and latency

ACID vs BASE

Database consistency is often discussed in terms of ACID (traditional) vs BASE (NoSQL):

ACID:

Atomicity: Transactions complete fully or not at all
Consistency: Transactions maintain database invariants
Isolation: Concurrent transactions don't interfere
Durability: Committed data survives failures

BASE:

Basically Available: System responds, even if degraded
Soft state: State may change without input (due to eventual consistency)
Eventual consistency: System becomes consistent over time

BASE systems trade ACID guarantees for availability and partition tolerance. They're not "worse"—they're optimized for different use cases.

Consistency Models Hierarchy:

Beyond CAP's binary consistency, there's a rich hierarchy:

Linearizability (strongest) - Real-time order, CAP's "C"
Sequential consistency - Some total order, not real-time
Causal consistency - Causally related events ordered
Read-your-writes - You see your own writes
Monotonic reads - You never see data go backward
Eventual consistency (weakest) - Eventually converges

Use the Right Hammer

Real-World Architecture Patterns

Let's examine architectural patterns that apply CAP thinking to real systems.

Pattern 1: Polyglot Persistence

Use different databases for different data based on CAP requirements:

┌─────────────────────────────────────────────────────────────┐
│                    MICROSERVICES SYSTEM                     │
├─────────────────────────────────────────────────────────────┤
│  Order Service     → PostgreSQL (CP)                        │
│  (Financial data needs strong consistency)                  │
│                                                             │
│  Session Service   → Redis Cluster (AP)                     │
│  (Sessions need availability, can tolerate staleness)       │
│                                                             │
│  Product Catalog   → Elasticsearch (AP)                     │
│  (Search needs availability, eventual updates are fine)     │
│                                                             │
│  User Preferences  → DynamoDB (AP)                          │
│  (Preferences can be eventually consistent)                 │
│                                                             │
│  Configuration     → etcd (CP)                              │
│  (Config must be consistent across all services)            │
└─────────────────────────────────────────────────────────────┘

Pattern 2: CQRS with Event Sourcing

Command Query Responsibility Segregation separates reads from writes, allowing different consistency for each:

Command side (writes): Strong consistency, ensure correct state transitions
Query side (reads): Eventually consistent projections, optimized for read patterns
Event store: Append-only, source of truth, CP characteristics
Read models: Derived from events, can be rebuilt, AP characteristics

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
PROBLEM: Order placement needs to:
1. Reserve inventory (Inventory Service)
2. Process payment (Payment Service)  
3. Confirm order (Order Service)
 
Can't use distributed transactions (2PC) - too slow, hurts availability.
 
SAGA PATTERN SOLUTION:
┌─────────────────────────────────────────────────────────────────┐
│  CHOREOGRAPHY SAGA:                                             │
│                                                                 │
│  1. Order Service: Create order (PENDING)                       │
│     → Publish: OrderCreated event                               │
│                                                                 │
│  2. Inventory Service: Receives OrderCreated                    │
│     → Reserve inventory                                         │
│     → Publish: InventoryReserved event                          │
│                                                                 │
│  3. Payment Service: Receives InventoryReserved                 │
│     → Process payment                                           │
│     → Publish: PaymentProcessed event                           │
│                                                                 │
│  4. Order Service: Receives PaymentProcessed                    │
│     → Mark order CONFIRMED                                      │
└─────────────────────────────────────────────────────────────────┘
 
COMPENSATION ON FAILURE:
┌─────────────────────────────────────────────────────────────────┐
│  If Payment fails:                                              │
│  → Payment Service: Publish PaymentFailed                       │
│  → Inventory Service: Release reserved inventory                │
│  → Order Service: Mark order CANCELLED                          │
│                                                                 │
│  Result: Eventually consistent, but correct!                    │
│  Trade-off: Temporary inconsistency (order PENDING but no       │
│             actual reservation in payment failure window)       │
└─────────────────────────────────────────────────────────────────┘

Pattern 3: Cache-Aside with TTL

In this pattern, the cache (AP) provides low-latency reads while the database (CP) remains the source of truth:

Read: Check cache first. On miss, read from DB, populate cache with TTL.
Write: Write to DB. Invalidate cache entry (or let TTL expire).

Trade-off: Reads may return stale data up to TTL duration. Acceptable for many use cases (product pages, user profiles, analytics dashboards).

Pattern 4: Write-Ahead Log with Async Replication

Local writes are durable to a WAL, then asynchronously replicated:

Client writes to local node
Write persisted to local WAL (durable)
Success returned to client
Background process replicates to other nodes

Trade-off: If the local node fails before replication, data is lost from other nodes' perspective. But writes are fast and available.

Patterns Are Building Blocks

Summary: CAP in Practice

We've bridged theory and practice, exploring how CAP principles translate into real architectural decisions. Let's consolidate the essential insights.

Key Takeaways

•Start by categorizing your data: Different data has different CAP requirements. Financial data needs CP. Session data can be AP. Analyze before choosing.
•Modern systems offer tunable consistency: Don't think in absolutes. MongoDB, DynamoDB, Cassandra all allow per-operation consistency settings.
•CP systems are about refusing incorrect ops: They sacrifice availability to ensure consistency. Use for financial, coordination, and identity data.
•AP systems are about always serving: They sacrifice consistency to remain available. Use for high-volume, tolerant, user-facing data.
•PACELC extends CAP with latency: Even without partitions, strong consistency costs latency. Consider L/C trade-offs in normal operation.
•Polyglot persistence matches data to stores: Use different databases for different CAP requirements within the same system.
•Saga pattern enables distributed transactions without 2PC: Choreographed services with compensating transactions achieve eventual consistency.
•Test your CAP assumptions: Inject partitions, measure actual behavior, validate that your system behaves as designed under failure.

What's Next:

Page Complete

4 / 5