System Design (HLD)PACELC Theorem

PACELC Theorem: Beyond CAP

LevelIntermediate

Duration75 mins

TopicPACELC Theorem

4 / 4

Practical Implications: Applying PACELC to Real-World Design

From Theory to Production Systems

Theory without application is incomplete. You now understand PACELC's extension of CAP, the behavior of systems during normal operation, and the mathematical underpinnings of the latency vs. consistency trade-off. But how do you actually use this knowledge?

This page bridges the gap between theoretical understanding and practical application. We'll examine how PACELC informs database selection, guides architecture decisions, shapes configuration strategies, and provides a framework for evaluating existing systems.

The goal is to transform PACELC from an academic concept into a practical tool you reach for whenever you design, evaluate, or troubleshoot distributed systems.

What You Will Learn

By the end of this page, you will be able to classify databases and systems according to PACELC, select appropriate technologies based on your consistency and latency requirements, configure systems to optimize for your specific workload, and apply PACELC thinking to real-world architecture decisions and trade-off discussions.

Database Selection Through the PACELC Lens

Database selection is one of the most consequential decisions in system design. PACELC provides a framework for evaluating databases based on their consistency and latency characteristics.

The PACELC Database Matrix:

Let's classify popular databases according to their default PACELC behavior:

Database PACELC Classification
Database	PACELC	Partition Behavior	Normal Operation	Configurability
PostgreSQL (sync replication)	PC/EC	Primary unavailable during partition	Synchronous writes to standbys	Tune synchronous_commit, synchronous_standby_names
MySQL InnoDB Cluster	PC/EC	Requires majority for transactions	Group Replication with synchronous commit	Semi-sync mode available
MongoDB (default)	PC/EC	Election blocks writes; reads continue	Primary handles writes; configurable read preference	Write concern, read concern tunable
CockroachDB	PC/EC	Requires majority for liveness	Serializable by default; Raft consensus	Transaction isolation levels
Google Spanner	PC/EC	Requires majority	TrueTime for external consistency	Read timestamp options
Cassandra (default)	PA/EL	Accepts writes on both sides	Eventual consistency; tunable levels	Consistency level per operation
DynamoDB (default)	PA/EL	Eventual across regions	Eventually consistent reads default	Strong consistency opt-in per read
Riak	PA/EL	CRDTs resolve conflicts	Eventual consistency via vector clocks	R/W values tunable
Redis Cluster	PA/EL	Async replication; potential data loss	Single-node operations; no multi-key TX	WAIT command for sync
Amazon Aurora	PC/EL	Quorum for writes	Async read replicas; write to quorum	Reader endpoints, writer endpoint

Selection Criteria Based on PACELC:

Choose PC/EC When:

•Correctness is paramount — Financial systems, inventory management, booking systems where double-selling is catastrophic.
•Simple consistency reasoning — When developers shouldn't need to think about eventual consistency semantics.
•Single-region or latency-tolerant — When cross-region latency is acceptable or all nodes are co-located.
•ACID transactions required — Multi-row/multi-table transactions with isolation guarantees.
•Audit and compliance — Regulatory requirements for consistent, ordered records.

Choose PA/EL When:

•Latency is critical — User-facing systems where 100ms+ responses are unacceptable.
•Global distribution required — Users worldwide need low-latency access.
•High write throughput — Millions of writes per second; synchronous coordination is a bottleneck.
•Eventual consistency acceptable — Social feeds, analytics, logging, caching where temporary inconsistency is fine.
•Availability over consistency — Always-writable systems like shopping carts, user preferences.

Many Databases Are Configurable

Most modern databases allow per-operation consistency tuning. Cassandra and DynamoDB can behave as PA/EL or PC/EC depending on consistency level settings. MongoDB's write and read concerns offer similar flexibility. This means your choice of database doesn't lock you into a single PACELC quadrant—but understanding the default behavior and configuration options is essential.

Architecture Patterns Informed by PACELC

PACELC doesn't just inform database selection—it shapes entire architecture patterns. Let's examine common patterns through the PACELC lens:

Pattern 1: CQRS (Command Query Responsibility Segregation)

CQRS separates read and write models, allowing different PACELC choices for each:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
CQRS Architecture with PACELC:
 
┌─────────────────────────────────────────────────────────────┐
│                        Clients                               │
└─────────────┬────────────────────────────┬──────────────────┘
              │ Commands (writes)           │ Queries (reads)
              ▼                             ▼
┌─────────────────────────┐    ┌──────────────────────────────┐
│    Command Service      │    │       Query Service          │
│    (PC/EC behavior)     │    │       (PA/EL behavior)       │
│                         │    │                              │
│  - Strong consistency   │    │  - Eventual consistency      │
│  - Higher latency OK    │    │  - Low latency critical      │
│  - Write to primary     │    │  - Read from replicas        │
└───────────┬─────────────┘    └──────────────┬───────────────┘
            │                                  │
            ▼                                  ▼
┌─────────────────────────┐    ┌──────────────────────────────┐
│   Write Database        │───▶│     Read Database(s)         │
│   (e.g., PostgreSQL)    │    │   (e.g., Elasticsearch,      │
│                         │    │    Redis, Read Replicas)     │
│  Optimized for:         │    │  Optimized for:              │
│  - ACID compliance      │    │  - Query performance         │
│  - Consistency          │    │  - Low latency               │
│  - Durability           │    │  - Read scaling              │
└─────────────────────────┘    └──────────────────────────────┘
            │                                  ▲
            └──────────── Async Sync ──────────┘
                      (Event sourcing/CDC)

Pattern 2: Multi-Region Active-Active

Globally distributed systems with multiple active regions face the full PACELC trade-off:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Multi-Region Deployment Strategies:
 
1. Single-Leader (PC/EC globally):
   ┌─────────────┐
   │  Region A   │◀── All writes
   │  (Primary)  │
   └──────┬──────┘
          │ Sync replication (100-200ms latency penalty)
   ┌──────┼──────┐
   ▼      ▼      ▼
┌──────┐┌──────┐┌──────┐
│Reg B ││Reg C ││Reg D │  (Read-only replicas)
└──────┘└──────┘└──────┘
 
   + Strong consistency globally
   - High write latency from non-primary regions
   - Single point of failure for writes
 
2. Multi-Leader + Conflict Resolution (PA/EL globally):
   ┌──────────┐    ┌──────────┐    ┌──────────┐
   │ Region A │◀──▶│ Region B │◀──▶│ Region C │
   │ (Leader) │    │ (Leader) │    │ (Leader) │
   └──────────┘    └──────────┘    └──────────┘
        ▲               ▲               ▲
        │               │               │
     Writes          Writes          Writes
   (local)          (local)         (local)
   
   + Low latency writes everywhere
   + High availability (any region can accept writes)
   - Conflicts possible; need resolution strategy (LWW, CRDT, etc.)
   - Eventual consistency; temporary divergence
 
3. Hybrid: Strong within region, eventual across (PC locally, PA globally):
   Each region runs PC/EC internally
   Cross-region replication is async (EL)
   Users routed to nearest region
   Cross-region reads may be stale

Pattern 3: The Saga Pattern for Distributed Transactions

When strong consistency is needed across services, Sagas provide eventual consistency with compensating transactions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Saga Pattern (PA/EL with application-level consistency):
 
Traditional 2PC (PC/EC):
  - All participants must be available
  - Locks held during coordination
  - High latency, poor availability during partitions
 
Saga (PA/EL with eventual correctness):
  Step 1: Book flight (local commit, fast)
  Step 2: Book hotel (local commit, fast)
  Step 3: Charge payment (local commit, fast)
  
  If Step 3 fails:
    Compensate Step 2: Cancel hotel
    Compensate Step 1: Cancel flight
    
  PACELC Implications:
    - Each step is fast (EL) - no distributed locking
    - Temporary inconsistencies exist (flight booked, hotel not yet)
    - Eventually consistent via compensating actions
    - Available during partitions (PA) - steps queued
    
  Trade-off:
    - Simpler consistency reasoning with 2PC
    - Better latency and availability with Sagas
    - More complex error handling (compensating logic)

Pattern Selection Heuristic

When choosing architecture patterns: If your system prioritizes correctness over latency and operates primarily in one region, favor patterns that provide PC/EC (single leader, 2PC). If your system prioritizes responsiveness and global availability, favor patterns that provide PA/EL (CQRS, multi-leader, Sagas) with appropriate conflict resolution and eventual consistency mechanisms.

Configuration Strategies for PACELC Optimization

Most distributed databases expose configuration knobs that let you tune your position on the PACELC spectrum. Understanding these configurations is essential for optimizing your system.

Cassandra Consistency Tuning:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Cassandra Consistency Levels
 
-- EL Behavior (default): Fast but eventually consistent
CONSISTENCY ONE
INSERT INTO orders (id, status) VALUES (uuid(), 'pending');
-- Latency: ~2-5ms
-- Writes to any single replica; async replication to others
-- Risk: Data loss if replica fails before replication
 
-- Balanced: Majority agreement
CONSISTENCY QUORUM
SELECT * FROM orders WHERE id = ?;
-- Latency: ~10-30ms (depends on RF and topology)
-- Requires majority (RF/2 + 1) for reads and writes
-- Strong consistency when using QUORUM for both R and W
 
-- EC Behavior: Maximum consistency
CONSISTENCY ALL
INSERT INTO accounts (id, balance) VALUES (?, ?);
-- Latency: ~50-200ms (slowest replica)
-- All replicas must acknowledge
-- Maximum durability and consistency
 
-- LOCAL variants for multi-DC:
CONSISTENCY LOCAL_QUORUM
-- Quorum within local datacenter only
-- Faster than QUORUM for geographically distributed clusters
-- Cross-DC consistency is eventually consistent

DynamoDB Consistency Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// DynamoDB Consistency Options
 
// EL Behavior: Eventually consistent read (default)
const eventualRead = await docClient.get({
    TableName: 'Orders',
    Key: { orderId: '12345' }
    // ConsistentRead defaults to false
}).promise();
// Latency: ~5-10ms
// Cost: 0.5 RCU per 4KB
// May return stale data (typically <1 second old)
 
// EC Behavior: Strongly consistent read
const strongRead = await docClient.get({
    TableName: 'Orders',
    Key: { orderId: '12345' },
    ConsistentRead: true  // Force strong consistency
}).promise();
// Latency: ~10-20ms (roughly 2x)
// Cost: 1.0 RCU per 4KB (2x cost)
// Always returns latest committed write
 
// Writes are always strongly consistent within a region
// Global Tables: Async replication, eventually consistent across regions
 
// TransactWriteItems for ACID across items (EC behavior)
const txResult = await docClient.transactWrite({
    TransactItems: [
        { Put: { TableName: 'Orders', Item: order } },
        { Update: { TableName: 'Inventory', Key: {...}, ... } }
    ]
}).promise();
// Latency: ~25-50ms (coordination overhead)
// Both operations succeed or both fail

MongoDB Read/Write Concerns:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// MongoDB Consistency Configuration
 
// EL-leaning: Acknowledge primary only
db.orders.insertOne(
    { orderId: "12345", status: "pending" },
    { writeConcern: { w: 1 } }  // Primary acknowledgment only
);
// Latency: ~2-5ms
// Risk: Data loss if primary fails before replication
 
// EC-leaning: Majority acknowledgment
db.orders.insertOne(
    { orderId: "12345", status: "pending" },
    { writeConcern: { w: "majority", j: true } }  // Majority + journal
);
// Latency: ~10-50ms (depends on replica locations)
// Durability: Survives primary failure
 
// Read Concern for EC behavior
db.orders.find({ orderId: "12345" })
    .readConcern("majority");  // Only majority-committed data
// Prevents reading data that might be rolled back
 
// Linearizable reads (strongest EC)
db.orders.find({ orderId: "12345" })
    .readConcern("linearizable");
// Latency: Highest (~50-100ms+)
// Guarantees real-time consistency
// Use sparingly for critical reads
 
// Read Preference for EL behavior
db.orders.find({ status: "pending" })
    .readPreference("nearest");  // Lowest latency replica
// May return stale data from secondary
// Ideal for analytics, dashboards

Per-Operation Tuning

The key insight is that consistency level can be chosen per operation, not per database. A single application might use eventual consistency for browsing products (fast, stale is OK), strong consistency for checking inventory during checkout (must be accurate), and linearizable for completing payment (absolutely correct). Design your data access layer to support this flexibility.

PACELC Decision Framework

Let's develop a structured decision framework for applying PACELC to system design:

Step 1: Categorize Your Operations

Classify each operation in your system by its consistency and latency requirements:

Operation Classification Matrix
Consistency Need	Latency Tolerance	PACELC Preference	Example Operations
Must be correct	Can wait 200ms	PC/EC	Financial transactions, inventory checks, booking confirmations
Must be correct	Needs <50ms	PC/EL (challenging)	Real-time bidding, high-frequency trading (need specialized solutions)
Can be stale (seconds)	Needs <50ms	PA/EL	Social feeds, product listings, dashboard metrics
Can be stale (minutes)	Needs <100ms	PA/EL	Recommendations, search results, analytics
Can be stale (hours)	Any	PA/EL	Reports, batch processing results, audit logs

Step 2: Map Operations to Data Stores

Group operations by their PACELC requirements and assign appropriate data stores:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Example: E-Commerce Platform
 
PC/EC Operations (correctness critical):
  - Inventory decrement on purchase → PostgreSQL (serializable)
  - Order creation → PostgreSQL with ACID transactions
  - Payment processing → External payment service with idempotency
  
  Data Store: PostgreSQL with synchronous standby
  Configuration: synchronous_commit = on
  Expected Latency: 20-50ms
 
PA/EL Operations (speed critical, staleness acceptable):
  - Product catalog browsing → Elasticsearch
  - User session data → Redis Cluster
  - Shopping cart → DynamoDB (eventually consistent)
  - Product recommendations → Precomputed, served from CDN
  
  Data Stores: Various, optimized for read latency
  Configuration: Async replication, local reads
  Expected Latency: 5-15ms
 
Hybrid Operations (context-dependent):
  - Inventory display (browsing): Eventually consistent (stale OK)
  - Inventory check (checkout): Strongly consistent (must be accurate)
  
  Same data, different consistency per operation context

Step 3: Design for Failure Modes

Consider how your system behaves during partitions (the PA/PC choice):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Partition Behavior Design:
 
1. Identify partition-sensitive operations:
   - Cross-region database writes
   - Distributed transactions
   - Consensus-dependent coordination
   
2. Define partition detection:
   - Timeout thresholds (e.g., 5 second timeout = assume partition)
   - Health check endpoints
   - Quorum loss detection
   
3. Design PA behavior (if choosing availability):
   - Accept writes locally, queue for reconciliation
   - Use CRDTs or LWW for conflict resolution
   - Communicate staleness to clients ("data may be outdated")
   - Reconcile when partition heals
   
4. Design PC behavior (if choosing consistency):
   - Return errors for operations requiring consensus
   - Allow read-only mode if read quorum available
   - Queue writes for execution after partition heals
   - Communicate degraded state to clients
   
5. Document failure mode in runbooks:
   - Expected behavior during partition
   - Monitoring alerts for partition detection
   - Manual intervention procedures if needed
   - Testing procedures (chaos engineering)

Don't Over-Engineer

If your system operates in a single region with reliable networking, you may never experience partitions. Focus your PACELC optimization on the Else clause (latency vs consistency) since that's where your system actually lives. Partition handling is important for true global distribution but may be over-engineering for simpler deployments.

Common Mistakes and Anti-Patterns

Understanding PACELC helps you avoid common distributed systems mistakes:

Mistake 1: Ignoring the Else Clause

The Mistake

•Choosing database based only on CAP classification
•Ignoring latency during normal operation
•Assuming 'AP database' means always fast
•Not testing consistency under load

The Correction

•Evaluate PACELC behavior: both partition and normal
•Benchmark latency at different consistency levels
•Understand that PA/EL ≠ always fast (config matters)
•Load test with realistic consistency requirements

Mistake 2: Uniform Consistency for All Operations

The Mistake

•Using strong consistency everywhere 'to be safe'
•Or using eventual consistency everywhere 'for speed'
•One-size-fits-all consistency configuration
•Not understanding consistency requirements per operation

The Correction

•Analyze which operations actually need strong consistency
•Use per-operation consistency levels
•Accept different latency for different operation types
•Document consistency requirements in API contracts

Mistake 3: Underestimating Cross-Region Latency

The Mistake

•Expecting 'optimization' to eliminate cross-region latency
•Designing for global strong consistency without RTIC
•Promising latency SLAs that physics can't support
•Not accounting for synchronous replication overhead

The Correction

•Accept speed-of-light latency floors
•Use hierarchical replication for global systems
•Set realistic SLAs based on geographic distribution
•Design for strong consistency within regions, eventual across

Mistake 4: Ignoring Tail Latency in Quorum Systems

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
The Mistake:
  "Our median latency is 10ms, so we're fine."
  
  Reality with quorum systems:
    - p50: 10ms (half of operations)
    - p99: 100ms (1% of operations, but that's 10,000/day at 1M ops)
    - p99.9: 500ms (rare but destroys user experience)
    
  With fan-out (reading from multiple services):
    - Each service p99: 100ms
    - If request hits 10 services: 1-(0.99)^10 = 9.6% chance of >100ms
    - Aggregate p99 is much worse than component p99
 
The Correction:
  - Measure and alert on p99 and p99.9, not just p50
  - Use speculation: send to more replicas, use fastest response
  - Set timeouts and fallbacks for slow operations
  - Consider hedged requests for latency-sensitive operations
  - Accept that strong consistency increases tail latency

Testing For PACELC Behavior

To validate your PACELC understanding, test your system under network latency injection (add 100ms between replicas), under partition simulation (block traffic between regions), under load (consistency behavior may change), and measure both latency percentiles and consistency violations. Chaos engineering tools like Chaos Monkey, Gremlin, and Toxiproxy help with this.

Real-World Case Studies

Let's examine how real-world systems navigate PACELC trade-offs:

Case Study 1: Amazon DynamoDB Global Tables

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
DynamoDB Global Tables PACELC Analysis:
 
Architecture:
  - Multi-region, multi-active
  - Each region is a full read/write replica
  - Async replication between regions via DynamoDB Streams
  
PACELC Classification: PA/EL globally, PA/EC regionally
 
Within a region (E clause):
  - Writes: Synchronously replicated within region (EC)
  - Reads: Eventually consistent default, strongly consistent opt-in
  - Typical latency: 5-20ms
  
Across regions (E clause):
  - Replication lag: typically 100-500ms
  - Conflicts resolved by Last Writer Wins (LWW) based on timestamp
  - No global strong consistency available
  
During partition (P clause):
  - Each region continues operating (PA)
  - Writes accepted locally, queued for replication
  - Conflicts resolved when partition heals
  
Practical Implication:
  - Users see local low-latency writes
  - Cross-region users may see delayed/stale data
  - Application must handle LWW conflict semantics
  - Ideal for: user profiles, shopping carts, session data
  - Not ideal for: globally consistent inventory, financial ledgers

Case Study 2: Google Spanner

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Google Spanner PACELC Analysis:
 
Architecture:
  - Globally distributed, synchronized
  - TrueTime API using atomic clocks + GPS
  - Paxos consensus for each partition
  
PACELC Classification: PC/EC (achieves global strong consistency)
 
Within a region (E clause):
  - Writes: Synchronous via Paxos, ~5-10ms
  - Reads: Snapshot reads at TrueTime timestamp
  - Latency: competitive with single-region databases
  
Across regions (E clause):
  - TrueTime enables external consistency without 2PC
  - Commit wait: ~7ms to account for clock uncertainty
  - Cross-region transaction: ~50-200ms (physics limit)
  
During partition (P clause):
  - Requires majority (PC behavior)
  - Minority partitions cannot process transactions
  - Availability sacrificed for consistency
  
How they achieve PC/EC globally:
  - Specialized hardware (atomic clocks, GPS)
  - TrueTime bounds clock uncertainty to ~7ms
  - Commit wait ensures serialization
  - Accept latency cost for global consistency
  
Practical Implication:
  - True ACID transactions at global scale
  - Higher latency than eventually consistent alternatives
  - Significant infrastructure investment
  - Ideal for: financial systems, inventory, anything requiring correctness

Case Study 3: Apache Cassandra at Netflix

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Netflix Cassandra PACELC Usage:
 
Context:
  - Global streaming service, 200M+ subscribers
  - Viewing history, user preferences, session data
  - Billions of operations per day across 3 regions
  
PACELC Classification: PA/EL (tuned for availability and speed)
 
Configuration:
  - Replication Factor: 3 per datacenter (9 total globally)
  - Default: LOCAL_QUORUM for most operations
  - Strong consistency (QUORUM all DCs) for critical metadata only
  
E clause behavior:
  - LOCAL_QUORUM: ~5-15ms reads/writes within region
  - Cross-region reads: routed regionally, <20ms
  - Global consistency: eventual, 100-500ms propagation
  
P clause behavior:
  - Regions degrade independently
  - Local operations continue (PA)
  - Cross-region operations may return stale data
  
Key Design Decisions:
  - Accept eventual consistency for viewing history (stale is OK)
  - Use LOCAL_QUORUM (not ALL) for availability during node failures
  - Cross-region: async replication, accept staleness
  - User context provides consistency: your own history is consistent
  
Result:
  - Sub-20ms latency globally
  - 99.99%+ availability
  - Occasional stale reads accepted (user won't notice)
  - Not suitable for billing/payment (use different system)

The Hybrid Reality

Notice that all three case studies use multiple systems with different PACELC properties. Amazon uses DynamoDB (PA/EL) alongside other systems. Netflix uses Cassandra (PA/EL) but has other databases for financial data. Google Spanner (PC/EC) exists alongside Bigtable (PA/EL). Mature architectures combine systems to get the right trade-offs for different data types.

Communicating Trade-offs to Stakeholders

As an architect, you'll need to explain PACELC trade-offs to non-technical stakeholders. Here's how to frame these discussions:

For Business Stakeholders:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Business-Friendly Framing:
 
Question: "Why can't we have both fast AND always-correct?"
 
Answer using analogy:
  Think of a chain of stores updating prices.
  
  Option A (Fast, Eventually Correct):
    - Each store updates immediately when they receive the memo
    - Customers get instant service
    - Different stores might briefly show different prices
    - Eventually, all stores sync up (minutes)
    
  Option B (Slow, Always Correct):
    - Central coordinator calls each store before change
    - No customer sees different prices at same time
    - Every price update takes longer (calls to all stores)
    - If one store is unreachable, updates halt
    
  Our system faces the same choice:
    - Fast responses OR perfect real-time consistency
    - We've chosen [X] because [business reason]
    - Here's the trade-off impact: [specific scenario]
 
Business Questions to Ask:
  1. What's the cost of a user seeing stale data for 1 second?
  2. What's the cost of a 200ms slower response?
  3. Which matters more for this feature?

For Engineering Teams:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Engineering Team Documentation Template:
 
## PACELC Trade-off Decision: [Feature/System]
 
### Context
- Feature: [What we're building]
- Data: [What data is involved]
- Scale: [Expected throughput, geographic distribution]
 
### Requirements Analysis
| Operation | Consistency Need | Latency Target | PACELC |
|-----------|-----------------|----------------|--------|
| Read X    | Eventual OK     | <50ms          | EL     |
| Write Y   | Strong          | <200ms OK      | EC     |
 
### Decision
- Database: [Selection] with PACELC: [Classification]
- Normal operation (E): [EL or EC behavior]
- Partition (P): [PA or PC behavior]
 
### Configuration
```
Consistency level: [Specific setting]
Replication: [Sync/async]
Read preference: [Setting]
```
 
### Implications
- Expected latency: [range]
- Consistency guarantee: [specific]
- Failure mode: [what happens during partition]
 
### Alternatives Considered
- [Option A]: Rejected because [reason]
- [Option B]: Rejected because [reason]

Document PACELC Decisions

Every significant distributed system decision should document its PACELC rationale. When future engineers ask 'Why is this eventually consistent?' or 'Why is this so slow?', the documentation should explain the intentional trade-off, not leave it as a mystery to be reverse-engineered.

Summary: PACELC as Your Distributed Systems Compass

We've completed our deep dive into the PACELC theorem. Let's consolidate everything into actionable guidance:

Module Key Takeaways

•PACELC extends CAP — CAP addresses partition behavior; PACELC adds normal operation (latency vs consistency), completing the picture.
•Normal operation dominates — Systems spend 99%+ of time in the 'Else' state. The EL/EC choice affects every operation; partition behavior is rare.
•The trade-off is physics — Strong consistency requires coordination; coordination requires time bounded by speed of light. No optimization eliminates this.
•Classify systems properly — PA/EL, PA/EC, PC/EL, PC/EC provide richer insight than CP/AP. Most databases are now configurable across the spectrum.
•Match consistency to operations — Different operations need different consistency. Use per-operation tuning to optimize for each use case.
•Use the right tool for the job — Production systems typically combine databases with different PACELC properties for different data types.

The PACELC Mental Model:

Whenever you design, evaluate, or debug a distributed system, ask:

What happens during a partition? Does the system choose availability (accept writes, resolve conflicts later) or consistency (reject operations until partition heals)?
What happens during normal operation? Does the system choose low latency (async replication, local reads) or strong consistency (synchronous replication, quorum operations)?
Is this the right choice for this data? Financial transactions need PC/EC. Social feeds can use PA/EL. Session data might use PA/EC. There's no universal answer.
Are we configured appropriately? Even the right database can be misconfigured. Verify consistency levels, replication settings, and timeout configurations match requirements.

You've Mastered PACELC When:

You classify databases by their PACELC behavior, not just vendor marketing
You specify different consistency levels for different operations
You can explain trade-offs to both engineers and business stakeholders
You design systems that intentionally position on the PACELC spectrum
You recognize symptoms of PACELC misconfiguration (latency spikes, stale reads)

Module Complete: PACELC Theorem

Congratulations! You've mastered the PACELC theorem—a critical framework for understanding distributed system behavior. You can now analyze systems beyond the limited CAP model, make informed database and architecture decisions, and communicate trade-offs clearly. This knowledge will serve you in every distributed system you design, evaluate, or troubleshoot throughout your career.

4 / 4

Loading learning content...

System Design (HLD)PACELC Theorem

PACELC Theorem: Beyond CAP

LevelIntermediate

Duration75 mins

TopicPACELC Theorem

4 / 4

Practical Implications: Applying PACELC to Real-World Design

From Theory to Production Systems

The goal is to transform PACELC from an academic concept into a practical tool you reach for whenever you design, evaluate, or troubleshoot distributed systems.

What You Will Learn

Database Selection Through the PACELC Lens

Database selection is one of the most consequential decisions in system design. PACELC provides a framework for evaluating databases based on their consistency and latency characteristics.

The PACELC Database Matrix:

Let's classify popular databases according to their default PACELC behavior:

Database PACELC Classification
Database	PACELC	Partition Behavior	Normal Operation	Configurability
PostgreSQL (sync replication)	PC/EC	Primary unavailable during partition	Synchronous writes to standbys	Tune synchronous_commit, synchronous_standby_names
MySQL InnoDB Cluster	PC/EC	Requires majority for transactions	Group Replication with synchronous commit	Semi-sync mode available
MongoDB (default)	PC/EC	Election blocks writes; reads continue	Primary handles writes; configurable read preference	Write concern, read concern tunable
CockroachDB	PC/EC	Requires majority for liveness	Serializable by default; Raft consensus	Transaction isolation levels
Google Spanner	PC/EC	Requires majority	TrueTime for external consistency	Read timestamp options
Cassandra (default)	PA/EL	Accepts writes on both sides	Eventual consistency; tunable levels	Consistency level per operation
DynamoDB (default)	PA/EL	Eventual across regions	Eventually consistent reads default	Strong consistency opt-in per read
Riak	PA/EL	CRDTs resolve conflicts	Eventual consistency via vector clocks	R/W values tunable
Redis Cluster	PA/EL	Async replication; potential data loss	Single-node operations; no multi-key TX	WAIT command for sync
Amazon Aurora	PC/EL	Quorum for writes	Async read replicas; write to quorum	Reader endpoints, writer endpoint

Selection Criteria Based on PACELC:

Choose PC/EC When:

•Correctness is paramount — Financial systems, inventory management, booking systems where double-selling is catastrophic.
•Simple consistency reasoning — When developers shouldn't need to think about eventual consistency semantics.
•Single-region or latency-tolerant — When cross-region latency is acceptable or all nodes are co-located.
•ACID transactions required — Multi-row/multi-table transactions with isolation guarantees.
•Audit and compliance — Regulatory requirements for consistent, ordered records.

Choose PA/EL When:

•Latency is critical — User-facing systems where 100ms+ responses are unacceptable.
•Global distribution required — Users worldwide need low-latency access.
•High write throughput — Millions of writes per second; synchronous coordination is a bottleneck.
•Eventual consistency acceptable — Social feeds, analytics, logging, caching where temporary inconsistency is fine.
•Availability over consistency — Always-writable systems like shopping carts, user preferences.

Many Databases Are Configurable

Architecture Patterns Informed by PACELC

PACELC doesn't just inform database selection—it shapes entire architecture patterns. Let's examine common patterns through the PACELC lens:

Pattern 1: CQRS (Command Query Responsibility Segregation)

CQRS separates read and write models, allowing different PACELC choices for each:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
CQRS Architecture with PACELC:
 
┌─────────────────────────────────────────────────────────────┐
│                        Clients                               │
└─────────────┬────────────────────────────┬──────────────────┘
              │ Commands (writes)           │ Queries (reads)
              ▼                             ▼
┌─────────────────────────┐    ┌──────────────────────────────┐
│    Command Service      │    │       Query Service          │
│    (PC/EC behavior)     │    │       (PA/EL behavior)       │
│                         │    │                              │
│  - Strong consistency   │    │  - Eventual consistency      │
│  - Higher latency OK    │    │  - Low latency critical      │
│  - Write to primary     │    │  - Read from replicas        │
└───────────┬─────────────┘    └──────────────┬───────────────┘
            │                                  │
            ▼                                  ▼
┌─────────────────────────┐    ┌──────────────────────────────┐
│   Write Database        │───▶│     Read Database(s)         │
│   (e.g., PostgreSQL)    │    │   (e.g., Elasticsearch,      │
│                         │    │    Redis, Read Replicas)     │
│  Optimized for:         │    │  Optimized for:              │
│  - ACID compliance      │    │  - Query performance         │
│  - Consistency          │    │  - Low latency               │
│  - Durability           │    │  - Read scaling              │
└─────────────────────────┘    └──────────────────────────────┘
            │                                  ▲
            └──────────── Async Sync ──────────┘
                      (Event sourcing/CDC)

Pattern 2: Multi-Region Active-Active

Globally distributed systems with multiple active regions face the full PACELC trade-off:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Multi-Region Deployment Strategies:
 
1. Single-Leader (PC/EC globally):
   ┌─────────────┐
   │  Region A   │◀── All writes
   │  (Primary)  │
   └──────┬──────┘
          │ Sync replication (100-200ms latency penalty)
   ┌──────┼──────┐
   ▼      ▼      ▼
┌──────┐┌──────┐┌──────┐
│Reg B ││Reg C ││Reg D │  (Read-only replicas)
└──────┘└──────┘└──────┘
 
   + Strong consistency globally
   - High write latency from non-primary regions
   - Single point of failure for writes
 
2. Multi-Leader + Conflict Resolution (PA/EL globally):
   ┌──────────┐    ┌──────────┐    ┌──────────┐
   │ Region A │◀──▶│ Region B │◀──▶│ Region C │
   │ (Leader) │    │ (Leader) │    │ (Leader) │
   └──────────┘    └──────────┘    └──────────┘
        ▲               ▲               ▲
        │               │               │
     Writes          Writes          Writes
   (local)          (local)         (local)
   
   + Low latency writes everywhere
   + High availability (any region can accept writes)
   - Conflicts possible; need resolution strategy (LWW, CRDT, etc.)
   - Eventual consistency; temporary divergence
 
3. Hybrid: Strong within region, eventual across (PC locally, PA globally):
   Each region runs PC/EC internally
   Cross-region replication is async (EL)
   Users routed to nearest region
   Cross-region reads may be stale

Pattern 3: The Saga Pattern for Distributed Transactions

When strong consistency is needed across services, Sagas provide eventual consistency with compensating transactions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Saga Pattern (PA/EL with application-level consistency):
 
Traditional 2PC (PC/EC):
  - All participants must be available
  - Locks held during coordination
  - High latency, poor availability during partitions
 
Saga (PA/EL with eventual correctness):
  Step 1: Book flight (local commit, fast)
  Step 2: Book hotel (local commit, fast)
  Step 3: Charge payment (local commit, fast)
  
  If Step 3 fails:
    Compensate Step 2: Cancel hotel
    Compensate Step 1: Cancel flight
    
  PACELC Implications:
    - Each step is fast (EL) - no distributed locking
    - Temporary inconsistencies exist (flight booked, hotel not yet)
    - Eventually consistent via compensating actions
    - Available during partitions (PA) - steps queued
    
  Trade-off:
    - Simpler consistency reasoning with 2PC
    - Better latency and availability with Sagas
    - More complex error handling (compensating logic)

Pattern Selection Heuristic

Configuration Strategies for PACELC Optimization

Most distributed databases expose configuration knobs that let you tune your position on the PACELC spectrum. Understanding these configurations is essential for optimizing your system.

Cassandra Consistency Tuning:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Cassandra Consistency Levels
 
-- EL Behavior (default): Fast but eventually consistent
CONSISTENCY ONE
INSERT INTO orders (id, status) VALUES (uuid(), 'pending');
-- Latency: ~2-5ms
-- Writes to any single replica; async replication to others
-- Risk: Data loss if replica fails before replication
 
-- Balanced: Majority agreement
CONSISTENCY QUORUM
SELECT * FROM orders WHERE id = ?;
-- Latency: ~10-30ms (depends on RF and topology)
-- Requires majority (RF/2 + 1) for reads and writes
-- Strong consistency when using QUORUM for both R and W
 
-- EC Behavior: Maximum consistency
CONSISTENCY ALL
INSERT INTO accounts (id, balance) VALUES (?, ?);
-- Latency: ~50-200ms (slowest replica)
-- All replicas must acknowledge
-- Maximum durability and consistency
 
-- LOCAL variants for multi-DC:
CONSISTENCY LOCAL_QUORUM
-- Quorum within local datacenter only
-- Faster than QUORUM for geographically distributed clusters
-- Cross-DC consistency is eventually consistent

DynamoDB Consistency Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// DynamoDB Consistency Options
 
// EL Behavior: Eventually consistent read (default)
const eventualRead = await docClient.get({
    TableName: 'Orders',
    Key: { orderId: '12345' }
    // ConsistentRead defaults to false
}).promise();
// Latency: ~5-10ms
// Cost: 0.5 RCU per 4KB
// May return stale data (typically <1 second old)
 
// EC Behavior: Strongly consistent read
const strongRead = await docClient.get({
    TableName: 'Orders',
    Key: { orderId: '12345' },
    ConsistentRead: true  // Force strong consistency
}).promise();
// Latency: ~10-20ms (roughly 2x)
// Cost: 1.0 RCU per 4KB (2x cost)
// Always returns latest committed write
 
// Writes are always strongly consistent within a region
// Global Tables: Async replication, eventually consistent across regions
 
// TransactWriteItems for ACID across items (EC behavior)
const txResult = await docClient.transactWrite({
    TransactItems: [
        { Put: { TableName: 'Orders', Item: order } },
        { Update: { TableName: 'Inventory', Key: {...}, ... } }
    ]
}).promise();
// Latency: ~25-50ms (coordination overhead)
// Both operations succeed or both fail

MongoDB Read/Write Concerns:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// MongoDB Consistency Configuration
 
// EL-leaning: Acknowledge primary only
db.orders.insertOne(
    { orderId: "12345", status: "pending" },
    { writeConcern: { w: 1 } }  // Primary acknowledgment only
);
// Latency: ~2-5ms
// Risk: Data loss if primary fails before replication
 
// EC-leaning: Majority acknowledgment
db.orders.insertOne(
    { orderId: "12345", status: "pending" },
    { writeConcern: { w: "majority", j: true } }  // Majority + journal
);
// Latency: ~10-50ms (depends on replica locations)
// Durability: Survives primary failure
 
// Read Concern for EC behavior
db.orders.find({ orderId: "12345" })
    .readConcern("majority");  // Only majority-committed data
// Prevents reading data that might be rolled back
 
// Linearizable reads (strongest EC)
db.orders.find({ orderId: "12345" })
    .readConcern("linearizable");
// Latency: Highest (~50-100ms+)
// Guarantees real-time consistency
// Use sparingly for critical reads
 
// Read Preference for EL behavior
db.orders.find({ status: "pending" })
    .readPreference("nearest");  // Lowest latency replica
// May return stale data from secondary
// Ideal for analytics, dashboards

Per-Operation Tuning

PACELC Decision Framework

Let's develop a structured decision framework for applying PACELC to system design:

Step 1: Categorize Your Operations

Classify each operation in your system by its consistency and latency requirements:

Operation Classification Matrix
Consistency Need	Latency Tolerance	PACELC Preference	Example Operations
Must be correct	Can wait 200ms	PC/EC	Financial transactions, inventory checks, booking confirmations
Must be correct	Needs <50ms	PC/EL (challenging)	Real-time bidding, high-frequency trading (need specialized solutions)
Can be stale (seconds)	Needs <50ms	PA/EL	Social feeds, product listings, dashboard metrics
Can be stale (minutes)	Needs <100ms	PA/EL	Recommendations, search results, analytics
Can be stale (hours)	Any	PA/EL	Reports, batch processing results, audit logs

Step 2: Map Operations to Data Stores

Group operations by their PACELC requirements and assign appropriate data stores:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Example: E-Commerce Platform
 
PC/EC Operations (correctness critical):
  - Inventory decrement on purchase → PostgreSQL (serializable)
  - Order creation → PostgreSQL with ACID transactions
  - Payment processing → External payment service with idempotency
  
  Data Store: PostgreSQL with synchronous standby
  Configuration: synchronous_commit = on
  Expected Latency: 20-50ms
 
PA/EL Operations (speed critical, staleness acceptable):
  - Product catalog browsing → Elasticsearch
  - User session data → Redis Cluster
  - Shopping cart → DynamoDB (eventually consistent)
  - Product recommendations → Precomputed, served from CDN
  
  Data Stores: Various, optimized for read latency
  Configuration: Async replication, local reads
  Expected Latency: 5-15ms
 
Hybrid Operations (context-dependent):
  - Inventory display (browsing): Eventually consistent (stale OK)
  - Inventory check (checkout): Strongly consistent (must be accurate)
  
  Same data, different consistency per operation context

Step 3: Design for Failure Modes

Consider how your system behaves during partitions (the PA/PC choice):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Partition Behavior Design:
 
1. Identify partition-sensitive operations:
   - Cross-region database writes
   - Distributed transactions
   - Consensus-dependent coordination
   
2. Define partition detection:
   - Timeout thresholds (e.g., 5 second timeout = assume partition)
   - Health check endpoints
   - Quorum loss detection
   
3. Design PA behavior (if choosing availability):
   - Accept writes locally, queue for reconciliation
   - Use CRDTs or LWW for conflict resolution
   - Communicate staleness to clients ("data may be outdated")
   - Reconcile when partition heals
   
4. Design PC behavior (if choosing consistency):
   - Return errors for operations requiring consensus
   - Allow read-only mode if read quorum available
   - Queue writes for execution after partition heals
   - Communicate degraded state to clients
   
5. Document failure mode in runbooks:
   - Expected behavior during partition
   - Monitoring alerts for partition detection
   - Manual intervention procedures if needed
   - Testing procedures (chaos engineering)

Don't Over-Engineer

Common Mistakes and Anti-Patterns

Understanding PACELC helps you avoid common distributed systems mistakes:

Mistake 1: Ignoring the Else Clause

The Mistake

•Choosing database based only on CAP classification
•Ignoring latency during normal operation
•Assuming 'AP database' means always fast
•Not testing consistency under load

The Correction

•Evaluate PACELC behavior: both partition and normal
•Benchmark latency at different consistency levels
•Understand that PA/EL ≠ always fast (config matters)
•Load test with realistic consistency requirements

Mistake 2: Uniform Consistency for All Operations

The Mistake

•Using strong consistency everywhere 'to be safe'
•Or using eventual consistency everywhere 'for speed'
•One-size-fits-all consistency configuration
•Not understanding consistency requirements per operation

The Correction

•Analyze which operations actually need strong consistency
•Use per-operation consistency levels
•Accept different latency for different operation types
•Document consistency requirements in API contracts

Mistake 3: Underestimating Cross-Region Latency

The Mistake

•Expecting 'optimization' to eliminate cross-region latency
•Designing for global strong consistency without RTIC
•Promising latency SLAs that physics can't support
•Not accounting for synchronous replication overhead

The Correction

•Accept speed-of-light latency floors
•Use hierarchical replication for global systems
•Set realistic SLAs based on geographic distribution
•Design for strong consistency within regions, eventual across

Mistake 4: Ignoring Tail Latency in Quorum Systems

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
The Mistake:
  "Our median latency is 10ms, so we're fine."
  
  Reality with quorum systems:
    - p50: 10ms (half of operations)
    - p99: 100ms (1% of operations, but that's 10,000/day at 1M ops)
    - p99.9: 500ms (rare but destroys user experience)
    
  With fan-out (reading from multiple services):
    - Each service p99: 100ms
    - If request hits 10 services: 1-(0.99)^10 = 9.6% chance of >100ms
    - Aggregate p99 is much worse than component p99
 
The Correction:
  - Measure and alert on p99 and p99.9, not just p50
  - Use speculation: send to more replicas, use fastest response
  - Set timeouts and fallbacks for slow operations
  - Consider hedged requests for latency-sensitive operations
  - Accept that strong consistency increases tail latency

Testing For PACELC Behavior

Real-World Case Studies

Let's examine how real-world systems navigate PACELC trade-offs:

Case Study 1: Amazon DynamoDB Global Tables

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
DynamoDB Global Tables PACELC Analysis:
 
Architecture:
  - Multi-region, multi-active
  - Each region is a full read/write replica
  - Async replication between regions via DynamoDB Streams
  
PACELC Classification: PA/EL globally, PA/EC regionally
 
Within a region (E clause):
  - Writes: Synchronously replicated within region (EC)
  - Reads: Eventually consistent default, strongly consistent opt-in
  - Typical latency: 5-20ms
  
Across regions (E clause):
  - Replication lag: typically 100-500ms
  - Conflicts resolved by Last Writer Wins (LWW) based on timestamp
  - No global strong consistency available
  
During partition (P clause):
  - Each region continues operating (PA)
  - Writes accepted locally, queued for replication
  - Conflicts resolved when partition heals
  
Practical Implication:
  - Users see local low-latency writes
  - Cross-region users may see delayed/stale data
  - Application must handle LWW conflict semantics
  - Ideal for: user profiles, shopping carts, session data
  - Not ideal for: globally consistent inventory, financial ledgers

Case Study 2: Google Spanner

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Google Spanner PACELC Analysis:
 
Architecture:
  - Globally distributed, synchronized
  - TrueTime API using atomic clocks + GPS
  - Paxos consensus for each partition
  
PACELC Classification: PC/EC (achieves global strong consistency)
 
Within a region (E clause):
  - Writes: Synchronous via Paxos, ~5-10ms
  - Reads: Snapshot reads at TrueTime timestamp
  - Latency: competitive with single-region databases
  
Across regions (E clause):
  - TrueTime enables external consistency without 2PC
  - Commit wait: ~7ms to account for clock uncertainty
  - Cross-region transaction: ~50-200ms (physics limit)
  
During partition (P clause):
  - Requires majority (PC behavior)
  - Minority partitions cannot process transactions
  - Availability sacrificed for consistency
  
How they achieve PC/EC globally:
  - Specialized hardware (atomic clocks, GPS)
  - TrueTime bounds clock uncertainty to ~7ms
  - Commit wait ensures serialization
  - Accept latency cost for global consistency
  
Practical Implication:
  - True ACID transactions at global scale
  - Higher latency than eventually consistent alternatives
  - Significant infrastructure investment
  - Ideal for: financial systems, inventory, anything requiring correctness

Case Study 3: Apache Cassandra at Netflix

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Netflix Cassandra PACELC Usage:
 
Context:
  - Global streaming service, 200M+ subscribers
  - Viewing history, user preferences, session data
  - Billions of operations per day across 3 regions
  
PACELC Classification: PA/EL (tuned for availability and speed)
 
Configuration:
  - Replication Factor: 3 per datacenter (9 total globally)
  - Default: LOCAL_QUORUM for most operations
  - Strong consistency (QUORUM all DCs) for critical metadata only
  
E clause behavior:
  - LOCAL_QUORUM: ~5-15ms reads/writes within region
  - Cross-region reads: routed regionally, <20ms
  - Global consistency: eventual, 100-500ms propagation
  
P clause behavior:
  - Regions degrade independently
  - Local operations continue (PA)
  - Cross-region operations may return stale data
  
Key Design Decisions:
  - Accept eventual consistency for viewing history (stale is OK)
  - Use LOCAL_QUORUM (not ALL) for availability during node failures
  - Cross-region: async replication, accept staleness
  - User context provides consistency: your own history is consistent
  
Result:
  - Sub-20ms latency globally
  - 99.99%+ availability
  - Occasional stale reads accepted (user won't notice)
  - Not suitable for billing/payment (use different system)

The Hybrid Reality

Communicating Trade-offs to Stakeholders

As an architect, you'll need to explain PACELC trade-offs to non-technical stakeholders. Here's how to frame these discussions:

For Business Stakeholders:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Business-Friendly Framing:
 
Question: "Why can't we have both fast AND always-correct?"
 
Answer using analogy:
  Think of a chain of stores updating prices.
  
  Option A (Fast, Eventually Correct):
    - Each store updates immediately when they receive the memo
    - Customers get instant service
    - Different stores might briefly show different prices
    - Eventually, all stores sync up (minutes)
    
  Option B (Slow, Always Correct):
    - Central coordinator calls each store before change
    - No customer sees different prices at same time
    - Every price update takes longer (calls to all stores)
    - If one store is unreachable, updates halt
    
  Our system faces the same choice:
    - Fast responses OR perfect real-time consistency
    - We've chosen [X] because [business reason]
    - Here's the trade-off impact: [specific scenario]
 
Business Questions to Ask:
  1. What's the cost of a user seeing stale data for 1 second?
  2. What's the cost of a 200ms slower response?
  3. Which matters more for this feature?

For Engineering Teams:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Engineering Team Documentation Template:
 
## PACELC Trade-off Decision: [Feature/System]
 
### Context
- Feature: [What we're building]
- Data: [What data is involved]
- Scale: [Expected throughput, geographic distribution]
 
### Requirements Analysis
| Operation | Consistency Need | Latency Target | PACELC |
|-----------|-----------------|----------------|--------|
| Read X    | Eventual OK     | <50ms          | EL     |
| Write Y   | Strong          | <200ms OK      | EC     |
 
### Decision
- Database: [Selection] with PACELC: [Classification]
- Normal operation (E): [EL or EC behavior]
- Partition (P): [PA or PC behavior]
 
### Configuration
```
Consistency level: [Specific setting]
Replication: [Sync/async]
Read preference: [Setting]
```
 
### Implications
- Expected latency: [range]
- Consistency guarantee: [specific]
- Failure mode: [what happens during partition]
 
### Alternatives Considered
- [Option A]: Rejected because [reason]
- [Option B]: Rejected because [reason]

Document PACELC Decisions

Summary: PACELC as Your Distributed Systems Compass

We've completed our deep dive into the PACELC theorem. Let's consolidate everything into actionable guidance:

Module Key Takeaways

•PACELC extends CAP — CAP addresses partition behavior; PACELC adds normal operation (latency vs consistency), completing the picture.
•Normal operation dominates — Systems spend 99%+ of time in the 'Else' state. The EL/EC choice affects every operation; partition behavior is rare.
•The trade-off is physics — Strong consistency requires coordination; coordination requires time bounded by speed of light. No optimization eliminates this.
•Classify systems properly — PA/EL, PA/EC, PC/EL, PC/EC provide richer insight than CP/AP. Most databases are now configurable across the spectrum.
•Match consistency to operations — Different operations need different consistency. Use per-operation tuning to optimize for each use case.
•Use the right tool for the job — Production systems typically combine databases with different PACELC properties for different data types.

The PACELC Mental Model:

Whenever you design, evaluate, or debug a distributed system, ask:

What happens during a partition? Does the system choose availability (accept writes, resolve conflicts later) or consistency (reject operations until partition heals)?
What happens during normal operation? Does the system choose low latency (async replication, local reads) or strong consistency (synchronous replication, quorum operations)?
Is this the right choice for this data? Financial transactions need PC/EC. Social feeds can use PA/EL. Session data might use PA/EC. There's no universal answer.
Are we configured appropriately? Even the right database can be misconfigured. Verify consistency levels, replication settings, and timeout configurations match requirements.

You've Mastered PACELC When:

You classify databases by their PACELC behavior, not just vendor marketing
You specify different consistency levels for different operations
You can explain trade-offs to both engineers and business stakeholders
You design systems that intentionally position on the PACELC spectrum
You recognize symptoms of PACELC misconfiguration (latency spikes, stale reads)

Module Complete: PACELC Theorem

4 / 4