System Design (HLD)PACELC Theorem

PACELC Theorem: Beyond CAP

LevelIntermediate

Duration75 mins

TopicPACELC Theorem

1 / 4

Extension of CAP: Why CAP Isn't the Complete Picture

The Missing Piece in Distributed Systems Theory

If you've studied distributed systems for even a few weeks, you've encountered the CAP theorem—Eric Brewer's famous result stating that distributed systems can provide at most two of three guarantees: Consistency, Availability, and Partition Tolerance. CAP has become gospel in system design discussions, the foundation upon which architects justify their database choices and system trade-offs.

But here's a provocative question: What does the CAP theorem tell you about system behavior when the network is working perfectly?

The answer is: almost nothing.

This is the fundamental limitation that led Daniel Abadi at Yale University to propose an extension in 2012—a theorem that completes the picture CAP left unfinished. Welcome to PACELC, the theorem that explains what's actually happening in your distributed systems most of the time.

What You Will Learn

By the end of this page, you will understand why CAP theorem provides an incomplete model for distributed system trade-offs, how PACELC extends CAP to cover normal operation scenarios, and why this distinction is critical for making informed architectural decisions. You'll discover that the trade-offs you face most frequently aren't about partitions at all—they're about latency.

Revisiting CAP—The Foundation

Before we can understand why PACELC matters, we need to revisit CAP with fresh eyes—examining not just what it says, but critically, what it leaves unsaid.

The CAP Theorem in Brief:

In 2000, Eric Brewer conjectured (and in 2002, Seth Gilbert and Nancy Lynch formally proved) that any distributed data store can provide at most two of the following three guarantees simultaneously:

The Three CAP Properties

•Consistency (C) — Every read receives the most recent write or an error. All nodes see the same data at the same time. This is linearizability, the strongest consistency model.
•Availability (A) — Every request receives a (non-error) response, without the guarantee that it contains the most recent write. The system remains operational even if some nodes are unreachable.
•Partition Tolerance (P) — The system continues to operate despite an arbitrary number of messages being dropped or delayed between nodes. Network failures don't cause system-wide failure.

The theorem states that during a network partition—when nodes cannot communicate with each other—a distributed system must choose between providing consistency or availability. You cannot have both.

Why Partition Tolerance Is Non-Negotiable:

In any realistic distributed system, network partitions will occur. Cables get cut. Switches fail. Cloud regions lose connectivity. The question isn't whether partitions happen—it's how frequently and for how long.

This means that in practice, the CAP choice reduces to CP vs. AP:

CP systems sacrifice availability during partitions to maintain consistency (example: a distributed database that becomes read-only during network splits)
AP systems sacrifice consistency during partitions to maintain availability (example: a system that allows writes to both sides of a partition, accepting potential conflicts)

Classic CAP Classification
System Type	During Partition	Example Systems
CP (Consistency + Partition Tolerance)	Rejects operations that can't be consistently processed	MongoDB (default), HBase, Spanner, Zookeeper
AP (Availability + Partition Tolerance)	Accepts operations, allows divergence	Cassandra, DynamoDB (default), CouchDB, Riak

The Critical Gap in CAP

CAP tells us what happens during a partition. But consider this reality:

Network partitions are rare events.

In a well-engineered system with quality infrastructure, partitions might occur for minutes or hours per year—not per day. The overwhelming majority of the time, your distributed system operates without partitions. All nodes can communicate. The network is functioning normally.

So what does CAP tell us about system behavior during normal operation?

Absolutely nothing.

In the absence of partitions, CAP appears to suggest you can have both consistency and availability. But anyone who has operated a distributed database knows this isn't the full story. Even when the network is healthy, you still face fundamental trade-offs.

The Blind Spot

Here's the core insight CAP misses: during normal operation, you don't face a consistency vs. availability trade-off—you face a consistency vs. latency trade-off. This is the gap that PACELC fills, and it's the trade-off you'll encounter far more frequently in practice.

The Latency Reality:

Consider a distributed database with three replicas across different data centers:

Strong consistency requires synchronous replication — A write isn't acknowledged until all (or a majority of) replicas confirm it. This means waiting for round-trip communication to potentially distant nodes.
Lower latency allows asynchronous replication — A write is acknowledged immediately after the primary processes it. Replicas are updated asynchronously, reducing wait time but creating a window where different replicas have different data.

This trade-off exists all the time, not just during partitions. And for most applications, it matters far more than the partition behavior because it affects every single operation, every single millisecond of every single day.

Why Latency Trade-offs Dominate

•Frequency — Latency affects every operation; partitions are rare events. A system handles millions of requests per day, but might see partitions only a few times per year.
•User Experience — Users perceive latency directly. A 200ms vs 50ms response time is the difference between 'snappy' and 'sluggish'. Partition behavior is invisible to most users.
•Architectural Decisions — Database selection, replication strategy, and geographic distribution all hinge on latency trade-offs, not just partition behavior.
•Cost Implications — Synchronous replication across regions can multiply response times by 10x, affecting infrastructure costs, user satisfaction, and conversion rates.

Introducing PACELC

In 2012, Daniel Abadi published a paper titled "Consistency Tradeoffs in Modern Distributed Database System Design" that introduced PACELC as an extension to CAP. The formulation elegantly captures both partition and normal operation behavior:

PACELC is pronounced "pass-elk" and stands for:

Partition → Availability vs Consistency; Else → Latency vs Consistency

Read it as: "If there is a Partition, choose between Availability and Consistency; Else (during normal operation), choose between Latency and Consistency."

This simple extension captures a profound insight: distributed systems make trade-offs along two dimensions, not one.

The PACELC Formula

IF Partition → A or C ELSE → L or C

A system is classified as PA/EL, PA/EC, PC/EL, or PC/EC based on its choices in both scenarios. This four-way classification provides much richer insight into system behavior than the simple CP/AP dichotomy of CAP.

The Four PACELC Classifications:

PACELC Classification Matrix
Classification	During Partition	Normal Operation	Behavior Summary
PA/EL	Availability over Consistency	Latency over Consistency	Always prioritizes responsiveness; never blocks for consistency
PA/EC	Availability over Consistency	Consistency over Latency	Relaxes during failures but strict during normal operation
PC/EL	Consistency over Availability	Latency over Consistency	Strict during failures but relaxes during normal operation
PC/EC	Consistency over Availability	Consistency over Latency	Always prioritizes consistency; accepts higher latency and reduced availability

Why Four Categories Matter:

The CAP theorem would classify Cassandra and DynamoDB both as "AP systems." But their behavior differs significantly during normal operation:

Cassandra with default settings is PA/EL — It prioritizes availability during partitions and prioritizes low latency during normal operation. Consistency is always secondary.
DynamoDB with strong consistency enabled is PA/EC — It prioritizes availability during partitions, but enforces strong consistency during normal operation, accepting higher latency.

This distinction is invisible under CAP but critically important for system architects choosing between these databases.

The Mathematical Foundation

To deeply understand PACELC, we need to examine the mathematical relationships that make these trade-offs inescapable.

The Latency-Consistency Relationship:

Consider a distributed system with N replicas. For a write operation to be "strongly consistent," the system must ensure that any subsequent read—from any replica—returns that write or a later one. This requires synchronization.

The fundamental equation governing this trade-off is derived from quorum systems:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Strong Consistency Requirement:
  R + W > N
 
Where:
  R = Read quorum (number of replicas consulted for reads)
  W = Write quorum (number of replicas that must acknowledge writes)
  N = Total number of replicas
 
Latency Implication:
  Write Latency ≥ RTT to Wth slowest replica
  Read Latency ≥ RTT to Rth slowest replica
 
For strong consistency with N=3:
  If W=2, R=2: Wait for 2 of 3 replicas (majority)
  If W=3, R=1: Wait for all 3 on writes, single replica reads
  If W=1, R=3: Immediate writes, but reads must query all

The Inescapable Trade-off:

To achieve strong consistency:

Writes must wait for acknowledgment from W replicas
Reads must consult at least R replicas and resolve any differences
The operation latency is bounded by the slowest of the required replicas

In a geo-distributed system where replicas span continents, this is devastating:

Same data center: ~0.5ms RTT
Cross-region (e.g., US-East to US-West): ~70ms RTT
Cross-continent (US to Europe): ~100-150ms RTT

With W=2 and N=3 replicas spanning these locations, every write must wait for at least one cross-region RTT. Strong consistency imposes a latency floor that cannot be optimized away—it's physics, not engineering.

The Speed of Light Limit

Light in fiber travels at roughly 200,000 km/s. New York to London is ~5,500 km. The theoretical minimum RTT is ~55ms. With routing overhead and switching delays, real-world RTT is 80-150ms. No amount of optimization can beat physics—this is why the latency vs consistency trade-off is fundamental, not merely an implementation detail.

Relaxing Consistency for Latency:

If we choose eventual consistency (the EL choice in PACELC), we can:

Acknowledge writes after only the local replica confirms (W=1)
Read from only the local replica (R=1)
Propagate updates asynchronously in the background

This reduces write latency from 100ms+ to 1ms, a 100x improvement. But we sacrifice the guarantee that reads return the latest write—different replicas may temporarily have different data.

The key insight: this trade-off exists independently of partition behavior. Whether or not any partition occurs, you must choose between synchronous (consistent, slow) and asynchronous (inconsistent, fast) replication.

Why CAP Wasn't Enough

CAP theorem, despite its importance, has been criticized and misinterpreted since its introduction. Understanding these limitations illuminates why PACELC was necessary.

Problem 1: The Binary Fallacy

CAP presents C and A as binary choices—you have them or you don't. Reality is far more nuanced:

Consistency exists on a spectrum: linearizability, sequential consistency, causal consistency, eventual consistency, and many gradations in between.
Availability also varies: 99.9% vs 99.99% availability, graceful degradation vs total failure, and different failure domains.

CAP's binary framing obscures these nuances. PACELC, by introducing latency as a continuous variable, implicitly acknowledges the spectrum nature of these trade-offs.

CAP's Limitations

•Only addresses partition scenarios
•Binary view of consistency and availability
•Silent on normal operation trade-offs
•Doesn't address latency implications
•Oversimplifies real-world decisions
•Conflates different consistency models

What PACELC Adds

•Addresses both partition and normal operation
•Introduces latency as a key variable
•Four-way classification vs two-way
•Explains everyday operational behavior
•Guides database selection more precisely
•Acknowledges the consistency spectrum

Problem 2: Partitions Are the Exception, Not the Rule

CAP's focus on partition behavior is somewhat like an automobile safety guide that only discusses what to do during a collision. Useful, certainly—but you spend 99.999% of your driving time not colliding. For that time, you need different guidance.

Similarly, systems spend most of their operational life in the 'Else' state of PACELC—no partition present. CAP provides no framework for the trade-offs during this vast majority of operational time.

Problem 3: The Consistency Confusion

CAP uses 'Consistency' to mean linearizability—the strongest consistency guarantee. But real distributed systems employ many consistency models:

Linearizability: Behaves as if a single copy, operations appear instantaneous
Sequential consistency: All nodes see operations in the same order
Causal consistency: Causally related operations appear in order; concurrent operations may differ
Eventual consistency: Updates propagate eventually; no ordering guarantees

CAP's binary C/not-C framing ignores this rich spectrum. PACELC's latency dimension implicitly captures it—stronger consistency requires more coordination, thus higher latency.

Historical Context and Evolution

Understanding the timeline of distributed systems theory helps contextualize PACELC's contribution:

The Evolution of Distributed Systems Understanding:

Timeline of Key Distributed Systems Results
Year	Contribution	Key Insight
1978	Lamport's Time, Clocks paper	Fundamental impossibility of global time in distributed systems
1985	FLP Impossibility Result	Consensus impossible with even one faulty process (async model)
2000	Brewer's CAP Conjecture	Cannot have consistency, availability, and partition tolerance simultaneously
2002	Gilbert & Lynch CAP Proof	Formal proof of CAP as impossibility result
2008	Amazon Dynamo Paper	Demonstrated practical eventual consistency at scale; influenced modern NoSQL
2010	Google Megastore Paper	Achieved strong consistency across regions with latency trade-offs
2012	Abadi's PACELC Paper	Extended CAP to address normal operation latency/consistency trade-off
2012	Google Spanner Paper	TrueTime API enabling strong consistency with bounded latency

The Industry Context:

When Abadi proposed PACELC, the industry was grappling with a practical problem: CAP was being used to justify architectural decisions it didn't actually address. Engineers were saying:

"We're an AP system because we need availability"

But this left unanswered: what about the 99.9% of operations during normal network conditions? The AP label said nothing about whether those operations would be consistent or eventually consistent, fast or slow.

PACELC provided the vocabulary to distinguish between Cassandra (PA/EL—always prioritizing responsiveness) and a system like DynamoDB with strong reads (PA/EC—allowing inconsistency only during partitions).

The Spanner Influence:

Google's Spanner, announced the same year as PACELC, demonstrated that achieving PC/EC (strong consistency always) was possible at global scale—if you were willing to invest in specialized hardware (atomic clocks and GPS receivers) and accept bounded latency penalties. Spanner's existence validated PACELC's framing: the trade-off isn't impossibility, it's cost—in latency, complexity, or literal dollars.

The Brewer Clarification

In 2012, Eric Brewer himself published "CAP Twelve Years Later: How the 'Rules' Have Changed," acknowledging that the binary interpretation of CAP was too simplistic. He noted that the consistency/availability trade-off is continuous, and that systems can implement different strategies for different operations. PACELC formalizes much of what Brewer clarified.

PACELC in Practice: System Classifications

Let's examine how popular distributed systems classify under PACELC, revealing the nuances invisible under CAP's simpler model:

PACELC Classification of Popular Systems
System	PACELC	Partition Behavior	Normal Operation Behavior
Cassandra	PA/EL	Accepts writes on both sides, reconciles later	Prioritizes low latency; tunable consistency levels but defaults favor speed
DynamoDB (eventual)	PA/EL	Remains available across partitions	Eventually consistent reads are fast; writes use local quorum
DynamoDB (strong)	PA/EC	Remains available across partitions	Strong reads wait for consensus; higher latency for consistency
MongoDB	PC/EC	Primary becomes unavailable during election	All writes go to primary; reads can be from primary for consistency
PostgreSQL (streaming)	PC/EC	Primary unavailable if standbys unreachable	Synchronous replication ensures consistency; adds latency
CockroachDB	PC/EC	Majority required for operations	Consensus-based writes; latency floor determined by quorum
Google Spanner	PC/EC	Operations blocked if synchrony lost	TrueTime enables consistency with bounded latency ~10ms
Riak	PA/EL	Accepts conflicting writes, uses CRDTs	Eventual consistency with low latency; vector clocks track causality

Analysis of Classifications:

Why are there more PA/EL and PC/EC systems than mixed classifications?

Systems tend to cluster at the extremes because organizations have clear priorities:

PA/EL systems serve use cases where speed and availability trump correctness: social media feeds, caching layers, session stores, analytics aggregation. Temporary inconsistency is acceptable.
PC/EC systems serve use cases where correctness is paramount: financial transactions, inventory management, coordination services. Latency is acceptable.

The mixed classifications (PA/EC, PC/EL) represent more sophisticated approaches:

PA/EC says: "During normal times, we can afford to wait for consistency. But during a crisis, availability matters more." DynamoDB with strongly consistent reads exemplifies this.
PC/EL is rare and represents an unusual philosophy: "Enforce consistency when things are broken, but relax when healthy." This is counterintuitive and thus uncommon.

Configurability Matters

Many modern systems are PACELC-configurable rather than fixed. Cassandra and DynamoDB allow per-operation consistency levels. MongoDB offers read concern and write concern settings. This configurability lets a single system behave as PA/EL for some operations and PC/EC for others—providing flexibility that neither CAP nor a fixed PACELC classification captures.

Summary: PACELC as a Complete Framework

We've established why CAP provides an incomplete picture and how PACELC fills the gap. Let's consolidate the key insights:

Key Takeaways

•CAP addresses partition scenarios — It tells us about the consistency vs. availability trade-off during network failures, which are rare events.
•PACELC extends to normal operation — It addresses the consistency vs. latency trade-off during the vast majority of operational time when no partition exists.
•The four PACELC categories — PA/EL, PA/EC, PC/EL, PC/EC provide a richer classification than CAP's CP/AP dichotomy.
•Latency trade-offs are mathematical — Strong consistency requires synchronous coordination, imposing latency floors bounded by physics (speed of light).
•Most systems prioritize one extreme — PA/EL for speed-critical applications, PC/EC for correctness-critical applications. Mixed strategies are sophisticated but less common.
•Modern systems are configurable — Many databases allow per-operation consistency tuning, enabling different PACELC behaviors for different use cases.

The Practical Impact:

Understanding PACELC changes how you approach system design:

When selecting a database, ask not just "CP or AP?" but "What's the latency/consistency trade-off during normal operation?"
When designing a replication strategy, recognize that synchronous replication affects every write, not just those during partitions.
When explaining system behavior to stakeholders, PACELC provides vocabulary to discuss why certain operations are slow (they're prioritizing consistency) or occasionally inconsistent (they're prioritizing speed).

What's next:

In the following pages, we'll explore:

What happens during normal operation (the 'Else' clause) and how systems optimize for it
The latency vs. consistency trade-off in depth, with concrete numbers and strategies
Practical implications for database selection, replication design, and architecture decisions

Page Complete

You now understand why PACELC extends CAP to provide a complete framework for distributed system trade-offs. The next page will explore what happens during normal operation—the 'Else' clause of PACELC—where your system spends the vast majority of its time.

1 / 4

Loading learning content...

System Design (HLD)PACELC Theorem

PACELC Theorem: Beyond CAP

LevelIntermediate

Duration75 mins

TopicPACELC Theorem

1 / 4

Extension of CAP: Why CAP Isn't the Complete Picture

The Missing Piece in Distributed Systems Theory

But here's a provocative question: What does the CAP theorem tell you about system behavior when the network is working perfectly?

The answer is: almost nothing.

What You Will Learn

Revisiting CAP—The Foundation

Before we can understand why PACELC matters, we need to revisit CAP with fresh eyes—examining not just what it says, but critically, what it leaves unsaid.

The CAP Theorem in Brief:

In 2000, Eric Brewer conjectured (and in 2002, Seth Gilbert and Nancy Lynch formally proved) that any distributed data store can provide at most two of the following three guarantees simultaneously:

The Three CAP Properties

•Consistency (C) — Every read receives the most recent write or an error. All nodes see the same data at the same time. This is linearizability, the strongest consistency model.
•Availability (A) — Every request receives a (non-error) response, without the guarantee that it contains the most recent write. The system remains operational even if some nodes are unreachable.
•Partition Tolerance (P) — The system continues to operate despite an arbitrary number of messages being dropped or delayed between nodes. Network failures don't cause system-wide failure.

Why Partition Tolerance Is Non-Negotiable:

This means that in practice, the CAP choice reduces to CP vs. AP:

CP systems sacrifice availability during partitions to maintain consistency (example: a distributed database that becomes read-only during network splits)
AP systems sacrifice consistency during partitions to maintain availability (example: a system that allows writes to both sides of a partition, accepting potential conflicts)

Classic CAP Classification
System Type	During Partition	Example Systems
CP (Consistency + Partition Tolerance)	Rejects operations that can't be consistently processed	MongoDB (default), HBase, Spanner, Zookeeper
AP (Availability + Partition Tolerance)	Accepts operations, allows divergence	Cassandra, DynamoDB (default), CouchDB, Riak

The Critical Gap in CAP

CAP tells us what happens during a partition. But consider this reality:

Network partitions are rare events.

So what does CAP tell us about system behavior during normal operation?

Absolutely nothing.

The Blind Spot

The Latency Reality:

Consider a distributed database with three replicas across different data centers:

Strong consistency requires synchronous replication — A write isn't acknowledged until all (or a majority of) replicas confirm it. This means waiting for round-trip communication to potentially distant nodes.
Lower latency allows asynchronous replication — A write is acknowledged immediately after the primary processes it. Replicas are updated asynchronously, reducing wait time but creating a window where different replicas have different data.

Why Latency Trade-offs Dominate

•Frequency — Latency affects every operation; partitions are rare events. A system handles millions of requests per day, but might see partitions only a few times per year.
•User Experience — Users perceive latency directly. A 200ms vs 50ms response time is the difference between 'snappy' and 'sluggish'. Partition behavior is invisible to most users.
•Architectural Decisions — Database selection, replication strategy, and geographic distribution all hinge on latency trade-offs, not just partition behavior.
•Cost Implications — Synchronous replication across regions can multiply response times by 10x, affecting infrastructure costs, user satisfaction, and conversion rates.

Introducing PACELC

PACELC is pronounced "pass-elk" and stands for:

Partition → Availability vs Consistency; Else → Latency vs Consistency

Read it as: "If there is a Partition, choose between Availability and Consistency; Else (during normal operation), choose between Latency and Consistency."

This simple extension captures a profound insight: distributed systems make trade-offs along two dimensions, not one.

The PACELC Formula

IF Partition → A or C ELSE → L or C

The Four PACELC Classifications:

PACELC Classification Matrix
Classification	During Partition	Normal Operation	Behavior Summary
PA/EL	Availability over Consistency	Latency over Consistency	Always prioritizes responsiveness; never blocks for consistency
PA/EC	Availability over Consistency	Consistency over Latency	Relaxes during failures but strict during normal operation
PC/EL	Consistency over Availability	Latency over Consistency	Strict during failures but relaxes during normal operation
PC/EC	Consistency over Availability	Consistency over Latency	Always prioritizes consistency; accepts higher latency and reduced availability

Why Four Categories Matter:

The CAP theorem would classify Cassandra and DynamoDB both as "AP systems." But their behavior differs significantly during normal operation:

Cassandra with default settings is PA/EL — It prioritizes availability during partitions and prioritizes low latency during normal operation. Consistency is always secondary.
DynamoDB with strong consistency enabled is PA/EC — It prioritizes availability during partitions, but enforces strong consistency during normal operation, accepting higher latency.

This distinction is invisible under CAP but critically important for system architects choosing between these databases.

The Mathematical Foundation

To deeply understand PACELC, we need to examine the mathematical relationships that make these trade-offs inescapable.

The Latency-Consistency Relationship:

The fundamental equation governing this trade-off is derived from quorum systems:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Strong Consistency Requirement:
  R + W > N
 
Where:
  R = Read quorum (number of replicas consulted for reads)
  W = Write quorum (number of replicas that must acknowledge writes)
  N = Total number of replicas
 
Latency Implication:
  Write Latency ≥ RTT to Wth slowest replica
  Read Latency ≥ RTT to Rth slowest replica
 
For strong consistency with N=3:
  If W=2, R=2: Wait for 2 of 3 replicas (majority)
  If W=3, R=1: Wait for all 3 on writes, single replica reads
  If W=1, R=3: Immediate writes, but reads must query all

The Inescapable Trade-off:

To achieve strong consistency:

Writes must wait for acknowledgment from W replicas
Reads must consult at least R replicas and resolve any differences
The operation latency is bounded by the slowest of the required replicas

In a geo-distributed system where replicas span continents, this is devastating:

Same data center: ~0.5ms RTT
Cross-region (e.g., US-East to US-West): ~70ms RTT
Cross-continent (US to Europe): ~100-150ms RTT

The Speed of Light Limit

Relaxing Consistency for Latency:

If we choose eventual consistency (the EL choice in PACELC), we can:

Acknowledge writes after only the local replica confirms (W=1)
Read from only the local replica (R=1)
Propagate updates asynchronously in the background

This reduces write latency from 100ms+ to 1ms, a 100x improvement. But we sacrifice the guarantee that reads return the latest write—different replicas may temporarily have different data.

Why CAP Wasn't Enough

CAP theorem, despite its importance, has been criticized and misinterpreted since its introduction. Understanding these limitations illuminates why PACELC was necessary.

Problem 1: The Binary Fallacy

CAP presents C and A as binary choices—you have them or you don't. Reality is far more nuanced:

Consistency exists on a spectrum: linearizability, sequential consistency, causal consistency, eventual consistency, and many gradations in between.
Availability also varies: 99.9% vs 99.99% availability, graceful degradation vs total failure, and different failure domains.

CAP's binary framing obscures these nuances. PACELC, by introducing latency as a continuous variable, implicitly acknowledges the spectrum nature of these trade-offs.

CAP's Limitations

•Only addresses partition scenarios
•Binary view of consistency and availability
•Silent on normal operation trade-offs
•Doesn't address latency implications
•Oversimplifies real-world decisions
•Conflates different consistency models

What PACELC Adds

•Addresses both partition and normal operation
•Introduces latency as a key variable
•Four-way classification vs two-way
•Explains everyday operational behavior
•Guides database selection more precisely
•Acknowledges the consistency spectrum

Problem 2: Partitions Are the Exception, Not the Rule

Problem 3: The Consistency Confusion

CAP uses 'Consistency' to mean linearizability—the strongest consistency guarantee. But real distributed systems employ many consistency models:

Linearizability: Behaves as if a single copy, operations appear instantaneous
Sequential consistency: All nodes see operations in the same order
Causal consistency: Causally related operations appear in order; concurrent operations may differ
Eventual consistency: Updates propagate eventually; no ordering guarantees

CAP's binary C/not-C framing ignores this rich spectrum. PACELC's latency dimension implicitly captures it—stronger consistency requires more coordination, thus higher latency.

Historical Context and Evolution

Understanding the timeline of distributed systems theory helps contextualize PACELC's contribution:

The Evolution of Distributed Systems Understanding:

Timeline of Key Distributed Systems Results
Year	Contribution	Key Insight
1978	Lamport's Time, Clocks paper	Fundamental impossibility of global time in distributed systems
1985	FLP Impossibility Result	Consensus impossible with even one faulty process (async model)
2000	Brewer's CAP Conjecture	Cannot have consistency, availability, and partition tolerance simultaneously
2002	Gilbert & Lynch CAP Proof	Formal proof of CAP as impossibility result
2008	Amazon Dynamo Paper	Demonstrated practical eventual consistency at scale; influenced modern NoSQL
2010	Google Megastore Paper	Achieved strong consistency across regions with latency trade-offs
2012	Abadi's PACELC Paper	Extended CAP to address normal operation latency/consistency trade-off
2012	Google Spanner Paper	TrueTime API enabling strong consistency with bounded latency

The Industry Context:

When Abadi proposed PACELC, the industry was grappling with a practical problem: CAP was being used to justify architectural decisions it didn't actually address. Engineers were saying:

"We're an AP system because we need availability"

The Spanner Influence:

The Brewer Clarification

PACELC in Practice: System Classifications

Let's examine how popular distributed systems classify under PACELC, revealing the nuances invisible under CAP's simpler model:

PACELC Classification of Popular Systems
System	PACELC	Partition Behavior	Normal Operation Behavior
Cassandra	PA/EL	Accepts writes on both sides, reconciles later	Prioritizes low latency; tunable consistency levels but defaults favor speed
DynamoDB (eventual)	PA/EL	Remains available across partitions	Eventually consistent reads are fast; writes use local quorum
DynamoDB (strong)	PA/EC	Remains available across partitions	Strong reads wait for consensus; higher latency for consistency
MongoDB	PC/EC	Primary becomes unavailable during election	All writes go to primary; reads can be from primary for consistency
PostgreSQL (streaming)	PC/EC	Primary unavailable if standbys unreachable	Synchronous replication ensures consistency; adds latency
CockroachDB	PC/EC	Majority required for operations	Consensus-based writes; latency floor determined by quorum
Google Spanner	PC/EC	Operations blocked if synchrony lost	TrueTime enables consistency with bounded latency ~10ms
Riak	PA/EL	Accepts conflicting writes, uses CRDTs	Eventual consistency with low latency; vector clocks track causality

Analysis of Classifications:

Why are there more PA/EL and PC/EC systems than mixed classifications?

Systems tend to cluster at the extremes because organizations have clear priorities:

PA/EL systems serve use cases where speed and availability trump correctness: social media feeds, caching layers, session stores, analytics aggregation. Temporary inconsistency is acceptable.
PC/EC systems serve use cases where correctness is paramount: financial transactions, inventory management, coordination services. Latency is acceptable.

The mixed classifications (PA/EC, PC/EL) represent more sophisticated approaches:

PA/EC says: "During normal times, we can afford to wait for consistency. But during a crisis, availability matters more." DynamoDB with strongly consistent reads exemplifies this.
PC/EL is rare and represents an unusual philosophy: "Enforce consistency when things are broken, but relax when healthy." This is counterintuitive and thus uncommon.

Configurability Matters

Summary: PACELC as a Complete Framework

We've established why CAP provides an incomplete picture and how PACELC fills the gap. Let's consolidate the key insights:

Key Takeaways

•CAP addresses partition scenarios — It tells us about the consistency vs. availability trade-off during network failures, which are rare events.
•PACELC extends to normal operation — It addresses the consistency vs. latency trade-off during the vast majority of operational time when no partition exists.
•The four PACELC categories — PA/EL, PA/EC, PC/EL, PC/EC provide a richer classification than CAP's CP/AP dichotomy.
•Latency trade-offs are mathematical — Strong consistency requires synchronous coordination, imposing latency floors bounded by physics (speed of light).
•Most systems prioritize one extreme — PA/EL for speed-critical applications, PC/EC for correctness-critical applications. Mixed strategies are sophisticated but less common.
•Modern systems are configurable — Many databases allow per-operation consistency tuning, enabling different PACELC behaviors for different use cases.

The Practical Impact:

Understanding PACELC changes how you approach system design:

When selecting a database, ask not just "CP or AP?" but "What's the latency/consistency trade-off during normal operation?"
When designing a replication strategy, recognize that synchronous replication affects every write, not just those during partitions.
When explaining system behavior to stakeholders, PACELC provides vocabulary to discuss why certain operations are slow (they're prioritizing consistency) or occasionally inconsistent (they're prioritizing speed).

What's next:

In the following pages, we'll explore:

What happens during normal operation (the 'Else' clause) and how systems optimize for it
The latency vs. consistency trade-off in depth, with concrete numbers and strategies
Practical implications for database selection, replication design, and architecture decisions

Page Complete

1 / 4