Database Management SystemsNoSQL Overview

NoSQL Overview: Understanding the NoSQL Paradigm

LevelIntermediate

Duration60 mins

TopicNoSQL Overview

3 / 5

CAP and BASE: Theoretical Foundations of Distributed Databases

The Impossibility Theorem That Changed Everything

In the year 2000, computer scientist Eric Brewer stood before an audience at the ACM Symposium on Principles of Distributed Computing and presented a conjecture that would fundamentally reshape how we think about distributed databases. His claim was deceptively simple yet profound:

"It is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees: Consistency, Availability, and Partition tolerance."

Two years later, Seth Gilbert and Nancy Lynch published a formal proof, elevating Brewer's conjecture to Brewer's theorem, commonly known as the CAP theorem.

This theorem didn't introduce new engineering constraints—it formalized an inherent limitation of physics. When network partitions occur (and they will), every distributed system must choose: reject some requests to maintain consistency, or serve requests with potentially stale data to maintain availability.

The CAP theorem explains why NoSQL databases make the trade-offs they do.

What You Will Learn

By the end of this page, you will understand the CAP theorem deeply—not just the theorem statement, but its implications, limitations, and practical applications. You'll understand ACID vs. BASE semantics and be able to evaluate database systems through the lens of consistency-availability trade-offs.

Understanding the CAP Properties

Before analyzing the theorem, we must precisely define its three properties. Misunderstandings of these definitions lead to confusion about CAP's implications.

Consistency (C)

In CAP, "consistency" means linearizability—the strongest form of consistency guarantee. Every read receives the most recent write or an error. All nodes see the same data at the same time.

This is different from the "C" in ACID, which refers to maintaining database invariants (constraints, foreign keys). CAP consistency is about data freshness across distributed nodes.

Formally: After a write completes successfully, all subsequent reads (from any node) must return that write's value or a newer value.

Availability (A)

Availability means every request to a non-failing node receives a response—without guarantee that it contains the most recent write.

Importantly:

Failed nodes are excluded (they're not expected to respond)
The response must be "reasonable" (not an error indicating unavailability)
There's no latency bound (though practical systems have timeouts)

Formally: Every request received by a non-failing node must result in a response.

Partition Tolerance (P)

Partition tolerance means the system continues to operate despite network partitions—when communication between some nodes is lost.

Network partitions are:

Messages lost or arbitrarily delayed between nodes
A subset of nodes unable to communicate with another subset
A fundamental reality of distributed systems

Formally: The system continues to operate despite arbitrary partitioning due to network failures.

CAP Properties Defined
Property	Guarantee	Measured By	Cost of Not Having
Consistency	All nodes see same data simultaneously	Every read returns latest write	Stale reads, conflicting views
Availability	Every request gets a response	Successful responses (not errors)	Rejected requests, downtime
Partition Tolerance	System works despite network splits	Continued operation during partitions	Cannot build distributed systems

P Is Not Optional

Network partitions happen. Hardware fails. Cables get cut. Cloud availability zones become unreachable. Choosing 'CA' (Consistency + Availability without Partition Tolerance) means building a single-node system—which isn't distributed and can't scale horizontally. For distributed systems, the real choice is between CP and AP.

The CAP Theorem: Why You Can't Have It All

Now let's understand why CAP forces a choice. The proof is intuitive once you visualize what happens during a network partition.

The Scenario

Imagine a distributed database with two nodes, N1 and N2, each holding a copy of value V.

Initial state: Both nodes have V = "A"
Client writes to N1: Set V = "B" (N1 now has V = "B")
Network partition occurs: N1 and N2 cannot communicate
Client reads from N2: What should N2 return?

The Impossible Choice

If we prioritize Consistency (CP):

N2 cannot confirm N1's state due to the partition
N2 must refuse to answer (might return stale data)
The read fails—availability is sacrificed
Client gets an error, not a response

If we prioritize Availability (AP):

N2 must respond to the read request
N2 returns V = "A" (its local value)
Client gets a stale value—consistency is sacrificed
System remains available but readers see different values

There is no third option. During a partition, a node that doesn't know the latest state must either admit it (breaking availability) or guess (breaking consistency).

Converting Mermaid diagram...

Mathematical Certainty

This isn't a limitation of current technology that might be overcome—it's mathematically proven impossible. Information cannot travel faster than light, network delays are unbounded, and a node cannot know what it doesn't know.

The CAP theorem is a fundamental constraint, like conservation of energy. Systems claiming to violate it are either:

Redefining the terms (using weaker definitions of consistency)
Not actually partition-tolerant (single-node or synchronized clusters)
Making unstated assumptions (bounded network delays)

CP vs. AP: Database Classification

Distributed databases can be broadly classified based on their CAP preference during partitions.

CP Systems (Consistency + Partition Tolerance)

CP systems sacrifice availability during partitions to maintain consistency. When a partition occurs, affected nodes stop accepting writes (and sometimes reads) until the partition heals.

Examples of CP systems:

Traditional RDBMS with synchronous replication: PostgreSQL streaming replication (synchronous mode), MySQL Group Replication
Distributed SQL databases: CockroachDB, Google Spanner, YugabyteDB
Consensus-based systems: Etcd, Consul, ZooKeeper
MongoDB (with majority write concern): Strong consistency at cost of availability

Use cases for CP:

Financial transactions (account balances must be accurate)
Inventory management (prevent overselling)
Leader election and distributed locking
Configuration management (all nodes must see same config)

CP System Characteristics

•Strong consistency guaranteed
•No stale reads possible
•Simpler application logic
•May reject requests during partitions
•Often uses consensus protocols (Raft, Paxos)
•Higher latency for writes (synchronous replication)

AP System Characteristics

•Always available for reads/writes
•Eventual consistency (stale reads possible)
•Application handles conflict resolution
•No request rejection during partitions
•Uses anti-entropy, gossip protocols
•Lower write latency (asynchronous replication)

AP Systems (Availability + Partition Tolerance)

AP systems sacrifice consistency during partitions to remain available. All nodes continue accepting reads and writes, with conflicts resolved later.

Examples of AP systems:

Cassandra: Tunable consistency but typically configured for availability
DynamoDB: Available with eventual consistency option
CouchDB: Built for offline-first, eventual sync
Riak: Dynamo-inspired, availability-focused
DNS: The internet's original eventually consistent system

Use cases for AP:

Social media feeds (stale likes don't matter)
Shopping carts (Amazon's Dynamo was built for this)
Content delivery (cached content is acceptable)
IoT data collection (eventual sync is fine)
Analytics event logging (some loss acceptable)

The Classification Is a Simplification

Real databases aren't purely CP or AP—they offer tunable consistency. Cassandra can behave as CP with quorum reads/writes. MongoDB can behave as AP with eventual consistency reads. The CAP classification describes default behavior and design philosophy, not immutable properties.

Beyond Binary: Nuanced CAP Understanding

The original CAP theorem presentation led to oversimplified thinking. Modern understanding recognizes important nuances.

CAP During Normal Operation

CAP only constrains behavior during partitions. When the network is healthy, well-designed systems can provide high consistency AND high availability. The trade-off only manifests when partitions occur.

Partition Frequency Considerations

Partitions occur but aren't constant:

Within a data center: Rare (redundant networking)
Between data centers: More common (WAN issues)
Global distribution: Expected (international network instability)

If partitions are rare and brief, CP systems experience minimal availability impact. The choice matters most in environments where partitions are frequent or long-lasting.

Latency as a Signal

Very high latency is indistinguishable from a partition. If a response takes 30 seconds, is that availability (you get a response) or unavailability (unusable latency)?

Practical systems must make decisions under latency uncertainty:

Wait longer for consistency (but risk timeout)
Return possibly stale data quickly (sacrifice consistency)

PACELC: Extending CAP

Daniel Abadi proposed PACELC to capture what happens when there's no partition:

If there is a Partition, choose Availability or Consistency; Else, when running normally, choose Latency or Consistency.

This recognizes that even without partitions, there's a trade-off between low latency (asynchronous replication) and consistency (synchronous replication).

PACELC Classifications:

PA/EL (DynamoDB, Cassandra): Prioritize availability during partition, latency during normal operation
PC/EC (Traditional RDBMS, Spanner): Prioritize consistency always, accept higher latency
PA/EC (MongoDB default): Availability during partition, but consistency matters normally

PACELC provides a more complete picture of database behavior across all operating conditions.

PACELC Classification of Popular Databases
System	During Partition	Else (Normal)	PACELC
DynamoDB (default)	Available	Low Latency	PA/EL
Cassandra (default)	Available	Low Latency	PA/EL
MongoDB (majority)	Consistent	Low Latency	PC/EL
CockroachDB	Consistent	Consistent	PC/EC
Google Spanner	Consistent	Consistent	PC/EC
Riak	Available	Low Latency	PA/EL

ACID: The Traditional Standard

To understand BASE, we must first review ACID—the consistency model that relational databases provide and that NoSQL databases often relax.

ACID Properties

ACID Transaction Guarantees

•Atomicity — Transactions are all-or-nothing. If any part fails, the entire transaction is rolled back. There are no partial updates visible to other transactions.
•Consistency — Transactions move the database from one valid state to another. All constraints (foreign keys, unique constraints, check constraints) are satisfied after commit.
•Isolation — Concurrent transactions don't interfere with each other. Each transaction sees a consistent snapshot, as if it were the only transaction running.
•Durability — Once committed, data survives system failures. Committed transactions are persisted to non-volatile storage.

acid-transaction-example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- ACID bank transfer example
BEGIN TRANSACTION;
 
    -- Atomicity: Both updates happen or neither happens
    UPDATE accounts SET balance = balance - 100 WHERE account_id = 'A';
    UPDATE accounts SET balance = balance + 100 WHERE account_id = 'B';
    
    -- Consistency: Constraint check happens automatically
    -- (e.g., balance >= 0 constraint)
    
    -- Isolation: Other transactions see either:
    --   - Both accounts unchanged (before commit)
    --   - Both accounts updated (after commit)
    --   - Never one updated and one not
 
COMMIT;
-- Durability: After COMMIT returns, data survives power loss

ACID in Distributed Systems

Implementing ACID across distributed nodes is challenging:

Atomicity across nodes: Requires two-phase commit (2PC) or similar protocols Consistency distributed: All nodes must agree on constraints Isolation across nodes: Distributed locks or snapshot isolation Durability across nodes: Data must be persisted on multiple nodes

Each of these adds latency and coordination overhead. The more nodes involved, the more expensive ACID becomes.

This is why traditional RDBMS struggles to scale horizontally while maintaining ACID guarantees—and why NoSQL databases often adopt weaker guarantees like BASE.

BASE: Eventually Consistent Semantics

BASE is a backronym (intentionally contrasting with ACID) describing the consistency model embraced by many NoSQL databases.

BASE Defined

Basically Available, Soft state, Eventually consistent

Let's unpack each component:

Basically Available: The system guarantees availability—it will respond to every request, even if the response is stale or an indication that the operation is pending. The system doesn't fail; it degrades gracefully.

Soft State: The system's state may change over time, even without new input, as updates propagate through the system. There's no guarantee that all nodes have the same view at any moment.

Eventually Consistent: If no new updates are made, eventually all replicas will converge to the same state. Given enough time, all nodes will be consistent—but "eventually" might be milliseconds or hours.

ACID vs. BASE Comparison
Property	ACID	BASE
Philosophy	Pessimistic (assume conflicts)	Optimistic (assume success)
Consistency	Strong (immediate)	Eventual (deferred)
Availability	May sacrifice for consistency	Prioritized over consistency
Isolation	Full transaction isolation	Limited or application-managed
Failure handling	Rollback entire transaction	Compensating actions, conflict resolution
Scalability	Limited by coordination	Scales horizontally
Complexity	Database handles complexity	Application handles complexity

Eventual Consistency in Practice

Eventual consistency sounds vague, but it can be bounded:

Consistency window: The time between a write and all replicas being updated

Same data center: Typically milliseconds
Cross-region: Typically seconds to minutes
Global distribution: Could be minutes to hours

Read-your-writes consistency: A weaker guarantee where a client always sees their own writes, even if other clients might not yet.

Monotonic reads: Once a client has seen a value, they won't see older values (no time travel).

Causal consistency: Effects of causally related operations are seen in order. If A causes B, everyone who sees B also sees A.

Eventual Consistency Is Often Fast

In practice, 'eventual' often means 'within milliseconds.' Modern NoSQL databases propagate updates quickly through efficient replication. Eventual consistency doesn't mean 'inconsistent'—it means 'consistent after a brief propagation delay.' For many applications, this delay is imperceptible.

Tunable Consistency: The Best of Both Worlds

Modern databases recognize that consistency requirements vary by operation. Rather than forcing a system-wide choice, they offer tunable consistency levels.

Cassandra's Consistency Levels

Cassandra allows specifying consistency per operation:

Write consistency levels:

ANY: Write to at least one node (may be a hint)
ONE: Write to at least one replica
QUORUM: Write to majority of replicas (⌊N/2⌋ + 1)
ALL: Write to all replicas

Read consistency levels:

ONE: Read from one replica (fast but possibly stale)
QUORUM: Read from majority, return most recent value
ALL: Read from all replicas (slowest, most consistent)

The magic number: If W + R > N (write replicas + read replicas > total replicas), you achieve strong consistency because at least one replica has the latest data.

tunable-consistency-example
CQL (Cassandra)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Configure consistency per query in Cassandra
 
-- High availability, possible stale reads
CONSISTENCY ONE;
SELECT * FROM user_sessions WHERE user_id = '12345';
 
-- Strong consistency for important operations
CONSISTENCY QUORUM;
UPDATE account_balance SET balance = 100.00 WHERE account_id = 'A';
 
-- Financial transaction requiring all nodes
CONSISTENCY ALL;
INSERT INTO transaction_log (id, amount, timestamp) 
VALUES (uuid(), 1000.00, now());
 
-- Consistency arithmetic:
-- With replication factor N=3:
-- QUORUM = 2 nodes
-- W(QUORUM) + R(QUORUM) = 2 + 2 = 4 > 3 = N
-- ∴ Strong consistency guaranteed

MongoDB's Write and Read Concerns

MongoDB offers similar flexibility:

Write concern: How many replicas must acknowledge a write

w: 0: Fire and forget (no acknowledgment)
w: 1: Primary only
w: majority: Majority of replica set
w: <number>: Specific count

Read concern: What data can be read

local: Return whatever is in memory (might be uncommitted)
available: Like local, but in sharded clusters
majority: Return only data written to majority of nodes
linearizable: Strongest—reflects all successful writes

Read preference: Which nodes to read from

primary: Only the primary (freshest data)
primaryPreferred: Primary if available, else secondary
secondary: Read from secondaries (reduces primary load)
secondaryPreferred: Secondaries if available, else primary
nearest: Lowest network latency

Choose Consistency Per Use Case

The power of tunable consistency is matching guarantees to requirements. Payment processing might use strong consistency (QUORUM for both read and write), while social media feed fetching uses eventual consistency (ONE for reads). One database can serve both patterns.

Conflict Resolution: When Eventual Isn't Enough

In AP systems with eventual consistency, conflicting updates can occur during partitions. Different nodes may accept different writes for the same data. When the partition heals, these conflicts must be resolved.

Conflict Resolution Strategies

Common Resolution Approaches

•Last Write Wins (LWW) — Each write carries a timestamp; the latest timestamp wins. Simple but can lose data if clocks are skewed. Used by: Cassandra, DynamoDB (default).
•First Write Wins — The earliest write is preserved; subsequent writes are rejected. Useful for immutable data like transaction logs.
•Vector Clocks — Track causal history of updates; detect concurrent writes. Return both versions to the application for resolution. Used by: Riak, Amazon Dynamo.
•Merge Functions — Custom logic merges conflicting values (e.g., union of sets, max of counters). Application-specific semantics.
•CRDTs (Conflict-free Replicated Data Types) — Data structures that automatically merge without conflicts. Counters, sets, registers with mathematical guarantees. Used by: Redis, Riak.

conflict-resolution-examples
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// Example: Shopping Cart conflict resolution
 
// Last Write Wins - Simple but loses data
interface LWWCart {
    items: string[];
    timestamp: number;
}
 
function resolveLWW(cart1: LWWCart, cart2: LWWCart): LWWCart {
    return cart1.timestamp > cart2.timestamp ? cart1 : cart2;
}
// Problem: If user adds ItemA on Node1, ItemB on Node2,
// one item is lost!
 
// Merge Function - Preserves all additions
interface MergeCart {
    items: Set<string>;
}
 
function mergeCarts(cart1: MergeCart, cart2: MergeCart): MergeCart {
    return {
        items: new Set([...cart1.items, ...cart2.items])
    };
}
// Better: Cart contains ItemA AND ItemB after merge
 
// CRDT G-Counter - Increment-only counter, auto-mergeable
interface GCounter {
    [nodeId: string]: number;  // Each node tracks its own increments
}
 
function mergeGCounters(a: GCounter, b: GCounter): GCounter {
    const result: GCounter = { ...a };
    for (const [nodeId, count] of Object.entries(b)) {
        result[nodeId] = Math.max(result[nodeId] || 0, count);
    }
    return result;
}
 
function getValue(counter: GCounter): number {
    return Object.values(counter).reduce((sum, n) => sum + n, 0);
}
// Perfect: Count is always correct regardless of merge order

Conflict Resolution Is Application-Specific

There's no universal 'correct' conflict resolution. A shopping cart should merge items (union). A bank balance cannot—the database can't know whether to add or subtract conflicting updates. Choose resolution strategies that match your domain semantics, or use strong consistency where conflicts are unacceptable.

Summary: CAP and BASE in Perspective

We've explored the theoretical foundations that govern distributed database design. These aren't academic exercises—they're practical constraints that shape every database decision.

Key Takeaways

•CAP is a fundamental constraint — During partitions, distributed systems must choose between consistency and availability. This is mathematically proven, not a technology limitation.
•Partition tolerance is mandatory — In distributed systems, partitions happen. The real choice is between CP (consistency during partitions) and AP (availability during partitions).
•ACID provides strong guarantees — Atomicity, Consistency, Isolation, Durability ensure correctness but are expensive in distributed systems.
•BASE accepts weaker guarantees — Basically Available, Soft state, Eventually consistent—prioritizes availability and scalability over immediate consistency.
•Modern databases offer tunability — Consistency levels can be specified per operation, matching guarantees to requirements.
•Conflict resolution is essential for AP systems — When consistency is relaxed, conflicts occur and must be resolved (LWW, merge functions, CRDTs).

What's next:

With the theoretical foundation established, we'll explore the practical landscape of NoSQL databases. The next page examines the four primary NoSQL categories—key-value, document, column-family, and graph databases—each optimized for different data models and access patterns.

Page Complete

You now understand the CAP theorem deeply—its properties, implications, and limitations. You can distinguish ACID from BASE semantics, classify databases by their CAP preferences, and understand conflict resolution strategies. This theoretical foundation will guide your database technology choices.

3 / 5

Loading learning content...

Database Management SystemsNoSQL Overview

NoSQL Overview: Understanding the NoSQL Paradigm

LevelIntermediate

Duration60 mins

TopicNoSQL Overview

3 / 5

CAP and BASE: Theoretical Foundations of Distributed Databases

The Impossibility Theorem That Changed Everything

"It is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees: Consistency, Availability, and Partition tolerance."

Two years later, Seth Gilbert and Nancy Lynch published a formal proof, elevating Brewer's conjecture to Brewer's theorem, commonly known as the CAP theorem.

The CAP theorem explains why NoSQL databases make the trade-offs they do.

What You Will Learn

Understanding the CAP Properties

Before analyzing the theorem, we must precisely define its three properties. Misunderstandings of these definitions lead to confusion about CAP's implications.

Consistency (C)

In CAP, "consistency" means linearizability—the strongest form of consistency guarantee. Every read receives the most recent write or an error. All nodes see the same data at the same time.

This is different from the "C" in ACID, which refers to maintaining database invariants (constraints, foreign keys). CAP consistency is about data freshness across distributed nodes.

Formally: After a write completes successfully, all subsequent reads (from any node) must return that write's value or a newer value.

Availability (A)

Availability means every request to a non-failing node receives a response—without guarantee that it contains the most recent write.

Importantly:

Failed nodes are excluded (they're not expected to respond)
The response must be "reasonable" (not an error indicating unavailability)
There's no latency bound (though practical systems have timeouts)

Formally: Every request received by a non-failing node must result in a response.

Partition Tolerance (P)

Partition tolerance means the system continues to operate despite network partitions—when communication between some nodes is lost.

Network partitions are:

Messages lost or arbitrarily delayed between nodes
A subset of nodes unable to communicate with another subset
A fundamental reality of distributed systems

Formally: The system continues to operate despite arbitrary partitioning due to network failures.

CAP Properties Defined
Property	Guarantee	Measured By	Cost of Not Having
Consistency	All nodes see same data simultaneously	Every read returns latest write	Stale reads, conflicting views
Availability	Every request gets a response	Successful responses (not errors)	Rejected requests, downtime
Partition Tolerance	System works despite network splits	Continued operation during partitions	Cannot build distributed systems

P Is Not Optional

The CAP Theorem: Why You Can't Have It All

Now let's understand why CAP forces a choice. The proof is intuitive once you visualize what happens during a network partition.

The Scenario

Imagine a distributed database with two nodes, N1 and N2, each holding a copy of value V.

Initial state: Both nodes have V = "A"
Client writes to N1: Set V = "B" (N1 now has V = "B")
Network partition occurs: N1 and N2 cannot communicate
Client reads from N2: What should N2 return?

The Impossible Choice

If we prioritize Consistency (CP):

N2 cannot confirm N1's state due to the partition
N2 must refuse to answer (might return stale data)
The read fails—availability is sacrificed
Client gets an error, not a response

If we prioritize Availability (AP):

N2 must respond to the read request
N2 returns V = "A" (its local value)
Client gets a stale value—consistency is sacrificed
System remains available but readers see different values

There is no third option. During a partition, a node that doesn't know the latest state must either admit it (breaking availability) or guess (breaking consistency).

Converting Mermaid diagram...

Mathematical Certainty

The CAP theorem is a fundamental constraint, like conservation of energy. Systems claiming to violate it are either:

Redefining the terms (using weaker definitions of consistency)
Not actually partition-tolerant (single-node or synchronized clusters)
Making unstated assumptions (bounded network delays)

CP vs. AP: Database Classification

Distributed databases can be broadly classified based on their CAP preference during partitions.

CP Systems (Consistency + Partition Tolerance)

CP systems sacrifice availability during partitions to maintain consistency. When a partition occurs, affected nodes stop accepting writes (and sometimes reads) until the partition heals.

Examples of CP systems:

Traditional RDBMS with synchronous replication: PostgreSQL streaming replication (synchronous mode), MySQL Group Replication
Distributed SQL databases: CockroachDB, Google Spanner, YugabyteDB
Consensus-based systems: Etcd, Consul, ZooKeeper
MongoDB (with majority write concern): Strong consistency at cost of availability

Use cases for CP:

Financial transactions (account balances must be accurate)
Inventory management (prevent overselling)
Leader election and distributed locking
Configuration management (all nodes must see same config)

CP System Characteristics

•Strong consistency guaranteed
•No stale reads possible
•Simpler application logic
•May reject requests during partitions
•Often uses consensus protocols (Raft, Paxos)
•Higher latency for writes (synchronous replication)

AP System Characteristics

•Always available for reads/writes
•Eventual consistency (stale reads possible)
•Application handles conflict resolution
•No request rejection during partitions
•Uses anti-entropy, gossip protocols
•Lower write latency (asynchronous replication)

AP Systems (Availability + Partition Tolerance)

AP systems sacrifice consistency during partitions to remain available. All nodes continue accepting reads and writes, with conflicts resolved later.

Examples of AP systems:

Cassandra: Tunable consistency but typically configured for availability
DynamoDB: Available with eventual consistency option
CouchDB: Built for offline-first, eventual sync
Riak: Dynamo-inspired, availability-focused
DNS: The internet's original eventually consistent system

Use cases for AP:

Social media feeds (stale likes don't matter)
Shopping carts (Amazon's Dynamo was built for this)
Content delivery (cached content is acceptable)
IoT data collection (eventual sync is fine)
Analytics event logging (some loss acceptable)

The Classification Is a Simplification

Beyond Binary: Nuanced CAP Understanding

The original CAP theorem presentation led to oversimplified thinking. Modern understanding recognizes important nuances.

CAP During Normal Operation

Partition Frequency Considerations

Partitions occur but aren't constant:

Within a data center: Rare (redundant networking)
Between data centers: More common (WAN issues)
Global distribution: Expected (international network instability)

If partitions are rare and brief, CP systems experience minimal availability impact. The choice matters most in environments where partitions are frequent or long-lasting.

Latency as a Signal

Very high latency is indistinguishable from a partition. If a response takes 30 seconds, is that availability (you get a response) or unavailability (unusable latency)?

Practical systems must make decisions under latency uncertainty:

Wait longer for consistency (but risk timeout)
Return possibly stale data quickly (sacrifice consistency)

PACELC: Extending CAP

Daniel Abadi proposed PACELC to capture what happens when there's no partition:

If there is a Partition, choose Availability or Consistency; Else, when running normally, choose Latency or Consistency.

This recognizes that even without partitions, there's a trade-off between low latency (asynchronous replication) and consistency (synchronous replication).

PACELC Classifications:

PA/EL (DynamoDB, Cassandra): Prioritize availability during partition, latency during normal operation
PC/EC (Traditional RDBMS, Spanner): Prioritize consistency always, accept higher latency
PA/EC (MongoDB default): Availability during partition, but consistency matters normally

PACELC provides a more complete picture of database behavior across all operating conditions.

PACELC Classification of Popular Databases
System	During Partition	Else (Normal)	PACELC
DynamoDB (default)	Available	Low Latency	PA/EL
Cassandra (default)	Available	Low Latency	PA/EL
MongoDB (majority)	Consistent	Low Latency	PC/EL
CockroachDB	Consistent	Consistent	PC/EC
Google Spanner	Consistent	Consistent	PC/EC
Riak	Available	Low Latency	PA/EL

ACID: The Traditional Standard

To understand BASE, we must first review ACID—the consistency model that relational databases provide and that NoSQL databases often relax.

ACID Properties

ACID Transaction Guarantees

•Atomicity — Transactions are all-or-nothing. If any part fails, the entire transaction is rolled back. There are no partial updates visible to other transactions.
•Consistency — Transactions move the database from one valid state to another. All constraints (foreign keys, unique constraints, check constraints) are satisfied after commit.
•Isolation — Concurrent transactions don't interfere with each other. Each transaction sees a consistent snapshot, as if it were the only transaction running.
•Durability — Once committed, data survives system failures. Committed transactions are persisted to non-volatile storage.

acid-transaction-example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- ACID bank transfer example
BEGIN TRANSACTION;
 
    -- Atomicity: Both updates happen or neither happens
    UPDATE accounts SET balance = balance - 100 WHERE account_id = 'A';
    UPDATE accounts SET balance = balance + 100 WHERE account_id = 'B';
    
    -- Consistency: Constraint check happens automatically
    -- (e.g., balance >= 0 constraint)
    
    -- Isolation: Other transactions see either:
    --   - Both accounts unchanged (before commit)
    --   - Both accounts updated (after commit)
    --   - Never one updated and one not
 
COMMIT;
-- Durability: After COMMIT returns, data survives power loss

ACID in Distributed Systems

Implementing ACID across distributed nodes is challenging:

Each of these adds latency and coordination overhead. The more nodes involved, the more expensive ACID becomes.

This is why traditional RDBMS struggles to scale horizontally while maintaining ACID guarantees—and why NoSQL databases often adopt weaker guarantees like BASE.

BASE: Eventually Consistent Semantics

BASE is a backronym (intentionally contrasting with ACID) describing the consistency model embraced by many NoSQL databases.

BASE Defined

Basically Available, Soft state, Eventually consistent

Let's unpack each component:

Soft State: The system's state may change over time, even without new input, as updates propagate through the system. There's no guarantee that all nodes have the same view at any moment.

ACID vs. BASE Comparison
Property	ACID	BASE
Philosophy	Pessimistic (assume conflicts)	Optimistic (assume success)
Consistency	Strong (immediate)	Eventual (deferred)
Availability	May sacrifice for consistency	Prioritized over consistency
Isolation	Full transaction isolation	Limited or application-managed
Failure handling	Rollback entire transaction	Compensating actions, conflict resolution
Scalability	Limited by coordination	Scales horizontally
Complexity	Database handles complexity	Application handles complexity

Eventual Consistency in Practice

Eventual consistency sounds vague, but it can be bounded:

Consistency window: The time between a write and all replicas being updated

Same data center: Typically milliseconds
Cross-region: Typically seconds to minutes
Global distribution: Could be minutes to hours

Read-your-writes consistency: A weaker guarantee where a client always sees their own writes, even if other clients might not yet.

Monotonic reads: Once a client has seen a value, they won't see older values (no time travel).

Causal consistency: Effects of causally related operations are seen in order. If A causes B, everyone who sees B also sees A.

Eventual Consistency Is Often Fast

Tunable Consistency: The Best of Both Worlds

Modern databases recognize that consistency requirements vary by operation. Rather than forcing a system-wide choice, they offer tunable consistency levels.

Cassandra's Consistency Levels

Cassandra allows specifying consistency per operation:

Write consistency levels:

ANY: Write to at least one node (may be a hint)
ONE: Write to at least one replica
QUORUM: Write to majority of replicas (⌊N/2⌋ + 1)
ALL: Write to all replicas

Read consistency levels:

ONE: Read from one replica (fast but possibly stale)
QUORUM: Read from majority, return most recent value
ALL: Read from all replicas (slowest, most consistent)

The magic number: If W + R > N (write replicas + read replicas > total replicas), you achieve strong consistency because at least one replica has the latest data.

tunable-consistency-example
CQL (Cassandra)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Configure consistency per query in Cassandra
 
-- High availability, possible stale reads
CONSISTENCY ONE;
SELECT * FROM user_sessions WHERE user_id = '12345';
 
-- Strong consistency for important operations
CONSISTENCY QUORUM;
UPDATE account_balance SET balance = 100.00 WHERE account_id = 'A';
 
-- Financial transaction requiring all nodes
CONSISTENCY ALL;
INSERT INTO transaction_log (id, amount, timestamp) 
VALUES (uuid(), 1000.00, now());
 
-- Consistency arithmetic:
-- With replication factor N=3:
-- QUORUM = 2 nodes
-- W(QUORUM) + R(QUORUM) = 2 + 2 = 4 > 3 = N
-- ∴ Strong consistency guaranteed

MongoDB's Write and Read Concerns

MongoDB offers similar flexibility:

Write concern: How many replicas must acknowledge a write

w: 0: Fire and forget (no acknowledgment)
w: 1: Primary only
w: majority: Majority of replica set
w: <number>: Specific count

Read concern: What data can be read

local: Return whatever is in memory (might be uncommitted)
available: Like local, but in sharded clusters
majority: Return only data written to majority of nodes
linearizable: Strongest—reflects all successful writes

Read preference: Which nodes to read from

primary: Only the primary (freshest data)
primaryPreferred: Primary if available, else secondary
secondary: Read from secondaries (reduces primary load)
secondaryPreferred: Secondaries if available, else primary
nearest: Lowest network latency

Choose Consistency Per Use Case

Conflict Resolution: When Eventual Isn't Enough

Conflict Resolution Strategies

Common Resolution Approaches

•Last Write Wins (LWW) — Each write carries a timestamp; the latest timestamp wins. Simple but can lose data if clocks are skewed. Used by: Cassandra, DynamoDB (default).
•First Write Wins — The earliest write is preserved; subsequent writes are rejected. Useful for immutable data like transaction logs.
•Vector Clocks — Track causal history of updates; detect concurrent writes. Return both versions to the application for resolution. Used by: Riak, Amazon Dynamo.
•Merge Functions — Custom logic merges conflicting values (e.g., union of sets, max of counters). Application-specific semantics.
•CRDTs (Conflict-free Replicated Data Types) — Data structures that automatically merge without conflicts. Counters, sets, registers with mathematical guarantees. Used by: Redis, Riak.

conflict-resolution-examples
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// Example: Shopping Cart conflict resolution
 
// Last Write Wins - Simple but loses data
interface LWWCart {
    items: string[];
    timestamp: number;
}
 
function resolveLWW(cart1: LWWCart, cart2: LWWCart): LWWCart {
    return cart1.timestamp > cart2.timestamp ? cart1 : cart2;
}
// Problem: If user adds ItemA on Node1, ItemB on Node2,
// one item is lost!
 
// Merge Function - Preserves all additions
interface MergeCart {
    items: Set<string>;
}
 
function mergeCarts(cart1: MergeCart, cart2: MergeCart): MergeCart {
    return {
        items: new Set([...cart1.items, ...cart2.items])
    };
}
// Better: Cart contains ItemA AND ItemB after merge
 
// CRDT G-Counter - Increment-only counter, auto-mergeable
interface GCounter {
    [nodeId: string]: number;  // Each node tracks its own increments
}
 
function mergeGCounters(a: GCounter, b: GCounter): GCounter {
    const result: GCounter = { ...a };
    for (const [nodeId, count] of Object.entries(b)) {
        result[nodeId] = Math.max(result[nodeId] || 0, count);
    }
    return result;
}
 
function getValue(counter: GCounter): number {
    return Object.values(counter).reduce((sum, n) => sum + n, 0);
}
// Perfect: Count is always correct regardless of merge order

Conflict Resolution Is Application-Specific

Summary: CAP and BASE in Perspective

We've explored the theoretical foundations that govern distributed database design. These aren't academic exercises—they're practical constraints that shape every database decision.

Key Takeaways

•CAP is a fundamental constraint — During partitions, distributed systems must choose between consistency and availability. This is mathematically proven, not a technology limitation.
•Partition tolerance is mandatory — In distributed systems, partitions happen. The real choice is between CP (consistency during partitions) and AP (availability during partitions).
•ACID provides strong guarantees — Atomicity, Consistency, Isolation, Durability ensure correctness but are expensive in distributed systems.
•BASE accepts weaker guarantees — Basically Available, Soft state, Eventually consistent—prioritizes availability and scalability over immediate consistency.
•Modern databases offer tunability — Consistency levels can be specified per operation, matching guarantees to requirements.
•Conflict resolution is essential for AP systems — When consistency is relaxed, conflicts occur and must be resolved (LWW, merge functions, CRDTs).

What's next:

Page Complete

3 / 5