Loading content...
Traditional databases offer a binary choice: either you have strong consistency (every read sees the most recent write) or you don't. But in globally distributed systems, this binary thinking creates painful trade-offs. Strong consistency in a system spanning continents means every operation waits for cross-continental acknowledgment—introducing latency that kills user experience.
Apache Cassandra introduces a revolutionary concept: tunable consistency. Instead of a global setting, Cassandra allows you to specify consistency requirements per-operation. A user login might require strong consistency, while updating a view counter can tolerate eventual consistency. This granular control lets you optimize each operation for its specific requirements.
By the end of this page, you will understand: (1) Cassandra's consistency levels and what each guarantees, (2) The relationship between consistency levels and the CAP theorem, (3) How to achieve strong consistency in Cassandra when needed, (4) Common consistency patterns for different use cases, (5) The performance implications of each consistency level, and (6) Best practices for choosing the right consistency level.
A consistency level (CL) defines how many replica nodes must respond to a read or write operation before the coordinator considers it successful. Different CLs provide different guarantees about data consistency and availability.
Write Consistency Level: Specifies how many replicas must acknowledge the write before returning success to the client.
Read Consistency Level: Specifies how many replicas must respond with data before the coordinator returns to the client.
Let's examine each consistency level in detail:
| Consistency Level | Write Behavior | Read Behavior | Use Case |
|---|---|---|---|
| ANY | Returns after any node (including hinted handoff) acknowledges | Cannot be used for reads | Maximum write availability; fire-and-forget logging |
| ONE | Returns after 1 replica acknowledges | Returns data from 1 replica | Fastest; acceptable for non-critical data |
| TWO | Returns after 2 replicas acknowledge | Returns data after reading from 2 replicas | Slightly more durable; rarely used |
| THREE | Returns after 3 replicas acknowledge | Returns data after reading from 3 replicas | Triple redundancy checks; rarely used |
| QUORUM | Returns after (RF/2)+1 replicas acknowledge | Returns data after (RF/2)+1 replicas respond | Strong consistency when paired together |
| LOCAL_QUORUM | Quorum within local datacenter only | Quorum read from local datacenter only | Low-latency strong consistency within a DC |
| EACH_QUORUM | Quorum in each datacenter | Cannot be used for reads | Multi-DC strong consistency for writes |
| ALL | Returns after all replicas acknowledge | Returns data after all replicas respond | Maximum consistency; lowest availability |
Understanding Quorum:
Quorum is the most important consistency level to understand. With RF (replication factor) = 3:
With RF = 5:
The quorum formula ensures that a majority of replicas participate, which has a crucial property: any two quorums must have at least one node in common. This overlap is what enables strong consistency.
If writes require W replicas and reads require R replicas, then W + R > N (replication factor) guarantees that every read sees the most recent write. With QUORUM for both reads and writes: (N/2 + 1) + (N/2 + 1) = N + 2 > N. This mathematical guarantee is the foundation of strong consistency in Cassandra.
The CAP theorem states that in a distributed system, you can only guarantee two of three properties:
Since network partitions are inevitable in distributed systems, the practical choice is between CP (consistency over availability) and AP (availability over consistency).
Cassandra's Position:
Cassandra is fundamentally an AP system—it prioritizes availability over consistency during partitions. However, tunable consistency lets you slide toward CP behavior when needed:
CAP Trade-off Spectrum in Cassandra==================================== More Available (AP) More Consistent (CP)←────────────────────────────────────────────────────────────→ CL=ONE CL=QUORUM CL=ALL │ │ │ │ │ │ ▼ ▼ ▼ [Eventual [Strongly [MaximallyConsistent] Consistent Consistent, w/ Quorum] Least Available] ONE, ONE QUORUM, QUORUM ALL, ALL (RF=3) (RF=3) (RF=3) │ │ │ │ │ │ ▼ ▼ ▼ Can lose Guarantees Any node failure1-2 replicas read-your- blocks operationsand still writesoperateDuring Partitions:
When network partitions occur, consistency level determines behavior:
The Availability-Consistency Trade-off:
With RF=3 and CL=QUORUM:
With RF=3 and CL=ONE:
With CL=ONE, data is still replicated to all replicas—just asynchronously. In the absence of failures, all replicas converge within milliseconds. 'Eventually consistent' describes the guarantee during and after failures, not normal operation. Most reads with CL=ONE return the latest data; consistency issues arise only in specific failure scenarios or concurrent access patterns.
While Cassandra defaults to eventual consistency, you can achieve strong consistency by carefully choosing consistency levels. The key formula is:
W + R > N
Where:
When this inequality holds, at least one replica in the read set must have the latest write, and Cassandra's conflict resolution (last-write-wins by timestamp) ensures the newest data is returned.
| Write CL | Read CL | W + R | Strongly Consistent? | Notes |
|---|---|---|---|---|
| ONE (1) | ONE (1) | 2 | No (2 ≤ 3) | Fastest, but may return stale reads |
| ONE (1) | QUORUM (2) | 3 | No (3 ≤ 3) | Still possible to miss latest write |
| QUORUM (2) | ONE (1) | 3 | No (3 ≤ 3) | Still possible to miss latest write |
| QUORUM (2) | QUORUM (2) | 4 | Yes (4 > 3) | Standard strong consistency |
| ALL (3) | ONE (1) | 4 | Yes (4 > 3) | Strong, but ALL writes reduce availability |
| ONE (1) | ALL (3) | 4 | Yes (4 > 3) | Strong, but ALL reads reduce availability |
| ALL (3) | ALL (3) | 6 | Yes (6 > 3) | Maximum consistency, minimum availability |
Why QUORUM/QUORUM is the Standard:
The QUORUM/QUORUM combination provides the optimal balance:
Common Strong Consistency Patterns:
Strong consistency (W + R > N) guarantees that reads see writes, but it doesn't provide multi-row transactions. Cassandra's lightweight transactions (LWTs) using CL=SERIAL provide linearizable single-partition operations, but there's no cross-partition ACID transactions. For transactional needs, consider Cassandra's LWT feature or other design patterns.
Global deployments introduce a new dimension: how do you balance consistency across datacenters with latency? Cassandra provides datacenter-aware consistency levels:
LOCAL_QUORUM: Operate with quorum semantics but only consider replicas in the coordinator's datacenter. This provides:
EACH_QUORUM (writes only): Require a quorum in every datacenter before acknowledging. This provides:
Multi-DC Deployment: 3 DCs, RF=3 per DC======================================== Keyspace configuration: class: NetworkTopologyStrategy us-east: 3 eu-west: 3 ap-south: 3 Total replicas per partition: 9 (3 + 3 + 3) Scenario: Client in us-east writes data ┌──────────────────────────────────────────────────────────────┐│ Consistency Level Comparison │├──────────────────────────────────────────────────────────────┤│ LOCAL_QUORUM (Write): ││ Wait for 2 replicas in us-east to acknowledge ││ Latency: ~2ms (local only) ││ eu-west and ap-south receive data asynchronously │├──────────────────────────────────────────────────────────────┤│ EACH_QUORUM (Write): ││ Wait for 2 replicas in us-east, 2 in eu-west, 2 in ap-south││ Latency: ~150ms (cross-continental) ││ All DCs guaranteed to have data │├──────────────────────────────────────────────────────────────┤│ QUORUM (Write): ││ Wait for (9/2)+1 = 5 replicas across any DCs ││ Latency: Variable (depends on which 5 respond first) ││ Might be satisfied by us-east(3) + eu-west(2) │└──────────────────────────────────────────────────────────────┘Choosing Multi-DC Consistency Levels:
Using EACH_QUORUM or cross-DC QUORUM adds significant latency (typically 50-150ms for cross-continental operations). This might be acceptable for rare operations (user signup) but not for every request. Design your data model and read/write patterns to minimize cross-DC coordination.
Consistency level directly impacts performance. Understanding these trade-offs is essential for system design.
| Consistency Level | Typical Latency | Throughput Impact | Failure Tolerance |
|---|---|---|---|
| ONE | ~1-2ms | Maximum | Can lose 2 replicas |
| TWO | ~2-3ms | High | Can lose 1 replica |
| QUORUM | ~3-5ms | Moderate | Can lose 1 replica |
| ALL | ~5-10ms | Lower | Any failure blocks |
Why Higher Consistency Affects Performance:
More Network Round-Trips: CL=ALL waits for all 3 replicas; CL=ONE returns after 1. The slowest replica determines latency.
Tail Latency Amplification: With CL=ALL, a single slow replica (GC pause, network hiccup) delays the entire operation. CL=QUORUM is less affected because only 2 of 3 matter.
Coordinator Overhead: Higher CLs require the coordinator to track more acknowledgments, merge more results, and perform more network I/O.
Speculative Retry: Cassandra can speculatively send requests to extra replicas if initial responses are slow. Higher CLs reduce the benefit of this optimization.
Latency Distribution Comparison:
Latency Distribution (RF=3, percentiles)========================================== Consistency Level: ONE├── p50: 1.2ms (fastest replica)├── p95: 2.8ms├── p99: 4.5ms└── p99.9: 12ms Consistency Level: QUORUM ├── p50: 2.1ms (median of 2 fastest replicas)├── p95: 4.2ms├── p99: 8.3ms└── p99.9: 25ms Consistency Level: ALL├── p50: 3.5ms (slowest of all replicas)├── p95: 8.1ms├── p99: 18ms└── p99.9: 85ms Key insight: ALL is ~3x slower at median, ~6x slower at p99.9This is because it always waits for the slowest replica.Cassandra's speculative retry feature can improve p99 latencies for CL=QUORUM. If the first replica doesn't respond within a threshold, Cassandra speculatively sends the request to another replica. Configure this in your table schema or client driver for latency-sensitive workloads.
Different use cases demand different consistency approaches. Here are battle-tested patterns:
User Accounts and Profiles
Requirements:
Recommended Pattern:
Why This Works:
Consistency levels are specified per CQL statement or at the session/connection level. This means you can use different CLs for different queries within the same application—for example, QUORUM for reading user profiles but ONE for reading feed items.
Even with consistency levels, replicas can diverge (e.g., a replica was down during a write). Cassandra provides background mechanisms to heal these inconsistencies:
Read Repair:
When a read operation contacts multiple replicas, the coordinator compares their responses. If discrepancies are found:
Blocking Read Repair (deprecated in newer versions): Fixes inconsistencies before returning to the client. Increases read latency.
Read Repair Chance (deprecated): Probabilistically triggered full read from all replicas to detect inconsistencies.
Post-Read Repair: After returning the latest data to the client, the coordinator asynchronously sends repair mutations to out-of-date replicas.
How Read Repair Works:
Read Repair Flow (CL=QUORUM, RF=3)=================================== 1. Client reads partition_key=X with CL=QUORUM 2. Coordinator sends digest request to 2 replicas (quorum) - Replica A: responds with digest of data @ timestamp T1 - Replica B: responds with digest of data @ timestamp T2 3. Digests match? → Yes: Return data to client (fast path) → No: Request full data from both replicas 4. If digests didn't match: - Compare full data from both replicas - Determine latest version (by timestamp) - Return latest to client 5. Asynchronously: - Send repair mutation to replica with stale data - That replica updates its local copy 6. Result: Reading data actively heals inconsistenciesAnti-Entropy Repair (nodetool repair):
Background repair process that proactively scans all data for inconsistencies:
Best Practice: Run repair periodically (at least weekly) to catch any write that might have been missed during node outages. This is especially important if you use consistency levels lower than QUORUM.
gc_grace_seconds:
This setting defines how long tombstones (deletion markers) are kept before being garbage collected. To prevent zombie data resurrection, repair must run more frequently than gc_grace_seconds (default: 10 days). Failing to repair in this window can cause deleted data to reappear.
Running nodetool repair regularly is a requirement, not an optimization. Without repair, replicas can diverge permanently, and worse, deleted data can 'resurrect' when a node that missed the delete comes back after gc_grace_seconds. Many production incidents trace back to neglected repair schedules.
For operations that require linearizable consistency—where the order of operations matters and concurrent operations must be serialized—Cassandra provides Lightweight Transactions (LWT).
When You Need LWT:
How LWT Works:
LWT uses the Paxos consensus protocol to serialize operations:
12345678910111213141516171819202122232425
-- Insert only if row doesn't exist (unique username)INSERT INTO users (username, email, created_at)VALUES ('johndoe', 'john@example.com', toTimestamp(now()))IF NOT EXISTS; -- Returns: [applied] = true if successful-- Returns: [applied] = false + existing row if username exists -- Update only if current value matches (optimistic locking)UPDATE inventory SET quantity = 95 WHERE product_id = 'SKU-123'IF quantity = 100; -- Returns: [applied] = true if quantity was 100-- Returns: [applied] = false + current quantity if different -- Conditional deleteDELETE FROM sessions WHERE user_id = 'user-456' IF token = 'abc123'; -- Only deletes if the token matchesLWT Performance Implications:
LWT uses Paxos consensus, which requires multiple round-trips:
This adds significant latency compared to regular writes:
LWT Consistency Levels:
Many use cases that seem to require LWT can be solved with better data modeling. Time-series data with no updates, write-only audit logs, or designs that embrace eventual consistency often outperform LWT-heavy approaches. Consider if your problem truly requires linearizability.
We've explored Cassandra's tunable consistency in depth. Let's consolidate the key concepts:
What's Next:
With consistency levels understood, we now turn to Cassandra's distinctive data model. The next page explores the wide-column model—how Cassandra organizes data in partitions and rows, and how this model enables both the performance and the access patterns that define Cassandra applications.
You now understand how to tune Cassandra's consistency to match your application's requirements—from fire-and-forget logging to strongly consistent financial transactions. Next, we'll explore Cassandra's wide-column data model and how it enables high-performance distributed storage.