Apache Cassandra - Learning Module

Loading content...

0/273

Tunable Consistency

Consistency as a Dial, Not a Switch

Traditional databases offer a binary choice: either you have strong consistency (every read sees the most recent write) or you don't. But in globally distributed systems, this binary thinking creates painful trade-offs. Strong consistency in a system spanning continents means every operation waits for cross-continental acknowledgment—introducing latency that kills user experience.

Apache Cassandra introduces a revolutionary concept: tunable consistency. Instead of a global setting, Cassandra allows you to specify consistency requirements per-operation. A user login might require strong consistency, while updating a view counter can tolerate eventual consistency. This granular control lets you optimize each operation for its specific requirements.

What You Will Learn

By the end of this page, you will understand: (1) Cassandra's consistency levels and what each guarantees, (2) The relationship between consistency levels and the CAP theorem, (3) How to achieve strong consistency in Cassandra when needed, (4) Common consistency patterns for different use cases, (5) The performance implications of each consistency level, and (6) Best practices for choosing the right consistency level.

Understanding Consistency Levels

A consistency level (CL) defines how many replica nodes must respond to a read or write operation before the coordinator considers it successful. Different CLs provide different guarantees about data consistency and availability.

Write Consistency Level: Specifies how many replicas must acknowledge the write before returning success to the client.

Read Consistency Level: Specifies how many replicas must respond with data before the coordinator returns to the client.

Let's examine each consistency level in detail:

Cassandra Consistency Levels
Consistency Level	Write Behavior	Read Behavior	Use Case
ANY	Returns after any node (including hinted handoff) acknowledges	Cannot be used for reads	Maximum write availability; fire-and-forget logging
ONE	Returns after 1 replica acknowledges	Returns data from 1 replica	Fastest; acceptable for non-critical data
TWO	Returns after 2 replicas acknowledge	Returns data after reading from 2 replicas	Slightly more durable; rarely used
THREE	Returns after 3 replicas acknowledge	Returns data after reading from 3 replicas	Triple redundancy checks; rarely used
QUORUM	Returns after (RF/2)+1 replicas acknowledge	Returns data after (RF/2)+1 replicas respond	Strong consistency when paired together
LOCAL_QUORUM	Quorum within local datacenter only	Quorum read from local datacenter only	Low-latency strong consistency within a DC
EACH_QUORUM	Quorum in each datacenter	Cannot be used for reads	Multi-DC strong consistency for writes
ALL	Returns after all replicas acknowledge	Returns data after all replicas respond	Maximum consistency; lowest availability

Understanding Quorum:

Quorum is the most important consistency level to understand. With RF (replication factor) = 3:

QUORUM = (3/2) + 1 = 2 replicas

With RF = 5:

QUORUM = (5/2) + 1 = 3 replicas

The quorum formula ensures that a majority of replicas participate, which has a crucial property: any two quorums must have at least one node in common. This overlap is what enables strong consistency.

The Quorum Overlap Principle

If writes require W replicas and reads require R replicas, then W + R > N (replication factor) guarantees that every read sees the most recent write. With QUORUM for both reads and writes: (N/2 + 1) + (N/2 + 1) = N + 2 > N. This mathematical guarantee is the foundation of strong consistency in Cassandra.

Consistency and the CAP Theorem

The CAP theorem states that in a distributed system, you can only guarantee two of three properties:

Consistency: Every read receives the most recent write
Availability: Every request receives a response (success or failure)
Partition tolerance: The system continues operating despite network partitions

Since network partitions are inevitable in distributed systems, the practical choice is between CP (consistency over availability) and AP (availability over consistency).

Cassandra's Position:

Cassandra is fundamentally an AP system—it prioritizes availability over consistency during partitions. However, tunable consistency lets you slide toward CP behavior when needed:

cap_spectrum.txt
CAP Trade-off Spectrum in Cassandra
====================================
 
More Available (AP)                    More Consistent (CP)
←────────────────────────────────────────────────────────────→
 
CL=ONE        CL=QUORUM         CL=ALL
  │               │                │
  │               │                │
  ▼               ▼                ▼
 
[Eventual     [Strongly         [Maximally
Consistent]   Consistent        Consistent,
               w/ Quorum]        Least Available]
 
ONE, ONE      QUORUM, QUORUM    ALL, ALL
 (RF=3)           (RF=3)         (RF=3)
  │                 │              │
  │                 │              │
  ▼                 ▼              ▼
  
Can lose        Guarantees      Any node failure
1-2 replicas    read-your-      blocks operations
and still       writes
operate

During Partitions:

When network partitions occur, consistency level determines behavior:

CL=ONE: Operations likely continue on both sides of the partition, potentially diverging
CL=QUORUM: Operations succeed only if a majority is reachable; minority side is blocked
CL=ALL: Operations succeed only if all replicas are reachable; any partition blocks

The Availability-Consistency Trade-off:

With RF=3 and CL=QUORUM:

Can tolerate 1 node failure and maintain both reads and writes
2 node failures block operations (can't reach quorum)

With RF=3 and CL=ONE:

Can tolerate 2 node failures and still operate
But reads might return stale data

Eventually Consistent Doesn't Mean Never Consistent

With CL=ONE, data is still replicated to all replicas—just asynchronously. In the absence of failures, all replicas converge within milliseconds. 'Eventually consistent' describes the guarantee during and after failures, not normal operation. Most reads with CL=ONE return the latest data; consistency issues arise only in specific failure scenarios or concurrent access patterns.

Achieving Strong Consistency in Cassandra

While Cassandra defaults to eventual consistency, you can achieve strong consistency by carefully choosing consistency levels. The key formula is:

W + R > N

Where:

W = Write Consistency Level (number of replicas for write acknowledgment)
R = Read Consistency Level (number of replicas for read)
N = Replication Factor

When this inequality holds, at least one replica in the read set must have the latest write, and Cassandra's conflict resolution (last-write-wins by timestamp) ensures the newest data is returned.

Strong Consistency Configurations (RF=3)
Write CL	Read CL	W + R	Strongly Consistent?	Notes
ONE (1)	ONE (1)	2	No (2 ≤ 3)	Fastest, but may return stale reads
ONE (1)	QUORUM (2)	3	No (3 ≤ 3)	Still possible to miss latest write
QUORUM (2)	ONE (1)	3	No (3 ≤ 3)	Still possible to miss latest write
QUORUM (2)	QUORUM (2)	4	Yes (4 > 3)	Standard strong consistency
ALL (3)	ONE (1)	4	Yes (4 > 3)	Strong, but ALL writes reduce availability
ONE (1)	ALL (3)	4	Yes (4 > 3)	Strong, but ALL reads reduce availability
ALL (3)	ALL (3)	6	Yes (6 > 3)	Maximum consistency, minimum availability

Why QUORUM/QUORUM is the Standard:

The QUORUM/QUORUM combination provides the optimal balance:

Strong Consistency Guaranteed: W + R = 4 > 3
Fault Tolerant: Can survive 1 replica failure and still satisfy quorum
Balanced Latency: Both reads and writes wait for majority, sharing the latency cost

Common Strong Consistency Patterns:

Strong Consistency Patterns

•QUORUM/QUORUM — The standard pattern for strong consistency with fault tolerance. Use for user accounts, financial data, inventory counts.
•LOCAL_QUORUM/LOCAL_QUORUM — Strong consistency within a single datacenter with lower latency. Use when you need strong consistency for local operations but don't need cross-DC consistency for every request.
•ALL/ONE — All replicas must acknowledge writes, but reads are fast. Use when write availability is less critical than read speed, and you want maximum durability.
•SERIAL/QUORUM — For lightweight transactions (compare-and-swap). Provides linearizable consistency for complex operations. Use for counters, sequences, or any read-modify-write pattern.

Strong Consistency ≠ Transactions

Strong consistency (W + R > N) guarantees that reads see writes, but it doesn't provide multi-row transactions. Cassandra's lightweight transactions (LWTs) using CL=SERIAL provide linearizable single-partition operations, but there's no cross-partition ACID transactions. For transactional needs, consider Cassandra's LWT feature or other design patterns.

Multi-Datacenter Consistency Levels

Global deployments introduce a new dimension: how do you balance consistency across datacenters with latency? Cassandra provides datacenter-aware consistency levels:

LOCAL_QUORUM: Operate with quorum semantics but only consider replicas in the coordinator's datacenter. This provides:

Strong consistency within a single DC
Low latency (no cross-DC coordination for reads/writes)
But: reads in one DC might not see writes in another DC immediately

EACH_QUORUM (writes only): Require a quorum in every datacenter before acknowledging. This provides:

Cross-DC strong consistency for writes
Higher latency (waits for all DCs)
Blocks if any DC loses quorum

multi_dc_example.txt
Multi-DC Deployment: 3 DCs, RF=3 per DC
========================================
 
Keyspace configuration:
  class: NetworkTopologyStrategy
  us-east: 3
  eu-west: 3
  ap-south: 3
 
Total replicas per partition: 9 (3 + 3 + 3)
 
Scenario: Client in us-east writes data
 
┌──────────────────────────────────────────────────────────────┐
│ Consistency Level Comparison                                  │
├──────────────────────────────────────────────────────────────┤
│ LOCAL_QUORUM (Write):                                         │
│   Wait for 2 replicas in us-east to acknowledge              │
│   Latency: ~2ms (local only)                                 │
│   eu-west and ap-south receive data asynchronously           │
├──────────────────────────────────────────────────────────────┤
│ EACH_QUORUM (Write):                                          │
│   Wait for 2 replicas in us-east, 2 in eu-west, 2 in ap-south│
│   Latency: ~150ms (cross-continental)                        │
│   All DCs guaranteed to have data                            │
├──────────────────────────────────────────────────────────────┤
│ QUORUM (Write):                                               │
│   Wait for (9/2)+1 = 5 replicas across any DCs               │
│   Latency: Variable (depends on which 5 respond first)       │
│   Might be satisfied by us-east(3) + eu-west(2)              │
└──────────────────────────────────────────────────────────────┘

Choosing Multi-DC Consistency Levels:

Multi-DC Consistency Recommendations

•LOCAL_QUORUM for most operations — Provides strong consistency for local users with low latency. Most applications don't need cross-DC consistency for every operation.
•EACH_QUORUM for critical writes — When you absolutely need a write to be durable across all DCs before acknowledging (e.g., payment processing).
•Avoid bare QUORUM in multi-DC — QUORUM across multiple DCs can be satisfied by a subset of DCs, which may not be what you want. Use LOCAL_QUORUM or EACH_QUORUM for explicit behavior.
•LOCAL_ONE for non-critical reads — If eventual consistency is acceptable and you want maximum read performance.

Cross-DC Consistency Trade-offs

Using EACH_QUORUM or cross-DC QUORUM adds significant latency (typically 50-150ms for cross-continental operations). This might be acceptable for rare operations (user signup) but not for every request. Design your data model and read/write patterns to minimize cross-DC coordination.

Performance Implications of Consistency Levels

Consistency level directly impacts performance. Understanding these trade-offs is essential for system design.

Performance Characteristics by Consistency Level (RF=3, Single DC)
Consistency Level	Typical Latency	Throughput Impact	Failure Tolerance
ONE	~1-2ms	Maximum	Can lose 2 replicas
TWO	~2-3ms	High	Can lose 1 replica
QUORUM	~3-5ms	Moderate	Can lose 1 replica
ALL	~5-10ms	Lower	Any failure blocks

Why Higher Consistency Affects Performance:

More Network Round-Trips: CL=ALL waits for all 3 replicas; CL=ONE returns after 1. The slowest replica determines latency.
Tail Latency Amplification: With CL=ALL, a single slow replica (GC pause, network hiccup) delays the entire operation. CL=QUORUM is less affected because only 2 of 3 matter.
Coordinator Overhead: Higher CLs require the coordinator to track more acknowledgments, merge more results, and perform more network I/O.
Speculative Retry: Cassandra can speculatively send requests to extra replicas if initial responses are slow. Higher CLs reduce the benefit of this optimization.

Latency Distribution Comparison:

latency_distribution.txt
Latency Distribution (RF=3, percentiles)
==========================================
 
Consistency Level: ONE
├── p50:  1.2ms  (fastest replica)
├── p95:  2.8ms
├── p99:  4.5ms
└── p99.9: 12ms
 
Consistency Level: QUORUM  
├── p50:  2.1ms  (median of 2 fastest replicas)
├── p95:  4.2ms
├── p99:  8.3ms
└── p99.9: 25ms
 
Consistency Level: ALL
├── p50:  3.5ms  (slowest of all replicas)
├── p95:  8.1ms
├── p99:  18ms
└── p99.9: 85ms
 
Key insight: ALL is ~3x slower at median, ~6x slower at p99.9
This is because it always waits for the slowest replica.

Speculative Retry

Cassandra's speculative retry feature can improve p99 latencies for CL=QUORUM. If the first replica doesn't respond within a threshold, Cassandra speculatively sends the request to another replica. Configure this in your table schema or client driver for latency-sensitive workloads.

Common Patterns and Best Practices

Different use cases demand different consistency approaches. Here are battle-tested patterns:

User Accounts and Profiles

Requirements:

Users must see their own updates immediately
Security-sensitive (password changes must be visible everywhere)
Moderate write frequency

Recommended Pattern:

Write CL: QUORUM — Ensure data is durably stored
Read CL: QUORUM — Ensure reads see latest writes
For password/email changes: EACH_QUORUM writes — Extra safety

Why This Works:

QUORUM/QUORUM guarantees read-your-writes consistency
User can immediately log in with new password from any DC
Can survive single node failure without impact

Consistency Level Per Statement

Consistency levels are specified per CQL statement or at the session/connection level. This means you can use different CLs for different queries within the same application—for example, QUORUM for reading user profiles but ONE for reading feed items.

Read Repair and Anti-Entropy: Consistency Helpers

Even with consistency levels, replicas can diverge (e.g., a replica was down during a write). Cassandra provides background mechanisms to heal these inconsistencies:

Read Repair:

When a read operation contacts multiple replicas, the coordinator compares their responses. If discrepancies are found:

Blocking Read Repair (deprecated in newer versions): Fixes inconsistencies before returning to the client. Increases read latency.
Read Repair Chance (deprecated): Probabilistically triggered full read from all replicas to detect inconsistencies.
Post-Read Repair: After returning the latest data to the client, the coordinator asynchronously sends repair mutations to out-of-date replicas.

How Read Repair Works:

read_repair_flow.txt
Read Repair Flow (CL=QUORUM, RF=3)
===================================
 
1. Client reads partition_key=X with CL=QUORUM
 
2. Coordinator sends digest request to 2 replicas (quorum)
   - Replica A: responds with digest of data @ timestamp T1
   - Replica B: responds with digest of data @ timestamp T2
 
3. Digests match? 
   → Yes: Return data to client (fast path)
   → No:  Request full data from both replicas
 
4. If digests didn't match:
   - Compare full data from both replicas
   - Determine latest version (by timestamp)
   - Return latest to client
 
5. Asynchronously:
   - Send repair mutation to replica with stale data
   - That replica updates its local copy
 
6. Result: Reading data actively heals inconsistencies

Anti-Entropy Repair (nodetool repair):

Background repair process that proactively scans all data for inconsistencies:

Builds Merkle trees (hash trees) of all data on each replica
Compares trees to find partitions with mismatches
Streams actual data for mismatched partitions

Best Practice: Run repair periodically (at least weekly) to catch any write that might have been missed during node outages. This is especially important if you use consistency levels lower than QUORUM.

gc_grace_seconds:

This setting defines how long tombstones (deletion markers) are kept before being garbage collected. To prevent zombie data resurrection, repair must run more frequently than gc_grace_seconds (default: 10 days). Failing to repair in this window can cause deleted data to reappear.

Repair is Not Optional

Running nodetool repair regularly is a requirement, not an optimization. Without repair, replicas can diverge permanently, and worse, deleted data can 'resurrect' when a node that missed the delete comes back after gc_grace_seconds. Many production incidents trace back to neglected repair schedules.

Lightweight Transactions (LWT) for Linearizability

For operations that require linearizable consistency—where the order of operations matters and concurrent operations must be serialized—Cassandra provides Lightweight Transactions (LWT).

When You Need LWT:

Compare-and-swap operations (update only if current value is X)
Unique constraint enforcement (insert only if not exists)
Counters and sequences that must be accurate
Any read-modify-write where races would cause incorrect results

How LWT Works:

LWT uses the Paxos consensus protocol to serialize operations:

lwt_examples.cql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Insert only if row doesn't exist (unique username)
INSERT INTO users (username, email, created_at)
VALUES ('johndoe', 'john@example.com', toTimestamp(now()))
IF NOT EXISTS;
 
-- Returns: [applied] = true if successful
-- Returns: [applied] = false + existing row if username exists
 
 
-- Update only if current value matches (optimistic locking)
UPDATE inventory 
SET quantity = 95 
WHERE product_id = 'SKU-123'
IF quantity = 100;
 
-- Returns: [applied] = true if quantity was 100
-- Returns: [applied] = false + current quantity if different
 
 
-- Conditional delete
DELETE FROM sessions 
WHERE user_id = 'user-456' 
IF token = 'abc123';
 
-- Only deletes if the token matches

LWT Performance Implications:

LWT uses Paxos consensus, which requires multiple round-trips:

Prepare phase: Coordinator proposes to replicas
Promise phase: Replicas promise not to accept older proposals
Accept phase: Coordinator sends actual value
Acknowledgment: Replicas apply and acknowledge

This adds significant latency compared to regular writes:

Regular write: ~2-5ms
LWT write: ~30-50ms (4x more network round-trips)

LWT Consistency Levels:

SERIAL: Linearizable across all datacenters (cross-DC Paxos)
LOCAL_SERIAL: Linearizable within local datacenter only

LWT Best Practices

•Use sparingly — LWT performance is 10-20x slower than regular writes. Reserve for truly critical operations.
•Avoid mixing LWT and non-LWT on the same partition — Can cause unexpected behavior and performance issues.
•Check [applied] status — Always check the return value; don't assume success.
•Consider LOCAL_SERIAL — If cross-DC linearizability isn't needed, LOCAL_SERIAL is faster.
•Don't use for counters — Use Cassandra's built-in counter column type instead.

Alternatives to LWT

Many use cases that seem to require LWT can be solved with better data modeling. Time-series data with no updates, write-only audit logs, or designs that embrace eventual consistency often outperform LWT-heavy approaches. Consider if your problem truly requires linearizability.

Summary and Next Steps

We've explored Cassandra's tunable consistency in depth. Let's consolidate the key concepts:

Key Takeaways

•Consistency is per-operation — Each read or write can specify its own consistency level, enabling fine-grained control.
•W + R > N for strong consistency — When write replicas plus read replicas exceed replication factor, reads see the latest writes.
•QUORUM/QUORUM is the standard — Provides strong consistency with fault tolerance, balancing latency and safety.
•LOCAL_QUORUM for multi-DC — Strong consistency within a datacenter without cross-DC latency.
•Lower CLs trade consistency for performance — CL=ONE is fast but may return stale data.
•Read repair and nodetool repair heal inconsistencies — Background processes ensure replicas converge over time.
•LWT for linearizable operations — Paxos-based transactions for compare-and-swap semantics, but with significant performance cost.
•Choose based on use case — Different data has different consistency requirements; one size doesn't fit all.

What's Next:

With consistency levels understood, we now turn to Cassandra's distinctive data model. The next page explores the wide-column model—how Cassandra organizes data in partitions and rows, and how this model enables both the performance and the access patterns that define Cassandra applications.

Page Complete

You now understand how to tune Cassandra's consistency to match your application's requirements—from fire-and-forget logging to strongly consistent financial transactions. Next, we'll explore Cassandra's wide-column data model and how it enables high-performance distributed storage.