Loading learning content...
Understanding linearizability is valuable, but building systems on a solid foundation requires knowing which production systems actually provide these guarantees—and under what conditions. Not all "strongly consistent" databases are created equal, and the devil is in the details.
Some systems provide linearizability by default; others require specific configurations. Some maintain guarantees during network partitions (by sacrificing availability); others silently degrade. Some have been rigorously tested with tools like Jepsen; others rely on correctness claims that haven't been independently verified.
This page explores the major categories of systems providing strong consistency, their architectural approaches, and practical guidance for choosing among them.
By the end of this page, you will understand the major categories of linearizable systems (coordination services, NewSQL databases, cloud services), how each achieves linearizability, the trade-offs between different approaches, and how to evaluate systems' consistency claims. You'll be equipped to choose the right foundation for your distributed applications.
Coordination services are specialized systems designed specifically to provide linearizable operations for distributed coordination tasks: leader election, distributed locking, service discovery, and configuration management.
Architecture:
Consistency Model:
// Linearizable read in ZooKeeper
zk.sync("/important/path", (rc, path, ctx) -> {
byte[] data = zk.getData("/important/path", false, null);
// This read is linearizable
});
// Non-linearizable (but faster) read
byte[] staleData = zk.getData("/important/path", false, null);
// May return stale data!
Performance Characteristics:
| System | Protocol | Linearizable Reads | Best For | Jepsen Status |
|---|---|---|---|---|
| ZooKeeper | ZAB | With sync() | Java ecosystem, Kafka, Hadoop | Tested, issues found/fixed |
| etcd | Raft | Yes (default) | Kubernetes, cloud-native | Tested, generally solid |
| Consul | Raft | Yes (consistent mode) | Service mesh, HashiCorp stack | Tested, issues found/fixed |
| Chubby | Paxos | Yes | Google internal (not public) | Google internal testing |
Architecture:
Consistency Model:
// etcd provides linearizable reads by default
resp, err := client.Get(ctx, "my-key")
// This read is linearizable
// Optional: serializable read (faster, may be stale)
resp, err := client.Get(ctx, "my-key", clientv3.WithSerializable())
// May return stale data
Why etcd Is Popular:
ZooKeeper, etcd, and Consul are designed for small amounts of critical data—configuration, leader information, service registration. They're not designed for high-volume application data. A few thousand keys at most. For application data requiring linearizability, use a linearizable database.
NewSQL databases combine the scalability of NoSQL with the strong consistency and SQL interface of traditional relational databases. They use consensus protocols to provide linearizability (or strict serializability) while sharding data across many nodes.
Architecture:
Consistency Model:
The TrueTime Magic:
TrueTime.now() returns interval: [earliest, latest]
Uncertainty: typically 5-10ms
Commit wait:
1. Assign commit timestamp T = TrueTime.now().latest
2. Wait until TrueTime.now().earliest > T
3. Transaction is now durable and globally ordered
Result: All transactions have globally consistent ordering
without requiring global consensus for reads
Performance:
Architecture:
Consistency Model:
HLC vs TrueTime Trade-off:
TrueTime (Spanner):
- Requires specialized hardware (GPS, atomic clocks)
- Bounded uncertainty enables external consistency
- Commit wait adds latency but avoids remote coordination for reads
HLC (CockroachDB):
- Works on commodity hardware
- Cannot guarantee external consistency without coordination
- May require remote reads to ensure consistency
- Slightly higher read latency in some scenarios
Architecture:
Consistency Model:
| Database | Consistency | Clock Mechanism | SQL Compatibility | Deployment |
|---|---|---|---|---|
| Spanner | Strict serializable | TrueTime | ANSI SQL | GCP only |
| CockroachDB | Serializable | HLC | PostgreSQL | Self-hosted or cloud |
| TiDB | Snapshot isolation | TSO (timestamp oracle) | MySQL | Self-hosted or cloud |
| YugabyteDB | Serializable | Hybrid clocks | PostgreSQL | Self-hosted or cloud |
| FoundationDB | Strict serializable | Centralized sequencer | KV only (layers) | Self-hosted |
Many NewSQL databases support multiple isolation levels. The default isn't always the strongest! Always verify your configuration provides the consistency level you need. For example, CockroachDB defaults to serializable, but PostgreSQL defaults to read committed.
FoundationDB deserves special attention as it represents a unique approach: provide a simple, rigorously correct linearizable key-value store, then build higher-level abstractions as "layers" on top.
Core components:
The Sequencer Model:
Unlike Raft/Paxos where each operation goes through consensus,
FoundationDB uses a centralized sequencer:
1. Client starts transaction, reads from storage servers
2. Client sends writes to proxies
3. Proxy requests commit timestamp from sequencer
4. Sequencer assigns monotonically increasing timestamp
5. Resolvers check for conflicts against committed transactions
6. If no conflicts, log servers persist the transaction
7. Transaction is committed with guaranteed ordering
Advantages:
Handling sequencer failure:
FoundationDB provides strict serializability for key-value operations. Higher-level abstractions are built as layers:
┌─────────────────────────────────────────────────┐
│ Application Layer │
├─────────────────────────────────────────────────┤
│ Document Layer │ SQL Layer │ Graph Layer │ ... │
├─────────────────────────────────────────────────┤
│ Record Layer │
│ (structured records, indexes, transactions) │
├─────────────────────────────────────────────────┤
│ FoundationDB Core │
│ (ordered key-value, strict serializability) │
└─────────────────────────────────────────────────┘
Notable users:
FoundationDB pioneered rigorous simulation testing:
Deterministic simulation:
- Run entire cluster in single-threaded simulator
- Control all randomness (network, disk, clocks)
- Inject failures systematically
- Replay bugs deterministically
Result: Millions of simulated hours of operation
Bugs found before production
High confidence in correctness claims
FoundationDB's philosophy is that a small, correct core is better than a feature-rich, complex one. By providing only linearizable key-value operations, FoundationDB can be exhaustively tested. Higher-level features (SQL, documents) are built as layers that inherit the core's correctness guarantees.
Major cloud providers offer managed services with strong consistency guarantees, often building on the technologies discussed above.
DynamoDB:
# DynamoDB with strong consistency
response = dynamodb.get_item(
TableName='users',
Key={'user_id': {'S': 'alice'}},
ConsistentRead=True # Linearizable read
)
# DynamoDB transaction (serializable)
response = dynamodb.transact_write_items(
TransactItems=[
{'Put': {...}},
{'Update': {...}},
{'Delete': {...}}
]
) # All-or-nothing, serializable
Aurora:
Amazon QLDB:
| Service | Provider | Default Consistency | Strong Consistency Option |
|---|---|---|---|
| Spanner | GCP | Strict serializable | Default (can't be weakened) |
| Cloud SQL | GCP | Serializable | Default for single primary |
| DynamoDB | AWS | Eventual | ConsistentRead=true |
| Aurora | AWS | Read committed | Writer endpoint, no read replicas |
| Cosmos DB | Azure | Config-dependent | Strong consistency level |
| Firestore | GCP | Linearizable | Default |
Cosmos DB offers five consistency levels, from strongest to weakest:
1. Strong (Linearizable)
- Reads guaranteed to return most recent committed write
- Single region only for writes
- Highest latency
2. Bounded Staleness
- Reads lag behind writes by at most K versions or T time
- Configurable K and T
3. Session
- Monotonic reads/writes within a session
- Different sessions may see different orders
4. Consistent Prefix
- Reads never see out-of-order writes
- May be arbitrarily stale
5. Eventual
- No ordering guarantees
- Highest throughput
Important: Strong consistency in Cosmos DB restricts writes to a single region, eliminating the multi-region write capability. This is a direct manifestation of the CAP theorem.
Firestore provides linearizable consistency by default:
// All Firestore operations are linearizable
await db.collection('users').doc('alice').set({name: 'Alice'});
// This read will see the write above
const doc = await db.collection('users').doc('alice').get();
Firestore uses a similar architecture to Spanner, providing external consistency for all operations.
Cloud providers use terms like 'strong consistency' inconsistently. Some mean linearizable, others mean read-your-writes. Always verify: (1) What operations are covered? (2) What happens during partitions? (3) Is it default or opt-in? (4) Are there regional restrictions?
Not all consistency claims are equally trustworthy. Here's how to evaluate whether a system actually provides the guarantees it claims.
Watch out for:
For any system claiming linearizability:
1. Which operations are linearizable?
- All operations? Writes only? Specific APIs?
2. What happens during network partitions?
- Operations fail? Silent degradation? Best-effort?
3. What's the default configuration?
- Is linearizability default or opt-in?
4. Has it been independently tested?
- Jepsen report? Academic analysis? Internal testing?
5. What's the latency and availability trade-off?
- How much latency does strong consistency add?
- What's the availability target?
Jepsen (jepsen.io) has become the industry standard for testing distributed systems' consistency claims:
What Jepsen does:
Selected Jepsen findings:
| System | Claim | Finding | Impact |
|---|---|---|---|
| MongoDB | Linearizable with majority read concern | Violated under network partitions | Fixed in later versions |
| Redis (Redlock) | Distributed lock | Unsafe under timing assumptions | Known limitation |
| etcd | Linearizable | Generally correct, minor issues found | Issues fixed |
| CockroachDB | Serializable | Generally correct, edge cases found | Issues fixed |
| PostgreSQL | Serializable | Correct | Validated |
Key insight: Many systems that claim linearizability have had violations discovered by Jepsen. Check for a Jepsen report and whether issues were fixed.
Even with verified systems, always understand the specific configuration you're using. Run your own consistency tests if the operation is critical. A system may be linearizable in general but misconfigured in your specific deployment.
With the landscape understood, here's how to choose the right linearizable system for your needs.
| Use Case | Recommended System(s) | Rationale |
|---|---|---|
| Kubernetes cluster state | etcd | Native integration, well-tested, purpose-built |
| Leader election (general) | etcd, ZooKeeper, Consul | Mature, well-understood, low overhead |
| Distributed locks | etcd, Redis (single-node) | Coordination services handle this native |
| Service discovery | Consul, etcd | Built-in features, health checking |
| OLTP with SQL | CockroachDB, Spanner, TiDB | SQL interface, scalable, transactions |
| Financial transactions | Spanner, FoundationDB | Strictest guarantees, proven at scale |
| Global distribution | Spanner, CockroachDB | Multi-region consensus built-in |
| Apple-scale workloads | FoundationDB | Proven, simulation-tested |
| Simple key-value | FoundationDB | Minimal overhead, strict guarantees |
| Serverless/managed | Firestore, Spanner, Cosmos DB | No operational overhead |
1. Self-Hosted vs Managed
Self-hosted (etcd, CockroachDB, FoundationDB):
+ Full control, no vendor lock-in
+ Potentially lower cost at scale
- Operational complexity
- Must handle upgrades, monitoring, recovery
Managed (Spanner, Cosmos DB, Firestore):
+ Zero operational overhead
+ Provider handles consistency verification
- Vendor lock-in
- Higher cost, especially at scale
2. Generality vs Specialization
General-purpose (CockroachDB, Spanner):
+ SQL interface, broad use cases
+ Feature-rich (transactions, indexes, schemas)
- Higher overhead for simple use cases
Specialized (etcd, ZooKeeper):
+ Low overhead for coordination tasks
+ Purpose-built APIs
- Not suitable for application data
- Limited data model
3. Geographic Distribution
Single-region:
- Most systems work well
- Lower latency
- Simpler operations
Multi-region:
- Spanner: Best-in-class (TrueTime)
- CockroachDB: Good (HLC)
- Most others: Significant latency penalty
For many applications, a single PostgreSQL instance with synchronous replication provides linearizable guarantees with minimal complexity. Only move to distributed NewSQL when you've outgrown PostgreSQL's capacity—typically 10,000+ writes/second or terabytes of data.
We've surveyed the landscape of systems providing strong consistency, from specialized coordination services to globally distributed databases. Each approach has trade-offs, and the right choice depends on your specific requirements.
Across five pages, you've developed a deep understanding of strong consistency in distributed systems:
You're now equipped to make informed decisions about consistency requirements in your distributed systems—knowing when to pay the linearizability tax and when weaker consistency suffices.
Congratulations! You've mastered strong consistency in distributed systems. You understand linearizability deeply, know its costs and when it's required, and can evaluate real-world systems' consistency claims. This knowledge is essential for building correct, reliable distributed applications.