Loading content...
Imagine you're transferring $10,000 between two bank accounts. You initiate the transfer on your phone while sitting at a café. A millisecond later, your spouse checks the account balance on their laptop at home. What should they see?
If the system shows the old balance—money in both accounts simultaneously—you've witnessed a consistency violation. If it shows money in neither account, that's equally problematic. The only acceptable answer is that your spouse sees either the complete pre-transfer state or the complete post-transfer state, never an intermediate or phantom state.
This scenario illustrates why strong consistency exists: some operations are so critical that any deviation from a single, globally-agreed-upon view of data is unacceptable. Strong consistency provides the gold standard guarantee—every read returns the most recent write, as if there were only one copy of the data and all operations happened sequentially.
By the end of this page, you will understand what strong consistency truly means, how linearizability provides the theoretical foundation, why real-time ordering matters, and the mechanisms—like consensus protocols—that make strong consistency achievable. You'll also understand the inherent costs and trade-offs that make strong consistency a carefully considered choice, not a default.
Strong consistency is a guarantee that once a write completes, all subsequent reads will return that written value. There is no window where different clients can observe different data—the system behaves as if there were a single, authoritative copy of every piece of data, even when that data is physically replicated across multiple nodes, data centers, or continents.
Formally, strong consistency provides the following guarantees:
1. Total Order of Operations: All operations (reads and writes) can be placed into a single, global, sequential order that respects real-time. If operation A completes before operation B starts, then A appears before B in this global order.
2. Reads Reflect Latest Writes: Every read operation returns the value from the most recent write operation in this global order. There are no stale reads.
3. Agreement Across All Observers: All clients, regardless of which replica they contact, observe the same sequence of states. There is no disagreement about what the current state is.
| Guarantee | Meaning | Violation Example |
|---|---|---|
| Recency | Reads always return the latest committed write | Client reads version 5 while version 6 is already committed |
| Total Ordering | All operations form a single global timeline | Operation A completes before B in real-time, but B's effects are observed first |
| Durability | Acknowledged writes survive failures | Write acknowledged but lost during subsequent node crash |
| Atomicity | Operations complete entirely or not at all | Partial update visible—some fields updated, others not |
Strong consistency creates an abstraction where a distributed system behaves like a single machine. From the client's perspective, it doesn't matter that data is replicated—every interaction feels like talking to one authoritative server. This illusion is extraordinarily valuable for application developers, but extraordinarily expensive to maintain.
Linearizability is the formal term for what we informally call strong consistency. Introduced by Maurice Herlihy and Jeannette Wing in 1990, linearizability provides a rigorous definition for correctness in concurrent systems.
A system is linearizable if every operation appears to take effect instantaneously at some point between its invocation and its response. This point is called the linearization point.
Consider the timeline of an operation:
├─────── Operation Invocation ───────────────────── Response ─────────┤
↑
Linearization Point
(effect takes place here)
Key properties of linearizability:
1. Real-Time Ordering: If operation A's response precedes operation B's invocation in real (wall-clock) time, then A must be ordered before B in the linearization. This is what makes linearizability stronger than serializability.
2. Sequential Specification: The effects of all operations must be explainable by some sequential execution of those operations, where each operation appears atomic.
3. External Consistency: Observations by any external entity are consistent with a single global order. This includes clients, monitoring systems, or any observer outside the system.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
# Example: Testing for Linearizability # Client A and Client B interact with a register# Initial value: x = 0 # Timeline:# ─────────────────────────────────────────────────────────────────# Time 0 1 2 3 4 5 6 7 8# ─────────────────────────────────────────────────────────────────# Client A: [──── write(x=1) ────]# ↑ response at time 3## Client B: [──── read(x) ────]# ↑ invocation at time 2# ↑ response at time 5# ───────────────────────────────────────────────────────────────── # QUESTION: What can Client B's read return? # Analysis:# - Client A's write spans time [0, 3]# - Client B's read spans time [2, 5] # - These operations OVERLAP (time 2-3) # In a linearizable system:# - If write's linearization point is before read's: read returns 1# - If read's linearization point is before write's: read returns 0# - BOTH are valid! (because operations overlap) # What WOULD violate linearizability:# - If write completes (response at time 3)# - Then read STARTS (invocation at time 4)# - Read returns 0# This violates linearizability because there's no possible# linearization order where read comes before write. class LinearizabilityChecker: """ Simplified linearizability checker for educational purposes. Real checkers (like Jepsen's Knossos) are far more complex. """ def __init__(self): self.history = [] # List of (op_type, op_id, value, start, end) def add_operation(self, op_type: str, op_id: str, value, start_time: float, end_time: float): """Record an operation with its time bounds.""" self.history.append({ 'type': op_type, 'id': op_id, 'value': value, 'start': start_time, 'end': end_time }) def check_linearizable(self) -> bool: """ Attempt to find a linearization of the history. Returns True if a valid linearization exists. This is NP-complete in general, so we use heuristics for real-world checking. """ # Sort by end time (partial order) sorted_ops = sorted(self.history, key=lambda x: x['end']) # Try to find valid linearization points # (Simplified: actual algorithm is much more complex) for op in sorted_ops: # Each operation must have a linearization point # within its [start, end] interval if not self._can_linearize(op): return False return True def _can_linearize(self, op) -> bool: """Check if operation can be assigned a valid linearization point.""" # Implementation would check all ordering constraints # This is where the NP-complete complexity lives return True # PlaceholderLinearizability vs Serializability:
These terms are often confused, but they differ fundamentally:
| Property | Linearizability | Serializability |
|---|---|---|
| Scope | Individual operations | Transactions (groups of operations) |
| Time Sensitivity | Respects real-time | Does not require real-time ordering |
| Use Case | Single-object operations | Multi-object transactions |
| Guarantee | Recent value | Isolation of concurrent transactions |
Strict Serializability combines both: transactions must be serializable AND respect real-time ordering. This is what systems like Google Spanner provide.
Strong consistency doesn't happen automatically in distributed systems—it must be explicitly engineered. Several mechanisms can provide linearizability, each with different trade-offs:
Approach 1: Single Leader (Primary-Based)
The simplest approach: route all writes through a single leader node. The leader serializes all operations, assigning each a sequence number. Reads can either go to the leader (guaranteeing freshness) or to followers after confirming they're caught up.
Pros: Simple to implement, low coordination overhead for writes Cons: Leader is a single point of failure; leader changes are complex
Approach 2: Consensus Protocols (Raft, Paxos, Viewstamped Replication)
Consensus protocols allow a cluster to agree on a sequence of operations despite node failures. A majority (quorum) of nodes must agree before an operation is committed.
Pros: Tolerates minority failures, no single point of failure after leader election Cons: Higher latency (majority must respond), leader election pauses availability
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
# Consensus-Based Strong Consistency# Simplified view of how Raft provides linearizability class RaftNode: """ Simplified Raft node demonstrating strong consistency. Real implementations are significantly more complex. """ def __init__(self, node_id: str, cluster_nodes: list): self.node_id = node_id self.cluster = cluster_nodes self.state = "follower" # follower, candidate, leader self.current_term = 0 self.log = [] # List of (term, index, command) self.commit_index = 0 self.state_machine = {} # Actual data storage def write(self, key: str, value: any) -> bool: """ Strong consistency write operation. Only leader can accept writes. """ if self.state != "leader": # Redirect to leader return self._forward_to_leader("write", key, value) # Step 1: Append to local log (not committed yet) entry = { 'term': self.current_term, 'index': len(self.log), 'command': {'type': 'write', 'key': key, 'value': value} } self.log.append(entry) # Step 2: Replicate to followers # Must get acknowledgment from MAJORITY before committing acks = 1 # Count self quorum = len(self.cluster) // 2 + 1 for node in self.cluster: if node != self.node_id: # Send AppendEntries RPC if self._send_append_entries(node, entry): acks += 1 # Step 3: If majority acknowledged, commit if acks >= quorum: self.commit_index = entry['index'] # Apply to state machine self.state_machine[key] = value # Notify followers to commit self._broadcast_commit(entry['index']) return True # Failed to reach quorum - DO NOT acknowledge to client return False def read(self, key: str) -> any: """ Linearizable read operation. Multiple strategies possible. """ if self.state == "leader": # Option A: Read from leader # Must confirm we're still leader (quorum check) if not self._confirm_leadership(): return self._forward_to_leader("read", key) # Safe to read - we're confirmed leader return self.state_machine.get(key) else: # Option B: Forward to leader return self._forward_to_leader("read", key) # Option C: Read from follower with ReadIndex protocol # (requires asking leader for current commit index) def _confirm_leadership(self) -> bool: """ Heartbeat to quorum to confirm we're still leader. Prevents stale reads after network partition. """ responses = 1 # Count self quorum = len(self.cluster) // 2 + 1 for node in self.cluster: if node != self.node_id: # Send heartbeat if self._send_heartbeat(node): responses += 1 return responses >= quorum def _send_append_entries(self, node: str, entry: dict) -> bool: """Send log entry to follower node.""" # Network call - may fail, timeout, etc. # Real implementation handles retries, term checking, etc. pass def _forward_to_leader(self, operation: str, *args) -> any: """Forward operation to current leader.""" # Real implementation discovers leader via cluster state passApproach 3: Quorum Systems with Read Repair
In systems like DynamoDB or Cassandra with strong consistency enabled, reads and writes go to multiple nodes simultaneously. With N replicas, writing to W nodes and reading from R nodes where W + R > N provides strong consistency.
Example: With 3 replicas (N=3), write to 2 nodes (W=2), read from 2 nodes (R=2). Since 2 + 2 = 4 > 3, at least one node in every read will have the latest write.
Approach 4: Synchronized Clocks (TrueTime)
Google Spanner uses synchronized atomic clocks to bound clock uncertainty. Transactions can be ordered globally using timestamps without per-transaction coordination, as long as operations wait for clock uncertainty to pass. This enables external consistency at global scale.
Strong consistency is not free. The CAP theorem and the PACELC framework formalize the fundamental trade-offs:
CAP Theorem: In the presence of a network Partition, a distributed system must choose between:
Strong consistency systems choose C over A. During network partitions, strongly consistent systems become unavailable rather than risk returning stale data.
PACELC Extension: Even when there's no partition (Else), there's a trade-off between:
Strong consistency requires coordination—waiting for quorums, confirming leadership, propagating updates. This coordination adds latency.
| Scenario | Weak Consistency Latency | Strong Consistency Latency | Overhead |
|---|---|---|---|
| Single datacenter read | ~1ms (local) | ~5-10ms (leader roundtrip) | 5-10x |
| Cross-datacenter read | ~1ms (local replica) | ~50-100ms (remote leader) | 50-100x |
| Write operation | ~5ms (async replication) | ~20-50ms (synchronous quorum) | 4-10x |
| Global transaction | N/A | ~150-300ms (global consensus) | N/A |
When a strongly consistent system loses contact with a majority of nodes, it MUST stop accepting writes. This is not a bug—it's the correctness guarantee in action. Accepting writes without consensus would risk split-brain scenarios where different partitions accept conflicting updates. For mission-critical systems, this unavailability is preferable to inconsistency.
Throughput Implications:
Strong consistency also limits throughput:
Serialization bottleneck: All operations must be ordered against each other. Even if you have 100 database nodes, all writes effectively pass through a single logical serialization point.
Cross-shard coordination: In sharded systems, transactions spanning multiple shards require distributed coordination (2PC, 3PC, or Saga patterns), multiplying latency.
Replication overhead: Every write must be replicated to a quorum before acknowledgment, consuming network bandwidth and disk I/O on multiple nodes.
Typical throughput differences:
| Configuration | Eventual Consistency | Strong Consistency |
|---|---|---|
| Single-shard writes | 50K ops/sec | 10K ops/sec |
| Multi-shard transactions | 30K ops/sec | 1-5K ops/sec |
| Global distribution | 100K+ ops/sec | 5-10K ops/sec |
Despite the costs, many applications genuinely require strong consistency. The decision framework centers on the cost of inconsistency:
Question: What happens if two clients briefly see different values?
If the answer involves money, safety, security, or correctness violations—strong consistency is likely necessary.
Most real-world systems use strong consistency selectively. Critical operations (payments, inventory decrements) use strongly consistent paths, while high-volume, less-critical operations (feeds, counters) use eventual consistency. This hybrid approach optimizes for both correctness and performance.
Several production systems offer strong consistency guarantees, each with different approaches:
| System | Consistency Mechanism | Scope | Trade-offs |
|---|---|---|---|
| Google Spanner | TrueTime + Paxos | Global, multi-region | Requires synchronized clocks; higher latency for commits |
| CockroachDB | Raft consensus | Multi-region distributed SQL | Write latency increases with geographic distance |
| etcd | Raft consensus | Configuration/coordination | Not designed for high-throughput data workloads |
| ZooKeeper | Zab protocol | Coordination service | Limited to coordination use cases; not a general database |
| FoundationDB | Optimistic concurrency + Paxos | Ordered key-value store | Transaction size limits; learning curve |
| TiDB | Raft + 2PC | MySQL-compatible distributed SQL | Cross-shard transactions add latency |
| Amazon Aurora | Quorum writes + storage fleet | Regional MySQL/PostgreSQL | Single-region only for strong consistency |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109
# How Google Spanner Achieves Global Strong Consistency# Using TrueTime for External Consistency class TrueTime: """ Simplified model of Google's TrueTime API. Real TrueTime uses GPS + Atomic clocks in datacenters. """ def now(self) -> tuple: """ Returns [earliest, latest] - the interval within which the true current time definitely falls. Real TrueTime typically has ~7ms uncertainty (ε). GPS/atomic clocks bound this uncertainty. """ current_time = self._get_clock_reading() epsilon = self._get_clock_uncertainty() # Usually ~7ms return (current_time - epsilon, current_time + epsilon) def after(self, timestamp: float) -> bool: """Returns True if timestamp is definitely in the past.""" earliest, _ = self.now() return earliest > timestamp def before(self, timestamp: float) -> bool: """Returns True if timestamp is definitely in the future.""" _, latest = self.now() return latest < timestamp class SpannerTransaction: """ Simplified model of Spanner commit protocol. Demonstrates how TrueTime enables external consistency. """ def __init__(self, truetime: TrueTime): self.truetime = truetime self.read_timestamp = None self.commit_timestamp = None def commit(self, writes: list) -> bool: """ Commit transaction with external consistency guarantee. KEY INSIGHT: After commit returns, any transaction that starts will see this transaction's writes. """ # Step 1: Acquire locks on all written keys (via Paxos groups) if not self._acquire_locks(writes): return False # Step 2: Choose commit timestamp # Must be >= any timestamp seen during transaction # Must be >= TrueTime.now().latest _, latest = self.truetime.now() self.commit_timestamp = max( self._max_timestamp_seen(), latest ) # Step 3: THE CRITICAL WAIT # We must wait until we're sure the commit timestamp # is in the past at ALL nodes worldwide. # # Wait until: TrueTime.now().earliest > commit_timestamp # # This typically takes ~2ε (~14ms with 7ms uncertainty) while not self.truetime.after(self.commit_timestamp): # Wait for clock uncertainty to pass time.sleep(0.001) # 1ms sleep # Step 4: Now safe to commit # Any future transaction will have a start timestamp # greater than our commit timestamp (external consistency!) self._apply_writes(writes) self._release_locks(writes) return True def _max_timestamp_seen(self) -> float: """Return maximum timestamp observed during reads.""" # Implementation tracks timestamps of all read values pass # WHY THIS WORKS:# # Transaction A commits with timestamp T_a# After commit returns, TrueTime.now().earliest > T_a (guaranteed by wait)## Transaction B starts after Transaction A returns# Transaction B's start timestamp T_b comes from TrueTime.now().latest## Since B started after A committed:# T_b >= B's TrueTime.now().latest# A's TrueTime.now().earliest > T_a (commit waited for this)# # Real time ordering ensures:# B's TrueTime.now().latest >= A's TrueTime.now().earliest# Therefore: T_b > T_a## Since B's timestamp > A's timestamp, and Spanner orders by timestamp,# B will always see A's writes. External consistency achieved!TrueTime represents a fundamental insight: if you can bound clock uncertainty precisely, you can trade that small wait time for global coordination-free ordering. The ~7-10ms commit wait is much faster than traditional cross-datacenter consensus round-trips (~100-300ms). This is why Spanner can provide external consistency at global scale with reasonable latency.
Providing linearizable writes is only half the battle. Ensuring reads return the latest data requires careful implementation. Several patterns exist:
Pattern 1: Read from Leader The simplest approach—all reads go to the leader node, which has authoritative state.
Pattern 2: Quorum Reads Read from majority of replicas, use the response with the highest version.
Pattern 3: ReadIndex Protocol (Raft) Follower asks leader for current commit index, waits until local log reaches that index.
Pattern 4: Lease-Based Reads Leader holds a time-bounded lease; can serve reads without quorum while lease is valid.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159
# Implementing Linearizable Reads# Multiple patterns with different trade-offs class LinearizableReader: """ Demonstrates different approaches to ensuring read linearizability. """ def __init__(self, cluster: list, leader_lease_duration: float = 10.0): self.cluster = cluster self.lease_duration = leader_lease_duration self.leader_lease_start = None self.is_leader = False # ═══════════════════════════════════════════════════════════════ # PATTERN 1: Read from leader with leadership confirmation # ═══════════════════════════════════════════════════════════════ def quorum_read(self, key: str) -> any: """ Confirm leadership via quorum before reading. Most straightforward linearizable read. Latency: ~2 RTT (1 for quorum check, 1 for actual read) """ if not self.is_leader: return self._forward_to_leader(key) # Confirm we're still leader # (Could be stale after network partition) if not self._confirm_leadership_via_heartbeat(): # We're not leader anymore - redirect return self._forward_to_leader(key) # Safe to read from local state return self._local_read(key) def _confirm_leadership_via_heartbeat(self) -> bool: """Send heartbeats to majority to confirm leadership.""" acks = 1 # Count self quorum = len(self.cluster) // 2 + 1 for node in self.cluster: if self._send_heartbeat(node): acks += 1 return acks >= quorum # ═══════════════════════════════════════════════════════════════ # PATTERN 2: ReadIndex Protocol (Raft-style) # ═══════════════════════════════════════════════════════════════ def read_index_read(self, key: str) -> any: """ Follower-based linearizable read using ReadIndex protocol. Allows distributing read load across followers. Steps: 1. Ask leader for current committed index 2. Wait until local log reaches that index 3. Read from local state Latency: ~1 RTT to leader + local wait """ # Step 1: Get committed index from leader committed_index = self._get_committed_index_from_leader() if committed_index is None: # Leader unreachable - fail or retry raise Exception("Cannot reach leader for ReadIndex") # Step 2: Wait for local log to catch up self._wait_for_local_commit(committed_index) # Step 3: Now safe to read locally return self._local_read(key) def _get_committed_index_from_leader(self) -> int: """Query leader for current commit index.""" # Leader must first confirm its own leadership # via quorum (can piggyback on heartbeat) pass def _wait_for_local_commit(self, target_index: int): """Wait until local log is committed up to target.""" while self._get_local_commit_index() < target_index: time.sleep(0.001) # 1ms polling # ═══════════════════════════════════════════════════════════════ # PATTERN 3: Lease-based reads # ═══════════════════════════════════════════════════════════════ def lease_read(self, key: str) -> any: """ Fast reads when leader holds valid lease. Requires synchronized clocks (bounded drift). Latency: ~0 RTT when lease valid, ~1-2 RTT to renew """ if not self.is_leader: return self._forward_to_leader(key) # Check if lease is still valid if self._lease_is_valid(): # Fast path: no network required return self._local_read(key) # Lease expired - need to renew if self._renew_lease(): return self._local_read(key) # Failed to renew - no longer leader return self._forward_to_leader(key) def _lease_is_valid(self) -> bool: """ Check if current lease is valid. CRITICAL: Lease must expire BEFORE followers might elect a new leader. Must account for clock drift! """ if self.leader_lease_start is None: return False current_time = time.time() elapsed = current_time - self.leader_lease_start # Use conservative estimate (account for clock drift) max_clock_drift = 0.1 # 100ms assumed max drift safe_lease_duration = self.lease_duration - max_clock_drift return elapsed < safe_lease_duration # ═══════════════════════════════════════════════════════════════ # PATTERN 4: Strict quorum reads (Dynamo-style) # ═══════════════════════════════════════════════════════════════ def quorum_read_dynamo_style(self, key: str) -> any: """ Read from majority of replicas, return highest version. For linearizability with N=3: R=2, W=2 (R + W > N) Latency: parallelized, but wait for majority """ n_replicas = 3 read_quorum = 2 responses = [] for replica in self._get_replicas(key): response = self._read_from_replica(replica, key) if response: responses.append(response) if len(responses) < read_quorum: raise Exception("Failed to reach read quorum") # Return value with highest version return max(responses, key=lambda r: r['version'])['value']A common production bug: configuring reads to go to followers without proper freshness checks. The read returns successfully, but with stale data. This creates subtle, hard-to-reproduce bugs where behavior depends on replication lag. Always verify your read path actually provides linearizability if that's what your application requires.
Claiming strong consistency and actually providing it are different things. Many distributed databases have been discovered to violate their consistency guarantees under specific failure scenarios. Testing is essential.
Jepsen Testing:
Kyle Kingsbury's Jepsen project has become the gold standard for distributed systems testing. Jepsen:
Systems that have failed Jepsen tests include: MongoDB, Cassandra, RethinkDB, CockroachDB (earlier versions), and many others. These failures often exposed real bugs and led to fixes.
Testing Strategies:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195
# Simplified Consistency Testing Framework import threadingimport timefrom dataclasses import dataclassfrom typing import List, Optionalimport random @dataclassclass Operation: """Record of a single operation.""" client_id: str op_type: str # 'write' or 'read' key: str input_value: Optional[any] # Value written (for writes) output_value: Optional[any] # Value read (for reads) start_time: float end_time: float success: bool class ConsistencyTester: """ Tests whether a distributed system provides linearizability. """ def __init__(self, client_factory, num_clients: int = 10): self.client_factory = client_factory self.num_clients = num_clients self.operations: List[Operation] = [] self.lock = threading.Lock() def run_test(self, duration_seconds: float = 30.0, failure_probability: float = 0.1) -> bool: """ Run concurrent operations and check linearizability. Args: duration_seconds: How long to run the test failure_probability: Probability of inducing failure per second Returns: True if history is linearizable, False otherwise """ threads = [] stop_event = threading.Event() # Start client threads for i in range(self.num_clients): client = self.client_factory() t = threading.Thread( target=self._client_worker, args=(f"client_{i}", client, stop_event) ) threads.append(t) t.start() # Start failure injector failure_thread = threading.Thread( target=self._failure_injector, args=(stop_event, failure_probability) ) failure_thread.start() # Run for duration time.sleep(duration_seconds) stop_event.set() # Wait for threads for t in threads: t.join() failure_thread.join() # Check linearizability return self._check_linearizability() def _client_worker(self, client_id: str, client, stop_event): """Worker thread performing random operations.""" while not stop_event.is_set(): key = f"key_{random.randint(0, 9)}" if random.random() < 0.5: # Write operation value = random.randint(1, 1000) start = time.time() try: client.write(key, value) success = True except: success = False end = time.time() self._record_operation( client_id, 'write', key, value, None, start, end, success ) else: # Read operation start = time.time() try: result = client.read(key) success = True except: result = None success = False end = time.time() self._record_operation( client_id, 'read', key, None, result, start, end, success ) # Small delay between operations time.sleep(random.uniform(0.001, 0.01)) def _record_operation(self, client_id, op_type, key, input_val, output_val, start, end, success): """Thread-safe operation recording.""" op = Operation( client_id=client_id, op_type=op_type, key=key, input_value=input_val, output_value=output_val, start_time=start, end_time=end, success=success ) with self.lock: self.operations.append(op) def _failure_injector(self, stop_event, probability): """Periodically inject failures into the cluster.""" while not stop_event.is_set(): if random.random() < probability: failure_type = random.choice([ 'network_partition', 'node_crash', 'node_restart', 'slow_network' ]) self._inject_failure(failure_type) time.sleep(1.0) def _inject_failure(self, failure_type: str): """Inject a specific failure into the cluster.""" print(f"[FAILURE] Injecting: {failure_type}") # Implementation would use iptables, kill signals, etc. pass def _check_linearizability(self) -> bool: """ Check if recorded history is linearizable. This is NP-complete in general, but tractable for small histories. Real tools like Knossos use sophisticated algorithms. """ successful_ops = [op for op in self.operations if op.success] print(f"Checking linearizability of {len(successful_ops)} operations") # Group by key (each key's operations must be independently linearizable) by_key = {} for op in successful_ops: by_key.setdefault(op.key, []).append(op) for key, ops in by_key.items(): if not self._check_key_linearizable(ops): print(f"[VIOLATION] Key '{key}' is not linearizable!") return False print("[PASS] History is linearizable") return True def _check_key_linearizable(self, ops: List[Operation]) -> bool: """Check linearizability for single key's operations.""" # Simplified check - real algorithms are much more complex # This handles basic sequential consistency checking ops_sorted = sorted(ops, key=lambda x: x.start_time) # Track what values could be current at each point current_value = None # or some initial value for op in ops_sorted: if op.op_type == 'write': current_value = op.input_value elif op.op_type == 'read': # In a simple check, read should return current_value # Real linearizability allows for concurrent operations # This is just for illustration if op.output_value != current_value: # Could be violation or concurrent operation # Full algorithm would explore all orderings pass return True # PlaceholderIf your system claims strong consistency, run Jepsen tests against it. Many companies have been embarrassed by Jepsen reports revealing consistency violations. Better to find issues in testing than have customers discover them in production. Jepsen is open source and well-documented.
We've covered the full landscape of strong consistency—from theoretical foundations to practical implementation. Let's consolidate the key takeaways:
What's Next:
Strong consistency represents one end of the consistency spectrum. The next page explores the opposite end: Eventual Consistency. We'll examine how systems can remain available and performant by relaxing consistency guarantees, the mathematical foundations in the form of convergence, and how applications can work effectively with eventually consistent data.
You now understand strong consistency at a deep, implementable level—from the theoretical foundation of linearizability through practical mechanisms like consensus protocols and TrueTime, to the real-world costs and testing approaches. This knowledge forms the baseline against which all other consistency models are compared.