Loading learning content...
In 2012, a Knight Capital trading algorithm went haywire, executing 4 million trades in 45 minutes and losing $440 million. Part of the failure involved systems that had inconsistent views of the same data—different servers believed different things about which version of code was active.
Consistency in distributed systems asks a deceptively simple question: When data changes, when do all observers see the change? The answer seems obvious—immediately, of course. But in distributed systems spanning multiple machines, data centers, and continents, "immediately" becomes surprisingly hard to define, achieve, and even want.
Consistency requirements specify how your system handles the fundamental impossibility of perfect, instant, global synchronization. They determine whether users see stale data, whether operations can conflict, and whether your system guarantees the correctness properties your business requires.
By the end of this page, you will master the framework for defining consistency requirements: understanding consistency models from strong to eventual, applying CAP and PACELC theorems, specifying consistency per operation type, and making the trade-offs between consistency, availability, and latency explicit.
Consistency has different meanings in different contexts. Before specifying requirements, we must clarify which type of consistency we're discussing.
Types of Consistency:
Data Consistency (ACID): All nodes see the same data at the same time. Traditional database consistency.
Transactional Consistency: ACID properties—Atomicity, Consistency, Isolation, Durability. Operations either fully complete or fully fail.
Replica Consistency: Agreement between copies of the same data across multiple storage nodes.
Read-After-Write Consistency: When you write data, you can immediately read what you wrote.
Causal Consistency: If operation A caused operation B, everyone sees A before B.
The Core Challenge:
In a single-server system, consistency is trivial—there's one copy of data, one truth. In distributed systems, data is replicated for reliability and performance. The moment you have two copies, they can disagree.
Server A writes: balance = 100
Server B writes: balance = 50 (concurrently)
Which value is correct? Who wins?
What does a reader see? From which server?
| Scenario | What Can Go Wrong | User Impact |
|---|---|---|
| User updates profile | Change visible on one server, not another | User refreshes and sees old data |
| Inventory decrement | Two servers both allow purchase of last item | Overselling, customer disappointment |
| Bank transfer | Debit completes, credit fails | Money disappears (catastrophic) |
| Social media post | Post visible to some users, not others | Confusion, incomplete conversations |
| Distributed counter | Two increments become one final value | Lost updates, incorrect counts |
Systems are not simply 'consistent' or 'inconsistent.' They exist on a spectrum of consistency models, each with different guarantees and trade-offs. Your requirements must specify WHERE on this spectrum each operation needs to be.
Consistency models define what guarantees a system provides about the visibility and ordering of operations. Here's the spectrum from strongest to weakest:
Strong Consistency (Linearizability):
Sequential Consistency:
Causal Consistency:
Session Consistency (Read-Your-Writes):
| Model | Strength | Latency Impact | Availability Impact | Typical Implementation |
|---|---|---|---|---|
| Strong/Linearizable | Strongest | High (sync replication) | Low (blocks on failure) | Paxos, Raft, 2PC |
| Sequential | Strong | High | Low | Total order broadcast |
| Causal | Medium | Medium | Medium | Vector clocks, version vectors |
| Session/Read-Your-Writes | Medium | Low | High | Sticky sessions, local caching |
| Monotonic Reads | Weak | Low | High | Version tracking per client |
| Eventual | Weakest | Lowest | Highest | Async replication, CRDTs |
Eventual Consistency:
Eventual consistency is often misunderstood. It doesn't mean "inconsistent"—it means consistency is deferred. The key question is: how eventual is eventual?
| Eventual Consistency Flavor | Typical Convergence Time | Example |
|---|---|---|
| Near real-time | <100ms | CDN cache invalidation |
| Short delay | 100ms - 1s | Cross-region database replication |
| Moderate delay | 1s - 30s | Async message processing |
| Long delay | Minutes to hours | Data warehouse synchronization |
| Unbounded | Unknown | Best-effort batch processing |
When specifying eventual consistency, always include a convergence time bound: 'The system shall provide eventual consistency with convergence within 5 seconds under normal operation.' Without this bound, 'eventual' could mean minutes or hours—very different user experiences.
The CAP theorem and its extension PACELC formalize the trade-offs that consistency requirements must navigate.
CAP Theorem:
In a distributed system, during a network partition, you can have at most two of three properties:
The CAP Reality:
Since network partitions will happen (P is required), CAP really asks: During a partition, do you prefer Consistency or Availability?
┌────────────────────────────────────────────────────────────┐
│ Network Partition Occurs │
│ │
│ [Server A] ════╳════ [Server B] │
│ Has write Doesn't have write │
│ │
│ Option CP: Server B returns error (consistent) │
│ Option AP: Server B returns stale data (available) │
└────────────────────────────────────────────────────────────┘
CP (Consistency over Availability):
AP (Availability over Consistency):
| Use Case | CAP Choice | Rationale |
|---|---|---|
| Banking transactions | CP | Wrong balance is unacceptable; prefer errors |
| Inventory management | CP | Overselling is costly; prefer rejection |
| Social media feed | AP | Stale posts are fine; unavailability isn't |
| Gaming leaderboard | AP | Slightly outdated scores are acceptable |
| Medical records | CP | Wrong data could harm patients |
| Shopping cart | AP | Availability drives revenue; merge conflicts later |
PACELC: The Complete Picture:
CAP only describes behavior during partitions. PACELC extends this:
Partition → Availability vs Consistency Else (no partition) → Latency vs Consistency
Even without partitions, there's a trade-off between latency and consistency:
| System Type | Partition Behavior | Normal Behavior |
|---|---|---|
| PA/EL | Availability | Low Latency |
| PA/EC | Availability | Strong Consistency |
| PC/EL | Consistency | Low Latency |
| PC/EC | Consistency | Strong Consistency |
Most modern systems are PA/EL:
CAP describes behavior during partitions—which are rare. Most of the time, systems can provide both consistency AND availability. Your requirements should specify: 1) Normal operation behavior, 2) Partition behavior, 3) Recovery behavior after partition heals.
Not every operation needs the same consistency level. Sophisticated systems apply different consistency requirements to different operations based on business impact.
Multi-Consistency Architecture:
A single application might use:
Consistency Requirement Specification by Entity:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
# Consistency Requirements by Operation Type ## Tier 1: Strong Consistency Required ### Financial Operations- Account balance reads: Strong consistency- Transfer execution: Linearizable transactions- Reconciliation: Point-in-time snapshot consistency- Rationale: Incorrect financial data is unacceptable ### Inventory Operations- Stock level checks before purchase: Strong consistency- Inventory decrement: Atomic with purchase- Rationale: Overselling creates fulfillment failures ## Tier 2: Session Consistency Required ### User Profile Operations- Profile reads after update: Read-your-writes within session- Preference changes: Visible immediately to user- Cross-session: Eventual (acceptable staleness: 30s)- Rationale: User should see their own changes ### Shopping Cart- Cart modifications: Session consistency- Cart reads: Session consistency- Cross-device sync: Eventual (5 minute convergence)- Rationale: Cart abandonment if user loses items ## Tier 3: Causal Consistency Required ### Messaging- Message ordering within thread: Causal- Read receipts: Causal (follow message they acknowledge)- Rationale: Out-of-order messages confuse users ### Comments and Replies- Reply ordering: Causal (reply after parent)- Edit visibility: Monotonic reads- Rationale: Conversation flow must make sense ## Tier 4: Eventual Consistency Acceptable ### Recommendations- Consistency model: Eventual- Acceptable staleness: Up to 1 hour- Rationale: Slightly outdated recommendations are fine ### Analytics Dashboards- Consistency model: Eventual- Acceptable staleness: Up to 15 minutes- Rationale: Analytics don't drive real-time decisions ### Social Feed- Consistency model: Eventual- Acceptable staleness: Up to 30 seconds- Read-your-writes: Yes (user sees own posts immediately)- Rationale: Freshness preference over strict orderingTunable Consistency:
Some databases (Cassandra, DynamoDB) offer tunable consistency per query:
| Consistency Level | Reads From | Writes To | Use When |
|---|---|---|---|
| ONE | Any 1 replica | 1 replica | Highest performance, lowest consistency |
| QUORUM | Majority of replicas | Majority of replicas | Balance of consistency and performance |
| ALL | All replicas | All replicas | Highest consistency, lowest availability |
| LOCAL_QUORUM | Majority in local DC | Majority in local DC | Multi-region with DC-local consistency |
With tunable consistency, your requirements should specify per-operation levels:
Consistency Level by Operation:
- User authentication: QUORUM read, QUORUM write
- Session creation: LOCAL_QUORUM write
- Product catalog read: ONE (cached)
- Inventory update: ALL write, QUORUM read
- Order creation: ALL write, ALL read
- Activity log: ONE write
Future engineers will question why you chose eventual consistency for operation X, or strong consistency for operation Y. Document the rationale: 'Recommendations use eventual consistency because 30-second staleness doesn't impact purchase decisions, while reducing read latency by 70%.'
When choosing availability over consistency (AP systems), conflicts will occur. Your requirements must specify how conflicts are detected and resolved.
Types of Conflicts:
| Conflict Type | Description | Example |
|---|---|---|
| Write-Write | Two concurrent writes to same key | Two users update same doc |
| Read-Write | Read during concurrent write | Balance check during transfer |
| Delete-Update | Delete and update to same record | Cancel and modify same order |
| Constraint violation | Multiple operations violate invariant | Two last-item purchases |
Conflict Resolution Strategies:
| Strategy | How It Works | Pros | Cons | Use When |
|---|---|---|---|---|
| Last-Write-Wins (LWW) | Latest timestamp wins | Simple, no user intervention | Can lose data | Non-critical, idempotent writes |
| First-Write-Wins | First write preserved | Prevents overwrites | Later updates lost | Creation-only data |
| Merge | Combine both values | No data loss | Complex merging logic | Additive operations |
| Custom resolution | Application-specific logic | Business rules enforced | Complex implementation | Domain-specific needs |
| User resolution | Present choices to user | User controls outcome | Poor UX at scale | High-value, rare conflicts |
| CRDTs | Mathematically conflict-free | Automatic convergence | Limited data types | Counters, sets, maps |
Conflict Resolution Requirements Specification:
Conflict Resolution Requirements:
1. Shopping Cart Conflicts
- Detection: Version vectors per cart
- Resolution: Merge (union of items, max quantity per item)
- Rationale: Losing items frustrates users; duplicates can be removed
2. User Profile Conflicts
- Detection: Timestamp per field
- Resolution: Last-Write-Wins per field, not per document
- Rationale: Users update different fields; field-level LWW preserves most data
3. Inventory Conflicts
- Detection: Local count vs expected count
- Resolution: Prevent if would go negative; alert if counts disagree
- Rationale: Financial accuracy required; human reconciliation for disputes
4. Comment Ordering Conflicts
- Detection: Causal dependency tracking (vector clocks)
- Resolution: Topological sort; concurrent comments shown in timestamp order
- Rationale: Replies must follow parents; siblings can be reordered
5. Distributed Counter Conflicts
- Data type: G-Counter CRDT (grow-only counter)
- Resolution: Automatic merge (sum of per-node values)
- Rationale: Like counts, view counts should never lose increments
Conflict resolution code often goes untested because conflicts are rare in development. Require: 'Conflict scenarios shall be covered by integration tests simulating concurrent operations. Chaos testing shall inject network partitions to validate conflict detection and resolution.'
Consistency requirements intersect with durability requirements through read and write concerns—how many replicas must acknowledge operations before they're considered complete.
Write Concerns:
Write concern specifies how many replicas must confirm a write before success is returned:
| Write Concern | Behavior | Durability Risk | Latency |
|---|---|---|---|
| w=0 (Fire and forget) | No acknowledgment | High (can lose write) | Lowest |
| w=1 | Primary acknowledges | Medium (primary loss = data loss) | Low |
| w=majority | Majority of replicas acknowledge | Low | Medium |
| w=all | All replicas acknowledge | Lowest | Highest |
| w=journaled | Written to disk journal | Very low | Medium-High |
Read Concerns:
Read concern specifies what data a read operation can see:
| Read Concern | Behavior | Consistency | Freshness |
|---|---|---|---|
| Local | Reads from instance | May read uncommitted | Highest |
| Available | Reads what's available | May read stale | High |
| Majority | Reads majority-committed data | Strong in cluster | Medium |
| Linearizable | Reads real-time, confirmed data | Strongest | Lowest |
| Snapshot | Point-in-time consistent | Transaction consistent | N/A |
Combined Requirements Specification:
Read and write concerns combine to determine effective consistency:
Read/Write Concern Requirements:
1. Financial Transactions
- Write concern: majority, journaled
- Read concern: majority
- Rationale: Cannot lose transactions; reads must be committed
2. User Sessions
- Write concern: majority
- Read concern: local (performance)
- Rationale: Sessions are short-lived; rare read of stale session is acceptable
3. Audit Logs
- Write concern: majority, journaled
- Read concern: majority
- Rationale: Audit records must be durable and complete
4. Cache Writes (to persistent cache)
- Write concern: 1
- Read concern: local
- Rationale: Cache can be rebuilt; performance is priority
5. Distributed Analytics
- Write concern: 1 (local DC)
- Read concern: available
- Rationale: Analytics tolerate some data loss for throughput
Default (unspecified operations):
- Write concern: majority
- Read concern: local
- Rationale: Balance of durability and performance
With R replicas for reads and W for writes, you get strong consistency if R + W > N (total replicas). Example: 3 replicas, majority read (R=2), majority write (W=2). Since 2+2=4 > 3, at least one replica participates in both read and write, guaranteeing consistency.
Consistency is difficult to observe directly—you often don't know you have a consistency problem until a user reports seeing strange behavior. Proactive monitoring is essential.
Consistency Monitoring Requirements:
Consistency Testing Requirements:
Consistency Testing Requirements:
1. Unit Tests
- Test conflict resolution logic with deterministic conflict scenarios
- Coverage: All merge functions, LWW tie-breakers, CRDT operations
2. Integration Tests
- Simulate concurrent writes from multiple clients
- Verify final state matches conflict resolution policy
- Test read-your-writes within session
3. Chaos Testing
- Inject network partitions between replicas
- Verify system behavior matches CAP choice (CP or AP)
- After partition heals, verify convergence within SLA
4. Load Testing with Consistency Validation
- Under sustained load, continuously verify:
- Replication lag stays within bounds
- Conflict rate doesn't spike
- Invariants hold
5. Production Verification
- Continuously run consistency probes in production
- Compare replica states periodically
- Shadow-read from multiple replicas and compare
Consider requiring Jepsen-style testing for critical consistency guarantees. 'Linearizability claims shall be validated using automated consistency testing that injects failures and network partitions, verifying that no linearizability violations occur across 1000+ test runs.'
We have covered the complete framework for consistency requirements. Let's consolidate the essential takeaways:
What's Next:
With consistency requirements mastered, we turn to the final non-functional dimension: Security Requirements. While consistency determines the correctness of your data, security determines who can access and modify it. Security requirements span authentication, authorization, data protection, and compliance—often the most legally consequential non-functional requirements you'll specify.
You now have a comprehensive framework for defining consistency requirements. These specifications determine your data architecture, replication strategy, and database selection. In the next page, we'll complete the non-functional requirements with security—the requirements that determine who can access your consistent, available, fast, scalable system.