System Design (HLD)Non-Functional Requirements

Non-Functional Requirements

LevelAdvanced

Duration90 mins

TopicNon-Functional Requirements

4 / 5

Consistency Requirements

When Reality Forks

In 2012, a Knight Capital trading algorithm went haywire, executing 4 million trades in 45 minutes and losing $440 million. Part of the failure involved systems that had inconsistent views of the same data—different servers believed different things about which version of code was active.

Consistency in distributed systems asks a deceptively simple question: When data changes, when do all observers see the change? The answer seems obvious—immediately, of course. But in distributed systems spanning multiple machines, data centers, and continents, "immediately" becomes surprisingly hard to define, achieve, and even want.

Consistency requirements specify how your system handles the fundamental impossibility of perfect, instant, global synchronization. They determine whether users see stale data, whether operations can conflict, and whether your system guarantees the correctness properties your business requires.

What You Will Learn

By the end of this page, you will master the framework for defining consistency requirements: understanding consistency models from strong to eventual, applying CAP and PACELC theorems, specifying consistency per operation type, and making the trade-offs between consistency, availability, and latency explicit.

Understanding Consistency in Distributed Systems

Consistency has different meanings in different contexts. Before specifying requirements, we must clarify which type of consistency we're discussing.

Types of Consistency:

Data Consistency (ACID): All nodes see the same data at the same time. Traditional database consistency.
Transactional Consistency: ACID properties—Atomicity, Consistency, Isolation, Durability. Operations either fully complete or fully fail.
Replica Consistency: Agreement between copies of the same data across multiple storage nodes.
Read-After-Write Consistency: When you write data, you can immediately read what you wrote.
Causal Consistency: If operation A caused operation B, everyone sees A before B.

The Core Challenge:

In a single-server system, consistency is trivial—there's one copy of data, one truth. In distributed systems, data is replicated for reliability and performance. The moment you have two copies, they can disagree.

Server A writes: balance = 100
Server B writes: balance = 50 (concurrently)

Which value is correct? Who wins?
What does a reader see? From which server?

Consistency Scenarios and Challenges
Scenario	What Can Go Wrong	User Impact
User updates profile	Change visible on one server, not another	User refreshes and sees old data
Inventory decrement	Two servers both allow purchase of last item	Overselling, customer disappointment
Bank transfer	Debit completes, credit fails	Money disappears (catastrophic)
Social media post	Post visible to some users, not others	Confusion, incomplete conversations
Distributed counter	Two increments become one final value	Lost updates, incorrect counts

Consistency Is Not Binary

Systems are not simply 'consistent' or 'inconsistent.' They exist on a spectrum of consistency models, each with different guarantees and trade-offs. Your requirements must specify WHERE on this spectrum each operation needs to be.

The Consistency Model Spectrum

Consistency models define what guarantees a system provides about the visibility and ordering of operations. Here's the spectrum from strongest to weakest:

Strong Consistency (Linearizability):

Guarantee: Every read receives the most recent write, globally ordered
Behavior: System acts as if there's a single copy of data
Cost: High latency (coordination required), reduced availability during partitions
Use cases: Financial transactions, inventory systems, leader election

Sequential Consistency:

Guarantee: Operations appear in some global order consistent with program order
Behavior: All processes see the same order of operations
Cost: Still requires significant coordination
Use cases: Replicated logs, consensus protocols

Causal Consistency:

Guarantee: Causally related operations are seen in order; concurrent operations may be seen in any order
Behavior: If A→B (A happened before B), everyone sees A before B
Cost: Moderate—tracks dependencies, not global order
Use cases: Collaborative editing, social media threads

Session Consistency (Read-Your-Writes):

Guarantee: Within a session, you always see your own writes
Behavior: User never sees their data go backward in their session
Cost: Low—track session state, route to correct replica
Use cases: User profiles, shopping carts, personal settings

Consistency Models Comparison
Model	Strength	Latency Impact	Availability Impact	Typical Implementation
Strong/Linearizable	Strongest	High (sync replication)	Low (blocks on failure)	Paxos, Raft, 2PC
Sequential	Strong	High	Low	Total order broadcast
Causal	Medium	Medium	Medium	Vector clocks, version vectors
Session/Read-Your-Writes	Medium	Low	High	Sticky sessions, local caching
Monotonic Reads	Weak	Low	High	Version tracking per client
Eventual	Weakest	Lowest	Highest	Async replication, CRDTs

Eventual Consistency:

Guarantee: Given no new updates, all replicas will eventually converge to the same value
Behavior: Reads may return stale data; replicas may temporarily disagree
Cost: Lowest latency, highest availability
Use cases: DNS, social media feeds, analytics, recommendations

Eventual consistency is often misunderstood. It doesn't mean "inconsistent"—it means consistency is deferred. The key question is: how eventual is eventual?

Eventual Consistency Flavor	Typical Convergence Time	Example
Near real-time	<100ms	CDN cache invalidation
Short delay	100ms - 1s	Cross-region database replication
Moderate delay	1s - 30s	Async message processing
Long delay	Minutes to hours	Data warehouse synchronization
Unbounded	Unknown	Best-effort batch processing

Specify Convergence Time

When specifying eventual consistency, always include a convergence time bound: 'The system shall provide eventual consistency with convergence within 5 seconds under normal operation.' Without this bound, 'eventual' could mean minutes or hours—very different user experiences.

CAP Theorem and PACELC: The Impossible Trade-offs

The CAP theorem and its extension PACELC formalize the trade-offs that consistency requirements must navigate.

CAP Theorem:

In a distributed system, during a network partition, you can have at most two of three properties:

Consistency: Every read receives the most recent write or an error
Availability: Every request receives a response (not necessarily the most recent write)
Partition tolerance: The system continues operating despite network partitions

The CAP Reality:

Since network partitions will happen (P is required), CAP really asks: During a partition, do you prefer Consistency or Availability?

┌────────────────────────────────────────────────────────────┐
│              Network Partition Occurs                      │
│                                                            │
│    [Server A]  ════╳════  [Server B]                       │
│    Has write               Doesn't have write              │
│                                                            │
│    Option CP: Server B returns error (consistent)          │
│    Option AP: Server B returns stale data (available)      │
└────────────────────────────────────────────────────────────┘

CP (Consistency over Availability):

Returns errors or blocks during partitions
Users may see unavailability
Data is always correct when returned
Examples: Traditional RDBMS, Zookeeper, HBase

AP (Availability over Consistency):

Always returns a response
Data may be stale or conflicting
System remains operational during partitions
Examples: Cassandra, CouchDB, DynamoDB (default)

CAP Trade-off by Use Case
Use Case	CAP Choice	Rationale
Banking transactions	CP	Wrong balance is unacceptable; prefer errors
Inventory management	CP	Overselling is costly; prefer rejection
Social media feed	AP	Stale posts are fine; unavailability isn't
Gaming leaderboard	AP	Slightly outdated scores are acceptable
Medical records	CP	Wrong data could harm patients
Shopping cart	AP	Availability drives revenue; merge conflicts later

PACELC: The Complete Picture:

CAP only describes behavior during partitions. PACELC extends this:

Partition → Availability vs Consistency Else (no partition) → Latency vs Consistency

Even without partitions, there's a trade-off between latency and consistency:

System Type	Partition Behavior	Normal Behavior
PA/EL	Availability	Low Latency
PA/EC	Availability	Strong Consistency
PC/EL	Consistency	Low Latency
PC/EC	Consistency	Strong Consistency

Most modern systems are PA/EL:

During partitions: Favor availability (keep serving)
During normal operation: Favor latency (async replication)
This describes DynamoDB, Cassandra, and many NoSQL databases

CAP is About Extreme Cases

CAP describes behavior during partitions—which are rare. Most of the time, systems can provide both consistency AND availability. Your requirements should specify: 1) Normal operation behavior, 2) Partition behavior, 3) Recovery behavior after partition heals.

Specifying Consistency Per Operation Type

Not every operation needs the same consistency level. Sophisticated systems apply different consistency requirements to different operations based on business impact.

Multi-Consistency Architecture:

A single application might use:

Strong consistency for payments
Session consistency for user preferences
Eventual consistency for recommendations
Causal consistency for messaging

Consistency Requirement Specification by Entity:

consistency-requirements-specification.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Consistency Requirements by Operation Type
 
## Tier 1: Strong Consistency Required
 
### Financial Operations
- Account balance reads: Strong consistency
- Transfer execution: Linearizable transactions
- Reconciliation: Point-in-time snapshot consistency
- Rationale: Incorrect financial data is unacceptable
 
### Inventory Operations
- Stock level checks before purchase: Strong consistency
- Inventory decrement: Atomic with purchase
- Rationale: Overselling creates fulfillment failures
 
## Tier 2: Session Consistency Required
 
### User Profile Operations
- Profile reads after update: Read-your-writes within session
- Preference changes: Visible immediately to user
- Cross-session: Eventual (acceptable staleness: 30s)
- Rationale: User should see their own changes
 
### Shopping Cart
- Cart modifications: Session consistency
- Cart reads: Session consistency
- Cross-device sync: Eventual (5 minute convergence)
- Rationale: Cart abandonment if user loses items
 
## Tier 3: Causal Consistency Required
 
### Messaging
- Message ordering within thread: Causal
- Read receipts: Causal (follow message they acknowledge)
- Rationale: Out-of-order messages confuse users
 
### Comments and Replies
- Reply ordering: Causal (reply after parent)
- Edit visibility: Monotonic reads
- Rationale: Conversation flow must make sense
 
## Tier 4: Eventual Consistency Acceptable
 
### Recommendations
- Consistency model: Eventual
- Acceptable staleness: Up to 1 hour
- Rationale: Slightly outdated recommendations are fine
 
### Analytics Dashboards
- Consistency model: Eventual
- Acceptable staleness: Up to 15 minutes
- Rationale: Analytics don't drive real-time decisions
 
### Social Feed
- Consistency model: Eventual
- Acceptable staleness: Up to 30 seconds
- Read-your-writes: Yes (user sees own posts immediately)
- Rationale: Freshness preference over strict ordering

Tunable Consistency:

Some databases (Cassandra, DynamoDB) offer tunable consistency per query:

Consistency Level	Reads From	Writes To	Use When
ONE	Any 1 replica	1 replica	Highest performance, lowest consistency
QUORUM	Majority of replicas	Majority of replicas	Balance of consistency and performance
ALL	All replicas	All replicas	Highest consistency, lowest availability
LOCAL_QUORUM	Majority in local DC	Majority in local DC	Multi-region with DC-local consistency

With tunable consistency, your requirements should specify per-operation levels:

Consistency Level by Operation:

- User authentication: QUORUM read, QUORUM write
- Session creation: LOCAL_QUORUM write
- Product catalog read: ONE (cached)
- Inventory update: ALL write, QUORUM read
- Order creation: ALL write, ALL read
- Activity log: ONE write

Document the 'Why' Behind Consistency Choices

Future engineers will question why you chose eventual consistency for operation X, or strong consistency for operation Y. Document the rationale: 'Recommendations use eventual consistency because 30-second staleness doesn't impact purchase decisions, while reducing read latency by 70%.'

Conflict Resolution Requirements

When choosing availability over consistency (AP systems), conflicts will occur. Your requirements must specify how conflicts are detected and resolved.

Types of Conflicts:

Conflict Type	Description	Example
Write-Write	Two concurrent writes to same key	Two users update same doc
Read-Write	Read during concurrent write	Balance check during transfer
Delete-Update	Delete and update to same record	Cancel and modify same order
Constraint violation	Multiple operations violate invariant	Two last-item purchases

Conflict Resolution Strategies:

Conflict Resolution Strategies
Strategy	How It Works	Pros	Cons	Use When
Last-Write-Wins (LWW)	Latest timestamp wins	Simple, no user intervention	Can lose data	Non-critical, idempotent writes
First-Write-Wins	First write preserved	Prevents overwrites	Later updates lost	Creation-only data
Merge	Combine both values	No data loss	Complex merging logic	Additive operations
Custom resolution	Application-specific logic	Business rules enforced	Complex implementation	Domain-specific needs
User resolution	Present choices to user	User controls outcome	Poor UX at scale	High-value, rare conflicts
CRDTs	Mathematically conflict-free	Automatic convergence	Limited data types	Counters, sets, maps

Conflict Resolution Requirements Specification:

Conflict Resolution Requirements:

1. Shopping Cart Conflicts
   - Detection: Version vectors per cart
   - Resolution: Merge (union of items, max quantity per item)
   - Rationale: Losing items frustrates users; duplicates can be removed

2. User Profile Conflicts
   - Detection: Timestamp per field
   - Resolution: Last-Write-Wins per field, not per document
   - Rationale: Users update different fields; field-level LWW preserves most data

3. Inventory Conflicts
   - Detection: Local count vs expected count
   - Resolution: Prevent if would go negative; alert if counts disagree
   - Rationale: Financial accuracy required; human reconciliation for disputes

4. Comment Ordering Conflicts
   - Detection: Causal dependency tracking (vector clocks)
   - Resolution: Topological sort; concurrent comments shown in timestamp order
   - Rationale: Replies must follow parents; siblings can be reordered

5. Distributed Counter Conflicts
   - Data type: G-Counter CRDT (grow-only counter)
   - Resolution: Automatic merge (sum of per-node values)
   - Rationale: Like counts, view counts should never lose increments

Test Conflict Resolution Explicitly

Conflict resolution code often goes untested because conflicts are rare in development. Require: 'Conflict scenarios shall be covered by integration tests simulating concurrent operations. Chaos testing shall inject network partitions to validate conflict detection and resolution.'

Read and Write Concerns: Durability Meets Consistency

Consistency requirements intersect with durability requirements through read and write concerns—how many replicas must acknowledge operations before they're considered complete.

Write Concerns:

Write concern specifies how many replicas must confirm a write before success is returned:

Write Concern	Behavior	Durability Risk	Latency
w=0 (Fire and forget)	No acknowledgment	High (can lose write)	Lowest
w=1	Primary acknowledges	Medium (primary loss = data loss)	Low
w=majority	Majority of replicas acknowledge	Low	Medium
w=all	All replicas acknowledge	Lowest	Highest
w=journaled	Written to disk journal	Very low	Medium-High

Read Concerns:

Read concern specifies what data a read operation can see:

Read Concern	Behavior	Consistency	Freshness
Local	Reads from instance	May read uncommitted	Highest
Available	Reads what's available	May read stale	High
Majority	Reads majority-committed data	Strong in cluster	Medium
Linearizable	Reads real-time, confirmed data	Strongest	Lowest
Snapshot	Point-in-time consistent	Transaction consistent	N/A

Combined Requirements Specification:

Read and write concerns combine to determine effective consistency:

Read/Write Concern Requirements:

1. Financial Transactions
   - Write concern: majority, journaled
   - Read concern: majority
   - Rationale: Cannot lose transactions; reads must be committed

2. User Sessions
   - Write concern: majority
   - Read concern: local (performance)
   - Rationale: Sessions are short-lived; rare read of stale session is acceptable

3. Audit Logs
   - Write concern: majority, journaled
   - Read concern: majority
   - Rationale: Audit records must be durable and complete

4. Cache Writes (to persistent cache)
   - Write concern: 1
   - Read concern: local
   - Rationale: Cache can be rebuilt; performance is priority

5. Distributed Analytics
   - Write concern: 1 (local DC)
   - Read concern: available
   - Rationale: Analytics tolerate some data loss for throughput

Default (unspecified operations):
   - Write concern: majority
   - Read concern: local
   - Rationale: Balance of durability and performance

Read Concern + Write Concern = Consistency Level

With R replicas for reads and W for writes, you get strong consistency if R + W > N (total replicas). Example: 3 replicas, majority read (R=2), majority write (W=2). Since 2+2=4 > 3, at least one replica participates in both read and write, guaranteeing consistency.

Consistency Monitoring and Testing Requirements

Consistency is difficult to observe directly—you often don't know you have a consistency problem until a user reports seeing strange behavior. Proactive monitoring is essential.

Consistency Monitoring Requirements:

Monitoring Requirements Specification

•Replication Lag Monitoring: 'Monitor replication lag between primary and all replicas. Alert if lag exceeds 5 seconds for >1 minute. Critical alert if lag exceeds 30 seconds.'
•Consistency Probe Testing: 'Deploy consistency probes that write to primary and read from replicas, measuring actual convergence time. Dashboard must show P50/P95/P99 convergence latency.'
•Conflict Rate Tracking: 'Track conflict detection rate per data type. Alert if conflict rate exceeds 0.1% of writes. Investigate root cause above 0.01%.'
•Stale Read Detection: 'Implement versioning that allows detection of stale reads. Track stale read rate by endpoint. Target: <0.1% stale reads for session-consistent operations.'
•Invariant Checking: 'Deploy scheduled jobs that verify business invariants (e.g., sum of account balances = expected total). Alert on any invariant violation.'

Consistency Testing Requirements:

Consistency Testing Requirements:

1. Unit Tests
   - Test conflict resolution logic with deterministic conflict scenarios
   - Coverage: All merge functions, LWW tie-breakers, CRDT operations

2. Integration Tests
   - Simulate concurrent writes from multiple clients
   - Verify final state matches conflict resolution policy
   - Test read-your-writes within session

3. Chaos Testing
   - Inject network partitions between replicas
   - Verify system behavior matches CAP choice (CP or AP)
   - After partition heals, verify convergence within SLA

4. Load Testing with Consistency Validation
   - Under sustained load, continuously verify:
     - Replication lag stays within bounds
     - Conflict rate doesn't spike
     - Invariants hold

5. Production Verification
   - Continuously run consistency probes in production
   - Compare replica states periodically
   - Shadow-read from multiple replicas and compare

Jepsen-Style Testing

Consider requiring Jepsen-style testing for critical consistency guarantees. 'Linearizability claims shall be validated using automated consistency testing that injects failures and network partitions, verifying that no linearizability violations occur across 1000+ test runs.'

Summary: Consistency Requirements Mastery

We have covered the complete framework for consistency requirements. Let's consolidate the essential takeaways:

Key Takeaways

•Consistency is a spectrum — From strong/linearizable to eventual, each model has trade-offs.
•CAP and PACELC are inescapable — During partitions, choose consistency or availability. During normal operation, choose latency or consistency.
•Different operations need different consistency — Financial transactions need strong; recommendations can be eventual.
•Specify convergence time for eventual — 'Eventual' without bounds is meaningless; specify maximum staleness.
•Plan for conflicts — If choosing AP, conflicts will occur; specify detection and resolution strategies.
•Configure read/write concerns explicitly — These determine actual durability and consistency.
•Monitor and test consistency — Replication lag, conflict rates, and consistency probes make invisible problems visible.

What's Next:

With consistency requirements mastered, we turn to the final non-functional dimension: Security Requirements. While consistency determines the correctness of your data, security determines who can access and modify it. Security requirements span authentication, authorization, data protection, and compliance—often the most legally consequential non-functional requirements you'll specify.

Page Complete

You now have a comprehensive framework for defining consistency requirements. These specifications determine your data architecture, replication strategy, and database selection. In the next page, we'll complete the non-functional requirements with security—the requirements that determine who can access your consistent, available, fast, scalable system.

4 / 5

Loading learning content...

System Design (HLD)Non-Functional Requirements

Non-Functional Requirements

LevelAdvanced

Duration90 mins

TopicNon-Functional Requirements

4 / 5

Consistency Requirements

When Reality Forks

What You Will Learn

Understanding Consistency in Distributed Systems

Consistency has different meanings in different contexts. Before specifying requirements, we must clarify which type of consistency we're discussing.

Types of Consistency:

Data Consistency (ACID): All nodes see the same data at the same time. Traditional database consistency.
Transactional Consistency: ACID properties—Atomicity, Consistency, Isolation, Durability. Operations either fully complete or fully fail.
Replica Consistency: Agreement between copies of the same data across multiple storage nodes.
Read-After-Write Consistency: When you write data, you can immediately read what you wrote.
Causal Consistency: If operation A caused operation B, everyone sees A before B.

The Core Challenge:

Server A writes: balance = 100
Server B writes: balance = 50 (concurrently)

Which value is correct? Who wins?
What does a reader see? From which server?

Consistency Scenarios and Challenges
Scenario	What Can Go Wrong	User Impact
User updates profile	Change visible on one server, not another	User refreshes and sees old data
Inventory decrement	Two servers both allow purchase of last item	Overselling, customer disappointment
Bank transfer	Debit completes, credit fails	Money disappears (catastrophic)
Social media post	Post visible to some users, not others	Confusion, incomplete conversations
Distributed counter	Two increments become one final value	Lost updates, incorrect counts

Consistency Is Not Binary

The Consistency Model Spectrum

Consistency models define what guarantees a system provides about the visibility and ordering of operations. Here's the spectrum from strongest to weakest:

Strong Consistency (Linearizability):

Guarantee: Every read receives the most recent write, globally ordered
Behavior: System acts as if there's a single copy of data
Cost: High latency (coordination required), reduced availability during partitions
Use cases: Financial transactions, inventory systems, leader election

Sequential Consistency:

Guarantee: Operations appear in some global order consistent with program order
Behavior: All processes see the same order of operations
Cost: Still requires significant coordination
Use cases: Replicated logs, consensus protocols

Causal Consistency:

Guarantee: Causally related operations are seen in order; concurrent operations may be seen in any order
Behavior: If A→B (A happened before B), everyone sees A before B
Cost: Moderate—tracks dependencies, not global order
Use cases: Collaborative editing, social media threads

Session Consistency (Read-Your-Writes):

Guarantee: Within a session, you always see your own writes
Behavior: User never sees their data go backward in their session
Cost: Low—track session state, route to correct replica
Use cases: User profiles, shopping carts, personal settings

Consistency Models Comparison
Model	Strength	Latency Impact	Availability Impact	Typical Implementation
Strong/Linearizable	Strongest	High (sync replication)	Low (blocks on failure)	Paxos, Raft, 2PC
Sequential	Strong	High	Low	Total order broadcast
Causal	Medium	Medium	Medium	Vector clocks, version vectors
Session/Read-Your-Writes	Medium	Low	High	Sticky sessions, local caching
Monotonic Reads	Weak	Low	High	Version tracking per client
Eventual	Weakest	Lowest	Highest	Async replication, CRDTs

Eventual Consistency:

Guarantee: Given no new updates, all replicas will eventually converge to the same value
Behavior: Reads may return stale data; replicas may temporarily disagree
Cost: Lowest latency, highest availability
Use cases: DNS, social media feeds, analytics, recommendations

Eventual consistency is often misunderstood. It doesn't mean "inconsistent"—it means consistency is deferred. The key question is: how eventual is eventual?

Eventual Consistency Flavor	Typical Convergence Time	Example
Near real-time	<100ms	CDN cache invalidation
Short delay	100ms - 1s	Cross-region database replication
Moderate delay	1s - 30s	Async message processing
Long delay	Minutes to hours	Data warehouse synchronization
Unbounded	Unknown	Best-effort batch processing

Specify Convergence Time

CAP Theorem and PACELC: The Impossible Trade-offs

The CAP theorem and its extension PACELC formalize the trade-offs that consistency requirements must navigate.

CAP Theorem:

In a distributed system, during a network partition, you can have at most two of three properties:

Consistency: Every read receives the most recent write or an error
Availability: Every request receives a response (not necessarily the most recent write)
Partition tolerance: The system continues operating despite network partitions

The CAP Reality:

Since network partitions will happen (P is required), CAP really asks: During a partition, do you prefer Consistency or Availability?

┌────────────────────────────────────────────────────────────┐
│              Network Partition Occurs                      │
│                                                            │
│    [Server A]  ════╳════  [Server B]                       │
│    Has write               Doesn't have write              │
│                                                            │
│    Option CP: Server B returns error (consistent)          │
│    Option AP: Server B returns stale data (available)      │
└────────────────────────────────────────────────────────────┘

CP (Consistency over Availability):

Returns errors or blocks during partitions
Users may see unavailability
Data is always correct when returned
Examples: Traditional RDBMS, Zookeeper, HBase

AP (Availability over Consistency):

Always returns a response
Data may be stale or conflicting
System remains operational during partitions
Examples: Cassandra, CouchDB, DynamoDB (default)

CAP Trade-off by Use Case
Use Case	CAP Choice	Rationale
Banking transactions	CP	Wrong balance is unacceptable; prefer errors
Inventory management	CP	Overselling is costly; prefer rejection
Social media feed	AP	Stale posts are fine; unavailability isn't
Gaming leaderboard	AP	Slightly outdated scores are acceptable
Medical records	CP	Wrong data could harm patients
Shopping cart	AP	Availability drives revenue; merge conflicts later

PACELC: The Complete Picture:

CAP only describes behavior during partitions. PACELC extends this:

Partition → Availability vs Consistency Else (no partition) → Latency vs Consistency

Even without partitions, there's a trade-off between latency and consistency:

System Type	Partition Behavior	Normal Behavior
PA/EL	Availability	Low Latency
PA/EC	Availability	Strong Consistency
PC/EL	Consistency	Low Latency
PC/EC	Consistency	Strong Consistency

Most modern systems are PA/EL:

During partitions: Favor availability (keep serving)
During normal operation: Favor latency (async replication)
This describes DynamoDB, Cassandra, and many NoSQL databases

CAP is About Extreme Cases

Specifying Consistency Per Operation Type

Not every operation needs the same consistency level. Sophisticated systems apply different consistency requirements to different operations based on business impact.

Multi-Consistency Architecture:

A single application might use:

Strong consistency for payments
Session consistency for user preferences
Eventual consistency for recommendations
Causal consistency for messaging

Consistency Requirement Specification by Entity:

consistency-requirements-specification.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Consistency Requirements by Operation Type
 
## Tier 1: Strong Consistency Required
 
### Financial Operations
- Account balance reads: Strong consistency
- Transfer execution: Linearizable transactions
- Reconciliation: Point-in-time snapshot consistency
- Rationale: Incorrect financial data is unacceptable
 
### Inventory Operations
- Stock level checks before purchase: Strong consistency
- Inventory decrement: Atomic with purchase
- Rationale: Overselling creates fulfillment failures
 
## Tier 2: Session Consistency Required
 
### User Profile Operations
- Profile reads after update: Read-your-writes within session
- Preference changes: Visible immediately to user
- Cross-session: Eventual (acceptable staleness: 30s)
- Rationale: User should see their own changes
 
### Shopping Cart
- Cart modifications: Session consistency
- Cart reads: Session consistency
- Cross-device sync: Eventual (5 minute convergence)
- Rationale: Cart abandonment if user loses items
 
## Tier 3: Causal Consistency Required
 
### Messaging
- Message ordering within thread: Causal
- Read receipts: Causal (follow message they acknowledge)
- Rationale: Out-of-order messages confuse users
 
### Comments and Replies
- Reply ordering: Causal (reply after parent)
- Edit visibility: Monotonic reads
- Rationale: Conversation flow must make sense
 
## Tier 4: Eventual Consistency Acceptable
 
### Recommendations
- Consistency model: Eventual
- Acceptable staleness: Up to 1 hour
- Rationale: Slightly outdated recommendations are fine
 
### Analytics Dashboards
- Consistency model: Eventual
- Acceptable staleness: Up to 15 minutes
- Rationale: Analytics don't drive real-time decisions
 
### Social Feed
- Consistency model: Eventual
- Acceptable staleness: Up to 30 seconds
- Read-your-writes: Yes (user sees own posts immediately)
- Rationale: Freshness preference over strict ordering

Tunable Consistency:

Some databases (Cassandra, DynamoDB) offer tunable consistency per query:

Consistency Level	Reads From	Writes To	Use When
ONE	Any 1 replica	1 replica	Highest performance, lowest consistency
QUORUM	Majority of replicas	Majority of replicas	Balance of consistency and performance
ALL	All replicas	All replicas	Highest consistency, lowest availability
LOCAL_QUORUM	Majority in local DC	Majority in local DC	Multi-region with DC-local consistency

With tunable consistency, your requirements should specify per-operation levels:

Consistency Level by Operation:

- User authentication: QUORUM read, QUORUM write
- Session creation: LOCAL_QUORUM write
- Product catalog read: ONE (cached)
- Inventory update: ALL write, QUORUM read
- Order creation: ALL write, ALL read
- Activity log: ONE write

Document the 'Why' Behind Consistency Choices

Conflict Resolution Requirements

When choosing availability over consistency (AP systems), conflicts will occur. Your requirements must specify how conflicts are detected and resolved.

Types of Conflicts:

Conflict Type	Description	Example
Write-Write	Two concurrent writes to same key	Two users update same doc
Read-Write	Read during concurrent write	Balance check during transfer
Delete-Update	Delete and update to same record	Cancel and modify same order
Constraint violation	Multiple operations violate invariant	Two last-item purchases

Conflict Resolution Strategies:

Conflict Resolution Strategies
Strategy	How It Works	Pros	Cons	Use When
Last-Write-Wins (LWW)	Latest timestamp wins	Simple, no user intervention	Can lose data	Non-critical, idempotent writes
First-Write-Wins	First write preserved	Prevents overwrites	Later updates lost	Creation-only data
Merge	Combine both values	No data loss	Complex merging logic	Additive operations
Custom resolution	Application-specific logic	Business rules enforced	Complex implementation	Domain-specific needs
User resolution	Present choices to user	User controls outcome	Poor UX at scale	High-value, rare conflicts
CRDTs	Mathematically conflict-free	Automatic convergence	Limited data types	Counters, sets, maps

Conflict Resolution Requirements Specification:

Conflict Resolution Requirements:

1. Shopping Cart Conflicts
   - Detection: Version vectors per cart
   - Resolution: Merge (union of items, max quantity per item)
   - Rationale: Losing items frustrates users; duplicates can be removed

2. User Profile Conflicts
   - Detection: Timestamp per field
   - Resolution: Last-Write-Wins per field, not per document
   - Rationale: Users update different fields; field-level LWW preserves most data

3. Inventory Conflicts
   - Detection: Local count vs expected count
   - Resolution: Prevent if would go negative; alert if counts disagree
   - Rationale: Financial accuracy required; human reconciliation for disputes

4. Comment Ordering Conflicts
   - Detection: Causal dependency tracking (vector clocks)
   - Resolution: Topological sort; concurrent comments shown in timestamp order
   - Rationale: Replies must follow parents; siblings can be reordered

5. Distributed Counter Conflicts
   - Data type: G-Counter CRDT (grow-only counter)
   - Resolution: Automatic merge (sum of per-node values)
   - Rationale: Like counts, view counts should never lose increments

Test Conflict Resolution Explicitly

Read and Write Concerns: Durability Meets Consistency

Consistency requirements intersect with durability requirements through read and write concerns—how many replicas must acknowledge operations before they're considered complete.

Write Concerns:

Write concern specifies how many replicas must confirm a write before success is returned:

Write Concern	Behavior	Durability Risk	Latency
w=0 (Fire and forget)	No acknowledgment	High (can lose write)	Lowest
w=1	Primary acknowledges	Medium (primary loss = data loss)	Low
w=majority	Majority of replicas acknowledge	Low	Medium
w=all	All replicas acknowledge	Lowest	Highest
w=journaled	Written to disk journal	Very low	Medium-High

Read Concerns:

Read concern specifies what data a read operation can see:

Read Concern	Behavior	Consistency	Freshness
Local	Reads from instance	May read uncommitted	Highest
Available	Reads what's available	May read stale	High
Majority	Reads majority-committed data	Strong in cluster	Medium
Linearizable	Reads real-time, confirmed data	Strongest	Lowest
Snapshot	Point-in-time consistent	Transaction consistent	N/A

Combined Requirements Specification:

Read and write concerns combine to determine effective consistency:

Read/Write Concern Requirements:

1. Financial Transactions
   - Write concern: majority, journaled
   - Read concern: majority
   - Rationale: Cannot lose transactions; reads must be committed

2. User Sessions
   - Write concern: majority
   - Read concern: local (performance)
   - Rationale: Sessions are short-lived; rare read of stale session is acceptable

3. Audit Logs
   - Write concern: majority, journaled
   - Read concern: majority
   - Rationale: Audit records must be durable and complete

4. Cache Writes (to persistent cache)
   - Write concern: 1
   - Read concern: local
   - Rationale: Cache can be rebuilt; performance is priority

5. Distributed Analytics
   - Write concern: 1 (local DC)
   - Read concern: available
   - Rationale: Analytics tolerate some data loss for throughput

Default (unspecified operations):
   - Write concern: majority
   - Read concern: local
   - Rationale: Balance of durability and performance

Read Concern + Write Concern = Consistency Level

Consistency Monitoring and Testing Requirements

Consistency is difficult to observe directly—you often don't know you have a consistency problem until a user reports seeing strange behavior. Proactive monitoring is essential.

Consistency Monitoring Requirements:

Monitoring Requirements Specification

•Replication Lag Monitoring: 'Monitor replication lag between primary and all replicas. Alert if lag exceeds 5 seconds for >1 minute. Critical alert if lag exceeds 30 seconds.'
•Consistency Probe Testing: 'Deploy consistency probes that write to primary and read from replicas, measuring actual convergence time. Dashboard must show P50/P95/P99 convergence latency.'
•Conflict Rate Tracking: 'Track conflict detection rate per data type. Alert if conflict rate exceeds 0.1% of writes. Investigate root cause above 0.01%.'
•Stale Read Detection: 'Implement versioning that allows detection of stale reads. Track stale read rate by endpoint. Target: <0.1% stale reads for session-consistent operations.'
•Invariant Checking: 'Deploy scheduled jobs that verify business invariants (e.g., sum of account balances = expected total). Alert on any invariant violation.'

Consistency Testing Requirements:

Consistency Testing Requirements:

1. Unit Tests
   - Test conflict resolution logic with deterministic conflict scenarios
   - Coverage: All merge functions, LWW tie-breakers, CRDT operations

2. Integration Tests
   - Simulate concurrent writes from multiple clients
   - Verify final state matches conflict resolution policy
   - Test read-your-writes within session

3. Chaos Testing
   - Inject network partitions between replicas
   - Verify system behavior matches CAP choice (CP or AP)
   - After partition heals, verify convergence within SLA

4. Load Testing with Consistency Validation
   - Under sustained load, continuously verify:
     - Replication lag stays within bounds
     - Conflict rate doesn't spike
     - Invariants hold

5. Production Verification
   - Continuously run consistency probes in production
   - Compare replica states periodically
   - Shadow-read from multiple replicas and compare

Jepsen-Style Testing

Summary: Consistency Requirements Mastery

We have covered the complete framework for consistency requirements. Let's consolidate the essential takeaways:

Key Takeaways

•Consistency is a spectrum — From strong/linearizable to eventual, each model has trade-offs.
•CAP and PACELC are inescapable — During partitions, choose consistency or availability. During normal operation, choose latency or consistency.
•Different operations need different consistency — Financial transactions need strong; recommendations can be eventual.
•Specify convergence time for eventual — 'Eventual' without bounds is meaningless; specify maximum staleness.
•Plan for conflicts — If choosing AP, conflicts will occur; specify detection and resolution strategies.
•Configure read/write concerns explicitly — These determine actual durability and consistency.
•Monitor and test consistency — Replication lag, conflict rates, and consistency probes make invisible problems visible.

What's Next:

Page Complete

4 / 5