Loading content...
Now that we understand the CAP theorem at a theoretical level, we face the practical question every distributed systems architect must answer: When a network partition occurs, should my system prioritize consistency or availability?
This decision is not purely technical. It has profound implications for user experience, business operations, legal compliance, and operational complexity. The wrong choice can result in:
This page provides a rigorous framework for making this decision. We'll examine the criteria that drive CP vs AP choices, analyze how leading companies have made these decisions, and develop practical heuristics you can apply to your own systems.
By the end of this page, you will have a systematic approach for determining whether a system—or specific operations within a system—should prioritize consistency or availability. You'll understand the business and technical factors that drive these decisions in real-world systems.
CP systems sacrifice availability during partitions to ensure that all operations observe a consistent view of the data. This means some requests will fail or block when partitions occur, but the data integrity is guaranteed.
Ask: "If two nodes have different views of the data during a partition, and both accept writes, would the resulting inconsistency cause unacceptable harm?"
If yes → Choose CP.
Consider a simple bank account with $100 balance. Two users attempt withdrawals simultaneously during a network partition:
AP System (Dangerous for Banking):
CP System (Correct for Banking):
The business accepts: Some withdrawal requests failing during network issues is acceptable. Uncontrolled overdrafts are not.
Choose CP when the consequences of inconsistency are more severe than the consequences of temporary unavailability. If wrong data is worse than no data, CP is your answer.
AP systems continue serving requests during partitions, accepting that different nodes may have temporarily divergent views of the data. Conflicts are resolved after the partition heals.
Ask: "If users receive stale or conflicting data during a partition, can the system recover gracefully when the partition heals? Is temporary inconsistency tolerable?"
If yes → Choose AP.
Amazon's Dynamo paper famously advocated for AP architecture in shopping carts. The reasoning:
The problem with CP for carts:
The AP solution:
The business trade-off: Having an extra item in a cart (user removes it) is far less costly than a user unable to add items and leaving the site. The conversion rate impact of unavailability dwarfs the minor friction of occasional cart merges.
Choosing AP requires a conflict resolution strategy for when the partition heals:
| Strategy | Description | Best For |
|---|---|---|
| Last-Write-Wins (LWW) | Most recent timestamp wins | Simple, low-conflict data |
| First-Write-Wins | Original value preserved | Immutable-ish data |
| Merge | Combine conflicting values | Sets, counters, accumulators |
| Application-specific | Custom logic | Complex domain rules |
| CRDTs | Mathematically guaranteed merge | Collaborative apps |
Choose AP when unavailability causes more user harm than inconsistency. If a degraded experience is better than no experience, and conflicts can be resolved later, AP is your answer.
Rather than relying on intuition, use a structured framework to evaluate CP vs AP trade-offs. This framework examines multiple dimensions of the problem.
| Factor | Favors CP | Favors AP |
|---|---|---|
| Cost of Inconsistency | Financial loss, regulatory violation, data corruption | Minor user inconvenience, self-correcting errors |
| Cost of Unavailability | Users can wait or retry | Revenue loss, user abandonment, SLA violations |
| Conflict Resolution | Complex or impossible to merge | Simple merge strategy exists (LWW, CRDTs) |
| Data Criticality | Source of truth, authoritative records | Derived data, caches, aggregations |
| User Expectations | "My money must be correct" | "I want it to work even if imperfect" |
| Recovery Path | No good way to fix bad data | Conflicts detectable and resolvable |
| Partition Frequency | Rare (can absorb brief outages) | Frequent (must operate through them) |
| Read/Write Ratio | Write-heavy (conflicts likely) | Read-heavy (stale reads acceptable) |
For any system or operation, answer these questions:
1. What happens if data is inconsistent during a partition?
2. What happens if the system is unavailable during a partition?
3. Can conflicts be detected and resolved after the partition heals?
4. What are the business and regulatory constraints?
5. What do users expect?
12345678910111213141516171819202122232425262728293031323334353637
CP vs AP DECISION FLOWCHART════════════════════════════════════════════════════════════════ ┌────────────────────────────────┐ │ Can inconsistency cause │ │ financial/legal/safety harm? │ └───────────────┬────────────────┘ │ ┌───────────────┴───────────────┐ ▼ ▼ YES NO │ │ ▼ ▼ ┌──────────────┐ ┌───────────────────────┐ │ Strong CP │ │ Is there a natural │ │ Required │ │ conflict resolution? │ └──────────────┘ └───────────┬───────────┘ │ ┌───────────┴───────────┐ ▼ ▼ YES NO │ │ ▼ ▼ ┌───────────────┐ ┌──────────────────┐ │ Does downtime │ │ Lean toward CP; │ │ cause harm? │ │ manual conflicts │ └───────┬───────┘ │ are expensive │ │ └──────────────────┘ ┌───────────┴───────────┐ ▼ ▼ YES NO │ │ ▼ ▼ ┌──────────────┐ ┌──────────────┐ │ AP Preferred │ │ CP Preferred │ │ with merge │ │ │ └──────────────┘ └──────────────┘Few systems are purely CP or AP. A typical e-commerce platform might use CP for payments and inventory, AP for product catalogs and reviews, and different consistency levels for different read operations. The framework applies per operation or data type, not globally.
Let's examine how major companies have made CP vs AP decisions and the reasoning behind their choices.
Notice that Netflix uses different consistency models for different data:
| Data Type | Consistency | Rationale |
|---|---|---|
| Content catalog | Eventual | Static data, cached heavily |
| Personalization | Eventual | Stale recommendations still work |
| Playback session | Session-level | User shouldn't lose progress mid-movie |
| Billing/Entitlements | Strong | Must not stream unpaid content |
| Account credentials | Strong | Login must be accurate |
This is the mature approach: analyzing each data domain and applying the appropriate consistency model, rather than one-size-fits-all.
Read engineering blogs from Netflix, Uber, Airbnb, and other companies. They publish detailed explanations of their consistency choices, providing invaluable insight into how these decisions are made at scale.
Most production systems don't fit neatly into CP or AP—they employ mixed strategies that vary consistency by operation, data type, or client requirements.
For many applications, users expect to see their own writes immediately, but don't need to see others' writes instantly.
Implementation:
// Client-side read-your-writes
async function updateProfile(userId, data) {
const result = await db.write('profiles', userId, data);
// Store the write timestamp in session
session.lastWriteTimestamp = result.timestamp;
return result;
}
async function getProfile(userId) {
return await db.read('profiles', userId, {
// Read from replica that has caught up to our last write
minTimestamp: session.lastWriteTimestamp || 0
});
}
Different data naturally has different consistency requirements.
Implementation:
Databases like Cassandra and DynamoDB allow consistency levels per query.
Implementation:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
// Example: E-commerce platform with mixed consistency interface ProductService { // Catalog data: AP (eventually consistent), cached aggressively async getProductDetails(productId: string): Promise<Product> { return await cache.getOrFetch( `product:${productId}`, () => catalog.get(productId, { consistency: 'eventual' }), { ttl: 60 } // 1-minute cache is fine ); } // Inventory check: CP (strongly consistent) for reservation async reserveInventory(productId: string, quantity: number): Promise<boolean> { return await inventory.reserve(productId, quantity, { consistency: 'strong', // Must not oversell timeout: 5000 // Accept brief unavailability }); } // Reviews: AP (eventually consistent), user-generated content async getReviews(productId: string): Promise<Review[]> { return await reviews.list(productId, { consistency: 'eventual' // Stale reviews are fine }); } // Add review: Read-your-writes for author async addReview(productId: string, authorId: string, review: ReviewInput): Promise<Review> { const result = await reviews.create({ productId, authorId, ...review }, { consistency: 'quorum' }); // Strong enough for read-your-writes // Clear author's cache so they see their own review await cache.invalidate(`reviews:${productId}:author:${authorId}`); return result; }} // Order processing: CP (strongly consistent, transactional)interface OrderService { async placeOrder(userId: string, cart: Cart): Promise<Order> { return await db.transaction(async (tx) => { // Check inventory (strong read) for (const item of cart.items) { const available = await tx.query( 'SELECT quantity FROM inventory WHERE product_id = $1 FOR UPDATE', [item.productId] ); if (available.quantity < item.quantity) { throw new InsufficientInventoryError(item.productId); } } // Decrement inventory for (const item of cart.items) { await tx.query( 'UPDATE inventory SET quantity = quantity - $1 WHERE product_id = $2', [item.quantity, item.productId] ); } // Create order const order = await tx.query( 'INSERT INTO orders (user_id, items, total) VALUES ($1, $2, $3) RETURNING *', [userId, cart.items, cart.total] ); return order; }, { isolation: 'serializable' }); // Maximum consistency for orders }}Document your consistency choices explicitly. Future engineers (and future you) need to understand why each operation uses its particular consistency level. Code comments, architecture decision records (ADRs), and runbooks should capture this.
Having decided on CP or AP for a given operation, implementation choices determine how well your system realizes that intent.
The most common CP implementation uses quorum reads and writes.
Rule: If R + W > N, where:
Then reads are guaranteed to see the latest write (assuming no concurrent writes).
Example with N=3:
Trade-off: Higher W improves durability but slows writes and reduces write availability. Higher R slows reads but allows lower W.
AP systems need strategies for merging divergent data:
Last-Write-Wins (LWW):
Vector Clocks:
CRDTs (Conflict-free Replicated Data Types):
Don't assume your system behaves correctly during partitions—test it. Use chaos engineering tools to inject network partitions and verify that your CP system properly rejects requests or your AP system properly merges conflicts. Untested partition behavior is unproven partition behavior.
We've established a rigorous framework for one of distributed systems' most critical decisions. Let's consolidate the key insights.
Having established when to choose CP vs AP, the next page explores how to tune consistency for availability—the techniques and patterns that allow systems to provide both strong consistency and good availability, even if they can't have both perfectly.
You now have a systematic framework for making CP vs AP decisions. This framework—evaluating costs, conflict resolution, user expectations, and regulatory requirements—will serve you across all distributed systems you design or evaluate.