System Design (HLD)Distributed Cache Systems

Distributed Cache Systems

LevelIntermediate

Duration90 mins

TopicDistributed Cache Systems

5 / 5

Cache Consistency Challenges

The Fundamental Tension of Caching

Caching introduces a fundamental tension in distributed systems: the cache is a copy, and copies can become stale. Every time you add a cache, you're accepting that reads might return data that no longer reflects the current state of the source of truth.

This isn't a bug—it's an inherent property of caching. The value proposition of caching (reduced latency, reduced load on source systems) comes precisely from serving copies instead of always querying the source. But this creates consistency challenges that, if mishandled, lead to subtle bugs, confused users, and data integrity issues.

Consider the consequences of cache inconsistency:

E-commerce: User sees item in stock (cached), but checkout fails because it sold out
Banking: User sees old balance after transfer completes
Social media: Post displays old like count after user liked it
Gaming: Leaderboard shows stale rankings after score update

None of these are necessarily wrong—users tolerate brief inconsistency in many contexts. But failing to understand and manage consistency expectations leads to poor user experiences and, in some cases, serious business problems.

This page examines cache consistency challenges in depth and provides strategies for achieving the right consistency level for your requirements.

What You Will Learn

By the end of this page, you will understand the root causes of cache inconsistency, evaluate different invalidation strategies and their trade-offs, apply patterns like Cache-Aside, Write-Through, and Write-Behind correctly, and design for your specific consistency requirements without over-engineering.

Understanding Cache Inconsistency

Cache inconsistency occurs when the cached value differs from the source-of-truth value. Understanding why this happens is essential for designing appropriate solutions.

Sources of Inconsistency

1. Stale Data from TTL-Based Expiration

The most common cause: data changes in the source, but the cached copy hasn't expired yet.

T0: Cache user:42 with balance=$100, TTL=300s
T1: User deposits $50, database updated to $150
T150: Read user:42 → Returns $100 (stale)
T300: Cache expires
T301: Read user:42 → Returns $150 (fresh)

For 150 seconds, reads returned stale data. This is the price of caching.

2. Race Conditions During Updates

Concurrent operations can cause the cache to end up with wrong data even after an update attempt.

T0: Process A reads user:42 = $100 from DB
T1: Process B updates user:42 to $150 in DB
T2: Process B invalidates cache (or writes $150)
T3: Process A writes $100 to cache (from its stale read)

Result: Cache has $100, database has $150 — and it's not a TTL issue.

Converting Mermaid diagram...

3. Distributed System Lag

In distributed caches with replication, updates may not propagate instantly:

Write to cache master
Replica hasn't received update yet
Read from replica returns stale data

4. Network Partitions and Failures

During network issues:

Invalidation messages may be lost
Database updated but cache invalidation fails
Cache retains stale data indefinitely

5. Clock Skew and Ordering

In distributed systems, time-based operations (TTL, timestamps) can misbehave:

Server A's clock is ahead of Server B's
TTL calculations differ between servers
"Newer" data might appear older due to clock differences

Inconsistency Is Inevitable

You cannot eliminate cache inconsistency in a distributed system—you can only manage it. The CAP theorem tells us we must choose between consistency and availability during partitions. Caching inherently trades some consistency for performance. The question is: how much inconsistency is acceptable, and for how long?

Consistency Levels and Requirements

Different use cases tolerate different amounts of inconsistency. Understanding your requirements prevents both under-engineering (too much inconsistency) and over-engineering (unnecessary complexity for strong consistency).

Consistency Spectrum

Consistency Levels in Caching
Level	Description	Typical Staleness	Use Cases
Strong Consistency	Cache always reflects source of truth	0 (no staleness)	Financial transactions, inventory counts
Read-Your-Writes	User sees their own writes immediately	Seconds (for others)	User profile updates, settings
Bounded Staleness	Data guaranteed fresh within time window	Seconds to minutes	Product catalogs, content feeds
Eventual Consistency	Data will converge, timing undefined	Minutes to hours	Analytics, aggregated counts
Best Effort	May never converge in edge cases	Variable	Non-critical caching

Matching Consistency to Requirements

Strong Consistency Use Cases:

Account balances during transactions
Inventory levels for last-item-in-stock
Security permissions and access control
Distributed locking and coordination

Read-Your-Writes Use Cases:

Profile updates: user changes name, sees it immediately
Order placement: user submits order, sees it in order history
Settings changes: user toggles preference, sees it applied

Eventual Consistency Use Cases:

Social media feeds: slight delay in new posts appearing is acceptable
Product recommendations: stale recommendations still useful
Analytics dashboards: real-time isn't required
Search indexes: brief lag between publish and searchability

Ask the Right Questions

• What's the worst that happens if a user sees stale data for 5 seconds? 1 minute? 1 hour? • Which specific data items have strong consistency requirements? • Can we use stronger consistency selectively (just for critical paths)? • What do users actually expect, not what would be technically "correct"?

The Cost of Strong Consistency

Stronger consistency typically means:

Higher latency: Must verify cache validity, potentially query source
Lower throughput: More coordination, more source queries
Higher complexity: Distributed protocols, locking mechanisms
Reduced availability: May fail-closed rather than serve stale data

For most web applications, eventual consistency with bounded staleness (TTL) is sufficient. Reserve strong consistency for the specific operations that truly require it.

Invalidation Strategies

Cache invalidation is famously "one of the two hard things in computer science" (along with naming things and off-by-one errors). Each invalidation strategy has distinct characteristics.

Time-Based Expiration (TTL)

The simplest approach: data expires after a fixed time.

SET user:42 "{...}" EX 300    # Expires in 5 minutes

Advantages:

Simple to implement and understand
No coordination required between writers
Natural bound on staleness

Disadvantages:

Data may be stale until TTL expires
No immediate consistency on update
Short TTLs increase source load; long TTLs increase staleness

Best Practices:

Use shorter TTLs for frequently-changing data
Use longer TTLs for stable reference data
Consider data criticality when setting TTL

Explicit Invalidation (Write-Through Updates)

Application explicitly invalidates or updates cache when data changes.

Invalidate (Delete) on Write:

def update_user(user_id, data):
    database.update(user_id, data)
    cache.delete(f"user:{user_id}")
    # Next read will cache fresh data

Update (Write-Through) on Write:

def update_user(user_id, data):
    database.update(user_id, data)
    cache.set(f"user:{user_id}", data, ex=300)
    # Cache immediately has fresh data

Invalidate vs Update Trade-off:

Approach	Advantage	Disadvantage
Invalidate	Simpler, less code	Next read triggers cache miss
Update	No cache miss after write	Must serialize data correctly

invalidation-patterns.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Pattern 1: Simple Invalidation
def update_user_simple(user_id: int, data: dict):
    """Simple invalidation - delete cache on write"""
    db.update_user(user_id, data)
    cache.delete(f"user:{user_id}")
 
# Pattern 2: Write-Through with Retry
def update_user_write_through(user_id: int, data: dict):
    """Write-through - update cache immediately after DB"""
    db.update_user(user_id, data)
    try:
        cache.set(f"user:{user_id}", serialize(data), ex=300)
    except CacheError:
        # Cache update failed - fall back to invalidation
        cache.delete(f"user:{user_id}")
 
# Pattern 3: Transactional Invalidation (with cleanup)
def update_user_transactional(user_id: int, data: dict):
    """
    Ensure cache is invalidated even if DB transaction fails.
    Use a cleanup pattern for reliability.
    """
    invalidation_key = f"pending_invalidation:user:{user_id}"
    
    try:
        # Mark pending invalidation
        cache.set(invalidation_key, "1", ex=60)
        
        # Update database
        db.update_user(user_id, data)
        
        # Invalidate cache
        cache.delete(f"user:{user_id}")
        
    finally:
        # Clean up pending marker
        cache.delete(invalidation_key)
 
# Pattern 4: Event-Driven Invalidation
def publish_user_update(user_id: int, data: dict):
    """Publish event for asynchronous cache invalidation"""
    db.update_user(user_id, data)
    event_bus.publish("user.updated", {"user_id": user_id})
 
# Separate consumer handles invalidation
def handle_user_updated(event):
    cache.delete(f"user:{event['user_id']}")

Event-Driven Invalidation

Decouple cache invalidation from the write path using events:

Application writes to database
Application publishes "data changed" event
Separate consumer receives event and invalidates cache

Advantages:

Write path isn't blocked by cache operations
Single invalidation point handles all caches
Works across services in microservices

Disadvantages:

Eventual consistency (event delivery delay)
Event delivery guarantees matter (at-least-once, exactly-once)
More infrastructure complexity

Database-Level Invalidation

Some systems use database triggers or CDC (Change Data Capture) to detect changes and trigger cache invalidation. This catches all changes, including those made outside your application. Tools like Debezium can stream database changes to Kafka for cache invalidation consumers.

Race Condition Prevention

Race conditions are the most insidious source of cache inconsistency. They occur intermittently, are hard to reproduce, and can cause data to be incorrect indefinitely.

The Classic Race

The problem we saw earlier:

Process A reads stale data from DB
Process B updates DB and invalidates cache
Process A writes stale data to cache

Result: Cache has stale data, and TTL won't help because the data was just "refreshed."

Solution 1: Cache-Aside with Read Locking

Prevent concurrent cache population for the same key:

def get_user(user_id):
    cached = cache.get(f"user:{user_id}")
    if cached:
        return cached
    
    # Acquire lock before populating
    lock_key = f"lock:user:{user_id}"
    if cache.set(lock_key, "1", nx=True, ex=5):  # Got lock
        try:
            user = db.get_user(user_id)
            cache.set(f"user:{user_id}", serialize(user), ex=300)
            return user
        finally:
            cache.delete(lock_key)
    else:
        # Someone else is populating, wait and retry
        time.sleep(0.1)
        return get_user(user_id)

Solution 2: Version Stamping

Include a version number in cached data and only accept newer versions:

def update_user(user_id, data):
    # Atomically increment version in DB
    new_version = db.update_user_with_version(user_id, data)
    
    # Only cache if this is the latest version
    cached = cache.get(f"user:{user_id}")
    if cached and cached['version'] >= new_version:
        return  # Cached version is newer, don't overwrite
    
    data['version'] = new_version
    cache.set(f"user:{user_id}", serialize(data), ex=300)

def populate_cache(user_id):
    user = db.get_user(user_id)
    
    # Check-and-set: only write if no newer version exists
    cached = cache.get(f"user:{user_id}")
    if cached and cached['version'] >= user['version']:
        return  # Already have newer version
    
    cache.set(f"user:{user_id}", serialize(user), ex=300)

Solution 3: Delayed Invalidation

Invalidate twice with a delay to catch late writes:

def update_user(user_id, data):
    db.update_user(user_id, data)
    
    # Immediate invalidation
    cache.delete(f"user:{user_id}")
    
    # Delayed second invalidation (catches late writes)
    delayed_queue.enqueue_in(1.0, "invalidate_cache", f"user:{user_id}")

This catches the race condition where a stale read was in-flight during the update. The second invalidation clears the stale data written by the late process.

cache-stampede-prevention.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import time
import random
from functools import wraps
 
def cache_with_probabilistic_refresh(ttl: int, refresh_ahead_factor: float = 0.1):
    """
    Probabilistic early refresh to prevent cache stampede.
    
    As TTL approaches, increase probability of fetching fresh data.
    This spreads cache refreshes over time instead of all at expiry.
    """
    def decorator(func):
        @wraps(func)
        def wrapper(key, *args, **kwargs):
            cached = cache.get(key)
            
            if cached is None:
                # Cache miss - fetch and cache
                value = func(*args, **kwargs)
                cache.set(key, value, ex=ttl)
                return value
            
            value, cached_at = cached['value'], cached['cached_at']
            age = time.time() - cached_at
            remaining_ttl = ttl - age
            
            # Probabilistic refresh: as TTL decreases, probability increases
            refresh_window = ttl * refresh_ahead_factor
            if remaining_ttl < refresh_window:
                # Probability increases as we approach expiry
                probability = 1 - (remaining_ttl / refresh_window)
                if random.random() < probability:
                    # Refresh the cache
                    fresh_value = func(*args, **kwargs)
                    cache.set(key, {"value": fresh_value, "cached_at": time.time()}, ex=ttl)
                    return fresh_value
            
            return value
        return wrapper
    return decorator
 
# Usage
@cache_with_probabilistic_refresh(ttl=300, refresh_ahead_factor=0.2)
def get_user(user_id):
    return db.get_user(user_id)
 
# As TTL decreases from 60s to 0, probability of refresh increases
# This prevents all requests from hitting DB at the same moment

Cache Stampede Prevention

When many requests hit an expired key simultaneously, they all query the database (thundering herd). Solutions:

• Locking: Only one request fetches, others wait • Probabilistic refresh: Refresh before TTL expires • Background refresh: Separate process refreshes hot keys • Stale-while-revalidate: Return stale data, refresh async

Achieving Read-Your-Writes Consistency

Read-your-writes consistency ensures that after a user makes a change, they immediately see that change reflected—even if other users might see stale data briefly.

Why It Matters

Without read-your-writes:

User updates profile name from "Bob" to "Robert"
Server responds "Update successful"
User refreshes profile page
Page shows "Bob" (cached value)
User confused: "My update didn't work!"

Even though the update succeeded, the user experience is broken.

Implementation Strategies

Strategy 1: Write-Through to Cache

After updating the database, immediately update the cache:

def update_profile(user_id, data):
    db.update_user(user_id, data)
    cache.set(f"user:{user_id}", serialize(data), ex=300)
    return data  # Return fresh data to client

Subsequent reads (from any replica) will get the fresh data.

Strategy 2: Session-Scoped Cache Bypass

Mark the user's session as having made recent writes:

def update_profile(user_id, data):
    db.update_user(user_id, data)
    cache.delete(f"user:{user_id}")
    
    # Mark session as having fresh data
    session['cache_bypass_until'] = time.time() + 5  # 5 second window

def get_profile(user_id):
    # Check if we should bypass cache
    if session.get('cache_bypass_until', 0) > time.time():
        return db.get_user(user_id)  # Skip cache, read from DB
    
    return cached_get_user(user_id)  # Normal cache-aside

Strategy 3: Client-Side Optimistic Updates

The client immediately displays the new value without waiting for server confirmation:

// Frontend code
async function updateUserName(newName) {
  // Immediately update UI (optimistic)
  setUserName(newName);
  
  try {
    await api.updateProfile({ name: newName });
  } catch (error) {
    // Revert on failure
    setUserName(previousName);
    showError("Update failed");
  }
}

The client displays the update immediately. If the server request fails, it reverts. This provides instant feedback without server-side changes.

Strategy 4: Return Fresh Data in Write Response

The write operation returns the fresh data, which the client uses:

# Server
@app.put("/users/{user_id}")
def update_user(user_id, data):
    updated_user = db.update_user(user_id, data)
    cache.delete(f"user:{user_id}")
    return updated_user  # Client has fresh data without another request

Combine Strategies

In practice, combine multiple strategies:

Return fresh data in write response (client has it)
Write-through to cache (subsequent requests get fresh data)
Optimistic UI updates (instant perceived performance)

This belt-and-suspenders approach ensures read-your-writes regardless of how the user navigates.

Multi-Cache Consistency

Real systems often have multiple caching layers, each with its own consistency characteristics:

Browser cache: HTTP Cache-Control headers
CDN cache: Edge caching for static and dynamic content
Application cache: Redis/Memcached
Database query cache: Built-in query result caching
ORM cache: Hydrated object caching

The Coordination Challenge

When data changes, all layers need updating:

Data Update
    ↓
[Database] → Updated
    ↓
[App Cache] → Invalidated
    ↓
[CDN Cache] → Needs purge
    ↓
[Browser Cache] → Stale until TTL

Each layer has different invalidation mechanisms and latencies.

Cache Layer Invalidation Comparison
Layer	Invalidation Method	Latency	Control Level
Browser	Cache-Control headers, versioned URLs	Immediate (new requests)	Limited
CDN	Purge API, surrogate keys	Seconds to minutes	Good
Application (Redis)	DELETE command, TTL	Immediate	Full
Database	Query invalidation, FLUSH QUERY CACHE	Immediate	Full
ORM	Object refresh, session clear	Per-session	Moderate

CDN Invalidation Strategies

Versioned URLs:

Include version in URL—changing version = new cache entry:

<link rel="stylesheet" href="/styles.css?v=1.2.3">
<script src="/app.js?v=abc123"></script>

Surrogate Keys (Cache Tags):

Tag cached responses with identifiers for bulk invalidation:

Surrogate-Key: user-42 profile-page-v1

When user 42's data changes:

curl -X PURGE https://cdn.example.com/ -H "Surrogate-Key: user-42"

Short TTL + Stale-While-Revalidate:

Cache-Control: max-age=60, stale-while-revalidate=300

Return stale content while fetching fresh content in background.

multi-layer-invalidation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import asyncio
from dataclasses import dataclass
from typing import List
 
@dataclass  
class CacheLayer:
    name: str
    invalidate: callable
    priority: int  # Lower = invalidate first
 
class MultiLayerCacheInvalidator:
    """
    Coordinates invalidation across multiple cache layers.
    """
    
    def __init__(self, layers: List[CacheLayer]):
        # Sort by priority
        self.layers = sorted(layers, key=lambda l: l.priority)
    
    async def invalidate_key(self, key: str):
        """Invalidate a key across all cache layers."""
        failures = []
        
        for layer in self.layers:
            try:
                await layer.invalidate(key)
                print(f"Invalidated {key} in {layer.name}")
            except Exception as e:
                failures.append((layer.name, str(e)))
                print(f"Failed to invalidate {key} in {layer.name}: {e}")
        
        if failures:
            # Log for retry or alerting
            self.log_invalidation_failures(key, failures)
    
    async def invalidate_pattern(self, pattern: str):
        """Invalidate keys matching pattern (e.g., 'user:42:*')"""
        for layer in self.layers:
            try:
                await layer.invalidate(pattern)
            except Exception as e:
                self.log_invalidation_failures(pattern, [(layer.name, str(e))])
 
# Example setup
async def redis_invalidate(key):
    await redis.delete(key)
 
async def cdn_invalidate(key):
    await cdn_client.purge_by_surrogate_key(key)
 
async def memcached_invalidate(key):
    await memcached.delete(key)
 
cache_system = MultiLayerCacheInvalidator([
    CacheLayer("redis", redis_invalidate, priority=1),
    CacheLayer("memcached", memcached_invalidate, priority=2),
    CacheLayer("cdn", cdn_invalidate, priority=3),
])
 
# Usage
await cache_system.invalidate_key("user:42")

Invalidation Order Matters

Invalidate inner layers before outer layers:

App cache (Redis) — closest to database
CDN cache — edge, may refetch from origin
Browser prompts (if possible)

If you invalidate CDN first, it might refetch and cache stale data from an app cache that wasn't yet invalidated.

Monitoring and Observability for Consistency

You can't fix consistency problems you can't see. Proactive monitoring helps detect and quantify inconsistency.

Metrics to Track

Consistency-Related Metrics
Metric	What It Indicates	Threshold Concerns
Stale read ratio	% of reads returning outdated data	Depends on tolerance; track trends
Invalidation latency	Time from write to invalidation	1s may cause user-visible staleness
Invalidation failures	Failed invalidation attempts	Any failures accumulated = risk
Version mismatch rate	Reads returning old versions	Should approach 0 over time
Replication lag	Delay in cache replica updates	100ms affects read-your-writes
Cache/DB divergence %	Sampled comparison of cache vs source	Any divergence outside TTL window

Active Consistency Checking

Shadow Reads:

Periodically read from both cache and database, compare:

async def shadow_consistency_check(key):
    """Compare cache and database values for a key."""
    cached_value = await cache.get(key)
    db_value = await db.get(key)
    
    if cached_value is None:
        metrics.increment("consistency.cache_miss")
        return
    
    if cached_value == db_value:
        metrics.increment("consistency.match")
    else:
        metrics.increment("consistency.mismatch")
        log.warning(f"Consistency mismatch for {key}", 
                    cached=cached_value, db=db_value)

Sampling:

You can't check every key. Sample a representative subset:

if random.random() < 0.01:  # 1% sample
    background_task.run(shadow_consistency_check, key)

Debugging Consistency Issues

When consistency problems are reported:

Identify the scope: Single key? Pattern? All keys?
Check timing: When did write occur? When was stale read?
Trace the flow: Was invalidation sent? Received? Applied?
Check replication: Is the read hitting a stale replica?
Review recent changes: New code? New cache config?

Useful Debug Information:

Timestamp of cache entry (when was it written?)
Version number if using versioning
Source of the cache population (which service instance?)
Invalidation event logs with timestamps

Summary: Managing Cache Consistency

Cache consistency is an ongoing challenge, not a solved problem. Understanding the sources of inconsistency and applying appropriate strategies enables you to build systems that meet your specific requirements without over-engineering.

Key Takeaways

•Inconsistency is inherent — Caching trades consistency for performance. Understand and accept this trade-off consciously.
•Match consistency to requirements — Not all data needs strong consistency. Apply stronger guarantees selectively to critical paths.
•TTL provides bounded staleness — Simple and effective for most use cases. Tune TTL based on data change frequency and tolerance.
•Race conditions cause persistent staleness — Version stamping, locking, and delayed invalidation prevent stale data from persisting.
•Read-your-writes matters for UX — Users should see their own changes immediately. Use write-through, session-scoped bypass, or optimistic UI.
•Multi-layer caching complicates invalidation — Coordinate invalidation across browser, CDN, app cache, and database cache layers.
•Monitor consistency proactively — Shadow reads, version mismatch tracking, and invalidation latency metrics detect problems before users do.

Module Complete

You have completed the Distributed Cache Systems module, covering Redis, Memcached, technology comparison, cluster management, and consistency challenges. You now have the knowledge to design, deploy, and operate distributed caching systems that meet demanding performance and reliability requirements.

5 / 5

Loading learning content...

System Design (HLD)Distributed Cache Systems

Distributed Cache Systems

LevelIntermediate

Duration90 mins

TopicDistributed Cache Systems

5 / 5

Cache Consistency Challenges

The Fundamental Tension of Caching

Consider the consequences of cache inconsistency:

E-commerce: User sees item in stock (cached), but checkout fails because it sold out
Banking: User sees old balance after transfer completes
Social media: Post displays old like count after user liked it
Gaming: Leaderboard shows stale rankings after score update

This page examines cache consistency challenges in depth and provides strategies for achieving the right consistency level for your requirements.

What You Will Learn

Understanding Cache Inconsistency

Cache inconsistency occurs when the cached value differs from the source-of-truth value. Understanding why this happens is essential for designing appropriate solutions.

Sources of Inconsistency

1. Stale Data from TTL-Based Expiration

The most common cause: data changes in the source, but the cached copy hasn't expired yet.

T0: Cache user:42 with balance=$100, TTL=300s
T1: User deposits $50, database updated to $150
T150: Read user:42 → Returns $100 (stale)
T300: Cache expires
T301: Read user:42 → Returns $150 (fresh)

For 150 seconds, reads returned stale data. This is the price of caching.

2. Race Conditions During Updates

Concurrent operations can cause the cache to end up with wrong data even after an update attempt.

T0: Process A reads user:42 = $100 from DB
T1: Process B updates user:42 to $150 in DB
T2: Process B invalidates cache (or writes $150)
T3: Process A writes $100 to cache (from its stale read)

Result: Cache has $100, database has $150 — and it's not a TTL issue.

Converting Mermaid diagram...

3. Distributed System Lag

In distributed caches with replication, updates may not propagate instantly:

Write to cache master
Replica hasn't received update yet
Read from replica returns stale data

4. Network Partitions and Failures

During network issues:

Invalidation messages may be lost
Database updated but cache invalidation fails
Cache retains stale data indefinitely

5. Clock Skew and Ordering

In distributed systems, time-based operations (TTL, timestamps) can misbehave:

Server A's clock is ahead of Server B's
TTL calculations differ between servers
"Newer" data might appear older due to clock differences

Inconsistency Is Inevitable

Consistency Levels and Requirements

Consistency Spectrum

Consistency Levels in Caching
Level	Description	Typical Staleness	Use Cases
Strong Consistency	Cache always reflects source of truth	0 (no staleness)	Financial transactions, inventory counts
Read-Your-Writes	User sees their own writes immediately	Seconds (for others)	User profile updates, settings
Bounded Staleness	Data guaranteed fresh within time window	Seconds to minutes	Product catalogs, content feeds
Eventual Consistency	Data will converge, timing undefined	Minutes to hours	Analytics, aggregated counts
Best Effort	May never converge in edge cases	Variable	Non-critical caching

Matching Consistency to Requirements

Strong Consistency Use Cases:

Account balances during transactions
Inventory levels for last-item-in-stock
Security permissions and access control
Distributed locking and coordination

Read-Your-Writes Use Cases:

Profile updates: user changes name, sees it immediately
Order placement: user submits order, sees it in order history
Settings changes: user toggles preference, sees it applied

Eventual Consistency Use Cases:

Social media feeds: slight delay in new posts appearing is acceptable
Product recommendations: stale recommendations still useful
Analytics dashboards: real-time isn't required
Search indexes: brief lag between publish and searchability

Ask the Right Questions

The Cost of Strong Consistency

Stronger consistency typically means:

Higher latency: Must verify cache validity, potentially query source
Lower throughput: More coordination, more source queries
Higher complexity: Distributed protocols, locking mechanisms
Reduced availability: May fail-closed rather than serve stale data

For most web applications, eventual consistency with bounded staleness (TTL) is sufficient. Reserve strong consistency for the specific operations that truly require it.

Invalidation Strategies

Cache invalidation is famously "one of the two hard things in computer science" (along with naming things and off-by-one errors). Each invalidation strategy has distinct characteristics.

Time-Based Expiration (TTL)

The simplest approach: data expires after a fixed time.

SET user:42 "{...}" EX 300    # Expires in 5 minutes

Advantages:

Simple to implement and understand
No coordination required between writers
Natural bound on staleness

Disadvantages:

Data may be stale until TTL expires
No immediate consistency on update
Short TTLs increase source load; long TTLs increase staleness

Best Practices:

Use shorter TTLs for frequently-changing data
Use longer TTLs for stable reference data
Consider data criticality when setting TTL

Explicit Invalidation (Write-Through Updates)

Application explicitly invalidates or updates cache when data changes.

Invalidate (Delete) on Write:

def update_user(user_id, data):
    database.update(user_id, data)
    cache.delete(f"user:{user_id}")
    # Next read will cache fresh data

Update (Write-Through) on Write:

def update_user(user_id, data):
    database.update(user_id, data)
    cache.set(f"user:{user_id}", data, ex=300)
    # Cache immediately has fresh data

Invalidate vs Update Trade-off:

Approach	Advantage	Disadvantage
Invalidate	Simpler, less code	Next read triggers cache miss
Update	No cache miss after write	Must serialize data correctly

invalidation-patterns.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Pattern 1: Simple Invalidation
def update_user_simple(user_id: int, data: dict):
    """Simple invalidation - delete cache on write"""
    db.update_user(user_id, data)
    cache.delete(f"user:{user_id}")
 
# Pattern 2: Write-Through with Retry
def update_user_write_through(user_id: int, data: dict):
    """Write-through - update cache immediately after DB"""
    db.update_user(user_id, data)
    try:
        cache.set(f"user:{user_id}", serialize(data), ex=300)
    except CacheError:
        # Cache update failed - fall back to invalidation
        cache.delete(f"user:{user_id}")
 
# Pattern 3: Transactional Invalidation (with cleanup)
def update_user_transactional(user_id: int, data: dict):
    """
    Ensure cache is invalidated even if DB transaction fails.
    Use a cleanup pattern for reliability.
    """
    invalidation_key = f"pending_invalidation:user:{user_id}"
    
    try:
        # Mark pending invalidation
        cache.set(invalidation_key, "1", ex=60)
        
        # Update database
        db.update_user(user_id, data)
        
        # Invalidate cache
        cache.delete(f"user:{user_id}")
        
    finally:
        # Clean up pending marker
        cache.delete(invalidation_key)
 
# Pattern 4: Event-Driven Invalidation
def publish_user_update(user_id: int, data: dict):
    """Publish event for asynchronous cache invalidation"""
    db.update_user(user_id, data)
    event_bus.publish("user.updated", {"user_id": user_id})
 
# Separate consumer handles invalidation
def handle_user_updated(event):
    cache.delete(f"user:{event['user_id']}")

Event-Driven Invalidation

Decouple cache invalidation from the write path using events:

Application writes to database
Application publishes "data changed" event
Separate consumer receives event and invalidates cache

Advantages:

Write path isn't blocked by cache operations
Single invalidation point handles all caches
Works across services in microservices

Disadvantages:

Eventual consistency (event delivery delay)
Event delivery guarantees matter (at-least-once, exactly-once)
More infrastructure complexity

Database-Level Invalidation

Race Condition Prevention

Race conditions are the most insidious source of cache inconsistency. They occur intermittently, are hard to reproduce, and can cause data to be incorrect indefinitely.

The Classic Race

The problem we saw earlier:

Process A reads stale data from DB
Process B updates DB and invalidates cache
Process A writes stale data to cache

Result: Cache has stale data, and TTL won't help because the data was just "refreshed."

Solution 1: Cache-Aside with Read Locking

Prevent concurrent cache population for the same key:

def get_user(user_id):
    cached = cache.get(f"user:{user_id}")
    if cached:
        return cached
    
    # Acquire lock before populating
    lock_key = f"lock:user:{user_id}"
    if cache.set(lock_key, "1", nx=True, ex=5):  # Got lock
        try:
            user = db.get_user(user_id)
            cache.set(f"user:{user_id}", serialize(user), ex=300)
            return user
        finally:
            cache.delete(lock_key)
    else:
        # Someone else is populating, wait and retry
        time.sleep(0.1)
        return get_user(user_id)

Solution 2: Version Stamping

Include a version number in cached data and only accept newer versions:

def update_user(user_id, data):
    # Atomically increment version in DB
    new_version = db.update_user_with_version(user_id, data)
    
    # Only cache if this is the latest version
    cached = cache.get(f"user:{user_id}")
    if cached and cached['version'] >= new_version:
        return  # Cached version is newer, don't overwrite
    
    data['version'] = new_version
    cache.set(f"user:{user_id}", serialize(data), ex=300)

def populate_cache(user_id):
    user = db.get_user(user_id)
    
    # Check-and-set: only write if no newer version exists
    cached = cache.get(f"user:{user_id}")
    if cached and cached['version'] >= user['version']:
        return  # Already have newer version
    
    cache.set(f"user:{user_id}", serialize(user), ex=300)

Solution 3: Delayed Invalidation

Invalidate twice with a delay to catch late writes:

def update_user(user_id, data):
    db.update_user(user_id, data)
    
    # Immediate invalidation
    cache.delete(f"user:{user_id}")
    
    # Delayed second invalidation (catches late writes)
    delayed_queue.enqueue_in(1.0, "invalidate_cache", f"user:{user_id}")

This catches the race condition where a stale read was in-flight during the update. The second invalidation clears the stale data written by the late process.

cache-stampede-prevention.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import time
import random
from functools import wraps
 
def cache_with_probabilistic_refresh(ttl: int, refresh_ahead_factor: float = 0.1):
    """
    Probabilistic early refresh to prevent cache stampede.
    
    As TTL approaches, increase probability of fetching fresh data.
    This spreads cache refreshes over time instead of all at expiry.
    """
    def decorator(func):
        @wraps(func)
        def wrapper(key, *args, **kwargs):
            cached = cache.get(key)
            
            if cached is None:
                # Cache miss - fetch and cache
                value = func(*args, **kwargs)
                cache.set(key, value, ex=ttl)
                return value
            
            value, cached_at = cached['value'], cached['cached_at']
            age = time.time() - cached_at
            remaining_ttl = ttl - age
            
            # Probabilistic refresh: as TTL decreases, probability increases
            refresh_window = ttl * refresh_ahead_factor
            if remaining_ttl < refresh_window:
                # Probability increases as we approach expiry
                probability = 1 - (remaining_ttl / refresh_window)
                if random.random() < probability:
                    # Refresh the cache
                    fresh_value = func(*args, **kwargs)
                    cache.set(key, {"value": fresh_value, "cached_at": time.time()}, ex=ttl)
                    return fresh_value
            
            return value
        return wrapper
    return decorator
 
# Usage
@cache_with_probabilistic_refresh(ttl=300, refresh_ahead_factor=0.2)
def get_user(user_id):
    return db.get_user(user_id)
 
# As TTL decreases from 60s to 0, probability of refresh increases
# This prevents all requests from hitting DB at the same moment

Cache Stampede Prevention

When many requests hit an expired key simultaneously, they all query the database (thundering herd). Solutions:

Achieving Read-Your-Writes Consistency

Read-your-writes consistency ensures that after a user makes a change, they immediately see that change reflected—even if other users might see stale data briefly.

Why It Matters

Without read-your-writes:

User updates profile name from "Bob" to "Robert"
Server responds "Update successful"
User refreshes profile page
Page shows "Bob" (cached value)
User confused: "My update didn't work!"

Even though the update succeeded, the user experience is broken.

Implementation Strategies

Strategy 1: Write-Through to Cache

After updating the database, immediately update the cache:

def update_profile(user_id, data):
    db.update_user(user_id, data)
    cache.set(f"user:{user_id}", serialize(data), ex=300)
    return data  # Return fresh data to client

Subsequent reads (from any replica) will get the fresh data.

Strategy 2: Session-Scoped Cache Bypass

Mark the user's session as having made recent writes:

def update_profile(user_id, data):
    db.update_user(user_id, data)
    cache.delete(f"user:{user_id}")
    
    # Mark session as having fresh data
    session['cache_bypass_until'] = time.time() + 5  # 5 second window

def get_profile(user_id):
    # Check if we should bypass cache
    if session.get('cache_bypass_until', 0) > time.time():
        return db.get_user(user_id)  # Skip cache, read from DB
    
    return cached_get_user(user_id)  # Normal cache-aside

Strategy 3: Client-Side Optimistic Updates

The client immediately displays the new value without waiting for server confirmation:

// Frontend code
async function updateUserName(newName) {
  // Immediately update UI (optimistic)
  setUserName(newName);
  
  try {
    await api.updateProfile({ name: newName });
  } catch (error) {
    // Revert on failure
    setUserName(previousName);
    showError("Update failed");
  }
}

The client displays the update immediately. If the server request fails, it reverts. This provides instant feedback without server-side changes.

Strategy 4: Return Fresh Data in Write Response

The write operation returns the fresh data, which the client uses:

# Server
@app.put("/users/{user_id}")
def update_user(user_id, data):
    updated_user = db.update_user(user_id, data)
    cache.delete(f"user:{user_id}")
    return updated_user  # Client has fresh data without another request

Combine Strategies

In practice, combine multiple strategies:

Return fresh data in write response (client has it)
Write-through to cache (subsequent requests get fresh data)
Optimistic UI updates (instant perceived performance)

This belt-and-suspenders approach ensures read-your-writes regardless of how the user navigates.

Multi-Cache Consistency

Real systems often have multiple caching layers, each with its own consistency characteristics:

Browser cache: HTTP Cache-Control headers
CDN cache: Edge caching for static and dynamic content
Application cache: Redis/Memcached
Database query cache: Built-in query result caching
ORM cache: Hydrated object caching

The Coordination Challenge

When data changes, all layers need updating:

Data Update
    ↓
[Database] → Updated
    ↓
[App Cache] → Invalidated
    ↓
[CDN Cache] → Needs purge
    ↓
[Browser Cache] → Stale until TTL

Each layer has different invalidation mechanisms and latencies.

Cache Layer Invalidation Comparison
Layer	Invalidation Method	Latency	Control Level
Browser	Cache-Control headers, versioned URLs	Immediate (new requests)	Limited
CDN	Purge API, surrogate keys	Seconds to minutes	Good
Application (Redis)	DELETE command, TTL	Immediate	Full
Database	Query invalidation, FLUSH QUERY CACHE	Immediate	Full
ORM	Object refresh, session clear	Per-session	Moderate

CDN Invalidation Strategies

Versioned URLs:

Include version in URL—changing version = new cache entry:

<link rel="stylesheet" href="/styles.css?v=1.2.3">
<script src="/app.js?v=abc123"></script>

Surrogate Keys (Cache Tags):

Tag cached responses with identifiers for bulk invalidation:

Surrogate-Key: user-42 profile-page-v1

When user 42's data changes:

curl -X PURGE https://cdn.example.com/ -H "Surrogate-Key: user-42"

Short TTL + Stale-While-Revalidate:

Cache-Control: max-age=60, stale-while-revalidate=300

Return stale content while fetching fresh content in background.

multi-layer-invalidation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import asyncio
from dataclasses import dataclass
from typing import List
 
@dataclass  
class CacheLayer:
    name: str
    invalidate: callable
    priority: int  # Lower = invalidate first
 
class MultiLayerCacheInvalidator:
    """
    Coordinates invalidation across multiple cache layers.
    """
    
    def __init__(self, layers: List[CacheLayer]):
        # Sort by priority
        self.layers = sorted(layers, key=lambda l: l.priority)
    
    async def invalidate_key(self, key: str):
        """Invalidate a key across all cache layers."""
        failures = []
        
        for layer in self.layers:
            try:
                await layer.invalidate(key)
                print(f"Invalidated {key} in {layer.name}")
            except Exception as e:
                failures.append((layer.name, str(e)))
                print(f"Failed to invalidate {key} in {layer.name}: {e}")
        
        if failures:
            # Log for retry or alerting
            self.log_invalidation_failures(key, failures)
    
    async def invalidate_pattern(self, pattern: str):
        """Invalidate keys matching pattern (e.g., 'user:42:*')"""
        for layer in self.layers:
            try:
                await layer.invalidate(pattern)
            except Exception as e:
                self.log_invalidation_failures(pattern, [(layer.name, str(e))])
 
# Example setup
async def redis_invalidate(key):
    await redis.delete(key)
 
async def cdn_invalidate(key):
    await cdn_client.purge_by_surrogate_key(key)
 
async def memcached_invalidate(key):
    await memcached.delete(key)
 
cache_system = MultiLayerCacheInvalidator([
    CacheLayer("redis", redis_invalidate, priority=1),
    CacheLayer("memcached", memcached_invalidate, priority=2),
    CacheLayer("cdn", cdn_invalidate, priority=3),
])
 
# Usage
await cache_system.invalidate_key("user:42")

Invalidation Order Matters

Invalidate inner layers before outer layers:

App cache (Redis) — closest to database
CDN cache — edge, may refetch from origin
Browser prompts (if possible)

If you invalidate CDN first, it might refetch and cache stale data from an app cache that wasn't yet invalidated.

Monitoring and Observability for Consistency

You can't fix consistency problems you can't see. Proactive monitoring helps detect and quantify inconsistency.

Metrics to Track

Consistency-Related Metrics
Metric	What It Indicates	Threshold Concerns
Stale read ratio	% of reads returning outdated data	Depends on tolerance; track trends
Invalidation latency	Time from write to invalidation	1s may cause user-visible staleness
Invalidation failures	Failed invalidation attempts	Any failures accumulated = risk
Version mismatch rate	Reads returning old versions	Should approach 0 over time
Replication lag	Delay in cache replica updates	100ms affects read-your-writes
Cache/DB divergence %	Sampled comparison of cache vs source	Any divergence outside TTL window

Active Consistency Checking

Shadow Reads:

Periodically read from both cache and database, compare:

async def shadow_consistency_check(key):
    """Compare cache and database values for a key."""
    cached_value = await cache.get(key)
    db_value = await db.get(key)
    
    if cached_value is None:
        metrics.increment("consistency.cache_miss")
        return
    
    if cached_value == db_value:
        metrics.increment("consistency.match")
    else:
        metrics.increment("consistency.mismatch")
        log.warning(f"Consistency mismatch for {key}", 
                    cached=cached_value, db=db_value)

Sampling:

You can't check every key. Sample a representative subset:

if random.random() < 0.01:  # 1% sample
    background_task.run(shadow_consistency_check, key)

Debugging Consistency Issues

When consistency problems are reported:

Identify the scope: Single key? Pattern? All keys?
Check timing: When did write occur? When was stale read?
Trace the flow: Was invalidation sent? Received? Applied?
Check replication: Is the read hitting a stale replica?
Review recent changes: New code? New cache config?

Useful Debug Information:

Timestamp of cache entry (when was it written?)
Version number if using versioning
Source of the cache population (which service instance?)
Invalidation event logs with timestamps

Summary: Managing Cache Consistency

Key Takeaways

•Inconsistency is inherent — Caching trades consistency for performance. Understand and accept this trade-off consciously.
•Match consistency to requirements — Not all data needs strong consistency. Apply stronger guarantees selectively to critical paths.
•TTL provides bounded staleness — Simple and effective for most use cases. Tune TTL based on data change frequency and tolerance.
•Race conditions cause persistent staleness — Version stamping, locking, and delayed invalidation prevent stale data from persisting.
•Read-your-writes matters for UX — Users should see their own changes immediately. Use write-through, session-scoped bypass, or optimistic UI.
•Multi-layer caching complicates invalidation — Coordinate invalidation across browser, CDN, app cache, and database cache layers.
•Monitor consistency proactively — Shadow reads, version mismatch tracking, and invalidation latency metrics detect problems before users do.

Module Complete

5 / 5