Loading content...
Caching represents the most aggressive form of denormalization—creating complete, pre-constructed copies of data optimized entirely for read performance. While in-table denormalization (column replication, derived columns) trades minimal write overhead for join elimination, caching trades significant complexity for orders-of-magnitude performance improvements.
Every high-scale system relies on caching. The question is never whether to cache but what, where, how long, and how to handle staleness. Mastering caching strategies is essential for any engineer building systems that serve millions of users.
By the end of this page, you will understand caching architectures, cache-aside vs. read-through patterns, time-based vs. event-based invalidation, cache stampedes and their prevention, and production patterns for maintaining cache consistency with databases.
Before diving into strategies, let's understand the caching landscape in modern systems. Caching exists at multiple levels, each with different latency, capacity, and management characteristics:
Cache Hierarchy (from fastest to slowest):
| Cache Layer | Latency | Capacity | Consistency Challenge |
|---|---|---|---|
| In-Process (HashMap) | ~100ns | MB (heap-limited) | Multi-instance coordination |
| Redis (network) | ~500µs-1ms | GB-TB (cluster) | Cache-DB sync |
| CDN Edge | ~10-50ms (miss) | TB (distributed) | Global invalidation delay |
| Browser | 0ms (local) | MB per origin | Version/freshness signaling |
Database Caching Focus:
For database denormalization, we primarily focus on application-level caching—specifically distributed caches like Redis or Memcached that sit between the application and database:
┌──────────┐ ┌───────────────┐ ┌──────────────┐
│ App │ ── │ Redis Cache │ ── │ PostgreSQL │
└──────────┘ └───────────────┘ └──────────────┘
│ │ │
~50µs ~500µs ~5-50ms
This cache layer serves several purposes:
In most systems, 20% of data receives 80% of reads. Effective caching exploits this skew—cache the hot data, let the cold data flow through to the database. Cache hit rates of 95%+ are common for well-designed systems.
How your application interacts with the cache matters significantly for performance, consistency, and complexity. The major patterns are:
Cache-Aside (Lazy Loading)
The most common pattern. Application manages cache explicitly:
def get_user(user_id):
# Try cache first
user = cache.get(f'user:{user_id}')
if user is not None:
return user
# Cache miss: load from database
user = db.query('SELECT * FROM users WHERE id = ?', user_id)
# Populate cache for next time
cache.set(f'user:{user_id}', user, ttl=3600)
return user
Pros:
Cons:
Pattern Selection Guide:
| Pattern | Read Performance | Write Performance | Consistency | Use Case |
|---|---|---|---|---|
| Cache-Aside | Good (after warm) | N/A (DB direct) | Eventual | Most CRUD apps |
| Read-Through | Good (after warm) | N/A (DB direct) | Eventual | Centralized loading |
| Write-Through | Good | Slower | Strong | Read-heavy, consistency-critical |
| Write-Behind | Good | Very Fast | Eventual | Write-heavy, loss-tolerant |
Recommendation: Most applications start with cache-aside for its simplicity and flexibility. Move to read-through when you have many cache access points that benefit from centralized loading. Use write-through/write-behind only when requirements demand them.
As Phil Karlton famously said: "There are only two hard things in Computer Science: cache invalidation and naming things."
Cache invalidation is the process of removing or updating cached data when the source data changes. Get it wrong, and users see stale data. Get it too aggressive, and you lose caching benefits.
TTL-Based Invalidation:
# Simple TTL - user profile cached for 1 hour
cache.set(f'user:{user_id}', user_data, ttl=3600)
Pros: Simple, self-healing (stale data eventually expires) Cons: Users see stale data for up to TTL duration
TTL Selection Considerations:
| Data Type | Typical TTL | Rationale |
|---|---|---|
| Static config | Hours-Days | Rarely changes; tolerate some staleness |
| User profile | 15-60 min | Balance freshness with load reduction |
| Product info | 5-15 min | Changes moderately; users expect current price |
| Real-time data | Seconds | Near-current accuracy needed |
| Session data | Hours | Tied to session lifetime |
Explicit Invalidation:
def update_user(user_id, data):
# Update database
db.execute('UPDATE users SET name = ? WHERE id = ?', data['name'], user_id)
# Invalidate cache immediately
cache.delete(f'user:{user_id}')
# OR update cache with new data
cache.set(f'user:{user_id}', data, ttl=3600)
The choice between delete and update:
Updating cache after DB write has a race condition: if a read fetches old DB data between your write and cache update, it may recache stale data. Delete is safer—the penalty is one cache miss, not stale data.
Event-Driven Invalidation (CDC):
PostgreSQL → Debezium (CDC) → Kafka → Cache Invalidation Service → Redis
Change Data Capture captures database writes at the transaction log level and publishes them as events. Advantages:
Latency is typically 50-500ms for invalidation, which is acceptable for most applications.
Version-Based Invalidation:
# Include version in key
version = db.query('SELECT cache_version FROM users WHERE id = ?', user_id)
key = f'user:{user_id}:v{version}'
def update_user(user_id, data):
db.execute(
'UPDATE users SET name = ?, cache_version = cache_version + 1 WHERE id = ?',
data['name'], user_id
)
# No explicit cache invalidation needed - new reads use new version key
Old versions naturally expire via TTL. New reads always get new version key.
Downside: Unused cache entries accumulate (but TTL eventually clears them).
A cache stampede (or thundering herd) occurs when many requests simultaneously find the cache empty and all attempt to reload from the database. This can overwhelm the database and cause cascading failures.
Scenario:
This is especially problematic for:
Locking Implementation:
def get_user_with_lock(user_id):
cache_key = f'user:{user_id}'
lock_key = f'lock:user:{user_id}'
# Try cache first
user = cache.get(cache_key)
if user is not None:
return user
# Try to acquire lock
if cache.set(lock_key, '1', nx=True, ttl=10): # nx=True means "if not exists"
try:
# We own the lock - load from DB
user = db.query_user(user_id)
cache.set(cache_key, user, ttl=3600)
return user
finally:
cache.delete(lock_key)
else:
# Another request is loading - wait and retry
time.sleep(0.05) # Short sleep
return get_user_with_lock(user_id) # Retry
Probabilistic Early Expiration (XFetch):
def get_with_early_expiration(cache_key, ttl, beta=1.0):
result = cache.get_with_meta(cache_key) # Returns value + remaining TTL
if result is None:
return refresh_and_cache(cache_key, ttl)
value, remaining_ttl = result
# Probabilistically refresh before expiration
# Higher beta = earlier refresh; log(random) is always negative
delta = ttl * beta * math.log(random.random())
should_refresh = remaining_ttl + delta < 0
if should_refresh:
# Refresh in background, return current value
async_refresh(cache_key)
return value
This algorithm makes entries "feel" older probabilistically, spreading refreshes over time instead of having all expire simultaneously.
Stale-While-Revalidate:
def get_with_swr(cache_key):
result = cache.get_with_meta(cache_key)
if result is None:
return refresh_and_cache(cache_key)
value, remaining_ttl, is_stale = result
if is_stale:
# Return stale immediately, refresh async
trigger_background_refresh(cache_key)
return value # Always return immediately (stale or fresh)
HTTP caching formalizes this with Cache-Control: stale-while-revalidate=<seconds>.
For predictably hot data, proactively populate caches before load arrives. Example: Before a sale event, pre-cache all sale products. This eliminates the cold-cache stampede entirely.
Beyond individual cache access patterns, the overall architecture of your caching layer matters significantly:
Multi-Tier Cache Implementation:
class TieredCache:
def __init__(self):
self.l1 = {} # In-process (dict with LRU eviction in production)
self.l2 = redis.Redis() # Distributed cache
def get(self, key):
# Check L1 first (microseconds)
if key in self.l1:
return self.l1[key]
# Check L2 (sub-millisecond)
value = self.l2.get(key)
if value:
self.l1[key] = value # Promote to L1
return value
return None # Cache miss
def set(self, key, value, ttl):
self.l2.set(key, value, ex=ttl)
self.l1[key] = value
def invalidate(self, key):
self.l2.delete(key)
self.l1.pop(key, None) # Remove from local cache
self.broadcast_invalidation(key) # Tell other app instances
L1 Invalidation Challenge:
With multiple application instances, each has its own L1 cache. When data changes, all L1 caches must be invalidated:
Cache Sharding:
For very large datasets, a single Redis instance may not suffice. Options:
Sharding key selection matters:
# Good: Distribute across shards
key = f'user:{user_id}' # user_id has good distribution
# Bad: Hot shard
key = f'popular:user:{user_id}' # 'popular' prefix may concentrate traffic
Production caches require monitoring: hit rate, memory usage, eviction rate, latency percentiles. A drop in hit rate often indicates workload change or cache misconfiguration. Set alerts for hit rate thresholds.
Cache key design significantly impacts cache effectiveness, debuggability, and invalidation granularity. Poor key design leads to cache collisions, unnecessary misses, and maintenance headaches.
Key Design Principles:
user:123, product:456, session:abc. Prevents collisions and aids debugging.product:123:region:us not sometimes product:123:us.Examples of Good and Bad Keys:
# Good: Clear namespace, consistent structure
'user:profile:12345'
'product:listing:67890:region:us'
'order:history:12345:page:1'
# Bad: Ambiguous, collision-prone
'12345' # What entity? Collision between user 12345 and product 12345
'data' # Which data?
# Bad: Non-deterministic
f'user:{user_id}:{datetime.now()}' # Timestamp causes cache miss every time
# Bad: Missing parameter
'search:results:laptop' # Missing: page, sort order, filters - cache collision
# Good: Complete parameters, sorted
'search:results:q:laptop:page:1:sort:price_asc:brand:dell'
Versioned Keys for Schema Changes:
When cached data structure changes, old cache entries become incompatible:
# v1: Cached user had {name, email}
# v2: Cached user now has {name, email, avatar_url}
# Include version in key
key = f'user:v2:{user_id}'
On schema change, increment version. Old keys naturally expire.
Hash-Based Keys for Complex Queries:
For complex queries with many parameters, use hash:
import hashlib
import json
def cache_key(query_params):
# Sort for determinism
normalized = json.dumps(query_params, sort_keys=True)
param_hash = hashlib.md5(normalized.encode()).hexdigest()[:16]
return f'search:results:{param_hash}'
# Readable debug key stored separately
cache.set(f'debug:search:{param_hash}', query_params)
Document your cache key schema in a central location. When debugging production issues, engineers must quickly understand what key pattern maps to what data—this is often the first step in diagnosing caching bugs.
Maintaining consistency between cache and database is the fundamental challenge of caching. Several patterns exist, each with different guarantees:
Pattern Spectrum:
Eventual ◄───────────────────────────────────► Strong
TTL-only Event-driven Write-through Distributed
invalidation sync transactions
| Pattern | Consistency | Write Latency | Complexity | Use Case |
|---|---|---|---|---|
| TTL-Only | Eventual (TTL bound) | Low | Simple | Static/semi-static data |
| TTL + Invalidation | Eventual (seconds) | Low-Medium | Medium | Most CRUD applications |
| Write-Through | Strong | High | Medium | Read-your-writes required |
| Cache Transactions | Strong | Very High | Very High | Rarely used (Lua scripts) |
The Read-After-Write Problem:
Common consistency issue: user updates data, then immediately reads and sees old value.
T1: Write to DB
T2: Invalidate cache (async)
T3: Read hits old cache (invalidation not yet processed)
Solutions:
def update_user(user_id, data):
db.update(user_id, data)
cache.delete(f'user:{user_id}') # Wait for completion
def update_user(user_id, data):
new_data = db.update(user_id, data)
cache.set(f'user:{user_id}', new_data) # Update, not delete
Session Affinity:
Read-Your-Writes Token:
The Double-Write Problem:
T1: Write to DB
T2: Update cache
-- SYSTEM CRASH --
Result: DB has new value, cache has new value. OK.
OR:
T1: Update cache
T2: Write to DB
-- SYSTEM CRASH --
Result: DB has old value, cache has new value. INCONSISTENT!
Rule: Always write to the authoritative source (database) first. Cache is derivative.
When data can be modified by external systems (background jobs, other services, direct DB access), prefer cache deletion over cache updates. You may not have the complete new value, and deleting ensures the next read fetches from the authoritative source.
We've covered the essential caching strategies that complement in-table denormalization:
What's Next:
Now that you understand caching strategies, we'll explore materialized views—database-native denormalization that offers a middle ground between in-table denormalization and external caching. Materialized views provide query acceleration with database-managed consistency.
You now have comprehensive knowledge of caching strategies as a denormalization approach. These patterns are essential for any high-traffic system and complement the in-table patterns covered earlier.