System Design (HLD)Cache Invalidation Strategies

Cache Invalidation Strategies

LevelIntermediate

Duration120 mins

TopicCache Invalidation Strategies

1 / 6

Time-based Expiration (TTL)

The Ticking Clock of Cached Data

Every piece of cached data carries an implicit promise: the data you're reading reflects reality closely enough to be useful. But reality changes. Prices fluctuate, inventory depletes, user preferences evolve, and content updates. The fundamental question of cache invalidation is: How do we know when cached data has drifted too far from truth?

Time-To-Live (TTL) represents the simplest, most universal answer to this question. Rather than tracking every possible change in the source system, TTL makes a probabilistic bet: after a certain period, cached data is likely stale enough that we should refresh it.

This approach trades precision for simplicity. Instead of building complex invalidation pipelines that track every data mutation, TTL lets time itself serve as the invalidation trigger. It's the difference between a precision surgical strike and scheduled maintenance—both valid, but optimized for different contexts.

What You Will Learn

By the end of this page, you will deeply understand TTL-based cache expiration: the mathematical models that inform TTL selection, the distinction between absolute and sliding expiration, the cache stampede problem and its mitigations, tuning strategies for different data patterns, and how production systems implement TTL across multiple caching layers. You'll gain the expertise to configure TTL intelligently rather than guessing at arbitrary values.

Fundamentals of Time-To-Live

Time-To-Live (TTL) is a value that specifies the maximum duration for which a cached entry remains valid. When a cache entry's TTL expires, the cache considers the entry stale and either evicts it immediately or marks it for refresh upon the next access.

The Core Concept:

At its essence, TTL is a timestamp-based validity check. When data enters the cache, it's paired with an expiration timestamp:

expiration_time = current_time + TTL_duration

On every cache read, the system checks:

if current_time >= expiration_time:
    treat as cache miss (stale)
else:
    return cached value (fresh)

This simple mechanism underlies virtually every caching system, from browser HTTP caches to distributed Redis clusters.

Why TTL Works:

TTL's power lies in its decoupling of cache from source system. The cache doesn't need to know how or when the source data changes—it only needs to know how long freshness matters. This decoupling provides several advantages:

No coordination overhead — The source system doesn't need to notify caches of changes. Each cache independently manages its own expiration.
Predictable memory reclamation — Expired entries are guaranteed to be evicted eventually, preventing unbounded memory growth from orphaned cache entries.
Graceful staleness — Rather than serving arbitrarily old data forever, TTL bounds the maximum staleness, providing a freshness guarantee (even if approximate).
Simplicity — TTL requires only a timestamp comparison. No event buses, no change tracking, no distributed coordination protocols.

The Famous Quote

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

TTL is often the first tool engineers reach for precisely because it sidesteps the complexity of explicit invalidation. Rather than solving invalidation perfectly, TTL makes it a time-bounded approximation problem.

TTL Expiration Models: Absolute vs. Sliding

Not all TTL implementations behave identically. The two primary models—absolute expiration and sliding expiration—serve fundamentally different use cases and have distinct performance characteristics.

Absolute Expiration

•Definition: Entry expires at a fixed point in time, regardless of access patterns
•Formula: expiration = creation_time + TTL
•Behavior: Entry expires exactly N seconds after insertion
•Best for: Data with predictable staleness (product prices, weather, exchange rates)
•Memory impact: Predictable eviction; memory usage is bounded by TTL × insertion rate
•Example: Cache stock price for 60 seconds; always refresh from source after 60s

Sliding Expiration

•Definition: Entry expiration resets on every access (read or write)
•Formula: expiration = last_access_time + TTL
•Behavior: Entry only expires after N seconds of inactivity
•Best for: Session data, frequently accessed hot items, user preferences
•Memory impact: Hot data lives indefinitely; cold data evicts; less predictable
•Example: User session expires after 30 minutes of inactivity, not 30 minutes total

Practical Implications:

Absolute expiration guarantees that cached data is never older than TTL seconds. This is critical when data freshness has a strict upper bound—for example, regulatory requirements that prices must be updated within a certain window, or security tokens that must be re-validated periodically.

Sliding expiration optimizes for access patterns. Frequently accessed data stays cached, reducing load on the source system. Infrequently accessed data expires quickly, conserving memory. This model works well for session management, where active users should remain cached but abandoned sessions should evict.

Hybrid Approaches:

Many production systems combine both models. A common pattern:

absolute_max_ttl = 24 hours (hard ceiling)
sliding_ttl = 30 minutes (activity-based)

Entry expires when:
  - 30 minutes pass without access, OR
  - 24 hours pass since creation (whichever comes first)

This prevents indefinitely cached stale data while still optimizing for access patterns.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import redis
import time
 
class HybridTTLCache:
    """
    Implements a hybrid absolute + sliding TTL strategy.
    
    - Absolute TTL: Maximum time entry can live (hard ceiling)
    - Sliding TTL: Time until expiration resets on each access
    """
    
    def __init__(self, redis_client: redis.Redis):
        self.client = redis_client
        self.ABSOLUTE_TTL = 86400  # 24 hours in seconds
        self.SLIDING_TTL = 1800    # 30 minutes in seconds
    
    def set(self, key: str, value: str) -> None:
        """Store value with both absolute and sliding TTL metadata."""
        creation_time = time.time()
        
        # Use Redis hash to store value + metadata
        self.client.hset(f"cache:{key}", mapping={
            "value": value,
            "created_at": creation_time,
            "last_accessed": creation_time
        })
        
        # Set initial TTL to sliding window
        self.client.expire(f"cache:{key}", self.SLIDING_TTL)
    
    def get(self, key: str) -> str | None:
        """Retrieve value, updating sliding TTL if within absolute bounds."""
        cache_key = f"cache:{key}"
        data = self.client.hgetall(cache_key)
        
        if not data:
            return None  # Cache miss
        
        current_time = time.time()
        created_at = float(data[b'created_at'])
        
        # Check if absolute TTL exceeded
        if current_time - created_at >= self.ABSOLUTE_TTL:
            self.client.delete(cache_key)
            return None  # Treat as miss, force refresh
        
        # Calculate remaining absolute time
        remaining_absolute = self.ABSOLUTE_TTL - (current_time - created_at)
        
        # New expiration is minimum of sliding TTL or remaining absolute time
        new_expiry = min(self.SLIDING_TTL, int(remaining_absolute))
        
        # Update last access time and reset expiration
        self.client.hset(cache_key, "last_accessed", current_time)
        self.client.expire(cache_key, new_expiry)
        
        return data[b'value'].decode('utf-8')

Choosing Between Models

Use absolute expiration when data has a known staleness threshold (prices, scores, metrics). Use sliding expiration when access patterns indicate relevance (sessions, user state). Use hybrid when you need both freshness guarantees and access optimization. Most production systems benefit from hybrid approaches.

Mathematical Models for TTL Selection

Selecting optimal TTL values is not guesswork—it's an engineering decision informed by data patterns, performance requirements, and business constraints. Understanding the mathematical relationships helps you make principled choices.

The Cache Hit Rate Model:

Let's define:

λ = average request rate (requests per second)
μ = average rate of source data changes (changes per second)
T = TTL (seconds)

The probability that cached data is still valid (cache hit with fresh data) follows:

P(fresh) = e^(-μ × T)

This is an exponential decay function. As TTL increases, the probability of serving stale data increases. As the source data change rate (μ) increases, stale probability increases faster.

The Freshness-Hit Rate Trade-off:

The cache hit rate depends on TTL relative to request frequency:

Very short TTL (T → 0): Nearly every request is a cache miss. You've effectively disabled caching. Hit rate approaches 0%.
Very long TTL (T → ∞): Nearly every request is a hit, but staleness grows unbounded. Hit rate approaches 100%, but freshness approaches 0%.

The optimal TTL balances these extremes. For a given workload:

Optimal TTL ≈ 1 / μ × ln(λ / μ)

Where λ >> μ (requests far exceed changes), longer TTLs are optimal. Where λ ≈ μ (requests and changes are similar), shorter TTLs prevent stale reads.

Example Calculation:

Consider a product catalog:

Request rate (λ) = 1000 requests/second
Update rate (μ) = 1 update/minute = 0.0167 updates/second

Optimal TTL ≈ (1 / 0.0167) × ln(1000 / 0.0167)
          ≈ 60 × ln(60000)
          ≈ 60 × 11.0
          ≈ 660 seconds ≈ 11 minutes

This suggests an 11-minute TTL balances hit rate against staleness for this workload.

TTL Selection Guide by Data Type
Data Type	Typical Update Frequency	Recommended TTL Range	Rationale
Static assets (images, fonts)	Weeks/months	1 day - 1 year	Immutable content; version via URL change
Configuration data	Days/weeks	1 hour - 24 hours	Changes are infrequent but should propagate reasonably
Product catalog	Hours/days	5 minutes - 1 hour	Price/inventory changes; balance freshness vs. load
User profile/preferences	User-dependent	5 - 30 minutes	Personal data; tolerable staleness is subjective
Session/auth tokens	Per request/action	Sliding: 15-60 minutes	Security-sensitive; sliding keeps active users valid
Real-time data (stock prices)	Seconds	5 - 60 seconds	High change rate; short TTL or consider push invalidation
Search results	Query-dependent	30 seconds - 5 minutes	Balance result freshness vs. query load reduction
Aggregated metrics/analytics	Batch updates	5 minutes - 1 hour	Expensive to compute; stale-on-revalidate pattern works well

Beware of Cargo Culting

Many teams set TTL to arbitrary round numbers (5 minutes, 1 hour) without analysis. This often leaves performance on the table or serves unnecessarily stale data. Always analyze your actual update frequency and request patterns to select TTL rationally. Instrument your cache to measure actual staleness when possible.

The Cache Stampede Problem

One of the most insidious problems with TTL-based caching is the cache stampede (also called thundering herd or cache avalanche). This occurs when a popular cache entry expires, and multiple concurrent requests simultaneously attempt to regenerate the cached value.

The Problem Illustrated:

A cache entry for a popular resource expires
At the same moment, 1000 requests arrive for this resource
All 1000 requests see a cache miss
All 1000 requests hit the origin database simultaneously
Database becomes overwhelmed, latency spikes, potentially crashes
One of the 1000 requests eventually writes to cache
The other 999 wasted effort

This problem is especially severe when:

The cached computation is expensive (complex DB queries, ML inference)
The origin can't handle the concurrent load
Many cache entries expire at the same time (synchronized TTLs)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Timeline: TTL Expiration at T=0
 
Request Timeline:
T=-1s: [Cache HIT] ✓  1000 req/s served from cache
T=0:   [Cache EXPIRES] TTL reached
T=0.1s: [Cache MISS] ✗  Request 1: Hits DB
T=0.1s: [Cache MISS] ✗  Request 2: Hits DB
T=0.1s: [Cache MISS] ✗  Request 3: Hits DB
  ...
T=0.1s: [Cache MISS] ✗  Request 1000: Hits DB
 
Database Load:
T=-1s: [===]           50 QPS (background traffic)
T=0.1s: [===========...] 1050 QPS (STAMPEDE!)
                        ↑
                    21× spike!
 
Result: Latency spike, potential timeout cascade, partial outage

Mitigation Strategies:

1. Request Coalescing (Single-Flight Pattern)

When multiple concurrent requests target the same missing cache key, only one request actually computes the value. Other requests wait for the computation to complete and share the result.

2. Probabilistic Early Expiration (Background Refresh)

Before TTL expires, randomly selected requests pre-emptively refresh the cache. This ensures the cache is always warm when the actual TTL hits.

3. Lock-with-Lease Pattern

The first request to find a cache miss acquires a short-lived lock, preventing other requests from also hitting the origin. Other requests either wait for the lock to release (and cache to repopulate) or return stale data with a warning.

4. Stale-While-Revalidate

Serve stale data immediately while asynchronously refreshing the cache in the background. Users get fast responses (stale), and the cache refreshes without stampede.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
import asyncio
from typing import Dict, Any, Callable, Awaitable
from dataclasses import dataclass
from time import time
 
@dataclass
class InFlight:
    """Represents an in-progress cache computation."""
    future: asyncio.Future
    started_at: float
 
class SingleFlightCache:
    """
    Cache with request coalescing to prevent stampedes.
    
    When multiple requests arrive for the same missing key,
    only one actually computes the value. Others wait and
    share the result.
    """
    
    def __init__(self, cache_backend, ttl: int = 300):
        self.cache = cache_backend
        self.ttl = ttl
        self.in_flight: Dict[str, InFlight] = {}
        self._lock = asyncio.Lock()
    
    async def get_or_compute(
        self, 
        key: str, 
        compute_fn: Callable[[], Awaitable[Any]]
    ) -> Any:
        """
        Get from cache, or compute with deduplication.
        
        If computation is already in progress for this key,
        wait for it rather than starting a duplicate computation.
        """
        # Fast path: cache hit
        cached = await self.cache.get(key)
        if cached is not None:
            return cached
        
        # Slow path: cache miss, need to compute
        async with self._lock:
            # Double-check: maybe another request filled cache
            cached = await self.cache.get(key)
            if cached is not None:
                return cached
            
            # Check if computation already in flight
            if key in self.in_flight:
                # Wait for existing computation
                return await self.in_flight[key].future
            
            # We're the first: start computation
            future = asyncio.get_event_loop().create_future()
            self.in_flight[key] = InFlight(future=future, started_at=time())
        
        # Compute outside lock to allow concurrency
        try:
            result = await compute_fn()
            
            # Store in cache
            await self.cache.set(key, result, ttl=self.ttl)
            
            # Notify waiting requests
            self.in_flight[key].future.set_result(result)
            
            return result
        except Exception as e:
            # On failure, propagate error to all waiters
            self.in_flight[key].future.set_exception(e)
            raise
        finally:
            # Cleanup
            async with self._lock:
                del self.in_flight[key]
 
# Usage:
# cache = SingleFlightCache(redis_client, ttl=300)
# value = await cache.get_or_compute("product:123", fetch_product_from_db)

Stampedes Can Be Catastrophic

A single TTL expiration on a popular cache key can cascade into a full system outage. In 2012, Facebook famously experienced a cache stampede that generated 20 million memcached operations per second, overwhelming their infrastructure. Always implement stampede prevention for high-traffic cache entries.

Jittered and Randomized TTL

A subtle but critical optimization is TTL jitter—adding randomness to TTL values to prevent synchronized expirations. When many cache entries are written at roughly the same time (e.g., after a cold start or cache flush), using identical TTLs causes them all to expire simultaneously, creating a coordinated stampede.

The Problem with Uniform TTL:

# All entries written at T=0 with TTL=300s
Entry A: expires at T=300s
Entry B: expires at T=300s  
Entry C: expires at T=300s
...
Entry N: expires at T=300s

# At T=300s: ALL entries expire together → massive stampede

The Solution: Add Jitter

base_ttl = 300 seconds
jitter_range = 0.1 (10%)

actual_ttl = base_ttl × (1 + random(-jitter_range, +jitter_range))

# Results:
Entry A: TTL = 285s → expires at T=285s
Entry B: TTL = 312s → expires at T=312s
Entry C: TTL = 297s → expires at T=297s
Entry D: TTL = 330s → expires at T=330s

# Expirations spread across 45-second window instead of instant spike

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/**
 * Calculates a jittered TTL value to prevent synchronized expirations.
 * 
 * @param baseTTL - The base TTL in seconds
 * @param jitterPercent - The percentage of jitter (0.0 to 1.0)
 * @returns TTL with random jitter applied
 */
function getJitteredTTL(baseTTL: number, jitterPercent: number = 0.1): number {
    // jitterPercent of 0.1 means ±10% variation
    const jitterRange = baseTTL * jitterPercent;
    const jitter = (Math.random() * 2 - 1) * jitterRange;
    
    // Ensure TTL is at least 1 second
    return Math.max(1, Math.round(baseTTL + jitter));
}
 
/**
 * Alternative: Exponential jitter for longer-lived entries.
 * Provides more spread for longer TTLs.
 */
function getExponentialJitteredTTL(baseTTL: number): number {
    // Random factor between 0.8 and 1.2, with exponential distribution
    // This creates natural clustering around base while allowing outliers
    const factor = 0.8 + (Math.random() ** 2) * 0.4;
    return Math.max(1, Math.round(baseTTL * factor));
}
 
// Usage examples:
const productCacheTTL = getJitteredTTL(300, 0.1);     // 270-330s
const sessionTTL = getJitteredTTL(1800, 0.05);        // 1710-1890s
const configTTL = getExponentialJitteredTTL(3600);     // Varies around 1 hour
 
console.log(`Product cache: ${productCacheTTL}s`);
console.log(`Session cache: ${sessionTTL}s`);
console.log(`Config cache: ${configTTL}s`);

Advanced Jitter Strategies:

1. Entry-based deterministic jitter: Rather than random jitter, compute jitter from the cache key hash. This ensures the same key always has the same jittered TTL, preventing erratic behavior while still spreading expirations.

import hashlib

def deterministic_jitter(key: str, base_ttl: int, jitter_percent: float = 0.1) -> int:
    # Generate consistent pseudo-random value from key
    key_hash = int(hashlib.md5(key.encode()).hexdigest(), 16)
    jitter_factor = (key_hash % 1000) / 1000  # 0.0 to 0.999
    
    jitter_range = base_ttl * jitter_percent * 2  # Full range
    jitter = (jitter_factor * jitter_range) - (jitter_range / 2)
    
    return max(1, int(base_ttl + jitter))

2. Load-aware adaptive jitter: Increase jitter when system load is high, decrease when load is low. This dynamically spreads expirations during peak periods.

3. Cohort-based staggering: Group cache entries into time-based cohorts (e.g., by minute of initial write), with each cohort having a different base TTL. This organically prevents mass expirations.

Rule of Thumb

Always add at least 5-10% jitter to TTL values unless you have a specific reason for precise expiration timing (rare). The performance cost is zero, but the stampede prevention is significant. Many production disasters have been caused by synchronized cache expirations that jitter would have prevented.

TTL Hierarchies Across Cache Layers

Production systems typically employ multiple caching layers, each with its own TTL configuration. Understanding how TTLs interact across layers is crucial for maintaining consistency and optimizing performance.

The Multi-Layer Cache Stack:

┌─────────────────────────────────────────────────────┐
│  Browser Cache (Client)           TTL: 60-3600s    │
├─────────────────────────────────────────────────────┤
│  CDN Edge Cache                   TTL: 60-86400s   │
├─────────────────────────────────────────────────────┤
│  Application Cache (Redis/Memcached)  TTL: 60-3600s │
├─────────────────────────────────────────────────────┤
│  Database Query Cache              TTL: 10-300s    │
├─────────────────────────────────────────────────────┤
│  Origin Database                                    │
└─────────────────────────────────────────────────────┘

Critical Principle: TTL Should Decrease as You Approach Origin

Entries closer to the user can tolerate longer TTLs (higher staleness tolerance), while caches closer to the origin should have shorter TTLs (stricter freshness). This creates a staleness gradient that limits worst-case data age.

The Aggregate Staleness Problem:

When TTLs stack, worst-case staleness is the sum of all TTLs in the chain:

Worst case staleness = Sum of all layer TTLs

Example:
  Browser cache TTL:     300s (5 min)
  CDN TTL:               600s (10 min)
  App cache TTL:         300s (5 min)
  -----------------------------------
  Worst case staleness: 1200s (20 min!)

User could see data that's 20 minutes old, even though no individual TTL exceeds 10 minutes.

Mitigation Strategies:

1. Set outer layers to fraction of inner layers:

Origin refresh: 60s
App cache TTL: 45s (75% of origin)
CDN TTL: 30s (50% of origin)
Browser TTL: 15s (25% of origin)

Worst case staleness: 60 + 45 + 30 + 15 = 150s (still bounded)

2. Use stale-while-revalidate at outer layers: Browser and CDN serve stale while fetching fresh, reducing user-visible staleness.

3. Propagate invalidation through layers: When origin data changes, actively invalidate outer caches rather than waiting for TTL.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# For static assets (long-lived, versioned by URL)
Cache-Control: public, max-age=31536000, immutable
# Browser + CDN cache for 1 year; URL changes on update
 
# For API responses (short-lived, must be fresh)
Cache-Control: private, max-age=60, stale-while-revalidate=300
# Browser caches 60s, can serve stale up to 5min while revalidating
 
# For user-specific data (no shared caching)  
Cache-Control: private, no-cache, max-age=0
# Force revalidation on every request
 
# CDN-specific with origin caching
Cache-Control: public, s-maxage=600, max-age=60
# CDN caches 10min, browser caches 1min
 
# Surrogates (CDN-layer control)
Surrogate-Control: max-age=86400, stale-while-revalidate=3600
Cache-Control: max-age=300
# CDN caches 1 day w/ 1hr stale-while-revalidate; browser 5min

Browser Cache is the Hardest to Control

Once data is in a user's browser cache, you cannot invalidate it remotely. This is why browser TTLs should be conservative and why versioned URLs (cache busting) are standard for static assets. For dynamic data, prefer no-cache (revalidate every time) over long TTLs.

Monitoring and Tuning TTL in Production

TTL values should not be set-and-forget. Production systems require continuous monitoring to validate that TTL settings achieve their intended goals and adapt to changing workloads.

Key Metrics to Track:

Essential TTL Monitoring Metrics

•Cache hit rate — Percentage of requests served from cache. Low hit rate suggests TTL is too short or keys are too specific.
•Time-to-expiration distribution — How long entries live before expiring. If most entries expire before their TTL, they're being evicted for other reasons (LRU, memory pressure).
•Staleness at access — Measure actual data age when served from cache. Compare to source data to calculate staleness.
•Origin load during TTL transitions — Spike in origin requests as TTLs expire indicates stampede behavior.
•Memory usage vs. TTL — Longer TTLs consume more memory. Track this relationship to find optimal balance.
•Expiration distribution over time — Should be uniform. Clustering indicates jitter problems or synchronized writes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Cache hit/miss rates with TTL context
- name: cache_requests_total
  type: counter
  labels: [cache_name, result, ttl_bucket]
  help: Total cache requests by result (hit/miss) and TTL bucket
 
- name: cache_entry_age_seconds
  type: histogram
  labels: [cache_name]
  buckets: [1, 5, 15, 30, 60, 120, 300, 600, 1800, 3600]
  help: Age of cache entries when accessed
 
- name: cache_ttl_remaining_seconds
  type: histogram  
  labels: [cache_name]
  buckets: [0, 10, 30, 60, 120, 300, 600]
  help: Remaining TTL when entries are accessed
 
- name: cache_origin_requests_total
  type: counter
  labels: [cache_name, reason]
  help: Requests to origin (miss, expired, invalidated)
 
- name: cache_stampede_coalesced_requests
  type: counter
  labels: [cache_name]
  help: Requests that waited on in-flight computation
 
# Alerting rules
groups:
  - name: cache_ttl_alerts
    rules:
      - alert: LowCacheHitRate
        expr: rate(cache_requests_total{result="miss"}[5m]) / rate(cache_requests_total[5m]) > 0.5
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Cache hit rate below 50%"
          description: "Consider increasing TTL or reviewing cache key design"
      
      - alert: CacheStampedeDetected
        expr: rate(cache_origin_requests_total{reason="expired"}[1m]) > 1000
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Potential cache stampede"
          description: "High rate of origin requests due to expirations"

Tuning Process:

1. Baseline measurement: Before changing TTL, measure current hit rate, latency distribution, and origin load for at least one full business cycle (typically a week).

2. Hypothesis formation: Based on metrics, form a specific hypothesis: "Increasing TTL from 5m to 15m will increase hit rate by 10% with acceptable staleness increase."

3. A/B testing or gradual rollout: Don't change TTL globally. Use feature flags or percentage-based rollout to test new TTL on a subset of traffic.

4. Measure impact: Compare metrics between control and experiment groups. Verify hypothesis, watch for unexpected side effects (memory growth, staleness complaints).

5. Iterate: TTL tuning is continuous. Workloads change, data patterns evolve. Revisit TTL settings quarterly or when significant system changes occur.

Start Conservative, Then Optimize

When uncertain, start with shorter TTLs and increase based on data. It's easier to relax freshness requirements than to recover from stale data being served to users. Always prioritize correctness over hit rate—an incorrect cache hit is worse than a miss.

Summary: Time-Based Expiration Mastery

Time-To-Live based cache expiration is deceptively simple in concept but nuanced in practice. Let's consolidate the critical insights from this deep dive:

Key Takeaways

•TTL trades precision for simplicity — Rather than tracking every data change, TTL makes staleness time-bounded. This decouples cache from source and eliminates coordination overhead.
•Absolute vs. sliding expiration serve different needs — Use absolute for data freshness guarantees, sliding for access-pattern optimization, and hybrid for most production scenarios.
•TTL selection is a mathematical optimization — Analyze update frequency, request rate, and staleness tolerance to calculate optimal TTL rather than guessing arbitrary values.
•Cache stampedes are a critical failure mode — Implement request coalescing, early probabilistic refresh, or stale-while-revalidate to prevent synchronized expirations from overwhelming your origin.
•Jitter is non-negotiable — Always add randomness to TTL values to spread expirations over time. The overhead is zero; the protection is significant.
•Multi-layer TTL requires careful design — Aggregate staleness is the sum of layer TTLs. Design TTL hierarchies to bound worst-case data age.
•Continuous monitoring enables tuning — Track hit rate, entry age, origin load, and expiration distribution to validate and optimize TTL settings.

What's Next:

TTL is the most common but not the only invalidation strategy. The next page explores Event-Based Invalidation—where changes in source data actively trigger cache invalidation, providing stronger consistency guarantees at the cost of additional complexity. Together, TTL and event-based strategies form the foundation of practical cache invalidation.

Page Complete

You now possess deep expertise in TTL-based cache expiration. You can select appropriate TTL models, calculate optimal values, prevent stampedes, and design multi-layer TTL hierarchies. This knowledge forms the foundation for all cache invalidation strategies.

1 / 6

Loading learning content...

System Design (HLD)Cache Invalidation Strategies

Cache Invalidation Strategies

LevelIntermediate

Duration120 mins

TopicCache Invalidation Strategies

1 / 6

Time-based Expiration (TTL)

The Ticking Clock of Cached Data

What You Will Learn

Fundamentals of Time-To-Live

The Core Concept:

At its essence, TTL is a timestamp-based validity check. When data enters the cache, it's paired with an expiration timestamp:

expiration_time = current_time + TTL_duration

On every cache read, the system checks:

if current_time >= expiration_time:
    treat as cache miss (stale)
else:
    return cached value (fresh)

This simple mechanism underlies virtually every caching system, from browser HTTP caches to distributed Redis clusters.

Why TTL Works:

No coordination overhead — The source system doesn't need to notify caches of changes. Each cache independently manages its own expiration.
Predictable memory reclamation — Expired entries are guaranteed to be evicted eventually, preventing unbounded memory growth from orphaned cache entries.
Graceful staleness — Rather than serving arbitrarily old data forever, TTL bounds the maximum staleness, providing a freshness guarantee (even if approximate).
Simplicity — TTL requires only a timestamp comparison. No event buses, no change tracking, no distributed coordination protocols.

The Famous Quote

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

TTL Expiration Models: Absolute vs. Sliding

Absolute Expiration

•Definition: Entry expires at a fixed point in time, regardless of access patterns
•Formula: expiration = creation_time + TTL
•Behavior: Entry expires exactly N seconds after insertion
•Best for: Data with predictable staleness (product prices, weather, exchange rates)
•Memory impact: Predictable eviction; memory usage is bounded by TTL × insertion rate
•Example: Cache stock price for 60 seconds; always refresh from source after 60s

Sliding Expiration

•Definition: Entry expiration resets on every access (read or write)
•Formula: expiration = last_access_time + TTL
•Behavior: Entry only expires after N seconds of inactivity
•Best for: Session data, frequently accessed hot items, user preferences
•Memory impact: Hot data lives indefinitely; cold data evicts; less predictable
•Example: User session expires after 30 minutes of inactivity, not 30 minutes total

Practical Implications:

Hybrid Approaches:

Many production systems combine both models. A common pattern:

absolute_max_ttl = 24 hours (hard ceiling)
sliding_ttl = 30 minutes (activity-based)

Entry expires when:
  - 30 minutes pass without access, OR
  - 24 hours pass since creation (whichever comes first)

This prevents indefinitely cached stale data while still optimizing for access patterns.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import redis
import time
 
class HybridTTLCache:
    """
    Implements a hybrid absolute + sliding TTL strategy.
    
    - Absolute TTL: Maximum time entry can live (hard ceiling)
    - Sliding TTL: Time until expiration resets on each access
    """
    
    def __init__(self, redis_client: redis.Redis):
        self.client = redis_client
        self.ABSOLUTE_TTL = 86400  # 24 hours in seconds
        self.SLIDING_TTL = 1800    # 30 minutes in seconds
    
    def set(self, key: str, value: str) -> None:
        """Store value with both absolute and sliding TTL metadata."""
        creation_time = time.time()
        
        # Use Redis hash to store value + metadata
        self.client.hset(f"cache:{key}", mapping={
            "value": value,
            "created_at": creation_time,
            "last_accessed": creation_time
        })
        
        # Set initial TTL to sliding window
        self.client.expire(f"cache:{key}", self.SLIDING_TTL)
    
    def get(self, key: str) -> str | None:
        """Retrieve value, updating sliding TTL if within absolute bounds."""
        cache_key = f"cache:{key}"
        data = self.client.hgetall(cache_key)
        
        if not data:
            return None  # Cache miss
        
        current_time = time.time()
        created_at = float(data[b'created_at'])
        
        # Check if absolute TTL exceeded
        if current_time - created_at >= self.ABSOLUTE_TTL:
            self.client.delete(cache_key)
            return None  # Treat as miss, force refresh
        
        # Calculate remaining absolute time
        remaining_absolute = self.ABSOLUTE_TTL - (current_time - created_at)
        
        # New expiration is minimum of sliding TTL or remaining absolute time
        new_expiry = min(self.SLIDING_TTL, int(remaining_absolute))
        
        # Update last access time and reset expiration
        self.client.hset(cache_key, "last_accessed", current_time)
        self.client.expire(cache_key, new_expiry)
        
        return data[b'value'].decode('utf-8')

Choosing Between Models

Mathematical Models for TTL Selection

The Cache Hit Rate Model:

Let's define:

λ = average request rate (requests per second)
μ = average rate of source data changes (changes per second)
T = TTL (seconds)

The probability that cached data is still valid (cache hit with fresh data) follows:

P(fresh) = e^(-μ × T)

This is an exponential decay function. As TTL increases, the probability of serving stale data increases. As the source data change rate (μ) increases, stale probability increases faster.

The Freshness-Hit Rate Trade-off:

The cache hit rate depends on TTL relative to request frequency:

Very short TTL (T → 0): Nearly every request is a cache miss. You've effectively disabled caching. Hit rate approaches 0%.
Very long TTL (T → ∞): Nearly every request is a hit, but staleness grows unbounded. Hit rate approaches 100%, but freshness approaches 0%.

The optimal TTL balances these extremes. For a given workload:

Optimal TTL ≈ 1 / μ × ln(λ / μ)

Where λ >> μ (requests far exceed changes), longer TTLs are optimal. Where λ ≈ μ (requests and changes are similar), shorter TTLs prevent stale reads.

Example Calculation:

Consider a product catalog:

Request rate (λ) = 1000 requests/second
Update rate (μ) = 1 update/minute = 0.0167 updates/second

Optimal TTL ≈ (1 / 0.0167) × ln(1000 / 0.0167)
          ≈ 60 × ln(60000)
          ≈ 60 × 11.0
          ≈ 660 seconds ≈ 11 minutes

This suggests an 11-minute TTL balances hit rate against staleness for this workload.

TTL Selection Guide by Data Type
Data Type	Typical Update Frequency	Recommended TTL Range	Rationale
Static assets (images, fonts)	Weeks/months	1 day - 1 year	Immutable content; version via URL change
Configuration data	Days/weeks	1 hour - 24 hours	Changes are infrequent but should propagate reasonably
Product catalog	Hours/days	5 minutes - 1 hour	Price/inventory changes; balance freshness vs. load
User profile/preferences	User-dependent	5 - 30 minutes	Personal data; tolerable staleness is subjective
Session/auth tokens	Per request/action	Sliding: 15-60 minutes	Security-sensitive; sliding keeps active users valid
Real-time data (stock prices)	Seconds	5 - 60 seconds	High change rate; short TTL or consider push invalidation
Search results	Query-dependent	30 seconds - 5 minutes	Balance result freshness vs. query load reduction
Aggregated metrics/analytics	Batch updates	5 minutes - 1 hour	Expensive to compute; stale-on-revalidate pattern works well

Beware of Cargo Culting

The Cache Stampede Problem

The Problem Illustrated:

A cache entry for a popular resource expires
At the same moment, 1000 requests arrive for this resource
All 1000 requests see a cache miss
All 1000 requests hit the origin database simultaneously
Database becomes overwhelmed, latency spikes, potentially crashes
One of the 1000 requests eventually writes to cache
The other 999 wasted effort

This problem is especially severe when:

The cached computation is expensive (complex DB queries, ML inference)
The origin can't handle the concurrent load
Many cache entries expire at the same time (synchronized TTLs)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Timeline: TTL Expiration at T=0
 
Request Timeline:
T=-1s: [Cache HIT] ✓  1000 req/s served from cache
T=0:   [Cache EXPIRES] TTL reached
T=0.1s: [Cache MISS] ✗  Request 1: Hits DB
T=0.1s: [Cache MISS] ✗  Request 2: Hits DB
T=0.1s: [Cache MISS] ✗  Request 3: Hits DB
  ...
T=0.1s: [Cache MISS] ✗  Request 1000: Hits DB
 
Database Load:
T=-1s: [===]           50 QPS (background traffic)
T=0.1s: [===========...] 1050 QPS (STAMPEDE!)
                        ↑
                    21× spike!
 
Result: Latency spike, potential timeout cascade, partial outage

Mitigation Strategies:

1. Request Coalescing (Single-Flight Pattern)

When multiple concurrent requests target the same missing cache key, only one request actually computes the value. Other requests wait for the computation to complete and share the result.

2. Probabilistic Early Expiration (Background Refresh)

Before TTL expires, randomly selected requests pre-emptively refresh the cache. This ensures the cache is always warm when the actual TTL hits.

3. Lock-with-Lease Pattern

4. Stale-While-Revalidate

Serve stale data immediately while asynchronously refreshing the cache in the background. Users get fast responses (stale), and the cache refreshes without stampede.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
import asyncio
from typing import Dict, Any, Callable, Awaitable
from dataclasses import dataclass
from time import time
 
@dataclass
class InFlight:
    """Represents an in-progress cache computation."""
    future: asyncio.Future
    started_at: float
 
class SingleFlightCache:
    """
    Cache with request coalescing to prevent stampedes.
    
    When multiple requests arrive for the same missing key,
    only one actually computes the value. Others wait and
    share the result.
    """
    
    def __init__(self, cache_backend, ttl: int = 300):
        self.cache = cache_backend
        self.ttl = ttl
        self.in_flight: Dict[str, InFlight] = {}
        self._lock = asyncio.Lock()
    
    async def get_or_compute(
        self, 
        key: str, 
        compute_fn: Callable[[], Awaitable[Any]]
    ) -> Any:
        """
        Get from cache, or compute with deduplication.
        
        If computation is already in progress for this key,
        wait for it rather than starting a duplicate computation.
        """
        # Fast path: cache hit
        cached = await self.cache.get(key)
        if cached is not None:
            return cached
        
        # Slow path: cache miss, need to compute
        async with self._lock:
            # Double-check: maybe another request filled cache
            cached = await self.cache.get(key)
            if cached is not None:
                return cached
            
            # Check if computation already in flight
            if key in self.in_flight:
                # Wait for existing computation
                return await self.in_flight[key].future
            
            # We're the first: start computation
            future = asyncio.get_event_loop().create_future()
            self.in_flight[key] = InFlight(future=future, started_at=time())
        
        # Compute outside lock to allow concurrency
        try:
            result = await compute_fn()
            
            # Store in cache
            await self.cache.set(key, result, ttl=self.ttl)
            
            # Notify waiting requests
            self.in_flight[key].future.set_result(result)
            
            return result
        except Exception as e:
            # On failure, propagate error to all waiters
            self.in_flight[key].future.set_exception(e)
            raise
        finally:
            # Cleanup
            async with self._lock:
                del self.in_flight[key]
 
# Usage:
# cache = SingleFlightCache(redis_client, ttl=300)
# value = await cache.get_or_compute("product:123", fetch_product_from_db)

Stampedes Can Be Catastrophic

Jittered and Randomized TTL

The Problem with Uniform TTL:

# All entries written at T=0 with TTL=300s
Entry A: expires at T=300s
Entry B: expires at T=300s  
Entry C: expires at T=300s
...
Entry N: expires at T=300s

# At T=300s: ALL entries expire together → massive stampede

The Solution: Add Jitter

base_ttl = 300 seconds
jitter_range = 0.1 (10%)

actual_ttl = base_ttl × (1 + random(-jitter_range, +jitter_range))

# Results:
Entry A: TTL = 285s → expires at T=285s
Entry B: TTL = 312s → expires at T=312s
Entry C: TTL = 297s → expires at T=297s
Entry D: TTL = 330s → expires at T=330s

# Expirations spread across 45-second window instead of instant spike

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/**
 * Calculates a jittered TTL value to prevent synchronized expirations.
 * 
 * @param baseTTL - The base TTL in seconds
 * @param jitterPercent - The percentage of jitter (0.0 to 1.0)
 * @returns TTL with random jitter applied
 */
function getJitteredTTL(baseTTL: number, jitterPercent: number = 0.1): number {
    // jitterPercent of 0.1 means ±10% variation
    const jitterRange = baseTTL * jitterPercent;
    const jitter = (Math.random() * 2 - 1) * jitterRange;
    
    // Ensure TTL is at least 1 second
    return Math.max(1, Math.round(baseTTL + jitter));
}
 
/**
 * Alternative: Exponential jitter for longer-lived entries.
 * Provides more spread for longer TTLs.
 */
function getExponentialJitteredTTL(baseTTL: number): number {
    // Random factor between 0.8 and 1.2, with exponential distribution
    // This creates natural clustering around base while allowing outliers
    const factor = 0.8 + (Math.random() ** 2) * 0.4;
    return Math.max(1, Math.round(baseTTL * factor));
}
 
// Usage examples:
const productCacheTTL = getJitteredTTL(300, 0.1);     // 270-330s
const sessionTTL = getJitteredTTL(1800, 0.05);        // 1710-1890s
const configTTL = getExponentialJitteredTTL(3600);     // Varies around 1 hour
 
console.log(`Product cache: ${productCacheTTL}s`);
console.log(`Session cache: ${sessionTTL}s`);
console.log(`Config cache: ${configTTL}s`);

Advanced Jitter Strategies:

import hashlib

def deterministic_jitter(key: str, base_ttl: int, jitter_percent: float = 0.1) -> int:
    # Generate consistent pseudo-random value from key
    key_hash = int(hashlib.md5(key.encode()).hexdigest(), 16)
    jitter_factor = (key_hash % 1000) / 1000  # 0.0 to 0.999
    
    jitter_range = base_ttl * jitter_percent * 2  # Full range
    jitter = (jitter_factor * jitter_range) - (jitter_range / 2)
    
    return max(1, int(base_ttl + jitter))

2. Load-aware adaptive jitter: Increase jitter when system load is high, decrease when load is low. This dynamically spreads expirations during peak periods.

Rule of Thumb

TTL Hierarchies Across Cache Layers

The Multi-Layer Cache Stack:

┌─────────────────────────────────────────────────────┐
│  Browser Cache (Client)           TTL: 60-3600s    │
├─────────────────────────────────────────────────────┤
│  CDN Edge Cache                   TTL: 60-86400s   │
├─────────────────────────────────────────────────────┤
│  Application Cache (Redis/Memcached)  TTL: 60-3600s │
├─────────────────────────────────────────────────────┤
│  Database Query Cache              TTL: 10-300s    │
├─────────────────────────────────────────────────────┤
│  Origin Database                                    │
└─────────────────────────────────────────────────────┘

Critical Principle: TTL Should Decrease as You Approach Origin

The Aggregate Staleness Problem:

When TTLs stack, worst-case staleness is the sum of all TTLs in the chain:

Worst case staleness = Sum of all layer TTLs

Example:
  Browser cache TTL:     300s (5 min)
  CDN TTL:               600s (10 min)
  App cache TTL:         300s (5 min)
  -----------------------------------
  Worst case staleness: 1200s (20 min!)

User could see data that's 20 minutes old, even though no individual TTL exceeds 10 minutes.

Mitigation Strategies:

1. Set outer layers to fraction of inner layers:

Origin refresh: 60s
App cache TTL: 45s (75% of origin)
CDN TTL: 30s (50% of origin)
Browser TTL: 15s (25% of origin)

Worst case staleness: 60 + 45 + 30 + 15 = 150s (still bounded)

2. Use stale-while-revalidate at outer layers: Browser and CDN serve stale while fetching fresh, reducing user-visible staleness.

3. Propagate invalidation through layers: When origin data changes, actively invalidate outer caches rather than waiting for TTL.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# For static assets (long-lived, versioned by URL)
Cache-Control: public, max-age=31536000, immutable
# Browser + CDN cache for 1 year; URL changes on update
 
# For API responses (short-lived, must be fresh)
Cache-Control: private, max-age=60, stale-while-revalidate=300
# Browser caches 60s, can serve stale up to 5min while revalidating
 
# For user-specific data (no shared caching)  
Cache-Control: private, no-cache, max-age=0
# Force revalidation on every request
 
# CDN-specific with origin caching
Cache-Control: public, s-maxage=600, max-age=60
# CDN caches 10min, browser caches 1min
 
# Surrogates (CDN-layer control)
Surrogate-Control: max-age=86400, stale-while-revalidate=3600
Cache-Control: max-age=300
# CDN caches 1 day w/ 1hr stale-while-revalidate; browser 5min

Browser Cache is the Hardest to Control

Monitoring and Tuning TTL in Production

TTL values should not be set-and-forget. Production systems require continuous monitoring to validate that TTL settings achieve their intended goals and adapt to changing workloads.

Key Metrics to Track:

Essential TTL Monitoring Metrics

•Cache hit rate — Percentage of requests served from cache. Low hit rate suggests TTL is too short or keys are too specific.
•Time-to-expiration distribution — How long entries live before expiring. If most entries expire before their TTL, they're being evicted for other reasons (LRU, memory pressure).
•Staleness at access — Measure actual data age when served from cache. Compare to source data to calculate staleness.
•Origin load during TTL transitions — Spike in origin requests as TTLs expire indicates stampede behavior.
•Memory usage vs. TTL — Longer TTLs consume more memory. Track this relationship to find optimal balance.
•Expiration distribution over time — Should be uniform. Clustering indicates jitter problems or synchronized writes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Cache hit/miss rates with TTL context
- name: cache_requests_total
  type: counter
  labels: [cache_name, result, ttl_bucket]
  help: Total cache requests by result (hit/miss) and TTL bucket
 
- name: cache_entry_age_seconds
  type: histogram
  labels: [cache_name]
  buckets: [1, 5, 15, 30, 60, 120, 300, 600, 1800, 3600]
  help: Age of cache entries when accessed
 
- name: cache_ttl_remaining_seconds
  type: histogram  
  labels: [cache_name]
  buckets: [0, 10, 30, 60, 120, 300, 600]
  help: Remaining TTL when entries are accessed
 
- name: cache_origin_requests_total
  type: counter
  labels: [cache_name, reason]
  help: Requests to origin (miss, expired, invalidated)
 
- name: cache_stampede_coalesced_requests
  type: counter
  labels: [cache_name]
  help: Requests that waited on in-flight computation
 
# Alerting rules
groups:
  - name: cache_ttl_alerts
    rules:
      - alert: LowCacheHitRate
        expr: rate(cache_requests_total{result="miss"}[5m]) / rate(cache_requests_total[5m]) > 0.5
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Cache hit rate below 50%"
          description: "Consider increasing TTL or reviewing cache key design"
      
      - alert: CacheStampedeDetected
        expr: rate(cache_origin_requests_total{reason="expired"}[1m]) > 1000
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Potential cache stampede"
          description: "High rate of origin requests due to expirations"

Tuning Process:

1. Baseline measurement: Before changing TTL, measure current hit rate, latency distribution, and origin load for at least one full business cycle (typically a week).

2. Hypothesis formation: Based on metrics, form a specific hypothesis: "Increasing TTL from 5m to 15m will increase hit rate by 10% with acceptable staleness increase."

3. A/B testing or gradual rollout: Don't change TTL globally. Use feature flags or percentage-based rollout to test new TTL on a subset of traffic.

4. Measure impact: Compare metrics between control and experiment groups. Verify hypothesis, watch for unexpected side effects (memory growth, staleness complaints).

5. Iterate: TTL tuning is continuous. Workloads change, data patterns evolve. Revisit TTL settings quarterly or when significant system changes occur.

Start Conservative, Then Optimize

Summary: Time-Based Expiration Mastery

Time-To-Live based cache expiration is deceptively simple in concept but nuanced in practice. Let's consolidate the critical insights from this deep dive:

Key Takeaways

•TTL trades precision for simplicity — Rather than tracking every data change, TTL makes staleness time-bounded. This decouples cache from source and eliminates coordination overhead.
•Absolute vs. sliding expiration serve different needs — Use absolute for data freshness guarantees, sliding for access-pattern optimization, and hybrid for most production scenarios.
•TTL selection is a mathematical optimization — Analyze update frequency, request rate, and staleness tolerance to calculate optimal TTL rather than guessing arbitrary values.
•Cache stampedes are a critical failure mode — Implement request coalescing, early probabilistic refresh, or stale-while-revalidate to prevent synchronized expirations from overwhelming your origin.
•Jitter is non-negotiable — Always add randomness to TTL values to spread expirations over time. The overhead is zero; the protection is significant.
•Multi-layer TTL requires careful design — Aggregate staleness is the sum of layer TTLs. Design TTL hierarchies to bound worst-case data age.
•Continuous monitoring enables tuning — Track hit rate, entry age, origin load, and expiration distribution to validate and optimize TTL settings.

What's Next:

Page Complete

1 / 6