Loading learning content...
Every piece of cached data carries an implicit promise: the data you're reading reflects reality closely enough to be useful. But reality changes. Prices fluctuate, inventory depletes, user preferences evolve, and content updates. The fundamental question of cache invalidation is: How do we know when cached data has drifted too far from truth?
Time-To-Live (TTL) represents the simplest, most universal answer to this question. Rather than tracking every possible change in the source system, TTL makes a probabilistic bet: after a certain period, cached data is likely stale enough that we should refresh it.
This approach trades precision for simplicity. Instead of building complex invalidation pipelines that track every data mutation, TTL lets time itself serve as the invalidation trigger. It's the difference between a precision surgical strike and scheduled maintenance—both valid, but optimized for different contexts.
By the end of this page, you will deeply understand TTL-based cache expiration: the mathematical models that inform TTL selection, the distinction between absolute and sliding expiration, the cache stampede problem and its mitigations, tuning strategies for different data patterns, and how production systems implement TTL across multiple caching layers. You'll gain the expertise to configure TTL intelligently rather than guessing at arbitrary values.
Time-To-Live (TTL) is a value that specifies the maximum duration for which a cached entry remains valid. When a cache entry's TTL expires, the cache considers the entry stale and either evicts it immediately or marks it for refresh upon the next access.
The Core Concept:
At its essence, TTL is a timestamp-based validity check. When data enters the cache, it's paired with an expiration timestamp:
expiration_time = current_time + TTL_duration
On every cache read, the system checks:
if current_time >= expiration_time:
treat as cache miss (stale)
else:
return cached value (fresh)
This simple mechanism underlies virtually every caching system, from browser HTTP caches to distributed Redis clusters.
Why TTL Works:
TTL's power lies in its decoupling of cache from source system. The cache doesn't need to know how or when the source data changes—it only needs to know how long freshness matters. This decoupling provides several advantages:
No coordination overhead — The source system doesn't need to notify caches of changes. Each cache independently manages its own expiration.
Predictable memory reclamation — Expired entries are guaranteed to be evicted eventually, preventing unbounded memory growth from orphaned cache entries.
Graceful staleness — Rather than serving arbitrarily old data forever, TTL bounds the maximum staleness, providing a freshness guarantee (even if approximate).
Simplicity — TTL requires only a timestamp comparison. No event buses, no change tracking, no distributed coordination protocols.
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
TTL is often the first tool engineers reach for precisely because it sidesteps the complexity of explicit invalidation. Rather than solving invalidation perfectly, TTL makes it a time-bounded approximation problem.
Not all TTL implementations behave identically. The two primary models—absolute expiration and sliding expiration—serve fundamentally different use cases and have distinct performance characteristics.
expiration = creation_time + TTLexpiration = last_access_time + TTLPractical Implications:
Absolute expiration guarantees that cached data is never older than TTL seconds. This is critical when data freshness has a strict upper bound—for example, regulatory requirements that prices must be updated within a certain window, or security tokens that must be re-validated periodically.
Sliding expiration optimizes for access patterns. Frequently accessed data stays cached, reducing load on the source system. Infrequently accessed data expires quickly, conserving memory. This model works well for session management, where active users should remain cached but abandoned sessions should evict.
Hybrid Approaches:
Many production systems combine both models. A common pattern:
absolute_max_ttl = 24 hours (hard ceiling)
sliding_ttl = 30 minutes (activity-based)
Entry expires when:
- 30 minutes pass without access, OR
- 24 hours pass since creation (whichever comes first)
This prevents indefinitely cached stale data while still optimizing for access patterns.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
import redisimport time class HybridTTLCache: """ Implements a hybrid absolute + sliding TTL strategy. - Absolute TTL: Maximum time entry can live (hard ceiling) - Sliding TTL: Time until expiration resets on each access """ def __init__(self, redis_client: redis.Redis): self.client = redis_client self.ABSOLUTE_TTL = 86400 # 24 hours in seconds self.SLIDING_TTL = 1800 # 30 minutes in seconds def set(self, key: str, value: str) -> None: """Store value with both absolute and sliding TTL metadata.""" creation_time = time.time() # Use Redis hash to store value + metadata self.client.hset(f"cache:{key}", mapping={ "value": value, "created_at": creation_time, "last_accessed": creation_time }) # Set initial TTL to sliding window self.client.expire(f"cache:{key}", self.SLIDING_TTL) def get(self, key: str) -> str | None: """Retrieve value, updating sliding TTL if within absolute bounds.""" cache_key = f"cache:{key}" data = self.client.hgetall(cache_key) if not data: return None # Cache miss current_time = time.time() created_at = float(data[b'created_at']) # Check if absolute TTL exceeded if current_time - created_at >= self.ABSOLUTE_TTL: self.client.delete(cache_key) return None # Treat as miss, force refresh # Calculate remaining absolute time remaining_absolute = self.ABSOLUTE_TTL - (current_time - created_at) # New expiration is minimum of sliding TTL or remaining absolute time new_expiry = min(self.SLIDING_TTL, int(remaining_absolute)) # Update last access time and reset expiration self.client.hset(cache_key, "last_accessed", current_time) self.client.expire(cache_key, new_expiry) return data[b'value'].decode('utf-8')Use absolute expiration when data has a known staleness threshold (prices, scores, metrics). Use sliding expiration when access patterns indicate relevance (sessions, user state). Use hybrid when you need both freshness guarantees and access optimization. Most production systems benefit from hybrid approaches.
Selecting optimal TTL values is not guesswork—it's an engineering decision informed by data patterns, performance requirements, and business constraints. Understanding the mathematical relationships helps you make principled choices.
The Cache Hit Rate Model:
Let's define:
The probability that cached data is still valid (cache hit with fresh data) follows:
P(fresh) = e^(-μ × T)
This is an exponential decay function. As TTL increases, the probability of serving stale data increases. As the source data change rate (μ) increases, stale probability increases faster.
The Freshness-Hit Rate Trade-off:
The cache hit rate depends on TTL relative to request frequency:
Very short TTL (T → 0): Nearly every request is a cache miss. You've effectively disabled caching. Hit rate approaches 0%.
Very long TTL (T → ∞): Nearly every request is a hit, but staleness grows unbounded. Hit rate approaches 100%, but freshness approaches 0%.
The optimal TTL balances these extremes. For a given workload:
Optimal TTL ≈ 1 / μ × ln(λ / μ)
Where λ >> μ (requests far exceed changes), longer TTLs are optimal. Where λ ≈ μ (requests and changes are similar), shorter TTLs prevent stale reads.
Example Calculation:
Consider a product catalog:
Optimal TTL ≈ (1 / 0.0167) × ln(1000 / 0.0167)
≈ 60 × ln(60000)
≈ 60 × 11.0
≈ 660 seconds ≈ 11 minutes
This suggests an 11-minute TTL balances hit rate against staleness for this workload.
| Data Type | Typical Update Frequency | Recommended TTL Range | Rationale |
|---|---|---|---|
| Static assets (images, fonts) | Weeks/months | 1 day - 1 year | Immutable content; version via URL change |
| Configuration data | Days/weeks | 1 hour - 24 hours | Changes are infrequent but should propagate reasonably |
| Product catalog | Hours/days | 5 minutes - 1 hour | Price/inventory changes; balance freshness vs. load |
| User profile/preferences | User-dependent | 5 - 30 minutes | Personal data; tolerable staleness is subjective |
| Session/auth tokens | Per request/action | Sliding: 15-60 minutes | Security-sensitive; sliding keeps active users valid |
| Real-time data (stock prices) | Seconds | 5 - 60 seconds | High change rate; short TTL or consider push invalidation |
| Search results | Query-dependent | 30 seconds - 5 minutes | Balance result freshness vs. query load reduction |
| Aggregated metrics/analytics | Batch updates | 5 minutes - 1 hour | Expensive to compute; stale-on-revalidate pattern works well |
Many teams set TTL to arbitrary round numbers (5 minutes, 1 hour) without analysis. This often leaves performance on the table or serves unnecessarily stale data. Always analyze your actual update frequency and request patterns to select TTL rationally. Instrument your cache to measure actual staleness when possible.
One of the most insidious problems with TTL-based caching is the cache stampede (also called thundering herd or cache avalanche). This occurs when a popular cache entry expires, and multiple concurrent requests simultaneously attempt to regenerate the cached value.
The Problem Illustrated:
This problem is especially severe when:
123456789101112131415161718
Timeline: TTL Expiration at T=0 Request Timeline:T=-1s: [Cache HIT] ✓ 1000 req/s served from cacheT=0: [Cache EXPIRES] TTL reachedT=0.1s: [Cache MISS] ✗ Request 1: Hits DBT=0.1s: [Cache MISS] ✗ Request 2: Hits DBT=0.1s: [Cache MISS] ✗ Request 3: Hits DB ...T=0.1s: [Cache MISS] ✗ Request 1000: Hits DB Database Load:T=-1s: [===] 50 QPS (background traffic)T=0.1s: [===========...] 1050 QPS (STAMPEDE!) ↑ 21× spike! Result: Latency spike, potential timeout cascade, partial outageMitigation Strategies:
1. Request Coalescing (Single-Flight Pattern)
When multiple concurrent requests target the same missing cache key, only one request actually computes the value. Other requests wait for the computation to complete and share the result.
2. Probabilistic Early Expiration (Background Refresh)
Before TTL expires, randomly selected requests pre-emptively refresh the cache. This ensures the cache is always warm when the actual TTL hits.
3. Lock-with-Lease Pattern
The first request to find a cache miss acquires a short-lived lock, preventing other requests from also hitting the origin. Other requests either wait for the lock to release (and cache to repopulate) or return stale data with a warning.
4. Stale-While-Revalidate
Serve stale data immediately while asynchronously refreshing the cache in the background. Users get fast responses (stale), and the cache refreshes without stampede.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
import asynciofrom typing import Dict, Any, Callable, Awaitablefrom dataclasses import dataclassfrom time import time @dataclassclass InFlight: """Represents an in-progress cache computation.""" future: asyncio.Future started_at: float class SingleFlightCache: """ Cache with request coalescing to prevent stampedes. When multiple requests arrive for the same missing key, only one actually computes the value. Others wait and share the result. """ def __init__(self, cache_backend, ttl: int = 300): self.cache = cache_backend self.ttl = ttl self.in_flight: Dict[str, InFlight] = {} self._lock = asyncio.Lock() async def get_or_compute( self, key: str, compute_fn: Callable[[], Awaitable[Any]] ) -> Any: """ Get from cache, or compute with deduplication. If computation is already in progress for this key, wait for it rather than starting a duplicate computation. """ # Fast path: cache hit cached = await self.cache.get(key) if cached is not None: return cached # Slow path: cache miss, need to compute async with self._lock: # Double-check: maybe another request filled cache cached = await self.cache.get(key) if cached is not None: return cached # Check if computation already in flight if key in self.in_flight: # Wait for existing computation return await self.in_flight[key].future # We're the first: start computation future = asyncio.get_event_loop().create_future() self.in_flight[key] = InFlight(future=future, started_at=time()) # Compute outside lock to allow concurrency try: result = await compute_fn() # Store in cache await self.cache.set(key, result, ttl=self.ttl) # Notify waiting requests self.in_flight[key].future.set_result(result) return result except Exception as e: # On failure, propagate error to all waiters self.in_flight[key].future.set_exception(e) raise finally: # Cleanup async with self._lock: del self.in_flight[key] # Usage:# cache = SingleFlightCache(redis_client, ttl=300)# value = await cache.get_or_compute("product:123", fetch_product_from_db)A single TTL expiration on a popular cache key can cascade into a full system outage. In 2012, Facebook famously experienced a cache stampede that generated 20 million memcached operations per second, overwhelming their infrastructure. Always implement stampede prevention for high-traffic cache entries.
A subtle but critical optimization is TTL jitter—adding randomness to TTL values to prevent synchronized expirations. When many cache entries are written at roughly the same time (e.g., after a cold start or cache flush), using identical TTLs causes them all to expire simultaneously, creating a coordinated stampede.
The Problem with Uniform TTL:
# All entries written at T=0 with TTL=300s
Entry A: expires at T=300s
Entry B: expires at T=300s
Entry C: expires at T=300s
...
Entry N: expires at T=300s
# At T=300s: ALL entries expire together → massive stampede
The Solution: Add Jitter
base_ttl = 300 seconds
jitter_range = 0.1 (10%)
actual_ttl = base_ttl × (1 + random(-jitter_range, +jitter_range))
# Results:
Entry A: TTL = 285s → expires at T=285s
Entry B: TTL = 312s → expires at T=312s
Entry C: TTL = 297s → expires at T=297s
Entry D: TTL = 330s → expires at T=330s
# Expirations spread across 45-second window instead of instant spike
1234567891011121314151617181920212223242526272829303132333435
/** * Calculates a jittered TTL value to prevent synchronized expirations. * * @param baseTTL - The base TTL in seconds * @param jitterPercent - The percentage of jitter (0.0 to 1.0) * @returns TTL with random jitter applied */function getJitteredTTL(baseTTL: number, jitterPercent: number = 0.1): number { // jitterPercent of 0.1 means ±10% variation const jitterRange = baseTTL * jitterPercent; const jitter = (Math.random() * 2 - 1) * jitterRange; // Ensure TTL is at least 1 second return Math.max(1, Math.round(baseTTL + jitter));} /** * Alternative: Exponential jitter for longer-lived entries. * Provides more spread for longer TTLs. */function getExponentialJitteredTTL(baseTTL: number): number { // Random factor between 0.8 and 1.2, with exponential distribution // This creates natural clustering around base while allowing outliers const factor = 0.8 + (Math.random() ** 2) * 0.4; return Math.max(1, Math.round(baseTTL * factor));} // Usage examples:const productCacheTTL = getJitteredTTL(300, 0.1); // 270-330sconst sessionTTL = getJitteredTTL(1800, 0.05); // 1710-1890sconst configTTL = getExponentialJitteredTTL(3600); // Varies around 1 hour console.log(`Product cache: ${productCacheTTL}s`);console.log(`Session cache: ${sessionTTL}s`);console.log(`Config cache: ${configTTL}s`);Advanced Jitter Strategies:
1. Entry-based deterministic jitter: Rather than random jitter, compute jitter from the cache key hash. This ensures the same key always has the same jittered TTL, preventing erratic behavior while still spreading expirations.
import hashlib
def deterministic_jitter(key: str, base_ttl: int, jitter_percent: float = 0.1) -> int:
# Generate consistent pseudo-random value from key
key_hash = int(hashlib.md5(key.encode()).hexdigest(), 16)
jitter_factor = (key_hash % 1000) / 1000 # 0.0 to 0.999
jitter_range = base_ttl * jitter_percent * 2 # Full range
jitter = (jitter_factor * jitter_range) - (jitter_range / 2)
return max(1, int(base_ttl + jitter))
2. Load-aware adaptive jitter: Increase jitter when system load is high, decrease when load is low. This dynamically spreads expirations during peak periods.
3. Cohort-based staggering: Group cache entries into time-based cohorts (e.g., by minute of initial write), with each cohort having a different base TTL. This organically prevents mass expirations.
Always add at least 5-10% jitter to TTL values unless you have a specific reason for precise expiration timing (rare). The performance cost is zero, but the stampede prevention is significant. Many production disasters have been caused by synchronized cache expirations that jitter would have prevented.
Production systems typically employ multiple caching layers, each with its own TTL configuration. Understanding how TTLs interact across layers is crucial for maintaining consistency and optimizing performance.
The Multi-Layer Cache Stack:
┌─────────────────────────────────────────────────────┐
│ Browser Cache (Client) TTL: 60-3600s │
├─────────────────────────────────────────────────────┤
│ CDN Edge Cache TTL: 60-86400s │
├─────────────────────────────────────────────────────┤
│ Application Cache (Redis/Memcached) TTL: 60-3600s │
├─────────────────────────────────────────────────────┤
│ Database Query Cache TTL: 10-300s │
├─────────────────────────────────────────────────────┤
│ Origin Database │
└─────────────────────────────────────────────────────┘
Critical Principle: TTL Should Decrease as You Approach Origin
Entries closer to the user can tolerate longer TTLs (higher staleness tolerance), while caches closer to the origin should have shorter TTLs (stricter freshness). This creates a staleness gradient that limits worst-case data age.
The Aggregate Staleness Problem:
When TTLs stack, worst-case staleness is the sum of all TTLs in the chain:
Worst case staleness = Sum of all layer TTLs
Example:
Browser cache TTL: 300s (5 min)
CDN TTL: 600s (10 min)
App cache TTL: 300s (5 min)
-----------------------------------
Worst case staleness: 1200s (20 min!)
User could see data that's 20 minutes old, even though no individual TTL exceeds 10 minutes.
Mitigation Strategies:
1. Set outer layers to fraction of inner layers:
Origin refresh: 60s
App cache TTL: 45s (75% of origin)
CDN TTL: 30s (50% of origin)
Browser TTL: 15s (25% of origin)
Worst case staleness: 60 + 45 + 30 + 15 = 150s (still bounded)
2. Use stale-while-revalidate at outer layers: Browser and CDN serve stale while fetching fresh, reducing user-visible staleness.
3. Propagate invalidation through layers: When origin data changes, actively invalidate outer caches rather than waiting for TTL.
1234567891011121314151617181920
# For static assets (long-lived, versioned by URL)Cache-Control: public, max-age=31536000, immutable# Browser + CDN cache for 1 year; URL changes on update # For API responses (short-lived, must be fresh)Cache-Control: private, max-age=60, stale-while-revalidate=300# Browser caches 60s, can serve stale up to 5min while revalidating # For user-specific data (no shared caching) Cache-Control: private, no-cache, max-age=0# Force revalidation on every request # CDN-specific with origin cachingCache-Control: public, s-maxage=600, max-age=60# CDN caches 10min, browser caches 1min # Surrogates (CDN-layer control)Surrogate-Control: max-age=86400, stale-while-revalidate=3600Cache-Control: max-age=300# CDN caches 1 day w/ 1hr stale-while-revalidate; browser 5minOnce data is in a user's browser cache, you cannot invalidate it remotely. This is why browser TTLs should be conservative and why versioned URLs (cache busting) are standard for static assets. For dynamic data, prefer no-cache (revalidate every time) over long TTLs.
TTL values should not be set-and-forget. Production systems require continuous monitoring to validate that TTL settings achieve their intended goals and adapt to changing workloads.
Key Metrics to Track:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
# Cache hit/miss rates with TTL context- name: cache_requests_total type: counter labels: [cache_name, result, ttl_bucket] help: Total cache requests by result (hit/miss) and TTL bucket - name: cache_entry_age_seconds type: histogram labels: [cache_name] buckets: [1, 5, 15, 30, 60, 120, 300, 600, 1800, 3600] help: Age of cache entries when accessed - name: cache_ttl_remaining_seconds type: histogram labels: [cache_name] buckets: [0, 10, 30, 60, 120, 300, 600] help: Remaining TTL when entries are accessed - name: cache_origin_requests_total type: counter labels: [cache_name, reason] help: Requests to origin (miss, expired, invalidated) - name: cache_stampede_coalesced_requests type: counter labels: [cache_name] help: Requests that waited on in-flight computation # Alerting rulesgroups: - name: cache_ttl_alerts rules: - alert: LowCacheHitRate expr: rate(cache_requests_total{result="miss"}[5m]) / rate(cache_requests_total[5m]) > 0.5 for: 10m labels: severity: warning annotations: summary: "Cache hit rate below 50%" description: "Consider increasing TTL or reviewing cache key design" - alert: CacheStampedeDetected expr: rate(cache_origin_requests_total{reason="expired"}[1m]) > 1000 for: 1m labels: severity: critical annotations: summary: "Potential cache stampede" description: "High rate of origin requests due to expirations"Tuning Process:
1. Baseline measurement: Before changing TTL, measure current hit rate, latency distribution, and origin load for at least one full business cycle (typically a week).
2. Hypothesis formation: Based on metrics, form a specific hypothesis: "Increasing TTL from 5m to 15m will increase hit rate by 10% with acceptable staleness increase."
3. A/B testing or gradual rollout: Don't change TTL globally. Use feature flags or percentage-based rollout to test new TTL on a subset of traffic.
4. Measure impact: Compare metrics between control and experiment groups. Verify hypothesis, watch for unexpected side effects (memory growth, staleness complaints).
5. Iterate: TTL tuning is continuous. Workloads change, data patterns evolve. Revisit TTL settings quarterly or when significant system changes occur.
When uncertain, start with shorter TTLs and increase based on data. It's easier to relax freshness requirements than to recover from stale data being served to users. Always prioritize correctness over hit rate—an incorrect cache hit is worse than a miss.
Time-To-Live based cache expiration is deceptively simple in concept but nuanced in practice. Let's consolidate the critical insights from this deep dive:
What's Next:
TTL is the most common but not the only invalidation strategy. The next page explores Event-Based Invalidation—where changes in source data actively trigger cache invalidation, providing stronger consistency guarantees at the cost of additional complexity. Together, TTL and event-based strategies form the foundation of practical cache invalidation.
You now possess deep expertise in TTL-based cache expiration. You can select appropriate TTL models, calculate optimal values, prevent stampedes, and design multi-layer TTL hierarchies. This knowledge forms the foundation for all cache invalidation strategies.