Loading learning content...
Memory is finite. This simple truth underlies one of the most critical decisions in cache design: when the cache is full and a new item must be stored, what gets removed?
This decision—cache eviction—appears deceptively simple. The naive answer is "remove something old or unused." But the engineering reality is far more nuanced. The choice of eviction policy directly impacts cache hit rate, which in turn affects system latency, origin load, and infrastructure costs. A 5% improvement in hit rate might mean 50% fewer requests hitting your database.
Eviction policies have been studied extensively in computer science for decades, from CPU cache design to virtual memory systems to distributed caches. The fundamental challenge is predicting the future: which items will be accessed soon, and which won't? Since we can't actually predict the future, eviction policies use heuristics based on past access patterns.
Different policies make different assumptions about access patterns, and no single policy dominates all workloads. Understanding the trade-offs between policies—and when each excels—is essential for designing high-performance caches.
This page examines eviction policies comprehensively: from classical algorithms through modern approximations, from theoretical optimality through practical implementation constraints.
By the end of this page, you will understand the theoretical basis for eviction policies and the optimal algorithm, compare LRU, LFU, and hybrid policies with their trade-offs, implement efficient approximations used in production systems, analyze which policies suit which workloads, and recognize the impact of eviction policy on overall cache performance.
Before examining practical policies, we must understand the theoretical optimum: Bélády's MIN algorithm (1966).
The Algorithm
Bélády's algorithm is simple: when eviction is needed, remove the item that will be used furthest in the future (or never used again). This minimizes cache misses because it keeps items that will be needed soon.
Why It's Optimal
The intuition is straightforward: if we evict something that won't be needed for a long time, we delay the miss penalty. If we evict something needed immediately, we cause an immediate miss.
Bélády proved this is optimal: no other policy can achieve fewer cache misses for a given sequence of accesses.
Why We Can't Use It
The catch is obvious: Bélády's algorithm requires knowing the future. In real systems, we don't know which keys will be accessed next. This makes Bélády's algorithm useful only as a benchmark—we can compare actual policies against it to see how close to optimal they perform.
The Benchmark Gap
| Policy | Hit Rate | Gap from Optimal | When It Excels |
|---|---|---|---|
| Bélády's MIN | 97% | — | Theoretical only |
| LRU | 89% | ~8% | Temporal locality workloads |
| LFU | 91% | ~6% | Frequency-skewed workloads |
| ARC (Adaptive) | 94% | ~3% | Mixed workloads |
| Random | 75% | ~22% | Never (baseline only) |
Since Bélády's algorithm is impractical, all real eviction policies are heuristics that approximate it. The goal is to get as close to optimal as possible while remaining computationally efficient. The best policies achieve hit rates within a few percentage points of optimal for typical workloads.
LRU is the most widely used eviction policy. Its premise is simple: items accessed recently are likely to be accessed again soon (temporal locality). When eviction is needed, remove the item that was accessed longest ago.
Why LRU Works Well
Many real-world access patterns exhibit temporal locality:
For these patterns, LRU approximates Bélády well because 'recently used' often correlates with 'soon to be used'.
LRU Implementation
The classic LRU implementation uses a doubly-linked list + hash map:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
from collections import OrderedDictfrom typing import Optional, TypeVar, Generic K = TypeVar('K')V = TypeVar('V') class LRUCache(Generic[K, V]): """ LRU Cache implementation using OrderedDict. Operations: - get(key): O(1) - put(key, value): O(1) - eviction: O(1) """ def __init__(self, capacity: int): self.capacity = capacity self.cache: OrderedDict[K, V] = OrderedDict() def get(self, key: K) -> Optional[V]: """ Get value for key, marking it as most recently used. Returns None if key not present. """ if key not in self.cache: return None # Move to end (most recently used) self.cache.move_to_end(key) return self.cache[key] def put(self, key: K, value: V) -> Optional[K]: """ Insert or update key-value pair. Returns evicted key if eviction occurred, None otherwise. """ evicted_key = None if key in self.cache: # Update existing: move to end self.cache.move_to_end(key) else: # New key: check capacity if len(self.cache) >= self.capacity: # Evict least recently used (first item) evicted_key, _ = self.cache.popitem(last=False) self.cache[key] = value return evicted_key def delete(self, key: K) -> bool: """Remove key from cache. Returns True if key existed.""" if key in self.cache: del self.cache[key] return True return False # Example usagecache = LRUCache[str, str](capacity=3)cache.put("a", "1") # Cache: [a]cache.put("b", "2") # Cache: [a, b]cache.put("c", "3") # Cache: [a, b, c]cache.get("a") # Cache: [b, c, a] (a moved to end)cache.put("d", "4") # Cache: [c, a, d] (b evicted)LRU Data Structure
The combination of doubly-linked list and hash map provides:
A single sequential scan of data larger than the cache (e.g., a backup job reading all records) evicts the entire hot working set. After the scan, the cache is cold and hit rate drops to near zero until the working set is rebuilt. This is LRU's most significant weakness.
LFU takes a different approach: items accessed more often are more valuable (frequency locality). When eviction is needed, remove the item with the lowest access count.
Why LFU Makes Sense
Some access patterns are frequency-dominated:
For these workloads, an item's past frequency is a strong predictor of future access probability.
Naive LFU Implementation
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
from collections import defaultdictfrom typing import Optional, TypeVar, Generic, Dict, Set K = TypeVar('K')V = TypeVar('V') class LFUCache(Generic[K, V]): """ LFU Cache implementation with O(1) operations. Uses three data structures: 1. key -> (value, frequency) mapping 2. frequency -> set of keys with that frequency 3. Track minimum frequency for O(1) eviction """ def __init__(self, capacity: int): self.capacity = capacity self.min_freq = 0 # Key -> (value, frequency) self.key_value_freq: Dict[K, tuple[V, int]] = {} # Frequency -> OrderedSet of keys (for LRU tie-breaking) # Using dict as ordered set (Python 3.7+) self.freq_keys: Dict[int, Dict[K, None]] = defaultdict(dict) def _update_frequency(self, key: K) -> None: """Increment frequency of a key.""" value, freq = self.key_value_freq[key] # Remove from current frequency bucket del self.freq_keys[freq][key] # Update min_freq if current bucket is now empty if not self.freq_keys[freq]: del self.freq_keys[freq] if self.min_freq == freq: self.min_freq = freq + 1 # Add to new frequency bucket new_freq = freq + 1 self.freq_keys[new_freq][key] = None # Using dict as ordered set self.key_value_freq[key] = (value, new_freq) def get(self, key: K) -> Optional[V]: """Get value and increment frequency.""" if key not in self.key_value_freq: return None self._update_frequency(key) return self.key_value_freq[key][0] def put(self, key: K, value: V) -> Optional[K]: """Insert or update, with LFU eviction if needed.""" if self.capacity <= 0: return None evicted_key = None if key in self.key_value_freq: # Update existing key _, freq = self.key_value_freq[key] self.key_value_freq[key] = (value, freq) self._update_frequency(key) else: # New key if len(self.key_value_freq) >= self.capacity: # Evict LFU (use LRU for tie-breaking) lfu_keys = self.freq_keys[self.min_freq] evicted_key = next(iter(lfu_keys)) # First inserted at this freq del lfu_keys[evicted_key] if not lfu_keys: del self.freq_keys[self.min_freq] del self.key_value_freq[evicted_key] # Insert new key with frequency 1 self.key_value_freq[key] = (value, 1) self.freq_keys[1][key] = None self.min_freq = 1 return evicted_keyLFU Trade-offs
LFU's biggest weakness: items that were once popular but are no longer relevant keep high counts forever. A news article from last year with 1 million historical accesses will never be evicted, even though it will never be accessed again. This is called 'cache pollution' or 'history pollution.'
LRU and LFU each have significant weaknesses. Modern eviction policies combine their strengths or adapt based on observed access patterns.
LRU-K and 2Q
LRU-K (1993) tracks the K-th to last access time instead of just the most recent. An item is only promoted to the main cache after K accesses. This provides scan resistance while maintaining LRU's recency benefits.
2Q ('Two Queue', 1994) simplifies LRU-K with two queues:
Single-access items (scans) stay in A1in and are evicted quickly. Frequently accessed items graduate to Am.
ARC: Adaptive Replacement Cache
IBM's ARC (2003) is a landmark algorithm that adapts between LRU and LFU behavior automatically:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
class ARCCache: """ Simplified conceptual ARC implementation. Real ARC is more complex but this captures the key ideas. """ def __init__(self, capacity: int): self.c = capacity # Total cache capacity self.p = 0 # Target size for T1 (adaptive parameter) # Real cache entries self.T1 = OrderedDict() # Recent, single access self.T2 = OrderedDict() # Frequent, multiple access # Ghost entries (just keys, no values) self.B1 = OrderedDict() # Recently evicted from T1 self.B2 = OrderedDict() # Recently evicted from T2 def _adapt_on_b1_hit(self, key): """B1 hit: items similar to this key should have stayed in T1.""" # Increase p to give T1 more space delta = max(1, len(self.B2) // len(self.B1)) if self.B1 else 1 self.p = min(self.p + delta, self.c) def _adapt_on_b2_hit(self, key): """B2 hit: items similar to this key should have stayed in T2.""" # Decrease p to give T2 more space delta = max(1, len(self.B1) // len(self.B2)) if self.B2 else 1 self.p = max(self.p - delta, 0) def get(self, key): if key in self.T1: # Promote from T1 to T2 (now a frequent item) value = self.T1.pop(key) self.T2[key] = value return value if key in self.T2: # Already frequent, move to end (LRU within T2) self.T2.move_to_end(key) return self.T2[key] # Cache miss if key in self.B1: self._adapt_on_b1_hit(key) elif key in self.B2: self._adapt_on_b2_hit(key) return None # put() implementation would handle eviction between T1 and T2 # based on the adaptive parameter pTinyLFU and W-TinyLFU
Caffeine (Java's best cache library) uses W-TinyLFU, combining:
The TinyLFU filter uses a Count-Min Sketch (probabilistic frequency counter) with minimal memory overhead. Items must pass the frequency filter to graduate from window to main cache.
| Policy | Scan Resistant | Adapts to Pattern | Memory Overhead | Complexity |
|---|---|---|---|---|
| LRU | No | No | Low | Simple |
| LFU | Yes | No | Medium | Medium |
| 2Q | Yes | No | Low | Simple |
| ARC | Yes | Yes | 2x cache size | Complex |
| W-TinyLFU | Yes | Somewhat | Low | Complex |
For most applications: start with LRU. It's well-understood, has minimal overhead, and works well for typical workloads. If you observe scan-related hit rate drops or stable popularity distributions where LFU would help, consider W-TinyLFU (use Caffeine in Java, or Ben Manes' implementation). ARC has patents that may affect commercial use.
Pure LRU's linked-list implementation has overhead: extra memory for pointers and lock contention on every access. Production systems often use approximated LRU that trades accuracy for efficiency.
Redis: Sampling-Based LRU
Redis approximates LRU by sampling:
This avoids maintaining a global ordering while approximating LRU behavior. Testing shows that sampling 10 keys gets very close to true LRU.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
import randomfrom time import timefrom typing import Dict, Tuple, Optional class SamplingLRUCache: """ Approximated LRU using random sampling. Similar to Redis's approach. Advantages: - O(1) lookup and update (no list maintenance) - Lower memory overhead (just a timestamp per entry) - Better concurrency (no shared list to update) """ def __init__(self, capacity: int, sample_size: int = 5): self.capacity = capacity self.sample_size = sample_size # Key -> (value, last_access_timestamp) self.cache: Dict[str, Tuple[any, float]] = {} def get(self, key: str) -> Optional[any]: """Get value, updating access time.""" if key not in self.cache: return None value, _ = self.cache[key] self.cache[key] = (value, time()) return value def put(self, key: str, value: any) -> Optional[str]: """Insert with sampling-based eviction.""" evicted = None if key not in self.cache and len(self.cache) >= self.capacity: # Need to evict: sample random keys and evict oldest sample_keys = random.sample(list(self.cache.keys()), min(self.sample_size, len(self.cache))) # Find oldest in sample oldest_key = min(sample_keys, key=lambda k: self.cache[k][1]) del self.cache[oldest_key] evicted = oldest_key self.cache[key] = (value, time()) return evicted # Comparison of exact vs sampled LRU hit ratesdef simulate_hit_rates(): """Compare exact LRU vs sampling approximation.""" import random # Zipf-distributed access pattern def zipf_key(n_keys=1000, s=1.5): # Simplified Zipf weights = [1.0 / (i ** s) for i in range(1, n_keys + 1)] total = sum(weights) weights = [w / total for w in weights] return f"key:{random.choices(range(n_keys), weights)[0]}" cache_size = 100 num_accesses = 100000 for sample_size in [1, 3, 5, 10, 100]: if sample_size == 100: cache = LRUCache(cache_size) # Exact LRU for comparison name = "Exact LRU" else: cache = SamplingLRUCache(cache_size, sample_size) name = f"Sample-{sample_size}" hits = 0 for _ in range(num_accesses): key = zipf_key() if cache.get(key) is not None: hits += 1 else: cache.put(key, "value") hit_rate = hits / num_accesses * 100 print(f"{name:12s}: {hit_rate:.1f}% hit rate")Clock Algorithm (Second-Chance)
Another popular approximation is the Clock algorithm, used in operating systems for page replacement:
This approximates LRU without maintaining ordering—it gives items a 'second chance' before eviction.
Approximated LRU also benefits cache-friendly memory layout. A linked list's nodes scatter across memory, causing CPU cache misses. Array-based approximations (like clock) or hash-table-only approaches (like sampling) can be much faster due to CPU cache locality—ironic for a discussion about application-level caching!
Beyond capacity-based eviction, caches also remove items based on time: entries expire after a configured TTL (Time To Live). TTL eviction and capacity eviction are orthogonal but interact.
Why TTL Matters
TTL Implementation Approaches
| Strategy | How It Works | Pros | Cons |
|---|---|---|---|
| Lazy (on-access) | Check expiration on get(); delete if expired | No background work | Memory not freed until access |
| Periodic scan | Background thread scans and deletes expired | Reclaims memory | CPU overhead; scan latency |
| Bucket-based | Group expires into time buckets; process buckets | Efficient batch deletion | Implementation complexity |
| Timer wheel | Scheduled timers per item or bucket | Precise expiration timing | Timer management overhead |
Redis TTL Implementation
Redis uses a hybrid approach:
This balances memory reclamation with CPU usage.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
from time import timefrom typing import Optional, Dict, Tuple class TTLCache: """ Cache with TTL support using lazy + periodic expiration. """ def __init__(self, capacity: int, default_ttl: float = 300.0): self.capacity = capacity self.default_ttl = default_ttl # Key -> (value, expiration_timestamp) self.cache: Dict[str, Tuple[any, float]] = {} self.last_cleanup = time() self.cleanup_interval = 60.0 # Periodic cleanup every 60s def _is_expired(self, key: str) -> bool: """Check if a key is expired.""" if key not in self.cache: return True _, expiry = self.cache[key] return time() > expiry def _lazy_expire(self, key: str) -> bool: """Delete key if expired. Returns True if expired.""" if self._is_expired(key): if key in self.cache: del self.cache[key] return True return False def _maybe_periodic_cleanup(self): """Run periodic cleanup if enough time has passed.""" current_time = time() if current_time - self.last_cleanup < self.cleanup_interval: return self.last_cleanup = current_time # Sample-based cleanup like Redis sample_size = min(20, len(self.cache)) if sample_size == 0: return import random expired_count = 0 for _ in range(5): # Max 5 iterations sample_keys = random.sample(list(self.cache.keys()), min(sample_size, len(self.cache))) batch_expired = 0 for key in sample_keys: if self._lazy_expire(key): batch_expired += 1 expired_count += batch_expired # Stop if less than 25% expired if batch_expired / len(sample_keys) < 0.25: break def get(self, key: str) -> Optional[any]: """Get with lazy expiration.""" self._maybe_periodic_cleanup() if self._lazy_expire(key): return None return self.cache[key][0] def put(self, key: str, value: any, ttl: Optional[float] = None) -> None: """Insert with optional custom TTL.""" self._maybe_periodic_cleanup() if ttl is None: ttl = self.default_ttl expiry = time() + ttl self.cache[key] = (value, expiry) # Note: capacity eviction would also happen hereSetting TTL too short increases cache misses and origin load. Setting TTL too long serves stale data. For most applications, TTL should align with how frequently the underlying data changes and how tolerant users are of staleness. Common values range from seconds (rate limiting) to hours (user profiles) to days (static content).
Eviction policy choice significantly impacts cache effectiveness. Let's consolidate the decision framework:
| Workload Characteristic | Recommended Policy | Rationale |
|---|---|---|
| General purpose / Unknown | LRU or approximated LRU | Good baseline, simple to operate |
| Sequential scans present | 2Q or W-TinyLFU | Scan resistant without complexity |
| Stable popularity (Zipf) | LFU or W-TinyLFU | Captures frequency-based value |
| Mixed/changing patterns | ARC or W-TinyLFU | Self-adapting |
| Extreme scale (millions QPS) | Approximated LRU | Lower overhead, better concurrency |
You now understand eviction policies from theoretical optimality through production implementation. In the next page, we'll explore Cache Coherence—how to maintain consistency between cached data and the source of truth when both can change.