Loading learning content...
Caches have finite memory. When capacity is exhausted and a new entry must be stored, something must be evicted. But which entry? The answer to this question profoundly impacts cache effectiveness.
Least Recently Used (LRU) is the most widely deployed cache eviction algorithm, used everywhere from CPU caches to Redis to CDN edge nodes. Its principle is elegantly simple: the entry accessed least recently is the one most likely to not be needed again soon.
This bet, known as temporal locality, observes that recently accessed items tend to be accessed again in the near future. It's not always correct—but it's correct often enough to be remarkably effective across diverse workloads.
By the end of this page, you will deeply understand LRU: how it works conceptually, how to implement it efficiently using hash maps and doubly-linked lists, the O(1) vs. approximation trade-offs, variants like LRU-K and SLRU, and when LRU performs well versus poorly. You'll also learn how production systems like Redis implement approximate LRU at scale.
The Core Idea:
Imagine a stack of books where you always remove a book from its current position and place it on top after reading it. Over time, frequently read books stay near the top, while rarely touched books sink to the bottom. When you need to discard a book, you remove the one at the very bottom—it's the one you haven't touched in the longest time.
This is LRU: an ordering of cache entries by recency of access, where eviction always removes the oldest (least recently used) entry.
Formal Definition:
Maintain an ordering of cache entries such that:
12345678910111213141516171819202122232425262728293031323334353637
LRU Cache with capacity 4 Initial state (empty):┌─────────────────────────────┐│ [empty] [empty] [empty] [empty] ││ ← LRU MRU → │└─────────────────────────────┘ After: GET A (miss, insert A)┌─────────────────────────────┐│ [empty] [empty] [empty] [A] │└─────────────────────────────┘ After: GET B, GET C, GET D (insert all)┌─────────────────────────────┐│ [A] [B] [C] [D] ││ ↑ LRU MRU ↑ │└─────────────────────────────┘ After: GET B (hit, move B to MRU)┌─────────────────────────────┐│ [A] [C] [D] [B] ││ ↑ LRU MRU ↑ │└─────────────────────────────┘ After: GET E (miss, evict A since it's LRU, insert E)┌─────────────────────────────┐│ [C] [D] [B] [E] ││ ↑ LRU MRU ↑ ││ (A evicted) │└─────────────────────────────┘ After: GET C (hit, move C to MRU)┌─────────────────────────────┐│ [D] [B] [E] [C] ││ ↑ LRU MRU ↑ │└─────────────────────────────┘Why Temporal Locality Works:
Temporal locality is observed across nearly all computing domains:
LRU capitalizes on this: by keeping recently accessed items and discarding stale ones, it naturally retains the working set.
The theoretically optimal eviction algorithm (OPT or Bélády's algorithm) evicts the entry that will be used furthest in the future. This requires knowing the future, which is impossible. LRU approximates OPT by assuming the recent past predicts the near future. Under workloads with strong temporal locality, LRU performs close to optimal.
An efficient LRU cache requires three operations to be O(1):
Achieving all three in O(1) requires combining two data structures:
The Insight:
The hash map stores (key → node pointer), where each node lives in a doubly-linked list ordered by recency. The list head is MRU (most recently used), the tail is LRU. Moving a node to head is O(1) since we have its pointer from the hash map.
1234567891011121314151617181920212223242526272829303132333435363738
┌─────────────────────────────────────────────────────────────────┐│ LRU CACHE STRUCTURE │└─────────────────────────────────────────────────────────────────┘ Hash Map (key → node pointer):┌─────────────────────────────────────────┐│ "user:1" → ●──────┐ ││ "user:2" → ●────┐ │ ││ "user:3" → ●──┐ │ │ │└─────────────────│─│─│───────────────────┘ │ │ │ │ │ │Doubly-Linked List (ordered by recency): │ │ │HEAD (MRU) ←──────│─│─┘ ↓ │ │┌────────────┐ │ ││ key: "user:1" │←─┘ ││ val: {...} │ ││ prev: null │ ││ next: ●─────┼─────│──┐└────────────┘ │ │ │ │┌────────────┐ │ ││ key: "user:2" │←───┘ ││ val: {...} │ ││ prev: ●──────────────┼───┐ (points to user:1)│ next: ●─────────────┐│ │└─────────────────────││───┘ ↓│┌────────────┐ │││ key: "user:3" │←─────┘││ val: {...} │ ││ prev: ●──────────────┘ (points to user:2)│ next: null │└────────────┘ ↓TAIL (LRU) ← evict from here123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143
from typing import TypeVar, Generic, Optional, Dictfrom dataclasses import dataclass K = TypeVar('K')V = TypeVar('V') @dataclassclass ListNode(Generic[K, V]): """Doubly-linked list node for LRU ordering.""" key: K value: V prev: Optional['ListNode[K, V]'] = None next: Optional['ListNode[K, V]'] = None class LRUCache(Generic[K, V]): """ LRU Cache with O(1) get, put, and eviction. Uses a hash map for O(1) key lookup and a doubly-linked list for O(1) recency ordering operations. """ def __init__(self, capacity: int): if capacity <= 0: raise ValueError("Capacity must be positive") self.capacity = capacity self.cache: Dict[K, ListNode[K, V]] = {} # Sentinel nodes simplify edge cases (empty list, single element) self.head = ListNode(None, None) # Dummy head (MRU side) self.tail = ListNode(None, None) # Dummy tail (LRU side) self.head.next = self.tail self.tail.prev = self.head def get(self, key: K) -> Optional[V]: """ Retrieve value by key. Returns None if not found. On cache hit, moves entry to MRU position. Time complexity: O(1) """ if key not in self.cache: return None # Cache miss node = self.cache[key] # Move to front (MRU position) self._remove_node(node) self._add_to_front(node) return node.value def put(self, key: K, value: V) -> Optional[K]: """ Store key-value pair. Returns evicted key if eviction occurred. If key exists, updates value and moves to MRU. If key is new and cache is full, evicts LRU entry first. Time complexity: O(1) """ evicted_key = None if key in self.cache: # Update existing entry node = self.cache[key] node.value = value self._remove_node(node) self._add_to_front(node) else: # New entry if len(self.cache) >= self.capacity: # Evict LRU (node before tail sentinel) lru_node = self.tail.prev evicted_key = lru_node.key self._remove_node(lru_node) del self.cache[lru_node.key] # Add new entry new_node = ListNode(key, value) self.cache[key] = new_node self._add_to_front(new_node) return evicted_key def delete(self, key: K) -> bool: """ Remove key from cache. Returns True if key existed. Time complexity: O(1) """ if key not in self.cache: return False node = self.cache[key] self._remove_node(node) del self.cache[key] return True def _remove_node(self, node: ListNode[K, V]) -> None: """Remove node from its current position in the list.""" node.prev.next = node.next node.next.prev = node.prev def _add_to_front(self, node: ListNode[K, V]) -> None: """Add node right after head (MRU position).""" node.next = self.head.next node.prev = self.head self.head.next.prev = node self.head.next = node def peek_lru(self) -> Optional[K]: """Return the LRU key without removing it.""" if self.tail.prev == self.head: return None # Empty cache return self.tail.prev.key def __len__(self) -> int: return len(self.cache) def __contains__(self, key: K) -> bool: return key in self.cache # Usage example:cache = LRUCache[str, dict](capacity=1000) # Store user datacache.put("user:123", {"name": "Alice", "email": "alice@example.com"})cache.put("user:456", {"name": "Bob", "email": "bob@example.com"}) # Retrieve (moves to MRU)user = cache.get("user:123") # Returns {"name": "Alice", ...} # Check if existsif "user:789" in cache: print("Found!")else: print("Not cached") # This prints # Get LRU key (for monitoring/debugging)lru_key = cache.peek_lru()print(f"Next to be evicted: {lru_key}")Using dummy head and tail nodes eliminates special-case handling for empty lists and single-element lists. Every real node always has valid prev and next pointers, making the code cleaner and less error-prone.
Production caches are accessed by multiple threads simultaneously. A naive LRU implementation will corrupt its internal state under concurrent access. There are several approaches to thread-safety, each with different performance characteristics.
Approach 1: Global Lock
Wrap all operations in a mutex. Simple but creates contention under high concurrency.
class ThreadSafeLRUCache:
def __init__(self, capacity: int):
self._cache = LRUCache(capacity)
self._lock = threading.Lock()
def get(self, key):
with self._lock:
return self._cache.get(key)
def put(self, key, value):
with self._lock:
return self._cache.put(key, value)
Approach 2: Read-Write Lock
Allows concurrent reads, exclusive writes. Better for read-heavy workloads, but LRU complicates this because reads modify the list (recency update).
Approach 3: Sharded LRU (Recommended for High Concurrency)
Partition the key space into N independent LRU caches. Each shard has its own lock, reducing contention N-fold. Keys are mapped to shards via hash.
Approach 4: Lock-Free Approximation
Maintain recency information approximately without precise ordering. Accept that under contention, recency order may be slightly imprecise. This is what Redis does.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
import threadingimport hashlibfrom typing import TypeVar, Generic, Optional, List K = TypeVar('K')V = TypeVar('V') class ShardedLRUCache(Generic[K, V]): """ Thread-safe LRU cache using sharding for reduced lock contention. Keys are partitioned across N independent LRU caches, each with its own lock. This allows N concurrent operations (on different shards) with no contention. """ def __init__(self, capacity: int, num_shards: int = 16): """ Args: capacity: Total capacity across all shards num_shards: Number of independent shards (more shards = less contention) """ self.num_shards = num_shards shard_capacity = max(1, capacity // num_shards) # Create independent LRU caches, each with its own lock self._shards: List[LRUCache[K, V]] = [ LRUCache(shard_capacity) for _ in range(num_shards) ] self._locks: List[threading.Lock] = [ threading.Lock() for _ in range(num_shards) ] def _get_shard_index(self, key: K) -> int: """Map key to shard index using consistent hashing.""" key_bytes = str(key).encode('utf-8') hash_value = int(hashlib.md5(key_bytes).hexdigest(), 16) return hash_value % self.num_shards def get(self, key: K) -> Optional[V]: """Thread-safe get with per-shard locking.""" shard_idx = self._get_shard_index(key) with self._locks[shard_idx]: return self._shards[shard_idx].get(key) def put(self, key: K, value: V) -> Optional[K]: """Thread-safe put with per-shard locking.""" shard_idx = self._get_shard_index(key) with self._locks[shard_idx]: return self._shards[shard_idx].put(key, value) def delete(self, key: K) -> bool: """Thread-safe delete with per-shard locking.""" shard_idx = self._get_shard_index(key) with self._locks[shard_idx]: return self._shards[shard_idx].delete(key) def __len__(self) -> int: """Total entries across all shards.""" total = 0 for i, shard in enumerate(self._shards): with self._locks[i]: total += len(shard) return total def stats(self) -> dict: """Return per-shard statistics for monitoring.""" return { "num_shards": self.num_shards, "shard_sizes": [len(s) for s in self._shards], "total_entries": sum(len(s) for s in self._shards), } # Usage:cache = ShardedLRUCache[str, bytes]( capacity=100000, # 100K total entries num_shards=64 # 64 independent shards) # Under high concurrency, 64 threads can access different shards simultaneously# without any lock contention # Monitor shard balanceprint(cache.stats())# {'num_shards': 64, 'shard_sizes': [...], 'total_entries': ...}More shards reduce contention but increase metadata overhead and can skew eviction behavior (each shard evicts independently). A common choice is 16-64 shards for most workloads. Monitor shard size distribution to detect hot shards that might benefit from further splitting.
Classic LRU has a significant weakness: it's susceptible to scan pollution. A one-time sequential scan of many items can evict the entire working set, even though those scanned items will never be accessed again.
Example: Scan Pollution
Cache: [A B C D] (working set, frequently accessed)
Scan operation reads: X, Y, Z, W, V, U, T, S... (one-time only)
After scan: [U T S R] (working set evicted, replaced by scan items)
Result: Next access to A, B, C, D are all cache misses!
Several LRU variants address this weakness:
LRU-K (K-th Occurrence LRU)
Instead of evicting the least recently accessed item, LRU-K evicts the item whose K-th most recent access is oldest. This means an item must be accessed K times before it's considered "established" in the cache.
LRU-2 (K=2) is most common:
Benefits:
Drawbacks:
SLRU (Segmented LRU)
SLRU divides the cache into two segments:
Eviction order:
┌─────────────────────────────────────────────┐
│ SEGMENTED LRU (SLRU) │
├─────────────────────────────────────────────┤
│ PROTECTED (80%) │ PROBATIONARY (20%) │
│ [D] [C] [B] [A] │ [X] [Y] [Z] │
│ ← LRU MRU →│ ← LRU MRU → │
│ │ ↑ │
│ Accessed 2+ times│ Evict from here first│
│ Stays longer │ One-time access items │
└─────────────────────────────────────────────┘
SLRU is simpler than LRU-K (no timestamp history needed) while providing similar scan resistance. It's used in many production systems including some database buffer pools.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120
from typing import TypeVar, Generic, Optional K = TypeVar('K')V = TypeVar('V') class SegmentedLRUCache(Generic[K, V]): """ Segmented LRU (SLRU) cache with scan resistance. Divides cache into two LRU segments: - Probationary: new entries land here - Protected: entries accessed again are promoted here Eviction prefers probationary segment, protecting the working set. """ def __init__(self, capacity: int, protected_ratio: float = 0.8): """ Args: capacity: Total cache capacity protected_ratio: Fraction of capacity for protected segment (0.0-1.0) """ self.protected_capacity = int(capacity * protected_ratio) self.probationary_capacity = capacity - self.protected_capacity self.protected = LRUCache[K, V](self.protected_capacity) self.probationary = LRUCache[K, V](self.probationary_capacity) def get(self, key: K) -> Optional[V]: """ Retrieve value. If in protected segment: return value, update recency in protected. If in probationary segment: PROMOTE to protected, return value. If not found: return None (cache miss). """ # Check protected segment first (hot data) value = self.protected.get(key) if value is not None: return value # Check probationary segment value = self.probationary.get(key) if value is not None: # Promote to protected segment (second access = established) self.probationary.delete(key) # If protected is full, demote LRU of protected to probationary if len(self.protected) >= self.protected_capacity: demoted_key = self.protected.peek_lru() demoted_value = self.protected.get(demoted_key) # Get before delete self.protected.delete(demoted_key) self.probationary.put(demoted_key, demoted_value) self.protected.put(key, value) return value return None # Cache miss def put(self, key: K, value: V) -> None: """ Store key-value pair. If key exists in either segment: update value and recency. If new: insert into probationary segment. """ # Check if key exists in either segment if key in self.protected.cache: self.protected.put(key, value) return if key in self.probationary.cache: # Update and promote to protected self.probationary.delete(key) if len(self.protected) >= self.protected_capacity: demoted_key = self.protected.peek_lru() demoted_value = self.protected.get(demoted_key) self.protected.delete(demoted_key) self.probationary.put(demoted_key, demoted_value) self.protected.put(key, value) return # New entry: insert into probationary self.probationary.put(key, value) def delete(self, key: K) -> bool: """Delete key from whichever segment contains it.""" if self.protected.delete(key): return True return self.probationary.delete(key) def stats(self) -> dict: """Return statistics for monitoring.""" return { "protected_count": len(self.protected), "protected_capacity": self.protected_capacity, "probationary_count": len(self.probationary), "probationary_capacity": self.probationary_capacity, "protected_utilization": len(self.protected) / self.protected_capacity if self.protected_capacity > 0 else 0, "probationary_utilization": len(self.probationary) / self.probationary_capacity if self.probationary_capacity > 0 else 0, } # Usage:cache = SegmentedLRUCache[str, bytes](capacity=10000, protected_ratio=0.8) # First access: goes to probationarycache.put("key:1", b"value1")# In probationary segment # Second access: promotes to protected_ = cache.get("key:1")# Now in protected segment # Sequential scan: all stay in probationaryfor i in range(100): cache.put(f"scan:{i}", b"scan_data")# These won't evict established items in protectedClassic LRU: Good for workloads with consistent temporal locality, minimal scans. Simple and fast.
SLRU: Good for workloads with occasional scans or mixed frequency patterns. Moderate complexity.
LRU-K: Best for highly varied workloads where access frequency matters significantly. Most complex but most adaptive.
Maintaining perfect LRU ordering is expensive at scale. Every cache access requires updating the doubly-linked list, which involves pointer manipulation and potentially lock contention. For caches with millions of entries and millions of operations per second, this overhead is significant.
Approximate LRU trades precision for performance by sampling rather than maintaining exact ordering.
Redis's Approximate LRU:
Redis (prior to 4.0) uses a clever approximate LRU:
This is O(N) where N is a small constant (sample size), not O(total_keys). It's remarkably effective despite being approximate.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130
import randomimport timefrom typing import Dict, Any, Optionalfrom dataclasses import dataclass @dataclassclass CacheEntry: value: Any last_access_time: int # 24-bit timestamp (wraps ~194 days) class ApproximateLRUCache: """ Approximate LRU cache using random sampling for eviction. Instead of maintaining expensive ordering structure, stores last-access timestamp per key and samples during eviction. This is similar to Redis's LRU implementation (pre-4.0). """ def __init__(self, capacity: int, sample_size: int = 5): """ Args: capacity: Maximum number of entries sample_size: Number of keys to sample during eviction decision """ self.capacity = capacity self.sample_size = sample_size self.store: Dict[str, CacheEntry] = {} def _current_lru_clock(self) -> int: """ Return current LRU clock value (24-bit, wraps). Using milliseconds >> 10 gives ~1 second resolution. 24 bits at 1/sec resolution = ~194 days before wrap. """ return int(time.time() * 1000) >> 10 & 0xFFFFFF def get(self, key: str) -> Optional[Any]: """Get value and update access time.""" entry = self.store.get(key) if entry is None: return None # Update last access time (the only "ordering" we maintain) entry.last_access_time = self._current_lru_clock() return entry.value def put(self, key: str, value: Any) -> None: """Store value, evicting if necessary.""" # Evict until we have space while len(self.store) >= self.capacity and key not in self.store: self._evict_approximate_lru() current_time = self._current_lru_clock() if key in self.store: # Update existing self.store[key].value = value self.store[key].last_access_time = current_time else: # Insert new self.store[key] = CacheEntry(value=value, last_access_time=current_time) def _evict_approximate_lru(self) -> Optional[str]: """ Evict approximately the LRU entry using random sampling. 1. Sample 'sample_size' random keys 2. Find the one with oldest last_access_time 3. Evict it This is O(sample_size), not O(n), making it fast for large caches. """ if not self.store: return None keys = list(self.store.keys()) # Sample random keys sample_count = min(self.sample_size, len(keys)) sampled_keys = random.sample(keys, sample_count) # Find the oldest among samples oldest_key = None oldest_time = float('inf') for key in sampled_keys: entry = self.store[key] # Handle time wrap-around by computing relative age age = self._compute_age(entry.last_access_time) if age > oldest_time: oldest_time = age oldest_key = key if oldest_key is None: oldest_key = sampled_keys[0] # Evict del self.store[oldest_key] return oldest_key def _compute_age(self, last_access: int) -> int: """ Compute age of entry, handling 24-bit wrap-around. """ current = self._current_lru_clock() # Handle wrap-around: if current < last_access, we wrapped if current >= last_access: return current - last_access else: return (0xFFFFFF - last_access) + current def __len__(self) -> int: return len(self.store) # Comparison: Approximate vs Exact LRU hit rates # For most workloads, approximate LRU with sample size 5-10# achieves 95-99% of the hit rate of exact LRU, while being# significantly faster and simpler. # Increasing sample size improves accuracy but costs more per eviction:# Sample Size 5: ~95% of optimal LRU hit rate# Sample Size 10: ~97% of optimal LRU hit rate# Sample Size 20: ~99% of optimal LRU hit rate| Aspect | Exact LRU | Approximate LRU |
|---|---|---|
| Time per access | O(1) but with pointer manipulation | O(1) just timestamp update |
| Time per eviction | O(1) | O(sample_size) ≈ O(1) |
| Space overhead | ~24 bytes/entry (prev/next pointers) | 8 bytes/entry (timestamp) |
| Hit rate | Optimal (for LRU assumption) | ~95-99% of optimal |
| Implementation | Complex (doubly-linked list) | Simple (hash map + timestamps) |
| Concurrency | List updates create contention | Timestamp updates are atomic |
Redis 4.0 introduced LFU (Least Frequently Used) as an alternative to approximate LRU. LFU better handles some workloads where access frequency matters more than recency. Redis allows choosing between LRU and LFU with the maxmemory-policy configuration.
LRU is not universally optimal. Understanding its failure modes helps you recognize when to choose alternative strategies.
Failure Case 1: Sequential Scan (Repeated)
Workload: Looping through N items repeatedly, where N > cache size.
Cache size: 4
Data set: A, B, C, D, E, F
Access pattern: A, B, C, D, E, F, A, B, C, D, E, F, ...
With LRU:
- Cache: [A B C D], access E → evict A → [B C D E]
- Access F → evict B → [C D E F]
- Access A → cache miss! → evict C → [D E F A]
- Access B → cache miss! → evict D → [E F A B]
- ...
Result: 0% hit rate forever! Every access is a miss.
This is called the thrashing pattern. The cache size is slightly smaller than the working set, causing continuous eviction of items about to be accessed.
Failure Case 2: Working Set Size Volatility
Workload: Working set changes dramatically over time.
Morning: Users A-F are active (6 items)
Afternoon: Users G-L are active (different 6 items)
Evening: Users A-B return (subset of morning)
With LRU:
- Evening access to A: cache miss (evicted hours ago)
- Evening access to B: cache miss
LRU assumes recent items are relevant. When recency doesn't predict future access, LRU wastes capacity on items that won't be accessed again.
Failure Case 3: Frequency Blindness
Workload: Mix of frequently and infrequently accessed items.
Item A: accessed 1000 times/hour (very hot)
Item B: accessed 1 time/hour (cold)
Item C: accessed 1 time (one-time)
If C is accessed after A:
LRU order: [... A, C]
C is now "more recent" than A!
If cache is nearly full and new items arrive:
A might be evicted before C, despite A being 1000x more valuable.
LRU doesn't consider access frequency, only recency. LFU (Least Frequently Used) handles this better.
Don't assume LRU is right for your workload. Measure your actual hit rate and compare against theoretical baselines. If hit rate is unexpectedly low, analyze the access pattern. It might be a scan-heavy workload where SLRU or LFU would perform better.
Let's examine how real production systems implement and configure LRU.
Redis maxmemory-policy Options:
# Classic LRU (approximate)
maxmemory-policy volatile-lru # LRU among keys with TTL set
maxmemory-policy allkeys-lru # LRU among all keys
# LFU alternatives (Redis 4.0+)
maxmemory-policy volatile-lfu # LFU among keys with TTL
maxmemory-policy allkeys-lfu # LFU among all keys
# LRU tuning
maxmemory-samples 5 # Sample size for approximate LRU (default: 5)
lfu-log-factor 10 # LFU frequency counter logarithm base
lfu-decay-time 1 # Minutes for LFU counter decay
Memcached LRU:
Memcached uses a segmented LRU with multiple "slab classes" based on item size. Each slab class has its own LRU list. This prevents small items from evicting large items and vice versa.
Slab 1 (64 bytes): LRU: [item1] → [item2] → [item3]
Slab 2 (128 bytes): LRU: [item4] → [item5]
Slab 3 (256 bytes): LRU: [item6] → [item7] → [item8]
...
Guava Cache (Java):
Google's Guava library provides a sophisticated cache with configurable eviction:
Cache<Key, Value> cache = CacheBuilder.newBuilder()
.maximumSize(10000)
.expireAfterWrite(10, TimeUnit.MINUTES)
.expireAfterAccess(5, TimeUnit.MINUTES) // Sliding TTL
.concurrencyLevel(16) // Sharding
.recordStats() // Enable hit rate metrics
.build();
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
from prometheus_client import Counter, Gauge, Histogramimport functools # Define metricsCACHE_HITS = Counter('cache_hits_total', 'Total cache hits', ['cache_name'])CACHE_MISSES = Counter('cache_misses_total', 'Total cache misses', ['cache_name'])CACHE_EVICTIONS = Counter('cache_evictions_total', 'Total evictions', ['cache_name', 'reason'])CACHE_SIZE = Gauge('cache_size', 'Current cache size', ['cache_name'])CACHE_HIT_RATE = Gauge('cache_hit_rate', 'Rolling hit rate', ['cache_name']) class InstrumentedLRUCache: """ LRU cache with Prometheus metrics for production monitoring. """ def __init__(self, name: str, capacity: int): self.name = name self.cache = LRUCache(capacity) self._hits = 0 self._misses = 0 def get(self, key): value = self.cache.get(key) if value is not None: CACHE_HITS.labels(cache_name=self.name).inc() self._hits += 1 else: CACHE_MISSES.labels(cache_name=self.name).inc() self._misses += 1 # Update rolling hit rate total = self._hits + self._misses if total > 0: CACHE_HIT_RATE.labels(cache_name=self.name).set(self._hits / total) return value def put(self, key, value): evicted = self.cache.put(key, value) if evicted: CACHE_EVICTIONS.labels(cache_name=self.name, reason='lru').inc() CACHE_SIZE.labels(cache_name=self.name).set(len(self.cache)) return evicted # Example Grafana alert:# Alert when hit rate drops below threshold# # expr: cache_hit_rate{cache_name="products"} < 0.7# for: 5m# labels:# severity: warning# annotations:# summary: "Low cache hit rate"# description: "Product cache hit rate is {{ $value }}, expected > 70%"A sudden drop in hit rate often indicates: (1) cache is too small for current working set, (2) access pattern changed (scans, new features), or (3) TTL is too aggressive. Trend alerts on hit rate are more valuable than absolute thresholds.
LRU is the workhorse of cache eviction algorithms, balancing effectiveness with simplicity. Let's consolidate the key insights:
What's Next:
LRU focuses on recency, but sometimes frequency is a better predictor of future access. The next page explores LFU (Least Frequently Used)—an alternative that evicts items based on how often they're accessed rather than when they were last accessed.
You now have deep expertise in LRU cache eviction. You can implement O(1) LRU, design thread-safe variants, choose the right LRU variant for your workload, and monitor LRU caches in production. This forms the foundation for understanding all cache eviction strategies.