Loading learning content...
Every cache faces an unavoidable constraint: memory is finite. No matter how much RAM you provision, your cache will eventually fill. When it does, you must answer two fundamental questions:
How much memory should this cache consume? Too little, and hit rates suffer. Too much, and you're wasting expensive resources.
When full, what should be evicted to make room? The wrong eviction policy can destroy cache effectiveness, turning a 95% hit rate into 5%.
These questions might seem like operational details, but they're actually architectural decisions with deep implications. A cache with the wrong size or eviction policy doesn't just underperform—it can create cascading failures, cost overruns, and user experience degradation.
Principal engineers understand that cache sizing and eviction are not one-time configuration choices. They're ongoing optimization targets that require workload analysis, monitoring, and periodic adjustment.
By the end of this page, you will understand how to determine appropriate cache sizes, select eviction policies for different workloads, handle memory pressure gracefully, and monitor cache efficiency. You'll learn the mathematics behind cache sizing and the algorithms behind eviction policies.
Determining the right cache size requires understanding the relationship between cache size, hit rate, and cost. This relationship is rarely linear—there are usually diminishing returns as cache size increases.
The Cache Size / Hit Rate Curve:
For most workloads, the relationship between cache size and hit rate follows a characteristic curve:
| Cache Size (% of total data) | Hit Rate | Cost per GB/month | Requests Saved per $ |
|---|---|---|---|
| 5% | 60% | $10 | 600K |
| 10% | 75% | $20 | 375K |
| 25% | 88% | $50 | 176K |
| 50% | 95% | $100 | 95K |
| 100% | 99% | $200 | 49K |
This table illustrates the diminishing returns principle: doubling cache size from 5% to 10% improves hit rate by 15 percentage points (60% → 75%), but doubling from 50% to 100% improves it by only 4 points (95% → 99%).
The optimal cache size depends on:
Many real-world workloads follow Pareto-like distributions: 20% of items account for 80% of accesses. For such workloads, a cache holding ~20% of data often achieves ~80% hit rate. Start here and adjust based on actual metrics.
The working set is the subset of data actively accessed within a time window. Understanding your working set is essential for cache sizing—if your cache can hold the entire working set, you achieve near-perfect hit rates.
Formal Definition:
For a time window T, the working set W(T) is the set of distinct items accessed during T. The working set size |W(T)| determines the minimum cache size for high hit rates.
Working Set Characteristics:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133
/** * Working set analysis for cache sizing decisions. */interface AccessEvent { key: string; timestamp: number; sizeBytes: number;} class WorkingSetAnalyzer { private accessLog: AccessEvent[] = []; /** * Record an access event for analysis. */ recordAccess(key: string, sizeBytes: number): void { this.accessLog.push({ key, timestamp: Date.now(), sizeBytes, }); // Maintain bounded log (last 24 hours) const cutoff = Date.now() - 24 * 60 * 60 * 1000; this.accessLog = this.accessLog.filter(e => e.timestamp > cutoff); } /** * Calculate working set size for different time windows. */ analyzeWorkingSets(): WorkingSetReport { const now = Date.now(); return { oneHour: this.workingSet(now - 60 * 60 * 1000), fourHours: this.workingSet(now - 4 * 60 * 60 * 1000), oneDay: this.workingSet(now - 24 * 60 * 60 * 1000), }; } private workingSet(since: number): WorkingSetMetrics { const relevant = this.accessLog.filter(e => e.timestamp >= since); // Unique keys accessed const uniqueKeys = new Set(relevant.map(e => e.key)); // Size by key (latest size for each key) const sizeByKey = new Map<string, number>(); for (const event of relevant) { sizeByKey.set(event.key, event.sizeBytes); } // Calculate total working set size let totalBytes = 0; for (const size of sizeByKey.values()) { totalBytes += size; } // Calculate access frequency distribution const accessCounts = new Map<string, number>(); for (const event of relevant) { accessCounts.set(event.key, (accessCounts.get(event.key) || 0) + 1); } return { uniqueItems: uniqueKeys.size, totalBytes, totalAccesses: relevant.length, frequencyDistribution: this.calculateFrequencyDistribution(accessCounts), }; } private calculateFrequencyDistribution( counts: Map<string, number> ): FrequencyDistribution { const values = Array.from(counts.values()).sort((a, b) => b - a); const total = values.reduce((sum, v) => sum + v, 0); // What percentage of items account for X% of accesses? let cumulative = 0; let itemsFor80Pct = 0; for (const count of values) { cumulative += count; itemsFor80Pct++; if (cumulative >= total * 0.8) break; } return { itemsFor80PercentAccesses: itemsFor80Pct, percentOfItems: (itemsFor80Pct / values.length) * 100, isSkewed: (itemsFor80Pct / values.length) < 0.3, // Less than 30% = skewed }; }} interface WorkingSetMetrics { uniqueItems: number; totalBytes: number; totalAccesses: number; frequencyDistribution: FrequencyDistribution;} interface FrequencyDistribution { itemsFor80PercentAccesses: number; percentOfItems: number; isSkewed: boolean;} interface WorkingSetReport { oneHour: WorkingSetMetrics; fourHours: WorkingSetMetrics; oneDay: WorkingSetMetrics;} // Usage and interpretationconst analyzer = new WorkingSetAnalyzer(); // ... record access events over time ... const report = analyzer.analyzeWorkingSets(); console.log(`1-hour working set: ${report.oneHour.uniqueItems} items, ${report.oneHour.totalBytes / 1e6} MB`);console.log(`24-hour working set: ${report.oneDay.uniqueItems} items, ${report.oneDay.totalBytes / 1e6} MB`); // Recommendation logicif (report.oneHour.frequencyDistribution.isSkewed) { console.log("Workload is skewed - small cache can be effective"); console.log(`Cache ~10% of 1-hour working set: ${report.oneHour.totalBytes * 0.1 / 1e6} MB`);} else { console.log("Workload is uniform - larger cache needed"); console.log(`Cache ~50% of 1-hour working set: ${report.oneHour.totalBytes * 0.5 / 1e6} MB`);}Working sets shift during special events (product launches, sales, viral content). Monitor working set size continuously, not just during normal operations. Your cache size should accommodate peak working sets, or you'll see hit rate collapse during high-traffic events.
When a cache is full and a new item must be inserted, the eviction policy determines which existing item to remove. The right policy depends on your access patterns.
Understanding Policy Tradeoffs:
Every policy makes predictions about future accesses based on past behavior. No policy is universally best—each optimizes for different patterns.
Least Recently Used (LRU) evicts the item that hasn't been accessed for the longest time.
Assumption: Items accessed recently are likely to be accessed again soon.
Best for: General-purpose caching, web applications, API responses, user sessions.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
/** * LRU Cache implementation using a Map (insertion order) + size tracking. * Map maintains insertion order; we move accessed items to end. */class LRUCache<K, V> { private cache: Map<K, { value: V; size: number }>; private currentSize: number = 0; private readonly maxSize: number; // Metrics private hits: number = 0; private misses: number = 0; private evictions: number = 0; constructor(maxSizeBytes: number) { this.maxSize = maxSizeBytes; this.cache = new Map(); } get(key: K): V | undefined { const entry = this.cache.get(key); if (entry) { // Move to end (most recently used) this.cache.delete(key); this.cache.set(key, entry); this.hits++; return entry.value; } this.misses++; return undefined; } set(key: K, value: V, sizeBytes: number): void { // If key exists, remove it first if (this.cache.has(key)) { const existing = this.cache.get(key)!; this.currentSize -= existing.size; this.cache.delete(key); } // Evict until space available while (this.currentSize + sizeBytes > this.maxSize && this.cache.size > 0) { this.evictOldest(); } // Insert at end (most recently used) this.cache.set(key, { value, size: sizeBytes }); this.currentSize += sizeBytes; } private evictOldest(): void { // First key in Map is the oldest const oldestKey = this.cache.keys().next().value; if (oldestKey !== undefined) { const entry = this.cache.get(oldestKey)!; this.cache.delete(oldestKey); this.currentSize -= entry.size; this.evictions++; } } getMetrics(): CacheMetrics { return { size: this.currentSize, itemCount: this.cache.size, hitRate: this.hits / (this.hits + this.misses) || 0, hits: this.hits, misses: this.misses, evictions: this.evictions, }; }} interface CacheMetrics { size: number; itemCount: number; hitRate: number; hits: number; misses: number; evictions: number;} // Strengths: Simple, low overhead, works well for recency-based patterns// Weaknesses: Vulnerable to scan pollution (sequential access destroys cache)LRU is vulnerable to scan pollution: a single sequential scan through data (e.g., a batch job) can evict all frequently-used items. If this is a concern, consider LRU with scan-resistance (SLRU) or LFU.
Choosing the right eviction policy requires understanding your workload characteristics. Here's a decision framework:
| Workload Pattern | Recommended Policy | Avoid | Key Consideration |
|---|---|---|---|
| Temporal locality (recent = valuable) | LRU | FIFO | Standard web applications |
| Stable popularity (some items always hot) | LFU / W-TinyLFU | Pure LRU | CDN, static content |
| Sequential scans frequent | LFU, SLRU, ARC | Pure LRU | Database, analytics |
| Changing popularity over time | ARC, LIRS, W-TinyLFU | Pure LFU | Trending content, news |
| Memory/CPU constrained | Clock, FIFO | ARC (overhead) | Embedded systems |
| Mixed/unknown workload | ARC, 2Q, CAR | Single-policy | General-purpose systems |
Don't over-engineer eviction policy selection. Start with LRU—it's simple, well-understood, and works well for most workloads. Measure hit rates. If they're below expectations, profile your access patterns and consider switching policies. Most applications never need anything beyond LRU.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
/** * Eviction policy recommendation based on workload analysis. */interface WorkloadAnalysis { accessPattern: 'temporal' | 'popularity' | 'sequential' | 'mixed'; workingSetStability: 'stable' | 'shifting' | 'volatile'; memoryConstraint: 'tight' | 'moderate' | 'generous'; performanceRequirement: 'latency-critical' | 'throughput' | 'balanced';} function recommendPolicy(analysis: WorkloadAnalysis): PolicyRecommendation { // Decision tree for policy selection if (analysis.memoryConstraint === 'tight') { return { policy: 'Clock', rationale: 'Clock has lowest memory overhead (no per-entry metadata)', alternative: 'FIFO', }; } if (analysis.accessPattern === 'temporal' && analysis.workingSetStability !== 'volatile') { return { policy: 'LRU', rationale: 'LRU excels at temporal locality patterns', alternative: 'S-LRU (if scans are concern)', }; } if (analysis.accessPattern === 'popularity' && analysis.workingSetStability === 'stable') { return { policy: 'LFU', rationale: 'LFU preserves frequently-accessed items regardless of recency', alternative: 'W-TinyLFU (if some temporal variation)', }; } if (analysis.accessPattern === 'sequential') { return { policy: 'SLRU or 2Q', rationale: 'Segmented LRU resists scan pollution', alternative: 'ARC', }; } if (analysis.accessPattern === 'mixed' || analysis.workingSetStability === 'volatile') { return { policy: 'ARC or W-TinyLFU', rationale: 'Adaptive policies handle workload changes', alternative: 'CAR (if IBM patent is concern)', }; } // Default fallback return { policy: 'LRU', rationale: 'Safe default that works well for most workloads', alternative: 'Measure and adjust based on hit rate', };} interface PolicyRecommendation { policy: string; rationale: string; alternative: string;}When system memory runs low, caches must respond gracefully. Uncontrolled cache growth can lead to out-of-memory crashes, while overly aggressive shrinking destroys hit rates.
Memory Pressure Scenarios:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156
/** * Memory pressure handling strategies for cache systems. */interface MemoryPressureConfig { softLimit: number; // Start gentle eviction (bytes) hardLimit: number; // Aggressive eviction threshold criticalLimit: number; // Emergency purge threshold checkIntervalMs: number;} class MemoryPressureHandler<K, V> { private cache: Map<K, CacheEntry<V>> = new Map(); private currentSize: number = 0; private readonly config: MemoryPressureConfig; // Monitoring private pressure: 'normal' | 'elevated' | 'high' | 'critical' = 'normal'; private evictionMultiplier: number = 1; constructor(config: MemoryPressureConfig) { this.config = config; this.startPressureMonitoring(); } private startPressureMonitoring(): void { setInterval(() => { this.assessPressure(); }, this.config.checkIntervalMs); } private assessPressure(): void { const ratio = this.currentSize / this.config.softLimit; if (ratio < 0.8) { this.pressure = 'normal'; this.evictionMultiplier = 1; } else if (ratio < 1.0) { this.pressure = 'elevated'; this.evictionMultiplier = 1.5; // Evict 50% more aggressively } else if (this.currentSize < this.config.hardLimit) { this.pressure = 'high'; this.evictionMultiplier = 3; // Evict 3x more this.proactiveEviction(); } else if (this.currentSize < this.config.criticalLimit) { this.pressure = 'critical'; this.emergencyPurge(0.25); // Purge 25% of cache } else { // Over critical limit - drastic measures this.emergencyPurge(0.5); // Purge 50% of cache } } /** * Proactive eviction during high pressure. * Evict oldest/least-valuable items preemptively. */ private proactiveEviction(): void { const targetSize = this.config.softLimit * 0.9; // Target 90% of soft limit const toEvict = this.currentSize - targetSize; let evicted = 0; const sortedKeys = this.getKeysByPriority(); for (const key of sortedKeys) { if (evicted >= toEvict) break; const entry = this.cache.get(key)!; this.remove(key); evicted += entry.size; } console.log(`Proactive eviction: freed ${evicted} bytes`); } /** * Emergency purge - rapidly reduce cache size. */ private emergencyPurge(fraction: number): void { console.warn(`Emergency cache purge: removing ${fraction * 100}% of entries`); const keys = Array.from(this.cache.keys()); const toRemove = Math.floor(keys.length * fraction); // Remove oldest entries first for (let i = 0; i < toRemove; i++) { this.remove(keys[i]); } } private getKeysByPriority(): K[] { // Sort by last access time (oldest first) return Array.from(this.cache.entries()) .sort((a, b) => a[1].lastAccess - b[1].lastAccess) .map(([key]) => key); } /** * Set with pressure-aware behavior. */ set(key: K, value: V, size: number): void { // Reject if critical and item is large if (this.pressure === 'critical' && size > this.config.softLimit * 0.01) { console.warn(`Rejecting large cache entry during critical pressure`); return; } // Evict with multiplier const targetFree = size * this.evictionMultiplier; this.ensureSpace(targetFree); this.cache.set(key, { value, size, lastAccess: Date.now(), }); this.currentSize += size; } private ensureSpace(needed: number): void { while (this.currentSize + needed > this.config.softLimit && this.cache.size > 0) { const oldestKey = this.cache.keys().next().value; this.remove(oldestKey); } } private remove(key: K): void { const entry = this.cache.get(key); if (entry) { this.currentSize -= entry.size; this.cache.delete(key); } } getHealthMetrics(): MemoryHealthMetrics { return { currentSize: this.currentSize, pressure: this.pressure, utilizationPercent: (this.currentSize / this.config.softLimit) * 100, itemCount: this.cache.size, }; }} interface CacheEntry<V> { value: V; size: number; lastAccess: number;} interface MemoryHealthMetrics { currentSize: number; pressure: string; utilizationPercent: number; itemCount: number;}Emergency cache purges cause sudden hit rate drops, spiking load on backing stores. Configure pressure thresholds conservatively to avoid reaching critical state. Monitor pressure metrics and alert before emergencies occur.
Effective cache management requires continuous monitoring. Without visibility into cache behavior, sizing and policy decisions are guesswork.
Essential Cache Metrics:
| Metric | Formula | Healthy Range | Action if Outside |
|---|---|---|---|
| Hit Rate | hits / (hits + misses) | 80% | Increase size or review key design |
| Miss Rate | misses / (hits + misses) | < 20% | Same as hit rate |
| Eviction Rate | evictions / sec | Stable, low | Increase size or review TTLs |
| Memory Utilization | used / max | 60-90% | Adjust size (too low = waste, too high = pressure) |
| Latency (p99) | 99th percentile get latency | < 5ms (local), < 20ms (distributed) | Review memory pressure, network |
| Expired Rate | expirations / evictions | High (items expire before eviction) | Adjust TTLs if eviction >> expiration |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159
/** * Comprehensive cache monitoring and alerting. */interface CacheStats { hits: number; misses: number; evictions: number; expirations: number; currentSize: number; maxSize: number; itemCount: number; getLatencyMs: number[]; // Recent latency samples} class CacheMonitor { private stats: CacheStats = { hits: 0, misses: 0, evictions: 0, expirations: 0, currentSize: 0, maxSize: 0, itemCount: 0, getLatencyMs: [], }; private windowStart: number = Date.now(); recordHit(latencyMs: number): void { this.stats.hits++; this.recordLatency(latencyMs); } recordMiss(latencyMs: number): void { this.stats.misses++; this.recordLatency(latencyMs); } recordEviction(): void { this.stats.evictions++; } recordExpiration(): void { this.stats.expirations++; } private recordLatency(ms: number): void { this.stats.getLatencyMs.push(ms); // Keep last 1000 samples if (this.stats.getLatencyMs.length > 1000) { this.stats.getLatencyMs.shift(); } } updateSizeMetrics(current: number, max: number, items: number): void { this.stats.currentSize = current; this.stats.maxSize = max; this.stats.itemCount = items; } getMetrics(): CacheHealthReport { const windowDuration = (Date.now() - this.windowStart) / 1000; const totalOps = this.stats.hits + this.stats.misses; // Calculate latency percentiles const sorted = [...this.stats.getLatencyMs].sort((a, b) => a - b); const p50 = sorted[Math.floor(sorted.length * 0.5)] || 0; const p95 = sorted[Math.floor(sorted.length * 0.95)] || 0; const p99 = sorted[Math.floor(sorted.length * 0.99)] || 0; return { // Rates hitRate: totalOps > 0 ? this.stats.hits / totalOps : 0, missRate: totalOps > 0 ? this.stats.misses / totalOps : 0, evictionRate: this.stats.evictions / windowDuration, expirationRate: this.stats.expirations / windowDuration, throughput: totalOps / windowDuration, // Utilization memoryUtilization: this.stats.maxSize > 0 ? this.stats.currentSize / this.stats.maxSize : 0, itemCount: this.stats.itemCount, avgItemSize: this.stats.itemCount > 0 ? this.stats.currentSize / this.stats.itemCount : 0, // Latency latencyP50Ms: p50, latencyP95Ms: p95, latencyP99Ms: p99, // Health assessment health: this.assessHealth( totalOps > 0 ? this.stats.hits / totalOps : 0, this.stats.currentSize / this.stats.maxSize, p99 ), }; } private assessHealth( hitRate: number, utilization: number, p99Latency: number ): CacheHealthStatus { const issues: string[] = []; if (hitRate < 0.5) issues.push('Critical: Hit rate below 50%'); else if (hitRate < 0.7) issues.push('Warning: Hit rate below 70%'); if (utilization > 0.95) issues.push('Warning: Memory utilization > 95%'); else if (utilization < 0.3) issues.push('Info: Memory underutilized'); if (p99Latency > 50) issues.push('Critical: p99 latency > 50ms'); else if (p99Latency > 20) issues.push('Warning: p99 latency > 20ms'); return { status: issues.some(i => i.startsWith('Critical')) ? 'critical' : issues.some(i => i.startsWith('Warning')) ? 'warning' : 'healthy', issues, }; } reset(): void { this.stats = { hits: 0, misses: 0, evictions: 0, expirations: 0, currentSize: this.stats.currentSize, maxSize: this.stats.maxSize, itemCount: this.stats.itemCount, getLatencyMs: [], }; this.windowStart = Date.now(); }} interface CacheHealthReport { hitRate: number; missRate: number; evictionRate: number; expirationRate: number; throughput: number; memoryUtilization: number; itemCount: number; avgItemSize: number; latencyP50Ms: number; latencyP95Ms: number; latencyP99Ms: number; health: CacheHealthStatus;} interface CacheHealthStatus { status: 'healthy' | 'warning' | 'critical'; issues: string[];}Export these metrics to your monitoring system (Prometheus, CloudWatch, Datadog). Key visualizations: hit rate over time, eviction rate spikes, memory utilization trends, and latency percentiles. Set alerts for hit rate drops and memory pressure.
Cache sizing and eviction policies are foundational to effective caching. Let's consolidate the key principles:
What's Next:
With cache sizing and eviction understood, we'll explore distributed vs local caching in the next page. You'll learn when to use in-process caches versus distributed caches like Redis, how to handle cache coherence, and the architectural patterns for multi-tier caching.
You now understand how to size caches effectively and select appropriate eviction policies. These skills enable you to build caching systems that maximize hit rates while managing memory efficiently. Apply working set analysis and continuous monitoring to keep your caches performing optimally.