System Design (LLD)Cache Design Considerations

Cache Design Considerations

LevelIntermediate

Duration90 mins

TopicCache Design Considerations

2 / 4

Cache Size and Eviction Policies

The Finite Resource Problem

Every cache faces an unavoidable constraint: memory is finite. No matter how much RAM you provision, your cache will eventually fill. When it does, you must answer two fundamental questions:

How much memory should this cache consume? Too little, and hit rates suffer. Too much, and you're wasting expensive resources.
When full, what should be evicted to make room? The wrong eviction policy can destroy cache effectiveness, turning a 95% hit rate into 5%.

These questions might seem like operational details, but they're actually architectural decisions with deep implications. A cache with the wrong size or eviction policy doesn't just underperform—it can create cascading failures, cost overruns, and user experience degradation.

Principal engineers understand that cache sizing and eviction are not one-time configuration choices. They're ongoing optimization targets that require workload analysis, monitoring, and periodic adjustment.

What You Will Learn

By the end of this page, you will understand how to determine appropriate cache sizes, select eviction policies for different workloads, handle memory pressure gracefully, and monitor cache efficiency. You'll learn the mathematics behind cache sizing and the algorithms behind eviction policies.

Cache Sizing Fundamentals

Determining the right cache size requires understanding the relationship between cache size, hit rate, and cost. This relationship is rarely linear—there are usually diminishing returns as cache size increases.

The Cache Size / Hit Rate Curve:

For most workloads, the relationship between cache size and hit rate follows a characteristic curve:

Cache Size Impact on Hit Rate (Example Workload)
Cache Size (% of total data)	Hit Rate	Cost per GB/month	Requests Saved per $
5%	60%	$10	600K
10%	75%	$20	375K
25%	88%	$50	176K
50%	95%	$100	95K
100%	99%	$200	49K

This table illustrates the diminishing returns principle: doubling cache size from 5% to 10% improves hit rate by 15 percentage points (60% → 75%), but doubling from 50% to 100% improves it by only 4 points (95% → 99%).

The optimal cache size depends on:

Factors Influencing Optimal Cache Size

•Access pattern — Highly skewed patterns (few items accessed frequently) need smaller caches. Uniform patterns need larger caches.
•Cost of cache miss — Expensive miss operations (API calls, complex queries) justify larger caches.
•Cost of cache memory — In-memory caches are expensive; larger caches require proportionally more investment.
•Data size distribution — If some items are very large, they consume disproportionate cache space.
•TTL requirements — Short TTLs mean items expire before eviction; long TTLs fill cache faster.

The 80/20 Starting Point

Many real-world workloads follow Pareto-like distributions: 20% of items account for 80% of accesses. For such workloads, a cache holding ~20% of data often achieves ~80% hit rate. Start here and adjust based on actual metrics.

Working Set Analysis

The working set is the subset of data actively accessed within a time window. Understanding your working set is essential for cache sizing—if your cache can hold the entire working set, you achieve near-perfect hit rates.

Formal Definition:

For a time window T, the working set W(T) is the set of distinct items accessed during T. The working set size |W(T)| determines the minimum cache size for high hit rates.

Working Set Characteristics:

•Time-varying — Working sets change over time (daily patterns, weekly patterns, events)
•Window-dependent — Larger windows include more items; smaller windows capture hot subset
•Workload-specific — E-commerce has different patterns than social media
•Seasonal — Holiday shopping, sports events, etc., dramatically shift working sets

working-set-analysis.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
/**
 * Working set analysis for cache sizing decisions.
 */
interface AccessEvent {
    key: string;
    timestamp: number;
    sizeBytes: number;
}
 
class WorkingSetAnalyzer {
    private accessLog: AccessEvent[] = [];
    
    /**
     * Record an access event for analysis.
     */
    recordAccess(key: string, sizeBytes: number): void {
        this.accessLog.push({
            key,
            timestamp: Date.now(),
            sizeBytes,
        });
        
        // Maintain bounded log (last 24 hours)
        const cutoff = Date.now() - 24 * 60 * 60 * 1000;
        this.accessLog = this.accessLog.filter(e => e.timestamp > cutoff);
    }
    
    /**
     * Calculate working set size for different time windows.
     */
    analyzeWorkingSets(): WorkingSetReport {
        const now = Date.now();
        
        return {
            oneHour: this.workingSet(now - 60 * 60 * 1000),
            fourHours: this.workingSet(now - 4 * 60 * 60 * 1000),
            oneDay: this.workingSet(now - 24 * 60 * 60 * 1000),
        };
    }
    
    private workingSet(since: number): WorkingSetMetrics {
        const relevant = this.accessLog.filter(e => e.timestamp >= since);
        
        // Unique keys accessed
        const uniqueKeys = new Set(relevant.map(e => e.key));
        
        // Size by key (latest size for each key)
        const sizeByKey = new Map<string, number>();
        for (const event of relevant) {
            sizeByKey.set(event.key, event.sizeBytes);
        }
        
        // Calculate total working set size
        let totalBytes = 0;
        for (const size of sizeByKey.values()) {
            totalBytes += size;
        }
        
        // Calculate access frequency distribution
        const accessCounts = new Map<string, number>();
        for (const event of relevant) {
            accessCounts.set(event.key, (accessCounts.get(event.key) || 0) + 1);
        }
        
        return {
            uniqueItems: uniqueKeys.size,
            totalBytes,
            totalAccesses: relevant.length,
            frequencyDistribution: this.calculateFrequencyDistribution(accessCounts),
        };
    }
    
    private calculateFrequencyDistribution(
        counts: Map<string, number>
    ): FrequencyDistribution {
        const values = Array.from(counts.values()).sort((a, b) => b - a);
        const total = values.reduce((sum, v) => sum + v, 0);
        
        // What percentage of items account for X% of accesses?
        let cumulative = 0;
        let itemsFor80Pct = 0;
        
        for (const count of values) {
            cumulative += count;
            itemsFor80Pct++;
            if (cumulative >= total * 0.8) break;
        }
        
        return {
            itemsFor80PercentAccesses: itemsFor80Pct,
            percentOfItems: (itemsFor80Pct / values.length) * 100,
            isSkewed: (itemsFor80Pct / values.length) < 0.3, // Less than 30% = skewed
        };
    }
}
 
interface WorkingSetMetrics {
    uniqueItems: number;
    totalBytes: number;
    totalAccesses: number;
    frequencyDistribution: FrequencyDistribution;
}
 
interface FrequencyDistribution {
    itemsFor80PercentAccesses: number;
    percentOfItems: number;
    isSkewed: boolean;
}
 
interface WorkingSetReport {
    oneHour: WorkingSetMetrics;
    fourHours: WorkingSetMetrics;
    oneDay: WorkingSetMetrics;
}
 
// Usage and interpretation
const analyzer = new WorkingSetAnalyzer();
 
// ... record access events over time ...
 
const report = analyzer.analyzeWorkingSets();
 
console.log(`1-hour working set: ${report.oneHour.uniqueItems} items, ${report.oneHour.totalBytes / 1e6} MB`);
console.log(`24-hour working set: ${report.oneDay.uniqueItems} items, ${report.oneDay.totalBytes / 1e6} MB`);
 
// Recommendation logic
if (report.oneHour.frequencyDistribution.isSkewed) {
    console.log("Workload is skewed - small cache can be effective");
    console.log(`Cache ~10% of 1-hour working set: ${report.oneHour.totalBytes * 0.1 / 1e6} MB`);
} else {
    console.log("Workload is uniform - larger cache needed");
    console.log(`Cache ~50% of 1-hour working set: ${report.oneHour.totalBytes * 0.5 / 1e6} MB`);
}

Working Set Shifts

Working sets shift during special events (product launches, sales, viral content). Monitor working set size continuously, not just during normal operations. Your cache size should accommodate peak working sets, or you'll see hit rate collapse during high-traffic events.

Eviction Policy Deep Dive

When a cache is full and a new item must be inserted, the eviction policy determines which existing item to remove. The right policy depends on your access patterns.

Understanding Policy Tradeoffs:

Every policy makes predictions about future accesses based on past behavior. No policy is universally best—each optimizes for different patterns.

Least Recently Used (LRU) evicts the item that hasn't been accessed for the longest time.

Assumption: Items accessed recently are likely to be accessed again soon.

Best for: General-purpose caching, web applications, API responses, user sessions.

lru-cache.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
/**
 * LRU Cache implementation using a Map (insertion order) + size tracking.
 * Map maintains insertion order; we move accessed items to end.
 */
class LRUCache<K, V> {
    private cache: Map<K, { value: V; size: number }>;
    private currentSize: number = 0;
    private readonly maxSize: number;
    
    // Metrics
    private hits: number = 0;
    private misses: number = 0;
    private evictions: number = 0;
    
    constructor(maxSizeBytes: number) {
        this.maxSize = maxSizeBytes;
        this.cache = new Map();
    }
    
    get(key: K): V | undefined {
        const entry = this.cache.get(key);
        
        if (entry) {
            // Move to end (most recently used)
            this.cache.delete(key);
            this.cache.set(key, entry);
            this.hits++;
            return entry.value;
        }
        
        this.misses++;
        return undefined;
    }
    
    set(key: K, value: V, sizeBytes: number): void {
        // If key exists, remove it first
        if (this.cache.has(key)) {
            const existing = this.cache.get(key)!;
            this.currentSize -= existing.size;
            this.cache.delete(key);
        }
        
        // Evict until space available
        while (this.currentSize + sizeBytes > this.maxSize && this.cache.size > 0) {
            this.evictOldest();
        }
        
        // Insert at end (most recently used)
        this.cache.set(key, { value, size: sizeBytes });
        this.currentSize += sizeBytes;
    }
    
    private evictOldest(): void {
        // First key in Map is the oldest
        const oldestKey = this.cache.keys().next().value;
        if (oldestKey !== undefined) {
            const entry = this.cache.get(oldestKey)!;
            this.cache.delete(oldestKey);
            this.currentSize -= entry.size;
            this.evictions++;
        }
    }
    
    getMetrics(): CacheMetrics {
        return {
            size: this.currentSize,
            itemCount: this.cache.size,
            hitRate: this.hits / (this.hits + this.misses) || 0,
            hits: this.hits,
            misses: this.misses,
            evictions: this.evictions,
        };
    }
}
 
interface CacheMetrics {
    size: number;
    itemCount: number;
    hitRate: number;
    hits: number;
    misses: number;
    evictions: number;
}
 
// Strengths: Simple, low overhead, works well for recency-based patterns
// Weaknesses: Vulnerable to scan pollution (sequential access destroys cache)

LRU Scan Pollution

LRU is vulnerable to scan pollution: a single sequential scan through data (e.g., a batch job) can evict all frequently-used items. If this is a concern, consider LRU with scan-resistance (SLRU) or LFU.

Eviction Policy Selection Guide

Choosing the right eviction policy requires understanding your workload characteristics. Here's a decision framework:

Eviction Policy Decision Matrix
Workload Pattern	Recommended Policy	Avoid	Key Consideration
Temporal locality (recent = valuable)	LRU	FIFO	Standard web applications
Stable popularity (some items always hot)	LFU / W-TinyLFU	Pure LRU	CDN, static content
Sequential scans frequent	LFU, SLRU, ARC	Pure LRU	Database, analytics
Changing popularity over time	ARC, LIRS, W-TinyLFU	Pure LFU	Trending content, news
Memory/CPU constrained	Clock, FIFO	ARC (overhead)	Embedded systems
Mixed/unknown workload	ARC, 2Q, CAR	Single-policy	General-purpose systems

Start Simple, Measure, Adapt

Don't over-engineer eviction policy selection. Start with LRU—it's simple, well-understood, and works well for most workloads. Measure hit rates. If they're below expectations, profile your access patterns and consider switching policies. Most applications never need anything beyond LRU.

policy-recommendation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
/**
 * Eviction policy recommendation based on workload analysis.
 */
interface WorkloadAnalysis {
    accessPattern: 'temporal' | 'popularity' | 'sequential' | 'mixed';
    workingSetStability: 'stable' | 'shifting' | 'volatile';
    memoryConstraint: 'tight' | 'moderate' | 'generous';
    performanceRequirement: 'latency-critical' | 'throughput' | 'balanced';
}
 
function recommendPolicy(analysis: WorkloadAnalysis): PolicyRecommendation {
    // Decision tree for policy selection
    
    if (analysis.memoryConstraint === 'tight') {
        return {
            policy: 'Clock',
            rationale: 'Clock has lowest memory overhead (no per-entry metadata)',
            alternative: 'FIFO',
        };
    }
    
    if (analysis.accessPattern === 'temporal' && 
        analysis.workingSetStability !== 'volatile') {
        return {
            policy: 'LRU',
            rationale: 'LRU excels at temporal locality patterns',
            alternative: 'S-LRU (if scans are concern)',
        };
    }
    
    if (analysis.accessPattern === 'popularity' &&
        analysis.workingSetStability === 'stable') {
        return {
            policy: 'LFU',
            rationale: 'LFU preserves frequently-accessed items regardless of recency',
            alternative: 'W-TinyLFU (if some temporal variation)',
        };
    }
    
    if (analysis.accessPattern === 'sequential') {
        return {
            policy: 'SLRU or 2Q',
            rationale: 'Segmented LRU resists scan pollution',
            alternative: 'ARC',
        };
    }
    
    if (analysis.accessPattern === 'mixed' ||
        analysis.workingSetStability === 'volatile') {
        return {
            policy: 'ARC or W-TinyLFU',
            rationale: 'Adaptive policies handle workload changes',
            alternative: 'CAR (if IBM patent is concern)',
        };
    }
    
    // Default fallback
    return {
        policy: 'LRU',
        rationale: 'Safe default that works well for most workloads',
        alternative: 'Measure and adjust based on hit rate',
    };
}
 
interface PolicyRecommendation {
    policy: string;
    rationale: string;
    alternative: string;
}

Memory Pressure Handling

When system memory runs low, caches must respond gracefully. Uncontrolled cache growth can lead to out-of-memory crashes, while overly aggressive shrinking destroys hit rates.

Memory Pressure Scenarios:

Memory Pressure Sources

•Traffic spikes — Sudden increase in unique items exceeds normal working set
•Large objects — A few large items consume disproportionate cache space
•Memory leaks — Other system components gradually consume available memory
•Noisy neighbors — In shared environments, other processes compete for RAM
•Configuration drift — Cache size limits not adjusted as data grows

memory-pressure-handling.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
/**
 * Memory pressure handling strategies for cache systems.
 */
interface MemoryPressureConfig {
    softLimit: number;      // Start gentle eviction (bytes)
    hardLimit: number;      // Aggressive eviction threshold
    criticalLimit: number;  // Emergency purge threshold
    checkIntervalMs: number;
}
 
class MemoryPressureHandler<K, V> {
    private cache: Map<K, CacheEntry<V>> = new Map();
    private currentSize: number = 0;
    private readonly config: MemoryPressureConfig;
    
    // Monitoring
    private pressure: 'normal' | 'elevated' | 'high' | 'critical' = 'normal';
    private evictionMultiplier: number = 1;
    
    constructor(config: MemoryPressureConfig) {
        this.config = config;
        this.startPressureMonitoring();
    }
    
    private startPressureMonitoring(): void {
        setInterval(() => {
            this.assessPressure();
        }, this.config.checkIntervalMs);
    }
    
    private assessPressure(): void {
        const ratio = this.currentSize / this.config.softLimit;
        
        if (ratio < 0.8) {
            this.pressure = 'normal';
            this.evictionMultiplier = 1;
        } else if (ratio < 1.0) {
            this.pressure = 'elevated';
            this.evictionMultiplier = 1.5; // Evict 50% more aggressively
        } else if (this.currentSize < this.config.hardLimit) {
            this.pressure = 'high';
            this.evictionMultiplier = 3; // Evict 3x more
            this.proactiveEviction();
        } else if (this.currentSize < this.config.criticalLimit) {
            this.pressure = 'critical';
            this.emergencyPurge(0.25); // Purge 25% of cache
        } else {
            // Over critical limit - drastic measures
            this.emergencyPurge(0.5); // Purge 50% of cache
        }
    }
    
    /**
     * Proactive eviction during high pressure.
     * Evict oldest/least-valuable items preemptively.
     */
    private proactiveEviction(): void {
        const targetSize = this.config.softLimit * 0.9; // Target 90% of soft limit
        const toEvict = this.currentSize - targetSize;
        
        let evicted = 0;
        const sortedKeys = this.getKeysByPriority();
        
        for (const key of sortedKeys) {
            if (evicted >= toEvict) break;
            
            const entry = this.cache.get(key)!;
            this.remove(key);
            evicted += entry.size;
        }
        
        console.log(`Proactive eviction: freed ${evicted} bytes`);
    }
    
    /**
     * Emergency purge - rapidly reduce cache size.
     */
    private emergencyPurge(fraction: number): void {
        console.warn(`Emergency cache purge: removing ${fraction * 100}% of entries`);
        
        const keys = Array.from(this.cache.keys());
        const toRemove = Math.floor(keys.length * fraction);
        
        // Remove oldest entries first
        for (let i = 0; i < toRemove; i++) {
            this.remove(keys[i]);
        }
    }
    
    private getKeysByPriority(): K[] {
        // Sort by last access time (oldest first)
        return Array.from(this.cache.entries())
            .sort((a, b) => a[1].lastAccess - b[1].lastAccess)
            .map(([key]) => key);
    }
    
    /**
     * Set with pressure-aware behavior.
     */
    set(key: K, value: V, size: number): void {
        // Reject if critical and item is large
        if (this.pressure === 'critical' && size > this.config.softLimit * 0.01) {
            console.warn(`Rejecting large cache entry during critical pressure`);
            return;
        }
        
        // Evict with multiplier
        const targetFree = size * this.evictionMultiplier;
        this.ensureSpace(targetFree);
        
        this.cache.set(key, {
            value,
            size,
            lastAccess: Date.now(),
        });
        this.currentSize += size;
    }
    
    private ensureSpace(needed: number): void {
        while (this.currentSize + needed > this.config.softLimit && 
               this.cache.size > 0) {
            const oldestKey = this.cache.keys().next().value;
            this.remove(oldestKey);
        }
    }
    
    private remove(key: K): void {
        const entry = this.cache.get(key);
        if (entry) {
            this.currentSize -= entry.size;
            this.cache.delete(key);
        }
    }
    
    getHealthMetrics(): MemoryHealthMetrics {
        return {
            currentSize: this.currentSize,
            pressure: this.pressure,
            utilizationPercent: (this.currentSize / this.config.softLimit) * 100,
            itemCount: this.cache.size,
        };
    }
}
 
interface CacheEntry<V> {
    value: V;
    size: number;
    lastAccess: number;
}
 
interface MemoryHealthMetrics {
    currentSize: number;
    pressure: string;
    utilizationPercent: number;
    itemCount: number;
}

Emergency Purge is a Last Resort

Emergency cache purges cause sudden hit rate drops, spiking load on backing stores. Configure pressure thresholds conservatively to avoid reaching critical state. Monitor pressure metrics and alert before emergencies occur.

Monitoring Cache Efficiency

Effective cache management requires continuous monitoring. Without visibility into cache behavior, sizing and policy decisions are guesswork.

Essential Cache Metrics:

Cache Health Monitoring Metrics
Metric	Formula	Healthy Range	Action if Outside
Hit Rate	hits / (hits + misses)	80%	Increase size or review key design
Miss Rate	misses / (hits + misses)	< 20%	Same as hit rate
Eviction Rate	evictions / sec	Stable, low	Increase size or review TTLs
Memory Utilization	used / max	60-90%	Adjust size (too low = waste, too high = pressure)
Latency (p99)	99th percentile get latency	< 5ms (local), < 20ms (distributed)	Review memory pressure, network
Expired Rate	expirations / evictions	High (items expire before eviction)	Adjust TTLs if eviction >> expiration

cache-monitoring.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
/**
 * Comprehensive cache monitoring and alerting.
 */
interface CacheStats {
    hits: number;
    misses: number;
    evictions: number;
    expirations: number;
    currentSize: number;
    maxSize: number;
    itemCount: number;
    getLatencyMs: number[];  // Recent latency samples
}
 
class CacheMonitor {
    private stats: CacheStats = {
        hits: 0,
        misses: 0,
        evictions: 0,
        expirations: 0,
        currentSize: 0,
        maxSize: 0,
        itemCount: 0,
        getLatencyMs: [],
    };
    
    private windowStart: number = Date.now();
    
    recordHit(latencyMs: number): void {
        this.stats.hits++;
        this.recordLatency(latencyMs);
    }
    
    recordMiss(latencyMs: number): void {
        this.stats.misses++;
        this.recordLatency(latencyMs);
    }
    
    recordEviction(): void {
        this.stats.evictions++;
    }
    
    recordExpiration(): void {
        this.stats.expirations++;
    }
    
    private recordLatency(ms: number): void {
        this.stats.getLatencyMs.push(ms);
        // Keep last 1000 samples
        if (this.stats.getLatencyMs.length > 1000) {
            this.stats.getLatencyMs.shift();
        }
    }
    
    updateSizeMetrics(current: number, max: number, items: number): void {
        this.stats.currentSize = current;
        this.stats.maxSize = max;
        this.stats.itemCount = items;
    }
    
    getMetrics(): CacheHealthReport {
        const windowDuration = (Date.now() - this.windowStart) / 1000;
        const totalOps = this.stats.hits + this.stats.misses;
        
        // Calculate latency percentiles
        const sorted = [...this.stats.getLatencyMs].sort((a, b) => a - b);
        const p50 = sorted[Math.floor(sorted.length * 0.5)] || 0;
        const p95 = sorted[Math.floor(sorted.length * 0.95)] || 0;
        const p99 = sorted[Math.floor(sorted.length * 0.99)] || 0;
        
        return {
            // Rates
            hitRate: totalOps > 0 ? this.stats.hits / totalOps : 0,
            missRate: totalOps > 0 ? this.stats.misses / totalOps : 0,
            evictionRate: this.stats.evictions / windowDuration,
            expirationRate: this.stats.expirations / windowDuration,
            throughput: totalOps / windowDuration,
            
            // Utilization
            memoryUtilization: this.stats.maxSize > 0 
                ? this.stats.currentSize / this.stats.maxSize 
                : 0,
            itemCount: this.stats.itemCount,
            avgItemSize: this.stats.itemCount > 0 
                ? this.stats.currentSize / this.stats.itemCount 
                : 0,
            
            // Latency
            latencyP50Ms: p50,
            latencyP95Ms: p95,
            latencyP99Ms: p99,
            
            // Health assessment
            health: this.assessHealth(
                totalOps > 0 ? this.stats.hits / totalOps : 0,
                this.stats.currentSize / this.stats.maxSize,
                p99
            ),
        };
    }
    
    private assessHealth(
        hitRate: number, 
        utilization: number, 
        p99Latency: number
    ): CacheHealthStatus {
        const issues: string[] = [];
        
        if (hitRate < 0.5) issues.push('Critical: Hit rate below 50%');
        else if (hitRate < 0.7) issues.push('Warning: Hit rate below 70%');
        
        if (utilization > 0.95) issues.push('Warning: Memory utilization > 95%');
        else if (utilization < 0.3) issues.push('Info: Memory underutilized');
        
        if (p99Latency > 50) issues.push('Critical: p99 latency > 50ms');
        else if (p99Latency > 20) issues.push('Warning: p99 latency > 20ms');
        
        return {
            status: issues.some(i => i.startsWith('Critical')) ? 'critical'
                : issues.some(i => i.startsWith('Warning')) ? 'warning' 
                : 'healthy',
            issues,
        };
    }
    
    reset(): void {
        this.stats = {
            hits: 0,
            misses: 0,
            evictions: 0,
            expirations: 0,
            currentSize: this.stats.currentSize,
            maxSize: this.stats.maxSize,
            itemCount: this.stats.itemCount,
            getLatencyMs: [],
        };
        this.windowStart = Date.now();
    }
}
 
interface CacheHealthReport {
    hitRate: number;
    missRate: number;
    evictionRate: number;
    expirationRate: number;
    throughput: number;
    memoryUtilization: number;
    itemCount: number;
    avgItemSize: number;
    latencyP50Ms: number;
    latencyP95Ms: number;
    latencyP99Ms: number;
    health: CacheHealthStatus;
}
 
interface CacheHealthStatus {
    status: 'healthy' | 'warning' | 'critical';
    issues: string[];
}

Dashboard Visualization

Export these metrics to your monitoring system (Prometheus, CloudWatch, Datadog). Key visualizations: hit rate over time, eviction rate spikes, memory utilization trends, and latency percentiles. Set alerts for hit rate drops and memory pressure.

Summary: Cache Size and Eviction Mastery

Cache sizing and eviction policies are foundational to effective caching. Let's consolidate the key principles:

Key Takeaways

•Cache size has diminishing returns — Doubling size doesn't double hit rate; target the knee of the curve for optimal ROI.
•Understand your working set — Analyze access patterns to size cache appropriately; monitor for working set shifts.
•LRU works for most workloads — Start with LRU; move to LFU, ARC, or other policies only if profiling shows need.
•Sequential scans destroy LRU — If batch jobs or analytics scans are common, use scan-resistant policies (SLRU, 2Q).
•Adaptive policies handle uncertainty — ARC and similar policies tune themselves; consider when workload is mixed or changing.
•Plan for memory pressure — Implement graduated response from gentle eviction to emergency purge; monitor pressure levels.
•Measure everything — Hit rate, eviction rate, latency, and utilization must be monitored; you can't optimize what you don't measure.

What's Next:

With cache sizing and eviction understood, we'll explore distributed vs local caching in the next page. You'll learn when to use in-process caches versus distributed caches like Redis, how to handle cache coherence, and the architectural patterns for multi-tier caching.

Page Complete

You now understand how to size caches effectively and select appropriate eviction policies. These skills enable you to build caching systems that maximize hit rates while managing memory efficiently. Apply working set analysis and continuous monitoring to keep your caches performing optimally.

2 / 4

Loading learning content...

System Design (LLD)Cache Design Considerations

Cache Design Considerations

LevelIntermediate

Duration90 mins

TopicCache Design Considerations

2 / 4

Cache Size and Eviction Policies

The Finite Resource Problem

Every cache faces an unavoidable constraint: memory is finite. No matter how much RAM you provision, your cache will eventually fill. When it does, you must answer two fundamental questions:

How much memory should this cache consume? Too little, and hit rates suffer. Too much, and you're wasting expensive resources.
When full, what should be evicted to make room? The wrong eviction policy can destroy cache effectiveness, turning a 95% hit rate into 5%.

What You Will Learn

Cache Sizing Fundamentals

The Cache Size / Hit Rate Curve:

For most workloads, the relationship between cache size and hit rate follows a characteristic curve:

Cache Size Impact on Hit Rate (Example Workload)
Cache Size (% of total data)	Hit Rate	Cost per GB/month	Requests Saved per $
5%	60%	$10	600K
10%	75%	$20	375K
25%	88%	$50	176K
50%	95%	$100	95K
100%	99%	$200	49K

The optimal cache size depends on:

Factors Influencing Optimal Cache Size

•Access pattern — Highly skewed patterns (few items accessed frequently) need smaller caches. Uniform patterns need larger caches.
•Cost of cache miss — Expensive miss operations (API calls, complex queries) justify larger caches.
•Cost of cache memory — In-memory caches are expensive; larger caches require proportionally more investment.
•Data size distribution — If some items are very large, they consume disproportionate cache space.
•TTL requirements — Short TTLs mean items expire before eviction; long TTLs fill cache faster.

The 80/20 Starting Point

Working Set Analysis

Formal Definition:

For a time window T, the working set W(T) is the set of distinct items accessed during T. The working set size |W(T)| determines the minimum cache size for high hit rates.

Working Set Characteristics:

•Time-varying — Working sets change over time (daily patterns, weekly patterns, events)
•Window-dependent — Larger windows include more items; smaller windows capture hot subset
•Workload-specific — E-commerce has different patterns than social media
•Seasonal — Holiday shopping, sports events, etc., dramatically shift working sets

working-set-analysis.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
/**
 * Working set analysis for cache sizing decisions.
 */
interface AccessEvent {
    key: string;
    timestamp: number;
    sizeBytes: number;
}
 
class WorkingSetAnalyzer {
    private accessLog: AccessEvent[] = [];
    
    /**
     * Record an access event for analysis.
     */
    recordAccess(key: string, sizeBytes: number): void {
        this.accessLog.push({
            key,
            timestamp: Date.now(),
            sizeBytes,
        });
        
        // Maintain bounded log (last 24 hours)
        const cutoff = Date.now() - 24 * 60 * 60 * 1000;
        this.accessLog = this.accessLog.filter(e => e.timestamp > cutoff);
    }
    
    /**
     * Calculate working set size for different time windows.
     */
    analyzeWorkingSets(): WorkingSetReport {
        const now = Date.now();
        
        return {
            oneHour: this.workingSet(now - 60 * 60 * 1000),
            fourHours: this.workingSet(now - 4 * 60 * 60 * 1000),
            oneDay: this.workingSet(now - 24 * 60 * 60 * 1000),
        };
    }
    
    private workingSet(since: number): WorkingSetMetrics {
        const relevant = this.accessLog.filter(e => e.timestamp >= since);
        
        // Unique keys accessed
        const uniqueKeys = new Set(relevant.map(e => e.key));
        
        // Size by key (latest size for each key)
        const sizeByKey = new Map<string, number>();
        for (const event of relevant) {
            sizeByKey.set(event.key, event.sizeBytes);
        }
        
        // Calculate total working set size
        let totalBytes = 0;
        for (const size of sizeByKey.values()) {
            totalBytes += size;
        }
        
        // Calculate access frequency distribution
        const accessCounts = new Map<string, number>();
        for (const event of relevant) {
            accessCounts.set(event.key, (accessCounts.get(event.key) || 0) + 1);
        }
        
        return {
            uniqueItems: uniqueKeys.size,
            totalBytes,
            totalAccesses: relevant.length,
            frequencyDistribution: this.calculateFrequencyDistribution(accessCounts),
        };
    }
    
    private calculateFrequencyDistribution(
        counts: Map<string, number>
    ): FrequencyDistribution {
        const values = Array.from(counts.values()).sort((a, b) => b - a);
        const total = values.reduce((sum, v) => sum + v, 0);
        
        // What percentage of items account for X% of accesses?
        let cumulative = 0;
        let itemsFor80Pct = 0;
        
        for (const count of values) {
            cumulative += count;
            itemsFor80Pct++;
            if (cumulative >= total * 0.8) break;
        }
        
        return {
            itemsFor80PercentAccesses: itemsFor80Pct,
            percentOfItems: (itemsFor80Pct / values.length) * 100,
            isSkewed: (itemsFor80Pct / values.length) < 0.3, // Less than 30% = skewed
        };
    }
}
 
interface WorkingSetMetrics {
    uniqueItems: number;
    totalBytes: number;
    totalAccesses: number;
    frequencyDistribution: FrequencyDistribution;
}
 
interface FrequencyDistribution {
    itemsFor80PercentAccesses: number;
    percentOfItems: number;
    isSkewed: boolean;
}
 
interface WorkingSetReport {
    oneHour: WorkingSetMetrics;
    fourHours: WorkingSetMetrics;
    oneDay: WorkingSetMetrics;
}
 
// Usage and interpretation
const analyzer = new WorkingSetAnalyzer();
 
// ... record access events over time ...
 
const report = analyzer.analyzeWorkingSets();
 
console.log(`1-hour working set: ${report.oneHour.uniqueItems} items, ${report.oneHour.totalBytes / 1e6} MB`);
console.log(`24-hour working set: ${report.oneDay.uniqueItems} items, ${report.oneDay.totalBytes / 1e6} MB`);
 
// Recommendation logic
if (report.oneHour.frequencyDistribution.isSkewed) {
    console.log("Workload is skewed - small cache can be effective");
    console.log(`Cache ~10% of 1-hour working set: ${report.oneHour.totalBytes * 0.1 / 1e6} MB`);
} else {
    console.log("Workload is uniform - larger cache needed");
    console.log(`Cache ~50% of 1-hour working set: ${report.oneHour.totalBytes * 0.5 / 1e6} MB`);
}

Working Set Shifts

Eviction Policy Deep Dive

When a cache is full and a new item must be inserted, the eviction policy determines which existing item to remove. The right policy depends on your access patterns.

Understanding Policy Tradeoffs:

Every policy makes predictions about future accesses based on past behavior. No policy is universally best—each optimizes for different patterns.

Least Recently Used (LRU) evicts the item that hasn't been accessed for the longest time.

Assumption: Items accessed recently are likely to be accessed again soon.

Best for: General-purpose caching, web applications, API responses, user sessions.

lru-cache.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
/**
 * LRU Cache implementation using a Map (insertion order) + size tracking.
 * Map maintains insertion order; we move accessed items to end.
 */
class LRUCache<K, V> {
    private cache: Map<K, { value: V; size: number }>;
    private currentSize: number = 0;
    private readonly maxSize: number;
    
    // Metrics
    private hits: number = 0;
    private misses: number = 0;
    private evictions: number = 0;
    
    constructor(maxSizeBytes: number) {
        this.maxSize = maxSizeBytes;
        this.cache = new Map();
    }
    
    get(key: K): V | undefined {
        const entry = this.cache.get(key);
        
        if (entry) {
            // Move to end (most recently used)
            this.cache.delete(key);
            this.cache.set(key, entry);
            this.hits++;
            return entry.value;
        }
        
        this.misses++;
        return undefined;
    }
    
    set(key: K, value: V, sizeBytes: number): void {
        // If key exists, remove it first
        if (this.cache.has(key)) {
            const existing = this.cache.get(key)!;
            this.currentSize -= existing.size;
            this.cache.delete(key);
        }
        
        // Evict until space available
        while (this.currentSize + sizeBytes > this.maxSize && this.cache.size > 0) {
            this.evictOldest();
        }
        
        // Insert at end (most recently used)
        this.cache.set(key, { value, size: sizeBytes });
        this.currentSize += sizeBytes;
    }
    
    private evictOldest(): void {
        // First key in Map is the oldest
        const oldestKey = this.cache.keys().next().value;
        if (oldestKey !== undefined) {
            const entry = this.cache.get(oldestKey)!;
            this.cache.delete(oldestKey);
            this.currentSize -= entry.size;
            this.evictions++;
        }
    }
    
    getMetrics(): CacheMetrics {
        return {
            size: this.currentSize,
            itemCount: this.cache.size,
            hitRate: this.hits / (this.hits + this.misses) || 0,
            hits: this.hits,
            misses: this.misses,
            evictions: this.evictions,
        };
    }
}
 
interface CacheMetrics {
    size: number;
    itemCount: number;
    hitRate: number;
    hits: number;
    misses: number;
    evictions: number;
}
 
// Strengths: Simple, low overhead, works well for recency-based patterns
// Weaknesses: Vulnerable to scan pollution (sequential access destroys cache)

LRU Scan Pollution

Eviction Policy Selection Guide

Choosing the right eviction policy requires understanding your workload characteristics. Here's a decision framework:

Eviction Policy Decision Matrix
Workload Pattern	Recommended Policy	Avoid	Key Consideration
Temporal locality (recent = valuable)	LRU	FIFO	Standard web applications
Stable popularity (some items always hot)	LFU / W-TinyLFU	Pure LRU	CDN, static content
Sequential scans frequent	LFU, SLRU, ARC	Pure LRU	Database, analytics
Changing popularity over time	ARC, LIRS, W-TinyLFU	Pure LFU	Trending content, news
Memory/CPU constrained	Clock, FIFO	ARC (overhead)	Embedded systems
Mixed/unknown workload	ARC, 2Q, CAR	Single-policy	General-purpose systems

Start Simple, Measure, Adapt

policy-recommendation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
/**
 * Eviction policy recommendation based on workload analysis.
 */
interface WorkloadAnalysis {
    accessPattern: 'temporal' | 'popularity' | 'sequential' | 'mixed';
    workingSetStability: 'stable' | 'shifting' | 'volatile';
    memoryConstraint: 'tight' | 'moderate' | 'generous';
    performanceRequirement: 'latency-critical' | 'throughput' | 'balanced';
}
 
function recommendPolicy(analysis: WorkloadAnalysis): PolicyRecommendation {
    // Decision tree for policy selection
    
    if (analysis.memoryConstraint === 'tight') {
        return {
            policy: 'Clock',
            rationale: 'Clock has lowest memory overhead (no per-entry metadata)',
            alternative: 'FIFO',
        };
    }
    
    if (analysis.accessPattern === 'temporal' && 
        analysis.workingSetStability !== 'volatile') {
        return {
            policy: 'LRU',
            rationale: 'LRU excels at temporal locality patterns',
            alternative: 'S-LRU (if scans are concern)',
        };
    }
    
    if (analysis.accessPattern === 'popularity' &&
        analysis.workingSetStability === 'stable') {
        return {
            policy: 'LFU',
            rationale: 'LFU preserves frequently-accessed items regardless of recency',
            alternative: 'W-TinyLFU (if some temporal variation)',
        };
    }
    
    if (analysis.accessPattern === 'sequential') {
        return {
            policy: 'SLRU or 2Q',
            rationale: 'Segmented LRU resists scan pollution',
            alternative: 'ARC',
        };
    }
    
    if (analysis.accessPattern === 'mixed' ||
        analysis.workingSetStability === 'volatile') {
        return {
            policy: 'ARC or W-TinyLFU',
            rationale: 'Adaptive policies handle workload changes',
            alternative: 'CAR (if IBM patent is concern)',
        };
    }
    
    // Default fallback
    return {
        policy: 'LRU',
        rationale: 'Safe default that works well for most workloads',
        alternative: 'Measure and adjust based on hit rate',
    };
}
 
interface PolicyRecommendation {
    policy: string;
    rationale: string;
    alternative: string;
}

Memory Pressure Handling

When system memory runs low, caches must respond gracefully. Uncontrolled cache growth can lead to out-of-memory crashes, while overly aggressive shrinking destroys hit rates.

Memory Pressure Scenarios:

Memory Pressure Sources

•Traffic spikes — Sudden increase in unique items exceeds normal working set
•Large objects — A few large items consume disproportionate cache space
•Memory leaks — Other system components gradually consume available memory
•Noisy neighbors — In shared environments, other processes compete for RAM
•Configuration drift — Cache size limits not adjusted as data grows

memory-pressure-handling.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
/**
 * Memory pressure handling strategies for cache systems.
 */
interface MemoryPressureConfig {
    softLimit: number;      // Start gentle eviction (bytes)
    hardLimit: number;      // Aggressive eviction threshold
    criticalLimit: number;  // Emergency purge threshold
    checkIntervalMs: number;
}
 
class MemoryPressureHandler<K, V> {
    private cache: Map<K, CacheEntry<V>> = new Map();
    private currentSize: number = 0;
    private readonly config: MemoryPressureConfig;
    
    // Monitoring
    private pressure: 'normal' | 'elevated' | 'high' | 'critical' = 'normal';
    private evictionMultiplier: number = 1;
    
    constructor(config: MemoryPressureConfig) {
        this.config = config;
        this.startPressureMonitoring();
    }
    
    private startPressureMonitoring(): void {
        setInterval(() => {
            this.assessPressure();
        }, this.config.checkIntervalMs);
    }
    
    private assessPressure(): void {
        const ratio = this.currentSize / this.config.softLimit;
        
        if (ratio < 0.8) {
            this.pressure = 'normal';
            this.evictionMultiplier = 1;
        } else if (ratio < 1.0) {
            this.pressure = 'elevated';
            this.evictionMultiplier = 1.5; // Evict 50% more aggressively
        } else if (this.currentSize < this.config.hardLimit) {
            this.pressure = 'high';
            this.evictionMultiplier = 3; // Evict 3x more
            this.proactiveEviction();
        } else if (this.currentSize < this.config.criticalLimit) {
            this.pressure = 'critical';
            this.emergencyPurge(0.25); // Purge 25% of cache
        } else {
            // Over critical limit - drastic measures
            this.emergencyPurge(0.5); // Purge 50% of cache
        }
    }
    
    /**
     * Proactive eviction during high pressure.
     * Evict oldest/least-valuable items preemptively.
     */
    private proactiveEviction(): void {
        const targetSize = this.config.softLimit * 0.9; // Target 90% of soft limit
        const toEvict = this.currentSize - targetSize;
        
        let evicted = 0;
        const sortedKeys = this.getKeysByPriority();
        
        for (const key of sortedKeys) {
            if (evicted >= toEvict) break;
            
            const entry = this.cache.get(key)!;
            this.remove(key);
            evicted += entry.size;
        }
        
        console.log(`Proactive eviction: freed ${evicted} bytes`);
    }
    
    /**
     * Emergency purge - rapidly reduce cache size.
     */
    private emergencyPurge(fraction: number): void {
        console.warn(`Emergency cache purge: removing ${fraction * 100}% of entries`);
        
        const keys = Array.from(this.cache.keys());
        const toRemove = Math.floor(keys.length * fraction);
        
        // Remove oldest entries first
        for (let i = 0; i < toRemove; i++) {
            this.remove(keys[i]);
        }
    }
    
    private getKeysByPriority(): K[] {
        // Sort by last access time (oldest first)
        return Array.from(this.cache.entries())
            .sort((a, b) => a[1].lastAccess - b[1].lastAccess)
            .map(([key]) => key);
    }
    
    /**
     * Set with pressure-aware behavior.
     */
    set(key: K, value: V, size: number): void {
        // Reject if critical and item is large
        if (this.pressure === 'critical' && size > this.config.softLimit * 0.01) {
            console.warn(`Rejecting large cache entry during critical pressure`);
            return;
        }
        
        // Evict with multiplier
        const targetFree = size * this.evictionMultiplier;
        this.ensureSpace(targetFree);
        
        this.cache.set(key, {
            value,
            size,
            lastAccess: Date.now(),
        });
        this.currentSize += size;
    }
    
    private ensureSpace(needed: number): void {
        while (this.currentSize + needed > this.config.softLimit && 
               this.cache.size > 0) {
            const oldestKey = this.cache.keys().next().value;
            this.remove(oldestKey);
        }
    }
    
    private remove(key: K): void {
        const entry = this.cache.get(key);
        if (entry) {
            this.currentSize -= entry.size;
            this.cache.delete(key);
        }
    }
    
    getHealthMetrics(): MemoryHealthMetrics {
        return {
            currentSize: this.currentSize,
            pressure: this.pressure,
            utilizationPercent: (this.currentSize / this.config.softLimit) * 100,
            itemCount: this.cache.size,
        };
    }
}
 
interface CacheEntry<V> {
    value: V;
    size: number;
    lastAccess: number;
}
 
interface MemoryHealthMetrics {
    currentSize: number;
    pressure: string;
    utilizationPercent: number;
    itemCount: number;
}

Emergency Purge is a Last Resort

Monitoring Cache Efficiency

Effective cache management requires continuous monitoring. Without visibility into cache behavior, sizing and policy decisions are guesswork.

Essential Cache Metrics:

Cache Health Monitoring Metrics
Metric	Formula	Healthy Range	Action if Outside
Hit Rate	hits / (hits + misses)	80%	Increase size or review key design
Miss Rate	misses / (hits + misses)	< 20%	Same as hit rate
Eviction Rate	evictions / sec	Stable, low	Increase size or review TTLs
Memory Utilization	used / max	60-90%	Adjust size (too low = waste, too high = pressure)
Latency (p99)	99th percentile get latency	< 5ms (local), < 20ms (distributed)	Review memory pressure, network
Expired Rate	expirations / evictions	High (items expire before eviction)	Adjust TTLs if eviction >> expiration

cache-monitoring.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
/**
 * Comprehensive cache monitoring and alerting.
 */
interface CacheStats {
    hits: number;
    misses: number;
    evictions: number;
    expirations: number;
    currentSize: number;
    maxSize: number;
    itemCount: number;
    getLatencyMs: number[];  // Recent latency samples
}
 
class CacheMonitor {
    private stats: CacheStats = {
        hits: 0,
        misses: 0,
        evictions: 0,
        expirations: 0,
        currentSize: 0,
        maxSize: 0,
        itemCount: 0,
        getLatencyMs: [],
    };
    
    private windowStart: number = Date.now();
    
    recordHit(latencyMs: number): void {
        this.stats.hits++;
        this.recordLatency(latencyMs);
    }
    
    recordMiss(latencyMs: number): void {
        this.stats.misses++;
        this.recordLatency(latencyMs);
    }
    
    recordEviction(): void {
        this.stats.evictions++;
    }
    
    recordExpiration(): void {
        this.stats.expirations++;
    }
    
    private recordLatency(ms: number): void {
        this.stats.getLatencyMs.push(ms);
        // Keep last 1000 samples
        if (this.stats.getLatencyMs.length > 1000) {
            this.stats.getLatencyMs.shift();
        }
    }
    
    updateSizeMetrics(current: number, max: number, items: number): void {
        this.stats.currentSize = current;
        this.stats.maxSize = max;
        this.stats.itemCount = items;
    }
    
    getMetrics(): CacheHealthReport {
        const windowDuration = (Date.now() - this.windowStart) / 1000;
        const totalOps = this.stats.hits + this.stats.misses;
        
        // Calculate latency percentiles
        const sorted = [...this.stats.getLatencyMs].sort((a, b) => a - b);
        const p50 = sorted[Math.floor(sorted.length * 0.5)] || 0;
        const p95 = sorted[Math.floor(sorted.length * 0.95)] || 0;
        const p99 = sorted[Math.floor(sorted.length * 0.99)] || 0;
        
        return {
            // Rates
            hitRate: totalOps > 0 ? this.stats.hits / totalOps : 0,
            missRate: totalOps > 0 ? this.stats.misses / totalOps : 0,
            evictionRate: this.stats.evictions / windowDuration,
            expirationRate: this.stats.expirations / windowDuration,
            throughput: totalOps / windowDuration,
            
            // Utilization
            memoryUtilization: this.stats.maxSize > 0 
                ? this.stats.currentSize / this.stats.maxSize 
                : 0,
            itemCount: this.stats.itemCount,
            avgItemSize: this.stats.itemCount > 0 
                ? this.stats.currentSize / this.stats.itemCount 
                : 0,
            
            // Latency
            latencyP50Ms: p50,
            latencyP95Ms: p95,
            latencyP99Ms: p99,
            
            // Health assessment
            health: this.assessHealth(
                totalOps > 0 ? this.stats.hits / totalOps : 0,
                this.stats.currentSize / this.stats.maxSize,
                p99
            ),
        };
    }
    
    private assessHealth(
        hitRate: number, 
        utilization: number, 
        p99Latency: number
    ): CacheHealthStatus {
        const issues: string[] = [];
        
        if (hitRate < 0.5) issues.push('Critical: Hit rate below 50%');
        else if (hitRate < 0.7) issues.push('Warning: Hit rate below 70%');
        
        if (utilization > 0.95) issues.push('Warning: Memory utilization > 95%');
        else if (utilization < 0.3) issues.push('Info: Memory underutilized');
        
        if (p99Latency > 50) issues.push('Critical: p99 latency > 50ms');
        else if (p99Latency > 20) issues.push('Warning: p99 latency > 20ms');
        
        return {
            status: issues.some(i => i.startsWith('Critical')) ? 'critical'
                : issues.some(i => i.startsWith('Warning')) ? 'warning' 
                : 'healthy',
            issues,
        };
    }
    
    reset(): void {
        this.stats = {
            hits: 0,
            misses: 0,
            evictions: 0,
            expirations: 0,
            currentSize: this.stats.currentSize,
            maxSize: this.stats.maxSize,
            itemCount: this.stats.itemCount,
            getLatencyMs: [],
        };
        this.windowStart = Date.now();
    }
}
 
interface CacheHealthReport {
    hitRate: number;
    missRate: number;
    evictionRate: number;
    expirationRate: number;
    throughput: number;
    memoryUtilization: number;
    itemCount: number;
    avgItemSize: number;
    latencyP50Ms: number;
    latencyP95Ms: number;
    latencyP99Ms: number;
    health: CacheHealthStatus;
}
 
interface CacheHealthStatus {
    status: 'healthy' | 'warning' | 'critical';
    issues: string[];
}

Dashboard Visualization

Summary: Cache Size and Eviction Mastery

Cache sizing and eviction policies are foundational to effective caching. Let's consolidate the key principles:

Key Takeaways

•Cache size has diminishing returns — Doubling size doesn't double hit rate; target the knee of the curve for optimal ROI.
•Understand your working set — Analyze access patterns to size cache appropriately; monitor for working set shifts.
•LRU works for most workloads — Start with LRU; move to LFU, ARC, or other policies only if profiling shows need.
•Sequential scans destroy LRU — If batch jobs or analytics scans are common, use scan-resistant policies (SLRU, 2Q).
•Adaptive policies handle uncertainty — ARC and similar policies tune themselves; consider when workload is mixed or changing.
•Plan for memory pressure — Implement graduated response from gentle eviction to emergency purge; monitor pressure levels.
•Measure everything — Hit rate, eviction rate, latency, and utilization must be monitored; you can't optimize what you don't measure.

What's Next:

Page Complete

2 / 4