System Design (LLD)Cache Testing and Monitoring

Cache Testing and Monitoring

LevelIntermediate

Duration60 mins

TopicCache Testing and Monitoring

4 / 4

Cache Performance Tuning

From Working to Optimal

A cache that 'works' isn't necessarily a cache that works well. The difference between a properly tuned cache and a default configuration can mean dramatic improvements in hit rate, latency, and resource efficiency—often without any code changes.

Cache tuning is the practice of optimizing cache configuration to match your specific workload characteristics. A cache optimized for an e-commerce product catalog behaves very differently from one optimized for user sessions or real-time analytics. One size does not fit all.

This page provides a systematic approach to cache performance tuning, from understanding your workload to selecting optimal configurations and validating the results.

What You Will Learn

By the end of this page, you will understand how to analyze workload characteristics to inform tuning decisions, configure optimal TTL values, memory limits, and eviction policies, identify and eliminate cache performance bottlenecks, and validate that tuning changes achieve the desired improvements.

Understanding Your Workload

Effective cache tuning starts with understanding your workload. You cannot optimize what you don't understand. Key workload characteristics that influence cache configuration:

Workload Characteristics and Cache Implications
Characteristic	Description	Cache Implication
Access Distribution	How evenly are keys accessed? (uniform vs. skewed)	Skewed access (80/20) benefits more from caching; uniform access needs larger cache
Read/Write Ratio	Percentage of reads vs. writes	High read ratio = ideal for caching; high writes = more invalidation overhead
Working Set Size	Number of unique keys accessed in a time window	Must fit in cache for good hit rate; if larger, expect evictions
Access Recency	How quickly do previously accessed items get re-accessed?	Frequent re-access = longer TTL effective; rare re-access = shorter TTL or no cache
Object Size	Average size of cached values	Large objects consume memory quickly; may need compression or size limits
Access Patterns	Temporal patterns (peaks, periodic, steady)	Pre-warming before peaks; adjust TTL for pattern
Data Volatility	How frequently does underlying data change?	High volatility = shorter TTL or event-based invalidation

Measuring Workload Characteristics:

Workload Analysis Tool
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
// Comprehensive workload analyzer
class CacheWorkloadAnalyzer {
    private accessLog: Array<{key: string; timestamp: number; operation: 'get' | 'set' | 'delete'; size?: number}> = [];
    
    constructor(private analysisWindow: number = 3600000) {} // Default 1 hour
    
    // Record all cache operations
    recordAccess(key: string, operation: 'get' | 'set' | 'delete', size?: number): void {
        this.accessLog.push({
            key,
            timestamp: Date.now(),
            operation,
            size,
        });
        
        // Prune old entries
        const cutoff = Date.now() - this.analysisWindow;
        this.accessLog = this.accessLog.filter(e => e.timestamp > cutoff);
    }
    
    // Generate comprehensive workload report
    analyze(): WorkloadReport {
        const now = Date.now();
        const cutoff = now - this.analysisWindow;
        const recentLogs = this.accessLog.filter(e => e.timestamp > cutoff);
        
        // 1. Read/Write Ratio
        const gets = recentLogs.filter(e => e.operation === 'get').length;
        const sets = recentLogs.filter(e => e.operation === 'set').length;
        const deletes = recentLogs.filter(e => e.operation === 'delete').length;
        const readWriteRatio = gets / (sets + deletes || 1);
        
        // 2. Access Distribution (calculate Pareto ratio)
        const keyAccess = new Map<string, number>();
        recentLogs.filter(e => e.operation === 'get').forEach(e => {
            keyAccess.set(e.key, (keyAccess.get(e.key) || 0) + 1);
        });
        const sortedAccess = [...keyAccess.values()].sort((a, b) => b - a);
        const totalAccess = sortedAccess.reduce((a, b) => a + b, 0);
        const top20Percent = Math.ceil(sortedAccess.length * 0.2);
        const top20Access = sortedAccess.slice(0, top20Percent).reduce((a, b) => a + b, 0);
        const paretoRatio = totalAccess > 0 ? top20Access / totalAccess : 0;
        
        // 3. Working Set Size
        const uniqueKeys = keyAccess.size;
        
        // 4. Access Frequency Distribution
        const accessCounts = [...keyAccess.values()];
        const avgAccessPerKey = accessCounts.reduce((a, b) => a + b, 0) / (accessCounts.length || 1);
        const maxAccessPerKey = Math.max(...accessCounts, 0);
        
        // 5. Object Size Distribution
        const sizes = recentLogs.filter(e => e.size).map(e => e.size!);
        const avgSize = sizes.reduce((a, b) => a + b, 0) / (sizes.length || 1);
        const maxSize = Math.max(...sizes, 0);
        const p95Size = sizes.length > 0 ? this.percentile(sizes, 95) : 0;
        
        // 6. Re-access Time Distribution
        const reAccessTimes = this.calculateReAccessTimes(recentLogs);
        const avgReAccessTime = reAccessTimes.length > 0
            ? reAccessTimes.reduce((a, b) => a + b, 0) / reAccessTimes.length
            : null;
        
        // 7. Temporal Pattern Analysis
        const hourlyDistribution = this.calculateHourlyDistribution(recentLogs);
        
        return {
            timeWindow: this.analysisWindow,
            totalOperations: recentLogs.length,
            
            readWriteRatio: {
                gets,
                sets,
                deletes,
                ratio: readWriteRatio,
                recommendation: readWriteRatio > 10 ? 'Excellent caching candidate' 
                    : readWriteRatio > 3 ? 'Good caching candidate'
                    : 'Consider if caching is beneficial',
            },
            
            accessDistribution: {
                paretoRatio,
                interpretation: paretoRatio > 0.8 ? 'Highly skewed (classic 80/20)' 
                    : paretoRatio > 0.6 ? 'Moderately skewed'
                    : 'Relatively uniform',
                recommendation: paretoRatio > 0.6 
                    ? 'Small cache can be very effective' 
                    : 'Need larger cache for good coverage',
            },
            
            workingSet: {
                uniqueKeys,
                recommendation: `Cache should hold at least ${Math.ceil(uniqueKeys * 0.3)} keys for 80% coverage`,
            },
            
            accessFrequency: {
                avgAccessPerKey,
                maxAccessPerKey,
                hotKeys: [...keyAccess.entries()]
                    .sort((a, b) => b[1] - a[1])
                    .slice(0, 10)
                    .map(([key, count]) => ({ key, count })),
            },
            
            objectSize: {
                avgBytes: avgSize,
                maxBytes: maxSize,
                p95Bytes: p95Size,
                estimatedMemoryForWorkingSet: uniqueKeys * avgSize,
            },
            
            reAccessPattern: {
                avgReAccessTimeMs: avgReAccessTime,
                recommendation: avgReAccessTime 
                    ? `TTL should be at least ${Math.ceil(avgReAccessTime / 1000 * 1.5)}s`
                    : 'Insufficient data for TTL recommendation',
            },
            
            temporalPattern: {
                hourlyDistribution,
                peakHour: hourlyDistribution.indexOf(Math.max(...hourlyDistribution)),
                recommendation: 'Consider pre-warming cache before peak hours',
            },
        };
    }
    
    private calculateReAccessTimes(logs: typeof this.accessLog): number[] {
        const lastAccess = new Map<string, number>();
        const reAccessTimes: number[] = [];
        
        for (const log of logs) {
            if (log.operation === 'get') {
                const last = lastAccess.get(log.key);
                if (last) {
                    reAccessTimes.push(log.timestamp - last);
                }
                lastAccess.set(log.key, log.timestamp);
            }
        }
        
        return reAccessTimes;
    }
    
    private calculateHourlyDistribution(logs: typeof this.accessLog): number[] {
        const hourly = new Array(24).fill(0);
        for (const log of logs) {
            const hour = new Date(log.timestamp).getHours();
            hourly[hour]++;
        }
        return hourly;
    }
    
    private percentile(arr: number[], p: number): number {
        const sorted = [...arr].sort((a, b) => a - b);
        const index = Math.ceil((p / 100) * sorted.length) - 1;
        return sorted[index];
    }
}

Sample Before Deciding

Run workload analysis for at least 24-48 hours to capture daily patterns. A single hour's snapshot might miss important variations. Ideally, analyze a full week to capture weekend vs. weekday differences.

Optimizing TTL Configuration

TTL (Time-To-Live) is one of the most impactful cache configuration parameters. Too short, and you're constantly refetching data unnecessarily. Too long, and you're serving stale data. The optimal TTL balances freshness requirements against cache effectiveness.

TTL Selection Factors

•Data Change Frequency — How often does the underlying data change? TTL should be shorter than typical change intervals.
•Staleness Tolerance — How stale can data be before users notice or business is impacted? Some data (stock prices) needs seconds-level freshness; other data (user preferences) can be hours stale.
•Access Frequency — High-access keys should have longer TTL to maximize hit rate. Low-access keys may be evicted before TTL anyway.
•Invalidation Strategy — If you have event-based invalidation, TTL can be longer as a safety net. If TTL is your only invalidation, it must be shorter.
•Backend Cost — Expensive queries warrant longer TTL to reduce backend load. Cheap queries can afford shorter TTL.

The TTL Optimization Process:

Baseline Measurement: Measure current hit rate, backend load, and staleness complaints
Categorize Data Types: Group cached data by freshness requirements
Calculate Optimal TTL: Use formulas based on access patterns and change frequency
A/B Test Changes: Deploy new TTL to subset of traffic
Validate and Iterate: Measure impact, adjust as needed

TTL Optimization Calculator
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
// TTL optimization based on workload analysis
class TTLOptimizer {
    
    /**
     * Calculate optimal TTL based on data characteristics
     */
    calculateOptimalTTL(params: TTLParams): TTLRecommendation {
        const {
            avgChangeIntervalSeconds,    // How often data changes
            maxAcceptableStaleSeconds,   // Business tolerance for staleness
            avgReAccessTimeSeconds,      // How often items are re-accessed
            backendQueryCostMs,          // Cost of regenerating data
            invalidationReliability,     // 0-1: How reliable is event-based invalidation?
        } = params;
        
        // Rule 1: TTL should be less than acceptable staleness
        const stalenessBound = maxAcceptableStaleSeconds;
        
        // Rule 2: TTL should be long enough for re-access
        // Aim for at least 2x average re-access time to ensure hits on return visits
        const reAccessBound = avgReAccessTimeSeconds * 2;
        
        // Rule 3: With reliable invalidation, TTL can be longer (serves as safety net)
        const reliabilityMultiplier = 1 + (invalidationReliability * 4);
        // If invalidation is 100% reliable, allow 5x longer TTL
        const adjustedStalenessBound = stalenessBound * reliabilityMultiplier;
        
        // Rule 4: High-cost queries warrant longer TTL
        const costMultiplier = Math.log10(backendQueryCostMs + 1) / 2;
        // 10ms query = 0.5x, 100ms = 1x, 1000ms = 1.5x
        
        // Calculate recommended TTL
        const baseTTL = Math.min(adjustedStalenessBound, avgChangeIntervalSeconds * 0.8);
        const recommendedTTL = Math.max(
            baseTTL * (1 + costMultiplier),
            reAccessBound / 2  // At minimum, half of re-access time
        );
        
        // Cap at 24 hours unless explicitly allowed
        const cappedTTL = Math.min(recommendedTTL, 86400);
        
        return {
            recommendedTTLSeconds: Math.round(cappedTTL),
            reasoning: {
                stalenessBound,
                reAccessBound,
                reliabilityMultiplier,
                costMultiplier,
            },
            confidence: this.calculateConfidence(params),
            warnings: this.generateWarnings(params, cappedTTL),
        };
    }
    
    /**
     * Per-key-pattern TTL configuration
     */
    generateTTLConfig(patterns: KeyPatternAnalysis[]): Map<string, number> {
        const config = new Map<string, number>();
        
        for (const pattern of patterns) {
            const recommendation = this.calculateOptimalTTL({
                avgChangeIntervalSeconds: pattern.avgChangeInterval,
                maxAcceptableStaleSeconds: pattern.stalenessTolerance,
                avgReAccessTimeSeconds: pattern.avgReAccessTime,
                backendQueryCostMs: pattern.queryLatencyMs,
                invalidationReliability: pattern.hasEventInvalidation ? 0.95 : 0,
            });
            
            config.set(pattern.pattern, recommendation.recommendedTTLSeconds);
        }
        
        return config;
    }
    
    private calculateConfidence(params: TTLParams): 'high' | 'medium' | 'low' {
        // Confidence based on data completeness
        if (params.avgChangeIntervalSeconds > 0 && params.avgReAccessTimeSeconds > 0) {
            return 'high';
        } else if (params.maxAcceptableStaleSeconds > 0) {
            return 'medium';
        }
        return 'low';
    }
    
    private generateWarnings(params: TTLParams, ttl: number): string[] {
        const warnings: string[] = [];
        
        if (ttl > params.avgChangeIntervalSeconds) {
            warnings.push('TTL exceeds average change interval - expect some stale data');
        }
        if (ttl < params.avgReAccessTimeSeconds) {
            warnings.push('TTL shorter than re-access time - hit rate may suffer');
        }
        if (params.invalidationReliability < 0.5 && ttl > 3600) {
            warnings.push('Long TTL with unreliable invalidation - staleness risk');
        }
        
        return warnings;
    }
}
 
// Example usage
const optimizer = new TTLOptimizer();
 
// Product catalog: changes rarely, needs freshness
const productTTL = optimizer.calculateOptimalTTL({
    avgChangeIntervalSeconds: 86400,          // Products change ~daily
    maxAcceptableStaleSeconds: 300,           // 5 min staleness OK
    avgReAccessTimeSeconds: 1800,             // Re-accessed every 30 min
    backendQueryCostMs: 50,                   // Moderate query cost
    invalidationReliability: 0.95,            // Good event-based invalidation
});
// Recommendation: ~1440s (24 min) with high confidence
 
// User session: changes frequently, needs immediate freshness
const sessionTTL = optimizer.calculateOptimalTTL({
    avgChangeIntervalSeconds: 60,             // Session changes every minute
    maxAcceptableStaleSeconds: 5,             // Must be <5s stale
    avgReAccessTimeSeconds: 10,               // Re-accessed every 10s
    backendQueryCostMs: 5,                    // Fast query
    invalidationReliability: 0.99,            // Very reliable invalidation
});
// Recommendation: ~25s with high confidence

Jitter Your TTLs

When many cache entries have the same TTL and were populated at similar times, they'll all expire together, causing a cache stampede. Add random jitter (±10-20%) to TTL values to spread expirations over time.

Memory Sizing and Eviction Policies

Cache memory sizing requires balancing cost (memory is expensive) against effectiveness (larger cache = higher hit rate). The optimal size depends on your working set, access patterns, and budget constraints.

Memory Sizing Formula:

Minimum Cache Size = Working Set Size × Average Object Size × Safety Factor

Where:
- Working Set Size = Unique keys accessed in typical period
- Average Object Size = Mean bytes per cached value
- Safety Factor = 1.2 to 1.5 (account for overhead and headroom)

Example Calculation:

Working set: 100,000 product entries
Average object: 2 KB
Safety factor: 1.3

Cache Size = 100,000 × 2 KB × 1.3 = 260 MB

Eviction Policy Comparison
Policy	Description	Best For	Overhead
LRU (Least Recently Used)	Evicts least recently accessed item	General purpose, temporal locality	Low to Medium
LFU (Least Frequently Used)	Evicts least accessed item overall	Stable hot data sets	Medium (requires counters)
FIFO (First In First Out)	Evicts oldest item	Simple, predictable, streaming data	Very Low
Random	Evicts random item	Uniform access patterns	Very Low
TTL-Only	Items only expire by TTL, no eviction	When all data fits in memory	None
LRU + TTL	LRU with TTL as maximum age	Most production workloads	Low to Medium
Segmented LRU	Hot/Cold segments with LRU each	Mixed access patterns	Medium

Memory Sizing Analysis
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
// Cache memory sizing analysis and recommendations
class CacheSizingAnalyzer {
    
    async analyzeCurrentUsage(redis: Redis): Promise<MemoryAnalysis> {
        const info = await redis.info('memory');
        const keyspace = await redis.info('keyspace');
        
        // Parse Redis INFO output
        const usedMemory = this.parseInfoValue(info, 'used_memory');
        const maxMemory = this.parseInfoValue(info, 'maxmemory');
        const fragRatio = this.parseInfoValue(info, 'mem_fragmentation_ratio');
        const evictedKeys = this.parseInfoValue(info, 'evicted_keys');
        
        // Calculate key count and average size
        const dbInfo = this.parseKeyspaceInfo(keyspace);
        const totalKeys = dbInfo.keys;
        const avgKeySize = totalKeys > 0 ? usedMemory / totalKeys : 0;
        
        // Memory utilization
        const utilizationPercent = maxMemory > 0 ? (usedMemory / maxMemory) * 100 : 0;
        
        return {
            current: {
                usedMemoryBytes: usedMemory,
                maxMemoryBytes: maxMemory,
                utilizationPercent,
                fragmentationRatio: fragRatio,
                totalKeys,
                avgKeyBytes: avgKeySize,
                evictedKeys,
            },
            
            recommendations: this.generateRecommendations({
                utilization: utilizationPercent,
                fragmentation: fragRatio,
                evictions: evictedKeys,
            }),
        };
    }
    
    calculateOptimalSize(params: SizingParams): SizeRecommendation {
        const {
            workingSetSize,          // Number of unique keys
            avgObjectBytes,          // Average bytes per value
            targetHitRatePercent,    // Desired hit rate (e.g., 90)
            accessDistributionSkew,  // 0-1, higher = more skewed
        } = params;
        
        // Base size to hold entire working set
        const fullWorkingSetBytes = workingSetSize * avgObjectBytes;
        
        // Adjust based on target hit rate and access distribution
        // With skewed access (hot keys), smaller cache achieves same hit rate
        const coverageNeeded = this.calculateCoverageForHitRate(
            targetHitRatePercent,
            accessDistributionSkew
        );
        
        const minSizeBytes = fullWorkingSetBytes * coverageNeeded;
        
        // Add overhead for metadata, fragmentation, etc.
        const overhead = 1.3;
        const recommendedBytes = minSizeBytes * overhead;
        
        return {
            minimumBytes: Math.ceil(minSizeBytes),
            recommendedBytes: Math.ceil(recommendedBytes),
            fullWorkingSetBytes,
            coveragePercent: coverageNeeded * 100,
            breakdown: {
                workingSetSize,
                avgObjectBytes,
                coveredKeys: Math.ceil(workingSetSize * coverageNeeded),
                overheadPercent: (overhead - 1) * 100,
            },
        };
    }
    
    // Simulate hit rate at different cache sizes
    async simulateSizingScenarios(
        accessHistory: string[],
        objectSizes: Map<string, number>,
        scenarios: number[]
    ): Promise<SizingScenario[]> {
        const results: SizingScenario[] = [];
        
        for (const sizeBytes of scenarios) {
            const simulation = this.runLRUSimulation(accessHistory, objectSizes, sizeBytes);
            results.push({
                cacheSizeBytes: sizeBytes,
                hitRate: simulation.hits / (simulation.hits + simulation.misses),
                evictionCount: simulation.evictions,
                avgMemoryUtilization: simulation.avgMemoryUsed / sizeBytes,
            });
        }
        
        return results;
    }
    
    private calculateCoverageForHitRate(targetHitRate: number, skew: number): number {
        // With perfect 80/20 distribution (skew=1), 20% of keys = 80% hit rate
        // With uniform distribution (skew=0), need 95% of keys for 95% hit rate
        
        const targetRate = targetHitRate / 100;
        
        if (skew > 0.8) {
            // Highly skewed: small cache is effective
            if (targetRate <= 0.8) return 0.2;
            if (targetRate <= 0.9) return 0.35;
            if (targetRate <= 0.95) return 0.5;
            return 0.7;
        } else if (skew > 0.5) {
            // Moderately skewed
            if (targetRate <= 0.8) return 0.4;
            if (targetRate <= 0.9) return 0.6;
            if (targetRate <= 0.95) return 0.75;
            return 0.9;
        } else {
            // Uniform access: need most of working set
            return Math.min(targetRate + 0.05, 1.0);
        }
    }
    
    private generateRecommendations(metrics: {
        utilization: number;
        fragmentation: number;
        evictions: number;
    }): string[] {
        const recommendations: string[] = [];
        
        if (metrics.utilization > 90) {
            recommendations.push('HIGH PRIORITY: Memory utilization >90%. Increase cache size or reduce working set.');
        } else if (metrics.utilization > 80) {
            recommendations.push('Memory utilization >80%. Monitor closely; consider increasing size.');
        }
        
        if (metrics.fragmentation > 1.5) {
            recommendations.push(`High fragmentation ratio (${metrics.fragmentation}). Consider restarting Redis or using active-defrag.`);
        }
        
        if (metrics.evictions > 0) {
            recommendations.push(`${metrics.evictions} evictions detected. If hit rate is suffering, increase cache size.`);
        }
        
        if (metrics.utilization < 30) {
            recommendations.push('Low memory utilization (<30%). Cache may be over-provisioned; could reduce costs.');
        }
        
        return recommendations;
    }
    
    private runLRUSimulation(
        accesses: string[],
        sizes: Map<string, number>,
        maxSize: number
    ): {hits: number; misses: number; evictions: number; avgMemoryUsed: number} {
        const cache = new Map<string, number>(); // key -> size
        const order: string[] = []; // LRU order
        let currentSize = 0;
        let hits = 0, misses = 0, evictions = 0;
        let totalMemory = 0, samples = 0;
        
        for (const key of accesses) {
            const size = sizes.get(key) || 1000; // default 1KB
            
            if (cache.has(key)) {
                // Hit: move to front of LRU
                hits++;
                const idx = order.indexOf(key);
                order.splice(idx, 1);
                order.unshift(key);
            } else {
                // Miss: add to cache
                misses++;
                
                // Evict if necessary
                while (currentSize + size > maxSize && order.length > 0) {
                    const evicted = order.pop()!;
                    currentSize -= cache.get(evicted)!;
                    cache.delete(evicted);
                    evictions++;
                }
                
                cache.set(key, size);
                order.unshift(key);
                currentSize += size;
            }
            
            totalMemory += currentSize;
            samples++;
        }
        
        return {
            hits,
            misses,
            evictions,
            avgMemoryUsed: totalMemory / (samples || 1),
        };
    }
    
    private parseInfoValue(info: string, key: string): number {
        const match = info.match(new RegExp(`${key}:(\d+)`));
        return match ? parseInt(match[1]) : 0;
    }
    
    private parseKeyspaceInfo(keyspace: string): {keys: number} {
        const match = keyspace.match(/keys=(d+)/);
        return { keys: match ? parseInt(match[1]) : 0 };
    }
}

Identifying Performance Bottlenecks

Even a properly configured cache can have performance bottlenecks that limit its effectiveness. Common bottlenecks include network latency, serialization overhead, hot key contention, and connection pool exhaustion.

Common Cache Performance Bottlenecks
Bottleneck	Symptoms	Diagnosis	Solutions
Network Latency	High P50/P95 latency even for hits	Compare localhost Redis to remote Redis latency	Use local caching tier, connection pooling, pipelining
Serialization	CPU spikes on cache operations, large objects slow	Profile serialization time separately	Use faster serializer (Protocol Buffers, MessagePack), compress large objects
Hot Key Contention	Single keys have extremely high access rate, potential throttling	Track per-key access counts	Replicate hot keys, add local cache layer
Connection Pool Exhaustion	Timeouts, connection errors under load	Monitor pool utilization, connection wait times	Increase pool size, reduce connection hold time
Large Object Size	High latency for specific keys, memory pressure	Analyze object size distribution	Compress, chunk large objects, or skip caching them
Cache Stampede	Backend load spikes when popular keys expire	Correlate backend load with cache expiration	Implement locking or probabilistic early refresh

Cache Bottleneck Profiler
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
// Comprehensive cache performance profiler
class CachePerformanceProfiler {
    private latencyHistogram: Map<string, number[]> = new Map();
    private keyAccessCounts: Map<string, number> = new Map();
    private connectionMetrics: ConnectionMetrics;
    
    constructor(private cache: Cache<any>) {
        this.connectionMetrics = {
            totalConnections: 0,
            activeConnections: 0,
            peakConnections: 0,
            waitTimes: [],
        };
    }
    
    // Profile a specific operation
    async profileOperation<T>(
        operation: string,
        key: string,
        fn: () => Promise<T>
    ): Promise<{result: T; profile: OperationProfile}> {
        const profile: OperationProfile = {
            operation,
            key,
            startTime: performance.now(),
            phases: {},
        };
        
        // Track connection acquisition
        profile.phases.connectionAcquire = { start: performance.now() };
        // ... connection pool timing
        profile.phases.connectionAcquire.end = performance.now();
        
        // Track actual operation
        profile.phases.execution = { start: performance.now() };
        const result = await fn();
        profile.phases.execution.end = performance.now();
        
        // Track deserialization if result exists
        if (result) {
            profile.phases.deserialization = { start: performance.now() };
            // Deserialization typically happens during execution, but we can estimate
            profile.phases.deserialization.end = performance.now();
        }
        
        profile.totalTimeMs = performance.now() - profile.startTime;
        
        // Record for analysis
        this.recordLatency(operation, profile.totalTimeMs);
        this.recordKeyAccess(key);
        
        return { result, profile };
    }
    
    // Generate comprehensive performance report
    generateReport(): PerformanceReport {
        const report: PerformanceReport = {
            latencyAnalysis: {},
            hotKeyAnalysis: this.analyzeHotKeys(),
            connectionHealth: this.analyzeConnections(),
            recommendations: [],
        };
        
        // Latency analysis per operation
        for (const [operation, latencies] of this.latencyHistogram) {
            const sorted = [...latencies].sort((a, b) => a - b);
            report.latencyAnalysis[operation] = {
                p50: this.percentile(sorted, 50),
                p95: this.percentile(sorted, 95),
                p99: this.percentile(sorted, 99),
                avg: latencies.reduce((a, b) => a + b, 0) / latencies.length,
                count: latencies.length,
            };
        }
        
        // Generate recommendations
        report.recommendations = this.generateRecommendations(report);
        
        return report;
    }
    
    // Analyze for hot keys
    private analyzeHotKeys(): HotKeyAnalysis {
        const sorted = [...this.keyAccessCounts.entries()]
            .sort((a, b) => b[1] - a[1]);
        
        const totalAccess = sorted.reduce((sum, [_, count]) => sum + count, 0);
        const top10 = sorted.slice(0, 10);
        const top10Access = top10.reduce((sum, [_, count]) => sum + count, 0);
        
        return {
            topKeys: top10.map(([key, count]) => ({
                key,
                accessCount: count,
                percentOfTotal: (count / totalAccess) * 100,
            })),
            concentration: top10Access / totalAccess,
            hotKeyWarning: top10Access / totalAccess > 0.5,
        };
    }
    
    private analyzeConnections(): ConnectionAnalysis {
        const waitTimes = this.connectionMetrics.waitTimes;
        const sorted = [...waitTimes].sort((a, b) => a - b);
        
        return {
            poolUtilization: this.connectionMetrics.activeConnections / 
                (this.connectionMetrics.totalConnections || 1) * 100,
            peakUtilization: this.connectionMetrics.peakConnections /
                (this.connectionMetrics.totalConnections || 1) * 100,
            avgWaitTimeMs: waitTimes.length > 0 
                ? waitTimes.reduce((a, b) => a + b, 0) / waitTimes.length 
                : 0,
            p95WaitTimeMs: sorted.length > 0 ? this.percentile(sorted, 95) : 0,
        };
    }
    
    private generateRecommendations(report: PerformanceReport): string[] {
        const recommendations: string[] = [];
        
        // Latency recommendations
        for (const [op, stats] of Object.entries(report.latencyAnalysis)) {
            if (stats.p95 > 10) { // > 10ms is concerning for cache
                recommendations.push(
                    `${op} P95 latency is ${stats.p95.toFixed(1)}ms - investigate network or serialization`
                );
            }
        }
        
        // Hot key recommendations
        if (report.hotKeyAnalysis.hotKeyWarning) {
            recommendations.push(
                `Hot key detected: top 10 keys account for ${(report.hotKeyAnalysis.concentration * 100).toFixed(1)}% of traffic. Consider local caching or replication.`
            );
        }
        
        // Connection recommendations
        if (report.connectionHealth.peakUtilization > 80) {
            recommendations.push(
                `Connection pool peak utilization at ${report.connectionHealth.peakUtilization.toFixed(1)}%. Increase pool size.`
            );
        }
        if (report.connectionHealth.p95WaitTimeMs > 5) {
            recommendations.push(
                `P95 connection wait time is ${report.connectionHealth.p95WaitTimeMs.toFixed(1)}ms. Pool may be undersized.`
            );
        }
        
        return recommendations;
    }
    
    private recordLatency(operation: string, ms: number): void {
        if (!this.latencyHistogram.has(operation)) {
            this.latencyHistogram.set(operation, []);
        }
        this.latencyHistogram.get(operation)!.push(ms);
    }
    
    private recordKeyAccess(key: string): void {
        const pattern = key.split(':')[0]; // Group by pattern
        this.keyAccessCounts.set(pattern, (this.keyAccessCounts.get(pattern) || 0) + 1);
    }
    
    private percentile(sorted: number[], p: number): number {
        if (sorted.length === 0) return 0;
        const idx = Math.ceil((p / 100) * sorted.length) - 1;
        return sorted[Math.max(0, idx)];
    }
}

Advanced Optimization Strategies

Beyond basic tuning, advanced optimization strategies can significantly improve cache performance for specific workloads.

Advanced Optimization Techniques

•Tiered Caching — Use multiple cache layers (L1: in-process, L2: distributed). Hot data stays in L1 for microsecond access; L2 handles larger working set.
•Compression — Compress large objects before caching. Trade CPU for memory and network bandwidth. Most effective for objects >1KB.
•Pipelining — Batch multiple cache operations into single round-trip. Dramatically reduces network overhead for bulk operations.
•Hot-Key Replication — Replicate extremely hot keys across multiple cache nodes to distribute load.
•Pre-Fetching — Predict and pre-fetch data before it's needed based on access patterns or navigation behavior.
•Adaptive TTL — Dynamically adjust TTL based on access frequency. Hot items get longer TTL; cold items expire faster.

Tiered Cache Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
// Two-tier caching: Local (L1) + Distributed (L2)
class TieredCache<T> implements Cache<T> {
    constructor(
        private l1Cache: LocalCache<T>,    // In-process, microsecond access
        private l2Cache: DistributedCache<T>, // Distributed, millisecond access
        private options: TieredCacheOptions = {}
    ) {}
    
    async get(key: string): Promise<T | null> {
        // Try L1 first (ultra-fast, no network)
        const l1Result = this.l1Cache.get(key);
        if (l1Result !== null) {
            this.options.metrics?.recordL1Hit();
            return l1Result;
        }
        
        // Try L2 (network round-trip, still fast)
        const l2Result = await this.l2Cache.get(key);
        if (l2Result !== null) {
            this.options.metrics?.recordL2Hit();
            
            // Promote to L1 for future access
            this.l1Cache.set(key, l2Result, this.options.l1TTLSeconds || 60);
            
            return l2Result;
        }
        
        this.options.metrics?.recordMiss();
        return null;
    }
    
    async set(key: string, value: T, ttlSeconds?: number): Promise<void> {
        // Write to both tiers
        await Promise.all([
            // L1: shorter TTL, memory sensitive
            Promise.resolve(
                this.l1Cache.set(key, value, Math.min(ttlSeconds || 60, this.options.l1MaxTTL || 60))
            ),
            // L2: full TTL, persistent
            this.l2Cache.set(key, value, ttlSeconds),
        ]);
    }
    
    async delete(key: string): Promise<void> {
        // Invalidate both tiers
        await Promise.all([
            Promise.resolve(this.l1Cache.delete(key)),
            this.l2Cache.delete(key),
        ]);
    }
    
    async clear(): Promise<void> {
        await Promise.all([
            Promise.resolve(this.l1Cache.clear()),
            this.l2Cache.clear(),
        ]);
    }
    
    // Get stats for both tiers
    getStats(): TieredCacheStats {
        return {
            l1: {
                size: this.l1Cache.size(),
                hitRate: this.options.metrics?.l1HitRate() || 0,
            },
            l2: {
                hitRate: this.options.metrics?.l2HitRate() || 0,
            },
            overallHitRate: this.options.metrics?.overallHitRate() || 0,
        };
    }
}
 
// LRU-based local cache with size limit
class LocalLRUCache<T> implements LocalCache<T> {
    private cache: Map<string, {value: T; expiresAt: number}> = new Map();
    private accessOrder: string[] = [];
    
    constructor(
        private maxSize: number,
        private maxMemoryBytes?: number
    ) {}
    
    get(key: string): T | null {
        const entry = this.cache.get(key);
        if (!entry) return null;
        
        if (Date.now() > entry.expiresAt) {
            this.delete(key);
            return null;
        }
        
        // Move to front of LRU
        this.touchKey(key);
        return entry.value;
    }
    
    set(key: string, value: T, ttlSeconds: number): void {
        // Evict if at capacity
        while (this.cache.size >= this.maxSize) {
            this.evictOldest();
        }
        
        this.cache.set(key, {
            value,
            expiresAt: Date.now() + (ttlSeconds * 1000),
        });
        this.touchKey(key);
    }
    
    delete(key: string): void {
        this.cache.delete(key);
        const idx = this.accessOrder.indexOf(key);
        if (idx !== -1) this.accessOrder.splice(idx, 1);
    }
    
    clear(): void {
        this.cache.clear();
        this.accessOrder = [];
    }
    
    size(): number {
        return this.cache.size;
    }
    
    private touchKey(key: string): void {
        const idx = this.accessOrder.indexOf(key);
        if (idx !== -1) this.accessOrder.splice(idx, 1);
        this.accessOrder.unshift(key);
    }
    
    private evictOldest(): void {
        const oldest = this.accessOrder.pop();
        if (oldest) this.cache.delete(oldest);
    }
}

Measure Before Optimizing

Don't apply advanced optimizations without evidence they're needed. Profile your cache performance first. A simple single-tier cache with proper TTL is often sufficient. Add complexity only when measurements show clear bottlenecks.

Validating Tuning Changes

Cache tuning without validation is guesswork. Every configuration change should be measured against baseline metrics to confirm improvement.

Tuning Validation Process

•Establish Baseline — Record key metrics (hit rate, latency, backend load) before changes
•Define Success Criteria — What specific improvement do you expect? (e.g., hit rate from 70% to 85%)
•Deploy to Canary — Apply changes to small subset of traffic first
•Monitor and Compare — Compare canary metrics to control group and baseline
•Statistical Validation — Ensure differences are statistically significant, not noise
•Gradual Rollout — Expand to more traffic if canary shows improvement
•Document Results — Record what changed and what impact it had for future reference

Cache Tuning Experiment Framework
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
// Framework for running cache tuning experiments
class CacheTuningExperiment {
    constructor(
        private name: string,
        private controlCache: Cache<any>,
        private experimentCache: Cache<any>,
        private trafficSplitter: TrafficSplitter,
        private metrics: ExperimentMetrics
    ) {}
    
    async runRequest(key: string, fetchFn: () => Promise<any>): Promise<any> {
        const bucket = this.trafficSplitter.getBucket(key);
        const cache = bucket === 'experiment' ? this.experimentCache : this.controlCache;
        const startTime = performance.now();
        
        let result = await cache.get(key);
        const cacheHit = result !== null;
        
        if (!result) {
            result = await fetchFn();
            await cache.set(key, result);
        }
        
        const duration = performance.now() - startTime;
        
        // Record metrics by bucket
        this.metrics.record(bucket, {
            hit: cacheHit,
            latencyMs: duration,
            timestamp: Date.now(),
        });
        
        return result;
    }
    
    async getResults(): Promise<ExperimentResults> {
        const control = this.metrics.getStats('control');
        const experiment = this.metrics.getStats('experiment');
        
        // Calculate relative improvement
        const hitRateDiff = experiment.hitRate - control.hitRate;
        const latencyDiff = control.p50Latency - experiment.p50Latency; // Positive = improvement
        
        // Statistical significance (simplified z-test for proportions)
        const hitRateSignificant = this.isProportionDifferenceSignificant(
            experiment.hitRate, experiment.sampleSize,
            control.hitRate, control.sampleSize,
            0.05 // 95% confidence
        );
        
        return {
            experimentName: this.name,
            control,
            experiment,
            comparison: {
                hitRateDelta: hitRateDiff,
                hitRateImprovement: (hitRateDiff / control.hitRate) * 100,
                latencyImprovement: latencyDiff,
                isSignificant: hitRateSignificant,
            },
            recommendation: this.generateRecommendation(hitRateDiff, hitRateSignificant),
        };
    }
    
    private isProportionDifferenceSignificant(
        p1: number, n1: number,
        p2: number, n2: number,
        alpha: number
    ): boolean {
        // Pooled proportion
        const p = (p1 * n1 + p2 * n2) / (n1 + n2);
        
        // Standard error
        const se = Math.sqrt(p * (1 - p) * (1/n1 + 1/n2));
        
        // Z-score
        const z = Math.abs(p1 - p2) / se;
        
        // Two-tailed critical value for alpha=0.05 is ~1.96
        return z > 1.96;
    }
    
    private generateRecommendation(delta: number, significant: boolean): string {
        if (!significant) {
            return 'No statistically significant difference. Need more samples or the change has no effect.';
        }
        if (delta > 0.05) {
            return `RECOMMENDED: Roll out experiment. Hit rate improved by ${(delta * 100).toFixed(1)}%.`;
        }
        if (delta > 0) {
            return `MARGINAL: Small improvement (${(delta * 100).toFixed(1)}%). Consider if complexity is worth it.`;
        }
        return `NOT RECOMMENDED: Experiment performed worse by ${(Math.abs(delta) * 100).toFixed(1)}%.`;
    }
}
 
// Usage
const experiment = new CacheTuningExperiment(
    'TTL Increase Test',
    new RedisCache({ ttlSeconds: 300 }),   // Control: 5 min TTL
    new RedisCache({ ttlSeconds: 900 }),   // Experiment: 15 min TTL
    new PercentageSplitter(10),            // 10% to experiment
    new PrometheusExperimentMetrics()
);
 
// After sufficient traffic...
const results = await experiment.getResults();
console.log(results.recommendation);

Summary: Cache Performance Tuning

Cache performance tuning transforms a basic cache implementation into an optimized system component. The key is systematic analysis, targeted optimization, and rigorous validation.

Key Takeaways

•Understand your workload first: Access distribution, read/write ratio, working set size, and data volatility all influence optimal configuration.
•TTL is your most impactful lever: Calculate optimal TTL based on change frequency, staleness tolerance, and access patterns. Add jitter to prevent synchronized expirations.
•Size cache to working set: Minimum cache size = working set × average object size × 1.3 overhead factor.
•Choose eviction policy by workload: LRU for temporal locality, LFU for stable hot sets, consider adaptive policies for mixed workloads.
•Profile before optimizing: Measure latency, hot keys, and connection utilization. Fix real bottlenecks, not imagined ones.
•Use tiered caching strategically: L1 (in-process) for ultra-hot data, L2 (distributed) for larger working set. Don't add tiers without evidence of need.
•Validate every change: Use A/B experiments with statistical significance tests. Don't trust intuition—trust data.

Module Complete:

You've now completed the Cache Testing and Monitoring module. You have comprehensive skills in testing cache implementations, collecting and interpreting cache metrics, debugging cache issues systematically, and tuning cache performance for optimal effectiveness. These skills form the operational foundation for running production caching systems with confidence.

Module Complete

You now have a comprehensive understanding of cache testing, monitoring, debugging, and performance tuning. These skills enable you to build, operate, and optimize production caching systems that are reliable, observable, and performant.

4 / 4

Loading learning content...

System Design (LLD)Cache Testing and Monitoring

Cache Testing and Monitoring

LevelIntermediate

Duration60 mins

TopicCache Testing and Monitoring

4 / 4

Cache Performance Tuning

From Working to Optimal

This page provides a systematic approach to cache performance tuning, from understanding your workload to selecting optimal configurations and validating the results.

What You Will Learn

Understanding Your Workload

Effective cache tuning starts with understanding your workload. You cannot optimize what you don't understand. Key workload characteristics that influence cache configuration:

Workload Characteristics and Cache Implications
Characteristic	Description	Cache Implication
Access Distribution	How evenly are keys accessed? (uniform vs. skewed)	Skewed access (80/20) benefits more from caching; uniform access needs larger cache
Read/Write Ratio	Percentage of reads vs. writes	High read ratio = ideal for caching; high writes = more invalidation overhead
Working Set Size	Number of unique keys accessed in a time window	Must fit in cache for good hit rate; if larger, expect evictions
Access Recency	How quickly do previously accessed items get re-accessed?	Frequent re-access = longer TTL effective; rare re-access = shorter TTL or no cache
Object Size	Average size of cached values	Large objects consume memory quickly; may need compression or size limits
Access Patterns	Temporal patterns (peaks, periodic, steady)	Pre-warming before peaks; adjust TTL for pattern
Data Volatility	How frequently does underlying data change?	High volatility = shorter TTL or event-based invalidation

Measuring Workload Characteristics:

Workload Analysis Tool
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
// Comprehensive workload analyzer
class CacheWorkloadAnalyzer {
    private accessLog: Array<{key: string; timestamp: number; operation: 'get' | 'set' | 'delete'; size?: number}> = [];
    
    constructor(private analysisWindow: number = 3600000) {} // Default 1 hour
    
    // Record all cache operations
    recordAccess(key: string, operation: 'get' | 'set' | 'delete', size?: number): void {
        this.accessLog.push({
            key,
            timestamp: Date.now(),
            operation,
            size,
        });
        
        // Prune old entries
        const cutoff = Date.now() - this.analysisWindow;
        this.accessLog = this.accessLog.filter(e => e.timestamp > cutoff);
    }
    
    // Generate comprehensive workload report
    analyze(): WorkloadReport {
        const now = Date.now();
        const cutoff = now - this.analysisWindow;
        const recentLogs = this.accessLog.filter(e => e.timestamp > cutoff);
        
        // 1. Read/Write Ratio
        const gets = recentLogs.filter(e => e.operation === 'get').length;
        const sets = recentLogs.filter(e => e.operation === 'set').length;
        const deletes = recentLogs.filter(e => e.operation === 'delete').length;
        const readWriteRatio = gets / (sets + deletes || 1);
        
        // 2. Access Distribution (calculate Pareto ratio)
        const keyAccess = new Map<string, number>();
        recentLogs.filter(e => e.operation === 'get').forEach(e => {
            keyAccess.set(e.key, (keyAccess.get(e.key) || 0) + 1);
        });
        const sortedAccess = [...keyAccess.values()].sort((a, b) => b - a);
        const totalAccess = sortedAccess.reduce((a, b) => a + b, 0);
        const top20Percent = Math.ceil(sortedAccess.length * 0.2);
        const top20Access = sortedAccess.slice(0, top20Percent).reduce((a, b) => a + b, 0);
        const paretoRatio = totalAccess > 0 ? top20Access / totalAccess : 0;
        
        // 3. Working Set Size
        const uniqueKeys = keyAccess.size;
        
        // 4. Access Frequency Distribution
        const accessCounts = [...keyAccess.values()];
        const avgAccessPerKey = accessCounts.reduce((a, b) => a + b, 0) / (accessCounts.length || 1);
        const maxAccessPerKey = Math.max(...accessCounts, 0);
        
        // 5. Object Size Distribution
        const sizes = recentLogs.filter(e => e.size).map(e => e.size!);
        const avgSize = sizes.reduce((a, b) => a + b, 0) / (sizes.length || 1);
        const maxSize = Math.max(...sizes, 0);
        const p95Size = sizes.length > 0 ? this.percentile(sizes, 95) : 0;
        
        // 6. Re-access Time Distribution
        const reAccessTimes = this.calculateReAccessTimes(recentLogs);
        const avgReAccessTime = reAccessTimes.length > 0
            ? reAccessTimes.reduce((a, b) => a + b, 0) / reAccessTimes.length
            : null;
        
        // 7. Temporal Pattern Analysis
        const hourlyDistribution = this.calculateHourlyDistribution(recentLogs);
        
        return {
            timeWindow: this.analysisWindow,
            totalOperations: recentLogs.length,
            
            readWriteRatio: {
                gets,
                sets,
                deletes,
                ratio: readWriteRatio,
                recommendation: readWriteRatio > 10 ? 'Excellent caching candidate' 
                    : readWriteRatio > 3 ? 'Good caching candidate'
                    : 'Consider if caching is beneficial',
            },
            
            accessDistribution: {
                paretoRatio,
                interpretation: paretoRatio > 0.8 ? 'Highly skewed (classic 80/20)' 
                    : paretoRatio > 0.6 ? 'Moderately skewed'
                    : 'Relatively uniform',
                recommendation: paretoRatio > 0.6 
                    ? 'Small cache can be very effective' 
                    : 'Need larger cache for good coverage',
            },
            
            workingSet: {
                uniqueKeys,
                recommendation: `Cache should hold at least ${Math.ceil(uniqueKeys * 0.3)} keys for 80% coverage`,
            },
            
            accessFrequency: {
                avgAccessPerKey,
                maxAccessPerKey,
                hotKeys: [...keyAccess.entries()]
                    .sort((a, b) => b[1] - a[1])
                    .slice(0, 10)
                    .map(([key, count]) => ({ key, count })),
            },
            
            objectSize: {
                avgBytes: avgSize,
                maxBytes: maxSize,
                p95Bytes: p95Size,
                estimatedMemoryForWorkingSet: uniqueKeys * avgSize,
            },
            
            reAccessPattern: {
                avgReAccessTimeMs: avgReAccessTime,
                recommendation: avgReAccessTime 
                    ? `TTL should be at least ${Math.ceil(avgReAccessTime / 1000 * 1.5)}s`
                    : 'Insufficient data for TTL recommendation',
            },
            
            temporalPattern: {
                hourlyDistribution,
                peakHour: hourlyDistribution.indexOf(Math.max(...hourlyDistribution)),
                recommendation: 'Consider pre-warming cache before peak hours',
            },
        };
    }
    
    private calculateReAccessTimes(logs: typeof this.accessLog): number[] {
        const lastAccess = new Map<string, number>();
        const reAccessTimes: number[] = [];
        
        for (const log of logs) {
            if (log.operation === 'get') {
                const last = lastAccess.get(log.key);
                if (last) {
                    reAccessTimes.push(log.timestamp - last);
                }
                lastAccess.set(log.key, log.timestamp);
            }
        }
        
        return reAccessTimes;
    }
    
    private calculateHourlyDistribution(logs: typeof this.accessLog): number[] {
        const hourly = new Array(24).fill(0);
        for (const log of logs) {
            const hour = new Date(log.timestamp).getHours();
            hourly[hour]++;
        }
        return hourly;
    }
    
    private percentile(arr: number[], p: number): number {
        const sorted = [...arr].sort((a, b) => a - b);
        const index = Math.ceil((p / 100) * sorted.length) - 1;
        return sorted[index];
    }
}

Sample Before Deciding

Optimizing TTL Configuration

TTL Selection Factors

•Data Change Frequency — How often does the underlying data change? TTL should be shorter than typical change intervals.
•Staleness Tolerance — How stale can data be before users notice or business is impacted? Some data (stock prices) needs seconds-level freshness; other data (user preferences) can be hours stale.
•Access Frequency — High-access keys should have longer TTL to maximize hit rate. Low-access keys may be evicted before TTL anyway.
•Invalidation Strategy — If you have event-based invalidation, TTL can be longer as a safety net. If TTL is your only invalidation, it must be shorter.
•Backend Cost — Expensive queries warrant longer TTL to reduce backend load. Cheap queries can afford shorter TTL.

The TTL Optimization Process:

Baseline Measurement: Measure current hit rate, backend load, and staleness complaints
Categorize Data Types: Group cached data by freshness requirements
Calculate Optimal TTL: Use formulas based on access patterns and change frequency
A/B Test Changes: Deploy new TTL to subset of traffic
Validate and Iterate: Measure impact, adjust as needed

TTL Optimization Calculator
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
// TTL optimization based on workload analysis
class TTLOptimizer {
    
    /**
     * Calculate optimal TTL based on data characteristics
     */
    calculateOptimalTTL(params: TTLParams): TTLRecommendation {
        const {
            avgChangeIntervalSeconds,    // How often data changes
            maxAcceptableStaleSeconds,   // Business tolerance for staleness
            avgReAccessTimeSeconds,      // How often items are re-accessed
            backendQueryCostMs,          // Cost of regenerating data
            invalidationReliability,     // 0-1: How reliable is event-based invalidation?
        } = params;
        
        // Rule 1: TTL should be less than acceptable staleness
        const stalenessBound = maxAcceptableStaleSeconds;
        
        // Rule 2: TTL should be long enough for re-access
        // Aim for at least 2x average re-access time to ensure hits on return visits
        const reAccessBound = avgReAccessTimeSeconds * 2;
        
        // Rule 3: With reliable invalidation, TTL can be longer (serves as safety net)
        const reliabilityMultiplier = 1 + (invalidationReliability * 4);
        // If invalidation is 100% reliable, allow 5x longer TTL
        const adjustedStalenessBound = stalenessBound * reliabilityMultiplier;
        
        // Rule 4: High-cost queries warrant longer TTL
        const costMultiplier = Math.log10(backendQueryCostMs + 1) / 2;
        // 10ms query = 0.5x, 100ms = 1x, 1000ms = 1.5x
        
        // Calculate recommended TTL
        const baseTTL = Math.min(adjustedStalenessBound, avgChangeIntervalSeconds * 0.8);
        const recommendedTTL = Math.max(
            baseTTL * (1 + costMultiplier),
            reAccessBound / 2  // At minimum, half of re-access time
        );
        
        // Cap at 24 hours unless explicitly allowed
        const cappedTTL = Math.min(recommendedTTL, 86400);
        
        return {
            recommendedTTLSeconds: Math.round(cappedTTL),
            reasoning: {
                stalenessBound,
                reAccessBound,
                reliabilityMultiplier,
                costMultiplier,
            },
            confidence: this.calculateConfidence(params),
            warnings: this.generateWarnings(params, cappedTTL),
        };
    }
    
    /**
     * Per-key-pattern TTL configuration
     */
    generateTTLConfig(patterns: KeyPatternAnalysis[]): Map<string, number> {
        const config = new Map<string, number>();
        
        for (const pattern of patterns) {
            const recommendation = this.calculateOptimalTTL({
                avgChangeIntervalSeconds: pattern.avgChangeInterval,
                maxAcceptableStaleSeconds: pattern.stalenessTolerance,
                avgReAccessTimeSeconds: pattern.avgReAccessTime,
                backendQueryCostMs: pattern.queryLatencyMs,
                invalidationReliability: pattern.hasEventInvalidation ? 0.95 : 0,
            });
            
            config.set(pattern.pattern, recommendation.recommendedTTLSeconds);
        }
        
        return config;
    }
    
    private calculateConfidence(params: TTLParams): 'high' | 'medium' | 'low' {
        // Confidence based on data completeness
        if (params.avgChangeIntervalSeconds > 0 && params.avgReAccessTimeSeconds > 0) {
            return 'high';
        } else if (params.maxAcceptableStaleSeconds > 0) {
            return 'medium';
        }
        return 'low';
    }
    
    private generateWarnings(params: TTLParams, ttl: number): string[] {
        const warnings: string[] = [];
        
        if (ttl > params.avgChangeIntervalSeconds) {
            warnings.push('TTL exceeds average change interval - expect some stale data');
        }
        if (ttl < params.avgReAccessTimeSeconds) {
            warnings.push('TTL shorter than re-access time - hit rate may suffer');
        }
        if (params.invalidationReliability < 0.5 && ttl > 3600) {
            warnings.push('Long TTL with unreliable invalidation - staleness risk');
        }
        
        return warnings;
    }
}
 
// Example usage
const optimizer = new TTLOptimizer();
 
// Product catalog: changes rarely, needs freshness
const productTTL = optimizer.calculateOptimalTTL({
    avgChangeIntervalSeconds: 86400,          // Products change ~daily
    maxAcceptableStaleSeconds: 300,           // 5 min staleness OK
    avgReAccessTimeSeconds: 1800,             // Re-accessed every 30 min
    backendQueryCostMs: 50,                   // Moderate query cost
    invalidationReliability: 0.95,            // Good event-based invalidation
});
// Recommendation: ~1440s (24 min) with high confidence
 
// User session: changes frequently, needs immediate freshness
const sessionTTL = optimizer.calculateOptimalTTL({
    avgChangeIntervalSeconds: 60,             // Session changes every minute
    maxAcceptableStaleSeconds: 5,             // Must be <5s stale
    avgReAccessTimeSeconds: 10,               // Re-accessed every 10s
    backendQueryCostMs: 5,                    // Fast query
    invalidationReliability: 0.99,            // Very reliable invalidation
});
// Recommendation: ~25s with high confidence

Jitter Your TTLs

Memory Sizing and Eviction Policies

Memory Sizing Formula:

Minimum Cache Size = Working Set Size × Average Object Size × Safety Factor

Where:
- Working Set Size = Unique keys accessed in typical period
- Average Object Size = Mean bytes per cached value
- Safety Factor = 1.2 to 1.5 (account for overhead and headroom)

Example Calculation:

Working set: 100,000 product entries
Average object: 2 KB
Safety factor: 1.3

Cache Size = 100,000 × 2 KB × 1.3 = 260 MB

Eviction Policy Comparison
Policy	Description	Best For	Overhead
LRU (Least Recently Used)	Evicts least recently accessed item	General purpose, temporal locality	Low to Medium
LFU (Least Frequently Used)	Evicts least accessed item overall	Stable hot data sets	Medium (requires counters)
FIFO (First In First Out)	Evicts oldest item	Simple, predictable, streaming data	Very Low
Random	Evicts random item	Uniform access patterns	Very Low
TTL-Only	Items only expire by TTL, no eviction	When all data fits in memory	None
LRU + TTL	LRU with TTL as maximum age	Most production workloads	Low to Medium
Segmented LRU	Hot/Cold segments with LRU each	Mixed access patterns	Medium

Memory Sizing Analysis
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
// Cache memory sizing analysis and recommendations
class CacheSizingAnalyzer {
    
    async analyzeCurrentUsage(redis: Redis): Promise<MemoryAnalysis> {
        const info = await redis.info('memory');
        const keyspace = await redis.info('keyspace');
        
        // Parse Redis INFO output
        const usedMemory = this.parseInfoValue(info, 'used_memory');
        const maxMemory = this.parseInfoValue(info, 'maxmemory');
        const fragRatio = this.parseInfoValue(info, 'mem_fragmentation_ratio');
        const evictedKeys = this.parseInfoValue(info, 'evicted_keys');
        
        // Calculate key count and average size
        const dbInfo = this.parseKeyspaceInfo(keyspace);
        const totalKeys = dbInfo.keys;
        const avgKeySize = totalKeys > 0 ? usedMemory / totalKeys : 0;
        
        // Memory utilization
        const utilizationPercent = maxMemory > 0 ? (usedMemory / maxMemory) * 100 : 0;
        
        return {
            current: {
                usedMemoryBytes: usedMemory,
                maxMemoryBytes: maxMemory,
                utilizationPercent,
                fragmentationRatio: fragRatio,
                totalKeys,
                avgKeyBytes: avgKeySize,
                evictedKeys,
            },
            
            recommendations: this.generateRecommendations({
                utilization: utilizationPercent,
                fragmentation: fragRatio,
                evictions: evictedKeys,
            }),
        };
    }
    
    calculateOptimalSize(params: SizingParams): SizeRecommendation {
        const {
            workingSetSize,          // Number of unique keys
            avgObjectBytes,          // Average bytes per value
            targetHitRatePercent,    // Desired hit rate (e.g., 90)
            accessDistributionSkew,  // 0-1, higher = more skewed
        } = params;
        
        // Base size to hold entire working set
        const fullWorkingSetBytes = workingSetSize * avgObjectBytes;
        
        // Adjust based on target hit rate and access distribution
        // With skewed access (hot keys), smaller cache achieves same hit rate
        const coverageNeeded = this.calculateCoverageForHitRate(
            targetHitRatePercent,
            accessDistributionSkew
        );
        
        const minSizeBytes = fullWorkingSetBytes * coverageNeeded;
        
        // Add overhead for metadata, fragmentation, etc.
        const overhead = 1.3;
        const recommendedBytes = minSizeBytes * overhead;
        
        return {
            minimumBytes: Math.ceil(minSizeBytes),
            recommendedBytes: Math.ceil(recommendedBytes),
            fullWorkingSetBytes,
            coveragePercent: coverageNeeded * 100,
            breakdown: {
                workingSetSize,
                avgObjectBytes,
                coveredKeys: Math.ceil(workingSetSize * coverageNeeded),
                overheadPercent: (overhead - 1) * 100,
            },
        };
    }
    
    // Simulate hit rate at different cache sizes
    async simulateSizingScenarios(
        accessHistory: string[],
        objectSizes: Map<string, number>,
        scenarios: number[]
    ): Promise<SizingScenario[]> {
        const results: SizingScenario[] = [];
        
        for (const sizeBytes of scenarios) {
            const simulation = this.runLRUSimulation(accessHistory, objectSizes, sizeBytes);
            results.push({
                cacheSizeBytes: sizeBytes,
                hitRate: simulation.hits / (simulation.hits + simulation.misses),
                evictionCount: simulation.evictions,
                avgMemoryUtilization: simulation.avgMemoryUsed / sizeBytes,
            });
        }
        
        return results;
    }
    
    private calculateCoverageForHitRate(targetHitRate: number, skew: number): number {
        // With perfect 80/20 distribution (skew=1), 20% of keys = 80% hit rate
        // With uniform distribution (skew=0), need 95% of keys for 95% hit rate
        
        const targetRate = targetHitRate / 100;
        
        if (skew > 0.8) {
            // Highly skewed: small cache is effective
            if (targetRate <= 0.8) return 0.2;
            if (targetRate <= 0.9) return 0.35;
            if (targetRate <= 0.95) return 0.5;
            return 0.7;
        } else if (skew > 0.5) {
            // Moderately skewed
            if (targetRate <= 0.8) return 0.4;
            if (targetRate <= 0.9) return 0.6;
            if (targetRate <= 0.95) return 0.75;
            return 0.9;
        } else {
            // Uniform access: need most of working set
            return Math.min(targetRate + 0.05, 1.0);
        }
    }
    
    private generateRecommendations(metrics: {
        utilization: number;
        fragmentation: number;
        evictions: number;
    }): string[] {
        const recommendations: string[] = [];
        
        if (metrics.utilization > 90) {
            recommendations.push('HIGH PRIORITY: Memory utilization >90%. Increase cache size or reduce working set.');
        } else if (metrics.utilization > 80) {
            recommendations.push('Memory utilization >80%. Monitor closely; consider increasing size.');
        }
        
        if (metrics.fragmentation > 1.5) {
            recommendations.push(`High fragmentation ratio (${metrics.fragmentation}). Consider restarting Redis or using active-defrag.`);
        }
        
        if (metrics.evictions > 0) {
            recommendations.push(`${metrics.evictions} evictions detected. If hit rate is suffering, increase cache size.`);
        }
        
        if (metrics.utilization < 30) {
            recommendations.push('Low memory utilization (<30%). Cache may be over-provisioned; could reduce costs.');
        }
        
        return recommendations;
    }
    
    private runLRUSimulation(
        accesses: string[],
        sizes: Map<string, number>,
        maxSize: number
    ): {hits: number; misses: number; evictions: number; avgMemoryUsed: number} {
        const cache = new Map<string, number>(); // key -> size
        const order: string[] = []; // LRU order
        let currentSize = 0;
        let hits = 0, misses = 0, evictions = 0;
        let totalMemory = 0, samples = 0;
        
        for (const key of accesses) {
            const size = sizes.get(key) || 1000; // default 1KB
            
            if (cache.has(key)) {
                // Hit: move to front of LRU
                hits++;
                const idx = order.indexOf(key);
                order.splice(idx, 1);
                order.unshift(key);
            } else {
                // Miss: add to cache
                misses++;
                
                // Evict if necessary
                while (currentSize + size > maxSize && order.length > 0) {
                    const evicted = order.pop()!;
                    currentSize -= cache.get(evicted)!;
                    cache.delete(evicted);
                    evictions++;
                }
                
                cache.set(key, size);
                order.unshift(key);
                currentSize += size;
            }
            
            totalMemory += currentSize;
            samples++;
        }
        
        return {
            hits,
            misses,
            evictions,
            avgMemoryUsed: totalMemory / (samples || 1),
        };
    }
    
    private parseInfoValue(info: string, key: string): number {
        const match = info.match(new RegExp(`${key}:(\d+)`));
        return match ? parseInt(match[1]) : 0;
    }
    
    private parseKeyspaceInfo(keyspace: string): {keys: number} {
        const match = keyspace.match(/keys=(d+)/);
        return { keys: match ? parseInt(match[1]) : 0 };
    }
}

Identifying Performance Bottlenecks

Common Cache Performance Bottlenecks
Bottleneck	Symptoms	Diagnosis	Solutions
Network Latency	High P50/P95 latency even for hits	Compare localhost Redis to remote Redis latency	Use local caching tier, connection pooling, pipelining
Serialization	CPU spikes on cache operations, large objects slow	Profile serialization time separately	Use faster serializer (Protocol Buffers, MessagePack), compress large objects
Hot Key Contention	Single keys have extremely high access rate, potential throttling	Track per-key access counts	Replicate hot keys, add local cache layer
Connection Pool Exhaustion	Timeouts, connection errors under load	Monitor pool utilization, connection wait times	Increase pool size, reduce connection hold time
Large Object Size	High latency for specific keys, memory pressure	Analyze object size distribution	Compress, chunk large objects, or skip caching them
Cache Stampede	Backend load spikes when popular keys expire	Correlate backend load with cache expiration	Implement locking or probabilistic early refresh

Cache Bottleneck Profiler
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
// Comprehensive cache performance profiler
class CachePerformanceProfiler {
    private latencyHistogram: Map<string, number[]> = new Map();
    private keyAccessCounts: Map<string, number> = new Map();
    private connectionMetrics: ConnectionMetrics;
    
    constructor(private cache: Cache<any>) {
        this.connectionMetrics = {
            totalConnections: 0,
            activeConnections: 0,
            peakConnections: 0,
            waitTimes: [],
        };
    }
    
    // Profile a specific operation
    async profileOperation<T>(
        operation: string,
        key: string,
        fn: () => Promise<T>
    ): Promise<{result: T; profile: OperationProfile}> {
        const profile: OperationProfile = {
            operation,
            key,
            startTime: performance.now(),
            phases: {},
        };
        
        // Track connection acquisition
        profile.phases.connectionAcquire = { start: performance.now() };
        // ... connection pool timing
        profile.phases.connectionAcquire.end = performance.now();
        
        // Track actual operation
        profile.phases.execution = { start: performance.now() };
        const result = await fn();
        profile.phases.execution.end = performance.now();
        
        // Track deserialization if result exists
        if (result) {
            profile.phases.deserialization = { start: performance.now() };
            // Deserialization typically happens during execution, but we can estimate
            profile.phases.deserialization.end = performance.now();
        }
        
        profile.totalTimeMs = performance.now() - profile.startTime;
        
        // Record for analysis
        this.recordLatency(operation, profile.totalTimeMs);
        this.recordKeyAccess(key);
        
        return { result, profile };
    }
    
    // Generate comprehensive performance report
    generateReport(): PerformanceReport {
        const report: PerformanceReport = {
            latencyAnalysis: {},
            hotKeyAnalysis: this.analyzeHotKeys(),
            connectionHealth: this.analyzeConnections(),
            recommendations: [],
        };
        
        // Latency analysis per operation
        for (const [operation, latencies] of this.latencyHistogram) {
            const sorted = [...latencies].sort((a, b) => a - b);
            report.latencyAnalysis[operation] = {
                p50: this.percentile(sorted, 50),
                p95: this.percentile(sorted, 95),
                p99: this.percentile(sorted, 99),
                avg: latencies.reduce((a, b) => a + b, 0) / latencies.length,
                count: latencies.length,
            };
        }
        
        // Generate recommendations
        report.recommendations = this.generateRecommendations(report);
        
        return report;
    }
    
    // Analyze for hot keys
    private analyzeHotKeys(): HotKeyAnalysis {
        const sorted = [...this.keyAccessCounts.entries()]
            .sort((a, b) => b[1] - a[1]);
        
        const totalAccess = sorted.reduce((sum, [_, count]) => sum + count, 0);
        const top10 = sorted.slice(0, 10);
        const top10Access = top10.reduce((sum, [_, count]) => sum + count, 0);
        
        return {
            topKeys: top10.map(([key, count]) => ({
                key,
                accessCount: count,
                percentOfTotal: (count / totalAccess) * 100,
            })),
            concentration: top10Access / totalAccess,
            hotKeyWarning: top10Access / totalAccess > 0.5,
        };
    }
    
    private analyzeConnections(): ConnectionAnalysis {
        const waitTimes = this.connectionMetrics.waitTimes;
        const sorted = [...waitTimes].sort((a, b) => a - b);
        
        return {
            poolUtilization: this.connectionMetrics.activeConnections / 
                (this.connectionMetrics.totalConnections || 1) * 100,
            peakUtilization: this.connectionMetrics.peakConnections /
                (this.connectionMetrics.totalConnections || 1) * 100,
            avgWaitTimeMs: waitTimes.length > 0 
                ? waitTimes.reduce((a, b) => a + b, 0) / waitTimes.length 
                : 0,
            p95WaitTimeMs: sorted.length > 0 ? this.percentile(sorted, 95) : 0,
        };
    }
    
    private generateRecommendations(report: PerformanceReport): string[] {
        const recommendations: string[] = [];
        
        // Latency recommendations
        for (const [op, stats] of Object.entries(report.latencyAnalysis)) {
            if (stats.p95 > 10) { // > 10ms is concerning for cache
                recommendations.push(
                    `${op} P95 latency is ${stats.p95.toFixed(1)}ms - investigate network or serialization`
                );
            }
        }
        
        // Hot key recommendations
        if (report.hotKeyAnalysis.hotKeyWarning) {
            recommendations.push(
                `Hot key detected: top 10 keys account for ${(report.hotKeyAnalysis.concentration * 100).toFixed(1)}% of traffic. Consider local caching or replication.`
            );
        }
        
        // Connection recommendations
        if (report.connectionHealth.peakUtilization > 80) {
            recommendations.push(
                `Connection pool peak utilization at ${report.connectionHealth.peakUtilization.toFixed(1)}%. Increase pool size.`
            );
        }
        if (report.connectionHealth.p95WaitTimeMs > 5) {
            recommendations.push(
                `P95 connection wait time is ${report.connectionHealth.p95WaitTimeMs.toFixed(1)}ms. Pool may be undersized.`
            );
        }
        
        return recommendations;
    }
    
    private recordLatency(operation: string, ms: number): void {
        if (!this.latencyHistogram.has(operation)) {
            this.latencyHistogram.set(operation, []);
        }
        this.latencyHistogram.get(operation)!.push(ms);
    }
    
    private recordKeyAccess(key: string): void {
        const pattern = key.split(':')[0]; // Group by pattern
        this.keyAccessCounts.set(pattern, (this.keyAccessCounts.get(pattern) || 0) + 1);
    }
    
    private percentile(sorted: number[], p: number): number {
        if (sorted.length === 0) return 0;
        const idx = Math.ceil((p / 100) * sorted.length) - 1;
        return sorted[Math.max(0, idx)];
    }
}

Advanced Optimization Strategies

Beyond basic tuning, advanced optimization strategies can significantly improve cache performance for specific workloads.

Advanced Optimization Techniques

•Tiered Caching — Use multiple cache layers (L1: in-process, L2: distributed). Hot data stays in L1 for microsecond access; L2 handles larger working set.
•Compression — Compress large objects before caching. Trade CPU for memory and network bandwidth. Most effective for objects >1KB.
•Pipelining — Batch multiple cache operations into single round-trip. Dramatically reduces network overhead for bulk operations.
•Hot-Key Replication — Replicate extremely hot keys across multiple cache nodes to distribute load.
•Pre-Fetching — Predict and pre-fetch data before it's needed based on access patterns or navigation behavior.
•Adaptive TTL — Dynamically adjust TTL based on access frequency. Hot items get longer TTL; cold items expire faster.

Tiered Cache Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
// Two-tier caching: Local (L1) + Distributed (L2)
class TieredCache<T> implements Cache<T> {
    constructor(
        private l1Cache: LocalCache<T>,    // In-process, microsecond access
        private l2Cache: DistributedCache<T>, // Distributed, millisecond access
        private options: TieredCacheOptions = {}
    ) {}
    
    async get(key: string): Promise<T | null> {
        // Try L1 first (ultra-fast, no network)
        const l1Result = this.l1Cache.get(key);
        if (l1Result !== null) {
            this.options.metrics?.recordL1Hit();
            return l1Result;
        }
        
        // Try L2 (network round-trip, still fast)
        const l2Result = await this.l2Cache.get(key);
        if (l2Result !== null) {
            this.options.metrics?.recordL2Hit();
            
            // Promote to L1 for future access
            this.l1Cache.set(key, l2Result, this.options.l1TTLSeconds || 60);
            
            return l2Result;
        }
        
        this.options.metrics?.recordMiss();
        return null;
    }
    
    async set(key: string, value: T, ttlSeconds?: number): Promise<void> {
        // Write to both tiers
        await Promise.all([
            // L1: shorter TTL, memory sensitive
            Promise.resolve(
                this.l1Cache.set(key, value, Math.min(ttlSeconds || 60, this.options.l1MaxTTL || 60))
            ),
            // L2: full TTL, persistent
            this.l2Cache.set(key, value, ttlSeconds),
        ]);
    }
    
    async delete(key: string): Promise<void> {
        // Invalidate both tiers
        await Promise.all([
            Promise.resolve(this.l1Cache.delete(key)),
            this.l2Cache.delete(key),
        ]);
    }
    
    async clear(): Promise<void> {
        await Promise.all([
            Promise.resolve(this.l1Cache.clear()),
            this.l2Cache.clear(),
        ]);
    }
    
    // Get stats for both tiers
    getStats(): TieredCacheStats {
        return {
            l1: {
                size: this.l1Cache.size(),
                hitRate: this.options.metrics?.l1HitRate() || 0,
            },
            l2: {
                hitRate: this.options.metrics?.l2HitRate() || 0,
            },
            overallHitRate: this.options.metrics?.overallHitRate() || 0,
        };
    }
}
 
// LRU-based local cache with size limit
class LocalLRUCache<T> implements LocalCache<T> {
    private cache: Map<string, {value: T; expiresAt: number}> = new Map();
    private accessOrder: string[] = [];
    
    constructor(
        private maxSize: number,
        private maxMemoryBytes?: number
    ) {}
    
    get(key: string): T | null {
        const entry = this.cache.get(key);
        if (!entry) return null;
        
        if (Date.now() > entry.expiresAt) {
            this.delete(key);
            return null;
        }
        
        // Move to front of LRU
        this.touchKey(key);
        return entry.value;
    }
    
    set(key: string, value: T, ttlSeconds: number): void {
        // Evict if at capacity
        while (this.cache.size >= this.maxSize) {
            this.evictOldest();
        }
        
        this.cache.set(key, {
            value,
            expiresAt: Date.now() + (ttlSeconds * 1000),
        });
        this.touchKey(key);
    }
    
    delete(key: string): void {
        this.cache.delete(key);
        const idx = this.accessOrder.indexOf(key);
        if (idx !== -1) this.accessOrder.splice(idx, 1);
    }
    
    clear(): void {
        this.cache.clear();
        this.accessOrder = [];
    }
    
    size(): number {
        return this.cache.size;
    }
    
    private touchKey(key: string): void {
        const idx = this.accessOrder.indexOf(key);
        if (idx !== -1) this.accessOrder.splice(idx, 1);
        this.accessOrder.unshift(key);
    }
    
    private evictOldest(): void {
        const oldest = this.accessOrder.pop();
        if (oldest) this.cache.delete(oldest);
    }
}

Measure Before Optimizing

Validating Tuning Changes

Cache tuning without validation is guesswork. Every configuration change should be measured against baseline metrics to confirm improvement.

Tuning Validation Process

•Establish Baseline — Record key metrics (hit rate, latency, backend load) before changes
•Define Success Criteria — What specific improvement do you expect? (e.g., hit rate from 70% to 85%)
•Deploy to Canary — Apply changes to small subset of traffic first
•Monitor and Compare — Compare canary metrics to control group and baseline
•Statistical Validation — Ensure differences are statistically significant, not noise
•Gradual Rollout — Expand to more traffic if canary shows improvement
•Document Results — Record what changed and what impact it had for future reference

Cache Tuning Experiment Framework
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
// Framework for running cache tuning experiments
class CacheTuningExperiment {
    constructor(
        private name: string,
        private controlCache: Cache<any>,
        private experimentCache: Cache<any>,
        private trafficSplitter: TrafficSplitter,
        private metrics: ExperimentMetrics
    ) {}
    
    async runRequest(key: string, fetchFn: () => Promise<any>): Promise<any> {
        const bucket = this.trafficSplitter.getBucket(key);
        const cache = bucket === 'experiment' ? this.experimentCache : this.controlCache;
        const startTime = performance.now();
        
        let result = await cache.get(key);
        const cacheHit = result !== null;
        
        if (!result) {
            result = await fetchFn();
            await cache.set(key, result);
        }
        
        const duration = performance.now() - startTime;
        
        // Record metrics by bucket
        this.metrics.record(bucket, {
            hit: cacheHit,
            latencyMs: duration,
            timestamp: Date.now(),
        });
        
        return result;
    }
    
    async getResults(): Promise<ExperimentResults> {
        const control = this.metrics.getStats('control');
        const experiment = this.metrics.getStats('experiment');
        
        // Calculate relative improvement
        const hitRateDiff = experiment.hitRate - control.hitRate;
        const latencyDiff = control.p50Latency - experiment.p50Latency; // Positive = improvement
        
        // Statistical significance (simplified z-test for proportions)
        const hitRateSignificant = this.isProportionDifferenceSignificant(
            experiment.hitRate, experiment.sampleSize,
            control.hitRate, control.sampleSize,
            0.05 // 95% confidence
        );
        
        return {
            experimentName: this.name,
            control,
            experiment,
            comparison: {
                hitRateDelta: hitRateDiff,
                hitRateImprovement: (hitRateDiff / control.hitRate) * 100,
                latencyImprovement: latencyDiff,
                isSignificant: hitRateSignificant,
            },
            recommendation: this.generateRecommendation(hitRateDiff, hitRateSignificant),
        };
    }
    
    private isProportionDifferenceSignificant(
        p1: number, n1: number,
        p2: number, n2: number,
        alpha: number
    ): boolean {
        // Pooled proportion
        const p = (p1 * n1 + p2 * n2) / (n1 + n2);
        
        // Standard error
        const se = Math.sqrt(p * (1 - p) * (1/n1 + 1/n2));
        
        // Z-score
        const z = Math.abs(p1 - p2) / se;
        
        // Two-tailed critical value for alpha=0.05 is ~1.96
        return z > 1.96;
    }
    
    private generateRecommendation(delta: number, significant: boolean): string {
        if (!significant) {
            return 'No statistically significant difference. Need more samples or the change has no effect.';
        }
        if (delta > 0.05) {
            return `RECOMMENDED: Roll out experiment. Hit rate improved by ${(delta * 100).toFixed(1)}%.`;
        }
        if (delta > 0) {
            return `MARGINAL: Small improvement (${(delta * 100).toFixed(1)}%). Consider if complexity is worth it.`;
        }
        return `NOT RECOMMENDED: Experiment performed worse by ${(Math.abs(delta) * 100).toFixed(1)}%.`;
    }
}
 
// Usage
const experiment = new CacheTuningExperiment(
    'TTL Increase Test',
    new RedisCache({ ttlSeconds: 300 }),   // Control: 5 min TTL
    new RedisCache({ ttlSeconds: 900 }),   // Experiment: 15 min TTL
    new PercentageSplitter(10),            // 10% to experiment
    new PrometheusExperimentMetrics()
);
 
// After sufficient traffic...
const results = await experiment.getResults();
console.log(results.recommendation);

Summary: Cache Performance Tuning

Cache performance tuning transforms a basic cache implementation into an optimized system component. The key is systematic analysis, targeted optimization, and rigorous validation.

Key Takeaways

•Understand your workload first: Access distribution, read/write ratio, working set size, and data volatility all influence optimal configuration.
•TTL is your most impactful lever: Calculate optimal TTL based on change frequency, staleness tolerance, and access patterns. Add jitter to prevent synchronized expirations.
•Size cache to working set: Minimum cache size = working set × average object size × 1.3 overhead factor.
•Choose eviction policy by workload: LRU for temporal locality, LFU for stable hot sets, consider adaptive policies for mixed workloads.
•Profile before optimizing: Measure latency, hot keys, and connection utilization. Fix real bottlenecks, not imagined ones.
•Use tiered caching strategically: L1 (in-process) for ultra-hot data, L2 (distributed) for larger working set. Don't add tiers without evidence of need.
•Validate every change: Use A/B experiments with statistical significance tests. Don't trust intuition—trust data.

Module Complete:

Module Complete

4 / 4