Loading learning content...
A cache that 'works' isn't necessarily a cache that works well. The difference between a properly tuned cache and a default configuration can mean dramatic improvements in hit rate, latency, and resource efficiency—often without any code changes.
Cache tuning is the practice of optimizing cache configuration to match your specific workload characteristics. A cache optimized for an e-commerce product catalog behaves very differently from one optimized for user sessions or real-time analytics. One size does not fit all.
This page provides a systematic approach to cache performance tuning, from understanding your workload to selecting optimal configurations and validating the results.
By the end of this page, you will understand how to analyze workload characteristics to inform tuning decisions, configure optimal TTL values, memory limits, and eviction policies, identify and eliminate cache performance bottlenecks, and validate that tuning changes achieve the desired improvements.
Effective cache tuning starts with understanding your workload. You cannot optimize what you don't understand. Key workload characteristics that influence cache configuration:
| Characteristic | Description | Cache Implication |
|---|---|---|
| Access Distribution | How evenly are keys accessed? (uniform vs. skewed) | Skewed access (80/20) benefits more from caching; uniform access needs larger cache |
| Read/Write Ratio | Percentage of reads vs. writes | High read ratio = ideal for caching; high writes = more invalidation overhead |
| Working Set Size | Number of unique keys accessed in a time window | Must fit in cache for good hit rate; if larger, expect evictions |
| Access Recency | How quickly do previously accessed items get re-accessed? | Frequent re-access = longer TTL effective; rare re-access = shorter TTL or no cache |
| Object Size | Average size of cached values | Large objects consume memory quickly; may need compression or size limits |
| Access Patterns | Temporal patterns (peaks, periodic, steady) | Pre-warming before peaks; adjust TTL for pattern |
| Data Volatility | How frequently does underlying data change? | High volatility = shorter TTL or event-based invalidation |
Measuring Workload Characteristics:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158
// Comprehensive workload analyzerclass CacheWorkloadAnalyzer { private accessLog: Array<{key: string; timestamp: number; operation: 'get' | 'set' | 'delete'; size?: number}> = []; constructor(private analysisWindow: number = 3600000) {} // Default 1 hour // Record all cache operations recordAccess(key: string, operation: 'get' | 'set' | 'delete', size?: number): void { this.accessLog.push({ key, timestamp: Date.now(), operation, size, }); // Prune old entries const cutoff = Date.now() - this.analysisWindow; this.accessLog = this.accessLog.filter(e => e.timestamp > cutoff); } // Generate comprehensive workload report analyze(): WorkloadReport { const now = Date.now(); const cutoff = now - this.analysisWindow; const recentLogs = this.accessLog.filter(e => e.timestamp > cutoff); // 1. Read/Write Ratio const gets = recentLogs.filter(e => e.operation === 'get').length; const sets = recentLogs.filter(e => e.operation === 'set').length; const deletes = recentLogs.filter(e => e.operation === 'delete').length; const readWriteRatio = gets / (sets + deletes || 1); // 2. Access Distribution (calculate Pareto ratio) const keyAccess = new Map<string, number>(); recentLogs.filter(e => e.operation === 'get').forEach(e => { keyAccess.set(e.key, (keyAccess.get(e.key) || 0) + 1); }); const sortedAccess = [...keyAccess.values()].sort((a, b) => b - a); const totalAccess = sortedAccess.reduce((a, b) => a + b, 0); const top20Percent = Math.ceil(sortedAccess.length * 0.2); const top20Access = sortedAccess.slice(0, top20Percent).reduce((a, b) => a + b, 0); const paretoRatio = totalAccess > 0 ? top20Access / totalAccess : 0; // 3. Working Set Size const uniqueKeys = keyAccess.size; // 4. Access Frequency Distribution const accessCounts = [...keyAccess.values()]; const avgAccessPerKey = accessCounts.reduce((a, b) => a + b, 0) / (accessCounts.length || 1); const maxAccessPerKey = Math.max(...accessCounts, 0); // 5. Object Size Distribution const sizes = recentLogs.filter(e => e.size).map(e => e.size!); const avgSize = sizes.reduce((a, b) => a + b, 0) / (sizes.length || 1); const maxSize = Math.max(...sizes, 0); const p95Size = sizes.length > 0 ? this.percentile(sizes, 95) : 0; // 6. Re-access Time Distribution const reAccessTimes = this.calculateReAccessTimes(recentLogs); const avgReAccessTime = reAccessTimes.length > 0 ? reAccessTimes.reduce((a, b) => a + b, 0) / reAccessTimes.length : null; // 7. Temporal Pattern Analysis const hourlyDistribution = this.calculateHourlyDistribution(recentLogs); return { timeWindow: this.analysisWindow, totalOperations: recentLogs.length, readWriteRatio: { gets, sets, deletes, ratio: readWriteRatio, recommendation: readWriteRatio > 10 ? 'Excellent caching candidate' : readWriteRatio > 3 ? 'Good caching candidate' : 'Consider if caching is beneficial', }, accessDistribution: { paretoRatio, interpretation: paretoRatio > 0.8 ? 'Highly skewed (classic 80/20)' : paretoRatio > 0.6 ? 'Moderately skewed' : 'Relatively uniform', recommendation: paretoRatio > 0.6 ? 'Small cache can be very effective' : 'Need larger cache for good coverage', }, workingSet: { uniqueKeys, recommendation: `Cache should hold at least ${Math.ceil(uniqueKeys * 0.3)} keys for 80% coverage`, }, accessFrequency: { avgAccessPerKey, maxAccessPerKey, hotKeys: [...keyAccess.entries()] .sort((a, b) => b[1] - a[1]) .slice(0, 10) .map(([key, count]) => ({ key, count })), }, objectSize: { avgBytes: avgSize, maxBytes: maxSize, p95Bytes: p95Size, estimatedMemoryForWorkingSet: uniqueKeys * avgSize, }, reAccessPattern: { avgReAccessTimeMs: avgReAccessTime, recommendation: avgReAccessTime ? `TTL should be at least ${Math.ceil(avgReAccessTime / 1000 * 1.5)}s` : 'Insufficient data for TTL recommendation', }, temporalPattern: { hourlyDistribution, peakHour: hourlyDistribution.indexOf(Math.max(...hourlyDistribution)), recommendation: 'Consider pre-warming cache before peak hours', }, }; } private calculateReAccessTimes(logs: typeof this.accessLog): number[] { const lastAccess = new Map<string, number>(); const reAccessTimes: number[] = []; for (const log of logs) { if (log.operation === 'get') { const last = lastAccess.get(log.key); if (last) { reAccessTimes.push(log.timestamp - last); } lastAccess.set(log.key, log.timestamp); } } return reAccessTimes; } private calculateHourlyDistribution(logs: typeof this.accessLog): number[] { const hourly = new Array(24).fill(0); for (const log of logs) { const hour = new Date(log.timestamp).getHours(); hourly[hour]++; } return hourly; } private percentile(arr: number[], p: number): number { const sorted = [...arr].sort((a, b) => a - b); const index = Math.ceil((p / 100) * sorted.length) - 1; return sorted[index]; }}Run workload analysis for at least 24-48 hours to capture daily patterns. A single hour's snapshot might miss important variations. Ideally, analyze a full week to capture weekend vs. weekday differences.
TTL (Time-To-Live) is one of the most impactful cache configuration parameters. Too short, and you're constantly refetching data unnecessarily. Too long, and you're serving stale data. The optimal TTL balances freshness requirements against cache effectiveness.
The TTL Optimization Process:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124
// TTL optimization based on workload analysisclass TTLOptimizer { /** * Calculate optimal TTL based on data characteristics */ calculateOptimalTTL(params: TTLParams): TTLRecommendation { const { avgChangeIntervalSeconds, // How often data changes maxAcceptableStaleSeconds, // Business tolerance for staleness avgReAccessTimeSeconds, // How often items are re-accessed backendQueryCostMs, // Cost of regenerating data invalidationReliability, // 0-1: How reliable is event-based invalidation? } = params; // Rule 1: TTL should be less than acceptable staleness const stalenessBound = maxAcceptableStaleSeconds; // Rule 2: TTL should be long enough for re-access // Aim for at least 2x average re-access time to ensure hits on return visits const reAccessBound = avgReAccessTimeSeconds * 2; // Rule 3: With reliable invalidation, TTL can be longer (serves as safety net) const reliabilityMultiplier = 1 + (invalidationReliability * 4); // If invalidation is 100% reliable, allow 5x longer TTL const adjustedStalenessBound = stalenessBound * reliabilityMultiplier; // Rule 4: High-cost queries warrant longer TTL const costMultiplier = Math.log10(backendQueryCostMs + 1) / 2; // 10ms query = 0.5x, 100ms = 1x, 1000ms = 1.5x // Calculate recommended TTL const baseTTL = Math.min(adjustedStalenessBound, avgChangeIntervalSeconds * 0.8); const recommendedTTL = Math.max( baseTTL * (1 + costMultiplier), reAccessBound / 2 // At minimum, half of re-access time ); // Cap at 24 hours unless explicitly allowed const cappedTTL = Math.min(recommendedTTL, 86400); return { recommendedTTLSeconds: Math.round(cappedTTL), reasoning: { stalenessBound, reAccessBound, reliabilityMultiplier, costMultiplier, }, confidence: this.calculateConfidence(params), warnings: this.generateWarnings(params, cappedTTL), }; } /** * Per-key-pattern TTL configuration */ generateTTLConfig(patterns: KeyPatternAnalysis[]): Map<string, number> { const config = new Map<string, number>(); for (const pattern of patterns) { const recommendation = this.calculateOptimalTTL({ avgChangeIntervalSeconds: pattern.avgChangeInterval, maxAcceptableStaleSeconds: pattern.stalenessTolerance, avgReAccessTimeSeconds: pattern.avgReAccessTime, backendQueryCostMs: pattern.queryLatencyMs, invalidationReliability: pattern.hasEventInvalidation ? 0.95 : 0, }); config.set(pattern.pattern, recommendation.recommendedTTLSeconds); } return config; } private calculateConfidence(params: TTLParams): 'high' | 'medium' | 'low' { // Confidence based on data completeness if (params.avgChangeIntervalSeconds > 0 && params.avgReAccessTimeSeconds > 0) { return 'high'; } else if (params.maxAcceptableStaleSeconds > 0) { return 'medium'; } return 'low'; } private generateWarnings(params: TTLParams, ttl: number): string[] { const warnings: string[] = []; if (ttl > params.avgChangeIntervalSeconds) { warnings.push('TTL exceeds average change interval - expect some stale data'); } if (ttl < params.avgReAccessTimeSeconds) { warnings.push('TTL shorter than re-access time - hit rate may suffer'); } if (params.invalidationReliability < 0.5 && ttl > 3600) { warnings.push('Long TTL with unreliable invalidation - staleness risk'); } return warnings; }} // Example usageconst optimizer = new TTLOptimizer(); // Product catalog: changes rarely, needs freshnessconst productTTL = optimizer.calculateOptimalTTL({ avgChangeIntervalSeconds: 86400, // Products change ~daily maxAcceptableStaleSeconds: 300, // 5 min staleness OK avgReAccessTimeSeconds: 1800, // Re-accessed every 30 min backendQueryCostMs: 50, // Moderate query cost invalidationReliability: 0.95, // Good event-based invalidation});// Recommendation: ~1440s (24 min) with high confidence // User session: changes frequently, needs immediate freshnessconst sessionTTL = optimizer.calculateOptimalTTL({ avgChangeIntervalSeconds: 60, // Session changes every minute maxAcceptableStaleSeconds: 5, // Must be <5s stale avgReAccessTimeSeconds: 10, // Re-accessed every 10s backendQueryCostMs: 5, // Fast query invalidationReliability: 0.99, // Very reliable invalidation});// Recommendation: ~25s with high confidenceWhen many cache entries have the same TTL and were populated at similar times, they'll all expire together, causing a cache stampede. Add random jitter (±10-20%) to TTL values to spread expirations over time.
Cache memory sizing requires balancing cost (memory is expensive) against effectiveness (larger cache = higher hit rate). The optimal size depends on your working set, access patterns, and budget constraints.
Memory Sizing Formula:
Minimum Cache Size = Working Set Size × Average Object Size × Safety Factor
Where:
- Working Set Size = Unique keys accessed in typical period
- Average Object Size = Mean bytes per cached value
- Safety Factor = 1.2 to 1.5 (account for overhead and headroom)
Example Calculation:
Cache Size = 100,000 × 2 KB × 1.3 = 260 MB
| Policy | Description | Best For | Overhead |
|---|---|---|---|
| LRU (Least Recently Used) | Evicts least recently accessed item | General purpose, temporal locality | Low to Medium |
| LFU (Least Frequently Used) | Evicts least accessed item overall | Stable hot data sets | Medium (requires counters) |
| FIFO (First In First Out) | Evicts oldest item | Simple, predictable, streaming data | Very Low |
| Random | Evicts random item | Uniform access patterns | Very Low |
| TTL-Only | Items only expire by TTL, no eviction | When all data fits in memory | None |
| LRU + TTL | LRU with TTL as maximum age | Most production workloads | Low to Medium |
| Segmented LRU | Hot/Cold segments with LRU each | Mixed access patterns | Medium |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210
// Cache memory sizing analysis and recommendationsclass CacheSizingAnalyzer { async analyzeCurrentUsage(redis: Redis): Promise<MemoryAnalysis> { const info = await redis.info('memory'); const keyspace = await redis.info('keyspace'); // Parse Redis INFO output const usedMemory = this.parseInfoValue(info, 'used_memory'); const maxMemory = this.parseInfoValue(info, 'maxmemory'); const fragRatio = this.parseInfoValue(info, 'mem_fragmentation_ratio'); const evictedKeys = this.parseInfoValue(info, 'evicted_keys'); // Calculate key count and average size const dbInfo = this.parseKeyspaceInfo(keyspace); const totalKeys = dbInfo.keys; const avgKeySize = totalKeys > 0 ? usedMemory / totalKeys : 0; // Memory utilization const utilizationPercent = maxMemory > 0 ? (usedMemory / maxMemory) * 100 : 0; return { current: { usedMemoryBytes: usedMemory, maxMemoryBytes: maxMemory, utilizationPercent, fragmentationRatio: fragRatio, totalKeys, avgKeyBytes: avgKeySize, evictedKeys, }, recommendations: this.generateRecommendations({ utilization: utilizationPercent, fragmentation: fragRatio, evictions: evictedKeys, }), }; } calculateOptimalSize(params: SizingParams): SizeRecommendation { const { workingSetSize, // Number of unique keys avgObjectBytes, // Average bytes per value targetHitRatePercent, // Desired hit rate (e.g., 90) accessDistributionSkew, // 0-1, higher = more skewed } = params; // Base size to hold entire working set const fullWorkingSetBytes = workingSetSize * avgObjectBytes; // Adjust based on target hit rate and access distribution // With skewed access (hot keys), smaller cache achieves same hit rate const coverageNeeded = this.calculateCoverageForHitRate( targetHitRatePercent, accessDistributionSkew ); const minSizeBytes = fullWorkingSetBytes * coverageNeeded; // Add overhead for metadata, fragmentation, etc. const overhead = 1.3; const recommendedBytes = minSizeBytes * overhead; return { minimumBytes: Math.ceil(minSizeBytes), recommendedBytes: Math.ceil(recommendedBytes), fullWorkingSetBytes, coveragePercent: coverageNeeded * 100, breakdown: { workingSetSize, avgObjectBytes, coveredKeys: Math.ceil(workingSetSize * coverageNeeded), overheadPercent: (overhead - 1) * 100, }, }; } // Simulate hit rate at different cache sizes async simulateSizingScenarios( accessHistory: string[], objectSizes: Map<string, number>, scenarios: number[] ): Promise<SizingScenario[]> { const results: SizingScenario[] = []; for (const sizeBytes of scenarios) { const simulation = this.runLRUSimulation(accessHistory, objectSizes, sizeBytes); results.push({ cacheSizeBytes: sizeBytes, hitRate: simulation.hits / (simulation.hits + simulation.misses), evictionCount: simulation.evictions, avgMemoryUtilization: simulation.avgMemoryUsed / sizeBytes, }); } return results; } private calculateCoverageForHitRate(targetHitRate: number, skew: number): number { // With perfect 80/20 distribution (skew=1), 20% of keys = 80% hit rate // With uniform distribution (skew=0), need 95% of keys for 95% hit rate const targetRate = targetHitRate / 100; if (skew > 0.8) { // Highly skewed: small cache is effective if (targetRate <= 0.8) return 0.2; if (targetRate <= 0.9) return 0.35; if (targetRate <= 0.95) return 0.5; return 0.7; } else if (skew > 0.5) { // Moderately skewed if (targetRate <= 0.8) return 0.4; if (targetRate <= 0.9) return 0.6; if (targetRate <= 0.95) return 0.75; return 0.9; } else { // Uniform access: need most of working set return Math.min(targetRate + 0.05, 1.0); } } private generateRecommendations(metrics: { utilization: number; fragmentation: number; evictions: number; }): string[] { const recommendations: string[] = []; if (metrics.utilization > 90) { recommendations.push('HIGH PRIORITY: Memory utilization >90%. Increase cache size or reduce working set.'); } else if (metrics.utilization > 80) { recommendations.push('Memory utilization >80%. Monitor closely; consider increasing size.'); } if (metrics.fragmentation > 1.5) { recommendations.push(`High fragmentation ratio (${metrics.fragmentation}). Consider restarting Redis or using active-defrag.`); } if (metrics.evictions > 0) { recommendations.push(`${metrics.evictions} evictions detected. If hit rate is suffering, increase cache size.`); } if (metrics.utilization < 30) { recommendations.push('Low memory utilization (<30%). Cache may be over-provisioned; could reduce costs.'); } return recommendations; } private runLRUSimulation( accesses: string[], sizes: Map<string, number>, maxSize: number ): {hits: number; misses: number; evictions: number; avgMemoryUsed: number} { const cache = new Map<string, number>(); // key -> size const order: string[] = []; // LRU order let currentSize = 0; let hits = 0, misses = 0, evictions = 0; let totalMemory = 0, samples = 0; for (const key of accesses) { const size = sizes.get(key) || 1000; // default 1KB if (cache.has(key)) { // Hit: move to front of LRU hits++; const idx = order.indexOf(key); order.splice(idx, 1); order.unshift(key); } else { // Miss: add to cache misses++; // Evict if necessary while (currentSize + size > maxSize && order.length > 0) { const evicted = order.pop()!; currentSize -= cache.get(evicted)!; cache.delete(evicted); evictions++; } cache.set(key, size); order.unshift(key); currentSize += size; } totalMemory += currentSize; samples++; } return { hits, misses, evictions, avgMemoryUsed: totalMemory / (samples || 1), }; } private parseInfoValue(info: string, key: string): number { const match = info.match(new RegExp(`${key}:(\d+)`)); return match ? parseInt(match[1]) : 0; } private parseKeyspaceInfo(keyspace: string): {keys: number} { const match = keyspace.match(/keys=(d+)/); return { keys: match ? parseInt(match[1]) : 0 }; }}Even a properly configured cache can have performance bottlenecks that limit its effectiveness. Common bottlenecks include network latency, serialization overhead, hot key contention, and connection pool exhaustion.
| Bottleneck | Symptoms | Diagnosis | Solutions |
|---|---|---|---|
| Network Latency | High P50/P95 latency even for hits | Compare localhost Redis to remote Redis latency | Use local caching tier, connection pooling, pipelining |
| Serialization | CPU spikes on cache operations, large objects slow | Profile serialization time separately | Use faster serializer (Protocol Buffers, MessagePack), compress large objects |
| Hot Key Contention | Single keys have extremely high access rate, potential throttling | Track per-key access counts | Replicate hot keys, add local cache layer |
| Connection Pool Exhaustion | Timeouts, connection errors under load | Monitor pool utilization, connection wait times | Increase pool size, reduce connection hold time |
| Large Object Size | High latency for specific keys, memory pressure | Analyze object size distribution | Compress, chunk large objects, or skip caching them |
| Cache Stampede | Backend load spikes when popular keys expire | Correlate backend load with cache expiration | Implement locking or probabilistic early refresh |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169
// Comprehensive cache performance profilerclass CachePerformanceProfiler { private latencyHistogram: Map<string, number[]> = new Map(); private keyAccessCounts: Map<string, number> = new Map(); private connectionMetrics: ConnectionMetrics; constructor(private cache: Cache<any>) { this.connectionMetrics = { totalConnections: 0, activeConnections: 0, peakConnections: 0, waitTimes: [], }; } // Profile a specific operation async profileOperation<T>( operation: string, key: string, fn: () => Promise<T> ): Promise<{result: T; profile: OperationProfile}> { const profile: OperationProfile = { operation, key, startTime: performance.now(), phases: {}, }; // Track connection acquisition profile.phases.connectionAcquire = { start: performance.now() }; // ... connection pool timing profile.phases.connectionAcquire.end = performance.now(); // Track actual operation profile.phases.execution = { start: performance.now() }; const result = await fn(); profile.phases.execution.end = performance.now(); // Track deserialization if result exists if (result) { profile.phases.deserialization = { start: performance.now() }; // Deserialization typically happens during execution, but we can estimate profile.phases.deserialization.end = performance.now(); } profile.totalTimeMs = performance.now() - profile.startTime; // Record for analysis this.recordLatency(operation, profile.totalTimeMs); this.recordKeyAccess(key); return { result, profile }; } // Generate comprehensive performance report generateReport(): PerformanceReport { const report: PerformanceReport = { latencyAnalysis: {}, hotKeyAnalysis: this.analyzeHotKeys(), connectionHealth: this.analyzeConnections(), recommendations: [], }; // Latency analysis per operation for (const [operation, latencies] of this.latencyHistogram) { const sorted = [...latencies].sort((a, b) => a - b); report.latencyAnalysis[operation] = { p50: this.percentile(sorted, 50), p95: this.percentile(sorted, 95), p99: this.percentile(sorted, 99), avg: latencies.reduce((a, b) => a + b, 0) / latencies.length, count: latencies.length, }; } // Generate recommendations report.recommendations = this.generateRecommendations(report); return report; } // Analyze for hot keys private analyzeHotKeys(): HotKeyAnalysis { const sorted = [...this.keyAccessCounts.entries()] .sort((a, b) => b[1] - a[1]); const totalAccess = sorted.reduce((sum, [_, count]) => sum + count, 0); const top10 = sorted.slice(0, 10); const top10Access = top10.reduce((sum, [_, count]) => sum + count, 0); return { topKeys: top10.map(([key, count]) => ({ key, accessCount: count, percentOfTotal: (count / totalAccess) * 100, })), concentration: top10Access / totalAccess, hotKeyWarning: top10Access / totalAccess > 0.5, }; } private analyzeConnections(): ConnectionAnalysis { const waitTimes = this.connectionMetrics.waitTimes; const sorted = [...waitTimes].sort((a, b) => a - b); return { poolUtilization: this.connectionMetrics.activeConnections / (this.connectionMetrics.totalConnections || 1) * 100, peakUtilization: this.connectionMetrics.peakConnections / (this.connectionMetrics.totalConnections || 1) * 100, avgWaitTimeMs: waitTimes.length > 0 ? waitTimes.reduce((a, b) => a + b, 0) / waitTimes.length : 0, p95WaitTimeMs: sorted.length > 0 ? this.percentile(sorted, 95) : 0, }; } private generateRecommendations(report: PerformanceReport): string[] { const recommendations: string[] = []; // Latency recommendations for (const [op, stats] of Object.entries(report.latencyAnalysis)) { if (stats.p95 > 10) { // > 10ms is concerning for cache recommendations.push( `${op} P95 latency is ${stats.p95.toFixed(1)}ms - investigate network or serialization` ); } } // Hot key recommendations if (report.hotKeyAnalysis.hotKeyWarning) { recommendations.push( `Hot key detected: top 10 keys account for ${(report.hotKeyAnalysis.concentration * 100).toFixed(1)}% of traffic. Consider local caching or replication.` ); } // Connection recommendations if (report.connectionHealth.peakUtilization > 80) { recommendations.push( `Connection pool peak utilization at ${report.connectionHealth.peakUtilization.toFixed(1)}%. Increase pool size.` ); } if (report.connectionHealth.p95WaitTimeMs > 5) { recommendations.push( `P95 connection wait time is ${report.connectionHealth.p95WaitTimeMs.toFixed(1)}ms. Pool may be undersized.` ); } return recommendations; } private recordLatency(operation: string, ms: number): void { if (!this.latencyHistogram.has(operation)) { this.latencyHistogram.set(operation, []); } this.latencyHistogram.get(operation)!.push(ms); } private recordKeyAccess(key: string): void { const pattern = key.split(':')[0]; // Group by pattern this.keyAccessCounts.set(pattern, (this.keyAccessCounts.get(pattern) || 0) + 1); } private percentile(sorted: number[], p: number): number { if (sorted.length === 0) return 0; const idx = Math.ceil((p / 100) * sorted.length) - 1; return sorted[Math.max(0, idx)]; }}Beyond basic tuning, advanced optimization strategies can significantly improve cache performance for specific workloads.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136
// Two-tier caching: Local (L1) + Distributed (L2)class TieredCache<T> implements Cache<T> { constructor( private l1Cache: LocalCache<T>, // In-process, microsecond access private l2Cache: DistributedCache<T>, // Distributed, millisecond access private options: TieredCacheOptions = {} ) {} async get(key: string): Promise<T | null> { // Try L1 first (ultra-fast, no network) const l1Result = this.l1Cache.get(key); if (l1Result !== null) { this.options.metrics?.recordL1Hit(); return l1Result; } // Try L2 (network round-trip, still fast) const l2Result = await this.l2Cache.get(key); if (l2Result !== null) { this.options.metrics?.recordL2Hit(); // Promote to L1 for future access this.l1Cache.set(key, l2Result, this.options.l1TTLSeconds || 60); return l2Result; } this.options.metrics?.recordMiss(); return null; } async set(key: string, value: T, ttlSeconds?: number): Promise<void> { // Write to both tiers await Promise.all([ // L1: shorter TTL, memory sensitive Promise.resolve( this.l1Cache.set(key, value, Math.min(ttlSeconds || 60, this.options.l1MaxTTL || 60)) ), // L2: full TTL, persistent this.l2Cache.set(key, value, ttlSeconds), ]); } async delete(key: string): Promise<void> { // Invalidate both tiers await Promise.all([ Promise.resolve(this.l1Cache.delete(key)), this.l2Cache.delete(key), ]); } async clear(): Promise<void> { await Promise.all([ Promise.resolve(this.l1Cache.clear()), this.l2Cache.clear(), ]); } // Get stats for both tiers getStats(): TieredCacheStats { return { l1: { size: this.l1Cache.size(), hitRate: this.options.metrics?.l1HitRate() || 0, }, l2: { hitRate: this.options.metrics?.l2HitRate() || 0, }, overallHitRate: this.options.metrics?.overallHitRate() || 0, }; }} // LRU-based local cache with size limitclass LocalLRUCache<T> implements LocalCache<T> { private cache: Map<string, {value: T; expiresAt: number}> = new Map(); private accessOrder: string[] = []; constructor( private maxSize: number, private maxMemoryBytes?: number ) {} get(key: string): T | null { const entry = this.cache.get(key); if (!entry) return null; if (Date.now() > entry.expiresAt) { this.delete(key); return null; } // Move to front of LRU this.touchKey(key); return entry.value; } set(key: string, value: T, ttlSeconds: number): void { // Evict if at capacity while (this.cache.size >= this.maxSize) { this.evictOldest(); } this.cache.set(key, { value, expiresAt: Date.now() + (ttlSeconds * 1000), }); this.touchKey(key); } delete(key: string): void { this.cache.delete(key); const idx = this.accessOrder.indexOf(key); if (idx !== -1) this.accessOrder.splice(idx, 1); } clear(): void { this.cache.clear(); this.accessOrder = []; } size(): number { return this.cache.size; } private touchKey(key: string): void { const idx = this.accessOrder.indexOf(key); if (idx !== -1) this.accessOrder.splice(idx, 1); this.accessOrder.unshift(key); } private evictOldest(): void { const oldest = this.accessOrder.pop(); if (oldest) this.cache.delete(oldest); }}Don't apply advanced optimizations without evidence they're needed. Profile your cache performance first. A simple single-tier cache with proper TTL is often sufficient. Add complexity only when measurements show clear bottlenecks.
Cache tuning without validation is guesswork. Every configuration change should be measured against baseline metrics to confirm improvement.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
// Framework for running cache tuning experimentsclass CacheTuningExperiment { constructor( private name: string, private controlCache: Cache<any>, private experimentCache: Cache<any>, private trafficSplitter: TrafficSplitter, private metrics: ExperimentMetrics ) {} async runRequest(key: string, fetchFn: () => Promise<any>): Promise<any> { const bucket = this.trafficSplitter.getBucket(key); const cache = bucket === 'experiment' ? this.experimentCache : this.controlCache; const startTime = performance.now(); let result = await cache.get(key); const cacheHit = result !== null; if (!result) { result = await fetchFn(); await cache.set(key, result); } const duration = performance.now() - startTime; // Record metrics by bucket this.metrics.record(bucket, { hit: cacheHit, latencyMs: duration, timestamp: Date.now(), }); return result; } async getResults(): Promise<ExperimentResults> { const control = this.metrics.getStats('control'); const experiment = this.metrics.getStats('experiment'); // Calculate relative improvement const hitRateDiff = experiment.hitRate - control.hitRate; const latencyDiff = control.p50Latency - experiment.p50Latency; // Positive = improvement // Statistical significance (simplified z-test for proportions) const hitRateSignificant = this.isProportionDifferenceSignificant( experiment.hitRate, experiment.sampleSize, control.hitRate, control.sampleSize, 0.05 // 95% confidence ); return { experimentName: this.name, control, experiment, comparison: { hitRateDelta: hitRateDiff, hitRateImprovement: (hitRateDiff / control.hitRate) * 100, latencyImprovement: latencyDiff, isSignificant: hitRateSignificant, }, recommendation: this.generateRecommendation(hitRateDiff, hitRateSignificant), }; } private isProportionDifferenceSignificant( p1: number, n1: number, p2: number, n2: number, alpha: number ): boolean { // Pooled proportion const p = (p1 * n1 + p2 * n2) / (n1 + n2); // Standard error const se = Math.sqrt(p * (1 - p) * (1/n1 + 1/n2)); // Z-score const z = Math.abs(p1 - p2) / se; // Two-tailed critical value for alpha=0.05 is ~1.96 return z > 1.96; } private generateRecommendation(delta: number, significant: boolean): string { if (!significant) { return 'No statistically significant difference. Need more samples or the change has no effect.'; } if (delta > 0.05) { return `RECOMMENDED: Roll out experiment. Hit rate improved by ${(delta * 100).toFixed(1)}%.`; } if (delta > 0) { return `MARGINAL: Small improvement (${(delta * 100).toFixed(1)}%). Consider if complexity is worth it.`; } return `NOT RECOMMENDED: Experiment performed worse by ${(Math.abs(delta) * 100).toFixed(1)}%.`; }} // Usageconst experiment = new CacheTuningExperiment( 'TTL Increase Test', new RedisCache({ ttlSeconds: 300 }), // Control: 5 min TTL new RedisCache({ ttlSeconds: 900 }), // Experiment: 15 min TTL new PercentageSplitter(10), // 10% to experiment new PrometheusExperimentMetrics()); // After sufficient traffic...const results = await experiment.getResults();console.log(results.recommendation);Cache performance tuning transforms a basic cache implementation into an optimized system component. The key is systematic analysis, targeted optimization, and rigorous validation.
Module Complete:
You've now completed the Cache Testing and Monitoring module. You have comprehensive skills in testing cache implementations, collecting and interpreting cache metrics, debugging cache issues systematically, and tuning cache performance for optimal effectiveness. These skills form the operational foundation for running production caching systems with confidence.
You now have a comprehensive understanding of cache testing, monitoring, debugging, and performance tuning. These skills enable you to build, operate, and optimize production caching systems that are reliable, observable, and performant.