Loading learning content...
A cache without metrics is flying blind. You might believe your cache is working—responses feel fast, databases aren't complaining—but without concrete measurements, you have no way to answer critical questions:
Cache metrics are the foundation of cache management. They tell you whether your investment in caching infrastructure is paying dividends, where optimization opportunities exist, and when something is going wrong—often before users notice.
By the end of this page, you will understand the essential cache metrics to track, how to instrument your cache layer to collect these metrics, how to interpret metric patterns to diagnose issues, and how to set up alerting thresholds that catch problems before they impact users.
While caches expose dozens of metrics, a handful of core measurements provide the vast majority of insight. Understanding these foundational metrics is essential before diving into more advanced instrumentation.
| Metric | Definition | Why It Matters |
|---|---|---|
| Hit Rate | Percentage of requests served from cache (hits / total requests) | The primary indicator of cache effectiveness. Low hit rate means the cache isn't reducing backend load. |
| Miss Rate | Percentage of requests that required backend fetch (misses / total requests) | Inverse of hit rate. High miss rate indicates cache warming issues or poor key design. |
| Hit Count | Absolute number of cache hits in a time window | Helps calculate total cache value in terms of backend requests saved. |
| Miss Count | Absolute number of cache misses in a time window | Corresponds to actual backend load. Sudden spikes indicate cache failures. |
| Eviction Count | Number of entries removed to make space for new ones | High evictions suggest cache is undersized or has hot key contention. |
| Latency (Get) | Time to retrieve an entry from cache | Cache reads should be sub-millisecond. High latency negates caching benefits. |
| Latency (Set) | Time to write an entry to cache | Write latency affects user-perceived performance for cache-aside patterns. |
| Memory Usage | Current memory consumed by cached entries | Approaching memory limits increases eviction pressure. |
| Key Count | Number of entries currently in cache | Helps understand cache utilization and entry size distribution. |
| TTL Distribution | Histogram of remaining TTL across entries | Reveals whether entries are being accessed before expiration. |
The Hit Rate Formula:
Hit Rate = Cache Hits / (Cache Hits + Cache Misses) × 100%
A typical target hit rate depends on your use case:
A 99% hit rate is worthless if the 1% of misses correspond to your most expensive database queries. Always consider hit rate alongside the cost of misses—some cache misses are cheap (simple key lookups), while others are catastrophic (complex aggregation queries taking 10+ seconds).
Effective cache instrumentation requires capturing metrics at the right granularity. You need both aggregate metrics (total hit rate across all caches) and per-key-pattern metrics (hit rate for user profiles vs. product data).
The instrumentation layer should be transparent to business logic—cache users shouldn't need to know that metrics are being collected.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159
import { Counter, Histogram, Gauge, Registry } from 'prom-client'; // Metrics interface for dependency injectioninterface CacheMetrics { recordHit(keyPattern: string): void; recordMiss(keyPattern: string): void; recordLatency(operation: 'get' | 'set' | 'delete', durationMs: number): void; recordEviction(keyPattern: string, reason: 'ttl' | 'lru' | 'manual'): void; setEntryCount(count: number): void; setMemoryUsage(bytes: number): void;} // Prometheus-based metrics implementationclass PrometheusCacheMetrics implements CacheMetrics { private hitCounter: Counter; private missCounter: Counter; private latencyHistogram: Histogram; private evictionCounter: Counter; private entryCountGauge: Gauge; private memoryUsageGauge: Gauge; constructor(registry: Registry, cacheName: string) { this.hitCounter = new Counter({ name: 'cache_hits_total', help: 'Total number of cache hits', labelNames: ['cache', 'key_pattern'], registers: [registry], }); this.missCounter = new Counter({ name: 'cache_misses_total', help: 'Total number of cache misses', labelNames: ['cache', 'key_pattern'], registers: [registry], }); this.latencyHistogram = new Histogram({ name: 'cache_operation_duration_seconds', help: 'Cache operation latency in seconds', labelNames: ['cache', 'operation'], buckets: [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5], registers: [registry], }); this.evictionCounter = new Counter({ name: 'cache_evictions_total', help: 'Total number of cache evictions', labelNames: ['cache', 'key_pattern', 'reason'], registers: [registry], }); this.entryCountGauge = new Gauge({ name: 'cache_entries_current', help: 'Current number of entries in cache', labelNames: ['cache'], registers: [registry], }); this.memoryUsageGauge = new Gauge({ name: 'cache_memory_bytes', help: 'Current memory usage of cache in bytes', labelNames: ['cache'], registers: [registry], }); } recordHit(keyPattern: string): void { this.hitCounter.inc({ cache: this.cacheName, key_pattern: keyPattern }); } recordMiss(keyPattern: string): void { this.missCounter.inc({ cache: this.cacheName, key_pattern: keyPattern }); } recordLatency(operation: 'get' | 'set' | 'delete', durationMs: number): void { this.latencyHistogram.observe( { cache: this.cacheName, operation }, durationMs / 1000 // Convert to seconds ); } recordEviction(keyPattern: string, reason: 'ttl' | 'lru' | 'manual'): void { this.evictionCounter.inc({ cache: this.cacheName, key_pattern: keyPattern, reason }); } setEntryCount(count: number): void { this.entryCountGauge.set({ cache: this.cacheName }, count); } setMemoryUsage(bytes: number): void { this.memoryUsageGauge.set({ cache: this.cacheName }, bytes); }} // Cache wrapper that adds metrics transparentlyclass InstrumentedCache<T> implements Cache<T> { constructor( private innerCache: Cache<T>, private metrics: CacheMetrics, private keyPatternExtractor: (key: string) => string = key => key.split(':')[0] ) {} async get(key: string): Promise<T | null> { const pattern = this.keyPatternExtractor(key); const startTime = performance.now(); try { const result = await this.innerCache.get(key); if (result !== null) { this.metrics.recordHit(pattern); } else { this.metrics.recordMiss(pattern); } return result; } finally { const duration = performance.now() - startTime; this.metrics.recordLatency('get', duration); } } async set(key: string, value: T, ttlSeconds?: number): Promise<void> { const startTime = performance.now(); try { await this.innerCache.set(key, value, ttlSeconds); } finally { const duration = performance.now() - startTime; this.metrics.recordLatency('set', duration); } } async delete(key: string): Promise<void> { const pattern = this.keyPatternExtractor(key); const startTime = performance.now(); try { await this.innerCache.delete(key); this.metrics.recordEviction(pattern, 'manual'); } finally { const duration = performance.now() - startTime; this.metrics.recordLatency('delete', duration); } } async clear(): Promise<void> { await this.innerCache.clear(); }} // Usage exampleconst registry = new Registry();const metrics = new PrometheusCacheMetrics(registry, 'product-cache');const rawCache = new RedisCache<Product>(redisClient);const cache = new InstrumentedCache(rawCache, metrics); // Now all cache operations automatically record metricsconst product = await cache.get('product:12345');Notice the keyPatternExtractor function that converts 'product:12345' to 'product'. This groups metrics by logical entity type rather than individual keys. Without this, you'd have millions of unique label values (one per key), which would explode your metrics cardinality and overwhelm your monitoring system.
Raw hit rate numbers tell only part of the story. How hit rate changes over time reveals important information about cache behavior, system health, and user patterns.
Let's examine common hit rate patterns and what they indicate:
| Pattern | Visual Signature | Likely Cause | Action |
|---|---|---|---|
| Stable High | Flat line at 85-95% | Cache is working well, sized appropriately | Monitor but no action needed |
| Gradual Decline | Slow downward trend | Growing dataset exceeding cache capacity | Increase cache size or improve eviction strategy |
| Sudden Drop | Sharp cliff in hit rate | Cache restart, mass invalidation, or config change | Investigate recent deployments or system events |
| Periodic Dips | Regular valleys (hourly, daily) | Scheduled jobs, traffic patterns, or batch invalidations | Pre-warm cache before high-traffic periods |
| Sawtooth Pattern | Rise and fall cycles | Cache warming after cold start, then gradual decay | Consider longer TTLs or background refresh |
| Bimodal Distribution | Alternating high/low | Hot keys vs. long-tail access patterns | Implement tiered caching (hot vs. cold) |
| Always Low | Flat line below 50% | Poor cache key design, TTLs too short, or cold data | Analyze access patterns and redesign caching strategy |
Case Study: The Mysterious Monday Mornings
A team noticed their cache hit rate dropped from 90% to 40% every Monday morning, gradually recovering by Tuesday. Investigation revealed:
Solution: Implement pre-warming script that runs after deployment, populating cache with most-accessed keys before production traffic arrives.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
// Pre-warming script for deploymentclass CacheWarmer { constructor( private cache: Cache<any>, private database: Database, private metrics: WarmingMetrics ) {} async warmCache(): Promise<WarmingReport> { console.log('🔥 Starting cache pre-warming...'); const startTime = Date.now(); const report: WarmingReport = { succeeded: 0, failed: 0, skipped: 0 }; // Get most accessed keys from analytics const hotKeys = await this.getHotKeys(); console.log(`Found ${hotKeys.length} hot keys to warm`); // Warm in batches to avoid overwhelming database const batchSize = 100; for (let i = 0; i < hotKeys.length; i += batchSize) { const batch = hotKeys.slice(i, i + batchSize); await Promise.all(batch.map(async (keyInfo) => { try { // Check if already cached (another instance may have warmed it) const existing = await this.cache.get(keyInfo.key); if (existing) { report.skipped++; return; } // Fetch and cache const data = await this.fetchData(keyInfo); await this.cache.set(keyInfo.key, data, keyInfo.ttl); report.succeeded++; } catch (error) { console.error(`Failed to warm key ${keyInfo.key}:`, error); report.failed++; } })); // Progress logging const progress = Math.round(((i + batchSize) / hotKeys.length) * 100); console.log(`Warming progress: ${progress}%`); } const duration = Date.now() - startTime; console.log(`✅ Cache warming complete in ${duration}ms`); console.log(` Succeeded: ${report.succeeded}, Failed: ${report.failed}, Skipped: ${report.skipped}`); this.metrics.recordWarmingComplete(report, duration); return report; } private async getHotKeys(): Promise<KeyInfo[]> { // Query analytics for most accessed keys in last 7 days return this.database.query(` SELECT cache_key as key, key_type, access_count, avg_ttl as ttl FROM cache_analytics WHERE accessed_at > NOW() - INTERVAL '7 days' GROUP BY cache_key, key_type ORDER BY access_count DESC LIMIT 10000 `); } private async fetchData(keyInfo: KeyInfo): Promise<any> { switch (keyInfo.key_type) { case 'product': return this.database.findProduct(this.extractId(keyInfo.key)); case 'user': return this.database.findUser(this.extractId(keyInfo.key)); case 'category': return this.database.findCategory(this.extractId(keyInfo.key)); default: throw new Error(`Unknown key type: ${keyInfo.key_type}`); } } private extractId(key: string): string { return key.split(':')[1]; }} // Run warming after deploymentasync function postDeploymentHook() { const warmer = new CacheWarmer(cache, database, metrics); const report = await warmer.warmCache(); // Fail deployment if warming is too unsuccessful const successRate = report.succeeded / (report.succeeded + report.failed); if (successRate < 0.9) { throw new Error(`Cache warming failed: only ${successRate * 100}% success rate`); }}Aggregate cache metrics can hide important problems. A 90% overall hit rate might mask a 20% hit rate for your most expensive queries. Segmenting metrics by logical dimensions reveals these hidden issues.
123456789101112131415161718192021222324252627282930313233343536
# Overall hit ratesum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m]))) # Hit rate by key patternsum by (key_pattern) (rate(cache_hits_total[5m])) / (sum by (key_pattern) (rate(cache_hits_total[5m])) + sum by (key_pattern) (rate(cache_misses_total[5m]))) # Identify patterns with worst hit ratesbottomk(5, sum by (key_pattern) (rate(cache_hits_total[5m])) / (sum by (key_pattern) (rate(cache_hits_total[5m])) + sum by (key_pattern) (rate(cache_misses_total[5m])))) # Hit rate change compared to 24 hours ago (detect degradation)( sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))) - ( sum(rate(cache_hits_total[5m] offset 24h)) / (sum(rate(cache_hits_total[5m] offset 24h)) + sum(rate(cache_misses_total[5m] offset 24h)))) # Cache latency percentileshistogram_quantile(0.95, sum by (le, operation) (rate(cache_operation_duration_seconds_bucket[5m]))) # Eviction rate by reasonsum by (reason) (rate(cache_evictions_total[5m])) # Cache size utilization (if max is known)cache_entries_current / cache_max_entries * 100 # Cost-weighted miss rate (expensive queries hurt more)sum by (key_pattern) (rate(cache_misses_total[5m]) * on(key_pattern) query_cost_weight)Interpreting Segmented Metrics:
Suppose overall hit rate is 88%, but segmented analysis reveals:
| Key Pattern | Hit Rate | Miss Cost | Impact |
|---|---|---|---|
| product | 95% | Low (10ms) | Minimal |
| user-profile | 92% | Medium (50ms) | Low |
| recommendations | 45% | Very High (2s) | Critical |
| session | 99% | Low (5ms) | Minimal |
The recommendations cache has a terrible hit rate, and each miss costs 2 seconds. This is where optimization effort should focus—not on the well-performing product cache.
Consider tracking a 'cost-weighted hit rate' that accounts for the expense of cache misses. A 50% hit rate on a 2-second query is much more valuable than a 99% hit rate on a 5ms query. Weight your metrics by the cost of the underlying operation.
A well-designed cache dashboard provides at-a-glance understanding of cache health while enabling deep-dive investigation when issues arise. The dashboard should answer: "Is the cache healthy?" within 5 seconds of viewing.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293
{ "title": "Cache Health Dashboard", "panels": [ { "title": "Overall Hit Rate", "type": "stat", "targets": [{ "expr": "sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m]))) * 100", "legendFormat": "Hit Rate %" }], "fieldConfig": { "defaults": { "thresholds": { "steps": [ { "value": 0, "color": "red" }, { "value": 70, "color": "yellow" }, { "value": 85, "color": "green" } ] }, "unit": "percent" } } }, { "title": "Hit Rate Trend (vs 24h ago)", "type": "timeseries", "targets": [ { "expr": "sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m]))) * 100", "legendFormat": "Current" }, { "expr": "sum(rate(cache_hits_total[5m] offset 24h)) / (sum(rate(cache_hits_total[5m] offset 24h)) + sum(rate(cache_misses_total[5m] offset 24h))) * 100", "legendFormat": "24h Ago" } ] }, { "title": "Hit Rate by Key Pattern", "type": "piechart", "targets": [{ "expr": "sum by (key_pattern) (rate(cache_hits_total[5m]))", "legendFormat": "{{key_pattern}}" }] }, { "title": "Cache Operation Latency (P95)", "type": "timeseries", "targets": [{ "expr": "histogram_quantile(0.95, sum by (le, operation) (rate(cache_operation_duration_seconds_bucket[5m])))", "legendFormat": "{{operation}} P95" }] }, { "title": "Evictions by Reason", "type": "timeseries", "targets": [{ "expr": "sum by (reason) (rate(cache_evictions_total[5m]))", "legendFormat": "{{reason}}" }] }, { "title": "Memory Utilization", "type": "gauge", "targets": [{ "expr": "cache_memory_bytes / cache_memory_limit_bytes * 100", "legendFormat": "Memory %" }], "fieldConfig": { "defaults": { "thresholds": { "steps": [ { "value": 0, "color": "green" }, { "value": 70, "color": "yellow" }, { "value": 90, "color": "red" } ] } } } }, { "title": "Top Missed Key Patterns", "type": "table", "targets": [{ "expr": "topk(10, sum by (key_pattern) (rate(cache_misses_total[5m])))", "legendFormat": "{{key_pattern}}" }], "transformations": [ { "id": "sortBy", "options": { "fields": { "Value": { "order": "desc" } } } } ] } ]}Dashboards are for investigation; alerts are for detection. Proper cache alerting catches issues before users notice degraded performance.
Cache alerts fall into two categories:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
groups: - name: cache-alerts rules: # Hit rate dropped below acceptable threshold - alert: CacheHitRateLow expr: | ( sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m]))) ) < 0.7 for: 5m labels: severity: warning annotations: summary: "Cache hit rate below 70%" description: "Cache hit rate is {{ $value | humanizePercentage }}, indicating potential cache issues" # Hit rate dropped significantly from baseline - alert: CacheHitRateDegraded expr: | ( sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m]))) ) < ( avg_over_time( (sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m]))) )[24h:5m] ) * 0.8 ) for: 10m labels: severity: warning annotations: summary: "Cache hit rate degraded by more than 20%" description: "Current hit rate is 20%+ below 24h average" # Specific key pattern has critically low hit rate - alert: CachePatternHitRateCritical expr: | ( sum by (key_pattern) (rate(cache_hits_total[5m])) / (sum by (key_pattern) (rate(cache_hits_total[5m])) + sum by (key_pattern) (rate(cache_misses_total[5m]))) ) < 0.5 and sum by (key_pattern) (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) > 100 for: 5m labels: severity: critical annotations: summary: "Key pattern {{ $labels.key_pattern }} has critical hit rate" description: "Hit rate for {{ $labels.key_pattern }} is {{ $value | humanizePercentage }}" # High eviction rate indicates memory pressure - alert: CacheEvictionRateHigh expr: | sum(rate(cache_evictions_total{reason="lru"}[5m])) > 1000 for: 5m labels: severity: warning annotations: summary: "High cache eviction rate" description: "{{ $value }} evictions/second indicates cache is under memory pressure" # Cache latency degraded - alert: CacheLatencyHigh expr: | histogram_quantile(0.95, sum by (le) (rate(cache_operation_duration_seconds_bucket{operation="get"}[5m])) ) > 0.01 for: 5m labels: severity: warning annotations: summary: "Cache GET latency P95 above 10ms" description: "P95 cache read latency is {{ $value | humanizeDuration }}" # Cache memory approaching limit - alert: CacheMemoryHigh expr: cache_memory_bytes / cache_memory_limit_bytes > 0.9 for: 5m labels: severity: warning annotations: summary: "Cache memory utilization above 90%" description: "Cache is using {{ $value | humanizePercentage }} of available memory" # No cache hits - cache may be down - alert: CacheNoHits expr: sum(rate(cache_hits_total[5m])) == 0 and sum(rate(cache_misses_total[5m])) > 0 for: 2m labels: severity: critical annotations: summary: "Cache is returning zero hits" description: "Cache may be offline or completely flushed"Set alert thresholds based on actual impact, not theoretical ideals. A 70% hit rate might trigger an alert but not require immediate action. Use severity levels (info, warning, critical) and escalation paths. Reserve critical alerts for issues that genuinely require immediate intervention.
Cache metrics transform caching from a black box into a transparent, observable system component. Effective metrics enable proactive optimization rather than reactive firefighting.
What's next:
With comprehensive metrics in place, we'll explore Debugging Cache Issues—systematic approaches to diagnosing cache problems when metrics indicate something is wrong.
You now understand how to instrument caches for comprehensive metric collection, interpret metric patterns to diagnose issues, build effective monitoring dashboards, and configure alerts that catch problems before they impact users.