System Design (LLD)Cache Testing and Monitoring

Cache Testing and Monitoring

LevelIntermediate

Duration60 mins

TopicCache Testing and Monitoring

2 / 4

Cache Hit/Miss Metrics

What Gets Measured Gets Managed

A cache without metrics is flying blind. You might believe your cache is working—responses feel fast, databases aren't complaining—but without concrete measurements, you have no way to answer critical questions:

Is the cache actually reducing load on the database?
Are certain keys being evicted too quickly?
Is the cache sized appropriately for your workload?
When did cache performance start degrading?

Cache metrics are the foundation of cache management. They tell you whether your investment in caching infrastructure is paying dividends, where optimization opportunities exist, and when something is going wrong—often before users notice.

What You Will Learn

By the end of this page, you will understand the essential cache metrics to track, how to instrument your cache layer to collect these metrics, how to interpret metric patterns to diagnose issues, and how to set up alerting thresholds that catch problems before they impact users.

The Core Cache Metrics

While caches expose dozens of metrics, a handful of core measurements provide the vast majority of insight. Understanding these foundational metrics is essential before diving into more advanced instrumentation.

Essential Cache Metrics
Metric	Definition	Why It Matters
Hit Rate	Percentage of requests served from cache (hits / total requests)	The primary indicator of cache effectiveness. Low hit rate means the cache isn't reducing backend load.
Miss Rate	Percentage of requests that required backend fetch (misses / total requests)	Inverse of hit rate. High miss rate indicates cache warming issues or poor key design.
Hit Count	Absolute number of cache hits in a time window	Helps calculate total cache value in terms of backend requests saved.
Miss Count	Absolute number of cache misses in a time window	Corresponds to actual backend load. Sudden spikes indicate cache failures.
Eviction Count	Number of entries removed to make space for new ones	High evictions suggest cache is undersized or has hot key contention.
Latency (Get)	Time to retrieve an entry from cache	Cache reads should be sub-millisecond. High latency negates caching benefits.
Latency (Set)	Time to write an entry to cache	Write latency affects user-perceived performance for cache-aside patterns.
Memory Usage	Current memory consumed by cached entries	Approaching memory limits increases eviction pressure.
Key Count	Number of entries currently in cache	Helps understand cache utilization and entry size distribution.
TTL Distribution	Histogram of remaining TTL across entries	Reveals whether entries are being accessed before expiration.

The Hit Rate Formula:

Hit Rate = Cache Hits / (Cache Hits + Cache Misses) × 100%

A typical target hit rate depends on your use case:

Static content (logos, CSS): 95-99%+ hit rate expected
User session data: 80-95% hit rate typical
Product catalog: 70-90% depending on catalog size
Personalized recommendations: 50-80% due to per-user variance
Real-time data (stock prices): Lower hit rates acceptable given short TTLs

Hit Rate Isn't Everything

A 99% hit rate is worthless if the 1% of misses correspond to your most expensive database queries. Always consider hit rate alongside the cost of misses—some cache misses are cheap (simple key lookups), while others are catastrophic (complex aggregation queries taking 10+ seconds).

Instrumenting Your Cache Layer

Effective cache instrumentation requires capturing metrics at the right granularity. You need both aggregate metrics (total hit rate across all caches) and per-key-pattern metrics (hit rate for user profiles vs. product data).

The instrumentation layer should be transparent to business logic—cache users shouldn't need to know that metrics are being collected.

Instrumented Cache Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
import { Counter, Histogram, Gauge, Registry } from 'prom-client';
 
// Metrics interface for dependency injection
interface CacheMetrics {
    recordHit(keyPattern: string): void;
    recordMiss(keyPattern: string): void;
    recordLatency(operation: 'get' | 'set' | 'delete', durationMs: number): void;
    recordEviction(keyPattern: string, reason: 'ttl' | 'lru' | 'manual'): void;
    setEntryCount(count: number): void;
    setMemoryUsage(bytes: number): void;
}
 
// Prometheus-based metrics implementation
class PrometheusCacheMetrics implements CacheMetrics {
    private hitCounter: Counter;
    private missCounter: Counter;
    private latencyHistogram: Histogram;
    private evictionCounter: Counter;
    private entryCountGauge: Gauge;
    private memoryUsageGauge: Gauge;
    
    constructor(registry: Registry, cacheName: string) {
        this.hitCounter = new Counter({
            name: 'cache_hits_total',
            help: 'Total number of cache hits',
            labelNames: ['cache', 'key_pattern'],
            registers: [registry],
        });
        
        this.missCounter = new Counter({
            name: 'cache_misses_total',
            help: 'Total number of cache misses',
            labelNames: ['cache', 'key_pattern'],
            registers: [registry],
        });
        
        this.latencyHistogram = new Histogram({
            name: 'cache_operation_duration_seconds',
            help: 'Cache operation latency in seconds',
            labelNames: ['cache', 'operation'],
            buckets: [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5],
            registers: [registry],
        });
        
        this.evictionCounter = new Counter({
            name: 'cache_evictions_total',
            help: 'Total number of cache evictions',
            labelNames: ['cache', 'key_pattern', 'reason'],
            registers: [registry],
        });
        
        this.entryCountGauge = new Gauge({
            name: 'cache_entries_current',
            help: 'Current number of entries in cache',
            labelNames: ['cache'],
            registers: [registry],
        });
        
        this.memoryUsageGauge = new Gauge({
            name: 'cache_memory_bytes',
            help: 'Current memory usage of cache in bytes',
            labelNames: ['cache'],
            registers: [registry],
        });
    }
    
    recordHit(keyPattern: string): void {
        this.hitCounter.inc({ cache: this.cacheName, key_pattern: keyPattern });
    }
    
    recordMiss(keyPattern: string): void {
        this.missCounter.inc({ cache: this.cacheName, key_pattern: keyPattern });
    }
    
    recordLatency(operation: 'get' | 'set' | 'delete', durationMs: number): void {
        this.latencyHistogram.observe(
            { cache: this.cacheName, operation },
            durationMs / 1000  // Convert to seconds
        );
    }
    
    recordEviction(keyPattern: string, reason: 'ttl' | 'lru' | 'manual'): void {
        this.evictionCounter.inc({ cache: this.cacheName, key_pattern: keyPattern, reason });
    }
    
    setEntryCount(count: number): void {
        this.entryCountGauge.set({ cache: this.cacheName }, count);
    }
    
    setMemoryUsage(bytes: number): void {
        this.memoryUsageGauge.set({ cache: this.cacheName }, bytes);
    }
}
 
// Cache wrapper that adds metrics transparently
class InstrumentedCache<T> implements Cache<T> {
    constructor(
        private innerCache: Cache<T>,
        private metrics: CacheMetrics,
        private keyPatternExtractor: (key: string) => string = key => key.split(':')[0]
    ) {}
    
    async get(key: string): Promise<T | null> {
        const pattern = this.keyPatternExtractor(key);
        const startTime = performance.now();
        
        try {
            const result = await this.innerCache.get(key);
            
            if (result !== null) {
                this.metrics.recordHit(pattern);
            } else {
                this.metrics.recordMiss(pattern);
            }
            
            return result;
        } finally {
            const duration = performance.now() - startTime;
            this.metrics.recordLatency('get', duration);
        }
    }
    
    async set(key: string, value: T, ttlSeconds?: number): Promise<void> {
        const startTime = performance.now();
        
        try {
            await this.innerCache.set(key, value, ttlSeconds);
        } finally {
            const duration = performance.now() - startTime;
            this.metrics.recordLatency('set', duration);
        }
    }
    
    async delete(key: string): Promise<void> {
        const pattern = this.keyPatternExtractor(key);
        const startTime = performance.now();
        
        try {
            await this.innerCache.delete(key);
            this.metrics.recordEviction(pattern, 'manual');
        } finally {
            const duration = performance.now() - startTime;
            this.metrics.recordLatency('delete', duration);
        }
    }
    
    async clear(): Promise<void> {
        await this.innerCache.clear();
    }
}
 
// Usage example
const registry = new Registry();
const metrics = new PrometheusCacheMetrics(registry, 'product-cache');
const rawCache = new RedisCache<Product>(redisClient);
const cache = new InstrumentedCache(rawCache, metrics);
 
// Now all cache operations automatically record metrics
const product = await cache.get('product:12345');

Key Pattern Extraction

Notice the keyPatternExtractor function that converts 'product:12345' to 'product'. This groups metrics by logical entity type rather than individual keys. Without this, you'd have millions of unique label values (one per key), which would explode your metrics cardinality and overwhelm your monitoring system.

Understanding Hit Rate Patterns

Raw hit rate numbers tell only part of the story. How hit rate changes over time reveals important information about cache behavior, system health, and user patterns.

Let's examine common hit rate patterns and what they indicate:

Hit Rate Patterns and Their Meaning
Pattern	Visual Signature	Likely Cause	Action
Stable High	Flat line at 85-95%	Cache is working well, sized appropriately	Monitor but no action needed
Gradual Decline	Slow downward trend	Growing dataset exceeding cache capacity	Increase cache size or improve eviction strategy
Sudden Drop	Sharp cliff in hit rate	Cache restart, mass invalidation, or config change	Investigate recent deployments or system events
Periodic Dips	Regular valleys (hourly, daily)	Scheduled jobs, traffic patterns, or batch invalidations	Pre-warm cache before high-traffic periods
Sawtooth Pattern	Rise and fall cycles	Cache warming after cold start, then gradual decay	Consider longer TTLs or background refresh
Bimodal Distribution	Alternating high/low	Hot keys vs. long-tail access patterns	Implement tiered caching (hot vs. cold)
Always Low	Flat line below 50%	Poor cache key design, TTLs too short, or cold data	Analyze access patterns and redesign caching strategy

Case Study: The Mysterious Monday Mornings

A team noticed their cache hit rate dropped from 90% to 40% every Monday morning, gradually recovering by Tuesday. Investigation revealed:

Weekly deployment happened Sunday night
Deployment included cache restart (cold start)
Traffic ramped up Monday morning before cache could warm
Users experienced slow responses until cache populated

Solution: Implement pre-warming script that runs after deployment, populating cache with most-accessed keys before production traffic arrives.

Cache Warming Script
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
// Pre-warming script for deployment
class CacheWarmer {
    constructor(
        private cache: Cache<any>,
        private database: Database,
        private metrics: WarmingMetrics
    ) {}
    
    async warmCache(): Promise<WarmingReport> {
        console.log('🔥 Starting cache pre-warming...');
        const startTime = Date.now();
        const report: WarmingReport = { succeeded: 0, failed: 0, skipped: 0 };
        
        // Get most accessed keys from analytics
        const hotKeys = await this.getHotKeys();
        console.log(`Found ${hotKeys.length} hot keys to warm`);
        
        // Warm in batches to avoid overwhelming database
        const batchSize = 100;
        for (let i = 0; i < hotKeys.length; i += batchSize) {
            const batch = hotKeys.slice(i, i + batchSize);
            
            await Promise.all(batch.map(async (keyInfo) => {
                try {
                    // Check if already cached (another instance may have warmed it)
                    const existing = await this.cache.get(keyInfo.key);
                    if (existing) {
                        report.skipped++;
                        return;
                    }
                    
                    // Fetch and cache
                    const data = await this.fetchData(keyInfo);
                    await this.cache.set(keyInfo.key, data, keyInfo.ttl);
                    report.succeeded++;
                } catch (error) {
                    console.error(`Failed to warm key ${keyInfo.key}:`, error);
                    report.failed++;
                }
            }));
            
            // Progress logging
            const progress = Math.round(((i + batchSize) / hotKeys.length) * 100);
            console.log(`Warming progress: ${progress}%`);
        }
        
        const duration = Date.now() - startTime;
        console.log(`✅ Cache warming complete in ${duration}ms`);
        console.log(`   Succeeded: ${report.succeeded}, Failed: ${report.failed}, Skipped: ${report.skipped}`);
        
        this.metrics.recordWarmingComplete(report, duration);
        return report;
    }
    
    private async getHotKeys(): Promise<KeyInfo[]> {
        // Query analytics for most accessed keys in last 7 days
        return this.database.query(`
            SELECT 
                cache_key as key,
                key_type,
                access_count,
                avg_ttl as ttl
            FROM cache_analytics
            WHERE accessed_at > NOW() - INTERVAL '7 days'
            GROUP BY cache_key, key_type
            ORDER BY access_count DESC
            LIMIT 10000
        `);
    }
    
    private async fetchData(keyInfo: KeyInfo): Promise<any> {
        switch (keyInfo.key_type) {
            case 'product':
                return this.database.findProduct(this.extractId(keyInfo.key));
            case 'user':
                return this.database.findUser(this.extractId(keyInfo.key));
            case 'category':
                return this.database.findCategory(this.extractId(keyInfo.key));
            default:
                throw new Error(`Unknown key type: ${keyInfo.key_type}`);
        }
    }
    
    private extractId(key: string): string {
        return key.split(':')[1];
    }
}
 
// Run warming after deployment
async function postDeploymentHook() {
    const warmer = new CacheWarmer(cache, database, metrics);
    const report = await warmer.warmCache();
    
    // Fail deployment if warming is too unsuccessful
    const successRate = report.succeeded / (report.succeeded + report.failed);
    if (successRate < 0.9) {
        throw new Error(`Cache warming failed: only ${successRate * 100}% success rate`);
    }
}

Segmented Metrics Analysis

Aggregate cache metrics can hide important problems. A 90% overall hit rate might mask a 20% hit rate for your most expensive queries. Segmenting metrics by logical dimensions reveals these hidden issues.

Key Segmentation Dimensions

•By Entity Type — Product cache vs. User cache vs. Session cache. Different entities have different access patterns.
•By Operation Cost — Cheap lookups vs. expensive aggregations. Prioritize caching high-cost operations.
•By Client/Tenant — In multi-tenant systems, one tenant's traffic pattern shouldn't mask issues for others.
•By Geography — Cache behavior may differ across regions due to data locality.
•By Time of Day — Business hours vs. off-hours, weekdays vs. weekends.
•By Cache Instance — In distributed caches, identify if specific nodes are underperforming.

PromQL Queries for Segmented Analysis

PromQL

# Overall hit rate
sum(rate(cache_hits_total[5m])) / 
(sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))
 
# Hit rate by key pattern
sum by (key_pattern) (rate(cache_hits_total[5m])) / 
(sum by (key_pattern) (rate(cache_hits_total[5m])) + 
 sum by (key_pattern) (rate(cache_misses_total[5m])))
 
# Identify patterns with worst hit rates
bottomk(5, 
    sum by (key_pattern) (rate(cache_hits_total[5m])) / 
    (sum by (key_pattern) (rate(cache_hits_total[5m])) + 
     sum by (key_pattern) (rate(cache_misses_total[5m])))
)
 
# Hit rate change compared to 24 hours ago (detect degradation)
(
    sum(rate(cache_hits_total[5m])) / 
    (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))
) - (
    sum(rate(cache_hits_total[5m] offset 24h)) / 
    (sum(rate(cache_hits_total[5m] offset 24h)) + sum(rate(cache_misses_total[5m] offset 24h)))
)
 
# Cache latency percentiles
histogram_quantile(0.95, sum by (le, operation) (rate(cache_operation_duration_seconds_bucket[5m])))
 
# Eviction rate by reason
sum by (reason) (rate(cache_evictions_total[5m]))
 
# Cache size utilization (if max is known)
cache_entries_current / cache_max_entries * 100
 
# Cost-weighted miss rate (expensive queries hurt more)
sum by (key_pattern) (rate(cache_misses_total[5m]) * on(key_pattern) query_cost_weight)

Interpreting Segmented Metrics:

Suppose overall hit rate is 88%, but segmented analysis reveals:

Key Pattern	Hit Rate	Miss Cost	Impact
product	95%	Low (10ms)	Minimal
user-profile	92%	Medium (50ms)	Low
recommendations	45%	Very High (2s)	Critical
session	99%	Low (5ms)	Minimal

The recommendations cache has a terrible hit rate, and each miss costs 2 seconds. This is where optimization effort should focus—not on the well-performing product cache.

Cost-Weighted Hit Rate

Consider tracking a 'cost-weighted hit rate' that accounts for the expense of cache misses. A 50% hit rate on a 2-second query is much more valuable than a 99% hit rate on a 5ms query. Weight your metrics by the cost of the underlying operation.

Building a Cache Metrics Dashboard

A well-designed cache dashboard provides at-a-glance understanding of cache health while enabling deep-dive investigation when issues arise. The dashboard should answer: "Is the cache healthy?" within 5 seconds of viewing.

Essential Dashboard Panels

•Hit Rate Over Time — Line chart showing hit rate trend. Include 24h ago for comparison.
•Hit Rate by Key Pattern — Stacked area or heatmap showing hit rate per entity type.
•Request Volume — Hits and misses as stacked bars. Shows total cache traffic.
•Latency Percentiles — P50, P95, P99 for get and set operations.
•Eviction Rate — Line chart with evictions by reason (TTL, LRU, manual).
•Memory Utilization — Gauge or line chart showing cache memory usage vs. limit.
•Error Rate — Connection errors, timeouts, serialization failures.
•Top Missed Keys — Table of most frequently missed key patterns for investigation.

Grafana Dashboard JSON (Simplified)
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
{
  "title": "Cache Health Dashboard",
  "panels": [
    {
      "title": "Overall Hit Rate",
      "type": "stat",
      "targets": [{
        "expr": "sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m]))) * 100",
        "legendFormat": "Hit Rate %"
      }],
      "fieldConfig": {
        "defaults": {
          "thresholds": {
            "steps": [
              { "value": 0, "color": "red" },
              { "value": 70, "color": "yellow" },
              { "value": 85, "color": "green" }
            ]
          },
          "unit": "percent"
        }
      }
    },
    {
      "title": "Hit Rate Trend (vs 24h ago)",
      "type": "timeseries",
      "targets": [
        {
          "expr": "sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m]))) * 100",
          "legendFormat": "Current"
        },
        {
          "expr": "sum(rate(cache_hits_total[5m] offset 24h)) / (sum(rate(cache_hits_total[5m] offset 24h)) + sum(rate(cache_misses_total[5m] offset 24h))) * 100",
          "legendFormat": "24h Ago"
        }
      ]
    },
    {
      "title": "Hit Rate by Key Pattern",
      "type": "piechart",
      "targets": [{
        "expr": "sum by (key_pattern) (rate(cache_hits_total[5m]))",
        "legendFormat": "{{key_pattern}}"
      }]
    },
    {
      "title": "Cache Operation Latency (P95)",
      "type": "timeseries",
      "targets": [{
        "expr": "histogram_quantile(0.95, sum by (le, operation) (rate(cache_operation_duration_seconds_bucket[5m])))",
        "legendFormat": "{{operation}} P95"
      }]
    },
    {
      "title": "Evictions by Reason",
      "type": "timeseries",
      "targets": [{
        "expr": "sum by (reason) (rate(cache_evictions_total[5m]))",
        "legendFormat": "{{reason}}"
      }]
    },
    {
      "title": "Memory Utilization",
      "type": "gauge",
      "targets": [{
        "expr": "cache_memory_bytes / cache_memory_limit_bytes * 100",
        "legendFormat": "Memory %"
      }],
      "fieldConfig": {
        "defaults": {
          "thresholds": {
            "steps": [
              { "value": 0, "color": "green" },
              { "value": 70, "color": "yellow" },
              { "value": 90, "color": "red" }
            ]
          }
        }
      }
    },
    {
      "title": "Top Missed Key Patterns",
      "type": "table",
      "targets": [{
        "expr": "topk(10, sum by (key_pattern) (rate(cache_misses_total[5m])))",
        "legendFormat": "{{key_pattern}}"
      }],
      "transformations": [
        { "id": "sortBy", "options": { "fields": { "Value": { "order": "desc" } } } }
      ]
    }
  ]
}

Alerting on Cache Metrics

Dashboards are for investigation; alerts are for detection. Proper cache alerting catches issues before users notice degraded performance.

Cache alerts fall into two categories:

Threshold Alerts — Triggered when metric crosses a static threshold
Anomaly Alerts — Triggered when metric deviates significantly from historical baseline

Cache Alert Rules (Prometheus)
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
groups:
  - name: cache-alerts
    rules:
      # Hit rate dropped below acceptable threshold
      - alert: CacheHitRateLow
        expr: |
          (
            sum(rate(cache_hits_total[5m])) / 
            (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))
          ) < 0.7
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Cache hit rate below 70%"
          description: "Cache hit rate is {{ $value | humanizePercentage }}, indicating potential cache issues"
      
      # Hit rate dropped significantly from baseline
      - alert: CacheHitRateDegraded
        expr: |
          (
            sum(rate(cache_hits_total[5m])) / 
            (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))
          ) < (
            avg_over_time(
              (sum(rate(cache_hits_total[5m])) / 
               (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))
              )[24h:5m]
            ) * 0.8
          )
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Cache hit rate degraded by more than 20%"
          description: "Current hit rate is 20%+ below 24h average"
      
      # Specific key pattern has critically low hit rate
      - alert: CachePatternHitRateCritical
        expr: |
          (
            sum by (key_pattern) (rate(cache_hits_total[5m])) / 
            (sum by (key_pattern) (rate(cache_hits_total[5m])) + 
             sum by (key_pattern) (rate(cache_misses_total[5m])))
          ) < 0.5 and 
          sum by (key_pattern) (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) > 100
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Key pattern {{ $labels.key_pattern }} has critical hit rate"
          description: "Hit rate for {{ $labels.key_pattern }} is {{ $value | humanizePercentage }}"
      
      # High eviction rate indicates memory pressure
      - alert: CacheEvictionRateHigh
        expr: |
          sum(rate(cache_evictions_total{reason="lru"}[5m])) > 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High cache eviction rate"
          description: "{{ $value }} evictions/second indicates cache is under memory pressure"
      
      # Cache latency degraded
      - alert: CacheLatencyHigh
        expr: |
          histogram_quantile(0.95, 
            sum by (le) (rate(cache_operation_duration_seconds_bucket{operation="get"}[5m]))
          ) > 0.01
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Cache GET latency P95 above 10ms"
          description: "P95 cache read latency is {{ $value | humanizeDuration }}"
      
      # Cache memory approaching limit
      - alert: CacheMemoryHigh
        expr: cache_memory_bytes / cache_memory_limit_bytes > 0.9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Cache memory utilization above 90%"
          description: "Cache is using {{ $value | humanizePercentage }} of available memory"
      
      # No cache hits - cache may be down
      - alert: CacheNoHits
        expr: sum(rate(cache_hits_total[5m])) == 0 and sum(rate(cache_misses_total[5m])) > 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Cache is returning zero hits"
          description: "Cache may be offline or completely flushed"

Avoid Alert Fatigue

Set alert thresholds based on actual impact, not theoretical ideals. A 70% hit rate might trigger an alert but not require immediate action. Use severity levels (info, warning, critical) and escalation paths. Reserve critical alerts for issues that genuinely require immediate intervention.

Summary: Cache Hit/Miss Metrics

Cache metrics transform caching from a black box into a transparent, observable system component. Effective metrics enable proactive optimization rather than reactive firefighting.

Key Takeaways

•Core metrics: Hit rate, miss rate, latency, evictions, and memory usage form the foundation of cache observability.
•Instrument transparently: Wrap caches with instrumentation that collects metrics without changing the interface for consumers.
•Control cardinality: Use key pattern extraction to avoid exploding metrics from millions of unique keys.
•Segment your analysis: Aggregate hit rate can hide critical issues in specific areas. Break down by entity type, cost, and time.
•Build dashboards for insight: Design dashboards that answer 'Is the cache healthy?' at a glance while enabling deep-dives.
•Alert on degradation: Use both threshold and anomaly-based alerts. Alert on relative degradation, not just absolute values.
•Pre-warm after deployment: Use historical access patterns to warm caches before traffic arrives.

What's next:

With comprehensive metrics in place, we'll explore Debugging Cache Issues—systematic approaches to diagnosing cache problems when metrics indicate something is wrong.

Page Complete

You now understand how to instrument caches for comprehensive metric collection, interpret metric patterns to diagnose issues, build effective monitoring dashboards, and configure alerts that catch problems before they impact users.

2 / 4

Loading learning content...

System Design (LLD)Cache Testing and Monitoring

Cache Testing and Monitoring

LevelIntermediate

Duration60 mins

TopicCache Testing and Monitoring

2 / 4

Cache Hit/Miss Metrics

What Gets Measured Gets Managed

Is the cache actually reducing load on the database?
Are certain keys being evicted too quickly?
Is the cache sized appropriately for your workload?
When did cache performance start degrading?

What You Will Learn

The Core Cache Metrics

Essential Cache Metrics
Metric	Definition	Why It Matters
Hit Rate	Percentage of requests served from cache (hits / total requests)	The primary indicator of cache effectiveness. Low hit rate means the cache isn't reducing backend load.
Miss Rate	Percentage of requests that required backend fetch (misses / total requests)	Inverse of hit rate. High miss rate indicates cache warming issues or poor key design.
Hit Count	Absolute number of cache hits in a time window	Helps calculate total cache value in terms of backend requests saved.
Miss Count	Absolute number of cache misses in a time window	Corresponds to actual backend load. Sudden spikes indicate cache failures.
Eviction Count	Number of entries removed to make space for new ones	High evictions suggest cache is undersized or has hot key contention.
Latency (Get)	Time to retrieve an entry from cache	Cache reads should be sub-millisecond. High latency negates caching benefits.
Latency (Set)	Time to write an entry to cache	Write latency affects user-perceived performance for cache-aside patterns.
Memory Usage	Current memory consumed by cached entries	Approaching memory limits increases eviction pressure.
Key Count	Number of entries currently in cache	Helps understand cache utilization and entry size distribution.
TTL Distribution	Histogram of remaining TTL across entries	Reveals whether entries are being accessed before expiration.

The Hit Rate Formula:

Hit Rate = Cache Hits / (Cache Hits + Cache Misses) × 100%

A typical target hit rate depends on your use case:

Static content (logos, CSS): 95-99%+ hit rate expected
User session data: 80-95% hit rate typical
Product catalog: 70-90% depending on catalog size
Personalized recommendations: 50-80% due to per-user variance
Real-time data (stock prices): Lower hit rates acceptable given short TTLs

Hit Rate Isn't Everything

Instrumenting Your Cache Layer

The instrumentation layer should be transparent to business logic—cache users shouldn't need to know that metrics are being collected.

Instrumented Cache Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
import { Counter, Histogram, Gauge, Registry } from 'prom-client';
 
// Metrics interface for dependency injection
interface CacheMetrics {
    recordHit(keyPattern: string): void;
    recordMiss(keyPattern: string): void;
    recordLatency(operation: 'get' | 'set' | 'delete', durationMs: number): void;
    recordEviction(keyPattern: string, reason: 'ttl' | 'lru' | 'manual'): void;
    setEntryCount(count: number): void;
    setMemoryUsage(bytes: number): void;
}
 
// Prometheus-based metrics implementation
class PrometheusCacheMetrics implements CacheMetrics {
    private hitCounter: Counter;
    private missCounter: Counter;
    private latencyHistogram: Histogram;
    private evictionCounter: Counter;
    private entryCountGauge: Gauge;
    private memoryUsageGauge: Gauge;
    
    constructor(registry: Registry, cacheName: string) {
        this.hitCounter = new Counter({
            name: 'cache_hits_total',
            help: 'Total number of cache hits',
            labelNames: ['cache', 'key_pattern'],
            registers: [registry],
        });
        
        this.missCounter = new Counter({
            name: 'cache_misses_total',
            help: 'Total number of cache misses',
            labelNames: ['cache', 'key_pattern'],
            registers: [registry],
        });
        
        this.latencyHistogram = new Histogram({
            name: 'cache_operation_duration_seconds',
            help: 'Cache operation latency in seconds',
            labelNames: ['cache', 'operation'],
            buckets: [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5],
            registers: [registry],
        });
        
        this.evictionCounter = new Counter({
            name: 'cache_evictions_total',
            help: 'Total number of cache evictions',
            labelNames: ['cache', 'key_pattern', 'reason'],
            registers: [registry],
        });
        
        this.entryCountGauge = new Gauge({
            name: 'cache_entries_current',
            help: 'Current number of entries in cache',
            labelNames: ['cache'],
            registers: [registry],
        });
        
        this.memoryUsageGauge = new Gauge({
            name: 'cache_memory_bytes',
            help: 'Current memory usage of cache in bytes',
            labelNames: ['cache'],
            registers: [registry],
        });
    }
    
    recordHit(keyPattern: string): void {
        this.hitCounter.inc({ cache: this.cacheName, key_pattern: keyPattern });
    }
    
    recordMiss(keyPattern: string): void {
        this.missCounter.inc({ cache: this.cacheName, key_pattern: keyPattern });
    }
    
    recordLatency(operation: 'get' | 'set' | 'delete', durationMs: number): void {
        this.latencyHistogram.observe(
            { cache: this.cacheName, operation },
            durationMs / 1000  // Convert to seconds
        );
    }
    
    recordEviction(keyPattern: string, reason: 'ttl' | 'lru' | 'manual'): void {
        this.evictionCounter.inc({ cache: this.cacheName, key_pattern: keyPattern, reason });
    }
    
    setEntryCount(count: number): void {
        this.entryCountGauge.set({ cache: this.cacheName }, count);
    }
    
    setMemoryUsage(bytes: number): void {
        this.memoryUsageGauge.set({ cache: this.cacheName }, bytes);
    }
}
 
// Cache wrapper that adds metrics transparently
class InstrumentedCache<T> implements Cache<T> {
    constructor(
        private innerCache: Cache<T>,
        private metrics: CacheMetrics,
        private keyPatternExtractor: (key: string) => string = key => key.split(':')[0]
    ) {}
    
    async get(key: string): Promise<T | null> {
        const pattern = this.keyPatternExtractor(key);
        const startTime = performance.now();
        
        try {
            const result = await this.innerCache.get(key);
            
            if (result !== null) {
                this.metrics.recordHit(pattern);
            } else {
                this.metrics.recordMiss(pattern);
            }
            
            return result;
        } finally {
            const duration = performance.now() - startTime;
            this.metrics.recordLatency('get', duration);
        }
    }
    
    async set(key: string, value: T, ttlSeconds?: number): Promise<void> {
        const startTime = performance.now();
        
        try {
            await this.innerCache.set(key, value, ttlSeconds);
        } finally {
            const duration = performance.now() - startTime;
            this.metrics.recordLatency('set', duration);
        }
    }
    
    async delete(key: string): Promise<void> {
        const pattern = this.keyPatternExtractor(key);
        const startTime = performance.now();
        
        try {
            await this.innerCache.delete(key);
            this.metrics.recordEviction(pattern, 'manual');
        } finally {
            const duration = performance.now() - startTime;
            this.metrics.recordLatency('delete', duration);
        }
    }
    
    async clear(): Promise<void> {
        await this.innerCache.clear();
    }
}
 
// Usage example
const registry = new Registry();
const metrics = new PrometheusCacheMetrics(registry, 'product-cache');
const rawCache = new RedisCache<Product>(redisClient);
const cache = new InstrumentedCache(rawCache, metrics);
 
// Now all cache operations automatically record metrics
const product = await cache.get('product:12345');

Key Pattern Extraction

Understanding Hit Rate Patterns

Raw hit rate numbers tell only part of the story. How hit rate changes over time reveals important information about cache behavior, system health, and user patterns.

Let's examine common hit rate patterns and what they indicate:

Hit Rate Patterns and Their Meaning
Pattern	Visual Signature	Likely Cause	Action
Stable High	Flat line at 85-95%	Cache is working well, sized appropriately	Monitor but no action needed
Gradual Decline	Slow downward trend	Growing dataset exceeding cache capacity	Increase cache size or improve eviction strategy
Sudden Drop	Sharp cliff in hit rate	Cache restart, mass invalidation, or config change	Investigate recent deployments or system events
Periodic Dips	Regular valleys (hourly, daily)	Scheduled jobs, traffic patterns, or batch invalidations	Pre-warm cache before high-traffic periods
Sawtooth Pattern	Rise and fall cycles	Cache warming after cold start, then gradual decay	Consider longer TTLs or background refresh
Bimodal Distribution	Alternating high/low	Hot keys vs. long-tail access patterns	Implement tiered caching (hot vs. cold)
Always Low	Flat line below 50%	Poor cache key design, TTLs too short, or cold data	Analyze access patterns and redesign caching strategy

Case Study: The Mysterious Monday Mornings

A team noticed their cache hit rate dropped from 90% to 40% every Monday morning, gradually recovering by Tuesday. Investigation revealed:

Weekly deployment happened Sunday night
Deployment included cache restart (cold start)
Traffic ramped up Monday morning before cache could warm
Users experienced slow responses until cache populated

Solution: Implement pre-warming script that runs after deployment, populating cache with most-accessed keys before production traffic arrives.

Cache Warming Script
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
// Pre-warming script for deployment
class CacheWarmer {
    constructor(
        private cache: Cache<any>,
        private database: Database,
        private metrics: WarmingMetrics
    ) {}
    
    async warmCache(): Promise<WarmingReport> {
        console.log('🔥 Starting cache pre-warming...');
        const startTime = Date.now();
        const report: WarmingReport = { succeeded: 0, failed: 0, skipped: 0 };
        
        // Get most accessed keys from analytics
        const hotKeys = await this.getHotKeys();
        console.log(`Found ${hotKeys.length} hot keys to warm`);
        
        // Warm in batches to avoid overwhelming database
        const batchSize = 100;
        for (let i = 0; i < hotKeys.length; i += batchSize) {
            const batch = hotKeys.slice(i, i + batchSize);
            
            await Promise.all(batch.map(async (keyInfo) => {
                try {
                    // Check if already cached (another instance may have warmed it)
                    const existing = await this.cache.get(keyInfo.key);
                    if (existing) {
                        report.skipped++;
                        return;
                    }
                    
                    // Fetch and cache
                    const data = await this.fetchData(keyInfo);
                    await this.cache.set(keyInfo.key, data, keyInfo.ttl);
                    report.succeeded++;
                } catch (error) {
                    console.error(`Failed to warm key ${keyInfo.key}:`, error);
                    report.failed++;
                }
            }));
            
            // Progress logging
            const progress = Math.round(((i + batchSize) / hotKeys.length) * 100);
            console.log(`Warming progress: ${progress}%`);
        }
        
        const duration = Date.now() - startTime;
        console.log(`✅ Cache warming complete in ${duration}ms`);
        console.log(`   Succeeded: ${report.succeeded}, Failed: ${report.failed}, Skipped: ${report.skipped}`);
        
        this.metrics.recordWarmingComplete(report, duration);
        return report;
    }
    
    private async getHotKeys(): Promise<KeyInfo[]> {
        // Query analytics for most accessed keys in last 7 days
        return this.database.query(`
            SELECT 
                cache_key as key,
                key_type,
                access_count,
                avg_ttl as ttl
            FROM cache_analytics
            WHERE accessed_at > NOW() - INTERVAL '7 days'
            GROUP BY cache_key, key_type
            ORDER BY access_count DESC
            LIMIT 10000
        `);
    }
    
    private async fetchData(keyInfo: KeyInfo): Promise<any> {
        switch (keyInfo.key_type) {
            case 'product':
                return this.database.findProduct(this.extractId(keyInfo.key));
            case 'user':
                return this.database.findUser(this.extractId(keyInfo.key));
            case 'category':
                return this.database.findCategory(this.extractId(keyInfo.key));
            default:
                throw new Error(`Unknown key type: ${keyInfo.key_type}`);
        }
    }
    
    private extractId(key: string): string {
        return key.split(':')[1];
    }
}
 
// Run warming after deployment
async function postDeploymentHook() {
    const warmer = new CacheWarmer(cache, database, metrics);
    const report = await warmer.warmCache();
    
    // Fail deployment if warming is too unsuccessful
    const successRate = report.succeeded / (report.succeeded + report.failed);
    if (successRate < 0.9) {
        throw new Error(`Cache warming failed: only ${successRate * 100}% success rate`);
    }
}

Segmented Metrics Analysis

Key Segmentation Dimensions

•By Entity Type — Product cache vs. User cache vs. Session cache. Different entities have different access patterns.
•By Operation Cost — Cheap lookups vs. expensive aggregations. Prioritize caching high-cost operations.
•By Client/Tenant — In multi-tenant systems, one tenant's traffic pattern shouldn't mask issues for others.
•By Geography — Cache behavior may differ across regions due to data locality.
•By Time of Day — Business hours vs. off-hours, weekdays vs. weekends.
•By Cache Instance — In distributed caches, identify if specific nodes are underperforming.

PromQL Queries for Segmented Analysis

PromQL

# Overall hit rate
sum(rate(cache_hits_total[5m])) / 
(sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))
 
# Hit rate by key pattern
sum by (key_pattern) (rate(cache_hits_total[5m])) / 
(sum by (key_pattern) (rate(cache_hits_total[5m])) + 
 sum by (key_pattern) (rate(cache_misses_total[5m])))
 
# Identify patterns with worst hit rates
bottomk(5, 
    sum by (key_pattern) (rate(cache_hits_total[5m])) / 
    (sum by (key_pattern) (rate(cache_hits_total[5m])) + 
     sum by (key_pattern) (rate(cache_misses_total[5m])))
)
 
# Hit rate change compared to 24 hours ago (detect degradation)
(
    sum(rate(cache_hits_total[5m])) / 
    (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))
) - (
    sum(rate(cache_hits_total[5m] offset 24h)) / 
    (sum(rate(cache_hits_total[5m] offset 24h)) + sum(rate(cache_misses_total[5m] offset 24h)))
)
 
# Cache latency percentiles
histogram_quantile(0.95, sum by (le, operation) (rate(cache_operation_duration_seconds_bucket[5m])))
 
# Eviction rate by reason
sum by (reason) (rate(cache_evictions_total[5m]))
 
# Cache size utilization (if max is known)
cache_entries_current / cache_max_entries * 100
 
# Cost-weighted miss rate (expensive queries hurt more)
sum by (key_pattern) (rate(cache_misses_total[5m]) * on(key_pattern) query_cost_weight)

Interpreting Segmented Metrics:

Suppose overall hit rate is 88%, but segmented analysis reveals:

Key Pattern	Hit Rate	Miss Cost	Impact
product	95%	Low (10ms)	Minimal
user-profile	92%	Medium (50ms)	Low
recommendations	45%	Very High (2s)	Critical
session	99%	Low (5ms)	Minimal

The recommendations cache has a terrible hit rate, and each miss costs 2 seconds. This is where optimization effort should focus—not on the well-performing product cache.

Cost-Weighted Hit Rate

Building a Cache Metrics Dashboard

Essential Dashboard Panels

•Hit Rate Over Time — Line chart showing hit rate trend. Include 24h ago for comparison.
•Hit Rate by Key Pattern — Stacked area or heatmap showing hit rate per entity type.
•Request Volume — Hits and misses as stacked bars. Shows total cache traffic.
•Latency Percentiles — P50, P95, P99 for get and set operations.
•Eviction Rate — Line chart with evictions by reason (TTL, LRU, manual).
•Memory Utilization — Gauge or line chart showing cache memory usage vs. limit.
•Error Rate — Connection errors, timeouts, serialization failures.
•Top Missed Keys — Table of most frequently missed key patterns for investigation.

Grafana Dashboard JSON (Simplified)
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
{
  "title": "Cache Health Dashboard",
  "panels": [
    {
      "title": "Overall Hit Rate",
      "type": "stat",
      "targets": [{
        "expr": "sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m]))) * 100",
        "legendFormat": "Hit Rate %"
      }],
      "fieldConfig": {
        "defaults": {
          "thresholds": {
            "steps": [
              { "value": 0, "color": "red" },
              { "value": 70, "color": "yellow" },
              { "value": 85, "color": "green" }
            ]
          },
          "unit": "percent"
        }
      }
    },
    {
      "title": "Hit Rate Trend (vs 24h ago)",
      "type": "timeseries",
      "targets": [
        {
          "expr": "sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m]))) * 100",
          "legendFormat": "Current"
        },
        {
          "expr": "sum(rate(cache_hits_total[5m] offset 24h)) / (sum(rate(cache_hits_total[5m] offset 24h)) + sum(rate(cache_misses_total[5m] offset 24h))) * 100",
          "legendFormat": "24h Ago"
        }
      ]
    },
    {
      "title": "Hit Rate by Key Pattern",
      "type": "piechart",
      "targets": [{
        "expr": "sum by (key_pattern) (rate(cache_hits_total[5m]))",
        "legendFormat": "{{key_pattern}}"
      }]
    },
    {
      "title": "Cache Operation Latency (P95)",
      "type": "timeseries",
      "targets": [{
        "expr": "histogram_quantile(0.95, sum by (le, operation) (rate(cache_operation_duration_seconds_bucket[5m])))",
        "legendFormat": "{{operation}} P95"
      }]
    },
    {
      "title": "Evictions by Reason",
      "type": "timeseries",
      "targets": [{
        "expr": "sum by (reason) (rate(cache_evictions_total[5m]))",
        "legendFormat": "{{reason}}"
      }]
    },
    {
      "title": "Memory Utilization",
      "type": "gauge",
      "targets": [{
        "expr": "cache_memory_bytes / cache_memory_limit_bytes * 100",
        "legendFormat": "Memory %"
      }],
      "fieldConfig": {
        "defaults": {
          "thresholds": {
            "steps": [
              { "value": 0, "color": "green" },
              { "value": 70, "color": "yellow" },
              { "value": 90, "color": "red" }
            ]
          }
        }
      }
    },
    {
      "title": "Top Missed Key Patterns",
      "type": "table",
      "targets": [{
        "expr": "topk(10, sum by (key_pattern) (rate(cache_misses_total[5m])))",
        "legendFormat": "{{key_pattern}}"
      }],
      "transformations": [
        { "id": "sortBy", "options": { "fields": { "Value": { "order": "desc" } } } }
      ]
    }
  ]
}

Alerting on Cache Metrics

Dashboards are for investigation; alerts are for detection. Proper cache alerting catches issues before users notice degraded performance.

Cache alerts fall into two categories:

Threshold Alerts — Triggered when metric crosses a static threshold
Anomaly Alerts — Triggered when metric deviates significantly from historical baseline

Cache Alert Rules (Prometheus)
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
groups:
  - name: cache-alerts
    rules:
      # Hit rate dropped below acceptable threshold
      - alert: CacheHitRateLow
        expr: |
          (
            sum(rate(cache_hits_total[5m])) / 
            (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))
          ) < 0.7
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Cache hit rate below 70%"
          description: "Cache hit rate is {{ $value | humanizePercentage }}, indicating potential cache issues"
      
      # Hit rate dropped significantly from baseline
      - alert: CacheHitRateDegraded
        expr: |
          (
            sum(rate(cache_hits_total[5m])) / 
            (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))
          ) < (
            avg_over_time(
              (sum(rate(cache_hits_total[5m])) / 
               (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))
              )[24h:5m]
            ) * 0.8
          )
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Cache hit rate degraded by more than 20%"
          description: "Current hit rate is 20%+ below 24h average"
      
      # Specific key pattern has critically low hit rate
      - alert: CachePatternHitRateCritical
        expr: |
          (
            sum by (key_pattern) (rate(cache_hits_total[5m])) / 
            (sum by (key_pattern) (rate(cache_hits_total[5m])) + 
             sum by (key_pattern) (rate(cache_misses_total[5m])))
          ) < 0.5 and 
          sum by (key_pattern) (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) > 100
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Key pattern {{ $labels.key_pattern }} has critical hit rate"
          description: "Hit rate for {{ $labels.key_pattern }} is {{ $value | humanizePercentage }}"
      
      # High eviction rate indicates memory pressure
      - alert: CacheEvictionRateHigh
        expr: |
          sum(rate(cache_evictions_total{reason="lru"}[5m])) > 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High cache eviction rate"
          description: "{{ $value }} evictions/second indicates cache is under memory pressure"
      
      # Cache latency degraded
      - alert: CacheLatencyHigh
        expr: |
          histogram_quantile(0.95, 
            sum by (le) (rate(cache_operation_duration_seconds_bucket{operation="get"}[5m]))
          ) > 0.01
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Cache GET latency P95 above 10ms"
          description: "P95 cache read latency is {{ $value | humanizeDuration }}"
      
      # Cache memory approaching limit
      - alert: CacheMemoryHigh
        expr: cache_memory_bytes / cache_memory_limit_bytes > 0.9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Cache memory utilization above 90%"
          description: "Cache is using {{ $value | humanizePercentage }} of available memory"
      
      # No cache hits - cache may be down
      - alert: CacheNoHits
        expr: sum(rate(cache_hits_total[5m])) == 0 and sum(rate(cache_misses_total[5m])) > 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Cache is returning zero hits"
          description: "Cache may be offline or completely flushed"

Avoid Alert Fatigue

Summary: Cache Hit/Miss Metrics

Cache metrics transform caching from a black box into a transparent, observable system component. Effective metrics enable proactive optimization rather than reactive firefighting.

Key Takeaways

•Core metrics: Hit rate, miss rate, latency, evictions, and memory usage form the foundation of cache observability.
•Instrument transparently: Wrap caches with instrumentation that collects metrics without changing the interface for consumers.
•Control cardinality: Use key pattern extraction to avoid exploding metrics from millions of unique keys.
•Segment your analysis: Aggregate hit rate can hide critical issues in specific areas. Break down by entity type, cost, and time.
•Build dashboards for insight: Design dashboards that answer 'Is the cache healthy?' at a glance while enabling deep-dives.
•Alert on degradation: Use both threshold and anomaly-based alerts. Alert on relative degradation, not just absolute values.
•Pre-warm after deployment: Use historical access patterns to warm caches before traffic arrives.

What's next:

With comprehensive metrics in place, we'll explore Debugging Cache Issues—systematic approaches to diagnosing cache problems when metrics indicate something is wrong.

Page Complete

2 / 4