Loading content...
Every time your system needs data, it asks a simple question: Is this data already in the cache? The answer—yes or no—determines whether you serve the request in microseconds or milliseconds. This binary outcome, multiplied across millions of requests, defines whether your system feels instantaneous or sluggish, whether your infrastructure costs are reasonable or astronomical.
Understanding the difference between cache hits and misses, and more importantly, understanding why hits occur and how to maximize them, is fundamental to building high-performance systems. In this page, we'll dissect the mechanics of cache access, explore the critical concept of hit rate, and examine how hit rate impacts every aspect of system behavior.
By the end of this page, you will understand the precise mechanics of cache hits and misses, how to calculate and interpret hit rates, how hit rate affects latency distribution and backend load, and the subtle factors that influence whether your cache is performing well or poorly.
A cache hit occurs when requested data is found in the cache and can be served directly without accessing the slower backend data source. This is the happy path—the scenario you want to optimize for.
The cache hit sequence:
The entire sequence typically completes in microseconds to low milliseconds, depending on whether the cache is in-process (nanoseconds) or network-accessible (sub-millisecond to a few milliseconds).
Why cache hits are fast:
Cache hits are fast for several reinforcing reasons:
The hit's contribution to perceived performance:
Users experience the average response time of your system, which is heavily weighted by cache hits. If 95% of requests hit the cache (0.5ms each) and 5% miss (100ms each), your average response time is:
0.95 × 0.5ms + 0.05 × 100ms = 0.475ms + 5ms = 5.475ms
The 5% of misses dominate the average! This is why even small improvements in hit rate can dramatically improve perceived performance.
A 'warm' cache is one that has been populated with data through normal operation. A 'cold' cache (just started, just cleared) has no data and will miss on everything. System performance immediately after a cache restart can be significantly worse until the cache warms up—a phenomenon called the 'cold start problem.'
A cache miss occurs when requested data is not found in the cache (or is found but stale/invalid). The application must then fetch data from the slower backend source. This is the expensive path that caching aims to minimize.
The cache miss sequence:
This sequence takes significantly longer—typically 10-1000x longer than a cache hit, depending on the backend.
Types of cache misses:
Not all misses are equal. Understanding miss types helps diagnose cache behavior:
1. Compulsory Miss (Cold Miss)
The very first request for a piece of data. No cache can avoid this—the data has never been accessed before. Compulsory misses are inevitable; they represent the cost of populating an empty cache.
2. Capacity Miss
The cache is full and had to evict this entry to make room for other data. The data was previously cached but has been removed. Indicates the cache is undersized for the working set.
3. Conflict Miss
In caches with limited associativity (common in CPU caches), data may be evicted even when the cache isn't full because of hash collisions. Less common in application-level caches.
4. Coherence/Invalidation Miss
The entry was explicitly invalidated due to a data update. The data existed but was removed to maintain correctness. This is an intentional miss.
5. Expiration Miss
The entry's TTL expired. The data was cached but is now considered stale. Indicates TTL may be too short, or this is working as designed.
When debugging poor cache performance, classify your misses. High compulsory misses on startup are normal. Persistent capacity misses indicate you need a larger cache. Frequent invalidation misses suggest overly aggressive invalidation. The fix depends on the miss type.
The cache hit rate is the most important metric for evaluating cache effectiveness. It measures the percentage of requests that are served from the cache without accessing the backend.
Hit Rate Formula:
Hit Rate = (Cache Hits) / (Cache Hits + Cache Misses) × 100%
Alternatively stated:
Hit Rate = (Cache Hits) / (Total Requests) × 100%
A hit rate of 95% means 95 out of every 100 requests are served from cache. Only 5 requests reach the backend.
Miss Rate:
The complement of hit rate is miss rate:
Miss Rate = 100% - Hit Rate
Miss Rate = (Cache Misses) / (Total Requests) × 100%
At 95% hit rate, miss rate is 5%.
| Hit Rate | Miss Rate | Assessment | Typical Scenario |
|---|---|---|---|
| < 50% | 50% | Poor - Cache likely misconfigured | Wrong data being cached, TTL too short |
| 50-80% | 20-50% | Moderate - Room for improvement | Diverse workload, undersized cache |
| 80-90% | 10-20% | Good - Effective caching | Well-configured general workload |
| 90-95% | 5-10% | Very Good - Optimized | Hot data well-cached |
| 95-99% | 1-5% | Excellent - High-performance | Highly cacheable workload |
99% | < 1% | Exceptional - Rare to achieve | Static/infrequently changing data |
Interpreting hit rates:
Hit rate interpretation depends heavily on context:
Hit rate over time:
Hit rates aren't static. They change based on:
Hit rates below 50% usually indicate a fundamental problem: caching the wrong data, TTLs too short, cache too small, or key design issues causing unnecessary uniqueness. A low hit rate means you're paying for cache infrastructure without getting corresponding benefit.
One of the most important—and counterintuitive—properties of hit rate is that its impact on system performance is non-linear. Small improvements in hit rate at high cache hit rates can have dramatic effects, while the same improvements at lower rates are less impactful.
The backend load calculation:
If your system receives R requests per second and your cache hit rate is H, the requests reaching your backend are:
Backend Load = R × (1 - H)
Consider 100,000 requests per second:
| Hit Rate | Miss Rate | Backend QPS | Improvement from +5% Hit Rate |
|---|---|---|---|
| 80% | 20% | 20,000 | — |
| 85% | 15% | 15,000 | 5,000 fewer QPS (25% reduction) |
| 90% | 10% | 10,000 | 5,000 fewer QPS (33% reduction) |
| 95% | 5% | 5,000 | 5,000 fewer QPS (50% reduction) |
| 99% | 1% | 1,000 | 4,000 fewer QPS (80% reduction) |
Notice the pattern: moving from 90% to 95% hit rate halves your backend load. Moving from 95% to 99% cuts backend load by 80%. At high hit rates, each percentage point improvement has outsized impact.
The latency distribution effect:
Hit rate also determines your latency distribution. Let's model a system where:
| Hit Rate | Average Latency | p50 Latency | p99 Latency |
|---|---|---|---|
| 50% | 50.5ms | ~50ms (mix) | 100ms |
| 80% | 20.8ms | 1ms | 100ms |
| 90% | 10.9ms | 1ms | 100ms |
| 95% | 5.95ms | 1ms | 100ms |
| 99% | 1.99ms | 1ms | 100ms |
Important observation: Your p50 (median) latency improves dramatically with hit rate, but your p99 latency remains at the miss latency until you achieve extremely high hit rates. This is why tail latency is often dominated by cache misses even in well-optimized systems.
Capacity planning implications:
Because hit rate determines backend load, capacity planning must account for cache behavior:
At 99% hit rate, your backend sees 1% of traffic. At 99.9% hit rate, it sees only 0.1%—a 10x reduction. This is why highly cacheable workloads (CDN, static content) can serve millions of requests per second with minimal origin infrastructure. Every nine you add multiplies the benefit.
Achieving high hit rates requires understanding and optimizing the factors that influence whether requests hit or miss. These factors are often interconnected and must be balanced against each other.
The working set concept:
The working set is the subset of data that is actively accessed during a given time window. Understanding your working set is crucial for cache sizing:
Measuring working set:
Estimate working set by analyzing:
Example analysis:
- 1 hour: 50,000 unique keys accessed
- 1 day: 200,000 unique keys accessed
- 1 week: 500,000 unique keys accessed
- Average entry size: 2 KB
1-hour working set: ~100 MB
1-day working set: ~400 MB
1-week working set: ~1 GB
If your cache is 512 MB with a 1-hour TTL, you're likely undersized for the workload.
Real workloads typically follow Pareto-like distributions: a small percentage of data receives most of the traffic. This is good news for caching—a cache that can hold just the 'hot' data can achieve high hit rates. Analyze your access patterns; they're probably more cacheable than random distributions would suggest.
One of the most dangerous cache-related failure modes is the cache miss storm, also known as the thundering herd problem. This occurs when a large number of requests simultaneously miss the cache and all hit the backend, potentially overwhelming it.
How thundering herds form:
Imagine a popular cached entry—say, the homepage data—that 1,000 requests per second access. At exactly the moment this entry expires:
Scenarios that trigger thundering herds:
Mitigation strategies:
Thundering herds are often discovered in production during high-traffic periods. A site that works fine at moderate load suddenly crashes when a popular cache entry expires during peak traffic. Implement request coalescing proactively—don't wait for the outage.
You can't optimize what you don't measure. Effective cache management requires continuous monitoring of hit rates and related metrics. Let's explore how to instrument and interpret cache metrics.
1234567891011121314151617181920212223242526
# Cache metrics exposed by application# Example Prometheus metric exposition # Counter: Total cache hitscache_hits_total{cache="product_cache"} 15234567 # Counter: Total cache misses cache_misses_total{cache="product_cache"} 892341 # Gauge: Current number of cached entriescache_entries{cache="product_cache"} 48234 # Gauge: Memory usage in bytescache_memory_bytes{cache="product_cache"} 268435456 # Counter: Evictions by reasoncache_evictions_total{cache="product_cache",reason="capacity"} 12456cache_evictions_total{cache="product_cache",reason="expired"} 34521 # Histogram: Cache operation latencycache_operation_duration_seconds_bucket{cache="product_cache",op="get",le="0.001"} 14000000cache_operation_duration_seconds_bucket{cache="product_cache",op="get",le="0.01"} 15500000 # Derived metric (PromQL query)# Hit rate = hits / (hits + misses)# rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m]))Alerting on cache metrics:
Set up alerts for cache health:
Dashboards and visualization:
Effective cache dashboards show:
Track hit rates per cache 'namespace' or data type. Aggregate hit rate can hide problems: 99% hit rate on session data might mask 40% hit rate on product data. Segmented metrics reveal which caches need attention.
Understanding the mechanics of cache hits and misses is fundamental to building and operating cached systems effectively. Let's consolidate the key insights:
What's next:
Now that we understand how cache hits and misses work and their impact on systems, the next page explores the performance improvement potential of caching. We'll quantify just how much faster and more scalable systems become with effective caching, and explore the mathematical models that help us predict cache behavior.
You now understand the fundamental mechanics of cache hits and misses, how to measure hit rates, and the factors that influence cache effectiveness. This knowledge is essential for designing, operating, and troubleshooting cached systems.