Loading content...
Caching doesn't just make systems a little faster—it can transform them fundamentally. A system that takes 500 milliseconds to respond can become one that responds in 5 milliseconds. A database that struggles at 1,000 queries per second can effectively handle 100,000 requests per second. Infrastructure costs can drop by 90% or more.
These aren't hypothetical improvements. They're what effective caching achieves in production systems every day. But to achieve these gains, you need to understand the potential—how much improvement is possible, under what conditions, and how to predict the impact of caching before you implement it.
In this page, we'll quantify the performance benefits of caching with mathematical precision, explore the models that predict cache behavior, and understand the boundaries of what caching can and cannot achieve.
By the end of this page, you will be able to calculate expected latency improvements from caching, model throughput gains mathematically, use Amdahl's Law to understand caching limits, and predict the ROI of cache investments before implementing them.
The most immediate and visible benefit of caching is latency reduction. By serving data from fast storage instead of slow backends, caching can reduce response times by orders of magnitude.
Understanding the latency gap:
The latency difference between cached and uncached access is profound:
| Data Source | Typical Latency | Latency Ratio vs. Cache |
|---|---|---|
| In-process cache (HashMap) | 100 nanoseconds | 1x (baseline) |
| Local Redis/Memcached | 200-500 microseconds | 2,000-5,000x |
| Remote Redis (same DC) | 0.5-2 milliseconds | 5,000-20,000x |
| PostgreSQL simple query | 5-20 milliseconds | 50,000-200,000x |
| PostgreSQL complex query | 50-500 milliseconds | 500,000-5,000,000x |
| External API call | 100-1000 milliseconds | 1,000,000-10,000,000x |
| Cross-region database | 200-500 milliseconds | 2,000,000-5,000,000x |
Calculating average latency:
With caching, your average latency becomes a weighted average of hit and miss latencies:
Average Latency = (Hit Rate × Hit Latency) + (Miss Rate × Miss Latency)
Example calculation:
Consider a product page that requires fetching product data:
Average Latency = (0.95 × 1ms) + (0.05 × 80ms)
= 0.95ms + 4ms
= 4.95ms
Without caching (100% miss rate): 80ms With 95% hit rate: 4.95ms
That's a 16x improvement in average latency.
The latency distribution shift:
Caching doesn't just improve averages—it fundamentally shifts the latency distribution. Without caching, every request is slow. With caching, most requests are fast, with a tail of slower requests (the misses).
| Hit Rate | p50 (Median) | p90 | p95 | p99 | p99.9 |
|---|---|---|---|---|---|
| 0% (no cache) | 80ms | 80ms | 80ms | 80ms | 80ms |
| 50% | ~40ms | 80ms | 80ms | 80ms | 80ms |
| 90% | 1ms | 80ms | 80ms | 80ms | 80ms |
| 95% | 1ms | 1ms | 80ms | 80ms | 80ms |
| 99% | 1ms | 1ms | 1ms | 80ms | 80ms |
| 99.9% | 1ms | 1ms | 1ms | 1ms | 80ms |
Don't fixate only on average latency. Your p95 or p99 latency determines the experience of your slowest requests—which users definitely notice. With 95% hit rate, 5% of users still experience the full miss latency. For critical paths, you may need even higher hit rates.
Beyond latency, caching dramatically increases your system's throughput capacity. By serving most requests from cache, your backend can handle the remaining traffic comfortably—even under loads that would otherwise crush it.
The throughput multiplier formula:
If your backend can handle B requests per second without caching, and your cache has hit rate H, your effective system capacity becomes:
Effective Capacity = B ÷ (1 - H)
This is the throughput multiplier effect.
| Hit Rate | Backend Capacity | System Capacity | Multiplier |
|---|---|---|---|
| 0% | 1,000 RPS | 1,000 RPS | 1x |
| 50% | 1,000 RPS | 2,000 RPS | 2x |
| 80% | 1,000 RPS | 5,000 RPS | 5x |
| 90% | 1,000 RPS | 10,000 RPS | 10x |
| 95% | 1,000 RPS | 20,000 RPS | 20x |
| 99% | 1,000 RPS | 100,000 RPS | 100x |
| 99.9% | 1,000 RPS | 1,000,000 RPS | 1,000x |
Real-world interpretation:
At 95% hit rate, a database that can handle 1,000 queries per second can effectively support a system receiving 20,000 requests per second. The cache absorbs 19,000 requests; only 1,000 reach the database.
This is why caching is the first technique to consider when scaling read-heavy systems. Before adding read replicas, before sharding, before exotic database architectures—try caching. It's often sufficient and far simpler.
The scaling curve:
Without caching, capacity scales linearly with infrastructure investment:
With effective caching, capacity scales sub-linearly:
This is because cache infrastructure scales more efficiently than database infrastructure. Adding Redis memory is cheaper than adding database read replicas.
Most applications find a sweet spot where caching handles >90% of read traffic, database handles writes plus cache misses, and the system scales by adding cache capacity rather than database capacity. This architecture is simpler, cheaper, and more resilient than alternatives.
While caching offers dramatic improvements, these improvements have limits. Amdahl's Law, originally formulated for parallel computing, applies equally well to caching optimization.
Amdahl's Law for Caching:
The maximum speedup achievable by caching is limited by the portion of the workload that cannot be cached.
Maximum Speedup = 1 ÷ (1 - Cacheable Fraction)
Where Cacheable Fraction is the portion of requests that can be served from cache.
Example:
If 90% of your requests are cacheable (read queries for static-ish data) and 10% are uncacheable (writes, real-time data):
Maximum Speedup = 1 ÷ (1 - 0.90) = 1 ÷ 0.10 = 10x
No matter how perfect your cache, you cannot achieve more than 10x speedup because 10% of traffic will always experience the uncached latency.
| Cacheable % | Uncacheable % | Maximum Speedup | Effective Latency* |
|---|---|---|---|
| 50% | 50% | 2x | 50% of original |
| 80% | 20% | 5x | 20% of original |
| 90% | 10% | 10x | 10% of original |
| 95% | 5% | 20x | 5% of original |
| 99% | 1% | 100x | 1% of original |
| 99.9% | 0.1% | 1,000x | 0.1% of original |
*Assuming perfect cache (100% hit rate on cacheable data, 0ms cache latency)
Practical implications:
Amdahl's Law tells us where to focus optimization efforts:
Identify the uncacheable fraction — What percentage of your traffic cannot be cached? Writes? Real-time data? Highly personalized content?
Optimize both cached and uncached paths — If 10% is uncacheable and takes 500ms, your average latency floor is 50ms even with perfect caching. You must also optimize the uncacheable path.
Expand what's cacheable — Often, data assumed to be uncacheable can be cached with creative approaches:
Amdahl's Law also shows diminishing returns. Going from 90% to 95% cacheable doubles your maximum speedup (10x to 20x). Going from 95% to 99% multiplies it by 5x (20x to 100x). Going from 99% to 99.9% multiplies by 10x. At high cacheable fractions, further improvement requires disproportionate effort.
The cache-aside (or look-aside) pattern is the most common caching approach. Understanding its performance characteristics helps predict real-world behavior.
Cache-aside flow:
Latency model:
T_hit = T_cache_lookup + T_cache_read
T_miss = T_cache_lookup + T_db_query + T_cache_write
T_average = (H × T_hit) + ((1-H) × T_miss)
Where:
H = Hit rateT_cache_lookup = Time to check cache (typically < 1ms)T_cache_read = Time to read from cache (typically < 1ms)T_db_query = Database query time (often 10-100ms+)T_cache_write = Time to populate cache (typically < 1ms)123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
# Performance modeling for cache-aside patternfrom dataclasses import dataclassfrom typing import Tuple @dataclassclass CacheConfig: cache_lookup_ms: float = 0.5 # Time to lookup in cache cache_read_ms: float = 0.5 # Time to read cached value cache_write_ms: float = 0.5 # Time to write to cache db_query_ms: float = 50.0 # Time for database query hit_rate: float = 0.95 # Cache hit rate (0-1) def calculate_latencies(config: CacheConfig) -> Tuple[float, float, float]: """Calculate hit, miss, and average latencies.""" # Hit path: lookup + read hit_latency = config.cache_lookup_ms + config.cache_read_ms # Miss path: lookup + db query + write miss_latency = (config.cache_lookup_ms + config.db_query_ms + config.cache_write_ms) # Weighted average avg_latency = (config.hit_rate * hit_latency + (1 - config.hit_rate) * miss_latency) return hit_latency, miss_latency, avg_latency def calculate_throughput_multiplier(hit_rate: float) -> float: """Calculate how much more traffic system can handle.""" if hit_rate >= 1.0: return float('inf') return 1.0 / (1.0 - hit_rate) # Example usageconfig = CacheConfig(hit_rate=0.95, db_query_ms=50)hit_ms, miss_ms, avg_ms = calculate_latencies(config) print(f"Hit latency: {hit_ms:.2f}ms")print(f"Miss latency: {miss_ms:.2f}ms") print(f"Average latency: {avg_ms:.2f}ms")print(f"Speedup vs no cache: {(config.db_query_ms) / avg_ms:.1f}x")print(f"Throughput multiplier: {calculate_throughput_multiplier(0.95):.1f}x") # Output:# Hit latency: 1.00ms# Miss latency: 51.00ms# Average latency: 3.50ms# Speedup vs no cache: 14.3x# Throughput multiplier: 20.0xSensitivity analysis:
Different parameters have different impacts on performance:
Key insight: Invest effort proportionally to impact. Improving hit rate almost always yields better returns than optimizing cache latency.
Before implementing caching, model the expected improvement using realistic latency estimates. This helps you predict ROI, set expectations, and identify whether caching is the right solution. Sometimes, the model reveals that optimizing the database query would be more effective.
Beyond raw latency and throughput, caching provides significant benefits to concurrency and connection management—often overlooked but critically important factors.
The connection pool problem:
Databases have connection limits. Each concurrent query typically requires a connection. Without caching:
With 95% hit rate:
| Hit Rate | Concurrent Requests | DB Connections Needed | Connection Pool Pressure |
|---|---|---|---|
| 0% | 1,000 | 1,000 | Critical (likely exhausted) |
| 50% | 1,000 | 500 | High |
| 80% | 1,000 | 200 | Moderate |
| 90% | 1,000 | 100 | Comfortable |
| 95% | 1,000 | 50 | Low |
| 99% | 1,000 | 10 | Minimal |
Little's Law and caching:
Little's Law states: L = λW
Where:
L = Average number of items in a system (concurrent operations)λ = Arrival rate (requests per second)W = Average time in system (latency)For database connections:
Concurrent DB Operations = (Requests/sec × Miss Rate) × DB Query Time
Example:
Concurrent DB Operations = (1000 × 0.05) × 0.050
= 50 × 0.050
= 2.5 concurrent queries
With caching, a system handling 1,000 RPS only needs ~3 database connections on average, with some headroom for variance. Without caching (0% hit rate):
Concurrent DB Operations = 1000 × 0.050 = 50 concurrent queries
Caching reduces connection needs by 20x in this example.
Connection pool exhaustion is a common hidden bottleneck. When all connections are busy, new requests queue or fail. Symptoms include sporadic timeouts and latency spikes that don't correlate with database load. Caching's reduction of connection pressure often resolves these mysterious issues.
Performance improvement directly translates to cost savings. Let's quantify the financial impact of caching.
Cost per request calculation:
The cost of serving a request includes:
Cached requests have minimal cost compared to uncached requests:
| Component | Uncached Cost | Cached Cost | Savings |
|---|---|---|---|
| Database I/O | High (query execution) | Zero | 100% |
| Database CPU | High (query processing) | Zero | 100% |
| App server time | High (waiting for DB) | Low (fast return) | ~95% |
| Network (internal) | Moderate | Lower | 50-80% |
| Cache operation | N/A | Low | N/A |
Total cost of ownership model:
Consider a workload of 10 million requests per day:
Without caching:
With 95% cache hit rate:
Savings: $5,500/month (79% reduction)
Calculate the ROI of caching before implementation: (Monthly Savings - Cache Cost) × 12 = Annual Savings. In most read-heavy systems, caching pays for itself within the first month and provides ongoing savings indefinitely.
Beyond performance and cost, caching provides resilience benefits that improve system reliability and availability. A well-designed cache can keep your system running even when backend components fail.
Caching as a circuit breaker:
When a database becomes slow or unavailable, a cache can:
Stale data as a fallback:
When properly configured, caches can serve stale data during backend outages:
For many applications, slightly stale data is far preferable to no data at all. A product catalog showing prices from 5 minutes ago is better than an error page.
Traffic absorption during incidents:
During backend incidents, caching provides critical protection:
Consider caching as part of your resilience strategy, not just performance optimization. Configure appropriate stale-serving behaviors, monitor cache health alongside other critical components, and understand what happens when cache fails. Your cache might be keeping your system running during your next incident.
We've explored the substantial performance improvements that caching enables, from latency reduction to cost savings to resilience benefits. Let's consolidate the key insights:
What's next:
While caching offers tremendous benefits, it's not without trade-offs. The next page explores the trade-offs and challenges of caching—cache invalidation, consistency issues, memory costs, and the complexity caching adds to systems. Understanding these trade-offs is essential for making informed caching decisions.
You now understand how to quantify caching benefits, model performance improvements mathematically, and evaluate the potential impact of caching on your systems. This analytical foundation prepares you for making informed caching decisions.