System Design (HLD)Caching Strategies

Write-Around Caching

LevelIntermediate

Duration55 mins

TopicCaching Strategies

4 / 5

Read Miss Behavior

When the Cache Says 'No'

In write-around caching, cache misses are not just expected—they're fundamental to how the system works. Unlike write-through caching where misses represent anomalies, write-around treats misses as the primary mechanism for cache population. Every piece of data in the cache arrived there through a read miss.

But cache misses have costs: increased latency for the requesting client, additional load on the database, and potential for cascading failures under high concurrency. Understanding read miss behavior in depth is essential for building resilient write-around systems.

What You Will Learn

By the end of this page, you will understand the complete anatomy of a cache miss—the latency breakdown, database load implications, failure cascades, and the full arsenal of mitigation strategies. You'll be equipped to design systems that handle cache misses gracefully even under extreme load.

Anatomy of a Cache Miss

A cache miss in write-around caching triggers a multi-step process. Understanding each step—and its latency contribution—is crucial for performance analysis and optimization.

The Cache Miss Timeline:

┌───────────────────────────────────────────────────────────────────────────────────┐
│                           Total Cache Miss Latency                                │
├────────────┬──────────────┬─────────────────┬────────────────┬───────────────────┤
│  Cache     │   Network    │   Database      │   Network      │  Cache            │
│  Lookup    │   to DB      │   Query         │   from DB      │  Population       │
│  (miss)    │              │                 │                │                   │
├────────────┼──────────────┼─────────────────┼────────────────┼───────────────────┤
│  0.1-1ms   │   0.5-5ms    │   1-100ms       │   0.5-5ms      │  0.1-2ms          │
└────────────┴──────────────┴─────────────────┴────────────────┴───────────────────┘
                        Total: 2-115ms (typically 10-30ms)

Cache Miss Latency Breakdown
Step	Typical Latency	Variability	Optimization Opportunities
Cache lookup (miss)	0.1-1ms	Low	Fast cache client, connection pooling
Network to database	0.5-5ms	Medium	Co-location, connection pooling
Database query execution	1-100ms	High	Query optimization, indexing
Data serialization	0.1-5ms	Medium	Efficient serialization (protobuf)
Network from database	0.5-5ms	Medium	Compact payloads, compression
Cache population	0.1-2ms	Low	Async population, pipelining
Response to client	0.1-5ms	Medium	Efficient serialization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
interface MissLatencyBreakdown {
  cacheLookupMs: number;
  networkToDbMs: number;
  dbQueryMs: number;
  dbResultSizeBytes: number;
  cachePopulationMs: number;
  totalMs: number;
}
 
class InstrumentedCacheMissHandler<T> {
  async handleCacheMiss(key: string): Promise<{data: T | null; breakdown: MissLatencyBreakdown}> {
    const breakdown: MissLatencyBreakdown = {
      cacheLookupMs: 0,
      networkToDbMs: 0,
      dbQueryMs: 0,
      dbResultSizeBytes: 0,
      cachePopulationMs: 0,
      totalMs: 0,
    };
    
    const totalStart = performance.now();
    
    // Step 1: Cache lookup (resulting in miss)
    const cacheStart = performance.now();
    const cached = await this.cache.get(key);
    breakdown.cacheLookupMs = performance.now() - cacheStart;
    
    if (cached !== null) {
      throw new Error("Expected cache miss, got hit");
    }
    
    // Step 2: Database query
    const dbStart = performance.now();
    const dbResult = await this.database.getWithMetadata(key);
    breakdown.dbQueryMs = performance.now() - dbStart;
    
    if (dbResult !== null) {
      breakdown.dbResultSizeBytes = JSON.stringify(dbResult.data).length;
      
      // Step 3: Cache population
      const cachePopStart = performance.now();
      await this.cache.set(key, dbResult.data, this.ttl);
      breakdown.cachePopulationMs = performance.now() - cachePopStart;
    }
    
    breakdown.totalMs = performance.now() - totalStart;
    breakdown.networkToDbMs = breakdown.totalMs - breakdown.cacheLookupMs 
      - breakdown.dbQueryMs - breakdown.cachePopulationMs;
    
    // Log for analysis
    this.metrics.recordMiss(breakdown);
    
    return {
      data: dbResult?.data ?? null,
      breakdown,
    };
  }
  
  // Analysis: Where is time being spent?
  analyzeMissProfile(): MissAnalysis {
    const avgBreakdown = this.metrics.getAverageBreakdown();
    
    return {
      bottleneck: this.identifyBottleneck(avgBreakdown),
      dbQueryPercentage: (avgBreakdown.dbQueryMs / avgBreakdown.totalMs) * 100,
      networkPercentage: ((avgBreakdown.networkToDbMs) / avgBreakdown.totalMs) * 100,
      cacheOverheadPercentage: 
        ((avgBreakdown.cacheLookupMs + avgBreakdown.cachePopulationMs) / avgBreakdown.totalMs) * 100,
      recommendations: this.generateRecommendations(avgBreakdown),
    };
  }
  
  private identifyBottleneck(breakdown: MissLatencyBreakdown): string {
    const components = [
      { name: 'database_query', value: breakdown.dbQueryMs },
      { name: 'network', value: breakdown.networkToDbMs },
      { name: 'cache_overhead', value: breakdown.cacheLookupMs + breakdown.cachePopulationMs },
    ];
    
    return components.sort((a, b) => b.value - a.value)[0].name;
  }
}

Database Query Dominates

In most systems, the database query accounts for 50-80% of cache miss latency. This is why database optimization (indexes, query tuning, connection pooling) has the highest impact on miss performance. Cache layer optimization provides diminishing returns if the database is slow.

Database Load Implications

Every cache miss translates to a database query. In write-around caching, understanding the relationship between miss rate and database load is critical for capacity planning and avoiding cascading failures.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
interface LoadModel {
  totalReadQPS: number;      // Total read queries per second
  cacheHitRate: number;      // Percentage of reads served from cache
  missRate: number;          // Percentage of reads causing DB queries
  dbQueryQPS: number;        // Resulting database queries per second
  avgQueryLatencyMs: number; // Average DB query time
  dbCPUUtilization: number;  // Estimated CPU usage
}
 
function modelDatabaseLoad(
  totalReads: number,
  hitRate: number,
  avgDbLatencyMs: number,
  dbMaxQPS: number
): LoadModel {
  const missRate = 1 - hitRate;
  const dbQueryQPS = totalReads * missRate;
  const dbCPUUtilization = dbQueryQPS / dbMaxQPS;
  
  return {
    totalReadQPS: totalReads,
    cacheHitRate: hitRate,
    missRate: missRate,
    dbQueryQPS: dbQueryQPS,
    avgQueryLatencyMs: avgDbLatencyMs,
    dbCPUUtilization: dbCPUUtilization,
  };
}
 
// Example scenarios
const scenarios = {
  // Healthy: High cache hit rate
  healthy: modelDatabaseLoad(
    10000,  // 10K reads/sec
    0.95,   // 95% cache hit rate
    5,      // 5ms average query
    1000    // DB can handle 1000 QPS
  ),
  // Result: 500 DB queries/sec (50% DB utilization)
  
  // Warming: Cold cache after restart
  warmingPhase: modelDatabaseLoad(
    10000,  // 10K reads/sec
    0.50,   // 50% hit rate (cache warming)
    5,      // 5ms average query
    1000    // DB can handle 1000 QPS
  ),
  // Result: 5000 DB queries/sec (500% DB - OVERLOADED!)
  
  // Degraded: Cache partially failed
  degraded: modelDatabaseLoad(
    10000,  // 10K reads/sec
    0.80,   // 80% hit rate
    15,     // 15ms (DB slowing under load)
    1000    // DB can handle 1000 QPS
  ),
  // Result: 2000 DB queries/sec (200% DB - DANGER ZONE)
};
 
// The takeaway: Hit rate directly determines DB load
// Small changes in hit rate cause large changes in DB load

The Amplification Effect:

Consider a system serving 10,000 reads/second:

Cache Hit Rate	DB Queries/sec	DB Load Change
99%	100	Baseline
95%	500	5x increase!
90%	1,000	10x increase!
80%	2,000	20x increase!
50%	5,000	50x increase!

A mere 4% drop in hit rate (99% → 95%) causes a 5x increase in database load. This nonlinear relationship is why cache health monitoring and protection mechanisms are critical.

The Death Spiral

When database load increases, query latency increases. Slower queries mean requests hold connections longer, reducing connection pool availability. This causes more timeouts, more retries, more load—a death spiral. Cache misses can trigger cascading failures if not managed carefully.

Cascading Failure Scenarios

Cache misses, while normal in isolation, can trigger cascading failures when they occur at scale. Understanding these failure scenarios helps you design protective mechanisms.

Common Cascading Failure Patterns

•Cache Stampede (Thundering Herd) — Popular key expires; thousands of requests simultaneously hit the database for the same key.
•Cold Cache Avalanche — Cache restart causes all requests to miss; database overwhelmed by sudden load spike.
•TTL Synchronization — Entries set at the same time expire together, causing periodic load spikes.
•Cache Eviction Storm — Memory pressure causes evictions; evicted keys immediately get requested again.
•Retry Amplification — Timeouts cause retries; retries increase load; more timeouts; more retries.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
// Scenario: What happens when a popular cache key expires?
 
interface SimulationResult {
  dbQueriesTriggered: number;
  totalLatencyMs: number;
  failureOccurred: boolean;
  failureType?: string;
}
 
async function simulateCacheStampede(
  concurrentRequests: number,
  dbMaxQPS: number,
  dbLatencyMs: number
): Promise<SimulationResult> {
  // All requests arrive at the same moment for an expired key
  const startTime = performance.now();
  let dbQueriesTriggered = 0;
  let failureOccurred = false;
  let failureType: string | undefined;
  
  // Without protection: All requests hit the database
  dbQueriesTriggered = concurrentRequests;
  
  // Database response based on load
  const actualLoadRatio = dbQueriesTriggered / dbMaxQPS;
  
  if (actualLoadRatio > 3) {
    failureOccurred = true;
    failureType = 'database_connection_exhausted';
  } else if (actualLoadRatio > 1.5) {
    failureOccurred = true;
    failureType = 'database_timeout_cascade';
  }
  
  // Latency increases non-linearly with load
  const latencyMultiplier = Math.pow(actualLoadRatio, 1.5);
  const actualLatency = dbLatencyMs * latencyMultiplier;
  
  return {
    dbQueriesTriggered,
    totalLatencyMs: actualLatency,
    failureOccurred,
    failureType,
  };
}
 
// Example: 1000 concurrent requests, DB can handle 100 QPS
const naiveResult = await simulateCacheStampede(1000, 100, 10);
// Result: 1000 DB queries, 10x overload, failure!
 
// With request coalescing:
async function simulateWithCoalescing(
  concurrentRequests: number,
  dbMaxQPS: number,
  dbLatencyMs: number
): Promise<SimulationResult> {
  // Only ONE request hits the database
  // Others wait for the result
  const dbQueriesTriggered = 1;
  const actualLoadRatio = dbQueriesTriggered / dbMaxQPS;
  
  return {
    dbQueriesTriggered,
    totalLatencyMs: dbLatencyMs, // Single query latency
    failureOccurred: false,
  };
}
 
const coalescedResult = await simulateWithCoalescing(1000, 100, 10);
// Result: 1 DB query, no overload, success!

Failure Scenario Comparison
Scenario	Trigger	Without Protection	With Protection
Cache Stampede	Hot key expiry	1000s of DB queries	1 query (coalescing)
Cold Start	Cache restart	100% miss rate	Gradual traffic shift
Mass Eviction	Memory pressure	All evicted keys query DB	Rate limiting, backpressure
TTL Sync	Batch of same TTL	Periodic spikes	TTL jitter

Design for the Worst Case

Your system will eventually experience every failure scenario. A popular key WILL expire exactly when traffic is highest. Your cache WILL restart unexpectedly. Design protective mechanisms before they're needed, not after the first outage.

Cache Miss Mitigation Strategies

A comprehensive miss mitigation strategy combines multiple techniques to handle cache misses gracefully under all conditions.

Preventive Strategies

•TTL Jittering — Randomize expiration times
•Background Refresh — Refresh before expiry
•Cache Warming — Pre-populate on startup
•Replication — Multiple cache copies
•Larger Cache — Reduce eviction pressure

Reactive Strategies

•Request Coalescing — Deduplicate in-flight
•Rate Limiting — Cap DB queries/second
•Circuit Breaker — Stop cascading failures
•Stale-While-Revalidate — Serve stale, refresh async
•Fallback Values — Default when DB fails

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
class ProtectedWriteAroundCache<T> {
  private coalescing = new Map<string, Promise<T | null>>();
  private circuitBreaker: CircuitBreaker;
  private rateLimiter: RateLimiter;
  
  constructor(
    private cache: CacheStore<T>,
    private database: Database<T>,
    private config: CacheConfig
  ) {
    this.circuitBreaker = new CircuitBreaker({
      failureThreshold: 5,
      resetTimeoutMs: 30000,
    });
    this.rateLimiter = new RateLimiter({
      maxRequestsPerSecond: config.maxDbQPS,
    });
  }
 
  async read(key: string): Promise<T | null> {
    // Layer 1: Check cache (with stale-while-revalidate)
    const cached = await this.cache.getWithMetadata(key);
    
    if (cached !== null) {
      const { data, ttlRemaining, setTime } = cached;
      
      // If near expiry, trigger background refresh
      if (ttlRemaining < this.config.refreshThreshold) {
        this.backgroundRefresh(key);
      }
      
      return data;
    }
    
    // Layer 2: Request coalescing
    if (this.coalescing.has(key)) {
      return this.coalescing.get(key)!;
    }
    
    // Layer 3: Circuit breaker check
    if (!this.circuitBreaker.canRequest()) {
      return this.handleCircuitOpen(key);
    }
    
    // Layer 4: Rate limiting
    if (!this.rateLimiter.tryAcquire()) {
      return this.handleRateLimited(key);
    }
    
    // Execute with protection
    const promise = this.fetchWithProtection(key);
    this.coalescing.set(key, promise);
    
    try {
      return await promise;
    } finally {
      this.coalescing.delete(key);
    }
  }
  
  private async fetchWithProtection(key: string): Promise<T | null> {
    const timeout = new Promise<never>((_, reject) => 
      setTimeout(() => reject(new Error('Timeout')), this.config.dbTimeoutMs)
    );
    
    try {
      const result = await Promise.race([
        this.database.get(key),
        timeout,
      ]);
      
      this.circuitBreaker.recordSuccess();
      
      if (result !== null) {
        await this.cache.set(key, result, this.jitteredTTL());
      }
      
      return result;
    } catch (error) {
      this.circuitBreaker.recordFailure();
      throw error;
    }
  }
  
  private async backgroundRefresh(key: string): Promise<void> {
    // Don't await - fire and forget
    (async () => {
      try {
        const fresh = await this.database.get(key);
        if (fresh !== null) {
          await this.cache.set(key, fresh, this.jitteredTTL());
        }
      } catch {
        // Background refresh failure is not critical
      }
    })();
  }
  
  private handleCircuitOpen(key: string): T | null {
    // Options:
    // 1. Return stale data if available
    // 2. Return fallback/default value
    // 3. Return null and let caller handle
    return this.getFallbackValue(key);
  }
  
  private handleRateLimited(key: string): Promise<T | null> {
    // Wait and retry, or return fallback
    return new Promise(resolve => {
      setTimeout(async () => {
        resolve(await this.read(key));
      }, 100);
    });
  }
  
  private jitteredTTL(): number {
    const jitter = this.config.baseTTL * 0.1;
    return this.config.baseTTL + (Math.random() - 0.5) * 2 * jitter;
  }
}

Defense in Depth

No single mitigation strategy is sufficient. The most resilient systems layer multiple protections: TTL jitter prevents synchronized expirations, coalescing handles stampedes, circuit breakers stop cascading failures, and stale-while-revalidate ensures availability during database issues.

Stale-While-Revalidate Pattern

Stale-While-Revalidate (SWR) is a cache strategy that serves stale (expired) data immediately while asynchronously fetching fresh data in the background. This pattern eliminates perceived cache misses entirely from the user's perspective.

How SWR Works:

Cache entry has two TTLs: fresh TTL and stale TTL
During fresh period: serve directly, no refresh
During stale period: serve immediately, trigger background refresh
After stale expires: true cache miss, wait for database

├───────────────────├───────────────────├─────────────────────┤
│    Fresh Period   │   Stale Period    │    Expired          │
│   (serve as-is)   │ (serve + refresh) │   (cache miss)      │
├───────────────────├───────────────────├─────────────────────┤
│      TTL: 300s    │      +60s         │                     │
└───────────────────└───────────────────└─────────────────────┘

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
interface SWRConfig {
  freshTTL: number;      // Time data is considered fresh
  staleTTL: number;      // Time stale data can still be served
  refreshInFlight: Map<string, boolean>;
}
 
class StaleWhileRevalidateCache<T> {
  private refreshInFlight = new Map<string, boolean>();
  
  constructor(
    private cache: CacheStore<T>,
    private database: Database<T>,
    private config: SWRConfig
  ) {}
 
  async read(key: string): Promise<T | null> {
    const entry = await this.cache.getWithTimestamps(key);
    
    if (entry !== null) {
      const { data, setTime } = entry;
      const age = Date.now() - setTime;
      
      // Case 1: Fresh - serve directly
      if (age < this.config.freshTTL * 1000) {
        return data;
      }
      
      // Case 2: Stale but serveable - serve and refresh
      if (age < (this.config.freshTTL + this.config.staleTTL) * 1000) {
        // Serve stale data immediately
        this.triggerBackgroundRefresh(key);
        return data; // User sees no latency!
      }
      
      // Case 3: Too stale - treat as miss (fall through)
    }
    
    // True cache miss - must wait for database
    return this.fetchAndCache(key);
  }
  
  private triggerBackgroundRefresh(key: string): void {
    // Prevent multiple concurrent refreshes for same key
    if (this.refreshInFlight.get(key)) {
      return;
    }
    
    this.refreshInFlight.set(key, true);
    
    // Fire and forget
    (async () => {
      try {
        const fresh = await this.database.get(key);
        if (fresh !== null) {
          await this.cache.setWithTimestamp(key, fresh, {
            setTime: Date.now(),
            expiresAt: Date.now() + (this.config.freshTTL + this.config.staleTTL) * 1000,
          });
        }
      } finally {
        this.refreshInFlight.delete(key);
      }
    })();
  }
  
  private async fetchAndCache(key: string): Promise<T | null> {
    const data = await this.database.get(key);
    
    if (data !== null) {
      await this.cache.setWithTimestamp(key, data, {
        setTime: Date.now(),
        expiresAt: Date.now() + (this.config.freshTTL + this.config.staleTTL) * 1000,
      });
    }
    
    return data;
  }
}
 
// Usage example
const swrCache = new StaleWhileRevalidateCache(cache, db, {
  freshTTL: 300,   // 5 minutes fresh
  staleTTL: 60,    // 1 minute stale grace period
  refreshInFlight: new Map(),
});
 
// User always sees low latency
// During stale period: 0.5ms (cache read)
// True miss: 15ms (database fetch)
// The stale period absorbs the refresh latency

SWR Latency Profile
Data State	User Latency	Background Action	Data Freshness
Fresh (< freshTTL)	~1ms (cache)	None	Current
Stale (< staleTTL)	~1ms (cache)	Async DB fetch	Slightly outdated
Expired (> staleTTL)	~20ms (DB)	None (waited)	Fresh on response

SWR in HTTP Caching

The stale-while-revalidate pattern is standardized in HTTP's Cache-Control header. CDNs like Cloudflare and Fastly support 'stale-while-revalidate' directives, allowing edge caches to serve stale content while fetching updates from origin servers.

Circuit Breaker for Database Protection

The circuit breaker pattern protects downstream services (your database) when they're struggling. Instead of continuing to hammer a failing database with requests, the circuit breaker 'opens' to stop the flood and give the database time to recover.

Circuit Breaker States:

                  ┌─────────────────────┐
                  │       CLOSED        │  Normal operation
                  │   (Requests pass)   │  Monitor for failures
                  └──────────┬──────────┘
                             │ Failure threshold exceeded
                             ▼
                  ┌─────────────────────┐
                  │        OPEN         │  Stop all requests
                  │  (Fail immediately) │  Wait for cooldown
                  └──────────┬──────────┘
                             │ Cooldown period expires
                             ▼
                  ┌─────────────────────┐
                  │     HALF-OPEN       │  Test with limited requests
                  │  (Allow some tests) │  Determine if recovered
                  └──────────┬──────────┘
                             │
              ┌──────────────┴──────────────┐
              │ Test succeeds               │ Test fails
              ▼                             ▼
         ┌────────┐                    ┌────────┐
         │ CLOSED │                    │  OPEN  │
         └────────┘                    └────────┘

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
enum CircuitState {
  CLOSED = 'CLOSED',     // Normal, requests pass through
  OPEN = 'OPEN',         // Failing, requests rejected
  HALF_OPEN = 'HALF_OPEN' // Testing if recovered
}
 
interface CircuitBreakerConfig {
  failureThreshold: number;    // Failures before opening
  successThreshold: number;    // Successes to close from half-open
  resetTimeoutMs: number;      // Time before trying again
}
 
class CircuitBreaker {
  private state: CircuitState = CircuitState.CLOSED;
  private failures = 0;
  private successes = 0;
  private lastFailureTime = 0;
  
  constructor(private config: CircuitBreakerConfig) {}
  
  canRequest(): boolean {
    switch (this.state) {
      case CircuitState.CLOSED:
        return true;
        
      case CircuitState.OPEN:
        // Check if cooldown has elapsed
        if (Date.now() - this.lastFailureTime > this.config.resetTimeoutMs) {
          this.transitionTo(CircuitState.HALF_OPEN);
          return true;
        }
        return false;
        
      case CircuitState.HALF_OPEN:
        // Allow limited test requests
        return true;
    }
  }
  
  recordSuccess(): void {
    if (this.state === CircuitState.HALF_OPEN) {
      this.successes++;
      if (this.successes >= this.config.successThreshold) {
        this.transitionTo(CircuitState.CLOSED);
      }
    }
    this.failures = 0; // Reset failure count on success
  }
  
  recordFailure(): void {
    this.failures++;
    this.lastFailureTime = Date.now();
    
    if (this.state === CircuitState.HALF_OPEN) {
      this.transitionTo(CircuitState.OPEN);
    } else if (this.failures >= this.config.failureThreshold) {
      this.transitionTo(CircuitState.OPEN);
    }
  }
  
  private transitionTo(newState: CircuitState): void {
    console.log(`Circuit breaker: ${this.state} -> ${newState}`);
    this.state = newState;
    
    if (newState === CircuitState.CLOSED) {
      this.failures = 0;
      this.successes = 0;
    } else if (newState === CircuitState.HALF_OPEN) {
      this.successes = 0;
    }
  }
  
  getState(): CircuitState {
    return this.state;
  }
}
 
// Usage in cache read
class CacheWithCircuitBreaker<T> {
  private circuitBreaker = new CircuitBreaker({
    failureThreshold: 5,
    successThreshold: 2,
    resetTimeoutMs: 30000,
  });
 
  async read(key: string): Promise<T | null> {
    const cached = await this.cache.get(key);
    if (cached !== null) return cached;
    
    // Check circuit breaker before DB call
    if (!this.circuitBreaker.canRequest()) {
      // Return fallback instead of hitting DB
      return this.fallbackValue(key);
    }
    
    try {
      const data = await this.database.get(key);
      this.circuitBreaker.recordSuccess();
      
      if (data !== null) {
        await this.cache.set(key, data);
      }
      return data;
    } catch (error) {
      this.circuitBreaker.recordFailure();
      throw error;
    }
  }
}

Circuit Breaker Trade-offs

When the circuit opens, cache misses return fallback values or errors—not real data. Users experience degraded functionality. This is intentional: it's better to serve fallbacks than to cascade failures across your entire system. Design your fallback behavior carefully.

Monitoring Cache Miss Behavior

Effective monitoring of cache miss behavior is essential for understanding system health and preventing issues before they become outages.

Key Metrics to Monitor

•Miss Rate — Percentage of reads that result in cache miss (target: <10%)
•Miss Latency (p50, p99) — Time to handle a cache miss (target: <50ms p99)
•Database QPS from Misses — Rate of database queries caused by misses
•Circuit Breaker State — How often circuit opens, duration open
•Coalescing Effectiveness — Requests saved by request coalescing
•Stale Serve Rate — Percentage of requests served stale data
•Cache Population Rate — New entries added per second
•Eviction Rate — Entries evicted per second (memory pressure indicator)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
interface CacheMissMetrics {
  // Core metrics
  totalReads: Counter;
  cacheHits: Counter;
  cacheMisses: Counter;
  
  // Latency histograms
  missLatency: Histogram;
  dbQueryLatency: Histogram;
  
  // Protection metrics
  coalescedRequests: Counter;
  circuitBreakerOpens: Counter;
  staleServes: Counter;
  rateLimitedRequests: Counter;
  
  // Health indicators
  dbConnectionPoolUsage: Gauge;
  cacheMemoryUsage: Gauge;
  evictionRate: Gauge;
}
 
class MetricsCollector {
  private metrics: CacheMissMetrics;
  
  recordCacheMiss(latencyMs: number, dbLatencyMs: number): void {
    this.metrics.cacheMisses.inc();
    this.metrics.missLatency.observe(latencyMs);
    this.metrics.dbQueryLatency.observe(dbLatencyMs);
  }
  
  recordCoalescedRequest(): void {
    this.metrics.coalescedRequests.inc();
  }
  
  // Alert thresholds
  checkAlerts(): Alert[] {
    const alerts: Alert[] = [];
    
    const hitRate = this.calculateHitRate();
    if (hitRate < 0.80) {
      alerts.push({
        severity: 'warning',
        message: `Cache hit rate dropped to ${(hitRate * 100).toFixed(1)}%`,
      });
    }
    
    const p99Latency = this.metrics.missLatency.getPercentile(0.99);
    if (p99Latency > 100) {
      alerts.push({
        severity: 'critical',
        message: `Cache miss p99 latency: ${p99Latency}ms exceeds threshold`,
      });
    }
    
    const evictionRate = this.metrics.evictionRate.get();
    if (evictionRate > 1000) {
      alerts.push({
        severity: 'warning',
        message: `High cache eviction rate: ${evictionRate}/sec`,
      });
    }
    
    return alerts;
  }
  
  // Dashboard data
  getDashboardData(): DashboardSnapshot {
    return {
      hitRate: this.calculateHitRate(),
      missRate: 1 - this.calculateHitRate(),
      avgMissLatencyMs: this.metrics.missLatency.getMean(),
      p99MissLatencyMs: this.metrics.missLatency.getPercentile(0.99),
      dbQPSFromMisses: this.metrics.cacheMisses.getRate(),
      coalescingEfficiency: this.getCoalescingEfficiency(),
      circuitBreakerState: this.getCircuitState(),
      cacheMemoryUsagePercent: this.metrics.cacheMemoryUsage.get(),
    };
  }
}

Alert Thresholds for Cache Miss Monitoring
Metric	Warning Threshold	Critical Threshold	Action
Hit Rate	< 85%	< 70%	Investigate cache size, TTL, access patterns
Miss p99 Latency	50ms	100ms	Check database performance, network
DB QPS from Misses	50% DB capacity	80% DB capacity	Increase cache, reduce traffic
Circuit Breaker Opens	Any occurrence	Multiple/hour	Investigate database health
Eviction Rate	100/sec	1000/sec	Increase cache size or reduce TTL

Alerting on Rate of Change

Sometimes the absolute values look fine, but a sudden change indicates problems. Alert on rate of change: if hit rate drops 10% in 5 minutes, something is wrong even if the absolute hit rate is still acceptable.

Summary: Mastering Cache Miss Behavior

Cache misses in write-around caching are a design feature, not a bug. Understanding their behavior—latency profile, database impact, failure scenarios, and mitigation strategies—is essential for building resilient systems.

Key Takeaways

•Cache misses have measurable latency — 10-30ms typical, dominated by database query time. Instrument and optimize the database path.
•Miss rate directly impacts database load — Small changes in hit rate cause large changes in DB queries due to the amplification effect.
•Cascading failures are real risks — Cache stampede, cold start avalanche, and retry amplification can overwhelm databases.
•Layer multiple protections — Request coalescing, circuit breakers, rate limiting, and stale-while-revalidate work together for resilience.
•Stale-while-revalidate hides miss latency — Serve stale data immediately, refresh in the background for user-perceived speed.
•Circuit breakers protect databases — When the database is struggling, stop making things worse by continuing to query it.
•Monitoring enables proactive response — Track hit rate, miss latency, and DB load to catch issues before they become outages.

What's Next:

Now that you deeply understand how cache misses behave and how to handle them, the final page explores when to use write-around caching—the use cases, workload patterns, and system characteristics that make write-around the optimal choice.

Page Complete

You now have a complete understanding of cache miss behavior in write-around caching—the latency anatomy, database load implications, cascading failure risks, and the full toolkit of mitigation strategies. You can design systems that handle misses gracefully under normal and extreme conditions.

4 / 5

Loading learning content...

System Design (HLD)Caching Strategies

Write-Around Caching

LevelIntermediate

Duration55 mins

TopicCaching Strategies

4 / 5

Read Miss Behavior

When the Cache Says 'No'

What You Will Learn

Anatomy of a Cache Miss

A cache miss in write-around caching triggers a multi-step process. Understanding each step—and its latency contribution—is crucial for performance analysis and optimization.

The Cache Miss Timeline:

┌───────────────────────────────────────────────────────────────────────────────────┐
│                           Total Cache Miss Latency                                │
├────────────┬──────────────┬─────────────────┬────────────────┬───────────────────┤
│  Cache     │   Network    │   Database      │   Network      │  Cache            │
│  Lookup    │   to DB      │   Query         │   from DB      │  Population       │
│  (miss)    │              │                 │                │                   │
├────────────┼──────────────┼─────────────────┼────────────────┼───────────────────┤
│  0.1-1ms   │   0.5-5ms    │   1-100ms       │   0.5-5ms      │  0.1-2ms          │
└────────────┴──────────────┴─────────────────┴────────────────┴───────────────────┘
                        Total: 2-115ms (typically 10-30ms)

Cache Miss Latency Breakdown
Step	Typical Latency	Variability	Optimization Opportunities
Cache lookup (miss)	0.1-1ms	Low	Fast cache client, connection pooling
Network to database	0.5-5ms	Medium	Co-location, connection pooling
Database query execution	1-100ms	High	Query optimization, indexing
Data serialization	0.1-5ms	Medium	Efficient serialization (protobuf)
Network from database	0.5-5ms	Medium	Compact payloads, compression
Cache population	0.1-2ms	Low	Async population, pipelining
Response to client	0.1-5ms	Medium	Efficient serialization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
interface MissLatencyBreakdown {
  cacheLookupMs: number;
  networkToDbMs: number;
  dbQueryMs: number;
  dbResultSizeBytes: number;
  cachePopulationMs: number;
  totalMs: number;
}
 
class InstrumentedCacheMissHandler<T> {
  async handleCacheMiss(key: string): Promise<{data: T | null; breakdown: MissLatencyBreakdown}> {
    const breakdown: MissLatencyBreakdown = {
      cacheLookupMs: 0,
      networkToDbMs: 0,
      dbQueryMs: 0,
      dbResultSizeBytes: 0,
      cachePopulationMs: 0,
      totalMs: 0,
    };
    
    const totalStart = performance.now();
    
    // Step 1: Cache lookup (resulting in miss)
    const cacheStart = performance.now();
    const cached = await this.cache.get(key);
    breakdown.cacheLookupMs = performance.now() - cacheStart;
    
    if (cached !== null) {
      throw new Error("Expected cache miss, got hit");
    }
    
    // Step 2: Database query
    const dbStart = performance.now();
    const dbResult = await this.database.getWithMetadata(key);
    breakdown.dbQueryMs = performance.now() - dbStart;
    
    if (dbResult !== null) {
      breakdown.dbResultSizeBytes = JSON.stringify(dbResult.data).length;
      
      // Step 3: Cache population
      const cachePopStart = performance.now();
      await this.cache.set(key, dbResult.data, this.ttl);
      breakdown.cachePopulationMs = performance.now() - cachePopStart;
    }
    
    breakdown.totalMs = performance.now() - totalStart;
    breakdown.networkToDbMs = breakdown.totalMs - breakdown.cacheLookupMs 
      - breakdown.dbQueryMs - breakdown.cachePopulationMs;
    
    // Log for analysis
    this.metrics.recordMiss(breakdown);
    
    return {
      data: dbResult?.data ?? null,
      breakdown,
    };
  }
  
  // Analysis: Where is time being spent?
  analyzeMissProfile(): MissAnalysis {
    const avgBreakdown = this.metrics.getAverageBreakdown();
    
    return {
      bottleneck: this.identifyBottleneck(avgBreakdown),
      dbQueryPercentage: (avgBreakdown.dbQueryMs / avgBreakdown.totalMs) * 100,
      networkPercentage: ((avgBreakdown.networkToDbMs) / avgBreakdown.totalMs) * 100,
      cacheOverheadPercentage: 
        ((avgBreakdown.cacheLookupMs + avgBreakdown.cachePopulationMs) / avgBreakdown.totalMs) * 100,
      recommendations: this.generateRecommendations(avgBreakdown),
    };
  }
  
  private identifyBottleneck(breakdown: MissLatencyBreakdown): string {
    const components = [
      { name: 'database_query', value: breakdown.dbQueryMs },
      { name: 'network', value: breakdown.networkToDbMs },
      { name: 'cache_overhead', value: breakdown.cacheLookupMs + breakdown.cachePopulationMs },
    ];
    
    return components.sort((a, b) => b.value - a.value)[0].name;
  }
}

Database Query Dominates

Database Load Implications

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
interface LoadModel {
  totalReadQPS: number;      // Total read queries per second
  cacheHitRate: number;      // Percentage of reads served from cache
  missRate: number;          // Percentage of reads causing DB queries
  dbQueryQPS: number;        // Resulting database queries per second
  avgQueryLatencyMs: number; // Average DB query time
  dbCPUUtilization: number;  // Estimated CPU usage
}
 
function modelDatabaseLoad(
  totalReads: number,
  hitRate: number,
  avgDbLatencyMs: number,
  dbMaxQPS: number
): LoadModel {
  const missRate = 1 - hitRate;
  const dbQueryQPS = totalReads * missRate;
  const dbCPUUtilization = dbQueryQPS / dbMaxQPS;
  
  return {
    totalReadQPS: totalReads,
    cacheHitRate: hitRate,
    missRate: missRate,
    dbQueryQPS: dbQueryQPS,
    avgQueryLatencyMs: avgDbLatencyMs,
    dbCPUUtilization: dbCPUUtilization,
  };
}
 
// Example scenarios
const scenarios = {
  // Healthy: High cache hit rate
  healthy: modelDatabaseLoad(
    10000,  // 10K reads/sec
    0.95,   // 95% cache hit rate
    5,      // 5ms average query
    1000    // DB can handle 1000 QPS
  ),
  // Result: 500 DB queries/sec (50% DB utilization)
  
  // Warming: Cold cache after restart
  warmingPhase: modelDatabaseLoad(
    10000,  // 10K reads/sec
    0.50,   // 50% hit rate (cache warming)
    5,      // 5ms average query
    1000    // DB can handle 1000 QPS
  ),
  // Result: 5000 DB queries/sec (500% DB - OVERLOADED!)
  
  // Degraded: Cache partially failed
  degraded: modelDatabaseLoad(
    10000,  // 10K reads/sec
    0.80,   // 80% hit rate
    15,     // 15ms (DB slowing under load)
    1000    // DB can handle 1000 QPS
  ),
  // Result: 2000 DB queries/sec (200% DB - DANGER ZONE)
};
 
// The takeaway: Hit rate directly determines DB load
// Small changes in hit rate cause large changes in DB load

The Amplification Effect:

Consider a system serving 10,000 reads/second:

Cache Hit Rate	DB Queries/sec	DB Load Change
99%	100	Baseline
95%	500	5x increase!
90%	1,000	10x increase!
80%	2,000	20x increase!
50%	5,000	50x increase!

A mere 4% drop in hit rate (99% → 95%) causes a 5x increase in database load. This nonlinear relationship is why cache health monitoring and protection mechanisms are critical.

The Death Spiral

Cascading Failure Scenarios

Cache misses, while normal in isolation, can trigger cascading failures when they occur at scale. Understanding these failure scenarios helps you design protective mechanisms.

Common Cascading Failure Patterns

•Cache Stampede (Thundering Herd) — Popular key expires; thousands of requests simultaneously hit the database for the same key.
•Cold Cache Avalanche — Cache restart causes all requests to miss; database overwhelmed by sudden load spike.
•TTL Synchronization — Entries set at the same time expire together, causing periodic load spikes.
•Cache Eviction Storm — Memory pressure causes evictions; evicted keys immediately get requested again.
•Retry Amplification — Timeouts cause retries; retries increase load; more timeouts; more retries.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
// Scenario: What happens when a popular cache key expires?
 
interface SimulationResult {
  dbQueriesTriggered: number;
  totalLatencyMs: number;
  failureOccurred: boolean;
  failureType?: string;
}
 
async function simulateCacheStampede(
  concurrentRequests: number,
  dbMaxQPS: number,
  dbLatencyMs: number
): Promise<SimulationResult> {
  // All requests arrive at the same moment for an expired key
  const startTime = performance.now();
  let dbQueriesTriggered = 0;
  let failureOccurred = false;
  let failureType: string | undefined;
  
  // Without protection: All requests hit the database
  dbQueriesTriggered = concurrentRequests;
  
  // Database response based on load
  const actualLoadRatio = dbQueriesTriggered / dbMaxQPS;
  
  if (actualLoadRatio > 3) {
    failureOccurred = true;
    failureType = 'database_connection_exhausted';
  } else if (actualLoadRatio > 1.5) {
    failureOccurred = true;
    failureType = 'database_timeout_cascade';
  }
  
  // Latency increases non-linearly with load
  const latencyMultiplier = Math.pow(actualLoadRatio, 1.5);
  const actualLatency = dbLatencyMs * latencyMultiplier;
  
  return {
    dbQueriesTriggered,
    totalLatencyMs: actualLatency,
    failureOccurred,
    failureType,
  };
}
 
// Example: 1000 concurrent requests, DB can handle 100 QPS
const naiveResult = await simulateCacheStampede(1000, 100, 10);
// Result: 1000 DB queries, 10x overload, failure!
 
// With request coalescing:
async function simulateWithCoalescing(
  concurrentRequests: number,
  dbMaxQPS: number,
  dbLatencyMs: number
): Promise<SimulationResult> {
  // Only ONE request hits the database
  // Others wait for the result
  const dbQueriesTriggered = 1;
  const actualLoadRatio = dbQueriesTriggered / dbMaxQPS;
  
  return {
    dbQueriesTriggered,
    totalLatencyMs: dbLatencyMs, // Single query latency
    failureOccurred: false,
  };
}
 
const coalescedResult = await simulateWithCoalescing(1000, 100, 10);
// Result: 1 DB query, no overload, success!

Failure Scenario Comparison
Scenario	Trigger	Without Protection	With Protection
Cache Stampede	Hot key expiry	1000s of DB queries	1 query (coalescing)
Cold Start	Cache restart	100% miss rate	Gradual traffic shift
Mass Eviction	Memory pressure	All evicted keys query DB	Rate limiting, backpressure
TTL Sync	Batch of same TTL	Periodic spikes	TTL jitter

Design for the Worst Case

Cache Miss Mitigation Strategies

A comprehensive miss mitigation strategy combines multiple techniques to handle cache misses gracefully under all conditions.

Preventive Strategies

•TTL Jittering — Randomize expiration times
•Background Refresh — Refresh before expiry
•Cache Warming — Pre-populate on startup
•Replication — Multiple cache copies
•Larger Cache — Reduce eviction pressure

Reactive Strategies

•Request Coalescing — Deduplicate in-flight
•Rate Limiting — Cap DB queries/second
•Circuit Breaker — Stop cascading failures
•Stale-While-Revalidate — Serve stale, refresh async
•Fallback Values — Default when DB fails

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
class ProtectedWriteAroundCache<T> {
  private coalescing = new Map<string, Promise<T | null>>();
  private circuitBreaker: CircuitBreaker;
  private rateLimiter: RateLimiter;
  
  constructor(
    private cache: CacheStore<T>,
    private database: Database<T>,
    private config: CacheConfig
  ) {
    this.circuitBreaker = new CircuitBreaker({
      failureThreshold: 5,
      resetTimeoutMs: 30000,
    });
    this.rateLimiter = new RateLimiter({
      maxRequestsPerSecond: config.maxDbQPS,
    });
  }
 
  async read(key: string): Promise<T | null> {
    // Layer 1: Check cache (with stale-while-revalidate)
    const cached = await this.cache.getWithMetadata(key);
    
    if (cached !== null) {
      const { data, ttlRemaining, setTime } = cached;
      
      // If near expiry, trigger background refresh
      if (ttlRemaining < this.config.refreshThreshold) {
        this.backgroundRefresh(key);
      }
      
      return data;
    }
    
    // Layer 2: Request coalescing
    if (this.coalescing.has(key)) {
      return this.coalescing.get(key)!;
    }
    
    // Layer 3: Circuit breaker check
    if (!this.circuitBreaker.canRequest()) {
      return this.handleCircuitOpen(key);
    }
    
    // Layer 4: Rate limiting
    if (!this.rateLimiter.tryAcquire()) {
      return this.handleRateLimited(key);
    }
    
    // Execute with protection
    const promise = this.fetchWithProtection(key);
    this.coalescing.set(key, promise);
    
    try {
      return await promise;
    } finally {
      this.coalescing.delete(key);
    }
  }
  
  private async fetchWithProtection(key: string): Promise<T | null> {
    const timeout = new Promise<never>((_, reject) => 
      setTimeout(() => reject(new Error('Timeout')), this.config.dbTimeoutMs)
    );
    
    try {
      const result = await Promise.race([
        this.database.get(key),
        timeout,
      ]);
      
      this.circuitBreaker.recordSuccess();
      
      if (result !== null) {
        await this.cache.set(key, result, this.jitteredTTL());
      }
      
      return result;
    } catch (error) {
      this.circuitBreaker.recordFailure();
      throw error;
    }
  }
  
  private async backgroundRefresh(key: string): Promise<void> {
    // Don't await - fire and forget
    (async () => {
      try {
        const fresh = await this.database.get(key);
        if (fresh !== null) {
          await this.cache.set(key, fresh, this.jitteredTTL());
        }
      } catch {
        // Background refresh failure is not critical
      }
    })();
  }
  
  private handleCircuitOpen(key: string): T | null {
    // Options:
    // 1. Return stale data if available
    // 2. Return fallback/default value
    // 3. Return null and let caller handle
    return this.getFallbackValue(key);
  }
  
  private handleRateLimited(key: string): Promise<T | null> {
    // Wait and retry, or return fallback
    return new Promise(resolve => {
      setTimeout(async () => {
        resolve(await this.read(key));
      }, 100);
    });
  }
  
  private jitteredTTL(): number {
    const jitter = this.config.baseTTL * 0.1;
    return this.config.baseTTL + (Math.random() - 0.5) * 2 * jitter;
  }
}

Defense in Depth

Stale-While-Revalidate Pattern

How SWR Works:

Cache entry has two TTLs: fresh TTL and stale TTL
During fresh period: serve directly, no refresh
During stale period: serve immediately, trigger background refresh
After stale expires: true cache miss, wait for database

├───────────────────├───────────────────├─────────────────────┤
│    Fresh Period   │   Stale Period    │    Expired          │
│   (serve as-is)   │ (serve + refresh) │   (cache miss)      │
├───────────────────├───────────────────├─────────────────────┤
│      TTL: 300s    │      +60s         │                     │
└───────────────────└───────────────────└─────────────────────┘

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
interface SWRConfig {
  freshTTL: number;      // Time data is considered fresh
  staleTTL: number;      // Time stale data can still be served
  refreshInFlight: Map<string, boolean>;
}
 
class StaleWhileRevalidateCache<T> {
  private refreshInFlight = new Map<string, boolean>();
  
  constructor(
    private cache: CacheStore<T>,
    private database: Database<T>,
    private config: SWRConfig
  ) {}
 
  async read(key: string): Promise<T | null> {
    const entry = await this.cache.getWithTimestamps(key);
    
    if (entry !== null) {
      const { data, setTime } = entry;
      const age = Date.now() - setTime;
      
      // Case 1: Fresh - serve directly
      if (age < this.config.freshTTL * 1000) {
        return data;
      }
      
      // Case 2: Stale but serveable - serve and refresh
      if (age < (this.config.freshTTL + this.config.staleTTL) * 1000) {
        // Serve stale data immediately
        this.triggerBackgroundRefresh(key);
        return data; // User sees no latency!
      }
      
      // Case 3: Too stale - treat as miss (fall through)
    }
    
    // True cache miss - must wait for database
    return this.fetchAndCache(key);
  }
  
  private triggerBackgroundRefresh(key: string): void {
    // Prevent multiple concurrent refreshes for same key
    if (this.refreshInFlight.get(key)) {
      return;
    }
    
    this.refreshInFlight.set(key, true);
    
    // Fire and forget
    (async () => {
      try {
        const fresh = await this.database.get(key);
        if (fresh !== null) {
          await this.cache.setWithTimestamp(key, fresh, {
            setTime: Date.now(),
            expiresAt: Date.now() + (this.config.freshTTL + this.config.staleTTL) * 1000,
          });
        }
      } finally {
        this.refreshInFlight.delete(key);
      }
    })();
  }
  
  private async fetchAndCache(key: string): Promise<T | null> {
    const data = await this.database.get(key);
    
    if (data !== null) {
      await this.cache.setWithTimestamp(key, data, {
        setTime: Date.now(),
        expiresAt: Date.now() + (this.config.freshTTL + this.config.staleTTL) * 1000,
      });
    }
    
    return data;
  }
}
 
// Usage example
const swrCache = new StaleWhileRevalidateCache(cache, db, {
  freshTTL: 300,   // 5 minutes fresh
  staleTTL: 60,    // 1 minute stale grace period
  refreshInFlight: new Map(),
});
 
// User always sees low latency
// During stale period: 0.5ms (cache read)
// True miss: 15ms (database fetch)
// The stale period absorbs the refresh latency

SWR Latency Profile
Data State	User Latency	Background Action	Data Freshness
Fresh (< freshTTL)	~1ms (cache)	None	Current
Stale (< staleTTL)	~1ms (cache)	Async DB fetch	Slightly outdated
Expired (> staleTTL)	~20ms (DB)	None (waited)	Fresh on response

SWR in HTTP Caching

Circuit Breaker for Database Protection

Circuit Breaker States:

                  ┌─────────────────────┐
                  │       CLOSED        │  Normal operation
                  │   (Requests pass)   │  Monitor for failures
                  └──────────┬──────────┘
                             │ Failure threshold exceeded
                             ▼
                  ┌─────────────────────┐
                  │        OPEN         │  Stop all requests
                  │  (Fail immediately) │  Wait for cooldown
                  └──────────┬──────────┘
                             │ Cooldown period expires
                             ▼
                  ┌─────────────────────┐
                  │     HALF-OPEN       │  Test with limited requests
                  │  (Allow some tests) │  Determine if recovered
                  └──────────┬──────────┘
                             │
              ┌──────────────┴──────────────┐
              │ Test succeeds               │ Test fails
              ▼                             ▼
         ┌────────┐                    ┌────────┐
         │ CLOSED │                    │  OPEN  │
         └────────┘                    └────────┘

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
enum CircuitState {
  CLOSED = 'CLOSED',     // Normal, requests pass through
  OPEN = 'OPEN',         // Failing, requests rejected
  HALF_OPEN = 'HALF_OPEN' // Testing if recovered
}
 
interface CircuitBreakerConfig {
  failureThreshold: number;    // Failures before opening
  successThreshold: number;    // Successes to close from half-open
  resetTimeoutMs: number;      // Time before trying again
}
 
class CircuitBreaker {
  private state: CircuitState = CircuitState.CLOSED;
  private failures = 0;
  private successes = 0;
  private lastFailureTime = 0;
  
  constructor(private config: CircuitBreakerConfig) {}
  
  canRequest(): boolean {
    switch (this.state) {
      case CircuitState.CLOSED:
        return true;
        
      case CircuitState.OPEN:
        // Check if cooldown has elapsed
        if (Date.now() - this.lastFailureTime > this.config.resetTimeoutMs) {
          this.transitionTo(CircuitState.HALF_OPEN);
          return true;
        }
        return false;
        
      case CircuitState.HALF_OPEN:
        // Allow limited test requests
        return true;
    }
  }
  
  recordSuccess(): void {
    if (this.state === CircuitState.HALF_OPEN) {
      this.successes++;
      if (this.successes >= this.config.successThreshold) {
        this.transitionTo(CircuitState.CLOSED);
      }
    }
    this.failures = 0; // Reset failure count on success
  }
  
  recordFailure(): void {
    this.failures++;
    this.lastFailureTime = Date.now();
    
    if (this.state === CircuitState.HALF_OPEN) {
      this.transitionTo(CircuitState.OPEN);
    } else if (this.failures >= this.config.failureThreshold) {
      this.transitionTo(CircuitState.OPEN);
    }
  }
  
  private transitionTo(newState: CircuitState): void {
    console.log(`Circuit breaker: ${this.state} -> ${newState}`);
    this.state = newState;
    
    if (newState === CircuitState.CLOSED) {
      this.failures = 0;
      this.successes = 0;
    } else if (newState === CircuitState.HALF_OPEN) {
      this.successes = 0;
    }
  }
  
  getState(): CircuitState {
    return this.state;
  }
}
 
// Usage in cache read
class CacheWithCircuitBreaker<T> {
  private circuitBreaker = new CircuitBreaker({
    failureThreshold: 5,
    successThreshold: 2,
    resetTimeoutMs: 30000,
  });
 
  async read(key: string): Promise<T | null> {
    const cached = await this.cache.get(key);
    if (cached !== null) return cached;
    
    // Check circuit breaker before DB call
    if (!this.circuitBreaker.canRequest()) {
      // Return fallback instead of hitting DB
      return this.fallbackValue(key);
    }
    
    try {
      const data = await this.database.get(key);
      this.circuitBreaker.recordSuccess();
      
      if (data !== null) {
        await this.cache.set(key, data);
      }
      return data;
    } catch (error) {
      this.circuitBreaker.recordFailure();
      throw error;
    }
  }
}

Circuit Breaker Trade-offs

Monitoring Cache Miss Behavior

Effective monitoring of cache miss behavior is essential for understanding system health and preventing issues before they become outages.

Key Metrics to Monitor

•Miss Rate — Percentage of reads that result in cache miss (target: <10%)
•Miss Latency (p50, p99) — Time to handle a cache miss (target: <50ms p99)
•Database QPS from Misses — Rate of database queries caused by misses
•Circuit Breaker State — How often circuit opens, duration open
•Coalescing Effectiveness — Requests saved by request coalescing
•Stale Serve Rate — Percentage of requests served stale data
•Cache Population Rate — New entries added per second
•Eviction Rate — Entries evicted per second (memory pressure indicator)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
interface CacheMissMetrics {
  // Core metrics
  totalReads: Counter;
  cacheHits: Counter;
  cacheMisses: Counter;
  
  // Latency histograms
  missLatency: Histogram;
  dbQueryLatency: Histogram;
  
  // Protection metrics
  coalescedRequests: Counter;
  circuitBreakerOpens: Counter;
  staleServes: Counter;
  rateLimitedRequests: Counter;
  
  // Health indicators
  dbConnectionPoolUsage: Gauge;
  cacheMemoryUsage: Gauge;
  evictionRate: Gauge;
}
 
class MetricsCollector {
  private metrics: CacheMissMetrics;
  
  recordCacheMiss(latencyMs: number, dbLatencyMs: number): void {
    this.metrics.cacheMisses.inc();
    this.metrics.missLatency.observe(latencyMs);
    this.metrics.dbQueryLatency.observe(dbLatencyMs);
  }
  
  recordCoalescedRequest(): void {
    this.metrics.coalescedRequests.inc();
  }
  
  // Alert thresholds
  checkAlerts(): Alert[] {
    const alerts: Alert[] = [];
    
    const hitRate = this.calculateHitRate();
    if (hitRate < 0.80) {
      alerts.push({
        severity: 'warning',
        message: `Cache hit rate dropped to ${(hitRate * 100).toFixed(1)}%`,
      });
    }
    
    const p99Latency = this.metrics.missLatency.getPercentile(0.99);
    if (p99Latency > 100) {
      alerts.push({
        severity: 'critical',
        message: `Cache miss p99 latency: ${p99Latency}ms exceeds threshold`,
      });
    }
    
    const evictionRate = this.metrics.evictionRate.get();
    if (evictionRate > 1000) {
      alerts.push({
        severity: 'warning',
        message: `High cache eviction rate: ${evictionRate}/sec`,
      });
    }
    
    return alerts;
  }
  
  // Dashboard data
  getDashboardData(): DashboardSnapshot {
    return {
      hitRate: this.calculateHitRate(),
      missRate: 1 - this.calculateHitRate(),
      avgMissLatencyMs: this.metrics.missLatency.getMean(),
      p99MissLatencyMs: this.metrics.missLatency.getPercentile(0.99),
      dbQPSFromMisses: this.metrics.cacheMisses.getRate(),
      coalescingEfficiency: this.getCoalescingEfficiency(),
      circuitBreakerState: this.getCircuitState(),
      cacheMemoryUsagePercent: this.metrics.cacheMemoryUsage.get(),
    };
  }
}

Alert Thresholds for Cache Miss Monitoring
Metric	Warning Threshold	Critical Threshold	Action
Hit Rate	< 85%	< 70%	Investigate cache size, TTL, access patterns
Miss p99 Latency	50ms	100ms	Check database performance, network
DB QPS from Misses	50% DB capacity	80% DB capacity	Increase cache, reduce traffic
Circuit Breaker Opens	Any occurrence	Multiple/hour	Investigate database health
Eviction Rate	100/sec	1000/sec	Increase cache size or reduce TTL

Alerting on Rate of Change

Summary: Mastering Cache Miss Behavior

Key Takeaways

•Cache misses have measurable latency — 10-30ms typical, dominated by database query time. Instrument and optimize the database path.
•Miss rate directly impacts database load — Small changes in hit rate cause large changes in DB queries due to the amplification effect.
•Cascading failures are real risks — Cache stampede, cold start avalanche, and retry amplification can overwhelm databases.
•Layer multiple protections — Request coalescing, circuit breakers, rate limiting, and stale-while-revalidate work together for resilience.
•Stale-while-revalidate hides miss latency — Serve stale data immediately, refresh in the background for user-perceived speed.
•Circuit breakers protect databases — When the database is struggling, stop making things worse by continuing to query it.
•Monitoring enables proactive response — Track hit rate, miss latency, and DB load to catch issues before they become outages.

What's Next:

Page Complete

4 / 5