Cache Design Considerations - Learning Module

Loading content...

0/246

Cache Warming Strategies

The Cold Cache Problem

Imagine deploying a new version of your application. Everything works perfectly in testing. But the moment traffic hits production, response times spike from 50ms to 5000ms. Databases get hammered. Users experience timeouts. What happened?

Your cache was cold.

A cold cache—one without pre-populated data—creates a "thundering herd" problem. Every request becomes a cache miss, flooding your backing stores with sudden load. At scale, this can cascade into complete system failure.

Cache warming is the practice of proactively populating caches before they're needed. It transforms the dangerous cold-start period into a managed operation, ensuring your system performs well from the first request.

Principal engineers treat cache warming as a critical deployment concern, not an afterthought. They understand that a well-warmed cache is the difference between a smooth deployment and an incident.

What You Will Learn

By the end of this page, you will understand cache warming strategies, implementation patterns, and best practices. You'll learn how to warm caches on deployment, maintain warm caches during operation, and handle warming during scaling events.

Understanding Cold Cache Impact

A cold cache creates multiple cascading problems. Understanding these helps prioritize warming investments.

The Cascade of Cold Cache Effects:

Cold Cache Impact Chain

•Step 1: Cache Misses — Every request hits the backing store instead of cache
•Step 2: Backing Store Load — Database/API experiences 10-100x normal load
•Step 3: Latency Spike — Response times increase by orders of magnitude
•Step 4: Timeout Cascade — Slow responses cause upstream timeouts and retries
•Step 5: Retry Amplification — Retries multiply already-elevated load
•Step 6: Resource Exhaustion — Connection pools, threads, memory exhausted
•Step 7: Complete Failure — System becomes unavailable

Cold Cache Scenarios
Scenario	Risk Level	Warming Priority
Fresh deployment (all instances)	Critical	Pre-warm before routing traffic
Rolling deployment (one instance)	Medium	Accept gradual warming
Scaling out (adding instances)	Medium	Pre-warm or gradual ramp
Cache system restart	Critical	Implement persistence or fast reload
TTL expiration of hot keys	Low-Medium	Background refresh before expiry
Traffic pattern shift	Low	Predictive warming if detectable

cold-cache-analysis.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
/**
 * Analyze the impact of cold cache on system capacity.
 */
interface SystemCapacity {
    cacheHitLatencyMs: number;      // Latency when cache hit
    cacheMissLatencyMs: number;     // Latency when cache miss
    normalHitRate: number;          // Normal cache hit rate (0-1)
    backingStoreCapacityQPS: number;  // Max queries/sec to database
    targetLatencyMs: number;        // SLA target
}
 
function analyzeColdCacheImpact(capacity: SystemCapacity): ColdCacheAnalysis {
    // Normal operation
    const normalAvgLatency = 
        capacity.cacheHitLatencyMs * capacity.normalHitRate +
        capacity.cacheMissLatencyMs * (1 - capacity.normalHitRate);
    
    const normalMissRate = 1 - capacity.normalHitRate;
    const normalBackingStoreLoad = 1.0; // Baseline
    
    // Cold cache operation (0% hit rate)
    const coldLatency = capacity.cacheMissLatencyMs;
    const coldBackingStoreMultiplier = 1 / normalMissRate;
    
    // Can backing store handle cold cache load?
    const backingStoreCapacityRatio = 
        capacity.backingStoreCapacityQPS / 
        (capacity.backingStoreCapacityQPS * coldBackingStoreMultiplier);
    
    return {
        normalAvgLatencyMs: normalAvgLatency,
        coldLatencyMs: coldLatency,
        latencyMultiplier: coldLatency / normalAvgLatency,
        backingStoreLoadMultiplier: coldBackingStoreMultiplier,
        backingStoreWouldOverload: coldBackingStoreMultiplier > 1 / normalMissRate,
        warmingRecommendation: getWarmingRecommendation(
            coldBackingStoreMultiplier,
            coldLatency,
            capacity.targetLatencyMs
        ),
    };
}
 
function getWarmingRecommendation(
    loadMultiplier: number,
    coldLatency: number,
    targetLatency: number
): WarmingRecommendation {
    if (loadMultiplier > 5 || coldLatency > targetLatency * 10) {
        return {
            priority: 'critical',
            strategy: 'mandatory-pre-warming',
            message: 'System cannot sustain cold cache. Pre-warming required before receiving traffic.',
        };
    }
    
    if (loadMultiplier > 2 || coldLatency > targetLatency * 3) {
        return {
            priority: 'high',
            strategy: 'pre-warming-with-gradual-ramp',
            message: 'Pre-warm critical paths, then gradually increase traffic.',
        };
    }
    
    return {
        priority: 'medium',
        strategy: 'gradual-warming-acceptable',
        message: 'System can absorb cold cache. Consider background warming for optimal UX.',
    };
}
 
interface ColdCacheAnalysis {
    normalAvgLatencyMs: number;
    coldLatencyMs: number;
    latencyMultiplier: number;
    backingStoreLoadMultiplier: number;
    backingStoreWouldOverload: boolean;
    warmingRecommendation: WarmingRecommendation;
}
 
interface WarmingRecommendation {
    priority: 'critical' | 'high' | 'medium' | 'low';
    strategy: string;
    message: string;
}
 
// Example analysis
const analysis = analyzeColdCacheImpact({
    cacheHitLatencyMs: 5,
    cacheMissLatencyMs: 100,
    normalHitRate: 0.95,
    backingStoreCapacityQPS: 10000,
    targetLatencyMs: 50,
});
 
console.log(analysis);
// {
//   normalAvgLatencyMs: 9.75,
//   coldLatencyMs: 100,
//   latencyMultiplier: 10.26,
//   backingStoreLoadMultiplier: 20,
//   backingStoreWouldOverload: true,
//   warmingRecommendation: {
//     priority: 'critical',
//     strategy: 'mandatory-pre-warming',
//     message: 'System cannot sustain cold cache...'
//   }
// }

The 95% Hit Rate Trap

If your system assumes 95% cache hit rate, cold cache creates 20x backing store load (100% misses vs 5% normally). Most databases cannot absorb 20x load spike. This is why cache warming isn't optional for high-hit-rate systems—it's mission critical.

Warming Strategies Overview

Cache warming strategies can be categorized by when and how they populate the cache:

The Warming Strategy Spectrum:

Cache Warming Strategies Comparison
Strategy	When Executed	Best For	Complexity
Startup Warming	Before accepting traffic	Known hot keys, critical data	Low
Background Warming	Continuously during operation	Maintaining warm cache, refresh	Medium
Lazy Warming	On first access (cache miss)	Unknown access patterns	Low
Predictive Warming	Based on predicted future access	Time-based patterns, ML predictions	High
Replica Warming	Copy from existing cache	Scaling, failover scenarios	Medium
TTL Refresh	Before expiration	Preventing hot key expiration	Medium

Choosing the Right Strategy:

Most production systems use a combination of strategies:

Startup warming for critical, known-hot data
Lazy warming for the long tail of accessed data
Background refresh to keep hot data fresh
Predictive warming for known time-based patterns

The 80/20 Warming Rule

Typically 20% of your data serves 80% of requests. Focus startup warming on identifying and loading this hot subset. Lazy warming handles the remaining long tail. You don't need to warm everything—just enough to absorb initial traffic.

Startup and Deployment Warming

Startup warming populates the cache before the instance accepts traffic. This is the most important warming strategy for systems with high cache dependency.

Implementation Patterns:

startup-warming.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
/**
 * Startup cache warming implementation.
 * Warms cache before accepting traffic.
 */
interface WarmingConfig {
    hotKeysSource: 'static' | 'analytics' | 'redis-keyspace';
    maxWarmingTimeMs: number;
    maxConcurrency: number;
    minItemsToWarm: number;
    failOnWarmingError: boolean;
}
 
class StartupWarmer<V> {
    private cache: CacheClient<V>;
    private dataLoader: DataLoader<V>;
    private config: WarmingConfig;
    
    constructor(
        cache: CacheClient<V>,
        dataLoader: DataLoader<V>,
        config: WarmingConfig
    ) {
        this.cache = cache;
        this.dataLoader = dataLoader;
        this.config = config;
    }
    
    /**
     * Execute warming before accepting traffic.
     * Should be called during application startup.
     */
    async warmBeforeTraffic(): Promise<WarmingResult> {
        const startTime = Date.now();
        console.log('Starting cache warming...');
        
        // 1. Identify keys to warm
        const hotKeys = await this.identifyHotKeys();
        console.log(`Identified ${hotKeys.length} hot keys to warm`);
        
        if (hotKeys.length < this.config.minItemsToWarm) {
            console.warn(`Only found ${hotKeys.length} keys; expected at least ${this.config.minItemsToWarm}`);
        }
        
        // 2. Warm in parallel batches with concurrency limit
        const results = await this.warmKeysBatched(hotKeys);
        
        const elapsed = Date.now() - startTime;
        const result = this.compileResults(results, elapsed, hotKeys.length);
        
        console.log(`Warming complete: ${result.successCount}/${result.totalAttempted} keys in ${elapsed}ms`);
        
        // 3. Decide if we should proceed
        if (this.config.failOnWarmingError && result.errorCount > 0) {
            throw new Error(`Warming failed with ${result.errorCount} errors`);
        }
        
        return result;
    }
    
    /**
     * Identify hot keys based on configured source.
     */
    private async identifyHotKeys(): Promise<string[]> {
        switch (this.config.hotKeysSource) {
            case 'static':
                return this.getStaticHotKeys();
            case 'analytics':
                return this.getAnalyticsHotKeys();
            case 'redis-keyspace':
                return this.getRedisKeyspaceHotKeys();
            default:
                return this.getStaticHotKeys();
        }
    }
    
    /**
     * Static hot keys from configuration.
     */
    private getStaticHotKeys(): string[] {
        // Example: Critical configuration, feature flags, popular products
        return [
            'config:global',
            'feature-flags:active',
            ...Array.from({ length: 100 }, (_, i) => `product:${i + 1}`),
            ...Array.from({ length: 50 }, (_, i) => `category:${i + 1}`),
        ];
    }
    
    /**
     * Hot keys from analytics/metrics system.
     */
    private async getAnalyticsHotKeys(): Promise<string[]> {
        // Query analytics for most-accessed keys in last period
        // This requires integration with your metrics system
        const response = await fetch(
            'http://analytics-service/api/cache/hot-keys?period=24h&limit=1000'
        );
        const data = await response.json();
        return data.keys;
    }
    
    /**
     * Hot keys from Redis memory analysis.
     */
    private async getRedisKeyspaceHotKeys(): Promise<string[]> {
        // Use Redis SCAN + OBJECT FREQ for hotkey detection
        // Requires Redis 4.0+ with LFU eviction policy
        // Alternative: Sample keys and check access patterns
        return []; // Implementation depends on Redis setup
    }
    
    /**
     * Warm keys in parallel with concurrency limit.
     */
    private async warmKeysBatched(keys: string[]): Promise<WarmingAttempt[]> {
        const results: WarmingAttempt[] = [];
        const deadline = Date.now() + this.config.maxWarmingTimeMs;
        
        // Process in chunks respecting concurrency
        const chunks = this.chunkArray(keys, this.config.maxConcurrency);
        
        for (const chunk of chunks) {
            if (Date.now() > deadline) {
                console.warn('Warming time limit reached, stopping early');
                break;
            }
            
            const chunkResults = await Promise.all(
                chunk.map(key => this.warmSingleKey(key))
            );
            
            results.push(...chunkResults);
        }
        
        return results;
    }
    
    /**
     * Warm a single key: load from source, store in cache.
     */
    private async warmSingleKey(key: string): Promise<WarmingAttempt> {
        const start = Date.now();
        
        try {
            // Load from authoritative source
            const value = await this.dataLoader.load(key);
            
            // Store in cache
            await this.cache.set(key, value);
            
            return {
                key,
                success: true,
                durationMs: Date.now() - start,
            };
        } catch (error) {
            return {
                key,
                success: false,
                durationMs: Date.now() - start,
                error: error instanceof Error ? error.message : 'Unknown error',
            };
        }
    }
    
    private chunkArray<T>(array: T[], size: number): T[][] {
        const chunks: T[][] = [];
        for (let i = 0; i < array.length; i += size) {
            chunks.push(array.slice(i, i + size));
        }
        return chunks;
    }
    
    private compileResults(
        attempts: WarmingAttempt[],
        totalDurationMs: number,
        totalTargeted: number
    ): WarmingResult {
        const successful = attempts.filter(a => a.success);
        const failed = attempts.filter(a => !a.success);
        
        return {
            totalAttempted: attempts.length,
            successCount: successful.length,
            errorCount: failed.length,
            skippedCount: totalTargeted - attempts.length,
            totalDurationMs,
            avgItemDurationMs: attempts.length > 0 
                ? attempts.reduce((sum, a) => sum + a.durationMs, 0) / attempts.length
                : 0,
            errors: failed.map(f => ({ key: f.key, error: f.error! })),
        };
    }
}
 
interface WarmingAttempt {
    key: string;
    success: boolean;
    durationMs: number;
    error?: string;
}
 
interface WarmingResult {
    totalAttempted: number;
    successCount: number;
    errorCount: number;
    skippedCount: number;
    totalDurationMs: number;
    avgItemDurationMs: number;
    errors: { key: string; error: string }[];
}
 
interface DataLoader<V> {
    load(key: string): Promise<V>;
}
 
interface CacheClient<V> {
    set(key: string, value: V): Promise<void>;
}

Deployment Integration:

Startup warming should integrate with your deployment lifecycle:

deployment-integration.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
/**
 * Kubernetes-style deployment with cache warming.
 */
import express from 'express';
 
class ApplicationServer {
    private app: express.Application;
    private isReady: boolean = false;
    private warmer: StartupWarmer<unknown>;
    
    constructor() {
        this.app = express();
        this.setupHealthEndpoints();
    }
    
    private setupHealthEndpoints(): void {
        // Liveness: Is the process running?
        this.app.get('/health/live', (req, res) => {
            res.status(200).json({ status: 'alive' });
        });
        
        // Readiness: Should we receive traffic?
        this.app.get('/health/ready', (req, res) => {
            if (this.isReady) {
                res.status(200).json({ status: 'ready' });
            } else {
                res.status(503).json({ status: 'warming' });
            }
        });
    }
    
    async start(port: number): Promise<void> {
        // Start server but not ready for traffic
        const server = this.app.listen(port, () => {
            console.log(`Server listening on port ${port} (not ready yet)`);
        });
        
        // Warm cache before accepting traffic
        try {
            console.log('Beginning cache warming...');
            const result = await this.warmer.warmBeforeTraffic();
            
            console.log(`Cache warming complete: ${result.successCount} items`);
            
            // Now ready for traffic
            this.isReady = true;
            console.log('Server ready for traffic');
            
        } catch (error) {
            console.error('Cache warming failed:', error);
            
            // Decide: fail startup or proceed with cold cache?
            if (process.env.REQUIRE_WARM_CACHE === 'true') {
                console.error('Shutting down due to warming failure');
                process.exit(1);
            } else {
                console.warn('Proceeding with partially warm cache');
                this.isReady = true;
            }
        }
    }
}
 
// Kubernetes deployment configuration (pseudo-YAML):
// 
// readinessProbe:
//   httpGet:
//     path: /health/ready
//     port: 8080
//   initialDelaySeconds: 5
//   periodSeconds: 5
//
// livenessProbe:
//   httpGet:
//     path: /health/live
//     port: 8080
//   initialDelaySeconds: 10
//   periodSeconds: 10

Don't Block Forever

Set a maximum warming time. If warming takes too long, you risk deployment timeouts, stuck deployments, and delayed rollbacks. Better to proceed with 80% warm than block indefinitely trying for 100%.

Background Warming and Refresh

Background warming maintains cache freshness during normal operation. It proactively refreshes entries before they expire, preventing cache misses on hot keys.

The Problem with TTL Expiration:

Consider a product detail page cached with 5-minute TTL:

Popular product cached at 10:00:00
TTL expires at 10:05:00
Next request at 10:05:01 — cache miss
All concurrent requests hit database (thundering herd)

Background refresh solves this by refreshing before expiration.

background-refresh.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
/**
 * Background cache refresh to prevent TTL thundering herd.
 * Refreshes entries before they expire.
 */
interface RefreshConfig {
    refreshWindowMs: number;     // Refresh this long before expiry
    maxConcurrentRefreshes: number;
    refreshIntervalMs: number;    // How often to check for items to refresh
}
 
class BackgroundRefresher<V> {
    private cache: CacheWithMetadata<V>;
    private loader: DataLoader<V>;
    private config: RefreshConfig;
    private refreshing: Set<string> = new Set();
    private intervalId?: NodeJS.Timeout;
    
    constructor(
        cache: CacheWithMetadata<V>,
        loader: DataLoader<V>,
        config: RefreshConfig
    ) {
        this.cache = cache;
        this.loader = loader;
        this.config = config;
    }
    
    start(): void {
        console.log('Starting background cache refresher');
        
        this.intervalId = setInterval(
            () => this.refreshDueItems(),
            this.config.refreshIntervalMs
        );
    }
    
    stop(): void {
        if (this.intervalId) {
            clearInterval(this.intervalId);
            this.intervalId = undefined;
        }
    }
    
    /**
     * Find and refresh items approaching expiration.
     */
    private async refreshDueItems(): Promise<void> {
        const now = Date.now();
        const refreshThreshold = now + this.config.refreshWindowMs;
        
        // Find items due for refresh
        const dueItems = this.cache.getItemsExpiringBefore(refreshThreshold);
        
        // Filter out items already being refreshed
        const toRefresh = dueItems
            .filter(item => !this.refreshing.has(item.key))
            .slice(0, this.config.maxConcurrentRefreshes);
        
        if (toRefresh.length > 0) {
            console.log(`Refreshing ${toRefresh.length} cache items in background`);
        }
        
        // Refresh in parallel
        await Promise.all(
            toRefresh.map(item => this.refreshItem(item.key))
        );
    }
    
    /**
     * Refresh a single item.
     */
    private async refreshItem(key: string): Promise<void> {
        this.refreshing.add(key);
        
        try {
            // Load fresh data
            const value = await this.loader.load(key);
            
            // Update cache with fresh TTL
            await this.cache.set(key, value);
            
            console.log(`Refreshed cache key: ${key}`);
        } catch (error) {
            console.error(`Failed to refresh ${key}:`, error);
            // Don't delete existing entry - better to serve stale than nothing
        } finally {
            this.refreshing.delete(key);
        }
    }
}
 
interface CacheWithMetadata<V> {
    getItemsExpiringBefore(timestamp: number): { key: string; expiresAt: number }[];
    set(key: string, value: V): Promise<void>;
}
 
/**
 * Alternative: Request-time stale-while-revalidate pattern.
 */
class StaleWhileRevalidateCache<V> {
    private cache: Map<string, SWREntry<V>> = new Map();
    private loader: DataLoader<V>;
    private refreshing: Set<string> = new Set();
    
    constructor(
        loader: DataLoader<V>,
        private ttlMs: number,
        private staleWindowMs: number  // Serve stale for this long while refreshing
    ) {
        this.loader = loader;
    }
    
    async get(key: string): Promise<V | undefined> {
        const entry = this.cache.get(key);
        const now = Date.now();
        
        if (!entry) {
            return undefined; // Cache miss
        }
        
        const isFresh = now < entry.expiresAt;
        const isStale = now >= entry.expiresAt && 
                        now < entry.expiresAt + this.staleWindowMs;
        const isExpired = now >= entry.expiresAt + this.staleWindowMs;
        
        if (isFresh) {
            return entry.value; // Fresh hit
        }
        
        if (isStale) {
            // Serve stale, trigger background refresh
            if (!this.refreshing.has(key)) {
                this.triggerBackgroundRefresh(key);
            }
            return entry.value; // Stale hit (user doesn't wait)
        }
        
        if (isExpired) {
            this.cache.delete(key);
            return undefined; // Expired, treat as miss
        }
        
        return undefined;
    }
    
    private async triggerBackgroundRefresh(key: string): Promise<void> {
        this.refreshing.add(key);
        
        try {
            const value = await this.loader.load(key);
            this.set(key, value);
        } catch (error) {
            console.error(`Background refresh failed for ${key}`);
        } finally {
            this.refreshing.delete(key);
        }
    }
    
    set(key: string, value: V): void {
        this.cache.set(key, {
            value,
            createdAt: Date.now(),
            expiresAt: Date.now() + this.ttlMs,
        });
    }
}
 
interface SWREntry<V> {
    value: V;
    createdAt: number;
    expiresAt: number;
}

Stale-While-Revalidate

The stale-while-revalidate pattern (borrowed from HTTP caching) provides the best user experience: users always get a fast response (even if slightly stale), while freshness is maintained through background updates. This pattern is used extensively at Netflix, Airbnb, and other high-traffic systems.

Predictive Warming

Predictive warming pre-fetches data based on anticipated future access patterns. This is the most sophisticated warming strategy, using historical patterns or machine learning to predict what data will be needed.

Predictive Warming Triggers:

Prediction Signals

•Time-based patterns — Morning traffic differs from evening; weekday from weekend
•Event triggers — Marketing email sent → warm promoted product pages
•Navigation prediction — User on homepage → warm likely next pages
•Geographic patterns — 9 AM in each timezone → warm region-specific content
•Seasonal patterns — Holiday shopping → warm gift categories

predictive-warming.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
/**
 * Predictive cache warming based on access patterns.
 */
interface AccessPattern {
    key: string;
    hourOfDay: number[];     // Hours when frequently accessed
    dayOfWeek: number[];     // Days when frequently accessed (0=Sun)
    avgAccessesPerHour: number;
}
 
class PredictiveWarmer<V> {
    private patterns: Map<string, AccessPattern> = new Map();
    private cache: CacheClient<V>;
    private loader: DataLoader<V>;
    
    constructor(cache: CacheClient<V>, loader: DataLoader<V>) {
        this.cache = cache;
        this.loader = loader;
    }
    
    /**
     * Load access patterns from analytics.
     */
    async loadPatterns(source: PatternSource): Promise<void> {
        const patterns = await source.getAccessPatterns();
        
        for (const pattern of patterns) {
            this.patterns.set(pattern.key, pattern);
        }
        
        console.log(`Loaded ${patterns.length} access patterns`);
    }
    
    /**
     * Schedule warming based on time patterns.
     */
    scheduleTimedWarming(): void {
        // Run prediction every hour
        setInterval(() => {
            this.warmPredictedKeys();
        }, 60 * 60 * 1000);
        
        // Also run immediately
        this.warmPredictedKeys();
    }
    
    /**
     * Warm keys predicted to be accessed soon.
     */
    private async warmPredictedKeys(): Promise<void> {
        const now = new Date();
        const currentHour = now.getHours();
        const currentDay = now.getDay();
        const nextHour = (currentHour + 1) % 24;
        
        // Find keys expected to be hot in the next hour
        const predictedHot: string[] = [];
        
        for (const [key, pattern] of this.patterns) {
            if (pattern.hourOfDay.includes(nextHour) &&
                pattern.dayOfWeek.includes(currentDay) &&
                pattern.avgAccessesPerHour > 10) {  // Threshold
                predictedHot.push(key);
            }
        }
        
        console.log(`Predictive warming: ${predictedHot.length} keys for upcoming hour`);
        
        // Warm predicted keys
        await Promise.all(
            predictedHot.map(key => this.warmIfMissing(key))
        );
    }
    
    private async warmIfMissing(key: string): Promise<void> {
        // Check if already cached
        const existing = await this.cache.get(key);
        if (existing !== null) {
            return; // Already warm
        }
        
        // Load and cache
        try {
            const value = await this.loader.load(key);
            await this.cache.set(key, value);
        } catch (error) {
            console.error(`Predictive warming failed for ${key}`);
        }
    }
}
 
/**
 * Event-triggered warming.
 */
class EventTriggeredWarmer<V> {
    private eventHandlers: Map<string, (event: CacheWarmingEvent) => string[]>;
    private cache: CacheClient<V>;
    private loader: DataLoader<V>;
    
    constructor(cache: CacheClient<V>, loader: DataLoader<V>) {
        this.cache = cache;
        this.loader = loader;
        this.eventHandlers = new Map();
    }
    
    /**
     * Register warming handler for event type.
     */
    registerHandler(
        eventType: string,
        handler: (event: CacheWarmingEvent) => string[]
    ): void {
        this.eventHandlers.set(eventType, handler);
    }
    
    /**
     * Handle incoming event.
     */
    async handleEvent(event: CacheWarmingEvent): Promise<void> {
        const handler = this.eventHandlers.get(event.type);
        if (!handler) {
            return;
        }
        
        const keysToWarm = handler(event);
        console.log(`Event '${event.type}' triggered warming of ${keysToWarm.length} keys`);
        
        await Promise.all(
            keysToWarm.map(key => this.warmKey(key))
        );
    }
    
    private async warmKey(key: string): Promise<void> {
        try {
            const value = await this.loader.load(key);
            await this.cache.set(key, value);
        } catch (error) {
            console.error(`Event-triggered warming failed for ${key}`);
        }
    }
}
 
interface CacheWarmingEvent {
    type: string;
    payload: unknown;
}
 
interface PatternSource {
    getAccessPatterns(): Promise<AccessPattern[]>;
}
 
// Example: Marketing campaign warming
const eventWarmer = new EventTriggeredWarmer(cache, loader);
 
eventWarmer.registerHandler('marketing.email.sent', (event) => {
    const campaign = event.payload as { productIds: string[] };
    // Warm all products featured in email
    return campaign.productIds.map(id => `product:${id}`);
});
 
eventWarmer.registerHandler('feature.announcement', (event) => {
    const feature = event.payload as { featureKey: string };
    // Warm documentation and related pages
    return [
        `feature:${feature.featureKey}:docs`,
        `feature:${feature.featureKey}:faq`,
    ];
});

Start Simple with Predictions

You don't need ML for useful predictions. Simple heuristics work well: "warm homepage products before the workday starts," "warm product pages when inventory alert sent." Add complexity only when simple rules prove insufficient.

Warming During Scaling Events

Scaling events—adding instances, replacing failed nodes, or handling failover—create cold cache challenges similar to deployments. Proper warming strategies prevent performance degradation during scaling.

Scaling Scenarios and Strategies:

Scaling Event Warming Strategies
Scenario	Challenge	Warming Strategy
Scale out (add instances)	New instances have empty local cache	Pre-warm from L2 or gradual traffic ramp
Scale in (remove instances)	Removed instance's cache lost	No action (remaining instances warm)
Rolling restart	Each instance restarts cold	Stagger restarts, pre-warm before traffic
Cache node failure	Portion of distributed cache lost	Replica promotion or rehashing with pre-warm
Region failover	All caches in failed region lost	Pre-warm new region from backup before traffic

scaling-warming.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
/**
 * Cache warming during auto-scaling events.
 */
interface ScalingWarmingConfig {
    warmingConcurrency: number;
    maxWarmingTimeMs: number;
    trafficRampDurationMs: number;
    initialTrafficPercent: number;
}
 
class ScalingWarmingManager {
    private config: ScalingWarmingConfig;
    private loadBalancerClient: LoadBalancerClient;
    private warmer: StartupWarmer<unknown>;
    
    constructor(
        config: ScalingWarmingConfig,
        loadBalancerClient: LoadBalancerClient,
        warmer: StartupWarmer<unknown>
    ) {
        this.config = config;
        this.loadBalancerClient = loadBalancerClient;
        this.warmer = warmer;
    }
    
    /**
     * Scale out with proper warming.
     * Called when new instances are launched.
     */
    async onNewInstanceLaunched(instanceId: string): Promise<void> {
        console.log(`New instance ${instanceId} launched, beginning warm-up`);
        
        // 1. Instance is NOT in load balancer yet
        // 2. Execute cache warming
        const warmingResult = await this.warmer.warmBeforeTraffic();
        
        console.log(`Instance ${instanceId} warming complete: ${warmingResult.successCount} items`);
        
        // 3. Gradually add to load balancer
        await this.gradualTrafficRamp(instanceId);
    }
    
    /**
     * Gradually increase traffic to new instance.
     */
    private async gradualTrafficRamp(instanceId: string): Promise<void> {
        const steps = 5;
        const stepDuration = this.config.trafficRampDurationMs / steps;
        
        let currentPercent = this.config.initialTrafficPercent;
        const increment = (100 - currentPercent) / (steps - 1);
        
        for (let i = 0; i < steps; i++) {
            await this.loadBalancerClient.setInstanceWeight(instanceId, currentPercent);
            console.log(`Instance ${instanceId} traffic: ${currentPercent}%`);
            
            if (i < steps - 1) {
                await this.sleep(stepDuration);
                currentPercent += increment;
            }
        }
        
        console.log(`Instance ${instanceId} at full traffic`);
    }
    
    private sleep(ms: number): Promise<void> {
        return new Promise(resolve => setTimeout(resolve, ms));
    }
}
 
/**
 * Distributed cache replica warming.
 * For when cache nodes fail and need replacement.
 */
class ReplicaWarmer {
    /**
     * Warm new cache node from surviving replicas.
     */
    async warmFromReplica(
        newNode: CacheNode,
        sourceNode: CacheNode,
        keyPattern: string
    ): Promise<void> {
        console.log(`Warming ${newNode.id} from replica ${sourceNode.id}`);
        
        // Get keys matching pattern from source
        const keys = await sourceNode.scanKeys(keyPattern);
        console.log(`Found ${keys.length} keys to replicate`);
        
        // Stream data from source to new node
        const batchSize = 100;
        for (let i = 0; i < keys.length; i += batchSize) {
            const batch = keys.slice(i, i + batchSize);
            
            // Get values from source
            const values = await sourceNode.mget(batch);
            
            // Set on new node
            await newNode.mset(values);
            
            console.log(`Replicated ${Math.min(i + batchSize, keys.length)}/${keys.length} keys`);
        }
        
        console.log(`Replica warming complete`);
    }
}
 
/**
 * Region failover with cache warming.
 */
class RegionFailoverManager {
    /**
     * Execute failover to backup region with warming.
     */
    async failoverToBackupRegion(
        primaryRegion: string,
        backupRegion: string
    ): Promise<void> {
        console.log(`Initiating failover from ${primaryRegion} to ${backupRegion}`);
        
        // 1. Stop routing traffic to failed region
        await this.routingService.disableRegion(primaryRegion);
        
        // 2. Warm backup region cache from data store
        console.log('Warming backup region cache...');
        const warmingResult = await this.regionalWarmer.warmRegion(backupRegion);
        console.log(`Warmed ${warmingResult.successCount} items in backup region`);
        
        // 3. Gradually route traffic to backup region
        await this.routingService.gradualEnable(backupRegion, {
            initialPercent: 10,
            incrementPercent: 10,
            intervalMs: 30000,
        });
        
        console.log(`Failover to ${backupRegion} complete`);
    }
}
 
interface LoadBalancerClient {
    setInstanceWeight(instanceId: string, percent: number): Promise<void>;
}
 
interface CacheNode {
    id: string;
    scanKeys(pattern: string): Promise<string[]>;
    mget(keys: string[]): Promise<Map<string, unknown>>;
    mset(values: Map<string, unknown>): Promise<void>;
}

Gradual Traffic Ramp

Never immediately send 100% traffic share to a cold instance. Gradually ramp traffic (10% → 25% → 50% → 100%) while the cache warms from actual requests. This converts cold-start into gradual warming with minimal user impact.

Best Practices and Common Pitfalls

Cache warming, when implemented poorly, can cause as many problems as it solves. Here are best practices and pitfalls to avoid:

Best Practices

•Set time limits — Bound warming to prevent stuck deployments
•Warm the 20% — Focus on hot subset, not entire dataset
•Stagger warming — Avoid overwhelming backing stores
•Monitor warming — Track success, failures, duration
•Graceful degradation — Accept partial warming if needed
•Test warming — Include in CI/CD pipeline
•Use health checks — Only ready after warm

Common Pitfalls

•Warming everything — Wastes resources on rarely-used data
•Unbounded concurrency — Overwhelms backing store during warming
•No timeout — Warming hangs forever, blocks deployment
•Ignoring failures — Proceeding with failed warming
•Synchronous warming — Blocking request threads
•No metrics — Can't tell if warming is effective
•Warming in production only — Bugs discovered too late

warming-checklist.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
/**
 * Cache warming readiness checklist.
 */
interface WarmingChecklist {
    // Configuration
    hasTimeLimit: boolean;
    hasConcurrencyLimit: boolean;
    hasHotKeyIdentification: boolean;
    
    // Integration
    integratedWithHealthCheck: boolean;
    integratedWithDeployment: boolean;
    integratedWithScaling: boolean;
    
    // Observability
    hasSuccessMetrics: boolean;
    hasFailureMetrics: boolean;
    hasDurationMetrics: boolean;
    hasAlerting: boolean;
    
    // Resilience
    handlesMissingData: boolean;
    handlesSlowBackingStore: boolean;
    handlesBackingStoreFailure: boolean;
    
    // Testing
    testedInDevelopment: boolean;
    testedInStaging: boolean;
    loadTestedWarmingImpact: boolean;
}
 
function evaluateWarmingReadiness(checklist: WarmingChecklist): {
    ready: boolean;
    score: number;
    issues: string[];
} {
    const issues: string[] = [];
    let score = 0;
    const maxScore = Object.keys(checklist).length;
    
    // Critical items
    if (!checklist.hasTimeLimit) {
        issues.push('CRITICAL: No time limit - warming could hang forever');
    } else score++;
    
    if (!checklist.hasConcurrencyLimit) {
        issues.push('CRITICAL: No concurrency limit - could overwhelm backing store');
    } else score++;
    
    if (!checklist.integratedWithHealthCheck) {
        issues.push('CRITICAL: Not integrated with health check - traffic routed before ready');
    } else score++;
    
    // Important items
    if (!checklist.hasHotKeyIdentification) {
        issues.push('WARNING: No hot key identification - may warm wrong data');
    } else score++;
    
    if (!checklist.hasFailureMetrics) {
        issues.push('WARNING: No failure metrics - can\'t detect warming problems');
    } else score++;
    
    if (!checklist.handlesMissingData) {
        issues.push('WARNING: Doesn\'t handle missing data - may fail on edge cases');
    } else score++;
    
    // Nice to have
    const niceToHaveFields: (keyof WarmingChecklist)[] = [
        'integratedWithDeployment',
        'integratedWithScaling',
        'hasSuccessMetrics',
        'hasDurationMetrics',
        'hasAlerting',
        'handlesSlowBackingStore',
        'handlesBackingStoreFailure',
        'testedInDevelopment',
        'testedInStaging',
        'loadTestedWarmingImpact',
    ];
    
    for (const field of niceToHaveFields) {
        if (checklist[field]) score++;
    }
    
    const ready = !issues.some(i => i.startsWith('CRITICAL'));
    
    return {
        ready,
        score: Math.round((score / maxScore) * 100),
        issues,
    };
}
 
// Example evaluation
const myWarmingSetup: WarmingChecklist = {
    hasTimeLimit: true,
    hasConcurrencyLimit: true,
    hasHotKeyIdentification: true,
    integratedWithHealthCheck: true,
    integratedWithDeployment: true,
    integratedWithScaling: false,
    hasSuccessMetrics: true,
    hasFailureMetrics: true,
    hasDurationMetrics: true,
    hasAlerting: false,
    handlesMissingData: true,
    handlesSlowBackingStore: false,
    handlesBackingStoreFailure: false,
    testedInDevelopment: true,
    testedInStaging: true,
    loadTestedWarmingImpact: false,
};
 
const evaluation = evaluateWarmingReadiness(myWarmingSetup);
console.log(evaluation);
// { ready: true, score: 73, issues: [...] }

Warming Is Not Optional for High-Hit-Rate Systems

If your system relies on 90%+ cache hit rates to meet SLOs, cache warming is mandatory infrastructure. Treat it with the same rigor as database migrations or load balancer configuration. Test it, monitor it, and plan for its failure.

Summary: Cache Warming Mastery

Cache warming transforms cold-start risk into managed operations. Let's consolidate the key principles:

Key Takeaways

•Cold cache causes cascading failures — High hit-rate systems cannot absorb sudden 100% miss rate.
•Startup warming is critical — Warm before accepting traffic; integrate with health checks.
•Background refresh prevents thundering herd — Proactively refresh before TTL expiration.
•Stale-while-revalidate provides best UX — Users get fast (possibly stale) response while background refreshes.
•Predictive warming anticipates demand — Use time patterns, events, and heuristics to pre-fetch.
•Scaling events require warming — Gradual traffic ramp gives new instances time to warm.
•Time-bound all warming — Never block forever; accept partial warming if needed.
•Monitor warming effectiveness — Track success, failure, duration; alert on anomalies.

Module Complete:

You have now completed the Cache Design Considerations module. You've mastered:

Cache Key Design — Composition, namespacing, collision prevention, security
Cache Size and Eviction — Working set analysis, eviction policies, memory pressure
Distributed vs Local Cache — Topology trade-offs, coherence strategies, multi-tier architectures
Cache Warming Strategies — Startup, background, predictive, and scaling-event warming

These concepts equip you to design and implement production-grade caching systems that perform reliably at scale.

Module Complete

You now possess a comprehensive understanding of cache design considerations at the LLD level. From key design to warming strategies, you can architect caching solutions that maximize hit rates, maintain consistency, handle failures gracefully, and perform optimally from the first request. Apply these patterns to build systems that scale confidently.