Loading content...
Imagine deploying a new version of your application. Everything works perfectly in testing. But the moment traffic hits production, response times spike from 50ms to 5000ms. Databases get hammered. Users experience timeouts. What happened?
Your cache was cold.
A cold cache—one without pre-populated data—creates a "thundering herd" problem. Every request becomes a cache miss, flooding your backing stores with sudden load. At scale, this can cascade into complete system failure.
Cache warming is the practice of proactively populating caches before they're needed. It transforms the dangerous cold-start period into a managed operation, ensuring your system performs well from the first request.
Principal engineers treat cache warming as a critical deployment concern, not an afterthought. They understand that a well-warmed cache is the difference between a smooth deployment and an incident.
By the end of this page, you will understand cache warming strategies, implementation patterns, and best practices. You'll learn how to warm caches on deployment, maintain warm caches during operation, and handle warming during scaling events.
A cold cache creates multiple cascading problems. Understanding these helps prioritize warming investments.
The Cascade of Cold Cache Effects:
| Scenario | Risk Level | Warming Priority |
|---|---|---|
| Fresh deployment (all instances) | Critical | Pre-warm before routing traffic |
| Rolling deployment (one instance) | Medium | Accept gradual warming |
| Scaling out (adding instances) | Medium | Pre-warm or gradual ramp |
| Cache system restart | Critical | Implement persistence or fast reload |
| TTL expiration of hot keys | Low-Medium | Background refresh before expiry |
| Traffic pattern shift | Low | Predictive warming if detectable |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
/** * Analyze the impact of cold cache on system capacity. */interface SystemCapacity { cacheHitLatencyMs: number; // Latency when cache hit cacheMissLatencyMs: number; // Latency when cache miss normalHitRate: number; // Normal cache hit rate (0-1) backingStoreCapacityQPS: number; // Max queries/sec to database targetLatencyMs: number; // SLA target} function analyzeColdCacheImpact(capacity: SystemCapacity): ColdCacheAnalysis { // Normal operation const normalAvgLatency = capacity.cacheHitLatencyMs * capacity.normalHitRate + capacity.cacheMissLatencyMs * (1 - capacity.normalHitRate); const normalMissRate = 1 - capacity.normalHitRate; const normalBackingStoreLoad = 1.0; // Baseline // Cold cache operation (0% hit rate) const coldLatency = capacity.cacheMissLatencyMs; const coldBackingStoreMultiplier = 1 / normalMissRate; // Can backing store handle cold cache load? const backingStoreCapacityRatio = capacity.backingStoreCapacityQPS / (capacity.backingStoreCapacityQPS * coldBackingStoreMultiplier); return { normalAvgLatencyMs: normalAvgLatency, coldLatencyMs: coldLatency, latencyMultiplier: coldLatency / normalAvgLatency, backingStoreLoadMultiplier: coldBackingStoreMultiplier, backingStoreWouldOverload: coldBackingStoreMultiplier > 1 / normalMissRate, warmingRecommendation: getWarmingRecommendation( coldBackingStoreMultiplier, coldLatency, capacity.targetLatencyMs ), };} function getWarmingRecommendation( loadMultiplier: number, coldLatency: number, targetLatency: number): WarmingRecommendation { if (loadMultiplier > 5 || coldLatency > targetLatency * 10) { return { priority: 'critical', strategy: 'mandatory-pre-warming', message: 'System cannot sustain cold cache. Pre-warming required before receiving traffic.', }; } if (loadMultiplier > 2 || coldLatency > targetLatency * 3) { return { priority: 'high', strategy: 'pre-warming-with-gradual-ramp', message: 'Pre-warm critical paths, then gradually increase traffic.', }; } return { priority: 'medium', strategy: 'gradual-warming-acceptable', message: 'System can absorb cold cache. Consider background warming for optimal UX.', };} interface ColdCacheAnalysis { normalAvgLatencyMs: number; coldLatencyMs: number; latencyMultiplier: number; backingStoreLoadMultiplier: number; backingStoreWouldOverload: boolean; warmingRecommendation: WarmingRecommendation;} interface WarmingRecommendation { priority: 'critical' | 'high' | 'medium' | 'low'; strategy: string; message: string;} // Example analysisconst analysis = analyzeColdCacheImpact({ cacheHitLatencyMs: 5, cacheMissLatencyMs: 100, normalHitRate: 0.95, backingStoreCapacityQPS: 10000, targetLatencyMs: 50,}); console.log(analysis);// {// normalAvgLatencyMs: 9.75,// coldLatencyMs: 100,// latencyMultiplier: 10.26,// backingStoreLoadMultiplier: 20,// backingStoreWouldOverload: true,// warmingRecommendation: {// priority: 'critical',// strategy: 'mandatory-pre-warming',// message: 'System cannot sustain cold cache...'// }// }If your system assumes 95% cache hit rate, cold cache creates 20x backing store load (100% misses vs 5% normally). Most databases cannot absorb 20x load spike. This is why cache warming isn't optional for high-hit-rate systems—it's mission critical.
Cache warming strategies can be categorized by when and how they populate the cache:
The Warming Strategy Spectrum:
| Strategy | When Executed | Best For | Complexity |
|---|---|---|---|
| Startup Warming | Before accepting traffic | Known hot keys, critical data | Low |
| Background Warming | Continuously during operation | Maintaining warm cache, refresh | Medium |
| Lazy Warming | On first access (cache miss) | Unknown access patterns | Low |
| Predictive Warming | Based on predicted future access | Time-based patterns, ML predictions | High |
| Replica Warming | Copy from existing cache | Scaling, failover scenarios | Medium |
| TTL Refresh | Before expiration | Preventing hot key expiration | Medium |
Choosing the Right Strategy:
Most production systems use a combination of strategies:
Typically 20% of your data serves 80% of requests. Focus startup warming on identifying and loading this hot subset. Lazy warming handles the remaining long tail. You don't need to warm everything—just enough to absorb initial traffic.
Startup warming populates the cache before the instance accepts traffic. This is the most important warming strategy for systems with high cache dependency.
Implementation Patterns:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219
/** * Startup cache warming implementation. * Warms cache before accepting traffic. */interface WarmingConfig { hotKeysSource: 'static' | 'analytics' | 'redis-keyspace'; maxWarmingTimeMs: number; maxConcurrency: number; minItemsToWarm: number; failOnWarmingError: boolean;} class StartupWarmer<V> { private cache: CacheClient<V>; private dataLoader: DataLoader<V>; private config: WarmingConfig; constructor( cache: CacheClient<V>, dataLoader: DataLoader<V>, config: WarmingConfig ) { this.cache = cache; this.dataLoader = dataLoader; this.config = config; } /** * Execute warming before accepting traffic. * Should be called during application startup. */ async warmBeforeTraffic(): Promise<WarmingResult> { const startTime = Date.now(); console.log('Starting cache warming...'); // 1. Identify keys to warm const hotKeys = await this.identifyHotKeys(); console.log(`Identified ${hotKeys.length} hot keys to warm`); if (hotKeys.length < this.config.minItemsToWarm) { console.warn(`Only found ${hotKeys.length} keys; expected at least ${this.config.minItemsToWarm}`); } // 2. Warm in parallel batches with concurrency limit const results = await this.warmKeysBatched(hotKeys); const elapsed = Date.now() - startTime; const result = this.compileResults(results, elapsed, hotKeys.length); console.log(`Warming complete: ${result.successCount}/${result.totalAttempted} keys in ${elapsed}ms`); // 3. Decide if we should proceed if (this.config.failOnWarmingError && result.errorCount > 0) { throw new Error(`Warming failed with ${result.errorCount} errors`); } return result; } /** * Identify hot keys based on configured source. */ private async identifyHotKeys(): Promise<string[]> { switch (this.config.hotKeysSource) { case 'static': return this.getStaticHotKeys(); case 'analytics': return this.getAnalyticsHotKeys(); case 'redis-keyspace': return this.getRedisKeyspaceHotKeys(); default: return this.getStaticHotKeys(); } } /** * Static hot keys from configuration. */ private getStaticHotKeys(): string[] { // Example: Critical configuration, feature flags, popular products return [ 'config:global', 'feature-flags:active', ...Array.from({ length: 100 }, (_, i) => `product:${i + 1}`), ...Array.from({ length: 50 }, (_, i) => `category:${i + 1}`), ]; } /** * Hot keys from analytics/metrics system. */ private async getAnalyticsHotKeys(): Promise<string[]> { // Query analytics for most-accessed keys in last period // This requires integration with your metrics system const response = await fetch( 'http://analytics-service/api/cache/hot-keys?period=24h&limit=1000' ); const data = await response.json(); return data.keys; } /** * Hot keys from Redis memory analysis. */ private async getRedisKeyspaceHotKeys(): Promise<string[]> { // Use Redis SCAN + OBJECT FREQ for hotkey detection // Requires Redis 4.0+ with LFU eviction policy // Alternative: Sample keys and check access patterns return []; // Implementation depends on Redis setup } /** * Warm keys in parallel with concurrency limit. */ private async warmKeysBatched(keys: string[]): Promise<WarmingAttempt[]> { const results: WarmingAttempt[] = []; const deadline = Date.now() + this.config.maxWarmingTimeMs; // Process in chunks respecting concurrency const chunks = this.chunkArray(keys, this.config.maxConcurrency); for (const chunk of chunks) { if (Date.now() > deadline) { console.warn('Warming time limit reached, stopping early'); break; } const chunkResults = await Promise.all( chunk.map(key => this.warmSingleKey(key)) ); results.push(...chunkResults); } return results; } /** * Warm a single key: load from source, store in cache. */ private async warmSingleKey(key: string): Promise<WarmingAttempt> { const start = Date.now(); try { // Load from authoritative source const value = await this.dataLoader.load(key); // Store in cache await this.cache.set(key, value); return { key, success: true, durationMs: Date.now() - start, }; } catch (error) { return { key, success: false, durationMs: Date.now() - start, error: error instanceof Error ? error.message : 'Unknown error', }; } } private chunkArray<T>(array: T[], size: number): T[][] { const chunks: T[][] = []; for (let i = 0; i < array.length; i += size) { chunks.push(array.slice(i, i + size)); } return chunks; } private compileResults( attempts: WarmingAttempt[], totalDurationMs: number, totalTargeted: number ): WarmingResult { const successful = attempts.filter(a => a.success); const failed = attempts.filter(a => !a.success); return { totalAttempted: attempts.length, successCount: successful.length, errorCount: failed.length, skippedCount: totalTargeted - attempts.length, totalDurationMs, avgItemDurationMs: attempts.length > 0 ? attempts.reduce((sum, a) => sum + a.durationMs, 0) / attempts.length : 0, errors: failed.map(f => ({ key: f.key, error: f.error! })), }; }} interface WarmingAttempt { key: string; success: boolean; durationMs: number; error?: string;} interface WarmingResult { totalAttempted: number; successCount: number; errorCount: number; skippedCount: number; totalDurationMs: number; avgItemDurationMs: number; errors: { key: string; error: string }[];} interface DataLoader<V> { load(key: string): Promise<V>;} interface CacheClient<V> { set(key: string, value: V): Promise<void>;}Deployment Integration:
Startup warming should integrate with your deployment lifecycle:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
/** * Kubernetes-style deployment with cache warming. */import express from 'express'; class ApplicationServer { private app: express.Application; private isReady: boolean = false; private warmer: StartupWarmer<unknown>; constructor() { this.app = express(); this.setupHealthEndpoints(); } private setupHealthEndpoints(): void { // Liveness: Is the process running? this.app.get('/health/live', (req, res) => { res.status(200).json({ status: 'alive' }); }); // Readiness: Should we receive traffic? this.app.get('/health/ready', (req, res) => { if (this.isReady) { res.status(200).json({ status: 'ready' }); } else { res.status(503).json({ status: 'warming' }); } }); } async start(port: number): Promise<void> { // Start server but not ready for traffic const server = this.app.listen(port, () => { console.log(`Server listening on port ${port} (not ready yet)`); }); // Warm cache before accepting traffic try { console.log('Beginning cache warming...'); const result = await this.warmer.warmBeforeTraffic(); console.log(`Cache warming complete: ${result.successCount} items`); // Now ready for traffic this.isReady = true; console.log('Server ready for traffic'); } catch (error) { console.error('Cache warming failed:', error); // Decide: fail startup or proceed with cold cache? if (process.env.REQUIRE_WARM_CACHE === 'true') { console.error('Shutting down due to warming failure'); process.exit(1); } else { console.warn('Proceeding with partially warm cache'); this.isReady = true; } } }} // Kubernetes deployment configuration (pseudo-YAML):// // readinessProbe:// httpGet:// path: /health/ready// port: 8080// initialDelaySeconds: 5// periodSeconds: 5//// livenessProbe:// httpGet:// path: /health/live// port: 8080// initialDelaySeconds: 10// periodSeconds: 10Set a maximum warming time. If warming takes too long, you risk deployment timeouts, stuck deployments, and delayed rollbacks. Better to proceed with 80% warm than block indefinitely trying for 100%.
Background warming maintains cache freshness during normal operation. It proactively refreshes entries before they expire, preventing cache misses on hot keys.
The Problem with TTL Expiration:
Consider a product detail page cached with 5-minute TTL:
Background refresh solves this by refreshing before expiration.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172
/** * Background cache refresh to prevent TTL thundering herd. * Refreshes entries before they expire. */interface RefreshConfig { refreshWindowMs: number; // Refresh this long before expiry maxConcurrentRefreshes: number; refreshIntervalMs: number; // How often to check for items to refresh} class BackgroundRefresher<V> { private cache: CacheWithMetadata<V>; private loader: DataLoader<V>; private config: RefreshConfig; private refreshing: Set<string> = new Set(); private intervalId?: NodeJS.Timeout; constructor( cache: CacheWithMetadata<V>, loader: DataLoader<V>, config: RefreshConfig ) { this.cache = cache; this.loader = loader; this.config = config; } start(): void { console.log('Starting background cache refresher'); this.intervalId = setInterval( () => this.refreshDueItems(), this.config.refreshIntervalMs ); } stop(): void { if (this.intervalId) { clearInterval(this.intervalId); this.intervalId = undefined; } } /** * Find and refresh items approaching expiration. */ private async refreshDueItems(): Promise<void> { const now = Date.now(); const refreshThreshold = now + this.config.refreshWindowMs; // Find items due for refresh const dueItems = this.cache.getItemsExpiringBefore(refreshThreshold); // Filter out items already being refreshed const toRefresh = dueItems .filter(item => !this.refreshing.has(item.key)) .slice(0, this.config.maxConcurrentRefreshes); if (toRefresh.length > 0) { console.log(`Refreshing ${toRefresh.length} cache items in background`); } // Refresh in parallel await Promise.all( toRefresh.map(item => this.refreshItem(item.key)) ); } /** * Refresh a single item. */ private async refreshItem(key: string): Promise<void> { this.refreshing.add(key); try { // Load fresh data const value = await this.loader.load(key); // Update cache with fresh TTL await this.cache.set(key, value); console.log(`Refreshed cache key: ${key}`); } catch (error) { console.error(`Failed to refresh ${key}:`, error); // Don't delete existing entry - better to serve stale than nothing } finally { this.refreshing.delete(key); } }} interface CacheWithMetadata<V> { getItemsExpiringBefore(timestamp: number): { key: string; expiresAt: number }[]; set(key: string, value: V): Promise<void>;} /** * Alternative: Request-time stale-while-revalidate pattern. */class StaleWhileRevalidateCache<V> { private cache: Map<string, SWREntry<V>> = new Map(); private loader: DataLoader<V>; private refreshing: Set<string> = new Set(); constructor( loader: DataLoader<V>, private ttlMs: number, private staleWindowMs: number // Serve stale for this long while refreshing ) { this.loader = loader; } async get(key: string): Promise<V | undefined> { const entry = this.cache.get(key); const now = Date.now(); if (!entry) { return undefined; // Cache miss } const isFresh = now < entry.expiresAt; const isStale = now >= entry.expiresAt && now < entry.expiresAt + this.staleWindowMs; const isExpired = now >= entry.expiresAt + this.staleWindowMs; if (isFresh) { return entry.value; // Fresh hit } if (isStale) { // Serve stale, trigger background refresh if (!this.refreshing.has(key)) { this.triggerBackgroundRefresh(key); } return entry.value; // Stale hit (user doesn't wait) } if (isExpired) { this.cache.delete(key); return undefined; // Expired, treat as miss } return undefined; } private async triggerBackgroundRefresh(key: string): Promise<void> { this.refreshing.add(key); try { const value = await this.loader.load(key); this.set(key, value); } catch (error) { console.error(`Background refresh failed for ${key}`); } finally { this.refreshing.delete(key); } } set(key: string, value: V): void { this.cache.set(key, { value, createdAt: Date.now(), expiresAt: Date.now() + this.ttlMs, }); }} interface SWREntry<V> { value: V; createdAt: number; expiresAt: number;}The stale-while-revalidate pattern (borrowed from HTTP caching) provides the best user experience: users always get a fast response (even if slightly stale), while freshness is maintained through background updates. This pattern is used extensively at Netflix, Airbnb, and other high-traffic systems.
Predictive warming pre-fetches data based on anticipated future access patterns. This is the most sophisticated warming strategy, using historical patterns or machine learning to predict what data will be needed.
Predictive Warming Triggers:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168
/** * Predictive cache warming based on access patterns. */interface AccessPattern { key: string; hourOfDay: number[]; // Hours when frequently accessed dayOfWeek: number[]; // Days when frequently accessed (0=Sun) avgAccessesPerHour: number;} class PredictiveWarmer<V> { private patterns: Map<string, AccessPattern> = new Map(); private cache: CacheClient<V>; private loader: DataLoader<V>; constructor(cache: CacheClient<V>, loader: DataLoader<V>) { this.cache = cache; this.loader = loader; } /** * Load access patterns from analytics. */ async loadPatterns(source: PatternSource): Promise<void> { const patterns = await source.getAccessPatterns(); for (const pattern of patterns) { this.patterns.set(pattern.key, pattern); } console.log(`Loaded ${patterns.length} access patterns`); } /** * Schedule warming based on time patterns. */ scheduleTimedWarming(): void { // Run prediction every hour setInterval(() => { this.warmPredictedKeys(); }, 60 * 60 * 1000); // Also run immediately this.warmPredictedKeys(); } /** * Warm keys predicted to be accessed soon. */ private async warmPredictedKeys(): Promise<void> { const now = new Date(); const currentHour = now.getHours(); const currentDay = now.getDay(); const nextHour = (currentHour + 1) % 24; // Find keys expected to be hot in the next hour const predictedHot: string[] = []; for (const [key, pattern] of this.patterns) { if (pattern.hourOfDay.includes(nextHour) && pattern.dayOfWeek.includes(currentDay) && pattern.avgAccessesPerHour > 10) { // Threshold predictedHot.push(key); } } console.log(`Predictive warming: ${predictedHot.length} keys for upcoming hour`); // Warm predicted keys await Promise.all( predictedHot.map(key => this.warmIfMissing(key)) ); } private async warmIfMissing(key: string): Promise<void> { // Check if already cached const existing = await this.cache.get(key); if (existing !== null) { return; // Already warm } // Load and cache try { const value = await this.loader.load(key); await this.cache.set(key, value); } catch (error) { console.error(`Predictive warming failed for ${key}`); } }} /** * Event-triggered warming. */class EventTriggeredWarmer<V> { private eventHandlers: Map<string, (event: CacheWarmingEvent) => string[]>; private cache: CacheClient<V>; private loader: DataLoader<V>; constructor(cache: CacheClient<V>, loader: DataLoader<V>) { this.cache = cache; this.loader = loader; this.eventHandlers = new Map(); } /** * Register warming handler for event type. */ registerHandler( eventType: string, handler: (event: CacheWarmingEvent) => string[] ): void { this.eventHandlers.set(eventType, handler); } /** * Handle incoming event. */ async handleEvent(event: CacheWarmingEvent): Promise<void> { const handler = this.eventHandlers.get(event.type); if (!handler) { return; } const keysToWarm = handler(event); console.log(`Event '${event.type}' triggered warming of ${keysToWarm.length} keys`); await Promise.all( keysToWarm.map(key => this.warmKey(key)) ); } private async warmKey(key: string): Promise<void> { try { const value = await this.loader.load(key); await this.cache.set(key, value); } catch (error) { console.error(`Event-triggered warming failed for ${key}`); } }} interface CacheWarmingEvent { type: string; payload: unknown;} interface PatternSource { getAccessPatterns(): Promise<AccessPattern[]>;} // Example: Marketing campaign warmingconst eventWarmer = new EventTriggeredWarmer(cache, loader); eventWarmer.registerHandler('marketing.email.sent', (event) => { const campaign = event.payload as { productIds: string[] }; // Warm all products featured in email return campaign.productIds.map(id => `product:${id}`);}); eventWarmer.registerHandler('feature.announcement', (event) => { const feature = event.payload as { featureKey: string }; // Warm documentation and related pages return [ `feature:${feature.featureKey}:docs`, `feature:${feature.featureKey}:faq`, ];});You don't need ML for useful predictions. Simple heuristics work well: "warm homepage products before the workday starts," "warm product pages when inventory alert sent." Add complexity only when simple rules prove insufficient.
Scaling events—adding instances, replacing failed nodes, or handling failover—create cold cache challenges similar to deployments. Proper warming strategies prevent performance degradation during scaling.
Scaling Scenarios and Strategies:
| Scenario | Challenge | Warming Strategy |
|---|---|---|
| Scale out (add instances) | New instances have empty local cache | Pre-warm from L2 or gradual traffic ramp |
| Scale in (remove instances) | Removed instance's cache lost | No action (remaining instances warm) |
| Rolling restart | Each instance restarts cold | Stagger restarts, pre-warm before traffic |
| Cache node failure | Portion of distributed cache lost | Replica promotion or rehashing with pre-warm |
| Region failover | All caches in failed region lost | Pre-warm new region from backup before traffic |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149
/** * Cache warming during auto-scaling events. */interface ScalingWarmingConfig { warmingConcurrency: number; maxWarmingTimeMs: number; trafficRampDurationMs: number; initialTrafficPercent: number;} class ScalingWarmingManager { private config: ScalingWarmingConfig; private loadBalancerClient: LoadBalancerClient; private warmer: StartupWarmer<unknown>; constructor( config: ScalingWarmingConfig, loadBalancerClient: LoadBalancerClient, warmer: StartupWarmer<unknown> ) { this.config = config; this.loadBalancerClient = loadBalancerClient; this.warmer = warmer; } /** * Scale out with proper warming. * Called when new instances are launched. */ async onNewInstanceLaunched(instanceId: string): Promise<void> { console.log(`New instance ${instanceId} launched, beginning warm-up`); // 1. Instance is NOT in load balancer yet // 2. Execute cache warming const warmingResult = await this.warmer.warmBeforeTraffic(); console.log(`Instance ${instanceId} warming complete: ${warmingResult.successCount} items`); // 3. Gradually add to load balancer await this.gradualTrafficRamp(instanceId); } /** * Gradually increase traffic to new instance. */ private async gradualTrafficRamp(instanceId: string): Promise<void> { const steps = 5; const stepDuration = this.config.trafficRampDurationMs / steps; let currentPercent = this.config.initialTrafficPercent; const increment = (100 - currentPercent) / (steps - 1); for (let i = 0; i < steps; i++) { await this.loadBalancerClient.setInstanceWeight(instanceId, currentPercent); console.log(`Instance ${instanceId} traffic: ${currentPercent}%`); if (i < steps - 1) { await this.sleep(stepDuration); currentPercent += increment; } } console.log(`Instance ${instanceId} at full traffic`); } private sleep(ms: number): Promise<void> { return new Promise(resolve => setTimeout(resolve, ms)); }} /** * Distributed cache replica warming. * For when cache nodes fail and need replacement. */class ReplicaWarmer { /** * Warm new cache node from surviving replicas. */ async warmFromReplica( newNode: CacheNode, sourceNode: CacheNode, keyPattern: string ): Promise<void> { console.log(`Warming ${newNode.id} from replica ${sourceNode.id}`); // Get keys matching pattern from source const keys = await sourceNode.scanKeys(keyPattern); console.log(`Found ${keys.length} keys to replicate`); // Stream data from source to new node const batchSize = 100; for (let i = 0; i < keys.length; i += batchSize) { const batch = keys.slice(i, i + batchSize); // Get values from source const values = await sourceNode.mget(batch); // Set on new node await newNode.mset(values); console.log(`Replicated ${Math.min(i + batchSize, keys.length)}/${keys.length} keys`); } console.log(`Replica warming complete`); }} /** * Region failover with cache warming. */class RegionFailoverManager { /** * Execute failover to backup region with warming. */ async failoverToBackupRegion( primaryRegion: string, backupRegion: string ): Promise<void> { console.log(`Initiating failover from ${primaryRegion} to ${backupRegion}`); // 1. Stop routing traffic to failed region await this.routingService.disableRegion(primaryRegion); // 2. Warm backup region cache from data store console.log('Warming backup region cache...'); const warmingResult = await this.regionalWarmer.warmRegion(backupRegion); console.log(`Warmed ${warmingResult.successCount} items in backup region`); // 3. Gradually route traffic to backup region await this.routingService.gradualEnable(backupRegion, { initialPercent: 10, incrementPercent: 10, intervalMs: 30000, }); console.log(`Failover to ${backupRegion} complete`); }} interface LoadBalancerClient { setInstanceWeight(instanceId: string, percent: number): Promise<void>;} interface CacheNode { id: string; scanKeys(pattern: string): Promise<string[]>; mget(keys: string[]): Promise<Map<string, unknown>>; mset(values: Map<string, unknown>): Promise<void>;}Never immediately send 100% traffic share to a cold instance. Gradually ramp traffic (10% → 25% → 50% → 100%) while the cache warms from actual requests. This converts cold-start into gradual warming with minimal user impact.
Cache warming, when implemented poorly, can cause as many problems as it solves. Here are best practices and pitfalls to avoid:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116
/** * Cache warming readiness checklist. */interface WarmingChecklist { // Configuration hasTimeLimit: boolean; hasConcurrencyLimit: boolean; hasHotKeyIdentification: boolean; // Integration integratedWithHealthCheck: boolean; integratedWithDeployment: boolean; integratedWithScaling: boolean; // Observability hasSuccessMetrics: boolean; hasFailureMetrics: boolean; hasDurationMetrics: boolean; hasAlerting: boolean; // Resilience handlesMissingData: boolean; handlesSlowBackingStore: boolean; handlesBackingStoreFailure: boolean; // Testing testedInDevelopment: boolean; testedInStaging: boolean; loadTestedWarmingImpact: boolean;} function evaluateWarmingReadiness(checklist: WarmingChecklist): { ready: boolean; score: number; issues: string[];} { const issues: string[] = []; let score = 0; const maxScore = Object.keys(checklist).length; // Critical items if (!checklist.hasTimeLimit) { issues.push('CRITICAL: No time limit - warming could hang forever'); } else score++; if (!checklist.hasConcurrencyLimit) { issues.push('CRITICAL: No concurrency limit - could overwhelm backing store'); } else score++; if (!checklist.integratedWithHealthCheck) { issues.push('CRITICAL: Not integrated with health check - traffic routed before ready'); } else score++; // Important items if (!checklist.hasHotKeyIdentification) { issues.push('WARNING: No hot key identification - may warm wrong data'); } else score++; if (!checklist.hasFailureMetrics) { issues.push('WARNING: No failure metrics - can\'t detect warming problems'); } else score++; if (!checklist.handlesMissingData) { issues.push('WARNING: Doesn\'t handle missing data - may fail on edge cases'); } else score++; // Nice to have const niceToHaveFields: (keyof WarmingChecklist)[] = [ 'integratedWithDeployment', 'integratedWithScaling', 'hasSuccessMetrics', 'hasDurationMetrics', 'hasAlerting', 'handlesSlowBackingStore', 'handlesBackingStoreFailure', 'testedInDevelopment', 'testedInStaging', 'loadTestedWarmingImpact', ]; for (const field of niceToHaveFields) { if (checklist[field]) score++; } const ready = !issues.some(i => i.startsWith('CRITICAL')); return { ready, score: Math.round((score / maxScore) * 100), issues, };} // Example evaluationconst myWarmingSetup: WarmingChecklist = { hasTimeLimit: true, hasConcurrencyLimit: true, hasHotKeyIdentification: true, integratedWithHealthCheck: true, integratedWithDeployment: true, integratedWithScaling: false, hasSuccessMetrics: true, hasFailureMetrics: true, hasDurationMetrics: true, hasAlerting: false, handlesMissingData: true, handlesSlowBackingStore: false, handlesBackingStoreFailure: false, testedInDevelopment: true, testedInStaging: true, loadTestedWarmingImpact: false,}; const evaluation = evaluateWarmingReadiness(myWarmingSetup);console.log(evaluation);// { ready: true, score: 73, issues: [...] }If your system relies on 90%+ cache hit rates to meet SLOs, cache warming is mandatory infrastructure. Treat it with the same rigor as database migrations or load balancer configuration. Test it, monitor it, and plan for its failure.
Cache warming transforms cold-start risk into managed operations. Let's consolidate the key principles:
Module Complete:
You have now completed the Cache Design Considerations module. You've mastered:
These concepts equip you to design and implement production-grade caching systems that perform reliably at scale.
You now possess a comprehensive understanding of cache design considerations at the LLD level. From key design to warming strategies, you can architect caching solutions that maximize hit rates, maintain consistency, handle failures gracefully, and perform optimally from the first request. Apply these patterns to build systems that scale confidently.