Loading learning content...
The most effective way to reduce latency is to avoid the slow operation entirely. Caching is the systematic application of this principle—storing results of expensive operations close to where they're needed, transforming seconds into microseconds.
Consider the latency hierarchy:
| Storage Layer | Access Time | Relative Speed |
|---|---|---|
| CPU L1 Cache | ~1 ns | 1x |
| CPU L3 Cache | ~10 ns | 10x slower |
| Main Memory (RAM) | ~100 ns | 100x slower |
| Local SSD | ~100 μs | 100,000x slower |
| Network (same DC) | ~500 μs | 500,000x slower |
| Network (cross-DC) | ~50 ms | 50,000,000x slower |
| Database query | 10-1000 ms | 10,000,000-1,000,000,000x slower |
Every layer you climb in this hierarchy provides orders of magnitude latency reduction. A Redis cache hit (500μs) is 2000x faster than a database query (1000ms). An in-memory application cache hit (100μs) is 10,000x faster.
By completing this page, you will understand how to design cache hierarchies for minimal latency, implement cache access patterns that maximize hit rates, handle cache misses gracefully, and make intelligent tradeoffs between latency, consistency, and memory usage.
Where you place caches determines the latency characteristics of your system. Each layer offers different tradeoffs between latency, capacity, consistency, and operational complexity.
The Multi-Layer Cache Hierarchy:
Client Request
↓
[Browser Cache] ─────────── 0ms (already on client)
↓ miss
[CDN Edge Cache] ─────────── 5-20ms (nearest POP)
↓ miss
[API Gateway Cache] ──────── 1-5ms (edge of infrastructure)
↓ miss
[Application In-Memory] ──── <1ms (same process)
↓ miss
[Distributed Cache] ──────── 1-10ms (Redis/Memcached)
↓ miss
[Database Query Cache] ───── 1-5ms (if enabled)
↓ miss
[Database Disk] ─────────── 10-1000ms (actual storage)
Each layer intercepts requests before they reach slower layers. The goal is to satisfy as many requests as possible at the fastest layer.
| Layer | Latency | Capacity | Best For | Challenges |
|---|---|---|---|---|
| Browser Cache | 0ms | MB-GB | Static assets, user-specific data | Limited control, varies by client |
| CDN Edge | 5-20ms | TB | Static content, cacheable APIs | Invalidation delay, edge complexity |
| API Gateway | 1-5ms | GB | Auth tokens, rate limiting, common responses | Single point of failure concern |
| In-Memory (App) | <1ms | MB-GB | Hot data, computed values, session | Lost on restart, not shared across instances |
| Distributed (Redis) | 1-10ms | TB | Shared state, sessions, feature flags | Network hop, operational overhead |
| Database Query | 1-5ms | MB-GB | Repeated exact queries | Limited hit rates, memory contention |
Layer Selection Guidelines:
Cache at the client when:
Cache at the CDN/edge when:
Cache in the application when:
Cache in distributed cache when:
Combine in-memory (L1) and distributed (L2) caches for optimal latency. Check local memory first (<1ms), then Redis (1-10ms), then database. This pattern handles 90%+ of requests from L1 while L2 provides consistency across instances. Libraries like Caffeine (Java) or node-cache (Node.js) plus Redis form this naturally.
The access pattern—how you interact with the cache—directly impacts achieved latency. Some patterns are designed for throughput; others optimize specifically for latency.
Cache-Aside (Lazy Loading):
The most common pattern. Application checks cache first, loads from database on miss, and populates cache.
12345678910111213141516171819202122232425262728293031323334353637
// Cache-Aside pattern implementationimport Redis from 'ioredis'; const redis = new Redis();const CACHE_TTL = 3600; // 1 hour interface User { id: string; name: string; email: string;} async function getUserById(userId: string): Promise<User | null> { const cacheKey = `user:${userId}`; // Step 1: Check cache first const cached = await redis.get(cacheKey); if (cached) { // Cache hit - return immediately (1-10ms) return JSON.parse(cached); } // Step 2: Cache miss - load from database (10-1000ms) const user = await db.users.findUnique({ where: { id: userId } }); if (user) { // Step 3: Populate cache for next time await redis.setex(cacheKey, CACHE_TTL, JSON.stringify(user)); } return user;} // Latency characteristics:// - Cache hit: 1-10ms// - Cache miss: 10-1000ms + 1-10ms (to populate)// - First request for any user is always slowRead-Through Cache:
Cache handles loading transparently. Application only interacts with cache; cache loads from source on miss.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
// Read-Through cache with automatic loadingclass ReadThroughCache<T> { private cache: Map<string, { value: T; expiresAt: number }> = new Map(); private pending: Map<string, Promise<T>> = new Map(); constructor( private loader: (key: string) => Promise<T>, private ttlMs: number = 60000 ) {} async get(key: string): Promise<T> { // Check if cached and not expired const cached = this.cache.get(key); if (cached && cached.expiresAt > Date.now()) { return cached.value; } // Prevent thundering herd - only one loader per key if (this.pending.has(key)) { return this.pending.get(key)!; } // Load and cache const loadPromise = this.loader(key).then(value => { this.cache.set(key, { value, expiresAt: Date.now() + this.ttlMs }); this.pending.delete(key); return value; }).catch(err => { this.pending.delete(key); throw err; }); this.pending.set(key, loadPromise); return loadPromise; }} // Usage - application never directly queries databaseconst userCache = new ReadThroughCache<User>( async (userId) => db.users.findUnique({ where: { id: userId } }), 3600000 // 1 hour TTL); async function handleRequest(userId: string) { // Cache handles everything - always same interface const user = await userCache.get(userId); return user;}Cache Warming / Preloading:
For latency-critical paths, cold cache is unacceptable. Preload cache with expected hot data before traffic arrives.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
// Cache warming strategies // 1. Warm on startupasync function warmCacheOnStartup() { console.log('Warming cache...'); // Load most accessed items const popularProducts = await db.products.findMany({ orderBy: { viewCount: 'desc' }, take: 1000 }); await Promise.all( popularProducts.map(p => redis.setex(`product:${p.id}`, 3600, JSON.stringify(p)) ) ); console.log(`Warmed ${popularProducts.length} products`);} // 2. Warm before deployment/cache flushasync function preWarmBeforeCacheFlush() { // Get current cache keys const keys = await redis.keys('product:*'); // Refresh all in background for (const key of keys) { const id = key.split(':')[1]; const product = await db.products.findUnique({ where: { id } }); if (product) { await redis.setex(key, 3600, JSON.stringify(product)); } }} // 3. Predictive warming based on patternsasync function predictiveWarm() { // Warm Monday morning data on Sunday night const upcomingReports = await db.reports.findMany({ where: { scheduledFor: { gte: new Date(), lte: new Date(Date.now() + 24 * 60 * 60 * 1000) } } }); for (const report of upcomingReports) { // Pre-compute and cache report data const data = await computeReportData(report.id); await redis.setex(`report:${report.id}`, 86400, JSON.stringify(data)); }} // Run warming on server startwarmCacheOnStartup().catch(console.error);After deployments, cache restarts, or traffic spikes to uncached data, latency will spike. Monitor cold cache latency separately from warm cache latency. Implement cache warming for critical paths, and use gradual rollouts to warm caches before receiving full traffic.
Caching can paradoxically increase latency if not handled properly. Several patterns cause cache-related latency spikes.
Thundering Herd / Cache Stampede:
When a cached item expires, hundreds of concurrent requests all find a cache miss simultaneously, all query the database, overwhelming it.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111
// Stampede protection with distributed lockimport Redis from 'ioredis';import Redlock from 'redlock'; const redis = new Redis();const redlock = new Redlock([redis]); async function getCachedWithLock<T>( key: string, ttl: number, loader: () => Promise<T>): Promise<T> { // Try to get from cache const cached = await redis.get(key); if (cached) { return JSON.parse(cached); } // Cache miss - acquire lock to prevent stampede const lockKey = `lock:${key}`; try { // Try to acquire lock (wait up to 1s, lock lasts 5s) const lock = await redlock.acquire([lockKey], 5000); try { // Double-check cache (another request may have populated) const recheckCached = await redis.get(key); if (recheckCached) { return JSON.parse(recheckCached); } // We have the lock, load from source const value = await loader(); await redis.setex(key, ttl, JSON.stringify(value)); return value; } finally { await lock.release(); } } catch (lockError) { // Could not acquire lock - another request is loading // Wait briefly and try cache again await new Promise(r => setTimeout(r, 100)); const waitedCached = await redis.get(key); if (waitedCached) { return JSON.parse(waitedCached); } // Still no cache, fall through to direct load // (prevents deadlock if lock holder crashes) return loader(); }} // Alternative: Stale-While-Revalidate// Returns stale data immediately while refreshing in backgroundasync function getWithSWR<T>( key: string, ttl: number, staleTtl: number, // How long stale data is acceptable loader: () => Promise<T>): Promise<T> { const data = await redis.hgetall(`swr:${key}`); const now = Date.now(); if (data.value) { const expiresAt = parseInt(data.expiresAt); const staleUntil = parseInt(data.staleUntil); if (now < expiresAt) { // Fresh - return immediately return JSON.parse(data.value); } if (now < staleUntil) { // Stale but acceptable - return immediately, refresh async refreshInBackground(key, ttl, staleTtl, loader); return JSON.parse(data.value); } } // No data or too stale - must wait for fresh return refreshAndReturn(key, ttl, staleTtl, loader);} async function refreshInBackground<T>( key: string, ttl: number, staleTtl: number, loader: () => Promise<T>) { // Set a flag to prevent multiple background refreshes const refreshKey = `refreshing:${key}`; const acquired = await redis.set(refreshKey, '1', 'EX', 30, 'NX'); if (!acquired) return; // Another process is refreshing try { const value = await loader(); const now = Date.now(); await redis.hset(`swr:${key}`, { value: JSON.stringify(value), expiresAt: (now + ttl * 1000).toString(), staleUntil: (now + (ttl + staleTtl) * 1000).toString() }); } finally { await redis.del(refreshKey); }}Other Cache Latency Issues:
Cache Serialization Overhead: JSON serialization/deserialization of large objects adds latency. For hot paths with complex objects, consider:
Network Latency to Cache: Distributed caches add network round-trip. For ultra-low-latency needs:
Large Value Latency: Retrieving 10MB cached values takes time (network transfer). Solutions:
When making multiple Redis calls, pipeline them. Without pipelining, 10 Redis calls = 10 round-trips = 10-100ms. With pipelining, 10 calls = 1 round-trip = 1-10ms. Use MULTI/EXEC for atomic pipelines, or client.pipeline() for non-atomic batching.
Cache key design impacts hit rates, memory efficiency, and debugging. Poor key design leads to cache misses for data that should be cached, or cache pollution from data that shouldn't.
Key Design Principles:
products:us:featured vs products:de:featureduser:123:orders not sometimes orders:user:123. Establish conventions.v2:user:123 allows safe migration when cache format changes.auth:token:xyz, catalog:product:456 prevents collisions in shared cache.u:123 saves bytes vs user_id:123 at scale.x:a1b2c3 is poor; session:a1b2c3 is clear.123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
// Cache key builder for consistencyclass CacheKeyBuilder { private static VERSION = 'v1'; private static SEPARATOR = ':'; // User-related keys static user(userId: string): string { return `${this.VERSION}:user:${userId}`; } static userOrders(userId: string, status?: string): string { const base = `${this.VERSION}:user:${userId}:orders`; return status ? `${base}:${status}` : base; } // Product-related keys with locale static product(productId: string, locale: string = 'en'): string { return `${this.VERSION}:prod:${locale}:${productId}`; } static productList(category: string, page: number, locale: string = 'en'): string { return `${this.VERSION}:prodlist:${locale}:${category}:p${page}`; } // Search results - include all query parameters static searchResults(params: { query: string; category?: string; sortBy?: string; page?: number; }): string { // Normalize and sort params for consistent keys const normalized = { q: params.query.toLowerCase().trim(), c: params.category || 'all', s: params.sortBy || 'relevance', p: params.page || 1 }; // Create deterministic key from params const hash = crypto .createHash('md5') .update(JSON.stringify(normalized)) .digest('hex') .substring(0, 12); return `${this.VERSION}:search:${hash}`; } // Session keys with TTL consideration static session(sessionId: string): string { // Sessions are short-lived, no version needed return `sess:${sessionId}`; } // Feature flags - include rollout segment static featureFlag(flagName: string, userId?: string): string { // Global flag or user-specific override return userId ? `ff:${flagName}:u:${userId}` : `ff:${flagName}:global`; }} // Usageconst userKey = CacheKeyBuilder.user('12345');// => "v1:user:12345" const searchKey = CacheKeyBuilder.searchResults({ query: 'laptop', category: 'electronics', page: 2});// => "v1:search:a7b3c9d2e1f0" // Wildcard patterns for bulk invalidation// Redis: SCAN with pattern matchingasync function invalidateUserCache(userId: string) { const pattern = `v1:user:${userId}:*`; let cursor = '0'; do { const [nextCursor, keys] = await redis.scan(cursor, 'MATCH', pattern, 'COUNT', 100); cursor = nextCursor; if (keys.length > 0) { await redis.del(...keys); } } while (cursor !== '0');}Be careful with high-cardinality keys. If every unique search query gets a cache key, you'll have millions of rarely-hit keys consuming memory. Set appropriate TTLs for high-cardinality caches (short, like 5 minutes) and monitor cache hit rates. If hit rate is below 10%, the cache may not be worth the memory.
Time-To-Live (TTL) balances data freshness against cache hit rate. Longer TTLs mean higher hit rates but staler data. The right TTL depends on how quickly data changes and how tolerant users are of staleness.
TTL Selection Framework:
| Data Type | Suggested TTL | Rationale |
|---|---|---|
| Static assets (JS, CSS, images) | 1 year + versioned URLs | Never changes without URL change |
| Reference data (countries, categories) | 24 hours | Changes very rarely |
| User profiles | 1-6 hours | Changes occasionally, staleness usually acceptable |
| Product details | 5-60 minutes | Changes with inventory, pricing updates |
| Search results | 1-5 minutes | Needs freshness but exact consistency not critical |
| Session/auth tokens | Match token expiry | Security-sensitive, must not outlive token |
| Real-time data (stock prices) | Seconds or none | Staleness directly impacts correctness |
| Computed aggregates | 1-60 minutes | Expensive to compute, staleness often acceptable |
Advanced TTL Patterns:
Adaptive TTL: Adjust TTL based on data volatility. Frequently-changing data gets short TTL; stable data gets long TTL.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
// Adaptive TTL based on change frequencyclass AdaptiveTTLCache { private changeHistory: Map<string, number[]> = new Map(); private calculateTTL(key: string): number { const history = this.changeHistory.get(key) || []; if (history.length < 2) { // Not enough history, use default return 300; // 5 minutes } // Calculate average time between changes const intervals: number[] = []; for (let i = 1; i < history.length; i++) { intervals.push(history[i] - history[i-1]); } const avgInterval = intervals.reduce((a, b) => a + b, 0) / intervals.length; // TTL = 50% of average change interval // (ensures usually fresh, occasional staleness) const calculatedTTL = Math.floor(avgInterval * 0.5 / 1000); // Clamp to reasonable bounds return Math.max(60, Math.min(3600, calculatedTTL)); } async set(key: string, value: unknown) { const ttl = this.calculateTTL(key); await redis.setex(key, ttl, JSON.stringify(value)); // Record change timestamp const history = this.changeHistory.get(key) || []; history.push(Date.now()); // Keep last 10 changes if (history.length > 10) history.shift(); this.changeHistory.set(key, history); console.log(`Cache set: ${key} with TTL ${ttl}s`); }} // Jittered TTL to prevent synchronized expirationfunction ttlWithJitter(baseTtl: number, jitterPercent: number = 10): number { const jitterRange = baseTtl * (jitterPercent / 100); const jitter = (Math.random() - 0.5) * 2 * jitterRange; return Math.floor(baseTtl + jitter);} // Usage: All keys expire around 300s, but not simultaneouslyawait redis.setex(key, ttlWithJitter(300), value);// Results in TTLs between 270-330 seconds // Probabilistic Early Expiration (to prevent stampede)function shouldRefresh(ttl: number, remainingTtl: number): boolean { if (remainingTtl <= 0) return true; // Probability increases as expiration approaches // At 10% remaining TTL, ~10% chance to refresh const fractionRemaining = remainingTtl / ttl; const refreshProbability = Math.max(0, (1 - fractionRemaining) * 0.2); return Math.random() < refreshProbability;}TTL is simpler but means data can be stale up to TTL duration. For consistent user experience (e.g., user updates their name and sees it immediately), combine TTL with active invalidation. Keep long TTL for efficiency, but explicitly delete cache keys when data changes. Event-driven invalidation (user.updated → delete cache) provides both efficiency and consistency.
When multiple concurrent requests need the same data, request coalescing merges them into a single backend call. This reduces load and prevents redundant work.
The Problem:
Imagine 100 users open a product page simultaneously. Without coalescing:
With coalescing:
Implementation: Single-Flight Pattern
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
// Single-flight / request coalescing implementationclass SingleFlight<T> { private inFlight: Map<string, Promise<T>> = new Map(); async do(key: string, fn: () => Promise<T>): Promise<T> { // Check if request is already in flight const existing = this.inFlight.get(key); if (existing) { // Join existing request instead of making new one return existing; } // First request - make the call const promise = fn().finally(() => { // Clean up after completion this.inFlight.delete(key); }); this.inFlight.set(key, promise); return promise; }} // Usage with cacheconst singleFlight = new SingleFlight<Product>(); async function getProduct(productId: string): Promise<Product> { const cacheKey = `product:${productId}`; // Check cache first const cached = await redis.get(cacheKey); if (cached) return JSON.parse(cached); // Use single-flight for database call const product = await singleFlight.do(cacheKey, async () => { const result = await db.products.findUnique({ where: { id: productId } }); // Populate cache if (result) { await redis.setex(cacheKey, 3600, JSON.stringify(result)); } return result; }); return product;} // Now 100 concurrent calls to getProduct('abc') result in:// - 1 database query// - 1 cache write// - 100 successful responses // DataLoader pattern (popular in GraphQL)// Automatically batches and deduplicates within a requestimport DataLoader from 'dataloader'; // Create loader that batches requestsconst productLoader = new DataLoader<string, Product>(async (ids) => { // Single query for all requested IDs const products = await db.products.findMany({ where: { id: { in: [...ids] } } }); // Return in same order as requested IDs const productMap = new Map(products.map(p => [p.id, p])); return ids.map(id => productMap.get(id) || null);}, { // Cache within request (typically request-scoped loader) cache: true, // Batch within 10ms window batchScheduleFn: callback => setTimeout(callback, 10)}); // Usage - calls are automatically batchedasync function getRelatedProducts(productId: string) { const product = await productLoader.load(productId); // Even if these are loaded in separate parts of code, // they'll be batched into one DB query const related = await Promise.all( product.relatedIds.map(id => productLoader.load(id)) ); return { product, related };}DataLoader should typically be request-scoped (new instance per HTTP request). This ensures batching within a request while preventing cross-request data leakage. Create the loader in middleware and attach to request context. For SingleFlight on cache, global scope is usually fine since results are cached anyway.
Cache invalidation is famously one of the two hard problems in computer science. From a latency perspective, the goal is to maintain high hit rates while ensuring users don't see unacceptably stale data.
Invalidation Strategies:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
// Event-driven cache invalidationimport { EventEmitter } from 'events'; // Create application event busconst eventBus = new EventEmitter(); // Entity update handlerseventBus.on('user.updated', async (userId: string) => { // Invalidate user cache await redis.del(`v1:user:${userId}`); // Also invalidate derived caches await redis.del(`v1:user:${userId}:orders`); await redis.del(`v1:user:${userId}:preferences`); console.log(`Invalidated cache for user ${userId}`);}); eventBus.on('product.updated', async (productId: string) => { // Invalidate in all locales const locales = ['en', 'de', 'fr', 'es']; await Promise.all( locales.map(locale => redis.del(`v1:prod:${locale}:${productId}`) ) ); // Invalidate any lists containing this product // (harder - may need tracking of dependencies)}); // Usage in update functionasync function updateUser(userId: string, updates: Partial<User>) { // Update database const user = await db.users.update({ where: { id: userId }, data: updates }); // Emit event for cache invalidation eventBus.emit('user.updated', userId); return user;} // Versioned cache keys for "instant" invalidationclass VersionedCache { private async getVersion(entity: string, id: string): Promise<number> { const version = await redis.get(`version:${entity}:${id}`); return version ? parseInt(version) : 1; } private async incrementVersion(entity: string, id: string): Promise<void> { await redis.incr(`version:${entity}:${id}`); } async get<T>(entity: string, id: string): Promise<T | null> { const version = await this.getVersion(entity, id); const key = `${entity}:${id}:v${version}`; const cached = await redis.get(key); return cached ? JSON.parse(cached) : null; } async set<T>(entity: string, id: string, value: T, ttl: number): Promise<void> { const version = await this.getVersion(entity, id); const key = `${entity}:${id}:v${version}`; await redis.setex(key, ttl, JSON.stringify(value)); } async invalidate(entity: string, id: string): Promise<void> { // Simply increment version - old keys become orphaned // (will be cleaned up by TTL) await this.incrementVersion(entity, id); }} // Usageconst versionedCache = new VersionedCache(); // Reads use current versionconst user = await versionedCache.get('user', '123'); // Invalidation is instant - just increment versionawait versionedCache.invalidate('user', '123'); // Next read will cache miss (and populate with new version)Event-driven invalidation provides the best consistency but adds complexity. You need reliable event delivery (message queue), careful dependency tracking (what caches depend on this entity?), and monitoring for invalidation failures. Start with TTL-based caching, move to event-driven only when consistency requirements demand it.
Caching is the most powerful tool for latency reduction—moving data closer to where it's needed and avoiding expensive operations entirely. Effective caching requires understanding the memory hierarchy, choosing appropriate access patterns, and handling edge cases that can actually increase latency.
What's Next:
Caching reduces latency by avoiding work. Async processing reduces latency by deferring work—moving expensive operations out of the request path entirely. The next page explores how to use queues, background jobs, and event-driven architectures to achieve ultra-low-latency responses.
You now understand how to use caching as a latency optimization strategy. From multi-layer cache hierarchies to stampede prevention to intelligent invalidation, these techniques reduce response times by orders of magnitude while maintaining acceptable data freshness.