System Design (HLD)Latency Optimization

Latency Optimization: Eliminating Performance Bottlenecks

LevelAdvanced

Duration90 mins

TopicLatency Optimization

3 / 5

Caching for Latency

The Memory Hierarchy Advantage

The most effective way to reduce latency is to avoid the slow operation entirely. Caching is the systematic application of this principle—storing results of expensive operations close to where they're needed, transforming seconds into microseconds.

Consider the latency hierarchy:

Storage Layer	Access Time	Relative Speed
CPU L1 Cache	~1 ns	1x
CPU L3 Cache	~10 ns	10x slower
Main Memory (RAM)	~100 ns	100x slower
Local SSD	~100 μs	100,000x slower
Network (same DC)	~500 μs	500,000x slower
Network (cross-DC)	~50 ms	50,000,000x slower
Database query	10-1000 ms	10,000,000-1,000,000,000x slower

Every layer you climb in this hierarchy provides orders of magnitude latency reduction. A Redis cache hit (500μs) is 2000x faster than a database query (1000ms). An in-memory application cache hit (100μs) is 10,000x faster.

What You Will Master

By completing this page, you will understand how to design cache hierarchies for minimal latency, implement cache access patterns that maximize hit rates, handle cache misses gracefully, and make intelligent tradeoffs between latency, consistency, and memory usage.

Cache Placement Strategy

Where you place caches determines the latency characteristics of your system. Each layer offers different tradeoffs between latency, capacity, consistency, and operational complexity.

The Multi-Layer Cache Hierarchy:

Client Request
    ↓
[Browser Cache] ─────────── 0ms (already on client)
    ↓ miss
[CDN Edge Cache] ─────────── 5-20ms (nearest POP)
    ↓ miss
[API Gateway Cache] ──────── 1-5ms (edge of infrastructure)
    ↓ miss
[Application In-Memory] ──── <1ms (same process)
    ↓ miss
[Distributed Cache] ──────── 1-10ms (Redis/Memcached)
    ↓ miss
[Database Query Cache] ───── 1-5ms (if enabled)
    ↓ miss
[Database Disk] ─────────── 10-1000ms (actual storage)

Each layer intercepts requests before they reach slower layers. The goal is to satisfy as many requests as possible at the fastest layer.

Cache Layer Comparison
Layer	Latency	Capacity	Best For	Challenges
Browser Cache	0ms	MB-GB	Static assets, user-specific data	Limited control, varies by client
CDN Edge	5-20ms	TB	Static content, cacheable APIs	Invalidation delay, edge complexity
API Gateway	1-5ms	GB	Auth tokens, rate limiting, common responses	Single point of failure concern
In-Memory (App)	<1ms	MB-GB	Hot data, computed values, session	Lost on restart, not shared across instances
Distributed (Redis)	1-10ms	TB	Shared state, sessions, feature flags	Network hop, operational overhead
Database Query	1-5ms	MB-GB	Repeated exact queries	Limited hit rates, memory contention

Layer Selection Guidelines:

Cache at the client when:

Data is user-specific and frequently accessed
Freshness can tolerate minutes of staleness (or use ETags/Last-Modified)
Reducing round-trips is critical (mobile, offline support)

Cache at the CDN/edge when:

Content is identical for multiple users
Geographic distribution matters
You can tolerate TTL-based invalidation delay

Cache in the application when:

Data is accessed multiple times within a single request (request-scoped cache)
Hot data fits in memory and is shared across requests
You need sub-millisecond latency

Cache in distributed cache when:

Data must be shared across application instances
You need persistence/replication for availability
Dataset exceeds single-instance memory

The L1 + L2 Pattern

Combine in-memory (L1) and distributed (L2) caches for optimal latency. Check local memory first (<1ms), then Redis (1-10ms), then database. This pattern handles 90%+ of requests from L1 while L2 provides consistency across instances. Libraries like Caffeine (Java) or node-cache (Node.js) plus Redis form this naturally.

Cache Access Patterns for Low Latency

The access pattern—how you interact with the cache—directly impacts achieved latency. Some patterns are designed for throughput; others optimize specifically for latency.

Cache-Aside (Lazy Loading):

The most common pattern. Application checks cache first, loads from database on miss, and populates cache.

cache-aside.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Cache-Aside pattern implementation
import Redis from 'ioredis';
 
const redis = new Redis();
const CACHE_TTL = 3600; // 1 hour
 
interface User {
    id: string;
    name: string;
    email: string;
}
 
async function getUserById(userId: string): Promise<User | null> {
    const cacheKey = `user:${userId}`;
    
    // Step 1: Check cache first
    const cached = await redis.get(cacheKey);
    if (cached) {
        // Cache hit - return immediately (1-10ms)
        return JSON.parse(cached);
    }
    
    // Step 2: Cache miss - load from database (10-1000ms)
    const user = await db.users.findUnique({ where: { id: userId } });
    
    if (user) {
        // Step 3: Populate cache for next time
        await redis.setex(cacheKey, CACHE_TTL, JSON.stringify(user));
    }
    
    return user;
}
 
// Latency characteristics:
// - Cache hit: 1-10ms
// - Cache miss: 10-1000ms + 1-10ms (to populate)
// - First request for any user is always slow

Read-Through Cache:

Cache handles loading transparently. Application only interacts with cache; cache loads from source on miss.

read-through.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Read-Through cache with automatic loading
class ReadThroughCache<T> {
    private cache: Map<string, { value: T; expiresAt: number }> = new Map();
    private pending: Map<string, Promise<T>> = new Map();
    
    constructor(
        private loader: (key: string) => Promise<T>,
        private ttlMs: number = 60000
    ) {}
    
    async get(key: string): Promise<T> {
        // Check if cached and not expired
        const cached = this.cache.get(key);
        if (cached && cached.expiresAt > Date.now()) {
            return cached.value;
        }
        
        // Prevent thundering herd - only one loader per key
        if (this.pending.has(key)) {
            return this.pending.get(key)!;
        }
        
        // Load and cache
        const loadPromise = this.loader(key).then(value => {
            this.cache.set(key, {
                value,
                expiresAt: Date.now() + this.ttlMs
            });
            this.pending.delete(key);
            return value;
        }).catch(err => {
            this.pending.delete(key);
            throw err;
        });
        
        this.pending.set(key, loadPromise);
        return loadPromise;
    }
}
 
// Usage - application never directly queries database
const userCache = new ReadThroughCache<User>(
    async (userId) => db.users.findUnique({ where: { id: userId } }),
    3600000 // 1 hour TTL
);
 
async function handleRequest(userId: string) {
    // Cache handles everything - always same interface
    const user = await userCache.get(userId);
    return user;
}

Cache Warming / Preloading:

For latency-critical paths, cold cache is unacceptable. Preload cache with expected hot data before traffic arrives.

cache-warming.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// Cache warming strategies
 
// 1. Warm on startup
async function warmCacheOnStartup() {
    console.log('Warming cache...');
    
    // Load most accessed items
    const popularProducts = await db.products.findMany({
        orderBy: { viewCount: 'desc' },
        take: 1000
    });
    
    await Promise.all(
        popularProducts.map(p => 
            redis.setex(`product:${p.id}`, 3600, JSON.stringify(p))
        )
    );
    
    console.log(`Warmed ${popularProducts.length} products`);
}
 
// 2. Warm before deployment/cache flush
async function preWarmBeforeCacheFlush() {
    // Get current cache keys
    const keys = await redis.keys('product:*');
    
    // Refresh all in background
    for (const key of keys) {
        const id = key.split(':')[1];
        const product = await db.products.findUnique({ where: { id } });
        if (product) {
            await redis.setex(key, 3600, JSON.stringify(product));
        }
    }
}
 
// 3. Predictive warming based on patterns
async function predictiveWarm() {
    // Warm Monday morning data on Sunday night
    const upcomingReports = await db.reports.findMany({
        where: {
            scheduledFor: {
                gte: new Date(),
                lte: new Date(Date.now() + 24 * 60 * 60 * 1000)
            }
        }
    });
    
    for (const report of upcomingReports) {
        // Pre-compute and cache report data
        const data = await computeReportData(report.id);
        await redis.setex(`report:${report.id}`, 86400, JSON.stringify(data));
    }
}
 
// Run warming on server start
warmCacheOnStartup().catch(console.error);

Cold Start Latency

After deployments, cache restarts, or traffic spikes to uncached data, latency will spike. Monitor cold cache latency separately from warm cache latency. Implement cache warming for critical paths, and use gradual rollouts to warm caches before receiving full traffic.

Avoiding Cache-Related Latency Issues

Caching can paradoxically increase latency if not handled properly. Several patterns cause cache-related latency spikes.

Thundering Herd / Cache Stampede:

When a cached item expires, hundreds of concurrent requests all find a cache miss simultaneously, all query the database, overwhelming it.

Without Protection

•Cache key expires at exactly 12:00:00
•100 concurrent requests find cache miss
•100 identical database queries execute
•Database CPU spikes to 100%
•All queries take 5s instead of 50ms
•All 100 requests experience 5s latency

With Lock + Refresh

•First request acquires lock
•Other 99 requests use stale value (or wait briefly)
•Single database query executes (50ms)
•Cache updated, lock released
•First request: 50ms, others: ~0ms (stale) or ~60ms (waited)
•Database load remains normal

stampede-protection.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
// Stampede protection with distributed lock
import Redis from 'ioredis';
import Redlock from 'redlock';
 
const redis = new Redis();
const redlock = new Redlock([redis]);
 
async function getCachedWithLock<T>(
    key: string,
    ttl: number,
    loader: () => Promise<T>
): Promise<T> {
    // Try to get from cache
    const cached = await redis.get(key);
    if (cached) {
        return JSON.parse(cached);
    }
    
    // Cache miss - acquire lock to prevent stampede
    const lockKey = `lock:${key}`;
    
    try {
        // Try to acquire lock (wait up to 1s, lock lasts 5s)
        const lock = await redlock.acquire([lockKey], 5000);
        
        try {
            // Double-check cache (another request may have populated)
            const recheckCached = await redis.get(key);
            if (recheckCached) {
                return JSON.parse(recheckCached);
            }
            
            // We have the lock, load from source
            const value = await loader();
            await redis.setex(key, ttl, JSON.stringify(value));
            return value;
            
        } finally {
            await lock.release();
        }
        
    } catch (lockError) {
        // Could not acquire lock - another request is loading
        // Wait briefly and try cache again
        await new Promise(r => setTimeout(r, 100));
        const waitedCached = await redis.get(key);
        if (waitedCached) {
            return JSON.parse(waitedCached);
        }
        
        // Still no cache, fall through to direct load
        // (prevents deadlock if lock holder crashes)
        return loader();
    }
}
 
// Alternative: Stale-While-Revalidate
// Returns stale data immediately while refreshing in background
async function getWithSWR<T>(
    key: string,
    ttl: number,
    staleTtl: number, // How long stale data is acceptable
    loader: () => Promise<T>
): Promise<T> {
    const data = await redis.hgetall(`swr:${key}`);
    const now = Date.now();
    
    if (data.value) {
        const expiresAt = parseInt(data.expiresAt);
        const staleUntil = parseInt(data.staleUntil);
        
        if (now < expiresAt) {
            // Fresh - return immediately
            return JSON.parse(data.value);
        }
        
        if (now < staleUntil) {
            // Stale but acceptable - return immediately, refresh async
            refreshInBackground(key, ttl, staleTtl, loader);
            return JSON.parse(data.value);
        }
    }
    
    // No data or too stale - must wait for fresh
    return refreshAndReturn(key, ttl, staleTtl, loader);
}
 
async function refreshInBackground<T>(
    key: string,
    ttl: number,
    staleTtl: number,
    loader: () => Promise<T>
) {
    // Set a flag to prevent multiple background refreshes
    const refreshKey = `refreshing:${key}`;
    const acquired = await redis.set(refreshKey, '1', 'EX', 30, 'NX');
    
    if (!acquired) return; // Another process is refreshing
    
    try {
        const value = await loader();
        const now = Date.now();
        await redis.hset(`swr:${key}`, {
            value: JSON.stringify(value),
            expiresAt: (now + ttl * 1000).toString(),
            staleUntil: (now + (ttl + staleTtl) * 1000).toString()
        });
    } finally {
        await redis.del(refreshKey);
    }
}

Other Cache Latency Issues:

Cache Serialization Overhead: JSON serialization/deserialization of large objects adds latency. For hot paths with complex objects, consider:

Binary formats (MessagePack, Protocol Buffers) for 2-3x faster serialization
Streaming deserialization (parse only needed fields)
Caching at multiple granularities (full object + frequently-accessed fields separately)

Network Latency to Cache: Distributed caches add network round-trip. For ultra-low-latency needs:

Use same-AZ cache instances to minimize network hops
Consider local in-memory cache as L1 in front of Redis
Use Redis pipelining to batch multiple operations in single round-trip
Enable Redis connection pooling to avoid connection overhead

Large Value Latency: Retrieving 10MB cached values takes time (network transfer). Solutions:

Compress large values before caching (trade CPU for bandwidth)
Break into smaller chunks with separate keys
Reconsider what needs caching vs. what should be computed

Redis Pipelining

When making multiple Redis calls, pipeline them. Without pipelining, 10 Redis calls = 10 round-trips = 10-100ms. With pipelining, 10 calls = 1 round-trip = 1-10ms. Use MULTI/EXEC for atomic pipelines, or client.pipeline() for non-atomic batching.

Cache Key Design for Performance

Cache key design impacts hit rates, memory efficiency, and debugging. Poor key design leads to cache misses for data that should be cached, or cache pollution from data that shouldn't.

Key Design Principles:

Cache Key Best Practices

•Include all relevant parameters — If query results differ by locale, include locale in key: products:us:featured vs products:de:featured
•Use consistent ordering — user:123:orders not sometimes orders:user:123. Establish conventions.
•Include version for schema changes — v2:user:123 allows safe migration when cache format changes.
•Namespace by service/component — auth:token:xyz, catalog:product:456 prevents collisions in shared cache.
•Keep keys reasonably short — Long keys consume memory. u:123 saves bytes vs user_id:123 at scale.
•Make keys debuggable — You should be able to identify what a key represents. x:a1b2c3 is poor; session:a1b2c3 is clear.

cache-key-patterns.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
// Cache key builder for consistency
class CacheKeyBuilder {
    private static VERSION = 'v1';
    private static SEPARATOR = ':';
    
    // User-related keys
    static user(userId: string): string {
        return `${this.VERSION}:user:${userId}`;
    }
    
    static userOrders(userId: string, status?: string): string {
        const base = `${this.VERSION}:user:${userId}:orders`;
        return status ? `${base}:${status}` : base;
    }
    
    // Product-related keys with locale
    static product(productId: string, locale: string = 'en'): string {
        return `${this.VERSION}:prod:${locale}:${productId}`;
    }
    
    static productList(category: string, page: number, locale: string = 'en'): string {
        return `${this.VERSION}:prodlist:${locale}:${category}:p${page}`;
    }
    
    // Search results - include all query parameters
    static searchResults(params: {
        query: string;
        category?: string;
        sortBy?: string;
        page?: number;
    }): string {
        // Normalize and sort params for consistent keys
        const normalized = {
            q: params.query.toLowerCase().trim(),
            c: params.category || 'all',
            s: params.sortBy || 'relevance',
            p: params.page || 1
        };
        
        // Create deterministic key from params
        const hash = crypto
            .createHash('md5')
            .update(JSON.stringify(normalized))
            .digest('hex')
            .substring(0, 12);
        
        return `${this.VERSION}:search:${hash}`;
    }
    
    // Session keys with TTL consideration
    static session(sessionId: string): string {
        // Sessions are short-lived, no version needed
        return `sess:${sessionId}`;
    }
    
    // Feature flags - include rollout segment
    static featureFlag(flagName: string, userId?: string): string {
        // Global flag or user-specific override
        return userId 
            ? `ff:${flagName}:u:${userId}`
            : `ff:${flagName}:global`;
    }
}
 
// Usage
const userKey = CacheKeyBuilder.user('12345');
// => "v1:user:12345"
 
const searchKey = CacheKeyBuilder.searchResults({
    query: 'laptop',
    category: 'electronics',
    page: 2
});
// => "v1:search:a7b3c9d2e1f0"
 
// Wildcard patterns for bulk invalidation
// Redis: SCAN with pattern matching
async function invalidateUserCache(userId: string) {
    const pattern = `v1:user:${userId}:*`;
    let cursor = '0';
    
    do {
        const [nextCursor, keys] = await redis.scan(cursor, 'MATCH', pattern, 'COUNT', 100);
        cursor = nextCursor;
        
        if (keys.length > 0) {
            await redis.del(...keys);
        }
    } while (cursor !== '0');
}

Cache Key Cardinality

Be careful with high-cardinality keys. If every unique search query gets a cache key, you'll have millions of rarely-hit keys consuming memory. Set appropriate TTLs for high-cardinality caches (short, like 5 minutes) and monitor cache hit rates. If hit rate is below 10%, the cache may not be worth the memory.

TTL and Expiration Strategies

Time-To-Live (TTL) balances data freshness against cache hit rate. Longer TTLs mean higher hit rates but staler data. The right TTL depends on how quickly data changes and how tolerant users are of staleness.

TTL Selection Framework:

TTL Guidelines by Data Type
Data Type	Suggested TTL	Rationale
Static assets (JS, CSS, images)	1 year + versioned URLs	Never changes without URL change
Reference data (countries, categories)	24 hours	Changes very rarely
User profiles	1-6 hours	Changes occasionally, staleness usually acceptable
Product details	5-60 minutes	Changes with inventory, pricing updates
Search results	1-5 minutes	Needs freshness but exact consistency not critical
Session/auth tokens	Match token expiry	Security-sensitive, must not outlive token
Real-time data (stock prices)	Seconds or none	Staleness directly impacts correctness
Computed aggregates	1-60 minutes	Expensive to compute, staleness often acceptable

Advanced TTL Patterns:

Adaptive TTL: Adjust TTL based on data volatility. Frequently-changing data gets short TTL; stable data gets long TTL.

adaptive-ttl.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
// Adaptive TTL based on change frequency
class AdaptiveTTLCache {
    private changeHistory: Map<string, number[]> = new Map();
    
    private calculateTTL(key: string): number {
        const history = this.changeHistory.get(key) || [];
        
        if (history.length < 2) {
            // Not enough history, use default
            return 300; // 5 minutes
        }
        
        // Calculate average time between changes
        const intervals: number[] = [];
        for (let i = 1; i < history.length; i++) {
            intervals.push(history[i] - history[i-1]);
        }
        const avgInterval = intervals.reduce((a, b) => a + b, 0) / intervals.length;
        
        // TTL = 50% of average change interval
        // (ensures usually fresh, occasional staleness)
        const calculatedTTL = Math.floor(avgInterval * 0.5 / 1000);
        
        // Clamp to reasonable bounds
        return Math.max(60, Math.min(3600, calculatedTTL));
    }
    
    async set(key: string, value: unknown) {
        const ttl = this.calculateTTL(key);
        await redis.setex(key, ttl, JSON.stringify(value));
        
        // Record change timestamp
        const history = this.changeHistory.get(key) || [];
        history.push(Date.now());
        // Keep last 10 changes
        if (history.length > 10) history.shift();
        this.changeHistory.set(key, history);
        
        console.log(`Cache set: ${key} with TTL ${ttl}s`);
    }
}
 
// Jittered TTL to prevent synchronized expiration
function ttlWithJitter(baseTtl: number, jitterPercent: number = 10): number {
    const jitterRange = baseTtl * (jitterPercent / 100);
    const jitter = (Math.random() - 0.5) * 2 * jitterRange;
    return Math.floor(baseTtl + jitter);
}
 
// Usage: All keys expire around 300s, but not simultaneously
await redis.setex(key, ttlWithJitter(300), value);
// Results in TTLs between 270-330 seconds
 
// Probabilistic Early Expiration (to prevent stampede)
function shouldRefresh(ttl: number, remainingTtl: number): boolean {
    if (remainingTtl <= 0) return true;
    
    // Probability increases as expiration approaches
    // At 10% remaining TTL, ~10% chance to refresh
    const fractionRemaining = remainingTtl / ttl;
    const refreshProbability = Math.max(0, (1 - fractionRemaining) * 0.2);
    
    return Math.random() < refreshProbability;
}

TTL vs Active Invalidation

TTL is simpler but means data can be stale up to TTL duration. For consistent user experience (e.g., user updates their name and sees it immediately), combine TTL with active invalidation. Keep long TTL for efficiency, but explicitly delete cache keys when data changes. Event-driven invalidation (user.updated → delete cache) provides both efficiency and consistency.

Request Coalescing and Deduplication

When multiple concurrent requests need the same data, request coalescing merges them into a single backend call. This reduces load and prevents redundant work.

The Problem:

Imagine 100 users open a product page simultaneously. Without coalescing:

100 cache misses → 100 database queries → Database overload

With coalescing:

100 cache misses → 1 database query → 100 responses from single result

Implementation: Single-Flight Pattern

request-coalescing.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
// Single-flight / request coalescing implementation
class SingleFlight<T> {
    private inFlight: Map<string, Promise<T>> = new Map();
    
    async do(key: string, fn: () => Promise<T>): Promise<T> {
        // Check if request is already in flight
        const existing = this.inFlight.get(key);
        if (existing) {
            // Join existing request instead of making new one
            return existing;
        }
        
        // First request - make the call
        const promise = fn().finally(() => {
            // Clean up after completion
            this.inFlight.delete(key);
        });
        
        this.inFlight.set(key, promise);
        return promise;
    }
}
 
// Usage with cache
const singleFlight = new SingleFlight<Product>();
 
async function getProduct(productId: string): Promise<Product> {
    const cacheKey = `product:${productId}`;
    
    // Check cache first
    const cached = await redis.get(cacheKey);
    if (cached) return JSON.parse(cached);
    
    // Use single-flight for database call
    const product = await singleFlight.do(cacheKey, async () => {
        const result = await db.products.findUnique({ 
            where: { id: productId } 
        });
        
        // Populate cache
        if (result) {
            await redis.setex(cacheKey, 3600, JSON.stringify(result));
        }
        
        return result;
    });
    
    return product;
}
 
// Now 100 concurrent calls to getProduct('abc') result in:
// - 1 database query
// - 1 cache write
// - 100 successful responses
 
// DataLoader pattern (popular in GraphQL)
// Automatically batches and deduplicates within a request
import DataLoader from 'dataloader';
 
// Create loader that batches requests
const productLoader = new DataLoader<string, Product>(async (ids) => {
    // Single query for all requested IDs
    const products = await db.products.findMany({
        where: { id: { in: [...ids] } }
    });
    
    // Return in same order as requested IDs
    const productMap = new Map(products.map(p => [p.id, p]));
    return ids.map(id => productMap.get(id) || null);
}, {
    // Cache within request (typically request-scoped loader)
    cache: true,
    // Batch within 10ms window
    batchScheduleFn: callback => setTimeout(callback, 10)
});
 
// Usage - calls are automatically batched
async function getRelatedProducts(productId: string) {
    const product = await productLoader.load(productId);
    
    // Even if these are loaded in separate parts of code,
    // they'll be batched into one DB query
    const related = await Promise.all(
        product.relatedIds.map(id => productLoader.load(id))
    );
    
    return { product, related };
}

Request Scoping

DataLoader should typically be request-scoped (new instance per HTTP request). This ensures batching within a request while preventing cross-request data leakage. Create the loader in middleware and attach to request context. For SingleFlight on cache, global scope is usually fine since results are cached anyway.

Cache Invalidation for Latency

Cache invalidation is famously one of the two hard problems in computer science. From a latency perspective, the goal is to maintain high hit rates while ensuring users don't see unacceptably stale data.

Invalidation Strategies:

Invalidation Approaches

•TTL-Based — Set expiration time, let data expire naturally. Simple but allows staleness up to TTL.
•Event-Driven — On data change, explicitly delete/update cache. Consistent but requires event infrastructure.
•Write-Through — On data change, write to cache and database together. Always consistent but adds write latency.
•Cache-Aside with Invalidation — Normal cache-aside reads, but invalidate on writes. Good balance of complexity and consistency.
•Versioned Keys — Include version in cache key; increment version to 'invalidate'. Avoids delete operations.

event-driven-invalidation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
// Event-driven cache invalidation
import { EventEmitter } from 'events';
 
// Create application event bus
const eventBus = new EventEmitter();
 
// Entity update handlers
eventBus.on('user.updated', async (userId: string) => {
    // Invalidate user cache
    await redis.del(`v1:user:${userId}`);
    
    // Also invalidate derived caches
    await redis.del(`v1:user:${userId}:orders`);
    await redis.del(`v1:user:${userId}:preferences`);
    
    console.log(`Invalidated cache for user ${userId}`);
});
 
eventBus.on('product.updated', async (productId: string) => {
    // Invalidate in all locales
    const locales = ['en', 'de', 'fr', 'es'];
    await Promise.all(
        locales.map(locale => 
            redis.del(`v1:prod:${locale}:${productId}`)
        )
    );
    
    // Invalidate any lists containing this product
    // (harder - may need tracking of dependencies)
});
 
// Usage in update function
async function updateUser(userId: string, updates: Partial<User>) {
    // Update database
    const user = await db.users.update({
        where: { id: userId },
        data: updates
    });
    
    // Emit event for cache invalidation
    eventBus.emit('user.updated', userId);
    
    return user;
}
 
// Versioned cache keys for "instant" invalidation
class VersionedCache {
    private async getVersion(entity: string, id: string): Promise<number> {
        const version = await redis.get(`version:${entity}:${id}`);
        return version ? parseInt(version) : 1;
    }
    
    private async incrementVersion(entity: string, id: string): Promise<void> {
        await redis.incr(`version:${entity}:${id}`);
    }
    
    async get<T>(entity: string, id: string): Promise<T | null> {
        const version = await this.getVersion(entity, id);
        const key = `${entity}:${id}:v${version}`;
        const cached = await redis.get(key);
        return cached ? JSON.parse(cached) : null;
    }
    
    async set<T>(entity: string, id: string, value: T, ttl: number): Promise<void> {
        const version = await this.getVersion(entity, id);
        const key = `${entity}:${id}:v${version}`;
        await redis.setex(key, ttl, JSON.stringify(value));
    }
    
    async invalidate(entity: string, id: string): Promise<void> {
        // Simply increment version - old keys become orphaned
        // (will be cleaned up by TTL)
        await this.incrementVersion(entity, id);
    }
}
 
// Usage
const versionedCache = new VersionedCache();
 
// Reads use current version
const user = await versionedCache.get('user', '123');
 
// Invalidation is instant - just increment version
await versionedCache.invalidate('user', '123');
 
// Next read will cache miss (and populate with new version)

The Complexity Trade-off

Event-driven invalidation provides the best consistency but adds complexity. You need reliable event delivery (message queue), careful dependency tracking (what caches depend on this entity?), and monitoring for invalidation failures. Start with TTL-based caching, move to event-driven only when consistency requirements demand it.

Summary: Caching for Latency

Caching is the most powerful tool for latency reduction—moving data closer to where it's needed and avoiding expensive operations entirely. Effective caching requires understanding the memory hierarchy, choosing appropriate access patterns, and handling edge cases that can actually increase latency.

Key Takeaways

•Layer your caches strategically — Browser, CDN, API gateway, in-memory, distributed. Each layer provides different latency/capacity tradeoffs.
•Choose appropriate access patterns — Cache-aside for simplicity, read-through for abstraction, write-through for consistency. Match pattern to requirements.
•Warm your caches — Cold cache = high latency. Preload hot data on startup, before deployments, and predictively based on usage patterns.
•Prevent stampedes — Use distributed locks, single-flight patterns, or stale-while-revalidate to prevent thundering herd on cache miss.
•Design cache keys carefully — Include all relevant parameters, use consistent conventions, balance debuggability with memory efficiency.
•Set appropriate TTLs — Match TTL to data volatility and staleness tolerance. Use jitter to prevent synchronized expiration.
•Coalesce concurrent requests — Single-flight and DataLoader patterns deduplicate identical requests, reducing backend load and improving consistency.
•Invalidate intelligently — TTL for simplicity, event-driven for consistency. The right approach depends on your consistency requirements.

What's Next:

Caching reduces latency by avoiding work. Async processing reduces latency by deferring work—moving expensive operations out of the request path entirely. The next page explores how to use queues, background jobs, and event-driven architectures to achieve ultra-low-latency responses.

Page Complete

You now understand how to use caching as a latency optimization strategy. From multi-layer cache hierarchies to stampede prevention to intelligent invalidation, these techniques reduce response times by orders of magnitude while maintaining acceptable data freshness.

3 / 5

Loading learning content...

System Design (HLD)Latency Optimization

Latency Optimization: Eliminating Performance Bottlenecks

LevelAdvanced

Duration90 mins

TopicLatency Optimization

3 / 5

Caching for Latency

The Memory Hierarchy Advantage

Consider the latency hierarchy:

Storage Layer	Access Time	Relative Speed
CPU L1 Cache	~1 ns	1x
CPU L3 Cache	~10 ns	10x slower
Main Memory (RAM)	~100 ns	100x slower
Local SSD	~100 μs	100,000x slower
Network (same DC)	~500 μs	500,000x slower
Network (cross-DC)	~50 ms	50,000,000x slower
Database query	10-1000 ms	10,000,000-1,000,000,000x slower

What You Will Master

Cache Placement Strategy

Where you place caches determines the latency characteristics of your system. Each layer offers different tradeoffs between latency, capacity, consistency, and operational complexity.

The Multi-Layer Cache Hierarchy:

Client Request
    ↓
[Browser Cache] ─────────── 0ms (already on client)
    ↓ miss
[CDN Edge Cache] ─────────── 5-20ms (nearest POP)
    ↓ miss
[API Gateway Cache] ──────── 1-5ms (edge of infrastructure)
    ↓ miss
[Application In-Memory] ──── <1ms (same process)
    ↓ miss
[Distributed Cache] ──────── 1-10ms (Redis/Memcached)
    ↓ miss
[Database Query Cache] ───── 1-5ms (if enabled)
    ↓ miss
[Database Disk] ─────────── 10-1000ms (actual storage)

Each layer intercepts requests before they reach slower layers. The goal is to satisfy as many requests as possible at the fastest layer.

Cache Layer Comparison
Layer	Latency	Capacity	Best For	Challenges
Browser Cache	0ms	MB-GB	Static assets, user-specific data	Limited control, varies by client
CDN Edge	5-20ms	TB	Static content, cacheable APIs	Invalidation delay, edge complexity
API Gateway	1-5ms	GB	Auth tokens, rate limiting, common responses	Single point of failure concern
In-Memory (App)	<1ms	MB-GB	Hot data, computed values, session	Lost on restart, not shared across instances
Distributed (Redis)	1-10ms	TB	Shared state, sessions, feature flags	Network hop, operational overhead
Database Query	1-5ms	MB-GB	Repeated exact queries	Limited hit rates, memory contention

Layer Selection Guidelines:

Cache at the client when:

Data is user-specific and frequently accessed
Freshness can tolerate minutes of staleness (or use ETags/Last-Modified)
Reducing round-trips is critical (mobile, offline support)

Cache at the CDN/edge when:

Content is identical for multiple users
Geographic distribution matters
You can tolerate TTL-based invalidation delay

Cache in the application when:

Data is accessed multiple times within a single request (request-scoped cache)
Hot data fits in memory and is shared across requests
You need sub-millisecond latency

Cache in distributed cache when:

Data must be shared across application instances
You need persistence/replication for availability
Dataset exceeds single-instance memory

The L1 + L2 Pattern

Cache Access Patterns for Low Latency

The access pattern—how you interact with the cache—directly impacts achieved latency. Some patterns are designed for throughput; others optimize specifically for latency.

Cache-Aside (Lazy Loading):

The most common pattern. Application checks cache first, loads from database on miss, and populates cache.

cache-aside.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Cache-Aside pattern implementation
import Redis from 'ioredis';
 
const redis = new Redis();
const CACHE_TTL = 3600; // 1 hour
 
interface User {
    id: string;
    name: string;
    email: string;
}
 
async function getUserById(userId: string): Promise<User | null> {
    const cacheKey = `user:${userId}`;
    
    // Step 1: Check cache first
    const cached = await redis.get(cacheKey);
    if (cached) {
        // Cache hit - return immediately (1-10ms)
        return JSON.parse(cached);
    }
    
    // Step 2: Cache miss - load from database (10-1000ms)
    const user = await db.users.findUnique({ where: { id: userId } });
    
    if (user) {
        // Step 3: Populate cache for next time
        await redis.setex(cacheKey, CACHE_TTL, JSON.stringify(user));
    }
    
    return user;
}
 
// Latency characteristics:
// - Cache hit: 1-10ms
// - Cache miss: 10-1000ms + 1-10ms (to populate)
// - First request for any user is always slow

Read-Through Cache:

Cache handles loading transparently. Application only interacts with cache; cache loads from source on miss.

read-through.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Read-Through cache with automatic loading
class ReadThroughCache<T> {
    private cache: Map<string, { value: T; expiresAt: number }> = new Map();
    private pending: Map<string, Promise<T>> = new Map();
    
    constructor(
        private loader: (key: string) => Promise<T>,
        private ttlMs: number = 60000
    ) {}
    
    async get(key: string): Promise<T> {
        // Check if cached and not expired
        const cached = this.cache.get(key);
        if (cached && cached.expiresAt > Date.now()) {
            return cached.value;
        }
        
        // Prevent thundering herd - only one loader per key
        if (this.pending.has(key)) {
            return this.pending.get(key)!;
        }
        
        // Load and cache
        const loadPromise = this.loader(key).then(value => {
            this.cache.set(key, {
                value,
                expiresAt: Date.now() + this.ttlMs
            });
            this.pending.delete(key);
            return value;
        }).catch(err => {
            this.pending.delete(key);
            throw err;
        });
        
        this.pending.set(key, loadPromise);
        return loadPromise;
    }
}
 
// Usage - application never directly queries database
const userCache = new ReadThroughCache<User>(
    async (userId) => db.users.findUnique({ where: { id: userId } }),
    3600000 // 1 hour TTL
);
 
async function handleRequest(userId: string) {
    // Cache handles everything - always same interface
    const user = await userCache.get(userId);
    return user;
}

Cache Warming / Preloading:

For latency-critical paths, cold cache is unacceptable. Preload cache with expected hot data before traffic arrives.

cache-warming.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// Cache warming strategies
 
// 1. Warm on startup
async function warmCacheOnStartup() {
    console.log('Warming cache...');
    
    // Load most accessed items
    const popularProducts = await db.products.findMany({
        orderBy: { viewCount: 'desc' },
        take: 1000
    });
    
    await Promise.all(
        popularProducts.map(p => 
            redis.setex(`product:${p.id}`, 3600, JSON.stringify(p))
        )
    );
    
    console.log(`Warmed ${popularProducts.length} products`);
}
 
// 2. Warm before deployment/cache flush
async function preWarmBeforeCacheFlush() {
    // Get current cache keys
    const keys = await redis.keys('product:*');
    
    // Refresh all in background
    for (const key of keys) {
        const id = key.split(':')[1];
        const product = await db.products.findUnique({ where: { id } });
        if (product) {
            await redis.setex(key, 3600, JSON.stringify(product));
        }
    }
}
 
// 3. Predictive warming based on patterns
async function predictiveWarm() {
    // Warm Monday morning data on Sunday night
    const upcomingReports = await db.reports.findMany({
        where: {
            scheduledFor: {
                gte: new Date(),
                lte: new Date(Date.now() + 24 * 60 * 60 * 1000)
            }
        }
    });
    
    for (const report of upcomingReports) {
        // Pre-compute and cache report data
        const data = await computeReportData(report.id);
        await redis.setex(`report:${report.id}`, 86400, JSON.stringify(data));
    }
}
 
// Run warming on server start
warmCacheOnStartup().catch(console.error);

Cold Start Latency

Avoiding Cache-Related Latency Issues

Caching can paradoxically increase latency if not handled properly. Several patterns cause cache-related latency spikes.

Thundering Herd / Cache Stampede:

When a cached item expires, hundreds of concurrent requests all find a cache miss simultaneously, all query the database, overwhelming it.

Without Protection

•Cache key expires at exactly 12:00:00
•100 concurrent requests find cache miss
•100 identical database queries execute
•Database CPU spikes to 100%
•All queries take 5s instead of 50ms
•All 100 requests experience 5s latency

With Lock + Refresh

•First request acquires lock
•Other 99 requests use stale value (or wait briefly)
•Single database query executes (50ms)
•Cache updated, lock released
•First request: 50ms, others: ~0ms (stale) or ~60ms (waited)
•Database load remains normal

stampede-protection.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
// Stampede protection with distributed lock
import Redis from 'ioredis';
import Redlock from 'redlock';
 
const redis = new Redis();
const redlock = new Redlock([redis]);
 
async function getCachedWithLock<T>(
    key: string,
    ttl: number,
    loader: () => Promise<T>
): Promise<T> {
    // Try to get from cache
    const cached = await redis.get(key);
    if (cached) {
        return JSON.parse(cached);
    }
    
    // Cache miss - acquire lock to prevent stampede
    const lockKey = `lock:${key}`;
    
    try {
        // Try to acquire lock (wait up to 1s, lock lasts 5s)
        const lock = await redlock.acquire([lockKey], 5000);
        
        try {
            // Double-check cache (another request may have populated)
            const recheckCached = await redis.get(key);
            if (recheckCached) {
                return JSON.parse(recheckCached);
            }
            
            // We have the lock, load from source
            const value = await loader();
            await redis.setex(key, ttl, JSON.stringify(value));
            return value;
            
        } finally {
            await lock.release();
        }
        
    } catch (lockError) {
        // Could not acquire lock - another request is loading
        // Wait briefly and try cache again
        await new Promise(r => setTimeout(r, 100));
        const waitedCached = await redis.get(key);
        if (waitedCached) {
            return JSON.parse(waitedCached);
        }
        
        // Still no cache, fall through to direct load
        // (prevents deadlock if lock holder crashes)
        return loader();
    }
}
 
// Alternative: Stale-While-Revalidate
// Returns stale data immediately while refreshing in background
async function getWithSWR<T>(
    key: string,
    ttl: number,
    staleTtl: number, // How long stale data is acceptable
    loader: () => Promise<T>
): Promise<T> {
    const data = await redis.hgetall(`swr:${key}`);
    const now = Date.now();
    
    if (data.value) {
        const expiresAt = parseInt(data.expiresAt);
        const staleUntil = parseInt(data.staleUntil);
        
        if (now < expiresAt) {
            // Fresh - return immediately
            return JSON.parse(data.value);
        }
        
        if (now < staleUntil) {
            // Stale but acceptable - return immediately, refresh async
            refreshInBackground(key, ttl, staleTtl, loader);
            return JSON.parse(data.value);
        }
    }
    
    // No data or too stale - must wait for fresh
    return refreshAndReturn(key, ttl, staleTtl, loader);
}
 
async function refreshInBackground<T>(
    key: string,
    ttl: number,
    staleTtl: number,
    loader: () => Promise<T>
) {
    // Set a flag to prevent multiple background refreshes
    const refreshKey = `refreshing:${key}`;
    const acquired = await redis.set(refreshKey, '1', 'EX', 30, 'NX');
    
    if (!acquired) return; // Another process is refreshing
    
    try {
        const value = await loader();
        const now = Date.now();
        await redis.hset(`swr:${key}`, {
            value: JSON.stringify(value),
            expiresAt: (now + ttl * 1000).toString(),
            staleUntil: (now + (ttl + staleTtl) * 1000).toString()
        });
    } finally {
        await redis.del(refreshKey);
    }
}

Other Cache Latency Issues:

Cache Serialization Overhead: JSON serialization/deserialization of large objects adds latency. For hot paths with complex objects, consider:

Binary formats (MessagePack, Protocol Buffers) for 2-3x faster serialization
Streaming deserialization (parse only needed fields)
Caching at multiple granularities (full object + frequently-accessed fields separately)

Network Latency to Cache: Distributed caches add network round-trip. For ultra-low-latency needs:

Use same-AZ cache instances to minimize network hops
Consider local in-memory cache as L1 in front of Redis
Use Redis pipelining to batch multiple operations in single round-trip
Enable Redis connection pooling to avoid connection overhead

Large Value Latency: Retrieving 10MB cached values takes time (network transfer). Solutions:

Compress large values before caching (trade CPU for bandwidth)
Break into smaller chunks with separate keys
Reconsider what needs caching vs. what should be computed

Redis Pipelining

Cache Key Design for Performance

Cache key design impacts hit rates, memory efficiency, and debugging. Poor key design leads to cache misses for data that should be cached, or cache pollution from data that shouldn't.

Key Design Principles:

Cache Key Best Practices

•Include all relevant parameters — If query results differ by locale, include locale in key: products:us:featured vs products:de:featured
•Use consistent ordering — user:123:orders not sometimes orders:user:123. Establish conventions.
•Include version for schema changes — v2:user:123 allows safe migration when cache format changes.
•Namespace by service/component — auth:token:xyz, catalog:product:456 prevents collisions in shared cache.
•Keep keys reasonably short — Long keys consume memory. u:123 saves bytes vs user_id:123 at scale.
•Make keys debuggable — You should be able to identify what a key represents. x:a1b2c3 is poor; session:a1b2c3 is clear.

cache-key-patterns.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
// Cache key builder for consistency
class CacheKeyBuilder {
    private static VERSION = 'v1';
    private static SEPARATOR = ':';
    
    // User-related keys
    static user(userId: string): string {
        return `${this.VERSION}:user:${userId}`;
    }
    
    static userOrders(userId: string, status?: string): string {
        const base = `${this.VERSION}:user:${userId}:orders`;
        return status ? `${base}:${status}` : base;
    }
    
    // Product-related keys with locale
    static product(productId: string, locale: string = 'en'): string {
        return `${this.VERSION}:prod:${locale}:${productId}`;
    }
    
    static productList(category: string, page: number, locale: string = 'en'): string {
        return `${this.VERSION}:prodlist:${locale}:${category}:p${page}`;
    }
    
    // Search results - include all query parameters
    static searchResults(params: {
        query: string;
        category?: string;
        sortBy?: string;
        page?: number;
    }): string {
        // Normalize and sort params for consistent keys
        const normalized = {
            q: params.query.toLowerCase().trim(),
            c: params.category || 'all',
            s: params.sortBy || 'relevance',
            p: params.page || 1
        };
        
        // Create deterministic key from params
        const hash = crypto
            .createHash('md5')
            .update(JSON.stringify(normalized))
            .digest('hex')
            .substring(0, 12);
        
        return `${this.VERSION}:search:${hash}`;
    }
    
    // Session keys with TTL consideration
    static session(sessionId: string): string {
        // Sessions are short-lived, no version needed
        return `sess:${sessionId}`;
    }
    
    // Feature flags - include rollout segment
    static featureFlag(flagName: string, userId?: string): string {
        // Global flag or user-specific override
        return userId 
            ? `ff:${flagName}:u:${userId}`
            : `ff:${flagName}:global`;
    }
}
 
// Usage
const userKey = CacheKeyBuilder.user('12345');
// => "v1:user:12345"
 
const searchKey = CacheKeyBuilder.searchResults({
    query: 'laptop',
    category: 'electronics',
    page: 2
});
// => "v1:search:a7b3c9d2e1f0"
 
// Wildcard patterns for bulk invalidation
// Redis: SCAN with pattern matching
async function invalidateUserCache(userId: string) {
    const pattern = `v1:user:${userId}:*`;
    let cursor = '0';
    
    do {
        const [nextCursor, keys] = await redis.scan(cursor, 'MATCH', pattern, 'COUNT', 100);
        cursor = nextCursor;
        
        if (keys.length > 0) {
            await redis.del(...keys);
        }
    } while (cursor !== '0');
}

Cache Key Cardinality

TTL and Expiration Strategies

TTL Selection Framework:

TTL Guidelines by Data Type
Data Type	Suggested TTL	Rationale
Static assets (JS, CSS, images)	1 year + versioned URLs	Never changes without URL change
Reference data (countries, categories)	24 hours	Changes very rarely
User profiles	1-6 hours	Changes occasionally, staleness usually acceptable
Product details	5-60 minutes	Changes with inventory, pricing updates
Search results	1-5 minutes	Needs freshness but exact consistency not critical
Session/auth tokens	Match token expiry	Security-sensitive, must not outlive token
Real-time data (stock prices)	Seconds or none	Staleness directly impacts correctness
Computed aggregates	1-60 minutes	Expensive to compute, staleness often acceptable

Advanced TTL Patterns:

Adaptive TTL: Adjust TTL based on data volatility. Frequently-changing data gets short TTL; stable data gets long TTL.

adaptive-ttl.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
// Adaptive TTL based on change frequency
class AdaptiveTTLCache {
    private changeHistory: Map<string, number[]> = new Map();
    
    private calculateTTL(key: string): number {
        const history = this.changeHistory.get(key) || [];
        
        if (history.length < 2) {
            // Not enough history, use default
            return 300; // 5 minutes
        }
        
        // Calculate average time between changes
        const intervals: number[] = [];
        for (let i = 1; i < history.length; i++) {
            intervals.push(history[i] - history[i-1]);
        }
        const avgInterval = intervals.reduce((a, b) => a + b, 0) / intervals.length;
        
        // TTL = 50% of average change interval
        // (ensures usually fresh, occasional staleness)
        const calculatedTTL = Math.floor(avgInterval * 0.5 / 1000);
        
        // Clamp to reasonable bounds
        return Math.max(60, Math.min(3600, calculatedTTL));
    }
    
    async set(key: string, value: unknown) {
        const ttl = this.calculateTTL(key);
        await redis.setex(key, ttl, JSON.stringify(value));
        
        // Record change timestamp
        const history = this.changeHistory.get(key) || [];
        history.push(Date.now());
        // Keep last 10 changes
        if (history.length > 10) history.shift();
        this.changeHistory.set(key, history);
        
        console.log(`Cache set: ${key} with TTL ${ttl}s`);
    }
}
 
// Jittered TTL to prevent synchronized expiration
function ttlWithJitter(baseTtl: number, jitterPercent: number = 10): number {
    const jitterRange = baseTtl * (jitterPercent / 100);
    const jitter = (Math.random() - 0.5) * 2 * jitterRange;
    return Math.floor(baseTtl + jitter);
}
 
// Usage: All keys expire around 300s, but not simultaneously
await redis.setex(key, ttlWithJitter(300), value);
// Results in TTLs between 270-330 seconds
 
// Probabilistic Early Expiration (to prevent stampede)
function shouldRefresh(ttl: number, remainingTtl: number): boolean {
    if (remainingTtl <= 0) return true;
    
    // Probability increases as expiration approaches
    // At 10% remaining TTL, ~10% chance to refresh
    const fractionRemaining = remainingTtl / ttl;
    const refreshProbability = Math.max(0, (1 - fractionRemaining) * 0.2);
    
    return Math.random() < refreshProbability;
}

TTL vs Active Invalidation

Request Coalescing and Deduplication

When multiple concurrent requests need the same data, request coalescing merges them into a single backend call. This reduces load and prevents redundant work.

The Problem:

Imagine 100 users open a product page simultaneously. Without coalescing:

100 cache misses → 100 database queries → Database overload

With coalescing:

100 cache misses → 1 database query → 100 responses from single result

Implementation: Single-Flight Pattern

request-coalescing.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
// Single-flight / request coalescing implementation
class SingleFlight<T> {
    private inFlight: Map<string, Promise<T>> = new Map();
    
    async do(key: string, fn: () => Promise<T>): Promise<T> {
        // Check if request is already in flight
        const existing = this.inFlight.get(key);
        if (existing) {
            // Join existing request instead of making new one
            return existing;
        }
        
        // First request - make the call
        const promise = fn().finally(() => {
            // Clean up after completion
            this.inFlight.delete(key);
        });
        
        this.inFlight.set(key, promise);
        return promise;
    }
}
 
// Usage with cache
const singleFlight = new SingleFlight<Product>();
 
async function getProduct(productId: string): Promise<Product> {
    const cacheKey = `product:${productId}`;
    
    // Check cache first
    const cached = await redis.get(cacheKey);
    if (cached) return JSON.parse(cached);
    
    // Use single-flight for database call
    const product = await singleFlight.do(cacheKey, async () => {
        const result = await db.products.findUnique({ 
            where: { id: productId } 
        });
        
        // Populate cache
        if (result) {
            await redis.setex(cacheKey, 3600, JSON.stringify(result));
        }
        
        return result;
    });
    
    return product;
}
 
// Now 100 concurrent calls to getProduct('abc') result in:
// - 1 database query
// - 1 cache write
// - 100 successful responses
 
// DataLoader pattern (popular in GraphQL)
// Automatically batches and deduplicates within a request
import DataLoader from 'dataloader';
 
// Create loader that batches requests
const productLoader = new DataLoader<string, Product>(async (ids) => {
    // Single query for all requested IDs
    const products = await db.products.findMany({
        where: { id: { in: [...ids] } }
    });
    
    // Return in same order as requested IDs
    const productMap = new Map(products.map(p => [p.id, p]));
    return ids.map(id => productMap.get(id) || null);
}, {
    // Cache within request (typically request-scoped loader)
    cache: true,
    // Batch within 10ms window
    batchScheduleFn: callback => setTimeout(callback, 10)
});
 
// Usage - calls are automatically batched
async function getRelatedProducts(productId: string) {
    const product = await productLoader.load(productId);
    
    // Even if these are loaded in separate parts of code,
    // they'll be batched into one DB query
    const related = await Promise.all(
        product.relatedIds.map(id => productLoader.load(id))
    );
    
    return { product, related };
}

Request Scoping

Cache Invalidation for Latency

Invalidation Strategies:

Invalidation Approaches

•TTL-Based — Set expiration time, let data expire naturally. Simple but allows staleness up to TTL.
•Event-Driven — On data change, explicitly delete/update cache. Consistent but requires event infrastructure.
•Write-Through — On data change, write to cache and database together. Always consistent but adds write latency.
•Cache-Aside with Invalidation — Normal cache-aside reads, but invalidate on writes. Good balance of complexity and consistency.
•Versioned Keys — Include version in cache key; increment version to 'invalidate'. Avoids delete operations.

event-driven-invalidation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
// Event-driven cache invalidation
import { EventEmitter } from 'events';
 
// Create application event bus
const eventBus = new EventEmitter();
 
// Entity update handlers
eventBus.on('user.updated', async (userId: string) => {
    // Invalidate user cache
    await redis.del(`v1:user:${userId}`);
    
    // Also invalidate derived caches
    await redis.del(`v1:user:${userId}:orders`);
    await redis.del(`v1:user:${userId}:preferences`);
    
    console.log(`Invalidated cache for user ${userId}`);
});
 
eventBus.on('product.updated', async (productId: string) => {
    // Invalidate in all locales
    const locales = ['en', 'de', 'fr', 'es'];
    await Promise.all(
        locales.map(locale => 
            redis.del(`v1:prod:${locale}:${productId}`)
        )
    );
    
    // Invalidate any lists containing this product
    // (harder - may need tracking of dependencies)
});
 
// Usage in update function
async function updateUser(userId: string, updates: Partial<User>) {
    // Update database
    const user = await db.users.update({
        where: { id: userId },
        data: updates
    });
    
    // Emit event for cache invalidation
    eventBus.emit('user.updated', userId);
    
    return user;
}
 
// Versioned cache keys for "instant" invalidation
class VersionedCache {
    private async getVersion(entity: string, id: string): Promise<number> {
        const version = await redis.get(`version:${entity}:${id}`);
        return version ? parseInt(version) : 1;
    }
    
    private async incrementVersion(entity: string, id: string): Promise<void> {
        await redis.incr(`version:${entity}:${id}`);
    }
    
    async get<T>(entity: string, id: string): Promise<T | null> {
        const version = await this.getVersion(entity, id);
        const key = `${entity}:${id}:v${version}`;
        const cached = await redis.get(key);
        return cached ? JSON.parse(cached) : null;
    }
    
    async set<T>(entity: string, id: string, value: T, ttl: number): Promise<void> {
        const version = await this.getVersion(entity, id);
        const key = `${entity}:${id}:v${version}`;
        await redis.setex(key, ttl, JSON.stringify(value));
    }
    
    async invalidate(entity: string, id: string): Promise<void> {
        // Simply increment version - old keys become orphaned
        // (will be cleaned up by TTL)
        await this.incrementVersion(entity, id);
    }
}
 
// Usage
const versionedCache = new VersionedCache();
 
// Reads use current version
const user = await versionedCache.get('user', '123');
 
// Invalidation is instant - just increment version
await versionedCache.invalidate('user', '123');
 
// Next read will cache miss (and populate with new version)

The Complexity Trade-off

Summary: Caching for Latency

Key Takeaways

•Layer your caches strategically — Browser, CDN, API gateway, in-memory, distributed. Each layer provides different latency/capacity tradeoffs.
•Choose appropriate access patterns — Cache-aside for simplicity, read-through for abstraction, write-through for consistency. Match pattern to requirements.
•Warm your caches — Cold cache = high latency. Preload hot data on startup, before deployments, and predictively based on usage patterns.
•Prevent stampedes — Use distributed locks, single-flight patterns, or stale-while-revalidate to prevent thundering herd on cache miss.
•Design cache keys carefully — Include all relevant parameters, use consistent conventions, balance debuggability with memory efficiency.
•Set appropriate TTLs — Match TTL to data volatility and staleness tolerance. Use jitter to prevent synchronized expiration.
•Coalesce concurrent requests — Single-flight and DataLoader patterns deduplicate identical requests, reducing backend load and improving consistency.
•Invalidate intelligently — TTL for simplicity, event-driven for consistency. The right approach depends on your consistency requirements.

What's Next:

Page Complete

3 / 5