Loading learning content...
Where should your cache live? This deceptively simple question has profound implications for your system's performance, consistency, and operational complexity.
Local caches (in-process, same machine) offer blazing speed—nanosecond access times with zero network overhead. But they're isolated: each application instance has its own cache with potentially different data.
Distributed caches (Redis, Memcached, shared across instances) provide a single source of truth that all instances share. But every access crosses the network, adding latency and failure modes.
Neither approach is universally superior. Principal engineers understand that cache topology is an architectural decision with trade-offs that must align with system requirements. Getting this wrong leads to either excessive latency (over-reliance on distributed cache) or inconsistency nightmares (poorly-managed local caches).
By the end of this page, you will understand the characteristics, trade-offs, and appropriate use cases for local and distributed caches. You'll learn cache coherence strategies, multi-tier architectures, and a decision framework for choosing cache topology in your systems.
Cache topology refers to where cache storage is located relative to your application instances. The three primary topologies are:
1. Local Cache (In-Process)
2. Distributed Cache (Shared)
3. Multi-Tier Cache (Layered)
| Aspect | Local Cache | Distributed Cache |
|---|---|---|
| Access Latency | < 1μs (sub-microsecond) | 1-10ms (network RTT) |
| Total Capacity | Bounded by instance RAM | Scales independently |
| Consistency Across Instances | No built-in coherence | Single source of truth |
| Failure Impact | Lost on instance restart | Survives instance restarts |
| Operational Complexity | Zero (embedded) | Additional infrastructure |
| Cost | Uses application memory | Separate compute/memory cost |
A local cache lookup takes ~100 nanoseconds. A Redis lookup takes ~1 millisecond (including network). That's a 10,000x difference. For hot paths where you need sub-millisecond response, this difference matters enormously. But for cold paths or large datasets, the shared capacity of distributed cache is more important.
Local caches are embedded within your application process. They offer unmatched performance but require careful management to avoid problems.
Characteristics:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191
/** * Production-ready local cache with essential features. */interface LocalCacheConfig<K, V> { maxSize: number; // Maximum items maxMemoryBytes?: number; // Maximum memory (if tracking size) ttlMs: number; // Time-to-live for entries refreshMs?: number; // Background refresh interval sizeEstimator?: (value: V) => number; // For memory tracking} class LocalCache<K, V> { private cache: Map<K, CacheEntry<V>> = new Map(); private config: LocalCacheConfig<K, V>; private currentMemory: number = 0; // Metrics private metrics = { hits: 0, misses: 0, evictions: 0, refreshes: 0, }; constructor(config: LocalCacheConfig<K, V>) { this.config = config; this.startCleanupTask(); } get(key: K): V | undefined { const entry = this.cache.get(key); if (!entry) { this.metrics.misses++; return undefined; } // Check TTL if (Date.now() > entry.expiresAt) { this.cache.delete(key); this.currentMemory -= entry.memorySize; this.metrics.misses++; return undefined; } // Update access time for LRU entry.lastAccess = Date.now(); this.metrics.hits++; return entry.value; } set(key: K, value: V): void { const memorySize = this.config.sizeEstimator?.(value) ?? 1; // Remove existing entry if present if (this.cache.has(key)) { const existing = this.cache.get(key)!; this.currentMemory -= existing.memorySize; this.cache.delete(key); } // Evict if necessary this.evictIfNeeded(memorySize); // Insert new entry const entry: CacheEntry<V> = { value, createdAt: Date.now(), lastAccess: Date.now(), expiresAt: Date.now() + this.config.ttlMs, memorySize, }; this.cache.set(key, entry); this.currentMemory += memorySize; } /** * Get with loader - returns cached value or loads and caches. */ async getOrLoad(key: K, loader: () => Promise<V>): Promise<V> { const cached = this.get(key); if (cached !== undefined) { return cached; } // Load and cache const value = await loader(); this.set(key, value); return value; } private evictIfNeeded(incomingSize: number): void { // Evict by count while (this.cache.size >= this.config.maxSize) { this.evictOldest(); } // Evict by memory (if configured) if (this.config.maxMemoryBytes) { while (this.currentMemory + incomingSize > this.config.maxMemoryBytes && this.cache.size > 0) { this.evictOldest(); } } } private evictOldest(): void { let oldestKey: K | undefined; let oldestAccess = Infinity; for (const [key, entry] of this.cache) { if (entry.lastAccess < oldestAccess) { oldestAccess = entry.lastAccess; oldestKey = key; } } if (oldestKey !== undefined) { const entry = this.cache.get(oldestKey)!; this.cache.delete(oldestKey); this.currentMemory -= entry.memorySize; this.metrics.evictions++; } } private startCleanupTask(): void { // Periodic cleanup of expired entries setInterval(() => { const now = Date.now(); for (const [key, entry] of this.cache) { if (now > entry.expiresAt) { this.cache.delete(key); this.currentMemory -= entry.memorySize; } } }, Math.min(this.config.ttlMs / 2, 60000)); } invalidate(key: K): boolean { const entry = this.cache.get(key); if (entry) { this.currentMemory -= entry.memorySize; this.cache.delete(key); return true; } return false; } clear(): void { this.cache.clear(); this.currentMemory = 0; } getMetrics(): LocalCacheMetrics { const total = this.metrics.hits + this.metrics.misses; return { hitRate: total > 0 ? this.metrics.hits / total : 0, size: this.cache.size, memoryUsed: this.currentMemory, ...this.metrics, }; }} interface CacheEntry<V> { value: V; createdAt: number; lastAccess: number; expiresAt: number; memorySize: number;} interface LocalCacheMetrics { hitRate: number; size: number; memoryUsed: number; hits: number; misses: number; evictions: number; refreshes: number;} // Usage exampleconst userCache = new LocalCache<string, UserProfile>({ maxSize: 10000, maxMemoryBytes: 100 * 1024 * 1024, // 100MB ttlMs: 5 * 60 * 1000, // 5 minutes sizeEstimator: (user) => JSON.stringify(user).length * 2, // Rough estimate});Local caches consume application heap memory. Large caches can cause GC pressure, especially in Java/JVM environments. Monitor application memory and GC metrics when using local caches. Consider off-heap storage (like Caffeine's experimental off-heap mode) for large local caches.
Distributed caches run as separate services, providing shared storage accessible by all application instances. They're the backbone of scalable caching architectures.
Popular Distributed Cache Systems:
| System | Strengths | Best For | Considerations |
|---|---|---|---|
| Redis | Rich data types, Pub/Sub, Persistence | General purpose, Sessions, Queues | Single-threaded, Memory-bound |
| Memcached | Simplicity, Multi-threaded | Pure key-value caching | No persistence, No data types |
| Hazelcast | Distributed computing, Near-cache | Java apps, Compute + Cache | JVM-only traditionally |
| Apache Ignite | SQL queries, Compute grid | Data grid, Analytics cache | Complexity, Resource intensive |
| Couchbase | JSON documents, Mobile sync | Document caching, Mobile | Operational overhead |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226
import Redis from 'ioredis'; /** * Production distributed cache client with resilience patterns. */interface DistributedCacheConfig { redisUrl: string; defaultTtlSeconds: number; connectionTimeout: number; maxRetries: number; retryDelayMs: number; prefix: string;} class DistributedCache<V> { private redis: Redis; private config: DistributedCacheConfig; private connected: boolean = false; // Circuit breaker state private failures: number = 0; private circuitOpen: boolean = false; private lastFailure: number = 0; constructor(config: DistributedCacheConfig) { this.config = config; this.redis = new Redis(config.redisUrl, { connectTimeout: config.connectionTimeout, retryStrategy: (times) => { if (times > config.maxRetries) { return null; // Stop retrying } return Math.min(times * config.retryDelayMs, 5000); }, lazyConnect: true, }); this.setupConnectionHandlers(); } private setupConnectionHandlers(): void { this.redis.on('connect', () => { this.connected = true; this.failures = 0; this.circuitOpen = false; console.log('Distributed cache connected'); }); this.redis.on('error', (err) => { console.error('Distributed cache error:', err.message); this.recordFailure(); }); this.redis.on('close', () => { this.connected = false; console.warn('Distributed cache connection closed'); }); } private recordFailure(): void { this.failures++; this.lastFailure = Date.now(); // Open circuit after 5 consecutive failures if (this.failures >= 5) { this.circuitOpen = true; console.warn('Cache circuit breaker opened'); // Auto-reset after 30 seconds setTimeout(() => { this.circuitOpen = false; this.failures = 0; console.log('Cache circuit breaker reset'); }, 30000); } } private fullKey(key: string): string { return `${this.config.prefix}:${key}`; } async get(key: string): Promise<V | null> { if (this.circuitOpen) { return null; // Fail fast } try { const data = await this.redis.get(this.fullKey(key)); if (!data) return null; return JSON.parse(data) as V; } catch (error) { this.recordFailure(); throw error; } } async set( key: string, value: V, ttlSeconds?: number ): Promise<boolean> { if (this.circuitOpen) { return false; // Fail fast } try { const serialized = JSON.stringify(value); const ttl = ttlSeconds ?? this.config.defaultTtlSeconds; await this.redis.setex( this.fullKey(key), ttl, serialized ); return true; } catch (error) { this.recordFailure(); throw error; } } /** * Get with fallback loader and cache-aside pattern. */ async getOrLoad( key: string, loader: () => Promise<V>, ttlSeconds?: number ): Promise<V> { // Try cache first try { const cached = await this.get(key); if (cached !== null) { return cached; } } catch (error) { // Cache unavailable - continue to loader console.warn('Cache read failed, loading from source'); } // Load from source const value = await loader(); // Try to cache (fire and forget) this.set(key, value, ttlSeconds).catch(err => { console.warn('Failed to cache loaded value:', err.message); }); return value; } /** * Delete with pattern matching (Redis SCAN + DEL). */ async invalidatePattern(pattern: string): Promise<number> { if (this.circuitOpen) return 0; let deleted = 0; const fullPattern = this.fullKey(pattern); // Use SCAN for non-blocking pattern search const stream = this.redis.scanStream({ match: fullPattern, count: 100, }); return new Promise((resolve, reject) => { stream.on('data', async (keys: string[]) => { if (keys.length > 0) { const pipeline = this.redis.pipeline(); keys.forEach(key => pipeline.del(key)); await pipeline.exec(); deleted += keys.length; } }); stream.on('end', () => resolve(deleted)); stream.on('error', reject); }); } /** * Bulk get for multiple keys. */ async mget(keys: string[]): Promise<Map<string, V>> { if (this.circuitOpen || keys.length === 0) { return new Map(); } const fullKeys = keys.map(k => this.fullKey(k)); const results = await this.redis.mget(...fullKeys); const map = new Map<string, V>(); for (let i = 0; i < keys.length; i++) { if (results[i]) { map.set(keys[i], JSON.parse(results[i]!) as V); } } return map; } getHealthStatus(): CacheHealthStatus { return { connected: this.connected, circuitOpen: this.circuitOpen, recentFailures: this.failures, lastFailureAt: this.lastFailure > 0 ? new Date(this.lastFailure).toISOString() : null, }; } async disconnect(): Promise<void> { await this.redis.quit(); }} interface CacheHealthStatus { connected: boolean; circuitOpen: boolean; recentFailures: number; lastFailureAt: string | null;}Distributed caches require serialization (object → bytes) on write and deserialization (bytes → object) on read. This adds latency and CPU cost. For performance-critical paths, consider binary formats like MessagePack or Protocol Buffers instead of JSON.
Cache coherence is the problem of keeping cached data consistent across multiple caches. This is primarily a challenge with local caches, where each instance maintains independent state.
The Coherence Problem:
Imagine three application instances, each with local caches:
This inconsistency can persist until cached data expires (minutes or hours).
Coherence Strategies:
TTL (Time-to-Live) coherence accepts eventual consistency by ensuring cached data expires within a bounded time. No active invalidation required.
When to Use: When temporary staleness is acceptable (product catalogs, configuration, non-critical content).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
/** * TTL-based coherence relies on natural expiration. * Staleness window = TTL duration. */class TTLCoherentCache<V> { private cache: Map<string, { value: V; expiresAt: number }> = new Map(); constructor( private readonly ttlMs: number, private readonly maxStalenessMs: number // Maximum acceptable staleness ) { // TTL should not exceed max staleness if (ttlMs > maxStalenessMs) { console.warn(`TTL (${ttlMs}ms) exceeds max staleness (${maxStalenessMs}ms)`); } } get(key: string): V | undefined { const entry = this.cache.get(key); if (!entry) return undefined; if (Date.now() > entry.expiresAt) { this.cache.delete(key); return undefined; } return entry.value; } set(key: string, value: V): void { this.cache.set(key, { value, expiresAt: Date.now() + this.ttlMs, }); }} // Coherence guarantee: Data is never staler than TTL// Trade-off: Must balance staleness tolerance vs hit rate // Shorter TTL = fresher data, lower hit rateconst realTimeCache = new TTLCoherentCache<User>( 30 * 1000, // 30 second TTL 60 * 1000 // 1 minute max staleness acceptable); // Longer TTL = higher hit rate, potentially staler dataconst catalogCache = new TTLCoherentCache<Product>( 5 * 60 * 1000, // 5 minute TTL 10 * 60 * 1000 // 10 minutes max staleness acceptable);Multi-tier caching combines local and distributed caches to achieve both performance and consistency. This is the standard architecture for production systems at scale.
The L1/L2 Pattern:
Read path: Check L1 → if miss, check L2 → if miss, load from source → populate L2 → populate L1
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166
/** * Multi-tier (L1/L2) cache implementation. * L1: Local in-memory cache (fast, small) * L2: Distributed Redis cache (larger, shared) */interface TieredCacheConfig { l1MaxItems: number; l1TtlMs: number; l2TtlSeconds: number; namespace: string;} class TieredCache<V> { private l1: LocalCache<string, V>; private l2: DistributedCache<V>; private config: TieredCacheConfig; // Metrics by tier private metrics = { l1Hits: 0, l2Hits: 0, misses: 0, }; constructor( config: TieredCacheConfig, redisUrl: string ) { this.config = config; this.l1 = new LocalCache({ maxSize: config.l1MaxItems, ttlMs: config.l1TtlMs, }); this.l2 = new DistributedCache({ redisUrl, defaultTtlSeconds: config.l2TtlSeconds, prefix: config.namespace, connectionTimeout: 5000, maxRetries: 3, retryDelayMs: 100, }); } /** * Multi-tier get with L1 → L2 → source fallback. */ async get(key: string): Promise<V | undefined> { // Try L1 first (local, ultra-fast) let value = this.l1.get(key); if (value !== undefined) { this.metrics.l1Hits++; return value; } // Try L2 (distributed) try { value = await this.l2.get(key) ?? undefined; if (value !== undefined) { this.metrics.l2Hits++; // Promote to L1 for future accesses this.l1.set(key, value); return value; } } catch (error) { // L2 unavailable, treat as miss console.warn('L2 cache unavailable:', error); } this.metrics.misses++; return undefined; } /** * Get with loader - populates both tiers on miss. */ async getOrLoad(key: string, loader: () => Promise<V>): Promise<V> { // Try caches first const cached = await this.get(key); if (cached !== undefined) { return cached; } // Load from source const value = await loader(); // Populate both tiers await this.set(key, value); return value; } /** * Set in both tiers. */ async set(key: string, value: V): Promise<void> { // Write to L1 (synchronous) this.l1.set(key, value); // Write to L2 (async, non-blocking) this.l2.set(key, value, this.config.l2TtlSeconds).catch(err => { console.warn('L2 cache write failed:', err); }); } /** * Invalidate from both tiers. */ async invalidate(key: string): Promise<void> { // Invalidate L1 (local only) this.l1.invalidate(key); // Invalidate L2 (all instances will see this) try { await this.l2.invalidate(key); } catch (error) { console.warn('L2 cache invalidation failed:', error); } // Note: Other instances still have stale L1 // Combine with Pub/Sub for cross-instance L1 invalidation } /** * Get tiered cache metrics. */ getMetrics(): TieredCacheMetrics { const total = this.metrics.l1Hits + this.metrics.l2Hits + this.metrics.misses; return { l1HitRate: total > 0 ? this.metrics.l1Hits / total : 0, l2HitRate: total > 0 ? this.metrics.l2Hits / total : 0, overallHitRate: total > 0 ? (this.metrics.l1Hits + this.metrics.l2Hits) / total : 0, l1Hits: this.metrics.l1Hits, l2Hits: this.metrics.l2Hits, misses: this.metrics.misses, l1Stats: this.l1.getMetrics(), l2Status: this.l2.getHealthStatus(), }; }} interface TieredCacheMetrics { l1HitRate: number; l2HitRate: number; overallHitRate: number; l1Hits: number; l2Hits: number; misses: number; l1Stats: LocalCacheMetrics; l2Status: CacheHealthStatus;} // Production configuration exampleconst userCache = new TieredCache<User>( { l1MaxItems: 1000, // Small L1 for hottest users l1TtlMs: 30 * 1000, // 30 second L1 TTL l2TtlSeconds: 5 * 60, // 5 minute L2 TTL namespace: 'user-service:users', }, process.env.REDIS_URL!);L1 TTL should be significantly shorter than L2 TTL. This ensures L1 refreshes from L2 regularly, maintaining coherence across instances. A common pattern: L1 = 30 seconds, L2 = 5-15 minutes.
Choosing between local, distributed, or multi-tier caching requires analyzing your specific requirements. Here's a decision framework:
| Requirement | Local Only | Distributed Only | Multi-Tier |
|---|---|---|---|
| Sub-millisecond latency critical | ✅ Best choice | ❌ Network latency | ✅ L1 handles hot path |
| Data must be consistent across instances | ❌ Coherence issues | ✅ Single source | ⚠️ Requires coherence strategy |
| Large dataset (GB+) | ❌ Memory limited | ✅ Scales separately | ✅ L2 holds large dataset |
| Survive instance restarts | ❌ Lost on restart | ✅ Persists externally | ✅ L2 persists |
| Zero external dependencies | ✅ Self-contained | ❌ Redis/Memcached needed | ❌ Requires distributed tier |
| Simple operational model | ✅ Embedded | ⚠️ Additional service | ⚠️ Most complex |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
/** * Cache topology decision helper. */interface CacheRequirements { latencyBudgetMs: number; // Maximum acceptable latency consistencyRequired: boolean; // Must be consistent across instances datasetSizeGB: number; // Total cacheable data size instanceCount: number; // Number of application instances operationalComplexity: 'minimal' | 'moderate' | 'acceptable'; stalenessToleranceSeconds: number; // How stale is okay} function recommendTopology(req: CacheRequirements): TopologyRecommendation { // Single instance = local cache is sufficient if (req.instanceCount === 1) { return { topology: 'local', rationale: 'Single instance - no coherence concerns', l1Config: { maxMB: Math.min(req.datasetSizeGB * 1000, 500), ttlMs: 300000 }, }; } // Ultra-low latency requirement if (req.latencyBudgetMs < 1) { if (req.consistencyRequired) { return { topology: 'multi-tier', rationale: 'Need speed of L1 with consistency from L2', l1Config: { maxMB: 100, ttlMs: Math.min(req.stalenessToleranceSeconds * 1000, 30000) }, l2Config: { maxGB: req.datasetSizeGB, ttlSeconds: 300 }, coherenceStrategy: 'pubsub', }; } return { topology: 'local', rationale: 'Latency critical, consistency secondary', l1Config: { maxMB: 200, ttlMs: req.stalenessToleranceSeconds * 1000 }, coherenceStrategy: 'ttl-only', }; } // Strong consistency required if (req.consistencyRequired && req.stalenessToleranceSeconds < 5) { return { topology: 'distributed', rationale: 'Consistency critical - single source of truth', l2Config: { maxGB: req.datasetSizeGB, ttlSeconds: 300 }, }; } // Large dataset if (req.datasetSizeGB > 1) { return { topology: 'multi-tier', rationale: 'Dataset too large for local; L1 for hot subset', l1Config: { maxMB: 500, ttlMs: 60000 }, l2Config: { maxGB: req.datasetSizeGB, ttlSeconds: 600 }, coherenceStrategy: req.stalenessToleranceSeconds < 60 ? 'pubsub' : 'ttl-only', }; } // Default: multi-tier for balance return { topology: 'multi-tier', rationale: 'Balanced approach for typical workload', l1Config: { maxMB: 100, ttlMs: 30000 }, l2Config: { maxGB: 1, ttlSeconds: 300 }, coherenceStrategy: 'pubsub', };} interface TopologyRecommendation { topology: 'local' | 'distributed' | 'multi-tier'; rationale: string; l1Config?: { maxMB: number; ttlMs: number }; l2Config?: { maxGB: number; ttlSeconds: number }; coherenceStrategy?: 'ttl-only' | 'pubsub' | 'version-based';} // Example usageconst recommendation = recommendTopology({ latencyBudgetMs: 5, consistencyRequired: true, datasetSizeGB: 2, instanceCount: 10, operationalComplexity: 'acceptable', stalenessToleranceSeconds: 30,}); console.log(recommendation);// {// topology: 'multi-tier',// rationale: 'Dataset too large for local; L1 for hot subset',// l1Config: { maxMB: 500, ttlMs: 60000 },// l2Config: { maxGB: 2, ttlSeconds: 600 },// coherenceStrategy: 'pubsub'// }For greenfield projects, start with distributed-only (Redis). Add a local L1 tier only when profiling shows network latency is a bottleneck. Multi-tier complexity is only justified when you have clear evidence of need.
Cache topology is an architectural decision with significant implications. Let's consolidate the key principles:
What's Next:
With cache topology understood, we'll explore cache warming strategies in the final page of this module. You'll learn how to pre-populate caches to avoid cold-start penalties, implement background warming, and handle cache warming during deployments and scaling events.
You now understand the trade-offs between local and distributed caches, cache coherence challenges, and multi-tier architectures. These patterns enable you to design caching systems that achieve both performance and consistency goals appropriate to your requirements.