Loading content...
If there's one technique that delivers outsized scaling benefits relative to its complexity, it's caching. A well-designed caching layer can absorb 90-99% of read traffic, reducing database load by orders of magnitude. Systems that would require 100 database servers can often run on 10—or fewer—with effective caching.
But caching at scale introduces its own challenges. Cache invalidation famously joins naming things as one of the two hard problems in computer science. Distributed caches add consistency, partitioning, and availability concerns. Scaling caches requires understanding not just what to cache, but how to build caching infrastructure that scales horizontally.
This page explores the patterns for building and scaling caching layers in distributed systems, from single-node caches to globally distributed caching infrastructure.
By the end of this page, you will understand cache topologies, partitioning strategies for distributed caches, consistent hashing, cache clustering patterns, and how to build caching infrastructure that scales horizontally. You'll be equipped to design caching layers for systems serving millions of requests per second.
Before diving into scaling patterns, let's quantify the impact of caching at scale:
The Math of Cache Hit Rates:
Scenario: E-commerce product catalog- 100,000 requests/second to product API- Each database query: 10ms latency- Database can handle: 10,000 queries/second WITHOUT CACHING:100,000 req/s → DatabaseNeed 10 database servers (expensive!)Total latency: 10ms minimum WITH 90% CACHE HIT RATE:100,000 req/s → Cache: 90,000 hits → 0.1ms latency → 10,000 misses → DatabaseNeed only 1 database server!Average latency: 0.1 * 0.9 + 10 * 0.1 = 1.09ms (9x faster) WITH 99% CACHE HIT RATE:100,000 req/s → Cache: 99,000 hits → 0.1ms → 1,000 misses → DatabaseDatabase load: 1% of originalAverage latency: 0.1 * 0.99 + 10 * 0.01 = 0.199ms (50x faster) The Cache Efficiency Formula:──────────────────────────────Database Load = Total Traffic × (1 - Hit Rate) 95% hit rate → 5% database load → 20x reduction99% hit rate → 1% database load → 100x reduction99.9% hit rate → 0.1% database load → 1000x reductionIn most systems, 20% of data accounts for 80% of accesses. Often it's more extreme: 1% of products might receive 50% of views. This 'long tail' distribution means even small caches can achieve high hit rates by caching the 'hot' data. Understanding your access distribution is key to cache sizing.
Cache Latency Comparison:
| Cache Type | Typical Latency | Throughput (ops/s) | Use Case |
|---|---|---|---|
| CPU L1 cache | ~1 nanosecond | Billions | CPU internal |
| CPU L3 cache | ~10 nanoseconds | Billions | CPU internal |
| In-process (HashMap) | ~100 nanoseconds | 10-100 million | Hot data, no network |
| Local Redis (same machine) | ~100 microseconds | 100,000-500,000 | Session data, rate limits |
| Remote Redis (same datacenter) | ~500 microseconds | 50,000-200,000 | Shared cache, distributed |
| CDN edge cache | ~10-50 milliseconds | Varies | Static content, global users |
| Database (optimized) | ~1-10 milliseconds | 1,000-10,000 | Persistent storage |
As systems scale, cache architecture evolves from single nodes to distributed clusters. Understanding these topologies is essential for building scalable caching layers.
Topology 1: Embedded Cache
The cache lives inside the application process.
┌─────────────────────────────────────────────────────────┐│ Application Server ││ ┌─────────────────────────────────────────────────┐ ││ │ Application Code │ ││ │ │ ││ │ ┌─────────────────────────────────┐ │ ││ │ │ Embedded Cache (In-Memory) │ │ ││ │ │ - HashMap/ConcurrentHashMap │ │ ││ │ │ - Caffeine, Guava Cache │ │ ││ │ │ - Node.js Map/lru-cache │ │ ││ │ └─────────────────────────────────┘ │ ││ │ │ ││ └─────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────┘ Pros:- Fastest possible access (no network)- No external dependencies- Simple to implement Cons:- Cache per server = memory duplication- Cache cold after restart- Inconsistency across instances- Limited by process memoryTopology 2: Sidecar Cache
A cache process runs alongside the application on the same machine.
┌──────────────────────────────────────────────────────────┐│ Server / Container Pod ││ ┌────────────────────┐ ┌────────────────────┐ ││ │ Application │ │ Redis Sidecar │ ││ │ Container │─┬──│ Container │ ││ │ │ │ │ │ ││ │ │ │ │ Local to pod │ ││ └────────────────────┘ │ │ Low latency │ ││ │ └────────────────────┘ ││ localhost │└──────────────────────────────────────────────────────────┘ Pros:- Low latency (localhost network)- Cache survives app restarts- Larger cache than in-process- Standard Redis tooling Cons:- Still per-server duplication- Inconsistency across pods- Pod termination loses cacheTopology 3: Centralized Distributed Cache
A shared cache cluster serves all application instances.
┌────────────┐ ┌────────────┐ ┌────────────┐│ App │ │ App │ │ App ││ Server 1 │ │ Server 2 │ │ Server N │└──────┬─────┘ └──────┬─────┘ └──────┬─────┘ │ │ │ └──────────────┼──────────────┘ │ ▼ ┌──────────────────────────────┐ │ Distributed Cache Cluster │ │ │ │ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │Node 1│ │Node 2│ │Node 3│ │ │ │Shard │ │Shard │ │Shard │ │ │ └──────┘ └──────┘ └──────┘ │ │ │ │ Redis Cluster / Memcached │ └──────────────────────────────┘ Pros:- Single source of truth- No duplication- Scales independently- Efficient memory usage Cons:- Network latency overhead- Cache cluster is critical infrastructure- More complex operationsTopology 4: Multi-Tier Cache
Combine local and distributed caches for optimal performance.
┌────────────────────────────────────────────────────────┐│ Application Server ││ ┌────────────────────────────────────────────────┐ ││ │ L1: In-Process Cache (Caffeine) │ ││ │ - Hottest data only │ ││ │ - TTL: 10-60 seconds │ ││ │ - Size: 1000-10000 entries │ ││ └────────────────────────────────────────────────┘ │└───────────────────────────┬────────────────────────────┘ │ On L1 miss ▼ ┌──────────────────────────────────────┐ │ L2: Distributed Cache (Redis) │ │ - Warm data, shared across servers │ │ - TTL: 5-60 minutes │ │ - Size: Millions of entries │ └────────────────────────┬─────────────┘ │ On L2 miss ▼ ┌──────────────────────────────┐ │ L3: Database │ │ - Cold/uncached data │ │ - Source of truth │ └──────────────────────────────┘ Read path: L1 → L2 → L3 (database)Write path: Database → Invalidate L2 → Invalidate L1 Benefits:- L1 absorbs hottest traffic (no network)- L2 handles cache misses efficiently- Database only handles true missesMulti-tier caches multiply consistency challenges. When data changes, you must invalidate all tiers. L1 caches across servers may see stale data until their TTL expires. Design L1 TTLs to be short (seconds, not minutes) and accept eventual consistency, or implement distributed invalidation protocols.
When a distributed cache grows to multiple nodes, you need a strategy for determining which node holds which keys. Consistent hashing is the foundational algorithm that makes this work at scale.
The Problem with Simple Hashing:
Simple modulo hashing: node = hash(key) % num_nodes With 3 nodes: hash("user:123") % 3 = 1 → Node 1 hash("user:456") % 3 = 0 → Node 0 hash("user:789") % 3 = 2 → Node 2 PROBLEM: Adding or removing nodes reshuffles EVERYTHING Add Node 3 (now 4 nodes): hash("user:123") % 4 = 2 → Node 2 (was Node 1, MOVED!) hash("user:456") % 4 = 3 → Node 3 (was Node 0, MOVED!) hash("user:789") % 4 = 1 → Node 1 (was Node 2, MOVED!) Result: ~75% of keys map to different nodesAll moved keys = cache misses = database thundering herdEvery scaling event triggers a cache miss storm!Consistent Hashing Solution:
Consistent hashing maps both nodes and keys onto a circular space (the "hash ring"). Keys are assigned to the nearest node clockwise on the ring.
THE HASH RING: 0° │ ┌──────┴──────┐ ╱ ╲ ╱ Node A ╲ 90° ▲ ╲ │ │ │ 270° │ ────►│◄──── │ │ │ │ ╲ ▼ ╱ ╲ Node B ╱ ╲ ╱ ─────────── 180° Node C Keys are hashed to positions on the ring.Each key belongs to the first node clockwise from its position. Key "user:123" → hash to position 45° → nearest node is AKey "user:456" → hash to position 120° → nearest node is B Key "user:789" → hash to position 200° → nearest node is C ADDING A NODE:Insert Node D at position 225° (between C and A) Only keys between 180° and 225° move to Node D≈ 25% of Node A's keys move (1/4 of the ring)75% of ALL keys stay where they are! REMOVING A NODE:Remove Node B Keys from 90° to 180° now go to Node COnly 25% of keys affected (the keys that were on B)Virtual Nodes for Better Distribution:
With only physical nodes on the ring, distribution can be uneven. Virtual nodes solve this:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
class ConsistentHashRing<T> { private ring: Map<number, T> = new Map(); private sortedKeys: number[] = []; private virtualNodesPerPhysical: number; constructor(virtualNodesPerPhysical = 150) { this.virtualNodesPerPhysical = virtualNodesPerPhysical; } // Add a physical node with many virtual nodes addNode(node: T, nodeId: string): void { for (let i = 0; i < this.virtualNodesPerPhysical; i++) { const virtualNodeId = `${nodeId}#${i}`; const hash = this.hash(virtualNodeId); this.ring.set(hash, node); this.sortedKeys.push(hash); } this.sortedKeys.sort((a, b) => a - b); } // Remove all virtual nodes for a physical node removeNode(nodeId: string): void { for (let i = 0; i < this.virtualNodesPerPhysical; i++) { const virtualNodeId = `${nodeId}#${i}`; const hash = this.hash(virtualNodeId); this.ring.delete(hash); this.sortedKeys = this.sortedKeys.filter(k => k !== hash); } } // Find the node responsible for a key getNode(key: string): T | undefined { if (this.ring.size === 0) return undefined; const hash = this.hash(key); // Binary search for first node with hash >= key hash let left = 0, right = this.sortedKeys.length - 1; while (left < right) { const mid = Math.floor((left + right) / 2); if (this.sortedKeys[mid] < hash) { left = mid + 1; } else { right = mid; } } // Wrap around if we've gone past the end const nodeHash = this.sortedKeys[left] >= hash ? this.sortedKeys[left] : this.sortedKeys[0]; return this.ring.get(nodeHash); } private hash(key: string): number { // Use a proper hash function (MD5, xxHash, etc.) // This is a simplified example let hash = 0; for (let i = 0; i < key.length; i++) { hash = ((hash << 5) - hash) + key.charCodeAt(i); hash = hash & hash; } return Math.abs(hash); }} // Usageconst ring = new ConsistentHashRing<CacheNode>();ring.addNode(redisNode1, 'redis-1');ring.addNode(redisNode2, 'redis-2');ring.addNode(redisNode3, 'redis-3'); const node = ring.getNode('user:12345');await node.get('user:12345');More virtual nodes = better distribution but more memory. 100-200 virtual nodes per physical node is typical. With 10 physical nodes and 150 virtual each, you have 1,500 points on the ring, ensuring relatively even distribution even if nodes have different capacities (assign more virtual nodes to larger nodes).
Redis Cluster is Redis's native solution for horizontal scaling. Understanding its architecture is essential for scaling cache infrastructure.
Redis Cluster Fundamentals:
REDIS CLUSTER ARCHITECTURE: ┌──────────────────────────────────────────────────────────────┐│ HASH SLOTS (16,384 total) ││ ││ 0 ────────────── 5461 ─────────── 10922 ─────────── 16383 ││ │ Shard 1 │ Shard 2 │ Shard 3 │ │└──┬────────────────┴────────────────┴──────────────────┬─────┘ │ │ ▼ ▼┌────────────────┐ ┌────────────────┐ ┌────────────────┐│ Shard 1 │ │ Shard 2 │ │ Shard 3 ││ │ │ │ │ ││ ┌────────────┐ │ │ ┌────────────┐ │ │ ┌────────────┐ ││ │ Primary │ │ │ │ Primary │ │ │ │ Primary │ ││ │ Redis │ │ │ │ Redis │ │ │ │ Redis │ ││ └──────┬─────┘ │ │ └──────┬─────┘ │ │ └──────┬─────┘ ││ │ │ │ │ │ │ │ ││ Replication │ │ Replication │ │ Replication ││ │ │ │ │ │ │ │ ││ ┌──────▼─────┐ │ │ ┌──────▼─────┐ │ │ ┌──────▼─────┐ ││ │ Replica │ │ │ │ Replica │ │ │ │ Replica │ ││ │ Redis │ │ │ │ Redis │ │ │ │ Redis │ ││ └────────────┘ │ │ └────────────┘ │ │ └────────────┘ │└────────────────┘ └────────────────┘ └────────────────┘ Hash Slot Assignment:- Key slot = CRC16(key) % 16384- "user:123" → CRC16("user:123") % 16384 = 5862 → Shard 2- Each shard owns a range of slotsKey Concepts in Redis Cluster:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
import { createCluster } from 'redis'; // Create cluster clientconst cluster = createCluster({ rootNodes: [ { host: 'redis-node-1.internal', port: 6379 }, { host: 'redis-node-2.internal', port: 6379 }, { host: 'redis-node-3.internal', port: 6379 }, ], defaults: { socket: { connectTimeout: 10000, keepAlive: 5000, }, }, // Enable reading from replicas for read scaling useReplicas: true,}); await cluster.connect(); // Simple operations work transparentlyawait cluster.set('user:123', JSON.stringify({ name: 'Alice' }));const user = await cluster.get('user:123'); // Multi-key operations: ONLY work if keys in same slot!// Use hash tags to ensure colocationawait cluster.set('{user:123}:profile', '...');await cluster.set('{user:123}:settings', '...'); // MGET works because both keys have {user:123} → same slotconst [profile, settings] = await cluster.mGet([ '{user:123}:profile', '{user:123}:settings']); // WARNING: This may fail if keys on different slots!// await cluster.mGet(['user:123', 'user:456']); // Cross-slot error // For cross-slot operations, use pipelines with careasync function getMultipleUsers(userIds: string[]): Promise<Map<string, string>> { const results = new Map(); // Group by slot (or just issue parallel requests) const promises = userIds.map(async (id) => { const value = await cluster.get(`user:${id}`); results.set(id, value); }); await Promise.all(promises); return results;}Redis Cluster does not support multi-key operations across slots. Commands like MGET, MSET, and SUNION will fail if keys hash to different slots. Use hash tags {common_tag} or redesign your data model to colocate related data. This is the most significant constraint when migrating from single Redis to cluster.
Different caching patterns have different scaling characteristics. Understanding these helps choose the right pattern for your workload.
Pattern 1: Cache-Aside (Lazy Loading)
The application manages cache population explicitly.
123456789101112131415161718192021222324252627282930313233343536373839
// Cache-Aside: Application manages cacheasync function getProduct(productId: string): Promise<Product> { // 1. Check cache first const cached = await cache.get(`product:${productId}`); if (cached) { return JSON.parse(cached); } // 2. Cache miss: load from database const product = await database.query( 'SELECT * FROM products WHERE id = ?', [productId] ); // 3. Populate cache for future requests await cache.set( `product:${productId}`, JSON.stringify(product), { EX: 3600 } // 1 hour TTL ); return product;} // Write path: invalidate cacheasync function updateProduct(productId: string, data: ProductData): Promise<void> { // 1. Update database await database.update('products', productId, data); // 2. Invalidate cache (delete, don't update) await cache.del(`product:${productId}`); // Next read will repopulate with fresh data} // Scaling characteristics:// ✓ Simple, works with any cache// ✓ Only caches data that's actually read// ✗ First request after invalidation hits database// ✗ Cache stampede possible on popular keysPattern 2: Read-Through Cache
The cache itself is responsible for loading data on miss.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
// Read-Through: Cache loads data transparentlyclass ReadThroughCache<T> { constructor( private cache: CacheClient, private loader: (key: string) => Promise<T>, private options: { ttlSeconds: number } ) {} async get(key: string): Promise<T> { // Cache handles miss internally const cached = await this.cache.get(key); if (cached) { return JSON.parse(cached); } // Load from source const value = await this.loader(key); // Store in cache await this.cache.set(key, JSON.stringify(value), { EX: this.options.ttlSeconds }); return value; }} // Usage: application just reads, cache handles loadingconst productCache = new ReadThroughCache<Product>( redisClient, async (key) => { const productId = key.replace('product:', ''); return database.getProduct(productId); }, { ttlSeconds: 3600 }); const product = await productCache.get('product:123'); // Scaling characteristics:// ✓ Simpler application code// ✓ Cache encapsulates loading logic// ✗ Still susceptible to stampedes// ✗ Requires cache to understand data sourcesPreventing Cache Stampedes:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
// Problem: When a cached key expires, many concurrent requests// hit the database simultaneously = STAMPEDE // Solution 1: Probabilistic Early Refreshasync function getWithEarlyRefresh(key: string): Promise<Product> { const { value, ttl } = await cache.getWithTTL(key); if (value) { // Probabilistically refresh before expiration // As TTL approaches 0, probability of refresh increases const refreshProbability = Math.exp(-ttl / 30); // 30 second window if (Math.random() < refreshProbability) { // Asynchronously refresh (don't block response) refreshCache(key).catch(console.error); } return JSON.parse(value); } return cacheMiss(key);} // Solution 2: Distributed Lock (Single Refresh)async function getWithLock(key: string): Promise<Product> { const cached = await cache.get(key); if (cached) { return JSON.parse(cached); } const lockKey = `lock:${key}`; const locked = await cache.set(lockKey, '1', { NX: true, EX: 10 }); if (locked) { // We got the lock - we refresh try { const product = await database.getProduct(key); await cache.set(key, JSON.stringify(product), { EX: 3600 }); return product; } finally { await cache.del(lockKey); } } else { // Someone else is refreshing - wait and retry await sleep(100); return getWithLock(key); // Recursively retry }} // Solution 3: Background Refresh (Never Expire)// Cache never truly expires - background job refreshesclass BackgroundRefreshCache { async start(): Promise<void> { setInterval(async () => { const keysNearingExpiry = await this.getKeysNearingExpiry(); for (const key of keysNearingExpiry) { const fresh = await this.loader(key); await this.cache.set(key, fresh, { EX: 3600 }); } }, 30000); // Check every 30 seconds }}Extremely popular keys (celebrity tweets, viral products) can overwhelm a single cache node. Solutions include: local caching with jittered TTLs, replicating hot keys to multiple nodes, or using a dedicated 'hot key' cache tier. Monitoring key access patterns helps identify hot keys before they cause incidents.
Both Memcached and Redis are excellent caching solutions, but they have different scaling characteristics.
| Aspect | Memcached | Redis |
|---|---|---|
| Architecture | Multi-threaded, uses all cores | Single-threaded event loop* |
| Clustering | Client-side sharding only | Native cluster mode |
| Data structures | Simple key-value only | Strings, lists, sets, hashes, etc. |
| Persistence | None (pure cache) | Optional RDB/AOF persistence |
| Memory efficiency | Less metadata, more cache space | More features, more overhead |
| Replication | Not native | Built-in primary/replica |
| Pub/Sub | Not supported | Built-in pub/sub |
| Lua scripting | Not supported | Full Lua scripting |
| Max item size | Default 1MB (configurable) | 512MB per value |
*Redis 7.0+ supports IO threading for network operations, improving multi-core utilization.
When to Choose Memcached:
When to Choose Redis:
In practice, Redis has become the default choice for most applications due to its richer features and native clustering. Memcached remains relevant for specialized use cases where its multi-threaded architecture provides better performance on high-core-count machines, or where pure caching with maximum memory efficiency is paramount.
For global applications, cache latency depends on geographic distance. Distributing caches globally reduces latency for users worldwide.
Global Distribution Patterns:
PATTERN 1: CDN Caching (Edge Caching)┌──────────────────────────────────────────────────────────────┐│ USERS ││ ││ Asia Users Europe Users US-East Users US-West ││ │ │ │ │ ││ ▼ ▼ ▼ ▼ ││ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ││ │ CDN │ │ CDN │ │ CDN │ │ CDN │ ││ │ Edge │ │ Edge │ │ Edge │ │ Edge │ ││ │ Asia │ │Europe │ │US-East│ │US-West│ ││ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ ││ │ │ │ │ ││ └──────────────┴───────┬───────┴──────────────┘ ││ │ ││ ┌─────▼─────┐ ││ │ Origin │ ││ │ Server │ ││ └───────────┘ │└──────────────────────────────────────────────────────────────┘ Best for: Static content, API responses with cache headersProviders: Cloudflare, Fastly, CloudFront, Akamai PATTERN 2: Regional Redis Clusters┌──────────────────────────────────────────────────────────────┐│ ││ US-EAST REGION US-WEST REGION ││ ┌──────────────────┐ ┌──────────────────┐ ││ │ App Servers │ │ App Servers │ ││ │ │ │ │ │ │ ││ │ ▼ │ │ ▼ │ ││ │ ┌───────────────┐│ │ ┌───────────────┐│ ││ │ │ Redis Cluster ││ │ │ Redis Cluster ││ ││ │ │ (Primary) ││ │ │ (Replica) ││ ││ │ └───────────────┘│ │ └───────────────┘│ ││ └────────┬─────────┘ └────────▲─────────┘ ││ │ │ ││ └──── Replication ───────┘ ││ ││ Writes: Always to primary region ││ Reads: Local region (may be stale) │└──────────────────────────────────────────────────────────────┘ Best for: Read-heavy workloads, acceptable staleness PATTERN 3: Write-Local, Read-Global (CRDT-based)┌──────────────────────────────────────────────────────────────┐│ ││ Each region has a writable cache ││ Changes sync via CRDTs (conflict-free) ││ ││ Region A Region B Region C ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ Cache │◄────►│ Cache │◄────►│ Cache │ ││ │ + CRDT │ │ + CRDT │ │ + CRDT │ ││ └──────────┘ └──────────┘ └──────────┘ ││ ││ All regions accept writes ││ Conflicts automatically resolved ││ Eventually consistent globally │└──────────────────────────────────────────────────────────────┘ Best for: User preferences, counters, eventual consistency OKModern CDNs can cache API responses, not just static files. Set appropriate Cache-Control headers and use cache keys that include relevant query parameters. Services like Cloudflare Workers and Fastly Compute@Edge can even run custom logic at the edge, enabling sophisticated caching strategies.
Caching is the great multiplier of system scalability. Let's consolidate the key insights:
What's Next:
With caching mastered, we'll explore the final scaling strategy in this module: queue scaling patterns. Message queues enable asynchronous processing, decoupling, and load leveling—essential patterns for building resilient, scalable systems.
You now understand how to design and scale caching layers for distributed systems, from consistent hashing fundamentals to Redis Cluster architecture to global distribution strategies. This knowledge enables you to build systems that handle massive read loads efficiently.