Scaling Strategies - Learning Module

Loading content...

0/273

Cache Scaling Patterns

Caching: The Great Multiplier

If there's one technique that delivers outsized scaling benefits relative to its complexity, it's caching. A well-designed caching layer can absorb 90-99% of read traffic, reducing database load by orders of magnitude. Systems that would require 100 database servers can often run on 10—or fewer—with effective caching.

But caching at scale introduces its own challenges. Cache invalidation famously joins naming things as one of the two hard problems in computer science. Distributed caches add consistency, partitioning, and availability concerns. Scaling caches requires understanding not just what to cache, but how to build caching infrastructure that scales horizontally.

This page explores the patterns for building and scaling caching layers in distributed systems, from single-node caches to globally distributed caching infrastructure.

What You Will Learn

By the end of this page, you will understand cache topologies, partitioning strategies for distributed caches, consistent hashing, cache clustering patterns, and how to build caching infrastructure that scales horizontally. You'll be equipped to design caching layers for systems serving millions of requests per second.

The Scale of Caching Impact

Before diving into scaling patterns, let's quantify the impact of caching at scale:

The Math of Cache Hit Rates:

cache-impact-math.md
Scenario: E-commerce product catalog
- 100,000 requests/second to product API
- Each database query: 10ms latency
- Database can handle: 10,000 queries/second
 
WITHOUT CACHING:
100,000 req/s → Database
Need 10 database servers (expensive!)
Total latency: 10ms minimum
 
WITH 90% CACHE HIT RATE:
100,000 req/s → Cache: 90,000 hits → 0.1ms latency
                      → 10,000 misses → Database
Need only 1 database server!
Average latency: 0.1 * 0.9 + 10 * 0.1 = 1.09ms (9x faster)
 
WITH 99% CACHE HIT RATE:
100,000 req/s → Cache: 99,000 hits → 0.1ms
                      → 1,000 misses → Database
Database load: 1% of original
Average latency: 0.1 * 0.99 + 10 * 0.01 = 0.199ms (50x faster)
 
The Cache Efficiency Formula:
──────────────────────────────
Database Load = Total Traffic × (1 - Hit Rate)
 
95% hit rate → 5% database load  → 20x reduction
99% hit rate → 1% database load  → 100x reduction
99.9% hit rate → 0.1% database load → 1000x reduction

The Pareto Principle in Caching

In most systems, 20% of data accounts for 80% of accesses. Often it's more extreme: 1% of products might receive 50% of views. This 'long tail' distribution means even small caches can achieve high hit rates by caching the 'hot' data. Understanding your access distribution is key to cache sizing.

Cache Latency Comparison:

Latency by Cache Layer
Cache Type	Typical Latency	Throughput (ops/s)	Use Case
CPU L1 cache	~1 nanosecond	Billions	CPU internal
CPU L3 cache	~10 nanoseconds	Billions	CPU internal
In-process (HashMap)	~100 nanoseconds	10-100 million	Hot data, no network
Local Redis (same machine)	~100 microseconds	100,000-500,000	Session data, rate limits
Remote Redis (same datacenter)	~500 microseconds	50,000-200,000	Shared cache, distributed
CDN edge cache	~10-50 milliseconds	Varies	Static content, global users
Database (optimized)	~1-10 milliseconds	1,000-10,000	Persistent storage

Cache Topologies for Scale

As systems scale, cache architecture evolves from single nodes to distributed clusters. Understanding these topologies is essential for building scalable caching layers.

Topology 1: Embedded Cache

The cache lives inside the application process.

embedded-cache.md
┌─────────────────────────────────────────────────────────┐
│  Application Server                                     │
│  ┌─────────────────────────────────────────────────┐   │
│  │  Application Code                               │   │
│  │                                                 │   │
│  │  ┌─────────────────────────────────┐           │   │
│  │  │  Embedded Cache (In-Memory)     │           │   │
│  │  │  - HashMap/ConcurrentHashMap    │           │   │
│  │  │  - Caffeine, Guava Cache        │           │   │
│  │  │  - Node.js Map/lru-cache        │           │   │
│  │  └─────────────────────────────────┘           │   │
│  │                                                 │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘
 
Pros:
- Fastest possible access (no network)
- No external dependencies
- Simple to implement
 
Cons:
- Cache per server = memory duplication
- Cache cold after restart
- Inconsistency across instances
- Limited by process memory

Topology 2: Sidecar Cache

A cache process runs alongside the application on the same machine.

sidecar-cache.md
┌──────────────────────────────────────────────────────────┐
│  Server / Container Pod                                  │
│  ┌────────────────────┐    ┌────────────────────┐       │
│  │  Application       │    │  Redis Sidecar     │       │
│  │  Container         │─┬──│  Container         │       │
│  │                    │ │  │                    │       │
│  │                    │ │  │  Local to pod      │       │
│  └────────────────────┘ │  │  Low latency       │       │
│                         │  └────────────────────┘       │
│                         localhost                        │
└──────────────────────────────────────────────────────────┘
 
Pros:
- Low latency (localhost network)
- Cache survives app restarts
- Larger cache than in-process
- Standard Redis tooling
 
Cons:
- Still per-server duplication
- Inconsistency across pods
- Pod termination loses cache

Topology 3: Centralized Distributed Cache

A shared cache cluster serves all application instances.

distributed-cache.md
┌────────────┐ ┌────────────┐ ┌────────────┐
│  App       │ │  App       │ │  App       │
│  Server 1  │ │  Server 2  │ │  Server N  │
└──────┬─────┘ └──────┬─────┘ └──────┬─────┘
       │              │              │
       └──────────────┼──────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │    Distributed Cache Cluster │
       │                              │
       │  ┌──────┐ ┌──────┐ ┌──────┐ │
       │  │Node 1│ │Node 2│ │Node 3│ │
       │  │Shard │ │Shard │ │Shard │ │
       │  └──────┘ └──────┘ └──────┘ │
       │                              │
       │   Redis Cluster / Memcached  │
       └──────────────────────────────┘
 
Pros:
- Single source of truth
- No duplication
- Scales independently
- Efficient memory usage
 
Cons:
- Network latency overhead
- Cache cluster is critical infrastructure
- More complex operations

Topology 4: Multi-Tier Cache

Combine local and distributed caches for optimal performance.

multi-tier-cache.md
┌────────────────────────────────────────────────────────┐
│  Application Server                                    │
│  ┌────────────────────────────────────────────────┐   │
│  │  L1: In-Process Cache (Caffeine)               │   │
│  │  - Hottest data only                           │   │
│  │  - TTL: 10-60 seconds                          │   │
│  │  - Size: 1000-10000 entries                    │   │
│  └────────────────────────────────────────────────┘   │
└───────────────────────────┬────────────────────────────┘
                            │ On L1 miss
                            ▼
      ┌──────────────────────────────────────┐
      │  L2: Distributed Cache (Redis)       │
      │  - Warm data, shared across servers  │
      │  - TTL: 5-60 minutes                 │
      │  - Size: Millions of entries         │
      └────────────────────────┬─────────────┘
                               │ On L2 miss
                               ▼
             ┌──────────────────────────────┐
             │  L3: Database                │
             │  - Cold/uncached data        │
             │  - Source of truth           │
             └──────────────────────────────┘
 
Read path: L1 → L2 → L3 (database)
Write path: Database → Invalidate L2 → Invalidate L1
 
Benefits:
- L1 absorbs hottest traffic (no network)
- L2 handles cache misses efficiently
- Database only handles true misses

Multi-Tier Consistency Challenge

Multi-tier caches multiply consistency challenges. When data changes, you must invalidate all tiers. L1 caches across servers may see stale data until their TTL expires. Design L1 TTLs to be short (seconds, not minutes) and accept eventual consistency, or implement distributed invalidation protocols.

Consistent Hashing: The Foundation of Cache Distribution

When a distributed cache grows to multiple nodes, you need a strategy for determining which node holds which keys. Consistent hashing is the foundational algorithm that makes this work at scale.

The Problem with Simple Hashing:

simple-hashing-problem.md
Simple modulo hashing: node = hash(key) % num_nodes
 
With 3 nodes:
  hash("user:123") % 3 = 1 → Node 1
  hash("user:456") % 3 = 0 → Node 0
  hash("user:789") % 3 = 2 → Node 2
 
PROBLEM: Adding or removing nodes reshuffles EVERYTHING
 
Add Node 3 (now 4 nodes):
  hash("user:123") % 4 = 2 → Node 2  (was Node 1, MOVED!)
  hash("user:456") % 4 = 3 → Node 3  (was Node 0, MOVED!)
  hash("user:789") % 4 = 1 → Node 1  (was Node 2, MOVED!)
 
Result: ~75% of keys map to different nodes
All moved keys = cache misses = database thundering herd
Every scaling event triggers a cache miss storm!

Consistent Hashing Solution:

Consistent hashing maps both nodes and keys onto a circular space (the "hash ring"). Keys are assigned to the nearest node clockwise on the ring.

consistent-hashing.md
THE HASH RING:
                         0°
                          │
                   ┌──────┴──────┐
                  ╱               ╲
                 ╱      Node A     ╲
               90°       ▲         ╲
                │        │          │ 270°
                │   ────►│◄────     │
                │        │          │
                 ╲      ▼          ╱
                  ╲    Node B     ╱
                   ╲             ╱
                    ───────────
                        180°
                       Node C
 
Keys are hashed to positions on the ring.
Each key belongs to the first node clockwise from its position.
 
Key "user:123" → hash to position 45° → nearest node is A
Key "user:456" → hash to position 120° → nearest node is B  
Key "user:789" → hash to position 200° → nearest node is C
 
ADDING A NODE:
Insert Node D at position 225° (between C and A)
 
Only keys between 180° and 225° move to Node D
≈ 25% of Node A's keys move (1/4 of the ring)
75% of ALL keys stay where they are!
 
REMOVING A NODE:
Remove Node B
 
Keys from 90° to 180° now go to Node C
Only 25% of keys affected (the keys that were on B)

Virtual Nodes for Better Distribution:

With only physical nodes on the ring, distribution can be uneven. Virtual nodes solve this:

virtual-nodes.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
class ConsistentHashRing<T> {
    private ring: Map<number, T> = new Map();
    private sortedKeys: number[] = [];
    private virtualNodesPerPhysical: number;
    
    constructor(virtualNodesPerPhysical = 150) {
        this.virtualNodesPerPhysical = virtualNodesPerPhysical;
    }
    
    // Add a physical node with many virtual nodes
    addNode(node: T, nodeId: string): void {
        for (let i = 0; i < this.virtualNodesPerPhysical; i++) {
            const virtualNodeId = `${nodeId}#${i}`;
            const hash = this.hash(virtualNodeId);
            this.ring.set(hash, node);
            this.sortedKeys.push(hash);
        }
        this.sortedKeys.sort((a, b) => a - b);
    }
    
    // Remove all virtual nodes for a physical node
    removeNode(nodeId: string): void {
        for (let i = 0; i < this.virtualNodesPerPhysical; i++) {
            const virtualNodeId = `${nodeId}#${i}`;
            const hash = this.hash(virtualNodeId);
            this.ring.delete(hash);
            this.sortedKeys = this.sortedKeys.filter(k => k !== hash);
        }
    }
    
    // Find the node responsible for a key
    getNode(key: string): T | undefined {
        if (this.ring.size === 0) return undefined;
        
        const hash = this.hash(key);
        
        // Binary search for first node with hash >= key hash
        let left = 0, right = this.sortedKeys.length - 1;
        while (left < right) {
            const mid = Math.floor((left + right) / 2);
            if (this.sortedKeys[mid] < hash) {
                left = mid + 1;
            } else {
                right = mid;
            }
        }
        
        // Wrap around if we've gone past the end
        const nodeHash = this.sortedKeys[left] >= hash 
            ? this.sortedKeys[left] 
            : this.sortedKeys[0];
            
        return this.ring.get(nodeHash);
    }
    
    private hash(key: string): number {
        // Use a proper hash function (MD5, xxHash, etc.)
        // This is a simplified example
        let hash = 0;
        for (let i = 0; i < key.length; i++) {
            hash = ((hash << 5) - hash) + key.charCodeAt(i);
            hash = hash & hash;
        }
        return Math.abs(hash);
    }
}
 
// Usage
const ring = new ConsistentHashRing<CacheNode>();
ring.addNode(redisNode1, 'redis-1');
ring.addNode(redisNode2, 'redis-2');
ring.addNode(redisNode3, 'redis-3');
 
const node = ring.getNode('user:12345');
await node.get('user:12345');

Virtual Node Count

More virtual nodes = better distribution but more memory. 100-200 virtual nodes per physical node is typical. With 10 physical nodes and 150 virtual each, you have 1,500 points on the ring, ensuring relatively even distribution even if nodes have different capacities (assign more virtual nodes to larger nodes).

Redis Cluster: Scaling Redis Horizontally

Redis Cluster is Redis's native solution for horizontal scaling. Understanding its architecture is essential for scaling cache infrastructure.

Redis Cluster Fundamentals:

redis-cluster-architecture.md
REDIS CLUSTER ARCHITECTURE:
 
┌──────────────────────────────────────────────────────────────┐
│                    HASH SLOTS (16,384 total)                 │
│                                                              │
│  0 ────────────── 5461 ─────────── 10922 ─────────── 16383  │
│  │     Shard 1    │    Shard 2     │      Shard 3     │     │
└──┬────────────────┴────────────────┴──────────────────┬─────┘
   │                                                    │
   ▼                                                    ▼
┌────────────────┐  ┌────────────────┐  ┌────────────────┐
│ Shard 1        │  │ Shard 2        │  │ Shard 3        │
│                │  │                │  │                │
│ ┌────────────┐ │  │ ┌────────────┐ │  │ ┌────────────┐ │
│ │  Primary   │ │  │ │  Primary   │ │  │ │  Primary   │ │
│ │  Redis     │ │  │ │  Redis     │ │  │ │  Redis     │ │
│ └──────┬─────┘ │  │ └──────┬─────┘ │  │ └──────┬─────┘ │
│        │       │  │        │       │  │        │       │
│   Replication  │  │   Replication  │  │   Replication  │
│        │       │  │        │       │  │        │       │
│ ┌──────▼─────┐ │  │ ┌──────▼─────┐ │  │ ┌──────▼─────┐ │
│ │  Replica   │ │  │ │  Replica   │ │  │ │  Replica   │ │
│ │  Redis     │ │  │ │  Redis     │ │  │ │  Redis     │ │
│ └────────────┘ │  │ └────────────┘ │  │ └────────────┘ │
└────────────────┘  └────────────────┘  └────────────────┘
 
Hash Slot Assignment:
- Key slot = CRC16(key) % 16384
- "user:123" → CRC16("user:123") % 16384 = 5862 → Shard 2
- Each shard owns a range of slots

Key Concepts in Redis Cluster:

Redis Cluster Key Concepts

•16,384 Hash Slots: Fixed number, divided among shards. Resharding moves slots between nodes.
•Gossip Protocol: Nodes communicate cluster state (which node owns which slots). No central coordinator.
•Smart Clients: Cluster-aware clients cache slot→node mapping. Direct connections to correct node.
•MOVED Redirects: If a key moved (during resharding), the old node returns MOVED with new location.
•ASK Redirects: During live migration, temporarily redirect to new node.
•Hash Tags: Keys with {tag} use only the tag for hashing. {user:123}:profile and {user:123}:orders → same slot.

redis-cluster-client.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import { createCluster } from 'redis';
 
// Create cluster client
const cluster = createCluster({
    rootNodes: [
        { host: 'redis-node-1.internal', port: 6379 },
        { host: 'redis-node-2.internal', port: 6379 },
        { host: 'redis-node-3.internal', port: 6379 },
    ],
    defaults: {
        socket: {
            connectTimeout: 10000,
            keepAlive: 5000,
        },
    },
    // Enable reading from replicas for read scaling
    useReplicas: true,
});
 
await cluster.connect();
 
// Simple operations work transparently
await cluster.set('user:123', JSON.stringify({ name: 'Alice' }));
const user = await cluster.get('user:123');
 
// Multi-key operations: ONLY work if keys in same slot!
// Use hash tags to ensure colocation
await cluster.set('{user:123}:profile', '...');
await cluster.set('{user:123}:settings', '...');
 
// MGET works because both keys have {user:123} → same slot
const [profile, settings] = await cluster.mGet([
    '{user:123}:profile',
    '{user:123}:settings'
]);
 
// WARNING: This may fail if keys on different slots!
// await cluster.mGet(['user:123', 'user:456']); // Cross-slot error
 
// For cross-slot operations, use pipelines with care
async function getMultipleUsers(userIds: string[]): Promise<Map<string, string>> {
    const results = new Map();
    
    // Group by slot (or just issue parallel requests)
    const promises = userIds.map(async (id) => {
        const value = await cluster.get(`user:${id}`);
        results.set(id, value);
    });
    
    await Promise.all(promises);
    return results;
}

Cross-Slot Operations

Redis Cluster does not support multi-key operations across slots. Commands like MGET, MSET, and SUNION will fail if keys hash to different slots. Use hash tags {common_tag} or redesign your data model to colocate related data. This is the most significant constraint when migrating from single Redis to cluster.

Cache Patterns at Scale

Different caching patterns have different scaling characteristics. Understanding these helps choose the right pattern for your workload.

Pattern 1: Cache-Aside (Lazy Loading)

The application manages cache population explicitly.

cache-aside-pattern.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Cache-Aside: Application manages cache
async function getProduct(productId: string): Promise<Product> {
    // 1. Check cache first
    const cached = await cache.get(`product:${productId}`);
    if (cached) {
        return JSON.parse(cached);
    }
    
    // 2. Cache miss: load from database
    const product = await database.query(
        'SELECT * FROM products WHERE id = ?',
        [productId]
    );
    
    // 3. Populate cache for future requests
    await cache.set(
        `product:${productId}`,
        JSON.stringify(product),
        { EX: 3600 } // 1 hour TTL
    );
    
    return product;
}
 
// Write path: invalidate cache
async function updateProduct(productId: string, data: ProductData): Promise<void> {
    // 1. Update database
    await database.update('products', productId, data);
    
    // 2. Invalidate cache (delete, don't update)
    await cache.del(`product:${productId}`);
    // Next read will repopulate with fresh data
}
 
// Scaling characteristics:
// ✓ Simple, works with any cache
// ✓ Only caches data that's actually read
// ✗ First request after invalidation hits database
// ✗ Cache stampede possible on popular keys

Pattern 2: Read-Through Cache

The cache itself is responsible for loading data on miss.

read-through-cache.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Read-Through: Cache loads data transparently
class ReadThroughCache<T> {
    constructor(
        private cache: CacheClient,
        private loader: (key: string) => Promise<T>,
        private options: { ttlSeconds: number }
    ) {}
    
    async get(key: string): Promise<T> {
        // Cache handles miss internally
        const cached = await this.cache.get(key);
        if (cached) {
            return JSON.parse(cached);
        }
        
        // Load from source
        const value = await this.loader(key);
        
        // Store in cache
        await this.cache.set(key, JSON.stringify(value), {
            EX: this.options.ttlSeconds
        });
        
        return value;
    }
}
 
// Usage: application just reads, cache handles loading
const productCache = new ReadThroughCache<Product>(
    redisClient,
    async (key) => {
        const productId = key.replace('product:', '');
        return database.getProduct(productId);
    },
    { ttlSeconds: 3600 }
);
 
const product = await productCache.get('product:123');
 
// Scaling characteristics:
// ✓ Simpler application code
// ✓ Cache encapsulates loading logic
// ✗ Still susceptible to stampedes
// ✗ Requires cache to understand data sources

Preventing Cache Stampedes:

stampede-prevention.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// Problem: When a cached key expires, many concurrent requests
// hit the database simultaneously = STAMPEDE
 
// Solution 1: Probabilistic Early Refresh
async function getWithEarlyRefresh(key: string): Promise<Product> {
    const { value, ttl } = await cache.getWithTTL(key);
    
    if (value) {
        // Probabilistically refresh before expiration
        // As TTL approaches 0, probability of refresh increases
        const refreshProbability = Math.exp(-ttl / 30); // 30 second window
        
        if (Math.random() < refreshProbability) {
            // Asynchronously refresh (don't block response)
            refreshCache(key).catch(console.error);
        }
        
        return JSON.parse(value);
    }
    
    return cacheMiss(key);
}
 
// Solution 2: Distributed Lock (Single Refresh)
async function getWithLock(key: string): Promise<Product> {
    const cached = await cache.get(key);
    if (cached) {
        return JSON.parse(cached);
    }
    
    const lockKey = `lock:${key}`;
    const locked = await cache.set(lockKey, '1', { NX: true, EX: 10 });
    
    if (locked) {
        // We got the lock - we refresh
        try {
            const product = await database.getProduct(key);
            await cache.set(key, JSON.stringify(product), { EX: 3600 });
            return product;
        } finally {
            await cache.del(lockKey);
        }
    } else {
        // Someone else is refreshing - wait and retry
        await sleep(100);
        return getWithLock(key); // Recursively retry
    }
}
 
// Solution 3: Background Refresh (Never Expire)
// Cache never truly expires - background job refreshes
class BackgroundRefreshCache {
    async start(): Promise<void> {
        setInterval(async () => {
            const keysNearingExpiry = await this.getKeysNearingExpiry();
            
            for (const key of keysNearingExpiry) {
                const fresh = await this.loader(key);
                await this.cache.set(key, fresh, { EX: 3600 });
            }
        }, 30000); // Check every 30 seconds
    }
}

The Hot Key Problem

Extremely popular keys (celebrity tweets, viral products) can overwhelm a single cache node. Solutions include: local caching with jittered TTLs, replicating hot keys to multiple nodes, or using a dedicated 'hot key' cache tier. Monitoring key access patterns helps identify hot keys before they cause incidents.

Memcached vs Redis: Scaling Considerations

Both Memcached and Redis are excellent caching solutions, but they have different scaling characteristics.

Memcached vs Redis for Scaling
Aspect	Memcached	Redis
Architecture	Multi-threaded, uses all cores	Single-threaded event loop*
Clustering	Client-side sharding only	Native cluster mode
Data structures	Simple key-value only	Strings, lists, sets, hashes, etc.
Persistence	None (pure cache)	Optional RDB/AOF persistence
Memory efficiency	Less metadata, more cache space	More features, more overhead
Replication	Not native	Built-in primary/replica
Pub/Sub	Not supported	Built-in pub/sub
Lua scripting	Not supported	Full Lua scripting
Max item size	Default 1MB (configurable)	512MB per value

*Redis 7.0+ supports IO threading for network operations, improving multi-core utilization.

When to Choose Memcached:

Memcached Use Cases

•Pure caching (no persistence needed)
•Simple key-value operations only
•Need to utilize all CPU cores on a single node
•Maximum memory efficiency is critical
•Large objects (multi-MB cached responses)

When to Choose Redis:

Redis Use Cases

•Need data structures (sorted sets for leaderboards, lists for queues)
•Need native clustering and replication
•Want persistence options (cache that survives restarts)
•Need pub/sub for cache invalidation
•Need atomic operations beyond GET/SET
•Session storage with rich data

The Practical Choice

In practice, Redis has become the default choice for most applications due to its richer features and native clustering. Memcached remains relevant for specialized use cases where its multi-threaded architecture provides better performance on high-core-count machines, or where pure caching with maximum memory efficiency is paramount.

Global Cache Distribution

For global applications, cache latency depends on geographic distance. Distributing caches globally reduces latency for users worldwide.

Global Distribution Patterns:

global-cache-patterns.md
PATTERN 1: CDN Caching (Edge Caching)
┌──────────────────────────────────────────────────────────────┐
│                         USERS                                │
│                                                              │
│  Asia Users    Europe Users    US-East Users    US-West     │
│      │              │               │              │        │
│      ▼              ▼               ▼              ▼        │
│  ┌───────┐     ┌───────┐      ┌───────┐      ┌───────┐     │
│  │ CDN   │     │ CDN   │      │ CDN   │      │ CDN   │     │
│  │ Edge  │     │ Edge  │      │ Edge  │      │ Edge  │     │
│  │ Asia  │     │Europe │      │US-East│      │US-West│     │
│  └───┬───┘     └───┬───┘      └───┬───┘      └───┬───┘     │
│      │              │               │              │        │
│      └──────────────┴───────┬───────┴──────────────┘        │
│                             │                                │
│                       ┌─────▼─────┐                         │
│                       │  Origin   │                         │
│                       │  Server   │                         │
│                       └───────────┘                         │
└──────────────────────────────────────────────────────────────┘
 
Best for: Static content, API responses with cache headers
Providers: Cloudflare, Fastly, CloudFront, Akamai
 
 
PATTERN 2: Regional Redis Clusters
┌──────────────────────────────────────────────────────────────┐
│                                                              │
│  US-EAST REGION            US-WEST REGION                   │
│  ┌──────────────────┐     ┌──────────────────┐              │
│  │ App Servers      │     │ App Servers      │              │
│  │       │          │     │       │          │              │
│  │       ▼          │     │       ▼          │              │
│  │ ┌───────────────┐│     │ ┌───────────────┐│              │
│  │ │ Redis Cluster ││     │ │ Redis Cluster ││              │
│  │ │ (Primary)     ││     │ │ (Replica)     ││              │
│  │ └───────────────┘│     │ └───────────────┘│              │
│  └────────┬─────────┘     └────────▲─────────┘              │
│           │                        │                         │
│           └──── Replication ───────┘                         │
│                                                              │
│  Writes: Always to primary region                            │
│  Reads: Local region (may be stale)                          │
└──────────────────────────────────────────────────────────────┘
 
Best for: Read-heavy workloads, acceptable staleness
 
 
PATTERN 3: Write-Local, Read-Global (CRDT-based)
┌──────────────────────────────────────────────────────────────┐
│                                                              │
│  Each region has a writable cache                            │
│  Changes sync via CRDTs (conflict-free)                      │
│                                                              │
│  Region A           Region B           Region C              │
│  ┌──────────┐      ┌──────────┐      ┌──────────┐           │
│  │ Cache    │◄────►│ Cache    │◄────►│ Cache    │           │
│  │ + CRDT   │      │ + CRDT   │      │ + CRDT   │           │
│  └──────────┘      └──────────┘      └──────────┘           │
│                                                              │
│  All regions accept writes                                   │
│  Conflicts automatically resolved                            │
│  Eventually consistent globally                              │
└──────────────────────────────────────────────────────────────┘
 
Best for: User preferences, counters, eventual consistency OK

CDN for API Responses

Modern CDNs can cache API responses, not just static files. Set appropriate Cache-Control headers and use cache keys that include relevant query parameters. Services like Cloudflare Workers and Fastly Compute@Edge can even run custom logic at the edge, enabling sophisticated caching strategies.

Summary: Cache Scaling Patterns

Caching is the great multiplier of system scalability. Let's consolidate the key insights:

Key Takeaways

•Caching provides exponential scaling benefits — 99% hit rate = 100x reduction in database load.
•Choose topology based on needs — Embedded for simplicity, distributed for consistency, multi-tier for performance.
•Consistent hashing is essential — Minimizes reshuffling when adding/removing cache nodes.
•Redis Cluster provides native horizontal scaling — Understand hash slots and cross-slot limitations.
•Prevent cache stampedes — Use locking, probabilistic refresh, or background refresh patterns.
•Hot keys need special handling — Local caching, replication, or dedicated hot-key tiers.
•Global distribution reduces latency — CDN edge, regional clusters, or CRDT-based solutions.
•Monitor cache metrics religiously — Hit rate, latency, memory usage, and eviction rates.

What's Next:

With caching mastered, we'll explore the final scaling strategy in this module: queue scaling patterns. Message queues enable asynchronous processing, decoupling, and load leveling—essential patterns for building resilient, scalable systems.

Page Complete

You now understand how to design and scale caching layers for distributed systems, from consistent hashing fundamentals to Redis Cluster architecture to global distribution strategies. This knowledge enables you to build systems that handle massive read loads efficiently.