Loading content...
If there's one pattern that delivers outsized impact with relatively modest complexity, it's caching. A well-designed caching layer can reduce database load by 90% or more, decrease response times from hundreds of milliseconds to single digits, and defer expensive database scaling indefinitely.
The fundamental insight is simple: most data is read far more often than it's written, and the same data is requested repeatedly. If the same product page is viewed 10,000 times per hour but its data changes once per day, why fetch it from the database 10,000 times? Cache it once, serve it instantly, and refresh only when necessary.
Yet caching introduces its own challenges: cache invalidation (famously one of the "two hard problems in computer science"), cache coherence, thundering herds, and cache penetration attacks. This page explores caching as a complete discipline—from fundamental concepts through production-grade implementation.
By the end of this page, you will understand how to design and implement effective caching layers at multiple levels of a system. You'll learn cache invalidation strategies, how to handle the challenges of distributed caching, and when caching is appropriate versus when it adds complexity without benefit.
Modern systems employ caching at multiple layers, each with different characteristics, trade-offs, and use cases. Understanding this hierarchy is essential for designing effective caching strategies.
Layer 1: Browser Cache The closest cache to the user. HTTP headers (Cache-Control, ETag, Last-Modified) instruct browsers to cache static assets (images, CSS, JavaScript) and even API responses. Cache hits never reach your servers.
Layer 2: CDN Cache (Edge) Content Delivery Networks cache content at points of presence (PoPs) geographically close to users. Reduces latency and offloads origin servers. Critical for static content and increasingly for dynamic content.
Layer 3: Application Cache (In-Memory) Local caches within application servers (e.g., Guava cache, local in-memory maps). Fastest access, but not shared across instances. Good for small, frequently accessed data that doesn't change often.
Layer 4: Distributed Cache Shared cache layer across the application tier (Redis, Memcached). Network hop required, but provides consistency across all instances and larger capacity than local memory.
Layer 5: Database Query Cache Some databases cache query results internally. Useful for transparent caching but limited control over invalidation.
| Layer | Location | Latency | Capacity | Shared | Best For |
|---|---|---|---|---|---|
| Browser Cache | Client device | 0ms (instant) | Limited (50-500MB) | No | Static assets, personalized data |
| CDN Cache | Edge PoPs | 1-20ms | Very large (TB+) | Yes (per-region) | Static content, cacheable APIs |
| Application Cache | App server memory | < 1ms | Limited (GB) | No (per instance) | Hot path data, computed values |
| Distributed Cache | Cache cluster | 1-5ms (network) | Large (100s GB) | Yes (global) | Session data, database query results |
| Database Cache | Database server | 1-10ms | Based on DB config | Yes | Transparent query caching |
The multi-layer strategy:
Effective systems employ caches at multiple layers, with each layer filtering requests:
If each layer achieves 90% hit rate:
This is the power of cache layering—multiplicative reduction at each tier.
When designing caching, work from outside in. Every request that can be served by the browser or CDN never reaches your infrastructure. Maximize these layers before optimizing internal caches. A proper CDN configuration often reduces origin traffic by 70-90% for content-heavy applications.
Redis has become the de facto standard for distributed caching. Its rich data structures, atomic operations, and excellent performance make it suitable for a wide range of caching patterns.
Why Redis for caching?
Performance: Redis operates entirely in memory. Single-threaded event loop eliminates lock contention. Typical latency is < 1ms for simple operations.
Data structures: Beyond simple key-value, Redis supports lists, sets, sorted sets, hashes, bitmaps, and more. These enable patterns impossible with simple caches.
Atomic operations: INCR, LPUSH, ZADD, and other atomic commands enable race-condition-free counter updates, rate limiting, and leaderboards.
Persistence options: Optional disk persistence (RDB snapshots, AOF logs) protects against data loss during restarts—useful when cache warmup is expensive.
Cluster mode: Redis Cluster provides automatic sharding and failover for horizontal scaling.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
import Redis from 'ioredis'; const redis = new Redis({ host: 'redis-cluster.example.com', port: 6379, retryDelayOnFailover: 100, maxRetriesPerRequest: 3,}); // Pattern 1: Simple cache-aside with TTLasync function getCachedUser(userId: string): Promise<User | null> { const cacheKey = `user:${userId}`; // Try cache first const cached = await redis.get(cacheKey); if (cached) { return JSON.parse(cached); } // Cache miss - fetch from database const user = await database.getUser(userId); if (user) { // Cache for 1 hour await redis.setex(cacheKey, 3600, JSON.stringify(user)); } return user;} // Pattern 2: Hash for object storage (more memory efficient)async function getCachedProduct(productId: string): Promise<Product | null> { const cacheKey = `product:${productId}`; const cached = await redis.hgetall(cacheKey); if (Object.keys(cached).length > 0) { return { id: cached.id, name: cached.name, price: parseFloat(cached.price), stock: parseInt(cached.stock, 10), }; } const product = await database.getProduct(productId); if (product) { await redis.hmset(cacheKey, { id: product.id, name: product.name, price: product.price.toString(), stock: product.stock.toString(), }); await redis.expire(cacheKey, 3600); } return product;} // Pattern 3: Sorted sets for leaderboards/rankingasync function getTopScores(limit: number = 10): Promise<LeaderboardEntry[]> { // ZREVRANGE returns highest scores first const results = await redis.zrevrange('leaderboard', 0, limit - 1, 'WITHSCORES'); const entries: LeaderboardEntry[] = []; for (let i = 0; i < results.length; i += 2) { entries.push({ userId: results[i], score: parseFloat(results[i + 1]), }); } return entries;} // Pattern 4: Rate limiting with sliding windowasync function isRateLimited(userId: string, limit: number, windowSecs: number): Promise<boolean> { const key = `ratelimit:${userId}`; const now = Date.now(); const windowStart = now - (windowSecs * 1000); // Remove old entries await redis.zremrangebyscore(key, 0, windowStart); // Count recent entries const count = await redis.zcard(key); if (count >= limit) { return true; // Rate limited } // Add current request await redis.zadd(key, now, `${now}-${Math.random()}`); await redis.expire(key, windowSecs); return false;}Memcached is simpler and can be slightly faster for pure key-value workloads. Redis offers richer data structures, persistence, and Lua scripting. For most applications, Redis's flexibility outweighs Memcached's marginal performance advantage. Use Memcached only if you have specific needs (legacy compatibility, extreme simplicity) that Redis doesn't address.
The cache-aside pattern (also called "lazy loading") is the most common caching strategy. The application manages both cache and database, loading data into cache on demand.
The flow:
Advantages:
Disadvantages:
Critical implementation considerations:
1. TTL selection: Every cached item needs a TTL (time-to-live). Too short: frequent cache misses, database hammered. Too long: stale data served to users.
Rules of thumb:
2. Serialization format: JSON is readable but verbose. MessagePack or Protocol Buffers are more compact but require schema changes. For most applications, JSON is fine—Redis is memory-bound before serialization becomes a bottleneck.
3. Cache key design: Keys should be:
Good: user:123:v2, product:abc:inventory
Bad: 123, data, cache_key_1
4. Null caching: If results for a query are legitimately empty, cache the empty result. Otherwise, repeated queries for non-existent data always hit the database. Use a sentinel value or short TTL for negative caching.
Cache-aside only addresses reads. On writes, you must decide: invalidate the cache (delete the key) or update the cache (set new value). Generally, invalidation is safer—it avoids race conditions where a stale read happens between database write and cache update. Update only if you can guarantee atomicity.
Phil Karlton's famous quote—"There are only two hard things in Computer Science: cache invalidation and naming things"—resonates because cache invalidation is genuinely difficult. When data changes, cached copies become stale. The question is: how and when do we detect and correct this?
Strategy 1: Time-Based Expiration (TTL)
The simplest approach: every cached item expires after a fixed duration.
Pros:
Cons:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
// Pattern 1: Write-Through (synchronous cache update)async function updateUserWithWriteThrough(userId: string, updates: Partial<User>) { const cacheKey = `user:${userId}`; // Start transaction const trx = await database.transaction(); try { // Update database const user = await trx.update('users', userId, updates); // Update cache await redis.setex(cacheKey, 3600, JSON.stringify(user)); // Commit transaction await trx.commit(); return user; } catch (error) { await trx.rollback(); // On failure, invalidate cache to prevent stale data await redis.del(cacheKey); throw error; }} // Pattern 2: Event-Based Invalidationclass UserService { private eventBus: EventBus; async updateUser(userId: string, updates: Partial<User>) { const user = await database.update('users', userId, updates); // Emit event for cache invalidation await this.eventBus.publish('user.updated', { userId, timestamp: Date.now(), }); return user; }} // Separate cache invalidation handlerclass CacheInvalidator { constructor(private redis: Redis, private eventBus: EventBus) { this.eventBus.subscribe('user.updated', this.handleUserUpdate.bind(this)); } private async handleUserUpdate(event: { userId: string }) { await this.redis.del(`user:${event.userId}`); // Also invalidate related caches await this.redis.del(`user:${event.userId}:preferences`); await this.redis.del(`user:${event.userId}:activity`); }} // Pattern 3: Version-Based Cache Keysclass VersionedCache { private versionKey = 'cache:version:users'; async getVersion(): Promise<number> { const version = await redis.get(this.versionKey); return version ? parseInt(version, 10) : 1; } async incrementVersion(): Promise<number> { return redis.incr(this.versionKey); } async getUser(userId: string): Promise<User | null> { const version = await this.getVersion(); const cacheKey = `user:${userId}:v${version}`; const cached = await redis.get(cacheKey); if (cached) return JSON.parse(cached); const user = await database.getUser(userId); if (user) { await redis.setex(cacheKey, 3600, JSON.stringify(user)); } return user; } async invalidateAllUsers(): Promise<void> { // Simply increment version; old keys expire naturally await this.incrementVersion(); }}Production systems often combine strategies. Use TTL as a safety net (data never stale for more than X minutes), event-based invalidation for real-time updates (changes reflected immediately), and version keys for bulk invalidation (invalidate all users after a schema migration). Each strategy covers different failure modes.
One of the most dangerous caching failure modes is the cache stampede (also called thundering herd). It occurs when a popular cache key expires and multiple requests simultaneously attempt to regenerate it, overwhelming the database.
The scenario:
The irony: The cache was protecting the database. When it expires, the protection disappears precisely when traffic is highest.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103
// Pattern 1: Distributed Lock (Mutex)async function getCachedWithLock<T>( key: string, ttlSeconds: number, loader: () => Promise<T>): Promise<T | null> { // Try cache first const cached = await redis.get(key); if (cached) return JSON.parse(cached); const lockKey = `lock:${key}`; const lockTTL = 10; // Lock expires after 10 seconds // Try to acquire lock const acquired = await redis.set(lockKey, '1', 'EX', lockTTL, 'NX'); if (acquired) { // We have the lock - load data try { const data = await loader(); await redis.setex(key, ttlSeconds, JSON.stringify(data)); return data; } finally { await redis.del(lockKey); } } else { // Another process is loading - wait and retry await sleep(100); const cached = await redis.get(key); if (cached) return JSON.parse(cached); // Still no data - retry with backoff return getCachedWithLock(key, ttlSeconds, loader); }} // Pattern 2: Probabilistic Early Expirationasync function getCachedWithProbabilisticRefresh<T>( key: string, ttlSeconds: number, earlyRefreshWindow: number, // seconds before expiration to consider refresh loader: () => Promise<T>): Promise<T | null> { const cached = await redis.get(key); if (cached) { const { data, createdAt } = JSON.parse(cached); const age = (Date.now() - createdAt) / 1000; const timeToExpiry = ttlSeconds - age; if (timeToExpiry < earlyRefreshWindow) { // In early refresh window - probabilistically refresh // Probability increases as we approach expiration const refreshProbability = 1 - (timeToExpiry / earlyRefreshWindow); if (Math.random() < refreshProbability) { // Refresh in background (don't await) refreshCache(key, ttlSeconds, loader).catch(console.error); } } return data; } // Cache miss - load synchronously const data = await loader(); await redis.setex(key, ttlSeconds, JSON.stringify({ data, createdAt: Date.now(), })); return data;} // Pattern 3: Stale-While-Revalidateasync function getCachedSWR<T>( key: string, ttlSeconds: number, staleWhileRevalidate: number, loader: () => Promise<T>): Promise<T | null> { const cached = await redis.get(key); if (cached) { const { data, createdAt } = JSON.parse(cached); const age = (Date.now() - createdAt) / 1000; if (age > ttlSeconds) { if (age < ttlSeconds + staleWhileRevalidate) { // Stale but within revalidate window // Return stale, refresh in background refreshCache(key, ttlSeconds, loader).catch(console.error); return data; // Return stale immediately } // Too stale - let it fall through to reload } else { return data; // Fresh data } } // No cache or too stale - load synchronously const data = await loader(); await saveToCache(key, data, ttlSeconds); return data;}While locking prevents stampedes, it can create contention under very high concurrency. If 10,000 requests wait for one lock, you've traded database overload for lock waiting. For extremely hot keys, consider background refresh or request coalescing instead of locks. The right pattern depends on your traffic patterns.
When data exists in multiple places—database, distributed cache, local caches across multiple servers—maintaining consistency becomes challenging. Cache coherence refers to ensuring all cache copies reflect the same state.
The fundamental tension:
Stronger consistency requires more coordination, which reduces performance. Weaker consistency improves performance but allows stale reads. There's no universal right answer—the choice depends on your domain.
Consistency levels in caching:
Strong consistency: Cache always reflects current database state. Requires synchronous invalidation on every write. Practically impossible with local/CDN caches.
Eventual consistency: Cache will converge to database state within a bounded time (TTL). Most common approach. Allows brief windows of stale data.
Read-your-writes consistency: A user sees their own writes immediately, even if others see stale data. Implemented by routing reads to primary after writes.
Causal consistency: Related operations are seen in order. If A causes B, no observer sees B before A. Complex to implement in distributed caches.
| Use Case | Required Consistency | Approach | Staleness Tolerance |
|---|---|---|---|
| User profile display | Eventual | TTL + event invalidation | Minutes acceptable |
| Account balance | Strong | Cache-aside with short TTL or no cache | Zero tolerance |
| Shopping cart | Read-your-writes | Write-through + user affinity | Own changes immediate |
| Product inventory | Eventual with bounds | Event invalidation + 30s TTL | Brief oversell acceptable |
| News feed | Eventual | Long TTL, background refresh | Hours acceptable |
| Session authentication | Strong | In-memory or very short TTL | Zero tolerance |
The dual-write problem:
A common pitfall occurs when updating cache and database separately:
Result: Database has 150, cache has 100. Data is inconsistent.
Solutions:
Invalidate, don't update — Delete cache key on write. Next read fetches fresh data. Race conditions cause extra cache misses, not inconsistency.
Single writer — All writes to a key go through one service. No concurrent conflicts.
Conditional updates — Use CAS (compare-and-set) operations. Update cache only if version matches.
Transaction log — Write to database, then publish to cache via change data capture. Order is guaranteed by log ordering.
The safest cache update strategy is deletion. On any write, delete the cache key. The next read will populate fresh data. This trades a cache miss for guaranteed consistency. Most systems can tolerate occasional cache misses far better than inconsistent data.
Cache sizing is both art and science. Too small: low hit rate, frequent eviction, limited benefit. Too large: wasted resources, cold data consuming memory.
Key metrics for sizing:
Working set size: How much data is actively accessed? If you have 100GB of data but only 1GB is accessed in any hour, the working set is ~1GB.
Hit rate target: What hit rate do you need? 90%? 99%? Higher hit rates require more memory to store more data.
Object size: Average size of cached items. Determines how many items fit in a given memory budget.
Access pattern: Uniform access (all items equally likely) or skewed (some items much hotter than others)? Skewed patterns allow smaller caches—hot items stay in cache.
Sizing calculation example:
Scenario:
Calculation:
With 10% buffer for metadata and fragmentation: 330MB
This is much smaller than caching all 1M users (2GB) because only active users need caching.
Monitoring cache efficiency:
Track these metrics continuously:
Most real-world access patterns follow Zipf's law: 20% of items receive 80% of traffic. This means a cache sized for 20% of data can achieve 80% hit rate. Understand your access pattern before sizing. Uniform random access is worst case; real traffic is usually much more skewed—and therefore more cache-friendly.
Caches introduce security considerations that are easy to overlook. Since caches are designed for performance, security features may be minimal. Understanding and mitigating cache-related attacks is essential.
Attack Vector 1: Cache Poisoning
An attacker injects malicious content into the cache, which is then served to legitimate users.
Example: If cache keys include user-controlled data (like a URL parameter) without proper validation, an attacker can pollute caches with malicious content.
Mitigation:
Attack Vector 2: Cache Timing Attacks
An attacker measures response times to determine if data is cached, revealing information about other users' access patterns.
Mitigation:
Security best practices:
Network security: Redis should not be exposed to the internet. Use VPCs, firewalls, and Redis AUTH.
Encryption in transit: Use TLS for connections to cache, especially in cloud environments.
Access control: Use separate cache instances or key prefixes for different trust levels.
Audit logging: Log cache operations for security forensics.
Sensitive data handling: Encrypt sensitive data before caching, or don't cache it at all. Authentication tokens, payment information, and personal data require special care.
Cache key hygiene: Never use user-controlled values directly in cache keys. Hash or sanitize inputs.
Redis has historically been designed for trusted environments and has minimal security by default. Always enable AUTH, bind to internal interfaces only, use TLS, and disable dangerous commands (FLUSHDB, CONFIG, KEYS) in production. Never expose Redis to the internet without these protections.
Caching is one of the most powerful tools in the scaling engineer's toolkit. Let's consolidate the key learnings:
What's next:
With caching understood, we turn to queue-based decoupling—another essential scaling pattern. Queues decouple producers from consumers, enabling asynchronous processing, traffic shaping, and resilience to transient failures. The next page explores message queue architectures, processing patterns, and when to introduce queues into your system.
You now have a comprehensive understanding of caching as a scaling strategy—from cache hierarchies through invalidation strategies, stampede prevention, and security considerations. Caching is often the highest-leverage optimization available, and this knowledge enables you to apply it effectively.