Data Structures & AlgorithmsWhen to Use Advanced Data Structures

When to Use Advanced Data Structures

LevelAdvanced

Duration60 mins

TopicWhen to Use Advanced Data Structures

3 / 4

Cache Design Considerations

The Art and Science of Caching

If range query structures are surgical instruments for slicing through data, caching structures are memory banks for avoiding work entirely. A well-designed cache doesn't just speed up individual operations—it fundamentally changes the economics of your system by trading space for time, turning expensive repeated computations into near-instantaneous retrievals.

But caching is far from a silver bullet. Every cache decision involves tradeoffs: memory consumption, staleness of data, complexity of invalidation, and the overhead of cache management itself. Knowing when to cache, what eviction policy to use, and how to design the cache interface separates thoughtful system design from naive optimization attempts.

This page exhaustively covers cache design considerations—from the fundamental question of whether you need a cache at all, through eviction policy selection, to the intricate details of cache sizing and invalidation strategies.

What You Will Learn

By the end of this page, you will understand when caching is beneficial versus when it adds unnecessary complexity, master the tradeoffs between different eviction policies (LRU, LFU, FIFO, etc.), learn principles for cache sizing and hit rate optimization, and develop intuition for cache invalidation strategies.

When to Introduce Caching

The first—and most important—cache design decision is whether to cache at all. Caching introduces complexity: additional code, potential consistency bugs, memory overhead, and the perpetual question of 'is my cache serving stale data?' Before adding a cache, rigorously evaluate whether it's truly necessary.

Conditions That Favor Caching

•Expensive computation or I/O — The underlying operation must be costly enough that cache overhead is worthwhile. Caching a simple arithmetic operation is absurd; caching a database query or API call makes sense.
•Repeated access patterns — The same inputs must be requested multiple times. A system where every request is unique gains nothing from caching.
•Temporal locality — Recently accessed items should be likely to be accessed again soon. This is the assumption underlying most eviction policies.
•Tolerance for staleness — The application must accept that cached data may not reflect the absolute latest state. Real-time requirements often preclude caching.
•Available memory — Caching consumes memory. If memory is already constrained, adding a cache may cause other performance problems (garbage collection, swapping).
•Stable result for same input — The underlying function should be deterministic or change infrequently. Caching rapidly-changing data creates invalidation nightmares.

Conditions That Argue Against Caching

•Low hit rate expected — If requests are mostly unique (e.g., personalized content with high cardinality), cache misses dominate and overhead exceeds benefit.
•Strict consistency requirements — Financial transactions, inventory counts, or any data where staleness is unacceptable should avoid caching or use very short TTLs.
•Simple underlying operation — If the 'expensive' operation is actually O(1), caching adds overhead without benefit. Measure first!
•Memory pressure already exists — Adding cache memory when the system is already memory-constrained can worsen overall performance.
•Complex invalidation requirements — If cached data depends on multiple sources that change independently, cache invalidation logic can become a maintenance nightmare.

The Cache Confidence Trap

Developers often assume caching will help without measuring. A cache with a 10% hit rate and 5ms cache overhead on every request can actually SLOW DOWN your system compared to no cache. Always measure your hit rate and ensure cache hits significantly outnumber misses before deploying.

Eviction Policy Deep Dive

When a cache reaches capacity, an eviction policy determines which existing item to remove to make room for new entries. The choice of eviction policy dramatically affects cache performance—the difference between a well-tuned policy and a poor one can be a 2-3x difference in hit rate.

Eviction Policy Comparison
Policy	Evicts	Pros	Cons	Best For
LRU (Least Recently Used)	Item not accessed for longest time	Simple, good for temporal locality	Doesn't consider frequency	General purpose, web caches
LFU (Least Frequently Used)	Item with lowest access count	Great for popularity-based access	Slow to adapt to changing patterns	CDN, static asset caches
FIFO (First In First Out)	Oldest item in cache	Extremely simple	Ignores access patterns entirely	Bounded buffers, simple cases
Random	Random item	O(1), no tracking overhead	Suboptimal hit rate	When simplicity trumps all
LRU-K	Least recent among K-th accesses	Better than LRU for scans	More complex to implement	Database buffer pools
ARC (Adaptive Replacement)	Adapts between recency and frequency	Self-tuning, robust	Complex implementation	Operating system caches
SLRU (Segmented LRU)	Probationary items first	Protects frequently used items	Two-tier complexity	CPU caches, database caches
TTL (Time-To-Live)	Items older than threshold	Guarantees freshness	May evict still-useful items	API caches, session stores

Deep Dive: LRU vs LFU

The LRU vs LFU decision is the most common policy choice engineers face. Let's analyze when each excels:

LRU (Least Recently Used):

Assumes temporal locality: recently accessed items will be accessed again
Excellent for user sessions, web page caching, database query results
Vulnerable to 'scan' patterns: a one-time sequential read evicts useful cached data
Implementation: O(1) with hash map + doubly linked list (the classic LeetCode problem)

LFU (Least Frequently Used):

Assumes popularity: frequently accessed items should stay in cache
Excellent for static assets, CDN edge caches, content that doesn't change often
Slow to adapt: an item accessed 1000 times yesterday stays even if never accessed today
Implementation: O(1) possible but more complex (multiple frequency buckets)

The Hybrid Approach: Many production systems use hybrid policies. For example:

TTL + LRU: Items expire after a time limit, and LRU handles capacity limits
LFU with aging: Frequency counts decay over time to adapt to changing patterns
Window-TinyLFU (used by Caffeine cache): Admits based on recency, evicts based on frequency

The 80% Rule

LRU is the right choice about 80% of the time for general-purpose caching. Unless you have specific evidence that your access patterns favor frequency over recency, start with LRU. You can always optimize the policy later based on observed hit rates.

Cache Sizing and Capacity Planning

Choosing the right cache size is a delicate balancing act. Too small, and hit rates suffer. Too large, and you waste memory that could be used elsewhere (or cause garbage collection issues). The optimal size depends on your access patterns, available memory, and acceptable miss rates.

The Working Set Concept:

The working set is the subset of data actively being accessed during a given time window. Ideally, your cache should be large enough to hold the entire working set:

If cache size ≥ working set: Hit rate approaches 100% for repeated accesses
If cache size < working set: Thrashing—items are evicted before they can be reused
If cache size >> working set: Wasted memory, no additional benefit

Estimating Working Set:

Measure unique items in a time window: Count distinct keys accessed per minute/hour/day
Apply the 80/20 rule: Often 20% of items receive 80% of accesses
Simulate with different cache sizes: Run production logs through cache simulation
Monitor hit rates in production: Adjust based on observed behavior

Cache Size Analysis Pattern
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Simulate cache behavior to find optimal size
interface CacheSimulationResult {
    cacheSize: number;
    hitRate: number;
    memoryUsed: number;
}
 
function simulateCachePerformance(
    accessLog: string[],                // Sequence of accessed keys
    cacheSizesToTest: number[],         // Different sizes to evaluate
    evictionPolicy: 'lru' | 'lfu'       // Policy to test
): CacheSimulationResult[] {
    return cacheSizesToTest.map(size => {
        const cache = createCache(size, evictionPolicy);
        let hits = 0;
        let total = 0;
        
        for (const key of accessLog) {
            total++;
            if (cache.get(key) !== undefined) {
                hits++;
            } else {
                cache.put(key, fetchValue(key));
            }
        }
        
        return {
            cacheSize: size,
            hitRate: hits / total,
            memoryUsed: size * estimatedItemSize
        };
    });
}
 
// Example output analysis:
// Size 100:  Hit rate 45%, Memory 1MB
// Size 500:  Hit rate 72%, Memory 5MB   ← Diminishing returns start here
// Size 1000: Hit rate 78%, Memory 10MB
// Size 5000: Hit rate 81%, Memory 50MB  ← Not worth 5x memory for 3% gain
 
// Choose size 500: Best balance of hit rate and memory

The Cliff Effect

Cache hit rates often follow a cliff pattern: increasing size rapidly improves hit rate up to a point (the working set size), then additional size yields diminishing returns. Find the 'knee' of this curve—that's your optimal cache size.

Memory Budget Allocation:

Caches don't exist in isolation. Consider the memory budget across your entire system:

Component	Typical Allocation	Notes
Application heap	40-60%	Core business logic
Caches	20-40%	Split across multiple caches
Buffers & I/O	10-20%	Network, file I/O
GC headroom	10-20%	For languages with GC

Multiple Caches: Most systems have multiple caches (query cache, session cache, computed result cache). Each needs its own sizing strategy:

Prioritize caches with highest impact (most expensive underlying operations)
Consider cache hierarchies: L1 (small, fast) → L2 (larger, slower)
Monitor and adjust allocations based on relative hit rates

Cache Invalidation Strategies

Phil Karlton famously said: "There are only two hard things in Computer Science: cache invalidation and naming things." Cache invalidation—ensuring cached data doesn't become stale—is indeed one of the most challenging aspects of cache design.

Invalidation Strategies

•TTL (Time-To-Live) — Items expire after a fixed duration. Simple but doesn't respond to actual data changes. Good for: API responses, content that changes on known schedules.
•Write-Through — Updates go to cache AND underlying store simultaneously. Cache always reflects latest data. Good for: Write-heavy workloads where consistency is critical.
•Write-Behind (Write-Back) — Updates go to cache first, asynchronously written to store. Fast writes but risk of data loss. Good for: High-throughput systems with acceptable data loss risk.
•Cache-Aside (Lazy Loading) — Application manages cache explicitly: check cache, on miss load from store and populate cache. Good for: Read-heavy workloads, complex cache-store relationships.
•Event-Driven Invalidation — Publish/subscribe on data changes to invalidate relevant cache entries. Good for: Distributed systems, microservices architectures.
•Invalidation on Write — When data changes, immediately invalidate/update corresponding cache entries. Good for: Simple data relationships, single-service caches.

The Invalidation Dilemma:

Every invalidation strategy involves tradeoffs:

Too Aggressive (Frequent Invalidation):

Cache hit rate drops
Underlying resources see more load
Latency increases

Too Conservative (Rare Invalidation):

Stale data served to users
Potential data consistency issues
User experience problems

Finding the Balance:

The right approach depends on your staleness tolerance:

Staleness Tolerance	Invalidation Approach	Example
Milliseconds	Write-through + synchronous invalidation	Financial transactions
Seconds	Event-driven invalidation	Social media feeds
Minutes	Short TTL + event backup	Product catalogs
Hours	Moderate TTL	CDN cached static assets
Days	Long TTL, manual refresh	Documentation caches

The Invalidation Bug Pattern

One of the most common production bugs: forgetting to invalidate a cache when data changes. A new code path updates the database but doesn't touch the cache. Users see stale data. Always document what invalidates each cache entry, and consider making invalidation automatic (e.g., decorators/annotations on write operations).

Cache Implementation Patterns

Beyond choosing eviction policies and invalidation strategies, the implementation pattern you select affects code maintainability, testability, and the potential for bugs.

Pattern 1: Inline Caching (Anti-Pattern for Complex Cases)

function getUser(userId: string): User {
    // Cache check inline with business logic
    const cached = cache.get(`user:${userId}`);
    if (cached) return cached;
    
    const user = database.fetchUser(userId);
    cache.put(`user:${userId}`, user, { ttl: 3600 });
    return user;
}

Problems:

Cache logic mixed with business logic
Hard to test business logic in isolation
Cache key construction duplicated everywhere
Easy to forget cache in new code paths

Pattern 2: Decorator/Wrapper Pattern (Recommended)

function cached<T>(
    fn: (...args: any[]) => Promise<T>,
    keyGenerator: (...args: any[]) => string,
    options: { ttl: number }
): (...args: any[]) => Promise<T> {
    return async (...args) => {
        const key = keyGenerator(...args);
        const cached = await cache.get(key);
        if (cached !== undefined) return cached;
        
        const result = await fn(...args);
        await cache.put(key, result, options);
        return result;
    };
}

// Usage
const getUserCached = cached(
    (userId: string) => database.fetchUser(userId),
    (userId) => `user:${userId}`,
    { ttl: 3600 }
);

Benefits:

Business logic remains pure
Cache behavior is consistent and declarative
Easy to test both cached and uncached versions
Cache logic is centralized

Pattern 3: Repository Pattern with Caching Layer

For complex domain models, implement caching as a repository decorator:

interface UserRepository {
    findById(id: string): Promise<User>;
    save(user: User): Promise<void>;
}

class DatabaseUserRepository implements UserRepository {
    async findById(id: string): Promise<User> {
        return this.db.query('SELECT * FROM users WHERE id = ?', [id]);
    }
    async save(user: User): Promise<void> {
        await this.db.query('UPDATE users SET ... WHERE id = ?', [user]);
    }
}

class CachedUserRepository implements UserRepository {
    constructor(
        private underlying: UserRepository,
        private cache: Cache<User>
    ) {}
    
    async findById(id: string): Promise<User> {
        const cached = await this.cache.get(id);
        if (cached) return cached;
        
        const user = await this.underlying.findById(id);
        await this.cache.put(id, user);
        return user;
    }
    
    async save(user: User): Promise<void> {
        await this.underlying.save(user);
        await this.cache.invalidate(user.id);  // Invalidation on write
    }
}

Benefits:

Clear separation of concerns
Caching is optional and swappable
Invalidation logic is co-located with write operations
Testable at each layer

The Cache Stampede Problem

When a popular cache entry expires, hundreds of requests may simultaneously try to recompute it, overloading your backend. Solutions: (1) Lock during recomputation so only one request fetches, (2) Proactive refresh before expiration, (3) Stale-while-revalidate patterns. Always consider stampede prevention for high-traffic cache entries.

Distributed Caching Considerations

In multi-server deployments, caching introduces additional complexity. Each server could maintain its own local cache, leading to inconsistencies, or you could use a shared distributed cache, introducing network latency and failure modes.

Local Cache (Per-Server)

•Extremely fast (in-memory, no network)
•No single point of failure
•Inconsistencies between servers
•Memory duplicated across servers
•Best for: Read-heavy, staleness-tolerant

Distributed Cache (Shared)

•Consistent view across all servers
•Efficient memory use (no duplication)
•Network latency on every access
•Cache server can be SPOF
•Best for: Consistency-critical, large data

Two-Tier Caching (The Production Standard):

Most production systems use a two-tier approach:

L1 (Local, per-server): Small, extremely fast, short TTL
L2 (Distributed, shared): Larger, consistent, longer TTL

Request → Check L1 → Hit? Return
                    → Miss? Check L2 → Hit? Populate L1, Return
                                     → Miss? Fetch from DB, Populate L1+L2, Return

L1 Configuration:

Size: 100-1000 items (depends on item size)
TTL: 10-60 seconds (short to limit staleness)
No cross-server invalidation needed (TTL handles it)

L2 Configuration:

Size: Tens of thousands to millions of items
TTL: Minutes to hours
Technology: Redis, Memcached, etc.

The Network Latency Reality

A Redis call typically takes 0.5-2ms (network + processing). A local HashMap lookup takes 10-100 nanoseconds. That's a 10,000x difference! For hot data accessed millions of times per second, local caching is essential even if you also use a distributed cache.

When NOT to Cache

Just as important as knowing when to cache is recognizing scenarios where caching would be counterproductive. Adding a cache in these situations wastes development time, increases complexity, and may even hurt performance.

Scenarios Where Caching Hurts

•Unique request patterns — If every request is unique (e.g., random UUIDs), cache hit rate is 0%. You're just wasting memory and adding latency.
•Write-heavy workloads — If data changes faster than it's read, cache entries are invalidated before they can be reused. The write/invalidation overhead exceeds any read benefit.
•Strict consistency requirements — Financial ledgers, inventory counts, security tokens. Serving stale data isn't just inefficient—it's incorrect behavior.
•Fast underlying operations — If the 'expensive' operation is actually microseconds, the cache lookup overhead may exceed the savings. Measure first!
•One-time computations — Data processed once and never accessed again (batch jobs, migrations) doesn't benefit from caching.
•Security-sensitive data — Caching auth tokens, session data, or sensitive information creates security risks. Cache invalidation bugs become security vulnerabilities.

The Decision Heuristic:

Before implementing a cache, answer these questions:

What's the expected hit rate? (Below 50% is usually not worth it)
How expensive is the underlying operation? (Must be >>cache overhead)
How frequently does data change? (High churn = low cache effectiveness)
What's the staleness tolerance? (Zero tolerance = don't cache or very short TTL)
What's the memory budget? (Cache competes with other memory needs)

If any answer is unfavorable, strongly reconsider whether caching is appropriate.

The Performance Testing Imperative:

Never assume caching helps. Implement it behind a feature flag, measure hit rates and latency with and without the cache, and only enable it if metrics prove it's beneficial:

const cacheEnabled = featureFlags.get('user-cache-enabled');

const getUser = cacheEnabled 
    ? cachedGetUser 
    : directGetUser;

The Complexity Tax

Every cache you add is a cache you must maintain: monitoring, capacity planning, invalidation logic, debugging tools. The simplest system is one with no caches. Add caching only when you have evidence it's necessary, not as a premature optimization.

Summary: Cache Design Mastery

We've covered the complete landscape of cache design considerations. Let's synthesize the key principles:

Key Cache Design Principles

•Cache only when justified — Expensive operations, repeated access patterns, and tolerance for staleness are prerequisites. Measure before implementing.
•LRU is the default choice — Unless you have specific evidence that frequency-based eviction (LFU) matches your access patterns better, start with LRU.
•Size caches to the working set — Too small causes thrashing, too large wastes memory. Find the 'knee' of the hit rate curve through simulation and monitoring.
•Choose invalidation strategy based on staleness tolerance — From write-through (millisecond consistency) to long TTLs (hours of staleness), match the strategy to requirements.
•Use the decorator/repository pattern — Separate caching concerns from business logic for maintainability and testability.
•Implement two-tier caching for distributed systems — L1 local cache for speed, L2 distributed cache for consistency.
•Know when NOT to cache — Unique requests, write-heavy workloads, strict consistency, and security-sensitive data often preclude caching.

What's Next:

The final page of this module addresses the overarching question that underlies every decision we've discussed: how to balance the power of advanced data structures against their implementation complexity. We'll develop a framework for deciding when sophistication is warranted versus when simplicity wins.

Page Complete

You now have a comprehensive framework for cache design decisions. From eviction policies to invalidation strategies to distributed architectures, you can confidently design caching solutions that match your system's specific requirements.

3 / 4

Loading learning content...

Data Structures & AlgorithmsWhen to Use Advanced Data Structures

When to Use Advanced Data Structures

LevelAdvanced

Duration60 mins

TopicWhen to Use Advanced Data Structures

3 / 4

Cache Design Considerations

The Art and Science of Caching

What You Will Learn

When to Introduce Caching

Conditions That Favor Caching

•Expensive computation or I/O — The underlying operation must be costly enough that cache overhead is worthwhile. Caching a simple arithmetic operation is absurd; caching a database query or API call makes sense.
•Repeated access patterns — The same inputs must be requested multiple times. A system where every request is unique gains nothing from caching.
•Temporal locality — Recently accessed items should be likely to be accessed again soon. This is the assumption underlying most eviction policies.
•Tolerance for staleness — The application must accept that cached data may not reflect the absolute latest state. Real-time requirements often preclude caching.
•Available memory — Caching consumes memory. If memory is already constrained, adding a cache may cause other performance problems (garbage collection, swapping).
•Stable result for same input — The underlying function should be deterministic or change infrequently. Caching rapidly-changing data creates invalidation nightmares.

Conditions That Argue Against Caching

•Low hit rate expected — If requests are mostly unique (e.g., personalized content with high cardinality), cache misses dominate and overhead exceeds benefit.
•Strict consistency requirements — Financial transactions, inventory counts, or any data where staleness is unacceptable should avoid caching or use very short TTLs.
•Simple underlying operation — If the 'expensive' operation is actually O(1), caching adds overhead without benefit. Measure first!
•Memory pressure already exists — Adding cache memory when the system is already memory-constrained can worsen overall performance.
•Complex invalidation requirements — If cached data depends on multiple sources that change independently, cache invalidation logic can become a maintenance nightmare.

The Cache Confidence Trap

Eviction Policy Deep Dive

Eviction Policy Comparison
Policy	Evicts	Pros	Cons	Best For
LRU (Least Recently Used)	Item not accessed for longest time	Simple, good for temporal locality	Doesn't consider frequency	General purpose, web caches
LFU (Least Frequently Used)	Item with lowest access count	Great for popularity-based access	Slow to adapt to changing patterns	CDN, static asset caches
FIFO (First In First Out)	Oldest item in cache	Extremely simple	Ignores access patterns entirely	Bounded buffers, simple cases
Random	Random item	O(1), no tracking overhead	Suboptimal hit rate	When simplicity trumps all
LRU-K	Least recent among K-th accesses	Better than LRU for scans	More complex to implement	Database buffer pools
ARC (Adaptive Replacement)	Adapts between recency and frequency	Self-tuning, robust	Complex implementation	Operating system caches
SLRU (Segmented LRU)	Probationary items first	Protects frequently used items	Two-tier complexity	CPU caches, database caches
TTL (Time-To-Live)	Items older than threshold	Guarantees freshness	May evict still-useful items	API caches, session stores

Deep Dive: LRU vs LFU

The LRU vs LFU decision is the most common policy choice engineers face. Let's analyze when each excels:

LRU (Least Recently Used):

Assumes temporal locality: recently accessed items will be accessed again
Excellent for user sessions, web page caching, database query results
Vulnerable to 'scan' patterns: a one-time sequential read evicts useful cached data
Implementation: O(1) with hash map + doubly linked list (the classic LeetCode problem)

LFU (Least Frequently Used):

Assumes popularity: frequently accessed items should stay in cache
Excellent for static assets, CDN edge caches, content that doesn't change often
Slow to adapt: an item accessed 1000 times yesterday stays even if never accessed today
Implementation: O(1) possible but more complex (multiple frequency buckets)

The Hybrid Approach: Many production systems use hybrid policies. For example:

TTL + LRU: Items expire after a time limit, and LRU handles capacity limits
LFU with aging: Frequency counts decay over time to adapt to changing patterns
Window-TinyLFU (used by Caffeine cache): Admits based on recency, evicts based on frequency

The 80% Rule

Cache Sizing and Capacity Planning

The Working Set Concept:

The working set is the subset of data actively being accessed during a given time window. Ideally, your cache should be large enough to hold the entire working set:

If cache size ≥ working set: Hit rate approaches 100% for repeated accesses
If cache size < working set: Thrashing—items are evicted before they can be reused
If cache size >> working set: Wasted memory, no additional benefit

Estimating Working Set:

Measure unique items in a time window: Count distinct keys accessed per minute/hour/day
Apply the 80/20 rule: Often 20% of items receive 80% of accesses
Simulate with different cache sizes: Run production logs through cache simulation
Monitor hit rates in production: Adjust based on observed behavior

Cache Size Analysis Pattern
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Simulate cache behavior to find optimal size
interface CacheSimulationResult {
    cacheSize: number;
    hitRate: number;
    memoryUsed: number;
}
 
function simulateCachePerformance(
    accessLog: string[],                // Sequence of accessed keys
    cacheSizesToTest: number[],         // Different sizes to evaluate
    evictionPolicy: 'lru' | 'lfu'       // Policy to test
): CacheSimulationResult[] {
    return cacheSizesToTest.map(size => {
        const cache = createCache(size, evictionPolicy);
        let hits = 0;
        let total = 0;
        
        for (const key of accessLog) {
            total++;
            if (cache.get(key) !== undefined) {
                hits++;
            } else {
                cache.put(key, fetchValue(key));
            }
        }
        
        return {
            cacheSize: size,
            hitRate: hits / total,
            memoryUsed: size * estimatedItemSize
        };
    });
}
 
// Example output analysis:
// Size 100:  Hit rate 45%, Memory 1MB
// Size 500:  Hit rate 72%, Memory 5MB   ← Diminishing returns start here
// Size 1000: Hit rate 78%, Memory 10MB
// Size 5000: Hit rate 81%, Memory 50MB  ← Not worth 5x memory for 3% gain
 
// Choose size 500: Best balance of hit rate and memory

The Cliff Effect

Memory Budget Allocation:

Caches don't exist in isolation. Consider the memory budget across your entire system:

Component	Typical Allocation	Notes
Application heap	40-60%	Core business logic
Caches	20-40%	Split across multiple caches
Buffers & I/O	10-20%	Network, file I/O
GC headroom	10-20%	For languages with GC

Multiple Caches: Most systems have multiple caches (query cache, session cache, computed result cache). Each needs its own sizing strategy:

Prioritize caches with highest impact (most expensive underlying operations)
Consider cache hierarchies: L1 (small, fast) → L2 (larger, slower)
Monitor and adjust allocations based on relative hit rates

Cache Invalidation Strategies

Invalidation Strategies

•TTL (Time-To-Live) — Items expire after a fixed duration. Simple but doesn't respond to actual data changes. Good for: API responses, content that changes on known schedules.
•Write-Through — Updates go to cache AND underlying store simultaneously. Cache always reflects latest data. Good for: Write-heavy workloads where consistency is critical.
•Write-Behind (Write-Back) — Updates go to cache first, asynchronously written to store. Fast writes but risk of data loss. Good for: High-throughput systems with acceptable data loss risk.
•Cache-Aside (Lazy Loading) — Application manages cache explicitly: check cache, on miss load from store and populate cache. Good for: Read-heavy workloads, complex cache-store relationships.
•Event-Driven Invalidation — Publish/subscribe on data changes to invalidate relevant cache entries. Good for: Distributed systems, microservices architectures.
•Invalidation on Write — When data changes, immediately invalidate/update corresponding cache entries. Good for: Simple data relationships, single-service caches.

The Invalidation Dilemma:

Every invalidation strategy involves tradeoffs:

Too Aggressive (Frequent Invalidation):

Cache hit rate drops
Underlying resources see more load
Latency increases

Too Conservative (Rare Invalidation):

Stale data served to users
Potential data consistency issues
User experience problems

Finding the Balance:

The right approach depends on your staleness tolerance:

Staleness Tolerance	Invalidation Approach	Example
Milliseconds	Write-through + synchronous invalidation	Financial transactions
Seconds	Event-driven invalidation	Social media feeds
Minutes	Short TTL + event backup	Product catalogs
Hours	Moderate TTL	CDN cached static assets
Days	Long TTL, manual refresh	Documentation caches

The Invalidation Bug Pattern

Cache Implementation Patterns

Beyond choosing eviction policies and invalidation strategies, the implementation pattern you select affects code maintainability, testability, and the potential for bugs.

Pattern 1: Inline Caching (Anti-Pattern for Complex Cases)

function getUser(userId: string): User {
    // Cache check inline with business logic
    const cached = cache.get(`user:${userId}`);
    if (cached) return cached;
    
    const user = database.fetchUser(userId);
    cache.put(`user:${userId}`, user, { ttl: 3600 });
    return user;
}

Problems:

Cache logic mixed with business logic
Hard to test business logic in isolation
Cache key construction duplicated everywhere
Easy to forget cache in new code paths

Pattern 2: Decorator/Wrapper Pattern (Recommended)

function cached<T>(
    fn: (...args: any[]) => Promise<T>,
    keyGenerator: (...args: any[]) => string,
    options: { ttl: number }
): (...args: any[]) => Promise<T> {
    return async (...args) => {
        const key = keyGenerator(...args);
        const cached = await cache.get(key);
        if (cached !== undefined) return cached;
        
        const result = await fn(...args);
        await cache.put(key, result, options);
        return result;
    };
}

// Usage
const getUserCached = cached(
    (userId: string) => database.fetchUser(userId),
    (userId) => `user:${userId}`,
    { ttl: 3600 }
);

Benefits:

Business logic remains pure
Cache behavior is consistent and declarative
Easy to test both cached and uncached versions
Cache logic is centralized

Pattern 3: Repository Pattern with Caching Layer

For complex domain models, implement caching as a repository decorator:

interface UserRepository {
    findById(id: string): Promise<User>;
    save(user: User): Promise<void>;
}

class DatabaseUserRepository implements UserRepository {
    async findById(id: string): Promise<User> {
        return this.db.query('SELECT * FROM users WHERE id = ?', [id]);
    }
    async save(user: User): Promise<void> {
        await this.db.query('UPDATE users SET ... WHERE id = ?', [user]);
    }
}

class CachedUserRepository implements UserRepository {
    constructor(
        private underlying: UserRepository,
        private cache: Cache<User>
    ) {}
    
    async findById(id: string): Promise<User> {
        const cached = await this.cache.get(id);
        if (cached) return cached;
        
        const user = await this.underlying.findById(id);
        await this.cache.put(id, user);
        return user;
    }
    
    async save(user: User): Promise<void> {
        await this.underlying.save(user);
        await this.cache.invalidate(user.id);  // Invalidation on write
    }
}

Benefits:

Clear separation of concerns
Caching is optional and swappable
Invalidation logic is co-located with write operations
Testable at each layer

The Cache Stampede Problem

Distributed Caching Considerations

Local Cache (Per-Server)

•Extremely fast (in-memory, no network)
•No single point of failure
•Inconsistencies between servers
•Memory duplicated across servers
•Best for: Read-heavy, staleness-tolerant

Distributed Cache (Shared)

•Consistent view across all servers
•Efficient memory use (no duplication)
•Network latency on every access
•Cache server can be SPOF
•Best for: Consistency-critical, large data

Two-Tier Caching (The Production Standard):

Most production systems use a two-tier approach:

L1 (Local, per-server): Small, extremely fast, short TTL
L2 (Distributed, shared): Larger, consistent, longer TTL

Request → Check L1 → Hit? Return
                    → Miss? Check L2 → Hit? Populate L1, Return
                                     → Miss? Fetch from DB, Populate L1+L2, Return

L1 Configuration:

Size: 100-1000 items (depends on item size)
TTL: 10-60 seconds (short to limit staleness)
No cross-server invalidation needed (TTL handles it)

L2 Configuration:

Size: Tens of thousands to millions of items
TTL: Minutes to hours
Technology: Redis, Memcached, etc.

The Network Latency Reality

When NOT to Cache

Scenarios Where Caching Hurts

•Unique request patterns — If every request is unique (e.g., random UUIDs), cache hit rate is 0%. You're just wasting memory and adding latency.
•Write-heavy workloads — If data changes faster than it's read, cache entries are invalidated before they can be reused. The write/invalidation overhead exceeds any read benefit.
•Strict consistency requirements — Financial ledgers, inventory counts, security tokens. Serving stale data isn't just inefficient—it's incorrect behavior.
•Fast underlying operations — If the 'expensive' operation is actually microseconds, the cache lookup overhead may exceed the savings. Measure first!
•One-time computations — Data processed once and never accessed again (batch jobs, migrations) doesn't benefit from caching.
•Security-sensitive data — Caching auth tokens, session data, or sensitive information creates security risks. Cache invalidation bugs become security vulnerabilities.

The Decision Heuristic:

Before implementing a cache, answer these questions:

What's the expected hit rate? (Below 50% is usually not worth it)
How expensive is the underlying operation? (Must be >>cache overhead)
How frequently does data change? (High churn = low cache effectiveness)
What's the staleness tolerance? (Zero tolerance = don't cache or very short TTL)
What's the memory budget? (Cache competes with other memory needs)

If any answer is unfavorable, strongly reconsider whether caching is appropriate.

The Performance Testing Imperative:

Never assume caching helps. Implement it behind a feature flag, measure hit rates and latency with and without the cache, and only enable it if metrics prove it's beneficial:

const cacheEnabled = featureFlags.get('user-cache-enabled');

const getUser = cacheEnabled 
    ? cachedGetUser 
    : directGetUser;

The Complexity Tax

Summary: Cache Design Mastery

We've covered the complete landscape of cache design considerations. Let's synthesize the key principles:

Key Cache Design Principles

•Cache only when justified — Expensive operations, repeated access patterns, and tolerance for staleness are prerequisites. Measure before implementing.
•LRU is the default choice — Unless you have specific evidence that frequency-based eviction (LFU) matches your access patterns better, start with LRU.
•Size caches to the working set — Too small causes thrashing, too large wastes memory. Find the 'knee' of the hit rate curve through simulation and monitoring.
•Choose invalidation strategy based on staleness tolerance — From write-through (millisecond consistency) to long TTLs (hours of staleness), match the strategy to requirements.
•Use the decorator/repository pattern — Separate caching concerns from business logic for maintainability and testability.
•Implement two-tier caching for distributed systems — L1 local cache for speed, L2 distributed cache for consistency.
•Know when NOT to cache — Unique requests, write-heavy workloads, strict consistency, and security-sensitive data often preclude caching.

What's Next:

Page Complete

3 / 4