Common Hashing Patterns - Learning Module

Loading content...

0/279

Caching and Memoization

The Ultimate Space-Time Trade-off

In computer science, we often face a fundamental trade-off: time vs space. Caching and memoization represent one of the most powerful applications of this trade-off—using extra memory to avoid redundant computation.

The core idea is elegant:

If you've computed something before, don't compute it again—just look it up.

This pattern appears everywhere:

Memoization: Storing function results to avoid recomputation
Caching: Storing frequently accessed data for fast retrieval
Dynamic Programming: Building solutions from cached subproblem results
Web Caching: CDNs storing content close to users
Database Query Caching: Storing result sets for repeated queries

At the heart of all these techniques is the hash table—providing O(1) lookup for cached values.

What You Will Master

By the end of this page, you will understand memoization as a fundamental optimization technique: how to implement it manually and with utilities, when to apply it, what its limitations are, and how it connects to dynamic programming. You'll also learn about cache eviction policies and practical caching considerations.

Memoization Fundamentals

Memoization is a specific form of caching where we store the results of function calls. When the function is called again with the same arguments, we return the cached result instead of recomputing.

Requirements for Memoization:

Pure Function: The function must be pure—given the same inputs, it always produces the same output (referential transparency). Side effects break memoization.
Hashable Arguments: Arguments must be usable as cache keys. For complex objects, this requires serialization.
Worth Caching: The computation must be expensive enough to justify the memory overhead.

The Canonical Example: Fibonacci

The naive recursive Fibonacci implementation has exponential time complexity due to redundant computation:

fibonacci_comparison
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
/**
 * Naive Fibonacci: Exponential Time O(2^n)
 * 
 * Problem: fib(40) makes over 300 million recursive calls
 * Each call to fib(n) computes fib(n-1) and fib(n-2) from scratch
 */
function fibNaive(n: number): number {
    if (n <= 1) return n;
    return fibNaive(n - 1) + fibNaive(n - 2);
}
 
/**
 * Memoized Fibonacci: Linear Time O(n)
 * 
 * Key insight: fib(k) is computed at most once for each k from 0 to n
 */
function fibMemoized(n: number, cache = new Map<number, number>()): number {
    // Check cache first
    if (cache.has(n)) {
        return cache.get(n)!;
    }
    
    // Base cases
    if (n <= 1) return n;
    
    // Compute and cache
    const result = fibMemoized(n - 1, cache) + fibMemoized(n - 2, cache);
    cache.set(n, result);
    
    return result;
}
 
// Performance comparison
console.time('naive');
// fibNaive(40); // ~1-2 seconds
console.timeEnd('naive');
 
console.time('memoized');
fibMemoized(40); // ~0.1 milliseconds
console.timeEnd('memoized');
 
// Call count visualization for n=5:
// Naive:     fib(5) → fib(4)×1 → fib(3)×2 → fib(2)×3 → fib(1)×5 → fib(0)×3
//            Total: 15 calls
// Memoized:  fib(5) → fib(4)×1 → fib(3)×1 → fib(2)×1 → fib(1)×1 → fib(0)×1
//            Total: 6 calls (one per unique value)

Understanding the Speedup

For Fibonacci, memoization reduces complexity from O(2ⁿ) to O(n)—an exponential-to-linear improvement. At n=40, that's from ~1 billion operations to ~40 operations. This pattern applies to ANY function with overlapping subproblems.

Building a Memoization Utility

Rather than adding caching logic to every function, we can create a higher-order function that wraps any function with memoization. This is the decorator pattern applied to caching.

memoize_utility
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
/**
 * Generic memoization utility for single-argument functions
 */
function memoize<T, R>(fn: (arg: T) => R): (arg: T) => R {
    const cache = new Map<T, R>();
    
    return (arg: T): R => {
        if (cache.has(arg)) {
            return cache.get(arg)!;
        }
        
        const result = fn(arg);
        cache.set(arg, result);
        return result;
    };
}
 
// Example usage
const expensiveComputation = (n: number): number => {
    console.log(`Computing for ${n}...`);
    let result = 0;
    for (let i = 0; i < n * 1000000; i++) {
        result += i;
    }
    return result;
};
 
const memoizedComputation = memoize(expensiveComputation);
 
console.log(memoizedComputation(5));  // "Computing for 5..." (computed)
console.log(memoizedComputation(5));  // (cached, no log)
console.log(memoizedComputation(10)); // "Computing for 10..." (computed)
console.log(memoizedComputation(5));  // (cached, no log)
 
/**
 * Memoization for multi-argument functions
 * Key challenge: How to create cache keys from multiple arguments
 */
function memoizeMulti<T extends any[], R>(
    fn: (...args: T) => R,
    keyFn: (...args: T) => string = (...args) => JSON.stringify(args)
): (...args: T) => R {
    const cache = new Map<string, R>();
    
    return (...args: T): R => {
        const key = keyFn(...args);
        
        if (cache.has(key)) {
            return cache.get(key)!;
        }
        
        const result = fn(...args);
        cache.set(key, result);
        return result;
    };
}
 
// Example: Memoizing a function with multiple arguments
const gridTraversal = (rows: number, cols: number): number => {
    if (rows === 1 || cols === 1) return 1;
    return gridTraversal(rows - 1, cols) + gridTraversal(rows, cols - 1);
};
 
const memoizedGrid = memoizeMulti(gridTraversal);
console.log(memoizedGrid(10, 10)); // Fast with memoization

Memoization Pitfalls

Non-hashable keys: Lists, dicts, and custom objects need serialization Memory growth: Unbounded caches can consume all memory Stale data: If underlying data changes, cached results become invalid Side effects: Memoizing impure functions causes bugs

Connection to Dynamic Programming

Dynamic Programming (DP) is memoization applied systematically. The two approaches are equivalent but differ in direction:

Top-Down (Memoization):

Start from the final problem
Recursively break down into subproblems
Cache results as you go
Natural recursive structure

Bottom-Up (Tabulation):

Start from smallest subproblems
Build up to the final answer
Fill a table iteratively
Often more space-efficient

Both achieve the same asymptotic complexity, but each has advantages:

Top-Down (Memoization)

•More intuitive—follows problem structure
•Only computes needed subproblems
•Easy to implement with recursion
•Risk of stack overflow for deep recursion
•Function call overhead

Bottom-Up (Tabulation)

•No recursion overhead
•Can optimize space (sliding window)
•No stack overflow risk
•Computes ALL subproblems
•Harder to see the structure

dp_comparison
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
/**
 * Coin Change Problem: Minimum coins to make amount
 * Classic DP problem demonstrating both approaches
 */
 
// Top-Down (Memoization)
function coinChangeTopDown(coins: number[], amount: number): number {
    const cache = new Map<number, number>();
    
    function dp(remaining: number): number {
        // Base cases
        if (remaining === 0) return 0;
        if (remaining < 0) return Infinity;
        
        // Check cache
        if (cache.has(remaining)) {
            return cache.get(remaining)!;
        }
        
        // Try each coin, take minimum
        let minCoins = Infinity;
        for (const coin of coins) {
            const result = dp(remaining - coin);
            minCoins = Math.min(minCoins, result + 1);
        }
        
        cache.set(remaining, minCoins);
        return minCoins;
    }
    
    const result = dp(amount);
    return result === Infinity ? -1 : result;
}
 
// Bottom-Up (Tabulation)
function coinChangeBottomUp(coins: number[], amount: number): number {
    // dp[i] = minimum coins to make amount i
    const dp = new Array(amount + 1).fill(Infinity);
    dp[0] = 0; // Base case: 0 coins for amount 0
    
    // Fill table from smallest to largest
    for (let i = 1; i <= amount; i++) {
        for (const coin of coins) {
            if (coin <= i) {
                dp[i] = Math.min(dp[i], dp[i - coin] + 1);
            }
        }
    }
    
    return dp[amount] === Infinity ? -1 : dp[amount];
}
 
// Both produce same results
console.log(coinChangeTopDown([1, 2, 5], 11));  // 3 (5+5+1)
console.log(coinChangeBottomUp([1, 2, 5], 11)); // 3

Cache Eviction Policies

Real-world caches can't grow indefinitely. When the cache is full, we need a cache eviction policy to decide what to remove.

Common Eviction Policies:

LRU (Least Recently Used): Evict the item that hasn't been accessed for the longest time
LFU (Least Frequently Used): Evict the item with the lowest access count
FIFO (First In First Out): Evict the oldest item
TTL (Time To Live): Evict items after a fixed time period

LRU Cache Implementation:

LRU is the most common policy. It requires O(1) access AND O(1) eviction, which is achieved using a hash map + doubly linked list:

lru_cache
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
/**
 * LRU Cache Implementation
 * 
 * Data Structures:
 * - Map: key → Node (O(1) lookup)
 * - Doubly Linked List: maintains access order (O(1) move/remove)
 * 
 * Most recently used at tail, least recently used at head
 */
class LRUNode<K, V> {
    key: K;
    value: V;
    prev: LRUNode<K, V> | null = null;
    next: LRUNode<K, V> | null = null;
    
    constructor(key: K, value: V) {
        this.key = key;
        this.value = value;
    }
}
 
class LRUCache<K, V> {
    private capacity: number;
    private cache: Map<K, LRUNode<K, V>>;
    private head: LRUNode<K, V>; // Dummy head (oldest)
    private tail: LRUNode<K, V>; // Dummy tail (newest)
    
    constructor(capacity: number) {
        this.capacity = capacity;
        this.cache = new Map();
        
        // Initialize dummy nodes
        this.head = new LRUNode<K, V>(null as any, null as any);
        this.tail = new LRUNode<K, V>(null as any, null as any);
        this.head.next = this.tail;
        this.tail.prev = this.head;
    }
    
    private addToTail(node: LRUNode<K, V>): void {
        node.prev = this.tail.prev;
        node.next = this.tail;
        this.tail.prev!.next = node;
        this.tail.prev = node;
    }
    
    private removeNode(node: LRUNode<K, V>): void {
        node.prev!.next = node.next;
        node.next!.prev = node.prev;
    }
    
    private moveToTail(node: LRUNode<K, V>): void {
        this.removeNode(node);
        this.addToTail(node);
    }
    
    get(key: K): V | undefined {
        const node = this.cache.get(key);
        if (!node) return undefined;
        
        // Mark as recently used
        this.moveToTail(node);
        return node.value;
    }
    
    put(key: K, value: V): void {
        const existingNode = this.cache.get(key);
        
        if (existingNode) {
            // Update existing
            existingNode.value = value;
            this.moveToTail(existingNode);
        } else {
            // Add new
            const newNode = new LRUNode(key, value);
            this.cache.set(key, newNode);
            this.addToTail(newNode);
            
            // Evict if over capacity
            if (this.cache.size > this.capacity) {
                const lru = this.head.next!;
                this.removeNode(lru);
                this.cache.delete(lru.key);
            }
        }
    }
}
 
// Example usage
const cache = new LRUCache<number, string>(3);
cache.put(1, "one");
cache.put(2, "two");
cache.put(3, "three");
console.log(cache.get(1));  // "one" (1 becomes most recent)
cache.put(4, "four");       // Evicts 2 (least recent)
console.log(cache.get(2));  // undefined (evicted)
console.log(cache.get(3));  // "three"

Practical Caching Applications

Beyond algorithm optimization, caching appears throughout software systems. Understanding these patterns helps you apply hashing concepts to real-world problems.

Application 1: API Response Caching

Cache expensive API calls to reduce latency and server load:

api_caching
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
/**
 * API Response Cache with TTL (Time To Live)
 */
interface CacheEntry<T> {
    value: T;
    expiresAt: number;
}
 
class TTLCache<T> {
    private cache = new Map<string, CacheEntry<T>>();
    private defaultTTL: number; // milliseconds
    
    constructor(defaultTTL: number = 60000) { // Default 1 minute
        this.defaultTTL = defaultTTL;
    }
    
    get(key: string): T | undefined {
        const entry = this.cache.get(key);
        
        if (!entry) return undefined;
        
        if (Date.now() > entry.expiresAt) {
            this.cache.delete(key);
            return undefined;
        }
        
        return entry.value;
    }
    
    set(key: string, value: T, ttl?: number): void {
        this.cache.set(key, {
            value,
            expiresAt: Date.now() + (ttl ?? this.defaultTTL)
        });
    }
    
    // Periodic cleanup of expired entries
    cleanup(): void {
        const now = Date.now();
        for (const [key, entry] of this.cache) {
            if (now > entry.expiresAt) {
                this.cache.delete(key);
            }
        }
    }
}
 
// Example: Caching API responses
const apiCache = new TTLCache<any>(30000); // 30 second TTL
 
async function fetchWithCache(url: string): Promise<any> {
    // Check cache first
    const cached = apiCache.get(url);
    if (cached) {
        console.log('Cache hit:', url);
        return cached;
    }
    
    // Fetch and cache
    console.log('Cache miss, fetching:', url);
    const response = await fetch(url);
    const data = await response.json();
    
    apiCache.set(url, data);
    return data;
}

Application 2: Computed Property Caching

Cache expensive property computations in objects:

computed_property_cache
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
from functools import cached_property
from typing import List
import hashlib
 
class DatasetAnalyzer:
    """
    Class demonstrating computed property caching.
    Expensive computations are cached after first access.
    """
    
    def __init__(self, data: List[float]):
        self._data = data
    
    @cached_property
    def mean(self) -> float:
        """Computed once, then cached."""
        print("Computing mean...")
        return sum(self._data) / len(self._data)
    
    @cached_property
    def variance(self) -> float:
        """Uses cached mean."""
        print("Computing variance...")
        mean = self.mean  # Uses cached value
        return sum((x - mean) ** 2 for x in self._data) / len(self._data)
    
    @cached_property
    def checksum(self) -> str:
        """Expensive hash computation."""
        print("Computing checksum...")
        data_str = ','.join(map(str, self._data))
        return hashlib.sha256(data_str.encode()).hexdigest()
 
 
# Usage
analyzer = DatasetAnalyzer([1.0, 2.0, 3.0, 4.0, 5.0])
 
# First access computes
print(analyzer.mean)      # "Computing mean..." → 3.0
print(analyzer.variance)  # "Computing variance..." (mean already cached) → 2.0
 
# Subsequent access returns cached
print(analyzer.mean)      # 3.0 (no computation)
print(analyzer.variance)  # 2.0 (no computation)
 
 
# Manual cache invalidation pattern
class MutableDataset:
    """When data can change, need manual cache management."""
    
    def __init__(self, data: List[float]):
        self._data = data
        self._cache = {}
    
    @property
    def mean(self) -> float:
        if 'mean' not in self._cache:
            self._cache['mean'] = sum(self._data) / len(self._data)
        return self._cache['mean']
    
    def add_point(self, value: float) -> None:
        self._data.append(value)
        self._cache.clear()  # Invalidate cache

Application 3: Request Deduplication

Prevent duplicate in-flight requests:

request_dedup
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/**
 * Request deduplication: If same request is in-flight, return that promise
 * instead of making a duplicate request.
 */
class RequestDeduplicator {
    private inFlight = new Map<string, Promise<any>>();
    
    async fetch(url: string): Promise<any> {
        // If request is already in-flight, return that promise
        if (this.inFlight.has(url)) {
            console.log(`Deduplicating request to ${url}`);
            return this.inFlight.get(url);
        }
        
        // Start new request
        const promise = fetch(url)
            .then(r => r.json())
            .finally(() => {
                // Remove from in-flight when done
                this.inFlight.delete(url);
            });
        
        this.inFlight.set(url, promise);
        return promise;
    }
}
 
// Usage: Multiple simultaneous calls get same response
const dedup = new RequestDeduplicator();
 
// These all share ONE network request
Promise.all([
    dedup.fetch('/api/data'),
    dedup.fetch('/api/data'),
    dedup.fetch('/api/data'),
]).then(results => {
    // All three get same data from single request
    console.log(results);
});

When NOT to Memoize

Memoization isn't free. Understanding when it hurts more than it helps is crucial.

Don't Memoize When...

•Computation is cheap: If the function is already O(1) or very fast, cache overhead dominates. Don't memoize simple arithmetic.
•Results are rarely reused: If each input is unique, the cache fills with unused entries. Profile before assuming reuse.
•Function has side effects: Functions that modify state, write to disk, or make network calls shouldn't be memoized.
•Memory is critical: Unbounded caches can consume all memory. Consider LRU or TTL limits.
•Arguments are huge or unhashable: Serializing large objects for keys is expensive and uses memory.
•Data changes frequently: Cached results become stale. Need invalidation strategy.

memoization_antipatterns
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// ❌ ANTI-PATTERN: Memoizing cheap operations
const addBad = memoize((a: number, b: number) => a + b);
// Cache lookup is comparable cost to addition!
 
// ❌ ANTI-PATTERN: Memoizing with unique inputs
const formatDateBad = memoize((timestamp: number) => new Date(timestamp).toISOString());
// If every timestamp is unique, cache never hits
 
// ❌ ANTI-PATTERN: Memoizing impure functions
let counter = 0;
const badMemo = memoize((n: number) => {
    counter++; // Side effect!
    return n * 2;
});
badMemo(5); // counter = 1
badMemo(5); // counter = 1 (cached)
badMemo(5); // counter = 1 (cached)
// Caller might expect counter to be 3!
 
// ❌ ANTI-PATTERN: Unbounded cache with many unique keys
const processingCache = new Map<string, any>();
async function processData(data: HugeObject) {
    const key = JSON.stringify(data); // Expensive serialization!
    if (processingCache.has(key)) {
        return processingCache.get(key);
    }
    const result = expensiveProcess(data);
    processingCache.set(key, result);
    // Memory leak: cache grows indefinitely!
    return result;
}
 
// ✅ CORRECT: Bounded cache with eviction
const boundedCache = new LRUCache<string, any>(1000);

The Memoization Checklist

Before memoizing, ask:

Is the function pure?
Is the computation expensive (> 1ms)?
Will the same inputs recur?
Are arguments easily hashable?
Is memory available for caching?
Is stale data acceptable?

If any answer is 'no', reconsider memoization.

Summary: Caching and Memoization Mastery

Caching and memoization are fundamental techniques for trading space for time. Understanding when and how to apply them is essential for building efficient systems.

Key Takeaways

•Memoization trades space for time — Store results to avoid recomputation
•Requires pure functions — Same input must always produce same output
•Memoization = Top-Down DP — Equivalent to bottom-up tabulation for DP problems
•Cache eviction matters — LRU prevents unbounded memory growth
•Many practical applications — API caching, computed properties, request deduplication
•Not always beneficial — Cheap functions, unique inputs, and memory constraints limit applicability

Caching Pattern Quick Reference
Pattern	Use Case	Key Consideration
Simple Memoization	Recursive functions	Pure functions only
LRU Cache	Bounded memory	Capacity tuning
TTL Cache	Time-sensitive data	Staleness tolerance
Request Deduplication	Concurrent requests	Promise handling
Computed Properties	Expensive derivations	Invalidation on change

Module Complete

Congratulations! You have now mastered the common hashing patterns: frequency counting, two-sum/pair problems, duplicate detection, grouping by characteristic, and caching/memoization. These patterns form the foundation for solving countless problems efficiently. With hash tables in your toolkit, you can transform O(n²) algorithms into O(n) solutions and build systems that scale gracefully.