Data Structures & AlgorithmsMemoization — Top-Down DP

Memoization — Top-Down Dynamic Programming

LevelIntermediate

Duration60 mins

TopicMemoization — Top-Down DP

3 / 4

Avoiding Recomputation

The Hidden Geometry of Redundant Work

When we add memoization to a recursive function, something profound happens to the underlying computation. What was once an exponentially exploding tree of recursive calls collapses into a compact directed acyclic graph (DAG) where each unique subproblem appears exactly once.

This transformation is the mathematical heart of dynamic programming. Understanding it deeply—not just procedurally—reveals why DP is so powerful and when it will (and won't) help.

In this page, we'll visualize this transformation, quantify the savings, understand the dependency structure that memoization exploits, and develop intuition for predicting when memoization will yield dramatic improvements.

What You Will Learn

You'll understand precisely how memoization eliminates redundant work—not just that it does, but the structural reasons why. You'll learn to visualize the transformation from call trees to DAGs, calculate the exact computational savings, and recognize the dependency patterns that make memoization effective.

Visualizing the Recursive Call Tree

To understand what memoization eliminates, we must first visualize what naive recursion creates. Every recursive call spawns a node in the call tree—a tree structure representing the execution of the algorithm.

For Fibonacci(5), the call tree looks like this:

                    fib(5)
                   /      \
              fib(4)        fib(3)
             /     \        /     \
        fib(3)    fib(2)  fib(2)  fib(1)
        /    \     /   \   /   \
    fib(2) fib(1) f(1) f(0) f(1) f(0)
    /    \
   f(1)  f(0)

Observations about this tree:

Properties of Naive Recursive Call Trees

•Exponential growth — The tree roughly doubles in size with each increment of n. For fib(n), there are approximately 2^n nodes.
•Massive duplication — fib(3) appears twice, fib(2) appears three times, fib(1) appears five times. The smaller the subproblem, the more it's duplicated.
•Independent subtrees — Each subtree computes its result from scratch, oblivious to work done by sibling subtrees.
•Work proportional to tree size — Time complexity equals the number of nodes. More nodes = more computation.
•Polynomial unique problems — Despite exponential nodes, there are only n+1 unique subproblems: fib(0) through fib(n).

The gap between exponential tree nodes and polynomial unique subproblems is precisely what memoization exploits. If we could ensure each unique subproblem is computed only once, we'd reduce exponential work to polynomial work.

The Redundancy Ratio

For Fibonacci, the redundancy ratio (total calls / unique subproblems) grows exponentially. At n=30, naive recursion makes ~2.7 million calls for just 31 unique subproblems—a redundancy ratio of ~87,000! Memoization reduces this to exactly 31 computations.

The DAG Transformation

When we add memoization, the call tree collapses into a directed acyclic graph (DAG). Instead of computing each subproblem multiple times, we compute it once and reuse the cached result.

The memoized Fibonacci(5) execution looks like this:

DAG of Subproblem Dependencies:

    fib(5)
      ↓
    fib(4) ─────────┐
      ↓             │
    fib(3) ─────┐   │
      ↓         │   │
    fib(2) ─┐   │   │
      ↓     ↓   ↓   ↓
    fib(1) fib(0)  (reused)
      ↓
     [1]   [0]    [1]  [2]  [3]  [5]

Execution order: fib(1) → fib(0) → fib(2) → fib(3) → fib(4) → fib(5)
Each node computed EXACTLY ONCE.

Properties of the DAG:

Properties of Memoized Computation DAG

•One node per unique subproblem — Exactly n+1 nodes for Fibonacci(n), regardless of how many times each subproblem is referenced.
•Edges represent dependencies — An edge from A to B means 'A depends on B's result'. These form a DAG (no cycles, since subproblems are smaller).
•Polynomial structure — The DAG has O(n) nodes and O(n) edges for Fibonacci. Other problems may have O(n²) or O(n×m) nodes.
•Execution follows topological order — We must compute subproblems before the problems that depend on them. Recursion handles this automatically.
•Cache transforms edges to lookups — When we encounter a dependency edge, instead of computing, we lookup the cached result in O(1).

Call Tree vs DAG Comparison
Aspect	Naive Call Tree	Memoized DAG
Nodes	Exponential O(2^n) for Fibonacci	Linear O(n) for Fibonacci
Work per node	O(1) addition	O(1) addition
Total work	O(2^n)	O(n)
Duplicate computations	Massive (~87,000× at n=30)	Zero
Memory for results	None (recomputed each time)	O(n) for memo cache
Traversal	All children visited	Each child visited once

The Trade-off

Memoization trades space for time. We use O(n) extra memory to store cached results, but in return, we reduce time from O(2^n) to O(n). This is almost always an excellent trade—the space cost is tiny compared to the time savings.

Quantifying the Savings

Let's precisely analyze how much work memoization saves. The analysis rests on a key insight:

Time complexity of memoized recursion = (Number of unique subproblems) × (Work per subproblem)

This formula, called the subproblem counting method, gives us a precise way to analyze memoized algorithms.

complexity_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
"""
Complexity Analysis of Memoized Algorithms
==========================================
 
Formula: Time = (# unique subproblems) × (work per subproblem)
 
Let's analyze several examples:
"""
 
# -----------------------------
# EXAMPLE 1: Fibonacci
# -----------------------------
# Subproblems: fib(0), fib(1), ..., fib(n) → (n + 1) subproblems
# Work per subproblem: 2 additions (combine fib(n-1) + fib(n-2)) → O(1)
# Total time: O(n) × O(1) = O(n)
# Space: O(n) for memo + O(n) for call stack = O(n)
 
def fib(n, memo={}):
    if n in memo: return memo[n]
    if n <= 1: return n
    memo[n] = fib(n-1, memo) + fib(n-2, memo)  # O(1) work
    return memo[n]
 
 
# -----------------------------
# EXAMPLE 2: Grid Paths (m × n grid)
# -----------------------------
# Subproblems: paths(i, j) for 0 ≤ i < m, 0 ≤ j < n → (m × n) subproblems
# Work per subproblem: 1 addition (combine from above + from left) → O(1)
# Total time: O(m × n) × O(1) = O(m × n)
# Space: O(m × n) for memo + O(m + n) for call stack = O(m × n)
 
def grid_paths(m, n, memo={}):
    if (m, n) in memo: return memo[(m, n)]
    if m == 1 or n == 1: return 1
    memo[(m, n)] = grid_paths(m-1, n, memo) + grid_paths(m, n-1, memo)
    return memo[(m, n)]
 
 
# -----------------------------
# EXAMPLE 3: Longest Common Subsequence (strings of length m, n)
# -----------------------------
# Subproblems: lcs(i, j) for 0 ≤ i ≤ m, 0 ≤ j ≤ n → O(m × n) subproblems
# Work per subproblem: String comparison + max → O(1)
# Total time: O(m × n) × O(1) = O(m × n)
# Space: O(m × n) for memo
 
def lcs(s1, s2, i=None, j=None, memo=None):
    if i is None: i, j = len(s1), len(s2)
    if memo is None: memo = {}
    
    if (i, j) in memo: return memo[(i, j)]
    if i == 0 or j == 0: return 0
    
    if s1[i-1] == s2[j-1]:
        result = 1 + lcs(s1, s2, i-1, j-1, memo)
    else:
        result = max(lcs(s1, s2, i-1, j, memo), lcs(s1, s2, i, j-1, memo))
    
    memo[(i, j)] = result
    return result
 
 
# -----------------------------
# EXAMPLE 4: 0/1 Knapsack (n items, capacity W)
# -----------------------------
# Subproblems: knapsack(i, cap) for 0 ≤ i ≤ n, 0 ≤ cap ≤ W → O(n × W) subproblems
# Work per subproblem: max of two choices → O(1)
# Total time: O(n × W) × O(1) = O(n × W)
# Note: This is "pseudo-polynomial" because W is a value, not input size
 
def knapsack(weights, values, i, cap, memo={}):
    if (i, cap) in memo: return memo[(i, cap)]
    if i >= len(weights) or cap == 0: return 0
    
    skip = knapsack(weights, values, i+1, cap, memo)
    take = 0
    if weights[i] <= cap:
        take = values[i] + knapsack(weights, values, i+1, cap-weights[i], memo)
    
    memo[(i, cap)] = max(skip, take)
    return memo[(i, cap)]
 
 
# -----------------------------
# SUMMARY TABLE
# -----------------------------
"""
Problem                 | Subproblems    | Work/Subproblem | Total Time
------------------------|----------------|-----------------|------------
Fibonacci               | O(n)           | O(1)            | O(n)
Grid Paths (m×n)        | O(m×n)         | O(1)            | O(m×n)
LCS (lengths m, n)      | O(m×n)         | O(1)            | O(m×n)
0/1 Knapsack (n, W)     | O(n×W)         | O(1)            | O(n×W)
Edit Distance (m, n)    | O(m×n)         | O(1)            | O(m×n)
Matrix Chain Mult (n)   | O(n²)          | O(n)            | O(n³)
Palindrome Partition    | O(n²)          | O(n)            | O(n³)
 
The pattern: Most classic DP problems have polynomial subproblem counts
and O(1) or O(n) work per subproblem, yielding polynomial total time.
"""

The Subproblem Counting Method

When analyzing a memoized algorithm, always ask: (1) How many unique states can the arguments take? (2) How much work do we do per state (excluding recursive calls)? The product gives total time complexity. This method is more reliable than analyzing the call tree.

The Structure of Recomputation

Not all recursive problems exhibit overlapping subproblems. To understand when memoization helps, we need to recognize the structural patterns that create redundant computation.

When does overlap occur?

Overlap occurs when the dependency structure of subproblems contains cycles in the subproblem graph—that is, when multiple paths lead to the same subproblem.

Overlapping (Memoization Helps)

•Fibonacci pattern — f(n) depends on f(n-1) and f(n-2); both depend on overlapping smaller values
•Grid traversal — Multiple paths converge on the same cell from different directions
•Sequence comparison — LCS, edit distance have shared prefixes/suffixes
•Subset problems — Different item orderings lead to same remaining subsets
•Common shape: lattice/grid — 2D state spaces with movement in multiple directions

Non-Overlapping (Memo Doesn't Help)

•Pure divide-and-conquer — Merge sort splits uniquely; no subproblem repeats
•Binary search — Each decision eliminates half; no revisiting
•Tree traversal — Each node visited exactly once naturally
•Unique partitioning — Quick sort partitions are disjoint
•Common shape: tree — No convergence in subproblem graph

The diamond pattern:

A clear indicator of overlap is the diamond (or rhombus) shape in the dependency graph:

       P
      / \
     A   B     <-- P depends on both A and B
      \ /
       C       <-- Both A and B depend on C!

When both branches of a recursion lead to the same sub-subproblem, we have overlap. Without memoization, C is computed twice. With memoization, C is computed once and reused.

The more diamonds in your dependency structure, the more memoization helps.

dependency_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
"""
Analyzing Dependency Structure for Overlap Detection
====================================================
"""
 
# EXAMPLE 1: Fibonacci has extreme overlap (diamond at every level)
#
#            fib(n)
#           /      \
#      fib(n-1)   fib(n-2)    <- Two branches
#        /    \     /    \
#   fib(n-2) fib(n-3) ... fib(n-3)  <- n-2 and n-3 appear multiple times!
#
# Overlapping paths: fib(n) → fib(n-1) → fib(n-2)
#                    fib(n) → fib(n-2)          (same destination!)
 
 
# EXAMPLE 2: Merge Sort has NO overlap (tree structure)
#
#           mergesort([1,3,2,4])
#              /            \
#    mergesort([1,3])    mergesort([2,4])     <- Disjoint halves
#       /       \          /        \
#   ms([1])   ms([3])   ms([2])   ms([4])     <- All unique!
#
# No shared subproblems: left and right halves are completely independent
 
 
# EXAMPLE 3: Grid paths has overlap (lattice structure)
#
# State space for 4×4 grid:
#
#   (0,0) → (0,1) → (0,2) → (0,3)
#     ↓       ↓       ↓       ↓
#   (1,0) → (1,1) → (1,2) → (1,3)
#     ↓       ↓       ↓       ↓  
#   (2,0) → (2,1) → (2,2) → (2,3)
#     ↓       ↓       ↓       ↓
#   (3,0) → (3,1) → (3,2) → (3,3)
#
# Multiple paths reach each cell! 
# e.g., (1,1) reachable via (0,0)→(0,1)→(1,1) and (0,0)→(1,0)→(1,1)
 
 
# Detecting overlap programmatically:
def detect_overlap(n, call_counts=None):
    """
    Count how many times each subproblem is requested.
    If any count > 1, we have overlap.
    """
    if call_counts is None:
        call_counts = {}
    
    # Record this call
    call_counts[n] = call_counts.get(n, 0) + 1
    
    # Base case
    if n <= 1:
        return n, call_counts
    
    # Recurse (don't actually compute, just track calls)
    detect_overlap(n - 1, call_counts)
    detect_overlap(n - 2, call_counts)
    
    return call_counts
 
 
# Check overlap in Fibonacci
counts = detect_overlap(10)
overlaps = {k: v for k, v in counts.items() if v > 1}
print(f"Overlapping subproblems in fib(10): {overlaps}")
# Output: {9: 1, 8: 2, 7: 3, 6: 5, 5: 8, 4: 13, 3: 21, 2: 34, 1: 55, 0: 34}
# Nearly every subproblem is computed many times!

Cache Hit Rates and Memoization Effectiveness

We can measure how effective memoization is by tracking cache hits versus cache misses:

Cache miss — First time computing a subproblem (result stored)
Cache hit — Subsequent requests for the same subproblem (result retrieved)

The hit rate = hits / (hits + misses) tells us what fraction of work was eliminated.

cache_statistics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
class MemoStats:
    """
    Wrapper that tracks memoization statistics.
    Reveals exactly how much work memoization saves.
    """
    def __init__(self):
        self.memo = {}
        self.hits = 0
        self.misses = 0
    
    def get(self, key):
        if key in self.memo:
            self.hits += 1
            return self.memo[key], True
        self.misses += 1
        return None, False
    
    def put(self, key, value):
        self.memo[key] = value
    
    def report(self):
        total = self.hits + self.misses
        hit_rate = self.hits / total if total > 0 else 0
        print(f"Cache Statistics:")
        print(f"  Hits:    {self.hits}")
        print(f"  Misses:  {self.misses}")
        print(f"  Hit Rate: {hit_rate:.2%}")
        print(f"  Work Saved: {self.hits} / {total} computations avoided")
        
 
def fib_with_stats(n: int, stats: MemoStats) -> int:
    """Fibonacci with cache statistics tracking."""
    cached, found = stats.get(n)
    if found:
        return cached
    
    if n <= 1:
        result = n
    else:
        result = fib_with_stats(n - 1, stats) + fib_with_stats(n - 2, stats)
    
    stats.put(n, result)
    return result
 
 
# Run and see statistics
print("=" * 50)
print("Fibonacci(30) Cache Analysis:")
print("=" * 50)
 
stats = MemoStats()
result = fib_with_stats(30, stats)
print(f"Result: {result}")
stats.report()
 
# Output:
# Cache Statistics:
#   Hits:    28
#   Misses:  31
#   Hit Rate: 47.46%
#   Work Saved: 28 / 59 computations avoided
#
# Without memoization: ~2.7 million calls
# With memoization: 59 calls (31 misses + 28 hits)
# Reduction factor: ~45,000×
 
 
print("\n" + "=" * 50)
print("Grid Paths(15, 15) Cache Analysis:")
print("=" * 50)
 
def grid_paths_with_stats(m: int, n: int, stats: MemoStats) -> int:
    """Grid paths with cache statistics."""
    key = (m, n)
    cached, found = stats.get(key)
    if found:
        return cached
    
    if m == 1 or n == 1:
        result = 1
    else:
        result = grid_paths_with_stats(m-1, n, stats) + \
                 grid_paths_with_stats(m, n-1, stats)
    
    stats.put(key, result)
    return result
 
 
stats2 = MemoStats()
result2 = grid_paths_with_stats(15, 15, stats2)
print(f"Result: {result2}")
stats2.report()
 
# For grid problems, the hit rate increases with grid size
# because more paths converge to intermediate cells

Interpreting Cache Statistics

The number of misses equals the number of unique subproblems—this determines your true time complexity. The number of hits represents avoided redundant work. For Fibonacci(30), we have 31 unique subproblems but would have made ~2.7 million calls without memoization. The hit rate understates the savings because each avoided call would have spawned its own subtree.

Dependency Order and Computation Flow

For memoization to work, we must compute subproblems before the problems that depend on them. In top-down memoization, the recursion automatically ensures this order.

How recursion enforces dependency order:

When we call dp(n), we immediately try to compute dp(n-1) and dp(n-2)
Those calls, in turn, request their own dependencies
Eventually we hit base cases, which return directly
As calls return, they populate the cache
When we need a dependency that's already cached, we get an instant hit

This is lazy evaluation — we only compute what we need, and dependencies are computed just-in-time when first requested.

computation_order.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def fib_traced(n: int, memo: dict = None, order: list = None) -> int:
    """
    Track the order in which subproblems are first computed.
    Shows how recursion automatically respects dependencies.
    """
    if memo is None:
        memo = {}
        order = []
    
    if n in memo:
        print(f"  fib({n}) → hit (already computed)")
        return memo[n]
    
    if n <= 1:
        result = n
        order.append(n)
        print(f"  fib({n}) → base case = {result}")
    else:
        print(f"  fib({n}) → need fib({n-1}) and fib({n-2})")
        result = fib_traced(n-1, memo, order) + fib_traced(n-2, memo, order)
        order.append(n)
        print(f"  fib({n}) → computed = {result}")
    
    memo[n] = result
    return result
 
 
print("Tracing fib(6):")
print("=" * 50)
memo = {}
order = []
result = fib_traced(6, memo, order)
print("=" * 50)
print(f"Computation order: {order}")
print(f"Final result: fib(6) = {result}")
 
# Output shows:
# - Recursion descends to base cases first
# - Values computed in order: 1, 0, 2, 1 (hit), 3, 2 (hit), 4, 3 (hit), 5, 4 (hit), 6
# - Dependency order respected automatically
 
 
# For bottom-up DP (tabulation), we explicitly loop in dependency order
def fib_bottom_up(n: int) -> int:
    """
    Explicit ordering: compute 0, 1, 2, ..., n in sequence.
    Each value available before it's needed.
    """
    if n <= 1:
        return n
    
    dp = [0] * (n + 1)
    dp[0] = 0
    dp[1] = 1
    
    print("Bottom-up computation order:")
    for i in range(2, n + 1):
        dp[i] = dp[i-1] + dp[i-2]
        print(f"  dp[{i}] = dp[{i-1}] + dp[{i-2}] = {dp[i]}")
    
    return dp[n]
 
 
print("\n" + "=" * 50)
fib_bottom_up(6)
 
# Both approaches respect the same dependency order:
# smaller problems computed before larger ones

Recursion as Automatic Ordering

The beauty of top-down memoization is that you don't need to figure out the computation order. The recursion naturally computes dependencies first. This is why memoization is often easier to implement than tabulation—you just write the recursive solution and add caching.

When Memoization Doesn't Help

Memoization isn't magic—it only helps when there's redundant work to eliminate. Here are cases where memoization provides little or no benefit:

Scenarios Where Memoization Has Limited Impact

•No overlapping subproblems — Pure divide-and-conquer (merge sort, binary search) naturally avoids revisiting subproblems. Memoization adds overhead without savings.
•Unbounded or sparse state space — If the number of possible states is too large or exploration is sparse, most lookups will be misses. Cache overhead dominates.
•Unhashable state — If state includes mutable containers or complex objects that can't be hashed, memoization becomes awkward or impossible.
•State too large to store — Some problems have state so large that storing it in a hash table is impractical (e.g., entire board configurations in some games).
•Each subproblem visited once naturally — If the algorithm already visits each subproblem exactly once (tree traversal), memoization just adds overhead.

no_overlap_example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# Example: Binary search has NO overlapping subproblems
# Adding memoization would be pointless
 
def binary_search(arr: list[int], target: int, lo: int = 0, hi: int = None) -> int:
    """
    Each call searches a unique range [lo, hi].
    No range is ever searched twice - no overlap exists.
    """
    if hi is None:
        hi = len(arr) - 1
    
    # Could add memo here, but it would never hit
    # because (lo, hi) is always unique per call
    
    if lo > hi:
        return -1
    
    mid = (lo + hi) // 2
    
    if arr[mid] == target:
        return mid
    elif arr[mid] < target:
        return binary_search(arr, target, mid + 1, hi)  # New unique range
    else:
        return binary_search(arr, target, lo, mid - 1)  # New unique range
 
 
# Example: Merge sort likewise has no overlap
def merge_sort_analysis(n: int, subproblems: set = None) -> set:
    """
    Track all subproblems visited in merge sort.
    Each subproblem (start, end) appears exactly once.
    """
    if subproblems is None:
        subproblems = set()
    
    # Simulate merge sort's division pattern
    def divide(start, end):
        if start >= end:
            return
        
        subproblems.add((start, end))
        mid = (start + end) // 2
        divide(start, mid)
        divide(mid + 1, end)
    
    divide(0, n - 1)
    return subproblems
 
 
# Check for duplicates
subproblems = merge_sort_analysis(16)
print(f"Merge sort on n=16:")
print(f"  Total subproblems: {len(subproblems)}")
print(f"  All unique: {len(subproblems) == len(set(subproblems))}")
# All subproblems are unique - memoization wouldn't help
 
 
# Contrast with Fibonacci
def fib_subproblems(n: int, visited: dict = None) -> dict:
    """Count visits to each subproblem in naive fib."""
    if visited is None:
        visited = {}
    
    visited[n] = visited.get(n, 0) + 1
    
    if n <= 1:
        return visited
    
    fib_subproblems(n - 1, visited)
    fib_subproblems(n - 2, visited)
    return visited
 
 
visits = fib_subproblems(15)
duplicates = {k: v for k, v in visits.items() if v > 1}
print(f"\nFibonacci(15):")
print(f"  Subproblems visited multiple times: {len(duplicates)}")
print(f"  Example: fib(5) visited {visits[5]} times")
# Massive duplication - memoization helps enormously

Don't Add Memoization Blindly

Adding memoization to algorithms without overlap wastes memory and adds lookup overhead. Before memoizing, ask: 'Can the same (arguments) ever be passed to this function twice?' If no, memoization won't help.

Summary: The Mathematics of Avoiding Redundancy

We've explored the deep mechanics of how memoization eliminates recomputation. Here are the essential insights:

Key Takeaways

•Call trees collapse to DAGs — Memoization transforms exponential recursive trees into polynomial DAGs with one node per unique subproblem
•Time = subproblems × work per subproblem — The subproblem counting method gives precise complexity analysis
•Overlap means diamonds — When multiple paths lead to the same subproblem, you have overlap and memoization helps
•Cache statistics reveal savings — Track hits vs misses to quantify how much work memoization eliminated
•Recursion handles ordering — Top-down memoization automatically computes dependencies before dependents
•No overlap = no benefit — Algorithms like binary search and merge sort don't benefit from memoization

What's next:

With our understanding of how memoization avoids recomputation complete, we'll explore when top-down (memoization) is the natural choice over bottom-up tabulation. Different problems favor different approaches, and knowing when top-down shines will make you a more effective problem solver.

Page Complete

You now understand the deep mechanics of how memoization eliminates redundant computation. The transformation from exponential trees to polynomial DAGs is the mathematical key to DP's power. This understanding forms the foundation for all future DP analysis.

3 / 4

Loading learning content...

Data Structures & AlgorithmsMemoization — Top-Down DP

Memoization — Top-Down Dynamic Programming

LevelIntermediate

Duration60 mins

TopicMemoization — Top-Down DP

3 / 4

Avoiding Recomputation

The Hidden Geometry of Redundant Work

This transformation is the mathematical heart of dynamic programming. Understanding it deeply—not just procedurally—reveals why DP is so powerful and when it will (and won't) help.

What You Will Learn

Visualizing the Recursive Call Tree

For Fibonacci(5), the call tree looks like this:

                    fib(5)
                   /      \
              fib(4)        fib(3)
             /     \        /     \
        fib(3)    fib(2)  fib(2)  fib(1)
        /    \     /   \   /   \
    fib(2) fib(1) f(1) f(0) f(1) f(0)
    /    \
   f(1)  f(0)

Observations about this tree:

Properties of Naive Recursive Call Trees

•Exponential growth — The tree roughly doubles in size with each increment of n. For fib(n), there are approximately 2^n nodes.
•Massive duplication — fib(3) appears twice, fib(2) appears three times, fib(1) appears five times. The smaller the subproblem, the more it's duplicated.
•Independent subtrees — Each subtree computes its result from scratch, oblivious to work done by sibling subtrees.
•Work proportional to tree size — Time complexity equals the number of nodes. More nodes = more computation.
•Polynomial unique problems — Despite exponential nodes, there are only n+1 unique subproblems: fib(0) through fib(n).

The Redundancy Ratio

The DAG Transformation

When we add memoization, the call tree collapses into a directed acyclic graph (DAG). Instead of computing each subproblem multiple times, we compute it once and reuse the cached result.

The memoized Fibonacci(5) execution looks like this:

DAG of Subproblem Dependencies:

    fib(5)
      ↓
    fib(4) ─────────┐
      ↓             │
    fib(3) ─────┐   │
      ↓         │   │
    fib(2) ─┐   │   │
      ↓     ↓   ↓   ↓
    fib(1) fib(0)  (reused)
      ↓
     [1]   [0]    [1]  [2]  [3]  [5]

Execution order: fib(1) → fib(0) → fib(2) → fib(3) → fib(4) → fib(5)
Each node computed EXACTLY ONCE.

Properties of the DAG:

Properties of Memoized Computation DAG

•One node per unique subproblem — Exactly n+1 nodes for Fibonacci(n), regardless of how many times each subproblem is referenced.
•Edges represent dependencies — An edge from A to B means 'A depends on B's result'. These form a DAG (no cycles, since subproblems are smaller).
•Polynomial structure — The DAG has O(n) nodes and O(n) edges for Fibonacci. Other problems may have O(n²) or O(n×m) nodes.
•Execution follows topological order — We must compute subproblems before the problems that depend on them. Recursion handles this automatically.
•Cache transforms edges to lookups — When we encounter a dependency edge, instead of computing, we lookup the cached result in O(1).

Call Tree vs DAG Comparison
Aspect	Naive Call Tree	Memoized DAG
Nodes	Exponential O(2^n) for Fibonacci	Linear O(n) for Fibonacci
Work per node	O(1) addition	O(1) addition
Total work	O(2^n)	O(n)
Duplicate computations	Massive (~87,000× at n=30)	Zero
Memory for results	None (recomputed each time)	O(n) for memo cache
Traversal	All children visited	Each child visited once

The Trade-off

Quantifying the Savings

Let's precisely analyze how much work memoization saves. The analysis rests on a key insight:

Time complexity of memoized recursion = (Number of unique subproblems) × (Work per subproblem)

This formula, called the subproblem counting method, gives us a precise way to analyze memoized algorithms.

complexity_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
"""
Complexity Analysis of Memoized Algorithms
==========================================
 
Formula: Time = (# unique subproblems) × (work per subproblem)
 
Let's analyze several examples:
"""
 
# -----------------------------
# EXAMPLE 1: Fibonacci
# -----------------------------
# Subproblems: fib(0), fib(1), ..., fib(n) → (n + 1) subproblems
# Work per subproblem: 2 additions (combine fib(n-1) + fib(n-2)) → O(1)
# Total time: O(n) × O(1) = O(n)
# Space: O(n) for memo + O(n) for call stack = O(n)
 
def fib(n, memo={}):
    if n in memo: return memo[n]
    if n <= 1: return n
    memo[n] = fib(n-1, memo) + fib(n-2, memo)  # O(1) work
    return memo[n]
 
 
# -----------------------------
# EXAMPLE 2: Grid Paths (m × n grid)
# -----------------------------
# Subproblems: paths(i, j) for 0 ≤ i < m, 0 ≤ j < n → (m × n) subproblems
# Work per subproblem: 1 addition (combine from above + from left) → O(1)
# Total time: O(m × n) × O(1) = O(m × n)
# Space: O(m × n) for memo + O(m + n) for call stack = O(m × n)
 
def grid_paths(m, n, memo={}):
    if (m, n) in memo: return memo[(m, n)]
    if m == 1 or n == 1: return 1
    memo[(m, n)] = grid_paths(m-1, n, memo) + grid_paths(m, n-1, memo)
    return memo[(m, n)]
 
 
# -----------------------------
# EXAMPLE 3: Longest Common Subsequence (strings of length m, n)
# -----------------------------
# Subproblems: lcs(i, j) for 0 ≤ i ≤ m, 0 ≤ j ≤ n → O(m × n) subproblems
# Work per subproblem: String comparison + max → O(1)
# Total time: O(m × n) × O(1) = O(m × n)
# Space: O(m × n) for memo
 
def lcs(s1, s2, i=None, j=None, memo=None):
    if i is None: i, j = len(s1), len(s2)
    if memo is None: memo = {}
    
    if (i, j) in memo: return memo[(i, j)]
    if i == 0 or j == 0: return 0
    
    if s1[i-1] == s2[j-1]:
        result = 1 + lcs(s1, s2, i-1, j-1, memo)
    else:
        result = max(lcs(s1, s2, i-1, j, memo), lcs(s1, s2, i, j-1, memo))
    
    memo[(i, j)] = result
    return result
 
 
# -----------------------------
# EXAMPLE 4: 0/1 Knapsack (n items, capacity W)
# -----------------------------
# Subproblems: knapsack(i, cap) for 0 ≤ i ≤ n, 0 ≤ cap ≤ W → O(n × W) subproblems
# Work per subproblem: max of two choices → O(1)
# Total time: O(n × W) × O(1) = O(n × W)
# Note: This is "pseudo-polynomial" because W is a value, not input size
 
def knapsack(weights, values, i, cap, memo={}):
    if (i, cap) in memo: return memo[(i, cap)]
    if i >= len(weights) or cap == 0: return 0
    
    skip = knapsack(weights, values, i+1, cap, memo)
    take = 0
    if weights[i] <= cap:
        take = values[i] + knapsack(weights, values, i+1, cap-weights[i], memo)
    
    memo[(i, cap)] = max(skip, take)
    return memo[(i, cap)]
 
 
# -----------------------------
# SUMMARY TABLE
# -----------------------------
"""
Problem                 | Subproblems    | Work/Subproblem | Total Time
------------------------|----------------|-----------------|------------
Fibonacci               | O(n)           | O(1)            | O(n)
Grid Paths (m×n)        | O(m×n)         | O(1)            | O(m×n)
LCS (lengths m, n)      | O(m×n)         | O(1)            | O(m×n)
0/1 Knapsack (n, W)     | O(n×W)         | O(1)            | O(n×W)
Edit Distance (m, n)    | O(m×n)         | O(1)            | O(m×n)
Matrix Chain Mult (n)   | O(n²)          | O(n)            | O(n³)
Palindrome Partition    | O(n²)          | O(n)            | O(n³)
 
The pattern: Most classic DP problems have polynomial subproblem counts
and O(1) or O(n) work per subproblem, yielding polynomial total time.
"""

The Subproblem Counting Method

The Structure of Recomputation

Not all recursive problems exhibit overlapping subproblems. To understand when memoization helps, we need to recognize the structural patterns that create redundant computation.

When does overlap occur?

Overlap occurs when the dependency structure of subproblems contains cycles in the subproblem graph—that is, when multiple paths lead to the same subproblem.

Overlapping (Memoization Helps)

•Fibonacci pattern — f(n) depends on f(n-1) and f(n-2); both depend on overlapping smaller values
•Grid traversal — Multiple paths converge on the same cell from different directions
•Sequence comparison — LCS, edit distance have shared prefixes/suffixes
•Subset problems — Different item orderings lead to same remaining subsets
•Common shape: lattice/grid — 2D state spaces with movement in multiple directions

Non-Overlapping (Memo Doesn't Help)

•Pure divide-and-conquer — Merge sort splits uniquely; no subproblem repeats
•Binary search — Each decision eliminates half; no revisiting
•Tree traversal — Each node visited exactly once naturally
•Unique partitioning — Quick sort partitions are disjoint
•Common shape: tree — No convergence in subproblem graph

The diamond pattern:

A clear indicator of overlap is the diamond (or rhombus) shape in the dependency graph:

       P
      / \
     A   B     <-- P depends on both A and B
      \ /
       C       <-- Both A and B depend on C!

When both branches of a recursion lead to the same sub-subproblem, we have overlap. Without memoization, C is computed twice. With memoization, C is computed once and reused.

The more diamonds in your dependency structure, the more memoization helps.

dependency_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
"""
Analyzing Dependency Structure for Overlap Detection
====================================================
"""
 
# EXAMPLE 1: Fibonacci has extreme overlap (diamond at every level)
#
#            fib(n)
#           /      \
#      fib(n-1)   fib(n-2)    <- Two branches
#        /    \     /    \
#   fib(n-2) fib(n-3) ... fib(n-3)  <- n-2 and n-3 appear multiple times!
#
# Overlapping paths: fib(n) → fib(n-1) → fib(n-2)
#                    fib(n) → fib(n-2)          (same destination!)
 
 
# EXAMPLE 2: Merge Sort has NO overlap (tree structure)
#
#           mergesort([1,3,2,4])
#              /            \
#    mergesort([1,3])    mergesort([2,4])     <- Disjoint halves
#       /       \          /        \
#   ms([1])   ms([3])   ms([2])   ms([4])     <- All unique!
#
# No shared subproblems: left and right halves are completely independent
 
 
# EXAMPLE 3: Grid paths has overlap (lattice structure)
#
# State space for 4×4 grid:
#
#   (0,0) → (0,1) → (0,2) → (0,3)
#     ↓       ↓       ↓       ↓
#   (1,0) → (1,1) → (1,2) → (1,3)
#     ↓       ↓       ↓       ↓  
#   (2,0) → (2,1) → (2,2) → (2,3)
#     ↓       ↓       ↓       ↓
#   (3,0) → (3,1) → (3,2) → (3,3)
#
# Multiple paths reach each cell! 
# e.g., (1,1) reachable via (0,0)→(0,1)→(1,1) and (0,0)→(1,0)→(1,1)
 
 
# Detecting overlap programmatically:
def detect_overlap(n, call_counts=None):
    """
    Count how many times each subproblem is requested.
    If any count > 1, we have overlap.
    """
    if call_counts is None:
        call_counts = {}
    
    # Record this call
    call_counts[n] = call_counts.get(n, 0) + 1
    
    # Base case
    if n <= 1:
        return n, call_counts
    
    # Recurse (don't actually compute, just track calls)
    detect_overlap(n - 1, call_counts)
    detect_overlap(n - 2, call_counts)
    
    return call_counts
 
 
# Check overlap in Fibonacci
counts = detect_overlap(10)
overlaps = {k: v for k, v in counts.items() if v > 1}
print(f"Overlapping subproblems in fib(10): {overlaps}")
# Output: {9: 1, 8: 2, 7: 3, 6: 5, 5: 8, 4: 13, 3: 21, 2: 34, 1: 55, 0: 34}
# Nearly every subproblem is computed many times!

Cache Hit Rates and Memoization Effectiveness

We can measure how effective memoization is by tracking cache hits versus cache misses:

Cache miss — First time computing a subproblem (result stored)
Cache hit — Subsequent requests for the same subproblem (result retrieved)

The hit rate = hits / (hits + misses) tells us what fraction of work was eliminated.

cache_statistics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
class MemoStats:
    """
    Wrapper that tracks memoization statistics.
    Reveals exactly how much work memoization saves.
    """
    def __init__(self):
        self.memo = {}
        self.hits = 0
        self.misses = 0
    
    def get(self, key):
        if key in self.memo:
            self.hits += 1
            return self.memo[key], True
        self.misses += 1
        return None, False
    
    def put(self, key, value):
        self.memo[key] = value
    
    def report(self):
        total = self.hits + self.misses
        hit_rate = self.hits / total if total > 0 else 0
        print(f"Cache Statistics:")
        print(f"  Hits:    {self.hits}")
        print(f"  Misses:  {self.misses}")
        print(f"  Hit Rate: {hit_rate:.2%}")
        print(f"  Work Saved: {self.hits} / {total} computations avoided")
        
 
def fib_with_stats(n: int, stats: MemoStats) -> int:
    """Fibonacci with cache statistics tracking."""
    cached, found = stats.get(n)
    if found:
        return cached
    
    if n <= 1:
        result = n
    else:
        result = fib_with_stats(n - 1, stats) + fib_with_stats(n - 2, stats)
    
    stats.put(n, result)
    return result
 
 
# Run and see statistics
print("=" * 50)
print("Fibonacci(30) Cache Analysis:")
print("=" * 50)
 
stats = MemoStats()
result = fib_with_stats(30, stats)
print(f"Result: {result}")
stats.report()
 
# Output:
# Cache Statistics:
#   Hits:    28
#   Misses:  31
#   Hit Rate: 47.46%
#   Work Saved: 28 / 59 computations avoided
#
# Without memoization: ~2.7 million calls
# With memoization: 59 calls (31 misses + 28 hits)
# Reduction factor: ~45,000×
 
 
print("\n" + "=" * 50)
print("Grid Paths(15, 15) Cache Analysis:")
print("=" * 50)
 
def grid_paths_with_stats(m: int, n: int, stats: MemoStats) -> int:
    """Grid paths with cache statistics."""
    key = (m, n)
    cached, found = stats.get(key)
    if found:
        return cached
    
    if m == 1 or n == 1:
        result = 1
    else:
        result = grid_paths_with_stats(m-1, n, stats) + \
                 grid_paths_with_stats(m, n-1, stats)
    
    stats.put(key, result)
    return result
 
 
stats2 = MemoStats()
result2 = grid_paths_with_stats(15, 15, stats2)
print(f"Result: {result2}")
stats2.report()
 
# For grid problems, the hit rate increases with grid size
# because more paths converge to intermediate cells

Interpreting Cache Statistics

Dependency Order and Computation Flow

For memoization to work, we must compute subproblems before the problems that depend on them. In top-down memoization, the recursion automatically ensures this order.

How recursion enforces dependency order:

When we call dp(n), we immediately try to compute dp(n-1) and dp(n-2)
Those calls, in turn, request their own dependencies
Eventually we hit base cases, which return directly
As calls return, they populate the cache
When we need a dependency that's already cached, we get an instant hit

This is lazy evaluation — we only compute what we need, and dependencies are computed just-in-time when first requested.

computation_order.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def fib_traced(n: int, memo: dict = None, order: list = None) -> int:
    """
    Track the order in which subproblems are first computed.
    Shows how recursion automatically respects dependencies.
    """
    if memo is None:
        memo = {}
        order = []
    
    if n in memo:
        print(f"  fib({n}) → hit (already computed)")
        return memo[n]
    
    if n <= 1:
        result = n
        order.append(n)
        print(f"  fib({n}) → base case = {result}")
    else:
        print(f"  fib({n}) → need fib({n-1}) and fib({n-2})")
        result = fib_traced(n-1, memo, order) + fib_traced(n-2, memo, order)
        order.append(n)
        print(f"  fib({n}) → computed = {result}")
    
    memo[n] = result
    return result
 
 
print("Tracing fib(6):")
print("=" * 50)
memo = {}
order = []
result = fib_traced(6, memo, order)
print("=" * 50)
print(f"Computation order: {order}")
print(f"Final result: fib(6) = {result}")
 
# Output shows:
# - Recursion descends to base cases first
# - Values computed in order: 1, 0, 2, 1 (hit), 3, 2 (hit), 4, 3 (hit), 5, 4 (hit), 6
# - Dependency order respected automatically
 
 
# For bottom-up DP (tabulation), we explicitly loop in dependency order
def fib_bottom_up(n: int) -> int:
    """
    Explicit ordering: compute 0, 1, 2, ..., n in sequence.
    Each value available before it's needed.
    """
    if n <= 1:
        return n
    
    dp = [0] * (n + 1)
    dp[0] = 0
    dp[1] = 1
    
    print("Bottom-up computation order:")
    for i in range(2, n + 1):
        dp[i] = dp[i-1] + dp[i-2]
        print(f"  dp[{i}] = dp[{i-1}] + dp[{i-2}] = {dp[i]}")
    
    return dp[n]
 
 
print("\n" + "=" * 50)
fib_bottom_up(6)
 
# Both approaches respect the same dependency order:
# smaller problems computed before larger ones

Recursion as Automatic Ordering

When Memoization Doesn't Help

Memoization isn't magic—it only helps when there's redundant work to eliminate. Here are cases where memoization provides little or no benefit:

Scenarios Where Memoization Has Limited Impact

•No overlapping subproblems — Pure divide-and-conquer (merge sort, binary search) naturally avoids revisiting subproblems. Memoization adds overhead without savings.
•Unbounded or sparse state space — If the number of possible states is too large or exploration is sparse, most lookups will be misses. Cache overhead dominates.
•Unhashable state — If state includes mutable containers or complex objects that can't be hashed, memoization becomes awkward or impossible.
•State too large to store — Some problems have state so large that storing it in a hash table is impractical (e.g., entire board configurations in some games).
•Each subproblem visited once naturally — If the algorithm already visits each subproblem exactly once (tree traversal), memoization just adds overhead.

no_overlap_example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# Example: Binary search has NO overlapping subproblems
# Adding memoization would be pointless
 
def binary_search(arr: list[int], target: int, lo: int = 0, hi: int = None) -> int:
    """
    Each call searches a unique range [lo, hi].
    No range is ever searched twice - no overlap exists.
    """
    if hi is None:
        hi = len(arr) - 1
    
    # Could add memo here, but it would never hit
    # because (lo, hi) is always unique per call
    
    if lo > hi:
        return -1
    
    mid = (lo + hi) // 2
    
    if arr[mid] == target:
        return mid
    elif arr[mid] < target:
        return binary_search(arr, target, mid + 1, hi)  # New unique range
    else:
        return binary_search(arr, target, lo, mid - 1)  # New unique range
 
 
# Example: Merge sort likewise has no overlap
def merge_sort_analysis(n: int, subproblems: set = None) -> set:
    """
    Track all subproblems visited in merge sort.
    Each subproblem (start, end) appears exactly once.
    """
    if subproblems is None:
        subproblems = set()
    
    # Simulate merge sort's division pattern
    def divide(start, end):
        if start >= end:
            return
        
        subproblems.add((start, end))
        mid = (start + end) // 2
        divide(start, mid)
        divide(mid + 1, end)
    
    divide(0, n - 1)
    return subproblems
 
 
# Check for duplicates
subproblems = merge_sort_analysis(16)
print(f"Merge sort on n=16:")
print(f"  Total subproblems: {len(subproblems)}")
print(f"  All unique: {len(subproblems) == len(set(subproblems))}")
# All subproblems are unique - memoization wouldn't help
 
 
# Contrast with Fibonacci
def fib_subproblems(n: int, visited: dict = None) -> dict:
    """Count visits to each subproblem in naive fib."""
    if visited is None:
        visited = {}
    
    visited[n] = visited.get(n, 0) + 1
    
    if n <= 1:
        return visited
    
    fib_subproblems(n - 1, visited)
    fib_subproblems(n - 2, visited)
    return visited
 
 
visits = fib_subproblems(15)
duplicates = {k: v for k, v in visits.items() if v > 1}
print(f"\nFibonacci(15):")
print(f"  Subproblems visited multiple times: {len(duplicates)}")
print(f"  Example: fib(5) visited {visits[5]} times")
# Massive duplication - memoization helps enormously

Don't Add Memoization Blindly

Summary: The Mathematics of Avoiding Redundancy

We've explored the deep mechanics of how memoization eliminates recomputation. Here are the essential insights:

Key Takeaways

•Call trees collapse to DAGs — Memoization transforms exponential recursive trees into polynomial DAGs with one node per unique subproblem
•Time = subproblems × work per subproblem — The subproblem counting method gives precise complexity analysis
•Overlap means diamonds — When multiple paths lead to the same subproblem, you have overlap and memoization helps
•Cache statistics reveal savings — Track hits vs misses to quantify how much work memoization eliminated
•Recursion handles ordering — Top-down memoization automatically computes dependencies before dependents
•No overlap = no benefit — Algorithms like binary search and merge sort don't benefit from memoization

What's next:

Page Complete

3 / 4