Data Structures & AlgorithmsMemoization — Top-Down DP

Memoization — Top-Down Dynamic Programming

LevelIntermediate

Duration60 mins

TopicMemoization — Top-Down DP

1 / 4

Recursive Approach with Caching

The Elegance of Enhanced Recursion

Dynamic programming often intimidates newcomers with its reputation for complexity and abstraction. But here's a secret that experienced engineers understand deeply: the most intuitive form of DP is simply recursion with memory.

Memoization—the technique of caching recursive function results—represents the natural evolution of recursive problem-solving. It preserves the elegant top-down decomposition that makes recursion intellectually satisfying while eliminating the catastrophic inefficiency that plagues naive recursive implementations.

If you can write a recursive solution, you can write a memoized solution. The transformation is mechanical, almost automatic. And that's precisely what makes memoization such a powerful entry point into the world of dynamic programming.

What You Will Learn

By the end of this page, you will understand why pure recursion without caching leads to exponential explosion, how memoization surgically eliminates redundant computation, and why this approach is called 'top-down' dynamic programming. You'll see the fundamental connection between recursive structure and DP optimization.

The Recursive Mindset

Before we can appreciate memoization, we must first deeply understand the recursive approach it enhances. Recursion is more than a programming technique—it's a problem-solving philosophy based on a simple but profound observation:

Many complex problems can be expressed as simpler versions of themselves.

This self-referential structure is the essence of recursive thinking. When you face a problem, you ask: "Can I solve this by solving a smaller instance of the same problem?" If yes, recursion provides a natural framework for expressing that solution.

Core Principles of Recursive Problem Decomposition

•Base Case — The simplest instance(s) of the problem that can be solved directly without further recursion. These are the foundation upon which all other solutions are built.
•Recursive Case — How to express the solution for a larger problem in terms of solutions to smaller subproblems. This is the inductive step that bridges base cases to the general solution.
•Problem Reduction — Each recursive call must make progress toward a base case. The problem must genuinely become 'smaller' with each recursion.
•Composition — How subproblem solutions combine to form the final answer. This may involve addition, selection of maximum/minimum, concatenation, or more complex operations.

The power of recursive framing:

Consider the Fibonacci sequence, the paradigmatic example of recursive definition:

F(0) = 0 (base case)
F(1) = 1 (base case)
F(n) = F(n-1) + F(n-2) for n > 1 (recursive case)

This definition is mathematically elegant. It captures the essence of Fibonacci numbers without ambiguity. And it translates directly into code:

fibonacci_recursive.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def fibonacci(n: int) -> int:
    """
    Compute the nth Fibonacci number using pure recursion.
    
    This implementation directly mirrors the mathematical definition:
    - F(0) = 0
    - F(1) = 1  
    - F(n) = F(n-1) + F(n-2) for n > 1
    
    While mathematically elegant, this has exponential time complexity O(2^n).
    """
    # Base cases: the foundation of our recursive definition
    if n == 0:
        return 0
    if n == 1:
        return 1
    
    # Recursive case: express F(n) in terms of smaller Fibonacci numbers
    return fibonacci(n - 1) + fibonacci(n - 2)
 
 
# Example usage demonstrating the simplicity
print(f"F(0) = {fibonacci(0)}")   # 0
print(f"F(1) = {fibonacci(1)}")   # 1
print(f"F(5) = {fibonacci(5)}")   # 5
print(f"F(10) = {fibonacci(10)}") # 55
 
# Warning: F(40) will take several seconds
# Warning: F(50) will take minutes
# The exponential explosion is real and dramatic

This code is beautiful in its simplicity. It reads almost like the mathematical definition. A programmer unfamiliar with Fibonacci could understand this function in seconds.

But there's a devastating problem lurking beneath this elegance.

The Exponential Explosion Problem

The naive recursive Fibonacci function hides a catastrophic inefficiency. To understand it, let's trace what happens when we compute fibonacci(5):

fibonacci(5)
├── fibonacci(4)
│   ├── fibonacci(3)
│   │   ├── fibonacci(2)
│   │   │   ├── fibonacci(1) → 1
│   │   │   └── fibonacci(0) → 0  
│   │   └── fibonacci(1) → 1
│   └── fibonacci(2)
│       ├── fibonacci(1) → 1
│       └── fibonacci(0) → 0
└── fibonacci(3)
    ├── fibonacci(2)
    │   ├── fibonacci(1) → 1
    │   └── fibonacci(0) → 0
    └── fibonacci(1) → 1

Notice the repetition. fibonacci(3) is computed twice. fibonacci(2) is computed three times. fibonacci(1) is computed five times. And this is just for n=5!

The Explosion Scales Exponentially

For fibonacci(n), the number of function calls grows approximately as φ^n, where φ ≈ 1.618 (the golden ratio). For n=50, this means roughly 20 billion redundant computations. What could be computed in microseconds takes minutes. What could be computed in milliseconds takes days.

Function Calls for Naive Recursive Fibonacci
n	Function Calls	Approximate Time (1GHz)	Result
10	177	< 1 microsecond	55
20	21,891	~22 microseconds	6,765
30	2,692,537	~2.7 milliseconds	832,040
40	331,160,281	~0.33 seconds	102,334,155
50	~40 billion	~40 seconds	12,586,269,025
100	~10^20	~3 million years	...very large

This exponential explosion occurs because the naive implementation solves the same subproblem multiple times. Every call to fibonacci(n) spawns calls to fibonacci(n-1) and fibonacci(n-2), and those calls spawn further overlapping calls.

The tree of recursive calls grows exponentially, but the number of distinct subproblems is only linear: we need F(0), F(1), F(2), ..., F(n). Just n+1 unique values.

This is the fundamental insight that memoization exploits.

Symptoms of Overlapping Subproblem Explosion

•Runtime grows exponentially — Doubling input size squares (or worse) the computation time
•Identical function calls with identical arguments appear repeatedly in the call tree
•The recursion tree has many more nodes than unique subproblems — a classic sign of wasted work
•Adding print statements shows the same values being computed over and over
•Stack usage is reasonable but wall-clock time is astronomical — the algorithm isn't deep, just redundant

The Caching Insight

The solution to exponential explosion is beautifully simple: remember what you've already computed.

This is the core insight of memoization. If we cache the result of each unique subproblem the first time we solve it, we can simply look up the answer instead of recomputing it when that subproblem appears again.

Memoization: From the Latin memorandum ("to be remembered"). The technique of storing function results to avoid redundant computation.

The transformation from naive recursion to memoized recursion is mechanical:

Pure Recursion vs. Memoized Recursion
Aspect	Pure Recursion	Memoized Recursion
Before computing	Proceed immediately	Check if result is cached
After computing	Return result	Store result in cache, then return
Repeated calls	Recompute from scratch	Return cached value in O(1)
Time complexity	Often exponential O(2^n)	Typically polynomial O(n) or O(n²)
Space complexity	Just call stack O(n)	Call stack + cache O(n)

The memoization recipe:

Create a cache — A dictionary/hash map mapping function arguments to results
Before recursing — Check if the answer is already in the cache
If cached — Return the cached value immediately (O(1) lookup)
If not cached — Compute the result recursively, then store it in the cache before returning

This simple modification transforms exponential algorithms into polynomial ones—often the difference between "never finishes" and "instantaneous".

fibonacci_memoized.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def fibonacci_memoized(n: int, memo: dict[int, int] = None) -> int:
    """
    Compute the nth Fibonacci number using memoization.
    
    This implementation adds caching to the recursive approach:
    - Before computing: check if result exists in memo
    - After computing: store result in memo before returning
    
    Time: O(n) - each subproblem computed exactly once
    Space: O(n) - memo stores n+1 values, call stack depth is n
    """
    # Initialize memo dictionary on first call
    if memo is None:
        memo = {}
    
    # CHECK CACHE: If we've solved this before, return cached result
    if n in memo:
        return memo[n]
    
    # BASE CASES: Solve directly, no caching needed for trivial cases
    if n == 0:
        return 0
    if n == 1:
        return 1
    
    # RECURSIVE CASE: Compute result from subproblems
    result = fibonacci_memoized(n - 1, memo) + fibonacci_memoized(n - 2, memo)
    
    # STORE IN CACHE: Remember this result for future calls
    memo[n] = result
    
    return result
 
 
# Now we can compute large Fibonacci numbers instantly
print(f"F(50) = {fibonacci_memoized(50)}")   # 12586269025 (instant!)
print(f"F(100) = {fibonacci_memoized(100)}") # 354224848179261915075 (still instant!)
print(f"F(200) = {fibonacci_memoized(200)}") # Huge number, still instant
 
# The exponential explosion is completely eliminated

The Transformation is Complete

With just a few lines of additional code—a cache check and a cache store—we've transformed an O(2^n) algorithm into an O(n) algorithm. F(100) that would have taken millions of years now computes in microseconds. This is the power of memoization.

Understanding the 'Top-Down' Nature

Memoization is called top-down dynamic programming because of how it approaches problem-solving:

Start at the top — Begin with the original problem (e.g., fibonacci(50))
Descend through subproblems — Recursively break down into smaller subproblems
Hit base cases — Eventually reach problems simple enough to solve directly
Build up answers — Combine subproblem solutions as recursion unwinds

The "top-down" label contrasts with bottom-up DP (tabulation), which we'll cover in the next module. Bottom-up starts with base cases and systematically builds toward the target.

Here's a visualization of how memoized Fibonacci works:

fibonacci_traced.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def fibonacci_traced(n: int, memo: dict = None, depth: int = 0) -> int:
    """
    Fibonacci with tracing to visualize the top-down nature.
    Watch how we start at the top (n=7) and descend.
    """
    if memo is None:
        memo = {}
    
    indent = "  " * depth
    
    # Check cache first
    if n in memo:
        print(f"{indent}fib({n}) → CACHED: {memo[n]}")
        return memo[n]
    
    # Base cases
    if n == 0:
        print(f"{indent}fib(0) → BASE: 0")
        return 0
    if n == 1:
        print(f"{indent}fib(1) → BASE: 1")
        return 1
    
    # Recursive case
    print(f"{indent}fib({n}) → COMPUTING...")
    result = fibonacci_traced(n - 1, memo, depth + 1) + \
             fibonacci_traced(n - 2, memo, depth + 1)
    
    memo[n] = result
    print(f"{indent}fib({n}) → STORED: {result}")
    return result
 
 
# Trace fibonacci(7) to see top-down behavior
print("Computing fibonacci(7) with memoization:")
print("=" * 50)
result = fibonacci_traced(7)
print("=" * 50)
print(f"Result: {result}")
 
# Output shows:
# - We start at fib(7) [the TOP]
# - Descend through fib(6), fib(5), etc.
# - Hit base cases fib(1) and fib(0)  
# - On the way back UP, we hit CACHED values
# - Each unique subproblem computed exactly once

Why 'top-down' matters:

The top-down approach has several important characteristics:

Natural problem decomposition — You think about problems the way humans naturally do: "To solve this big problem, I need to solve these smaller problems first."
Lazy computation — Only subproblems that are actually needed get computed. If certain parts of the subproblem space are never touched by the recursion, they're never computed.
Recursion depth — The call stack depth equals the longest path from the original problem to a base case. This can cause stack overflow for very deep recursions.
Overhead — Function call overhead and hash table lookups add constant factors compared to bottom-up iteration.

Top-Down Advantages

•Intuitive — directly mirrors problem structure
•Lazy evaluation — only computes needed subproblems
•Easier to derive — start with plain recursion, add memo
•Handles complex state — arbitrary argument combinations
•Natural for tree-structured subproblems

Top-Down Considerations

•Stack depth — deep recursion can overflow
•Function overhead — each recursive call has cost
•Hash lookups — memo access adds constant factor
•Memory fragmentation — dictionary allocations scattered
•Harder to optimize space — cache persists entirely

The Conceptual Framework

Memoization rests on a fundamental principle that applies far beyond Fibonacci:

If a function is pure (deterministic with no side effects) and its arguments fully determine its result, then calling it multiple times with the same arguments is wasted work.

This principle leads to a formal definition of when memoization applies:

Requirements for Memoization

•Overlapping subproblems — The same subproblem must appear multiple times in the recursion tree. Without overlap, caching provides no benefit.
•Optimal substructure — The optimal solution must be constructable from optimal solutions to subproblems. This ensures cached subproblem answers contribute to the correct final answer.
•Pure function behavior — Given the same inputs, the function must always produce the same output. Side effects would invalidate cached results.
•Hashable arguments — Function arguments must be usable as dictionary keys (or array indices). Mutable arguments require special handling.

The memoization invariant:

At any point during execution, the memo dictionary satisfies this invariant:

For every entry (args, result) in memo:
    function(args) == result

This invariant guarantees correctness: looking up a cached value is semantically equivalent to recomputing it. The memo is a materialized view of the function's behavior over the subproblem space.

Thinking about state space:

Every recursive DP problem defines a state space—the set of all possible subproblems. For Fibonacci:

State: a single integer n
State space: {0, 1, 2, ..., N}
Size: N + 1 distinct states

Memoization ensures each state is visited at most once. The time complexity of memoized recursion is:

O(number of states × work per state)

For Fibonacci: O(N) states × O(1) work per state = O(N) total.

The Mental Model

Think of memoization as converting a tree of overlapping recursive calls into a directed acyclic graph (DAG) of unique subproblems. The tree may have exponentially many nodes, but the DAG has only polynomially many. Caching is what transforms the tree into the DAG.

Beyond Fibonacci — A More Complex Example

Fibonacci is illustrative but simple. Let's see memoization applied to a problem with more interesting structure: counting unique paths in a grid.

Problem: You're at the top-left corner of an m×n grid. You can only move right or down. How many unique paths exist to reach the bottom-right corner?

This problem appears in robotics, game development, and combinatorics. The recursive insight is elegant:

To reach cell (i, j), you must have come from (i-1, j) or (i, j-1)
Therefore: paths(i, j) = paths(i-1, j) + paths(i, j-1)
Base cases: paths(0, ) = 1 and paths(, 0) = 1 (only one way along edges)

grid_paths.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
def unique_paths_naive(m: int, n: int) -> int:
    """
    Count unique paths in m×n grid using naive recursion.
    
    This has EXPONENTIAL time complexity due to overlapping subproblems.
    For a 20×20 grid, this would take an impractical amount of time.
    """
    # Base case: reached the bottom-right corner
    if m == 1 or n == 1:
        return 1
    
    # Recursive case: sum of paths from above and from left
    return unique_paths_naive(m - 1, n) + unique_paths_naive(m, n - 1)
 
 
def unique_paths_memo(m: int, n: int, memo: dict = None) -> int:
    """
    Count unique paths using memoization.
    
    Time: O(m × n) - each cell computed exactly once
    Space: O(m × n) - memo size + O(m + n) call stack
    
    This handles even 100×100 grids instantly.
    """
    if memo is None:
        memo = {}
    
    # CHECK CACHE with tuple key (m, n)
    if (m, n) in memo:
        return memo[(m, n)]
    
    # Base cases
    if m == 1 or n == 1:
        return 1
    
    # Recursive case
    result = unique_paths_memo(m - 1, n, memo) + unique_paths_memo(m, n - 1, memo)
    
    # STORE IN CACHE
    memo[(m, n)] = result
    
    return result
 
 
# Compare performance
import time
 
# Small grid: both work
print(f"3×3 grid: {unique_paths_memo(3, 3)} paths")        # 6
print(f"10×10 grid: {unique_paths_memo(10, 10)} paths")    # 48620
 
# Large grid: only memoized version is feasible
start = time.time()
result = unique_paths_memo(100, 100)
elapsed = time.time() - start
print(f"100×100 grid: {result} paths (computed in {elapsed:.6f}s)")
 
# The answer is a massive number, computed instantly:
# 22750883079422934966181954039568885395604168260154104734000

Notice the pattern:

We identified the recursive structure
We added a memo dictionary with tuple keys
We check the cache before computing
We store results after computing

This same transformation works for countless problems. The memoization recipe is universal.

Common Patterns That Signal Memoization

With practice, you'll develop an instinct for when memoization applies. Here are the telltale patterns:

Memoization Pattern Recognition Guide

•'In how many ways...' — Counting problems often have overlapping subproblems (paths, combinations, arrangements)
•'Minimum/maximum cost to...' — Optimization problems where the optimal solution depends on optimal subproblem solutions
•'Is it possible to...' — Decision problems that decompose into smaller decisions with overlap
•Recursive function calls the same subproblem multiple times — Direct evidence of overlap
•Problem has 'independent' subproblems that share parameters — Like grid paths sharing rows/columns
•Naive recursion has exponential complexity but polynomial state space — The gap indicates redundant computation
•Problem can be parameterized by a small number of integer indices — Enables efficient memo key design

Practice Makes Pattern Recognition Automatic

Experienced engineers don't consciously analyze these patterns—they recognize them instantly. This comes from solving many problems. Each problem you solve adds to your pattern library, making future recognition faster and more reliable.

Summary: The Recursive-Caching Foundation

We've established the foundational understanding of memoization as enhanced recursion. Let's consolidate the key insights:

Key Takeaways

•Memoization = Recursion + Caching — The transformation is mechanical: add a cache, check before computing, store after computing
•Overlapping subproblems cause exponential explosion — Naive recursion recomputes the same values repeatedly
•Caching collapses exponential to polynomial — Each unique subproblem is computed exactly once
•'Top-down' refers to the problem decomposition direction — We start with the target and recursively break down
•The technique applies broadly — Any pure recursive function with overlapping subproblems can benefit
•Time complexity = O(states × work per state) — Memoization bounds work by the size of the state space

What's next:

Now that you understand the conceptual foundation of memoization as recursive caching, we'll dive into the practical implementation details. The next page covers the mechanics of adding memo dictionaries and arrays—including key design, initialization, and language-specific idioms.

Page Complete

You now understand memoization as the natural evolution of recursion. The core insight—cache what you've computed to avoid recomputing—transforms exponential algorithms into polynomial ones. Next, we'll master the practical mechanics of implementing memo caches.

1 / 4

Loading learning content...

Data Structures & AlgorithmsMemoization — Top-Down DP

Memoization — Top-Down Dynamic Programming

LevelIntermediate

Duration60 mins

TopicMemoization — Top-Down DP

1 / 4

Recursive Approach with Caching

The Elegance of Enhanced Recursion

What You Will Learn

The Recursive Mindset

Many complex problems can be expressed as simpler versions of themselves.

Core Principles of Recursive Problem Decomposition

•Base Case — The simplest instance(s) of the problem that can be solved directly without further recursion. These are the foundation upon which all other solutions are built.
•Recursive Case — How to express the solution for a larger problem in terms of solutions to smaller subproblems. This is the inductive step that bridges base cases to the general solution.
•Problem Reduction — Each recursive call must make progress toward a base case. The problem must genuinely become 'smaller' with each recursion.
•Composition — How subproblem solutions combine to form the final answer. This may involve addition, selection of maximum/minimum, concatenation, or more complex operations.

The power of recursive framing:

Consider the Fibonacci sequence, the paradigmatic example of recursive definition:

F(0) = 0 (base case)
F(1) = 1 (base case)
F(n) = F(n-1) + F(n-2) for n > 1 (recursive case)

This definition is mathematically elegant. It captures the essence of Fibonacci numbers without ambiguity. And it translates directly into code:

fibonacci_recursive.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def fibonacci(n: int) -> int:
    """
    Compute the nth Fibonacci number using pure recursion.
    
    This implementation directly mirrors the mathematical definition:
    - F(0) = 0
    - F(1) = 1  
    - F(n) = F(n-1) + F(n-2) for n > 1
    
    While mathematically elegant, this has exponential time complexity O(2^n).
    """
    # Base cases: the foundation of our recursive definition
    if n == 0:
        return 0
    if n == 1:
        return 1
    
    # Recursive case: express F(n) in terms of smaller Fibonacci numbers
    return fibonacci(n - 1) + fibonacci(n - 2)
 
 
# Example usage demonstrating the simplicity
print(f"F(0) = {fibonacci(0)}")   # 0
print(f"F(1) = {fibonacci(1)}")   # 1
print(f"F(5) = {fibonacci(5)}")   # 5
print(f"F(10) = {fibonacci(10)}") # 55
 
# Warning: F(40) will take several seconds
# Warning: F(50) will take minutes
# The exponential explosion is real and dramatic

This code is beautiful in its simplicity. It reads almost like the mathematical definition. A programmer unfamiliar with Fibonacci could understand this function in seconds.

But there's a devastating problem lurking beneath this elegance.

The Exponential Explosion Problem

The naive recursive Fibonacci function hides a catastrophic inefficiency. To understand it, let's trace what happens when we compute fibonacci(5):

fibonacci(5)
├── fibonacci(4)
│   ├── fibonacci(3)
│   │   ├── fibonacci(2)
│   │   │   ├── fibonacci(1) → 1
│   │   │   └── fibonacci(0) → 0  
│   │   └── fibonacci(1) → 1
│   └── fibonacci(2)
│       ├── fibonacci(1) → 1
│       └── fibonacci(0) → 0
└── fibonacci(3)
    ├── fibonacci(2)
    │   ├── fibonacci(1) → 1
    │   └── fibonacci(0) → 0
    └── fibonacci(1) → 1

Notice the repetition. fibonacci(3) is computed twice. fibonacci(2) is computed three times. fibonacci(1) is computed five times. And this is just for n=5!

The Explosion Scales Exponentially

Function Calls for Naive Recursive Fibonacci
n	Function Calls	Approximate Time (1GHz)	Result
10	177	< 1 microsecond	55
20	21,891	~22 microseconds	6,765
30	2,692,537	~2.7 milliseconds	832,040
40	331,160,281	~0.33 seconds	102,334,155
50	~40 billion	~40 seconds	12,586,269,025
100	~10^20	~3 million years	...very large

The tree of recursive calls grows exponentially, but the number of distinct subproblems is only linear: we need F(0), F(1), F(2), ..., F(n). Just n+1 unique values.

This is the fundamental insight that memoization exploits.

Symptoms of Overlapping Subproblem Explosion

•Runtime grows exponentially — Doubling input size squares (or worse) the computation time
•Identical function calls with identical arguments appear repeatedly in the call tree
•The recursion tree has many more nodes than unique subproblems — a classic sign of wasted work
•Adding print statements shows the same values being computed over and over
•Stack usage is reasonable but wall-clock time is astronomical — the algorithm isn't deep, just redundant

The Caching Insight

The solution to exponential explosion is beautifully simple: remember what you've already computed.

Memoization: From the Latin memorandum ("to be remembered"). The technique of storing function results to avoid redundant computation.

The transformation from naive recursion to memoized recursion is mechanical:

Pure Recursion vs. Memoized Recursion
Aspect	Pure Recursion	Memoized Recursion
Before computing	Proceed immediately	Check if result is cached
After computing	Return result	Store result in cache, then return
Repeated calls	Recompute from scratch	Return cached value in O(1)
Time complexity	Often exponential O(2^n)	Typically polynomial O(n) or O(n²)
Space complexity	Just call stack O(n)	Call stack + cache O(n)

The memoization recipe:

Create a cache — A dictionary/hash map mapping function arguments to results
Before recursing — Check if the answer is already in the cache
If cached — Return the cached value immediately (O(1) lookup)
If not cached — Compute the result recursively, then store it in the cache before returning

This simple modification transforms exponential algorithms into polynomial ones—often the difference between "never finishes" and "instantaneous".

fibonacci_memoized.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def fibonacci_memoized(n: int, memo: dict[int, int] = None) -> int:
    """
    Compute the nth Fibonacci number using memoization.
    
    This implementation adds caching to the recursive approach:
    - Before computing: check if result exists in memo
    - After computing: store result in memo before returning
    
    Time: O(n) - each subproblem computed exactly once
    Space: O(n) - memo stores n+1 values, call stack depth is n
    """
    # Initialize memo dictionary on first call
    if memo is None:
        memo = {}
    
    # CHECK CACHE: If we've solved this before, return cached result
    if n in memo:
        return memo[n]
    
    # BASE CASES: Solve directly, no caching needed for trivial cases
    if n == 0:
        return 0
    if n == 1:
        return 1
    
    # RECURSIVE CASE: Compute result from subproblems
    result = fibonacci_memoized(n - 1, memo) + fibonacci_memoized(n - 2, memo)
    
    # STORE IN CACHE: Remember this result for future calls
    memo[n] = result
    
    return result
 
 
# Now we can compute large Fibonacci numbers instantly
print(f"F(50) = {fibonacci_memoized(50)}")   # 12586269025 (instant!)
print(f"F(100) = {fibonacci_memoized(100)}") # 354224848179261915075 (still instant!)
print(f"F(200) = {fibonacci_memoized(200)}") # Huge number, still instant
 
# The exponential explosion is completely eliminated

The Transformation is Complete

Understanding the 'Top-Down' Nature

Memoization is called top-down dynamic programming because of how it approaches problem-solving:

Start at the top — Begin with the original problem (e.g., fibonacci(50))
Descend through subproblems — Recursively break down into smaller subproblems
Hit base cases — Eventually reach problems simple enough to solve directly
Build up answers — Combine subproblem solutions as recursion unwinds

The "top-down" label contrasts with bottom-up DP (tabulation), which we'll cover in the next module. Bottom-up starts with base cases and systematically builds toward the target.

Here's a visualization of how memoized Fibonacci works:

fibonacci_traced.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def fibonacci_traced(n: int, memo: dict = None, depth: int = 0) -> int:
    """
    Fibonacci with tracing to visualize the top-down nature.
    Watch how we start at the top (n=7) and descend.
    """
    if memo is None:
        memo = {}
    
    indent = "  " * depth
    
    # Check cache first
    if n in memo:
        print(f"{indent}fib({n}) → CACHED: {memo[n]}")
        return memo[n]
    
    # Base cases
    if n == 0:
        print(f"{indent}fib(0) → BASE: 0")
        return 0
    if n == 1:
        print(f"{indent}fib(1) → BASE: 1")
        return 1
    
    # Recursive case
    print(f"{indent}fib({n}) → COMPUTING...")
    result = fibonacci_traced(n - 1, memo, depth + 1) + \
             fibonacci_traced(n - 2, memo, depth + 1)
    
    memo[n] = result
    print(f"{indent}fib({n}) → STORED: {result}")
    return result
 
 
# Trace fibonacci(7) to see top-down behavior
print("Computing fibonacci(7) with memoization:")
print("=" * 50)
result = fibonacci_traced(7)
print("=" * 50)
print(f"Result: {result}")
 
# Output shows:
# - We start at fib(7) [the TOP]
# - Descend through fib(6), fib(5), etc.
# - Hit base cases fib(1) and fib(0)  
# - On the way back UP, we hit CACHED values
# - Each unique subproblem computed exactly once

Why 'top-down' matters:

The top-down approach has several important characteristics:

Natural problem decomposition — You think about problems the way humans naturally do: "To solve this big problem, I need to solve these smaller problems first."
Lazy computation — Only subproblems that are actually needed get computed. If certain parts of the subproblem space are never touched by the recursion, they're never computed.
Recursion depth — The call stack depth equals the longest path from the original problem to a base case. This can cause stack overflow for very deep recursions.
Overhead — Function call overhead and hash table lookups add constant factors compared to bottom-up iteration.

Top-Down Advantages

•Intuitive — directly mirrors problem structure
•Lazy evaluation — only computes needed subproblems
•Easier to derive — start with plain recursion, add memo
•Handles complex state — arbitrary argument combinations
•Natural for tree-structured subproblems

Top-Down Considerations

•Stack depth — deep recursion can overflow
•Function overhead — each recursive call has cost
•Hash lookups — memo access adds constant factor
•Memory fragmentation — dictionary allocations scattered
•Harder to optimize space — cache persists entirely

The Conceptual Framework

Memoization rests on a fundamental principle that applies far beyond Fibonacci:

If a function is pure (deterministic with no side effects) and its arguments fully determine its result, then calling it multiple times with the same arguments is wasted work.

This principle leads to a formal definition of when memoization applies:

Requirements for Memoization

•Overlapping subproblems — The same subproblem must appear multiple times in the recursion tree. Without overlap, caching provides no benefit.
•Optimal substructure — The optimal solution must be constructable from optimal solutions to subproblems. This ensures cached subproblem answers contribute to the correct final answer.
•Pure function behavior — Given the same inputs, the function must always produce the same output. Side effects would invalidate cached results.
•Hashable arguments — Function arguments must be usable as dictionary keys (or array indices). Mutable arguments require special handling.

The memoization invariant:

At any point during execution, the memo dictionary satisfies this invariant:

For every entry (args, result) in memo:
    function(args) == result

Thinking about state space:

Every recursive DP problem defines a state space—the set of all possible subproblems. For Fibonacci:

State: a single integer n
State space: {0, 1, 2, ..., N}
Size: N + 1 distinct states

Memoization ensures each state is visited at most once. The time complexity of memoized recursion is:

O(number of states × work per state)

For Fibonacci: O(N) states × O(1) work per state = O(N) total.

The Mental Model

Beyond Fibonacci — A More Complex Example

Fibonacci is illustrative but simple. Let's see memoization applied to a problem with more interesting structure: counting unique paths in a grid.

Problem: You're at the top-left corner of an m×n grid. You can only move right or down. How many unique paths exist to reach the bottom-right corner?

This problem appears in robotics, game development, and combinatorics. The recursive insight is elegant:

To reach cell (i, j), you must have come from (i-1, j) or (i, j-1)
Therefore: paths(i, j) = paths(i-1, j) + paths(i, j-1)
Base cases: paths(0, ) = 1 and paths(, 0) = 1 (only one way along edges)

grid_paths.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
def unique_paths_naive(m: int, n: int) -> int:
    """
    Count unique paths in m×n grid using naive recursion.
    
    This has EXPONENTIAL time complexity due to overlapping subproblems.
    For a 20×20 grid, this would take an impractical amount of time.
    """
    # Base case: reached the bottom-right corner
    if m == 1 or n == 1:
        return 1
    
    # Recursive case: sum of paths from above and from left
    return unique_paths_naive(m - 1, n) + unique_paths_naive(m, n - 1)
 
 
def unique_paths_memo(m: int, n: int, memo: dict = None) -> int:
    """
    Count unique paths using memoization.
    
    Time: O(m × n) - each cell computed exactly once
    Space: O(m × n) - memo size + O(m + n) call stack
    
    This handles even 100×100 grids instantly.
    """
    if memo is None:
        memo = {}
    
    # CHECK CACHE with tuple key (m, n)
    if (m, n) in memo:
        return memo[(m, n)]
    
    # Base cases
    if m == 1 or n == 1:
        return 1
    
    # Recursive case
    result = unique_paths_memo(m - 1, n, memo) + unique_paths_memo(m, n - 1, memo)
    
    # STORE IN CACHE
    memo[(m, n)] = result
    
    return result
 
 
# Compare performance
import time
 
# Small grid: both work
print(f"3×3 grid: {unique_paths_memo(3, 3)} paths")        # 6
print(f"10×10 grid: {unique_paths_memo(10, 10)} paths")    # 48620
 
# Large grid: only memoized version is feasible
start = time.time()
result = unique_paths_memo(100, 100)
elapsed = time.time() - start
print(f"100×100 grid: {result} paths (computed in {elapsed:.6f}s)")
 
# The answer is a massive number, computed instantly:
# 22750883079422934966181954039568885395604168260154104734000

Notice the pattern:

We identified the recursive structure
We added a memo dictionary with tuple keys
We check the cache before computing
We store results after computing

This same transformation works for countless problems. The memoization recipe is universal.

Common Patterns That Signal Memoization

With practice, you'll develop an instinct for when memoization applies. Here are the telltale patterns:

Memoization Pattern Recognition Guide

•'In how many ways...' — Counting problems often have overlapping subproblems (paths, combinations, arrangements)
•'Minimum/maximum cost to...' — Optimization problems where the optimal solution depends on optimal subproblem solutions
•'Is it possible to...' — Decision problems that decompose into smaller decisions with overlap
•Recursive function calls the same subproblem multiple times — Direct evidence of overlap
•Problem has 'independent' subproblems that share parameters — Like grid paths sharing rows/columns
•Naive recursion has exponential complexity but polynomial state space — The gap indicates redundant computation
•Problem can be parameterized by a small number of integer indices — Enables efficient memo key design

Practice Makes Pattern Recognition Automatic

Summary: The Recursive-Caching Foundation

We've established the foundational understanding of memoization as enhanced recursion. Let's consolidate the key insights:

Key Takeaways

•Memoization = Recursion + Caching — The transformation is mechanical: add a cache, check before computing, store after computing
•Overlapping subproblems cause exponential explosion — Naive recursion recomputes the same values repeatedly
•Caching collapses exponential to polynomial — Each unique subproblem is computed exactly once
•'Top-down' refers to the problem decomposition direction — We start with the target and recursively break down
•The technique applies broadly — Any pure recursive function with overlapping subproblems can benefit
•Time complexity = O(states × work per state) — Memoization bounds work by the size of the state space

What's next:

Page Complete

1 / 4