Data Structures & AlgorithmsKnapsack Problems

Knapsack Problems — Resource-Constrained Optimization

LevelIntermediate

Duration90 mins

TopicKnapsack Problems

2 / 4

State Design — Capacity Remaining as the Key Dimension

The Art of State Design

Dynamic programming is fundamentally about state design—choosing the right way to describe subproblems so that larger problems can be built from smaller ones. In the 0/1 Knapsack problem, the state choice might seem obvious in retrospect, but understanding why it works reveals deep insights applicable to all DP problems.

The state for 0/1 Knapsack is typically expressed as:

dp[i][c] = Maximum value using items 1 through i with remaining capacity c

But why this particular formulation? Why track "capacity remaining" rather than "capacity used"? Why do we need two dimensions at all? Could we use fewer? Could different state formulations lead to better algorithms?

These questions touch on the heart of dynamic programming design. Answering them will sharpen your ability to design DP solutions for novel problems.

What You Will Learn

By the end of this page, you will understand why 'capacity remaining' is the natural state variable for knapsack problems, how alternative formulations compare, and how to reason about state design for DP problems in general. You'll also see how the state dimensions directly determine time and space complexity.

What Is State in Dynamic Programming?

In dynamic programming, state refers to the minimal information needed to describe a subproblem completely. A good state definition must satisfy several criteria:

1. Sufficient Information: The state must contain enough information to solve the subproblem without referring back to the full problem description. Given the state, we should be able to make optimal decisions for that subproblem.

2. Optimal Substructure: The state must be defined so that optimal solutions to subproblems combine to form optimal solutions to larger problems. The state boundaries must align with the problem's natural decomposition.

3. Overlapping Subproblems: The state space should exhibit repetition—many distinct problem instances should reduce to the same state. This enables the caching that makes DP efficient.

4. Minimality: The state should be as compact as possible while still satisfying the above. Unnecessary state dimensions increase complexity without benefit.

The State Design Principle

The state must capture exactly what matters for future decisions. Include too little, and you can't make optimal choices. Include too much, and you lose the efficiency of subproblem reuse. Finding the right balance is the art of DP.

For the 0/1 Knapsack problem, we need to ask: What information do we need to decide whether to include item i?

The answer crystallizes into two pieces:

Which items remain for consideration (items i through n)
How much capacity is available (0 through W)

This leads naturally to a two-dimensional state space indexed by (item_index, remaining_capacity).

Why Capacity Is a State Dimension

Consider the decision to include item i. This decision depends critically on one factor: whether item i fits in the knapsack given the remaining capacity.

If we have items [A, B, C] and we've already included A, the decision about B and C doesn't depend on A itself—it depends only on how much capacity A consumed. Two different items that consume the same weight leave us in identical decision situations for subsequent items.

This observation is crucial:

The past matters only through its effect on remaining capacity.

Formally, if two partial solutions S₁ and S₂ have the same remaining capacity, then the optimal continuation from S₁ is the same as from S₂. This is the principle that enables us to collapse exponentially many partial solutions into polynomially many states.

Example: Different Paths, Same State
Items Considered	Items Included	Weight Used	Remaining Capacity
[A, B]	{A}	3	7
[A, B]	{B}	3	7
[A, B]	{}	0	10

In this example, the first two rows represent different partial solutions (one includes A, one includes B), but if both A and B have weight 3, the remaining decisions for items C, D, etc. are identical. We don't need to distinguish between "included A" and "included B"—only the remaining capacity matters.

This insight is profound: it means the state space is O(n × W) rather than O(2^n). We've compressed an exponential space into a polynomial one.

The Compression Insight

There are 2^n possible subsets of items, but only (W+1) possible remaining capacities. By tracking capacity instead of the specific subset, we achieve massive compression. This is why DP works—it exploits the structure that makes many distinct histories equivalent.

Remaining Capacity vs Used Capacity

A natural question arises: why track "remaining capacity" rather than "used capacity"? Mathematically, they're equivalent—if we know one, we can compute the other. But the choice affects how naturally the recurrence reads and how we think about the problem.

Remaining Capacity Formulation:

dp[i][c] = max value using items 1..i with remaining capacity c
Recurrence: dp[i][c] = max(dp[i-1][c], dp[i-1][c-w[i]] + v[i])

Used Capacity Formulation:

dp'[i][u] = max value using items 1..i with exactly u capacity used
Recurrence: dp'[i][u] = max(dp'[i-1][u], dp'[i-1][u-w[i]] + v[i])
Final answer: max(dp'[n][u]) for all u ≤ W

Remaining Capacity Advantages

•Direct answer at dp[n][W]
•Natural constraint: c - w[i] ≥ 0
•Intuitive: 'how much can I still use?'
•Standard formulation in literature

Used Capacity Characteristics

•Requires max over final column
•Natural constraint: u + w[i] ≤ W
•Intuitive: 'how much have I spent?'
•Sometimes useful for variants

Both formulations work, but remaining capacity is generally preferred because:

The answer is at a specific cell: dp[n][W] directly gives the maximum value with full capacity available. With used capacity, you'd need to find max(dp'[n][u]) across all valid u.
The constraint is naturally expressed: Checking w[i] <= c (item fits in remaining) is more intuitive than checking bounds on u + w[i].
Consistency with problem statement: The problem gives us capacity W as what we have, not as what we've used.

That said, recognizing that both are equivalent helps when you encounter problems where "used resource" is more natural (e.g., "minimize cost to achieve at least value V").

The Two-Dimensional State Space

The 0/1 Knapsack state is inherently two-dimensional: (item_index, capacity). Let's understand why both dimensions are necessary and how they interact.

Dimension 1: Item Index (i)

This dimension tracks which items we've processed or considered. At state i, we've made decisions for items 1 through i and are building toward the full solution.

Range: 0 to n (n+1 values)
Semantics: dp[i][c] considers only items 1 through i
Transition: Always moves from i-1 to i (processing one item)

Dimension 2: Remaining Capacity (c)

This dimension tracks the resource constraint—how much capacity remains for future items.

Range: 0 to W (W+1 values)
Semantics: At dp[i][c], exactly c capacity is available
Transition: Either stays at c (exclude item) or decreases by w[i] (include item)

Visualizing the State Space

Think of the DP table as a grid where rows are items (progressing through decisions) and columns are capacities (tracking resource consumption). We fill it row by row, and each cell depends only on cells in the previous row. This dependency structure is what enables efficient computation.

State Space Size:

The total number of states is:

$$(n + 1) \times (W + 1) = O(n \cdot W)$$

This is pseudo-polynomial complexity—polynomial in the numeric value of W, but exponential in the number of bits needed to represent W. A capacity of W = 10^9 would create a billion states per item, which is infeasible. We'll discuss this limitation and workarounds later.

State Space Scaling
n (items)	W (capacity)	States	Feasible?
100	1,000	100,000	✓ Easy
1,000	10,000	10,000,000	✓ Reasonable
100	1,000,000	100,000,000	⚠️ Tight
100	10^9	10^11	✗ Infeasible

Dependency Structure of States

Understanding how states depend on each other is crucial for both correctness and optimization. In 0/1 Knapsack, the dependencies are particularly clean.

Dependency Rule:

dp[i][c] depends only on:

dp[i-1][c] (exclude item i)
dp[i-1][c-w[i]] (include item i, if c ≥ w[i])

Both dependencies are in row i-1, meaning:

We only need the previous row to compute the current row
Dependencies point strictly backward (no forward references)
Dependencies within a row are independent (cells don't depend on each other)

Dependency Visualization
State: dp[i][c]
Dependencies:
         ┌─────────────────────────────┐
Row i-1: │ ... dp[i-1][c-w[i]] ... dp[i-1][c] │
         └──────────┬──────────────┬───┘
                    │              │
                    │   Include    │    Exclude
                    │   (add v[i]) │    (keep same)
                    │              │
                    v              v
Row i:   │ ...        dp[i][c]        ... │
 
Key observations:
• Dependencies only point to row i-1
• c-w[i] is always ≤ c (leftward or same position)
• Each cell in row i can be computed independently

Implications of the Dependency Structure:

1. Fill Order: We must complete row i-1 before starting row i. Within a row, cells can be filled in any order (left-to-right, right-to-left, or even in parallel).

2. Space Optimization: Since each row only depends on the previous row, we can reduce space from O(nW) to O(W) by keeping only two rows (or even one row with careful indexing). This is a critical optimization we'll explore later.

3. Parallelization: Cells within the same row are independent, making this amenable to parallel computation if W is large enough to benefit.

4. Correctness Guarantee: The acyclic dependency structure ensures we never access uncomputed values—a common source of bugs in DP implementations.

Alternative State Formulations

While the standard (item_index, capacity) formulation is canonical, understanding alternatives deepens our grasp of state design.

Formulation A: Subset-Based State (Exponential)

dp[S] = max value using exactly the items in subset S

This works but has 2^n states—no better than brute force. It fails to exploit the "capacity is all that matters" insight.

Formulation B: Prefix Items with Max Capacity Used

dp[i][v] = minimum weight to achieve exactly value v using items 1..i

This is the dual formulation—instead of maximizing value for a given weight limit, we minimize weight for a given value. It's useful when values are bounded but weights are large.

Formulation C: Suffix Items

dp[i][c] = max value using items i..n with remaining capacity c

This is equivalent to the prefix formulation but processes items in reverse. Some find it more intuitive because it matches the recursive structure more naturally.

State Formulation Comparison
Formulation	State Space	Best When	Complexity
(item, capacity)	O(nW)	W is moderate	O(nW)
Subset-based	O(2^n)	Never (dominated)	O(2^n)
(item, value)	O(nV)	V << W (values small)	O(nV)
Meet-in-middle	O(2^(n/2))	n ≤ 40, W huge	O(2^(n/2))

The Dual Formulation

When capacity W is huge but the sum of values Σv[i] is moderate, the dual formulation dp[i][v] = min weight for value v is more efficient. This demonstrates how the 'right' state depends on the problem parameters, not just the problem structure.

General Principles for State Design

The 0/1 Knapsack state design illustrates principles applicable to all DP problems. When facing a new problem, apply these guidelines:

Principle 1: Identify What Matters for Future Decisions

Ask: "If I'm at step i, what do I need to know to make optimal decisions for steps i+1 onward?" For knapsack, only the remaining capacity matters—not which specific items we chose.

Principle 2: Find the Natural Decomposition

Most DP problems decompose along one or two dimensions:

Sequential: Items, characters, positions (process one at a time)
Resource: Capacity, budget, time remaining (track consumption)
Progress: Substrings, subtrees, subgraphs (partial structures)

Principle 3: Minimize State Dimensions

Each state dimension multiplies complexity. Ask: "Is this dimension actually necessary, or can it be derived from others?" Unnecessary dimensions waste time and space.

Principle 4: Ensure Polynomial State Count

The state space size directly determines DP complexity. Choose state variables with bounded ranges. Avoid states that enumerate subsets or permutations directly.

State Design Checklist

•Completeness: Can I make optimal decisions given only this state?
•Uniqueness: Do different problem instances map to different states?
•Reachability: Are all states actually reachable from valid inputs?
•Overlap: Do many paths through the problem converge to the same state?
•Compactness: Could I remove any state dimension without losing optimality?

Mastering state design is perhaps the most transferable skill in dynamic programming. Once you can identify the right state for a problem, the recurrence and implementation often follow naturally.

Pseudo-Polynomial Complexity Explained

The 0/1 Knapsack problem is NP-Hard, yet our DP solution runs in O(nW) time. How can this be? The answer lies in understanding pseudo-polynomial complexity.

Polynomial vs Pseudo-Polynomial:

A truly polynomial algorithm runs in time polynomial in the input size (number of bits needed to represent the input). For knapsack:

n items require at least n bits to list
Weights and values up to M require O(log M) bits each
Capacity W requires O(log W) bits

Total input size: O(n log M + log W) bits.

Our O(nW) algorithm is polynomial in n but exponential in log W (the number of bits representing W). If W = 2^k, the algorithm runs in O(n · 2^k) time—exponential in the input size!

The Pseudo-Polynomial Trap

When W is given in binary, O(nW) is not polynomial in input size. With W = 10^9, we have 10^9 capacity states—but log₂(10^9) ≈ 30 bits means we're doing 2^30 operations for just 30 bits of input. This is exponential in the representation, not polynomial.

When Pseudo-Polynomial Is Acceptable:

Small W in practice: Many real problems have moderate capacities (W ≤ 10^6), making O(nW) practical.
W scales with n: If W = O(n) or W = O(n²), then O(nW) = O(n²) or O(n³), which are genuinely polynomial.
Approximation alternative: For large W, fully polynomial-time approximation schemes (FPTAS) can achieve (1+ε) approximations in time O(n² / ε).

Implications for Problem Solving:

Check capacity constraints carefully in problem statements
If W ≤ 10^5 or so, standard DP works
If W ≈ 10^9, look for alternative approaches (meet-in-middle, approximation, problem-specific structure)
NP-hardness tells us no truly polynomial algorithm exists (unless P = NP), but pseudo-polynomial may suffice in practice

Summary: Mastering State Design

We've taken a deep dive into the state design of the 0/1 Knapsack problem, uncovering principles that extend far beyond this specific algorithm.

Key Takeaways

•State captures what matters: For knapsack, only remaining capacity affects future decisions, not the specific items chosen.
•Compression enables efficiency: Tracking capacity instead of subsets compresses 2^n possibilities into O(W) states.
•Two dimensions are necessary: Item index tracks progress; capacity tracks resource consumption. Both are required.
•Dependencies determine fill order: Cells depend only on previous row, enabling space optimization.
•Alternative formulations exist: The dual formulation (minimize weight for value) is useful when values are small but capacity is large.
•Pseudo-polynomial is practical but limited: O(nW) works when W is moderate but becomes infeasible for exponentially large capacities.

What's Next:

With the state design firmly understood, we'll explore a powerful variant: the Unbounded Knapsack problem, where each item type can be selected multiple times. This changes the recurrence structure in subtle but important ways.

Page Complete

You now understand why 'capacity remaining' is the key state variable in knapsack problems and how this choice enables dynamic programming. These state design principles will serve you well across the full spectrum of DP problems.

2 / 4

Loading learning content...

Data Structures & AlgorithmsKnapsack Problems

Knapsack Problems — Resource-Constrained Optimization

LevelIntermediate

Duration90 mins

TopicKnapsack Problems

2 / 4

State Design — Capacity Remaining as the Key Dimension

The Art of State Design

The state for 0/1 Knapsack is typically expressed as:

dp[i][c] = Maximum value using items 1 through i with remaining capacity c

These questions touch on the heart of dynamic programming design. Answering them will sharpen your ability to design DP solutions for novel problems.

What You Will Learn

What Is State in Dynamic Programming?

In dynamic programming, state refers to the minimal information needed to describe a subproblem completely. A good state definition must satisfy several criteria:

3. Overlapping Subproblems: The state space should exhibit repetition—many distinct problem instances should reduce to the same state. This enables the caching that makes DP efficient.

4. Minimality: The state should be as compact as possible while still satisfying the above. Unnecessary state dimensions increase complexity without benefit.

The State Design Principle

For the 0/1 Knapsack problem, we need to ask: What information do we need to decide whether to include item i?

The answer crystallizes into two pieces:

Which items remain for consideration (items i through n)
How much capacity is available (0 through W)

This leads naturally to a two-dimensional state space indexed by (item_index, remaining_capacity).

Why Capacity Is a State Dimension

Consider the decision to include item i. This decision depends critically on one factor: whether item i fits in the knapsack given the remaining capacity.

This observation is crucial:

The past matters only through its effect on remaining capacity.

Example: Different Paths, Same State
Items Considered	Items Included	Weight Used	Remaining Capacity
[A, B]	{A}	3	7
[A, B]	{B}	3	7
[A, B]	{}	0	10

This insight is profound: it means the state space is O(n × W) rather than O(2^n). We've compressed an exponential space into a polynomial one.

The Compression Insight

Remaining Capacity vs Used Capacity

Remaining Capacity Formulation:

dp[i][c] = max value using items 1..i with remaining capacity c
Recurrence: dp[i][c] = max(dp[i-1][c], dp[i-1][c-w[i]] + v[i])

Used Capacity Formulation:

dp'[i][u] = max value using items 1..i with exactly u capacity used
Recurrence: dp'[i][u] = max(dp'[i-1][u], dp'[i-1][u-w[i]] + v[i])
Final answer: max(dp'[n][u]) for all u ≤ W

Remaining Capacity Advantages

•Direct answer at dp[n][W]
•Natural constraint: c - w[i] ≥ 0
•Intuitive: 'how much can I still use?'
•Standard formulation in literature

Used Capacity Characteristics

•Requires max over final column
•Natural constraint: u + w[i] ≤ W
•Intuitive: 'how much have I spent?'
•Sometimes useful for variants

Both formulations work, but remaining capacity is generally preferred because:

The answer is at a specific cell: dp[n][W] directly gives the maximum value with full capacity available. With used capacity, you'd need to find max(dp'[n][u]) across all valid u.
The constraint is naturally expressed: Checking w[i] <= c (item fits in remaining) is more intuitive than checking bounds on u + w[i].
Consistency with problem statement: The problem gives us capacity W as what we have, not as what we've used.

That said, recognizing that both are equivalent helps when you encounter problems where "used resource" is more natural (e.g., "minimize cost to achieve at least value V").

The Two-Dimensional State Space

The 0/1 Knapsack state is inherently two-dimensional: (item_index, capacity). Let's understand why both dimensions are necessary and how they interact.

Dimension 1: Item Index (i)

This dimension tracks which items we've processed or considered. At state i, we've made decisions for items 1 through i and are building toward the full solution.

Range: 0 to n (n+1 values)
Semantics: dp[i][c] considers only items 1 through i
Transition: Always moves from i-1 to i (processing one item)

Dimension 2: Remaining Capacity (c)

This dimension tracks the resource constraint—how much capacity remains for future items.

Range: 0 to W (W+1 values)
Semantics: At dp[i][c], exactly c capacity is available
Transition: Either stays at c (exclude item) or decreases by w[i] (include item)

Visualizing the State Space

State Space Size:

The total number of states is:

$$(n + 1) \times (W + 1) = O(n \cdot W)$$

State Space Scaling
n (items)	W (capacity)	States	Feasible?
100	1,000	100,000	✓ Easy
1,000	10,000	10,000,000	✓ Reasonable
100	1,000,000	100,000,000	⚠️ Tight
100	10^9	10^11	✗ Infeasible

Dependency Structure of States

Understanding how states depend on each other is crucial for both correctness and optimization. In 0/1 Knapsack, the dependencies are particularly clean.

Dependency Rule:

dp[i][c] depends only on:

dp[i-1][c] (exclude item i)
dp[i-1][c-w[i]] (include item i, if c ≥ w[i])

Both dependencies are in row i-1, meaning:

We only need the previous row to compute the current row
Dependencies point strictly backward (no forward references)
Dependencies within a row are independent (cells don't depend on each other)

Dependency Visualization
State: dp[i][c]
Dependencies:
         ┌─────────────────────────────┐
Row i-1: │ ... dp[i-1][c-w[i]] ... dp[i-1][c] │
         └──────────┬──────────────┬───┘
                    │              │
                    │   Include    │    Exclude
                    │   (add v[i]) │    (keep same)
                    │              │
                    v              v
Row i:   │ ...        dp[i][c]        ... │
 
Key observations:
• Dependencies only point to row i-1
• c-w[i] is always ≤ c (leftward or same position)
• Each cell in row i can be computed independently

Implications of the Dependency Structure:

1. Fill Order: We must complete row i-1 before starting row i. Within a row, cells can be filled in any order (left-to-right, right-to-left, or even in parallel).

3. Parallelization: Cells within the same row are independent, making this amenable to parallel computation if W is large enough to benefit.

4. Correctness Guarantee: The acyclic dependency structure ensures we never access uncomputed values—a common source of bugs in DP implementations.

Alternative State Formulations

While the standard (item_index, capacity) formulation is canonical, understanding alternatives deepens our grasp of state design.

Formulation A: Subset-Based State (Exponential)

dp[S] = max value using exactly the items in subset S

This works but has 2^n states—no better than brute force. It fails to exploit the "capacity is all that matters" insight.

Formulation B: Prefix Items with Max Capacity Used

dp[i][v] = minimum weight to achieve exactly value v using items 1..i

This is the dual formulation—instead of maximizing value for a given weight limit, we minimize weight for a given value. It's useful when values are bounded but weights are large.

Formulation C: Suffix Items

dp[i][c] = max value using items i..n with remaining capacity c

This is equivalent to the prefix formulation but processes items in reverse. Some find it more intuitive because it matches the recursive structure more naturally.

State Formulation Comparison
Formulation	State Space	Best When	Complexity
(item, capacity)	O(nW)	W is moderate	O(nW)
Subset-based	O(2^n)	Never (dominated)	O(2^n)
(item, value)	O(nV)	V << W (values small)	O(nV)
Meet-in-middle	O(2^(n/2))	n ≤ 40, W huge	O(2^(n/2))

The Dual Formulation

General Principles for State Design

The 0/1 Knapsack state design illustrates principles applicable to all DP problems. When facing a new problem, apply these guidelines:

Principle 1: Identify What Matters for Future Decisions

Ask: "If I'm at step i, what do I need to know to make optimal decisions for steps i+1 onward?" For knapsack, only the remaining capacity matters—not which specific items we chose.

Principle 2: Find the Natural Decomposition

Most DP problems decompose along one or two dimensions:

Sequential: Items, characters, positions (process one at a time)
Resource: Capacity, budget, time remaining (track consumption)
Progress: Substrings, subtrees, subgraphs (partial structures)

Principle 3: Minimize State Dimensions

Each state dimension multiplies complexity. Ask: "Is this dimension actually necessary, or can it be derived from others?" Unnecessary dimensions waste time and space.

Principle 4: Ensure Polynomial State Count

The state space size directly determines DP complexity. Choose state variables with bounded ranges. Avoid states that enumerate subsets or permutations directly.

State Design Checklist

•Completeness: Can I make optimal decisions given only this state?
•Uniqueness: Do different problem instances map to different states?
•Reachability: Are all states actually reachable from valid inputs?
•Overlap: Do many paths through the problem converge to the same state?
•Compactness: Could I remove any state dimension without losing optimality?

Mastering state design is perhaps the most transferable skill in dynamic programming. Once you can identify the right state for a problem, the recurrence and implementation often follow naturally.

Pseudo-Polynomial Complexity Explained

The 0/1 Knapsack problem is NP-Hard, yet our DP solution runs in O(nW) time. How can this be? The answer lies in understanding pseudo-polynomial complexity.

Polynomial vs Pseudo-Polynomial:

A truly polynomial algorithm runs in time polynomial in the input size (number of bits needed to represent the input). For knapsack:

n items require at least n bits to list
Weights and values up to M require O(log M) bits each
Capacity W requires O(log W) bits

Total input size: O(n log M + log W) bits.

Our O(nW) algorithm is polynomial in n but exponential in log W (the number of bits representing W). If W = 2^k, the algorithm runs in O(n · 2^k) time—exponential in the input size!

The Pseudo-Polynomial Trap

When Pseudo-Polynomial Is Acceptable:

Small W in practice: Many real problems have moderate capacities (W ≤ 10^6), making O(nW) practical.
W scales with n: If W = O(n) or W = O(n²), then O(nW) = O(n²) or O(n³), which are genuinely polynomial.
Approximation alternative: For large W, fully polynomial-time approximation schemes (FPTAS) can achieve (1+ε) approximations in time O(n² / ε).

Implications for Problem Solving:

Check capacity constraints carefully in problem statements
If W ≤ 10^5 or so, standard DP works
If W ≈ 10^9, look for alternative approaches (meet-in-middle, approximation, problem-specific structure)
NP-hardness tells us no truly polynomial algorithm exists (unless P = NP), but pseudo-polynomial may suffice in practice

Summary: Mastering State Design

We've taken a deep dive into the state design of the 0/1 Knapsack problem, uncovering principles that extend far beyond this specific algorithm.

Key Takeaways

•State captures what matters: For knapsack, only remaining capacity affects future decisions, not the specific items chosen.
•Compression enables efficiency: Tracking capacity instead of subsets compresses 2^n possibilities into O(W) states.
•Two dimensions are necessary: Item index tracks progress; capacity tracks resource consumption. Both are required.
•Dependencies determine fill order: Cells depend only on previous row, enabling space optimization.
•Alternative formulations exist: The dual formulation (minimize weight for value) is useful when values are small but capacity is large.
•Pseudo-polynomial is practical but limited: O(nW) works when W is moderate but becomes infeasible for exponentially large capacities.

What's Next:

Page Complete

2 / 4