Loading learning content...
The heart of the LCS algorithm is its DP table—a two-dimensional matrix where each cell represents the answer to a subproblem. Understanding how to construct this table is essential not just for LCS, but for building intuition that transfers to dozens of other dynamic programming problems.
In this page, we'll trace through the table construction process step by step, examining how each cell is computed, what information flows between cells, and how patterns in the table reveal properties of the input strings. By the end, you'll be able to construct an LCS table by hand and, more importantly, you'll deeply understand why it works.
This page focuses on developing visual and intuitive understanding of the DP table. You'll learn to read the table to understand string relationships, trace the computation flow, and recognize patterns that help with debugging and optimization.
The Table Dimensions
For strings X of length m and Y of length n, we create a table with:
Why the Extra Row and Column?
The zeroth row represents LCS(∅, Y[1..j]) — the LCS of the empty string with various prefixes of Y. Since the empty string has no characters, this LCS is always 0.
Similarly, the zeroth column represents LCS(X[1..i], ∅) — always 0.
These base cases are crucial: they provide the foundation from which all other values are computed.
The Computation Order
We fill the table in row-major order: left to right, top to bottom. This order is not arbitrary—it ensures that when we compute dp[i][j], the three cells it depends on have already been computed:
This dependency pattern is what makes the row-major filling order correct. Other valid orders exist (column-major, anti-diagonal), but row-major is most natural for most programming languages.
Imagine arrows pointing into each cell from its dependencies. For cell (i, j), there are arrows coming from (i-1, j-1), (i-1, j), and (i, j-1). The filling order must respect these arrows: we can only compute a cell after all cells that point to it are complete.
Each cell dp[i][j] is computed using exactly one of two rules, depending on whether the current characters match.
Rule 1: Characters Match (X[i] == Y[j])
dp[i][j] = dp[i-1][j-1] + 1
Intuition: When the characters match, we can extend the LCS of the shorter prefixes by including this matching character. We look diagonally (representing both strings being one character shorter) and add 1.
Rule 2: Characters Don't Match (X[i] ≠ Y[j])
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
Intuition: When characters don't match, at least one of them cannot be in the LCS. We consider two possibilities:
We take the maximum because we want the longest common subsequence.
Notice that the diagonal cell (dp[i-1][j-1]) is only used when characters match. When they don't match, we never look diagonally—we only look up and left. This is because a mismatch means we cannot simultaneously include both X[i] and Y[j] in the LCS, so extending from the diagonal would be invalid.
Let's trace through the complete table construction for X = "ABCD" and Y = "AEBD". We'll compute every cell explicitly.
Step 1: Initialize the table with base cases
ε A E B D
ε 0 0 0 0 0
A 0 _ _ _ _
B 0 _ _ _ _
C 0 _ _ _ _
D 0 _ _ _ _
Step 2: Fill Row 1 (i = 1, X[1] = 'A')
ε A E B D
ε 0 0 0 0 0
A 0 1 1 1 1
B 0 _ _ _ _
C 0 _ _ _ _
D 0 _ _ _ _
Observation: Once we find the first 'A' match, the value stays at 1 across the row. This makes sense: X[1..1] = "A" has an LCS of length 1 with any prefix of Y that contains 'A'.
Step 3: Fill Row 2 (i = 2, X[2] = 'B')
ε A E B D
ε 0 0 0 0 0
A 0 1 1 1 1
B 0 1 1 2 2
C 0 _ _ _ _
D 0 _ _ _ _
Key Insight: When 'B' matches 'B' at position (2, 3), we look diagonally at (1, 2), which is 1 (the LCS of "A" and "AE"). Adding 1 gives us 2, representing the LCS "AB".
Step 4: Fill Row 3 (i = 3, X[3] = 'C')
ε A E B D
ε 0 0 0 0 0
A 0 1 1 1 1
B 0 1 1 2 2
C 0 1 1 2 2
D 0 _ _ _ _
Observation: 'C' doesn't appear in Y at all, so it cannot contribute to any LCS. The row maintains the values from the row above.
Step 5: Fill Row 4 (i = 4, X[4] = 'D')
Final Table:
ε A E B D
ε 0 0 0 0 0
A 0 1 1 1 1
B 0 1 1 2 2
C 0 1 1 2 2
D 0 1 1 2 3
The Answer: dp[4][4] = 3. The LCS of "ABCD" and "AEBD" has length 3.
From this table, we can determine that the LCS is "ABD". It comes from: A (matched at position 1,1), B (matched at position 2,3), and D (matched at position 4,4). We'll formalize the reconstruction process in the next section.
Understanding the patterns that emerge in LCS tables develops intuition that helps with debugging, optimization, and related problems.
Pattern 1: Monotonic Non-Decrease
As you move right or down in the table, values never decrease. Formally:
Why? Adding characters to a prefix can only maintain or improve the LCS, never shorten it.
Pattern 2: Maximum Increase of 1
The value can increase by at most 1 when moving to an adjacent cell. You'll never see a jump from 2 to 4, for example.
Why? Each step processes at most one character from each string. At most one new character can be added to the LCS.
Pattern 3: Diagonal Increases Indicate Matches
When dp[i][j] = dp[i-1][j-1] + 1, the characters X[i] and Y[j] matched. These diagonal increases trace the path of the LCS through the table.
Pattern 4: Plateaus Indicate No Contribution
When a large region of the table has the same value, it means those characters don't contribute to extending the LCS beyond what was already achieved.
Pattern 5: The Main Diagonal Matters Most
In strings that are similar (many matches), the largest values tend to cluster near the main diagonal. Dissimilar strings have their maximum values pushed toward the corners.
Pattern 6: Row/Column of Zeros
Beyond the base case, if Y[1..k] contains no characters from X, then the first k+1 entries of every row will be 0. This indicates a "dead zone" for the LCS.
When debugging LCS code, check these invariants: (1) First row/column should be all zeros, (2) Values should never decrease going right or down, (3) Values should never increase by more than 1 per cell, (4) The final answer should equal the number of diagonal increases in any path from (0,0) to (m,n).
Let's work through a more complex example: X = "AGGTAB" and Y = "GXTXAYB".
Initial Setup:
ε G X T X A Y B
ε 0 0 0 0 0 0 0 0
A 0 _ _ _ _ _ _ _
G 0 _ _ _ _ _ _ _
G 0 _ _ _ _ _ _ _
T 0 _ _ _ _ _ _ _
A 0 _ _ _ _ _ _ _
B 0 _ _ _ _ _ _ _
Row-by-Row Computation:
Row 1 (A): Matches at column 5 (A). Values: 0, 0, 0, 0, 0, 1, 1, 1
Row 2 (G): Matches at column 1 (G). Values: 0, 1, 1, 1, 1, 1, 1, 1
Row 3 (G): Matches at column 1 (G). Values: 0, 1, 1, 1, 1, 1, 1, 1
Row 4 (T): Matches at column 3 (T).
Row 5 (A): Matches at column 5 (A).
Row 6 (B): Matches at column 7 (B).
Complete Table:
ε G X T X A Y B
ε 0 0 0 0 0 0 0 0
A 0 0 0 0 0 1 1 1
G 0 1 1 1 1 1 1 1
G 0 1 1 1 1 1 1 1
T 0 1 1 2 2 2 2 2
A 0 1 1 2 2 3 3 3
B 0 1 1 2 2 3 3 4
The Answer: LCS length = 4.
The LCS itself: "GTAB"
This example also has another valid LCS: "GXAB" (G at position 1, X at position 2... wait, there's no second X in the first string). Actually, "GTAB" is the unique LCS here. In general, multiple distinct LCS of the same length may exist, but this particular example has only one.
While the basic algorithm is O(m × n), there are implementation techniques that can improve practical performance.
Technique 1: Early Termination
If we're computing LCS(X, Y) and we find a common subsequence of length min(m, n), we can stop immediately—the LCS cannot be longer than the shorter string.
Technique 2: Prefix/Suffix Optimization
If X and Y share a common prefix of length p and a common suffix of length s (where p + s ≤ min(m, n)):
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
def lcs_with_prefix_suffix_optimization(X: str, Y: str) -> int: """ LCS with prefix/suffix optimization for similar strings. When strings share common prefixes/suffixes, we can reduce the effective problem size significantly. """ m, n = len(X), len(Y) # Find common prefix length prefix_len = 0 while prefix_len < m and prefix_len < n and X[prefix_len] == Y[prefix_len]: prefix_len += 1 # Find common suffix length (avoiding overlap with prefix) suffix_len = 0 while (suffix_len < m - prefix_len and suffix_len < n - prefix_len and X[m - 1 - suffix_len] == Y[n - 1 - suffix_len]): suffix_len += 1 # If strings are identical, return length if prefix_len + suffix_len >= min(m, n): return min(m, n) # Extract middle portions X_middle = X[prefix_len:m - suffix_len] Y_middle = Y[prefix_len:n - suffix_len] # Compute LCS of middle portions middle_lcs = lcs_length(X_middle, Y_middle) # Total LCS = prefix + middle_lcs + suffix return prefix_len + middle_lcs + suffix_len def lcs_length(X: str, Y: str) -> int: """Standard LCS computation for reference.""" m, n = len(X), len(Y) dp = [[0] * (n + 1) for _ in range(m + 1)] for i in range(1, m + 1): for j in range(1, n + 1): if X[i - 1] == Y[j - 1]: dp[i][j] = dp[i - 1][j - 1] + 1 else: dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]) return dp[m][n] # Example: Comparing similar stringsX = "Hello, World! This is a test."Y = "Hello, World! That was a test." # Similar prefix and suffix # Without optimization: 30 × 31 = 930 cells# With optimization: Shares "Hello, World! Th" (16) prefix and "s a test." (9) suffix# Middle portions: "is i" and "at wa" → 4 × 5 = 20 cells print(f"LCS length: {lcs_with_prefix_suffix_optimization(X, Y)}")The prefix/suffix optimization is particularly valuable for diff algorithms (comparing file versions) where changes are typically localized. Two 10,000-line files that differ only in the middle 100 lines would otherwise require a 10,000 × 10,000 = 100 million cell table, but with optimization might only need a 100 × 100 = 10,000 cell table—a 10,000× speedup!
We've covered the complete mechanics of DP table construction for the LCS problem.
What's Next:
Now that we can compute the LCS length efficiently, the next crucial question is: how do we actually find the LCS itself? The next page covers LCS reconstruction—tracing back through the DP table to extract one or all longest common subsequences.
You now understand every aspect of DP table construction for the LCS problem. You can trace through examples by hand, recognize patterns in completed tables, and apply optimizations for similar strings. Next: reconstructing the actual LCS from the table.