Data Structures & AlgorithmsString Algorithms

Longest Common Subsequence (Revisited)

LevelIntermediate

Duration75 mins

TopicString Algorithms

2 / 4

DP Table Construction

Building the Solution Matrix

The heart of the LCS algorithm is its DP table—a two-dimensional matrix where each cell represents the answer to a subproblem. Understanding how to construct this table is essential not just for LCS, but for building intuition that transfers to dozens of other dynamic programming problems.

In this page, we'll trace through the table construction process step by step, examining how each cell is computed, what information flows between cells, and how patterns in the table reveal properties of the input strings. By the end, you'll be able to construct an LCS table by hand and, more importantly, you'll deeply understand why it works.

What You Will Master

This page focuses on developing visual and intuitive understanding of the DP table. You'll learn to read the table to understand string relationships, trace the computation flow, and recognize patterns that help with debugging and optimization.

Table Structure and Initialization

The Table Dimensions

For strings X of length m and Y of length n, we create a table with:

(m + 1) rows: One for each prefix of X, plus a row for the empty prefix
(n + 1) columns: One for each prefix of Y, plus a column for the empty prefix

Why the Extra Row and Column?

The zeroth row represents LCS(∅, Y[1..j]) — the LCS of the empty string with various prefixes of Y. Since the empty string has no characters, this LCS is always 0.

Similarly, the zeroth column represents LCS(X[1..i], ∅) — always 0.

These base cases are crucial: they provide the foundation from which all other values are computed.

Initial Table Setup for X = "ABCD" and Y = "AEBD"Before any computation, the table looks like this:

Input

Output

The Computation Order

We fill the table in row-major order: left to right, top to bottom. This order is not arbitrary—it ensures that when we compute dp[i][j], the three cells it depends on have already been computed:

dp[i-1][j-1]: The cell diagonally above-left (previous row, previous column)
dp[i-1][j]: The cell directly above (previous row, same column)
dp[i][j-1]: The cell directly to the left (same row, previous column)

This dependency pattern is what makes the row-major filling order correct. Other valid orders exist (column-major, anti-diagonal), but row-major is most natural for most programming languages.

Dependency Visualization

Imagine arrows pointing into each cell from its dependencies. For cell (i, j), there are arrows coming from (i-1, j-1), (i-1, j), and (i, j-1). The filling order must respect these arrows: we can only compute a cell after all cells that point to it are complete.

Cell Computation Rules

Each cell dp[i][j] is computed using exactly one of two rules, depending on whether the current characters match.

Rule 1: Characters Match (X[i] == Y[j])

dp[i][j] = dp[i-1][j-1] + 1

Intuition: When the characters match, we can extend the LCS of the shorter prefixes by including this matching character. We look diagonally (representing both strings being one character shorter) and add 1.

Rule 2: Characters Don't Match (X[i] ≠ Y[j])

dp[i][j] = max(dp[i-1][j], dp[i][j-1])

Intuition: When characters don't match, at least one of them cannot be in the LCS. We consider two possibilities:

The LCS doesn't include X[i], so we take the best answer from X[1..i-1] and Y[1..j]
The LCS doesn't include Y[j], so we take the best answer from X[1..i] and Y[1..j-1]

We take the maximum because we want the longest common subsequence.

When Characters Match

•Look diagonally up-left
•Add 1 to that value
•This represents extending the LCS
•The matching character joins the subsequence
•Example: dp[2][3] = dp[1][2] + 1

When Characters Don't Match

•Look above and to the left
•Take the maximum of these two values
•This represents 'skipping' one character
•Neither character joins the LCS at this step
•Example: dp[2][3] = max(dp[1][3], dp[2][2])

The Diagonal Insight

Notice that the diagonal cell (dp[i-1][j-1]) is only used when characters match. When they don't match, we never look diagonally—we only look up and left. This is because a mismatch means we cannot simultaneously include both X[i] and Y[j] in the LCS, so extending from the diagonal would be invalid.

Complete Worked Example

Let's trace through the complete table construction for X = "ABCD" and Y = "AEBD". We'll compute every cell explicitly.

Step 1: Initialize the table with base cases

        ε    A    E    B    D
    ε   0    0    0    0    0
    A   0    _    _    _    _
    B   0    _    _    _    _
    C   0    _    _    _    _
    D   0    _    _    _    _

Step 2: Fill Row 1 (i = 1, X[1] = 'A')

Cell (1, 1): X[1] = 'A', Y[1] = 'A'. Match! → dp[0][0] + 1 = 0 + 1 = 1
Cell (1, 2): X[1] = 'A', Y[2] = 'E'. No match → max(dp[0][2], dp[1][1]) = max(0, 1) = 1
Cell (1, 3): X[1] = 'A', Y[3] = 'B'. No match → max(dp[0][3], dp[1][2]) = max(0, 1) = 1
Cell (1, 4): X[1] = 'A', Y[4] = 'D'. No match → max(dp[0][4], dp[1][3]) = max(0, 1) = 1

        ε    A    E    B    D
    ε   0    0    0    0    0
    A   0    1    1    1    1
    B   0    _    _    _    _
    C   0    _    _    _    _
    D   0    _    _    _    _

Observation: Once we find the first 'A' match, the value stays at 1 across the row. This makes sense: X[1..1] = "A" has an LCS of length 1 with any prefix of Y that contains 'A'.

Step 3: Fill Row 2 (i = 2, X[2] = 'B')

Cell (2, 1): X[2] = 'B', Y[1] = 'A'. No match → max(dp[1][1], dp[2][0]) = max(1, 0) = 1
Cell (2, 2): X[2] = 'B', Y[2] = 'E'. No match → max(dp[1][2], dp[2][1]) = max(1, 1) = 1
Cell (2, 3): X[2] = 'B', Y[3] = 'B'. Match! → dp[1][2] + 1 = 1 + 1 = 2
Cell (2, 4): X[2] = 'B', Y[4] = 'D'. No match → max(dp[1][4], dp[2][3]) = max(1, 2) = 2

        ε    A    E    B    D
    ε   0    0    0    0    0
    A   0    1    1    1    1
    B   0    1    1    2    2
    C   0    _    _    _    _
    D   0    _    _    _    _

Key Insight: When 'B' matches 'B' at position (2, 3), we look diagonally at (1, 2), which is 1 (the LCS of "A" and "AE"). Adding 1 gives us 2, representing the LCS "AB".

Step 4: Fill Row 3 (i = 3, X[3] = 'C')

Cell (3, 1): X[3] = 'C', Y[1] = 'A'. No match → max(dp[2][1], dp[3][0]) = max(1, 0) = 1
Cell (3, 2): X[3] = 'C', Y[2] = 'E'. No match → max(dp[2][2], dp[3][1]) = max(1, 1) = 1
Cell (3, 3): X[3] = 'C', Y[3] = 'B'. No match → max(dp[2][3], dp[3][2]) = max(2, 1) = 2
Cell (3, 4): X[3] = 'C', Y[4] = 'D'. No match → max(dp[2][4], dp[3][3]) = max(2, 2) = 2

        ε    A    E    B    D
    ε   0    0    0    0    0
    A   0    1    1    1    1
    B   0    1    1    2    2
    C   0    1    1    2    2
    D   0    _    _    _    _

Observation: 'C' doesn't appear in Y at all, so it cannot contribute to any LCS. The row maintains the values from the row above.

Step 5: Fill Row 4 (i = 4, X[4] = 'D')

Cell (4, 1): X[4] = 'D', Y[1] = 'A'. No match → max(dp[3][1], dp[4][0]) = max(1, 0) = 1
Cell (4, 2): X[4] = 'D', Y[2] = 'E'. No match → max(dp[3][2], dp[4][1]) = max(1, 1) = 1
Cell (4, 3): X[4] = 'D', Y[3] = 'B'. No match → max(dp[3][3], dp[4][2]) = max(2, 1) = 2
Cell (4, 4): X[4] = 'D', Y[4] = 'D'. Match! → dp[3][3] + 1 = 2 + 1 = 3

Final Table:

        ε    A    E    B    D
    ε   0    0    0    0    0
    A   0    1    1    1    1
    B   0    1    1    2    2
    C   0    1    1    2    2
    D   0    1    1    2    3

The Answer: dp[4][4] = 3. The LCS of "ABCD" and "AEBD" has length 3.

Identifying the LCS

From this table, we can determine that the LCS is "ABD". It comes from: A (matched at position 1,1), B (matched at position 2,3), and D (matched at position 4,4). We'll formalize the reconstruction process in the next section.

Patterns in the Table

Understanding the patterns that emerge in LCS tables develops intuition that helps with debugging, optimization, and related problems.

Pattern 1: Monotonic Non-Decrease

As you move right or down in the table, values never decrease. Formally:

dp[i][j] ≥ dp[i-1][j] (moving down)
dp[i][j] ≥ dp[i][j-1] (moving right)
dp[i][j] ≥ dp[i-1][j-1] (moving diagonally)

Why? Adding characters to a prefix can only maintain or improve the LCS, never shorten it.

Pattern 2: Maximum Increase of 1

The value can increase by at most 1 when moving to an adjacent cell. You'll never see a jump from 2 to 4, for example.

Why? Each step processes at most one character from each string. At most one new character can be added to the LCS.

Pattern 3: Diagonal Increases Indicate Matches

When dp[i][j] = dp[i-1][j-1] + 1, the characters X[i] and Y[j] matched. These diagonal increases trace the path of the LCS through the table.

Pattern 4: Plateaus Indicate No Contribution

When a large region of the table has the same value, it means those characters don't contribute to extending the LCS beyond what was already achieved.

Pattern 5: The Main Diagonal Matters Most

In strings that are similar (many matches), the largest values tend to cluster near the main diagonal. Dissimilar strings have their maximum values pushed toward the corners.

Pattern 6: Row/Column of Zeros

Beyond the base case, if Y[1..k] contains no characters from X, then the first k+1 entries of every row will be 0. This indicates a "dead zone" for the LCS.

Recognizing PatternsAnalyzing the table for X = "ABCD" and Y = "AEBD"

Input

Output

Debugging with Patterns

When debugging LCS code, check these invariants: (1) First row/column should be all zeros, (2) Values should never decrease going right or down, (3) Values should never increase by more than 1 per cell, (4) The final answer should equal the number of diagonal increases in any path from (0,0) to (m,n).

Second Worked Example: Longer Strings

Let's work through a more complex example: X = "AGGTAB" and Y = "GXTXAYB".

Initial Setup:

        ε    G    X    T    X    A    Y    B
    ε   0    0    0    0    0    0    0    0
    A   0    _    _    _    _    _    _    _
    G   0    _    _    _    _    _    _    _
    G   0    _    _    _    _    _    _    _
    T   0    _    _    _    _    _    _    _
    A   0    _    _    _    _    _    _    _
    B   0    _    _    _    _    _    _    _

Row-by-Row Computation:

Row 1 (A): Matches at column 5 (A). Values: 0, 0, 0, 0, 0, 1, 1, 1

Row 2 (G): Matches at column 1 (G). Values: 0, 1, 1, 1, 1, 1, 1, 1

Row 3 (G): Matches at column 1 (G). Values: 0, 1, 1, 1, 1, 1, 1, 1

Note: The second 'G' in X doesn't increase the LCS because there's only one 'G' in Y.

Row 4 (T): Matches at column 3 (T).

Cell (4,3): X[4]='T', Y[3]='T'. Match! → dp[3][2] + 1 = 1 + 1 = 2
Values: 0, 1, 1, 2, 2, 2, 2, 2

Row 5 (A): Matches at column 5 (A).

Cell (5,5): X[5]='A', Y[5]='A'. Match! → dp[4][4] + 1 = 2 + 1 = 3
Values: 0, 1, 1, 2, 2, 3, 3, 3

Row 6 (B): Matches at column 7 (B).

Cell (6,7): X[6]='B', Y[7]='B'. Match! → dp[5][6] + 1 = 3 + 1 = 4
Values: 0, 1, 1, 2, 2, 3, 3, 4

Complete Table:

        ε    G    X    T    X    A    Y    B
    ε   0    0    0    0    0    0    0    0
    A   0    0    0    0    0    1    1    1
    G   0    1    1    1    1    1    1    1
    G   0    1    1    1    1    1    1    1
    T   0    1    1    2    2    2    2    2
    A   0    1    1    2    2    3    3    3
    B   0    1    1    2    2    3    3    4

The Answer: LCS length = 4.

The LCS itself: "GTAB"

G: matched at (2, 1)
T: matched at (4, 3)
A: matched at (5, 5)
B: matched at (6, 7)

Multiple Valid LCS

This example also has another valid LCS: "GXAB" (G at position 1, X at position 2... wait, there's no second X in the first string). Actually, "GTAB" is the unique LCS here. In general, multiple distinct LCS of the same length may exist, but this particular example has only one.

Efficient Table Construction Techniques

While the basic algorithm is O(m × n), there are implementation techniques that can improve practical performance.

Technique 1: Early Termination

If we're computing LCS(X, Y) and we find a common subsequence of length min(m, n), we can stop immediately—the LCS cannot be longer than the shorter string.

Technique 2: Prefix/Suffix Optimization

If X and Y share a common prefix of length p and a common suffix of length s (where p + s ≤ min(m, n)):

These characters are guaranteed to be in the LCS
We only need to compute LCS for the middle portions: X[p+1..m-s] and Y[p+1..n-s]
This reduces the table size significantly for similar strings

lcs_optimized_construction.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def lcs_with_prefix_suffix_optimization(X: str, Y: str) -> int:
    """
    LCS with prefix/suffix optimization for similar strings.
    
    When strings share common prefixes/suffixes, we can reduce
    the effective problem size significantly.
    """
    m, n = len(X), len(Y)
    
    # Find common prefix length
    prefix_len = 0
    while prefix_len < m and prefix_len < n and X[prefix_len] == Y[prefix_len]:
        prefix_len += 1
    
    # Find common suffix length (avoiding overlap with prefix)
    suffix_len = 0
    while (suffix_len < m - prefix_len and 
           suffix_len < n - prefix_len and
           X[m - 1 - suffix_len] == Y[n - 1 - suffix_len]):
        suffix_len += 1
    
    # If strings are identical, return length
    if prefix_len + suffix_len >= min(m, n):
        return min(m, n)
    
    # Extract middle portions
    X_middle = X[prefix_len:m - suffix_len]
    Y_middle = Y[prefix_len:n - suffix_len]
    
    # Compute LCS of middle portions
    middle_lcs = lcs_length(X_middle, Y_middle)
    
    # Total LCS = prefix + middle_lcs + suffix
    return prefix_len + middle_lcs + suffix_len
 
 
def lcs_length(X: str, Y: str) -> int:
    """Standard LCS computation for reference."""
    m, n = len(X), len(Y)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    
    return dp[m][n]
 
 
# Example: Comparing similar strings
X = "Hello, World! This is a test."
Y = "Hello, World! That was a test."  # Similar prefix and suffix
 
# Without optimization: 30 × 31 = 930 cells
# With optimization: Shares "Hello, World! Th" (16) prefix and "s a test." (9) suffix
# Middle portions: "is i" and "at wa" → 4 × 5 = 20 cells
 
print(f"LCS length: {lcs_with_prefix_suffix_optimization(X, Y)}")

Practical Impact

The prefix/suffix optimization is particularly valuable for diff algorithms (comparing file versions) where changes are typically localized. Two 10,000-line files that differ only in the middle 100 lines would otherwise require a 10,000 × 10,000 = 100 million cell table, but with optimization might only need a 100 × 100 = 10,000 cell table—a 10,000× speedup!

Summary: Mastering Table Construction

We've covered the complete mechanics of DP table construction for the LCS problem.

Key Takeaways

•Table dimensions are (m+1) × (n+1) — The extra row and column handle base cases (empty prefixes).
•Base cases are all zeros — Empty strings have no common subsequence with anything.
•Two computation rules — Match (diagonal + 1) or no match (max of top and left).
•Row-major order ensures dependencies are met — Each cell's prerequisites are computed before it.
•Patterns reveal string relationships — Monotonicity, plateaus, and diagonal increases tell a story.
•Practical optimizations exist — Prefix/suffix trimming can dramatically reduce work for similar strings.

What's Next:

Now that we can compute the LCS length efficiently, the next crucial question is: how do we actually find the LCS itself? The next page covers LCS reconstruction—tracing back through the DP table to extract one or all longest common subsequences.

Page Complete

You now understand every aspect of DP table construction for the LCS problem. You can trace through examples by hand, recognize patterns in completed tables, and apply optimizations for similar strings. Next: reconstructing the actual LCS from the table.

2 / 4

Loading learning content...

Data Structures & AlgorithmsString Algorithms

Longest Common Subsequence (Revisited)

LevelIntermediate

Duration75 mins

TopicString Algorithms

2 / 4

DP Table Construction

Building the Solution Matrix

What You Will Master

Table Structure and Initialization

The Table Dimensions

For strings X of length m and Y of length n, we create a table with:

(m + 1) rows: One for each prefix of X, plus a row for the empty prefix
(n + 1) columns: One for each prefix of Y, plus a column for the empty prefix

Why the Extra Row and Column?

The zeroth row represents LCS(∅, Y[1..j]) — the LCS of the empty string with various prefixes of Y. Since the empty string has no characters, this LCS is always 0.

Similarly, the zeroth column represents LCS(X[1..i], ∅) — always 0.

These base cases are crucial: they provide the foundation from which all other values are computed.

Initial Table Setup for X = "ABCD" and Y = "AEBD"Before any computation, the table looks like this:

Input

Output

The Computation Order

dp[i-1][j-1]: The cell diagonally above-left (previous row, previous column)
dp[i-1][j]: The cell directly above (previous row, same column)
dp[i][j-1]: The cell directly to the left (same row, previous column)

This dependency pattern is what makes the row-major filling order correct. Other valid orders exist (column-major, anti-diagonal), but row-major is most natural for most programming languages.

Dependency Visualization

Cell Computation Rules

Each cell dp[i][j] is computed using exactly one of two rules, depending on whether the current characters match.

Rule 1: Characters Match (X[i] == Y[j])

dp[i][j] = dp[i-1][j-1] + 1

Rule 2: Characters Don't Match (X[i] ≠ Y[j])

dp[i][j] = max(dp[i-1][j], dp[i][j-1])

Intuition: When characters don't match, at least one of them cannot be in the LCS. We consider two possibilities:

The LCS doesn't include X[i], so we take the best answer from X[1..i-1] and Y[1..j]
The LCS doesn't include Y[j], so we take the best answer from X[1..i] and Y[1..j-1]

We take the maximum because we want the longest common subsequence.

When Characters Match

•Look diagonally up-left
•Add 1 to that value
•This represents extending the LCS
•The matching character joins the subsequence
•Example: dp[2][3] = dp[1][2] + 1

When Characters Don't Match

•Look above and to the left
•Take the maximum of these two values
•This represents 'skipping' one character
•Neither character joins the LCS at this step
•Example: dp[2][3] = max(dp[1][3], dp[2][2])

The Diagonal Insight

Complete Worked Example

Let's trace through the complete table construction for X = "ABCD" and Y = "AEBD". We'll compute every cell explicitly.

Step 1: Initialize the table with base cases

        ε    A    E    B    D
    ε   0    0    0    0    0
    A   0    _    _    _    _
    B   0    _    _    _    _
    C   0    _    _    _    _
    D   0    _    _    _    _

Step 2: Fill Row 1 (i = 1, X[1] = 'A')

Cell (1, 1): X[1] = 'A', Y[1] = 'A'. Match! → dp[0][0] + 1 = 0 + 1 = 1
Cell (1, 2): X[1] = 'A', Y[2] = 'E'. No match → max(dp[0][2], dp[1][1]) = max(0, 1) = 1
Cell (1, 3): X[1] = 'A', Y[3] = 'B'. No match → max(dp[0][3], dp[1][2]) = max(0, 1) = 1
Cell (1, 4): X[1] = 'A', Y[4] = 'D'. No match → max(dp[0][4], dp[1][3]) = max(0, 1) = 1

        ε    A    E    B    D
    ε   0    0    0    0    0
    A   0    1    1    1    1
    B   0    _    _    _    _
    C   0    _    _    _    _
    D   0    _    _    _    _

Observation: Once we find the first 'A' match, the value stays at 1 across the row. This makes sense: X[1..1] = "A" has an LCS of length 1 with any prefix of Y that contains 'A'.

Step 3: Fill Row 2 (i = 2, X[2] = 'B')

Cell (2, 1): X[2] = 'B', Y[1] = 'A'. No match → max(dp[1][1], dp[2][0]) = max(1, 0) = 1
Cell (2, 2): X[2] = 'B', Y[2] = 'E'. No match → max(dp[1][2], dp[2][1]) = max(1, 1) = 1
Cell (2, 3): X[2] = 'B', Y[3] = 'B'. Match! → dp[1][2] + 1 = 1 + 1 = 2
Cell (2, 4): X[2] = 'B', Y[4] = 'D'. No match → max(dp[1][4], dp[2][3]) = max(1, 2) = 2

        ε    A    E    B    D
    ε   0    0    0    0    0
    A   0    1    1    1    1
    B   0    1    1    2    2
    C   0    _    _    _    _
    D   0    _    _    _    _

Key Insight: When 'B' matches 'B' at position (2, 3), we look diagonally at (1, 2), which is 1 (the LCS of "A" and "AE"). Adding 1 gives us 2, representing the LCS "AB".

Step 4: Fill Row 3 (i = 3, X[3] = 'C')

Cell (3, 1): X[3] = 'C', Y[1] = 'A'. No match → max(dp[2][1], dp[3][0]) = max(1, 0) = 1
Cell (3, 2): X[3] = 'C', Y[2] = 'E'. No match → max(dp[2][2], dp[3][1]) = max(1, 1) = 1
Cell (3, 3): X[3] = 'C', Y[3] = 'B'. No match → max(dp[2][3], dp[3][2]) = max(2, 1) = 2
Cell (3, 4): X[3] = 'C', Y[4] = 'D'. No match → max(dp[2][4], dp[3][3]) = max(2, 2) = 2

        ε    A    E    B    D
    ε   0    0    0    0    0
    A   0    1    1    1    1
    B   0    1    1    2    2
    C   0    1    1    2    2
    D   0    _    _    _    _

Observation: 'C' doesn't appear in Y at all, so it cannot contribute to any LCS. The row maintains the values from the row above.

Step 5: Fill Row 4 (i = 4, X[4] = 'D')

Cell (4, 1): X[4] = 'D', Y[1] = 'A'. No match → max(dp[3][1], dp[4][0]) = max(1, 0) = 1
Cell (4, 2): X[4] = 'D', Y[2] = 'E'. No match → max(dp[3][2], dp[4][1]) = max(1, 1) = 1
Cell (4, 3): X[4] = 'D', Y[3] = 'B'. No match → max(dp[3][3], dp[4][2]) = max(2, 1) = 2
Cell (4, 4): X[4] = 'D', Y[4] = 'D'. Match! → dp[3][3] + 1 = 2 + 1 = 3

Final Table:

        ε    A    E    B    D
    ε   0    0    0    0    0
    A   0    1    1    1    1
    B   0    1    1    2    2
    C   0    1    1    2    2
    D   0    1    1    2    3

The Answer: dp[4][4] = 3. The LCS of "ABCD" and "AEBD" has length 3.

Identifying the LCS

Patterns in the Table

Understanding the patterns that emerge in LCS tables develops intuition that helps with debugging, optimization, and related problems.

Pattern 1: Monotonic Non-Decrease

As you move right or down in the table, values never decrease. Formally:

dp[i][j] ≥ dp[i-1][j] (moving down)
dp[i][j] ≥ dp[i][j-1] (moving right)
dp[i][j] ≥ dp[i-1][j-1] (moving diagonally)

Why? Adding characters to a prefix can only maintain or improve the LCS, never shorten it.

Pattern 2: Maximum Increase of 1

The value can increase by at most 1 when moving to an adjacent cell. You'll never see a jump from 2 to 4, for example.

Why? Each step processes at most one character from each string. At most one new character can be added to the LCS.

Pattern 3: Diagonal Increases Indicate Matches

When dp[i][j] = dp[i-1][j-1] + 1, the characters X[i] and Y[j] matched. These diagonal increases trace the path of the LCS through the table.

Pattern 4: Plateaus Indicate No Contribution

When a large region of the table has the same value, it means those characters don't contribute to extending the LCS beyond what was already achieved.

Pattern 5: The Main Diagonal Matters Most

In strings that are similar (many matches), the largest values tend to cluster near the main diagonal. Dissimilar strings have their maximum values pushed toward the corners.

Pattern 6: Row/Column of Zeros

Beyond the base case, if Y[1..k] contains no characters from X, then the first k+1 entries of every row will be 0. This indicates a "dead zone" for the LCS.

Recognizing PatternsAnalyzing the table for X = "ABCD" and Y = "AEBD"

Input

Output

Debugging with Patterns

Second Worked Example: Longer Strings

Let's work through a more complex example: X = "AGGTAB" and Y = "GXTXAYB".

Initial Setup:

        ε    G    X    T    X    A    Y    B
    ε   0    0    0    0    0    0    0    0
    A   0    _    _    _    _    _    _    _
    G   0    _    _    _    _    _    _    _
    G   0    _    _    _    _    _    _    _
    T   0    _    _    _    _    _    _    _
    A   0    _    _    _    _    _    _    _
    B   0    _    _    _    _    _    _    _

Row-by-Row Computation:

Row 1 (A): Matches at column 5 (A). Values: 0, 0, 0, 0, 0, 1, 1, 1

Row 2 (G): Matches at column 1 (G). Values: 0, 1, 1, 1, 1, 1, 1, 1

Row 3 (G): Matches at column 1 (G). Values: 0, 1, 1, 1, 1, 1, 1, 1

Note: The second 'G' in X doesn't increase the LCS because there's only one 'G' in Y.

Row 4 (T): Matches at column 3 (T).

Cell (4,3): X[4]='T', Y[3]='T'. Match! → dp[3][2] + 1 = 1 + 1 = 2
Values: 0, 1, 1, 2, 2, 2, 2, 2

Row 5 (A): Matches at column 5 (A).

Cell (5,5): X[5]='A', Y[5]='A'. Match! → dp[4][4] + 1 = 2 + 1 = 3
Values: 0, 1, 1, 2, 2, 3, 3, 3

Row 6 (B): Matches at column 7 (B).

Cell (6,7): X[6]='B', Y[7]='B'. Match! → dp[5][6] + 1 = 3 + 1 = 4
Values: 0, 1, 1, 2, 2, 3, 3, 4

Complete Table:

        ε    G    X    T    X    A    Y    B
    ε   0    0    0    0    0    0    0    0
    A   0    0    0    0    0    1    1    1
    G   0    1    1    1    1    1    1    1
    G   0    1    1    1    1    1    1    1
    T   0    1    1    2    2    2    2    2
    A   0    1    1    2    2    3    3    3
    B   0    1    1    2    2    3    3    4

The Answer: LCS length = 4.

The LCS itself: "GTAB"

G: matched at (2, 1)
T: matched at (4, 3)
A: matched at (5, 5)
B: matched at (6, 7)

Multiple Valid LCS

Efficient Table Construction Techniques

While the basic algorithm is O(m × n), there are implementation techniques that can improve practical performance.

Technique 1: Early Termination

If we're computing LCS(X, Y) and we find a common subsequence of length min(m, n), we can stop immediately—the LCS cannot be longer than the shorter string.

Technique 2: Prefix/Suffix Optimization

If X and Y share a common prefix of length p and a common suffix of length s (where p + s ≤ min(m, n)):

These characters are guaranteed to be in the LCS
We only need to compute LCS for the middle portions: X[p+1..m-s] and Y[p+1..n-s]
This reduces the table size significantly for similar strings

lcs_optimized_construction.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def lcs_with_prefix_suffix_optimization(X: str, Y: str) -> int:
    """
    LCS with prefix/suffix optimization for similar strings.
    
    When strings share common prefixes/suffixes, we can reduce
    the effective problem size significantly.
    """
    m, n = len(X), len(Y)
    
    # Find common prefix length
    prefix_len = 0
    while prefix_len < m and prefix_len < n and X[prefix_len] == Y[prefix_len]:
        prefix_len += 1
    
    # Find common suffix length (avoiding overlap with prefix)
    suffix_len = 0
    while (suffix_len < m - prefix_len and 
           suffix_len < n - prefix_len and
           X[m - 1 - suffix_len] == Y[n - 1 - suffix_len]):
        suffix_len += 1
    
    # If strings are identical, return length
    if prefix_len + suffix_len >= min(m, n):
        return min(m, n)
    
    # Extract middle portions
    X_middle = X[prefix_len:m - suffix_len]
    Y_middle = Y[prefix_len:n - suffix_len]
    
    # Compute LCS of middle portions
    middle_lcs = lcs_length(X_middle, Y_middle)
    
    # Total LCS = prefix + middle_lcs + suffix
    return prefix_len + middle_lcs + suffix_len
 
 
def lcs_length(X: str, Y: str) -> int:
    """Standard LCS computation for reference."""
    m, n = len(X), len(Y)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    
    return dp[m][n]
 
 
# Example: Comparing similar strings
X = "Hello, World! This is a test."
Y = "Hello, World! That was a test."  # Similar prefix and suffix
 
# Without optimization: 30 × 31 = 930 cells
# With optimization: Shares "Hello, World! Th" (16) prefix and "s a test." (9) suffix
# Middle portions: "is i" and "at wa" → 4 × 5 = 20 cells
 
print(f"LCS length: {lcs_with_prefix_suffix_optimization(X, Y)}")

Practical Impact

Summary: Mastering Table Construction

We've covered the complete mechanics of DP table construction for the LCS problem.

Key Takeaways

•Table dimensions are (m+1) × (n+1) — The extra row and column handle base cases (empty prefixes).
•Base cases are all zeros — Empty strings have no common subsequence with anything.
•Two computation rules — Match (diagonal + 1) or no match (max of top and left).
•Row-major order ensures dependencies are met — Each cell's prerequisites are computed before it.
•Patterns reveal string relationships — Monotonicity, plateaus, and diagonal increases tell a story.
•Practical optimizations exist — Prefix/suffix trimming can dramatically reduce work for similar strings.

What's Next:

Page Complete

2 / 4