Data Structures & Algorithms2D DP Problems — Grid Paths & LCS

2D Dynamic Programming — Grid Paths & Longest Common Subsequence

LevelIntermediate

Duration75 mins

Topic2D DP Problems — Grid Paths & LCS

2 / 4

Longest Common Subsequence

From Grids to Sequences: The LCS Problem

The Longest Common Subsequence (LCS) problem is one of the most celebrated problems in computer science. It asks a deceptively simple question:

Given two sequences, what is the longest sequence that appears as a subsequence in both?

Unlike substrings (which must be contiguous), a subsequence can skip elements while maintaining relative order. For example, "ACE" is a subsequence of "ABCDE" (skip B, D), but "AEC" is not (order violated).

LCS lies at the heart of:

Unix diff utility: Comparing files and showing changes
Version control systems: Git, SVN, Mercurial merge algorithms
Bioinformatics: DNA/protein sequence alignment
Plagiarism detection: Finding common text fragments
Spell checkers: Computing edit distance (related problem)

The algorithm we'll develop is elegant, instructive, and foundational to understanding how DP handles sequence problems.

What You Will Learn

By the end of this page, you will understand the LCS problem deeply: its recursive structure, the 2D DP formulation, time and space complexity, path reconstruction to find the actual subsequence, and connections to related problems like edit distance. This is essential knowledge for any serious algorithm practitioner.

Understanding the LCS Problem

Formal Definition:

Given two strings X of length m and Y of length n, find the length of the longest sequence Z such that:

Z is a subsequence of X
Z is a subsequence of Y

Subsequence vs Substring:

Concept	Definition	Example from "ALGORITHM"
Substring	Contiguous characters	"LGOR", "ALGO", "RITHM"
Subsequence	Characters in order (can skip)	"AGRM", "LRTM", "AGORITHM"

Examples:

X = "ABCDGH"
Y = "AEDFHR"
LCS = "ADH" (length 3)

X = "AGGTAB"
Y = "GXTXAYB"
LCS = "GTAB" (length 4)

LCS May Not Be Unique

Multiple LCS of the same length may exist. For X="ABCBDAB" and Y="BDCABA", both "BDAB" and "BCAB" are valid LCS (length 4). The algorithm finds the length; reconstructing one specific LCS requires additional work.

Why LCS is Computationally Interesting:

A naive approach would enumerate all subsequences of X (2^m possibilities) and check which appear in Y. This is exponential—completely impractical for strings of even moderate length.

The breakthrough insight: LCS has optimal substructure and overlapping subproblems, making it perfectly suited for dynamic programming. The same LCS algorithm that seemed impossibly slow becomes O(m×n)—practical for strings with millions of characters.

Discovering the Recursive Structure

To develop the DP solution, we first need to understand the problem's recursive structure. Consider strings X = X[0..m-1] and Y = Y[0..n-1].

The Key Insight:

Let LCS(i, j) denote the length of LCS of X[0..i-1] and Y[0..j-1] (the first i characters of X and first j characters of Y).

Case 1: Last characters match

If X[i-1] == Y[j-1], this matching character must be part of some LCS. Why? If we excluded it, we couldn't do better (matching gives us at least one character).

LCS(i, j) = 1 + LCS(i-1, j-1)

Case 2: Last characters don't match

If X[i-1] ≠ Y[j-1], at least one of them is not in the LCS. Either:

The LCS doesn't include X[i-1], so LCS(i, j) = LCS(i-1, j), or
The LCS doesn't include Y[j-1], so LCS(i, j) = LCS(i, j-1)

We take the maximum:

LCS(i, j) = max(LCS(i-1, j), LCS(i, j-1))

Base Cases:

LCS(0, j) = 0 for all j (empty X has no common subsequence)
LCS(i, 0) = 0 for all i (empty Y has no common subsequence)

Converting Mermaid diagram...

lcs_recursive.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def lcs_recursive(X: str, Y: str) -> int:
    """
    Naive recursive LCS - exponential time complexity.
    
    Time Complexity: O(2^(m+n)) in worst case
    Space Complexity: O(m + n) for recursion stack
    """
    def lcs(i: int, j: int) -> int:
        # Base case: one string is empty
        if i == 0 or j == 0:
            return 0
        
        # Case 1: Last characters match
        if X[i - 1] == Y[j - 1]:
            return 1 + lcs(i - 1, j - 1)
        
        # Case 2: Last characters don't match
        else:
            return max(lcs(i - 1, j), lcs(i, j - 1))
    
    return lcs(len(X), len(Y))
 
 
# Example
X = "AGGTAB"
Y = "GXTXAYB"
print(f"LCS length: {lcs_recursive(X, Y)}")  # Output: 4

Exponential Explosion

The naive recursion is disastrously slow. For strings like "AAAA...A" (all same characters), nearly every call branches into two recursive calls. For two 20-character strings, this can require over a million calls. We need memoization.

Memoization: Top-Down DP

The recursive structure reveals overlapping subproblems. Consider computing LCS(i, j): it may call LCS(i-1, j) and LCS(i, j-1), both of which may independently call LCS(i-1, j-1). This overlap multiplies as the recursion deepens.

The Solution: Memoization

Cache each (i, j) result in a 2D table. The recursion tree collapses from exponential branches to at most m × n unique computations.

lcs_memoization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def lcs_memoization(X: str, Y: str) -> int:
    """
    LCS with memoization - polynomial time complexity.
    
    Time Complexity: O(m * n) - each state computed once
    Space Complexity: O(m * n) for memo table + O(m + n) for stack
    """
    m, n = len(X), len(Y)
    
    # Memoization table: -1 means not yet computed
    memo = [[-1] * (n + 1) for _ in range(m + 1)]
    
    def lcs(i: int, j: int) -> int:
        # Base case
        if i == 0 or j == 0:
            return 0
        
        # Return cached result if available
        if memo[i][j] != -1:
            return memo[i][j]
        
        # Compute based on whether characters match
        if X[i - 1] == Y[j - 1]:
            memo[i][j] = 1 + lcs(i - 1, j - 1)
        else:
            memo[i][j] = max(lcs(i - 1, j), lcs(i, j - 1))
        
        return memo[i][j]
    
    return lcs(m, n)
 
 
# Now handles large strings efficiently
X = "AGGTAB" * 100  # 600 characters
Y = "GXTXAYB" * 100  # 700 characters
print(f"LCS length: {lcs_memoization(X, Y)}")

Complexity Improvement:

Aspect	Naive Recursion	Memoization
Time	O(2^(m+n))	O(m × n)
States computed	Exponential duplicates	Each state once
20×20 strings	~1 million calls	400 unique states
1000×1000 strings	Infeasible	1 million states (fast)

Memoization transforms an unusable algorithm into a practical one. But we can do better by eliminating recursion entirely.

Tabulation: Bottom-Up DP

The tabulation approach builds the solution iteratively, avoiding recursion overhead and potential stack overflow. We create an (m+1) × (n+1) table dp where dp[i][j] represents the LCS length of X[0..i-1] and Y[0..j-1].

Table Construction:

Initialize first row and first column to 0 (base cases)
Fill remaining cells using the recurrence:
- If X[i-1] == Y[j-1]: dp[i][j] = 1 + dp[i-1][j-1]
- Otherwise: dp[i][j] = max(dp[i-1][j], dp[i][j-1])
Answer is at dp[m][n]

lcs_tabulation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
def lcs_tabulation(X: str, Y: str) -> int:
    """
    LCS using tabulation (bottom-up DP).
    
    Time Complexity: O(m * n)
    Space Complexity: O(m * n)
    """
    m, n = len(X), len(Y)
    
    # Create (m+1) x (n+1) table, initialized to 0
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    # Fill the table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                # Characters match: extend LCS from diagonal
                dp[i][j] = 1 + dp[i - 1][j - 1]
            else:
                # No match: take max of excluding one character
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    
    return dp[m][n]
 
 
def lcs_with_table_visualization(X: str, Y: str) -> int:
    """Show the DP table for understanding."""
    m, n = len(X), len(Y)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                dp[i][j] = 1 + dp[i - 1][j - 1]
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    
    # Print table with headers
    print("\n    " + "  ".join(f"{c}" for c in " " + Y))
    for i, row in enumerate(dp):
        prefix = X[i - 1] if i > 0 else " "
        print(f"{prefix}   " + "  ".join(f"{x}" for x in row))
    
    return dp[m][n]
 
 
X = "ABCDGH"
Y = "AEDFHR"
print(f"\nLCS length: {lcs_with_table_visualization(X, Y)}")

Visualizing the DP Table for X="ABCDGH", Y="AEDFHR":

      A  E  D  F  H  R
   0  0  0  0  0  0  0
A  0  1  1  1  1  1  1
B  0  1  1  1  1  1  1
C  0  1  1  1  1  1  1
D  0  1  1  2  2  2  2
G  0  1  1  2  2  2  2
H  0  1  1  2  2  3  3

Reading the Table:

Cell (4, 3) = 2: LCS of "ABCD" and "AED" has length 2 ("AD")
Cell (6, 5) = 3: LCS of "ABCDGH" and "AEDFH" has length 3 ("ADH")
The value increases when we find a matching character that extends a previous LCS
The "flow" of increasing values traces the LCS characters

Understanding the Table Geometry

In the LCS table, increases happen at diagonal moves when characters match. Horizontal and vertical moves maintain the previous maximum. The "staircase" pattern of increasing values corresponds to matched characters in the LCS.

Reconstructing the Actual LCS

The DP table gives us the length, but often we need the actual subsequence. We can reconstruct it by tracing back through the table, following the decisions that led to each cell's value.

Backtracking Algorithm:

Start at dp[m][n] and trace backward:

If X[i-1] == Y[j-1]: This character is in the LCS. Add it, move diagonally to (i-1, j-1).
Otherwise: Move to the larger of dp[i-1][j] or dp[i][j-1].
Stop when i == 0 or j == 0.

Since we trace backward, the LCS is built in reverse order—reverse it at the end.

lcs_reconstruct.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
def lcs_with_reconstruction(X: str, Y: str) -> tuple[int, str]:
    """
    Compute LCS length and reconstruct the actual LCS string.
    
    Time Complexity: O(m * n) for DP + O(m + n) for reconstruction
    Space Complexity: O(m * n)
    
    Returns:
        Tuple of (length, LCS string)
    """
    m, n = len(X), len(Y)
    
    # Build the DP table as before
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                dp[i][j] = 1 + dp[i - 1][j - 1]
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    
    # Reconstruct LCS by backtracking
    lcs_chars = []
    i, j = m, n
    
    while i > 0 and j > 0:
        if X[i - 1] == Y[j - 1]:
            # This character is part of LCS
            lcs_chars.append(X[i - 1])
            i -= 1
            j -= 1
        elif dp[i - 1][j] > dp[i][j - 1]:
            # Came from above
            i -= 1
        else:
            # Came from left (or equal, arbitrary choice)
            j -= 1
    
    # Reverse since we built it backwards
    lcs_string = "".join(reversed(lcs_chars))
    
    return dp[m][n], lcs_string
 
 
# Example
X = "AGGTAB"
Y = "GXTXAYB"
length, lcs = lcs_with_reconstruction(X, Y)
print(f"X = '{X}'")
print(f"Y = '{Y}'")
print(f"LCS = '{lcs}' (length {length})")
# Output: LCS = 'GTAB' (length 4)

Converting Mermaid diagram...

Finding All LCS

When dp[i-1][j] == dp[i][j-1], we have a choice of which way to go, leading to different but equally valid LCS. To find all LCS, use backtracking with branching at these tie points. The number of distinct LCS can be exponential in pathological cases.

Space Optimization

Like grid paths, LCS can be space-optimized. Each cell only depends on:

The cell directly above: dp[i-1][j]
The cell to the left: dp[i][j-1]
The cell diagonally above-left: dp[i-1][j-1]

We can reduce from O(m×n) to O(min(m, n)) using two rows or even one row with a saved diagonal value.

Important Caveat: Space optimization destroys the backtracking path. If you need to reconstruct the LCS string, keep the full table. If you only need the length, optimize space.

lcs_space_optimized.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
def lcs_space_optimized(X: str, Y: str) -> int:
    """
    Space-optimized LCS using two rows.
    
    Time Complexity: O(m * n)
    Space Complexity: O(min(m, n))
    """
    # Ensure Y is the shorter string for optimal space
    if len(X) < len(Y):
        X, Y = Y, X
    
    m, n = len(X), len(Y)
    
    # Two rows of size n+1
    prev = [0] * (n + 1)
    curr = [0] * (n + 1)
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                curr[j] = 1 + prev[j - 1]
            else:
                curr[j] = max(prev[j], curr[j - 1])
        
        # Swap rows
        prev, curr = curr, prev
    
    return prev[n]
 
 
def lcs_single_row(X: str, Y: str) -> int:
    """
    Even more optimized: single row with diagonal tracking.
    
    Time Complexity: O(m * n)
    Space Complexity: O(min(m, n))
    """
    if len(X) < len(Y):
        X, Y = Y, X
    
    m, n = len(X), len(Y)
    dp = [0] * (n + 1)
    
    for i in range(1, m + 1):
        prev_diagonal = 0  # dp[i-1][j-1] before update
        
        for j in range(1, n + 1):
            # Save current value before overwriting (it becomes prev_diagonal for next j)
            temp = dp[j]
            
            if X[i - 1] == Y[j - 1]:
                dp[j] = 1 + prev_diagonal
            else:
                dp[j] = max(dp[j], dp[j - 1])
            
            prev_diagonal = temp
    
    return dp[n]
 
 
# Verify both give same results
X = "ABCDGH"
Y = "AEDFHR"
print(f"Two rows: {lcs_space_optimized(X, Y)}")  # 3
print(f"Single row: {lcs_single_row(X, Y)}")     # 3

LCS Space Complexity Options
Approach	Space	Can Reconstruct LCS?
Full 2D Table	O(m × n)	Yes
Two Rows	O(min(m, n))	No
Single Row	O(min(m, n))	No
Hirschberg Algorithm	O(min(m, n))	Yes! (Advanced)

Hirschberg's Algorithm

The Hirschberg algorithm achieves O(min(m,n)) space while still allowing LCS reconstruction. It uses a divide-and-conquer approach: find the "midpoint" of the LCS using forward and backward passes, then recurse. This is the standard algorithm for memory-constrained sequence alignment in bioinformatics.

Applications and Variations

LCS is the foundation for many practical algorithms. Understanding it unlocks a family of related problems.

1. Diff and Patch Algorithms:

The Unix diff utility computes the LCS of two files (treating lines as characters). The difference between files is everything not in the LCS. This enables minimal change descriptions for version control.

2. Edit Distance (Levenshtein Distance):

Closely related to LCS. If lcs_length is the LCS length, then:

edit_distance = (m - lcs_length) + (n - lcs_length)

Or equivalently: the number of insertions + deletions to transform X into Y.

3. Sequence Alignment in Bioinformatics:

DNA sequence comparison uses weighted LCS variants. The Smith-Waterman algorithm extends LCS with:

Match/mismatch scores
Gap penalties (consecutive missing characters)
Local alignment (find best matching subregions)

LCS Variations

•Longest Common Substring — Must be contiguous, different DP formulation
•Shortest Common Supersequence — Smallest string containing both as subsequences
•LCS of 3+ Strings — Extends to higher dimensions
•Longest Palindromic Subsequence — LCS(s, reverse(s))

Real-World Uses

•Git — Three-way merge uses LCS to find common base
•BLAST — Biological sequence comparison
•Plagiarism detection — Finding copied text fragments
•Spell checkers — Edit distance for suggestions

lcs_applications.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def longest_palindromic_subsequence(s: str) -> int:
    """
    LCS of s and reverse(s) gives longest palindromic subsequence.
    
    Example: "bbbab" -> "bbbb" (length 4)
    """
    return lcs_tabulation(s, s[::-1])
 
 
def shortest_common_supersequence_length(X: str, Y: str) -> int:
    """
    Length of shortest string containing both X and Y as subsequences.
    
    SCS length = m + n - LCS length
    
    Example: X="AGGTAB", Y="GXTXAYB"
    LCS = "GTAB" (4)
    SCS = "AGGXTXAYB" (6 + 7 - 4 = 9)
    """
    lcs_len = lcs_tabulation(X, Y)
    return len(X) + len(Y) - lcs_len
 
 
def edit_distance_from_lcs(X: str, Y: str) -> int:
    """
    Edit distance (insertions + deletions) from LCS.
    
    To transform X to Y:
    - Delete characters in X not in LCS
    - Insert characters in Y not in LCS
    """
    lcs_len = lcs_tabulation(X, Y)
    deletions = len(X) - lcs_len
    insertions = len(Y) - lcs_len
    return deletions + insertions
 
 
# Reference implementation from earlier
def lcs_tabulation(X: str, Y: str) -> int:
    m, n = len(X), len(Y)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                dp[i][j] = 1 + dp[i - 1][j - 1]
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    return dp[m][n]
 
 
# Examples
print(f"Longest palindromic subseq of 'bbbab': {longest_palindromic_subsequence('bbbab')}")
print(f"SCS length of 'AGGTAB' and 'GXTXAYB': {shortest_common_supersequence_length('AGGTAB', 'GXTXAYB')}")
print(f"Edit distance: {edit_distance_from_lcs('kitten', 'sitting')}")

Common Mistakes and Interview Tips

LCS is a frequent interview topic. Here are common pitfalls and how to avoid them:

Mistake 1: Confusing Subsequence with Substring

Subsequence: Can skip characters ("ACE" from "ABCDE")
Substring: Must be contiguous ("BCD" from "ABCDE")
The DP recurrences are different!

Mistake 2: Off-by-One Indexing

The DP table has dimensions (m+1) × (n+1) to accommodate base cases. dp[i][j] refers to prefixes X[0..i-1] and Y[0..j-1]. Confusing 0-indexed strings with 1-indexed table positions is a common bug source.

Mistake 3: Incorrect Base Cases

Both dp[0][j] and dp[i][0] should be 0 (empty string has no common subsequence). Forgetting to initialize or initializing incorrectly breaks the algorithm.

Interview Tips

•Start with the recurrence: Explain the two cases (match/no match) before coding. This shows understanding.
•Draw the table: For small examples, drawing the DP table catches logical errors and impresses interviewers.
•Discuss complexity upfront: State O(m×n) time and space before implementation. Mention space optimization if asked.
•Handle edge cases: Empty strings, identical strings, completely different strings. Test your solution mentally.
•Know the reconstruction: Be ready to modify your solution to return the actual LCS string, not just its length.

The Substring Trap

If asked for Longest Common Substring (contiguous), the recurrence changes: dp[i][j] = dp[i-1][j-1] + 1 only when characters match, and dp[i][j] = 0 otherwise. The answer is max(dp), not dp[m][n]. Don't confuse the two problems!

Summary and Key Takeaways

The Longest Common Subsequence problem exemplifies the power of 2D dynamic programming for sequence comparison. Let's consolidate our understanding:

Key Concepts Mastered

•Problem Structure — LCS asks for the longest sequence appearing in both inputs, with characters in order but not necessarily contiguous.
•Recursive Insight — Match → extend diagonally; No match → take max of excluding one character.
•2D DP Table — dp[i][j] = LCS length of X[0..i-1] and Y[0..j-1], with O(m×n) time and space.
•Reconstruction — Backtrack from dp[m][n], moving diagonally on matches and toward larger values otherwise.
•Space Optimization — Reduce to O(min(m,n)) using rolling rows, but lose reconstruction ability (unless using Hirschberg).
•Applications — Diff, version control, sequence alignment, edit distance, plagiarism detection.

The Bigger Picture:

LCS and grid paths share the same 2D DP structure but solve different problems. Grid paths count possibilities; LCS optimizes sequence similarity. This versatility is the power of 2D DP: a single framework for diverse problems.

What's Next:

We've seen how to structure state with two indices. The next page formalizes this: when do we need 2D state? How do we identify the right indices? Mastering state design is the key to solving novel DP problems.

Page Complete

You now understand the Longest Common Subsequence problem—its structure, algorithms, and applications. Combined with grid paths, you have a strong foundation in 2D DP. Next, we'll explore the art of state design: how to identify when two indices are needed and what they should represent.

2 / 4

Loading learning content...

Data Structures & Algorithms2D DP Problems — Grid Paths & LCS

2D Dynamic Programming — Grid Paths & Longest Common Subsequence

LevelIntermediate

Duration75 mins

Topic2D DP Problems — Grid Paths & LCS

2 / 4

Longest Common Subsequence

From Grids to Sequences: The LCS Problem

The Longest Common Subsequence (LCS) problem is one of the most celebrated problems in computer science. It asks a deceptively simple question:

Given two sequences, what is the longest sequence that appears as a subsequence in both?

LCS lies at the heart of:

Unix diff utility: Comparing files and showing changes
Version control systems: Git, SVN, Mercurial merge algorithms
Bioinformatics: DNA/protein sequence alignment
Plagiarism detection: Finding common text fragments
Spell checkers: Computing edit distance (related problem)

The algorithm we'll develop is elegant, instructive, and foundational to understanding how DP handles sequence problems.

What You Will Learn

Understanding the LCS Problem

Formal Definition:

Given two strings X of length m and Y of length n, find the length of the longest sequence Z such that:

Z is a subsequence of X
Z is a subsequence of Y

Subsequence vs Substring:

Concept	Definition	Example from "ALGORITHM"
Substring	Contiguous characters	"LGOR", "ALGO", "RITHM"
Subsequence	Characters in order (can skip)	"AGRM", "LRTM", "AGORITHM"

Examples:

X = "ABCDGH"
Y = "AEDFHR"
LCS = "ADH" (length 3)

X = "AGGTAB"
Y = "GXTXAYB"
LCS = "GTAB" (length 4)

LCS May Not Be Unique

Why LCS is Computationally Interesting:

A naive approach would enumerate all subsequences of X (2^m possibilities) and check which appear in Y. This is exponential—completely impractical for strings of even moderate length.

Discovering the Recursive Structure

To develop the DP solution, we first need to understand the problem's recursive structure. Consider strings X = X[0..m-1] and Y = Y[0..n-1].

The Key Insight:

Let LCS(i, j) denote the length of LCS of X[0..i-1] and Y[0..j-1] (the first i characters of X and first j characters of Y).

Case 1: Last characters match

If X[i-1] == Y[j-1], this matching character must be part of some LCS. Why? If we excluded it, we couldn't do better (matching gives us at least one character).

LCS(i, j) = 1 + LCS(i-1, j-1)

Case 2: Last characters don't match

If X[i-1] ≠ Y[j-1], at least one of them is not in the LCS. Either:

The LCS doesn't include X[i-1], so LCS(i, j) = LCS(i-1, j), or
The LCS doesn't include Y[j-1], so LCS(i, j) = LCS(i, j-1)

We take the maximum:

LCS(i, j) = max(LCS(i-1, j), LCS(i, j-1))

Base Cases:

LCS(0, j) = 0 for all j (empty X has no common subsequence)
LCS(i, 0) = 0 for all i (empty Y has no common subsequence)

Converting Mermaid diagram...

lcs_recursive.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def lcs_recursive(X: str, Y: str) -> int:
    """
    Naive recursive LCS - exponential time complexity.
    
    Time Complexity: O(2^(m+n)) in worst case
    Space Complexity: O(m + n) for recursion stack
    """
    def lcs(i: int, j: int) -> int:
        # Base case: one string is empty
        if i == 0 or j == 0:
            return 0
        
        # Case 1: Last characters match
        if X[i - 1] == Y[j - 1]:
            return 1 + lcs(i - 1, j - 1)
        
        # Case 2: Last characters don't match
        else:
            return max(lcs(i - 1, j), lcs(i, j - 1))
    
    return lcs(len(X), len(Y))
 
 
# Example
X = "AGGTAB"
Y = "GXTXAYB"
print(f"LCS length: {lcs_recursive(X, Y)}")  # Output: 4

Exponential Explosion

Memoization: Top-Down DP

The Solution: Memoization

Cache each (i, j) result in a 2D table. The recursion tree collapses from exponential branches to at most m × n unique computations.

lcs_memoization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def lcs_memoization(X: str, Y: str) -> int:
    """
    LCS with memoization - polynomial time complexity.
    
    Time Complexity: O(m * n) - each state computed once
    Space Complexity: O(m * n) for memo table + O(m + n) for stack
    """
    m, n = len(X), len(Y)
    
    # Memoization table: -1 means not yet computed
    memo = [[-1] * (n + 1) for _ in range(m + 1)]
    
    def lcs(i: int, j: int) -> int:
        # Base case
        if i == 0 or j == 0:
            return 0
        
        # Return cached result if available
        if memo[i][j] != -1:
            return memo[i][j]
        
        # Compute based on whether characters match
        if X[i - 1] == Y[j - 1]:
            memo[i][j] = 1 + lcs(i - 1, j - 1)
        else:
            memo[i][j] = max(lcs(i - 1, j), lcs(i, j - 1))
        
        return memo[i][j]
    
    return lcs(m, n)
 
 
# Now handles large strings efficiently
X = "AGGTAB" * 100  # 600 characters
Y = "GXTXAYB" * 100  # 700 characters
print(f"LCS length: {lcs_memoization(X, Y)}")

Complexity Improvement:

Aspect	Naive Recursion	Memoization
Time	O(2^(m+n))	O(m × n)
States computed	Exponential duplicates	Each state once
20×20 strings	~1 million calls	400 unique states
1000×1000 strings	Infeasible	1 million states (fast)

Memoization transforms an unusable algorithm into a practical one. But we can do better by eliminating recursion entirely.

Tabulation: Bottom-Up DP

Table Construction:

Initialize first row and first column to 0 (base cases)
Fill remaining cells using the recurrence:
- If X[i-1] == Y[j-1]: dp[i][j] = 1 + dp[i-1][j-1]
- Otherwise: dp[i][j] = max(dp[i-1][j], dp[i][j-1])
Answer is at dp[m][n]

lcs_tabulation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
def lcs_tabulation(X: str, Y: str) -> int:
    """
    LCS using tabulation (bottom-up DP).
    
    Time Complexity: O(m * n)
    Space Complexity: O(m * n)
    """
    m, n = len(X), len(Y)
    
    # Create (m+1) x (n+1) table, initialized to 0
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    # Fill the table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                # Characters match: extend LCS from diagonal
                dp[i][j] = 1 + dp[i - 1][j - 1]
            else:
                # No match: take max of excluding one character
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    
    return dp[m][n]
 
 
def lcs_with_table_visualization(X: str, Y: str) -> int:
    """Show the DP table for understanding."""
    m, n = len(X), len(Y)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                dp[i][j] = 1 + dp[i - 1][j - 1]
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    
    # Print table with headers
    print("\n    " + "  ".join(f"{c}" for c in " " + Y))
    for i, row in enumerate(dp):
        prefix = X[i - 1] if i > 0 else " "
        print(f"{prefix}   " + "  ".join(f"{x}" for x in row))
    
    return dp[m][n]
 
 
X = "ABCDGH"
Y = "AEDFHR"
print(f"\nLCS length: {lcs_with_table_visualization(X, Y)}")

Visualizing the DP Table for X="ABCDGH", Y="AEDFHR":

      A  E  D  F  H  R
   0  0  0  0  0  0  0
A  0  1  1  1  1  1  1
B  0  1  1  1  1  1  1
C  0  1  1  1  1  1  1
D  0  1  1  2  2  2  2
G  0  1  1  2  2  2  2
H  0  1  1  2  2  3  3

Reading the Table:

Cell (4, 3) = 2: LCS of "ABCD" and "AED" has length 2 ("AD")
Cell (6, 5) = 3: LCS of "ABCDGH" and "AEDFH" has length 3 ("ADH")
The value increases when we find a matching character that extends a previous LCS
The "flow" of increasing values traces the LCS characters

Understanding the Table Geometry

Reconstructing the Actual LCS

The DP table gives us the length, but often we need the actual subsequence. We can reconstruct it by tracing back through the table, following the decisions that led to each cell's value.

Backtracking Algorithm:

Start at dp[m][n] and trace backward:

If X[i-1] == Y[j-1]: This character is in the LCS. Add it, move diagonally to (i-1, j-1).
Otherwise: Move to the larger of dp[i-1][j] or dp[i][j-1].
Stop when i == 0 or j == 0.

Since we trace backward, the LCS is built in reverse order—reverse it at the end.

lcs_reconstruct.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
def lcs_with_reconstruction(X: str, Y: str) -> tuple[int, str]:
    """
    Compute LCS length and reconstruct the actual LCS string.
    
    Time Complexity: O(m * n) for DP + O(m + n) for reconstruction
    Space Complexity: O(m * n)
    
    Returns:
        Tuple of (length, LCS string)
    """
    m, n = len(X), len(Y)
    
    # Build the DP table as before
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                dp[i][j] = 1 + dp[i - 1][j - 1]
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    
    # Reconstruct LCS by backtracking
    lcs_chars = []
    i, j = m, n
    
    while i > 0 and j > 0:
        if X[i - 1] == Y[j - 1]:
            # This character is part of LCS
            lcs_chars.append(X[i - 1])
            i -= 1
            j -= 1
        elif dp[i - 1][j] > dp[i][j - 1]:
            # Came from above
            i -= 1
        else:
            # Came from left (or equal, arbitrary choice)
            j -= 1
    
    # Reverse since we built it backwards
    lcs_string = "".join(reversed(lcs_chars))
    
    return dp[m][n], lcs_string
 
 
# Example
X = "AGGTAB"
Y = "GXTXAYB"
length, lcs = lcs_with_reconstruction(X, Y)
print(f"X = '{X}'")
print(f"Y = '{Y}'")
print(f"LCS = '{lcs}' (length {length})")
# Output: LCS = 'GTAB' (length 4)

Converting Mermaid diagram...

Finding All LCS

Space Optimization

Like grid paths, LCS can be space-optimized. Each cell only depends on:

The cell directly above: dp[i-1][j]
The cell to the left: dp[i][j-1]
The cell diagonally above-left: dp[i-1][j-1]

We can reduce from O(m×n) to O(min(m, n)) using two rows or even one row with a saved diagonal value.

Important Caveat: Space optimization destroys the backtracking path. If you need to reconstruct the LCS string, keep the full table. If you only need the length, optimize space.

lcs_space_optimized.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
def lcs_space_optimized(X: str, Y: str) -> int:
    """
    Space-optimized LCS using two rows.
    
    Time Complexity: O(m * n)
    Space Complexity: O(min(m, n))
    """
    # Ensure Y is the shorter string for optimal space
    if len(X) < len(Y):
        X, Y = Y, X
    
    m, n = len(X), len(Y)
    
    # Two rows of size n+1
    prev = [0] * (n + 1)
    curr = [0] * (n + 1)
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                curr[j] = 1 + prev[j - 1]
            else:
                curr[j] = max(prev[j], curr[j - 1])
        
        # Swap rows
        prev, curr = curr, prev
    
    return prev[n]
 
 
def lcs_single_row(X: str, Y: str) -> int:
    """
    Even more optimized: single row with diagonal tracking.
    
    Time Complexity: O(m * n)
    Space Complexity: O(min(m, n))
    """
    if len(X) < len(Y):
        X, Y = Y, X
    
    m, n = len(X), len(Y)
    dp = [0] * (n + 1)
    
    for i in range(1, m + 1):
        prev_diagonal = 0  # dp[i-1][j-1] before update
        
        for j in range(1, n + 1):
            # Save current value before overwriting (it becomes prev_diagonal for next j)
            temp = dp[j]
            
            if X[i - 1] == Y[j - 1]:
                dp[j] = 1 + prev_diagonal
            else:
                dp[j] = max(dp[j], dp[j - 1])
            
            prev_diagonal = temp
    
    return dp[n]
 
 
# Verify both give same results
X = "ABCDGH"
Y = "AEDFHR"
print(f"Two rows: {lcs_space_optimized(X, Y)}")  # 3
print(f"Single row: {lcs_single_row(X, Y)}")     # 3

LCS Space Complexity Options
Approach	Space	Can Reconstruct LCS?
Full 2D Table	O(m × n)	Yes
Two Rows	O(min(m, n))	No
Single Row	O(min(m, n))	No
Hirschberg Algorithm	O(min(m, n))	Yes! (Advanced)

Hirschberg's Algorithm

Applications and Variations

LCS is the foundation for many practical algorithms. Understanding it unlocks a family of related problems.

1. Diff and Patch Algorithms:

2. Edit Distance (Levenshtein Distance):

Closely related to LCS. If lcs_length is the LCS length, then:

edit_distance = (m - lcs_length) + (n - lcs_length)

Or equivalently: the number of insertions + deletions to transform X into Y.

3. Sequence Alignment in Bioinformatics:

DNA sequence comparison uses weighted LCS variants. The Smith-Waterman algorithm extends LCS with:

Match/mismatch scores
Gap penalties (consecutive missing characters)
Local alignment (find best matching subregions)

LCS Variations

•Longest Common Substring — Must be contiguous, different DP formulation
•Shortest Common Supersequence — Smallest string containing both as subsequences
•LCS of 3+ Strings — Extends to higher dimensions
•Longest Palindromic Subsequence — LCS(s, reverse(s))

Real-World Uses

•Git — Three-way merge uses LCS to find common base
•BLAST — Biological sequence comparison
•Plagiarism detection — Finding copied text fragments
•Spell checkers — Edit distance for suggestions

lcs_applications.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def longest_palindromic_subsequence(s: str) -> int:
    """
    LCS of s and reverse(s) gives longest palindromic subsequence.
    
    Example: "bbbab" -> "bbbb" (length 4)
    """
    return lcs_tabulation(s, s[::-1])
 
 
def shortest_common_supersequence_length(X: str, Y: str) -> int:
    """
    Length of shortest string containing both X and Y as subsequences.
    
    SCS length = m + n - LCS length
    
    Example: X="AGGTAB", Y="GXTXAYB"
    LCS = "GTAB" (4)
    SCS = "AGGXTXAYB" (6 + 7 - 4 = 9)
    """
    lcs_len = lcs_tabulation(X, Y)
    return len(X) + len(Y) - lcs_len
 
 
def edit_distance_from_lcs(X: str, Y: str) -> int:
    """
    Edit distance (insertions + deletions) from LCS.
    
    To transform X to Y:
    - Delete characters in X not in LCS
    - Insert characters in Y not in LCS
    """
    lcs_len = lcs_tabulation(X, Y)
    deletions = len(X) - lcs_len
    insertions = len(Y) - lcs_len
    return deletions + insertions
 
 
# Reference implementation from earlier
def lcs_tabulation(X: str, Y: str) -> int:
    m, n = len(X), len(Y)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                dp[i][j] = 1 + dp[i - 1][j - 1]
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    return dp[m][n]
 
 
# Examples
print(f"Longest palindromic subseq of 'bbbab': {longest_palindromic_subsequence('bbbab')}")
print(f"SCS length of 'AGGTAB' and 'GXTXAYB': {shortest_common_supersequence_length('AGGTAB', 'GXTXAYB')}")
print(f"Edit distance: {edit_distance_from_lcs('kitten', 'sitting')}")

Common Mistakes and Interview Tips

LCS is a frequent interview topic. Here are common pitfalls and how to avoid them:

Mistake 1: Confusing Subsequence with Substring

Subsequence: Can skip characters ("ACE" from "ABCDE")
Substring: Must be contiguous ("BCD" from "ABCDE")
The DP recurrences are different!

Mistake 2: Off-by-One Indexing

Mistake 3: Incorrect Base Cases

Both dp[0][j] and dp[i][0] should be 0 (empty string has no common subsequence). Forgetting to initialize or initializing incorrectly breaks the algorithm.

Interview Tips

•Start with the recurrence: Explain the two cases (match/no match) before coding. This shows understanding.
•Draw the table: For small examples, drawing the DP table catches logical errors and impresses interviewers.
•Discuss complexity upfront: State O(m×n) time and space before implementation. Mention space optimization if asked.
•Handle edge cases: Empty strings, identical strings, completely different strings. Test your solution mentally.
•Know the reconstruction: Be ready to modify your solution to return the actual LCS string, not just its length.

The Substring Trap

Summary and Key Takeaways

The Longest Common Subsequence problem exemplifies the power of 2D dynamic programming for sequence comparison. Let's consolidate our understanding:

Key Concepts Mastered

•Problem Structure — LCS asks for the longest sequence appearing in both inputs, with characters in order but not necessarily contiguous.
•Recursive Insight — Match → extend diagonally; No match → take max of excluding one character.
•2D DP Table — dp[i][j] = LCS length of X[0..i-1] and Y[0..j-1], with O(m×n) time and space.
•Reconstruction — Backtrack from dp[m][n], moving diagonally on matches and toward larger values otherwise.
•Space Optimization — Reduce to O(min(m,n)) using rolling rows, but lose reconstruction ability (unless using Hirschberg).
•Applications — Diff, version control, sequence alignment, edit distance, plagiarism detection.

The Bigger Picture:

What's Next:

Page Complete

2 / 4