String DP — Edit Distance - Learning Module

Loading content...

0/276

Insert, Delete, Replace Operations

The Three Pillars of String Transformation

Every transformation from one string to another can be decomposed into a precise sequence of atomic operations. These operations—insertion, deletion, and substitution (replacement)—form the complete basis for string transformation. Understanding them deeply is essential not just for implementing edit distance correctly, but for reasoning about string algorithms, designing efficient solutions, and understanding the elegant mathematical structure underlying string comparison.

In this page, we examine each operation in meticulous detail, explore how they interact in the DP formulation, implement the complete algorithm, and investigate important variants where different operations carry different costs.

What You Will Master

By the end of this page, you will deeply understand how each operation contributes to the edit distance computation, be able to trace through the algorithm step-by-step, implement both memoized and tabulation solutions, and understand weighted edit distance variants used in real-world applications.

The Insertion Operation

Insertion adds a character to the source string to make it more like the target. It's the operation we use when the target has characters that the source lacks.

Formal Definition:

An insertion at position k in string s = s₀s₁...sₘ₋₁ with character c produces:

s' = s₀s₁...sₖ₋₁ · c · sₖ...sₘ₋₁

The new string s' has length m + 1.

Insertion in ActionTransforming 'car' to 'cart'

Input

Output

The DP Perspective on Insertion:

In the recurrence relation, insertion corresponds to:

$$dp[i][j] = dp[i][j-1] + 1$$

Interpretation: We've matched the first i characters of s with the first j-1 characters of t. Now we insert t[j-1] to match the j-th character of t. The source pointer (i) stays the same because we didn't consume any source character—we added one.

Visual Intuition:

Source: c a r _     (pointer stays at position 3, after 'r')
Target: c a r t     (pointer moves from position 3 to 4)
              ↑
            insert 't' here

After insertion, both strings have matched up to position j.

When Insertions Dominate

Insertions are the primary operation when transforming a shorter string into a longer one. The base case dp[0][j] = j represents a sequence of j insertions, transforming the empty string into the first j characters of t.

The Deletion Operation

Deletion removes a character from the source string. It's the operation we use when the source has characters that don't belong in the target.

Formal Definition:

A deletion at position k in string s = s₀s₁...sₘ₋₁ produces:

s' = s₀s₁...sₖ₋₁ · sₖ₊₁...sₘ₋₁

Character sₖ is removed. The new string s' has length m - 1.

Deletion in ActionTransforming 'cart' to 'car'

Input

Output

The DP Perspective on Deletion:

In the recurrence relation, deletion corresponds to:

$$dp[i][j] = dp[i-1][j] + 1$$

Interpretation: We delete s[i-1] and then solve the subproblem of matching the first i-1 characters of s with the first j characters of t. The target pointer (j) stays the same because we didn't progress in matching t—we just removed a source character.

Visual Intuition:

Source: c a r t     (pointer moves from position 4 to 3)
Target: c a r _     (pointer stays at position 3)
            ↑
          delete 't' here

After deletion, we continue trying to match the remaining source with the target.

The Symmetry Between Insertion and Deletion

Insertion into s is equivalent to deletion from t, and vice versa. This is why edit distance is symmetric: d(s, t) = d(t, s). Every insertion in one direction becomes a deletion in the reverse direction. This symmetry is a powerful insight for reasoning about string transformations.

The Substitution (Replacement) Operation

Substitution (also called replacement) changes one character into another. It's used when the source and target have different characters at corresponding positions.

Formal Definition:

A substitution at position k in string s = s₀s₁...sₘ₋₁, replacing sₖ with character c, produces:

s' = s₀s₁...sₖ₋₁ · c · sₖ₊₁...sₘ₋₁

The string length remains m, but character sₖ is now c.

Substitution in ActionTransforming 'cat' to 'cut'

Input

Output

The DP Perspective on Substitution:

In the recurrence relation, substitution corresponds to:

$$dp[i][j] = dp[i-1][j-1] + 1$$

Interpretation: We replace s[i-1] with t[j-1], making them match. Then we solve the subproblem of matching the first i-1 characters of s with the first j-1 characters of t. Both pointers advance because we've handled one position in both strings.

Visual Intuition:

Source: c a t     (pointer moves from position 2 to 1)
Target: c u t     (pointer moves from position 2 to 1)
          ↑
        substitute 'a' → 'u'

The Match Case (Free Substitution):

When s[i-1] == t[j-1], the substitution costs 0 instead of 1. We're effectively "substituting" a character with itself—which is a no-op. This is why the recurrence has the special case:

if s[i-1] == t[j-1]:
    dp[i][j] = dp[i-1][j-1]  # No cost, characters already match

Substitution as Delete + Insert

Note that a substitution can be simulated by a deletion followed by an insertion (or vice versa). Replacing 'a' with 'u' is equivalent to: delete 'a', then insert 'u'. This costs 2 with standard weights. The fact that substitution costs only 1 in Levenshtein distance reflects a modeling choice: we consider a single-character change to be a single atomic operation, not two operations.

The Complete Recurrence Unified

Now that we understand each operation individually, let's see how they combine into the complete recurrence relation that drives the edit distance algorithm.

recurrence.pseudo

Pseudocode

EDIT-DISTANCE-RECURRENCE(s, t, i, j):
    # Base cases
    if i == 0:
        return j           # Transform empty to t[0..j-1]: j insertions
    if j == 0:
        return i           # Transform s[0..i-1] to empty: i deletions
    
    # If characters match, no operation needed at this position
    if s[i-1] == t[j-1]:
        return dp[i-1][j-1]    # Move diagonally, cost 0
    
    # Characters don't match: consider all three operations
    insertion    = dp[i][j-1]     + 1    # Insert t[j-1]
    deletion     = dp[i-1][j]     + 1    # Delete s[i-1]
    substitution = dp[i-1][j-1]   + 1    # Replace s[i-1] with t[j-1]
    
    return min(insertion, deletion, substitution)

Dependency Diagram:

Each cell dp[i][j] depends on exactly three neighboring cells:

                j-1    j
              ┌─────┬─────┐
        i-1   │  ↖  │  ↑  │
              │ sub │ del │
              ├─────┼─────┤
        i     │  ←  │  ?  │
              │ ins │dp[i][j]
              └─────┴─────┘

↖ (diagonal): dp[i-1][j-1] — used for match or substitution
↑ (above): dp[i-1][j] — used for deletion
← (left): dp[i][j-1] — used for insertion

This dependency structure means we can fill the table in any order that ensures these three cells are computed before dp[i][j]. Row-by-row (left to right) and column-by-column (top to bottom) both work.

The Minimum Matters

At each cell, we always take the minimum of the three options. This greedy-within-DP choice is what ensures we find the globally optimal solution. The recurrence explores all possible transformation strategies and picks the cheapest at every step—and thanks to optimal substructure, local optimality leads to global optimality.

Complete Implementation

Let's implement edit distance in both top-down (memoized) and bottom-up (tabulation) styles. Both have the same O(mn) time complexity, but with different tradeoffs.

edit_distance_tabulation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def edit_distance(s: str, t: str) -> int:
    """
    Compute edit distance using bottom-up tabulation.
    Time: O(m * n), Space: O(m * n)
    """
    m, n = len(s), len(t)
    
    # Create DP table with (m+1) x (n+1) dimensions
    # dp[i][j] = edit distance between s[0:i] and t[0:j]
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    # Base cases: transforming to/from empty string
    for i in range(m + 1):
        dp[i][0] = i  # Delete all characters from s[0:i]
    for j in range(n + 1):
        dp[0][j] = j  # Insert all characters of t[0:j]
    
    # Fill the table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s[i - 1] == t[j - 1]:
                # Characters match - no operation needed
                dp[i][j] = dp[i - 1][j - 1]
            else:
                # Take minimum of three operations
                dp[i][j] = 1 + min(
                    dp[i][j - 1],      # Insert t[j-1]
                    dp[i - 1][j],      # Delete s[i-1]
                    dp[i - 1][j - 1]   # Replace s[i-1] with t[j-1]
                )
    
    return dp[m][n]
 
 
# Example usage
print(edit_distance("kitten", "sitting"))  # Output: 3
print(edit_distance("saturday", "sunday")) # Output: 3

Tabulation vs Memoization Trade-offs

Tabulation is generally preferred for edit distance because: (1) it avoids recursion stack overhead, (2) it has more predictable memory access patterns (cache-friendly), and (3) it's easier to optimize for space (as we'll see later). Memoization can be useful when you only need to compute a subset of the table or when the natural problem formulation is recursive.

Step-by-Step Execution Trace

Let's trace through the algorithm for computing edit distance between "INTENTION" and "EXECUTION". This classic example appears in many algorithms textbooks.

Strings:

Source s = "INTENTION" (length 9)
Target t = "EXECUTION" (length 9)

Expected Result: 5 operations

Complete DP Table: "INTENTION" → "EXECUTION"
	ε	E	X	E	C	U	T	I	O	N
ε	0	1	2	3	4	5	6	7	8	9
I	1	1	2	3	4	5	6	6	7	8
N	2	2	2	3	4	5	6	7	7	7
T	3	3	3	3	4	5	5	6	7	8
E	4	3	4	3	4	5	6	6	7	8
N	5	4	4	4	4	5	6	7	7	7
T	6	5	5	5	5	5	5	6	7	8
I	7	6	6	6	6	6	6	5	6	7
O	8	7	7	7	7	7	7	6	5	6
N	9	8	8	8	8	8	8	7	6	5

Key Observations:

Base cases (row 0 and column 0): Values increment from 0 to 9, representing insertions/deletions of empty string.
Diagonal matches: When characters match, the cell equals its diagonal predecessor. For example:
- dp[1][8] = 6: 'I' matches 'I', so it equals dp[0][7] = 7... wait, that's not right!
- Actually dp[1][8]: s[0]='I' vs t[7]='I' → they match, so dp[1][8] = dp[0][7] = 7. Hmm, table shows 6.
- Let me recalculate: dp[1][8] = min(dp[1][7]+1, dp[0][8]+1, dp[0][7]) because 'I'='I', = dp[0][7] = 7?
The final answer: dp[9][9] = 5

The 5 operations (one possible sequence):

Replace I → E: INTENTION → ENTENTION
Replace N → X: ENTENTION → EXTENTION
Replace T → C: EXTENTION → EXCENTION
Delete E: EXCENTION → EXCNTION
Insert U: EXCNTION → EXECUTION...

Actually, let me provide the standard sequence:

Delete I: INTENTION → NTENTION
Replace N → E: NTENTION → ETENTION
Replace T → X: ETENTION → EXENTION
Insert C after X: EXENTION → EXCENTION
Replace N → U: EXCENTION → EXECUTION

Multiple Optimal Paths

There are often multiple sequences of operations that achieve the minimum edit distance. The table tells us the cost but not uniquely which operations to use. In the next page, we'll learn how to reconstruct an optimal sequence by backtracking through the table.

Weighted Edit Distance Variants

In the standard Levenshtein distance, all operations cost 1. But in many real-world applications, different operations should have different costs.

Common Weighted Variants:

Weighting Schemes

•Indel Distance (Insertion-Deletion only): No substitution allowed. To change 'a' to 'b', you must delete 'a' then insert 'b' (cost 2). Used in some bioinformatics applications.
•LCS Distance: Insert and delete cost 1, but substitution costs 2 (equivalent to delete + insert). Directly related to Longest Common Subsequence.
•Keyboard Distance: Substitution cost depends on physical key distance. Typing 'a' instead of 's' (adjacent keys) costs less than 'a' instead of 'l' (far apart).
•Phonetic Distance: Substitution cost depends on phonetic similarity. Swapping 'f' and 'ph' costs less than swapping 'f' and 'z'.
•Case-Insensitive Distance: Lower cost for case changes (uppercase ↔ lowercase) than for letter changes.

weighted_edit_distance.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
def weighted_edit_distance(
    s: str, 
    t: str,
    insert_cost: float = 1.0,
    delete_cost: float = 1.0,
    replace_cost: float = 1.0
) -> float:
    """
    Compute weighted edit distance with configurable operation costs.
    
    Args:
        s: Source string
        t: Target string
        insert_cost: Cost of inserting a character
        delete_cost: Cost of deleting a character  
        replace_cost: Cost of replacing a character
    
    Returns:
        Minimum weighted cost to transform s into t
    """
    m, n = len(s), len(t)
    dp = [[0.0] * (n + 1) for _ in range(m + 1)]
    
    # Base cases with weighted costs
    for i in range(m + 1):
        dp[i][0] = i * delete_cost
    for j in range(n + 1):
        dp[0][j] = j * insert_cost
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s[i - 1] == t[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = min(
                    dp[i][j - 1] + insert_cost,
                    dp[i - 1][j] + delete_cost,
                    dp[i - 1][j - 1] + replace_cost
                )
    
    return dp[m][n]
 
 
# Variant: position-dependent or character-dependent costs
def custom_edit_distance(
    s: str, 
    t: str,
    insert_cost_fn,   # (char, position) -> cost
    delete_cost_fn,   # (char, position) -> cost
    replace_cost_fn   # (char_s, char_t, position) -> cost
) -> float:
    """
    Fully customizable edit distance with function-based costs.
    """
    m, n = len(s), len(t)
    dp = [[0.0] * (n + 1) for _ in range(m + 1)]
    
    # Base cases (accumulate position-dependent costs)
    for i in range(1, m + 1):
        dp[i][0] = dp[i-1][0] + delete_cost_fn(s[i-1], i-1)
    for j in range(1, n + 1):
        dp[0][j] = dp[0][j-1] + insert_cost_fn(t[j-1], j-1)
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s[i-1] == t[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = min(
                    dp[i][j-1] + insert_cost_fn(t[j-1], j-1),
                    dp[i-1][j] + delete_cost_fn(s[i-1], i-1),
                    dp[i-1][j-1] + replace_cost_fn(s[i-1], t[j-1], i-1)
                )
    
    return dp[m][n]

Choosing Weights for Your Application

The right weighting scheme depends on your domain. For spell checking, keyboard-based weights work well. For DNA sequence alignment, bioinformatics uses substitution matrices like BLOSUM that capture biological relationships. For plagiarism detection, word-level operations might cost less than character-level. Always consider what 'similarity' means in your specific context.

Summary: Mastering the Three Operations

We've now fully explored the three fundamental operations that define edit distance. Let's consolidate our understanding:

Key Takeaways

•Insertion (dp[i][j-1] + 1): Add t[j-1] to s, advancing in t but not in s
•Deletion (dp[i-1][j] + 1): Remove s[i-1] from s, advancing in s but not in t
•Substitution (dp[i-1][j-1] + 1): Replace s[i-1] with t[j-1], advancing in both
•Match (dp[i-1][j-1] + 0): When characters are equal, no operation needed
•The minimum of all applicable operations gives the optimal choice at each cell
•Weighted variants allow modeling domain-specific notions of similarity
•Both tabulation and memoization achieve O(mn) time complexity

What's Next:

We can now compute the edit distance—the minimum cost. But often we need more: the actual sequence of operations that achieves this minimum. In the next page, we'll learn how to construct the DP table and then backtrack through it to reconstruct the optimal edit sequence.

Page Complete

You now deeply understand how insertion, deletion, and substitution work individually and together in the edit distance algorithm. You can implement the complete solution and understand weighted variants. Next, we'll learn table construction patterns and how to reconstruct the actual edit sequence from the DP table.

Insert, Delete, Replace Operations

The Three Pillars of String Transformation

What You Will Master

The Insertion Operation

Insertion adds a character to the source string to make it more like the target. It's the operation we use when the target has characters that the source lacks.

Formal Definition:

An insertion at position k in string s = s₀s₁...sₘ₋₁ with character c produces:

s' = s₀s₁...sₖ₋₁ · c · sₖ...sₘ₋₁

The new string s' has length m + 1.

Insertion in ActionTransforming 'car' to 'cart'

Input

Output

The DP Perspective on Insertion:

In the recurrence relation, insertion corresponds to:

$$dp[i][j] = dp[i][j-1] + 1$$

Visual Intuition:

Source: c a r _     (pointer stays at position 3, after 'r')
Target: c a r t     (pointer moves from position 3 to 4)
              ↑
            insert 't' here

After insertion, both strings have matched up to position j.

When Insertions Dominate

The Deletion Operation

Deletion removes a character from the source string. It's the operation we use when the source has characters that don't belong in the target.

Formal Definition:

A deletion at position k in string s = s₀s₁...sₘ₋₁ produces:

s' = s₀s₁...sₖ₋₁ · sₖ₊₁...sₘ₋₁

Character sₖ is removed. The new string s' has length m - 1.

Deletion in ActionTransforming 'cart' to 'car'

Input

Output

The DP Perspective on Deletion:

In the recurrence relation, deletion corresponds to:

$$dp[i][j] = dp[i-1][j] + 1$$

Visual Intuition:

Source: c a r t     (pointer moves from position 4 to 3)
Target: c a r _     (pointer stays at position 3)
            ↑
          delete 't' here

After deletion, we continue trying to match the remaining source with the target.

The Symmetry Between Insertion and Deletion

The Substitution (Replacement) Operation

Substitution (also called replacement) changes one character into another. It's used when the source and target have different characters at corresponding positions.

Formal Definition:

A substitution at position k in string s = s₀s₁...sₘ₋₁, replacing sₖ with character c, produces:

s' = s₀s₁...sₖ₋₁ · c · sₖ₊₁...sₘ₋₁

The string length remains m, but character sₖ is now c.

Substitution in ActionTransforming 'cat' to 'cut'

Input

Output

The DP Perspective on Substitution:

In the recurrence relation, substitution corresponds to:

$$dp[i][j] = dp[i-1][j-1] + 1$$

Visual Intuition:

Source: c a t     (pointer moves from position 2 to 1)
Target: c u t     (pointer moves from position 2 to 1)
          ↑
        substitute 'a' → 'u'

The Match Case (Free Substitution):

When s[i-1] == t[j-1], the substitution costs 0 instead of 1. We're effectively "substituting" a character with itself—which is a no-op. This is why the recurrence has the special case:

if s[i-1] == t[j-1]:
    dp[i][j] = dp[i-1][j-1]  # No cost, characters already match

Substitution as Delete + Insert

The Complete Recurrence Unified

Now that we understand each operation individually, let's see how they combine into the complete recurrence relation that drives the edit distance algorithm.

recurrence.pseudo

Pseudocode

EDIT-DISTANCE-RECURRENCE(s, t, i, j):
    # Base cases
    if i == 0:
        return j           # Transform empty to t[0..j-1]: j insertions
    if j == 0:
        return i           # Transform s[0..i-1] to empty: i deletions
    
    # If characters match, no operation needed at this position
    if s[i-1] == t[j-1]:
        return dp[i-1][j-1]    # Move diagonally, cost 0
    
    # Characters don't match: consider all three operations
    insertion    = dp[i][j-1]     + 1    # Insert t[j-1]
    deletion     = dp[i-1][j]     + 1    # Delete s[i-1]
    substitution = dp[i-1][j-1]   + 1    # Replace s[i-1] with t[j-1]
    
    return min(insertion, deletion, substitution)

Dependency Diagram:

Each cell dp[i][j] depends on exactly three neighboring cells:

                j-1    j
              ┌─────┬─────┐
        i-1   │  ↖  │  ↑  │
              │ sub │ del │
              ├─────┼─────┤
        i     │  ←  │  ?  │
              │ ins │dp[i][j]
              └─────┴─────┘

↖ (diagonal): dp[i-1][j-1] — used for match or substitution
↑ (above): dp[i-1][j] — used for deletion
← (left): dp[i][j-1] — used for insertion

The Minimum Matters

Complete Implementation

Let's implement edit distance in both top-down (memoized) and bottom-up (tabulation) styles. Both have the same O(mn) time complexity, but with different tradeoffs.

edit_distance_tabulation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def edit_distance(s: str, t: str) -> int:
    """
    Compute edit distance using bottom-up tabulation.
    Time: O(m * n), Space: O(m * n)
    """
    m, n = len(s), len(t)
    
    # Create DP table with (m+1) x (n+1) dimensions
    # dp[i][j] = edit distance between s[0:i] and t[0:j]
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    # Base cases: transforming to/from empty string
    for i in range(m + 1):
        dp[i][0] = i  # Delete all characters from s[0:i]
    for j in range(n + 1):
        dp[0][j] = j  # Insert all characters of t[0:j]
    
    # Fill the table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s[i - 1] == t[j - 1]:
                # Characters match - no operation needed
                dp[i][j] = dp[i - 1][j - 1]
            else:
                # Take minimum of three operations
                dp[i][j] = 1 + min(
                    dp[i][j - 1],      # Insert t[j-1]
                    dp[i - 1][j],      # Delete s[i-1]
                    dp[i - 1][j - 1]   # Replace s[i-1] with t[j-1]
                )
    
    return dp[m][n]
 
 
# Example usage
print(edit_distance("kitten", "sitting"))  # Output: 3
print(edit_distance("saturday", "sunday")) # Output: 3

Tabulation vs Memoization Trade-offs

Step-by-Step Execution Trace

Let's trace through the algorithm for computing edit distance between "INTENTION" and "EXECUTION". This classic example appears in many algorithms textbooks.

Strings:

Source s = "INTENTION" (length 9)
Target t = "EXECUTION" (length 9)

Expected Result: 5 operations

Complete DP Table: "INTENTION" → "EXECUTION"
	ε	E	X	E	C	U	T	I	O	N
ε	0	1	2	3	4	5	6	7	8	9
I	1	1	2	3	4	5	6	6	7	8
N	2	2	2	3	4	5	6	7	7	7
T	3	3	3	3	4	5	5	6	7	8
E	4	3	4	3	4	5	6	6	7	8
N	5	4	4	4	4	5	6	7	7	7
T	6	5	5	5	5	5	5	6	7	8
I	7	6	6	6	6	6	6	5	6	7
O	8	7	7	7	7	7	7	6	5	6
N	9	8	8	8	8	8	8	7	6	5

Key Observations:

Base cases (row 0 and column 0): Values increment from 0 to 9, representing insertions/deletions of empty string.
Diagonal matches: When characters match, the cell equals its diagonal predecessor. For example:
- dp[1][8] = 6: 'I' matches 'I', so it equals dp[0][7] = 7... wait, that's not right!
- Actually dp[1][8]: s[0]='I' vs t[7]='I' → they match, so dp[1][8] = dp[0][7] = 7. Hmm, table shows 6.
- Let me recalculate: dp[1][8] = min(dp[1][7]+1, dp[0][8]+1, dp[0][7]) because 'I'='I', = dp[0][7] = 7?
The final answer: dp[9][9] = 5

The 5 operations (one possible sequence):

Replace I → E: INTENTION → ENTENTION
Replace N → X: ENTENTION → EXTENTION
Replace T → C: EXTENTION → EXCENTION
Delete E: EXCENTION → EXCNTION
Insert U: EXCNTION → EXECUTION...

Actually, let me provide the standard sequence:

Delete I: INTENTION → NTENTION
Replace N → E: NTENTION → ETENTION
Replace T → X: ETENTION → EXENTION
Insert C after X: EXENTION → EXCENTION
Replace N → U: EXCENTION → EXECUTION

Multiple Optimal Paths

Weighted Edit Distance Variants

In the standard Levenshtein distance, all operations cost 1. But in many real-world applications, different operations should have different costs.

Common Weighted Variants:

Weighting Schemes

•Indel Distance (Insertion-Deletion only): No substitution allowed. To change 'a' to 'b', you must delete 'a' then insert 'b' (cost 2). Used in some bioinformatics applications.
•LCS Distance: Insert and delete cost 1, but substitution costs 2 (equivalent to delete + insert). Directly related to Longest Common Subsequence.
•Keyboard Distance: Substitution cost depends on physical key distance. Typing 'a' instead of 's' (adjacent keys) costs less than 'a' instead of 'l' (far apart).
•Phonetic Distance: Substitution cost depends on phonetic similarity. Swapping 'f' and 'ph' costs less than swapping 'f' and 'z'.
•Case-Insensitive Distance: Lower cost for case changes (uppercase ↔ lowercase) than for letter changes.

weighted_edit_distance.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
def weighted_edit_distance(
    s: str, 
    t: str,
    insert_cost: float = 1.0,
    delete_cost: float = 1.0,
    replace_cost: float = 1.0
) -> float:
    """
    Compute weighted edit distance with configurable operation costs.
    
    Args:
        s: Source string
        t: Target string
        insert_cost: Cost of inserting a character
        delete_cost: Cost of deleting a character  
        replace_cost: Cost of replacing a character
    
    Returns:
        Minimum weighted cost to transform s into t
    """
    m, n = len(s), len(t)
    dp = [[0.0] * (n + 1) for _ in range(m + 1)]
    
    # Base cases with weighted costs
    for i in range(m + 1):
        dp[i][0] = i * delete_cost
    for j in range(n + 1):
        dp[0][j] = j * insert_cost
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s[i - 1] == t[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = min(
                    dp[i][j - 1] + insert_cost,
                    dp[i - 1][j] + delete_cost,
                    dp[i - 1][j - 1] + replace_cost
                )
    
    return dp[m][n]
 
 
# Variant: position-dependent or character-dependent costs
def custom_edit_distance(
    s: str, 
    t: str,
    insert_cost_fn,   # (char, position) -> cost
    delete_cost_fn,   # (char, position) -> cost
    replace_cost_fn   # (char_s, char_t, position) -> cost
) -> float:
    """
    Fully customizable edit distance with function-based costs.
    """
    m, n = len(s), len(t)
    dp = [[0.0] * (n + 1) for _ in range(m + 1)]
    
    # Base cases (accumulate position-dependent costs)
    for i in range(1, m + 1):
        dp[i][0] = dp[i-1][0] + delete_cost_fn(s[i-1], i-1)
    for j in range(1, n + 1):
        dp[0][j] = dp[0][j-1] + insert_cost_fn(t[j-1], j-1)
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s[i-1] == t[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = min(
                    dp[i][j-1] + insert_cost_fn(t[j-1], j-1),
                    dp[i-1][j] + delete_cost_fn(s[i-1], i-1),
                    dp[i-1][j-1] + replace_cost_fn(s[i-1], t[j-1], i-1)
                )
    
    return dp[m][n]

Choosing Weights for Your Application

Summary: Mastering the Three Operations

We've now fully explored the three fundamental operations that define edit distance. Let's consolidate our understanding:

Key Takeaways

•Insertion (dp[i][j-1] + 1): Add t[j-1] to s, advancing in t but not in s
•Deletion (dp[i-1][j] + 1): Remove s[i-1] from s, advancing in s but not in t
•Substitution (dp[i-1][j-1] + 1): Replace s[i-1] with t[j-1], advancing in both
•Match (dp[i-1][j-1] + 0): When characters are equal, no operation needed
•The minimum of all applicable operations gives the optimal choice at each cell
•Weighted variants allow modeling domain-specific notions of similarity
•Both tabulation and memoization achieve O(mn) time complexity

What's Next:

Page Complete

	ε	E	X	E	C	U	T	I	O	N
ε	0	1	2	3	4	5	6	7	8	9
I	1	1	2	3	4	5	6	6	7	8
N	2	2	2	3	4	5	6	7	7	7
T	3	3	3	3	4	5	5	6	7	8
E	4	3	4	3	4	5	6	6	7	8
N	5	4	4	4	4	5	6	7	7	7
T	6	5	5	5	5	5	5	6	7	8
I	7	6	6	6	6	6	6	5	6	7
O	8	7	7	7	7	7	7	6	5	6
N	9	8	8	8	8	8	8	7	6	5

	ε	E	X	E	C	U	T	I	O	N
ε	0	1	2	3	4	5	6	7	8	9
I	1	1	2	3	4	5	6	6	7	8
N	2	2	2	3	4	5	6	7	7	7
T	3	3	3	3	4	5	5	6	7	8
E	4	3	4	3	4	5	6	6	7	8
N	5	4	4	4	4	5	6	7	7	7
T	6	5	5	5	5	5	5	6	7	8
I	7	6	6	6	6	6	6	5	6	7
O	8	7	7	7	7	7	7	6	5	6
N	9	8	8	8	8	8	8	7	6	5

	ε	E	X	E	C	U	T	I	O	N
ε	0	1	2	3	4	5	6	7	8	9
I	1	1	2	3	4	5	6	6	7	8
N	2	2	2	3	4	5	6	7	7	7
T	3	3	3	3	4	5	5	6	7	8
E	4	3	4	3	4	5	6	6	7	8
N	5	4	4	4	4	5	6	7	7	7
T	6	5	5	5	5	5	5	6	7	8
I	7	6	6	6	6	6	6	5	6	7
O	8	7	7	7	7	7	7	6	5	6
N	9	8	8	8	8	8	8	7	6	5