Loading learning content...
How do you quantify how different two strings are? This deceptively simple question lies at the heart of some of the most important applications in computer science—from the spell checker correcting your typos to the genetic sequencing algorithms that revolutionized modern medicine.
When you type "teh" and your word processor suggests "the," when a search engine understands that "restaraunt" means "restaurant," when DNA sequences are compared to identify mutations or evolutionary relationships—behind all of these is a precise mathematical framework for measuring string similarity. That framework is edit distance.
By the end of this page, you will understand what edit distance means mathematically, why it's defined the way it is, how it captures our intuitive notion of string similarity, and why it's one of the most important problems in string dynamic programming. You'll see the formal definition, develop deep intuition for what the metric represents, and understand why this problem is ideally suited for dynamic programming.
Edit distance (also known as Levenshtein distance after Soviet mathematician Vladimir Levenshtein, who defined it in 1965) is the minimum number of single-character edits required to transform one string into another.
The elegance of this definition lies in its simplicity: we're counting the fewest changes needed to make two strings identical. But what exactly counts as an "edit"?
Each of these operations has a cost of 1 in the standard Levenshtein distance. The edit distance between two strings is the minimum total cost to transform one into the other.
Formal Definition:
Given two strings s of length m and t of length n, the edit distance d(s, t) is defined as:
$$d(s, t) = \min{\text{cost of any sequence of edits transforming } s \text{ into } t}$$
This minimum is taken over all possible sequences of insertions, deletions, and substitutions.
These three operations capture all possible single-character modifications. Any transformation between strings can be decomposed into a sequence of these atomic operations. Insertion adds information, deletion removes information, and substitution changes information. Together, they form a complete basis for string transformation.
Let's build intuition by working through several examples, progressively increasing in complexity.
Why is the minimum important?
For any pair of strings, there are infinitely many edit sequences that transform one into the other. You could always make unnecessary edits. For example, to transform "cat" into "bat," you could:
But this is clearly wasteful. The edit distance captures the minimum cost, reflecting the "true" difference between strings.
Two strings may have multiple different edit sequences that achieve the minimum cost. For example, transforming "abc" to "adc" could be done by substituting 'b' → 'd', or by deleting 'b' and inserting 'd'. Both have cost 1 (substitution) or cost 2 (delete + insert), so the substitution is optimal. When multiple sequences have the same minimum cost, they're all optimal solutions.
Edit distance isn't just a useful metric—it's a proper mathematical distance function (a metric). This gives it powerful properties that make it suitable for many applications.
Why Do These Properties Matter?
Because edit distance is a proper metric, we can:
The symmetry property is particularly important for applications: it doesn't matter which string we consider the "source" and which the "target."
For strings of lengths m and n, the edit distance is at most max(m, n). This is because we can always transform s into t by deleting all characters from s (cost m) and inserting all characters of t (cost n), giving total cost m + n. But we can do better: delete all of s (cost m) gives us the empty string, which needs n insertions to become t, but if m < n we should insert n characters instead of deleting m and inserting n. The maximum is achieved when the strings have no common structure.
Given how simple the definition is—just count the minimum edits—you might expect a straightforward algorithm. But computing edit distance naively is computationally expensive.
The Brute Force Approach:
To find the minimum, we could enumerate all possible edit sequences. But how many are there?
Consider transforming a string of length m into a string of length n. At each step, we have multiple choices:
The number of possible sequences grows exponentially. For the classic "kitten" to "sitting" example, there are thousands of possible edit sequences to explore.
123456789101112131415161718192021222324
def naive_edit_distance(s, t): """ Naive recursive edit distance - exponential time complexity! This demonstrates the approach but is impractical for real use. """ # Base cases if len(s) == 0: return len(t) # Insert all characters of t if len(t) == 0: return len(s) # Delete all characters of s # If last characters match, no edit needed for them if s[-1] == t[-1]: return naive_edit_distance(s[:-1], t[:-1]) # Otherwise, try all three operations and take minimum insert = naive_edit_distance(s, t[:-1]) # Insert t[-1] delete = naive_edit_distance(s[:-1], t) # Delete s[-1] replace = naive_edit_distance(s[:-1], t[:-1]) # Replace s[-1] with t[-1] return 1 + min(insert, delete, replace) # This works but is EXTREMELY slow!# Time complexity: O(3^(m+n)) - exponential!Exponential Explosion:
For strings of length 10, the naive algorithm might explore millions of subproblems. For length 100, it's computationally intractable—the heat death of the universe would occur before the algorithm completes.
But there's a saving grace: look at the recursive calls. We're computing naive_edit_distance(s[:-1], t[:-1]), naive_edit_distance(s, t[:-1]), etc. These are the same subproblems over and over!
If we have strings of length m and n, there are only m × n unique subproblems (each pair of prefixes). But the naive recursion explores exponentially many paths through these subproblems, recomputing the same answers repeatedly.
Edit distance exhibits both hallmarks of dynamic programming: overlapping subproblems (the same prefix pairs are computed repeatedly) and optimal substructure (the optimal solution to the whole problem is built from optimal solutions to subproblems). This makes it a perfect candidate for DP optimization, reducing time complexity from exponential to polynomial—specifically O(m × n).
To solve edit distance efficiently with DP, we need to understand its recursive structure precisely. Let's define the problem in terms of subproblems.
State Definition:
Let dp[i][j] = the edit distance between the first i characters of string s (i.e., s[0..i-1]) and the first j characters of string t (i.e., t[0..j-1]).
Our goal is to compute dp[m][n] where m = len(s) and n = len(t).
Base Cases:
dp[0][j] = j: Transforming an empty string to the first j characters of t requires j insertions.dp[i][0] = i: Transforming the first i characters of s to an empty string requires i deletions.Recurrence Relation:
For dp[i][j] where i > 0 and j > 0, we consider two cases:
Case 1: The characters match (s[i-1] == t[j-1])
If the current characters match, no edit is needed for this position. The edit distance equals the cost of transforming the remaining prefixes:
$$dp[i][j] = dp[i-1][j-1]$$
Case 2: The characters don't match (s[i-1] ≠ t[j-1])
We must make an edit. We consider all three operations and take the minimum:
$$dp[i][j] = 1 + \min(dp[i][j-1], dp[i-1][j], dp[i-1][j-1])$$
Where:
dp[i][j-1] + 1: Insert t[j-1] at position i in s, then solve for s[0..i-1] vs t[0..j-2]dp[i-1][j] + 1: Delete s[i-1], then solve for s[0..i-2] vs t[0..j-1]dp[i-1][j-1] + 1: Substitute s[i-1] with t[j-1], then solve for s[0..i-2] vs t[0..j-2]Think of it this way: we're aligning the two strings from right to left. At each position, if characters match, we just move left in both strings. If they don't match, we try all ways to fix the mismatch (insert, delete, substitute) and take the cheapest. The DP table records the optimal cost for each prefix pair so we never recompute.
The Complete Recurrence:
dp[i][j] =
if i == 0: j
else if j == 0: i
else if s[i-1] == t[j-1]: dp[i-1][j-1]
else: 1 + min(
dp[i][j-1], // insert
dp[i-1][j], // delete
dp[i-1][j-1] // substitute
)
Dependency Structure:
Each cell dp[i][j] depends on:
dp[i-1][j-1] (diagonal)dp[i-1][j] (above)dp[i][j-1] (left)This means we can fill the table row by row, left to right, or column by column, top to bottom. All dependencies will be satisfied.
Let's trace through a complete example to see how the DP table is constructed. We'll compute the edit distance between "cat" and "cut".
| ε | c | u | t | |
|---|---|---|---|---|
| ε | 0 | 1 | 2 | 3 |
| c | 1 | 0 | 1 | 2 |
| a | 2 | 1 | 1 | 2 |
| t | 3 | 2 | 2 | 1 |
Reading the table:
Trace of key cells:
dp[1][1]: 'c' == 'c', so dp[1][1] = dp[0][0] = 0 (characters match)dp[2][2]: 'a' ≠ 'u', so dp[2][2] = 1 + min(dp[1][2], dp[2][1], dp[1][1]) = 1 + min(1, 1, 0) = 1dp[3][3]: 't' == 't', so dp[3][3] = dp[2][2] = 1 (characters match)The edit distance is 1. This corresponds to substituting 'a' → 'u' to transform "cat" → "cut".
Every path from the top-left corner to the bottom-right corner represents a sequence of edit operations. Diagonal moves with no cost increase mean matching characters. Diagonal moves with +1 mean substitution. Horizontal moves mean insertion. Vertical moves mean deletion. The optimal path follows the cells that led to the minimum values.
Edit distance is not just an academic exercise—it's one of the most widely applied algorithms in computer science. Understanding its importance helps motivate the careful study we'll undertake in the following pages.
Historical Significance:
Vladimir Levenshtein introduced this distance in 1965 while studying error-correcting codes. The algorithm to compute it efficiently was soon developed, and it became a cornerstone of string processing algorithms.
The DP solution for edit distance is considered one of the classic examples of dynamic programming—simple enough to teach the paradigm, yet representative of real algorithmic challenges. Nearly every algorithms course covers it, and it appears frequently in technical interviews.
Understanding edit distance provides the mental framework for tackling many other string DP problems. The pattern of building a 2D table over two string indices, with transitions based on character comparisons, recurs in problems like longest common subsequence, longest palindromic subsequence, regular expression matching, and more. Master edit distance, and you've mastered a fundamental pattern.
We've established the conceptual foundation for edit distance. Let's consolidate the key insights:
dp[i][j] = edit distance between first i characters of s and first j characters of t.dp[m][n] where m and n are the string lengths.What's Next:
We've understood what edit distance is. In the next page, we'll dive deep into the three fundamental operations—insertion, deletion, and substitution—understanding how each contributes to the solution, exploring weighted variants, and examining how the choice of operation costs affects the optimal solution.
You now understand the edit distance problem at a conceptual level: its definition, properties, computational challenges, and significance. Next, we'll explore the three operations in detail and see how to implement the complete DP solution.