Loading content...
While frequency counting teaches us to view strings as unordered collections, many problems require us to understand strings as ordered sequences with structure. This page explores three fundamental concepts that capture positional relationships within strings: prefixes, suffixes, and substrings.
These aren't just vocabulary words—they're the building blocks of virtually every string algorithm. Pattern matching algorithms like KMP and Boyer-Moore reason about prefix and suffix overlaps. Substring search is the foundation of text editors, search engines, and genomic analysis. Suffix structures (suffix trees, suffix arrays) power some of the most efficient text processing algorithms known.
Before you can understand these advanced algorithms, you must deeply internalize what prefixes, suffixes, and substrings are, how they relate, and how to reason about them.
By the end of this page, you will have precise definitions of prefix, suffix, and substring with clear mental models for each. You'll understand how these concepts relate to each other and to the full string. You'll be able to reason about the number of possible prefixes, suffixes, and substrings. And you'll recognize how these structures appear in classic string problems.
A prefix of a string is any leading portion of the string, starting from the first character. Think of it as "the first k characters" for some k.
Formal Definition:
Given a string S of length n, a prefix of S is any string P such that:
P = S[0..k] for some 0 ≤ k ≤ nS starts with PExample: For S = "algorithm":
| k | Prefix S[0..k] |
|---|---|
| 0 | "" (empty string) |
| 1 | "a" |
| 2 | "al" |
| 3 | "alg" |
| 4 | "algo" |
| 5 | "algor" |
| 6 | "algori" |
| 7 | "algorit" |
| 8 | "algorith" |
| 9 | "algorithm" (the full string) |
A string of length n has exactly n + 1 prefixes (including the empty string and the full string itself). Each integer from 0 to n corresponds to exactly one prefix of that length. This is a small, finite set—critical for algorithms that precompute prefix-based information.
Key Properties of Prefixes:
Nesting: Prefixes are nested. Every prefix of length k is also a prefix of every prefix of length > k. If "al" is a prefix, so is "a".
Unique per length: There is exactly one prefix of each length from 0 to n.
Two special cases:
Proper prefix: A prefix strictly shorter than the full string is called a proper prefix. There are exactly n proper prefixes (excluding the full string).
A suffix is the mirror concept to prefix: any trailing portion of the string, ending at the last character. Think of it as "the last k characters" for some k.
Formal Definition:
Given a string S of length n, a suffix of S is any string T such that:
T = S[j..n] for some 0 ≤ j ≤ nS ends with TExample: For S = "algorithm":
| j | Suffix S[j..n] |
|---|---|
| 0 | "algorithm" (the full string) |
| 1 | "lgorithm" |
| 2 | "gorithm" |
| 3 | "orithm" |
| 4 | "rithm" |
| 5 | "ithm" |
| 6 | "thm" |
| 7 | "hm" |
| 8 | "m" |
| 9 | "" (empty string) |
Just like prefixes, a string of length n has exactly n + 1 suffixes. The suffix of length k starts at position n - k. This symmetry is profound: many prefix-based algorithms have suffix-based counterparts, and suffix structures are among the most powerful tools in text processing.
Key Properties of Suffixes:
Nesting (in reverse): Suffixes nest in the opposite direction. Every suffix of length k is also a suffix of every suffix of length > k.
Unique per length: There is exactly one suffix of each length from 0 to n.
Special cases:
Proper suffix: A suffix strictly shorter than the full string is called a proper suffix.
The Prefix-Suffix Duality:
Reversing a string swaps prefixes and suffixes. If P is a prefix of S, then reverse(P) is a suffix of reverse(S). This duality means algorithms can work on either end—choosing based on which direction is more convenient.
A substring is any contiguous sequence of characters within a string—not just from the beginning (prefix) or end (suffix), but from anywhere to anywhere.
Formal Definition:
Given a string S of length n, a substring of S is any string T such that:
T = S[i..j] for some 0 ≤ i ≤ j ≤ nT is a contiguous portion of SKey Insight: Every prefix is a substring (from 0). Every suffix is a substring (to n). But substrings also include everything in between.
Example: For S = "abcde", some substrings:
| Start | End | Substring |
|---|---|---|
| 0 | 0 | "" (empty) |
| 0 | 3 | "abc" (prefix) |
| 2 | 5 | "cde" (suffix) |
| 1 | 4 | "bcd" (middle) |
| 2 | 3 | "c" (single char) |
A string of length n has O(n²) possible substrings. Precisely: n(n+1)/2 + 1 substrings (including the empty string). For "abcde" (n=5): 5×6/2 + 1 = 16 substrings. This quadratic growth is why naive substring enumeration is expensive—and why clever algorithms are needed for substring problems at scale.
Key Properties of Substrings:
Defined by two indices: A substring needs both a start and end position (or start and length).
Subsumes prefixes and suffixes: Every prefix S[0..k] and every suffix S[j..n] is a substring.
Contiguity is essential: The characters must be consecutive. "ace" is NOT a substring of "abcde" (skips 'b' and 'd')—that's a subsequence, a different concept.
Quadratic count: There are Θ(n²) substrings, making exhaustive enumeration expensive.
Substring vs Subsequence:
This distinction is critical and often confused:
"bcd" is a substring of "abcde""ace" is a subsequence of "abcde"Substring problems are typically easier (O(n²) substrings vs 2^n subsequences).
Contiguous: characters must be adjacent in the original string. "bcd" from "abcde". O(n²) possible substrings. Easier to enumerate and process.
Non-contiguous: characters maintain order but can skip. "ace" from "abcde". O(2^n) possible subsequences. Much harder to enumerate.
Understanding how prefixes, suffixes, and substrings relate to each other deepens your ability to reason about string structure.
Set Relationships:
Substrings ⊃ Prefixes ∪ Suffixes
All prefixes are substrings. All suffixes are substrings. But not all substrings are prefixes or suffixes.
For S = "abcd":
Notice that prefixes + suffixes ≠ substrings (there's overlap at "" and the full string, plus unique middle substrings like "b", "c", "bc").
The Powerful Intersection: Prefix That Is Also a Suffix
One of the most important concepts in string algorithms is finding prefixes that are also suffixes (excluding the trivial case of the full string). This is called the border of a string.
Definition: A border of string S is a proper prefix that is also a suffix.
Example: For S = "abcab":
The longest border is "ab" (length 2).
Why borders matter:
When searching for a pattern P in text T, if we find a partial match then fail, we can skip ahead by using the longest border of the matched portion. Instead of restarting from scratch, we leverage the overlapping prefix-suffix structure. This single insight is what makes KMP O(n + m) instead of O(nm).
| Concept | Count | Formula | Example (n=5) |
|---|---|---|---|
| Prefixes | n + 1 | One per length 0..n | 6 |
| Suffixes | n + 1 | One per length 0..n | 6 |
| Substrings | n(n+1)/2 + 1 | Quadratic | 16 |
| Proper Prefixes | n | Excluding full string | 5 |
| Proper Suffixes | n | Excluding full string | 5 |
| Unique Substrings | ≤ n(n+1)/2 + 1 | May have duplicates | ≤ 16 |
To work effectively with prefixes, suffixes, and substrings, you need a clear mental model of indexing. Let's establish precise conventions.
Convention: 0-Based Indexing
Most programming languages use 0-based indexing:
For S = "hello" (n = 5):
Index: 0 1 2 3 4
Char: h e l l o
Substring Notation: S[i:j] (Exclusive End)
We use half-open intervals: S[i:j] means characters from index i up to (but not including) j.
S[0:3] = "hel" (indices 0, 1, 2)S[2:5] = "llo" (indices 2, 3, 4)S[i:i] = "" (empty string, i to i exclusive is nothing)Half-open intervals have elegant properties: the length is simply j - i, adjacent slices share an endpoint (S[0:k] + S[k:n] = S[0:n]), and empty ranges are naturally represented (i == j). This convention is used by Python, Go, and many algorithms textbooks.
Expressing Prefixes and Suffixes:
Using half-open notation:
S[0:k]S[n-k:n]S[j:n]Example: S = "algorithm" (n = 9)
| Description | Expression | Result |
|---|---|---|
| First 4 chars (prefix) | S[0:4] | "algo" |
| Last 3 chars (suffix) | S[6:9] | "thm" |
| Middle 5 chars | S[2:7] | "gorit" |
| Empty | S[4:4] | "" |
| Full string | S[0:9] | "algorithm" |
Length Calculations:
S[i:j] = j - iS[0:k] = kS[j:n] = n - jThese formulas should become automatic. When solving string problems, you'll constantly translate between "the first k characters" and S[0:k], or "starting from position j" and S[j:n].
Bounds Checking:
Valid indices satisfy:
0 ≤ i ≤ j ≤ nOut-of-bounds access is a common bug. Train yourself to verify:
i non-negative?j at most n?i ≤ j?Many algorithms precompute information about prefixes and suffixes for efficient processing. Understanding these "functions" is crucial for pattern matching and other advanced algorithms.
The Prefix Function (Failure Function)
For each position i in a string S, the prefix function π[i] gives the length of the longest proper prefix of S[0:i+1] that is also a suffix of S[0:i+1].
In simpler terms: π[i] answers "What's the longest border of the prefix ending at i?"
Example: For S = "abcabc":
| i | Prefix S[0:i+1] | Longest Border | π[i] |
|---|---|---|---|
| 0 | "a" | "" (no proper prefix) | 0 |
| 1 | "ab" | "" | 0 |
| 2 | "abc" | "" | 0 |
| 3 | "abca" | "a" (prefix and suffix) | 1 |
| 4 | "abcab" | "ab" | 2 |
| 5 | "abcabc" | "abc" | 3 |
When the KMP algorithm has matched the first i characters of a pattern but then fails, π[i-1] tells it exactly how many characters still match after the shift. Instead of restarting from 0, it resumes from π[i-1]. This turns O(nm) naive search into O(n + m).
The Z-Function (Z-Array)
An alternative to the prefix function. For each position i, Z[i] gives the length of the longest substring starting at i that matches a prefix of the string.
Example: For S = "aabaaab":
| i | S[i:] starts as | Matches prefix | Z[i] |
|---|---|---|---|
| 0 | "aabaaab" | (undefined/full match) | 7 |
| 1 | "abaaab" | "a" matches "a" | 1 |
| 2 | "baaab" | "" (b ≠ a) | 0 |
| 3 | "aaab" | "aa" matches "aa" | 2 |
| 4 | "aab" | "aab" matches "aab" | 3 |
| 5 | "ab" | "a" matches "a" | 1 |
| 6 | "b" | "" (b ≠ a) | 0 |
The Z-function can be computed in O(n) time and is useful for pattern matching and repetition detection.
Understanding the computational cost of prefix/suffix/substring operations helps you analyze algorithm complexity.
Extracting a Substring: S[i:j]
Checking if P is a Prefix of S
Checking if P is a Suffix of S
Checking if P is a Substring of S (Naive)
| Operation | Naive Time | Optimal Time | Space |
|---|---|---|---|
| Extract substring S[i:j] | O(j-i) | O(j-i) | O(j-i) |
| Check if P is prefix | O(|P|) | O(|P|) | O(1) |
| Check if P is suffix | O(|P|) | O(|P|) | O(1) |
| Find P in S (substring search) | O(|S| × |P|) | O(|S| + |P|) | O(|P|) |
| Find all occurrences of P in S | O(|S| × |P|) | O(|S| + |P|) | O(|P|) |
| Longest common prefix of 2 strings | O(min(|S|,|T|)) | O(min(|S|,|T|)) | O(1) |
| Enumerate all substrings | O(n²) | O(n²) | O(n²) |
In languages with immutable strings (Java, Python, JavaScript), every substring extraction creates a new string object. What looks like a simple operation can trigger O(n) allocation and copying. This matters when extracting many substrings in a loop—consider using indices instead of creating actual substring copies.
Prefixes, suffixes, and substrings appear in recurring problem patterns. Recognizing these patterns accelerates problem-solving.
Pattern 1: Longest Common Prefix (LCP)
Find the longest string that is a prefix of all given strings.
Approach: Compare character by character, stopping at first mismatch. For k strings of average length m, this is O(k × m) in the worst case.
Use case: Autocomplete prefix matching, directory path comparison.
Pattern 2: Longest Common Suffix
Mirror of LCP—find the longest shared ending.
Approach: Reverse the problem or compare from the end.
Use case: File extension extraction, domain matching.
Pattern 3: Substring Search
Does string T contain pattern P as a substring?
Naive approach: O(|T| × |P|) Optimal approach: O(|T| + |P|) using KMP, Z-algorithm, Rabin-Karp
Use case: Text editors, grep, database LIKE queries.
Pattern 4: All Occurrences
Find every position where pattern P occurs in T.
Approach: Build a suffix array or use KMP/Z-function for O(n + m + k) where k is the number of occurrences.
Pattern 5: Longest Repeated Substring
Find the longest substring that appears at least twice.
Approach: Suffix array + LCP array gives O(n) solution after O(n) or O(n log n) construction.
Use case: Compression, plagiarism detection.
A classic insight: string A is a rotation of string B if and only if A is a substring of B concatenated with itself (B+B). For example, "defabc" is a rotation of "abcdef" because "defabc" appears in "abcdefabcdef". This reduces rotation checking to substring search.
We've explored the fundamental anatomy of strings—prefixes, suffixes, and substrings. These concepts are the vocabulary of string algorithms and the foundation for advanced techniques like KMP, suffix arrays, and suffix trees.
With frequency counting (compositional view) and prefix/suffix/substring (positional view), you have two complementary lenses for analyzing strings. In the next page, we'll introduce the two-pointer technique applied to strings—a powerful tool for problems involving pairs, palindromes, and range-based queries.
You now understand the structural building blocks of strings: prefixes, suffixes, and substrings. These concepts will appear constantly as you study pattern matching, suffix structures, and advanced string algorithms. Next, we move to the two-pointer technique for strings.