Common String Patterns - Learning Module

Loading content...

0/276

Prefixes, Suffixes, and Substrings

The Anatomy of Strings

While frequency counting teaches us to view strings as unordered collections, many problems require us to understand strings as ordered sequences with structure. This page explores three fundamental concepts that capture positional relationships within strings: prefixes, suffixes, and substrings.

These aren't just vocabulary words—they're the building blocks of virtually every string algorithm. Pattern matching algorithms like KMP and Boyer-Moore reason about prefix and suffix overlaps. Substring search is the foundation of text editors, search engines, and genomic analysis. Suffix structures (suffix trees, suffix arrays) power some of the most efficient text processing algorithms known.

Before you can understand these advanced algorithms, you must deeply internalize what prefixes, suffixes, and substrings are, how they relate, and how to reason about them.

What You Will Learn

By the end of this page, you will have precise definitions of prefix, suffix, and substring with clear mental models for each. You'll understand how these concepts relate to each other and to the full string. You'll be able to reason about the number of possible prefixes, suffixes, and substrings. And you'll recognize how these structures appear in classic string problems.

Prefixes — The Beginnings

A prefix of a string is any leading portion of the string, starting from the first character. Think of it as "the first k characters" for some k.

Formal Definition:

Given a string S of length n, a prefix of S is any string P such that:

P = S[0..k] for some 0 ≤ k ≤ n
In other words, S starts with P

Example: For S = "algorithm":

k	Prefix S[0..k]
0	`""` (empty string)
1	`"a"`
2	`"al"`
3	`"alg"`
4	`"algo"`
5	`"algor"`
6	`"algori"`
7	`"algorit"`
8	`"algorith"`
9	`"algorithm"` (the full string)

Counting Prefixes

A string of length n has exactly n + 1 prefixes (including the empty string and the full string itself). Each integer from 0 to n corresponds to exactly one prefix of that length. This is a small, finite set—critical for algorithms that precompute prefix-based information.

Key Properties of Prefixes:

Nesting: Prefixes are nested. Every prefix of length k is also a prefix of every prefix of length > k. If "al" is a prefix, so is "a".
Unique per length: There is exactly one prefix of each length from 0 to n.
Two special cases:
- The empty prefix (length 0) is a prefix of every string
- The full string is always a prefix of itself (the trivial prefix)
Proper prefix: A prefix strictly shorter than the full string is called a proper prefix. There are exactly n proper prefixes (excluding the full string).

Where Prefixes Appear in Algorithms

•KMP Pattern Matching: Uses a "failure function" based on longest proper prefix that is also a suffix
•Trie Data Structure: A trie is essentially a tree of all prefixes of a set of strings
•Autocomplete Systems: Suggest words sharing a prefix with user input
•Longest Common Prefix (LCP): Finding the longest prefix shared by a set of strings
•Incremental Parsing: Many parsers process input prefix by prefix
•Version Control: Diffing algorithms often identify matching prefixes before examining differences

Suffixes — The Endings

A suffix is the mirror concept to prefix: any trailing portion of the string, ending at the last character. Think of it as "the last k characters" for some k.

Formal Definition:

Given a string S of length n, a suffix of S is any string T such that:

T = S[j..n] for some 0 ≤ j ≤ n
In other words, S ends with T

Example: For S = "algorithm":

j	Suffix S[j..n]
0	`"algorithm"` (the full string)
1	`"lgorithm"`
2	`"gorithm"`
3	`"orithm"`
4	`"rithm"`
5	`"ithm"`
6	`"thm"`
7	`"hm"`
8	`"m"`
9	`""` (empty string)

Symmetry with Prefixes

Just like prefixes, a string of length n has exactly n + 1 suffixes. The suffix of length k starts at position n - k. This symmetry is profound: many prefix-based algorithms have suffix-based counterparts, and suffix structures are among the most powerful tools in text processing.

Key Properties of Suffixes:

Nesting (in reverse): Suffixes nest in the opposite direction. Every suffix of length k is also a suffix of every suffix of length > k.
Unique per length: There is exactly one suffix of each length from 0 to n.
Special cases:
- The empty suffix (length 0) is a suffix of every string
- The full string is always a suffix of itself
Proper suffix: A suffix strictly shorter than the full string is called a proper suffix.

The Prefix-Suffix Duality:

Reversing a string swaps prefixes and suffixes. If P is a prefix of S, then reverse(P) is a suffix of reverse(S). This duality means algorithms can work on either end—choosing based on which direction is more convenient.

Where Suffixes Appear in Algorithms

•Suffix Tree: A compressed trie of all suffixes—enables O(m) substring search in O(n) space
•Suffix Array: Sorted array of all suffix start positions—space-efficient alternative to suffix trees
•Boyer-Moore Algorithm: Uses suffix-based reasoning for fast pattern matching
•Longest Common Suffix (LCS): Finding the longest suffix shared by multiple strings
•File Extension Matching: Checking if a filename ends with ".txt", ".jpg", etc.
•URL Routing: Web frameworks match URL patterns often by suffix

Substrings — Any Contiguous Portion

A substring is any contiguous sequence of characters within a string—not just from the beginning (prefix) or end (suffix), but from anywhere to anywhere.

Formal Definition:

Given a string S of length n, a substring of S is any string T such that:

T = S[i..j] for some 0 ≤ i ≤ j ≤ n
In other words, T is a contiguous portion of S

Key Insight: Every prefix is a substring (from 0). Every suffix is a substring (to n). But substrings also include everything in between.

Example: For S = "abcde", some substrings:

Start	End	Substring
0	0	`""` (empty)
0	3	`"abc"` (prefix)
2	5	`"cde"` (suffix)
1	4	`"bcd"` (middle)
2	3	`"c"` (single char)

Counting Substrings

A string of length n has O(n²) possible substrings. Precisely: n(n+1)/2 + 1 substrings (including the empty string). For "abcde" (n=5): 5×6/2 + 1 = 16 substrings. This quadratic growth is why naive substring enumeration is expensive—and why clever algorithms are needed for substring problems at scale.

Key Properties of Substrings:

Defined by two indices: A substring needs both a start and end position (or start and length).
Subsumes prefixes and suffixes: Every prefix S[0..k] and every suffix S[j..n] is a substring.
Contiguity is essential: The characters must be consecutive. "ace" is NOT a substring of "abcde" (skips 'b' and 'd')—that's a subsequence, a different concept.
Quadratic count: There are Θ(n²) substrings, making exhaustive enumeration expensive.

Substring vs Subsequence:

This distinction is critical and often confused:

Substring: Contiguous characters — "bcd" is a substring of "abcde"
Subsequence: Characters in order but not necessarily contiguous — "ace" is a subsequence of "abcde"

Substring problems are typically easier (O(n²) substrings vs 2^n subsequences).

Substring

Contiguous: characters must be adjacent in the original string. "bcd" from "abcde". O(n²) possible substrings. Easier to enumerate and process.

Subsequence

Non-contiguous: characters maintain order but can skip. "ace" from "abcde". O(2^n) possible subsequences. Much harder to enumerate.

Relationships Between the Three

Understanding how prefixes, suffixes, and substrings relate to each other deepens your ability to reason about string structure.

Set Relationships:

Substrings ⊃ Prefixes ∪ Suffixes

All prefixes are substrings. All suffixes are substrings. But not all substrings are prefixes or suffixes.

For S = "abcd":

Prefixes: "", "a", "ab", "abc", "abcd" (5 total)
Suffixes: "", "d", "cd", "bcd", "abcd" (5 total)
Substrings: "", "a", "b", "c", "d", "ab", "bc", "cd", "abc", "bcd", "abcd" (11 total)

Notice that prefixes + suffixes ≠ substrings (there's overlap at "" and the full string, plus unique middle substrings like "b", "c", "bc").

The Powerful Intersection: Prefix That Is Also a Suffix

One of the most important concepts in string algorithms is finding prefixes that are also suffixes (excluding the trivial case of the full string). This is called the border of a string.

Definition: A border of string S is a proper prefix that is also a suffix.

Example: For S = "abcab":

Prefixes: "", "a", "ab", "abc", "abca", "abcab"
Suffixes: "", "b", "ab", "cab", "bcab", "abcab"
Intersection (borders): "" and "ab"

The longest border is "ab" (length 2).

Why borders matter:

The KMP algorithm's failure function computes the longest border for each prefix
Borders determine how far a pattern can shift after a mismatch
They're central to efficient pattern matching

The KMP Insight

When searching for a pattern P in text T, if we find a partial match then fail, we can skip ahead by using the longest border of the matched portion. Instead of restarting from scratch, we leverage the overlapping prefix-suffix structure. This single insight is what makes KMP O(n + m) instead of O(nm).

Prefix, Suffix, and Substring Counts for String of Length n
Concept	Count	Formula	Example (n=5)
Prefixes	n + 1	One per length 0..n	6
Suffixes	n + 1	One per length 0..n	6
Substrings	n(n+1)/2 + 1	Quadratic	16
Proper Prefixes	n	Excluding full string	5
Proper Suffixes	n	Excluding full string	5
Unique Substrings	≤ n(n+1)/2 + 1	May have duplicates	≤ 16

Reasoning About Positions

To work effectively with prefixes, suffixes, and substrings, you need a clear mental model of indexing. Let's establish precise conventions.

Convention: 0-Based Indexing

Most programming languages use 0-based indexing:

First character is at index 0
Last character is at index n - 1
String length is n

For S = "hello" (n = 5):

Index:  0  1  2  3  4
Char:   h  e  l  l  o

Substring Notation: S[i:j] (Exclusive End)

We use half-open intervals: S[i:j] means characters from index i up to (but not including) j.

S[0:3] = "hel" (indices 0, 1, 2)
S[2:5] = "llo" (indices 2, 3, 4)
S[i:i] = "" (empty string, i to i exclusive is nothing)

Why Half-Open Intervals?

Half-open intervals have elegant properties: the length is simply j - i, adjacent slices share an endpoint (S[0:k] + S[k:n] = S[0:n]), and empty ranges are naturally represented (i == j). This convention is used by Python, Go, and many algorithms textbooks.

Expressing Prefixes and Suffixes:

Using half-open notation:

Prefix of length k: S[0:k]
Suffix of length k: S[n-k:n]
Suffix starting at index j: S[j:n]

Example: S = "algorithm" (n = 9)

Description	Expression	Result
First 4 chars (prefix)	`S[0:4]`	"algo"
Last 3 chars (suffix)	`S[6:9]`	"thm"
Middle 5 chars	`S[2:7]`	"gorit"
Empty	`S[4:4]`	""
Full string	`S[0:9]`	"algorithm"

Length Calculations:

Length of S[i:j] = j - i
Length of prefix S[0:k] = k
Length of suffix S[j:n] = n - j

These formulas should become automatic. When solving string problems, you'll constantly translate between "the first k characters" and S[0:k], or "starting from position j" and S[j:n].

Bounds Checking:

Valid indices satisfy:

0 ≤ i ≤ j ≤ n

Out-of-bounds access is a common bug. Train yourself to verify:

Is i non-negative?
Is j at most n?
Is i ≤ j?

Prefix and Suffix Functions

Many algorithms precompute information about prefixes and suffixes for efficient processing. Understanding these "functions" is crucial for pattern matching and other advanced algorithms.

The Prefix Function (Failure Function)

For each position i in a string S, the prefix function π[i] gives the length of the longest proper prefix of S[0:i+1] that is also a suffix of S[0:i+1].

In simpler terms: π[i] answers "What's the longest border of the prefix ending at i?"

Example: For S = "abcabc":

i	Prefix S[0:i+1]	Longest Border	π[i]
0	"a"	"" (no proper prefix)	0
1	"ab"	""	0
2	"abc"	""	0
3	"abca"	"a" (prefix and suffix)	1
4	"abcab"	"ab"	2
5	"abcabc"	"abc"	3

Why This Matters for KMP

When the KMP algorithm has matched the first i characters of a pattern but then fails, π[i-1] tells it exactly how many characters still match after the shift. Instead of restarting from 0, it resumes from π[i-1]. This turns O(nm) naive search into O(n + m).

The Z-Function (Z-Array)

An alternative to the prefix function. For each position i, Z[i] gives the length of the longest substring starting at i that matches a prefix of the string.

Example: For S = "aabaaab":

i	S[i:] starts as	Matches prefix	Z[i]
0	"aabaaab"	(undefined/full match)	7
1	"abaaab"	"a" matches "a"	1
2	"baaab"	"" (b ≠ a)	0
3	"aaab"	"aa" matches "aa"	2
4	"aab"	"aab" matches "aab"	3
5	"ab"	"a" matches "a"	1
6	"b"	"" (b ≠ a)	0

The Z-function can be computed in O(n) time and is useful for pattern matching and repetition detection.

Applications of Prefix/Suffix Functions

•Pattern Matching: Both π and Z enable O(n + m) pattern search
•Period Detection: Finding if a string is a repetition of a smaller pattern
•String Compression: Identifying repeating segments for compression
•DNA Sequence Analysis: Finding repeated motifs in genetic sequences
•Plagiarism Detection: Identifying copied text segments efficiently

Common Operations and Their Costs

Understanding the computational cost of prefix/suffix/substring operations helps you analyze algorithm complexity.

Extracting a Substring: S[i:j]

Time: O(j - i) — you must copy (or reference) j - i characters
Space: O(j - i) if creating a new string
Note: Some languages offer O(1) substring views that share memory with the original

Checking if P is a Prefix of S

Time: O(|P|) — compare first |P| characters of S against P
Early termination if mismatch found

Checking if P is a Suffix of S

Time: O(|P|) — compare last |P| characters of S against P

Checking if P is a Substring of S (Naive)

Time: O(|S| × |P|) in worst case — try each starting position
Can be improved to O(|S| + |P|) with KMP, Boyer-Moore, or Rabin-Karp

Substring Operation Complexities
Operation	Naive Time	Optimal Time	Space
Extract substring S[i:j]	O(j-i)	O(j-i)	O(j-i)
Check if P is prefix	O(\|P\|)	O(\|P\|)	O(1)
Check if P is suffix	O(\|P\|)	O(\|P\|)	O(1)
Find P in S (substring search)	O(\|S\| × \|P\|)	O(\|S\| + \|P\|)	O(\|P\|)
Find all occurrences of P in S	O(\|S\| × \|P\|)	O(\|S\| + \|P\|)	O(\|P\|)
Longest common prefix of 2 strings	O(min(\|S\|,\|T\|))	O(min(\|S\|,\|T\|))	O(1)
Enumerate all substrings	O(n²)	O(n²)	O(n²)

Beware Hidden Costs

In languages with immutable strings (Java, Python, JavaScript), every substring extraction creates a new string object. What looks like a simple operation can trigger O(n) allocation and copying. This matters when extracting many substrings in a loop—consider using indices instead of creating actual substring copies.

Problem-Solving Patterns

Prefixes, suffixes, and substrings appear in recurring problem patterns. Recognizing these patterns accelerates problem-solving.

Pattern 1: Longest Common Prefix (LCP)

Find the longest string that is a prefix of all given strings.

Approach: Compare character by character, stopping at first mismatch. For k strings of average length m, this is O(k × m) in the worst case.

Use case: Autocomplete prefix matching, directory path comparison.

Pattern 2: Longest Common Suffix

Mirror of LCP—find the longest shared ending.

Approach: Reverse the problem or compare from the end.

Use case: File extension extraction, domain matching.

Pattern 3: Substring Search

Does string T contain pattern P as a substring?

Naive approach: O(|T| × |P|) Optimal approach: O(|T| + |P|) using KMP, Z-algorithm, Rabin-Karp

Use case: Text editors, grep, database LIKE queries.

Pattern 4: All Occurrences

Find every position where pattern P occurs in T.

Approach: Build a suffix array or use KMP/Z-function for O(n + m + k) where k is the number of occurrences.

Pattern 5: Longest Repeated Substring

Find the longest substring that appears at least twice.

Approach: Suffix array + LCP array gives O(n) solution after O(n) or O(n log n) construction.

Use case: Compression, plagiarism detection.

Pattern Recognition Triggers

•"Find the common beginning..." → Longest Common Prefix
•"Does A contain B?" → Substring Search
•"Find all matches of pattern..." → Pattern Matching (KMP/Z)
•"Longest string that appears multiple times" → Suffix Array + LCP
•"Check if string starts/ends with..." → Direct prefix/suffix check
•"Shortest unique prefix for each word" → Trie-based prefix analysis
•"Can string A be rotated to form B?" → Check if A is substring of B+B

The Rotation Trick

A classic insight: string A is a rotation of string B if and only if A is a substring of B concatenated with itself (B+B). For example, "defabc" is a rotation of "abcdef" because "defabc" appears in "abcdefabcdef". This reduces rotation checking to substring search.

Summary and Key Takeaways

We've explored the fundamental anatomy of strings—prefixes, suffixes, and substrings. These concepts are the vocabulary of string algorithms and the foundation for advanced techniques like KMP, suffix arrays, and suffix trees.

Core Concepts Mastered

•Prefix: First k characters, exactly n+1 prefixes, key to tries and autocomplete
•Suffix: Last k characters, exactly n+1 suffixes, key to suffix structures
•Substring: Any contiguous portion, O(n²) possible, includes all prefixes and suffixes
•Border: Proper prefix that is also a suffix, central to KMP algorithm
•Substring ≠ Subsequence: Substrings are contiguous; subsequences can skip characters
•Half-open intervals: S[i:j] means index i up to (not including) j; length = j - i
•Prefix/Z functions: Precomputed border/match information enabling O(n+m) pattern matching

What's Next

With frequency counting (compositional view) and prefix/suffix/substring (positional view), you have two complementary lenses for analyzing strings. In the next page, we'll introduce the two-pointer technique applied to strings—a powerful tool for problems involving pairs, palindromes, and range-based queries.

Page Complete

You now understand the structural building blocks of strings: prefixes, suffixes, and substrings. These concepts will appear constantly as you study pattern matching, suffix structures, and advanced string algorithms. Next, we move to the two-pointer technique for strings.

Prefixes, Suffixes, and Substrings

The Anatomy of Strings

Before you can understand these advanced algorithms, you must deeply internalize what prefixes, suffixes, and substrings are, how they relate, and how to reason about them.

What You Will Learn

Prefixes — The Beginnings

A prefix of a string is any leading portion of the string, starting from the first character. Think of it as "the first k characters" for some k.

Formal Definition:

Given a string S of length n, a prefix of S is any string P such that:

P = S[0..k] for some 0 ≤ k ≤ n
In other words, S starts with P

Example: For S = "algorithm":

k	Prefix S[0..k]
0	`""` (empty string)
1	`"a"`
2	`"al"`
3	`"alg"`
4	`"algo"`
5	`"algor"`
6	`"algori"`
7	`"algorit"`
8	`"algorith"`
9	`"algorithm"` (the full string)

Counting Prefixes

Key Properties of Prefixes:

Nesting: Prefixes are nested. Every prefix of length k is also a prefix of every prefix of length > k. If "al" is a prefix, so is "a".
Unique per length: There is exactly one prefix of each length from 0 to n.
Two special cases:
- The empty prefix (length 0) is a prefix of every string
- The full string is always a prefix of itself (the trivial prefix)
Proper prefix: A prefix strictly shorter than the full string is called a proper prefix. There are exactly n proper prefixes (excluding the full string).

Where Prefixes Appear in Algorithms

•KMP Pattern Matching: Uses a "failure function" based on longest proper prefix that is also a suffix
•Trie Data Structure: A trie is essentially a tree of all prefixes of a set of strings
•Autocomplete Systems: Suggest words sharing a prefix with user input
•Longest Common Prefix (LCP): Finding the longest prefix shared by a set of strings
•Incremental Parsing: Many parsers process input prefix by prefix
•Version Control: Diffing algorithms often identify matching prefixes before examining differences

Suffixes — The Endings

A suffix is the mirror concept to prefix: any trailing portion of the string, ending at the last character. Think of it as "the last k characters" for some k.

Formal Definition:

Given a string S of length n, a suffix of S is any string T such that:

T = S[j..n] for some 0 ≤ j ≤ n
In other words, S ends with T

Example: For S = "algorithm":

j	Suffix S[j..n]
0	`"algorithm"` (the full string)
1	`"lgorithm"`
2	`"gorithm"`
3	`"orithm"`
4	`"rithm"`
5	`"ithm"`
6	`"thm"`
7	`"hm"`
8	`"m"`
9	`""` (empty string)

Symmetry with Prefixes

Key Properties of Suffixes:

Nesting (in reverse): Suffixes nest in the opposite direction. Every suffix of length k is also a suffix of every suffix of length > k.
Unique per length: There is exactly one suffix of each length from 0 to n.
Special cases:
- The empty suffix (length 0) is a suffix of every string
- The full string is always a suffix of itself
Proper suffix: A suffix strictly shorter than the full string is called a proper suffix.

The Prefix-Suffix Duality:

Where Suffixes Appear in Algorithms

•Suffix Tree: A compressed trie of all suffixes—enables O(m) substring search in O(n) space
•Suffix Array: Sorted array of all suffix start positions—space-efficient alternative to suffix trees
•Boyer-Moore Algorithm: Uses suffix-based reasoning for fast pattern matching
•Longest Common Suffix (LCS): Finding the longest suffix shared by multiple strings
•File Extension Matching: Checking if a filename ends with ".txt", ".jpg", etc.
•URL Routing: Web frameworks match URL patterns often by suffix

Substrings — Any Contiguous Portion

A substring is any contiguous sequence of characters within a string—not just from the beginning (prefix) or end (suffix), but from anywhere to anywhere.

Formal Definition:

Given a string S of length n, a substring of S is any string T such that:

T = S[i..j] for some 0 ≤ i ≤ j ≤ n
In other words, T is a contiguous portion of S

Key Insight: Every prefix is a substring (from 0). Every suffix is a substring (to n). But substrings also include everything in between.

Example: For S = "abcde", some substrings:

Start	End	Substring
0	0	`""` (empty)
0	3	`"abc"` (prefix)
2	5	`"cde"` (suffix)
1	4	`"bcd"` (middle)
2	3	`"c"` (single char)

Counting Substrings

Key Properties of Substrings:

Defined by two indices: A substring needs both a start and end position (or start and length).
Subsumes prefixes and suffixes: Every prefix S[0..k] and every suffix S[j..n] is a substring.
Contiguity is essential: The characters must be consecutive. "ace" is NOT a substring of "abcde" (skips 'b' and 'd')—that's a subsequence, a different concept.
Quadratic count: There are Θ(n²) substrings, making exhaustive enumeration expensive.

Substring vs Subsequence:

This distinction is critical and often confused:

Substring: Contiguous characters — "bcd" is a substring of "abcde"
Subsequence: Characters in order but not necessarily contiguous — "ace" is a subsequence of "abcde"

Substring problems are typically easier (O(n²) substrings vs 2^n subsequences).

Substring

Contiguous: characters must be adjacent in the original string. "bcd" from "abcde". O(n²) possible substrings. Easier to enumerate and process.

Subsequence

Non-contiguous: characters maintain order but can skip. "ace" from "abcde". O(2^n) possible subsequences. Much harder to enumerate.

Relationships Between the Three

Understanding how prefixes, suffixes, and substrings relate to each other deepens your ability to reason about string structure.

Set Relationships:

Substrings ⊃ Prefixes ∪ Suffixes

All prefixes are substrings. All suffixes are substrings. But not all substrings are prefixes or suffixes.

For S = "abcd":

Prefixes: "", "a", "ab", "abc", "abcd" (5 total)
Suffixes: "", "d", "cd", "bcd", "abcd" (5 total)
Substrings: "", "a", "b", "c", "d", "ab", "bc", "cd", "abc", "bcd", "abcd" (11 total)

Notice that prefixes + suffixes ≠ substrings (there's overlap at "" and the full string, plus unique middle substrings like "b", "c", "bc").

The Powerful Intersection: Prefix That Is Also a Suffix

One of the most important concepts in string algorithms is finding prefixes that are also suffixes (excluding the trivial case of the full string). This is called the border of a string.

Definition: A border of string S is a proper prefix that is also a suffix.

Example: For S = "abcab":

Prefixes: "", "a", "ab", "abc", "abca", "abcab"
Suffixes: "", "b", "ab", "cab", "bcab", "abcab"
Intersection (borders): "" and "ab"

The longest border is "ab" (length 2).

Why borders matter:

The KMP algorithm's failure function computes the longest border for each prefix
Borders determine how far a pattern can shift after a mismatch
They're central to efficient pattern matching

The KMP Insight

Prefix, Suffix, and Substring Counts for String of Length n
Concept	Count	Formula	Example (n=5)
Prefixes	n + 1	One per length 0..n	6
Suffixes	n + 1	One per length 0..n	6
Substrings	n(n+1)/2 + 1	Quadratic	16
Proper Prefixes	n	Excluding full string	5
Proper Suffixes	n	Excluding full string	5
Unique Substrings	≤ n(n+1)/2 + 1	May have duplicates	≤ 16

Reasoning About Positions

To work effectively with prefixes, suffixes, and substrings, you need a clear mental model of indexing. Let's establish precise conventions.

Convention: 0-Based Indexing

Most programming languages use 0-based indexing:

First character is at index 0
Last character is at index n - 1
String length is n

For S = "hello" (n = 5):

Index:  0  1  2  3  4
Char:   h  e  l  l  o

Substring Notation: S[i:j] (Exclusive End)

We use half-open intervals: S[i:j] means characters from index i up to (but not including) j.

S[0:3] = "hel" (indices 0, 1, 2)
S[2:5] = "llo" (indices 2, 3, 4)
S[i:i] = "" (empty string, i to i exclusive is nothing)

Why Half-Open Intervals?

Expressing Prefixes and Suffixes:

Using half-open notation:

Prefix of length k: S[0:k]
Suffix of length k: S[n-k:n]
Suffix starting at index j: S[j:n]

Example: S = "algorithm" (n = 9)

Description	Expression	Result
First 4 chars (prefix)	`S[0:4]`	"algo"
Last 3 chars (suffix)	`S[6:9]`	"thm"
Middle 5 chars	`S[2:7]`	"gorit"
Empty	`S[4:4]`	""
Full string	`S[0:9]`	"algorithm"

Length Calculations:

Length of S[i:j] = j - i
Length of prefix S[0:k] = k
Length of suffix S[j:n] = n - j

These formulas should become automatic. When solving string problems, you'll constantly translate between "the first k characters" and S[0:k], or "starting from position j" and S[j:n].

Bounds Checking:

Valid indices satisfy:

0 ≤ i ≤ j ≤ n

Out-of-bounds access is a common bug. Train yourself to verify:

Is i non-negative?
Is j at most n?
Is i ≤ j?

Prefix and Suffix Functions

Many algorithms precompute information about prefixes and suffixes for efficient processing. Understanding these "functions" is crucial for pattern matching and other advanced algorithms.

The Prefix Function (Failure Function)

For each position i in a string S, the prefix function π[i] gives the length of the longest proper prefix of S[0:i+1] that is also a suffix of S[0:i+1].

In simpler terms: π[i] answers "What's the longest border of the prefix ending at i?"

Example: For S = "abcabc":

i	Prefix S[0:i+1]	Longest Border	π[i]
0	"a"	"" (no proper prefix)	0
1	"ab"	""	0
2	"abc"	""	0
3	"abca"	"a" (prefix and suffix)	1
4	"abcab"	"ab"	2
5	"abcabc"	"abc"	3

Why This Matters for KMP

The Z-Function (Z-Array)

An alternative to the prefix function. For each position i, Z[i] gives the length of the longest substring starting at i that matches a prefix of the string.

Example: For S = "aabaaab":

i	S[i:] starts as	Matches prefix	Z[i]
0	"aabaaab"	(undefined/full match)	7
1	"abaaab"	"a" matches "a"	1
2	"baaab"	"" (b ≠ a)	0
3	"aaab"	"aa" matches "aa"	2
4	"aab"	"aab" matches "aab"	3
5	"ab"	"a" matches "a"	1
6	"b"	"" (b ≠ a)	0

The Z-function can be computed in O(n) time and is useful for pattern matching and repetition detection.

Applications of Prefix/Suffix Functions

•Pattern Matching: Both π and Z enable O(n + m) pattern search
•Period Detection: Finding if a string is a repetition of a smaller pattern
•String Compression: Identifying repeating segments for compression
•DNA Sequence Analysis: Finding repeated motifs in genetic sequences
•Plagiarism Detection: Identifying copied text segments efficiently

Common Operations and Their Costs

Understanding the computational cost of prefix/suffix/substring operations helps you analyze algorithm complexity.

Extracting a Substring: S[i:j]

Time: O(j - i) — you must copy (or reference) j - i characters
Space: O(j - i) if creating a new string
Note: Some languages offer O(1) substring views that share memory with the original

Checking if P is a Prefix of S

Time: O(|P|) — compare first |P| characters of S against P
Early termination if mismatch found

Checking if P is a Suffix of S

Time: O(|P|) — compare last |P| characters of S against P

Checking if P is a Substring of S (Naive)

Time: O(|S| × |P|) in worst case — try each starting position
Can be improved to O(|S| + |P|) with KMP, Boyer-Moore, or Rabin-Karp

Substring Operation Complexities
Operation	Naive Time	Optimal Time	Space
Extract substring S[i:j]	O(j-i)	O(j-i)	O(j-i)
Check if P is prefix	O(\|P\|)	O(\|P\|)	O(1)
Check if P is suffix	O(\|P\|)	O(\|P\|)	O(1)
Find P in S (substring search)	O(\|S\| × \|P\|)	O(\|S\| + \|P\|)	O(\|P\|)
Find all occurrences of P in S	O(\|S\| × \|P\|)	O(\|S\| + \|P\|)	O(\|P\|)
Longest common prefix of 2 strings	O(min(\|S\|,\|T\|))	O(min(\|S\|,\|T\|))	O(1)
Enumerate all substrings	O(n²)	O(n²)	O(n²)

Beware Hidden Costs

Problem-Solving Patterns

Prefixes, suffixes, and substrings appear in recurring problem patterns. Recognizing these patterns accelerates problem-solving.

Pattern 1: Longest Common Prefix (LCP)

Find the longest string that is a prefix of all given strings.

Approach: Compare character by character, stopping at first mismatch. For k strings of average length m, this is O(k × m) in the worst case.

Use case: Autocomplete prefix matching, directory path comparison.

Pattern 2: Longest Common Suffix

Mirror of LCP—find the longest shared ending.

Approach: Reverse the problem or compare from the end.

Use case: File extension extraction, domain matching.

Pattern 3: Substring Search

Does string T contain pattern P as a substring?

Naive approach: O(|T| × |P|) Optimal approach: O(|T| + |P|) using KMP, Z-algorithm, Rabin-Karp

Use case: Text editors, grep, database LIKE queries.

Pattern 4: All Occurrences

Find every position where pattern P occurs in T.

Approach: Build a suffix array or use KMP/Z-function for O(n + m + k) where k is the number of occurrences.

Pattern 5: Longest Repeated Substring

Find the longest substring that appears at least twice.

Approach: Suffix array + LCP array gives O(n) solution after O(n) or O(n log n) construction.

Use case: Compression, plagiarism detection.

Pattern Recognition Triggers

•"Find the common beginning..." → Longest Common Prefix
•"Does A contain B?" → Substring Search
•"Find all matches of pattern..." → Pattern Matching (KMP/Z)
•"Longest string that appears multiple times" → Suffix Array + LCP
•"Check if string starts/ends with..." → Direct prefix/suffix check
•"Shortest unique prefix for each word" → Trie-based prefix analysis
•"Can string A be rotated to form B?" → Check if A is substring of B+B

The Rotation Trick

Summary and Key Takeaways

Core Concepts Mastered

•Prefix: First k characters, exactly n+1 prefixes, key to tries and autocomplete
•Suffix: Last k characters, exactly n+1 suffixes, key to suffix structures
•Substring: Any contiguous portion, O(n²) possible, includes all prefixes and suffixes
•Border: Proper prefix that is also a suffix, central to KMP algorithm
•Substring ≠ Subsequence: Substrings are contiguous; subsequences can skip characters
•Half-open intervals: S[i:j] means index i up to (not including) j; length = j - i
•Prefix/Z functions: Precomputed border/match information enabling O(n+m) pattern matching

What's Next

Page Complete