Loading content...
What do the strings 'flower', 'flow', 'flight' have in common? They all start with 'fl'. This longest common prefix (LCP) is the longest string that is a prefix of every string in a collection.
While this seems like a simple string problem, it appears with surprising frequency in real systems:
Tries offer an elegant solution to LCP, but understanding the full landscape of approaches—and when each applies—separates good engineers from great ones.
By the end of this page, you will master: (1) Multiple algorithms for finding LCP with their trade-offs, (2) How tries naturally solve LCP through structure, (3) The horizontal scanning, vertical scanning, and divide & conquer approaches, (4) Binary search on prefix length, and (5) When to choose each approach based on constraints.
Formal Definition:
Given a list of n strings S = [s₁, s₂, ..., sₙ], find the longest string P such that P is a prefix of every string sᵢ.
Properties:
min(|sᵢ|) (the shortest string)Examples:
Example 1: ["flower", "flow", "flight"]LCP: "fl"Explanation: "fl" is the longest string that starts every word Example 2: ["dog", "racecar", "car"]LCP: ""Explanation: No common prefix exists Example 3: ["interview", "internal", "internet", "interleave"]LCP: "inter"Explanation: All words share "inter" at the start Example 4: ["", "hello", "help"]LCP: ""Explanation: Empty string means no common prefix Example 5: ["test"]LCP: "test"Explanation: Single string is its own LCP Example 6: ["a", "a", "a"]LCP: "a"Explanation: Identical strings yield the full stringWhy Multiple Approaches Exist:
Unlike many string problems with a clear "best" algorithm, LCP has multiple valid approaches, each with different performance characteristics:
| Approach | Time | Space | Best When |
|---|---|---|---|
| Horizontal Scanning | O(S) | O(1) | General purpose |
| Vertical Scanning | O(S) | O(1) | Short LCP expected |
| Divide & Conquer | O(S) | O(n log n) | Parallelizable computation |
| Binary Search | O(S × log m) | O(1) | Random access strings |
| Trie | O(S) build + O(m) query | O(S) | Multiple LCP queries |
Where S = sum of all character lengths, n = number of strings, m = min string length.
Let's explore each approach in depth.
The most intuitive approach: compare strings pairwise, progressively narrowing the common prefix.
Algorithm:
prefix = s₁ (the first string)sᵢ:
sᵢ does not start with prefix, shrink prefix by one characterprefix becomes empty, return ""prefixIntuition:
We start with the maximal possible prefix (the first string) and trim it until it fits all subsequent strings. Like carving a sculpture—we start with a block and remove material until the shape emerges.
123456789101112131415161718192021222324252627282930313233343536373839404142434445
def longest_common_prefix_horizontal(strs: list[str]) -> str: """ Horizontal Scanning approach for Longest Common Prefix. Time Complexity: O(S) where S = sum of all string lengths In worst case, we compare all characters Space Complexity: O(1) - only using constant extra space Best for: General purpose, when you have no prior knowledge about the strings """ if not strs: return "" # Start with the first string as the candidate prefix prefix = strs[0] for i in range(1, len(strs)): # Shrink prefix until current string starts with it while not strs[i].startswith(prefix): prefix = prefix[:-1] # Remove last character if not prefix: return "" # No common prefix exists return prefix # Detailed trace for ["flower", "flow", "flight"]:# # Initial: prefix = "flower"# # i=1, strs[1]="flow":# "flow".startswith("flower")? No# prefix = "flowe"# "flow".startswith("flowe")? No# prefix = "flow"# "flow".startswith("flow")? Yes ✓# # i=2, strs[2]="flight":# "flight".startswith("flow")? No# prefix = "flo"# "flight".startswith("flo")? No# prefix = "fl"# "flight".startswith("fl")? Yes ✓# # Return "fl"Time: O(S) where S is the total character count. In the worst case (all strings identical), we compare every character. Space: O(1) extra space—we only modify our prefix variable. The prefix string resides in existing memory.
Trade-offs:
✅ Strengths:
❌ Weaknesses:
startsWith operation may be suboptimal in some languagesInstead of comparing entire strings horizontally, compare one character position at a time across all strings.
Algorithm:
i starting from 0:
c at position i of the first stringsⱼ:
i >= len(sⱼ) or sⱼ[i] != c, return s₁[0:i]s₁ (all strings are identical)Intuition:
Imagine the strings as columns in a grid. We scan column by column (vertically), stopping at the first mismatch.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
def longest_common_prefix_vertical(strs: list[str]) -> str: """ Vertical Scanning approach for Longest Common Prefix. Time Complexity: O(S) where S = sum of all string lengths But terminates early when LCP is found Space Complexity: O(1) Best for: When the LCP is expected to be short Terminates as soon as mismatch is found """ if not strs: return "" # Compare character-by-character across all strings for i in range(len(strs[0])): char = strs[0][i] for j in range(1, len(strs)): # Check if current position exceeds string length # or if characters don't match if i >= len(strs[j]) or strs[j][i] != char: return strs[0][:i] # All characters of first string matched in all strings return strs[0] # Alternative with explicit loop (more educational):def longest_common_prefix_vertical_explicit(strs: list[str]) -> str: """ More explicit version showing the column-by-column comparison. """ if not strs: return "" if len(strs) == 1: return strs[0] min_len = min(len(s) for s in strs) # Check each column for col in range(min_len): # All strings must have the same character in this column reference_char = strs[0][col] for row in range(1, len(strs)): if strs[row][col] != reference_char: # Mismatch found - return prefix up to (but not including) this column return strs[0][:col] # All columns matched up to min_len return strs[0][:min_len] # Visual: Vertical Scanning## Column: 0 1 2 3 4 5# ┌─────────────────────────────────────# String 0: │ f │ l │ o │ w │ e │ r │# ├─────────────────────────────────────# String 1: │ f │ l │ o │ w │ │ │# ├─────────────────────────────────────# String 2: │ f │ l │ i │ g │ h │ t │# └─────────────────────────────────────# ↓ ↓ ↓# ✓ ✓ ✗ (mismatch at column 2)## Return strs[0][:2] = "fl"Vertical scanning terminates immediately when a mismatch is found. If the LCP is 'fl' (length 2), we only examine 2 characters from each string, regardless of how long the strings are. Horizontal scanning might examine many more characters before realizing the prefix must shrink.
Comparison: Horizontal vs Vertical
Consider: ["verylongstringthatdoesntmatter", "v", "veryshort"]
Horizontal Scanning:
prefix = "verylongstringthatdoesntmatter" (30 chars)"v".startsWith(prefix)? Noprefix = "v""veryshort".startsWith("v")? YesVertical Scanning:
"v" ends"v"Vertical scanning performs the minimum necessary work.
Apply the classic divide and conquer paradigm: split the problem, solve subproblems, combine results.
Algorithm:
Key Insight:
LCP(S₁, S₂, ..., Sₙ) = LCP(LCP(S₁, ..., Sₙ/₂), LCP(Sₙ/₂₊₁, ..., Sₙ))
The LCP of all strings equals the LCP of the LCPs of two halves. This is because prefix relationships are transitive—if P is a prefix of both A and B, it's a prefix of their LCP.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
def longest_common_prefix_divide_conquer(strs: list[str]) -> str: """ Divide and Conquer approach for Longest Common Prefix. Time Complexity: O(S) where S = sum of all string lengths T(n) = 2T(n/2) + O(m), solves to O(S) Space Complexity: O(n log n) for recursion stack (depth log n, each level may store prefix) Best for: Parallelizable scenarios, distributed computing Each subproblem can be solved independently """ if not strs: return "" def lcp_two_strings(s1: str, s2: str) -> str: """Find LCP of exactly two strings.""" min_len = min(len(s1), len(s2)) for i in range(min_len): if s1[i] != s2[i]: return s1[:i] return s1[:min_len] def divide_conquer(left: int, right: int) -> str: """Recursively find LCP of strs[left:right+1].""" # Base case: single string if left == right: return strs[left] # Divide: split into two halves mid = (left + right) // 2 # Conquer: solve each half recursively lcp_left = divide_conquer(left, mid) lcp_right = divide_conquer(mid + 1, right) # Combine: find LCP of the two results return lcp_two_strings(lcp_left, lcp_right) return divide_conquer(0, len(strs) - 1) # Recursion tree for ["flower", "flow", "flight", "fly"]:## LCP(0,3)# / \# LCP(0,1) LCP(2,3)# / \ / \# "flower" "flow" "flight" "fly"# \ / \ /# "flow" "fl"# \ /# \ /# "fl"## Work at each level:# Level 2: Compare pairs → 4 strings examined# Level 1: LCP("flow","flow") → "flow", LCP("flight","fly") → "fl" # Level 0: LCP("flow","fl") → "fl"This approach is ideal for parallel computing. Each subproblem is independent—you can distribute divide_conquer(0, mid) and divide_conquer(mid+1, n) to different processors, then combine results. In a MapReduce framework, this is the natural approach.
Parallel Execution Potential:
Processor 1: LCP("flower", "flow") → "flow"
Processor 2: LCP("flight", "fly") → "fl"
[Wait for both to complete]
Processor 1: LCP("flow", "fl") → "fl"
With enough processors, we achieve O(log n) parallel time instead of O(n) sequential time.
Trade-offs:
✅ Strengths:
❌ Weaknesses:
A clever approach: binary search on the length of the LCP, not the strings themselves.
Key Observation:
If a prefix of length k is common to all strings, then all shorter prefixes are also common. This monotonicity enables binary search:
[0, minLen] where minLen = min(|sᵢ|)mid, check if s₁[0:mid] is a prefix of all stringsmid—search in [mid+1, right][left, mid-1]123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
def longest_common_prefix_binary_search(strs: list[str]) -> str: """ Binary Search approach for Longest Common Prefix. Time Complexity: O(S × log m) where m = min string length, S = total chars log m binary search iterations Each iteration checks O(n × mid) characters Space Complexity: O(1) Best for: Unique interview approach, demonstrating binary search thinking Note: Often less efficient than vertical scanning in practice """ if not strs: return "" # Find the minimum length string - LCP cannot be longer min_len = min(len(s) for s in strs) # Binary search on the length of the prefix left, right = 0, min_len def is_common_prefix(length: int) -> bool: """Check if strs[0][:length] is a prefix of all strings.""" prefix = strs[0][:length] return all(s.startswith(prefix) for s in strs) while left <= right: mid = (left + right) // 2 if is_common_prefix(mid): # This length works, try longer left = mid + 1 else: # This length doesn't work, try shorter right = mid - 1 # 'right' is the longest valid prefix length return strs[0][:right] if right >= 0 else "" # More efficient version avoiding repeated string slicing:def longest_common_prefix_binary_search_optimized(strs: list[str]) -> str: """ Optimized binary search that compares characters directly. """ if not strs: return "" min_len = min(len(s) for s in strs) if min_len == 0: return "" left, right = 0, min_len - 1 while left <= right: mid = (left + right) // 2 # Check if all strings match strs[0] at positions [0, mid] all_match = True for s in strs: # Check character at position mid # (assumes all previous positions already checked in earlier iterations) if s[mid] != strs[0][mid]: all_match = False break # Actually need to verify entire prefix, not just one character # Let's fix this: all_match = True for s in strs[1:]: for i in range(mid + 1): if s[i] != strs[0][i]: all_match = False break if not all_match: break if all_match: left = mid + 1 else: right = mid - 1 return strs[0][:right + 1] if right >= 0 else "" # Binary Search Visualization:## strs = ["flower", "flow", "flight"]# min_len = 4 (length of "flow")## Binary search on length [0, 4]:## left=0, right=4, mid=2# Is "fl" prefix of all? Yes → search [3, 4]## left=3, right=4, mid=3# Is "flo" prefix of all? No ("flight" has "fli") → search [3, 2]## left=3, right=2 → loop ends## Return strs[0][:2] = "fl"While elegant, binary search on prefix length is often slower in practice than vertical scanning. Each binary search iteration is O(n × mid), and we need O(log m) iterations. The total is O(n × m × log m), worse than vertical scanning's O(n × m) for short LCPs.
When Binary Search Makes Sense:
In general interviews, present vertical scanning first, then mention binary search as an alternative to demonstrate algorithmic versatility.
Now we arrive at the trie-based solution—elegant, intuitive, and powerful for multiple LCP queries.
Key Insight:
When we insert all strings into a trie, the LCP corresponds to the path from root until we encounter either:
The trie naturally represents shared structure—LCP is literally walking down the shared path.
Strings: ["flower", "flow", "flight"] Trie structure: (root) │ f │ l / \ o i │ │ w g / \ │ (·) e h │ │ r t │ │ (·) (·) (·) = end of word marker Finding LCP:1. Start at root2. 'f' - only one child, continue → LCP so far: "f"3. 'l' - TWO children ('o' and 'i'), STOP4. Return "fl" The branching point reveals the LCP!1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
class TrieNode: """Trie node for LCP computation.""" __slots__ = ['children', 'is_end', 'count'] def __init__(self): self.children: dict[str, 'TrieNode'] = {} self.is_end: bool = False # Marks end of a word self.count: int = 0 # Number of words passing through class LCPTrie: """ Trie optimized for Longest Common Prefix queries. Build Time: O(S) where S = sum of all string lengths LCP Query Time: O(m) where m = LCP length Space: O(S) worst case """ def __init__(self): self.root = TrieNode() self.word_count = 0 def insert(self, word: str) -> None: """Insert a word into the trie.""" node = self.root for char in word: if char not in node.children: node.children[char] = TrieNode() node = node.children[char] node.count += 1 node.is_end = True self.word_count += 1 def find_lcp(self) -> str: """ Find the longest common prefix of all inserted words. Traverse the trie until: 1. Node has multiple children (divergence point) 2. Node is end of word (one word is prefix of another) 3. Node has no children (reached end of all words) """ if self.word_count == 0: return "" lcp = [] node = self.root while True: # Stop if node marks end of word (shortest word ends here) if node.is_end: break # Stop if multiple children (strings diverge) if len(node.children) != 1: break # Exactly one child - this character is part of LCP char = next(iter(node.children)) lcp.append(char) node = node.children[char] return ''.join(lcp) def longest_common_prefix_trie(strs: list[str]) -> str: """ Find LCP using a trie. Time: O(S) for building + O(m) for query = O(S) Space: O(S) for trie Best for: Multiple LCP queries after initial build Or when trie is already built for other purposes """ if not strs: return "" # Handle empty strings if any(s == "" for s in strs): return "" trie = LCPTrie() for word in strs: trie.insert(word) return trie.find_lcp() # Example usage:strs = ["flower", "flow", "flight"]print(longest_common_prefix_trie(strs)) # Output: "fl" # The trie also supports:# - Adding more strings and recomputing LCP# - LCP for subsets of strings (with count-based filtering)# - Prefix existence queriesUse a trie when: (1) You need LCP for the same dataset multiple times, (2) You're incrementally adding strings and tracking LCP, (3) You already have a trie for autocomplete or other purposes, (4) You need additional operations like prefix search alongside LCP.
Let's summarize all approaches with a detailed comparison to guide your choice:
| Approach | Time | Space | Strengths | Weaknesses |
|---|---|---|---|---|
| Horizontal Scan | O(S) | O(1) | Simple, minimal space | May do extra work |
| Vertical Scan | O(S)* | O(1) | Early termination, optimal for short LCP | Sequential only |
| Divide & Conquer | O(S) | O(n log n) | Parallelizable, elegant | Stack space, overhead |
| Binary Search | O(S log m) | O(1) | Creative, works with prefix oracles | Often slower |
| Trie | O(S)+O(m) | O(S) | Reusable, supports other queries | Memory overhead |
Decision Framework:
┌─────────────────────────────────────────────────────────────┐
│ LCP Algorithm Selection │
└─────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────┐
│ Single query or │
│ multiple queries?│
└────────┬─────────┘
│
┌─────────────────┴─────────────────┐
▼ ▼
┌─────────────┐ ┌─────────────┐
│ SINGLE │ │ MULTIPLE │
│ QUERY │ │ QUERIES │
└──────┬──────┘ └──────┬──────┘
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Need parallel │ │ Use TRIE │
│ computation? │ │ Build once, │
└──────┬────────┘ │ query O(m) │
│ └───────────────┘
┌─────┴─────┐
▼ ▼
┌───┐ ┌───┐
│YES│ │NO │
└─┬─┘ └─┬─┘
│ │
▼ ▼
┌────────┐ ┌───────────────────────┐
│ D&C │ │ Vertical Scanning │
│ (best │ │ (best single-threaded)│
│ for │ │ │
│ multi- │ │ │
│ core) │ │ │
└────────┘ └───────────────────────┘
For interviews: Start with Vertical Scanning (simple, efficient). Then mention Trie if asked about extensions. Discuss Divide & Conquer if parallelism comes up. Show Binary Search to demonstrate algorithmic creativity if time permits.
LCP isn't just an academic exercise—it solves real engineering problems:
cp /home/user/docs/a.txt /home/user/docs/b.txt ... → extract /home/user/docs/./api/users/123 matches /api/users/:id over /api.['interview', 'internal', 'internet'], we can show 'inter' with confidence.123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
# Application: Find common directory to simplify pathsdef simplify_paths(file_paths: list[str]) -> tuple[str, list[str]]: """ Given file paths, extract common prefix and relative paths. Example: Input: ["/home/user/docs/a.txt", "/home/user/docs/b.txt", "/home/user/docs/sub/c.txt"] Output: ("/home/user/docs/", ["a.txt", "b.txt", "sub/c.txt"]) """ if not file_paths: return "", [] # Split paths into components split_paths = [p.split('/') for p in file_paths] # Find LCP of path components lcp_components = [] min_len = min(len(p) for p in split_paths) for i in range(min_len): component = split_paths[0][i] if all(p[i] == component for p in split_paths): lcp_components.append(component) else: break # Ensure we don't include partial directory names common_prefix = '/'.join(lcp_components) if common_prefix and not common_prefix.endswith('/'): common_prefix += '/' # Generate relative paths relative_paths = [ p[len(common_prefix):] if p.startswith(common_prefix) else p for p in file_paths ] return common_prefix, relative_paths # Usagepaths = [ "/home/user/projects/myapp/src/main.py", "/home/user/projects/myapp/src/utils.py", "/home/user/projects/myapp/tests/test_main.py"] common, relative = simplify_paths(paths)print(f"Common prefix: {common}")print(f"Relative paths: {relative}") # Output:# Common prefix: /home/user/projects/myapp/# Relative paths: ['src/main.py', 'src/utils.py', 'tests/test_main.py']We've explored the Longest Common Prefix problem from multiple angles, building a complete understanding of this fundamental string pattern:
You now understand the Longest Common Prefix problem deeply—multiple algorithms, their trade-offs, and real-world applications. Next, we'll explore Word Break Problems, another critical trie pattern where dynamic programming meets string processing.