Loading learning content...
In the previous page, we explored the architecture of autocomplete systems and established why tries are the data structure of choice. Now we confront the core algorithmic challenge: given a prefix typed by the user, how do we efficiently retrieve all words that begin with that prefix?
This operation—prefix-based word suggestion—is the beating heart of any autocomplete system. It must be:
The solution involves two distinct phases: navigation (finding the prefix node) and collection (gathering all words in the subtree). Each phase has its own complexity characteristics and optimization opportunities.
By the end of this page, you will understand how to navigate from the root to any prefix node, implement complete subtree collection using DFS, analyze time and space complexity of word collection, handle edge cases robustly, and optimize collection for practical use cases.
Suggesting all words with a prefix is a two-phase process:
Phase 1: Navigate to the Prefix Node
Starting from the root, follow the path defined by each character in the prefix. If at any point a character's child doesn't exist, the prefix has no matches in the dictionary.
Phase 2: Collect All Words in the Subtree
Once at the prefix node, traverse the entire subtree rooted at that node. Every node marked as an end-of-word represents a complete word starting with our prefix.
Let's trace through an example to build intuition:
Example: Finding all words starting with 'pro'
Consider a trie containing: [program, programmer, programming, progress, project, promise, protect]
root
│
p
│
r
│
o ← prefix node for 'pro'
/|
g j m
│ │ │
r e i
│ │ │
a c s
│ │ │
m t e
/
∎ m ← 'program', 'programm...'
|
e
/|
r ∎ ← 'programme'
|
∎ ← 'programmer'
Phase 1: Navigate p → r → o (3 steps) Phase 2: DFS from 'o' node, collecting: program, programmer, programming, progress, project, promise
Note that Phase 1 is O(m) where m is prefix length, independent of dictionary size. Phase 2's cost depends on the number of words that match.
Think of the prefix node as a gateway. Everything 'beyond' it (in the subtree) matches the prefix. Everything 'beside' it (sibling branches) doesn't. The trie structure naturally partitions the dictionary based on prefixes, which is exactly why it's perfect for this task.
The navigation phase is straightforward but deserves careful attention because it's where we detect invalid prefixes early.
The Algorithm:
c in the prefix:
c, move to that childKey Properties:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
class TrieNode { children: Map<string, TrieNode> = new Map(); isEndOfWord: boolean = false; word: string | null = null;} /** * Navigate to the node representing the given prefix. * * @param root - The root node of the trie * @param prefix - The prefix to search for * @returns The prefix node, or null if prefix doesn't exist * * Time Complexity: O(m) where m is prefix length * Space Complexity: O(1) */function findPrefixNode(root: TrieNode, prefix: string): TrieNode | null { let current = root; for (const char of prefix) { const child = current.children.get(char); if (!child) { // This character doesn't exist at this position // No words in the trie start with this prefix return null; } current = child; } // We've successfully traversed all prefix characters // 'current' is now the prefix node return current;} // Example usageconst trie = buildTrie(['program', 'programmer', 'progress', 'project']); const proNode = findPrefixNode(trie.root, 'pro');// proNode !== null; all words starting with 'pro' are in this subtree const xyzNode = findPrefixNode(trie.root, 'xyz');// xyzNode === null; no words start with 'xyz' const emptyNode = findPrefixNode(trie.root, '');// emptyNode === root; empty prefix matches everythingEdge Cases to Handle:
Empty prefix: Returns the root node. The subtree is the entire trie (all words).
Single character prefix: Works identically; navigates one step from root.
Prefix that is a complete word: The prefix node may itself have isEndOfWord = true. This word should be included in results.
Prefix longer than any word: If the trie contains 'cat' but you search for 'catalog', navigation fails at 'a' (no 'l' child).
Case sensitivity: Decide in advance whether 'A' and 'a' are the same. Typically, normalize to lowercase during both insertion and search.
Whatever normalization you apply during navigation (lowercase, accent stripping, etc.) must match what you did during insertion. Inconsistent normalization is a common bug that causes 'missing' words that are actually present under different casing.
Once we've navigated to the prefix node, we need to collect all words in its subtree. This is a classic tree traversal problem, and Depth-First Search (DFS) is the natural choice.
Why DFS?
The Basic DFS Approach:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
/** * Collect all words in a trie subtree using DFS. * This version stores the complete word in each end-of-word node. * * Time Complexity: O(n) where n is the number of nodes in the subtree * Space Complexity: O(h) for recursion stack, where h is max word length */function collectWordsStored(node: TrieNode, results: string[]): void { // If this node marks the end of a word, add it to results if (node.isEndOfWord && node.word !== null) { results.push(node.word); } // Recursively collect from all children for (const child of node.children.values()) { collectWordsStored(child, results); }} /** * Alternative: Build words character by character during traversal. * Useful when words aren't stored in nodes. * * Time Complexity: O(n) for traversal + O(total_chars) for string building * Space Complexity: O(h) for recursion + O(h) for current path */function collectWordsBuild( node: TrieNode, currentPath: string, results: string[]): void { if (node.isEndOfWord) { results.push(currentPath); } for (const [char, child] of node.children.entries()) { collectWordsBuild(child, currentPath + char, results); }} /** * Main function: Get all words starting with a prefix */function getAllWordsWithPrefix(root: TrieNode, prefix: string): string[] { const prefixNode = findPrefixNode(root, prefix); if (prefixNode === null) { return []; // No words match this prefix } const results: string[] = []; // Option 1: If words are stored in nodes collectWordsStored(prefixNode, results); // Option 2: If building words during traversal // collectWordsBuild(prefixNode, prefix, results); return results;}Stored vs Built Words:
There are two common approaches for retrieving the actual word strings:
| Approach | Pros | Cons |
|---|---|---|
| Store word in node | Fast retrieval (O(1) per word) | Higher memory usage |
| Build during traversal | Lower memory | String concatenation overhead |
For autocomplete, storing words in nodes is usually preferred because:
Recursion can cause stack overflow for very deep tries (words with thousands of characters). An iterative approach using an explicit stack avoids this. The logic is identical: push the root, pop nodes one by one, process them, push their children. This is rarely needed for natural language but essential for applications like long DNA sequences.
Understanding the complexity of prefix-based word retrieval is crucial for predicting system behavior and making informed design decisions.
Let's define our variables:
m = length of the prefixn = number of nodes in the entire triek = number of words matching the prefixs = number of nodes in the subtree rooted at the prefix nodeL = maximum word length| Operation | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Navigate to prefix | O(m) | O(1) | m = prefix length, independent of dictionary size |
| Collect all words (DFS) | O(s) | O(L) | s = subtree size; L = max depth for recursion stack |
| Total for getAllWordsWithPrefix | O(m + s) | O(L + k) | k words in result array |
Detailed Breakdown:
Navigation: O(m)
We traverse exactly m edges in the trie, one per character in the prefix. This is independent of the dictionary size. Whether your dictionary has 1,000 or 1,000,000 words, navigating to 'prog' takes 4 steps.
Collection: O(s)
We visit every node in the subtree rooted at the prefix node. Each node is visited exactly once. The number of nodes s depends on:
Space for Recursion: O(L) The recursion depth is at most the length of the longest word in the subtree. For natural language, this is typically 20-30 characters. For other applications (URLs, file paths), it could be much longer.
Space for Results: O(k × average_word_length)
We store k words, each requiring space proportional to its length. If words are stored in nodes, this is just O(k) for the array of references.
Short prefixes match many words. If your dictionary has 500,000 words and 30% start with 's', searching for prefix 's' means navigating 1 step (fast!) then collecting 150,000 words (slow!). This is why autocomplete systems don't fetch all matches—they limit results and use ranking instead.
Worst Case Analysis:
The worst case for collection occurs when:
Example: prefix '' (empty) on an English dictionary:
This is essentially "return the entire dictionary," which is never what users want but highlights why we need limits.
Best Case Analysis:
The best case occurs when:
Time: O(m) for navigation, O(1) for empty result.
Practical Average Case:
For a well-distributed dictionary with typical autocomplete usage:
While recursive DFS is elegant, an iterative version using an explicit stack is sometimes necessary. This avoids potential stack overflow for deep tries and gives you more control over the traversal.
When to Use Iterative:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192
/** * Collect words using iterative DFS with explicit stack. * More control, no recursion depth limits. * * Time Complexity: O(s) where s is subtree size * Space Complexity: O(w) where w is max width at any level (stack size) */function collectWordsIterative(prefixNode: TrieNode): string[] { const results: string[] = []; const stack: TrieNode[] = [prefixNode]; while (stack.length > 0) { const current = stack.pop()!; // Process current node if (current.isEndOfWord && current.word !== null) { results.push(current.word); } // Add children to stack // Note: Order of traversal depends on iteration order of children for (const child of current.children.values()) { stack.push(child); } } return results;} /** * Collect with a maximum count limit. * Stops early once we have enough results. * * Useful for autocomplete where we only need top-K results. */function collectWordsWithLimit(prefixNode: TrieNode, limit: number): string[] { const results: string[] = []; const stack: TrieNode[] = [prefixNode]; while (stack.length > 0 && results.length < limit) { const current = stack.pop()!; if (current.isEndOfWord && current.word !== null) { results.push(current.word); if (results.length >= limit) { break; // Early termination } } for (const child of current.children.values()) { stack.push(child); } } return results;} /** * Collect with time limit (for responsive systems). * Ensures we don't block too long on large subtrees. */function collectWordsWithTimeout( prefixNode: TrieNode, maxTimeMs: number): string[] { const results: string[] = []; const stack: TrieNode[] = [prefixNode]; const startTime = performance.now(); while (stack.length > 0) { // Check timeout periodically (every 100 nodes) if (results.length % 100 === 0) { if (performance.now() - startTime > maxTimeMs) { console.warn('Collection timeout - returning partial results'); break; } } const current = stack.pop()!; if (current.isEndOfWord && current.word !== null) { results.push(current.word); } for (const child of current.children.values()) { stack.push(child); } } return results;}Using a queue instead of a stack gives Breadth-First Search. BFS explores nodes level by level, which means shorter words are found before longer ones. This can be useful if you want to prioritize shorter completions. Simply replace stack.pop() with queue.shift() (or use a proper deque for O(1) operations).
Robust autocomplete implementations must handle various edge cases gracefully. Let's examine each one in detail.
Edge Case 1: Empty Prefix
When the user hasn't typed anything yet, the prefix is empty. This should return all words in the dictionary (or trigger a different behavior like showing recent searches).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
/** * Handle empty prefix case. * Empty prefix = start from root, all words match. */function getWordsWithPrefix(root: TrieNode, prefix: string): string[] { if (prefix === '') { // Decision point: return all words or handle specially? // Option 1: Return all (could be slow for large dictionaries) const results: string[] = []; collectWordsStored(root, results); return results; // Option 2: Return empty (let caller show recent/popular) // return []; // Option 3: Return most popular N words // return getMostPopular(root, 10); } const prefixNode = findPrefixNode(root, prefix); if (!prefixNode) return []; const results: string[] = []; collectWordsStored(prefixNode, results); return results;} /** * Edge Case 2: Prefix that is itself a complete word. * Example: prefix "program" when "program" is a word. * The prefix node should be included if it marks end-of-word. */function prefixIsWord(root: TrieNode, prefix: string): boolean { const prefixNode = findPrefixNode(root, prefix); return prefixNode !== null && prefixNode.isEndOfWord;}// This is automatically handled if collectWords checks isEndOfWord /** * Edge Case 3: Non-existent prefix. * findPrefixNode returns null, we return empty array. */function handleNonExistentPrefix(): void { const words = getWordsWithPrefix(trieRoot, 'xyz'); console.log(words); // [] - empty, not an error} /** * Edge Case 4: Special characters and mixed case. * Normalize consistently in both insert and search. */function normalizeQuery(query: string): string { return query .toLowerCase() // Case insensitive .normalize('NFD') // Decompose accented characters .replace(/[̀-ͯ]/g, '') // Remove diacritics .replace(/[^a-z0-9]/g, ''); // Remove special chars (optional)} /** * Edge Case 5: Unicode and emoji. * Ensure character iteration handles multi-byte correctly. */function iterateCharacters(str: string): string[] { // Using spread operator handles Unicode surrogates correctly return [...str];}// "café" → ['c', 'a', 'f', 'é']// "🎉hello" → ['🎉', 'h', 'e', 'l', 'l', 'o']Always validate input at the API boundary. A production autocomplete endpoint should handle null, undefined, extremely long strings, and malformed input gracefully—returning empty results rather than throwing errors.
Sometimes you want results in a specific order—typically alphabetical. The trie structure can produce alphabetically sorted results naturally, with slight modifications to the traversal.
Key Insight:
A trie already organizes words by their prefixes. If we traverse children in alphabetical order, we visit nodes in a way that produces words in alphabetical order.
Requirements for Alphabetical Collection:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
class SortedTrieNode { // Using a sorted array of [char, node] pairs for ordered iteration children: [string, SortedTrieNode][] = []; isEndOfWord: boolean = false; word: string | null = null; getChild(char: string): SortedTrieNode | undefined { // Binary search for O(log k) lookup let left = 0, right = this.children.length - 1; while (left <= right) { const mid = Math.floor((left + right) / 2); if (this.children[mid][0] === char) { return this.children[mid][1]; } else if (this.children[mid][0] < char) { left = mid + 1; } else { right = mid - 1; } } return undefined; } addChild(char: string, node: SortedTrieNode): void { // Insert in sorted position let i = 0; while (i < this.children.length && this.children[i][0] < char) { i++; } this.children.splice(i, 0, [char, node]); }} /** * Collect words in alphabetical order. * Children are traversed in sorted order, producing sorted results. */function collectWordsAlphabetical( node: SortedTrieNode, results: string[]): void { // Pre-order: add word first, then explore children if (node.isEndOfWord && node.word !== null) { results.push(node.word); } // Children are already sorted, iterate in order for (const [char, child] of node.children) { collectWordsAlphabetical(child, results); }} // Alternative: Sort after collection (simpler but less efficient)function collectThenSort(prefixNode: TrieNode): string[] { const results: string[] = []; collectWordsStored(prefixNode, results); return results.sort((a, b) => a.localeCompare(b));}Trade-offs:
| Approach | Insert Time | Lookup Time | Collection Order | Use Case |
|---|---|---|---|---|
| Hash Map children | O(1) | O(1) | Unordered | Maximum speed, sort later |
| Sorted Array children | O(k) | O(log k) | Alphabetical | Always need sorted output |
| TreeMap children | O(log k) | O(log k) | Alphabetical | Balanced performance |
where k is the number of children (alphabet size, typically 26-256).
For autocomplete, hash-based children with post-collection sorting is usually best because:
For non-ASCII text, 'alphabetical' order depends on locale. 'ñ' comes after 'n' in Spanish but after 'z' in English sorting. Use locale-aware comparison like Intl.Collator in JavaScript or locale.strxfrm() in Python for proper international sorting.
In practice, autocomplete shows only 5-10 suggestions. Collecting thousands of matches just to discard most of them is wasteful. Let's explore strategies for efficient result limiting.
Strategy 1: Early Termination
The simplest approach—stop collecting once you have enough results. Works well when you don't need ranking.
Strategy 2: Priority-Based Collection
If nodes store ranking information (popularity, recency), traverse in a way that finds the best results first.
Strategy 3: Bounded Heap
Use a min-heap of size K. As you traverse, add words to the heap. When full, only add if the new word is better than the worst in the heap.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114
import { MinHeap } from './heap'; // Assume we have a min-heap interface RankedWord { word: string; score: number;} /** * Strategy 3: Bounded heap for top-K collection. * Efficiently finds the K highest-scored words without * collecting all words first. * * Time: O(s log K) where s is subtree size * Space: O(K) for the heap */function collectTopK(prefixNode: TrieNode, k: number): string[] { // Min-heap keeps track of top K (smallest of the top K at root) const minHeap = new MinHeap<RankedWord>( (a, b) => a.score - b.score // Compare by score ); const stack: TrieNode[] = [prefixNode]; while (stack.length > 0) { const current = stack.pop()!; if (current.isEndOfWord && current.word !== null) { const candidate: RankedWord = { word: current.word, score: current.frequency // Assume nodes store frequency }; if (minHeap.size() < k) { // Haven't found k words yet, add unconditionally minHeap.push(candidate); } else if (candidate.score > minHeap.peek()!.score) { // This word is better than the worst in our top-K minHeap.pop(); // Remove worst minHeap.push(candidate); // Add better one } // If score <= worst in heap, skip this word } for (const child of current.children.values()) { stack.push(child); } } // Extract results (will be in heap order, reverse for descending) const results: string[] = []; while (minHeap.size() > 0) { results.push(minHeap.pop()!.word); } return results.reverse(); // Best first} /** * Optimization: Branch pruning with max-score metadata. * If nodes store the maximum score of any descendant, * we can skip entire branches that can't compete. */interface TrieNodeWithMaxScore extends TrieNode { maxDescendantScore: number; // Max score in this subtree} function collectTopKOptimized( prefixNode: TrieNodeWithMaxScore, k: number): string[] { const minHeap = new MinHeap<RankedWord>( (a, b) => a.score - b.score ); // Use priority queue for traversal - visit high-potential branches first const pq = new MaxHeap<TrieNodeWithMaxScore>( (a, b) => a.maxDescendantScore - b.maxDescendantScore ); pq.push(prefixNode); while (pq.size() > 0) { const current = pq.pop()!; // Pruning: if this branch can't beat our worst top-K, skip it if (minHeap.size() >= k && current.maxDescendantScore <= minHeap.peek()!.score) { continue; // Skip entire branch } if (current.isEndOfWord && current.word !== null) { const candidate: RankedWord = { word: current.word, score: current.frequency }; if (minHeap.size() < k) { minHeap.push(candidate); } else if (candidate.score > minHeap.peek()!.score) { minHeap.pop(); minHeap.push(candidate); } } for (const child of current.children.values()) { pq.push(child as TrieNodeWithMaxScore); } } const results: string[] = []; while (minHeap.size() > 0) { results.push(minHeap.pop()!.word); } return results.reverse();}| Strategy | Time Complexity | Quality | Implementation |
|---|---|---|---|
| Early termination | O(m + number_found) | Low (arbitrary results) | Simple |
| Collect all + sort + slice | O(m + s + s log s) | High (truly top-K) | Simple |
| Bounded heap (basic) | O(m + s log K) | High (truly top-K) | Medium |
| Bounded heap + pruning | O(m + visited log K) | High (truly top-K) | Complex |
Branch pruning with max-score metadata requires maintaining this metadata during insertion (additional work). It pays off when: the subtree is large, score distribution is skewed (few high-score branches), and queries frequently hit common prefixes. For small datasets, the overhead isn't worth it.
We've thoroughly explored how to suggest all words with a given prefix—the fundamental operation behind autocomplete. Let's consolidate what we've learned:
What's Next:
In the next page, we'll dive deeper into DFS from prefix node—specifically, the detailed mechanics of traversing trie subtrees. We'll explore different traversal orders, how to construct words during traversal, and advanced patterns for combining collection with filtering.
You now understand how to implement prefix-based word suggestion completely—from navigating to the prefix node through collecting all matching words. You can analyze the complexity, handle edge cases, and apply various limiting strategies. This knowledge forms the core of any autocomplete implementation.