Loading content...
On the surface, prefix matching seems trivial: given "prog", find all strings starting with "prog". But production typeahead systems face challenges that transform this simple problem into a complex engineering undertaking.
Consider these real-world queries:
This page explores the full spectrum of prefix matching challenges and their solutions.
By the end of this page, you will understand how to handle multi-word queries, implement word-boundary prefix matching, normalize text for robust matching, handle international text (CJK, Arabic, etc.), and implement fuzzy matching for typo tolerance—all while maintaining sub-100ms latency.
Users type queries with multiple words. The typeahead system must understand how these words relate to potential suggestions.
When a user types "machine le", we need to determine: is this a prefix of "machine learning" or two separate terms "machine" and "le*"?
Strategy 1: Phrase Prefix (Last Word is Prefix)
Treat all words except the last as complete words, and only the last word as a prefix:
12345678910111213141516171819202122232425262728293031323334353637383940
function phrasePrefix(query: string): { completeWords: string[]; prefix: string } { const words = query.trim().split(/\s+/); if (words.length === 0) { return { completeWords: [], prefix: '' }; } if (words.length === 1) { return { completeWords: [], prefix: words[0] }; } return { completeWords: words.slice(0, -1), prefix: words[words.length - 1], };} // "machine le" → complete: ["machine"], prefix: "le"// Matches: "machine learning", "machine learning python"// Does NOT match: "how machines learn" (word order matters) function matchPhrasePrefix( suggestion: string, completeWords: string[], prefix: string): boolean { const suggestionWords = suggestion.toLowerCase().split(/\s+/); // Check if complete words appear in order at the start for (let i = 0; i < completeWords.length; i++) { if (i >= suggestionWords.length) return false; if (suggestionWords[i] !== completeWords[i].toLowerCase()) return false; } // Check if prefix matches next word const nextWordIndex = completeWords.length; if (nextWordIndex >= suggestionWords.length) return false; return suggestionWords[nextWordIndex].startsWith(prefix.toLowerCase());}Strategy 2: All Words as Prefixes (Boolean AND)
Treat every word as a prefix and require all to match:
123456789101112131415161718192021
function matchAllPrefixes( suggestion: string, queryWords: string[]): boolean { const suggestionLower = suggestion.toLowerCase(); const suggestionWords = suggestionLower.split(/\s+/); // Each query word must match as prefix of some suggestion word return queryWords.every(queryWord => { const qLower = queryWord.toLowerCase(); return suggestionWords.some(sWord => sWord.startsWith(qLower)); });} // "le mach" →// Matches: "machine learning" (le→learning, mach→machine)// Matches: "learn machine" (same reason)// Matches: "machine learning for beginners" (same prefixes matched) // This is more flexible but can over-match// "py learn" might match "learn python" or "learning pygame"Strategy 3: Hybrid with Position Weighting
Combine strategies with scoring that prefers phrase-order matches:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
interface MatchResult { matches: boolean; score: number; matchType: 'exact_phrase' | 'ordered_prefix' | 'unordered_prefix' | 'no_match';} function hybridMatch(suggestion: string, query: string): MatchResult { const queryWords = query.toLowerCase().trim().split(/\s+/); const suggestionWords = suggestion.toLowerCase().split(/\s+/); // Check phrase prefix (highest score) if (matchesPhrasePrefix(suggestionWords, queryWords)) { return { matches: true, score: 1.0, matchType: 'exact_phrase' }; } // Check ordered prefix (medium score) if (matchesOrderedPrefixes(suggestionWords, queryWords)) { return { matches: true, score: 0.7, matchType: 'ordered_prefix' }; } // Check unordered prefix (lowest score) if (matchesUnorderedPrefixes(suggestionWords, queryWords)) { return { matches: true, score: 0.4, matchType: 'unordered_prefix' }; } return { matches: false, score: 0, matchType: 'no_match' };} function matchesOrderedPrefixes( suggestionWords: string[], queryWords: string[]): boolean { // Query words must match in order, but not necessarily contiguously // "java script" should match "javascript fundamentals tutorial" let sIndex = 0; for (const qWord of queryWords) { // Find this query word prefix in remaining suggestion words let found = false; while (sIndex < suggestionWords.length) { if (suggestionWords[sIndex].startsWith(qWord)) { found = true; sIndex++; break; } sIndex++; } if (!found) return false; } return true;}Most production systems use phrase prefix as the primary strategy (it matches user mental model), with ordered-prefix as a fallback. Unordered matching is typically reserved for fuzzy search features rather than core autocomplete. Score-weight your strategies accordingly.
Text normalization ensures that superficially different strings match when semantically equivalent. This is critical for user experience—users shouldn't need to guess the exact casing or spelling used in the corpus.
The simplest but most impactful normalization:
123456789101112131415161718
// Simple approach: lowercase everythingfunction normalizeCase(text: string): string { return text.toLowerCase();} // But beware of locale-specific issues!// Turkish: 'I'.toLowerCase() should be 'ı' not 'i'// German: 'ß'.toUpperCase() is 'SS' function normalizeCase(text: string, locale: string = 'en'): string { return text.toLocaleLowerCase(locale);} // For search indices, often use case-folding (Unicode standard)function caseFold(text: string): string { // Full Unicode case folding handles all edge cases return text.normalize('NFKC').toLowerCase();}Handle accented characters for languages like French, Spanish, German:
123456789101112131415161718192021222324
function removeDiacritics(text: string): string { // NFD: Decomposes characters (é → e + combining-acute) // Then remove the combining marks return text .normalize('NFD') .replace(/[\u0300-\u036f]/g, ''); // Remove combining diacritical marks} // "café" → "cafe"// "résumé" → "resume"// "naïve" → "naive"// "Zürich" → "Zurich" // But preserve original for display!interface NormalizedSuggestion { original: string; // "Café Latte" - shown to user normalized: string; // "cafe latte" - used for matching} // Index stores both; search normalizes queryfunction search(query: string): Suggestion[] { const normalizedQuery = removeDiacritics(normalizeCase(query)); return index.search(normalizedQuery); // Returns with original text}Unicode text can represent the same visual character in multiple ways. Normalization ensures consistent representation:
| Form | Name | Use Case |
|---|---|---|
| NFC | Canonical Composition | Most common; combines characters (e + ́ → é) |
| NFD | Canonical Decomposition | Separates characters (é → e + ́); useful for diacritic removal |
| NFKC | Compatibility Composition | Maps compatibility characters (fi → fi, ① → 1) |
| NFKD | Compatibility Decomposition | Most aggressive; use for search indices |
12345678910111213141516171819
function normalizeForIndex(text: string): string { return text // 1. Unicode normalization .normalize('NFKC') // 2. Case folding .toLocaleLowerCase('en') // 3. Remove diacritics .normalize('NFD') .replace(/[\u0300-\u036f]/g, '') // 4. Normalize whitespace .replace(/\s+/g, ' ') .trim() // 5. Remove punctuation (optional, depends on use case) .replace(/[^\p{L}\p{N}\s]/gu, '');} // " Café Latte™ " → "cafe latte"// "naïve résumé" → "naive resume"// "Hello World" (full-width) → "hello world"The same normalization must be applied to both the indexed suggestions AND the user query at search time. Mismatched normalization is a common source of "why doesn't my search work" bugs. Centralize normalization logic and use it everywhere.
Building a typeahead system for global users introduces fundamental challenges. Different languages have different concepts of "words," "prefixes," and even character boundaries.
These languages don't use spaces between words. A sentence like "机器学习" (machine learning) is four characters with no explicit word boundaries:
12345678910111213141516171819202122232425262728293031323334
// The problem: Where are the "words" in "机器学习"?// 机器 (machine) + 学习 (learning) - but this requires linguistic knowledge // Solution 1: Character-level prefix matching// Simple but over-matches: "机器" matches "机器学习" and "机器人"function cjkCharacterMatch(suggestion: string, query: string): boolean { return suggestion.includes(query); // Substring, not just prefix} // Solution 2: Dictionary-based segmentation// Use libraries like jieba (Chinese), MeCab (Japanese), Okt (Korean)import Segment from 'segment'; // Chinese word segmentation const segmenter = new Segment();segmenter.useDefault(); function segmentChinese(text: string): string[] { return segmenter.doSegment(text, { simple: true });} // "机器学习教程" → ["机器", "学习", "教程"] // Solution 3: N-gram indexing// Index all overlapping character sequencesfunction generateNgrams(text: string, n: number = 2): string[] { const ngrams: string[] = []; for (let i = 0; i <= text.length - n; i++) { ngrams.push(text.slice(i, i + n)); } return ngrams;} // "学习" → ["学习"] (2-gram)// Can match within "机器学习" without full segmentationRTL languages have unique challenges for prefix matching:
12345678910111213141516171819202122232425262728
// Arabic example: "الترجمة" (translation)// Visually appears right-to-left, but string storage is left-to-right // The "prefix" from user perspective is actually at the end of string storage// No code change needed - just understand that:// - User types aleph (ا) first// - String is stored with aleph at index 0// - Prefix matching works normally // BUT: Arabic has complex letter forms based on position// "ب" (bā') looks different at start, middle, end of word// Initial: بـ Medial: ـبـ Final: ـب Isolated: ب // Solution: Normalize to isolated forms for matchingfunction normalizeArabic(text: string): string { // Map presentation forms to base letters const arabicNormMap: Record<string, string> = { // ... mapping of presentation forms to base }; return [...text] .map(char => arabicNormMap[char] || char) .join('');} // Hebrew considerations: // - Final letter forms (ם vs מ, ן vs נ, etc.)// - Vowel points (niqqud) typically stripped for searchUsers may mix scripts in a single query ("Python チュートリアル" = Python tutorial in Japanese):
1234567891011121314151617181920212223242526272829303132333435363738394041
function detectScripts(text: string): Set<string> { const scripts = new Set<string>(); for (const char of text) { const code = char.charCodeAt(0); if (code >= 0x0000 && code <= 0x007F) scripts.add('latin'); else if (code >= 0x4E00 && code <= 0x9FFF) scripts.add('cjk'); else if (code >= 0x3040 && code <= 0x309F) scripts.add('hiragana'); else if (code >= 0x30A0 && code <= 0x30FF) scripts.add('katakana'); else if (code >= 0x0600 && code <= 0x06FF) scripts.add('arabic'); else if (code >= 0x0590 && code <= 0x05FF) scripts.add('hebrew'); else if (code >= 0xAC00 && code <= 0xD7AF) scripts.add('korean'); // ... more scripts } return scripts;} function handleMixedScriptQuery(query: string): string[] { // Split into script-homogeneous segments const segments: { script: string; text: string }[] = []; let currentScript = ''; let currentText = ''; for (const char of query) { const charScript = getCharScript(char); if (charScript !== currentScript && currentText) { segments.push({ script: currentScript, text: currentText }); currentText = ''; } currentScript = charScript; currentText += char; } if (currentText) { segments.push({ script: currentScript, text: currentText }); } // Apply appropriate normalization per segment return segments.map(s => normalizeForScript(s.text, s.script));}For production internationalization, use the ICU library (International Components for Unicode). It provides locale-aware text segmentation, collation, and normalization. Available in most languages: icu4c (C/C++), icu4j (Java), Intl API (JavaScript). Don't reinvent linguistic wheels.
Users make typos. A typeahead system that only matches exact prefixes frustrates users who mis-type. Fuzzy matching allows "prgoramming" to match "programming."
The classic metric for string similarity: minimum edits (insert, delete, replace) to transform one string into another.
123456789101112131415161718192021222324252627282930313233
function levenshteinDistance(a: string, b: string): number { const m = a.length; const n = b.length; // DP table: dp[i][j] = edit distance between a[0..i-1] and b[0..j-1] const dp: number[][] = Array(m + 1) .fill(null) .map(() => Array(n + 1).fill(0)); // Base cases for (let i = 0; i <= m; i++) dp[i][0] = i; // Delete all from a for (let j = 0; j <= n; j++) dp[0][j] = j; // Insert all to get b // Fill DP table for (let i = 1; i <= m; i++) { for (let j = 1; j <= n; j++) { if (a[i - 1] === b[j - 1]) { dp[i][j] = dp[i - 1][j - 1]; // No edit needed } else { dp[i][j] = 1 + Math.min( dp[i - 1][j], // Delete from a dp[i][j - 1], // Insert into a dp[i - 1][j - 1] // Replace ); } } } return dp[m][n];} // "prgoramming" vs "programming": distance = 2 (swap 'rg' and 'or')// "pyton" vs "python": distance = 1 (missing 'h')Naive fuzzy matching is O(n × m × k) where n = corpus size, m = query length, k = average suggestion length. This is far too slow for real-time typeahead.
Build a finite automaton that accepts all strings within edit distance d of the query. Then traverse the trie in parallel with the automaton:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
// A Levenshtein automaton for query "cat" with distance 1// accepts: "cat" (0), "cats" (1), "bat" (1), "at" (1), "cart" (1)// rejects: "dog" (3), "catch" (2) interface AutomatonState { // The state encodes how much of the query we've matched // and how many errors we've used position: number; errorCount: number; // Transitions on input character transitions: Map<string | 'any', AutomatonState[]>; // Can we accept (match) from this state? canMatch: boolean;} function fuzzySearchWithAutomaton( trie: TrieNode, query: string, maxDistance: number): Suggestion[] { const automaton = buildLevenshteinAutomaton(query, maxDistance); const results: Suggestion[] = []; function traverse( trieNode: TrieNode, autoState: AutomatonState ): void { // If automaton can match and trie node is terminal, we have a result if (autoState.canMatch && trieNode.isTerminal) { results.push({ word: trieNode.word!, score: trieNode.metadata!.score }); } // Try all trie children for (const [char, childTrie] of trieNode.children) { // Check what automaton states we can reach const nextStates = step(autoState, char); for (const nextState of nextStates) { if (!isDead(nextState)) { traverse(childTrie, nextState); } } } } traverse(trie.root, automaton.initialState); return results;} // Time complexity: O(m × |Σ| × d) to build automaton// + O(k × d) per trie node visited, where d = max distance// Much better than O(n × m × k) naive approachMatch based on how words sound, not how they're spelled. Useful for names and products:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
// Soundex: Encode words by their sound (English-centric)function soundex(word: string): string { const codes: Record<string, string> = { 'b': '1', 'f': '1', 'p': '1', 'v': '1', 'c': '2', 'g': '2', 'j': '2', 'k': '2', 'q': '2', 's': '2', 'x': '2', 'z': '2', 'd': '3', 't': '3', 'l': '4', 'm': '5', 'n': '5', 'r': '6', }; const first = word[0].toUpperCase(); let code = first; let lastCode = codes[word[0].toLowerCase()] || ''; for (let i = 1; i < word.length && code.length < 4; i++) { const c = word[i].toLowerCase(); const newCode = codes[c]; if (newCode && newCode !== lastCode) { code += newCode; lastCode = newCode; } else if (!newCode) { lastCode = ''; // Vowels/h/w separate consonants } } return code.padEnd(4, '0');} // "Robert" → "R163"// "Rupert" → "R163" (same code!)// "Programming" → "P625"// "Programing" → "P625" (same code - handles misspelling) // Build a phonetic index alongside the prefix indexinterface DualIndex { prefixTrie: Trie; phoneticIndex: Map<string, string[]>; // Soundex → words} function fuzzySearch(query: string): Suggestion[] { const prefixMatches = prefixTrie.search(query); const phoneticCode = soundex(query); const phoneticMatches = phoneticIndex.get(phoneticCode) || []; // Merge and deduplicate return mergeResults(prefixMatches, phoneticMatches);}Production systems often use a tiered approach: exact prefix matches first (score 1.0), then edit distance 1 matches (score 0.8), then phonetic matches (score 0.5). This ensures exact matches always rank highest while still surfacing relevant fuzzy matches.
Users expect prefix matching to respect word boundaries. "learn" should match "machine learning" but perhaps not "relearn" (unless we explicitly want infix matching).
Strategy 1: First Word Prefix Only
Only match if the query is a prefix of the first word:
123456789101112
function matchFirstWordPrefix( suggestion: string, query: string): boolean { const firstWord = suggestion.split(/\s+/)[0]; return firstWord.toLowerCase().startsWith(query.toLowerCase());} // Query: "mac"// ✓ "MacBook Pro" (first word starts with mac)// ✓ "Machine Learning" (first word starts with mac)// ✗ "Apple MacBook" (first word is Apple)Strategy 2: Any Word Prefix (Word Boundary)
Match if query is prefix of ANY word in the suggestion:
12345678910111213141516171819202122232425
function matchAnyWordPrefix( suggestion: string, query: string): { matches: boolean; position: number } { const words = suggestion.split(/\s+/); const queryLower = query.toLowerCase(); for (let i = 0; i < words.length; i++) { if (words[i].toLowerCase().startsWith(queryLower)) { return { matches: true, position: i }; } } return { matches: false, position: -1 };} // Query: "learn"// ("Machine Learning", 1) - matches 2nd word// ("Learn Python", 0) - matches 1st word// ("Relearn Basics", false) - "learn" is not a word prefix // Use position for ranking: earlier position = higher scorefunction scoreByPosition(position: number): number { return position === 0 ? 1.0 : 1.0 / (position + 1);}Strategy 3: Multi-Index Approach
Build separate trie indices for different word positions:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
interface MultiPositionIndex { // Index by word position firstWordTrie: Trie; // Only first words allWordsTrie: Trie; // All words with pointers to full suggestions // Posting lists: word → [(suggestionId, position), ...] wordToSuggestions: Map<string, { id: number; position: number }[]>;} function buildMultiPositionIndex(suggestions: string[]): MultiPositionIndex { const firstWordTrie = new Trie(); const allWordsTrie = new Trie(); const wordToSuggestions = new Map<string, { id: number; position: number }[]>(); for (let id = 0; id < suggestions.length; id++) { const suggestion = suggestions[id]; const words = suggestion.split(/\s+/); // Index first word firstWordTrie.insert(words[0].toLowerCase(), { suggestionId: id }); // Index all words for (let pos = 0; pos < words.length; pos++) { const wordLower = words[pos].toLowerCase(); allWordsTrie.insert(wordLower, { suggestionId: id, position: pos }); // Posting list if (!wordToSuggestions.has(wordLower)) { wordToSuggestions.set(wordLower, []); } wordToSuggestions.get(wordLower)!.push({ id, position: pos }); } } return { firstWordTrie, allWordsTrie, wordToSuggestions };} function search(query: string, index: MultiPositionIndex): Suggestion[] { // First try exact first-word match (highest relevance) const firstWordMatches = index.firstWordTrie.search(query); if (firstWordMatches.length >= 5) { return firstWordMatches.slice(0, 10); // Enough first-word matches } // Supplement with any-word matches const anyWordMatches = index.allWordsTrie.search(query); return mergeByScore(firstWordMatches, anyWordMatches).slice(0, 10);}For technical domains (code, product names), consider CamelCase/snake_case boundaries: 'getUser' should be searchable as 'get' or 'user'. Split on case transitions and underscores when indexing: 'getUserById' → ['get', 'user', 'by', 'id'].
Even with tries, matching at scale requires careful optimization. Here are proven techniques.
Very short prefixes (1-2 characters) match too many suggestions. Cache their results:
123456789101112131415161718192021222324252627282930313233343536373839
class ShortPrefixCache { // Pre-computed results for all 1-2 character prefixes private cache: Map<string, Suggestion[]>; constructor(trie: Trie, topK: number = 10) { this.cache = new Map(); // Generate all 1-char prefixes for (let c = 'a'.charCodeAt(0); c <= 'z'.charCodeAt(0); c++) { const prefix = String.fromCharCode(c); this.cache.set(prefix, trie.getTopK(prefix, topK)); } // Generate all 2-char prefixes (26 × 26 = 676) for (let c1 = 'a'.charCodeAt(0); c1 <= 'z'.charCodeAt(0); c1++) { for (let c2 = 'a'.charCodeAt(0); c2 <= 'z'.charCodeAt(0); c2++) { const prefix = String.fromCharCode(c1) + String.fromCharCode(c2); this.cache.set(prefix, trie.getTopK(prefix, topK)); } } // Optional: 3-char prefixes (26³ = 17,576 entries) // Only if memory allows and 2-char results are too broad } search(prefix: string, k: number): Suggestion[] | null { if (prefix.length <= 2) { return this.cache.get(prefix.toLowerCase()) || null; } return null; // Fall through to trie search }} // Usage pattern:function search(prefix: string): Suggestion[] { const cached = shortPrefixCache.search(prefix, 10); if (cached) return cached; return trie.search(prefix);}Keystrokes often extend the previous query. Reuse previous computation:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
class IncrementalSearchSession { private lastQuery: string = ''; private lastNode: TrieNode | null = null; private lastResults: Suggestion[] = []; search(query: string, trie: Trie): Suggestion[] { // Check if this is an extension of the last query if (query.startsWith(this.lastQuery) && this.lastNode) { // Navigate from last position instead of root const extension = query.slice(this.lastQuery.length); let node = this.lastNode; for (const char of extension) { const child = node.children.get(char.toLowerCase()); if (!child) { // No match for extension this.reset(); return []; } node = child; } this.lastQuery = query; this.lastNode = node; this.lastResults = node.getTopK(10); return this.lastResults; } // Not an extension, start fresh this.reset(); this.lastNode = trie.findNode(query); this.lastQuery = query; if (this.lastNode) { this.lastResults = this.lastNode.getTopK(10); return this.lastResults; } return []; } reset(): void { this.lastQuery = ''; this.lastNode = null; this.lastResults = []; }} // Benefits:// - "mach" → "machi" → "machin" → "machine"// Each keystroke traverses only 1 node instead of full path// - Dramatic speedup for interactive typingClient-side optimization reduces unnecessary requests:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
class TypeaheadClient { private debounceMs: number = 100; // Wait 100ms after last keystroke private minChars: number = 2; // Don't search until 2+ characters private timeoutId: number | null = null; private lastQuery: string = ''; private controller: AbortController | null = null; onInput(query: string): void { // Cancel pending request if (this.controller) { this.controller.abort(); } // Clear pending debounce if (this.timeoutId) { clearTimeout(this.timeoutId); } // Skip short queries if (query.length < this.minChars) { this.clearResults(); return; } // Debounce this.timeoutId = setTimeout(() => { this.executeSearch(query); }, this.debounceMs); } private async executeSearch(query: string): Promise<void> { // Avoid duplicate requests if (query === this.lastQuery) return; this.lastQuery = query; // Abort previous in-flight request this.controller = new AbortController(); try { const response = await fetch(`/api/suggest?q=${encodeURIComponent(query)}`, { signal: this.controller.signal, }); const suggestions = await response.json(); this.displayResults(suggestions); } catch (e) { if (e.name !== 'AbortError') { console.error('Search failed:', e); } } }} // Optimization impact:// Without debouncing: "machine" = 7 requests// With 100ms debounce: "machine" typed quickly = 1-2 requestsPrefix matching performance is a full-stack concern. Client-side debouncing, edge caching, incremental search sessions, and trie optimizations all contribute. A 20ms server response means nothing if the network round-trip is 200ms—that's why edge deployment and caching matter.
How do we know if our prefix matching is working well? Quality metrics help guide optimization efforts.
| Metric | Definition | Target |
|---|---|---|
| Prefix Recall | % of relevant suggestions retrieved for given prefix | 90% for exact prefix |
| Fuzzy Recall | % of relevant suggestions retrieved despite typos | 70% at edit distance 1 |
| MRR (Mean Reciprocal Rank) | Average of 1/rank of first relevant result | 0.5 (avg rank 2 or better) |
| P@K (Precision at K) | % of top-K results that are relevant | 80% for K=5 |
| Character Savings Rate | Avg keystrokes saved by using suggestions | 50% |
| Zero Results Rate | % of queries with no suggestions | < 5% |
12345678910111213141516171819202122232425262728293031323334353637383940414243
interface SearchEvaluation { query: string; expectedResult: string; // What the user was looking for actualResults: string[]; // What the system returned} function computeMRR(evaluations: SearchEvaluation[]): number { let sumReciprocalRanks = 0; for (const eval of evaluations) { const rank = eval.actualResults.findIndex( r => r.toLowerCase() === eval.expectedResult.toLowerCase() ) + 1; // 1-indexed rank if (rank > 0) { sumReciprocalRanks += 1 / rank; } // If not found, adds 0 (reciprocal rank of ∞ is 0) } return sumReciprocalRanks / evaluations.length;} // MRR = 1.0 means perfect (always rank 1)// MRR = 0.5 means average rank 2// MRR = 0.33 means average rank 3 function computeCharacterSavings(evaluations: SearchEvaluation[]): number { let totalSaved = 0; let totalPossible = 0; for (const eval of evaluations) { const queryLength = eval.query.length; const targetLength = eval.expectedResult.length; if (eval.actualResults.includes(eval.expectedResult)) { totalSaved += targetLength - queryLength; } totalPossible += targetLength; } return totalSaved / totalPossible;}Metrics matter, but user behavior is the final judge. A/B test matching strategies with real users. Measure not just clicks on suggestions, but downstream outcomes: Did users find what they wanted? Did they complete their task? Matching changes can have surprising effects on user behavior.
Prefix matching is far more nuanced than simple string comparison. Let's consolidate the key takeaways:
User Input: "prgrm tutoral"
↓
Normalization: "prgrm tutoral" (lowercase, no diacritics)
↓
Tokenization: ["prgrm", "tutoral"]
↓
Fuzzy Expansion: ["program", "programming"], ["tutorial", "tutorials"]
↓
Prefix Match: Find all suggestions matching expanded terms
↓
Scoring: Rank by match quality, position, popularity
↓
Top-K: Return best 10 suggestions
↓
Display: "Programming Tutorial", "Program Tutorial", ...
What's next:
With matching mechanics understood, the next page explores Ranking Suggestions—how to order millions of matching suggestions so the best ones appear first. Ranking determines whether typeahead feels magical or mediocre.
You now understand the breadth and depth of prefix matching challenges. From simple case-folding to complex CJK segmentation, from exact prefix to fuzzy tolerance—you can make informed decisions about matching strategies for any typeahead system.