Autocomplete Systems - Learning Module

Loading content...

0/276

Ranking Suggestions

The Art and Science of Relevance

Finding all words that match a prefix is only half the battle. When a user types 'pro' and your dictionary contains hundreds of matching words—program, project, progress, promise, property, protocol, professor, production, professional, and countless more—which ones do you show first?

This is the ranking problem: ordering candidates by relevance so the most useful suggestions appear at the top. Get it right, and autocomplete feels magical—anticipating exactly what users want. Get it wrong, and even a technically correct system feels broken, buried beneath irrelevant suggestions.

Ranking transforms autocomplete from a data structure exercise into a user experience optimization problem. It draws on signals from usage patterns, user behavior, temporal dynamics, and context. This page will equip you with the principles and techniques to build ranking systems that truly serve users.

What You Will Master

By the end of this page, you will understand why ranking matters beyond correctness, implement popularity-based and recency-based ranking, design personalized ranking systems, combine multiple signals into unified scores, and recognize when simple ranking suffices vs when you need sophisticated models.

Why Ranking Matters

Before diving into algorithms, let's understand why ranking is so critical to autocomplete success.

The Fundamental Constraint:

Users see only 5-10 suggestions. A prefix like 'pro' might match 500 words in your dictionary. Showing the wrong 5 means the right suggestion is buried—users must keep typing or scroll through options, defeating the purpose of autocomplete.

The User Expectation:

Users expect the first suggestion to be their intended word most of the time. When typing 'prog', a software developer expects 'programming' or 'program' at the top—not 'prognosis' or 'progeny', even though these are valid matches.

The Business Impact:

Poor ranking creates friction. In search engines, it means more typing and slower task completion. In e-commerce, it means missed products and lost sales. In code editors, it means slower development. Every bad suggestion costs user satisfaction and productivity.

Impact of Ranking Quality
Ranking Quality	User Experience	System Perception	User Behavior
Excellent	First suggestion is usually correct	System feels intelligent, magical	Users adopt and rely on autocomplete
Good	Correct suggestion in top 3	System is helpful	Users use autocomplete when convenient
Poor	Must scroll or type more	System feels dumb, frustrating	Users ignore autocomplete
Terrible	Suggestions are never useful	Feature feels broken	Users actively avoid the feature

The 80/20 of Ranking

Simple popularity-based ranking gets you 80% of the way. Most users searching 'pro' want common words. But the last 20%—personalization, context-awareness, recency—separates good autocomplete from great. Invest in the basics first, then add sophistication.

Popularity-Based Ranking

The simplest and most effective baseline: rank words by how frequently they're used. Popular words are popular for a reason—they're what most users want.

How to Measure Popularity:

Search query frequency: How often this term is searched (for search engines)
Selection frequency: How often users select this suggestion when offered
Document frequency: How often this term appears in a corpus
Manual curation: Editor-assigned importance scores

Implementation Approach:

Store a frequency/popularity score with each word in the trie. During collection, include this score. After collection, sort by score descending.

popularityRanking.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
class TrieNode {
    children: Map<string, TrieNode> = new Map();
    isEndOfWord: boolean = false;
    word: string | null = null;
    frequency: number = 0;  // Popularity score
}
 
interface RankedWord {
    word: string;
    score: number;
}
 
/**
 * Collect words with their popularity scores.
 */
function collectWithScores(node: TrieNode): RankedWord[] {
    const results: RankedWord[] = [];
    
    function dfs(current: TrieNode): void {
        if (current.isEndOfWord && current.word) {
            results.push({
                word: current.word,
                score: current.frequency
            });
        }
        for (const child of current.children.values()) {
            dfs(child);
        }
    }
    
    dfs(node);
    return results;
}
 
/**
 * Get top-K suggestions ranked by popularity.
 */
function getTopSuggestions(
    root: TrieNode,
    prefix: string,
    k: number = 10
): string[] {
    // Navigate to prefix node
    let current = root;
    for (const char of prefix) {
        if (!current.children.has(char)) {
            return [];
        }
        current = current.children.get(char)!;
    }
    
    // Collect and rank
    const candidates = collectWithScores(current);
    
    // Sort by frequency descending
    candidates.sort((a, b) => b.score - a.score);
    
    // Return top K words
    return candidates.slice(0, k).map(c => c.word);
}
 
 
/**
 * Efficient top-K using min-heap (avoids full sort).
 */
import { MinHeap } from './heap';
 
function getTopSuggestionsEfficient(
    root: TrieNode,
    prefix: string,
    k: number = 10
): string[] {
    let current = root;
    for (const char of prefix) {
        if (!current.children.has(char)) {
            return [];
        }
        current = current.children.get(char)!;
    }
    
    // Min-heap of size K
    const heap = new MinHeap<RankedWord>((a, b) => a.score - b.score);
    
    function dfs(node: TrieNode): void {
        if (node.isEndOfWord && node.word) {
            const candidate = { word: node.word, score: node.frequency };
            
            if (heap.size() < k) {
                heap.push(candidate);
            } else if (candidate.score > heap.peek()!.score) {
                heap.pop();
                heap.push(candidate);
            }
        }
        
        for (const child of node.children.values()) {
            dfs(child);
        }
    }
    
    dfs(current);
    
    // Extract in sorted order
    const result: string[] = [];
    while (heap.size() > 0) {
        result.push(heap.pop()!.word);
    }
    return result.reverse();  // Highest first
}

Updating Popularity Scores

Popularity isn't static. Track which suggestions users actually select and increment their scores. Use exponential decay or sliding windows to prevent stale popular terms from dominating forever. Balance historical popularity with recent trends.

Recency-Based Ranking

Popularity captures long-term trends, but what about the present moment? Recency-based ranking favors recently popular or newly added terms.

Use Cases for Recency:

Trending topics: A new movie or event suddenly becomes searchable
Breaking news: Terms spike temporarily during events
User session context: Words used earlier in the same session
New entries: Recently added products, features, or content

Time Decay Models:

Recency is typically implemented by decaying historical scores over time, so recent activity counts more than old activity.

recencyRanking.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
/**
 * Exponential decay: recent events count more.
 * Score decays by half every 'halfLife' time units.
 */
function exponentialDecay(
    baseScore: number,
    ageMs: number,
    halfLifeMs: number
): number {
    const decayFactor = Math.pow(0.5, ageMs / halfLifeMs);
    return baseScore * decayFactor;
}
 
// Example usage:
// A word searched 1 hour ago with base score 100:
// exponentialDecay(100, 60*60*1000, 24*60*60*1000)  // halfLife = 1 day
// → ~97.2 (decayed slightly)
 
// Same word searched 7 days ago:
// exponentialDecay(100, 7*24*60*60*1000, 24*60*60*1000)
// → ~0.78 (heavily decayed)
 
 
/**
 * Sliding window: count only events within time window.
 */
interface TimestampedEvent {
    timestamp: number;
    weight: number;
}
 
function slidingWindowScore(
    events: TimestampedEvent[],
    windowMs: number,
    now: number = Date.now()
): number {
    const cutoff = now - windowMs;
    
    return events
        .filter(e => e.timestamp >= cutoff)
        .reduce((sum, e) => sum + e.weight, 0);
}
 
 
/**
 * Combined recency and popularity scoring.
 */
interface WordMetadata {
    word: string;
    totalScore: number;        // Historical popularity
    lastAccessTime: number;    // When was this last selected
    recentEvents: TimestampedEvent[];  // Recent selection events
}
 
function calculateRecencyAwareScore(
    metadata: WordMetadata,
    config: {
        popularityWeight: number;    // e.g., 0.6
        recencyWeight: number;       // e.g., 0.4
        decayHalfLifeMs: number;     // e.g., 24 hours
        windowMs: number;            // e.g., 7 days
    }
): number {
    const now = Date.now();
    
    // Popularity component (with decay based on staleness)
    const ageMs = now - metadata.lastAccessTime;
    const decayedPopularity = exponentialDecay(
        metadata.totalScore,
        ageMs,
        config.decayHalfLifeMs
    );
    
    // Recency component (recent activity)
    const recentScore = slidingWindowScore(
        metadata.recentEvents,
        config.windowMs,
        now
    );
    
    // Weighted combination
    return (
        config.popularityWeight * decayedPopularity +
        config.recencyWeight * recentScore
    );
}
 
 
/**
 * Trie node with recency tracking.
 */
class RecencyTrieNode {
    children: Map<string, RecencyTrieNode> = new Map();
    isEndOfWord: boolean = false;
    word: string | null = null;
    
    // Recency data
    totalSelections: number = 0;
    lastSelectedAt: number = 0;
    selectionHistory: number[] = [];  // Timestamps of selections
    
    // Pre-computed score (refreshed periodically)
    cachedScore: number = 0;
    cacheExpiry: number = 0;
    
    getScore(now: number = Date.now()): number {
        // Use cached score if fresh
        if (now < this.cacheExpiry) {
            return this.cachedScore;
        }
        
        // Recompute score
        const windowMs = 7 * 24 * 60 * 60 * 1000;  // 7 days
        const halfLifeMs = 24 * 60 * 60 * 1000;   // 1 day
        
        // Recent selections count
        const cutoff = now - windowMs;
        const recentCount = this.selectionHistory
            .filter(t => t >= cutoff).length;
        
        // Decay based on last access
        const ageMs = now - this.lastSelectedAt;
        const decayFactor = Math.pow(0.5, ageMs / halfLifeMs);
        
        this.cachedScore = (this.totalSelections * decayFactor) + (recentCount * 10);
        this.cacheExpiry = now + 60000;  // Cache for 1 minute
        
        return this.cachedScore;
    }
    
    recordSelection(timestamp: number = Date.now()): void {
        this.totalSelections++;
        this.lastSelectedAt = timestamp;
        this.selectionHistory.push(timestamp);
        
        // Prune old history to prevent unbounded growth
        const cutoff = timestamp - (30 * 24 * 60 * 60 * 1000);  // 30 days
        this.selectionHistory = this.selectionHistory.filter(t => t >= cutoff);
    }
}

Recency Can Be Gamed

Pure recency ranking is vulnerable to manipulation. A malicious actor could artificially boost terms by repeatedly searching for them. Always combine recency with other signals and implement rate limiting. For critical applications, use authenticated signals (logged-in user selections) rather than anonymous activity.

Personalized Ranking

Different users want different things. A software engineer typing 'class' wants programming-related suggestions; a biology student might want 'classification'. Personalization tailors ranking to individual users.

Personalization Signals:

User history: What has this user searched/selected before?
User profile: Explicit preferences, profession, interests
Session context: What has the user done in this session?
Cohort similarity: What do similar users prefer?

Implementation Strategies:

Per-user score adjustments: Boost words the user has selected before
User-specific frequency counts: Maintain separate counts per user
Hybrid scoring: Combine global popularity with personal relevance

personalizedRanking.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
interface UserProfile {
    userId: string;
    wordScores: Map<string, number>;  // Personal frequency for each word
    categories: Set<string>;          // Interest categories
    sessionHistory: string[];         // Words used this session
}
 
/**
 * Calculate personalized score for a word.
 */
function personalizedScore(
    word: string,
    globalScore: number,
    userProfile: UserProfile,
    config: {
        globalWeight: number;         // e.g., 0.5
        personalWeight: number;       // e.g., 0.3
        sessionWeight: number;        // e.g., 0.2
        categoryBoost: number;        // e.g., 1.5x multiplier
    }
): number {
    // Global popularity component
    const globalComponent = globalScore * config.globalWeight;
    
    // Personal history component
    const personalFreq = userProfile.wordScores.get(word) ?? 0;
    const personalComponent = personalFreq * config.personalWeight;
    
    // Session context component (huge boost if used this session)
    const usedInSession = userProfile.sessionHistory.includes(word);
    const sessionComponent = usedInSession ? 100 * config.sessionWeight : 0;
    
    // Category boost (if we know word categories)
    const wordCategories = getWordCategories(word);  // External function
    const hasMatchingCategory = wordCategories.some(
        cat => userProfile.categories.has(cat)
    );
    const categoryMultiplier = hasMatchingCategory ? config.categoryBoost : 1.0;
    
    return (globalComponent + personalComponent + sessionComponent) * categoryMultiplier;
}
 
 
/**
 * Ranking with personalization.
 */
function getRankedSuggestions(
    root: TrieNode,
    prefix: string,
    userProfile: UserProfile | null,
    k: number = 10
): string[] {
    // Navigate to prefix node
    let current = root;
    for (const char of prefix) {
        if (!current.children.has(char)) {
            return [];
        }
        current = current.children.get(char)!;
    }
    
    // Collect candidates with global scores
    const candidates: { word: string; score: number }[] = [];
    
    function dfs(node: TrieNode): void {
        if (node.isEndOfWord && node.word) {
            let score = node.frequency;  // Global score
            
            // Apply personalization if we have a user profile
            if (userProfile) {
                score = personalizedScore(
                    node.word,
                    node.frequency,
                    userProfile,
                    {
                        globalWeight: 0.5,
                        personalWeight: 0.3,
                        sessionWeight: 0.2,
                        categoryBoost: 1.5
                    }
                );
            }
            
            candidates.push({ word: node.word, score });
        }
        
        for (const child of node.children.values()) {
            dfs(child);
        }
    }
    
    dfs(current);
    
    // Sort by personalized score
    candidates.sort((a, b) => b.score - a.score);
    
    return candidates.slice(0, k).map(c => c.word);
}
 
 
/**
 * Learning from user selections.
 * Called when user selects a suggestion.
 */
function recordUserSelection(
    userProfile: UserProfile,
    selectedWord: string
): void {
    // Update personal word score
    const currentScore = userProfile.wordScores.get(selectedWord) ?? 0;
    userProfile.wordScores.set(selectedWord, currentScore + 1);
    
    // Update session history
    if (!userProfile.sessionHistory.includes(selectedWord)) {
        userProfile.sessionHistory.push(selectedWord);
    }
    
    // Optional: limit history size
    if (userProfile.sessionHistory.length > 100) {
        userProfile.sessionHistory.shift();
    }
}

Personalization Best Practices

•Start with global, add personal — Don't personalize so much that users can't find common terms. Global popularity is a strong baseline.
•Session context is powerful — Words typed earlier in the session are strong predictors of related words.
•Decay personal preferences — Old preferences become stale. A word the user searched 2 years ago shouldn't dominate forever.
•Handle cold start — New users have no history. Fall back to global popularity or cohort-based suggestions.
•Respect privacy — Be transparent about what data you collect and how you use it. Provide opt-out options.

The Filter Bubble Risk

Heavy personalization can create filter bubbles—users only see what they've seen before. Intentionally inject some diversity (random popular terms, new entries) to expose users to new content and prevent over-personalization.

Context-Aware Ranking

Context extends beyond user history to include situational factors that influence what the user likely wants.

Contextual Signals:

Application domain: Code editor vs search engine vs map app
Current location: Geographic relevance (local businesses, events)
Time of day: Restaurant searches peak at meal times
Device type: Mobile users might want shorter suggestions
Page/cursor context: In code editors, what's the surrounding code?
Previous query: For search refinement scenarios

contextAwareRanking.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
interface QueryContext {
    // Application context
    domain: 'search' | 'code' | 'email' | 'maps' | 'shopping';
    
    // Temporal context
    timestamp: Date;
    dayOfWeek: number;
    
    // Geographic context
    userLocation?: { lat: number; lng: number };
    
    // Device context
    deviceType: 'desktop' | 'mobile' | 'tablet';
    
    // Session context
    previousQueries: string[];
    currentDocument?: string;  // For code editors
}
 
/**
 * Context-sensitive scoring adjustments.
 */
function contextualScore(
    word: string,
    baseScore: number,
    context: QueryContext
): number {
    let score = baseScore;
    let boost = 1.0;
    
    // Domain-specific boosts
    const domainCategories = getDomainCategories(word);
    if (domainCategories.includes(context.domain)) {
        boost *= 2.0;  // Double score for domain-relevant words
    }
    
    // Time-based adjustments (e.g., food at mealtimes)
    const hour = context.timestamp.getHours();
    if (isFood(word) && (hour >= 11 && hour <= 13 || hour >= 17 && hour <= 20)) {
        boost *= 1.5;  // Boost food during meal times
    }
    
    // Mobile-friendly (prefer shorter completions)
    if (context.deviceType === 'mobile' && word.length < 8) {
        boost *= 1.2;
    }
    
    // Query sequence context (search refinement)
    if (context.previousQueries.length > 0) {
        const lastQuery = context.previousQueries[context.previousQueries.length - 1];
        if (isRefinementOf(word, lastQuery)) {
            boost *= 1.5;
        }
    }
    
    return score * boost;
}
 
 
/**
 * Code editor context: consider surrounding code.
 */
interface CodeContext {
    language: string;          // 'typescript', 'python', etc.
    currentScope: string;      // 'class', 'function', 'global'
    expectedType?: string;     // Type constraint from position
    importedModules: string[]; // Available imports
    localSymbols: string[];    // Variables in scope
}
 
function codeContextScore(
    symbol: string,
    baseScore: number,
    codeContext: CodeContext
): number {
    let score = baseScore;
    
    // Huge boost for symbols in scope
    if (codeContext.localSymbols.includes(symbol)) {
        score += 1000;  // Local symbols should appear first
    }
    
    // Boost for expected type match
    if (codeContext.expectedType) {
        const symbolType = getSymbolType(symbol);
        if (symbolType === codeContext.expectedType) {
            score *= 2.0;
        }
    }
    
    // Boost for recently used in this file
    // (Would need additional tracking)
    
    return score;
}
 
 
/**
 * Geographic context for local suggestions.
 */
function geoContextScore(
    term: string,
    baseScore: number,
    userLocation: { lat: number; lng: number } | undefined
): number {
    if (!userLocation || !isLocationTerm(term)) {
        return baseScore;
    }
    
    const termLocation = getTermLocation(term);
    if (!termLocation) {
        return baseScore;
    }
    
    // Calculate distance
    const distance = haversineDistance(userLocation, termLocation);
    
    // Boost closer locations (decay with distance)
    const maxBoostDistance = 50;  // km
    const distanceBoost = Math.max(0, 1 - distance / maxBoostDistance);
    
    return baseScore * (1 + distanceBoost);
}

Context Signals Have Varying Reliability

Some context signals are highly reliable (code editor knowing expected type) while others are probabilistic guesses (time of day → food interest). Weight your signals accordingly. Strong signals should have large effects; weak signals should only break ties.

Multi-Signal Ranking Functions

Real ranking systems combine multiple signals. The challenge is how to combine them into a single score that produces good rankings.

Combination Strategies:

Weighted linear combination: score = w₁f₁ + w₂f₂ + w₃f₃
Multiplicative combination: score = f₁ × f₂ × f₃
Tiered ranking: Use f₁ as primary, f₂ to break ties
Learned ranking: Train a model to combine features

multiSignalRanking.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
interface RankingFeatures {
    popularity: number;       // Historical selection count
    recency: number;         // Time since last selection (lower = better)
    personalScore: number;   // User-specific score
    exactMatch: boolean;     // Prefix matches exactly (the prefix IS a word)
    prefixLength: number;    // Length of matching prefix
    wordLength: number;      // How long is the complete word
    categoryMatch: boolean;  // Matches user's interests
}
 
/**
 * Strategy 1: Weighted Linear Combination
 * Simple, interpretable, easy to tune.
 */
function linearRanking(
    features: RankingFeatures,
    weights: { [key: string]: number }
): number {
    return (
        weights.popularity * features.popularity +
        weights.recency * Math.max(0, 1000 - features.recency / 3600000) + // Decay over hours
        weights.personal * features.personalScore +
        weights.exactMatch * (features.exactMatch ? 100 : 0) +
        weights.lengthPenalty * (-features.wordLength) + // Prefer shorter words
        weights.categoryBoost * (features.categoryMatch ? 50 : 0)
    );
}
 
// Example weights (would be tuned based on data)
const defaultWeights = {
    popularity: 1.0,
    recency: 0.5,
    personal: 2.0,
    exactMatch: 1.0,
    lengthPenalty: 0.1,
    categoryBoost: 0.3
};
 
 
/**
 * Strategy 2: Tiered Ranking
 * Primary signal dominates; secondary breaks ties.
 */
interface TieredCandidate {
    word: string;
    tier: number;         // Lower tier = higher priority
    primaryScore: number;
    secondaryScore: number;
}
 
function tieredComparison(a: TieredCandidate, b: TieredCandidate): number {
    // First compare by tier
    if (a.tier !== b.tier) {
        return a.tier - b.tier;  // Lower tier wins
    }
    
    // Within same tier, compare by primary score
    if (a.primaryScore !== b.primaryScore) {
        return b.primaryScore - a.primaryScore;  // Higher score wins
    }
    
    // If primary scores tie, use secondary
    return b.secondaryScore - a.secondaryScore;
}
 
function assignTier(features: RankingFeatures): number {
    // Tier 0: Exact matches (prefix is complete word)
    if (features.exactMatch) return 0;
    
    // Tier 1: Personal favorites (used before)
    if (features.personalScore > 0) return 1;
    
    // Tier 2: Popular suggestions
    if (features.popularity > 100) return 2;
    
    // Tier 3: Everything else
    return 3;
}
 
 
/**
 * Strategy 3: BM25-style scoring (from information retrieval)
 * Used for text search ranking, adaptable to autocomplete.
 */
function bm25Score(
    termFreq: number,      // How often user selected this term
    docFreq: number,       // How many users selected this term
    totalDocs: number,     // Total number of users
    avgTermFreq: number,   // Average selection count
    k1: number = 1.2,      // Term frequency saturation
    b: number = 0.75       // Length normalization
): number {
    // IDF component: rare terms are more distinctive
    const idf = Math.log(
        (totalDocs - docFreq + 0.5) / (docFreq + 0.5) + 1
    );
    
    // TF component with saturation
    const tfComponent = (termFreq * (k1 + 1)) / (
        termFreq + k1 * (1 - b + b * (termFreq / avgTermFreq))
    );
    
    return idf * tfComponent;
}
 
 
/**
 * Strategy 4: Learned Ranking (simple linear model)
 * Train weights from user feedback data.
 */
class LearnedRanker {
    private weights: number[];
    
    constructor(numFeatures: number) {
        // Initialize with uniform weights
        this.weights = new Array(numFeatures).fill(1 / numFeatures);
    }
    
    score(featureVector: number[]): number {
        return featureVector.reduce(
            (sum, f, i) => sum + f * this.weights[i],
            0
        );
    }
    
    /**
     * Update weights based on user feedback.
     * Click data: user clicked suggestion at position p
     * when better suggestion was at position q > p.
     */
    update(
        clickedFeatures: number[],
        betterFeatures: number[],
        learningRate: number = 0.01
    ): void {
        // Gradient: push weight toward features of preferred item
        for (let i = 0; i < this.weights.length; i++) {
            const gradient = betterFeatures[i] - clickedFeatures[i];
            this.weights[i] += learningRate * gradient;
        }
        
        // Normalize to prevent weight explosion
        const norm = Math.sqrt(this.weights.reduce((s, w) => s + w * w, 0));
        this.weights = this.weights.map(w => w / norm);
    }
}

Tuning Ranking Weights

Initial weights are educated guesses. Tune them based on: A/B tests (which weights lead to higher selection rates?), user feedback (explicit ratings or implicit click patterns), and domain expertise (what should matter most in this application?).

Handling Edge Cases in Ranking

Ranking systems must handle various edge cases gracefully. Poor handling leads to confusing or frustrating user experiences.

Common Edge Cases:

Ranking Edge Cases and Solutions

•All candidates have same score — Fall back to secondary ranking (alphabetical, length). Avoid arbitrary ordering that confuses users.
•New words with no history — Give new entries a 'boost window' to gain initial selections. Otherwise they're buried forever.
•Cold start users — New users have no personalization data. Use global rankings, cohort-based suggestions (users like you searched...), or explicit onboarding.
•Stale high-scorers — Old popular terms can dominate indefinitely. Apply time decay or maintain separate 'trending' signals.
•Score overflow/underflow — Cumulative scores can overflow. Use logarithmic scaling or periodic normalization.
•Ties at cutoff — When the 10th and 11th words have equal scores, the cutoff is arbitrary. Consider showing one extra or randomizing among ties.
•Empty results — When no words match the prefix, consider fuzzy suggestions or 'did you mean?' alternatives.

rankingEdgeCases.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
/**
 * Robust ranking with edge case handling.
 */
function robustRankSuggestions(
    candidates: { word: string; score: number; createdAt: number }[],
    k: number,
    now: number = Date.now()
): string[] {
    if (candidates.length === 0) {
        return [];  // Empty prefix match
    }
    
    // Handle all-same-score case
    const allSameScore = candidates.every(
        c => c.score === candidates[0].score
    );
    
    if (allSameScore) {
        // Fall back to alphabetical order (deterministic)
        candidates.sort((a, b) => a.word.localeCompare(b.word));
    } else {
        // Standard score-based sorting
        candidates.sort((a, b) => {
            // Primary: score descending
            if (b.score !== a.score) {
                return b.score - a.score;
            }
            // Secondary: alphabetical (for determinism)
            return a.word.localeCompare(b.word);
        });
    }
    
    // Handle new words: boost window for visibility
    const newWordBoostMs = 7 * 24 * 60 * 60 * 1000;  // 7 days
    const newWords = candidates.filter(
        c => now - c.createdAt < newWordBoostMs && c.score < 10
    );
    
    // Ensure at least one new word in results (if any exist)
    const result = candidates.slice(0, k).map(c => c.word);
    
    if (newWords.length > 0 && !result.some(w => 
        newWords.some(nw => nw.word === w)
    )) {
        // Replace last position with a new word
        result[result.length - 1] = newWords[0].word;
    }
    
    return result;
}
 
 
/**
 * Handle no-match case with fuzzy fallback.
 */
function getSuggestionsWithFallback(
    root: TrieNode,
    prefix: string,
    k: number,
    fuzzyMatcher: (query: string, words: string[]) => string[]
): { suggestions: string[]; isFuzzy: boolean } {
    // Try exact prefix match first
    const exactMatches = getTopSuggestions(root, prefix, k);
    
    if (exactMatches.length > 0) {
        return { suggestions: exactMatches, isFuzzy: false };
    }
    
    // No exact matches — try fuzzy matching
    const allWords = getAllWords(root);  // Expensive, cache this
    const fuzzyMatches = fuzzyMatcher(prefix, allWords);
    
    if (fuzzyMatches.length > 0) {
        return { suggestions: fuzzyMatches.slice(0, k), isFuzzy: true };
    }
    
    // Still nothing — return empty with flag
    return { suggestions: [], isFuzzy: false };
}
 
 
/**
 * Periodic score normalization to prevent overflow.
 */
function normalizeScores(root: TrieNode, targetMax: number = 1000): void {
    // Find current max score
    let maxScore = 0;
    
    function findMax(node: TrieNode): void {
        if (node.isEndOfWord) {
            maxScore = Math.max(maxScore, node.frequency);
        }
        for (const child of node.children.values()) {
            findMax(child);
        }
    }
    
    findMax(root);
    
    if (maxScore <= targetMax) {
        return;  // No normalization needed
    }
    
    // Scale all scores
    const scaleFactor = targetMax / maxScore;
    
    function scale(node: TrieNode): void {
        if (node.isEndOfWord) {
            node.frequency = Math.floor(node.frequency * scaleFactor);
        }
        for (const child of node.children.values()) {
            scale(child);
        }
    }
    
    scale(root);
}

Deterministic Is Better Than Random

When breaking ties, prefer deterministic methods (alphabetical order) over random. Users find it confusing when the same query produces different orderings. Consistency builds trust in the system.

Evaluating Ranking Quality

How do you know if your ranking is good? You need metrics to measure quality and guide improvements.

Key Metrics:

Autocomplete Ranking Metrics
Metric	Definition	Ideal Value	What It Measures
Mean Reciprocal Rank (MRR)	Average of 1/position of correct answer	Close to 1.0	How high the right answer typically appears
Success@K	% of queries where correct answer is in top K	High % (>80%)	Can users usually find what they want?
Keystroke Savings	Characters saved by using autocomplete	High	Practical utility: fewer keystrokes = faster
Click Position Distribution	How often users click 1st, 2nd, 3rd... suggestion	Heavily skewed toward 1st	If users often click 3rd+, ranking is poor
Abandonment Rate	% of times users ignore suggestions entirely	Low	Suggestions are irrelevant if ignored

rankingEvaluation.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
interface AutocompleteEvent {
    prefix: string;
    suggestions: string[];       // What we showed
    selectedWord: string | null; // What user selected (null = abandoned)
    selectedPosition: number;    // 0-indexed position of selection
    finalQuery: string;          // What user actually submitted
}
 
/**
 * Mean Reciprocal Rank (MRR)
 * Average of 1/rank of the correct answer.
 * Higher is better; 1.0 means always first position.
 */
function calculateMRR(events: AutocompleteEvent[]): number {
    const reciprocalRanks = events.map(event => {
        if (event.selectedWord === null) {
            return 0;  // No selection = reciprocal rank 0
        }
        // Position is 0-indexed, rank is 1-indexed
        return 1 / (event.selectedPosition + 1);
    });
    
    return reciprocalRanks.reduce((a, b) => a + b, 0) / events.length;
}
 
// Example:
// Event 1: Selected 1st suggestion → 1/1 = 1.0
// Event 2: Selected 3rd suggestion → 1/3 = 0.33
// Event 3: No selection → 0
// MRR = (1.0 + 0.33 + 0) / 3 = 0.44
 
 
/**
 * Success@K: percentage of queries with correct answer in top K
 */
function calculateSuccessAtK(events: AutocompleteEvent[], k: number): number {
    const successes = events.filter(event => {
        if (event.selectedWord === null) return false;
        return event.selectedPosition < k;
    });
    
    return successes.length / events.length;
}
 
 
/**
 * Keystroke savings: characters saved by autocomplete
 */
function calculateKeystrokeSavings(events: AutocompleteEvent[]): number {
    let totalSaved = 0;
    let totalPossible = 0;
    
    for (const event of events) {
        if (event.selectedWord !== null) {
            const charsTyped = event.prefix.length;
            const wordLength = event.selectedWord.length;
            const charsSaved = Math.max(0, wordLength - charsTyped);
            
            totalSaved += charsSaved;
            totalPossible += wordLength;
        }
    }
    
    return totalPossible > 0 ? totalSaved / totalPossible : 0;
}
 
 
/**
 * Click position distribution
 */
function getClickDistribution(
    events: AutocompleteEvent[],
    positions: number  // How many positions to track
): number[] {
    const distribution = new Array(positions + 1).fill(0);  // +1 for "none/abandoned"
    
    for (const event of events) {
        if (event.selectedWord === null) {
            distribution[positions]++;  // Last bucket = abandoned
        } else if (event.selectedPosition < positions) {
            distribution[event.selectedPosition]++;
        }
    }
    
    // Normalize to percentages
    const total = events.length;
    return distribution.map(count => count / total);
}
 
// Example output: [0.65, 0.15, 0.08, 0.05, 0.02, 0.05]
// Means: 65% clicked 1st, 15% clicked 2nd, ..., 5% abandoned
 
 
/**
 * A/B test comparison
 */
interface ABTestResult {
    variant: 'A' | 'B';
    mrr: number;
    successAt3: number;
    keystrokeSavings: number;
    pValue: number;  // Statistical significance
}
 
function compareRankingVariants(
    variantAEvents: AutocompleteEvent[],
    variantBEvents: AutocompleteEvent[]
): ABTestResult[] {
    const mrrA = calculateMRR(variantAEvents);
    const mrrB = calculateMRR(variantBEvents);
    
    // Would compute p-value with statistical test (e.g., t-test)
    const pValue = computePValue(variantAEvents, variantBEvents);
    
    return [
        { variant: 'A', mrr: mrrA, successAt3: calculateSuccessAtK(variantAEvents, 3), 
          keystrokeSavings: calculateKeystrokeSavings(variantAEvents), pValue },
        { variant: 'B', mrr: mrrB, successAt3: calculateSuccessAtK(variantBEvents, 3),
          keystrokeSavings: calculateKeystrokeSavings(variantBEvents), pValue }
    ];
}

Metrics Can Be Gamed

Optimizing blindly for a metric can backfire. For example, showing only super-popular words maximizes MRR but frustrates users seeking specific rare terms. Always complement metrics with qualitative analysis: Does the experience feel good? Are users satisfied?

Summary: Ranking Autocomplete Suggestions

We've comprehensively explored how to rank autocomplete suggestions—transforming raw prefix matches into useful, prioritized recommendations. Let's consolidate the key insights:

Key Takeaways

•Ranking is crucial — Users see 5-10 suggestions. If the right one isn't near the top, autocomplete fails its purpose.
•Popularity is the best baseline — Start with frequency-based ranking. It works surprisingly well because popular terms are what most users want.
•Recency captures trends — Time decay and sliding windows ensure recent activity influences rankings without letting stale terms dominate.
•Personalization tailors experience — User history, session context, and preferences make suggestions individually relevant.
•Context adds intelligence — Domain, time, location, and application context help predict what users want now.
•Combine signals carefully — Linear combination, tiered ranking, or learned models can unify multiple signals. Tune weights based on data.
•Handle edge cases gracefully — Ties, cold start, new words, and empty results all need explicit handling.
•Measure what matters — MRR, Success@K, and keystroke savings quantify ranking quality. A/B test improvements.

Module Complete:

You've now mastered the complete autocomplete pipeline:

Building the trie — Storing the dictionary efficiently
Prefix matching — Navigating to the prefix node
DFS collection — Gathering all matching words
Ranking — Ordering suggestions by relevance

With this knowledge, you can build production-quality autocomplete systems for search engines, code editors, mobile keyboards, e-commerce platforms, and any application that benefits from intelligent text completion.

Module Complete

Congratulations! You've completed the Word Dictionary & Autocomplete Systems module. You now understand how to build, optimize, and evaluate autocomplete systems using tries. This is a practical skill used in applications serving billions of users daily. Apply this knowledge to create responsive, intelligent text completion experiences.