Typeahead/Autocomplete - Learning Module

Loading content...

0/273

Suggestions Ranking: Surfacing the Best Results

The Art of Relevance

Finding matching suggestions is only half the battle. When a user types "app", your system might find 10,000 matching terms: "apple", "application", "appointment", "appetite", "appreciation", and countless more. Which five do you show?

Ranking is the process of ordering candidates to maximize user satisfaction. It's where search becomes an art as much as a science—balancing objective signals like popularity with subjective factors like personalization and freshness.

This page takes you through the complete ranking stack: from simple heuristics through sophisticated machine learning models, teaching you to design ranking systems that feel intelligent to users.

What You Will Master

By completing this page, you'll understand scoring function design, multi-signal combination strategies, personalization techniques, contextual ranking, the cold start problem, and how production systems like Google and Amazon rank autocomplete suggestions.

Ranking Fundamentals: Signals and Scores

Ranking is fundamentally about combining signals into a single score that predicts user satisfaction. Each signal captures a different aspect of relevance.

Core Ranking Signals

Production autocomplete systems consider multiple signal categories:

Autocomplete Ranking Signals
Signal Category	Examples	Purpose
Query Match Quality	Prefix length, edit distance, position of match	How well does suggestion match input?
Popularity (Global)	Search frequency, click-through rate, trend velocity	What do most users want?
Personalization	User history, preferences, demographics	What does THIS user want?
Freshness	Recency of content, trending topics	Is suggestion timely?
Authority	Source quality, brand recognition	Is suggestion trustworthy?
Context	Location, device, time of day, previous queries	What's the user's situation?

The Scoring Function

The fundamental structure of ranking is a scoring function S that takes a query q, suggestion s, and context c, returning a real-valued score:

S(q, s, c) = w₁·f₁(q, s, c) + w₂·f₂(q, s, c) + ... + wₙ·fₙ(q, s, c)

Where:

fᵢ are feature functions extracting individual signals
wᵢ are weights determining each signal's importance
The final score determines ordering

basic_ranking.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
from dataclasses import dataclass
from typing import Optional
import math
 
@dataclass
class Suggestion:
    term: str
    prefix_match_length: int
    edit_distance: int
    global_popularity: float      # 0-1 normalized
    user_history_score: float     # 0-1 normalized
    freshness_score: float        # 0-1 normalized
    
 
@dataclass
class RankingContext:
    query: str
    user_id: Optional[str]
    location: Optional[str]
    timestamp: float
    previous_queries: list[str]
 
 
class BasicRanker:
    """
    Simple linear combination of ranking signals.
    
    This is the starting point for most ranking systems.
    Each signal contributes linearly to final score.
    """
    
    def __init__(self):
        # Initial weights (tuned via offline evaluation or A/B tests)
        self.weights = {
            'prefix_match': 0.30,    # How well query matches
            'popularity': 0.25,      # Global popularity
            'personalization': 0.20, # User-specific relevance
            'freshness': 0.15,       # Recency
            'edit_distance': 0.10,   # Penalize fuzzy matches
        }
    
    def score(self, query: str, suggestion: Suggestion, 
              context: RankingContext) -> float:
        """
        Compute ranking score for a suggestion.
        Higher score = more relevant.
        """
        scores = {}
        
        # Prefix match quality: longer match = better
        prefix_quality = suggestion.prefix_match_length / len(suggestion.term)
        scores['prefix_match'] = prefix_quality
        
        # Global popularity (already normalized 0-1)
        scores['popularity'] = suggestion.global_popularity
        
        # Personalization: user history affinity
        scores['personalization'] = suggestion.user_history_score
        
        # Freshness (already normalized 0-1)
        scores['freshness'] = suggestion.freshness_score
        
        # Edit distance penalty (convert to 0-1 where 1 = no edits)
        max_expected_distance = 3
        distance_penalty = max(0, 1 - suggestion.edit_distance / max_expected_distance)
        scores['edit_distance'] = distance_penalty
        
        # Linear combination
        final_score = sum(
            self.weights[key] * scores[key]
            for key in self.weights
        )
        
        return final_score
    
    def rank(self, query: str, suggestions: list[Suggestion],
             context: RankingContext, limit: int = 10) -> list[Suggestion]:
        """Rank suggestions by score, return top-k."""
        scored = [(s, self.score(query, s, context)) for s in suggestions]
        scored.sort(key=lambda x: -x[1])
        return [s for s, _ in scored[:limit]]

Linear Models as Foundation

Linear combination is not unsophisticated—it's interpretable, fast, and often surprisingly effective. Many production systems use linear models enhanced with careful feature engineering. Non-linear models (ML) improve quality but add complexity and latency.

Popularity Signals: What the Crowd Wants

Popularity is typically the strongest signal for autocomplete. If 1 million users searched for "programming" and 10 searched for "programmable thermostat", the former should usually rank higher for prefix "prog".

Types of Popularity Metrics

Popularity Signal Types

•Raw frequency: Total times a term was searched/selected. Simple but susceptible to spam.
•Time-decayed frequency: Recent activity weighted more heavily. Captures trends.
•Unique user count: Deduplicates per user. More robust to manipulation.
•Click-through rate: Selections / impressions. Measures actual utility.
•Engagement depth: Did users click results? Stay on page? Complete action?

popularity_scoring.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
import time
from collections import defaultdict
from typing import Dict, List, Tuple
 
 
class PopularityTracker:
    """
    Track and score term popularity with time decay.
    """
    
    def __init__(self, decay_half_life_days: float = 7.0):
        """
        decay_half_life_days: After this many days, an event's
        contribution is halved.
        """
        self.decay_half_life = decay_half_life_days * 24 * 3600  # seconds
        
        # Store events as (timestamp, count) pairs
        self.term_events: Dict[str, List[Tuple[float, int]]] = defaultdict(list)
        
        # Pre-computed scores (refreshed periodically)
        self.cached_scores: Dict[str, float] = {}
        self.cache_timestamp: float = 0
        self.cache_ttl: float = 300  # 5 minutes
    
    def record_event(self, term: str, timestamp: float = None, 
                     count: int = 1) -> None:
        """Record a search/selection event for a term."""
        ts = timestamp or time.time()
        self.term_events[term].append((ts, count))
    
    def compute_decayed_score(self, term: str, current_time: float) -> float:
        """
        Compute popularity score with exponential time decay.
        
        score = Σ count_i × exp(-λ × age_i)
        where λ = ln(2) / half_life
        """
        if term not in self.term_events:
            return 0.0
        
        decay_lambda = math.log(2) / self.decay_half_life
        
        score = 0.0
        for timestamp, count in self.term_events[term]:
            age = current_time - timestamp
            if age < 0:
                continue  # Future event (shouldn't happen)
            
            decay_factor = math.exp(-decay_lambda * age)
            score += count * decay_factor
        
        return score
    
    def get_popularity_score(self, term: str) -> float:
        """Get normalized popularity score (0-1 range)."""
        current_time = time.time()
        
        # Refresh cache if stale
        if current_time - self.cache_timestamp > self.cache_ttl:
            self._refresh_cache(current_time)
        
        return self.cached_scores.get(term, 0.0)
    
    def _refresh_cache(self, current_time: float) -> None:
        """Recompute all popularity scores."""
        raw_scores = {}
        
        for term in self.term_events:
            raw_scores[term] = self.compute_decayed_score(term, current_time)
        
        if not raw_scores:
            self.cached_scores = {}
            return
        
        # Normalize to 0-1 using log scale (handles power law distribution)
        max_score = max(raw_scores.values())
        if max_score <= 0:
            self.cached_scores = {t: 0.0 for t in raw_scores}
            return
        
        self.cached_scores = {
            term: math.log1p(score) / math.log1p(max_score)
            for term, score in raw_scores.items()
        }
        
        self.cache_timestamp = current_time
 
 
class TrendingDetector:
    """
    Detect trending terms by comparing recent vs historical popularity.
    """
    
    def __init__(self, recent_window_hours: float = 24,
                 baseline_window_days: float = 30):
        self.recent_window = recent_window_hours * 3600
        self.baseline_window = baseline_window_days * 24 * 3600
        
        self.term_events: Dict[str, List[float]] = defaultdict(list)
    
    def record_event(self, term: str, timestamp: float = None) -> None:
        ts = timestamp or time.time()
        self.term_events[term].append(ts)
    
    def get_trend_score(self, term: str) -> float:
        """
        Calculate trending score as ratio of recent to baseline velocity.
        
        trend = (recent_rate / baseline_rate) - 1
        
        Positive = trending up
        Negative = trending down
        Zero = stable
        """
        if term not in self.term_events:
            return 0.0
        
        current_time = time.time()
        events = self.term_events[term]
        
        # Count events in recent window
        recent_cutoff = current_time - self.recent_window
        recent_count = sum(1 for ts in events if ts >= recent_cutoff)
        
        # Count events in baseline window (excluding recent)
        baseline_start = current_time - self.baseline_window
        baseline_count = sum(
            1 for ts in events 
            if baseline_start <= ts < recent_cutoff
        )
        
        # Calculate rates
        recent_rate = recent_count / self.recent_window
        baseline_days = (self.baseline_window - self.recent_window) / 3600
        baseline_rate = baseline_count / baseline_days if baseline_days > 0 else 0
        
        if baseline_rate <= 0:
            # No baseline - can't measure trend
            return 1.0 if recent_count > 5 else 0.0
        
        trend = (recent_rate / baseline_rate) - 1
        
        # Normalize to reasonable range
        return max(-1, min(5, trend))

Popularity Bias Trap

Over-relying on popularity creates a feedback loop: popular items get shown more, get clicked more, become more popular. This hurts diversity and new content discovery. Balance with freshness, exploration, and diversity signals.

Personalization: What THIS User Wants

Personalization transforms autocomplete from one-size-fits-all to individually tailored. When a developer types "app", they want "application" or "append". When a foodie types "app", they might want "appetizer" or "apple pie recipe".

Personalization Data Sources

Personalization Signal Sources
Source	Signals	Privacy Consideration
Search history	Past queries, clicked results	Medium—PII if linked to identity
Browsing history	Pages visited, time on page	High—very revealing
Purchase history	Bought items, categories	Medium—transactional data
User profile	Age, location, preferences	Medium—explicit collection
Session context	Current session queries	Low—short-term, anonymous
Device/platform	Mobile vs desktop, OS	Low—technical metadata

personalization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
from collections import Counter, defaultdict
from typing import Optional, Set
import math
 
 
class UserProfile:
    """User profile for personalization."""
    
    def __init__(self, user_id: str):
        self.user_id = user_id
        
        # Historical behavior
        self.query_history: List[str] = []
        self.selected_terms: List[str] = []
        
        # Aggregated interests (category -> score)
        self.interests: Dict[str, float] = {}
        
        # Recent session context
        self.session_queries: List[str] = []
 
 
class PersonalizationEngine:
    """
    Compute personalized relevance scores for suggestions.
    """
    
    def __init__(self):
        # Category assignments for terms
        self.term_categories: Dict[str, Set[str]] = {}
        
        # Co-occurrence model: term -> related terms
        self.term_cooccurrence: Dict[str, Dict[str, float]] = defaultdict(Counter)
    
    def register_term_category(self, term: str, categories: Set[str]) -> None:
        """Associate term with categories for interest matching."""
        self.term_categories[term] = categories
    
    def record_selection(self, user: UserProfile, query: str, 
                         selected_term: str) -> None:
        """Update user profile based on selection."""
        user.query_history.append(query)
        user.selected_terms.append(selected_term)
        
        # Update interest scores based on selected term's categories
        if selected_term in self.term_categories:
            for category in self.term_categories[selected_term]:
                user.interests[category] = user.interests.get(category, 0) + 1.0
        
        # Update co-occurrence model
        for prev_term in user.selected_terms[-10:]:  # Last 10 selections
            if prev_term != selected_term:
                self.term_cooccurrence[prev_term][selected_term] += 1
                self.term_cooccurrence[selected_term][prev_term] += 1
    
    def personalized_score(self, user: UserProfile, term: str) -> float:
        """
        Compute personalized relevance score for a term.
        
        Combines:
        1. Direct history match (user searched/selected this before)
        2. Interest match (term's categories align with user interests)
        3. Co-occurrence (term often follows user's recent selections)
        """
        score = 0.0
        
        # Direct history: has user engaged with this term?
        direct_count = user.selected_terms.count(term)
        if direct_count > 0:
            # Log scale to prevent single-term dominance
            score += 0.3 * math.log1p(direct_count)
        
        # Interest match: does term align with user's interests?
        if term in self.term_categories and user.interests:
            term_categories = self.term_categories[term]
            total_interest = sum(user.interests.values())
            
            interest_score = sum(
                user.interests.get(cat, 0) / total_interest
                for cat in term_categories
            )
            score += 0.4 * interest_score
        
        # Co-occurrence: does term follow recent activity?
        if user.selected_terms:
            recent_terms = user.selected_terms[-5:]  # Very recent
            cooc_score = 0.0
            
            for recent in recent_terms:
                if recent in self.term_cooccurrence:
                    cooc = self.term_cooccurrence[recent].get(term, 0)
                    total_cooc = sum(self.term_cooccurrence[recent].values())
                    if total_cooc > 0:
                        cooc_score += cooc / total_cooc
            
            score += 0.3 * (cooc_score / len(recent_terms))
        
        return min(1.0, score)  # Normalize to 0-1
 
 
class SessionContextRanker:
    """
    Use within-session context for anonymous personalization.
    """
    
    def __init__(self):
        # Query refinement patterns
        self.refinement_patterns: Dict[str, str] = {}
    
    def session_score(self, session_queries: List[str], term: str) -> float:
        """
        Score term based on session context.
        
        Patterns detected:
        1. Query refinement: "java" -> "javascript" suggests JS preference
        2. Related topics: "python" + "machine learning" suggests ML context
        3. Reformulation: helps user who seems stuck
        """
        if not session_queries:
            return 0.0
        
        score = 0.0
        last_query = session_queries[-1].lower()
        
        # Refinement detection: is term a refinement of previous query?
        if term.lower().startswith(last_query):
            score += 0.5  # Natural progression
        
        # Topic coherence: does term share words with session?
        session_words = set()
        for q in session_queries:
            session_words.update(q.lower().split())
        
        term_words = set(term.lower().split())
        overlap = len(session_words & term_words)
        if overlap > 0:
            score += 0.3 * min(1, overlap / len(term_words))
        
        return min(1.0, score)

Privacy-Preserving Personalization

Session-based personalization is privacy-friendly—no long-term storage or user identification needed. It provides 70-80% of personalization benefit with minimal privacy concerns. Always prefer session context over historical profiles when possible.

Contextual Ranking: Situation Awareness

Context goes beyond personal history—it captures the user's current situation: where they are, what device they're using, what time it is, and what they were doing moments before.

High-Value Context Signals

Context Signal Impact
Context Type	Example Impact	Implementation
Location	"coffee" → nearby coffee shops vs coffee brands	Geo-filter or geo-boost
Time of day	"breakfast" higher in AM, "dinner" in PM	Time-based weights
Device type	Shorter suggestions on mobile	Length/complexity filter
App/page context	In shopping app → product suggestions	Domain-specific index
Previous query	After "python", boost programming terms	Session memory
Calendar/events	Near meeting, boost attendee names	Integration APIs

contextual_ranking.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
import datetime
from enum import Enum
from typing import Optional
 
 
class DeviceType(Enum):
    DESKTOP = "desktop"
    MOBILE = "mobile"
    TABLET = "tablet"
 
 
@dataclass
class QueryContext:
    """Rich context for a query."""
    query: str
    timestamp: datetime.datetime
    
    # Location
    latitude: Optional[float] = None
    longitude: Optional[float] = None
    city: Optional[str] = None
    country: Optional[str] = None
    
    # Device
    device_type: DeviceType = DeviceType.DESKTOP
    screen_width: int = 1920
    
    # Application context
    app_name: Optional[str] = None
    page_type: Optional[str] = None
    
    # Session
    session_queries: list[str] = None
    session_duration_seconds: float = 0
 
 
class ContextualRanker:
    """
    Adjust ranking based on query context.
    """
    
    def __init__(self):
        # Time-of-day profiles for categories
        self.time_profiles: Dict[str, Dict[int, float]] = {
            'breakfast_food': {h: 2.0 if 6 <= h <= 10 else 0.5 
                               for h in range(24)},
            'lunch_food': {h: 2.0 if 11 <= h <= 14 else 0.5 
                          for h in range(24)},
            'dinner_food': {h: 2.0 if 17 <= h <= 21 else 0.5 
                           for h in range(24)},
            'nightlife': {h: 2.0 if 20 <= h <= 3 else 0.3 
                         for h in range(24)},
        }
        
        # Location relevance
        self.geo_sensitive_categories = {
            'restaurants', 'stores', 'services', 'places'
        }
        
        # Term -> location data for local results
        self.term_locations: Dict[str, Tuple[float, float]] = {}
    
    def time_adjustment(self, term: str, context: QueryContext) -> float:
        """
        Adjust score based on time of day.
        Returns multiplier (1.0 = no change).
        """
        hour = context.timestamp.hour
        
        # Check if term belongs to time-sensitive category
        term_categories = self._get_term_categories(term)
        
        for category in term_categories:
            if category in self.time_profiles:
                return self.time_profiles[category].get(hour, 1.0)
        
        return 1.0
    
    def location_adjustment(self, term: str, context: QueryContext) -> float:
        """
        Boost local results when location available.
        """
        if context.latitude is None or context.longitude is None:
            return 1.0
        
        if term not in self.term_locations:
            return 1.0
        
        term_lat, term_lon = self.term_locations[term]
        
        # Haversine distance (simplified)
        distance_km = self._haversine_distance(
            context.latitude, context.longitude,
            term_lat, term_lon
        )
        
        # Boost local results (within 10km = 2x, decays with distance)
        if distance_km < 1:
            return 3.0
        elif distance_km < 5:
            return 2.0
        elif distance_km < 20:
            return 1.5
        elif distance_km < 50:
            return 1.2
        else:
            return 1.0
    
    def device_adjustment(self, term: str, context: QueryContext) -> float:
        """
        Adjust for device constraints.
        Mobile: prefer shorter, more common terms.
        """
        if context.device_type == DeviceType.MOBILE:
            # Penalize long suggestions on mobile
            if len(term) > 20:
                return 0.7
            elif len(term) > 15:
                return 0.85
            # Boost simple, common terms
            return 1.0
        
        return 1.0
    
    def get_context_multiplier(self, term: str, context: QueryContext) -> float:
        """
        Combine all context adjustments into single multiplier.
        """
        multiplier = 1.0
        
        multiplier *= self.time_adjustment(term, context)
        multiplier *= self.location_adjustment(term, context)
        multiplier *= self.device_adjustment(term, context)
        
        return multiplier
    
    def _get_term_categories(self, term: str) -> Set[str]:
        """Get categories for a term (placeholder)."""
        # In production, use category index or ML classifier
        return set()
    
    def _haversine_distance(self, lat1, lon1, lat2, lon2) -> float:
        """Calculate distance between two points in km."""
        from math import radians, sin, cos, sqrt, atan2
        
        R = 6371  # Earth radius in km
        
        lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
        dlat = lat2 - lat1
        dlon = lon2 - lon1
        
        a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
        c = 2 * atan2(sqrt(a), sqrt(1-a))
        
        return R * c

Context Sensitivity

Not all contexts are equally valuable. Location matters hugely for restaurant search, not at all for software documentation search. Build context-aware systems that know WHEN context matters, not just systems that blindly apply context everywhere.

Machine Learning Ranking (LTR)

Learning to Rank (LTR) uses machine learning to automatically learn optimal weights and feature combinations from historical data. Instead of manually tuning weights, we train models on user behavior.

LTR Problem Formulation

Given:

Query q
Candidate set C = {c₁, c₂, ..., cₙ}
Feature vectors x(q, cᵢ) for each candidate
Training labels (clicks, conversions, or explicit ratings)

Learn: ranking function f(q, c) that orders candidates by relevance.

LTR Approach Comparison
Approach	Optimization	Pros	Cons
Pointwise	Predict absolute score	Simple implementation	Ignores relative ranking
Pairwise	Predict which of pair is better	Captures relative order	Expensive training
Listwise	Optimize full list metric	Directly optimizes NDCG	Complex, data hungry

ltr_model.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
import numpy as np
from typing import List, Tuple
 
 
class FeatureExtractor:
    """Extract features for LTR model."""
    
    def extract(self, query: str, suggestion: Suggestion,
                context: QueryContext) -> np.ndarray:
        """
        Extract feature vector for query-suggestion pair.
        
        Returns fixed-size feature vector for ML model input.
        """
        features = []
        
        # Text match features
        features.append(suggestion.prefix_match_length / len(suggestion.term))
        features.append(len(query) / len(suggestion.term))  # Query coverage
        features.append(suggestion.edit_distance / max(len(query), 1))
        features.append(1 if query.lower() == suggestion.term.lower() else 0)
        
        # Popularity features
        features.append(suggestion.global_popularity)
        features.append(np.log1p(suggestion.global_popularity * 1000))  # Log scale
        
        # Personalization features
        features.append(suggestion.user_history_score)
        
        # Freshness features
        features.append(suggestion.freshness_score)
        
        # Context features
        features.append(1 if context.device_type == DeviceType.MOBILE else 0)
        features.append(context.timestamp.hour / 24)  # Time of day normalized
        features.append(len(context.session_queries or []))  # Session depth
        
        # Length features
        features.append(len(suggestion.term))
        features.append(len(suggestion.term.split()))  # Word count
        
        return np.array(features, dtype=np.float32)
 
 
class GradientBoostingRanker:
    """
    LTR using gradient boosting (XGBoost/LightGBM style).
    
    In production, use XGBoost or LightGBM with the 'rank:pairwise'
    or 'lambdarank' objective. This is a simplified illustration.
    """
    
    def __init__(self, n_trees: int = 100, learning_rate: float = 0.1):
        self.n_trees = n_trees
        self.learning_rate = learning_rate
        self.trees = []
        self.feature_extractor = FeatureExtractor()
    
    def train(self, training_data: List[Tuple[str, List[Suggestion], 
                                               QueryContext, List[int]]]):
        """
        Train on labeled data.
        
        Each training example: (query, suggestions, context, labels)
        Labels: relevance grades (0-4) or binary (0/1 clicked)
        """
        # In production: use XGBoost/LightGBM
        # xgb.train(params={'objective': 'rank:ndcg'}, dtrain=dmatrix)
        pass
    
    def predict(self, query: str, suggestions: List[Suggestion],
                context: QueryContext) -> List[float]:
        """Predict relevance scores for suggestions."""
        scores = []
        
        for suggestion in suggestions:
            features = self.feature_extractor.extract(query, suggestion, context)
            
            # Sum predictions from all trees
            score = 0.0
            for tree in self.trees:
                score += self.learning_rate * tree.predict(features)
            
            scores.append(score)
        
        return scores
    
    def rank(self, query: str, suggestions: List[Suggestion],
             context: QueryContext, limit: int = 10) -> List[Suggestion]:
        """Rank suggestions using model predictions."""
        scores = self.predict(query, suggestions, context)
        
        # Sort by score descending
        ranked = sorted(zip(suggestions, scores), key=lambda x: -x[1])
        
        return [s for s, _ in ranked[:limit]]
 
 
# Training data example:
# [
#   ("python",
#    [Suggestion("python programming", ...), Suggestion("python snake", ...)],
#    context,
#    [1, 0])  # First was clicked, second wasn't
# ]
 
# Key training signals:
# - Clicks (implicit positive)
# - Skip-over (implicit negative)
# - Dwell time after click
# - Reformulation (user tried again = negative)

ML Ranking Best Practices

•Use implicit feedback carefully: Clicks are biased toward position 1. Use position-debiased objectives.
•Feature engineering matters more than model choice: Good features with simple models often beat complex models with bad features.
•Watch for feature drift: User behavior changes. Retrain models regularly.
•Latency budget: Complex models must still score in <5ms. Use model distillation for production.
•A/B test religiously: Offline metrics (NDCG) don't always predict online success (CTR, revenue).

Start Simple

Don't jump to deep learning. Linear models and gradient boosting with good features outperform neural networks in many ranking scenarios. Add complexity only when simpler approaches plateau and you have abundant training data.

The Cold Start Problem

Cold start occurs when ranking systems lack data: new users (no history), new terms (no popularity data), or new products (no engagement). Handling cold start gracefully is critical for system robustness.

Cold Start Scenarios

Cold Start Types and Mitigations
Scenario	Challenge	Mitigation Strategy
New user	No personalization history	Fall back to demographics, session context, popularity
New term	No popularity data	Use category averages, content similarity, editorial boost
New session	No context yet	Use trending topics, time-appropriate defaults
New market	No localized data	Mirror successful markets, universal signals

cold_start_handling.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
class ColdStartHandler:
    """
    Handle cold start scenarios with graceful degradation.
    """
    
    def __init__(self):
        # Category-level default scores (fallback when term is unknown)
        self.category_defaults: Dict[str, float] = {}
        
        # Trending terms (for new sessions)
        self.trending_terms: List[str] = []
        
        # Demographic-based defaults
        self.demographic_interests: Dict[str, Dict[str, float]] = {}
    
    def get_user_score(self, user: Optional[UserProfile], 
                       term: str) -> Tuple[float, str]:
        """
        Get personalized score with cold start handling.
        Returns (score, source) for debugging.
        """
        if user is None:
            # Completely anonymous - use global popularity only
            return (0.5, 'anonymous_default')
        
        if len(user.selected_terms) >= 5:
            # Warm user - use full personalization
            engine = PersonalizationEngine()
            score = engine.personalized_score(user, term)
            return (score, 'personalized')
        
        if len(user.selected_terms) >= 1:
            # Cool user - limited history
            # Blend personalization with defaults
            engine = PersonalizationEngine()
            personal = engine.personalized_score(user, term)
            default = self._get_demographic_default(user, term)
            
            history_weight = len(user.selected_terms) / 5
            score = personal * history_weight + default * (1 - history_weight)
            return (score, 'partial_personalization')
        
        # Cold user - no history
        # Use demographic defaults if available
        default = self._get_demographic_default(user, term)
        return (default, 'demographic_default')
    
    def get_term_popularity(self, term: str) -> Tuple[float, str]:
        """
        Get popularity score with cold start handling for new terms.
        """
        popularity_tracker = PopularityTracker()
        score = popularity_tracker.get_popularity_score(term)
        
        if score > 0:
            return (score, 'tracked_popularity')
        
        # Unknown term - try category default
        categories = self._infer_categories(term)
        if categories:
            cat_score = max(
                self.category_defaults.get(cat, 0.3)
                for cat in categories
            )
            return (cat_score, 'category_default')
        
        # Truly unknown - use content-based heuristics
        heuristic_score = self._content_heuristic_score(term)
        return (heuristic_score, 'content_heuristic')
    
    def _get_demographic_default(self, user: UserProfile, term: str) -> float:
        """Get default score based on user demographics."""
        # In production, use user attributes like:
        # - Country/language
        # - Age group
        # - Platform (mobile/desktop)
        # to look up demographic-specific term affinities
        return 0.5
    
    def _infer_categories(self, term: str) -> Set[str]:
        """Infer categories from term content."""
        # Use keyword matching, ML classifier, or embedding similarity
        return set()
    
    def _content_heuristic_score(self, term: str) -> float:
        """
        Score based on content features for completely unknown terms.
        """
        score = 0.5  # Neutral baseline
        
        # Prefer shorter terms (likely more common)
        if len(term) < 10:
            score += 0.1
        elif len(term) > 20:
            score -= 0.1
        
        # Prefer single words (simpler concepts)
        word_count = len(term.split())
        if word_count == 1:
            score += 0.05
        elif word_count > 4:
            score -= 0.05
        
        # Penalize unusual characters
        alpha_ratio = sum(c.isalpha() for c in term) / len(term)
        if alpha_ratio < 0.8:
            score -= 0.1
        
        return max(0.1, min(0.9, score))

Exploration vs Exploitation

Cold start is also an opportunity. Occasionally showing cold items gets training data. Use multi-armed bandit strategies (epsilon-greedy, Thompson sampling) to balance showing proven winners with exploring new options.

Diversity and Deduplication

A ranking that shows 10 variations of the same thing frustrates users. Diversity ensures the result set covers different intents, while deduplication removes redundant suggestions.

Why Diversity Matters

For query "apple":

Bad: apple, Apple Inc, apple stock, AAPL, apple company
Good: apple (fruit), Apple Inc, apple pie recipe, Apple Store near me

The second set covers different user intents; the first just repeats one.

diversity.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
class DiversityReranker:
    """
    Rerank results for diversity using MMR 
    (Maximal Marginal Relevance).
    """
    
    def __init__(self, lambda_param: float = 0.7):
        """
        lambda_param: trade-off between relevance and diversity
        - Higher (0.9+): prefer relevance
        - Lower (0.3-0.5): prefer diversity
        """
        self.lambda_param = lambda_param
    
    def similarity(self, term1: str, term2: str) -> float:
        """
        Compute similarity between suggestions.
        Used to penalize redundancy.
        """
        # Option 1: Jaccard on character n-grams
        ngrams1 = set(term1[i:i+2] for i in range(len(term1)-1))
        ngrams2 = set(term2[i:i+2] for i in range(len(term2)-1))
        
        if not ngrams1 or not ngrams2:
            return 0.0
        
        intersection = len(ngrams1 & ngrams2)
        union = len(ngrams1 | ngrams2)
        
        return intersection / union
    
    def mmr_rerank(self, suggestions: List[Tuple[Suggestion, float]],
                   limit: int = 10) -> List[Suggestion]:
        """
        Maximal Marginal Relevance reranking.
        
        MMR = λ × relevance(s) - (1-λ) × max_similarity(s, selected)
        
        Iteratively select suggestion that maximizes MMR.
        """
        if not suggestions:
            return []
        
        selected: List[Suggestion] = []
        remaining = list(suggestions)
        
        # First item: highest relevance
        remaining.sort(key=lambda x: -x[1])
        selected.append(remaining.pop(0)[0])
        
        while len(selected) < limit and remaining:
            best_idx = -1
            best_mmr = float('-inf')
            
            for i, (suggestion, relevance) in enumerate(remaining):
                # Max similarity to already selected items
                max_sim = max(
                    self.similarity(suggestion.term, s.term)
                    for s in selected
                )
                
                # MMR score
                mmr = (self.lambda_param * relevance - 
                       (1 - self.lambda_param) * max_sim)
                
                if mmr > best_mmr:
                    best_mmr = mmr
                    best_idx = i
            
            selected.append(remaining.pop(best_idx)[0])
        
        return selected
 
 
class Deduplicator:
    """
    Remove near-duplicate suggestions.
    """
    
    def __init__(self, threshold: float = 0.85):
        """
        threshold: similarity above which items are considered duplicates
        """
        self.threshold = threshold
    
    def deduplicate(self, suggestions: List[Suggestion]) -> List[Suggestion]:
        """
        Remove duplicates, keeping highest-scored version.
        """
        if not suggestions:
            return []
        
        result: List[Suggestion] = []
        
        for suggestion in suggestions:
            is_duplicate = False
            
            for existing in result:
                if self._is_duplicate(suggestion.term, existing.term):
                    is_duplicate = True
                    break
            
            if not is_duplicate:
                result.append(suggestion)
        
        return result
    
    def _is_duplicate(self, term1: str, term2: str) -> bool:
        """Check if two terms are duplicates."""
        # Exact match after normalization
        norm1 = term1.lower().strip()
        norm2 = term2.lower().strip()
        
        if norm1 == norm2:
            return True
        
        # One is substring of other
        if norm1 in norm2 or norm2 in norm1:
            # Check length ratio
            len_ratio = min(len(norm1), len(norm2)) / max(len(norm1), len(norm2))
            if len_ratio > 0.8:
                return True
        
        # High character similarity
        common = len(set(norm1) & set(norm2))
        total = len(set(norm1) | set(norm2))
        if common / total > self.threshold:
            return True
        
        return False

Intent Diversity

The best diversity isn't just textual—it's intent-based. Use query intent classification (navigational, informational, transactional) to ensure the result set covers multiple likely intents. This requires understanding what users might MEAN, not just what they typed.

Summary: Ranking for Relevance

Ranking transforms raw matches into a curated list that serves user needs. The art lies in combining multiple signals into a single ordering that feels natural and helpful.

Key Takeaways

•Ranking combines multiple signals — Query match, popularity, personalization, freshness, and context each contribute.
•Popularity is powerful but dangerous — Over-reliance creates feedback loops. Balance with exploration.
•Personalization transforms experiences — Even simple session-based personalization dramatically improves relevance.
•Context awareness adds intelligence — Location, device, time, and session state influence what users need.
•Machine learning optimizes automatically — LTR models learn optimal combinations from user behavior.
•Cold start requires graceful degradation — Prepare fallbacks for new users, terms, and contexts.
•Diversity prevents redundancy — MMR and deduplication ensure the result set is useful, not repetitive.

What's Next: Performance Optimization

Ranking adds latency to every query. The final page explores performance optimization: caching strategies, precomputation, index sharding, and latency budgeting to deliver ranked suggestions in milliseconds.

Ranking Expertise Achieved

You now understand the complete ranking stack for autocomplete systems. Combined with previous pages on matching and data structures, you can build autocomplete systems that feel intelligent and responsive. Next, we'll optimize for the speed users demand.