Design a Search Autocomplete System

Design a search autocomplete (typeahead) system like Google Search Suggestions. As the user types each character into the search box, the system returns the top K most popular query suggestions matching the typed prefix — all within 100ms for a seamless, lag-free typing experience.

Scale Estimates

Metric	Value
Daily active users (DAU)	300 million
Searches per day	10 billion
Average characters typed per search	15 (each triggers an autocomplete request)
Autocomplete queries per day	10B × 15 × 0.5 (debounce) ≈ 75 billion
Peak QPS	~2 million autocomplete lookups / sec
Unique searchable queries	100 million (after deduplication and filtering)
Trie node count (compressed)	~500 million nodes
Trie memory (in-memory)	~10–20 GB per replica
Trie server replicas	20–50 per region
K (suggestions per request)	10

Non-Functional Requirements

Ultra-low latency: < 100ms end-to-end per keystroke; Trie lookup in microseconds; network is the bottleneck → edge caching + geo-distribution
High availability: Replicated Trie servers behind load balancer; cache at CDN edge and client; graceful degradation (show cached/stale suggestions if Trie server is slow)
Freshness: Trending queries surface within 15 minutes via real-time aggregation pipeline; full Trie rebuild every 15 min–1 hour
Relevance: Suggestions ranked by popularity (search frequency with exponential decay); personalised boost from user history; trending injection
Content safety: Offensive/NSFW suggestions filtered via blocklist + ML classifier before reaching the user
Scalability: CDN absorbs majority of traffic for common prefixes; Trie servers handle long-tail; horizontal scaling via replication

Scale Estimates

Metric

Value

Daily active users (DAU)

300 million

Searches per day

10 billion

Average characters typed per search

15 (each triggers an autocomplete request)

Autocomplete queries per day

10B × 15 × 0.5 (debounce) ≈ 75 billion

Peak QPS

~2 million autocomplete lookups / sec

Unique searchable queries

100 million (after deduplication and filtering)

Trie node count (compressed)

~500 million nodes

Trie memory (in-memory)

~10–20 GB per replica

Trie server replicas

20–50 per region

K (suggestions per request)

Non-Functional Requirements

Ultra-low latency: < 100ms end-to-end per keystroke; Trie lookup in microseconds; network is the bottleneck → edge caching + geo-distribution

High availability: Replicated Trie servers behind load balancer; cache at CDN edge and client; graceful degradation (show cached/stale suggestions if Trie server is slow)

Freshness: Trending queries surface within 15 minutes via real-time aggregation pipeline; full Trie rebuild every 15 min–1 hour

Relevance: Suggestions ranked by popularity (search frequency with exponential decay); personalised boost from user history; trending injection

Content safety: Offensive/NSFW suggestions filtered via blocklist + ML classifier before reaching the user

Scalability: CDN absorbs majority of traffic for common prefixes; Trie servers handle long-tail; horizontal scaling via replication

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Search Autocomplete System

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Search Autocomplete System

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How would you use a Trie data structure to power prefix-based autocomplete?

2How would you efficiently update the Trie with new search data without disrupting live reads?

3How would you aggregate search frequency data from billions of daily searches?

4How would you serve autocomplete at scale with sub-100ms latency?

5How would you implement personalised autocomplete?

6How would you handle trending/breaking queries that spike within minutes?

7How would you filter offensive or inappropriate suggestions?

Key Topics

Asked At

Design a Search Autocomplete System

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How would you use a Trie data structure to power prefix-based autocomplete?

2How would you efficiently update the Trie with new search data without disrupting live reads?

3How would you aggregate search frequency data from billions of daily searches?

4How would you serve autocomplete at scale with sub-100ms latency?

5How would you implement personalised autocomplete?

6How would you handle trending/breaking queries that spike within minutes?

7How would you filter offensive or inappropriate suggestions?

Key Topics

Asked At