Design Google Search

Design a web search engine like Google that crawls the entire internet, builds an inverted index of hundreds of billions of pages, and returns the most relevant results for any text query in under 500ms — at a scale of 8.5 billion searches per day. This encompasses web crawling, indexing, query understanding, multi-signal ranking (BM25 + PageRank + ML), snippet generation, and serving with massive scatter-gather fan-out.

Scale Estimates

Metric	Value
Indexed web pages	100+ billion
Queries per day	8.5 billion
Peak queries per second	200,000
Index size (compressed)	~100 petabytes
Index shards	10,000+
Index shard replicas	3× per data centre
Data centres	20+ globally
Crawler pages fetched per day	1+ billion
Unique terms in index	1+ trillion
Average query latency	< 500ms
Result cache hit rate	~30% (head queries)

Non-Functional Requirements

Relevance: Results must be highly relevant — BM25 for lexical matching, PageRank for authority, BERT for semantic understanding; hundreds of ranking signals; ML re-ranker for final ordering
Latency: < 500ms end-to-end — despite fanning out to thousands of index shards; achieved via in-memory Tier 1 index, early termination, hedged requests, and aggressive caching
Freshness: Breaking news searchable within minutes via realtime supplemental index (Kafka pipeline); base index rebuilt periodically for full corpus optimisation
Scale: 100K–200K QPS, each query fans to 10,000+ shards; geo-distributed complete index replicas; tail latency managed with hedged requests and partial results
Fault tolerance: Any shard or data centre can fail without affecting availability; query serves partial results on timeout; replicated index across data centres
Comprehensiveness: Crawl and index the entire public web; prioritise high-quality and frequently-changing pages; avoid spam and low-quality content

Scale Estimates

Metric

Value

Indexed web pages

100+ billion

Queries per day

8.5 billion

Peak queries per second

200,000

Index size (compressed)

~100 petabytes

Index shards

10,000+

Index shard replicas

3× per data centre

Data centres

20+ globally

Crawler pages fetched per day

1+ billion

Unique terms in index

1+ trillion

Average query latency

< 500ms

Result cache hit rate

~30% (head queries)

Non-Functional Requirements

Relevance: Results must be highly relevant — BM25 for lexical matching, PageRank for authority, BERT for semantic understanding; hundreds of ranking signals; ML re-ranker for final ordering

Latency: < 500ms end-to-end — despite fanning out to thousands of index shards; achieved via in-memory Tier 1 index, early termination, hedged requests, and aggressive caching

Freshness: Breaking news searchable within minutes via realtime supplemental index (Kafka pipeline); base index rebuilt periodically for full corpus optimisation

Scale: 100K–200K QPS, each query fans to 10,000+ shards; geo-distributed complete index replicas; tail latency managed with hedged requests and partial results

Fault tolerance: Any shard or data centre can fail without affecting availability; query serves partial results on timeout; replicated index across data centres

Comprehensiveness: Crawl and index the entire public web; prioritise high-quality and frequently-changing pages; avoid spam and low-quality content

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design Google Search

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design Google Search

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How does the web crawling pipeline work at Google scale (hundreds of billions of pages)?

2How is the inverted index built and structured for sub-second query performance?

3How does the query serving pipeline work to return results in < 500ms?

4How does ranking work? What signals does the ranking algorithm use?

5How does query understanding work (spell correction, entity recognition, intent)?

6How would you handle the massive scale of serving 8.5 billion queries per day?

7How is the index kept fresh? How do updates propagate from crawl to search results?

Key Topics

Asked At

Design Google Search

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How does the web crawling pipeline work at Google scale (hundreds of billions of pages)?

2How is the inverted index built and structured for sub-second query performance?

3How does the query serving pipeline work to return results in < 500ms?

4How does ranking work? What signals does the ranking algorithm use?

5How does query understanding work (spell correction, entity recognition, intent)?

6How would you handle the massive scale of serving 8.5 billion queries per day?

7How is the index kept fresh? How do updates propagate from crawl to search results?

Key Topics

Asked At