Design a Recommendation System

Design a personalised recommendation system (like Netflix, YouTube, Amazon, or Spotify) that serves relevant item recommendations to hundreds of millions of users from a catalogue of millions of items. The system uses a two-stage architecture: fast candidate generation (embedding-based ANN retrieval from multiple sources) followed by accurate neural ranking, supported by a real-time feature store, and continuously improved through A/B testing.

Scale Estimates

Metric	Value
Users	500 million
Items in catalogue	100 million
User interaction events per day	10 billion
Recommendation requests per second	500,000
Candidates per request	500 (from multiple sources)
Items scored by ranking model per request	500
Top-K returned per request	20
End-to-end latency (p99)	< 100ms
Embedding dimensions	256
ANN index size	100M items × 256 dims ≈ 100 GB
Model retraining frequency	Daily

Non-Functional Requirements

Latency: End-to-end recommendation < 100ms; candidate generation < 10ms (parallel ANN queries); feature fetch < 5ms (Redis); ranking model inference < 30ms (batched GPU); re-ranking < 5ms
Personalisation: Recommendations tailored to individual users based on interaction history, preferences, demographics, and context; not just popularity-based
Freshness: New user interactions reflected in recommendations within minutes (streaming features); new model deployed daily; new items surfaced within hours of addition
Diversity: Recommendations span different categories, content types, and novelty levels; exploration budget for under-served items; anti-filter-bubble measures
Cold start: Meaningful recommendations for new users (onboarding + popularity) and new items (content-based embeddings + exploration); progressive personalisation as data accumulates
Evaluation: Offline metrics (NDCG, Recall@K) for model validation; online A/B testing for business impact; continuous monitoring for model drift and bias

Scale Estimates

Metric

Value

Users

500 million

Items in catalogue

100 million

User interaction events per day

10 billion

Recommendation requests per second

500,000

Candidates per request

500 (from multiple sources)

Items scored by ranking model per request

500

Top-K returned per request

End-to-end latency (p99)

< 100ms

Embedding dimensions

256

ANN index size

100M items × 256 dims ≈ 100 GB

Model retraining frequency

Daily

Non-Functional Requirements

Latency: End-to-end recommendation < 100ms; candidate generation < 10ms (parallel ANN queries); feature fetch < 5ms (Redis); ranking model inference < 30ms (batched GPU); re-ranking < 5ms

Personalisation: Recommendations tailored to individual users based on interaction history, preferences, demographics, and context; not just popularity-based

Freshness: New user interactions reflected in recommendations within minutes (streaming features); new model deployed daily; new items surfaced within hours of addition

Diversity: Recommendations span different categories, content types, and novelty levels; exploration budget for under-served items; anti-filter-bubble measures

Cold start: Meaningful recommendations for new users (onboarding + popularity) and new items (content-based embeddings + exploration); progressive personalisation as data accumulates

Evaluation: Offline metrics (NDCG, Recall@K) for model validation; online A/B testing for business impact; continuous monitoring for model drift and bias

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Recommendation System

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Recommendation System

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1What are the main recommendation approaches and how do they work?

2How does the two-stage architecture (candidate generation → ranking) work?

3How does the feature store work for real-time recommendation?

4How would you handle the cold-start problem?

5How would you design the embedding-based retrieval system?

6How would you evaluate and improve the recommendation system?

7How would you design the system architecture end-to-end?

Key Topics

Asked At

Design a Recommendation System

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1What are the main recommendation approaches and how do they work?

2How does the two-stage architecture (candidate generation → ranking) work?

3How does the feature store work for real-time recommendation?

4How would you handle the cold-start problem?

5How would you design the embedding-based retrieval system?

6How would you evaluate and improve the recommendation system?

7How would you design the system architecture end-to-end?

Key Topics

Asked At