Loading system design...
Design a personalised recommendation system (like Netflix, YouTube, Amazon, or Spotify) that serves relevant item recommendations to hundreds of millions of users from a catalogue of millions of items. The system uses a two-stage architecture: fast candidate generation (embedding-based ANN retrieval from multiple sources) followed by accurate neural ranking, supported by a real-time feature store, and continuously improved through A/B testing.
| Metric | Value |
|---|---|
| Users | 500 million |
| Items in catalogue | 100 million |
| User interaction events per day | 10 billion |
| Recommendation requests per second | 500,000 |
| Candidates per request | 500 (from multiple sources) |
| Items scored by ranking model per request | 500 |
| Top-K returned per request | 20 |
| End-to-end latency (p99) | < 100ms |
| Embedding dimensions | 256 |
| ANN index size | 100M items × 256 dims ≈ 100 GB |
| Model retraining frequency | Daily |
Personalised recommendations: given a user, return a ranked list of items (products, videos, songs, articles) tailored to their preferences; recommendations reflect user's past behaviour, demographics, and stated preferences
Candidate generation: from a catalogue of millions/billions of items, efficiently narrow down to hundreds of candidate items relevant to a user; multiple candidate sources (collaborative filtering, content-based, trending, etc.) merged into a candidate pool
Ranking / scoring: given the candidate pool (~500–1,000 items), score each item using a learned ranking model; return the top-K items sorted by predicted relevance/engagement; the ranking model uses user features, item features, and context features
Real-time feature serving: at recommendation time, fetch user features (watch history, click history, demographics) and item features (category, popularity, recency) with < 10ms latency; features stored in a low-latency feature store
Similar items: given an item, return the K most similar items ('customers who viewed X also viewed Y'); based on item embeddings (cosine similarity) or co-interaction patterns
Diversity and freshness: recommendations should not be monotonous — inject diversity (different categories, different content types); balance exploitation (recommend proven preferences) with exploration (introduce new items); surface fresh/new content
Cold-start handling: provide meaningful recommendations for new users (no history) and new items (no interactions); for new users: use onboarding preferences, demographics, or popularity-based fallback; for new items: use content features or editorial boosting
Implicit and explicit feedback: capture implicit signals (views, clicks, dwell time, add-to-cart, purchase, skip) and explicit signals (ratings, likes, dislikes, saves); weight signals appropriately (purchase > click > view)
A/B testing: run experiments comparing different recommendation models/algorithms; split users into control/treatment groups; measure engagement metrics (CTR, watch time, conversion rate); statistically determine winner
Contextual recommendations: factor in real-time context — time of day, device type, location, current session activity, trending events; 'morning commute' → short content; 'evening at home' → long content
Non-functional requirements define the system qualities critical to your users. Frame them as 'The system should be able to...' statements. These will guide your deep dives later.
Think about CAP theorem trade-offs, scalability limits, latency targets, durability guarantees, security requirements, fault tolerance, and compliance needs.
Frame NFRs for this specific system. 'Low latency search under 100ms' is far more valuable than just 'low latency'.
Add concrete numbers: 'P99 response time < 500ms', '99.9% availability', '10M DAU'. This drives architectural decisions.
Choose the 3-5 most critical NFRs. Every system should be 'scalable', but what makes THIS system's scaling uniquely challenging?