Loading learning content...
With over 800 million videos on YouTube and 500 hours uploaded every minute, users face an impossible discovery challenge. No human could browse even a fraction of available content. The recommendation engine solves this by predicting what each user wants to watch—often before they know themselves.
Recommendations drive 70%+ of all watch time on YouTube. This isn't just a nice feature—it's the core product. A better recommendation system directly translates to more engagement, longer sessions, and higher revenue. It's also one of the most complex ML systems ever built at scale.
At YouTube's scale, the recommendation engine:
By the end of this page, you will understand the architecture of large-scale recommendation systems, including candidate generation, ranking models, feature engineering, and serving infrastructure. You'll learn how to balance multiple objectives and handle the unique challenges of video recommendation.
A production recommendation system is a multi-stage pipeline that progressively narrows hundreds of millions of candidates down to a handful of personalized suggestions.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
┌─────────────────────────────────────────────────────────────────────────────────┐│ RECOMMENDATION SYSTEM ARCHITECTURE │└─────────────────────────────────────────────────────────────────────────────────┘ 800M+ videos 10-50 videos in catalog shown to user │ ▲ │ │ ▼ │┌─────────────────────────────────────────────────────────────────────────────────┐│ CANDIDATE GENERATION ││ "Find videos this user might be interested in" ││ ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││ │ Collaborative│ │ Content │ │ Social │ │ Trending │ ││ │ Filtering │ │ Based │ │ Graph │ │ & Popular │ ││ │ │ │ │ │ │ │ │ ││ │"Users like │ │"Videos like │ │"Friends are │ │"Viral now" │ ││ │ you watched" │ │ this one" │ │ watching" │ │ │ ││ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ ││ ││ Output: ~10,000 candidate videos per user │└─────────────────────────────────────┬───────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────────────────────┐│ RANKING MODEL ││ "Score each candidate for this specific user" ││ ││ Features: ││ • User features: watch history, demographics, preferences ││ • Video features: content, creator, freshness, popularity ││ • Context features: time of day, device, location ││ ││ Objective: Predict P(watch) × E(watch_time) ││ ││ Output: Ranked list of ~1000 videos with scores │└─────────────────────────────────────┬───────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────────────────────┐│ RE-RANKING / FILTERING ││ "Apply business rules and diversity requirements" ││ ││ • Remove watched videos, flagged content ││ • Apply age restrictions, geo-blocks ││ • Ensure diversity (don't show 10 videos from same creator) ││ • Apply fairness constraints ││ • Insert ads at appropriate positions ││ ││ Output: ~100 videos (more than needed for scroll depth) │└─────────────────────────────────────┬───────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────────────────────┐│ SERVING LAYER ││ "Deliver recommendations with low latency" ││ ││ • Assemble final response with thumbnails, metadata ││ • A/B test assignment ││ • Logging for training feedback ││ • p99 latency < 100ms ││ ││ Output: 10-50 videos for initial page load │└─────────────────────────────────────────────────────────────────────────────────┘Candidate generation is the first filter—reducing hundreds of millions of videos to ~10,000 relevant candidates in milliseconds. Multiple generators run in parallel, each capturing different aspects of relevance.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191
// ================================================================// APPROACH 1: TWO-TOWER NEURAL RETRIEVAL// ================================================================// Learn embeddings for users and videos in the same vector space.// Similar users and videos are close in embedding space. interface TwoTowerModel { // User tower: encodes user history into embedding userEncoder: { inputs: [ 'watch_history', // Last N watched videos 'search_history', // Recent searches 'liked_videos', // Explicit likes 'channel_subscriptions', 'demographics', // Age, gender, country ]; output: 'user_embedding[256]'; }; // Video tower: encodes video into embedding videoEncoder: { inputs: [ 'video_id', // Learned embedding 'title_embedding', // NLP encoding of title 'channel_id', 'category', 'tags', 'visual_features', // CNN features from thumbnails/frames ]; output: 'video_embedding[256]'; }; // Similarity: dot product of embeddings // score(user, video) = user_embedding · video_embedding} // Training: positive examples are (user, watched_video) pairs// Use in-batch negatives for efficiency // Serving: user embedding computed at request time// Video embeddings pre-computed and indexed for ANN search class TwoTowerCandidateGenerator { private videoIndex: ApproximateNearestNeighborIndex; async generateCandidates(userId: string): Promise<VideoCandidate[]> { // 1. Compute user embedding (real-time) const userFeatures = await this.getUserFeatures(userId); const userEmbedding = await this.userEncoder.encode(userFeatures); // 2. Find nearest videos in embedding space (ANN search) // Using HNSW or ScaNN for efficient approximate search const candidates = await this.videoIndex.search(userEmbedding, { k: 1000, // Return top 1000 efSearch: 200, // HNSW search parameter }); return candidates.map(c => ({ videoId: c.id, score: c.similarity, source: 'two-tower', })); }} // ================================================================// APPROACH 2: COLLABORATIVE FILTERING// ================================================================// "Users who watched X also watched Y" class MatrixFactorizationGenerator { // User-item interaction matrix factorized into: // R ≈ U × V^T // U: user latent factors [n_users × k] // V: video latent factors [n_videos × k] private userFactors: Map<string, Float32Array>; private videoIndex: ANNIndex<Float32Array>; async generateCandidates(userId: string): Promise<VideoCandidate[]> { const userVector = this.userFactors.get(userId); if (!userVector) { // Cold start: use popularity-based fallback return this.getPopularVideos(); } // Find videos with high dot product return this.videoIndex.search(userVector, { k: 500 }); }} // ================================================================// APPROACH 3: CONTENT-BASED / CONTEXTUAL// ================================================================// "Videos similar to what you just watched" class ContentBasedGenerator { async generateCandidates( userId: string, context: RequestContext ): Promise<VideoCandidate[]> { const candidates: VideoCandidate[] = []; // Seed videos: recently watched, currently watching const seedVideos = await this.getSeedVideos(userId, context); for (const seed of seedVideos) { // Find videos with similar content const similar = await this.findSimilarContent(seed, { methods: ['topic', 'creator', 'visual'], }); candidates.push(...similar.map(v => ({ videoId: v.id, score: v.similarity * seed.recency_weight, source: `content-similar-to-${seed.id}`, }))); } return this.deduplicate(candidates); } private async findSimilarContent( video: Video, options: SimilarityOptions ): Promise<SimilarVideo[]> { const results: SimilarVideo[] = []; if (options.methods.includes('topic')) { // Same topic/category results.push(...await this.findByTopic(video.topics)); } if (options.methods.includes('creator')) { // Same creator's other videos results.push(...await this.findByCreator(video.channelId)); } if (options.methods.includes('visual')) { // Visually similar (thumbnail/content embeddings) results.push(...await this.findByVisualSimilarity(video.visualEmbedding)); } return results; }} // ================================================================// COMBINING MULTIPLE SOURCES// ================================================================ class CandidateAggregator { private generators: CandidateGenerator[] = [ new TwoTowerCandidateGenerator(), new MatrixFactorizationGenerator(), new ContentBasedGenerator(), new TrendingGenerator(), new SubscriptionGenerator(), ]; async generateCandidates(request: RecommendationRequest): Promise<VideoCandidate[]> { // Run all generators in parallel const results = await Promise.all( this.generators.map(g => g.generateCandidates(request.userId, request.context)) ); // Merge and deduplicate const candidateMap = new Map<string, VideoCandidate>(); for (const candidates of results) { for (const candidate of candidates) { const existing = candidateMap.get(candidate.videoId); if (existing) { // Boost score if nominated by multiple sources existing.score = Math.max(existing.score, candidate.score); existing.sources.push(candidate.source); } else { candidateMap.set(candidate.videoId, { ...candidate, sources: [candidate.source], }); } } } // Sort by initial score and take top N return Array.from(candidateMap.values()) .sort((a, b) => b.score - a.score) .slice(0, 10000); }}Exact nearest neighbor search over 800M vectors would take seconds. ANN algorithms (HNSW, ScaNN, FAISS) find approximate neighbors in milliseconds with 95%+ recall. The slight accuracy loss is acceptable for the massive speed gain.
The ranking model is the most complex component—a deep neural network that predicts user engagement for each candidate. It must be expressive enough to capture subtle preferences while fast enough to score thousands of candidates in milliseconds.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183
// ================================================================// RANKING MODEL ARCHITECTURE// ================================================================ interface RankingModelInputs { // User features user: { userId: string; watchHistory: VideoId[]; // Last 100 videos searchHistory: string[]; // Last 50 searches demographics: { ageGroup: number; // Age bucket gender: number; // Encoded gender country: number; // Country ID }; preferences: { preferredCategories: number[]; avgWatchDuration: number; // User's typical watch length activityLevel: number; // How active on platform }; }; // Video features video: { videoId: string; channelId: string; category: number; duration: number; uploadAge: number; // Days since upload popularity: { totalViews: number; recentViews: number; // Last 7 days avgWatchPercentage: number; likeRatio: number; }; content: { titleEmbedding: Float32Array; // NLP encoding thumbnailEmbedding: Float32Array; // Vision encoding topics: number[]; }; }; // Context features context: { timeOfDay: number; // Hour, encoded cyclically dayOfWeek: number; device: number; // Mobile, desktop, TV isWeekend: boolean; sessionDepth: number; // How many videos already watched }; // Interaction features (user × video) interaction: { userWatchedChannel: boolean; // Has user watched this creator? userSubscribed: boolean; lastWatchedFromChannel: number; // Days since topicOverlap: number; // User preference × video topic match };} // Multi-objective output: predict multiple engagement signalsinterface RankingModelOutputs { pClick: number; // Probability of clicking pWatch: number; // Probability of watching > 30 seconds expectedWatchTime: number; // Expected watch duration pLike: number; // Probability of liking pShare: number; // Probability of sharing pSubscribe: number; // Probability of subscribing to channel} // Final score combines multiple objectivesfunction computeFinalScore(outputs: RankingModelOutputs): number { // Weights determined by business objectives and A/B testing const weights = { watchTime: 0.6, // Primary objective click: 0.1, // Necessary but not sufficient like: 0.15, // Explicit positive signal share: 0.1, // High-value engagement subscribe: 0.05, // Channel growth }; // Normalize watch time prediction const normalizedWatchTime = outputs.expectedWatchTime / 300; // Assume 5 min max return ( weights.watchTime * normalizedWatchTime * outputs.pWatch + weights.click * outputs.pClick + weights.like * outputs.pLike + weights.share * outputs.pShare + weights.subscribe * outputs.pSubscribe );} // ================================================================// FEATURE ENGINEERING// ================================================================ class FeatureExtractor { async extractFeatures( userId: string, videoId: string, context: RequestContext ): Promise<RankingModelInputs> { // Parallel feature fetching for low latency const [userFeatures, videoFeatures] = await Promise.all([ this.getUserFeatures(userId), this.getVideoFeatures(videoId), ]); // Compute interaction features const interactionFeatures = this.computeInteractionFeatures( userFeatures, videoFeatures ); // Context features from request const contextFeatures = this.extractContextFeatures(context); return { user: userFeatures, video: videoFeatures, context: contextFeatures, interaction: interactionFeatures, }; } private computeInteractionFeatures( user: UserFeatures, video: VideoFeatures ): InteractionFeatures { // Cross features that capture user-item interactions return { userWatchedChannel: user.watchedChannels.has(video.channelId), userSubscribed: user.subscriptions.has(video.channelId), lastWatchedFromChannel: this.daysSinceLastWatch(user, video.channelId), topicOverlap: this.computeTopicOverlap(user.preferredTopics, video.topics), }; } private computeTopicOverlap(userTopics: Map<number, number>, videoTopics: number[]): number { // Weighted overlap based on user topic preferences let overlap = 0; for (const topic of videoTopics) { overlap += userTopics.get(topic) || 0; } return overlap; }} // ================================================================// MODEL SERVING// ================================================================ class RankingServer { private model: TensorFlowServingModel; private featureExtractor: FeatureExtractor; async rankCandidates( userId: string, candidates: VideoCandidate[], context: RequestContext ): Promise<RankedVideo[]> { // Batch feature extraction const features = await Promise.all( candidates.map(c => this.featureExtractor.extractFeatures(userId, c.videoId, context) ) ); // Batch inference (much faster than individual requests) const predictions = await this.model.predict(features); // Compute final scores and rank const rankedVideos = candidates.map((candidate, i) => ({ videoId: candidate.videoId, ...predictions[i], finalScore: computeFinalScore(predictions[i]), candidateSources: candidate.sources, })); return rankedVideos.sort((a, b) => b.finalScore - a.finalScore); }}| Feature Category | Examples | Predictive Power | Freshness |
|---|---|---|---|
| User history | Watch history, likes, subscriptions | High | Real-time to daily |
| Video popularity | View count, CTR, avg watch % | Medium-High | Hourly |
| Content features | Title, thumbnail, topics | Medium | Static after upload |
| Context | Time, device, session depth | Medium | Real-time |
| Interaction | User × video crosses | Very High | Real-time |
Optimizing purely for watch time can lead to clickbait, autoplay loops, and addictive content. Responsible recommendation systems include signals for user satisfaction (likes, not just passive watching), diversity, and time well spent metrics.
Training recommendation models requires processing billions of user interactions efficiently. The training pipeline must handle massive data volumes while ensuring model freshness.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206
// ================================================================// DATA COLLECTION AND LABELING// ================================================================ interface InteractionEvent { userId: string; videoId: string; sessionId: string; timestamp: Date; // What the user did action: 'impression' | 'click' | 'watch' | 'like' | 'share' | 'subscribe'; // For watch events watchDuration?: number; watchPercentage?: number; // Context at time of interaction device: string; country: string; source: 'home' | 'search' | 'related' | 'subscription' | 'notification'; position: number; // Position in recommendation list} // Positive examples: videos user engaged with// Negative examples: videos shown but not clicked/watched interface TrainingExample { features: RankingModelInputs; labels: { clicked: boolean; watchDuration: number; liked: boolean; shared: boolean; subscribed: boolean; }; // Sample weight (for importance sampling) weight: number;} // ================================================================// TRAINING DATA PROCESSING// ================================================================ class TrainingDataPipeline { async generateTrainingData(date: Date): Promise<TrainingDataset> { // 1. Load interaction logs from data warehouse const interactions = await this.loadInteractions(date); // 2. Join with feature snapshots // (features as they were when interaction happened) const examples = await this.joinWithFeatures(interactions); // 3. Apply sampling strategies // - Downsample negative examples (most impressions are not clicks) // - Upsample rare positive signals (shares are rare but valuable) const balanced = this.applySampling(examples, { negativeRatio: 5, // 5 negatives per positive rarePositiveBoost: 2, // 2x weight for shares/subscribes }); // 4. Create training/validation split // (temporal split: older for training, recent for validation) const { training, validation } = this.temporalSplit(balanced, { validationHours: 6, }); return { training, validation }; } private applySampling( examples: TrainingExample[], config: SamplingConfig ): TrainingExample[] { const positives = examples.filter(e => e.labels.clicked); const negatives = examples.filter(e => !e.labels.clicked); // Downsample negatives const sampledNegatives = this.sampleWithoutReplacement( negatives, positives.length * config.negativeRatio ); // Adjust weights for sampled negatives const negativeWeight = negatives.length / sampledNegatives.length; for (const neg of sampledNegatives) { neg.weight *= negativeWeight; } // Boost rare positives for (const pos of positives) { if (pos.labels.shared || pos.labels.subscribed) { pos.weight *= config.rarePositiveBoost; } } return [...positives, ...sampledNegatives]; }} // ================================================================// MODEL TRAINING// ================================================================ interface TrainingConfig { // Model architecture architecture: { userTowerLayers: number[]; // e.g., [512, 256, 128] videoTowerLayers: number[]; crossLayers: number; // Deep & Cross Network layers finalLayers: number[]; // e.g., [256, 128] }; // Multi-task learning tasks: { click: { weight: 1.0, loss: 'binary_crossentropy' }; watchTime: { weight: 0.5, loss: 'mse' }; like: { weight: 0.3, loss: 'binary_crossentropy' }; share: { weight: 0.2, loss: 'binary_crossentropy' }; }; // Training hyperparameters training: { batchSize: 4096; learningRate: 0.001; epochs: 5; // Few epochs, lots of data earlyStoppingPatience: 2; }; // Regularization regularization: { dropout: 0.2; l2: 0.0001; embeddingL2: 0.00001; };} class ModelTrainer { async train( dataset: TrainingDataset, config: TrainingConfig ): Promise<TrainedModel> { // Build model const model = this.buildModel(config.architecture); // Multi-task loss function const lossWeights = Object.fromEntries( Object.entries(config.tasks).map(([task, cfg]) => [task, cfg.weight]) ); // Train with distributed training (data parallel) const strategy = new tf.distribute.MirroredStrategy(); await strategy.scope(() => { model.compile({ optimizer: tf.train.adam(config.training.learningRate), loss: this.buildMultiTaskLoss(config.tasks), lossWeights, }); return model.fit(dataset.training, { epochs: config.training.epochs, batchSize: config.training.batchSize, validationData: dataset.validation, callbacks: [ tf.callbacks.earlyStopping({ patience: config.training.earlyStoppingPatience }), new MetricsLogger(), new ModelCheckpointer(), ], }); }); return model; }} // ================================================================// CONTINUOUS TRAINING// ================================================================ class ContinuousTrainingPipeline { // Train new model daily async dailyTrainingJob(): Promise<void> { // 1. Generate training data from yesterday const dataset = await this.dataGenerator.generateTrainingData(yesterday()); // 2. Train new model const newModel = await this.trainer.train(dataset, this.config); // 3. Evaluate on holdout set const metrics = await this.evaluator.evaluate(newModel, holdoutSet); // 4. Compare with current production model const comparison = await this.compare(newModel, this.productionModel); // 5. If better, gradually roll out if (comparison.improvement > 0.005) { // 0.5% improvement threshold await this.deployModel(newModel, { rolloutStrategy: 'gradual', initialPercent: 5, maxPercent: 100, rolloutDays: 7, }); } }}Models trained on yesterday's data are always slightly behind. For fast-moving signals (trending videos, breaking news), supplement with real-time boosting rules that don't require retraining.
Serving recommendations at scale requires specialized infrastructure optimized for low-latency, high-throughput inference. Every millisecond of latency impacts user experience and engagement.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205
// ================================================================// FEATURE STORE// ================================================================// Pre-computed features for low-latency serving interface FeatureStore { // User features: updated hourly, cached in memory userFeatures: { storage: Redis | Memcached; ttl: '1 hour'; precomputed: [ 'watch_history_embedding', 'topic_preferences', 'activity_level', 'demographic_segment', ]; }; // Video features: updated on upload, cached long-term videoFeatures: { storage: Redis | Bigtable; ttl: 'indefinite'; precomputed: [ 'content_embedding', 'popularity_metrics', 'creator_features', ]; }; // Real-time features: computed at request time realtimeFeatures: [ 'session_context', 'recency_since_last_watch', ];} class FeatureServer { private userCache: Redis; private videoCache: Bigtable; async getUserFeatures(userId: string): Promise<UserFeatures> { // Try cache first const cached = await this.userCache.get(`user:${userId}`); if (cached) return JSON.parse(cached); // Fallback to cold-start features return this.getColdStartUserFeatures(userId); } async getVideoFeaturesBatch(videoIds: string[]): Promise<Map<string, VideoFeatures>> { // Batch request for efficiency const keys = videoIds.map(id => `video:${id}`); const results = await this.videoCache.getBatch(keys); const features = new Map(); for (const [key, value] of results) { const videoId = key.replace('video:', ''); features.set(videoId, JSON.parse(value)); } return features; }} // ================================================================// MODEL SERVING// ================================================================ interface ServingCluster { // TensorFlow Serving or TensorRT for GPU inference modelServers: { replicas: 100; instanceType: 'GPU' | 'CPU'; modelsLoaded: ['ranking_v23', 'ranking_v24']; // A/B testing batchSize: 128; // Batch requests for GPU efficiency maxConcurrency: 32; }; // Load balancing loadBalancer: { strategy: 'least_connections'; healthCheck: '/health'; timeoutMs: 50; };} class ModelServer { private model: TFServingClient; async predict(features: RankingModelInputs[]): Promise<RankingModelOutputs[]> { // Batch for GPU efficiency const batches = this.createBatches(features, 128); const results: RankingModelOutputs[] = []; for (const batch of batches) { const batchResults = await this.model.predict({ inputs: this.tensorize(batch), outputNames: ['p_click', 'p_watch', 'expected_watch_time', 'p_like'], }); results.push(...this.parsePredictions(batchResults)); } return results; }} // ================================================================// REQUEST HANDLING// ================================================================ class RecommendationService { private candidateGenerator: CandidateAggregator; private featureStore: FeatureServer; private ranker: ModelServer; private reranker: ReRanker; async getRecommendations(request: RecommendationRequest): Promise<RecommendationResponse> { const startTime = Date.now(); try { // Stage 1: Candidate generation (target: 20ms) const t1 = Date.now(); const candidates = await this.candidateGenerator.generateCandidates(request); this.logLatency('candidate_generation', Date.now() - t1); // Stage 2: Feature extraction (target: 15ms) const t2 = Date.now(); const features = await this.extractFeatures(request.userId, candidates); this.logLatency('feature_extraction', Date.now() - t2); // Stage 3: Ranking (target: 25ms) const t3 = Date.now(); const ranked = await this.ranker.rank(features); this.logLatency('ranking', Date.now() - t3); // Stage 4: Re-ranking and filtering (target: 10ms) const t4 = Date.now(); const final = await this.reranker.rerank(ranked, request); this.logLatency('reranking', Date.now() - t4); // Stage 5: Response assembly (target: 5ms) const response = await this.assembleResponse(final, request); // Log total latency const totalLatency = Date.now() - startTime; this.logLatency('total', totalLatency); if (totalLatency > 100) { this.logSlowRequest(request, totalLatency); } return response; } catch (error) { // Fallback to cached/popular recommendations return this.getFallbackRecommendations(request); } } // Latency budget allocation private readonly LATENCY_BUDGETS = { candidateGeneration: 20, featureExtraction: 15, ranking: 25, reranking: 10, assembly: 5, network: 20, total: 100, };} // ================================================================// CACHING LAYERS// ================================================================ class RecommendationCache { // Cache recommendations for recently active users // Reduces cold-start latency and handles traffic spikes async getOrCompute(request: RecommendationRequest): Promise<RecommendationResponse> { const cacheKey = this.computeCacheKey(request); // Check cache (valid for ~5 minutes for active users) const cached = await this.cache.get(cacheKey); if (cached && this.isValid(cached, request)) { return cached; } // Compute fresh recommendations const fresh = await this.service.getRecommendations(request); // Cache for next request await this.cache.set(cacheKey, fresh, { ttl: 300 }); return fresh; } private computeCacheKey(request: RecommendationRequest): string { // Cache key based on user + context // Fine-grained enough to be relevant, coarse enough to get hits return `rec:${request.userId}:${request.surface}:${Math.floor(Date.now() / 60000)}`; }}| Stage | Budget (ms) | Key Optimizations |
|---|---|---|
| Candidate generation | 20 | ANN search, parallel generators, pre-computed embeddings |
| Feature extraction | 15 | Feature store, batch lookups, caching |
| Ranking inference | 25 | GPU batching, model optimization, quantization |
| Re-ranking | 10 | Simple rules, no ML |
| Response assembly | 5 | Pre-fetched metadata |
| Network overhead | 20 | CDN, connection reuse |
| Total | < 100 | – |
Real-world recommendation systems must handle numerous edge cases and special scenarios that don't fit the standard pipeline.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159
// ================================================================// COLD START: NEW USERS// ================================================================ class ColdStartHandler { async getRecommendationsForNewUser( request: RecommendationRequest ): Promise<RecommendationResponse> { // 1. Use available signals const signals: ColdStartSignals = { device: request.device, // Mobile users prefer shorter content country: request.country, // Regional preferences language: request.language, // Language-appropriate content referrer: request.referrer, // What brought them here firstSearch: request.firstSearch, // If they searched first }; // 2. Find similar users (demographic clustering) const similarUserCluster = await this.findSimilarCluster(signals); // 3. Get popular content for that cluster const popularInCluster = await this.getPopularForCluster(similarUserCluster); // 4. Mix with global trending for discovery const trending = await this.getTrendingForRegion(request.country); // 5. Include exploration content to learn preferences const exploration = await this.getExplorationContent(signals); return this.blend([ { weight: 0.4, content: popularInCluster }, { weight: 0.3, content: trending }, { weight: 0.3, content: exploration }, ]); }} // ================================================================// COLD START: NEW VIDEOS// ================================================================ class NewVideoHandler { async scoreNewVideo( video: Video, user: User ): Promise<number> { // No engagement history - use content features const contentScore = this.scoreByContent(video, user); // Creator track record const creatorScore = await this.getCreatorScore(video.channelId); // Similar videos' performance const similarScore = await this.getSimilarVideosScore(video); // Combine with uncertainty bonus (explore new videos) const explorationBonus = this.computeExplorationBonus(video.uploadAge); return ( contentScore * 0.3 + creatorScore * 0.3 + similarScore * 0.3 + explorationBonus * 0.1 ); } private computeExplorationBonus(uploadAgeHours: number): number { // Higher bonus for very new videos (encourage exploration) // Decays over first 48 hours if (uploadAgeHours < 1) return 1.0; if (uploadAgeHours < 6) return 0.8; if (uploadAgeHours < 24) return 0.5; if (uploadAgeHours < 48) return 0.2; return 0; }} // ================================================================// REAL-TIME TRENDING BOOST// ================================================================ class TrendingBooster { private trendingScores: Map<string, TrendingScore> = new Map(); // Update every few minutes async updateTrendingScores(): Promise<void> { // Get real-time view velocity const velocities = await this.getViewVelocities(); for (const [videoId, velocity] of velocities) { const historical = await this.getHistoricalVelocity(videoId); // Score = how much faster than expected const ratio = velocity / (historical || 1); if (ratio > 5) { // 5x+ normal velocity = viral this.trendingScores.set(videoId, { boost: Math.min(ratio / 10, 2.0), // Cap at 2x boost reason: 'viral', detectedAt: Date.now(), }); } } } applyTrendingBoost(video: RankedVideo): number { const trending = this.trendingScores.get(video.videoId); if (trending && Date.now() - trending.detectedAt < 3600000) { return video.finalScore * (1 + trending.boost); } return video.finalScore; }} // ================================================================// DIVERSITY AND EXPLORATION// ================================================================ class DiversityReRanker { rerank(videos: RankedVideo[]): RankedVideo[] { const result: RankedVideo[] = []; const usedCreators = new Set<string>(); const usedCategories = new Map<string, number>(); for (const video of videos) { // Penalize if creator already in results if (usedCreators.has(video.channelId)) { video.finalScore *= 0.7; // 30% penalty } // Penalize if too many from same category const categoryCount = usedCategories.get(video.category) || 0; if (categoryCount >= 3) { video.finalScore *= 0.8; } result.push(video); usedCreators.add(video.channelId); usedCategories.set(video.category, categoryCount + 1); } // Re-sort after penalties return result.sort((a, b) => b.finalScore - a.finalScore); } // Insert exploration slots injectExploration(videos: RankedVideo[], positions: number[]): RankedVideo[] { const exploration = this.getExplorationVideos(); for (const pos of positions) { if (pos < videos.length) { videos.splice(pos, 0, exploration.shift()!); } } return videos; }}Pure exploitation (showing only what the model is confident about) leads to filter bubbles and fails to discover new preferences. Allocate 5-10% of recommendations to exploration—content the model is uncertain about. This gathers data to improve future recommendations.
Recommendation systems have significant societal impact. Optimizing purely for engagement can promote sensationalism, misinformation, and addictive patterns. Responsible design considers broader impacts.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125
// ================================================================// POLICY-BASED FILTERING// ================================================================ class SafetyFilter { async filter(videos: RankedVideo[], user: User): Promise<RankedVideo[]> { const filtered: RankedVideo[] = []; for (const video of videos) { const policy = await this.getPolicyStatus(video.videoId); // Remove policy-violating content if (policy.violated) continue; // Demote borderline content (not removed, but lower ranked) if (policy.borderline) { video.finalScore *= 0.3; // Heavy demotion } // Apply age restrictions if (policy.ageRestricted && user.age < 18) continue; // Apply user's content preferences if (user.restrictedMode && policy.matureContent) continue; filtered.push(video); } return filtered; }} // ================================================================// RADICALIZATION PREVENTION// ================================================================ class RadicalizationMonitor { async checkRecommendationPath( session: Session, newRecommendations: RankedVideo[] ): Promise<RankedVideo[]> { // Track content 'direction' in session const sessionPath = await this.getSessionContentPath(session); // Detect if trending toward extreme content const extremityTrend = this.computeExtremityTrend(sessionPath); if (extremityTrend > this.THRESHOLD) { // Intervene: inject moderate/authoritative content const intervention = await this.getInterventionContent( sessionPath, { type: 'authoritative', topic: sessionPath.mainTopic } ); // Replace some recommendations with intervention return this.blend([ { weight: 0.6, content: newRecommendations }, { weight: 0.4, content: intervention }, ]); } return newRecommendations; }} // ================================================================// CREATOR FAIRNESS// ================================================================ class CreatorFairnessHandler { async adjustForFairness(videos: RankedVideo[]): Promise<RankedVideo[]> { // Ensure small/new creators get some exposure const smallCreatorSlots = Math.ceil(videos.length * 0.1); // 10% of slots const smallCreatorVideos = videos.filter(v => v.creatorSubscribers < 10000 && v.qualityScore > 0.6 ); const largeCreatorVideos = videos.filter(v => v.creatorSubscribers >= 10000 ); // Interleave to ensure small creators appear return this.interleave( largeCreatorVideos.slice(0, videos.length - smallCreatorSlots), smallCreatorVideos.slice(0, smallCreatorSlots) ); }} // ================================================================// WELLBEING FEATURES// ================================================================ class WellbeingHandler { async adjustForWellbeing( videos: RankedVideo[], session: Session ): Promise<RecommendationResponse> { const response: RecommendationResponse = { videos }; // Check session length if (session.duration > 2 * 3600) { // 2+ hours // Prompt to take a break response.wellbeingPrompt = { type: 'break_reminder', message: "You've been watching for a while. Consider taking a break.", }; } // Check time of day const userLocalHour = this.getUserLocalHour(session); if (userLocalHour >= 0 && userLocalHour < 5) { // Late night // Suggest bedtime reminder if enabled if (session.user.bedtimeReminders) { response.wellbeingPrompt = { type: 'bedtime_reminder', message: "It's getting late. Set a reminder to stop watching?", }; } } return response; }}Maximizing watch time and maximizing user wellbeing are sometimes in conflict. Responsible platforms must make deliberate tradeoffs, accepting some engagement loss for healthier usage patterns.
We've explored the architecture, algorithms, and considerations for building a world-class video recommendation system. Let's consolidate the key takeaways:
Module Complete:
Congratulations! You've completed the comprehensive design of a YouTube-scale video platform, covering the entire content lifecycle from upload to personalized recommendation. You now understand the systems that power one of the largest distributed applications in the world.
You now understand the architecture of recommendation systems at scale. From candidate generation to ranking models to responsible design, these patterns power the discovery engines that drive engagement on the world's largest video platforms.