System Design (HLD)YouTube Video Platform

Designing YouTube: A Video Platform at Planetary Scale

LevelAdvanced

Duration180 mins

TopicYouTube Video Platform

6 / 6

Recommendation Engine

The Discovery Challenge: Surfacing Relevant Content at Scale

With over 800 million videos on YouTube and 500 hours uploaded every minute, users face an impossible discovery challenge. No human could browse even a fraction of available content. The recommendation engine solves this by predicting what each user wants to watch—often before they know themselves.

Recommendations drive 70%+ of all watch time on YouTube. This isn't just a nice feature—it's the core product. A better recommendation system directly translates to more engagement, longer sessions, and higher revenue. It's also one of the most complex ML systems ever built at scale.

At YouTube's scale, the recommendation engine:

Processes billions of user interactions daily for training
Scores millions of candidate videos per user session
Serves recommendations in under 100ms latency
Balances user satisfaction, creator economics, and platform health

What You Will Learn

By the end of this page, you will understand the architecture of large-scale recommendation systems, including candidate generation, ranking models, feature engineering, and serving infrastructure. You'll learn how to balance multiple objectives and handle the unique challenges of video recommendation.

Recommendation System Architecture

A production recommendation system is a multi-stage pipeline that progressively narrows hundreds of millions of candidates down to a handful of personalized suggestions.

Recommendation Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
┌─────────────────────────────────────────────────────────────────────────────────┐
│                     RECOMMENDATION SYSTEM ARCHITECTURE                           │
└─────────────────────────────────────────────────────────────────────────────────┘
 
     800M+ videos                                              10-50 videos
     in catalog                                                shown to user
         │                                                          ▲
         │                                                          │
         ▼                                                          │
┌─────────────────────────────────────────────────────────────────────────────────┐
│                           CANDIDATE GENERATION                                   │
│  "Find videos this user might be interested in"                                 │
│                                                                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
│  │ Collaborative│  │   Content    │  │   Social    │  │  Trending    │        │
│  │  Filtering   │  │    Based     │  │   Graph     │  │  & Popular   │        │
│  │              │  │              │  │             │  │              │        │
│  │"Users like   │  │"Videos like  │  │"Friends are │  │"Viral now"   │        │
│  │ you watched" │  │ this one"    │  │ watching"   │  │              │        │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘        │
│                                                                                  │
│  Output: ~10,000 candidate videos per user                                      │
└─────────────────────────────────────┬───────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                              RANKING MODEL                                       │
│  "Score each candidate for this specific user"                                  │
│                                                                                  │
│  Features:                                                                       │
│  • User features: watch history, demographics, preferences                      │
│  • Video features: content, creator, freshness, popularity                      │
│  • Context features: time of day, device, location                              │
│                                                                                  │
│  Objective: Predict P(watch) × E(watch_time)                                    │
│                                                                                  │
│  Output: Ranked list of ~1000 videos with scores                                │
└─────────────────────────────────────┬───────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                              RE-RANKING / FILTERING                              │
│  "Apply business rules and diversity requirements"                              │
│                                                                                  │
│  • Remove watched videos, flagged content                                       │
│  • Apply age restrictions, geo-blocks                                           │
│  • Ensure diversity (don't show 10 videos from same creator)                    │
│  • Apply fairness constraints                                                   │
│  • Insert ads at appropriate positions                                          │
│                                                                                  │
│  Output: ~100 videos (more than needed for scroll depth)                        │
└─────────────────────────────────────┬───────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                              SERVING LAYER                                       │
│  "Deliver recommendations with low latency"                                     │
│                                                                                  │
│  • Assemble final response with thumbnails, metadata                            │
│  • A/B test assignment                                                          │
│  • Logging for training feedback                                                │
│  • p99 latency < 100ms                                                          │
│                                                                                  │
│  Output: 10-50 videos for initial page load                                     │
└─────────────────────────────────────────────────────────────────────────────────┘

Why Multi-Stage?

•Computational efficiency — A complex neural network could score 800M videos for each user, but it would take hours. Instead, cheap models quickly narrow to 10K candidates, then expensive models rank the survivors.
•Different optimization targets — Candidate generation optimizes for recall (don't miss good videos); ranking optimizes for precision (put best videos at top).
•Separation of concerns — Business logic (diversity, fairness, ads) is cleanly separated from ML models. Easy to modify rules without retraining.
•Latency control — Each stage has a latency budget. Total must be <100ms for good UX.

Candidate Generation

Candidate generation is the first filter—reducing hundreds of millions of videos to ~10,000 relevant candidates in milliseconds. Multiple generators run in parallel, each capturing different aspects of relevance.

candidate-generation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
// ================================================================
// APPROACH 1: TWO-TOWER NEURAL RETRIEVAL
// ================================================================
// Learn embeddings for users and videos in the same vector space.
// Similar users and videos are close in embedding space.
 
interface TwoTowerModel {
  // User tower: encodes user history into embedding
  userEncoder: {
    inputs: [
      'watch_history',       // Last N watched videos
      'search_history',      // Recent searches
      'liked_videos',        // Explicit likes
      'channel_subscriptions',
      'demographics',        // Age, gender, country
    ];
    output: 'user_embedding[256]';
  };
  
  // Video tower: encodes video into embedding
  videoEncoder: {
    inputs: [
      'video_id',            // Learned embedding
      'title_embedding',     // NLP encoding of title
      'channel_id',
      'category',
      'tags',
      'visual_features',     // CNN features from thumbnails/frames
    ];
    output: 'video_embedding[256]';
  };
  
  // Similarity: dot product of embeddings
  // score(user, video) = user_embedding · video_embedding
}
 
// Training: positive examples are (user, watched_video) pairs
// Use in-batch negatives for efficiency
 
// Serving: user embedding computed at request time
// Video embeddings pre-computed and indexed for ANN search
 
class TwoTowerCandidateGenerator {
  private videoIndex: ApproximateNearestNeighborIndex;
  
  async generateCandidates(userId: string): Promise<VideoCandidate[]> {
    // 1. Compute user embedding (real-time)
    const userFeatures = await this.getUserFeatures(userId);
    const userEmbedding = await this.userEncoder.encode(userFeatures);
    
    // 2. Find nearest videos in embedding space (ANN search)
    // Using HNSW or ScaNN for efficient approximate search
    const candidates = await this.videoIndex.search(userEmbedding, {
      k: 1000,           // Return top 1000
      efSearch: 200,     // HNSW search parameter
    });
    
    return candidates.map(c => ({
      videoId: c.id,
      score: c.similarity,
      source: 'two-tower',
    }));
  }
}
 
// ================================================================
// APPROACH 2: COLLABORATIVE FILTERING
// ================================================================
// "Users who watched X also watched Y"
 
class MatrixFactorizationGenerator {
  // User-item interaction matrix factorized into:
  // R ≈ U × V^T
  // U: user latent factors [n_users × k]
  // V: video latent factors [n_videos × k]
  
  private userFactors: Map<string, Float32Array>;
  private videoIndex: ANNIndex<Float32Array>;
  
  async generateCandidates(userId: string): Promise<VideoCandidate[]> {
    const userVector = this.userFactors.get(userId);
    if (!userVector) {
      // Cold start: use popularity-based fallback
      return this.getPopularVideos();
    }
    
    // Find videos with high dot product
    return this.videoIndex.search(userVector, { k: 500 });
  }
}
 
// ================================================================
// APPROACH 3: CONTENT-BASED / CONTEXTUAL
// ================================================================
// "Videos similar to what you just watched"
 
class ContentBasedGenerator {
  async generateCandidates(
    userId: string,
    context: RequestContext
  ): Promise<VideoCandidate[]> {
    const candidates: VideoCandidate[] = [];
    
    // Seed videos: recently watched, currently watching
    const seedVideos = await this.getSeedVideos(userId, context);
    
    for (const seed of seedVideos) {
      // Find videos with similar content
      const similar = await this.findSimilarContent(seed, {
        methods: ['topic', 'creator', 'visual'],
      });
      
      candidates.push(...similar.map(v => ({
        videoId: v.id,
        score: v.similarity * seed.recency_weight,
        source: `content-similar-to-${seed.id}`,
      })));
    }
    
    return this.deduplicate(candidates);
  }
  
  private async findSimilarContent(
    video: Video,
    options: SimilarityOptions
  ): Promise<SimilarVideo[]> {
    const results: SimilarVideo[] = [];
    
    if (options.methods.includes('topic')) {
      // Same topic/category
      results.push(...await this.findByTopic(video.topics));
    }
    
    if (options.methods.includes('creator')) {
      // Same creator's other videos
      results.push(...await this.findByCreator(video.channelId));
    }
    
    if (options.methods.includes('visual')) {
      // Visually similar (thumbnail/content embeddings)
      results.push(...await this.findByVisualSimilarity(video.visualEmbedding));
    }
    
    return results;
  }
}
 
// ================================================================
// COMBINING MULTIPLE SOURCES
// ================================================================
 
class CandidateAggregator {
  private generators: CandidateGenerator[] = [
    new TwoTowerCandidateGenerator(),
    new MatrixFactorizationGenerator(),
    new ContentBasedGenerator(),
    new TrendingGenerator(),
    new SubscriptionGenerator(),
  ];
  
  async generateCandidates(request: RecommendationRequest): Promise<VideoCandidate[]> {
    // Run all generators in parallel
    const results = await Promise.all(
      this.generators.map(g => g.generateCandidates(request.userId, request.context))
    );
    
    // Merge and deduplicate
    const candidateMap = new Map<string, VideoCandidate>();
    
    for (const candidates of results) {
      for (const candidate of candidates) {
        const existing = candidateMap.get(candidate.videoId);
        if (existing) {
          // Boost score if nominated by multiple sources
          existing.score = Math.max(existing.score, candidate.score);
          existing.sources.push(candidate.source);
        } else {
          candidateMap.set(candidate.videoId, {
            ...candidate,
            sources: [candidate.source],
          });
        }
      }
    }
    
    // Sort by initial score and take top N
    return Array.from(candidateMap.values())
      .sort((a, b) => b.score - a.score)
      .slice(0, 10000);
  }
}

Approximate Nearest Neighbor (ANN) Search

Exact nearest neighbor search over 800M vectors would take seconds. ANN algorithms (HNSW, ScaNN, FAISS) find approximate neighbors in milliseconds with 95%+ recall. The slight accuracy loss is acceptable for the massive speed gain.

Ranking Model

The ranking model is the most complex component—a deep neural network that predicts user engagement for each candidate. It must be expressive enough to capture subtle preferences while fast enough to score thousands of candidates in milliseconds.

ranking-model.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
// ================================================================
// RANKING MODEL ARCHITECTURE
// ================================================================
 
interface RankingModelInputs {
  // User features
  user: {
    userId: string;
    watchHistory: VideoId[];           // Last 100 videos
    searchHistory: string[];           // Last 50 searches
    demographics: {
      ageGroup: number;                // Age bucket
      gender: number;                  // Encoded gender
      country: number;                 // Country ID
    };
    preferences: {
      preferredCategories: number[];
      avgWatchDuration: number;        // User's typical watch length
      activityLevel: number;           // How active on platform
    };
  };
  
  // Video features
  video: {
    videoId: string;
    channelId: string;
    category: number;
    duration: number;
    uploadAge: number;                 // Days since upload
    popularity: {
      totalViews: number;
      recentViews: number;             // Last 7 days
      avgWatchPercentage: number;
      likeRatio: number;
    };
    content: {
      titleEmbedding: Float32Array;    // NLP encoding
      thumbnailEmbedding: Float32Array; // Vision encoding
      topics: number[];
    };
  };
  
  // Context features
  context: {
    timeOfDay: number;                 // Hour, encoded cyclically
    dayOfWeek: number;
    device: number;                    // Mobile, desktop, TV
    isWeekend: boolean;
    sessionDepth: number;              // How many videos already watched
  };
  
  // Interaction features (user × video)
  interaction: {
    userWatchedChannel: boolean;       // Has user watched this creator?
    userSubscribed: boolean;
    lastWatchedFromChannel: number;    // Days since
    topicOverlap: number;              // User preference × video topic match
  };
}
 
// Multi-objective output: predict multiple engagement signals
interface RankingModelOutputs {
  pClick: number;               // Probability of clicking
  pWatch: number;               // Probability of watching > 30 seconds
  expectedWatchTime: number;    // Expected watch duration
  pLike: number;                // Probability of liking
  pShare: number;               // Probability of sharing
  pSubscribe: number;           // Probability of subscribing to channel
}
 
// Final score combines multiple objectives
function computeFinalScore(outputs: RankingModelOutputs): number {
  // Weights determined by business objectives and A/B testing
  const weights = {
    watchTime: 0.6,      // Primary objective
    click: 0.1,          // Necessary but not sufficient
    like: 0.15,          // Explicit positive signal
    share: 0.1,          // High-value engagement
    subscribe: 0.05,     // Channel growth
  };
  
  // Normalize watch time prediction
  const normalizedWatchTime = outputs.expectedWatchTime / 300; // Assume 5 min max
  
  return (
    weights.watchTime * normalizedWatchTime * outputs.pWatch +
    weights.click * outputs.pClick +
    weights.like * outputs.pLike +
    weights.share * outputs.pShare +
    weights.subscribe * outputs.pSubscribe
  );
}
 
// ================================================================
// FEATURE ENGINEERING
// ================================================================
 
class FeatureExtractor {
  async extractFeatures(
    userId: string,
    videoId: string,
    context: RequestContext
  ): Promise<RankingModelInputs> {
    // Parallel feature fetching for low latency
    const [userFeatures, videoFeatures] = await Promise.all([
      this.getUserFeatures(userId),
      this.getVideoFeatures(videoId),
    ]);
    
    // Compute interaction features
    const interactionFeatures = this.computeInteractionFeatures(
      userFeatures, 
      videoFeatures
    );
    
    // Context features from request
    const contextFeatures = this.extractContextFeatures(context);
    
    return {
      user: userFeatures,
      video: videoFeatures,
      context: contextFeatures,
      interaction: interactionFeatures,
    };
  }
  
  private computeInteractionFeatures(
    user: UserFeatures,
    video: VideoFeatures
  ): InteractionFeatures {
    // Cross features that capture user-item interactions
    return {
      userWatchedChannel: user.watchedChannels.has(video.channelId),
      userSubscribed: user.subscriptions.has(video.channelId),
      lastWatchedFromChannel: this.daysSinceLastWatch(user, video.channelId),
      topicOverlap: this.computeTopicOverlap(user.preferredTopics, video.topics),
    };
  }
  
  private computeTopicOverlap(userTopics: Map<number, number>, videoTopics: number[]): number {
    // Weighted overlap based on user topic preferences
    let overlap = 0;
    for (const topic of videoTopics) {
      overlap += userTopics.get(topic) || 0;
    }
    return overlap;
  }
}
 
// ================================================================
// MODEL SERVING
// ================================================================
 
class RankingServer {
  private model: TensorFlowServingModel;
  private featureExtractor: FeatureExtractor;
  
  async rankCandidates(
    userId: string,
    candidates: VideoCandidate[],
    context: RequestContext
  ): Promise<RankedVideo[]> {
    // Batch feature extraction
    const features = await Promise.all(
      candidates.map(c => 
        this.featureExtractor.extractFeatures(userId, c.videoId, context)
      )
    );
    
    // Batch inference (much faster than individual requests)
    const predictions = await this.model.predict(features);
    
    // Compute final scores and rank
    const rankedVideos = candidates.map((candidate, i) => ({
      videoId: candidate.videoId,
      ...predictions[i],
      finalScore: computeFinalScore(predictions[i]),
      candidateSources: candidate.sources,
    }));
    
    return rankedVideos.sort((a, b) => b.finalScore - a.finalScore);
  }
}

Feature Categories and Their Impact
Feature Category	Examples	Predictive Power	Freshness
User history	Watch history, likes, subscriptions	High	Real-time to daily
Video popularity	View count, CTR, avg watch %	Medium-High	Hourly
Content features	Title, thumbnail, topics	Medium	Static after upload
Context	Time, device, session depth	Medium	Real-time
Interaction	User × video crosses	Very High	Real-time

The Watch Time Trap

Optimizing purely for watch time can lead to clickbait, autoplay loops, and addictive content. Responsible recommendation systems include signals for user satisfaction (likes, not just passive watching), diversity, and time well spent metrics.

Training Pipeline

Training recommendation models requires processing billions of user interactions efficiently. The training pipeline must handle massive data volumes while ensuring model freshness.

training-pipeline.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
// ================================================================
// DATA COLLECTION AND LABELING
// ================================================================
 
interface InteractionEvent {
  userId: string;
  videoId: string;
  sessionId: string;
  timestamp: Date;
  
  // What the user did
  action: 'impression' | 'click' | 'watch' | 'like' | 'share' | 'subscribe';
  
  // For watch events
  watchDuration?: number;
  watchPercentage?: number;
  
  // Context at time of interaction
  device: string;
  country: string;
  source: 'home' | 'search' | 'related' | 'subscription' | 'notification';
  position: number;  // Position in recommendation list
}
 
// Positive examples: videos user engaged with
// Negative examples: videos shown but not clicked/watched
 
interface TrainingExample {
  features: RankingModelInputs;
  labels: {
    clicked: boolean;
    watchDuration: number;
    liked: boolean;
    shared: boolean;
    subscribed: boolean;
  };
  // Sample weight (for importance sampling)
  weight: number;
}
 
// ================================================================
// TRAINING DATA PROCESSING
// ================================================================
 
class TrainingDataPipeline {
  async generateTrainingData(date: Date): Promise<TrainingDataset> {
    // 1. Load interaction logs from data warehouse
    const interactions = await this.loadInteractions(date);
    
    // 2. Join with feature snapshots
    // (features as they were when interaction happened)
    const examples = await this.joinWithFeatures(interactions);
    
    // 3. Apply sampling strategies
    //    - Downsample negative examples (most impressions are not clicks)
    //    - Upsample rare positive signals (shares are rare but valuable)
    const balanced = this.applySampling(examples, {
      negativeRatio: 5,   // 5 negatives per positive
      rarePositiveBoost: 2, // 2x weight for shares/subscribes
    });
    
    // 4. Create training/validation split
    //    (temporal split: older for training, recent for validation)
    const { training, validation } = this.temporalSplit(balanced, {
      validationHours: 6,
    });
    
    return { training, validation };
  }
  
  private applySampling(
    examples: TrainingExample[],
    config: SamplingConfig
  ): TrainingExample[] {
    const positives = examples.filter(e => e.labels.clicked);
    const negatives = examples.filter(e => !e.labels.clicked);
    
    // Downsample negatives
    const sampledNegatives = this.sampleWithoutReplacement(
      negatives,
      positives.length * config.negativeRatio
    );
    
    // Adjust weights for sampled negatives
    const negativeWeight = negatives.length / sampledNegatives.length;
    for (const neg of sampledNegatives) {
      neg.weight *= negativeWeight;
    }
    
    // Boost rare positives
    for (const pos of positives) {
      if (pos.labels.shared || pos.labels.subscribed) {
        pos.weight *= config.rarePositiveBoost;
      }
    }
    
    return [...positives, ...sampledNegatives];
  }
}
 
// ================================================================
// MODEL TRAINING
// ================================================================
 
interface TrainingConfig {
  // Model architecture
  architecture: {
    userTowerLayers: number[];      // e.g., [512, 256, 128]
    videoTowerLayers: number[];
    crossLayers: number;            // Deep & Cross Network layers
    finalLayers: number[];          // e.g., [256, 128]
  };
  
  // Multi-task learning
  tasks: {
    click: { weight: 1.0, loss: 'binary_crossentropy' };
    watchTime: { weight: 0.5, loss: 'mse' };
    like: { weight: 0.3, loss: 'binary_crossentropy' };
    share: { weight: 0.2, loss: 'binary_crossentropy' };
  };
  
  // Training hyperparameters
  training: {
    batchSize: 4096;
    learningRate: 0.001;
    epochs: 5;                      // Few epochs, lots of data
    earlyStoppingPatience: 2;
  };
  
  // Regularization
  regularization: {
    dropout: 0.2;
    l2: 0.0001;
    embeddingL2: 0.00001;
  };
}
 
class ModelTrainer {
  async train(
    dataset: TrainingDataset,
    config: TrainingConfig
  ): Promise<TrainedModel> {
    // Build model
    const model = this.buildModel(config.architecture);
    
    // Multi-task loss function
    const lossWeights = Object.fromEntries(
      Object.entries(config.tasks).map(([task, cfg]) => [task, cfg.weight])
    );
    
    // Train with distributed training (data parallel)
    const strategy = new tf.distribute.MirroredStrategy();
    
    await strategy.scope(() => {
      model.compile({
        optimizer: tf.train.adam(config.training.learningRate),
        loss: this.buildMultiTaskLoss(config.tasks),
        lossWeights,
      });
      
      return model.fit(dataset.training, {
        epochs: config.training.epochs,
        batchSize: config.training.batchSize,
        validationData: dataset.validation,
        callbacks: [
          tf.callbacks.earlyStopping({ patience: config.training.earlyStoppingPatience }),
          new MetricsLogger(),
          new ModelCheckpointer(),
        ],
      });
    });
    
    return model;
  }
}
 
// ================================================================
// CONTINUOUS TRAINING
// ================================================================
 
class ContinuousTrainingPipeline {
  // Train new model daily
  async dailyTrainingJob(): Promise<void> {
    // 1. Generate training data from yesterday
    const dataset = await this.dataGenerator.generateTrainingData(yesterday());
    
    // 2. Train new model
    const newModel = await this.trainer.train(dataset, this.config);
    
    // 3. Evaluate on holdout set
    const metrics = await this.evaluator.evaluate(newModel, holdoutSet);
    
    // 4. Compare with current production model
    const comparison = await this.compare(newModel, this.productionModel);
    
    // 5. If better, gradually roll out
    if (comparison.improvement > 0.005) { // 0.5% improvement threshold
      await this.deployModel(newModel, {
        rolloutStrategy: 'gradual',
        initialPercent: 5,
        maxPercent: 100,
        rolloutDays: 7,
      });
    }
  }
}

Feedback Loop Latency

Models trained on yesterday's data are always slightly behind. For fast-moving signals (trending videos, breaking news), supplement with real-time boosting rules that don't require retraining.

Serving Infrastructure

Serving recommendations at scale requires specialized infrastructure optimized for low-latency, high-throughput inference. Every millisecond of latency impacts user experience and engagement.

serving-infrastructure.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
// ================================================================
// FEATURE STORE
// ================================================================
// Pre-computed features for low-latency serving
 
interface FeatureStore {
  // User features: updated hourly, cached in memory
  userFeatures: {
    storage: Redis | Memcached;
    ttl: '1 hour';
    precomputed: [
      'watch_history_embedding',
      'topic_preferences',
      'activity_level',
      'demographic_segment',
    ];
  };
  
  // Video features: updated on upload, cached long-term
  videoFeatures: {
    storage: Redis | Bigtable;
    ttl: 'indefinite';
    precomputed: [
      'content_embedding',
      'popularity_metrics',
      'creator_features',
    ];
  };
  
  // Real-time features: computed at request time
  realtimeFeatures: [
    'session_context',
    'recency_since_last_watch',
  ];
}
 
class FeatureServer {
  private userCache: Redis;
  private videoCache: Bigtable;
  
  async getUserFeatures(userId: string): Promise<UserFeatures> {
    // Try cache first
    const cached = await this.userCache.get(`user:${userId}`);
    if (cached) return JSON.parse(cached);
    
    // Fallback to cold-start features
    return this.getColdStartUserFeatures(userId);
  }
  
  async getVideoFeaturesBatch(videoIds: string[]): Promise<Map<string, VideoFeatures>> {
    // Batch request for efficiency
    const keys = videoIds.map(id => `video:${id}`);
    const results = await this.videoCache.getBatch(keys);
    
    const features = new Map();
    for (const [key, value] of results) {
      const videoId = key.replace('video:', '');
      features.set(videoId, JSON.parse(value));
    }
    
    return features;
  }
}
 
// ================================================================
// MODEL SERVING
// ================================================================
 
interface ServingCluster {
  // TensorFlow Serving or TensorRT for GPU inference
  modelServers: {
    replicas: 100;
    instanceType: 'GPU' | 'CPU';
    modelsLoaded: ['ranking_v23', 'ranking_v24'];  // A/B testing
    batchSize: 128;     // Batch requests for GPU efficiency
    maxConcurrency: 32;
  };
  
  // Load balancing
  loadBalancer: {
    strategy: 'least_connections';
    healthCheck: '/health';
    timeoutMs: 50;
  };
}
 
class ModelServer {
  private model: TFServingClient;
  
  async predict(features: RankingModelInputs[]): Promise<RankingModelOutputs[]> {
    // Batch for GPU efficiency
    const batches = this.createBatches(features, 128);
    
    const results: RankingModelOutputs[] = [];
    
    for (const batch of batches) {
      const batchResults = await this.model.predict({
        inputs: this.tensorize(batch),
        outputNames: ['p_click', 'p_watch', 'expected_watch_time', 'p_like'],
      });
      
      results.push(...this.parsePredictions(batchResults));
    }
    
    return results;
  }
}
 
// ================================================================
// REQUEST HANDLING
// ================================================================
 
class RecommendationService {
  private candidateGenerator: CandidateAggregator;
  private featureStore: FeatureServer;
  private ranker: ModelServer;
  private reranker: ReRanker;
  
  async getRecommendations(request: RecommendationRequest): Promise<RecommendationResponse> {
    const startTime = Date.now();
    
    try {
      // Stage 1: Candidate generation (target: 20ms)
      const t1 = Date.now();
      const candidates = await this.candidateGenerator.generateCandidates(request);
      this.logLatency('candidate_generation', Date.now() - t1);
      
      // Stage 2: Feature extraction (target: 15ms)
      const t2 = Date.now();
      const features = await this.extractFeatures(request.userId, candidates);
      this.logLatency('feature_extraction', Date.now() - t2);
      
      // Stage 3: Ranking (target: 25ms)
      const t3 = Date.now();
      const ranked = await this.ranker.rank(features);
      this.logLatency('ranking', Date.now() - t3);
      
      // Stage 4: Re-ranking and filtering (target: 10ms)
      const t4 = Date.now();
      const final = await this.reranker.rerank(ranked, request);
      this.logLatency('reranking', Date.now() - t4);
      
      // Stage 5: Response assembly (target: 5ms)
      const response = await this.assembleResponse(final, request);
      
      // Log total latency
      const totalLatency = Date.now() - startTime;
      this.logLatency('total', totalLatency);
      
      if (totalLatency > 100) {
        this.logSlowRequest(request, totalLatency);
      }
      
      return response;
      
    } catch (error) {
      // Fallback to cached/popular recommendations
      return this.getFallbackRecommendations(request);
    }
  }
  
  // Latency budget allocation
  private readonly LATENCY_BUDGETS = {
    candidateGeneration: 20,
    featureExtraction: 15,
    ranking: 25,
    reranking: 10,
    assembly: 5,
    network: 20,
    total: 100,
  };
}
 
// ================================================================
// CACHING LAYERS
// ================================================================
 
class RecommendationCache {
  // Cache recommendations for recently active users
  // Reduces cold-start latency and handles traffic spikes
  
  async getOrCompute(request: RecommendationRequest): Promise<RecommendationResponse> {
    const cacheKey = this.computeCacheKey(request);
    
    // Check cache (valid for ~5 minutes for active users)
    const cached = await this.cache.get(cacheKey);
    if (cached && this.isValid(cached, request)) {
      return cached;
    }
    
    // Compute fresh recommendations
    const fresh = await this.service.getRecommendations(request);
    
    // Cache for next request
    await this.cache.set(cacheKey, fresh, { ttl: 300 });
    
    return fresh;
  }
  
  private computeCacheKey(request: RecommendationRequest): string {
    // Cache key based on user + context
    // Fine-grained enough to be relevant, coarse enough to get hits
    return `rec:${request.userId}:${request.surface}:${Math.floor(Date.now() / 60000)}`;
  }
}

Latency Budget by Stage
Stage	Budget (ms)	Key Optimizations
Candidate generation	20	ANN search, parallel generators, pre-computed embeddings
Feature extraction	15	Feature store, batch lookups, caching
Ranking inference	25	GPU batching, model optimization, quantization
Re-ranking	10	Simple rules, no ML
Response assembly	5	Pre-fetched metadata
Network overhead	20	CDN, connection reuse
Total	< 100	–

Handling Special Cases

Real-world recommendation systems must handle numerous edge cases and special scenarios that don't fit the standard pipeline.

Critical Edge Cases

•Cold start (new users) — No watch history to personalize from. Use demographic similarity, trending content, and explore/exploit strategies to learn preferences quickly.
•Cold start (new videos) — No engagement data to rank with. Use content features, creator track record, and exploration slots to gather initial signals.
•Trending/viral content — Standard models are trained on yesterday's data; viral videos need real-time boosting before models catch up.
•Breaking news — Time-sensitive content requires rapid promotion, bypassing normal ranking for relevance.
•Returning users after absence — Preferences may have shifted. Need exploration to re-learn while not alienating with irrelevant content.
•Multi-profile (shared accounts) — Different people using same account. Need session-level personalization, not just user-level.

special-cases.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
// ================================================================
// COLD START: NEW USERS
// ================================================================
 
class ColdStartHandler {
  async getRecommendationsForNewUser(
    request: RecommendationRequest
  ): Promise<RecommendationResponse> {
    // 1. Use available signals
    const signals: ColdStartSignals = {
      device: request.device,        // Mobile users prefer shorter content
      country: request.country,      // Regional preferences
      language: request.language,    // Language-appropriate content
      referrer: request.referrer,    // What brought them here
      firstSearch: request.firstSearch, // If they searched first
    };
    
    // 2. Find similar users (demographic clustering)
    const similarUserCluster = await this.findSimilarCluster(signals);
    
    // 3. Get popular content for that cluster
    const popularInCluster = await this.getPopularForCluster(similarUserCluster);
    
    // 4. Mix with global trending for discovery
    const trending = await this.getTrendingForRegion(request.country);
    
    // 5. Include exploration content to learn preferences
    const exploration = await this.getExplorationContent(signals);
    
    return this.blend([
      { weight: 0.4, content: popularInCluster },
      { weight: 0.3, content: trending },
      { weight: 0.3, content: exploration },
    ]);
  }
}
 
// ================================================================
// COLD START: NEW VIDEOS
// ================================================================
 
class NewVideoHandler {
  async scoreNewVideo(
    video: Video,
    user: User
  ): Promise<number> {
    // No engagement history - use content features
    const contentScore = this.scoreByContent(video, user);
    
    // Creator track record
    const creatorScore = await this.getCreatorScore(video.channelId);
    
    // Similar videos' performance
    const similarScore = await this.getSimilarVideosScore(video);
    
    // Combine with uncertainty bonus (explore new videos)
    const explorationBonus = this.computeExplorationBonus(video.uploadAge);
    
    return (
      contentScore * 0.3 +
      creatorScore * 0.3 +
      similarScore * 0.3 +
      explorationBonus * 0.1
    );
  }
  
  private computeExplorationBonus(uploadAgeHours: number): number {
    // Higher bonus for very new videos (encourage exploration)
    // Decays over first 48 hours
    if (uploadAgeHours < 1) return 1.0;
    if (uploadAgeHours < 6) return 0.8;
    if (uploadAgeHours < 24) return 0.5;
    if (uploadAgeHours < 48) return 0.2;
    return 0;
  }
}
 
// ================================================================
// REAL-TIME TRENDING BOOST
// ================================================================
 
class TrendingBooster {
  private trendingScores: Map<string, TrendingScore> = new Map();
  
  // Update every few minutes
  async updateTrendingScores(): Promise<void> {
    // Get real-time view velocity
    const velocities = await this.getViewVelocities();
    
    for (const [videoId, velocity] of velocities) {
      const historical = await this.getHistoricalVelocity(videoId);
      
      // Score = how much faster than expected
      const ratio = velocity / (historical || 1);
      
      if (ratio > 5) {
        // 5x+ normal velocity = viral
        this.trendingScores.set(videoId, {
          boost: Math.min(ratio / 10, 2.0), // Cap at 2x boost
          reason: 'viral',
          detectedAt: Date.now(),
        });
      }
    }
  }
  
  applyTrendingBoost(video: RankedVideo): number {
    const trending = this.trendingScores.get(video.videoId);
    if (trending && Date.now() - trending.detectedAt < 3600000) {
      return video.finalScore * (1 + trending.boost);
    }
    return video.finalScore;
  }
}
 
// ================================================================
// DIVERSITY AND EXPLORATION
// ================================================================
 
class DiversityReRanker {
  rerank(videos: RankedVideo[]): RankedVideo[] {
    const result: RankedVideo[] = [];
    const usedCreators = new Set<string>();
    const usedCategories = new Map<string, number>();
    
    for (const video of videos) {
      // Penalize if creator already in results
      if (usedCreators.has(video.channelId)) {
        video.finalScore *= 0.7; // 30% penalty
      }
      
      // Penalize if too many from same category
      const categoryCount = usedCategories.get(video.category) || 0;
      if (categoryCount >= 3) {
        video.finalScore *= 0.8;
      }
      
      result.push(video);
      usedCreators.add(video.channelId);
      usedCategories.set(video.category, categoryCount + 1);
    }
    
    // Re-sort after penalties
    return result.sort((a, b) => b.finalScore - a.finalScore);
  }
  
  // Insert exploration slots
  injectExploration(videos: RankedVideo[], positions: number[]): RankedVideo[] {
    const exploration = this.getExplorationVideos();
    
    for (const pos of positions) {
      if (pos < videos.length) {
        videos.splice(pos, 0, exploration.shift()!);
      }
    }
    
    return videos;
  }
}

Explore vs. Exploit

Pure exploitation (showing only what the model is confident about) leads to filter bubbles and fails to discover new preferences. Allocate 5-10% of recommendations to exploration—content the model is uncertain about. This gathers data to improve future recommendations.

Responsible Recommendations

Recommendation systems have significant societal impact. Optimizing purely for engagement can promote sensationalism, misinformation, and addictive patterns. Responsible design considers broader impacts.

Responsibility Considerations

•Misinformation demoting — Reduce visibility of content flagged by fact-checkers. Don't recommend borderline content even if engaging.
•Radicalization prevention — Detect and interrupt recommendation paths that lead to increasingly extreme content.
•Advertiser safety — Don't recommend content that would put ads in inappropriate contexts.
•Creator fairness — Avoid winner-take-all dynamics where small creators can't get discovered.
•User wellbeing — Consider 'time well spent' metrics, not just time spent. Avoid patterns that feel addictive.
•Transparency — Provide users insight into why recommendations are made. Enable preference controls.

responsible-recommendations.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
// ================================================================
// POLICY-BASED FILTERING
// ================================================================
 
class SafetyFilter {
  async filter(videos: RankedVideo[], user: User): Promise<RankedVideo[]> {
    const filtered: RankedVideo[] = [];
    
    for (const video of videos) {
      const policy = await this.getPolicyStatus(video.videoId);
      
      // Remove policy-violating content
      if (policy.violated) continue;
      
      // Demote borderline content (not removed, but lower ranked)
      if (policy.borderline) {
        video.finalScore *= 0.3; // Heavy demotion
      }
      
      // Apply age restrictions
      if (policy.ageRestricted && user.age < 18) continue;
      
      // Apply user's content preferences
      if (user.restrictedMode && policy.matureContent) continue;
      
      filtered.push(video);
    }
    
    return filtered;
  }
}
 
// ================================================================
// RADICALIZATION PREVENTION
// ================================================================
 
class RadicalizationMonitor {
  async checkRecommendationPath(
    session: Session,
    newRecommendations: RankedVideo[]
  ): Promise<RankedVideo[]> {
    // Track content 'direction' in session
    const sessionPath = await this.getSessionContentPath(session);
    
    // Detect if trending toward extreme content
    const extremityTrend = this.computeExtremityTrend(sessionPath);
    
    if (extremityTrend > this.THRESHOLD) {
      // Intervene: inject moderate/authoritative content
      const intervention = await this.getInterventionContent(
        sessionPath,
        { type: 'authoritative', topic: sessionPath.mainTopic }
      );
      
      // Replace some recommendations with intervention
      return this.blend([
        { weight: 0.6, content: newRecommendations },
        { weight: 0.4, content: intervention },
      ]);
    }
    
    return newRecommendations;
  }
}
 
// ================================================================
// CREATOR FAIRNESS
// ================================================================
 
class CreatorFairnessHandler {
  async adjustForFairness(videos: RankedVideo[]): Promise<RankedVideo[]> {
    // Ensure small/new creators get some exposure
    const smallCreatorSlots = Math.ceil(videos.length * 0.1); // 10% of slots
    
    const smallCreatorVideos = videos.filter(v => 
      v.creatorSubscribers < 10000 && v.qualityScore > 0.6
    );
    
    const largeCreatorVideos = videos.filter(v => 
      v.creatorSubscribers >= 10000
    );
    
    // Interleave to ensure small creators appear
    return this.interleave(
      largeCreatorVideos.slice(0, videos.length - smallCreatorSlots),
      smallCreatorVideos.slice(0, smallCreatorSlots)
    );
  }
}
 
// ================================================================
// WELLBEING FEATURES
// ================================================================
 
class WellbeingHandler {
  async adjustForWellbeing(
    videos: RankedVideo[],
    session: Session
  ): Promise<RecommendationResponse> {
    const response: RecommendationResponse = { videos };
    
    // Check session length
    if (session.duration > 2 * 3600) { // 2+ hours
      // Prompt to take a break
      response.wellbeingPrompt = {
        type: 'break_reminder',
        message: "You've been watching for a while. Consider taking a break.",
      };
    }
    
    // Check time of day
    const userLocalHour = this.getUserLocalHour(session);
    if (userLocalHour >= 0 && userLocalHour < 5) { // Late night
      // Suggest bedtime reminder if enabled
      if (session.user.bedtimeReminders) {
        response.wellbeingPrompt = {
          type: 'bedtime_reminder',
          message: "It's getting late. Set a reminder to stop watching?",
        };
      }
    }
    
    return response;
  }
}

Engagement vs. Wellbeing Tension

Maximizing watch time and maximizing user wellbeing are sometimes in conflict. Responsible platforms must make deliberate tradeoffs, accepting some engagement loss for healthier usage patterns.

Recommendation Engine Summary

We've explored the architecture, algorithms, and considerations for building a world-class video recommendation system. Let's consolidate the key takeaways:

Key Design Decisions

•Multi-stage pipeline — Candidate generation (fast, high recall), ranking (expensive, high precision), re-ranking (business rules). Each stage progressively narrows options.
•Multiple candidate sources — Collaborative filtering, content-based, trending, subscriptions. Parallel generators capture different relevance signals.
•Deep ranking models — Multi-task learning predicting click, watch time, likes, shares. Combine objectives based on business goals.
•Feature engineering — Pre-computed features in feature stores. Real-time features computed at request time. Interaction features (user × item crosses) are highly predictive.
•Low-latency serving — Feature stores, GPU batching, model optimization, caching. Total latency < 100ms for good UX.
•Cold start handling — Demographic similarity for new users, creator track record for new videos, exploration bonuses for both.
•Responsible design — Safety filtering, radicalization prevention, creator fairness, wellbeing features. Engagement isn't the only objective.

Module Complete:

Congratulations! You've completed the comprehensive design of a YouTube-scale video platform, covering the entire content lifecycle from upload to personalized recommendation. You now understand the systems that power one of the largest distributed applications in the world.

Module Complete

You now understand the architecture of recommendation systems at scale. From candidate generation to ranking models to responsible design, these patterns power the discovery engines that drive engagement on the world's largest video platforms.

6 / 6

Loading learning content...

System Design (HLD)YouTube Video Platform

Designing YouTube: A Video Platform at Planetary Scale

LevelAdvanced

Duration180 mins

TopicYouTube Video Platform

6 / 6

Recommendation Engine

The Discovery Challenge: Surfacing Relevant Content at Scale

At YouTube's scale, the recommendation engine:

Processes billions of user interactions daily for training
Scores millions of candidate videos per user session
Serves recommendations in under 100ms latency
Balances user satisfaction, creator economics, and platform health

What You Will Learn

Recommendation System Architecture

A production recommendation system is a multi-stage pipeline that progressively narrows hundreds of millions of candidates down to a handful of personalized suggestions.

Recommendation Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
┌─────────────────────────────────────────────────────────────────────────────────┐
│                     RECOMMENDATION SYSTEM ARCHITECTURE                           │
└─────────────────────────────────────────────────────────────────────────────────┘
 
     800M+ videos                                              10-50 videos
     in catalog                                                shown to user
         │                                                          ▲
         │                                                          │
         ▼                                                          │
┌─────────────────────────────────────────────────────────────────────────────────┐
│                           CANDIDATE GENERATION                                   │
│  "Find videos this user might be interested in"                                 │
│                                                                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
│  │ Collaborative│  │   Content    │  │   Social    │  │  Trending    │        │
│  │  Filtering   │  │    Based     │  │   Graph     │  │  & Popular   │        │
│  │              │  │              │  │             │  │              │        │
│  │"Users like   │  │"Videos like  │  │"Friends are │  │"Viral now"   │        │
│  │ you watched" │  │ this one"    │  │ watching"   │  │              │        │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘        │
│                                                                                  │
│  Output: ~10,000 candidate videos per user                                      │
└─────────────────────────────────────┬───────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                              RANKING MODEL                                       │
│  "Score each candidate for this specific user"                                  │
│                                                                                  │
│  Features:                                                                       │
│  • User features: watch history, demographics, preferences                      │
│  • Video features: content, creator, freshness, popularity                      │
│  • Context features: time of day, device, location                              │
│                                                                                  │
│  Objective: Predict P(watch) × E(watch_time)                                    │
│                                                                                  │
│  Output: Ranked list of ~1000 videos with scores                                │
└─────────────────────────────────────┬───────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                              RE-RANKING / FILTERING                              │
│  "Apply business rules and diversity requirements"                              │
│                                                                                  │
│  • Remove watched videos, flagged content                                       │
│  • Apply age restrictions, geo-blocks                                           │
│  • Ensure diversity (don't show 10 videos from same creator)                    │
│  • Apply fairness constraints                                                   │
│  • Insert ads at appropriate positions                                          │
│                                                                                  │
│  Output: ~100 videos (more than needed for scroll depth)                        │
└─────────────────────────────────────┬───────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                              SERVING LAYER                                       │
│  "Deliver recommendations with low latency"                                     │
│                                                                                  │
│  • Assemble final response with thumbnails, metadata                            │
│  • A/B test assignment                                                          │
│  • Logging for training feedback                                                │
│  • p99 latency < 100ms                                                          │
│                                                                                  │
│  Output: 10-50 videos for initial page load                                     │
└─────────────────────────────────────────────────────────────────────────────────┘

Why Multi-Stage?

•Computational efficiency — A complex neural network could score 800M videos for each user, but it would take hours. Instead, cheap models quickly narrow to 10K candidates, then expensive models rank the survivors.
•Different optimization targets — Candidate generation optimizes for recall (don't miss good videos); ranking optimizes for precision (put best videos at top).
•Separation of concerns — Business logic (diversity, fairness, ads) is cleanly separated from ML models. Easy to modify rules without retraining.
•Latency control — Each stage has a latency budget. Total must be <100ms for good UX.

Candidate Generation

candidate-generation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
// ================================================================
// APPROACH 1: TWO-TOWER NEURAL RETRIEVAL
// ================================================================
// Learn embeddings for users and videos in the same vector space.
// Similar users and videos are close in embedding space.
 
interface TwoTowerModel {
  // User tower: encodes user history into embedding
  userEncoder: {
    inputs: [
      'watch_history',       // Last N watched videos
      'search_history',      // Recent searches
      'liked_videos',        // Explicit likes
      'channel_subscriptions',
      'demographics',        // Age, gender, country
    ];
    output: 'user_embedding[256]';
  };
  
  // Video tower: encodes video into embedding
  videoEncoder: {
    inputs: [
      'video_id',            // Learned embedding
      'title_embedding',     // NLP encoding of title
      'channel_id',
      'category',
      'tags',
      'visual_features',     // CNN features from thumbnails/frames
    ];
    output: 'video_embedding[256]';
  };
  
  // Similarity: dot product of embeddings
  // score(user, video) = user_embedding · video_embedding
}
 
// Training: positive examples are (user, watched_video) pairs
// Use in-batch negatives for efficiency
 
// Serving: user embedding computed at request time
// Video embeddings pre-computed and indexed for ANN search
 
class TwoTowerCandidateGenerator {
  private videoIndex: ApproximateNearestNeighborIndex;
  
  async generateCandidates(userId: string): Promise<VideoCandidate[]> {
    // 1. Compute user embedding (real-time)
    const userFeatures = await this.getUserFeatures(userId);
    const userEmbedding = await this.userEncoder.encode(userFeatures);
    
    // 2. Find nearest videos in embedding space (ANN search)
    // Using HNSW or ScaNN for efficient approximate search
    const candidates = await this.videoIndex.search(userEmbedding, {
      k: 1000,           // Return top 1000
      efSearch: 200,     // HNSW search parameter
    });
    
    return candidates.map(c => ({
      videoId: c.id,
      score: c.similarity,
      source: 'two-tower',
    }));
  }
}
 
// ================================================================
// APPROACH 2: COLLABORATIVE FILTERING
// ================================================================
// "Users who watched X also watched Y"
 
class MatrixFactorizationGenerator {
  // User-item interaction matrix factorized into:
  // R ≈ U × V^T
  // U: user latent factors [n_users × k]
  // V: video latent factors [n_videos × k]
  
  private userFactors: Map<string, Float32Array>;
  private videoIndex: ANNIndex<Float32Array>;
  
  async generateCandidates(userId: string): Promise<VideoCandidate[]> {
    const userVector = this.userFactors.get(userId);
    if (!userVector) {
      // Cold start: use popularity-based fallback
      return this.getPopularVideos();
    }
    
    // Find videos with high dot product
    return this.videoIndex.search(userVector, { k: 500 });
  }
}
 
// ================================================================
// APPROACH 3: CONTENT-BASED / CONTEXTUAL
// ================================================================
// "Videos similar to what you just watched"
 
class ContentBasedGenerator {
  async generateCandidates(
    userId: string,
    context: RequestContext
  ): Promise<VideoCandidate[]> {
    const candidates: VideoCandidate[] = [];
    
    // Seed videos: recently watched, currently watching
    const seedVideos = await this.getSeedVideos(userId, context);
    
    for (const seed of seedVideos) {
      // Find videos with similar content
      const similar = await this.findSimilarContent(seed, {
        methods: ['topic', 'creator', 'visual'],
      });
      
      candidates.push(...similar.map(v => ({
        videoId: v.id,
        score: v.similarity * seed.recency_weight,
        source: `content-similar-to-${seed.id}`,
      })));
    }
    
    return this.deduplicate(candidates);
  }
  
  private async findSimilarContent(
    video: Video,
    options: SimilarityOptions
  ): Promise<SimilarVideo[]> {
    const results: SimilarVideo[] = [];
    
    if (options.methods.includes('topic')) {
      // Same topic/category
      results.push(...await this.findByTopic(video.topics));
    }
    
    if (options.methods.includes('creator')) {
      // Same creator's other videos
      results.push(...await this.findByCreator(video.channelId));
    }
    
    if (options.methods.includes('visual')) {
      // Visually similar (thumbnail/content embeddings)
      results.push(...await this.findByVisualSimilarity(video.visualEmbedding));
    }
    
    return results;
  }
}
 
// ================================================================
// COMBINING MULTIPLE SOURCES
// ================================================================
 
class CandidateAggregator {
  private generators: CandidateGenerator[] = [
    new TwoTowerCandidateGenerator(),
    new MatrixFactorizationGenerator(),
    new ContentBasedGenerator(),
    new TrendingGenerator(),
    new SubscriptionGenerator(),
  ];
  
  async generateCandidates(request: RecommendationRequest): Promise<VideoCandidate[]> {
    // Run all generators in parallel
    const results = await Promise.all(
      this.generators.map(g => g.generateCandidates(request.userId, request.context))
    );
    
    // Merge and deduplicate
    const candidateMap = new Map<string, VideoCandidate>();
    
    for (const candidates of results) {
      for (const candidate of candidates) {
        const existing = candidateMap.get(candidate.videoId);
        if (existing) {
          // Boost score if nominated by multiple sources
          existing.score = Math.max(existing.score, candidate.score);
          existing.sources.push(candidate.source);
        } else {
          candidateMap.set(candidate.videoId, {
            ...candidate,
            sources: [candidate.source],
          });
        }
      }
    }
    
    // Sort by initial score and take top N
    return Array.from(candidateMap.values())
      .sort((a, b) => b.score - a.score)
      .slice(0, 10000);
  }
}

Approximate Nearest Neighbor (ANN) Search

Ranking Model

ranking-model.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
// ================================================================
// RANKING MODEL ARCHITECTURE
// ================================================================
 
interface RankingModelInputs {
  // User features
  user: {
    userId: string;
    watchHistory: VideoId[];           // Last 100 videos
    searchHistory: string[];           // Last 50 searches
    demographics: {
      ageGroup: number;                // Age bucket
      gender: number;                  // Encoded gender
      country: number;                 // Country ID
    };
    preferences: {
      preferredCategories: number[];
      avgWatchDuration: number;        // User's typical watch length
      activityLevel: number;           // How active on platform
    };
  };
  
  // Video features
  video: {
    videoId: string;
    channelId: string;
    category: number;
    duration: number;
    uploadAge: number;                 // Days since upload
    popularity: {
      totalViews: number;
      recentViews: number;             // Last 7 days
      avgWatchPercentage: number;
      likeRatio: number;
    };
    content: {
      titleEmbedding: Float32Array;    // NLP encoding
      thumbnailEmbedding: Float32Array; // Vision encoding
      topics: number[];
    };
  };
  
  // Context features
  context: {
    timeOfDay: number;                 // Hour, encoded cyclically
    dayOfWeek: number;
    device: number;                    // Mobile, desktop, TV
    isWeekend: boolean;
    sessionDepth: number;              // How many videos already watched
  };
  
  // Interaction features (user × video)
  interaction: {
    userWatchedChannel: boolean;       // Has user watched this creator?
    userSubscribed: boolean;
    lastWatchedFromChannel: number;    // Days since
    topicOverlap: number;              // User preference × video topic match
  };
}
 
// Multi-objective output: predict multiple engagement signals
interface RankingModelOutputs {
  pClick: number;               // Probability of clicking
  pWatch: number;               // Probability of watching > 30 seconds
  expectedWatchTime: number;    // Expected watch duration
  pLike: number;                // Probability of liking
  pShare: number;               // Probability of sharing
  pSubscribe: number;           // Probability of subscribing to channel
}
 
// Final score combines multiple objectives
function computeFinalScore(outputs: RankingModelOutputs): number {
  // Weights determined by business objectives and A/B testing
  const weights = {
    watchTime: 0.6,      // Primary objective
    click: 0.1,          // Necessary but not sufficient
    like: 0.15,          // Explicit positive signal
    share: 0.1,          // High-value engagement
    subscribe: 0.05,     // Channel growth
  };
  
  // Normalize watch time prediction
  const normalizedWatchTime = outputs.expectedWatchTime / 300; // Assume 5 min max
  
  return (
    weights.watchTime * normalizedWatchTime * outputs.pWatch +
    weights.click * outputs.pClick +
    weights.like * outputs.pLike +
    weights.share * outputs.pShare +
    weights.subscribe * outputs.pSubscribe
  );
}
 
// ================================================================
// FEATURE ENGINEERING
// ================================================================
 
class FeatureExtractor {
  async extractFeatures(
    userId: string,
    videoId: string,
    context: RequestContext
  ): Promise<RankingModelInputs> {
    // Parallel feature fetching for low latency
    const [userFeatures, videoFeatures] = await Promise.all([
      this.getUserFeatures(userId),
      this.getVideoFeatures(videoId),
    ]);
    
    // Compute interaction features
    const interactionFeatures = this.computeInteractionFeatures(
      userFeatures, 
      videoFeatures
    );
    
    // Context features from request
    const contextFeatures = this.extractContextFeatures(context);
    
    return {
      user: userFeatures,
      video: videoFeatures,
      context: contextFeatures,
      interaction: interactionFeatures,
    };
  }
  
  private computeInteractionFeatures(
    user: UserFeatures,
    video: VideoFeatures
  ): InteractionFeatures {
    // Cross features that capture user-item interactions
    return {
      userWatchedChannel: user.watchedChannels.has(video.channelId),
      userSubscribed: user.subscriptions.has(video.channelId),
      lastWatchedFromChannel: this.daysSinceLastWatch(user, video.channelId),
      topicOverlap: this.computeTopicOverlap(user.preferredTopics, video.topics),
    };
  }
  
  private computeTopicOverlap(userTopics: Map<number, number>, videoTopics: number[]): number {
    // Weighted overlap based on user topic preferences
    let overlap = 0;
    for (const topic of videoTopics) {
      overlap += userTopics.get(topic) || 0;
    }
    return overlap;
  }
}
 
// ================================================================
// MODEL SERVING
// ================================================================
 
class RankingServer {
  private model: TensorFlowServingModel;
  private featureExtractor: FeatureExtractor;
  
  async rankCandidates(
    userId: string,
    candidates: VideoCandidate[],
    context: RequestContext
  ): Promise<RankedVideo[]> {
    // Batch feature extraction
    const features = await Promise.all(
      candidates.map(c => 
        this.featureExtractor.extractFeatures(userId, c.videoId, context)
      )
    );
    
    // Batch inference (much faster than individual requests)
    const predictions = await this.model.predict(features);
    
    // Compute final scores and rank
    const rankedVideos = candidates.map((candidate, i) => ({
      videoId: candidate.videoId,
      ...predictions[i],
      finalScore: computeFinalScore(predictions[i]),
      candidateSources: candidate.sources,
    }));
    
    return rankedVideos.sort((a, b) => b.finalScore - a.finalScore);
  }
}

Feature Categories and Their Impact
Feature Category	Examples	Predictive Power	Freshness
User history	Watch history, likes, subscriptions	High	Real-time to daily
Video popularity	View count, CTR, avg watch %	Medium-High	Hourly
Content features	Title, thumbnail, topics	Medium	Static after upload
Context	Time, device, session depth	Medium	Real-time
Interaction	User × video crosses	Very High	Real-time

The Watch Time Trap

Training Pipeline

Training recommendation models requires processing billions of user interactions efficiently. The training pipeline must handle massive data volumes while ensuring model freshness.

training-pipeline.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
// ================================================================
// DATA COLLECTION AND LABELING
// ================================================================
 
interface InteractionEvent {
  userId: string;
  videoId: string;
  sessionId: string;
  timestamp: Date;
  
  // What the user did
  action: 'impression' | 'click' | 'watch' | 'like' | 'share' | 'subscribe';
  
  // For watch events
  watchDuration?: number;
  watchPercentage?: number;
  
  // Context at time of interaction
  device: string;
  country: string;
  source: 'home' | 'search' | 'related' | 'subscription' | 'notification';
  position: number;  // Position in recommendation list
}
 
// Positive examples: videos user engaged with
// Negative examples: videos shown but not clicked/watched
 
interface TrainingExample {
  features: RankingModelInputs;
  labels: {
    clicked: boolean;
    watchDuration: number;
    liked: boolean;
    shared: boolean;
    subscribed: boolean;
  };
  // Sample weight (for importance sampling)
  weight: number;
}
 
// ================================================================
// TRAINING DATA PROCESSING
// ================================================================
 
class TrainingDataPipeline {
  async generateTrainingData(date: Date): Promise<TrainingDataset> {
    // 1. Load interaction logs from data warehouse
    const interactions = await this.loadInteractions(date);
    
    // 2. Join with feature snapshots
    // (features as they were when interaction happened)
    const examples = await this.joinWithFeatures(interactions);
    
    // 3. Apply sampling strategies
    //    - Downsample negative examples (most impressions are not clicks)
    //    - Upsample rare positive signals (shares are rare but valuable)
    const balanced = this.applySampling(examples, {
      negativeRatio: 5,   // 5 negatives per positive
      rarePositiveBoost: 2, // 2x weight for shares/subscribes
    });
    
    // 4. Create training/validation split
    //    (temporal split: older for training, recent for validation)
    const { training, validation } = this.temporalSplit(balanced, {
      validationHours: 6,
    });
    
    return { training, validation };
  }
  
  private applySampling(
    examples: TrainingExample[],
    config: SamplingConfig
  ): TrainingExample[] {
    const positives = examples.filter(e => e.labels.clicked);
    const negatives = examples.filter(e => !e.labels.clicked);
    
    // Downsample negatives
    const sampledNegatives = this.sampleWithoutReplacement(
      negatives,
      positives.length * config.negativeRatio
    );
    
    // Adjust weights for sampled negatives
    const negativeWeight = negatives.length / sampledNegatives.length;
    for (const neg of sampledNegatives) {
      neg.weight *= negativeWeight;
    }
    
    // Boost rare positives
    for (const pos of positives) {
      if (pos.labels.shared || pos.labels.subscribed) {
        pos.weight *= config.rarePositiveBoost;
      }
    }
    
    return [...positives, ...sampledNegatives];
  }
}
 
// ================================================================
// MODEL TRAINING
// ================================================================
 
interface TrainingConfig {
  // Model architecture
  architecture: {
    userTowerLayers: number[];      // e.g., [512, 256, 128]
    videoTowerLayers: number[];
    crossLayers: number;            // Deep & Cross Network layers
    finalLayers: number[];          // e.g., [256, 128]
  };
  
  // Multi-task learning
  tasks: {
    click: { weight: 1.0, loss: 'binary_crossentropy' };
    watchTime: { weight: 0.5, loss: 'mse' };
    like: { weight: 0.3, loss: 'binary_crossentropy' };
    share: { weight: 0.2, loss: 'binary_crossentropy' };
  };
  
  // Training hyperparameters
  training: {
    batchSize: 4096;
    learningRate: 0.001;
    epochs: 5;                      // Few epochs, lots of data
    earlyStoppingPatience: 2;
  };
  
  // Regularization
  regularization: {
    dropout: 0.2;
    l2: 0.0001;
    embeddingL2: 0.00001;
  };
}
 
class ModelTrainer {
  async train(
    dataset: TrainingDataset,
    config: TrainingConfig
  ): Promise<TrainedModel> {
    // Build model
    const model = this.buildModel(config.architecture);
    
    // Multi-task loss function
    const lossWeights = Object.fromEntries(
      Object.entries(config.tasks).map(([task, cfg]) => [task, cfg.weight])
    );
    
    // Train with distributed training (data parallel)
    const strategy = new tf.distribute.MirroredStrategy();
    
    await strategy.scope(() => {
      model.compile({
        optimizer: tf.train.adam(config.training.learningRate),
        loss: this.buildMultiTaskLoss(config.tasks),
        lossWeights,
      });
      
      return model.fit(dataset.training, {
        epochs: config.training.epochs,
        batchSize: config.training.batchSize,
        validationData: dataset.validation,
        callbacks: [
          tf.callbacks.earlyStopping({ patience: config.training.earlyStoppingPatience }),
          new MetricsLogger(),
          new ModelCheckpointer(),
        ],
      });
    });
    
    return model;
  }
}
 
// ================================================================
// CONTINUOUS TRAINING
// ================================================================
 
class ContinuousTrainingPipeline {
  // Train new model daily
  async dailyTrainingJob(): Promise<void> {
    // 1. Generate training data from yesterday
    const dataset = await this.dataGenerator.generateTrainingData(yesterday());
    
    // 2. Train new model
    const newModel = await this.trainer.train(dataset, this.config);
    
    // 3. Evaluate on holdout set
    const metrics = await this.evaluator.evaluate(newModel, holdoutSet);
    
    // 4. Compare with current production model
    const comparison = await this.compare(newModel, this.productionModel);
    
    // 5. If better, gradually roll out
    if (comparison.improvement > 0.005) { // 0.5% improvement threshold
      await this.deployModel(newModel, {
        rolloutStrategy: 'gradual',
        initialPercent: 5,
        maxPercent: 100,
        rolloutDays: 7,
      });
    }
  }
}

Feedback Loop Latency

Models trained on yesterday's data are always slightly behind. For fast-moving signals (trending videos, breaking news), supplement with real-time boosting rules that don't require retraining.

Serving Infrastructure

Serving recommendations at scale requires specialized infrastructure optimized for low-latency, high-throughput inference. Every millisecond of latency impacts user experience and engagement.

serving-infrastructure.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
// ================================================================
// FEATURE STORE
// ================================================================
// Pre-computed features for low-latency serving
 
interface FeatureStore {
  // User features: updated hourly, cached in memory
  userFeatures: {
    storage: Redis | Memcached;
    ttl: '1 hour';
    precomputed: [
      'watch_history_embedding',
      'topic_preferences',
      'activity_level',
      'demographic_segment',
    ];
  };
  
  // Video features: updated on upload, cached long-term
  videoFeatures: {
    storage: Redis | Bigtable;
    ttl: 'indefinite';
    precomputed: [
      'content_embedding',
      'popularity_metrics',
      'creator_features',
    ];
  };
  
  // Real-time features: computed at request time
  realtimeFeatures: [
    'session_context',
    'recency_since_last_watch',
  ];
}
 
class FeatureServer {
  private userCache: Redis;
  private videoCache: Bigtable;
  
  async getUserFeatures(userId: string): Promise<UserFeatures> {
    // Try cache first
    const cached = await this.userCache.get(`user:${userId}`);
    if (cached) return JSON.parse(cached);
    
    // Fallback to cold-start features
    return this.getColdStartUserFeatures(userId);
  }
  
  async getVideoFeaturesBatch(videoIds: string[]): Promise<Map<string, VideoFeatures>> {
    // Batch request for efficiency
    const keys = videoIds.map(id => `video:${id}`);
    const results = await this.videoCache.getBatch(keys);
    
    const features = new Map();
    for (const [key, value] of results) {
      const videoId = key.replace('video:', '');
      features.set(videoId, JSON.parse(value));
    }
    
    return features;
  }
}
 
// ================================================================
// MODEL SERVING
// ================================================================
 
interface ServingCluster {
  // TensorFlow Serving or TensorRT for GPU inference
  modelServers: {
    replicas: 100;
    instanceType: 'GPU' | 'CPU';
    modelsLoaded: ['ranking_v23', 'ranking_v24'];  // A/B testing
    batchSize: 128;     // Batch requests for GPU efficiency
    maxConcurrency: 32;
  };
  
  // Load balancing
  loadBalancer: {
    strategy: 'least_connections';
    healthCheck: '/health';
    timeoutMs: 50;
  };
}
 
class ModelServer {
  private model: TFServingClient;
  
  async predict(features: RankingModelInputs[]): Promise<RankingModelOutputs[]> {
    // Batch for GPU efficiency
    const batches = this.createBatches(features, 128);
    
    const results: RankingModelOutputs[] = [];
    
    for (const batch of batches) {
      const batchResults = await this.model.predict({
        inputs: this.tensorize(batch),
        outputNames: ['p_click', 'p_watch', 'expected_watch_time', 'p_like'],
      });
      
      results.push(...this.parsePredictions(batchResults));
    }
    
    return results;
  }
}
 
// ================================================================
// REQUEST HANDLING
// ================================================================
 
class RecommendationService {
  private candidateGenerator: CandidateAggregator;
  private featureStore: FeatureServer;
  private ranker: ModelServer;
  private reranker: ReRanker;
  
  async getRecommendations(request: RecommendationRequest): Promise<RecommendationResponse> {
    const startTime = Date.now();
    
    try {
      // Stage 1: Candidate generation (target: 20ms)
      const t1 = Date.now();
      const candidates = await this.candidateGenerator.generateCandidates(request);
      this.logLatency('candidate_generation', Date.now() - t1);
      
      // Stage 2: Feature extraction (target: 15ms)
      const t2 = Date.now();
      const features = await this.extractFeatures(request.userId, candidates);
      this.logLatency('feature_extraction', Date.now() - t2);
      
      // Stage 3: Ranking (target: 25ms)
      const t3 = Date.now();
      const ranked = await this.ranker.rank(features);
      this.logLatency('ranking', Date.now() - t3);
      
      // Stage 4: Re-ranking and filtering (target: 10ms)
      const t4 = Date.now();
      const final = await this.reranker.rerank(ranked, request);
      this.logLatency('reranking', Date.now() - t4);
      
      // Stage 5: Response assembly (target: 5ms)
      const response = await this.assembleResponse(final, request);
      
      // Log total latency
      const totalLatency = Date.now() - startTime;
      this.logLatency('total', totalLatency);
      
      if (totalLatency > 100) {
        this.logSlowRequest(request, totalLatency);
      }
      
      return response;
      
    } catch (error) {
      // Fallback to cached/popular recommendations
      return this.getFallbackRecommendations(request);
    }
  }
  
  // Latency budget allocation
  private readonly LATENCY_BUDGETS = {
    candidateGeneration: 20,
    featureExtraction: 15,
    ranking: 25,
    reranking: 10,
    assembly: 5,
    network: 20,
    total: 100,
  };
}
 
// ================================================================
// CACHING LAYERS
// ================================================================
 
class RecommendationCache {
  // Cache recommendations for recently active users
  // Reduces cold-start latency and handles traffic spikes
  
  async getOrCompute(request: RecommendationRequest): Promise<RecommendationResponse> {
    const cacheKey = this.computeCacheKey(request);
    
    // Check cache (valid for ~5 minutes for active users)
    const cached = await this.cache.get(cacheKey);
    if (cached && this.isValid(cached, request)) {
      return cached;
    }
    
    // Compute fresh recommendations
    const fresh = await this.service.getRecommendations(request);
    
    // Cache for next request
    await this.cache.set(cacheKey, fresh, { ttl: 300 });
    
    return fresh;
  }
  
  private computeCacheKey(request: RecommendationRequest): string {
    // Cache key based on user + context
    // Fine-grained enough to be relevant, coarse enough to get hits
    return `rec:${request.userId}:${request.surface}:${Math.floor(Date.now() / 60000)}`;
  }
}

Latency Budget by Stage
Stage	Budget (ms)	Key Optimizations
Candidate generation	20	ANN search, parallel generators, pre-computed embeddings
Feature extraction	15	Feature store, batch lookups, caching
Ranking inference	25	GPU batching, model optimization, quantization
Re-ranking	10	Simple rules, no ML
Response assembly	5	Pre-fetched metadata
Network overhead	20	CDN, connection reuse
Total	< 100	–

Handling Special Cases

Real-world recommendation systems must handle numerous edge cases and special scenarios that don't fit the standard pipeline.

Critical Edge Cases

•Cold start (new users) — No watch history to personalize from. Use demographic similarity, trending content, and explore/exploit strategies to learn preferences quickly.
•Cold start (new videos) — No engagement data to rank with. Use content features, creator track record, and exploration slots to gather initial signals.
•Trending/viral content — Standard models are trained on yesterday's data; viral videos need real-time boosting before models catch up.
•Breaking news — Time-sensitive content requires rapid promotion, bypassing normal ranking for relevance.
•Returning users after absence — Preferences may have shifted. Need exploration to re-learn while not alienating with irrelevant content.
•Multi-profile (shared accounts) — Different people using same account. Need session-level personalization, not just user-level.

special-cases.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
// ================================================================
// COLD START: NEW USERS
// ================================================================
 
class ColdStartHandler {
  async getRecommendationsForNewUser(
    request: RecommendationRequest
  ): Promise<RecommendationResponse> {
    // 1. Use available signals
    const signals: ColdStartSignals = {
      device: request.device,        // Mobile users prefer shorter content
      country: request.country,      // Regional preferences
      language: request.language,    // Language-appropriate content
      referrer: request.referrer,    // What brought them here
      firstSearch: request.firstSearch, // If they searched first
    };
    
    // 2. Find similar users (demographic clustering)
    const similarUserCluster = await this.findSimilarCluster(signals);
    
    // 3. Get popular content for that cluster
    const popularInCluster = await this.getPopularForCluster(similarUserCluster);
    
    // 4. Mix with global trending for discovery
    const trending = await this.getTrendingForRegion(request.country);
    
    // 5. Include exploration content to learn preferences
    const exploration = await this.getExplorationContent(signals);
    
    return this.blend([
      { weight: 0.4, content: popularInCluster },
      { weight: 0.3, content: trending },
      { weight: 0.3, content: exploration },
    ]);
  }
}
 
// ================================================================
// COLD START: NEW VIDEOS
// ================================================================
 
class NewVideoHandler {
  async scoreNewVideo(
    video: Video,
    user: User
  ): Promise<number> {
    // No engagement history - use content features
    const contentScore = this.scoreByContent(video, user);
    
    // Creator track record
    const creatorScore = await this.getCreatorScore(video.channelId);
    
    // Similar videos' performance
    const similarScore = await this.getSimilarVideosScore(video);
    
    // Combine with uncertainty bonus (explore new videos)
    const explorationBonus = this.computeExplorationBonus(video.uploadAge);
    
    return (
      contentScore * 0.3 +
      creatorScore * 0.3 +
      similarScore * 0.3 +
      explorationBonus * 0.1
    );
  }
  
  private computeExplorationBonus(uploadAgeHours: number): number {
    // Higher bonus for very new videos (encourage exploration)
    // Decays over first 48 hours
    if (uploadAgeHours < 1) return 1.0;
    if (uploadAgeHours < 6) return 0.8;
    if (uploadAgeHours < 24) return 0.5;
    if (uploadAgeHours < 48) return 0.2;
    return 0;
  }
}
 
// ================================================================
// REAL-TIME TRENDING BOOST
// ================================================================
 
class TrendingBooster {
  private trendingScores: Map<string, TrendingScore> = new Map();
  
  // Update every few minutes
  async updateTrendingScores(): Promise<void> {
    // Get real-time view velocity
    const velocities = await this.getViewVelocities();
    
    for (const [videoId, velocity] of velocities) {
      const historical = await this.getHistoricalVelocity(videoId);
      
      // Score = how much faster than expected
      const ratio = velocity / (historical || 1);
      
      if (ratio > 5) {
        // 5x+ normal velocity = viral
        this.trendingScores.set(videoId, {
          boost: Math.min(ratio / 10, 2.0), // Cap at 2x boost
          reason: 'viral',
          detectedAt: Date.now(),
        });
      }
    }
  }
  
  applyTrendingBoost(video: RankedVideo): number {
    const trending = this.trendingScores.get(video.videoId);
    if (trending && Date.now() - trending.detectedAt < 3600000) {
      return video.finalScore * (1 + trending.boost);
    }
    return video.finalScore;
  }
}
 
// ================================================================
// DIVERSITY AND EXPLORATION
// ================================================================
 
class DiversityReRanker {
  rerank(videos: RankedVideo[]): RankedVideo[] {
    const result: RankedVideo[] = [];
    const usedCreators = new Set<string>();
    const usedCategories = new Map<string, number>();
    
    for (const video of videos) {
      // Penalize if creator already in results
      if (usedCreators.has(video.channelId)) {
        video.finalScore *= 0.7; // 30% penalty
      }
      
      // Penalize if too many from same category
      const categoryCount = usedCategories.get(video.category) || 0;
      if (categoryCount >= 3) {
        video.finalScore *= 0.8;
      }
      
      result.push(video);
      usedCreators.add(video.channelId);
      usedCategories.set(video.category, categoryCount + 1);
    }
    
    // Re-sort after penalties
    return result.sort((a, b) => b.finalScore - a.finalScore);
  }
  
  // Insert exploration slots
  injectExploration(videos: RankedVideo[], positions: number[]): RankedVideo[] {
    const exploration = this.getExplorationVideos();
    
    for (const pos of positions) {
      if (pos < videos.length) {
        videos.splice(pos, 0, exploration.shift()!);
      }
    }
    
    return videos;
  }
}

Explore vs. Exploit

Responsible Recommendations

Responsibility Considerations

•Misinformation demoting — Reduce visibility of content flagged by fact-checkers. Don't recommend borderline content even if engaging.
•Radicalization prevention — Detect and interrupt recommendation paths that lead to increasingly extreme content.
•Advertiser safety — Don't recommend content that would put ads in inappropriate contexts.
•Creator fairness — Avoid winner-take-all dynamics where small creators can't get discovered.
•User wellbeing — Consider 'time well spent' metrics, not just time spent. Avoid patterns that feel addictive.
•Transparency — Provide users insight into why recommendations are made. Enable preference controls.

responsible-recommendations.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
// ================================================================
// POLICY-BASED FILTERING
// ================================================================
 
class SafetyFilter {
  async filter(videos: RankedVideo[], user: User): Promise<RankedVideo[]> {
    const filtered: RankedVideo[] = [];
    
    for (const video of videos) {
      const policy = await this.getPolicyStatus(video.videoId);
      
      // Remove policy-violating content
      if (policy.violated) continue;
      
      // Demote borderline content (not removed, but lower ranked)
      if (policy.borderline) {
        video.finalScore *= 0.3; // Heavy demotion
      }
      
      // Apply age restrictions
      if (policy.ageRestricted && user.age < 18) continue;
      
      // Apply user's content preferences
      if (user.restrictedMode && policy.matureContent) continue;
      
      filtered.push(video);
    }
    
    return filtered;
  }
}
 
// ================================================================
// RADICALIZATION PREVENTION
// ================================================================
 
class RadicalizationMonitor {
  async checkRecommendationPath(
    session: Session,
    newRecommendations: RankedVideo[]
  ): Promise<RankedVideo[]> {
    // Track content 'direction' in session
    const sessionPath = await this.getSessionContentPath(session);
    
    // Detect if trending toward extreme content
    const extremityTrend = this.computeExtremityTrend(sessionPath);
    
    if (extremityTrend > this.THRESHOLD) {
      // Intervene: inject moderate/authoritative content
      const intervention = await this.getInterventionContent(
        sessionPath,
        { type: 'authoritative', topic: sessionPath.mainTopic }
      );
      
      // Replace some recommendations with intervention
      return this.blend([
        { weight: 0.6, content: newRecommendations },
        { weight: 0.4, content: intervention },
      ]);
    }
    
    return newRecommendations;
  }
}
 
// ================================================================
// CREATOR FAIRNESS
// ================================================================
 
class CreatorFairnessHandler {
  async adjustForFairness(videos: RankedVideo[]): Promise<RankedVideo[]> {
    // Ensure small/new creators get some exposure
    const smallCreatorSlots = Math.ceil(videos.length * 0.1); // 10% of slots
    
    const smallCreatorVideos = videos.filter(v => 
      v.creatorSubscribers < 10000 && v.qualityScore > 0.6
    );
    
    const largeCreatorVideos = videos.filter(v => 
      v.creatorSubscribers >= 10000
    );
    
    // Interleave to ensure small creators appear
    return this.interleave(
      largeCreatorVideos.slice(0, videos.length - smallCreatorSlots),
      smallCreatorVideos.slice(0, smallCreatorSlots)
    );
  }
}
 
// ================================================================
// WELLBEING FEATURES
// ================================================================
 
class WellbeingHandler {
  async adjustForWellbeing(
    videos: RankedVideo[],
    session: Session
  ): Promise<RecommendationResponse> {
    const response: RecommendationResponse = { videos };
    
    // Check session length
    if (session.duration > 2 * 3600) { // 2+ hours
      // Prompt to take a break
      response.wellbeingPrompt = {
        type: 'break_reminder',
        message: "You've been watching for a while. Consider taking a break.",
      };
    }
    
    // Check time of day
    const userLocalHour = this.getUserLocalHour(session);
    if (userLocalHour >= 0 && userLocalHour < 5) { // Late night
      // Suggest bedtime reminder if enabled
      if (session.user.bedtimeReminders) {
        response.wellbeingPrompt = {
          type: 'bedtime_reminder',
          message: "It's getting late. Set a reminder to stop watching?",
        };
      }
    }
    
    return response;
  }
}

Engagement vs. Wellbeing Tension

Maximizing watch time and maximizing user wellbeing are sometimes in conflict. Responsible platforms must make deliberate tradeoffs, accepting some engagement loss for healthier usage patterns.

Recommendation Engine Summary

We've explored the architecture, algorithms, and considerations for building a world-class video recommendation system. Let's consolidate the key takeaways:

Key Design Decisions

•Multi-stage pipeline — Candidate generation (fast, high recall), ranking (expensive, high precision), re-ranking (business rules). Each stage progressively narrows options.
•Multiple candidate sources — Collaborative filtering, content-based, trending, subscriptions. Parallel generators capture different relevance signals.
•Deep ranking models — Multi-task learning predicting click, watch time, likes, shares. Combine objectives based on business goals.
•Feature engineering — Pre-computed features in feature stores. Real-time features computed at request time. Interaction features (user × item crosses) are highly predictive.
•Low-latency serving — Feature stores, GPU batching, model optimization, caching. Total latency < 100ms for good UX.
•Cold start handling — Demographic similarity for new users, creator track record for new videos, exploration bonuses for both.
•Responsible design — Safety filtering, radicalization prevention, creator fairness, wellbeing features. Engagement isn't the only objective.

Module Complete:

Module Complete

6 / 6