System Design HLDTikTok Short Videos

Designing TikTok: Short-Form Video Platform

LevelAdvanced

Duration120 mins

TopicTikTok Short Videos

3 / 6

For You Page Algorithm

The Algorithm That Changed Everything

TikTok's For You Page (FYP) is often described as 'addictive' or 'uncannily accurate.' Users report the algorithm understanding their preferences better than they understand themselves. This isn't hyperbole—it's the result of perhaps the most sophisticated content recommendation system ever built for consumer media.\n\nWhat makes the FYP special?\n\n- Interest-first, not social-first: Unlike Instagram or Facebook, you don't need to follow anyone to see great content\n- Cold start excellence: New users get compelling content within 5 videos; new creators can go viral with zero followers\n- Real-time adaptation: Watch 3 cooking videos, and your feed immediately shifts to include more cooking\n- Content-level personalization: It's not just 'you like cooking'—it's 'you like sous vide techniques from home chefs, not professional chefs, in evening posting times'\n- Serendipity preservation: Even perfect personalization gets boring; the algorithm intentionally introduces novel content\n\nThis page dissects how to build a recommendation system that achieves these properties at 6+ million requests per second.

Learning Objectives

By the end of this page, you will understand: (1) Multi-stage retrieval and ranking architecture, (2) User and content embedding approaches, (3) Real-time feature stores for online ML, (4) Cold start strategies for users and content, (5) Exploration-exploitation trade-offs, and (6) A/B testing infrastructure for recommendation systems.

Recommendation System Philosophy

Before diving into architecture, we must understand the philosophical approach that distinguishes TikTok's recommendations from competitors.

Traditional Approach (YouTube, Netflix)

•Optimize for long content (higher watch time)
•Favor established creators (proven performance)
•Episodic content (series, channels)
•Collaborative filtering (users like you liked...)
•Batch model updates (daily/weekly retraining)
•Explicit signals prioritized (likes, subscriptions)

TikTok Approach

•Optimize for content completion (full video watch)
•Meritocratic surfacing (any video can win)
•Atomic content (each video stands alone)
•Content-first understanding (visual, audio, text)
•Near-real-time model adaptation
•Implicit signals dominate (watch time, rewatches, shares)

The Fundamental Equation\n\nAt its core, TikTok's recommendation problem can be framed as:\n\n\nScore(user, video) = P(engagement | user_features, video_features, context_features)\n\n\nWhere engagement is a weighted combination of:\n- Completion rate: Did they watch to the end? (highest weight)\n- Rewatch: Did they watch it multiple times? (very high signal)\n- Share: Did they share externally? (high intent signal)\n- Save to Favorites: Did they bookmark it?\n- Like: Did they explicitly like it?\n- Comment: Did they engage in discussion?\n- Follow creator: Did they follow after watching?\n- Negative signals: Did they scroll away quickly? Long press and select 'Not Interested'?\n\nThe model predicts the probability of these engagement actions and combines them into a ranking score.

Why Completion Rate Matters Most

Likes are cheap—one tap. Shares require social risk. But watching a 30-second video to completion is a strong signal of genuine interest. Rewatching is even stronger. TikTok's short format makes completion rate a dense, high-quality signal unavailable to long-form platforms.

Multi-Stage Ranking Architecture

With 500+ million active videos and 1 billion users, scoring every video for every request is computationally impossible. Instead, TikTok uses a funnel architecture that progressively narrows candidates while applying increasingly sophisticated (and expensive) models at each stage.

Converting Mermaid diagram...

Funnel Stage Details
Stage	Input Size	Output Size	Latency	Model Complexity
Candidate Retrieval	500M+ videos	~10,000	10ms	Approximate nearest neighbor (ANN)
Pre-Filtering	10,000	2,000	5ms	Rule-based, bloom filters
Coarse Ranking	2,000	500	20ms	GBDT (LightGBM/XGBoost)
Fine Ranking	500	100	30ms	Deep neural network (DNN)
Re-Ranking	100	20	10ms	Rule-based + lightweight ML

Latency Budget

Total ranking latency must be <100ms to maintain smooth scrolling experience. Pre-fetching the next batch while user watches current video hides most latency. But if ranking takes >150ms, users perceive stutter on fast scrolling.

Candidate Retrieval Strategies

The retrieval stage must quickly identify ~10,000 potentially relevant videos from a pool of 500M+. This is achieved through multiple retrieval triggers that each contribute candidates, later merged and deduplicated.

Retrieval Trigger Types

•User Embedding Similarity — Convert user's recent watch history into a dense embedding vector. Find videos whose embeddings are nearest neighbors in embedding space. Uses ANN indices (FAISS, ScaNN, Milvus) for sub-millisecond retrieval.
•Two-Tower Retrieval — Separate neural networks encode users and videos into the same embedding space. Inner product of embeddings predicts engagement. Efficient because video embeddings are pre-computed.
•Collaborative Filtering — 'Users who watched X also watched Y'. Item-item similarity matrices pre-computed offline. Contributes candidates based on user's recent watches.
•Content-Based Triggers — Videos with similar visual/audio features to recently watched content. Uses perceptual hashing and multimodal embeddings.
•Trending Content — Currently viral videos in user's region/language. Provides exposure to cultural moments regardless of personal history.
•Following Feed Leak — Some percentage of slots reserved for followed creators' new content, even if algorithm wouldn't otherwise surface it.
•Exploration Pool — Random or cold-start content to evaluate new videos and prevent filter bubbles. Essential for new creator discovery.

candidate-retrieval.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import numpy as np
from typing import List, Set
from dataclasses import dataclass
 
@dataclass
class RetrievalConfig:
    embedding_candidates: int = 5000      # From ANN embedding search
    collaborative_candidates: int = 2000   # From CF item-item
    trending_candidates: int = 1000        # From regional trending
    following_candidates: int = 1000       # From followed creators
    exploration_candidates: int = 1000     # Random exploration
 
class CandidateRetrieval:
    def __init__(self, config: RetrievalConfig):
        self.config = config
        self.ann_index = load_ann_index()  # FAISS/ScaNN index
        self.cf_model = load_cf_model()
        self.trending_cache = TrendingCache()
        
    async def retrieve_candidates(
        self,
        user_id: str,
        user_embedding: np.ndarray,
        recent_watches: List[str],
        followed_creators: List[str]
    ) -> Set[str]:
        """Retrieve candidate videos from multiple sources."""
        
        # Parallel retrieval from all sources
        results = await asyncio.gather(
            self._embedding_retrieval(user_embedding),
            self._collaborative_retrieval(recent_watches),
            self._trending_retrieval(user_id),
            self._following_retrieval(followed_creators),
            self._exploration_retrieval()
        )
        
        # Merge and deduplicate
        all_candidates = set()
        for candidate_list in results:
            all_candidates.update(candidate_list)
        
        # Remove already-watched videos
        all_candidates -= set(recent_watches)
        
        return all_candidates  # ~10,000 candidates
    
    async def _embedding_retrieval(
        self, 
        user_embedding: np.ndarray
    ) -> List[str]:
        """Approximate nearest neighbor search."""
        video_ids, distances = self.ann_index.search(
            user_embedding,
            k=self.config.embedding_candidates
        )
        return video_ids
    
    async def _exploration_retrieval(self) -> List[str]:
        """Sample from exploration pool for new content discovery."""
        # Stratified sample: 50% new videos (<24h), 50% random
        new_videos = sample_recent_videos(
            count=self.config.exploration_candidates // 2
        )
        random_videos = sample_random_videos(
            count=self.config.exploration_candidates // 2
        )
        return new_videos + random_videos

ANN Index at Scale

500M video embeddings at 128 dimensions = 256GB of vectors. FAISS with IVF+PQ can compress to ~50GB while maintaining 95% recall. Distributed across multiple shards with routing based on user location. Index updates every 15-30 minutes for new content; real-time updates for extremely viral videos.

Feature Engineering for Ranking

The ranking models consume hundreds of features that capture user preferences, video characteristics, and contextual signals. Careful feature engineering is often more impactful than model architecture improvements.

Feature Categories
Category	Example Features	Update Frequency	Storage
User Static	Age, gender, country, language, account age	Rarely	User profile DB
User Historical	Genre preferences, avg watch time, favorite sounds	Daily batch	Feature store
User Session	Recent watches, current mood, time since last engagement	Real-time	Session cache
Video Static	Duration, resolution, has_text, music_id, creator_id	At upload	Video metadata DB
Video Engagement	Total views, like rate, completion rate, share rate	Near-real-time	Engagement store
Video Content	Visual embedding, audio embedding, text transcript	At upload	Feature store
Cross Features	User-creator affinity, user-genre history, time-of-day patterns	Computed online	Calculated at inference
Contextual	Device type, network quality, time of day, day of week	Request-time	Request context

High-Signal Features (Based on Public Research)

•Video Completion Rate by Cohort — Completion rate for similar users is highly predictive. 'Users in your demographic complete this video 87% of the time.'
•Rewatch Rate — Single highest-signal engagement metric. If users rewatch, the content is compelling.
•Time-Weighted Watch History — Recent watches matter more. Exponential decay on historical preferences (half-life ~24 hours for trends, ~7 days for genres).
•Creator Authority Score — Combination of follower count, average engagement, account age, verification status. Weighted lower for new content discovery.
•Sound/Hashtag Trending Velocity — Is this sound/hashtag currently growing? Trending content gets temporary boost.
•Social Proof Signals — Did friends/followed accounts engage? High-trust signal but lower reach in interest-graph model.
•Negative Feedback Rate — 'Not Interested' selections, fast skips, reports. Heavily penalize videos with high negative rates for similar users.

Feature Store Architecture

Features are served from a unified feature store (Feast, Tecton, or custom) that provides: (1) Low-latency online serving (<5ms), (2) Point-in-time correct training data, (3) Feature versioning and lineage, (4) Streaming updates for real-time features. The feature store is the backbone of ML infrastructure at scale.

Deep Learning Ranking Models

The fine ranking stage uses deep neural networks to predict engagement probability. TikTok's models evolved through several generations, each improving personalization quality.

Model Architecture Evolution

•Generation 1: Wide & Deep — Google's classic architecture. Wide component (linear model) memorizes specific feature crosses; deep component (MLP) generalizes. Baseline for recommendations.
•Generation 2: DeepFM — Replaces wide component with factorization machines. Better automatic feature interaction learning. Still widely used as strong baseline.
•Generation 3: DIN (Deep Interest Network) — Attention mechanism over user's historical interactions. Weights recent watches by relevance to candidate video. Major improvement for personalization.
•Generation 4: Transformer-based — Full transformer architecture over interaction sequences. Self-attention captures long-range dependencies in user behavior. Current state-of-the-art.
•Multi-Task Learning — Single model predicts multiple objectives (completion, like, share, follow) with shared layers and task-specific heads. Improves data efficiency and captures objective correlations.

Converting Mermaid diagram...

Model Serving at 6M QPS

Serving a deep neural network to 6 million requests per second requires: (1) Model quantization (FP32 → INT8) for 4x throughput, (2) Batched inference to maximize GPU utilization, (3) Model partitioning across multiple GPUs, (4) Aggressive caching of user/video embeddings, (5) Early exit for obvious high/low-score candidates. TensorRT, TF Serving, or custom inference engines are essential.

Cold Start Strategies

Cold start is simultaneously TikTok's greatest challenge and greatest achievement. The platform must provide excellent recommendations for:\n\n- New users: No watch history, preferences unknown\n- New videos: No engagement data, quality unknown\n- New creators: No track record, audience unknown\n\nTikTok's superiority in cold start is a major competitive moat.

New User Cold Start

•Contextual Priors — Device type (iPhone vs Android), language settings, country, time of day. iPhone users in the US at 8pm get different initial content than Android users in India at 2am.
•Exploration-Heavy Initial Feed — First 20 videos sampled from diverse categories to rapidly probe preferences. Each video is a 'test question' for the algorithm.
•Rapid Preference Inference — After just 3-5 videos, the algorithm has initial signal. Watch to completion = positive signal for that genre/creator-type. Quick skip = negative signal.
•Cluster Assignment — Map new user to existing user clusters based on early signals. 'This user behaves like cluster-42, serve cluster-42 preferences as starting point.'
•Progressive Personalization — With each interaction, uncertainty decreases. After 50 videos, the user has a stable preference embedding. After 500, personalization is deeply refined.

New Video Cold Start

•Content-Based Quality Prediction — ML model predicts expected engagement from video features (visual quality, audio features, text/captions) before any views.
•Initial Traffic Allocation — New videos receive a minimum impression quota regardless of predicted quality. Fair chance to prove themselves.
•Tiered Rollout — Start with 100 impressions. If engagement exceeds threshold, expand to 1000. If positive, 10,000. Exponential expansion for proven content.
•Multi-Armed Bandit — Treat video distribution as explore/exploit problem. Thompson Sampling or UCB algorithms balance trying new videos vs. showing proven hits.
•Trusted Creator Boost — Videos from creators with strong track records get faster initial rollout. Trust is earned through consistent quality.

cold-start-scoring.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
from scipy.stats import beta
import numpy as np
 
class VideoExplorationScore:
    """
    Thompson Sampling for video exploration.
    Each video maintains (successes, failures) = (engagements, non-engagements).
    Sample from Beta distribution to balance explore/exploit.
    """
    
    def __init__(self, prior_alpha: float = 1.0, prior_beta: float = 1.0):
        # Prior represents our belief before any observations
        # (1, 1) = uniform prior, no strong belief
        self.prior_alpha = prior_alpha
        self.prior_beta = prior_beta
    
    def sample_score(
        self,
        engagements: int,
        impressions: int,
        decay_factor: float = 0.95  # Recent data weighted more
    ) -> float:
        """
        Sample expected engagement rate using Thompson Sampling.
        Higher uncertainty = higher exploration bonus.
        """
        non_engagements = impressions - engagements
        
        alpha = self.prior_alpha + engagements
        beta_param = self.prior_beta + non_engagements
        
        # Sample from posterior Beta distribution
        sampled_rate = beta.rvs(alpha, beta_param)
        
        return sampled_rate
    
    def should_explore(
        self,
        video_age_hours: float,
        impressions: int,
        min_impressions: int = 100
    ) -> bool:
        """
        Videos under min_impressions should always be explored.
        Freshness boost for new content.
        """
        if impressions < min_impressions:
            return True
        
        # Freshness factor: new videos get exploration boost
        freshness_urgency = max(0, 24 - video_age_hours) / 24
        exploration_probability = 0.1 + 0.3 * freshness_urgency
        
        return np.random.random() < exploration_probability

The Exploration Tax

Every video shown for exploration is a slot NOT given to a proven high-engagement video. This 'exploration tax' reduces short-term engagement metrics. But without it, new creators can't surface, and the content ecosystem stagnates. TikTok accepts ~10-15% of impressions going to exploration to maintain content diversity.

Real-Time Signal Integration

A defining feature of TikTok is how quickly the algorithm adapts. Watch 3 cooking videos × your next scroll shows more cooking. This requires real-time signal processing and model adaptation that most recommendation systems lack.

Converting Mermaid diagram...

Real-Time Signal Types

•Watch Events — Video ID, watch duration, completion %, rewatch flag. Sent every 5 seconds during playback + on video exit.
•Engagement Events — Like, comment, share, save, follow. Sent immediately on action.
•Negative Signals — Quick skip (<2s), 'Not Interested' selection, report. Immediately updates session preferences.
•Session Context — Scroll velocity, session duration, interaction density. Infers user mood/intent.
•External Context — Time transitions (lunch to evening), location changes, network changes.

Session State Management\n\nThe key innovation is maintaining a session preference vector that updates in real-time:\n\n\nSession Start: session_pref = user_long_term_pref\nAfter Video 1: session_pref = 0.9 * session_pref + 0.1 * video_1_features\nAfter Video 2: session_pref = 0.9 * session_pref + 0.1 * video_2_features\n...\n\n\nThis exponential moving average rapidly incorporates recent signals while maintaining stability. The session preference vector is passed to the ranking model alongside static user preferences, allowing the model to capture 'what does this user want right now?' separate from 'what does this user generally like?'

Latency Requirements

End-to-end latency from user action to updated ranking must be <500ms. User likes a video → next scroll includes similar content. This requires: (1) Edge-to-backend event delivery <100ms, (2) Stream processing update <200ms, (3) Feature store propagation <100ms, (4) Ranking service reads fresh features on next request.

Diversity and Filter Bubble Prevention

Pure engagement optimization creates filter bubbles—users see only what they've already expressed interest in, missing potentially engaging new content. This is both an ethical concern (echo chambers) and a business concern (user boredom leads to churn).

Diversity Injection Strategies

•Category Diversity — Don't show 5 cooking videos in a row even if predicted engagement is highest. Force rotation across categories using maximal marginal relevance (MMR).
•Creator Diversity — Limit consecutive videos from same creator. Ensure exposure to variety of voices, especially new creators.
•Sound/Hashtag Diversity — Popular sounds can dominate. Cap exposure to any single sound within a session.
•Freshness Slots — Reserve 10-15% of slots for content <24 hours old regardless of predicted engagement.
•Serendipity Injection — Occasionally show content from categories user hasn't engaged with. 'You've never watched comedy, but this video is performing exceptionally well.'
•Time-Based Decay — Reduce repeat exposure to same visual/audio patterns. If user saw 10 dance videos yesterday, reduce dance video rate today.

diversity-reranker.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
from typing import List, Dict
from dataclasses import dataclass
import numpy as np
 
@dataclass
class Video:
    id: str
    category: str
    creator_id: str
    sound_id: str
    score: float
    embedding: np.ndarray
 
def maximal_marginal_relevance(
    candidates: List[Video],
    selected: List[Video],
    lambda_param: float = 0.7  # Balance relevance vs diversity
) -> Video:
    """
    MMR: Select video that maximizes:
    lambda * relevance(v) - (1-lambda) * max_similarity(v, selected)
    """
    if not selected:
        return max(candidates, key=lambda v: v.score)
    
    best_video = None
    best_mmr = float('-inf')
    
    for candidate in candidates:
        relevance = candidate.score
        
        # Max similarity to already selected videos
        max_sim = max(
            cosine_similarity(candidate.embedding, s.embedding)
            for s in selected
        )
        
        mmr_score = lambda_param * relevance - (1 - lambda_param) * max_sim
        
        if mmr_score > best_mmr:
            best_mmr = mmr_score
            best_video = candidate
    
    return best_video
 
def apply_diversity_constraints(
    ranked_videos: List[Video],
    final_count: int = 20
) -> List[Video]:
    """Apply business rules for diversity."""
    
    selected = []
    category_counts: Dict[str, int] = {}
    creator_counts: Dict[str, int] = {}
    
    MAX_PER_CATEGORY = 4  # No more than 4 videos per category
    MAX_PER_CREATOR = 2   # No more than 2 videos per creator
    
    for video in ranked_videos:
        if len(selected) >= final_count:
            break
            
        # Check constraints
        cat_count = category_counts.get(video.category, 0)
        creator_count = creator_counts.get(video.creator_id, 0)
        
        if cat_count >= MAX_PER_CATEGORY:
            continue
        if creator_count >= MAX_PER_CREATOR:
            continue
        
        selected.append(video)
        category_counts[video.category] = cat_count + 1
        creator_counts[video.creator_id] = creator_count + 1
    
    return selected

The Diversity Paradox

Diversity reduces short-term engagement (showing non-optimal content) but increases long-term retention (preventing boredom). A/B tests show 5-10% diversity injection reduces daily engagement by ~2% but improves 30-day retention by ~5%. The tradeoff is worth it for sustainable platform health.

A/B Testing Infrastructure

TikTok runs hundreds of concurrent A/B experiments on its recommendation system. Every model change, feature addition, or parameter tweak is rigorously tested before full rollout. This experimentation infrastructure is as critical as the models themselves.

Experimentation Components

•User Bucketing — Consistent hash of user ID determines experiment group. Same user always sees same variant for duration of experiment.
•Feature Flags — Server-side flags control which model/parameters are active for each bucket. Instant rollback without code deploy.
•Holdout Groups — 5-10% of users never see new features to measure long-term baseline. Critical for detecting delayed effects.
•Metric Suite — Core metrics: daily engagement (DEI), weekly retention (WAU/DAU ratio), content diversity consumed, creator distribution fairness.
•Statistical Rigor — Sequential testing to detect effects early without inflating false positive rate. Bonferroni correction for multiple comparisons.
•Guardrail Metrics — Automatic experiment termination if safety metrics degrade (hate speech engagement, self-harm content exposure).

Typical Experiment Lifecycle\n\n1. Hypothesis → 'Adding audio similarity features will improve completion rate'\n2. Design → 1% treatment vs 1% control, 14-day duration\n3. Launch → Feature flag enabled, metrics collection starts\n4. Monitoring → Daily metric checks, guardrail alerts\n5. Analysis → Statistical significance reached, lift measured\n6. Decision → Ship (if positive), iterate (if neutral), kill (if negative)\n7. Rollout → Gradual expansion: 1% → 10% → 50% → 100%\n\nVelocity: TikTok runs ~1,000 experiments per quarter on recommendations alone, with ~100 concurrent at any time.

Experiment Interference

With 100+ concurrent experiments, interference is inevitable. User in experiment A might also be in experiments B, C, D. Interaction effects can confound results. Mitigation: (1) Orthogonal experiment layers, (2) Interaction effect modeling, (3) Larger holdout groups for cross-experiment validation.

Summary: For You Page Algorithm

Key Takeaways

•Multi-Stage Funnel — Retrieval → Pre-filter → Coarse Rank → Fine Rank → Re-rank. Each stage narrows candidates while increasing model sophistication.
•Interest Graph > Social Graph — Content-first recommendations enable discovery of content from strangers, not just the social network.
•Completion is King — Video completion rate is the highest-signal engagement metric, uniquely abundant in short-form video.
•Real-Time Adaptation — Session-level preference vectors update with each interaction, enabling instant personalization shifts.
•Cold Start Excellence — Multi-armed bandits, content-based quality prediction, and exploration budgets give new users/videos fair chances.
•Diversity Matters — 10-15% of impressions sacrificed for diversity improves long-term retention despite short-term engagement loss.
•Experiment Everything — 100+ concurrent A/B tests with rigorous statistical discipline drive continuous improvement.

Coming Up Next\n\nThe For You algorithm generates recommendations, but how do those recommendations translate into the engaging, instantaneous experience users feel? The next page explores Real-Time Engagement infrastructure—how likes, comments, and shares flow through the system, update counters at scale, and feed back into the recommendation loop.

Algorithm Complete

You now understand the architecture of one of the most sophisticated recommendation systems in the world. The key insight: it's not a single model but an orchestrated system of retrieval, ranking, and re-ranking stages, each optimized for different tradeoffs. Real-time personalization and deliberate exploration are the differentiators.

3 / 6

Loading learning content...

System Design HLDTikTok Short Videos

Designing TikTok: Short-Form Video Platform

LevelAdvanced

Duration120 mins

TopicTikTok Short Videos

3 / 6

For You Page Algorithm

The Algorithm That Changed Everything

Learning Objectives

Recommendation System Philosophy

Before diving into architecture, we must understand the philosophical approach that distinguishes TikTok's recommendations from competitors.

Traditional Approach (YouTube, Netflix)

•Optimize for long content (higher watch time)
•Favor established creators (proven performance)
•Episodic content (series, channels)
•Collaborative filtering (users like you liked...)
•Batch model updates (daily/weekly retraining)
•Explicit signals prioritized (likes, subscriptions)

TikTok Approach

•Optimize for content completion (full video watch)
•Meritocratic surfacing (any video can win)
•Atomic content (each video stands alone)
•Content-first understanding (visual, audio, text)
•Near-real-time model adaptation
•Implicit signals dominate (watch time, rewatches, shares)

Why Completion Rate Matters Most

Multi-Stage Ranking Architecture

Converting Mermaid diagram...

Funnel Stage Details
Stage	Input Size	Output Size	Latency	Model Complexity
Candidate Retrieval	500M+ videos	~10,000	10ms	Approximate nearest neighbor (ANN)
Pre-Filtering	10,000	2,000	5ms	Rule-based, bloom filters
Coarse Ranking	2,000	500	20ms	GBDT (LightGBM/XGBoost)
Fine Ranking	500	100	30ms	Deep neural network (DNN)
Re-Ranking	100	20	10ms	Rule-based + lightweight ML

Latency Budget

Candidate Retrieval Strategies

Retrieval Trigger Types

•User Embedding Similarity — Convert user's recent watch history into a dense embedding vector. Find videos whose embeddings are nearest neighbors in embedding space. Uses ANN indices (FAISS, ScaNN, Milvus) for sub-millisecond retrieval.
•Two-Tower Retrieval — Separate neural networks encode users and videos into the same embedding space. Inner product of embeddings predicts engagement. Efficient because video embeddings are pre-computed.
•Collaborative Filtering — 'Users who watched X also watched Y'. Item-item similarity matrices pre-computed offline. Contributes candidates based on user's recent watches.
•Content-Based Triggers — Videos with similar visual/audio features to recently watched content. Uses perceptual hashing and multimodal embeddings.
•Trending Content — Currently viral videos in user's region/language. Provides exposure to cultural moments regardless of personal history.
•Following Feed Leak — Some percentage of slots reserved for followed creators' new content, even if algorithm wouldn't otherwise surface it.
•Exploration Pool — Random or cold-start content to evaluate new videos and prevent filter bubbles. Essential for new creator discovery.

candidate-retrieval.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import numpy as np
from typing import List, Set
from dataclasses import dataclass
 
@dataclass
class RetrievalConfig:
    embedding_candidates: int = 5000      # From ANN embedding search
    collaborative_candidates: int = 2000   # From CF item-item
    trending_candidates: int = 1000        # From regional trending
    following_candidates: int = 1000       # From followed creators
    exploration_candidates: int = 1000     # Random exploration
 
class CandidateRetrieval:
    def __init__(self, config: RetrievalConfig):
        self.config = config
        self.ann_index = load_ann_index()  # FAISS/ScaNN index
        self.cf_model = load_cf_model()
        self.trending_cache = TrendingCache()
        
    async def retrieve_candidates(
        self,
        user_id: str,
        user_embedding: np.ndarray,
        recent_watches: List[str],
        followed_creators: List[str]
    ) -> Set[str]:
        """Retrieve candidate videos from multiple sources."""
        
        # Parallel retrieval from all sources
        results = await asyncio.gather(
            self._embedding_retrieval(user_embedding),
            self._collaborative_retrieval(recent_watches),
            self._trending_retrieval(user_id),
            self._following_retrieval(followed_creators),
            self._exploration_retrieval()
        )
        
        # Merge and deduplicate
        all_candidates = set()
        for candidate_list in results:
            all_candidates.update(candidate_list)
        
        # Remove already-watched videos
        all_candidates -= set(recent_watches)
        
        return all_candidates  # ~10,000 candidates
    
    async def _embedding_retrieval(
        self, 
        user_embedding: np.ndarray
    ) -> List[str]:
        """Approximate nearest neighbor search."""
        video_ids, distances = self.ann_index.search(
            user_embedding,
            k=self.config.embedding_candidates
        )
        return video_ids
    
    async def _exploration_retrieval(self) -> List[str]:
        """Sample from exploration pool for new content discovery."""
        # Stratified sample: 50% new videos (<24h), 50% random
        new_videos = sample_recent_videos(
            count=self.config.exploration_candidates // 2
        )
        random_videos = sample_random_videos(
            count=self.config.exploration_candidates // 2
        )
        return new_videos + random_videos

ANN Index at Scale

Feature Engineering for Ranking

Feature Categories
Category	Example Features	Update Frequency	Storage
User Static	Age, gender, country, language, account age	Rarely	User profile DB
User Historical	Genre preferences, avg watch time, favorite sounds	Daily batch	Feature store
User Session	Recent watches, current mood, time since last engagement	Real-time	Session cache
Video Static	Duration, resolution, has_text, music_id, creator_id	At upload	Video metadata DB
Video Engagement	Total views, like rate, completion rate, share rate	Near-real-time	Engagement store
Video Content	Visual embedding, audio embedding, text transcript	At upload	Feature store
Cross Features	User-creator affinity, user-genre history, time-of-day patterns	Computed online	Calculated at inference
Contextual	Device type, network quality, time of day, day of week	Request-time	Request context

High-Signal Features (Based on Public Research)

•Video Completion Rate by Cohort — Completion rate for similar users is highly predictive. 'Users in your demographic complete this video 87% of the time.'
•Rewatch Rate — Single highest-signal engagement metric. If users rewatch, the content is compelling.
•Time-Weighted Watch History — Recent watches matter more. Exponential decay on historical preferences (half-life ~24 hours for trends, ~7 days for genres).
•Creator Authority Score — Combination of follower count, average engagement, account age, verification status. Weighted lower for new content discovery.
•Sound/Hashtag Trending Velocity — Is this sound/hashtag currently growing? Trending content gets temporary boost.
•Social Proof Signals — Did friends/followed accounts engage? High-trust signal but lower reach in interest-graph model.
•Negative Feedback Rate — 'Not Interested' selections, fast skips, reports. Heavily penalize videos with high negative rates for similar users.

Feature Store Architecture

Deep Learning Ranking Models

The fine ranking stage uses deep neural networks to predict engagement probability. TikTok's models evolved through several generations, each improving personalization quality.

Model Architecture Evolution

•Generation 1: Wide & Deep — Google's classic architecture. Wide component (linear model) memorizes specific feature crosses; deep component (MLP) generalizes. Baseline for recommendations.
•Generation 2: DeepFM — Replaces wide component with factorization machines. Better automatic feature interaction learning. Still widely used as strong baseline.
•Generation 3: DIN (Deep Interest Network) — Attention mechanism over user's historical interactions. Weights recent watches by relevance to candidate video. Major improvement for personalization.
•Generation 4: Transformer-based — Full transformer architecture over interaction sequences. Self-attention captures long-range dependencies in user behavior. Current state-of-the-art.
•Multi-Task Learning — Single model predicts multiple objectives (completion, like, share, follow) with shared layers and task-specific heads. Improves data efficiency and captures objective correlations.

Converting Mermaid diagram...

Model Serving at 6M QPS

Cold Start Strategies

New User Cold Start

•Contextual Priors — Device type (iPhone vs Android), language settings, country, time of day. iPhone users in the US at 8pm get different initial content than Android users in India at 2am.
•Exploration-Heavy Initial Feed — First 20 videos sampled from diverse categories to rapidly probe preferences. Each video is a 'test question' for the algorithm.
•Rapid Preference Inference — After just 3-5 videos, the algorithm has initial signal. Watch to completion = positive signal for that genre/creator-type. Quick skip = negative signal.
•Cluster Assignment — Map new user to existing user clusters based on early signals. 'This user behaves like cluster-42, serve cluster-42 preferences as starting point.'
•Progressive Personalization — With each interaction, uncertainty decreases. After 50 videos, the user has a stable preference embedding. After 500, personalization is deeply refined.

New Video Cold Start

•Content-Based Quality Prediction — ML model predicts expected engagement from video features (visual quality, audio features, text/captions) before any views.
•Initial Traffic Allocation — New videos receive a minimum impression quota regardless of predicted quality. Fair chance to prove themselves.
•Tiered Rollout — Start with 100 impressions. If engagement exceeds threshold, expand to 1000. If positive, 10,000. Exponential expansion for proven content.
•Multi-Armed Bandit — Treat video distribution as explore/exploit problem. Thompson Sampling or UCB algorithms balance trying new videos vs. showing proven hits.
•Trusted Creator Boost — Videos from creators with strong track records get faster initial rollout. Trust is earned through consistent quality.

cold-start-scoring.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
from scipy.stats import beta
import numpy as np
 
class VideoExplorationScore:
    """
    Thompson Sampling for video exploration.
    Each video maintains (successes, failures) = (engagements, non-engagements).
    Sample from Beta distribution to balance explore/exploit.
    """
    
    def __init__(self, prior_alpha: float = 1.0, prior_beta: float = 1.0):
        # Prior represents our belief before any observations
        # (1, 1) = uniform prior, no strong belief
        self.prior_alpha = prior_alpha
        self.prior_beta = prior_beta
    
    def sample_score(
        self,
        engagements: int,
        impressions: int,
        decay_factor: float = 0.95  # Recent data weighted more
    ) -> float:
        """
        Sample expected engagement rate using Thompson Sampling.
        Higher uncertainty = higher exploration bonus.
        """
        non_engagements = impressions - engagements
        
        alpha = self.prior_alpha + engagements
        beta_param = self.prior_beta + non_engagements
        
        # Sample from posterior Beta distribution
        sampled_rate = beta.rvs(alpha, beta_param)
        
        return sampled_rate
    
    def should_explore(
        self,
        video_age_hours: float,
        impressions: int,
        min_impressions: int = 100
    ) -> bool:
        """
        Videos under min_impressions should always be explored.
        Freshness boost for new content.
        """
        if impressions < min_impressions:
            return True
        
        # Freshness factor: new videos get exploration boost
        freshness_urgency = max(0, 24 - video_age_hours) / 24
        exploration_probability = 0.1 + 0.3 * freshness_urgency
        
        return np.random.random() < exploration_probability

The Exploration Tax

Real-Time Signal Integration

Converting Mermaid diagram...

Real-Time Signal Types

•Watch Events — Video ID, watch duration, completion %, rewatch flag. Sent every 5 seconds during playback + on video exit.
•Engagement Events — Like, comment, share, save, follow. Sent immediately on action.
•Negative Signals — Quick skip (<2s), 'Not Interested' selection, report. Immediately updates session preferences.
•Session Context — Scroll velocity, session duration, interaction density. Infers user mood/intent.
•External Context — Time transitions (lunch to evening), location changes, network changes.

Latency Requirements

Diversity and Filter Bubble Prevention

Diversity Injection Strategies

•Category Diversity — Don't show 5 cooking videos in a row even if predicted engagement is highest. Force rotation across categories using maximal marginal relevance (MMR).
•Creator Diversity — Limit consecutive videos from same creator. Ensure exposure to variety of voices, especially new creators.
•Sound/Hashtag Diversity — Popular sounds can dominate. Cap exposure to any single sound within a session.
•Freshness Slots — Reserve 10-15% of slots for content <24 hours old regardless of predicted engagement.
•Serendipity Injection — Occasionally show content from categories user hasn't engaged with. 'You've never watched comedy, but this video is performing exceptionally well.'
•Time-Based Decay — Reduce repeat exposure to same visual/audio patterns. If user saw 10 dance videos yesterday, reduce dance video rate today.

diversity-reranker.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
from typing import List, Dict
from dataclasses import dataclass
import numpy as np
 
@dataclass
class Video:
    id: str
    category: str
    creator_id: str
    sound_id: str
    score: float
    embedding: np.ndarray
 
def maximal_marginal_relevance(
    candidates: List[Video],
    selected: List[Video],
    lambda_param: float = 0.7  # Balance relevance vs diversity
) -> Video:
    """
    MMR: Select video that maximizes:
    lambda * relevance(v) - (1-lambda) * max_similarity(v, selected)
    """
    if not selected:
        return max(candidates, key=lambda v: v.score)
    
    best_video = None
    best_mmr = float('-inf')
    
    for candidate in candidates:
        relevance = candidate.score
        
        # Max similarity to already selected videos
        max_sim = max(
            cosine_similarity(candidate.embedding, s.embedding)
            for s in selected
        )
        
        mmr_score = lambda_param * relevance - (1 - lambda_param) * max_sim
        
        if mmr_score > best_mmr:
            best_mmr = mmr_score
            best_video = candidate
    
    return best_video
 
def apply_diversity_constraints(
    ranked_videos: List[Video],
    final_count: int = 20
) -> List[Video]:
    """Apply business rules for diversity."""
    
    selected = []
    category_counts: Dict[str, int] = {}
    creator_counts: Dict[str, int] = {}
    
    MAX_PER_CATEGORY = 4  # No more than 4 videos per category
    MAX_PER_CREATOR = 2   # No more than 2 videos per creator
    
    for video in ranked_videos:
        if len(selected) >= final_count:
            break
            
        # Check constraints
        cat_count = category_counts.get(video.category, 0)
        creator_count = creator_counts.get(video.creator_id, 0)
        
        if cat_count >= MAX_PER_CATEGORY:
            continue
        if creator_count >= MAX_PER_CREATOR:
            continue
        
        selected.append(video)
        category_counts[video.category] = cat_count + 1
        creator_counts[video.creator_id] = creator_count + 1
    
    return selected

The Diversity Paradox

A/B Testing Infrastructure

Experimentation Components

•User Bucketing — Consistent hash of user ID determines experiment group. Same user always sees same variant for duration of experiment.
•Feature Flags — Server-side flags control which model/parameters are active for each bucket. Instant rollback without code deploy.
•Holdout Groups — 5-10% of users never see new features to measure long-term baseline. Critical for detecting delayed effects.
•Metric Suite — Core metrics: daily engagement (DEI), weekly retention (WAU/DAU ratio), content diversity consumed, creator distribution fairness.
•Statistical Rigor — Sequential testing to detect effects early without inflating false positive rate. Bonferroni correction for multiple comparisons.
•Guardrail Metrics — Automatic experiment termination if safety metrics degrade (hate speech engagement, self-harm content exposure).

Experiment Interference

Summary: For You Page Algorithm

Key Takeaways

•Multi-Stage Funnel — Retrieval → Pre-filter → Coarse Rank → Fine Rank → Re-rank. Each stage narrows candidates while increasing model sophistication.
•Interest Graph > Social Graph — Content-first recommendations enable discovery of content from strangers, not just the social network.
•Completion is King — Video completion rate is the highest-signal engagement metric, uniquely abundant in short-form video.
•Real-Time Adaptation — Session-level preference vectors update with each interaction, enabling instant personalization shifts.
•Cold Start Excellence — Multi-armed bandits, content-based quality prediction, and exploration budgets give new users/videos fair chances.
•Diversity Matters — 10-15% of impressions sacrificed for diversity improves long-term retention despite short-term engagement loss.
•Experiment Everything — 100+ concurrent A/B tests with rigorous statistical discipline drive continuous improvement.

Algorithm Complete

3 / 6