System Design (HLD)Instagram Photos

Designing Instagram: Photo-Sharing at Planetary Scale

LevelAdvanced

Duration120 mins

TopicInstagram Photos

5 / 6

Explore Algorithm: Discovery at Planetary Scale

The Discovery Challenge

The Instagram Explore page is fundamentally different from the Home Feed. While the Home Feed shows content from accounts you follow, Explore surfaces content from the entire Instagram ecosystem—billions of public posts from accounts you've never seen.

This is one of the hardest recommendation problems in consumer technology:

The Scale Challenge:

Candidate space: Billions of potential posts to recommend
User base: 500+ million daily users, each with unique interests
Latency budget: <200ms to generate personalized recommendations
Freshness: Must incorporate recent viral content and interest changes
Safety: Must filter policy-violating content before recommendation

Unlike search (where users express intent explicitly), Explore must infer what users want based on their behavior. A successful Explore experience feels like Instagram "knows" you—surfacing content you didn't know you wanted.

What You Will Learn

By the end of this page, you will understand: (1) The multi-stage recommendation funnel from billions of candidates to dozens of recommendations, (2) Candidate generation strategies including embedding-based retrieval, (3) Multi-objective ranking that balances engagement, diversity, and safety, (4) Real-time personalization and interest modeling, (5) Cold-start handling for new users, (6) Safety and policy enforcement in recommendations, and (7) The infrastructure required for recommendations at 100K+ QPS.

The Recommendation Funnel

Generating personalized recommendations from billions of candidates in <200ms is impossible with a single model pass. Instead, Explore uses a multi-stage funnel that progressively narrows candidates while increasing ranking precision.

The Funnel Stages:

Converting Mermaid diagram...

Funnel Stage Details
Stage	Input Size	Output Size	Model Complexity	Latency Budget
Candidate Generation	~10B posts	~10K	Embedding retrieval (ANN)	30ms
First-Pass Ranking	~10K	~1K	Lightweight NN/GBDT	15ms
Second-Pass Ranking	~1K	~200	Heavy transformer	25ms
Business Rules	~200	~100	Rules engine	5ms
Diversity/Blending	~100	~50	Heuristics + ML	10ms
Total				<100ms

Why Multi-Stage?

Each stage trades off coverage for precision:

Early stages are fast but imprecise—they filter out obviously irrelevant content
Later stages are slow but precise—they carefully rank the best candidates
Cascade design ensures the best content isn't filtered early while maintaining latency

The key insight: Running a heavy neural network on 10 million candidates is impossible in real-time. But running it on 1,000 pre-filtered candidates is feasible.

async def generate_explore_feed(user_id: str, context: RequestContext) -> ExploreResponse:
    """
    Main Explore recommendation pipeline.
    """
    # Stage 1: Generate candidates from multiple sources
    candidates = await generate_candidates(user_id, limit=10000)
    
    # Stage 2: First-pass ranking (fast, broad filtering)
    candidates = await first_pass_rank(user_id, candidates, limit=1000)
    
    # Stage 3: Second-pass ranking (heavy model, precise scoring)
    candidates = await second_pass_rank(user_id, candidates, limit=200)
    
    # Stage 4: Apply business rules (policy, freshness, etc.)
    candidates = await apply_business_rules(user_id, candidates, limit=100)
    
    # Stage 5: Diversity and blending
    final_posts = await apply_diversity(user_id, candidates, limit=50)
    
    return ExploreResponse(
        posts=final_posts,
        impression_id=generate_impression_id(),
        metadata=generate_metadata(candidates)
    )

Impression Tracking

Every Explore load generates an impression_id that tracks which candidates were shown. This enables: (1) measuring engagement on recommended content, (2) deduplicating across sessions (don't show the same post twice), and (3) training data collection for model improvement.

Candidate Generation: Retrieval at Scale

Candidate generation is the most critical stage—if good content isn't in the candidate pool, no amount of ranking can surface it. Instagram uses multiple candidate sources that each contribute candidates based on different signals.

Candidate Sources:

Candidate Generation Sources

•Embedding-based retrieval: Find posts with visual/caption embeddings similar to user's past engagement
•Account-based expansion: Posts from accounts similar to ones user follows or engages with
•Topic-based retrieval: Posts matching user's inferred interest topics (food, travel, fashion)
•Engagement-based: Posts that are trending or have high engagement globally/regionally
•Social graph expansion: Posts liked by users similar to you (collaborative filtering)
•Hashtag/location expansion: Posts from hashtags or locations user has engaged with
•Recency-biased retrieval: Recent posts weighted higher for freshness

Embedding-Based Retrieval Deep Dive:

The most powerful candidate source uses learned embeddings to represent users and content in a shared vector space:

# User embedding: 512-dimensional vector representing user interests
user_embedding = user_tower_model(user_features)

# Content embedding: 512-dimensional vector representing post content
content_embedding = content_tower_model(post_features)

# Relevance is measured by cosine similarity
similarity = cosine_similarity(user_embedding, content_embedding)

Two-Tower Architecture:

Converting Mermaid diagram...

Approximate Nearest Neighbor (ANN) Search:

Finding the most similar embeddings among billions requires specialized infrastructure:

Technology	Description	Use Case
FAISS (Facebook AI Similarity Search)	GPU-accelerated vector search	Real-time retrieval
ScaNN (Google)	Asymmetric hashing for efficiency	Large-scale retrieval
Pinecone/Milvus/Weaviate	Managed vector databases	Production deployments
HNSW (Hierarchical NSW)	Graph-based ANN	Billion-scale with low latency

# Embedding retrieval with FAISS
class EmbeddingRetriever:
    def __init__(self):
        # Index contains all content embeddings (~10B vectors)
        self.index = faiss.read_index("content_embeddings.index")
        # IVF (Inverted File) for partitioning, PQ (Product Quantization) for compression
        # Typical: IVF65536,PQ64 for billion-scale
    
    def retrieve(self, user_embedding: np.ndarray, top_k: int = 5000) -> List[str]:
        """Find most similar content to user embedding."""
        # Normalize for cosine similarity
        user_embedding = user_embedding / np.linalg.norm(user_embedding)
        
        # Search index (returns top_k approximate nearest neighbors)
        distances, indices = self.index.search(
            user_embedding.reshape(1, -1), 
            top_k
        )
        
        # Convert indices to post IDs
        post_ids = [self.index_to_post_id[idx] for idx in indices[0]]
        return post_ids

Embedding Freshness Challenge

New posts need embeddings computed and indexed before they can be retrieved. This creates a 'cold start' for new content. Solutions include: (1) Fast embedding computation as part of upload pipeline, (2) Near-real-time index updates, (3) Separate 'recent content' index with frequent rebuilds, and (4) Non-embedding candidate sources for very recent content.

Ranking Models: Predicting Engagement

Once candidates are generated, ranking models predict how likely the user is to engage with each post. Modern ranking uses deep learning models that predict multiple engagement outcomes simultaneously.

Multi-Task Learning Objectives:

Instead of predicting a single 'relevance' score, Instagram predicts multiple outcomes:

Prediction Task	Weight	Why
P(like)	High	Strong positive signal
P(comment)	Medium	Deeper engagement
P(save)	High	Strong quality signal
P(share)	Medium	Viral potential
P(profile visit)	Medium	Discovery of creator
P(follow from post)	High	Strongest conversion
P(hide)	Negative	User doesn't want this content
P(report)	Strongly negative	Policy risk signal
Expected time spent	Considered	Engagement depth

Multi-Task Ranking Model (Simplified)
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import torch
import torch.nn as nn
 
class ExploreRankingModel(nn.Module):
    """
    Multi-task ranking model for Explore recommendations.
    Predicts multiple engagement outcomes from user-content pair features.
    """
    
    def __init__(self, input_dim=1024, hidden_dim=512):
        super().__init__()
        
        # Shared representation layers
        self.shared_layers = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.BatchNorm1d(hidden_dim),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.BatchNorm1d(hidden_dim),
        )
        
        # Task-specific heads
        self.like_head = nn.Linear(hidden_dim, 1)
        self.comment_head = nn.Linear(hidden_dim, 1)
        self.save_head = nn.Linear(hidden_dim, 1)
        self.share_head = nn.Linear(hidden_dim, 1)
        self.follow_head = nn.Linear(hidden_dim, 1)
        self.hide_head = nn.Linear(hidden_dim, 1)
        
        # Time spent (regression head)
        self.time_spent_head = nn.Linear(hidden_dim, 1)
    
    def forward(self, features):
        """
        Args:
            features: Combined user, content, and context features
        
        Returns:
            Dictionary of predicted probabilities and values
        """
        shared_repr = self.shared_layers(features)
        
        return {
            'p_like': torch.sigmoid(self.like_head(shared_repr)),
            'p_comment': torch.sigmoid(self.comment_head(shared_repr)),
            'p_save': torch.sigmoid(self.save_head(shared_repr)),
            'p_share': torch.sigmoid(self.share_head(shared_repr)),
            'p_follow': torch.sigmoid(self.follow_head(shared_repr)),
            'p_hide': torch.sigmoid(self.hide_head(shared_repr)),
            'expected_time_spent': torch.relu(self.time_spent_head(shared_repr)),
        }
 
 
def combine_predictions_to_score(predictions: dict) -> float:
    """
    Combine multi-task predictions into a single ranking score.
    Weights reflect relative value of each engagement type.
    """
    return (
        1.0 * predictions['p_like'] +
        2.0 * predictions['p_comment'] +
        3.0 * predictions['p_save'] +
        2.0 * predictions['p_share'] +
        5.0 * predictions['p_follow'] +
        -10.0 * predictions['p_hide'] +  # Negative: penalize content user might hide
        0.1 * predictions['expected_time_spent']
    )

Feature Engineering:

The ranking model consumes hundreds of features categorized as:

Category	Example Features	Source
User features	Following count, account age, activity patterns, interests	User profile service
Content features	Visual embedding, caption embedding, post age, engagement rate	Content service
Author features	Follower count, post frequency, avg engagement, verification	Author profile
User-Author affinity	Prior engagement, shared connections, topic overlap	Interaction history
Context features	Time of day, day of week, device, app version	Request context
Real-time features	Recent engagement velocity, trending status	Event stream

Model Serving Infrastructure:

class RankingService:
    """Serves ranking predictions at high throughput."""
    
    def __init__(self):
        # Load optimized model (TensorRT, ONNX, etc.)
        self.model = load_optimized_model("explore_ranker_v42")
        self.feature_store = FeatureStore()
        
        # GPU-based batch inference
        self.batch_size = 64
        self.gpu_device = torch.device("cuda:0")
    
    async def rank_candidates(
        self, 
        user_id: str, 
        candidates: List[str], 
        context: RequestContext
    ) -> List[RankedCandidate]:
        """Rank candidates by predicted engagement."""
        
        # Batch fetch features
        user_features = await self.feature_store.get_user_features(user_id)
        content_features = await self.feature_store.batch_get_content_features(candidates)
        
        # Combine features into model input
        feature_batch = self.prepare_features(
            user_features, content_features, context
        )
        
        # Batch inference on GPU
        with torch.no_grad():
            feature_tensor = torch.tensor(feature_batch).to(self.gpu_device)
            predictions = self.model(feature_tensor)
        
        # Compute final scores
        scores = combine_predictions_to_score(predictions)
        
        # Return ranked candidates
        ranked = [
            RankedCandidate(post_id=candidates[i], score=scores[i].item())
            for i in range(len(candidates))
        ]
        ranked.sort(key=lambda x: x.score, reverse=True)
        
        return ranked

Model Latency is Critical

The ranking model runs on every Explore request—100K+ QPS. Model inference must complete in <30ms. This requires: (1) Model compression and quantization, (2) GPU-based serving with batching, (3) Aggressive feature caching, and (4) Careful model architecture to limit FLOPs.

User Interest Modeling

Effective recommendations require understanding user interests. Instagram builds interest profiles from user behavior that guide candidate generation and ranking.

Interest Signals:

Signal	Strength	Recency Weight
Liked content	Strong	Decay over weeks
Saved content	Strongest	Slow decay
Commented content	Strong	Medium decay
Long-viewed content	Medium	Fast decay
Searched content	Strong	Fast decay
Followed accounts	Strong	Persistent
Shared content	Strong	Medium decay
Quickly scrolled past	Weak negative	Fast decay
Hidden content	Strong negative	Slow decay

Interest Profile Structure:

User Interest Profile
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
interface UserInterestProfile {
  userId: string;
  lastUpdated: number;
  
  // Topic-level interests (hierarchical taxonomy)
  topicInterests: TopicScore[];  // e.g., [{topic: "food/cooking", score: 0.85}, ...]
  
  // Account-level affinities
  accountAffinities: AccountScore[];  // Similar accounts user might like
  
  // Content-level embeddings
  interestEmbedding: number[];  // 512-d vector aggregating engagement
  negativeEmbedding: number[];  // What user dislikes
  
  // Behavioral patterns
  preferredContentTypes: ContentTypeScore[];  // photo vs video vs carousel
  activeTimePatterns: HourOfDayScore[];
  preferredTopicsTime: TopicTimePattern[];  // e.g., news in morning, entertainment at night
  
  // Exploration vs exploitation balance
  explorationScore: number;  // How open to new topics is user
  
  // Short-term interests (session-level)
  sessionInterests: SessionInterest[];
}
 
interface TopicScore {
  topic: string;
  score: number;  // 0-1, normalized interest strength
  confidence: number;  // How confident based on signal volume
  lastEngagement: number;  // For recency weighting
}
 
interface SessionInterest {
  // Interests inferred from current session
  topics: TopicScore[];
  searchQueries: string[];
  viewedAccounts: string[];
  sessionDuration: number;
}

Interest Profile Updates:

Interest profiles are updated both in real-time and batch:

Real-time Updates (Session-level):

def update_session_interests(user_id: str, event: EngagementEvent):
    """Update session-level interests for immediate effect."""
    session = get_session(user_id)
    
    if event.type == 'like':
        # Strong positive signal - boost topic interest
        topics = get_content_topics(event.content_id)
        for topic in topics:
            session.boost_interest(topic, weight=1.0)
    
    elif event.type == 'scroll_past_quickly':
        # Weak negative - user not interested
        topics = get_content_topics(event.content_id)
        for topic in topics:
            session.reduce_interest(topic, weight=0.1)
    
    # Session interests affect next Explore load
    cache_session(user_id, session)

Batch Updates (Daily):

@scheduled(daily)
def update_interest_profiles():
    """Nightly batch update of all user interest profiles."""
    for user_id in all_active_users():
        # Aggregate all engagement from past 90 days
        engagements = get_user_engagements(user_id, days=90)
        
        # Compute topic interests with time decay
        topic_scores = {}
        for eng in engagements:
            topics = get_content_topics(eng.content_id)
            decay = compute_time_decay(eng.timestamp, half_life_days=14)
            weight = get_engagement_weight(eng.type)  # like: 1.0, save: 2.0, etc.
            
            for topic in topics:
                topic_scores[topic] = topic_scores.get(topic, 0) + weight * decay
        
        # Normalize and store
        profile = build_interest_profile(user_id, topic_scores, engagements)
        store_interest_profile(user_id, profile)

Interest Exploration vs. Exploitation

Pure exploitation (only showing known interests) leads to filter bubbles. Instagram balances with exploration: occasionally showing content outside the user's interest profile to discover new interests. The exploration_score tracks how receptive users are to new topics—users who engage with diverse content get more exploration.

Cold Start: New Users & New Content

The cold start problem occurs when the recommendation system lacks sufficient data about a user or content item. There are two variants:

User Cold Start: New users have no engagement history Content Cold Start: New posts have no engagement signals

User Cold Start Solutions:

User Cold Start Strategies
Strategy	Description	When Used
Global popularity	Show trending/popular content	First session, no signals
Demographic inference	Infer interests from age, location, device	Registration time
Onboarding interests	Ask user to select interests explicitly	First app open
Social graph bootstrap	Use interests of connected users	If phone contacts synced
Rapid learning	Heavily weight first few engagements	First 24 hours
Cross-platform signals	Use Facebook interests if connected	For FB-linked accounts

User Cold Start Handler
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
async def get_explore_for_new_user(user_id: str, context: RequestContext) -> List[Post]:
    """
    Generate Explore recommendations for a user with minimal history.
    Uses multiple fallback strategies.
    """
    user = await get_user(user_id)
    
    # Check for any engagement history
    engagement_count = await get_engagement_count(user_id)
    
    if engagement_count == 0:
        # Complete cold start - use global signals
        return await cold_start_strategy_zero_engagement(user, context)
    
    elif engagement_count < 10:
        # Warming up - blend cold start with early signals
        cold_candidates = await cold_start_candidates(user, context)
        warm_candidates = await early_signal_candidates(user_id)
        
        return blend_candidates(
            cold_candidates, 
            warm_candidates, 
            warm_weight=engagement_count / 10  # Gradually increase warm weight
        )
    
    else:
        # Enough signal for standard pipeline
        return await standard_explore_pipeline(user_id, context)
 
 
async def cold_start_strategy_zero_engagement(user: User, context: RequestContext) -> List[Post]:
    """Generate recommendations with zero user engagement data."""
    candidates = []
    
    # 1. Global trending content
    trending = await get_trending_posts(region=user.country, limit=20)
    candidates.extend(trending)
    
    # 2. High-quality evergreen content (reliably engaging)
    evergreen = await get_evergreen_content(limit=15)
    candidates.extend(evergreen)
    
    # 3. Inferred interests from registration
    if user.declared_interests:
        interest_content = await get_content_by_topics(user.declared_interests, limit=15)
        candidates.extend(interest_content)
    
    # 4. Similar users' popular content
    if user.phone_contacts_synced:
        friends = await get_registered_contacts(user_id)
        friend_favorites = await get_friends_liked_content(friends, limit=10)
        candidates.extend(friend_favorites)
    
    # Deduplicate and rank by popularity
    candidates = deduplicate(candidates)
    return sort_by_engagement_rate(candidates)[:50]

Content Cold Start Solutions:

Strategy	Description	Trade-off
Content-based signals	Use visual/caption embeddings to match to users	Ignores engagement quality
Author-based inference	New posts from engaging authors get exploration budget	Favors established creators
Controlled exploration	Show new content to small sample, measure engagement	Delays virality
Time decay boosting	Give new content temporary rank boost	May surface low-quality content
Similar content proxy	Use engagement of visually similar older posts	Assumes visual similarity = interest similarity

def get_new_content_exploration_score(post_id: str, post: Post) -> float:
    """
    Compute exploration boost for new content with no engagement data.
    """
    hours_since_post = (time.now() - post.created_at).total_seconds() / 3600
    
    if hours_since_post > 48:
        return 0.0  # No longer 'new'
    
    # Base exploration boost (decays with time)
    time_boost = math.exp(-hours_since_post / 12)  # Half-life of 12 hours
    
    # Author quality signal
    author_quality = get_author_avg_engagement_rate(post.author_id)
    author_boost = min(author_quality / 0.05, 2.0)  # Cap at 2x
    
    # Content quality prediction (from visual model)
    predicted_quality = predict_content_quality(post.visual_embedding)
    quality_boost = predicted_quality  # 0-1 range
    
    return time_boost * (0.5 + 0.25 * author_boost + 0.25 * quality_boost)

The Exploration Tax

Showing cold-start content to users is an 'exploration tax'—potentially worse recommendations in exchange for exploration data. Instagram carefully budgets exploration: it never dominates the feed, and it's allocated to users/contexts where suboptimal recommendations are least costly (e.g., users who browse Explore extensively anyway).

Diversity, Freshness & Anti-Filter-Bubble

Pure engagement optimization creates filter bubbles—users only see content matching narrow interests, missing broader discovery. Instagram actively counters this with diversity and freshness mechanisms.

Diversity Dimensions:

Explore Diversity Requirements

•Topic diversity: Don't show 10 food posts in a row, even if user loves food
•Creator diversity: Spread exposure across many creators, not just a few
•Content type diversity: Mix photos, videos, Reels, carousels
•Recency diversity: Blend recent posts with evergreen content
•Novelty injection: Occasionally show topics outside user's profile
•Geographic diversity: Include content from other regions/cultures

Diversity Re-Ranking
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def apply_diversity_reranking(
    ranked_posts: List[RankedPost],
    target_count: int = 50
) -> List[RankedPost]:
    """
    Re-rank posts to ensure diversity while preserving quality.
    Uses Maximal Marginal Relevance (MMR) principle.
    """
    selected = []
    remaining = list(ranked_posts)
    
    # Track diversity dimensions
    topic_counts = Counter()
    creator_set = set()
    content_type_counts = Counter()
    
    while len(selected) < target_count and remaining:
        best_score = -float('inf')
        best_idx = 0
        
        for i, post in enumerate(remaining):
            # Original relevance score (from ranking)
            relevance = post.score
            
            # Diversity penalty based on what's already selected
            topic_penalty = sum(
                topic_counts[t] for t in post.topics
            ) * 0.1
            
            creator_penalty = 0.3 if post.author_id in creator_set else 0
            
            content_penalty = content_type_counts[post.content_type] * 0.05
            
            # Combined score: relevance - diversity penalties
            combined = relevance - topic_penalty - creator_penalty - content_penalty
            
            if combined > best_score:
                best_score = combined
                best_idx = i
        
        # Select best post
        selected_post = remaining.pop(best_idx)
        selected.append(selected_post)
        
        # Update diversity trackers
        for topic in selected_post.topics:
            topic_counts[topic] += 1
        creator_set.add(selected_post.author_id)
        content_type_counts[selected_post.content_type] += 1
    
    return selected
 
 
def inject_novelty(selected: List[Post], user_profile: UserInterestProfile) -> List[Post]:
    """
    Inject a small amount of novel content outside user's interest profile.
    Enables interest discovery and prevents filter bubbles.
    """
    novelty_budget = int(len(selected) * 0.1)  # 10% novelty
    
    # Get topics user hasn't engaged with recently
    user_topics = set(i.topic for i in user_profile.topicInterests[:20])
    novel_topics = get_popular_topics() - user_topics
    
    # Fetch high-quality content from novel topics
    novel_posts = get_top_content_for_topics(novel_topics, limit=novelty_budget)
    
    # Inject at distributed positions (not all at the end)
    result = []
    novel_idx = 0
    for i, post in enumerate(selected):
        result.append(post)
        if i % 10 == 9 and novel_idx < len(novel_posts):  # Every 10th position
            result.append(novel_posts[novel_idx])
            novel_idx += 1
    
    return result

Freshness Requirements:

Explore must surface recent/trending content, not just historically engaging content:

Freshness Mechanism	Description
Time decay in ranking	Older posts get score penalty
Trending injection	Force-include currently viral content
Recency candidate source	Dedicated candidate pool for posts <24h old
Session-based freshness	Don't re-show content from previous sessions
Breaking content detection	Identify rapidly-engaging new content for priority ranking

The Goldilocks Zone

Too much diversity sacrifices relevance. Too little creates filter bubbles. Instagram aims for a 'Goldilocks' balance where ~80% of content matches user interests and ~20% provides exploration and diversity. This ratio may vary by user—exploration-friendly users get more novelty.

Safety & Policy Enforcement in Recommendations

Recommendations amplify content—showing policy-violating or harmful content in Explore has outsized impact. Safety enforcement in the recommendation pipeline is critical.

Recommendation-Specific Policies:

Explore has stricter policies than general content:

Content Category	Allowed on Platform	Recommended in Explore
Borderline content	Yes (with labels)	No
Sensational headlines	Yes	Deprioritized
Unverified claims	Yes (with context)	No
Low-quality content	Yes	Deprioritized
Engagement bait	Yes	Deprioritized
Repeat policy violators	Yes (unless banned)	No

Safety Integration Points:

Converting Mermaid diagram...

Safety Filtering Pipeline
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
class ExploreSafetyFilter:
    """Enforces safety policies in recommendations."""
    
    async def apply_safety_filters(
        self, 
        candidates: List[Post],
        viewer_id: str
    ) -> List[Post]:
        """Filter candidates for safety/policy compliance."""
        
        # Stage 1: Hard blocks (content never recommended)
        safe_candidates = []
        for post in candidates:
            if await self.is_hard_blocked(post):
                continue  # Remove entirely
            safe_candidates.append(post)
        
        # Stage 2: Soft filters (demote but don't remove)
        for post in safe_candidates:
            demotion = await self.calculate_demotion(post, viewer_id)
            post.safety_demotion = demotion
        
        return safe_candidates
    
    async def is_hard_blocked(self, post: Post) -> bool:
        """Check if post is blocked from recommendations."""
        # Author-level blocks
        author = await get_author(post.author_id)
        if author.repeat_violator:
            return True
        if author.recommendation_suspension:
            return True
        
        # Content-level blocks
        classifications = await get_content_classifications(post.id)
        if classifications.nudity_score > 0.9:
            return True
        if classifications.violence_score > 0.9:
            return True
        if classifications.hate_speech > 0.8:
            return True
        if classifications.misinformation_flag:
            return True
        
        return False
    
    async def calculate_demotion(self, post: Post, viewer_id: str) -> float:
        """Calculate soft demotion factor (0 = no demotion, 1 = max demotion)."""
        demotion = 0.0
        
        classifications = await get_content_classifications(post.id)
        
        # Borderline content gets demoted
        if classifications.borderline_score > 0.5:
            demotion += 0.3
        
        # Low quality content
        if classifications.quality_score < 0.3:
            demotion += 0.2
        
        # Engagement bait
        if classifications.engagement_bait_score > 0.6:
            demotion += 0.4
        
        # Sensationalism
        if classifications.sensationalism_score > 0.7:
            demotion += 0.3
        
        # Viewer-specific context (e.g., minor protection)
        viewer = await get_user(viewer_id)
        if viewer.is_minor and classifications.mature_content > 0.3:
            demotion += 0.5
        
        return min(demotion, 1.0)  # Cap at 1.0

Amplification Responsibility

Recommending content is an active choice to amplify it. Even if content is allowed on the platform, recommending it to millions gives Instagram responsibility for its spread. This is why Explore policies are stricter than general content policies—the platform is actively promoting what appears here.

Infrastructure: Serving Recommendations at Scale

Serving personalized recommendations at 100K+ QPS with <200ms latency requires purpose-built infrastructure. Let's examine the serving stack.

Serving Architecture:

Converting Mermaid diagram...

Explore Serving Infrastructure
Component	Technology	Scale	Latency
API Gateway	Custom L7 LB	100K+ QPS	<5ms
Orchestrator	gRPC services	Stateless pods	Coordinates all
User Profiles	Redis Cluster	500M+ profiles	<5ms
Feature Store	Custom (Feast-like)	10B+ features	<10ms
Embedding Index	FAISS on GPU	10B+ vectors	<20ms
Ranking Service	TorchServe/TF Serving	GPU cluster	<25ms
Content Store	Distributed cache + DB	10B+ posts	<15ms

Caching Strategy:

class ExploreCacheStrategy:
    """Multi-level caching for Explore recommendations."""
    
    # Level 1: Response cache (short TTL, exact user+context match)
    RESPONSE_CACHE_TTL = 60  # 1 minute
    
    # Level 2: Candidate cache (medium TTL, reusable across context)
    CANDIDATE_CACHE_TTL = 300  # 5 minutes
    
    # Level 3: User profile cache (longer TTL, slower to change)
    PROFILE_CACHE_TTL = 3600  # 1 hour
    
    # Level 4: Content features cache (long TTL, immutable per version)
    CONTENT_CACHE_TTL = 86400  # 1 day
    
    async def get_explore(self, user_id: str, context: RequestContext) -> ExploreResponse:
        cache_key = f"explore:{user_id}:{context.page}:{context.session_id}"
        
        # Try response cache
        cached = await self.response_cache.get(cache_key)
        if cached:
            return cached
        
        # Generate fresh response
        response = await self.generate_explore(user_id, context)
        
        # Cache response
        await self.response_cache.set(cache_key, response, ttl=self.RESPONSE_CACHE_TTL)
        
        return response

GPU Serving for ML:

Both embedding retrieval and ranking require GPU acceleration:

Component	GPU Role	Serving Pattern
Embedding retrieval	Parallel vector similarity search	FAISS GPU index
Ranking model	Batch inference	TensorRT optimized
Content analysis	Vision models	Async, not real-time path

Horizontal Scaling:

All services are stateless and horizontally scalable
Load balancing by user ID hash for cache efficiency
Auto-scaling based on QPS and latency metrics
Geographic distribution for reduced latency

Feature Store is Critical

The feature store is the unsung hero of recommendation systems. It precomputes and serves hundreds of features per request with <10ms latency. Building a reliable, low-latency feature store is often harder than building the ML models themselves.

Explore Algorithm Summary

The Explore algorithm represents the pinnacle of recommendation system engineering—personalized discovery from billions of candidates in under 200 milliseconds. Let's consolidate the key patterns:

Explore Algorithm Key Patterns

•Multi-stage funnel: Filter billions → thousands → hundreds → tens progressively
•Multiple candidate sources: Embeddings, topics, social graph, trending—diverse retrieval strategies
•Two-tower embeddings + ANN: Enable real-time retrieval from billion-scale corpus
•Multi-task ranking: Predict engagement outcomes, combine into ranking score
•Interest modeling: Build user profiles from behavioral signals with real-time updates
•Cold start strategies: Popularity, demographics, onboarding, rapid learning for new users
•Diversity enforcement: Topic, creator, content type mixing to prevent filter bubbles
•Safety as first-class citizen: Hard blocks, soft demotions, and stricter-than-platform policies
•Purpose-built infrastructure: GPU serving, feature stores, multi-level caching

What's Next: Scaling Media Storage

We've covered how content is created, distributed, and discovered. The final piece is storage at exabyte scale—how Instagram stores, replicates, and serves billions of photos and videos with 11-nines durability. We'll explore:

Object storage architecture and tiering
Replication strategies for durability and availability
Cost optimization through intelligent tiering
Cross-region distribution and disaster recovery
Deletion, archival, and compliance

Explore Complete

You now understand one of the most sophisticated recommendation systems in production—serving personalized discovery to 500 million daily users. The patterns here (retrieval, ranking, diversity, safety) form the foundation of any large-scale recommendation system. Next, we'll see how all this content is stored at planetary scale.

5 / 6

Loading learning content...

System Design (HLD)Instagram Photos

Designing Instagram: Photo-Sharing at Planetary Scale

LevelAdvanced

Duration120 mins

TopicInstagram Photos

5 / 6

Explore Algorithm: Discovery at Planetary Scale

The Discovery Challenge

This is one of the hardest recommendation problems in consumer technology:

The Scale Challenge:

Candidate space: Billions of potential posts to recommend
User base: 500+ million daily users, each with unique interests
Latency budget: <200ms to generate personalized recommendations
Freshness: Must incorporate recent viral content and interest changes
Safety: Must filter policy-violating content before recommendation

What You Will Learn

The Recommendation Funnel

The Funnel Stages:

Converting Mermaid diagram...

Funnel Stage Details
Stage	Input Size	Output Size	Model Complexity	Latency Budget
Candidate Generation	~10B posts	~10K	Embedding retrieval (ANN)	30ms
First-Pass Ranking	~10K	~1K	Lightweight NN/GBDT	15ms
Second-Pass Ranking	~1K	~200	Heavy transformer	25ms
Business Rules	~200	~100	Rules engine	5ms
Diversity/Blending	~100	~50	Heuristics + ML	10ms
Total				<100ms

Why Multi-Stage?

Each stage trades off coverage for precision:

Early stages are fast but imprecise—they filter out obviously irrelevant content
Later stages are slow but precise—they carefully rank the best candidates
Cascade design ensures the best content isn't filtered early while maintaining latency

The key insight: Running a heavy neural network on 10 million candidates is impossible in real-time. But running it on 1,000 pre-filtered candidates is feasible.

async def generate_explore_feed(user_id: str, context: RequestContext) -> ExploreResponse:
    """
    Main Explore recommendation pipeline.
    """
    # Stage 1: Generate candidates from multiple sources
    candidates = await generate_candidates(user_id, limit=10000)
    
    # Stage 2: First-pass ranking (fast, broad filtering)
    candidates = await first_pass_rank(user_id, candidates, limit=1000)
    
    # Stage 3: Second-pass ranking (heavy model, precise scoring)
    candidates = await second_pass_rank(user_id, candidates, limit=200)
    
    # Stage 4: Apply business rules (policy, freshness, etc.)
    candidates = await apply_business_rules(user_id, candidates, limit=100)
    
    # Stage 5: Diversity and blending
    final_posts = await apply_diversity(user_id, candidates, limit=50)
    
    return ExploreResponse(
        posts=final_posts,
        impression_id=generate_impression_id(),
        metadata=generate_metadata(candidates)
    )

Impression Tracking

Candidate Generation: Retrieval at Scale

Candidate Sources:

Candidate Generation Sources

•Embedding-based retrieval: Find posts with visual/caption embeddings similar to user's past engagement
•Account-based expansion: Posts from accounts similar to ones user follows or engages with
•Topic-based retrieval: Posts matching user's inferred interest topics (food, travel, fashion)
•Engagement-based: Posts that are trending or have high engagement globally/regionally
•Social graph expansion: Posts liked by users similar to you (collaborative filtering)
•Hashtag/location expansion: Posts from hashtags or locations user has engaged with
•Recency-biased retrieval: Recent posts weighted higher for freshness

Embedding-Based Retrieval Deep Dive:

The most powerful candidate source uses learned embeddings to represent users and content in a shared vector space:

# User embedding: 512-dimensional vector representing user interests
user_embedding = user_tower_model(user_features)

# Content embedding: 512-dimensional vector representing post content
content_embedding = content_tower_model(post_features)

# Relevance is measured by cosine similarity
similarity = cosine_similarity(user_embedding, content_embedding)

Two-Tower Architecture:

Converting Mermaid diagram...

Approximate Nearest Neighbor (ANN) Search:

Finding the most similar embeddings among billions requires specialized infrastructure:

Technology	Description	Use Case
FAISS (Facebook AI Similarity Search)	GPU-accelerated vector search	Real-time retrieval
ScaNN (Google)	Asymmetric hashing for efficiency	Large-scale retrieval
Pinecone/Milvus/Weaviate	Managed vector databases	Production deployments
HNSW (Hierarchical NSW)	Graph-based ANN	Billion-scale with low latency

# Embedding retrieval with FAISS
class EmbeddingRetriever:
    def __init__(self):
        # Index contains all content embeddings (~10B vectors)
        self.index = faiss.read_index("content_embeddings.index")
        # IVF (Inverted File) for partitioning, PQ (Product Quantization) for compression
        # Typical: IVF65536,PQ64 for billion-scale
    
    def retrieve(self, user_embedding: np.ndarray, top_k: int = 5000) -> List[str]:
        """Find most similar content to user embedding."""
        # Normalize for cosine similarity
        user_embedding = user_embedding / np.linalg.norm(user_embedding)
        
        # Search index (returns top_k approximate nearest neighbors)
        distances, indices = self.index.search(
            user_embedding.reshape(1, -1), 
            top_k
        )
        
        # Convert indices to post IDs
        post_ids = [self.index_to_post_id[idx] for idx in indices[0]]
        return post_ids

Embedding Freshness Challenge

Ranking Models: Predicting Engagement

Multi-Task Learning Objectives:

Instead of predicting a single 'relevance' score, Instagram predicts multiple outcomes:

Prediction Task	Weight	Why
P(like)	High	Strong positive signal
P(comment)	Medium	Deeper engagement
P(save)	High	Strong quality signal
P(share)	Medium	Viral potential
P(profile visit)	Medium	Discovery of creator
P(follow from post)	High	Strongest conversion
P(hide)	Negative	User doesn't want this content
P(report)	Strongly negative	Policy risk signal
Expected time spent	Considered	Engagement depth

Multi-Task Ranking Model (Simplified)
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import torch
import torch.nn as nn
 
class ExploreRankingModel(nn.Module):
    """
    Multi-task ranking model for Explore recommendations.
    Predicts multiple engagement outcomes from user-content pair features.
    """
    
    def __init__(self, input_dim=1024, hidden_dim=512):
        super().__init__()
        
        # Shared representation layers
        self.shared_layers = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.BatchNorm1d(hidden_dim),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.BatchNorm1d(hidden_dim),
        )
        
        # Task-specific heads
        self.like_head = nn.Linear(hidden_dim, 1)
        self.comment_head = nn.Linear(hidden_dim, 1)
        self.save_head = nn.Linear(hidden_dim, 1)
        self.share_head = nn.Linear(hidden_dim, 1)
        self.follow_head = nn.Linear(hidden_dim, 1)
        self.hide_head = nn.Linear(hidden_dim, 1)
        
        # Time spent (regression head)
        self.time_spent_head = nn.Linear(hidden_dim, 1)
    
    def forward(self, features):
        """
        Args:
            features: Combined user, content, and context features
        
        Returns:
            Dictionary of predicted probabilities and values
        """
        shared_repr = self.shared_layers(features)
        
        return {
            'p_like': torch.sigmoid(self.like_head(shared_repr)),
            'p_comment': torch.sigmoid(self.comment_head(shared_repr)),
            'p_save': torch.sigmoid(self.save_head(shared_repr)),
            'p_share': torch.sigmoid(self.share_head(shared_repr)),
            'p_follow': torch.sigmoid(self.follow_head(shared_repr)),
            'p_hide': torch.sigmoid(self.hide_head(shared_repr)),
            'expected_time_spent': torch.relu(self.time_spent_head(shared_repr)),
        }
 
 
def combine_predictions_to_score(predictions: dict) -> float:
    """
    Combine multi-task predictions into a single ranking score.
    Weights reflect relative value of each engagement type.
    """
    return (
        1.0 * predictions['p_like'] +
        2.0 * predictions['p_comment'] +
        3.0 * predictions['p_save'] +
        2.0 * predictions['p_share'] +
        5.0 * predictions['p_follow'] +
        -10.0 * predictions['p_hide'] +  # Negative: penalize content user might hide
        0.1 * predictions['expected_time_spent']
    )

Feature Engineering:

The ranking model consumes hundreds of features categorized as:

Category	Example Features	Source
User features	Following count, account age, activity patterns, interests	User profile service
Content features	Visual embedding, caption embedding, post age, engagement rate	Content service
Author features	Follower count, post frequency, avg engagement, verification	Author profile
User-Author affinity	Prior engagement, shared connections, topic overlap	Interaction history
Context features	Time of day, day of week, device, app version	Request context
Real-time features	Recent engagement velocity, trending status	Event stream

Model Serving Infrastructure:

class RankingService:
    """Serves ranking predictions at high throughput."""
    
    def __init__(self):
        # Load optimized model (TensorRT, ONNX, etc.)
        self.model = load_optimized_model("explore_ranker_v42")
        self.feature_store = FeatureStore()
        
        # GPU-based batch inference
        self.batch_size = 64
        self.gpu_device = torch.device("cuda:0")
    
    async def rank_candidates(
        self, 
        user_id: str, 
        candidates: List[str], 
        context: RequestContext
    ) -> List[RankedCandidate]:
        """Rank candidates by predicted engagement."""
        
        # Batch fetch features
        user_features = await self.feature_store.get_user_features(user_id)
        content_features = await self.feature_store.batch_get_content_features(candidates)
        
        # Combine features into model input
        feature_batch = self.prepare_features(
            user_features, content_features, context
        )
        
        # Batch inference on GPU
        with torch.no_grad():
            feature_tensor = torch.tensor(feature_batch).to(self.gpu_device)
            predictions = self.model(feature_tensor)
        
        # Compute final scores
        scores = combine_predictions_to_score(predictions)
        
        # Return ranked candidates
        ranked = [
            RankedCandidate(post_id=candidates[i], score=scores[i].item())
            for i in range(len(candidates))
        ]
        ranked.sort(key=lambda x: x.score, reverse=True)
        
        return ranked

Model Latency is Critical

User Interest Modeling

Effective recommendations require understanding user interests. Instagram builds interest profiles from user behavior that guide candidate generation and ranking.

Interest Signals:

Signal	Strength	Recency Weight
Liked content	Strong	Decay over weeks
Saved content	Strongest	Slow decay
Commented content	Strong	Medium decay
Long-viewed content	Medium	Fast decay
Searched content	Strong	Fast decay
Followed accounts	Strong	Persistent
Shared content	Strong	Medium decay
Quickly scrolled past	Weak negative	Fast decay
Hidden content	Strong negative	Slow decay

Interest Profile Structure:

User Interest Profile
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
interface UserInterestProfile {
  userId: string;
  lastUpdated: number;
  
  // Topic-level interests (hierarchical taxonomy)
  topicInterests: TopicScore[];  // e.g., [{topic: "food/cooking", score: 0.85}, ...]
  
  // Account-level affinities
  accountAffinities: AccountScore[];  // Similar accounts user might like
  
  // Content-level embeddings
  interestEmbedding: number[];  // 512-d vector aggregating engagement
  negativeEmbedding: number[];  // What user dislikes
  
  // Behavioral patterns
  preferredContentTypes: ContentTypeScore[];  // photo vs video vs carousel
  activeTimePatterns: HourOfDayScore[];
  preferredTopicsTime: TopicTimePattern[];  // e.g., news in morning, entertainment at night
  
  // Exploration vs exploitation balance
  explorationScore: number;  // How open to new topics is user
  
  // Short-term interests (session-level)
  sessionInterests: SessionInterest[];
}
 
interface TopicScore {
  topic: string;
  score: number;  // 0-1, normalized interest strength
  confidence: number;  // How confident based on signal volume
  lastEngagement: number;  // For recency weighting
}
 
interface SessionInterest {
  // Interests inferred from current session
  topics: TopicScore[];
  searchQueries: string[];
  viewedAccounts: string[];
  sessionDuration: number;
}

Interest Profile Updates:

Interest profiles are updated both in real-time and batch:

Real-time Updates (Session-level):

def update_session_interests(user_id: str, event: EngagementEvent):
    """Update session-level interests for immediate effect."""
    session = get_session(user_id)
    
    if event.type == 'like':
        # Strong positive signal - boost topic interest
        topics = get_content_topics(event.content_id)
        for topic in topics:
            session.boost_interest(topic, weight=1.0)
    
    elif event.type == 'scroll_past_quickly':
        # Weak negative - user not interested
        topics = get_content_topics(event.content_id)
        for topic in topics:
            session.reduce_interest(topic, weight=0.1)
    
    # Session interests affect next Explore load
    cache_session(user_id, session)

Batch Updates (Daily):

@scheduled(daily)
def update_interest_profiles():
    """Nightly batch update of all user interest profiles."""
    for user_id in all_active_users():
        # Aggregate all engagement from past 90 days
        engagements = get_user_engagements(user_id, days=90)
        
        # Compute topic interests with time decay
        topic_scores = {}
        for eng in engagements:
            topics = get_content_topics(eng.content_id)
            decay = compute_time_decay(eng.timestamp, half_life_days=14)
            weight = get_engagement_weight(eng.type)  # like: 1.0, save: 2.0, etc.
            
            for topic in topics:
                topic_scores[topic] = topic_scores.get(topic, 0) + weight * decay
        
        # Normalize and store
        profile = build_interest_profile(user_id, topic_scores, engagements)
        store_interest_profile(user_id, profile)

Interest Exploration vs. Exploitation

Cold Start: New Users & New Content

The cold start problem occurs when the recommendation system lacks sufficient data about a user or content item. There are two variants:

User Cold Start: New users have no engagement history Content Cold Start: New posts have no engagement signals

User Cold Start Solutions:

User Cold Start Strategies
Strategy	Description	When Used
Global popularity	Show trending/popular content	First session, no signals
Demographic inference	Infer interests from age, location, device	Registration time
Onboarding interests	Ask user to select interests explicitly	First app open
Social graph bootstrap	Use interests of connected users	If phone contacts synced
Rapid learning	Heavily weight first few engagements	First 24 hours
Cross-platform signals	Use Facebook interests if connected	For FB-linked accounts

User Cold Start Handler
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
async def get_explore_for_new_user(user_id: str, context: RequestContext) -> List[Post]:
    """
    Generate Explore recommendations for a user with minimal history.
    Uses multiple fallback strategies.
    """
    user = await get_user(user_id)
    
    # Check for any engagement history
    engagement_count = await get_engagement_count(user_id)
    
    if engagement_count == 0:
        # Complete cold start - use global signals
        return await cold_start_strategy_zero_engagement(user, context)
    
    elif engagement_count < 10:
        # Warming up - blend cold start with early signals
        cold_candidates = await cold_start_candidates(user, context)
        warm_candidates = await early_signal_candidates(user_id)
        
        return blend_candidates(
            cold_candidates, 
            warm_candidates, 
            warm_weight=engagement_count / 10  # Gradually increase warm weight
        )
    
    else:
        # Enough signal for standard pipeline
        return await standard_explore_pipeline(user_id, context)
 
 
async def cold_start_strategy_zero_engagement(user: User, context: RequestContext) -> List[Post]:
    """Generate recommendations with zero user engagement data."""
    candidates = []
    
    # 1. Global trending content
    trending = await get_trending_posts(region=user.country, limit=20)
    candidates.extend(trending)
    
    # 2. High-quality evergreen content (reliably engaging)
    evergreen = await get_evergreen_content(limit=15)
    candidates.extend(evergreen)
    
    # 3. Inferred interests from registration
    if user.declared_interests:
        interest_content = await get_content_by_topics(user.declared_interests, limit=15)
        candidates.extend(interest_content)
    
    # 4. Similar users' popular content
    if user.phone_contacts_synced:
        friends = await get_registered_contacts(user_id)
        friend_favorites = await get_friends_liked_content(friends, limit=10)
        candidates.extend(friend_favorites)
    
    # Deduplicate and rank by popularity
    candidates = deduplicate(candidates)
    return sort_by_engagement_rate(candidates)[:50]

Content Cold Start Solutions:

Strategy	Description	Trade-off
Content-based signals	Use visual/caption embeddings to match to users	Ignores engagement quality
Author-based inference	New posts from engaging authors get exploration budget	Favors established creators
Controlled exploration	Show new content to small sample, measure engagement	Delays virality
Time decay boosting	Give new content temporary rank boost	May surface low-quality content
Similar content proxy	Use engagement of visually similar older posts	Assumes visual similarity = interest similarity

def get_new_content_exploration_score(post_id: str, post: Post) -> float:
    """
    Compute exploration boost for new content with no engagement data.
    """
    hours_since_post = (time.now() - post.created_at).total_seconds() / 3600
    
    if hours_since_post > 48:
        return 0.0  # No longer 'new'
    
    # Base exploration boost (decays with time)
    time_boost = math.exp(-hours_since_post / 12)  # Half-life of 12 hours
    
    # Author quality signal
    author_quality = get_author_avg_engagement_rate(post.author_id)
    author_boost = min(author_quality / 0.05, 2.0)  # Cap at 2x
    
    # Content quality prediction (from visual model)
    predicted_quality = predict_content_quality(post.visual_embedding)
    quality_boost = predicted_quality  # 0-1 range
    
    return time_boost * (0.5 + 0.25 * author_boost + 0.25 * quality_boost)

The Exploration Tax

Diversity, Freshness & Anti-Filter-Bubble

Diversity Dimensions:

Explore Diversity Requirements

•Topic diversity: Don't show 10 food posts in a row, even if user loves food
•Creator diversity: Spread exposure across many creators, not just a few
•Content type diversity: Mix photos, videos, Reels, carousels
•Recency diversity: Blend recent posts with evergreen content
•Novelty injection: Occasionally show topics outside user's profile
•Geographic diversity: Include content from other regions/cultures

Diversity Re-Ranking
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def apply_diversity_reranking(
    ranked_posts: List[RankedPost],
    target_count: int = 50
) -> List[RankedPost]:
    """
    Re-rank posts to ensure diversity while preserving quality.
    Uses Maximal Marginal Relevance (MMR) principle.
    """
    selected = []
    remaining = list(ranked_posts)
    
    # Track diversity dimensions
    topic_counts = Counter()
    creator_set = set()
    content_type_counts = Counter()
    
    while len(selected) < target_count and remaining:
        best_score = -float('inf')
        best_idx = 0
        
        for i, post in enumerate(remaining):
            # Original relevance score (from ranking)
            relevance = post.score
            
            # Diversity penalty based on what's already selected
            topic_penalty = sum(
                topic_counts[t] for t in post.topics
            ) * 0.1
            
            creator_penalty = 0.3 if post.author_id in creator_set else 0
            
            content_penalty = content_type_counts[post.content_type] * 0.05
            
            # Combined score: relevance - diversity penalties
            combined = relevance - topic_penalty - creator_penalty - content_penalty
            
            if combined > best_score:
                best_score = combined
                best_idx = i
        
        # Select best post
        selected_post = remaining.pop(best_idx)
        selected.append(selected_post)
        
        # Update diversity trackers
        for topic in selected_post.topics:
            topic_counts[topic] += 1
        creator_set.add(selected_post.author_id)
        content_type_counts[selected_post.content_type] += 1
    
    return selected
 
 
def inject_novelty(selected: List[Post], user_profile: UserInterestProfile) -> List[Post]:
    """
    Inject a small amount of novel content outside user's interest profile.
    Enables interest discovery and prevents filter bubbles.
    """
    novelty_budget = int(len(selected) * 0.1)  # 10% novelty
    
    # Get topics user hasn't engaged with recently
    user_topics = set(i.topic for i in user_profile.topicInterests[:20])
    novel_topics = get_popular_topics() - user_topics
    
    # Fetch high-quality content from novel topics
    novel_posts = get_top_content_for_topics(novel_topics, limit=novelty_budget)
    
    # Inject at distributed positions (not all at the end)
    result = []
    novel_idx = 0
    for i, post in enumerate(selected):
        result.append(post)
        if i % 10 == 9 and novel_idx < len(novel_posts):  # Every 10th position
            result.append(novel_posts[novel_idx])
            novel_idx += 1
    
    return result

Freshness Requirements:

Explore must surface recent/trending content, not just historically engaging content:

Freshness Mechanism	Description
Time decay in ranking	Older posts get score penalty
Trending injection	Force-include currently viral content
Recency candidate source	Dedicated candidate pool for posts <24h old
Session-based freshness	Don't re-show content from previous sessions
Breaking content detection	Identify rapidly-engaging new content for priority ranking

The Goldilocks Zone

Safety & Policy Enforcement in Recommendations

Recommendations amplify content—showing policy-violating or harmful content in Explore has outsized impact. Safety enforcement in the recommendation pipeline is critical.

Recommendation-Specific Policies:

Explore has stricter policies than general content:

Content Category	Allowed on Platform	Recommended in Explore
Borderline content	Yes (with labels)	No
Sensational headlines	Yes	Deprioritized
Unverified claims	Yes (with context)	No
Low-quality content	Yes	Deprioritized
Engagement bait	Yes	Deprioritized
Repeat policy violators	Yes (unless banned)	No

Safety Integration Points:

Converting Mermaid diagram...

Safety Filtering Pipeline
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
class ExploreSafetyFilter:
    """Enforces safety policies in recommendations."""
    
    async def apply_safety_filters(
        self, 
        candidates: List[Post],
        viewer_id: str
    ) -> List[Post]:
        """Filter candidates for safety/policy compliance."""
        
        # Stage 1: Hard blocks (content never recommended)
        safe_candidates = []
        for post in candidates:
            if await self.is_hard_blocked(post):
                continue  # Remove entirely
            safe_candidates.append(post)
        
        # Stage 2: Soft filters (demote but don't remove)
        for post in safe_candidates:
            demotion = await self.calculate_demotion(post, viewer_id)
            post.safety_demotion = demotion
        
        return safe_candidates
    
    async def is_hard_blocked(self, post: Post) -> bool:
        """Check if post is blocked from recommendations."""
        # Author-level blocks
        author = await get_author(post.author_id)
        if author.repeat_violator:
            return True
        if author.recommendation_suspension:
            return True
        
        # Content-level blocks
        classifications = await get_content_classifications(post.id)
        if classifications.nudity_score > 0.9:
            return True
        if classifications.violence_score > 0.9:
            return True
        if classifications.hate_speech > 0.8:
            return True
        if classifications.misinformation_flag:
            return True
        
        return False
    
    async def calculate_demotion(self, post: Post, viewer_id: str) -> float:
        """Calculate soft demotion factor (0 = no demotion, 1 = max demotion)."""
        demotion = 0.0
        
        classifications = await get_content_classifications(post.id)
        
        # Borderline content gets demoted
        if classifications.borderline_score > 0.5:
            demotion += 0.3
        
        # Low quality content
        if classifications.quality_score < 0.3:
            demotion += 0.2
        
        # Engagement bait
        if classifications.engagement_bait_score > 0.6:
            demotion += 0.4
        
        # Sensationalism
        if classifications.sensationalism_score > 0.7:
            demotion += 0.3
        
        # Viewer-specific context (e.g., minor protection)
        viewer = await get_user(viewer_id)
        if viewer.is_minor and classifications.mature_content > 0.3:
            demotion += 0.5
        
        return min(demotion, 1.0)  # Cap at 1.0

Amplification Responsibility

Infrastructure: Serving Recommendations at Scale

Serving personalized recommendations at 100K+ QPS with <200ms latency requires purpose-built infrastructure. Let's examine the serving stack.

Serving Architecture:

Converting Mermaid diagram...

Explore Serving Infrastructure
Component	Technology	Scale	Latency
API Gateway	Custom L7 LB	100K+ QPS	<5ms
Orchestrator	gRPC services	Stateless pods	Coordinates all
User Profiles	Redis Cluster	500M+ profiles	<5ms
Feature Store	Custom (Feast-like)	10B+ features	<10ms
Embedding Index	FAISS on GPU	10B+ vectors	<20ms
Ranking Service	TorchServe/TF Serving	GPU cluster	<25ms
Content Store	Distributed cache + DB	10B+ posts	<15ms

Caching Strategy:

class ExploreCacheStrategy:
    """Multi-level caching for Explore recommendations."""
    
    # Level 1: Response cache (short TTL, exact user+context match)
    RESPONSE_CACHE_TTL = 60  # 1 minute
    
    # Level 2: Candidate cache (medium TTL, reusable across context)
    CANDIDATE_CACHE_TTL = 300  # 5 minutes
    
    # Level 3: User profile cache (longer TTL, slower to change)
    PROFILE_CACHE_TTL = 3600  # 1 hour
    
    # Level 4: Content features cache (long TTL, immutable per version)
    CONTENT_CACHE_TTL = 86400  # 1 day
    
    async def get_explore(self, user_id: str, context: RequestContext) -> ExploreResponse:
        cache_key = f"explore:{user_id}:{context.page}:{context.session_id}"
        
        # Try response cache
        cached = await self.response_cache.get(cache_key)
        if cached:
            return cached
        
        # Generate fresh response
        response = await self.generate_explore(user_id, context)
        
        # Cache response
        await self.response_cache.set(cache_key, response, ttl=self.RESPONSE_CACHE_TTL)
        
        return response

GPU Serving for ML:

Both embedding retrieval and ranking require GPU acceleration:

Component	GPU Role	Serving Pattern
Embedding retrieval	Parallel vector similarity search	FAISS GPU index
Ranking model	Batch inference	TensorRT optimized
Content analysis	Vision models	Async, not real-time path

Horizontal Scaling:

All services are stateless and horizontally scalable
Load balancing by user ID hash for cache efficiency
Auto-scaling based on QPS and latency metrics
Geographic distribution for reduced latency

Feature Store is Critical

Explore Algorithm Summary

The Explore algorithm represents the pinnacle of recommendation system engineering—personalized discovery from billions of candidates in under 200 milliseconds. Let's consolidate the key patterns:

Explore Algorithm Key Patterns

•Multi-stage funnel: Filter billions → thousands → hundreds → tens progressively
•Multiple candidate sources: Embeddings, topics, social graph, trending—diverse retrieval strategies
•Two-tower embeddings + ANN: Enable real-time retrieval from billion-scale corpus
•Multi-task ranking: Predict engagement outcomes, combine into ranking score
•Interest modeling: Build user profiles from behavioral signals with real-time updates
•Cold start strategies: Popularity, demographics, onboarding, rapid learning for new users
•Diversity enforcement: Topic, creator, content type mixing to prevent filter bubbles
•Safety as first-class citizen: Hard blocks, soft demotions, and stricter-than-platform policies
•Purpose-built infrastructure: GPU serving, feature stores, multi-level caching

What's Next: Scaling Media Storage

Object storage architecture and tiering
Replication strategies for durability and availability
Cost optimization through intelligent tiering
Cross-region distribution and disaster recovery
Deletion, archival, and compliance

Explore Complete

5 / 6