System Design (HLD)Facebook Newsfeed

Facebook Newsfeed Design

LevelAdvanced

Duration120 mins

TopicFacebook Newsfeed

3 / 6

Content Aggregation

The Aggregation Challenge

When you open Facebook, your feed loads in under a second. But consider what happens behind the scenes: the system must collect posts from your 350+ friends, hundreds of pages, dozens of groups, extract relevant ads, and bring them together for ranking—all before you can scroll.

This is the content aggregation problem, and it's one of the most challenging aspects of feed system design. The numbers are staggering:

2+ billion users, each with a unique set of connections
4+ billion posts created daily, distributed across millions of servers
Quadrillions of (user, post) pairs to potentially evaluate
Sub-second latency requirement for each feed request

The naive approach—query all friends' posts on demand—would take minutes. The alternative—precompute all feeds—would require impossible storage. Feed systems must find a middle path that balances computation, storage, and latency.

What You Will Learn

By the end of this page, you will understand the fan-out problem and its implications, master push vs pull aggregation strategies, learn hybrid approaches used at Facebook scale, and explore the distributed systems patterns that make content aggregation tractable.

Understanding the Fan-out Problem

Fan-out describes how many operations a single action triggers. In feed systems, we encounter two types of fan-out that create fundamentally different challenges:

Fan-out on Write (Push)

•When a post is created, distribute it to all followers' feed caches
•Write Amplification: 1 post → N follower writes
•Fast reads (pre-materialized)
•Expensive writes
•Problem: Celebrity with 10M followers = 10M writes per post

Fan-out on Read (Pull)

•When feed is requested, query all connections for their posts
•Read Amplification: 1 feed request → N friend queries
•Cheap writes (single store)
•Expensive reads
•Problem: User with 500 friends = 500 queries per feed load

The Scale Math

Let's quantify these challenges with realistic numbers:

fanout_analysis.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Fan-out on Write Analysis
# =========================
 
# Average case
avg_followers = 350
posts_per_user_per_day = 2
total_users = 2_000_000_000
 
daily_posts = total_users * posts_per_user_per_day
# = 4 billion posts/day
 
daily_fanout_writes = daily_posts * avg_followers
# = 4B * 350 = 1.4 trillion feed cache writes/day
# = 16.2 million writes/second (average)
 
# Celebrity case (problematic)
celebrity_followers = 10_000_000
celebrity_posts_per_day = 5
 
celebrity_fanout_per_post = celebrity_followers
# = 10 million writes per single post
# Must complete before post appears "live"
 
# If each write takes 1ms, 10M writes = 167 minutes (unacceptable)
# Need 10,000+ parallel writers to complete in 1 minute
 
 
# Fan-out on Read Analysis
# ========================
 
# Average case
avg_friends = 350
avg_pages_followed = 100
avg_groups = 20
 
feed_sources_per_user = avg_friends + avg_pages_followed + avg_groups
# = 470 sources to query
 
daily_feed_requests = 2_000_000_000 * 8  # 8 sessions/user/day
# = 16 billion feed requests/day
 
total_source_queries = daily_feed_requests * feed_sources_per_user
# = 16B * 470 = 7.5 trillion queries/day
# = 86.8 million queries/second
 
# Each query taking 5ms = 470 * 5ms = 2.35 seconds per feed (unacceptable)
# Need heavy parallelization or pre-aggregation

Fan-out Strategy Comparison
Dimension	Fan-out on Write	Fan-out on Read
Write Latency	High (must update many caches)	Low (single write)
Read Latency	Low (pre-computed feeds)	High (aggregate on demand)
Storage Cost	High (feed cached per user)	Low (posts stored once)
Freshness	Eventual (propagation delay)	Real-time (current data)
Consistency	Complex (distributed updates)	Simple (single source)
Celebrity Posts	Extremely expensive	Same as average user
Inactive Users	Wasteful (feed never read)	No waste

Neither Pure Approach Works

Pure push fails for celebrities (10M writes per post is unsustainable). Pure pull fails for latency (470 queries per feed request is too slow). Real systems use hybrid approaches that combine benefits of both.

Hybrid Aggregation Architecture

Facebook's solution is a hybrid push-pull model that uses different strategies based on user type and access patterns. The key insight: optimize for the common case while having fallbacks for edge cases.

Converting Mermaid diagram...

2.1 User Classification

The system classifies users based on their follower count and posting behavior:

User Classification for Fan-out Strategy
User Type	Follower Threshold	Write Strategy	Read Strategy
Regular User	< 10K followers	Push to all followers	Read from cache
Micro-celebrity	10K - 100K	Push to active followers only	Hybrid cache + pull
Celebrity	100K - 1M	Push to top 5K, index for rest	Pull from celebrity index
Mega-celebrity	1M followers	No push, index only	Pure pull from index
Pages (Brands)	Varies	Push to highly engaged followers	Pull for casual followers

2.2 Feed Cache Design

The feed cache stores pre-aggregated content IDs (not full content) for each user. This dramatically reduces storage while enabling fast reads.

feed_cache_schema.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Feed Cache Entry (stored per user)
interface FeedCacheEntry {
  userId: string;
  
  // Sorted list of post references (not full posts)
  posts: PostReference[];
  
  // Metadata for cache management
  lastUpdated: timestamp;
  cacheVersion: number;
}
 
interface PostReference {
  postId: string;           // Reference to actual post
  authorId: string;         // For fast filtering
  createdAt: timestamp;     // For time decay
  estimatedScore: number;   // Pre-computed score (may be stale)
  contentType: ContentType; // For diversity filtering
}
 
// Storage calculation
const AVERAGE_POST_REFERENCES = 1500;  // Posts per user cache
const REFERENCE_SIZE_BYTES = 64;       // Compact encoding
const USERS_TO_CACHE = 2_000_000_000;  // All active users
 
// Total storage
const feedCacheStorage = AVERAGE_POST_REFERENCES * REFERENCE_SIZE_BYTES * USERS_TO_CACHE;
// = 1500 * 64 * 2B = 192 TB
 
// This is feasible for distributed cache (Redis cluster)
// Full post content would be: 1500 * 50KB * 2B = 150 EB (impossible)

2.3 Celebrity Index

For high-follower accounts, posts are indexed centrally rather than pushed to followers.

celebrity_index.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Celebrity Index Service
interface CelebrityIndex {
  // Sharded by celebrity ID for parallel access
  shardKey: string;
  
  // Recent posts per celebrity (rolling window)
  recentPosts: Map<CelebrityId, PostReference[]>;
  
  // Maximum posts stored per celebrity
  maxPostsPerCelebrity: 200;  // ~7 days of posts
}
 
// Query pattern: Get posts from celebrities I follow
async function getCelebrityPosts(
  userId: string,
  followedCelebrities: string[]
): Promise<PostReference[]> {
  
  // Batch query across shards
  const shardQueries = groupByShards(followedCelebrities);
  
  const results = await Promise.all(
    shardQueries.map(async ([shard, celebIds]) => {
      return celebrityIndex.shard(shard).batchGet(celebIds);
    })
  );
  
  // Flatten and merge
  const allCelebrityPosts = results.flat();
  
  // Sort by recency or estimated score
  return allCelebrityPosts.sort((a, b) => b.createdAt - a.createdAt);
}
 
// This turns O(N followers) writes into O(1) index update
// At read time: O(K celebrities followed) queries (typically < 100)

The 80-20 Rule in Action

Approximately 0.1% of users (celebrities) cause 99% of fan-out problems. By handling this tiny fraction differently, the system becomes tractable. This pattern—special-casing outliers—appears throughout large-scale system design.

Content Storage and Retrieval

With content references aggregated, the system needs to efficiently retrieve full post content for rendering. This involves a multi-tier storage architecture optimized for different access patterns.

3.1 Storage Tiers

Converting Mermaid diagram...

Storage Tier Characteristics
Tier	Latency	Capacity	Cost	Typical Content
Hot (Memory)	< 1ms	~50 TB	$$$$$	Last hour's viral posts
Warm (SSD)	1-5ms	~10 PB	$$$	Week's active content
Cold (HDD)	10-50ms	~500 PB	$$	Historical posts
Archive	100ms - minutes	Unlimited	$	Old posts, rarely accessed

3.2 Content Prefetching

Rather than wait for cache misses, the system proactively prefetches content likely to be needed.

prefetch_strategy.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Prefetch Manager
class ContentPrefetcher {
  
  // Strategy 1: Prefetch on aggregation
  // When building feed candidate list, prefetch content in parallel
  async onFeedRequest(userId: string): Promise<void> {
    const candidates = await getCandidatePostIds(userId);
    
    // Speculatively prefetch top candidates into hot cache
    const topCandidates = candidates.slice(0, 100);
    
    // Non-blocking prefetch (don't wait for completion)
    this.prefetchBatch(topCandidates);
  }
  
  // Strategy 2: Prefetch on trending
  // When post starts going viral, proactively cache
  async onTrendingDetected(postId: string): Promise<void> {
    const post = await coldStorage.get(postId);
    
    // Promote to all tiers
    await Promise.all([
      hotCache.set(postId, post),
      warmCache.set(postId, post),
    ]);
    
    // Prefetch author's other recent posts (likely to be accessed)
    const authorPosts = await getAuthorRecentPosts(post.authorId, 10);
    this.prefetchBatch(authorPosts.map(p => p.id));
  }
  
  // Strategy 3: Prefetch on session start
  // When user opens app, predict content needs based on patterns
  async onSessionStart(userId: string): Promise<void> {
    // Get user's typical engagement patterns
    const patterns = await getUserPatterns(userId);
    
    // Prefetch content from frequently engaged authors
    const topAuthors = patterns.topEngagedAuthors.slice(0, 20);
    for (const authorId of topAuthors) {
      const recentPosts = await getAuthorRecentPosts(authorId, 5);
      this.prefetchBatch(recentPosts.map(p => p.id));
    }
  }
}

3.3 Content Hydration

Feed caches store post IDs; full content must be 'hydrated' before serving.

content_hydration.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Hydration Service
interface HydratedPost {
  id: string;
  author: UserProfile;       // Hydrated from user service
  content: PostContent;      // Text, media URLs, etc.
  engagement: EngagementCounts;  // Like, comment, share counts
  socialContext: SocialContext;  // Friends who engaged
  media: MediaAssets[];      // Image/video URLs from CDN
}
 
async function hydratePostBatch(
  postIds: string[],
  viewerId: string
): Promise<HydratedPost[]> {
  
  // Parallel fetches from multiple services
  const [posts, authors, engagement, socialContext] = await Promise.all([
    postStore.batchGet(postIds),
    userService.batchGetProfiles(getAuthorIds(postIds)),
    engagementService.batchGetCounts(postIds),
    socialService.getSocialContext(postIds, viewerId),
  ]);
  
  // Merge into hydrated posts
  return postIds.map(id => ({
    id,
    author: authors.get(posts.get(id).authorId),
    content: posts.get(id).content,
    engagement: engagement.get(id),
    socialContext: socialContext.get(id),
    media: generateCDNUrls(posts.get(id).mediaIds),
  }));
}
 
// Batching is critical: 50 individual requests = ~500ms
// Single batch request = ~50ms
// Always batch across services!

Hydration is the Bottleneck

Content hydration often dominates feed latency. Each post requires data from 4-5 services (posts, users, engagement, social, media). Aggressive batching, parallel fetching, and caching at every layer are essential. Facebook uses custom RPC frameworks optimized for this pattern.

Social Graph Integration

Content aggregation fundamentally depends on the social graph—the edges that connect users to friends, pages, and groups. This graph must support fast lookups while handling Facebook's massive scale.

4.1 Graph Data Model

social_graph_schema.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Social Graph Edges
interface GraphEdge {
  sourceId: string;      // User creating the edge
  destinationType: 'user' | 'page' | 'group';
  targetId: string;      // Friend, page, or group
  edgeType: EdgeType;    // friend, follow, member, etc.
  
  // Edge metadata for feed ranking
  createdAt: timestamp;
  interactionCount: number;  // Engagements on target's content
  lastInteraction: timestamp;
  relationship: RelationshipStrength;  // close_friend, acquaintance, etc.
}
 
enum EdgeType {
  FRIEND = 'friend',           // Bidirectional friendship
  FOLLOW = 'follow',           // Unidirectional follow
  PAGE_LIKE = 'page_like',     // Following a page
  GROUP_MEMBER = 'group_member', // Group membership
  BLOCKED = 'blocked',         // Negative edge (filter content)
}
 
// Graph statistics
// - 2 billion+ nodes (users)
// - Average edges per user: ~500 (350 friends + pages + groups)
// - Total edges: ~1 trillion
// - Edge updates per second: ~1 million (friendships, follows)

4.2 TAO: The Social Graph Store

Facebook built TAO (The Associations and Objects) specifically for social graph access patterns.

TAO Design Principles

•Objects and Associations — Two data types: nodes (users, posts) and edges (friendships, likes)
•Write-through Cache — All reads go through cache; writes update cache then database
•Eventual Consistency — Replication is asynchronous; slight staleness acceptable
•Association Lists — Get all edges of a type for a node (e.g., all friends of a user)
•Inverse Indexes — Both directions of edges indexed (who I follow, who follows me)
•Time-aware Access — Edges sorted by time for recency queries

tao_queries.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Common TAO query patterns for feed aggregation
 
// Get all friends of a user
const friends = await tao.assocRange(
  userId,           // Source node
  'friend',         // Edge type
  0,               // Offset
  1000             // Limit
);
 
// Get all pages a user follows
const pages = await tao.assocRange(
  userId,
  'page_like',
  0,
  500
);
 
// Get all groups a user is member of
const groups = await tao.assocRange(
  userId,
  'group_member',
  0,
  100
);
 
// Get mutual friends with another user
const mutualFriends = await tao.assocIntersect(
  userId,
  targetUserId,
  'friend'
);
 
// Check if user is blocked
const isBlocked = await tao.assocGet(
  userId,
  targetUserId,
  'blocked'
);
 
// These operations are O(1) or O(log N) due to indexing
// Sub-millisecond latency from cache layer

4.3 Relationship Strength Scoring

Not all connections are equal. The system computes relationship strength to prioritize content from close connections.

relationship_strength.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def compute_relationship_strength(user_id, connection_id):
    """
    Compute affinity score between user and a connection.
    Higher score = closer relationship = higher feed priority.
    """
    signals = {}
    
    # Explicit signals
    signals['is_close_friend'] = is_marked_close_friend(user_id, connection_id)
    signals['is_family'] = is_marked_family(user_id, connection_id)
    
    # Engagement signals (past 30 days)
    signals['comment_count'] = count_comments_on_connection(user_id, connection_id)
    signals['reaction_count'] = count_reactions_on_connection(user_id, connection_id)
    signals['message_count'] = count_messages_with_connection(user_id, connection_id)
    signals['profile_views'] = count_profile_views(user_id, connection_id)
    signals['tag_count'] = count_mutual_tags(user_id, connection_id)
    signals['photo_views'] = count_photo_album_views(user_id, connection_id)
    
    # Recency signals
    signals['days_since_last_interaction'] = days_since_interaction(user_id, connection_id)
    signals['days_since_friendship'] = days_since_connected(user_id, connection_id)
    
    # Network signals
    signals['mutual_friend_count'] = count_mutual_friends(user_id, connection_id)
    signals['mutual_group_count'] = count_mutual_groups(user_id, connection_id)
    
    # Weighted combination (weights learned from engagement data)
    weights = load_affinity_weights()
    score = sum(
        weights[signal] * value 
        for signal, value in signals.items()
    )
    
    # Normalize to 0-1 range
    return sigmoid(score)
 
# This score is precomputed and stored with edge metadata
# Updated periodically (daily) or on significant interactions

Affinity Scores Enable Efficient Pruning

With relationship strength precomputed, the aggregation step can quickly prune low-affinity connections. Instead of fetching posts from 500 friends, the system might only fetch from the top 100 by affinity, dramatically reducing query load while maintaining feed quality.

Distributed Aggregation Pipeline

Putting it all together, the aggregation pipeline orchestrates multiple services across data centers to assemble feed candidates within latency budgets.

Converting Mermaid diagram...

5.1 Latency Budget Allocation

Feed Request Latency Budget (500ms target)
Stage	Budget	Parallelism	Notes
Network (client → edge)	50ms	Sequential	CDN/edge routing
Graph lookup	30ms	Parallel	TAO cache hit
Feed cache read	20ms	Parallel	Redis cluster
Celebrity index query	40ms	Parallel	Sharded queries
Freshness query	30ms	Parallel	Recent posts only
Candidate merging	10ms	Sequential	In-memory
Ranking inference	50ms	Sequential	ML model
Content hydration	80ms	Parallel	Batch fetches
Serialization	20ms	Sequential	Protobuf
Network (edge → client)	50ms	Sequential	Response delivery
Total	~380ms		With buffer for variability

5.2 Failure Handling and Degradation

Graceful Degradation Strategies

•Feed Cache Miss — Fall back to pull-based aggregation (slower but functional)
•Celebrity Index Timeout — Skip celebrity content, show friend content only
•Ranking Service Timeout — Use cached rankings or chronological fallback
•Social Graph Unavailable — Use stale cached connections (may be slightly outdated)
•Post Store Timeout — Show cached content with 'load more' option
•Complete Outage — Show 'last known good' feed from client cache

degradation_policy.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
// Degradation controller for feed aggregation
class FeedDegradationController {
  
  async aggregateWithDegradation(userId: string): Promise<FeedResponse> {
    const startTime = Date.now();
    const budget = 400; // ms before degradation
    
    let candidates: PostReference[] = [];
    let degradationFlags: string[] = [];
    
    // Stage 1: Try cache (always fast)
    try {
      const cached = await withTimeout(feedCache.get(userId), 50);
      candidates.push(...cached);
    } catch (e) {
      degradationFlags.push('cache_miss');
    }
    
    // Stage 2: Social graph (critical path)
    let connections;
    try {
      connections = await withTimeout(tao.getConnections(userId), 50);
    } catch (e) {
      // Fall back to cached connections
      connections = await localCache.getConnections(userId);
      degradationFlags.push('stale_graph');
    }
    
    // Stage 3: Celebrity content (non-critical)
    const remainingBudget = budget - (Date.now() - startTime);
    if (remainingBudget > 100) {
      try {
        const celebPosts = await withTimeout(
          celebrityIndex.getPosts(connections.celebrities), 
          Math.min(remainingBudget / 2, 100)
        );
        candidates.push(...celebPosts);
      } catch (e) {
        degradationFlags.push('no_celebrity_content');
      }
    }
    
    // Stage 4: Ranking (fallback to heuristic)
    let rankedFeed;
    try {
      rankedFeed = await withTimeout(rankingService.rank(candidates), 80);
    } catch (e) {
      rankedFeed = heuristicRank(candidates);
      degradationFlags.push('heuristic_ranking');
    }
    
    return {
      posts: rankedFeed,
      degraded: degradationFlags.length > 0,
      degradationFlags,
    };
  }
}

Availability Over Perfection

A slightly worse feed that loads in 300ms is better than a perfect feed that times out. Design degradation paths that preserve user experience even when subsystems fail. Users tolerate stale data far better than loading spinners.

Summary: Content Aggregation

We've explored the infrastructure that aggregates content for feed generation. Let's consolidate the key concepts:

Key Takeaways

•Fan-out problem is central — Neither pure push nor pure pull works at scale; hybrid approaches are essential
•Classify users by impact — Handle celebrities differently: index their content centrally instead of pushing
•Cache references, not content — Store post IDs in feed cache; hydrate full content on demand
•Tiered storage optimizes cost — Hot/warm/cold tiers balance latency against storage cost
•Social graph enables aggregation — TAO provides fast edge lookups; precomputed relationship strength enables pruning
•Degrade gracefully — Every component can fail; design fallbacks that preserve user experience
•Batch everything — Individual requests are slow; batch across services to reduce round trips

What's Next:

With content aggregated and ranked, the next page explores Real-time Updates—how the feed stays fresh with new content and engagement updates without requiring full page refreshes.

Page Complete

You now understand how content is aggregated from distributed sources for feed generation. The hybrid push-pull model, tiered caching, and graceful degradation patterns form the backbone of scalable feed infrastructure.

3 / 6

Loading learning content...

System Design (HLD)Facebook Newsfeed

Facebook Newsfeed Design

LevelAdvanced

Duration120 mins

TopicFacebook Newsfeed

3 / 6

Content Aggregation

The Aggregation Challenge

This is the content aggregation problem, and it's one of the most challenging aspects of feed system design. The numbers are staggering:

2+ billion users, each with a unique set of connections
4+ billion posts created daily, distributed across millions of servers
Quadrillions of (user, post) pairs to potentially evaluate
Sub-second latency requirement for each feed request

What You Will Learn

Understanding the Fan-out Problem

Fan-out describes how many operations a single action triggers. In feed systems, we encounter two types of fan-out that create fundamentally different challenges:

Fan-out on Write (Push)

•When a post is created, distribute it to all followers' feed caches
•Write Amplification: 1 post → N follower writes
•Fast reads (pre-materialized)
•Expensive writes
•Problem: Celebrity with 10M followers = 10M writes per post

Fan-out on Read (Pull)

•When feed is requested, query all connections for their posts
•Read Amplification: 1 feed request → N friend queries
•Cheap writes (single store)
•Expensive reads
•Problem: User with 500 friends = 500 queries per feed load

The Scale Math

Let's quantify these challenges with realistic numbers:

fanout_analysis.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Fan-out on Write Analysis
# =========================
 
# Average case
avg_followers = 350
posts_per_user_per_day = 2
total_users = 2_000_000_000
 
daily_posts = total_users * posts_per_user_per_day
# = 4 billion posts/day
 
daily_fanout_writes = daily_posts * avg_followers
# = 4B * 350 = 1.4 trillion feed cache writes/day
# = 16.2 million writes/second (average)
 
# Celebrity case (problematic)
celebrity_followers = 10_000_000
celebrity_posts_per_day = 5
 
celebrity_fanout_per_post = celebrity_followers
# = 10 million writes per single post
# Must complete before post appears "live"
 
# If each write takes 1ms, 10M writes = 167 minutes (unacceptable)
# Need 10,000+ parallel writers to complete in 1 minute
 
 
# Fan-out on Read Analysis
# ========================
 
# Average case
avg_friends = 350
avg_pages_followed = 100
avg_groups = 20
 
feed_sources_per_user = avg_friends + avg_pages_followed + avg_groups
# = 470 sources to query
 
daily_feed_requests = 2_000_000_000 * 8  # 8 sessions/user/day
# = 16 billion feed requests/day
 
total_source_queries = daily_feed_requests * feed_sources_per_user
# = 16B * 470 = 7.5 trillion queries/day
# = 86.8 million queries/second
 
# Each query taking 5ms = 470 * 5ms = 2.35 seconds per feed (unacceptable)
# Need heavy parallelization or pre-aggregation

Fan-out Strategy Comparison
Dimension	Fan-out on Write	Fan-out on Read
Write Latency	High (must update many caches)	Low (single write)
Read Latency	Low (pre-computed feeds)	High (aggregate on demand)
Storage Cost	High (feed cached per user)	Low (posts stored once)
Freshness	Eventual (propagation delay)	Real-time (current data)
Consistency	Complex (distributed updates)	Simple (single source)
Celebrity Posts	Extremely expensive	Same as average user
Inactive Users	Wasteful (feed never read)	No waste

Neither Pure Approach Works

Hybrid Aggregation Architecture

Converting Mermaid diagram...

2.1 User Classification

The system classifies users based on their follower count and posting behavior:

User Classification for Fan-out Strategy
User Type	Follower Threshold	Write Strategy	Read Strategy
Regular User	< 10K followers	Push to all followers	Read from cache
Micro-celebrity	10K - 100K	Push to active followers only	Hybrid cache + pull
Celebrity	100K - 1M	Push to top 5K, index for rest	Pull from celebrity index
Mega-celebrity	1M followers	No push, index only	Pure pull from index
Pages (Brands)	Varies	Push to highly engaged followers	Pull for casual followers

2.2 Feed Cache Design

The feed cache stores pre-aggregated content IDs (not full content) for each user. This dramatically reduces storage while enabling fast reads.

feed_cache_schema.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Feed Cache Entry (stored per user)
interface FeedCacheEntry {
  userId: string;
  
  // Sorted list of post references (not full posts)
  posts: PostReference[];
  
  // Metadata for cache management
  lastUpdated: timestamp;
  cacheVersion: number;
}
 
interface PostReference {
  postId: string;           // Reference to actual post
  authorId: string;         // For fast filtering
  createdAt: timestamp;     // For time decay
  estimatedScore: number;   // Pre-computed score (may be stale)
  contentType: ContentType; // For diversity filtering
}
 
// Storage calculation
const AVERAGE_POST_REFERENCES = 1500;  // Posts per user cache
const REFERENCE_SIZE_BYTES = 64;       // Compact encoding
const USERS_TO_CACHE = 2_000_000_000;  // All active users
 
// Total storage
const feedCacheStorage = AVERAGE_POST_REFERENCES * REFERENCE_SIZE_BYTES * USERS_TO_CACHE;
// = 1500 * 64 * 2B = 192 TB
 
// This is feasible for distributed cache (Redis cluster)
// Full post content would be: 1500 * 50KB * 2B = 150 EB (impossible)

2.3 Celebrity Index

For high-follower accounts, posts are indexed centrally rather than pushed to followers.

celebrity_index.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Celebrity Index Service
interface CelebrityIndex {
  // Sharded by celebrity ID for parallel access
  shardKey: string;
  
  // Recent posts per celebrity (rolling window)
  recentPosts: Map<CelebrityId, PostReference[]>;
  
  // Maximum posts stored per celebrity
  maxPostsPerCelebrity: 200;  // ~7 days of posts
}
 
// Query pattern: Get posts from celebrities I follow
async function getCelebrityPosts(
  userId: string,
  followedCelebrities: string[]
): Promise<PostReference[]> {
  
  // Batch query across shards
  const shardQueries = groupByShards(followedCelebrities);
  
  const results = await Promise.all(
    shardQueries.map(async ([shard, celebIds]) => {
      return celebrityIndex.shard(shard).batchGet(celebIds);
    })
  );
  
  // Flatten and merge
  const allCelebrityPosts = results.flat();
  
  // Sort by recency or estimated score
  return allCelebrityPosts.sort((a, b) => b.createdAt - a.createdAt);
}
 
// This turns O(N followers) writes into O(1) index update
// At read time: O(K celebrities followed) queries (typically < 100)

The 80-20 Rule in Action

Content Storage and Retrieval

With content references aggregated, the system needs to efficiently retrieve full post content for rendering. This involves a multi-tier storage architecture optimized for different access patterns.

3.1 Storage Tiers

Converting Mermaid diagram...

Storage Tier Characteristics
Tier	Latency	Capacity	Cost	Typical Content
Hot (Memory)	< 1ms	~50 TB	$$$$$	Last hour's viral posts
Warm (SSD)	1-5ms	~10 PB	$$$	Week's active content
Cold (HDD)	10-50ms	~500 PB	$$	Historical posts
Archive	100ms - minutes	Unlimited	$	Old posts, rarely accessed

3.2 Content Prefetching

Rather than wait for cache misses, the system proactively prefetches content likely to be needed.

prefetch_strategy.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Prefetch Manager
class ContentPrefetcher {
  
  // Strategy 1: Prefetch on aggregation
  // When building feed candidate list, prefetch content in parallel
  async onFeedRequest(userId: string): Promise<void> {
    const candidates = await getCandidatePostIds(userId);
    
    // Speculatively prefetch top candidates into hot cache
    const topCandidates = candidates.slice(0, 100);
    
    // Non-blocking prefetch (don't wait for completion)
    this.prefetchBatch(topCandidates);
  }
  
  // Strategy 2: Prefetch on trending
  // When post starts going viral, proactively cache
  async onTrendingDetected(postId: string): Promise<void> {
    const post = await coldStorage.get(postId);
    
    // Promote to all tiers
    await Promise.all([
      hotCache.set(postId, post),
      warmCache.set(postId, post),
    ]);
    
    // Prefetch author's other recent posts (likely to be accessed)
    const authorPosts = await getAuthorRecentPosts(post.authorId, 10);
    this.prefetchBatch(authorPosts.map(p => p.id));
  }
  
  // Strategy 3: Prefetch on session start
  // When user opens app, predict content needs based on patterns
  async onSessionStart(userId: string): Promise<void> {
    // Get user's typical engagement patterns
    const patterns = await getUserPatterns(userId);
    
    // Prefetch content from frequently engaged authors
    const topAuthors = patterns.topEngagedAuthors.slice(0, 20);
    for (const authorId of topAuthors) {
      const recentPosts = await getAuthorRecentPosts(authorId, 5);
      this.prefetchBatch(recentPosts.map(p => p.id));
    }
  }
}

3.3 Content Hydration

Feed caches store post IDs; full content must be 'hydrated' before serving.

content_hydration.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Hydration Service
interface HydratedPost {
  id: string;
  author: UserProfile;       // Hydrated from user service
  content: PostContent;      // Text, media URLs, etc.
  engagement: EngagementCounts;  // Like, comment, share counts
  socialContext: SocialContext;  // Friends who engaged
  media: MediaAssets[];      // Image/video URLs from CDN
}
 
async function hydratePostBatch(
  postIds: string[],
  viewerId: string
): Promise<HydratedPost[]> {
  
  // Parallel fetches from multiple services
  const [posts, authors, engagement, socialContext] = await Promise.all([
    postStore.batchGet(postIds),
    userService.batchGetProfiles(getAuthorIds(postIds)),
    engagementService.batchGetCounts(postIds),
    socialService.getSocialContext(postIds, viewerId),
  ]);
  
  // Merge into hydrated posts
  return postIds.map(id => ({
    id,
    author: authors.get(posts.get(id).authorId),
    content: posts.get(id).content,
    engagement: engagement.get(id),
    socialContext: socialContext.get(id),
    media: generateCDNUrls(posts.get(id).mediaIds),
  }));
}
 
// Batching is critical: 50 individual requests = ~500ms
// Single batch request = ~50ms
// Always batch across services!

Hydration is the Bottleneck

Social Graph Integration

4.1 Graph Data Model

social_graph_schema.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Social Graph Edges
interface GraphEdge {
  sourceId: string;      // User creating the edge
  destinationType: 'user' | 'page' | 'group';
  targetId: string;      // Friend, page, or group
  edgeType: EdgeType;    // friend, follow, member, etc.
  
  // Edge metadata for feed ranking
  createdAt: timestamp;
  interactionCount: number;  // Engagements on target's content
  lastInteraction: timestamp;
  relationship: RelationshipStrength;  // close_friend, acquaintance, etc.
}
 
enum EdgeType {
  FRIEND = 'friend',           // Bidirectional friendship
  FOLLOW = 'follow',           // Unidirectional follow
  PAGE_LIKE = 'page_like',     // Following a page
  GROUP_MEMBER = 'group_member', // Group membership
  BLOCKED = 'blocked',         // Negative edge (filter content)
}
 
// Graph statistics
// - 2 billion+ nodes (users)
// - Average edges per user: ~500 (350 friends + pages + groups)
// - Total edges: ~1 trillion
// - Edge updates per second: ~1 million (friendships, follows)

4.2 TAO: The Social Graph Store

Facebook built TAO (The Associations and Objects) specifically for social graph access patterns.

TAO Design Principles

•Objects and Associations — Two data types: nodes (users, posts) and edges (friendships, likes)
•Write-through Cache — All reads go through cache; writes update cache then database
•Eventual Consistency — Replication is asynchronous; slight staleness acceptable
•Association Lists — Get all edges of a type for a node (e.g., all friends of a user)
•Inverse Indexes — Both directions of edges indexed (who I follow, who follows me)
•Time-aware Access — Edges sorted by time for recency queries

tao_queries.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Common TAO query patterns for feed aggregation
 
// Get all friends of a user
const friends = await tao.assocRange(
  userId,           // Source node
  'friend',         // Edge type
  0,               // Offset
  1000             // Limit
);
 
// Get all pages a user follows
const pages = await tao.assocRange(
  userId,
  'page_like',
  0,
  500
);
 
// Get all groups a user is member of
const groups = await tao.assocRange(
  userId,
  'group_member',
  0,
  100
);
 
// Get mutual friends with another user
const mutualFriends = await tao.assocIntersect(
  userId,
  targetUserId,
  'friend'
);
 
// Check if user is blocked
const isBlocked = await tao.assocGet(
  userId,
  targetUserId,
  'blocked'
);
 
// These operations are O(1) or O(log N) due to indexing
// Sub-millisecond latency from cache layer

4.3 Relationship Strength Scoring

Not all connections are equal. The system computes relationship strength to prioritize content from close connections.

relationship_strength.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def compute_relationship_strength(user_id, connection_id):
    """
    Compute affinity score between user and a connection.
    Higher score = closer relationship = higher feed priority.
    """
    signals = {}
    
    # Explicit signals
    signals['is_close_friend'] = is_marked_close_friend(user_id, connection_id)
    signals['is_family'] = is_marked_family(user_id, connection_id)
    
    # Engagement signals (past 30 days)
    signals['comment_count'] = count_comments_on_connection(user_id, connection_id)
    signals['reaction_count'] = count_reactions_on_connection(user_id, connection_id)
    signals['message_count'] = count_messages_with_connection(user_id, connection_id)
    signals['profile_views'] = count_profile_views(user_id, connection_id)
    signals['tag_count'] = count_mutual_tags(user_id, connection_id)
    signals['photo_views'] = count_photo_album_views(user_id, connection_id)
    
    # Recency signals
    signals['days_since_last_interaction'] = days_since_interaction(user_id, connection_id)
    signals['days_since_friendship'] = days_since_connected(user_id, connection_id)
    
    # Network signals
    signals['mutual_friend_count'] = count_mutual_friends(user_id, connection_id)
    signals['mutual_group_count'] = count_mutual_groups(user_id, connection_id)
    
    # Weighted combination (weights learned from engagement data)
    weights = load_affinity_weights()
    score = sum(
        weights[signal] * value 
        for signal, value in signals.items()
    )
    
    # Normalize to 0-1 range
    return sigmoid(score)
 
# This score is precomputed and stored with edge metadata
# Updated periodically (daily) or on significant interactions

Affinity Scores Enable Efficient Pruning

Distributed Aggregation Pipeline

Putting it all together, the aggregation pipeline orchestrates multiple services across data centers to assemble feed candidates within latency budgets.

Converting Mermaid diagram...

5.1 Latency Budget Allocation

Feed Request Latency Budget (500ms target)
Stage	Budget	Parallelism	Notes
Network (client → edge)	50ms	Sequential	CDN/edge routing
Graph lookup	30ms	Parallel	TAO cache hit
Feed cache read	20ms	Parallel	Redis cluster
Celebrity index query	40ms	Parallel	Sharded queries
Freshness query	30ms	Parallel	Recent posts only
Candidate merging	10ms	Sequential	In-memory
Ranking inference	50ms	Sequential	ML model
Content hydration	80ms	Parallel	Batch fetches
Serialization	20ms	Sequential	Protobuf
Network (edge → client)	50ms	Sequential	Response delivery
Total	~380ms		With buffer for variability

5.2 Failure Handling and Degradation

Graceful Degradation Strategies

•Feed Cache Miss — Fall back to pull-based aggregation (slower but functional)
•Celebrity Index Timeout — Skip celebrity content, show friend content only
•Ranking Service Timeout — Use cached rankings or chronological fallback
•Social Graph Unavailable — Use stale cached connections (may be slightly outdated)
•Post Store Timeout — Show cached content with 'load more' option
•Complete Outage — Show 'last known good' feed from client cache

degradation_policy.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
// Degradation controller for feed aggregation
class FeedDegradationController {
  
  async aggregateWithDegradation(userId: string): Promise<FeedResponse> {
    const startTime = Date.now();
    const budget = 400; // ms before degradation
    
    let candidates: PostReference[] = [];
    let degradationFlags: string[] = [];
    
    // Stage 1: Try cache (always fast)
    try {
      const cached = await withTimeout(feedCache.get(userId), 50);
      candidates.push(...cached);
    } catch (e) {
      degradationFlags.push('cache_miss');
    }
    
    // Stage 2: Social graph (critical path)
    let connections;
    try {
      connections = await withTimeout(tao.getConnections(userId), 50);
    } catch (e) {
      // Fall back to cached connections
      connections = await localCache.getConnections(userId);
      degradationFlags.push('stale_graph');
    }
    
    // Stage 3: Celebrity content (non-critical)
    const remainingBudget = budget - (Date.now() - startTime);
    if (remainingBudget > 100) {
      try {
        const celebPosts = await withTimeout(
          celebrityIndex.getPosts(connections.celebrities), 
          Math.min(remainingBudget / 2, 100)
        );
        candidates.push(...celebPosts);
      } catch (e) {
        degradationFlags.push('no_celebrity_content');
      }
    }
    
    // Stage 4: Ranking (fallback to heuristic)
    let rankedFeed;
    try {
      rankedFeed = await withTimeout(rankingService.rank(candidates), 80);
    } catch (e) {
      rankedFeed = heuristicRank(candidates);
      degradationFlags.push('heuristic_ranking');
    }
    
    return {
      posts: rankedFeed,
      degraded: degradationFlags.length > 0,
      degradationFlags,
    };
  }
}

Availability Over Perfection

Summary: Content Aggregation

We've explored the infrastructure that aggregates content for feed generation. Let's consolidate the key concepts:

Key Takeaways

•Fan-out problem is central — Neither pure push nor pure pull works at scale; hybrid approaches are essential
•Classify users by impact — Handle celebrities differently: index their content centrally instead of pushing
•Cache references, not content — Store post IDs in feed cache; hydrate full content on demand
•Tiered storage optimizes cost — Hot/warm/cold tiers balance latency against storage cost
•Social graph enables aggregation — TAO provides fast edge lookups; precomputed relationship strength enables pruning
•Degrade gracefully — Every component can fail; design fallbacks that preserve user experience
•Batch everything — Individual requests are slow; batch across services to reduce round trips

What's Next:

With content aggregated and ranked, the next page explores Real-time Updates—how the feed stays fresh with new content and engagement updates without requiring full page refreshes.

Page Complete

3 / 6