Twitter/X Feed - Learning Module

Loading content...

0/273

Requirements: Post, Follow, Timeline

The Social Media Feed Problem

When you open Twitter (now X), your timeline loads in milliseconds—a personalized stream of tweets from the hundreds or thousands of accounts you follow. This seemingly simple interaction masks one of the most challenging distributed systems problems in modern software engineering.

Consider the scale:

Over 500 million tweets are posted daily
The platform serves 350+ million monthly active users
Average user follows 400+ accounts
Celebrity accounts have 100+ million followers
Timeline must load in under 200ms globally

Designing a system that can handle this scale while maintaining sub-second response times requires deep understanding of distributed systems, data modeling, caching strategies, and algorithmic trade-offs. In this module, we'll design Twitter's core features from scratch.

What You Will Learn

By the end of this page, you will master the requirements analysis for a Twitter-like system. You'll understand how to decompose the problem into functional and non-functional requirements, identify critical constraints, estimate scale, and establish clear success criteria—the foundation for all subsequent design decisions.

Understanding the Product

Before diving into technical requirements, we must deeply understand what Twitter/X actually does. This understanding guides every technical decision we make.

Twitter's Core Value Proposition:

Twitter is a real-time information network where users share short-form content with followers. Unlike Facebook's bidirectional friendships, Twitter uses a unidirectional follow model—you can follow anyone without their approval. This asymmetric relationship creates unique technical challenges:

Power law distribution: A small number of accounts (celebrities, influencers) have millions of followers, while most accounts have few
Information flows one way: When Taylor Swift tweets, 90 million people receive it; when a regular user tweets, perhaps 200 people see it
Real-time expectations: Breaking news, live events, and conversations demand near-instant delivery

Twitter vs Other Social Platforms: Key Differences
Characteristic	Twitter/X	Facebook	Instagram
Relationship Model	Unidirectional (follow)	Bidirectional (friend)	Unidirectional (follow)
Content Format	Text-first, 280 chars	Multi-format, unlimited	Image/video-first
Primary Use Case	Real-time information	Social connections	Visual storytelling
Feed Expectation	Chronological + ranked	Heavily ranked	Ranked by engagement
Follower Distribution	Extreme power law	Moderate variance	Strong power law
Time Sensitivity	Seconds matter	Hours acceptable	Hours acceptable

The Unidirectional Follow Model Implications:

This seemingly simple design choice has profound technical implications:

Asymmetric data access patterns: Reading a celebrity's tweets is common; reading a regular user's tweets is rare
Fan-out explosion: A single tweet from a popular account triggers millions of deliveries
Hot partition risks: Celebrity accounts become data hotspots
Staleness tolerance varies: Users following 50 accounts expect perfect chronology; users following 5,000 accept some delay

Interview Insight

When designing Twitter in an interview, always clarify the relationship model first. The unidirectional follow model (vs. bidirectional friendship) fundamentally changes your architecture. This question demonstrates that you understand the problem before jumping to solutions.

Core Functional Requirements

Functional requirements define what the system must do. For Twitter, we'll focus on three core features: posting tweets, following users, and viewing timelines. Each has nuanced sub-requirements that significantly impact design.

2.1 Posting Tweets (Write Path)

The tweet posting feature is deceptively complex. Let's decompose it:

Tweet Posting Requirements

•Content creation: Users can compose tweets up to 280 characters (or 4,000 for premium users)
•Media attachments: Support for images (up to 4), videos (up to 2:20 duration), GIFs, and polls
•Mentions: Reference other users with @username, triggering notifications
•Hashtags: Tag tweets with #topics for discoverability
•Replies and threads: Tweets can reply to other tweets, forming conversation chains
•Retweets and quote tweets: Amplify content with or without commentary
•Visibility controls: Public tweets, protected accounts, reply restrictions
•Geolocation: Optional location tagging
•Scheduling: Compose now, publish later

2.2 Following Users (Relationship Management)

The follow system establishes the social graph that powers content distribution:

Follow System Requirements

•Follow action: One-click to subscribe to a user's tweets
•Unfollow action: Remove subscription without notification to the followee
•Follower list: View who follows you (with pagination)
•Following list: View who you follow (with pagination)
•Protected accounts: Require approval for new followers
•Block and mute: Prevent certain users from interacting or appearing in timeline
•Follow suggestions: Recommend accounts based on interests and social graph
•Mutual follows detection: Identify when both users follow each other

2.3 Viewing Timeline (Read Path)

The timeline is the most complex and performance-critical feature:

Timeline Requirements

•Home timeline: Aggregated feed of tweets from all followed accounts
•Chronological ordering: Option to view tweets in strict time order
•Algorithmic ranking: Smart ordering based on engagement, recency, and user preferences
•Real-time updates: New tweets appear without manual refresh
•Infinite scroll: Seamless loading of older content as user scrolls
•Gap handling: Mark when user missed tweets due to inactivity
•User timeline: View all tweets from a specific user's profile
•List timelines: Custom feeds based on curated user lists
•Search timeline: Tweets matching search queries

MVP Scoping for Interviews

In a 45-minute interview, you cannot design all these features. Confirm with your interviewer: 'Should I focus on the core tweet-follow-timeline loop, or dive deep into a specific area like real-time updates or media handling?' This shows prioritization skills.

Non-Functional Requirements

Non-functional requirements (NFRs) define how well the system must perform. For Twitter, these NFRs drive architectural decisions more than functional requirements do.

3.1 Scalability Requirements

Scale Estimates for Twitter-like System
Metric	Estimate	Implications
Daily Active Users (DAU)	200-250 million	Global distribution required
Monthly Active Users (MAU)	350-400 million	Account storage at scale
Tweets per day	500 million	~6,000 writes/second average
Timeline reads per day	100+ billion	~1.2 million reads/second average
Read:Write ratio	~200:1	Heavily read-optimized architecture
Average followers per user	200-400	Moderate fan-out for most users
Max followers (celebrities)	100+ million	Extreme fan-out edge cases
Peak traffic multiplier	3-5x average	Spikes during major events

3.2 Availability Requirements

Twitter is a critical communication platform, especially during emergencies and breaking news:

Availability Targets

•Uptime SLA: 99.99% ("four nines") = ~52 minutes downtime/year maximum
•Graceful degradation: Timeline can be slightly stale; posting must always work
•Regional independence: Failure in one region shouldn't impact others
•No single point of failure: Every component must have redundancy

3.3 Latency Requirements

User experience depends critically on response times:

Latency Targets by Operation
Operation	P50 Target	P99 Target	Rationale
Home timeline load	< 100ms	< 300ms	Primary user experience
Post a tweet	< 200ms	< 500ms	Immediate feedback required
Follow/unfollow	< 100ms	< 300ms	Should feel instant
Search results	< 200ms	< 500ms	Users expect fast search
Real-time tweet delivery	< 5 seconds	< 30 seconds	"Real-time" expectation
Profile load	< 150ms	< 400ms	Frequent navigation target

3.4 Consistency Requirements

Twitter can tolerate some consistency trade-offs in favor of availability and performance:

Strong Consistency Needed

•User authentication
•Tweet ownership/authorship
•Account suspension status
•Payment/subscription state
•Privacy settings

Eventual Consistency Acceptable

•Like/retweet counts (lag OK)
•Follower counts (lag OK)
•Timeline freshness (seconds OK)
•Trending topics
•Search index updates

The CAP Theorem Trade-off

Twitter explicitly chooses Availability and Partition Tolerance over perfect Consistency. A user seeing a slightly stale timeline is acceptable; the timeline being unavailable is not. This trade-off shapes the entire architecture.

Scale Estimation (Back-of-Envelope)

Before designing any system, we must estimate the scale. These calculations inform technology choices, capacity planning, and architecture patterns.

4.1 Traffic Estimation

traffic_estimation.txt
// Daily Active Users (DAU)
DAU = 200 million users
 
// Tweet writes
Tweets per day = 500 million
Tweets per second (average) = 500M / 86,400 ≈ 5,800 TPS
Peak tweet write rate = 5,800 × 5 = ~30,000 TPS
 
// Timeline reads
Assume each DAU opens timeline 10 times/day
Timeline reads per day = 200M × 10 = 2 billion
Timeline reads per second (average) = 2B / 86,400 ≈ 23,000 RPS
Peak read rate = 23,000 × 5 = ~115,000 RPS
 
// Read:Write ratio
Ratio = 2 billion reads / 500 million writes ≈ 4:1 (timeline loads)
 
// But each timeline load fetches many tweets!
Avg tweets fetched per timeline = 50
Effective read ratio = 50 × 4 = 200:1 (tweet reads)

4.2 Storage Estimation

storage_estimation.txt
// Tweet storage
Average tweet size:
  - Tweet ID: 8 bytes
  - User ID: 8 bytes
  - Content: 280 chars × 4 bytes (UTF-8 worst case) = 1,120 bytes
  - Timestamp: 8 bytes
  - Metadata (likes, RTs, replies counts): 24 bytes
  - Media references: 50 bytes (URLs/IDs)
  Total per tweet ≈ 1,200 bytes ≈ 1.2 KB
 
Tweets per year = 500M × 365 = 182.5 billion tweets
Tweet storage per year = 182.5B × 1.2 KB ≈ 219 TB
 
// Media storage (separate system)
Assume 20% of tweets have media
Media tweets per year = 182.5B × 0.2 = 36.5 billion
Average media size = 500 KB (images compressed)
Media storage per year = 36.5B × 500 KB ≈ 18.25 PB
 
// Follow graph storage
Total users = 400 million
Average follows per user = 300
Total follow edges = 400M × 300 = 120 billion
Edge size = 16 bytes (follower_id + followee_id)
Follow graph storage = 120B × 16 bytes ≈ 1.92 TB

4.3 Bandwidth Estimation

bandwidth_estimation.txt
// Outbound bandwidth (serving timelines)
Timeline reads per second (peak) = 115,000
Tweets per timeline = 50
Tweet size (with metadata) = 2 KB
Outbound per request = 50 × 2 KB = 100 KB
 
Peak outbound = 115,000 × 100 KB = 11.5 GB/s = 92 Gbps
 
// Inbound bandwidth (receiving tweets)
Tweet writes per second (peak) = 30,000
Average tweet size = 1.2 KB
Inbound text = 30,000 × 1.2 KB = 36 MB/s
 
// Media upload bandwidth (peak)
Media tweets per second = 30,000 × 0.2 = 6,000
Average media size = 500 KB
Media inbound = 6,000 × 500 KB = 3 GB/s = 24 Gbps
 
Total peak inbound ≈ 25 Gbps

Estimation Best Practices

In interviews, round aggressively and show your reasoning. '200 million × 10 × 50 = 100 billion tweet reads daily' is better than getting lost in precise calculations. The goal is order-of-magnitude accuracy to guide architectural decisions.

Critical Constraints and Edge Cases

Every system has constraints that shape its design. For Twitter, several constraints are particularly critical:

5.1 The Celebrity Problem (Hot Partitions)

The most challenging constraint in Twitter's design is the power-law distribution of followers:

Follower Distribution Examples
Account Type	Followers	When They Tweet	Fan-out Impact
Regular user	200	Deliver to 200 timelines	Trivial
Micro-influencer	10,000	Deliver to 10K timelines	Moderate load
Celebrity	10 million	Deliver to 10M timelines	Significant spike
@BarackObama	130+ million	Deliver to 130M timelines	Potential outage
@elonmusk	170+ million	Deliver to 170M timelines	System-wide impact

This creates a fundamental design challenge: should we pre-compute timelines (fanout on write) or compute them on request (fanout on read)? We'll explore both approaches in detail later.

5.2 Real-Time Expectations

Users expect tweets to appear in their timeline within seconds of posting. For breaking news and live events, this expectation is even higher. This constraint impacts:

Message queue depth and throughput
Cache invalidation strategies
WebSocket/SSE connection management
Global data replication delays

5.3 Viral Content (Thundering Herd)

When content goes viral, millions of users may request the same tweet simultaneously. Without careful caching:

Database becomes bottleneck
Cache stampedes exhaust memory
Response times degrade exponentially

Edge Cases to Consider

•User follows 50,000 accounts: Timeline generation becomes expensive
•User hasn't logged in for months: Massive gap in timeline to fill
•Mass follow events: Bot attacks or celebrity campaigns
•Tweet deletion: Must propagate to all cached timelines
•Account suspension: All tweets must become inaccessible immediately
•Global events: Super Bowl, Elections, Disasters—traffic spikes 10x
•Reply chains: Threads can have thousands of replies

Design for Outliers

A system that works for the average user but fails for edge cases is not production-ready. Your design must handle both @regularuser with 200 followers AND @elonmusk with 170 million. This duality defines Twitter's architecture.

API Design (Contract Definition)

Before designing internal systems, let's define the external API contract. This clarifies exactly what the system must support.

6.1 Tweet APIs

tweet_apis.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
## POST /tweets
Create a new tweet.
 
Request:
{
  "content": "string (1-280 chars, 1-4000 for premium)",
  "reply_to_id": "string (optional, for replies)",
  "quote_tweet_id": "string (optional, for quote tweets)",
  "media_ids": ["string"] (optional, up to 4),
  "poll": { "options": ["string"], "duration_minutes": int } (optional),
  "geo": { "lat": float, "lon": float } (optional)
}
 
Response: 201 Created
{
  "id": "1234567890",
  "author_id": "user_abc",
  "content": "Hello, world!",
  "created_at": "2024-01-15T10:30:00Z",
  "metrics": { "likes": 0, "retweets": 0, "replies": 0, "views": 0 }
}
 
---
 
## DELETE /tweets/{tweet_id}
Delete a tweet (author only).
 
Response: 204 No Content
 
---
 
## POST /tweets/{tweet_id}/retweet
Retweet an existing tweet.
 
Response: 201 Created
{ "retweet_id": "9876543210" }
 
---
 
## POST /tweets/{tweet_id}/like
Like a tweet.
 
Response: 200 OK
{ "liked": true }

6.2 Follow APIs

follow_apis.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
## POST /users/{user_id}/follow
Follow a user.
 
Response: 200 OK
{
  "following": true,
  "pending_approval": false  // true for protected accounts
}
 
---
 
## DELETE /users/{user_id}/follow
Unfollow a user.
 
Response: 200 OK
{ "following": false }
 
---
 
## GET /users/{user_id}/followers
Get paginated list of followers.
 
Query params:
  - cursor: string (pagination token)
  - limit: int (default 20, max 100)
 
Response:
{
  "users": [
    { "id": "user_xyz", "username": "johndoe", "display_name": "John Doe", ... }
  ],
  "next_cursor": "abc123",
  "has_more": true
}
 
---
 
## GET /users/{user_id}/following
Get paginated list of accounts user follows.
(Same response format as /followers)

6.3 Timeline APIs

timeline_apis.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
## GET /timeline/home
Get authenticated user's home timeline.
 
Query params:
  - cursor: string (pagination token for older tweets)
  - since_id: string (fetch tweets newer than this ID)
  - limit: int (default 50, max 200)
  - include_replies: boolean (default true)
  - ranking: "chronological" | "algorithmic" (default "algorithmic")
 
Response:
{
  "tweets": [
    {
      "id": "1234567890",
      "author": { "id": "user_abc", "username": "alice", ... },
      "content": "Just shipped a new feature!",
      "created_at": "2024-01-15T10:30:00Z",
      "metrics": { "likes": 42, "retweets": 5, "replies": 3, "views": 1200 },
      "in_reply_to": null,
      "retweet_of": null,
      "media": []
    },
    ...
  ],
  "next_cursor": "eyJsYXN0X2lkIjogIjEyMzQ1Njc4OTAifQ==",
  "has_more": true,
  "gap_detected": false  // true if user missed tweets
}
 
---
 
## GET /users/{user_id}/tweets
Get a user's profile timeline.
 
Query params:
  - cursor, limit (same as home timeline)
  - include_replies: boolean
  - include_retweets: boolean
 
---
 
## GET /timeline/mentions
Get tweets mentioning the authenticated user.

API Design Principles

Note the use of cursor-based pagination (not offset-based), which performs better at scale. Also note 'since_id' for efficient polling—clients can ask 'what's new since I last checked?' without fetching the entire timeline.

Success Metrics and SLIs/SLOs

Before building, we must define how we'll measure success. Service Level Indicators (SLIs) and Service Level Objectives (SLOs) provide concrete targets.

7.1 Key Performance Indicators

SLIs and SLOs for Twitter-like System
Service	SLI (What We Measure)	SLO (Target)	Error Budget
Timeline API	P99 latency	< 300ms	0.1% of requests can exceed
Timeline API	Availability	99.99%	4.32 min downtime/month
Tweet Post API	P99 latency	< 500ms	0.1% can exceed
Tweet Post API	Success rate	99.9%	0.1% can fail
Tweet Delivery	Time to timeline	< 5 seconds	95th percentile
Follow API	P99 latency	< 300ms	0.1% can exceed
Search	P99 latency	< 500ms	0.5% can exceed

7.2 Business Metrics to Track

Business Health Indicators

•Timeline engagement rate: % of loaded tweets that receive interaction
•Time to first tweet: How quickly new users post their first tweet
•DAU/MAU ratio: Daily engagement vs monthly signups (stickiness)
•Tweet velocity: Average tweets per active user per day
•Follow graph density: Average follows and followers per user
•Session duration: Time spent in app per session
•Real-time freshness: Average age of newest tweet in timeline

Observability from Day One

Define these metrics before writing code. Instrument every API endpoint with latency histograms, error rates, and throughput counters. Without observability, you're flying blind—you won't know if your system meets requirements until users complain.

Summary: Requirements Complete

We've completed a thorough requirements analysis for our Twitter-like system. Let's consolidate our understanding:

Key Requirements Summary

•Functional Core: Post tweets, follow users, view home timeline—with support for replies, retweets, mentions, and media
•Scale: 200M DAU, 500M tweets/day, 100B+ timeline reads/day, 200:1 read-write ratio
•Latency: Timeline loads in <300ms P99, tweet posts in <500ms P99
•Availability: 99.99% uptime, no single points of failure
•Consistency: Eventual consistency for timelines/metrics, strong consistency for auth/ownership
•Critical Constraint: Celebrity accounts with 100M+ followers—the 'hot partition' problem
•API Contract: RESTful APIs with cursor pagination and efficient polling mechanisms

What's Next:

With requirements established, we'll explore how to actually build this system. The next page dives into Feed Generation Approaches—the architectural patterns for assembling a user's timeline from hundreds of followed accounts efficiently. We'll examine pull-based, push-based, and hybrid approaches, each with distinct trade-offs.

The requirements we've defined here will directly inform which approach works best for different user segments. The celebrity problem, in particular, will drive us toward a hybrid architecture that treats different account types differently.

Page Complete

You've mastered the requirements analysis for a Twitter-like system. You understand the functional requirements (post, follow, timeline), non-functional requirements (scale, latency, availability), critical constraints (celebrity problem, real-time expectations), and success metrics. This foundation enables informed architectural decisions in the pages ahead.