System Design (HLD)Back-of-Envelope Estimation

Back-of-Envelope Estimation

LevelIntermediate

Duration90 mins

TopicBack-of-Envelope Estimation

1 / 5

Traffic Estimation

The Numbers That Define Your System

Imagine you're in a system design interview, tasked with designing Twitter's backend. The interviewer asks: "How many requests per second should your system handle?" You freeze. You've studied caching, load balancing, and database sharding—but without knowing the traffic numbers, you can't make any concrete decisions.

This is where most engineers struggle. They understand how to scale systems but not how big the system needs to be. Traffic estimation is the foundation upon which all other system design decisions rest. Get it wrong, and you'll either over-engineer a solution that wastes millions in infrastructure or under-engineer one that collapses under real load.

Traffic estimation is not about getting the exact number—it's about getting within an order of magnitude. The difference between 1,000 requests per second and 10,000 RPS fundamentally changes your architecture. The difference between 10,000 and 11,000? Usually irrelevant.

What You Will Learn

By the end of this page, you will be able to: (1) Convert business metrics like DAU and MAU into technical requirements, (2) Calculate requests per second from user behavior patterns, (3) Account for peak traffic multipliers and temporal distributions, (4) Apply the essential formulas that every senior engineer uses intuitively.

Understanding User Metrics: DAU, MAU, and Beyond

Before we can estimate traffic, we need to understand how user activity is measured. Every system design begins with user metrics—the fundamental numbers that describe how people interact with a product.

Daily Active Users (DAU) represents the number of unique users who engage with your product in a single day. This is the heartbeat of any consumer application. A social media platform might have 100 million DAU; a B2B SaaS tool might have 50,000.

Monthly Active Users (MAU) captures unique users over a 30-day window. The DAU/MAU ratio (often called "stickiness") reveals user engagement quality. A ratio of 50% means the average user engages 15 days per month—excellent for consumer apps. A ratio of 10% suggests sporadic usage, common for utility apps.

Typical DAU/MAU Ratios by Product Category
Product Category	Typical DAU/MAU	Engagement Pattern	Example
Social Media	50-70%	Multiple daily sessions	Facebook, Instagram, TikTok
Messaging	60-80%	Continuous throughout day	WhatsApp, Slack, Discord
Content Streaming	25-40%	Evening-heavy usage	Netflix, Spotify, YouTube
E-Commerce	5-15%	Purchase-intent driven	Amazon, eBay, Shopify stores
Productivity SaaS	20-35%	Workday concentrated	Notion, Figma, Jira
Utility Apps	10-20%	As-needed usage	Weather, Maps, Banking

Why this matters for traffic estimation:

Understanding the DAU/MAU ratio helps you model traffic distribution. A 70% ratio (like messaging apps) means your daily traffic is relatively consistent across the month. A 5% ratio (like tax software) means traffic is heavily concentrated in specific periods.

Concurrent Users (CCU) is another critical metric—the number of users active simultaneously at any given moment. This is typically 5-15% of DAU for consumer apps, but varies wildly by product type. A stock trading platform might see 30% CCU during market hours; a sleep tracking app might see 1% CCU at any moment.

The CCU Rule of Thumb

For most consumer applications, concurrent users peak at roughly 10% of DAU during peak hours. This means if you have 10 million DAU, expect up to 1 million simultaneous users during your busiest hour. Adjust this multiplier based on your product's usage patterns.

From Users to Requests: The Conversion Chain

User metrics tell us who is active. Now we need to understand what they're doing—specifically, how many requests each user generates.

The fundamental equation:

Total Daily Requests = DAU × Actions per User per Day × Requests per Action

Actions per User per Day depends on your product. A Twitter user might scroll their feed 10 times, post 2 tweets, like 20 posts, and view 5 profiles—that's 37 actions per day. A banking app user might check their balance twice and transfer money once—3 actions per day.

Requests per Action is where engineering meets product. A single "scroll feed" action might generate 20 API calls (fetch posts, fetch images, fetch engagement counts, prefetch next page). A "check balance" action might be just 1 API call.

traffic_estimation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Example: Twitter-scale traffic estimation
 
# User metrics
daily_active_users = 300_000_000  # 300 million DAU
 
# User behavior model
actions_per_user = {
    "scroll_feed": 10,        # Average feed refreshes per day
    "view_tweet_detail": 15,  # Click to view full tweet
    "post_tweet": 0.5,        # Half of users post once per day on average
    "like_tweet": 20,         # Likes per day
    "retweet": 2,             # Retweets per day
    "view_profile": 5,        # Profile views per day
    "search": 3,              # Searches per day
}
 
# Requests per action (API calls generated)
requests_per_action = {
    "scroll_feed": 25,        # Fetch tweets, images, counts, suggestions
    "view_tweet_detail": 10,  # Tweet, replies, related tweets
    "post_tweet": 5,          # Write tweet, update timeline, send notifications
    "like_tweet": 3,          # Update count, notify author, update recommendations
    "retweet": 4,             # Similar to post but with source reference
    "view_profile": 15,       # Profile, tweets, followers, media
    "search": 20,             # Search, autocomplete, trending, results
}
 
# Calculate total daily requests
total_daily_requests = 0
for action, frequency in actions_per_user.items():
    api_calls = frequency * requests_per_action[action]
    daily_per_user = api_calls
    total_for_action = daily_active_users * daily_per_user
    total_daily_requests += total_for_action
    print(f"{action}: {daily_per_user} API calls/user/day → {total_for_action / 1e9:.1f}B requests/day")
 
print(f"
Total Daily Requests: {total_daily_requests / 1e12:.2f} trillion")
print(f"Requests per Second (average): {total_daily_requests / 86400 / 1e6:.1f}M RPS")

Read vs Write Ratio

Most systems are heavily read-biased. Twitter's read-to-write ratio is approximately 1000:1. For every tweet posted, it's read thousands of times. Instagram is similar. Understanding this ratio is crucial for designing caching strategies and database architectures. A 1000:1 read/write ratio means aggressive caching can eliminate 99.9% of database reads.

Converting Daily Requests to Requests Per Second

Systems don't handle "daily requests"—they handle requests per second (RPS). Converting daily metrics to RPS requires understanding temporal distribution of traffic.

The Basic Conversion:

Average RPS = Daily Requests / 86,400

(There are 86,400 seconds in a day: 24 × 60 × 60)

But average RPS is dangerous—it's almost always lower than what your system actually needs to handle. Traffic is never uniformly distributed. Users sleep, work, and cluster their activity around specific hours.

The Peak Multiplier:

Real traffic follows a daily curve. The peak hour typically sees 2-3x average traffic for consumer apps, and can reach 5-10x for event-driven systems (sports streaming, flash sales, election nights).

Peak Traffic Multipliers by System Type
System Type	Peak/Average Ratio	Peak Duration	Example Scenario
Social Media (Global)	2-3x	Several hours	Evening hours across time zones
Social Media (Regional)	4-5x	2-3 hours	8-10 PM local evening peak
E-Commerce (Normal)	3-4x	2-4 hours	Lunch breaks and evening shopping
E-Commerce (Sale Events)	10-50x	Minutes to hours	Black Friday, Prime Day flash sales
Streaming (On-Demand)	2-3x	4-6 hours	Evening entertainment hours
Streaming (Live Events)	20-100x	Event duration	Super Bowl, World Cup Final
B2B SaaS	2-3x	6-8 hours	Business hours (9 AM - 5 PM)
Gaming	3-5x	4-6 hours	After school/work hours

The Complete RPS Formula:

Peak RPS = (Daily Requests / 86,400) × Peak Multiplier

Example Calculation:

Daily Requests: 10 billion
Average RPS: 10,000,000,000 / 86,400 ≈ 115,741 RPS
With 3x peak multiplier: 115,741 × 3 = 347,222 Peak RPS

Your system must handle ~350K RPS during peak hours, not the ~116K average. Designing for average means failure during peak.

The Second-Level Peak

Beyond daily peaks, there are second-level bursts. A celebrity tweet, breaking news, or viral moment can spike traffic 10x above normal peak levels for seconds to minutes. Production systems need headroom—design for 2-3x your expected peak, or implement traffic shedding and rate limiting for overflow scenarios.

Geographic Traffic Distribution

For global services, geographic distribution dramatically affects traffic patterns. A service with users across all time zones doesn't experience the sharp peaks of a single-region service—the "peak" shifts around the globe throughout the day.

Global vs Regional Traffic Patterns:

Global services benefit from traffic smoothing. When North America sleeps, Asia is awake. This naturally distributes load over 24 hours, reducing the peak/average ratio.

Regional services experience sharper peaks. A US-only service sees nearly all users active within a 4-hour evening window across time zones (7 PM Eastern to 7 PM Pacific = 10 PM - 3 AM UTC).

Global Service

•Peak/Average ratio: 1.5-2x
•Traffic never drops to zero
•Must deploy to multiple regions
•Data replication complexity increases
•Follow-the-sun support possible
•Example: Facebook, Google, Netflix

Regional Service

•Peak/Average ratio: 4-6x
•Significant overnight drop (10-20%)
•Single region deployment possible
•Simpler data consistency
•Maintenance windows available at night
•Example: Local banks, regional retailers

Modeling Geographic Distribution:

When estimating traffic for a global service, consider user distribution:

Total DAU = 100 million
Distribution:
  - North America: 30% = 30M DAU
  - Europe: 25% = 25M DAU  
  - Asia Pacific: 35% = 35M DAU
  - Rest of World: 10% = 10M DAU

Peak RPS Calculation:
  - NA peak (8-10 PM EST): 30M × 0.10 concurrent × 5 actions/minute = 2.5M RPS
  - EU peak (8-10 PM CET): 25M × 0.10 × 5 = 2.1M RPS
  - APAC peak (8-10 PM JST): 35M × 0.10 × 5 = 2.9M RPS

Non-overlapping peaks mean global peak ≈ max(regional peaks) + baseline
Global Peak RPS ≈ 3-4M RPS (not sum of regional peaks)

This is the power of geographic distribution—peak traffic doesn't simply sum across regions.

Traffic Pattern Analysis

Understanding traffic patterns goes beyond simple peak calculations. Real systems exhibit multiple traffic patterns that compound and interact.

Daily Patterns (Diurnal):

Most consumer applications follow a predictable daily curve:

6 AM - 9 AM: Morning spike (commute, morning routines)
9 AM - 12 PM: Gradual decline (work focus time)
12 PM - 2 PM: Lunch spike
2 PM - 5 PM: Afternoon plateau (lower engagement)
5 PM - 11 PM: Evening peak (highest traffic)
11 PM - 6 AM: Overnight trough (minimal traffic)

Weekly Patterns:

Weekends differ from weekdays:

B2B SaaS: 60-70% drop on weekends
Social Media: 10-20% increase on weekends
E-Commerce: Variable (weekend browsing, weekday purchasing)
Gaming: 50-100% increase on weekends

Weekly Traffic Variation by Day
Day	Relative Traffic	Notes
Monday	100%	Baseline - return from weekend
Tuesday	105%	Peak engagement mid-week
Wednesday	107%	Highest weekday engagement
Thursday	105%	Similar to Tuesday
Friday	95%	Early evening drop-off
Saturday	110%	Weekend leisure time
Sunday	108%	Sunday evening preparation drop

Seasonal Patterns:

Many businesses experience significant seasonal variation:

E-Commerce: 3-5x increase November-December (holiday shopping)
Tax Software: 10-20x increase February-April
Travel: 2-3x increase during summer and holiday booking periods
Education: 2-3x increase during school year vs summer
Sports Streaming: Event-driven spikes (playoffs, championships)

Event-Driven Spikes:

Unpredictable events can dwarf all pattern-based predictions:

Super Bowl: 10-100x normal traffic for sports/betting platforms
Product launches: Apple keynotes drive 20x traffic to news sites
Breaking news: Major events can 50x traffic to news platforms
Viral content: Single viral post can 100x traffic to small services

Well-designed systems account for these patterns through capacity planning and auto-scaling strategies.

The Thundering Herd

Beware of self-inflicted traffic spikes. Push notifications to all users, scheduled jobs that start at :00 minutes, or cache expiration at fixed times can cause 'thundering herd' problems—where millions of requests arrive simultaneously. Stagger notifications, add jitter to scheduled jobs, and use probabilistic cache expiration to smooth these artificial peaks.

Practical Estimation Techniques

When estimating traffic—especially in interviews—you often lack precise data. Here are techniques for making reasonable estimates from limited information.

Technique 1: Top-Down Estimation

Start with public information and work down:

Research the company's total registered users (often public)
Estimate DAU/MAU ratio based on product type
Estimate actions per user based on feature analysis
Apply requests-per-action multiplier

Example: "LinkedIn has ~900 million registered users and ~65 million DAU based on industry reports. For a news feed feature..."

Technique 2: Bottom-Up Estimation

Start with individual user behavior and scale up:

Model a typical user session in detail
Count API calls per session
Estimate sessions per day
Multiply by user count

This is more accurate when you understand the product deeply.

estimation_example.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Bottom-Up Estimation: Instagram Story Views
 
# Define a typical user session
class UserSession:
    def __init__(self):
        self.api_calls = 0
        
    def open_app(self):
        self.api_calls += 5  # Auth, profile, notifications, feed init, stories bar
        
    def view_stories_bar(self):
        self.api_calls += 2  # Fetch story thumbnails, mark as seen
        
    def view_single_story(self):
        self.api_calls += 3  # Fetch story content, analytics, prefetch next
        
    def navigate_stories(self, count):
        self.api_calls += count * 1  # Swipe = prefetch
        
    def watch_story_fully(self, count):
        self.api_calls += count * 2  # View completion + engagement metrics
        
    def react_to_story(self, count):
        self.api_calls += count * 3  # Send reaction, notify creator, update analytics
 
# Model typical session
session = UserSession()
session.open_app()
session.view_stories_bar()
for _ in range(10):  # View 10 stories
    session.view_single_story()
    session.navigate_stories(1)
    session.watch_story_fully(1)
session.react_to_story(2)  # React to 2 stories
 
print(f"API calls per story session: {session.api_calls}")
 
# Scale to platform level
dau = 500_000_000  # 500M DAU
sessions_per_day = 7  # Average user opens app 7 times/day
story_sessions_ratio = 0.6  # 60% of sessions include stories
 
daily_story_api_calls = dau * sessions_per_day * story_sessions_ratio * session.api_calls
print(f"Daily Story API Calls: {daily_story_api_calls / 1e12:.2f} trillion")
print(f"Average RPS: {daily_story_api_calls / 86400 / 1e6:.1f}M")
print(f"Peak RPS (3x): {daily_story_api_calls / 86400 * 3 / 1e6:.1f}M")

Technique 3: Analogous System Comparison

Compare to known systems:

"WhatsApp handles 100 billion messages per day with 2 billion MAU. Our messaging feature targets 10 million MAU, so roughly 1/200th scale = 500 million messages/day. At 3 API calls per message, that's 1.5 billion requests/day or ~17,000 RPS average, ~50,000 peak."

Technique 4: Working Backwards from Constraints

Sometimes you know the infrastructure limits and need to validate:

"We have 10 app servers, each handling 1,000 RPS max. That's 10,000 RPS capacity. With 3x peak headroom needed, our sustainable peak is ~3,300 RPS. Can our expected traffic fit within this?"

This is especially useful for capacity validation.

Essential Numbers Every Engineer Should Know

Experienced system designers have a mental library of reference numbers. Here are the traffic-related numbers you should internalize:

Reference Numbers for Traffic Estimation
Metric	Value	Derivation
Seconds per day	86,400	24 × 60 × 60
Seconds per month	2.5 million	~30 × 86,400
Seconds per year	31.5 million	~365 × 86,400
1 million RPS daily	86.4 billion requests	1M × 86,400
1 billion requests/day	~11,574 RPS average	1B / 86,400
Peak multiplier (typical)	2-3x	Evening peak vs average
Peak multiplier (events)	10-100x	Viral moments, major events
Concurrent user ratio	5-10% of DAU	At any given moment

Quick Mental Math Powers of 10

•1 million seconds ≈ 11.5 days
•1 billion seconds ≈ 31.7 years
•2.5 million ≈ seconds in a month
•100,000 seconds ≈ 1 day (actually 86,400, but close enough for estimation)
•1 million DAU with 10 actions ≈ 10 million requests/day ≈ 116 RPS
•100 million DAU with 100 actions ≈ 10 billion requests/day ≈ 116,000 RPS

The 80/20 Rule in Traffic

In most systems, 80% of traffic comes from 20% of features (or users). When designing capacity, focus on the high-traffic paths first. In interviews, identify and size the critical path before considering edge cases.

Traffic Estimation in System Design Interviews

In interviews, traffic estimation demonstrates your ability to think quantitatively about systems. Here's how to approach it effectively:

Step 1: Clarify the Scope (1-2 minutes)

"Before I estimate traffic, let me confirm a few things. Are we designing for the current scale or a 5-year projection? Is this a global service or regional? Are there known peak events we need to handle?"

This shows maturity—you're not jumping to assumptions.

Step 2: State Your Assumptions (1 minute)

"I'll assume we have 50 million DAU, with a DAU/MAU ratio of about 30%, typical of social platforms. Each user performs roughly 20 actions per day based on..."

State assumptions clearly. Interviewers may adjust them or confirm they're reasonable.

Step 3: Do the Math Out Loud (2-3 minutes)

Walk through your calculation step by step. Use round numbers for easier mental math.

interview_calculation.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Sample interview calculation (spoken out loud)
 
"Let's size the traffic for this URL shortening service.
 
User Metrics:
- We'll assume 100 million MAU
- With ~30% DAU/MAU (utility app), that's 30 million DAU
 
User Behavior:
- Creating short URLs: 1 per user per day = 30M writes/day
- Redirecting (clicking short URLs): This is the read-heavy operation
- If each URL is clicked 100 times on average, that's 3 billion reads/day
 
The math:
- Writes: 30M / 86,400 ≈ 350 writes/sec average, ~1,000 peak
- Reads: 3B / 86,400 ≈ 35,000 reads/sec average, ~100,000 peak
 
Summary:
- Read:Write ratio is 100:1 (heavily read-optimized system)
- Peak RPS: ~100,000 reads/sec, ~1,000 writes/sec
- This tells us we need aggressive read caching and can use simpler write infrastructure"

Step 4: Validate with Sanity Checks

"Let me sanity check this: 100K RPS is about 8.6 billion requests per day. With 30M DAU, that's roughly 290 requests per user, which makes sense for 3 actions triggering ~100 redirects each. The numbers are consistent."

Step 5: Acknowledge Uncertainty

"These are rough estimates. In production, I'd validate with actual metrics and build with 2-3x headroom for unexpected peaks."

This demonstrates engineering maturity—you know estimates are directional, not precise.

Interview Pro Tip

Don't get stuck on exact numbers. If you calculate 127,345 RPS, just say '~130K RPS.' Interviewers care about your reasoning process and order-of-magnitude accuracy, not arithmetic precision. Spending time on precise calculation suggests you're missing the forest for the trees.

Summary: Traffic Estimation Mastery

You've now learned the foundational skills of traffic estimation. Let's consolidate the key formulas and concepts:

Key Takeaways

•Start with user metrics — DAU, MAU, and the DAU/MAU ratio tell you engagement patterns
•Model user behavior — Actions per user × Requests per action = Daily requests per user
•Apply the RPS formula — Daily Requests ÷ 86,400 = Average RPS
•Account for peaks — Multiply by 2-3x for normal peaks, 10x+ for event-driven systems
•Consider geography — Global services smooth traffic; regional services have sharper peaks
•Identify read/write ratios — Most systems are 10:1 to 1000:1 read-heavy
•Round aggressively — Order of magnitude matters, not precision

Traffic Estimation Quick Reference
Formula	Usage
Daily Requests = DAU × Actions/User × Requests/Action	Total volume calculation
Average RPS = Daily Requests / 86,400	Baseline throughput
Peak RPS = Average RPS × Peak Multiplier	Capacity planning
CCU ≈ DAU × 0.05 to 0.15	Concurrent user estimation

What's Next:

With traffic estimation mastered, we'll move to storage estimation—calculating how much data your system needs to store based on write patterns, retention policies, and data growth projections. Traffic drives storage: traffic tells you how many requests arrive; storage tells you how much data accumulates.

Page Complete

You now have a robust framework for estimating traffic in any system design scenario. Practice by estimating traffic for products you use daily—Twitter, Netflix, Uber, Slack. The more you practice, the more instinctive these calculations become.

1 / 5

Loading learning content...

System Design (HLD)Back-of-Envelope Estimation

Back-of-Envelope Estimation

LevelIntermediate

Duration90 mins

TopicBack-of-Envelope Estimation

1 / 5

Traffic Estimation

The Numbers That Define Your System

What You Will Learn

Understanding User Metrics: DAU, MAU, and Beyond

Typical DAU/MAU Ratios by Product Category
Product Category	Typical DAU/MAU	Engagement Pattern	Example
Social Media	50-70%	Multiple daily sessions	Facebook, Instagram, TikTok
Messaging	60-80%	Continuous throughout day	WhatsApp, Slack, Discord
Content Streaming	25-40%	Evening-heavy usage	Netflix, Spotify, YouTube
E-Commerce	5-15%	Purchase-intent driven	Amazon, eBay, Shopify stores
Productivity SaaS	20-35%	Workday concentrated	Notion, Figma, Jira
Utility Apps	10-20%	As-needed usage	Weather, Maps, Banking

Why this matters for traffic estimation:

The CCU Rule of Thumb

From Users to Requests: The Conversion Chain

User metrics tell us who is active. Now we need to understand what they're doing—specifically, how many requests each user generates.

The fundamental equation:

Total Daily Requests = DAU × Actions per User per Day × Requests per Action

traffic_estimation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Example: Twitter-scale traffic estimation
 
# User metrics
daily_active_users = 300_000_000  # 300 million DAU
 
# User behavior model
actions_per_user = {
    "scroll_feed": 10,        # Average feed refreshes per day
    "view_tweet_detail": 15,  # Click to view full tweet
    "post_tweet": 0.5,        # Half of users post once per day on average
    "like_tweet": 20,         # Likes per day
    "retweet": 2,             # Retweets per day
    "view_profile": 5,        # Profile views per day
    "search": 3,              # Searches per day
}
 
# Requests per action (API calls generated)
requests_per_action = {
    "scroll_feed": 25,        # Fetch tweets, images, counts, suggestions
    "view_tweet_detail": 10,  # Tweet, replies, related tweets
    "post_tweet": 5,          # Write tweet, update timeline, send notifications
    "like_tweet": 3,          # Update count, notify author, update recommendations
    "retweet": 4,             # Similar to post but with source reference
    "view_profile": 15,       # Profile, tweets, followers, media
    "search": 20,             # Search, autocomplete, trending, results
}
 
# Calculate total daily requests
total_daily_requests = 0
for action, frequency in actions_per_user.items():
    api_calls = frequency * requests_per_action[action]
    daily_per_user = api_calls
    total_for_action = daily_active_users * daily_per_user
    total_daily_requests += total_for_action
    print(f"{action}: {daily_per_user} API calls/user/day → {total_for_action / 1e9:.1f}B requests/day")
 
print(f"
Total Daily Requests: {total_daily_requests / 1e12:.2f} trillion")
print(f"Requests per Second (average): {total_daily_requests / 86400 / 1e6:.1f}M RPS")

Read vs Write Ratio

Converting Daily Requests to Requests Per Second

Systems don't handle "daily requests"—they handle requests per second (RPS). Converting daily metrics to RPS requires understanding temporal distribution of traffic.

The Basic Conversion:

Average RPS = Daily Requests / 86,400

(There are 86,400 seconds in a day: 24 × 60 × 60)

The Peak Multiplier:

Real traffic follows a daily curve. The peak hour typically sees 2-3x average traffic for consumer apps, and can reach 5-10x for event-driven systems (sports streaming, flash sales, election nights).

Peak Traffic Multipliers by System Type
System Type	Peak/Average Ratio	Peak Duration	Example Scenario
Social Media (Global)	2-3x	Several hours	Evening hours across time zones
Social Media (Regional)	4-5x	2-3 hours	8-10 PM local evening peak
E-Commerce (Normal)	3-4x	2-4 hours	Lunch breaks and evening shopping
E-Commerce (Sale Events)	10-50x	Minutes to hours	Black Friday, Prime Day flash sales
Streaming (On-Demand)	2-3x	4-6 hours	Evening entertainment hours
Streaming (Live Events)	20-100x	Event duration	Super Bowl, World Cup Final
B2B SaaS	2-3x	6-8 hours	Business hours (9 AM - 5 PM)
Gaming	3-5x	4-6 hours	After school/work hours

The Complete RPS Formula:

Peak RPS = (Daily Requests / 86,400) × Peak Multiplier

Example Calculation:

Daily Requests: 10 billion
Average RPS: 10,000,000,000 / 86,400 ≈ 115,741 RPS
With 3x peak multiplier: 115,741 × 3 = 347,222 Peak RPS

Your system must handle ~350K RPS during peak hours, not the ~116K average. Designing for average means failure during peak.

The Second-Level Peak

Geographic Traffic Distribution

Global vs Regional Traffic Patterns:

Global services benefit from traffic smoothing. When North America sleeps, Asia is awake. This naturally distributes load over 24 hours, reducing the peak/average ratio.

Regional services experience sharper peaks. A US-only service sees nearly all users active within a 4-hour evening window across time zones (7 PM Eastern to 7 PM Pacific = 10 PM - 3 AM UTC).

Global Service

•Peak/Average ratio: 1.5-2x
•Traffic never drops to zero
•Must deploy to multiple regions
•Data replication complexity increases
•Follow-the-sun support possible
•Example: Facebook, Google, Netflix

Regional Service

•Peak/Average ratio: 4-6x
•Significant overnight drop (10-20%)
•Single region deployment possible
•Simpler data consistency
•Maintenance windows available at night
•Example: Local banks, regional retailers

Modeling Geographic Distribution:

When estimating traffic for a global service, consider user distribution:

Total DAU = 100 million
Distribution:
  - North America: 30% = 30M DAU
  - Europe: 25% = 25M DAU  
  - Asia Pacific: 35% = 35M DAU
  - Rest of World: 10% = 10M DAU

Peak RPS Calculation:
  - NA peak (8-10 PM EST): 30M × 0.10 concurrent × 5 actions/minute = 2.5M RPS
  - EU peak (8-10 PM CET): 25M × 0.10 × 5 = 2.1M RPS
  - APAC peak (8-10 PM JST): 35M × 0.10 × 5 = 2.9M RPS

Non-overlapping peaks mean global peak ≈ max(regional peaks) + baseline
Global Peak RPS ≈ 3-4M RPS (not sum of regional peaks)

This is the power of geographic distribution—peak traffic doesn't simply sum across regions.

Traffic Pattern Analysis

Understanding traffic patterns goes beyond simple peak calculations. Real systems exhibit multiple traffic patterns that compound and interact.

Daily Patterns (Diurnal):

Most consumer applications follow a predictable daily curve:

6 AM - 9 AM: Morning spike (commute, morning routines)
9 AM - 12 PM: Gradual decline (work focus time)
12 PM - 2 PM: Lunch spike
2 PM - 5 PM: Afternoon plateau (lower engagement)
5 PM - 11 PM: Evening peak (highest traffic)
11 PM - 6 AM: Overnight trough (minimal traffic)

Weekly Patterns:

Weekends differ from weekdays:

B2B SaaS: 60-70% drop on weekends
Social Media: 10-20% increase on weekends
E-Commerce: Variable (weekend browsing, weekday purchasing)
Gaming: 50-100% increase on weekends

Weekly Traffic Variation by Day
Day	Relative Traffic	Notes
Monday	100%	Baseline - return from weekend
Tuesday	105%	Peak engagement mid-week
Wednesday	107%	Highest weekday engagement
Thursday	105%	Similar to Tuesday
Friday	95%	Early evening drop-off
Saturday	110%	Weekend leisure time
Sunday	108%	Sunday evening preparation drop

Seasonal Patterns:

Many businesses experience significant seasonal variation:

E-Commerce: 3-5x increase November-December (holiday shopping)
Tax Software: 10-20x increase February-April
Travel: 2-3x increase during summer and holiday booking periods
Education: 2-3x increase during school year vs summer
Sports Streaming: Event-driven spikes (playoffs, championships)

Event-Driven Spikes:

Unpredictable events can dwarf all pattern-based predictions:

Super Bowl: 10-100x normal traffic for sports/betting platforms
Product launches: Apple keynotes drive 20x traffic to news sites
Breaking news: Major events can 50x traffic to news platforms
Viral content: Single viral post can 100x traffic to small services

Well-designed systems account for these patterns through capacity planning and auto-scaling strategies.

The Thundering Herd

Practical Estimation Techniques

When estimating traffic—especially in interviews—you often lack precise data. Here are techniques for making reasonable estimates from limited information.

Technique 1: Top-Down Estimation

Start with public information and work down:

Research the company's total registered users (often public)
Estimate DAU/MAU ratio based on product type
Estimate actions per user based on feature analysis
Apply requests-per-action multiplier

Example: "LinkedIn has ~900 million registered users and ~65 million DAU based on industry reports. For a news feed feature..."

Technique 2: Bottom-Up Estimation

Start with individual user behavior and scale up:

Model a typical user session in detail
Count API calls per session
Estimate sessions per day
Multiply by user count

This is more accurate when you understand the product deeply.

estimation_example.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Bottom-Up Estimation: Instagram Story Views
 
# Define a typical user session
class UserSession:
    def __init__(self):
        self.api_calls = 0
        
    def open_app(self):
        self.api_calls += 5  # Auth, profile, notifications, feed init, stories bar
        
    def view_stories_bar(self):
        self.api_calls += 2  # Fetch story thumbnails, mark as seen
        
    def view_single_story(self):
        self.api_calls += 3  # Fetch story content, analytics, prefetch next
        
    def navigate_stories(self, count):
        self.api_calls += count * 1  # Swipe = prefetch
        
    def watch_story_fully(self, count):
        self.api_calls += count * 2  # View completion + engagement metrics
        
    def react_to_story(self, count):
        self.api_calls += count * 3  # Send reaction, notify creator, update analytics
 
# Model typical session
session = UserSession()
session.open_app()
session.view_stories_bar()
for _ in range(10):  # View 10 stories
    session.view_single_story()
    session.navigate_stories(1)
    session.watch_story_fully(1)
session.react_to_story(2)  # React to 2 stories
 
print(f"API calls per story session: {session.api_calls}")
 
# Scale to platform level
dau = 500_000_000  # 500M DAU
sessions_per_day = 7  # Average user opens app 7 times/day
story_sessions_ratio = 0.6  # 60% of sessions include stories
 
daily_story_api_calls = dau * sessions_per_day * story_sessions_ratio * session.api_calls
print(f"Daily Story API Calls: {daily_story_api_calls / 1e12:.2f} trillion")
print(f"Average RPS: {daily_story_api_calls / 86400 / 1e6:.1f}M")
print(f"Peak RPS (3x): {daily_story_api_calls / 86400 * 3 / 1e6:.1f}M")

Technique 3: Analogous System Comparison

Compare to known systems:

Technique 4: Working Backwards from Constraints

Sometimes you know the infrastructure limits and need to validate:

"We have 10 app servers, each handling 1,000 RPS max. That's 10,000 RPS capacity. With 3x peak headroom needed, our sustainable peak is ~3,300 RPS. Can our expected traffic fit within this?"

This is especially useful for capacity validation.

Essential Numbers Every Engineer Should Know

Experienced system designers have a mental library of reference numbers. Here are the traffic-related numbers you should internalize:

Reference Numbers for Traffic Estimation
Metric	Value	Derivation
Seconds per day	86,400	24 × 60 × 60
Seconds per month	2.5 million	~30 × 86,400
Seconds per year	31.5 million	~365 × 86,400
1 million RPS daily	86.4 billion requests	1M × 86,400
1 billion requests/day	~11,574 RPS average	1B / 86,400
Peak multiplier (typical)	2-3x	Evening peak vs average
Peak multiplier (events)	10-100x	Viral moments, major events
Concurrent user ratio	5-10% of DAU	At any given moment

Quick Mental Math Powers of 10

•1 million seconds ≈ 11.5 days
•1 billion seconds ≈ 31.7 years
•2.5 million ≈ seconds in a month
•100,000 seconds ≈ 1 day (actually 86,400, but close enough for estimation)
•1 million DAU with 10 actions ≈ 10 million requests/day ≈ 116 RPS
•100 million DAU with 100 actions ≈ 10 billion requests/day ≈ 116,000 RPS

The 80/20 Rule in Traffic

Traffic Estimation in System Design Interviews

In interviews, traffic estimation demonstrates your ability to think quantitatively about systems. Here's how to approach it effectively:

Step 1: Clarify the Scope (1-2 minutes)

This shows maturity—you're not jumping to assumptions.

Step 2: State Your Assumptions (1 minute)

"I'll assume we have 50 million DAU, with a DAU/MAU ratio of about 30%, typical of social platforms. Each user performs roughly 20 actions per day based on..."

State assumptions clearly. Interviewers may adjust them or confirm they're reasonable.

Step 3: Do the Math Out Loud (2-3 minutes)

Walk through your calculation step by step. Use round numbers for easier mental math.

interview_calculation.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Sample interview calculation (spoken out loud)
 
"Let's size the traffic for this URL shortening service.
 
User Metrics:
- We'll assume 100 million MAU
- With ~30% DAU/MAU (utility app), that's 30 million DAU
 
User Behavior:
- Creating short URLs: 1 per user per day = 30M writes/day
- Redirecting (clicking short URLs): This is the read-heavy operation
- If each URL is clicked 100 times on average, that's 3 billion reads/day
 
The math:
- Writes: 30M / 86,400 ≈ 350 writes/sec average, ~1,000 peak
- Reads: 3B / 86,400 ≈ 35,000 reads/sec average, ~100,000 peak
 
Summary:
- Read:Write ratio is 100:1 (heavily read-optimized system)
- Peak RPS: ~100,000 reads/sec, ~1,000 writes/sec
- This tells us we need aggressive read caching and can use simpler write infrastructure"

Step 4: Validate with Sanity Checks

Step 5: Acknowledge Uncertainty

"These are rough estimates. In production, I'd validate with actual metrics and build with 2-3x headroom for unexpected peaks."

This demonstrates engineering maturity—you know estimates are directional, not precise.

Interview Pro Tip

Summary: Traffic Estimation Mastery

You've now learned the foundational skills of traffic estimation. Let's consolidate the key formulas and concepts:

Key Takeaways

•Start with user metrics — DAU, MAU, and the DAU/MAU ratio tell you engagement patterns
•Model user behavior — Actions per user × Requests per action = Daily requests per user
•Apply the RPS formula — Daily Requests ÷ 86,400 = Average RPS
•Account for peaks — Multiply by 2-3x for normal peaks, 10x+ for event-driven systems
•Consider geography — Global services smooth traffic; regional services have sharper peaks
•Identify read/write ratios — Most systems are 10:1 to 1000:1 read-heavy
•Round aggressively — Order of magnitude matters, not precision

Traffic Estimation Quick Reference
Formula	Usage
Daily Requests = DAU × Actions/User × Requests/Action	Total volume calculation
Average RPS = Daily Requests / 86,400	Baseline throughput
Peak RPS = Average RPS × Peak Multiplier	Capacity planning
CCU ≈ DAU × 0.05 to 0.15	Concurrent user estimation

What's Next:

Page Complete

1 / 5