System Design (HLD)Back-of-Envelope Estimation

Back-of-Envelope Estimation

LevelIntermediate

Duration90 mins

TopicBack-of-Envelope Estimation

5 / 5

Common Estimation Formulas

The Engineer's Estimation Toolkit

In the heat of a system design interview—or during an urgent production incident—you don't have time to derive formulas from first principles. You need instant recall of key numbers and equations that let you reason quickly about systems.

This page is your reference toolkit. It consolidates everything we've covered into quick-reference formulas, provides the essential numbers every system designer should memorize, highlights common estimation mistakes, and offers practice problems to build your intuition.

Think of this as your mental cheat sheet—not for memorizing by rote, but for internalizing the patterns that experienced engineers use instinctively. After working through this material and practicing, these estimations will become second nature.

What You Will Learn

By the end of this page, you will have: (1) A quick-reference formula sheet for all estimation types, (2) Essential numbers to memorize, (3) Awareness of the most common estimation pitfalls, (4) Practice problems with worked solutions, (5) Confidence to estimate any system rapidly.

Master Formula Reference

Here are all the core formulas you need, organized by estimation type:

Traffic Estimation Formulas:

traffic_formulas.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
TRAFFIC ESTIMATION FORMULAS
============================
 
Daily Requests = DAU × Actions/User × Requests/Action
 
Average RPS = Daily Requests / 86,400
 
Peak RPS = Average RPS × Peak Multiplier (typically 2-5x)
 
Concurrent Users ≈ DAU × 0.05 to 0.15 (5-15%)
 
Read:Write Ratio = Read Requests / Write Requests
 
Write RPS = Total RPS / (1 + Read:Write Ratio)
Read RPS = Total RPS - Write RPS

Storage Estimation Formulas:

storage_formulas.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
STORAGE ESTIMATION FORMULAS
============================
 
Raw Storage = Objects/Day × Size/Object × Retention Days
 
Production Storage = Raw × Replication (3x) × Overhead (1.5x)
 
Annual Storage = Daily New Storage × 365
 
5-Year Storage = Year1 × (1 + Growth Rate)^5 + Cumulative
 
Index Overhead = Raw Data × 0.2 to 0.5 (20-50%)
 
Compression Savings = Original × 0.7 to 0.9 (30-10% of original)

Bandwidth Estimation Formulas:

bandwidth_formulas.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
BANDWIDTH ESTIMATION FORMULAS
=============================
 
Bandwidth (bytes/sec) = RPS × Response Size (bytes)
 
Bandwidth (Gbps) = (bytes/sec × 8) / 1,000,000,000
 
Daily Data = RPS × Response Size × 86,400
 
Egress = Total Bandwidth - Ingress
 
Origin Bandwidth = Total × (1 - CDN Cache Hit Ratio)
 
Monthly Egress Cost = GB × Rate (tiered pricing)

Server Estimation Formulas:

server_formulas.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SERVER ESTIMATION FORMULAS
===========================
 
Base Servers = Peak RPS / RPS per Server
 
Multi-AZ Servers = Base × (AZs / (AZs - 1))   # For N AZs, survive 1 failure
 
With Deployment = Multi-AZ × 1.15 (15% overhead)
 
Production Total = With Deployment × 1.2 (20% spike headroom)
 
Little's Law: L = λ × W
  L = concurrent requests
  λ = arrival rate (RPS)  
  W = average latency (seconds)
 
RPS = Concurrent Connections / Average Latency

Essential Numbers to Memorize

Certain numbers appear so frequently in system design that having them memorized accelerates every estimation. Here's your essential list:

Time Constants:

Time Constants for Estimation
Period	Seconds	Useful For
1 minute	60	Short-term rates
1 hour	3,600	Hourly metrics
1 day	86,400 ≈ 100,000	Round to 100K for easy math
1 month	2.5 million	Monthly volumes
1 year	31.5 million	Annual volumes
1 million seconds	~11.5 days	Sanity check
1 billion seconds	~31.7 years	Perspective

Storage and Bandwidth Units:

Storage and Bandwidth Reference
Unit	Bytes	Context
1 KB	1,000	Small JSON, text messages
1 MB	1,000,000	Photos, documents
1 GB	1 billion	Video hour (low quality)
1 TB	1 trillion	Small database
1 PB	1 quadrillion	Large scale storage
1 Gbps → GB/s	125 MB/s	Bits to bytes (/8)
10 Gbps	1.25 GB/s	Fast server NIC
100 Gbps	12.5 GB/s	Data center interconnect

Latency Numbers:

Latency Numbers Every Engineer Should Know
Operation	Latency	Notes
L1 cache reference	0.5 ns	Fastest possible
L2 cache reference	7 ns	On-CPU
Main memory reference	100 ns	DRAM access
SSD random read	150 μs	150,000 ns
SSD sequential 1 MB	1 ms	Fast storage
HDD seek	10 ms	Mechanical
Same datacenter RTT	0.5 ms	AZ to AZ
Cross-region RTT	50-150 ms	Coast to coast
Intercontinental RTT	100-300 ms	US to Europe/Asia

Throughput Numbers:

System Throughput Reference
System	Throughput	Notes
Redis get/set	100,000+ ops/sec	In-memory
PostgreSQL queries	10,000-50,000 qps	Simple queries
PostgreSQL writes	1,000-10,000 tps	With durability
Kafka throughput	1M+ msgs/sec	Per partition
HTTP server (simple)	10,000-50,000 RPS	Static/cached
HTTP server (complex)	500-5,000 RPS	With business logic
Single thread CPU ops	100M-1B ops/sec	Tight loops

The Powers of 10 Pattern

Notice how most numbers differ by powers of 10: 1ms vs 100ms vs 10s. 1KB vs 1MB vs 1GB. When estimating, round to the nearest power of 10. Being off by 2x doesn't matter; being off by 10x does.

Quick Estimation Techniques

When you need to estimate quickly—under time pressure in interviews or during incidents—use these shortcut techniques:

Technique 1: Powers of 2 for Storage

Memory and storage often use powers of 2. Memorize these:

2^10 = 1 KB (thousand)
2^20 = 1 MB (million)
2^30 = 1 GB (billion)
2^40 = 1 TB (trillion)
2^50 = 1 PB (quadrillion)

Technique 2: The 100K Seconds Trick

86,400 seconds per day ≈ 100,000 (10^5)

So for daily→per-second conversion:

1 billion requests/day ≈ 10,000 RPS (10^9 / 10^5)
100 million/day ≈ 1,000 RPS
10 million/day ≈ 100 RPS

quick_estimation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Quick estimation using the 100K trick
 
def daily_to_rps(daily_count: int) -> float:
    """Convert daily count to approximate RPS using 100K trick"""
    return daily_count / 100_000
 
# Examples
print("Daily to RPS conversions (quick method):")
print(f"  1 billion/day → {daily_to_rps(1_000_000_000):,.0f} RPS")
print(f"  100 million/day → {daily_to_rps(100_000_000):,.0f} RPS")
print(f"  10 million/day → {daily_to_rps(10_000_000):,.0f} RPS")
print(f"  1 million/day → {daily_to_rps(1_000_000):,.0f} RPS")
 
# Verify accuracy
def accurate_conversion(daily_count: int) -> float:
    return daily_count / 86_400
 
print("
Accuracy check:")
for daily in [1_000_000_000, 100_000_000, 10_000_000]:
    quick = daily_to_rps(daily)
    accurate = accurate_conversion(daily)
    error = abs(quick - accurate) / accurate * 100
    print(f"  {daily:,}/day: Quick={quick:,.0f}, Accurate={accurate:,.0f}, Error={error:.1f}%")

Technique 3: The 2.5 Million Seconds/Month

30 days × 86,400 ≈ 2.5 million seconds per month

250 billion requests/month ≈ 100,000 RPS
25 billion/month ≈ 10,000 RPS
2.5 billion/month ≈ 1,000 RPS

Technique 4: Bits/Bytes Quick Multiply

To convert Gbps to MB/s, divide by 8 and multiply by 1000:

1 Gbps = 125 MB/s
10 Gbps = 1,250 MB/s ≈ 1.25 GB/s

Quick mental rule: Gbps → MB/s: multiply by 125

Technique 5: Order of Magnitude Validation

After any calculation, sanity check:

Is this more or less than known systems?
Does the ratio make sense? (reads >> writes?)
Is the cost reasonable for a business this size?

Round Aggressively

In interviews, use round numbers: 100M not 97.3M, 1 billion not 1.073B. Round to 1, 2, 5, or 10 × powers of 10. This makes mental math fast while maintaining accuracy within the order of magnitude that matters.

Common Estimation Mistakes to Avoid

Even experienced engineers make these estimation errors. Knowing the pitfalls helps you avoid them:

Mistake 1: Forgetting the Bits/Bytes Conversion

Wrong

•"1 Gbps = 1 GB/s"
•"100 Mbps downloads 100 MB in 1 second"

Correct

•"1 Gbps = 125 MB/s (divide by 8)"
•"100 Mbps takes 8 seconds for 100 MB"

Mistake 2: Using Average Instead of Peak

Wrong

•"100K RPS average, so 100K capacity"
•"Average load determines server count"

Correct

•"100K avg × 3 peak = 300K capacity needed"
•"Peak load + headroom determines capacity"

Mistake 3: Ignoring Replication and Overhead

Wrong

•"100GB data = 100GB storage"
•"Just calculate raw data size"

Correct

•"100GB × 3 (replicas) × 1.5 (overhead) = 450GB"
•"Include indexes, backups, replication"

Mistake 4: Assuming Linear Scaling

Wrong

•"10x users = just add 10x servers"
•"Double CPUs = double throughput"

Correct

•"10x users may need new architecture (sharding, etc.)"
•"Amdahl's Law limits parallel scaling"

Mistake 5: Forgetting About Time (Retention)

Wrong

•"1GB/day = 1GB storage"
•"Only calculate daily storage"

Correct

•"1GB/day × 365 days = 365GB after 1 year"
•"Include retention period in calculation"

The False Precision Trap

Don't calculate exact numbers like 127,345.72 RPS. This implies false precision when your inputs (DAU, actions/user) are estimates. Say '~130K RPS' or 'roughly 100K-150K RPS.' Ranges communicate appropriate confidence levels.

Complete System Estimation: Twitter Clone

Let's walk through a complete estimation for a Twitter-like social media platform to demonstrate how all the pieces fit together.

twitter_complete_estimation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
"""
Complete Back-of-Envelope Estimation: Twitter Clone
=====================================================
"""
 
# ===================
# 1. TRAFFIC ESTIMATION
# ===================
 
# User metrics
total_users = 500_000_000           # 500M registered users
dau = 200_000_000                   # 200M DAU (40% engagement)
mau = 350_000_000                   # 350M MAU
dau_mau_ratio = dau / mau           # ~57% stickiness
 
# User behavior
tweets_per_user_day = 0.5           # Average (many read-only)
feed_views_per_user = 10            # Feed refreshes
likes_per_user = 20                 # Likes given
profile_views_per_user = 5          # Profile views
 
# Calculate operations per day
tweets_per_day = dau * tweets_per_user_day          # 100M tweets/day
feed_views_per_day = dau * feed_views_per_user      # 2B feed views/day
likes_per_day = dau * likes_per_user                # 4B likes/day
profile_views_per_day = dau * profile_views_per_user  # 1B profile views/day
 
# API calls per operation
tweets_per_tweet = 5                 # Write + fan-out + notify
tweets_per_feed_view = 25            # Multiple API calls
tweets_per_like = 3                  # Update + notify + record
 
# Total daily API calls
write_api_calls = tweets_per_day * tweets_per_tweet
read_api_calls = (
    feed_views_per_day * tweets_per_feed_view +
    profile_views_per_day * 15 +
    likes_per_day * tweets_per_like
)
total_api_calls = write_api_calls + read_api_calls
 
# Convert to RPS
average_rps = total_api_calls / 86_400
peak_rps = average_rps * 3
 
print("=== TRAFFIC ESTIMATION ===")
print(f"DAU: {dau/1e6:.0f}M")
print(f"Daily API calls: {total_api_calls/1e12:.1f} trillion")
print(f"Average RPS: {average_rps/1e6:.1f}M")
print(f"Peak RPS: {peak_rps/1e6:.1f}M")
print(f"Read:Write ratio: {read_api_calls/write_api_calls:.0f}:1")
 
# ===================
# 2. STORAGE ESTIMATION
# ===================
 
# Tweet storage
tweet_size_bytes = 500              # Text + metadata
tweets_yearly = tweets_per_day * 365
tweet_storage_yearly = tweets_yearly * tweet_size_bytes
 
# Media storage (30% of tweets have media)
media_tweets_ratio = 0.30
avg_media_size_mb = 3               # All resolutions
media_storage_yearly = tweets_yearly * media_tweets_ratio * avg_media_size_mb * 1e6
 
# User data (500M users × 2KB each)
user_storage = total_users * 2000
 
# Engagement data (likes, retweets, follows)
engagement_events_daily = likes_per_day + tweets_per_day * 2  # likes + RTs
engagement_record_bytes = 32
engagement_storage_yearly = engagement_events_daily * 365 * engagement_record_bytes
 
# Total raw storage
total_raw_storage = (
    tweet_storage_yearly +
    media_storage_yearly +
    user_storage +
    engagement_storage_yearly
)
 
# With replication and overhead
storage_multiplier = 3 * 1.5  # 3x replication, 50% overhead
total_production_storage = total_raw_storage * storage_multiplier
 
print(f"
=== STORAGE ESTIMATION ===")
print(f"Tweets/year: {tweets_yearly/1e9:.0f}B")
print(f"Tweet storage/year: {tweet_storage_yearly/1e15:.2f} PB")
print(f"Media storage/year: {media_storage_yearly/1e15:.2f} PB")
print(f"Total raw/year: {total_raw_storage/1e15:.2f} PB")
print(f"Total production/year: {total_production_storage/1e15:.1f} PB")
 
# ===================
# 3. BANDWIDTH ESTIMATION
# ===================
 
# Egress (downloads)
# Feed views dominate egress
feed_response_kb = 50               # JSON, no embedded media
media_views_per_day = feed_views_per_day * 5  # 5 media items per feed view
avg_media_view_kb = 200             # Lazy loaded, optimized
 
egress_feed = feed_views_per_day * feed_response_kb * 1024
egress_media = media_views_per_day * avg_media_view_kb * 1024
total_daily_egress = egress_feed + egress_media
 
# Convert to bandwidth
egress_bps = (total_daily_egress * 8) / 86_400
egress_gbps = egress_bps / 1e9
peak_egress_gbps = egress_gbps * 3
 
# CDN impact
cdn_hit_ratio = 0.90
origin_egress_gbps = peak_egress_gbps * (1 - cdn_hit_ratio)
 
print(f"
=== BANDWIDTH ESTIMATION ===")
print(f"Daily egress: {total_daily_egress/1e15:.2f} PB")
print(f"Average egress: {egress_gbps/1000:.1f} Tbps")
print(f"Peak egress: {peak_egress_gbps/1000:.1f} Tbps")
print(f"Origin (10% cache miss): {origin_egress_gbps:.0f} Gbps")
 
# ===================
# 4. SERVER ESTIMATION
# ===================
 
# Application servers (handles API)
rps_per_app_server = 2000           # Mixed read/write
base_app_servers = peak_rps / rps_per_app_server
 
# Apply redundancy (3 AZs, survive 1)
app_servers_with_az = base_app_servers * 1.5
# Deployment overhead
app_servers_with_deploy = app_servers_with_az * 1.15
# Spike headroom
app_servers_final = app_servers_with_deploy * 1.2
 
# Cache layer (Redis)
hot_data_gb = 500                   # Recently accessed tweets, trending
redis_instance_gb = 64
redis_instances = (hot_data_gb * 3) / redis_instance_gb  # 3x for replication
 
# Database (distributed)
# Assume DynamoDB/Cassandra style - estimate by write throughput
write_rps = peak_rps * 0.01         # 1% writes at peak
writes_per_db_node = 10_000
db_nodes = write_rps / writes_per_db_node * 3  # 3x replication
 
print(f"
=== SERVER ESTIMATION ===")
print(f"App servers (base): {base_app_servers:,.0f}")
print(f"App servers (production): {app_servers_final:,.0f}")
print(f"Redis instances: {redis_instances:.0f}")
print(f"Database nodes: {db_nodes:.0f}")
 
# ===================
# 5. SUMMARY
# ===================
print(f"
{'='*50}")
print("TWITTER CLONE ESTIMATION SUMMARY")
print(f"{'='*50}")
print(f"DAU: {dau/1e6:.0f} million")
print(f"Peak RPS: {peak_rps/1e6:.1f} million")
print(f"Annual storage: {total_production_storage/1e15:.1f} PB")
print(f"Peak bandwidth: {peak_egress_gbps/1000:.1f} Tbps")
print(f"Application servers: ~{round(app_servers_final, -2):,.0f}")
print(f"Cache instances: ~{round(redis_instances, -1):.0f}")
print(f"Database nodes: ~{round(db_nodes, -1):.0f}")

Practice Problems

Practice builds intuition. Work through these problems before checking the solutions.

Problem 1: URL Shortener

Given:

500 million URL creations per month
Each URL is clicked 50 times on average
Retention: forever

Calculate:

Write RPS and Read RPS
Storage after 5 years
Bandwidth requirements

Problem 2: Video Streaming Service

Given:

10 million DAU
Average viewing: 2 hours per day
Average bitrate: 5 Mbps
10% of users watch at any given time

Calculate:

Concurrent viewers at peak
Bandwidth at peak
Daily data transfer

Problem 3: Chat Application

Given:

50 million DAU
100 messages per user per day
Average message: 500 bytes
30-day message retention

Calculate:

Messages per second
Storage requirements
WebSocket connection capacity needed

Try Before Checking

Work through these problems on paper before looking at the solutions. The act of struggling with the estimation builds the intuition you need for interviews. Set a timer for 5 minutes per problem.

Practice Problem Solutions

Solution 1: URL Shortener

solution_1_url_shortener.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# URL Shortener Solution
 
# Given
url_creations_month = 500_000_000
clicks_per_url = 50
retention = "forever"  # 5 years for calculation
 
# Traffic
writes_per_second = url_creations_month / (30 * 86_400)
reads_per_second = writes_per_second * clicks_per_url
 
print("URL Shortener Traffic:")
print(f"  Write RPS: {writes_per_second:,.0f} (×3 peak = {writes_per_second*3:,.0f})")
print(f"  Read RPS: {reads_per_second:,.0f} (×3 peak = {reads_per_second*3:,.0f})")
print(f"  Read:Write ratio: {clicks_per_url}:1")
 
# Storage
url_record_bytes = 200  # short_code + long_url + metadata
urls_in_5_years = url_creations_month * 12 * 5
storage_raw = urls_in_5_years * url_record_bytes
storage_production = storage_raw * 3 * 1.5  # replication + overhead
 
print(f"
URL Shortener Storage (5 years):")
print(f"  URLs created: {urls_in_5_years/1e9:.0f}B")
print(f"  Raw storage: {storage_raw/1e12:.1f} TB")
print(f"  Production storage: {storage_production/1e12:.1f} TB")
 
# Bandwidth (reads return 302 redirect, minimal)
redirect_response_bytes = 300
daily_bandwidth = reads_per_second * 86400 * redirect_response_bytes
bandwidth_mbps = (reads_per_second * 3 * redirect_response_bytes * 8) / 1e6
 
print(f"
URL Shortener Bandwidth:")
print(f"  Daily transfer: {daily_bandwidth/1e9:.1f} GB")
print(f"  Peak bandwidth: {bandwidth_mbps:.0f} Mbps")

Solution 2: Video Streaming Service

solution_2_video_streaming.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Video Streaming Solution
 
# Given
dau = 10_000_000
hours_per_user = 2
avg_bitrate_mbps = 5
concurrent_ratio = 0.10
 
# Concurrent viewers
concurrent_viewers = dau * concurrent_ratio
peak_concurrent = concurrent_viewers * 1.5  # Peak is 1.5x average
 
print("Video Streaming - Concurrent Viewers:")
print(f"  Average concurrent: {concurrent_viewers/1e6:.1f}M")
print(f"  Peak concurrent: {peak_concurrent/1e6:.1f}M")
 
# Bandwidth at peak
peak_bandwidth_mbps = peak_concurrent * avg_bitrate_mbps
peak_bandwidth_tbps = peak_bandwidth_mbps / 1e6
 
print(f"
Video Streaming - Bandwidth:")
print(f"  Peak bandwidth: {peak_bandwidth_tbps:.1f} Tbps")
 
# Daily data transfer
total_viewing_hours = dau * hours_per_user
data_per_hour_gb = (avg_bitrate_mbps * 3600) / (8 * 1024)  # Mbps to GB
daily_data_pb = (total_viewing_hours * data_per_hour_gb) / 1e6
 
print(f"
Video Streaming - Daily Transfer:")
print(f"  Total viewing hours: {total_viewing_hours/1e6:.0f}M hours")
print(f"  Data per hour: {data_per_hour_gb:.1f} GB")
print(f"  Daily transfer: {daily_data_pb:.1f} PB")

Solution 3: Chat Application

solution_3_chat_app.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Chat Application Solution
 
# Given
dau = 50_000_000
messages_per_user = 100
msg_size_bytes = 500
retention_days = 30
 
# Messages per second
daily_messages = dau * messages_per_user
messages_per_second = daily_messages / 86_400
peak_mps = messages_per_second * 3
 
print("Chat Application - Message Rate:")
print(f"  Daily messages: {daily_messages/1e9:.0f}B")
print(f"  Average MPS: {messages_per_second/1e3:.0f}K")
print(f"  Peak MPS: {peak_mps/1e3:.0f}K")
 
# Storage (30-day rolling)
messages_retained = daily_messages * retention_days
raw_storage = messages_retained * msg_size_bytes
production_storage = raw_storage * 3 * 1.5
 
print(f"
Chat Application - Storage:")
print(f"  Messages retained: {messages_retained/1e9:.0f}B")
print(f"  Raw storage: {raw_storage/1e12:.1f} TB")
print(f"  Production storage: {production_storage/1e12:.1f} TB")
 
# WebSocket connections
# Assume 60% DAU online at peak, each with 1 connection
concurrent_connections = dau * 0.60
# Distribute across servers (each handles ~50K connections)
ws_servers = concurrent_connections / 50_000
 
print(f"
Chat Application - Connections:")
print(f"  Peak concurrent connections: {concurrent_connections/1e6:.0f}M")
print(f"  WebSocket servers needed: {ws_servers:.0f}")

Interview Estimation Cheat Sheet

Print this mental checklist for quick reference before interviews:

Step 1: Clarify Requirements (1-2 min)

Confirm DAU/MAU or total users
Identify read vs write operations
Ask about peak events or seasonality
Confirm retention requirements

Step 2: Key Assumptions (1 min)

State assumptions explicitly
Use industry benchmarks for ratios
Round to convenient numbers

Step 3: Traffic Estimation (2-3 min)

Daily operations from user behavior
Convert to RPS (/86,400)
Apply peak multiplier (×3 default)

Step 4: Storage Estimation (2-3 min)

Object size × count × retention
Add overhead (×3-5 for production)
Consider media separately

Step 5: Bandwidth (1-2 min)

RPS × response size
Peak bandwidth (×3)
CDN reduces origin 90%+

Step 6: Server Count (2-3 min)

Capacity per server (depends on workload)
Add AZ redundancy (×1.5)
Add deployment/spike headroom (×1.3)

Step 7: Sanity Check (30 sec)

Compare to known systems
Verify ratios make sense
Acknowledge uncertainty

Quick Reference Numbers
Remember	Value	Usage
Seconds/day	~100,000	Easy division
Seconds/month	~2.5M	Monthly to RPS
Peak multiplier	3x	Default assumption
Storage overhead	3-5x	Replication + indexes
CDN hit ratio	90-95%	Reduces origin load
AZ redundancy	1.5x	Survive 1 AZ failure
vCPU benchmark	1-5K RPS	Simple API servers

Confidence in Estimation

The goal of back-of-envelope estimation isn't precision—it's demonstrating you can reason quantitatively about systems. Interviewers are evaluating your thought process and awareness of scale, not your arithmetic. Stay calm, state assumptions clearly, and show your work.

Summary: Estimation Mastery Complete

Congratulations! You've completed the comprehensive guide to back-of-envelope estimation. Let's consolidate what you've learned:

Module Key Takeaways

•Traffic Estimation: DAU → Actions → Requests → RPS → Peak RPS
•Storage Estimation: Objects × Size × Retention × Replication × Overhead
•Bandwidth Estimation: RPS × Response Size, with CDN reducing origin 90%+
•Server Estimation: Peak RPS / Capacity × Redundancy Factors
•Common Formulas: Memorize key conversions (86,400 sec/day, 8 bits/byte)
•Quick Techniques: Powers of 10, 100K trick, round aggressively
•Common Mistakes: Bits/bytes confusion, average vs peak, missing overhead

The Estimation Mindset:

Order of magnitude matters, not precision — 100K vs 1M is significant; 100K vs 130K is not
State assumptions explicitly — Makes your reasoning transparent and adjustable
Always apply safety factors — Peak traffic, redundancy, and operational headroom
Sanity check against reality — Compare to known systems; if Twitter needs 1000 servers, your 100M-user system shouldn't need 10,000
Practice makes intuitive — After estimating 50 systems, you'll have instant intuition for scale

Module Complete: Back-of-Envelope Estimation

You now possess the quantitative skills that separate junior engineers from senior ones. These estimation abilities apply to every system design interview and every capacity planning conversation in your career. Next, you'll learn how to take these estimated requirements and synthesize them into high-level system designs.

5 / 5

Loading learning content...

System Design (HLD)Back-of-Envelope Estimation

Back-of-Envelope Estimation

LevelIntermediate

Duration90 mins

TopicBack-of-Envelope Estimation

5 / 5

Common Estimation Formulas

The Engineer's Estimation Toolkit

What You Will Learn

Master Formula Reference

Here are all the core formulas you need, organized by estimation type:

Traffic Estimation Formulas:

traffic_formulas.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
TRAFFIC ESTIMATION FORMULAS
============================
 
Daily Requests = DAU × Actions/User × Requests/Action
 
Average RPS = Daily Requests / 86,400
 
Peak RPS = Average RPS × Peak Multiplier (typically 2-5x)
 
Concurrent Users ≈ DAU × 0.05 to 0.15 (5-15%)
 
Read:Write Ratio = Read Requests / Write Requests
 
Write RPS = Total RPS / (1 + Read:Write Ratio)
Read RPS = Total RPS - Write RPS

Storage Estimation Formulas:

storage_formulas.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
STORAGE ESTIMATION FORMULAS
============================
 
Raw Storage = Objects/Day × Size/Object × Retention Days
 
Production Storage = Raw × Replication (3x) × Overhead (1.5x)
 
Annual Storage = Daily New Storage × 365
 
5-Year Storage = Year1 × (1 + Growth Rate)^5 + Cumulative
 
Index Overhead = Raw Data × 0.2 to 0.5 (20-50%)
 
Compression Savings = Original × 0.7 to 0.9 (30-10% of original)

Bandwidth Estimation Formulas:

bandwidth_formulas.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
BANDWIDTH ESTIMATION FORMULAS
=============================
 
Bandwidth (bytes/sec) = RPS × Response Size (bytes)
 
Bandwidth (Gbps) = (bytes/sec × 8) / 1,000,000,000
 
Daily Data = RPS × Response Size × 86,400
 
Egress = Total Bandwidth - Ingress
 
Origin Bandwidth = Total × (1 - CDN Cache Hit Ratio)
 
Monthly Egress Cost = GB × Rate (tiered pricing)

Server Estimation Formulas:

server_formulas.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SERVER ESTIMATION FORMULAS
===========================
 
Base Servers = Peak RPS / RPS per Server
 
Multi-AZ Servers = Base × (AZs / (AZs - 1))   # For N AZs, survive 1 failure
 
With Deployment = Multi-AZ × 1.15 (15% overhead)
 
Production Total = With Deployment × 1.2 (20% spike headroom)
 
Little's Law: L = λ × W
  L = concurrent requests
  λ = arrival rate (RPS)  
  W = average latency (seconds)
 
RPS = Concurrent Connections / Average Latency

Essential Numbers to Memorize

Certain numbers appear so frequently in system design that having them memorized accelerates every estimation. Here's your essential list:

Time Constants:

Time Constants for Estimation
Period	Seconds	Useful For
1 minute	60	Short-term rates
1 hour	3,600	Hourly metrics
1 day	86,400 ≈ 100,000	Round to 100K for easy math
1 month	2.5 million	Monthly volumes
1 year	31.5 million	Annual volumes
1 million seconds	~11.5 days	Sanity check
1 billion seconds	~31.7 years	Perspective

Storage and Bandwidth Units:

Storage and Bandwidth Reference
Unit	Bytes	Context
1 KB	1,000	Small JSON, text messages
1 MB	1,000,000	Photos, documents
1 GB	1 billion	Video hour (low quality)
1 TB	1 trillion	Small database
1 PB	1 quadrillion	Large scale storage
1 Gbps → GB/s	125 MB/s	Bits to bytes (/8)
10 Gbps	1.25 GB/s	Fast server NIC
100 Gbps	12.5 GB/s	Data center interconnect

Latency Numbers:

Latency Numbers Every Engineer Should Know
Operation	Latency	Notes
L1 cache reference	0.5 ns	Fastest possible
L2 cache reference	7 ns	On-CPU
Main memory reference	100 ns	DRAM access
SSD random read	150 μs	150,000 ns
SSD sequential 1 MB	1 ms	Fast storage
HDD seek	10 ms	Mechanical
Same datacenter RTT	0.5 ms	AZ to AZ
Cross-region RTT	50-150 ms	Coast to coast
Intercontinental RTT	100-300 ms	US to Europe/Asia

Throughput Numbers:

System Throughput Reference
System	Throughput	Notes
Redis get/set	100,000+ ops/sec	In-memory
PostgreSQL queries	10,000-50,000 qps	Simple queries
PostgreSQL writes	1,000-10,000 tps	With durability
Kafka throughput	1M+ msgs/sec	Per partition
HTTP server (simple)	10,000-50,000 RPS	Static/cached
HTTP server (complex)	500-5,000 RPS	With business logic
Single thread CPU ops	100M-1B ops/sec	Tight loops

The Powers of 10 Pattern

Notice how most numbers differ by powers of 10: 1ms vs 100ms vs 10s. 1KB vs 1MB vs 1GB. When estimating, round to the nearest power of 10. Being off by 2x doesn't matter; being off by 10x does.

Quick Estimation Techniques

When you need to estimate quickly—under time pressure in interviews or during incidents—use these shortcut techniques:

Technique 1: Powers of 2 for Storage

Memory and storage often use powers of 2. Memorize these:

2^10 = 1 KB (thousand)
2^20 = 1 MB (million)
2^30 = 1 GB (billion)
2^40 = 1 TB (trillion)
2^50 = 1 PB (quadrillion)

Technique 2: The 100K Seconds Trick

86,400 seconds per day ≈ 100,000 (10^5)

So for daily→per-second conversion:

1 billion requests/day ≈ 10,000 RPS (10^9 / 10^5)
100 million/day ≈ 1,000 RPS
10 million/day ≈ 100 RPS

quick_estimation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Quick estimation using the 100K trick
 
def daily_to_rps(daily_count: int) -> float:
    """Convert daily count to approximate RPS using 100K trick"""
    return daily_count / 100_000
 
# Examples
print("Daily to RPS conversions (quick method):")
print(f"  1 billion/day → {daily_to_rps(1_000_000_000):,.0f} RPS")
print(f"  100 million/day → {daily_to_rps(100_000_000):,.0f} RPS")
print(f"  10 million/day → {daily_to_rps(10_000_000):,.0f} RPS")
print(f"  1 million/day → {daily_to_rps(1_000_000):,.0f} RPS")
 
# Verify accuracy
def accurate_conversion(daily_count: int) -> float:
    return daily_count / 86_400
 
print("
Accuracy check:")
for daily in [1_000_000_000, 100_000_000, 10_000_000]:
    quick = daily_to_rps(daily)
    accurate = accurate_conversion(daily)
    error = abs(quick - accurate) / accurate * 100
    print(f"  {daily:,}/day: Quick={quick:,.0f}, Accurate={accurate:,.0f}, Error={error:.1f}%")

Technique 3: The 2.5 Million Seconds/Month

30 days × 86,400 ≈ 2.5 million seconds per month

250 billion requests/month ≈ 100,000 RPS
25 billion/month ≈ 10,000 RPS
2.5 billion/month ≈ 1,000 RPS

Technique 4: Bits/Bytes Quick Multiply

To convert Gbps to MB/s, divide by 8 and multiply by 1000:

1 Gbps = 125 MB/s
10 Gbps = 1,250 MB/s ≈ 1.25 GB/s

Quick mental rule: Gbps → MB/s: multiply by 125

Technique 5: Order of Magnitude Validation

After any calculation, sanity check:

Is this more or less than known systems?
Does the ratio make sense? (reads >> writes?)
Is the cost reasonable for a business this size?

Round Aggressively

Common Estimation Mistakes to Avoid

Even experienced engineers make these estimation errors. Knowing the pitfalls helps you avoid them:

Mistake 1: Forgetting the Bits/Bytes Conversion

Wrong

•"1 Gbps = 1 GB/s"
•"100 Mbps downloads 100 MB in 1 second"

Correct

•"1 Gbps = 125 MB/s (divide by 8)"
•"100 Mbps takes 8 seconds for 100 MB"

Mistake 2: Using Average Instead of Peak

Wrong

•"100K RPS average, so 100K capacity"
•"Average load determines server count"

Correct

•"100K avg × 3 peak = 300K capacity needed"
•"Peak load + headroom determines capacity"

Mistake 3: Ignoring Replication and Overhead

Wrong

•"100GB data = 100GB storage"
•"Just calculate raw data size"

Correct

•"100GB × 3 (replicas) × 1.5 (overhead) = 450GB"
•"Include indexes, backups, replication"

Mistake 4: Assuming Linear Scaling

Wrong

•"10x users = just add 10x servers"
•"Double CPUs = double throughput"

Correct

•"10x users may need new architecture (sharding, etc.)"
•"Amdahl's Law limits parallel scaling"

Mistake 5: Forgetting About Time (Retention)

Wrong

•"1GB/day = 1GB storage"
•"Only calculate daily storage"

Correct

•"1GB/day × 365 days = 365GB after 1 year"
•"Include retention period in calculation"

The False Precision Trap

Complete System Estimation: Twitter Clone

Let's walk through a complete estimation for a Twitter-like social media platform to demonstrate how all the pieces fit together.

twitter_complete_estimation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
"""
Complete Back-of-Envelope Estimation: Twitter Clone
=====================================================
"""
 
# ===================
# 1. TRAFFIC ESTIMATION
# ===================
 
# User metrics
total_users = 500_000_000           # 500M registered users
dau = 200_000_000                   # 200M DAU (40% engagement)
mau = 350_000_000                   # 350M MAU
dau_mau_ratio = dau / mau           # ~57% stickiness
 
# User behavior
tweets_per_user_day = 0.5           # Average (many read-only)
feed_views_per_user = 10            # Feed refreshes
likes_per_user = 20                 # Likes given
profile_views_per_user = 5          # Profile views
 
# Calculate operations per day
tweets_per_day = dau * tweets_per_user_day          # 100M tweets/day
feed_views_per_day = dau * feed_views_per_user      # 2B feed views/day
likes_per_day = dau * likes_per_user                # 4B likes/day
profile_views_per_day = dau * profile_views_per_user  # 1B profile views/day
 
# API calls per operation
tweets_per_tweet = 5                 # Write + fan-out + notify
tweets_per_feed_view = 25            # Multiple API calls
tweets_per_like = 3                  # Update + notify + record
 
# Total daily API calls
write_api_calls = tweets_per_day * tweets_per_tweet
read_api_calls = (
    feed_views_per_day * tweets_per_feed_view +
    profile_views_per_day * 15 +
    likes_per_day * tweets_per_like
)
total_api_calls = write_api_calls + read_api_calls
 
# Convert to RPS
average_rps = total_api_calls / 86_400
peak_rps = average_rps * 3
 
print("=== TRAFFIC ESTIMATION ===")
print(f"DAU: {dau/1e6:.0f}M")
print(f"Daily API calls: {total_api_calls/1e12:.1f} trillion")
print(f"Average RPS: {average_rps/1e6:.1f}M")
print(f"Peak RPS: {peak_rps/1e6:.1f}M")
print(f"Read:Write ratio: {read_api_calls/write_api_calls:.0f}:1")
 
# ===================
# 2. STORAGE ESTIMATION
# ===================
 
# Tweet storage
tweet_size_bytes = 500              # Text + metadata
tweets_yearly = tweets_per_day * 365
tweet_storage_yearly = tweets_yearly * tweet_size_bytes
 
# Media storage (30% of tweets have media)
media_tweets_ratio = 0.30
avg_media_size_mb = 3               # All resolutions
media_storage_yearly = tweets_yearly * media_tweets_ratio * avg_media_size_mb * 1e6
 
# User data (500M users × 2KB each)
user_storage = total_users * 2000
 
# Engagement data (likes, retweets, follows)
engagement_events_daily = likes_per_day + tweets_per_day * 2  # likes + RTs
engagement_record_bytes = 32
engagement_storage_yearly = engagement_events_daily * 365 * engagement_record_bytes
 
# Total raw storage
total_raw_storage = (
    tweet_storage_yearly +
    media_storage_yearly +
    user_storage +
    engagement_storage_yearly
)
 
# With replication and overhead
storage_multiplier = 3 * 1.5  # 3x replication, 50% overhead
total_production_storage = total_raw_storage * storage_multiplier
 
print(f"
=== STORAGE ESTIMATION ===")
print(f"Tweets/year: {tweets_yearly/1e9:.0f}B")
print(f"Tweet storage/year: {tweet_storage_yearly/1e15:.2f} PB")
print(f"Media storage/year: {media_storage_yearly/1e15:.2f} PB")
print(f"Total raw/year: {total_raw_storage/1e15:.2f} PB")
print(f"Total production/year: {total_production_storage/1e15:.1f} PB")
 
# ===================
# 3. BANDWIDTH ESTIMATION
# ===================
 
# Egress (downloads)
# Feed views dominate egress
feed_response_kb = 50               # JSON, no embedded media
media_views_per_day = feed_views_per_day * 5  # 5 media items per feed view
avg_media_view_kb = 200             # Lazy loaded, optimized
 
egress_feed = feed_views_per_day * feed_response_kb * 1024
egress_media = media_views_per_day * avg_media_view_kb * 1024
total_daily_egress = egress_feed + egress_media
 
# Convert to bandwidth
egress_bps = (total_daily_egress * 8) / 86_400
egress_gbps = egress_bps / 1e9
peak_egress_gbps = egress_gbps * 3
 
# CDN impact
cdn_hit_ratio = 0.90
origin_egress_gbps = peak_egress_gbps * (1 - cdn_hit_ratio)
 
print(f"
=== BANDWIDTH ESTIMATION ===")
print(f"Daily egress: {total_daily_egress/1e15:.2f} PB")
print(f"Average egress: {egress_gbps/1000:.1f} Tbps")
print(f"Peak egress: {peak_egress_gbps/1000:.1f} Tbps")
print(f"Origin (10% cache miss): {origin_egress_gbps:.0f} Gbps")
 
# ===================
# 4. SERVER ESTIMATION
# ===================
 
# Application servers (handles API)
rps_per_app_server = 2000           # Mixed read/write
base_app_servers = peak_rps / rps_per_app_server
 
# Apply redundancy (3 AZs, survive 1)
app_servers_with_az = base_app_servers * 1.5
# Deployment overhead
app_servers_with_deploy = app_servers_with_az * 1.15
# Spike headroom
app_servers_final = app_servers_with_deploy * 1.2
 
# Cache layer (Redis)
hot_data_gb = 500                   # Recently accessed tweets, trending
redis_instance_gb = 64
redis_instances = (hot_data_gb * 3) / redis_instance_gb  # 3x for replication
 
# Database (distributed)
# Assume DynamoDB/Cassandra style - estimate by write throughput
write_rps = peak_rps * 0.01         # 1% writes at peak
writes_per_db_node = 10_000
db_nodes = write_rps / writes_per_db_node * 3  # 3x replication
 
print(f"
=== SERVER ESTIMATION ===")
print(f"App servers (base): {base_app_servers:,.0f}")
print(f"App servers (production): {app_servers_final:,.0f}")
print(f"Redis instances: {redis_instances:.0f}")
print(f"Database nodes: {db_nodes:.0f}")
 
# ===================
# 5. SUMMARY
# ===================
print(f"
{'='*50}")
print("TWITTER CLONE ESTIMATION SUMMARY")
print(f"{'='*50}")
print(f"DAU: {dau/1e6:.0f} million")
print(f"Peak RPS: {peak_rps/1e6:.1f} million")
print(f"Annual storage: {total_production_storage/1e15:.1f} PB")
print(f"Peak bandwidth: {peak_egress_gbps/1000:.1f} Tbps")
print(f"Application servers: ~{round(app_servers_final, -2):,.0f}")
print(f"Cache instances: ~{round(redis_instances, -1):.0f}")
print(f"Database nodes: ~{round(db_nodes, -1):.0f}")

Practice Problems

Practice builds intuition. Work through these problems before checking the solutions.

Problem 1: URL Shortener

Given:

500 million URL creations per month
Each URL is clicked 50 times on average
Retention: forever

Calculate:

Write RPS and Read RPS
Storage after 5 years
Bandwidth requirements

Problem 2: Video Streaming Service

Given:

10 million DAU
Average viewing: 2 hours per day
Average bitrate: 5 Mbps
10% of users watch at any given time

Calculate:

Concurrent viewers at peak
Bandwidth at peak
Daily data transfer

Problem 3: Chat Application

Given:

50 million DAU
100 messages per user per day
Average message: 500 bytes
30-day message retention

Calculate:

Messages per second
Storage requirements
WebSocket connection capacity needed

Try Before Checking

Work through these problems on paper before looking at the solutions. The act of struggling with the estimation builds the intuition you need for interviews. Set a timer for 5 minutes per problem.

Practice Problem Solutions

Solution 1: URL Shortener

solution_1_url_shortener.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# URL Shortener Solution
 
# Given
url_creations_month = 500_000_000
clicks_per_url = 50
retention = "forever"  # 5 years for calculation
 
# Traffic
writes_per_second = url_creations_month / (30 * 86_400)
reads_per_second = writes_per_second * clicks_per_url
 
print("URL Shortener Traffic:")
print(f"  Write RPS: {writes_per_second:,.0f} (×3 peak = {writes_per_second*3:,.0f})")
print(f"  Read RPS: {reads_per_second:,.0f} (×3 peak = {reads_per_second*3:,.0f})")
print(f"  Read:Write ratio: {clicks_per_url}:1")
 
# Storage
url_record_bytes = 200  # short_code + long_url + metadata
urls_in_5_years = url_creations_month * 12 * 5
storage_raw = urls_in_5_years * url_record_bytes
storage_production = storage_raw * 3 * 1.5  # replication + overhead
 
print(f"
URL Shortener Storage (5 years):")
print(f"  URLs created: {urls_in_5_years/1e9:.0f}B")
print(f"  Raw storage: {storage_raw/1e12:.1f} TB")
print(f"  Production storage: {storage_production/1e12:.1f} TB")
 
# Bandwidth (reads return 302 redirect, minimal)
redirect_response_bytes = 300
daily_bandwidth = reads_per_second * 86400 * redirect_response_bytes
bandwidth_mbps = (reads_per_second * 3 * redirect_response_bytes * 8) / 1e6
 
print(f"
URL Shortener Bandwidth:")
print(f"  Daily transfer: {daily_bandwidth/1e9:.1f} GB")
print(f"  Peak bandwidth: {bandwidth_mbps:.0f} Mbps")

Solution 2: Video Streaming Service

solution_2_video_streaming.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Video Streaming Solution
 
# Given
dau = 10_000_000
hours_per_user = 2
avg_bitrate_mbps = 5
concurrent_ratio = 0.10
 
# Concurrent viewers
concurrent_viewers = dau * concurrent_ratio
peak_concurrent = concurrent_viewers * 1.5  # Peak is 1.5x average
 
print("Video Streaming - Concurrent Viewers:")
print(f"  Average concurrent: {concurrent_viewers/1e6:.1f}M")
print(f"  Peak concurrent: {peak_concurrent/1e6:.1f}M")
 
# Bandwidth at peak
peak_bandwidth_mbps = peak_concurrent * avg_bitrate_mbps
peak_bandwidth_tbps = peak_bandwidth_mbps / 1e6
 
print(f"
Video Streaming - Bandwidth:")
print(f"  Peak bandwidth: {peak_bandwidth_tbps:.1f} Tbps")
 
# Daily data transfer
total_viewing_hours = dau * hours_per_user
data_per_hour_gb = (avg_bitrate_mbps * 3600) / (8 * 1024)  # Mbps to GB
daily_data_pb = (total_viewing_hours * data_per_hour_gb) / 1e6
 
print(f"
Video Streaming - Daily Transfer:")
print(f"  Total viewing hours: {total_viewing_hours/1e6:.0f}M hours")
print(f"  Data per hour: {data_per_hour_gb:.1f} GB")
print(f"  Daily transfer: {daily_data_pb:.1f} PB")

Solution 3: Chat Application

solution_3_chat_app.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Chat Application Solution
 
# Given
dau = 50_000_000
messages_per_user = 100
msg_size_bytes = 500
retention_days = 30
 
# Messages per second
daily_messages = dau * messages_per_user
messages_per_second = daily_messages / 86_400
peak_mps = messages_per_second * 3
 
print("Chat Application - Message Rate:")
print(f"  Daily messages: {daily_messages/1e9:.0f}B")
print(f"  Average MPS: {messages_per_second/1e3:.0f}K")
print(f"  Peak MPS: {peak_mps/1e3:.0f}K")
 
# Storage (30-day rolling)
messages_retained = daily_messages * retention_days
raw_storage = messages_retained * msg_size_bytes
production_storage = raw_storage * 3 * 1.5
 
print(f"
Chat Application - Storage:")
print(f"  Messages retained: {messages_retained/1e9:.0f}B")
print(f"  Raw storage: {raw_storage/1e12:.1f} TB")
print(f"  Production storage: {production_storage/1e12:.1f} TB")
 
# WebSocket connections
# Assume 60% DAU online at peak, each with 1 connection
concurrent_connections = dau * 0.60
# Distribute across servers (each handles ~50K connections)
ws_servers = concurrent_connections / 50_000
 
print(f"
Chat Application - Connections:")
print(f"  Peak concurrent connections: {concurrent_connections/1e6:.0f}M")
print(f"  WebSocket servers needed: {ws_servers:.0f}")

Interview Estimation Cheat Sheet

Print this mental checklist for quick reference before interviews:

Step 1: Clarify Requirements (1-2 min)

Confirm DAU/MAU or total users
Identify read vs write operations
Ask about peak events or seasonality
Confirm retention requirements

Step 2: Key Assumptions (1 min)

State assumptions explicitly
Use industry benchmarks for ratios
Round to convenient numbers

Step 3: Traffic Estimation (2-3 min)

Daily operations from user behavior
Convert to RPS (/86,400)
Apply peak multiplier (×3 default)

Step 4: Storage Estimation (2-3 min)

Object size × count × retention
Add overhead (×3-5 for production)
Consider media separately

Step 5: Bandwidth (1-2 min)

RPS × response size
Peak bandwidth (×3)
CDN reduces origin 90%+

Step 6: Server Count (2-3 min)

Capacity per server (depends on workload)
Add AZ redundancy (×1.5)
Add deployment/spike headroom (×1.3)

Step 7: Sanity Check (30 sec)

Compare to known systems
Verify ratios make sense
Acknowledge uncertainty

Quick Reference Numbers
Remember	Value	Usage
Seconds/day	~100,000	Easy division
Seconds/month	~2.5M	Monthly to RPS
Peak multiplier	3x	Default assumption
Storage overhead	3-5x	Replication + indexes
CDN hit ratio	90-95%	Reduces origin load
AZ redundancy	1.5x	Survive 1 AZ failure
vCPU benchmark	1-5K RPS	Simple API servers

Confidence in Estimation

Summary: Estimation Mastery Complete

Congratulations! You've completed the comprehensive guide to back-of-envelope estimation. Let's consolidate what you've learned:

Module Key Takeaways

•Traffic Estimation: DAU → Actions → Requests → RPS → Peak RPS
•Storage Estimation: Objects × Size × Retention × Replication × Overhead
•Bandwidth Estimation: RPS × Response Size, with CDN reducing origin 90%+
•Server Estimation: Peak RPS / Capacity × Redundancy Factors
•Common Formulas: Memorize key conversions (86,400 sec/day, 8 bits/byte)
•Quick Techniques: Powers of 10, 100K trick, round aggressively
•Common Mistakes: Bits/bytes confusion, average vs peak, missing overhead

The Estimation Mindset:

Order of magnitude matters, not precision — 100K vs 1M is significant; 100K vs 130K is not
State assumptions explicitly — Makes your reasoning transparent and adjustable
Always apply safety factors — Peak traffic, redundancy, and operational headroom
Sanity check against reality — Compare to known systems; if Twitter needs 1000 servers, your 100M-user system shouldn't need 10,000
Practice makes intuitive — After estimating 50 systems, you'll have instant intuition for scale

Module Complete: Back-of-Envelope Estimation

5 / 5