Loading learning content...
In 2020, Instagram reported storing over 1.2 billion photos daily. WhatsApp processes 100 billion messages every day. Netflix stores thousands of hours of video in multiple resolutions across global data centers. Behind these staggering numbers lies a fundamental system design question: How much storage do we actually need?
Storage estimation is where theoretical capacity meets economic reality. Unlike traffic—which is transient—storage accumulates. Every message sent, every photo uploaded, every log line written takes up space forever until explicitly deleted. A 1% underestimation in traffic means slightly slower responses. A 10% underestimation in storage means you run out of disk space and experience a catastrophic outage.
More critically, storage decisions are hard to reverse. Choosing the wrong database or storage tier is expensive to fix. Data migrations at scale can take months. Storage estimation isn't just about calculating numbers—it's about making architectural decisions with multi-year consequences.
By the end of this page, you will be able to: (1) Calculate storage requirements for different data types, (2) Project storage growth over multiple years, (3) Understand replication and backup overhead, (4) Optimize storage costs through tiered strategies, (5) Apply these principles in system design interviews.
Storage estimation follows a systematic framework. Every data point in your system flows through this chain:
The Storage Equation:
Total Storage = Objects Created × Size per Object × Retention Period × Replication Factor × Overhead
Let's decompose each component:
Objects Created: How many new data items are written per time unit? For Twitter, this is tweets per day. For Netflix, this is new video hours per month. For a banking system, this is transactions per day.
Size per Object: How large is each item? A tweet is a few kilobytes including metadata. A Netflix movie is potentially terabytes across all quality levels. Accurate size estimation requires understanding the data model.
Retention Period: How long do you keep data? Session logs might be kept for 30 days. Financial transactions for 7 years (regulatory). Social media posts forever (until user deletion).
Replication Factor: How many copies exist? Production databases typically replicate 3x. Backups add more. Cross-region redundancy doubles again.
Overhead: Indexes, metadata, filesystem overhead, and operational headroom. Typically 20-50% on top of raw data.
| Factor | Typical Multiplier | Reason |
|---|---|---|
| Database replication | 3x | Primary + 2 replicas for HA |
| Cross-region redundancy | 2x | DR in secondary region |
| Backup copies | 1.5-2x | Daily/weekly/monthly backups |
| Index overhead | 1.2-1.5x | B-tree indexes, secondary indexes |
| Filesystem overhead | 1.1-1.2x | Block allocation, metadata |
| Operational headroom | 1.3x | 30% free space for operations |
These factors multiply, not add. If your raw data is 100TB: 100TB × 3 (replication) × 2 (DR) × 1.5 (backups) × 1.3 (indexes) × 1.2 (headroom) = 1,404 TB ≈ 1.4 PB. Your 100TB of 'data' requires 1.4PB of actual storage infrastructure.
Different data types have vastly different storage characteristics. A senior engineer intuitively knows approximate sizes for common objects. Here's your reference guide:
| Data Type | Typical Size | Size Range | Storage Considerations |
|---|---|---|---|
| User ID (UUID) | 16-36 bytes | 16B binary, 36B string | Use binary UUIDs for 60% space savings |
| Integer ID | 4-8 bytes | int32 vs int64 | int64 for >2B records |
| Timestamp | 8 bytes | 4-8 bytes | Unix epoch (4B) or precise datetime (8B) |
| Short text (username) | 20-50 bytes | Variable | VARCHAR, not fixed CHAR |
| Medium text (tweet) | 300-500 bytes | 140-280 chars + metadata | UTF-8 encoding varies by language |
| Long text (article) | 5-50 KB | Variable | Consider compression |
| JSON document | 1-10 KB | Variable | JSONB more compact than text JSON |
| Thumbnail image | 10-50 KB | Variable | Aggressive compression |
| Standard photo | 2-5 MB | 1-20 MB | Quality-dependent |
| HD video (1 min) | 50-150 MB | Variable | Highly codec-dependent |
| 4K video (1 min) | 200-500 MB | Variable | Multiple formats for adaptive streaming |
| Log entry | 200-500 bytes | Variable | Structured logs more compact |
| Metric data point | 8-32 bytes | Variable | Time-series optimized storage |
Character Encoding Matters:
Character size varies by encoding:
A global platform with multi-language support should assume average 2 bytes per character for text content.
Metadata Overhead:
Every object has metadata beyond its content:
A "simple" tweet is 280 characters + 50+ bytes of metadata.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
# Detailed size calculation for a single tweet class Tweet: """ Tweet storage size breakdown All sizes in bytes """ # Core identifiers tweet_id: int = 8 # int64 user_id: int = 8 # int64 # Content text_max: int = 560 # 280 chars × ~2 bytes (UTF-8 average) # Timestamps created_at: int = 8 # datetime updated_at: int = 8 # datetime # Engagement counters (denormalized for read performance) like_count: int = 4 # int32 retweet_count: int = 4 # int32 reply_count: int = 4 # int32 quote_count: int = 4 # int32 # Metadata language_code: int = 3 # 'en', 'es', 'jp', etc. source_app: int = 50 # 'Twitter for iPhone' # References reply_to_id: int = 8 # Nullable - original tweet ID quoted_tweet_id: int = 8 # Nullable - quoted tweet ID # Media references (actual media stored separately) media_ids: int = 32 # Up to 4 media items × 8 bytes # Location (optional) geo_lat: int = 8 # double geo_lng: int = 8 # double place_id: int = 24 # String reference # Flags is_sensitive: int = 1 # boolean is_reply: int = 1 # boolean has_media: int = 1 # boolean def total_size(self) -> int: return ( self.tweet_id + self.user_id + self.text_max + self.created_at + self.updated_at + self.like_count + self.retweet_count + self.reply_count + self.quote_count + self.language_code + self.source_app + self.reply_to_id + self.quoted_tweet_id + self.media_ids + self.geo_lat + self.geo_lng + self.place_id + self.is_sensitive + self.is_reply + self.has_media ) tweet = Tweet()print(f"Single tweet size: {tweet.total_size()} bytes ≈ {tweet.total_size()/1024:.2f} KB") # Scale calculationtweets_per_day = 500_000_000 # 500 million tweets/dayraw_daily_storage = tweets_per_day * tweet.total_size()print(f"Daily tweet storage: {raw_daily_storage / (1024**4):.2f} TB (raw)")print(f"With 3x replication: {raw_daily_storage * 3 / (1024**4):.2f} TB")print(f"Yearly (365 days): {raw_daily_storage * 365 / (1024**5):.2f} PB")Databases add significant overhead beyond raw data. Understanding these overheads is crucial for accurate estimation.
Index Overhead:
Indexes trade space for query speed. A B-tree index on a column adds approximately:
Example: A users table with 100M rows × 500 bytes = 50GB raw data
Total: 50GB + 23GB = 73GB (46% overhead just from indexes)
Write Amplification:
When you write 1KB of data, the database might write 10KB or more:
| Database Type | Storage Efficiency | Index Overhead | Best For |
|---|---|---|---|
| PostgreSQL | High (TOAST compression) | 10-30% | General purpose, structured data |
| MySQL (InnoDB) | Medium | 15-40% | OLTP workloads |
| MongoDB | Medium (BSON) | 20-50% | Flexible schemas |
| Cassandra | Low (replication) | 5-15% | Write-heavy, wide-column |
| Redis | Low (in-memory) | 50-100% | Caching, sessions |
| ClickHouse | Very High (columnar) | 5-10% | Analytics, time-series |
| Elasticsearch | Low | 100-300% | Full-text search |
Elasticsearch Special Case:
Elasticsearch deserves special mention because its storage overhead often surprises engineers:
DynamoDB/Cassandra Distribution:
Distributed databases spread data across partitions:
When sizing, account for partition overhead and potential imbalance.
Text and JSON compress extremely well—often 5-10x reduction. Enable compression for archival storage. However, compression increases CPU usage. For hot data with frequent access, the CPU cost may outweigh storage savings. Compress cold data aggressively; keep hot data uncompressed for performance.
Media typically dominates storage in consumer applications. A single 4K video can consume more storage than millions of text records.
Image Storage:
Images are stored at multiple sizes for different use cases:
Total storage per image = sum of all versions ≈ 3-5 MB on average.
Video Storage:
Video requires multiple renditions for adaptive streaming (HLS/DASH):
Total storage per minute of video (all qualities) ≈ 230 MB/min
123456789101112131415161718192021222324252627282930313233343536373839404142434445
# Storage calculation for a YouTube-like platform # Video upload assumptionshours_uploaded_per_minute = 500 # YouTube actual stat: 500 hours/minminutes_per_day = 60 * 24 # Minutes of video uploaded dailyvideo_minutes_daily = hours_uploaded_per_minute * 60 * minutes_per_dayprint(f"Video minutes uploaded daily: {video_minutes_daily:,}") # Storage per minute of video (all quality levels)storage_per_minute_mb = { "144p": 1.5, "240p": 3, "360p": 4, "480p": 8, "720p": 22, "1080p": 45, "1440p": 90, "2160p (4K)": 150,} # Not all videos are encoded at all qualities# Assume average encoding profileaverage_mb_per_minute = 100 # Weighted average # Daily storage (raw)daily_storage_tb = video_minutes_daily * average_mb_per_minute / (1024 * 1024)print(f"Daily raw video storage: {daily_storage_tb:,.0f} TB") # With CDN distribution (multiple copies across edge locations)cdn_copies = 3 # Minimum copies for global coverage# Plus origin storage with replicationorigin_replication = 3 # Actual storage (origin + some CDN)# CDN typically caches popular 20% of contentcdn_cached_percentage = 0.20 total_daily = (daily_storage_tb * origin_replication + daily_storage_tb * cdn_cached_percentage * cdn_copies)print(f"Daily storage with replication: {total_daily:,.0f} TB")print(f"Annual storage growth: {total_daily * 365 / 1000:,.1f} PB/year")| Media Type | Typical Size | Storage Strategy |
|---|---|---|
| Profile photo | 500 KB total (all sizes) | Cache aggressively, rarely changes |
| Social media photo | 3-5 MB (all sizes) | Hot storage for recent, cold for old |
| Short video (15 sec) | 50-100 MB (all qualities) | CDN caching, adaptive streaming |
| Standard video (10 min) | 1-3 GB (all qualities) | Tiered storage by view count |
| Movie (2 hours) | 20-50 GB (all qualities) | Origin + edge caching |
| User-generated document | 100 KB - 10 MB | Deduplicated storage |
| Audio track (3 min) | 10-30 MB (all qualities) | Cache popular tracks |
Many platforms see significant duplicate content. The same meme might be uploaded thousands of times. Content-addressable storage (using content hash as key) can reduce storage by 20-40% for UGC platforms. This is why services like IPFS and similar deduplication strategies matter at scale.
Storage planning requires looking years into the future. You can't simply buy more databases mid-year when capacity runs out.
The Compound Growth Formula:
Storage(year N) = Current Storage × (1 + Growth Rate)^N + Cumulative New Data
But this is simplistic. Real storage growth depends on:
Modeling Growth Rates:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
# 5-Year storage projection model from dataclasses import dataclassfrom typing import List @dataclassclass YearlyProjection: year: int dau: int objects_per_user_day: float object_size_bytes: int retention_days: int new_storage_pb: float cumulative_storage_pb: float def project_storage( initial_dau: int, dau_growth_rate: float, engagement_growth_rate: float, initial_objects_per_user: float, object_size_bytes: int, retention_days: int, years: int) -> List[YearlyProjection]: projections = [] cumulative_storage = 0 for year in range(1, years + 1): # Calculate metrics for this year dau = int(initial_dau * (1 + dau_growth_rate) ** (year - 1)) objects_per_user = initial_objects_per_user * (1 + engagement_growth_rate) ** (year - 1) # Annual data creation daily_objects = dau * objects_per_user annual_objects = daily_objects * 365 annual_storage_bytes = annual_objects * object_size_bytes annual_storage_pb = annual_storage_bytes / (1024 ** 5) # Account for data retention # If retention is 365 days, we keep 1 year of data # If retention is unlimited, cumulative grows indefinitely if retention_days >= 365: cumulative_storage += annual_storage_pb else: # Rolling retention - only keep retention_days worth daily_bytes = (dau * objects_per_user * object_size_bytes) cumulative_storage = (daily_bytes * retention_days) / (1024 ** 5) projections.append(YearlyProjection( year=year, dau=dau, objects_per_user_day=objects_per_user, object_size_bytes=object_size_bytes, retention_days=retention_days, new_storage_pb=annual_storage_pb, cumulative_storage_pb=cumulative_storage )) return projections # Example: Social media platform with photosprojections = project_storage( initial_dau=50_000_000, # 50M DAU year 1 dau_growth_rate=0.25, # 25% YoY user growth engagement_growth_rate=0.10, # 10% more photos per user each year initial_objects_per_user=3, # 3 photos/day initially object_size_bytes=3 * 1024 * 1024, # 3MB per photo (all sizes) retention_days=36500, # Keep forever (100 years) years=5) print("5-Year Storage Projection (Photo Platform)")print("=" * 70)for p in projections: print(f"Year {p.year}: DAU={p.dau/1e6:.0f}M | " f"Photos/user={p.objects_per_user_day:.1f} | " f"New={p.new_storage_pb:.1f}PB | " f"Total={p.cumulative_storage_pb:.1f}PB")| Year | DAU | New Data (PB) | Total Data (PB) | Storage Cost (est) |
|---|---|---|---|---|
| Year 1 | 50M | 16.4 PB | 16.4 PB | $400K/month |
| Year 2 | 62.5M | 22.5 PB | 38.9 PB | $970K/month |
| Year 3 | 78.1M | 30.9 PB | 69.8 PB | $1.7M/month |
| Year 4 | 97.6M | 42.5 PB | 112.3 PB | $2.8M/month |
| Year 5 | 122M | 58.4 PB | 170.7 PB | $4.3M/month |
Storage costs grow exponentially while revenue typically grows linearly or sub-linearly. A platform adding 16PB in Year 1 adds 58PB in Year 5. Without tiered storage strategies (moving cold data to cheaper tiers), storage costs can consume unsustainable portions of revenue.
Not all data deserves the same storage class. Hot data needs fast access; cold data just needs to exist. Tiered storage is key to controlling costs at scale.
The Storage Temperature Model:
Hot Storage: Frequently accessed, low latency required (<10ms). SSDs, high-IOPS databases. Most expensive.
Warm Storage: Occasionally accessed, moderate latency acceptable (<100ms). HDDs, standard cloud storage.
Cold Storage: Rarely accessed, high latency acceptable (<1 hour). Archive storage, tape.
Glacier/Archive: Almost never accessed, retrieval takes hours. Compliance and disaster recovery.
| Tier | Use Case | Latency | Cost per GB/month | Retrieval Cost |
|---|---|---|---|---|
| S3 Standard | Frequent access | <10ms | $0.023 | Free |
| S3 Intelligent-Tiering | Variable access | <10ms | $0.0225 | Free |
| S3 Standard-IA | Infrequent access | <10ms | $0.0125 | $0.01/GB |
| S3 One Zone-IA | Recreatable data | <10ms | $0.01 | $0.01/GB |
| S3 Glacier Instant | Rare access | <10ms | $0.004 | $0.03/GB |
| S3 Glacier Flexible | Archive | Minutes-hours | $0.0036 | $0.03-0.05/GB |
| S3 Glacier Deep Archive | Long-term archive | 12-48 hours | $0.00099 | $0.02/GB |
Automatic Tiering Strategies:
Implement lifecycle policies based on access patterns:
Policy: Social Media Photos
- Days 0-30: S3 Standard (frequently viewed)
- Days 31-90: S3 Standard-IA (occasional viewing)
- Days 91-365: S3 Glacier Instant (rare viewing)
- Days 365+: S3 Glacier Deep Archive (almost never)
The 80/20 Rule of Storage:
In most systems:
By tiering aggressively, you can reduce costs by 60-80% while maintaining user experience for the actively accessed content.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
# Calculate tiered storage savings class StorageTierAnalysis: def __init__(self, total_storage_pb: float, monthly_growth_pb: float): self.total_storage_pb = total_storage_pb self.monthly_growth_pb = monthly_growth_pb def calculate_flat_cost(self) -> float: """All data in S3 Standard""" cost_per_gb = 0.023 gb = self.total_storage_pb * 1024 * 1024 # PB to GB return gb * cost_per_gb def calculate_tiered_cost(self) -> float: """ Distribution: - 15% Hot (last 30 days of new data) - 25% Warm (31-90 days) - 30% Cold (91-365 days) - 30% Archive (365+ days) """ gb = self.total_storage_pb * 1024 * 1024 hot_pct, warm_pct, cold_pct, archive_pct = 0.15, 0.25, 0.30, 0.30 hot_cost = gb * hot_pct * 0.023 # S3 Standard warm_cost = gb * warm_pct * 0.0125 # S3 Standard-IA cold_cost = gb * cold_pct * 0.004 # Glacier Instant archive_cost = gb * archive_pct * 0.001 # Glacier Deep Archive return hot_cost + warm_cost + cold_cost + archive_cost def savings_analysis(self) -> dict: flat = self.calculate_flat_cost() tiered = self.calculate_tiered_cost() savings = flat - tiered savings_pct = (savings / flat) * 100 return { "flat_cost_monthly": flat, "tiered_cost_monthly": tiered, "monthly_savings": savings, "savings_percentage": savings_pct, "annual_savings": savings * 12 } # Example: 100PB photo storage platformanalysis = StorageTierAnalysis(total_storage_pb=100, monthly_growth_pb=5)results = analysis.savings_analysis() print("Storage Cost Analysis: 100PB Photo Platform")print("=" * 50)print(f"Flat pricing (all S3 Standard): ${results['flat_cost_monthly']:, .0f } / month")print(f"Tiered pricing: ${results['tiered_cost_monthly']:,.0f}/month")print(f"Monthly savings: ${results['monthly_savings']:,.0f}")print(f"Savings percentage: {results['savings_percentage']:.1f}%")print(f"Annual savings: ${results['annual_savings']/1e6:.1f}M")For a 100PB storage footprint, proper tiering can save $20+ million annually versus flat S3 Standard pricing. This is why every major platform has dedicated storage infrastructure teams focused on data lifecycle management.
Let's walk through complete storage estimations for three different system types:
Example 1: URL Shortening Service (Like bit.ly)
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
# URL Shortening Service Storage Estimation # Service assumptionsdaily_url_creations = 100_000_000 # 100M URLs created / dayservice_lifespan_years = 10 # URLs never expire # Data model per URLurl_record = { "short_code": 7, # 7 character code "original_url": 200, # Average URL length "user_id": 8, # int64(nullable for anonymous) "created_at": 8, # timestamp "click_count": 4, # int32 "last_clicked": 8, # timestamp(nullable) "metadata": 50, # custom tracking params, title}bytes_per_url = sum(url_record.values())print(f"Bytes per URL: {bytes_per_url}") # Analytics data(per click)click_record = { "click_id": 8, "short_code": 7, "timestamp": 8, "ip_hash": 16, # Anonymized "user_agent": 100, "referer": 100, "country": 2, "device_type": 1,}bytes_per_click = sum(click_record.values())print(f"Bytes per click: {bytes_per_click}") # Assume each URL gets clicked 50 times on averageclicks_per_url = 50 # Daily storagedaily_url_storage = daily_url_creations * bytes_per_urldaily_click_storage = daily_url_creations * clicks_per_url * bytes_per_clickdaily_total = daily_url_storage + daily_click_storage print(f"Daily URL storage: {daily_url_storage / 1e9:.1f} GB")print(f"Daily click storage: {daily_click_storage / 1e9:.1f} GB")print(f"Daily total: {daily_total / 1e9:.1f} GB") # With replication and overhead(3x replication, 1.5x indexes / overhead)storage_multiplier = 3 * 1.5daily_actual = daily_total * storage_multiplier # 10 - year projectionten_year_storage = daily_actual * 365 * service_lifespan_yearsprint(f"10-year storage: {ten_year_storage / 1e15:.1f} PB")Example 2: Chat Application (Like Slack/Discord)
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
# Chat Application Storage Estimation # User assumptionsmonthly_active_users = 20_000_000 # 20M MAUdau_mau_ratio = 0.65 # High engagementdaily_active_users = monthly_active_users * dau_mau_ratio # Message patternsmessages_per_user_per_day = 40 # Active messengersdirect_message_ratio = 0.4 # 40 % DMs, 60 % channels # Message data modeltext_message = { "message_id": 16, # Snowflake ID "channel_id": 16, # Where sent "user_id": 16, "content": 500, # Average message(incl.emojis, links) "created_at": 8, "edited_at": 8, "attachments_meta": 50, # References to files "reactions_count": 4, "is_pinned": 1, "thread_id": 16, # If reply}bytes_per_message = sum(text_message.values()) # File attachments(images, documents)messages_with_attachments_ratio = 0.15 # 15 % have attachmentsaverage_attachment_size_mb = 2 # Reactions(separate table for many - to - many) reactions_per_message = 0.5 # Average reactionsreaction_record_bytes = 32 # message_id + user_id + emoji # Daily calculationsdaily_messages = daily_active_users * messages_per_user_per_daydaily_text_storage = daily_messages * bytes_per_messagedaily_attachment_storage = daily_messages * messages_with_attachments_ratio * average_attachment_size_mb * 1e6daily_reaction_storage = daily_messages * reactions_per_message * reaction_record_bytes print(f"Daily messages: {daily_messages/1e6:.0f}M")print(f"Daily text storage: {daily_text_storage/1e9:.1f} GB")print(f"Daily attachment storage: {daily_attachment_storage/1e9:.1f} GB")print(f"Daily reaction storage: {daily_reaction_storage/1e9:.2f} GB") total_daily = daily_text_storage + daily_attachment_storage + daily_reaction_storageprint(f"Total daily storage: {total_daily/1e9:.1f} GB") # Retention: Keep messages forever, but tier attachments# With 3x replicationannual_storage_tb = total_daily * 365 * 3 / 1e12print(f"Annual storage (replicated): {annual_storage_tb:.0f} TB")Example 3: E-Commerce Platform (Like Amazon)
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
# E - Commerce Platform Storage Estimation # Business scaleproducts = 500_000_000 # 500M product listingsdaily_orders = 50_000_000 # 50M orders / daydaily_product_views = 5_000_000_000 # 5B product views / day # Product catalog storageproduct_record = { "product_id": 8, "seller_id": 8, "title": 200, "description": 2000, "category_ids": 32, # Multiple categories "price": 8, "inventory": 4, "ratings_summary": 24, "attributes": 500, # JSON attributes "created_at": 8, "updated_at": 8,}bytes_per_product = sum(product_record.values()) # Product images(separate storage)images_per_product = 7 # Main + galleryimage_size_all_versions_mb = 3 # All resolutions # Product catalog totalcatalog_data = products * bytes_per_productcatalog_images = products * images_per_product * image_size_all_versions_mb * 1e6print(f"Catalog data: {catalog_data/1e12:.1f} TB")print(f"Catalog images: {catalog_images/1e15:.1f} PB") # Order storageorder_record = { "order_id": 16, "user_id": 8, "total": 8, "status": 1, "payment_status": 1, "shipping_address": 500, "created_at": 8, "updated_at": 8,}order_item_record = { "order_item_id": 16, "order_id": 16, "product_id": 8, "quantity": 4, "price": 8,}bytes_per_order = sum(order_record.values())bytes_per_order_item = sum(order_item_record.values())items_per_order = 3 # Average daily_order_storage = daily_orders * (bytes_per_order + items_per_order * bytes_per_order_item)print(f"Daily order storage: {daily_order_storage/1e9:.1f} GB") # View / click history for recommendationsview_event = 50 # bytes per eventdaily_view_storage = daily_product_views * view_eventprint(f"Daily view event storage: {daily_view_storage/1e12:.1f} TB") # Retain views for 90 days(recommendation training)rolling_view_storage = daily_view_storage * 90print(f"90-day view history: {rolling_view_storage/1e15:.1f} PB")You now have a comprehensive framework for estimating storage requirements. Let's consolidate the key principles:
| Formula | Usage |
|---|---|
| Raw Storage = Objects/Day × Size × Retention Days | Baseline calculation |
| Actual Storage = Raw × Replication (3x) × Overhead (1.5x) | Production sizing |
| Annual Cost = Storage GB × Tier Rate × 12 | Budget planning |
| Savings = Flat Cost - Tiered Cost | Optimization opportunity |
What's Next:
With traffic and storage estimation complete, we move to bandwidth estimation—calculating the network capacity needed to deliver your data to users. Bandwidth connects traffic (requests per second) with storage (data transferred per request) to determine network infrastructure requirements.
You now understand how to estimate storage for any system. Practice by analyzing products you use: How much data does a single Instagram post consume? How much storage does Netflix need for its movie catalog? Building this intuition makes system design interviews significantly easier.