System Design (HLD)YouTube Video Platform

Designing YouTube: A Video Platform at Planetary Scale

LevelAdvanced

Duration180 mins

TopicYouTube Video Platform

1 / 6

Requirements: Upload, Transcode, Stream

Designing for Planetary-Scale Video Consumption

YouTube processes over 500 hours of video uploaded every minute and serves more than 1 billion hours of video watched daily. Behind these staggering numbers lies one of the most sophisticated distributed systems ever built—a platform that must simultaneously handle creators uploading 4K content from mobile devices in remote locations and viewers streaming HDR video on smart TVs in metropolitan centers.

Designing a video platform at YouTube's scale isn't merely an exercise in handling large files. It's a masterclass in distributed systems, signal processing, content delivery networks, machine learning, and human-computer interaction. Every architectural decision ripples through the entire stack, affecting upload success rates, transcoding costs, streaming quality, and ultimately, user engagement.

Before diving into architecture, we must crystallize what we're building. This page establishes the comprehensive requirements that will guide every subsequent design decision.

What You Will Learn

By the end of this page, you will understand how to systematically decompose a video platform's requirements into functional features, quality attributes, and scale constraints. You'll learn to think like a Principal Engineer who must anticipate edge cases, quantify expectations, and establish measurable success criteria before a single line of code is written.

Understanding the Problem Domain

A video platform is fundamentally a content lifecycle management system with three distinct phases:

Ingestion — Receiving video content from creators, ranging from smartphone recordings to professional studio output
Processing — Transforming raw uploads into optimized formats suitable for diverse playback environments
Distribution — Delivering processed content to viewers with minimal latency and maximum quality

Each phase presents unique challenges that compound when operating at planetary scale. Let's examine the stakeholders and their expectations to understand the full scope of what we're building.

Stakeholder Requirements Analysis
Stakeholder	Primary Needs	Quality Expectations	Scale Context
Content Creators	Reliable uploads, processing feedback, quality preservation	Upload success rate > 99.9%, processing < 30 min for 1-hour video	500+ hours uploaded per minute globally
Viewers	Instant playback, adaptive quality, seamless experience	Time-to-first-byte < 200ms, zero buffering on stable connections	1+ billion hours watched daily
Advertisers	Accurate targeting, brand safety, viewability	Ad delivery latency < 100ms, 99.99% insertion success rate	Billions of ad impressions daily
Platform Operators	Cost efficiency, observability, compliance	Processing cost < $0.01 per minute of video, 99.95% uptime	Exabytes of storage, petabytes of daily egress
Regulators	Content moderation, data privacy, accessibility	Policy violation detection < 24 hours, GDPR compliance	Global regulatory coverage across 100+ countries

The Scale Amplification Effect

At YouTube's scale, even rare edge cases occur millions of times daily. A 0.1% upload failure rate means 720+ failed uploads every minute. A 0.01% transcoding error affects 4,300+ videos daily. Requirements must account for these 'rare' scenarios as first-class concerns.

Functional Requirements: Core Features

Functional requirements define what the system must do. For a video platform, these span the entire content lifecycle plus supporting capabilities that enable the core experience.

Video Upload Capabilities

•Multi-source ingestion — Accept uploads from web browsers, mobile apps, desktop applications, and programmatic APIs. Support direct upload from cloud storage providers (Google Drive, Dropbox).
•Large file handling — Support videos up to 256GB and 12 hours in duration. Handle file sizes that exceed available client memory through streaming uploads.
•Resumable uploads — Enable interrupted uploads to resume from the last successful byte, critical for large files on unstable connections. Maintain upload state for at least 7 days.
•Format acceptance — Accept 30+ container formats (MP4, MOV, AVI, MKV, WebM, etc.) and 20+ codecs (H.264, H.265, VP9, AV1, ProRes, etc.). Validate files before processing begins.
•Metadata capture — Collect title, description, tags, category, privacy settings, monetization preferences, and scheduling options. Support programmatic metadata via API.
•Upload progress tracking — Provide real-time progress updates to clients with estimated completion time. Report processing stage transitions (uploading → processing → live).

Video Processing Capabilities

•Multi-resolution transcoding — Generate renditions at 144p, 240p, 360p, 480p, 720p, 1080p, 1440p, 2160p (4K), and potentially 4320p (8K). Each resolution requires multiple bitrate variants.
•Adaptive bitrate encoding — Create multiple bitrate versions at each resolution for adaptive streaming. Typical ladder: 6-12 quality levels per video.
•Audio processing — Transcode audio to AAC, Opus, and spatial audio formats. Support stereo, 5.1 surround, and Dolby Atmos. Normalize loudness per broadcast standards.
•Thumbnail generation — Auto-generate 3-5 candidate thumbnails at key frames. Allow creator upload of custom thumbnails. Generate animated previews for hover states.
•Caption/subtitle processing — Auto-generate captions using speech recognition. Support uploaded captions in 20+ formats (SRT, VTT, TTML, etc.). Enable community-contributed translations.
•Content analysis — Perform automated content moderation, copyright detection (Content ID), brand safety classification, and category inference. Extract video-level features for recommendations.

Video Streaming Capabilities

•Adaptive bitrate streaming — Dynamically adjust quality based on available bandwidth using HLS, DASH, or proprietary protocols. Support seamless quality switching mid-playback.
•Instant playback — Begin playback within 200ms of user intent. Prefetch initial segments based on predicted viewing behavior.
•Seek support — Enable instant seeking to any position in the video. Support keyframe-aligned seeking for efficiency and frame-accurate seeking for precision.
•Live streaming — Support real-time ingestion with <10 second end-to-end latency for standard live and <3 second for ultra-low latency mode.
•Playback controls — Variable speed playback (0.25x to 2x), looping, picture-in-picture, background audio, and Chromecast/AirPlay casting.
•Offline viewing — Enable authorized downloads for offline playback on mobile devices. Enforce DRM and expiration policies.

Functional Requirements: Supporting Features

Beyond the core upload-process-stream lifecycle, a production video platform requires extensive supporting functionality that enables monetization, discovery, engagement, and compliance.

Discovery & Organization

•Search — Full-text search across titles, descriptions, captions. Filter by duration, upload date, quality, features.
•Recommendations — Personalized video suggestions on homepage, sidebar, and end screens.
•Channels & Playlists — Creator pages with subscription management. User-created and auto-generated playlists.
•Categories & Topics — Hierarchical categorization for browsing. Trending and popular aggregations.

Engagement & Social

•Comments — Threaded discussions with moderation tools. Reply notifications and pinning.
•Reactions — Likes/dislikes with privacy options. View count tracking.
•Sharing — Embed codes, social sharing, timestamped links. Private link sharing.
•Subscriptions — Channel following with notification preferences. Feed customization.

Monetization

•Ad insertion — Pre-roll, mid-roll, post-roll placements. Skippable and non-skippable formats.
•Subscription tiers — Premium ad-free experience. Channel memberships with perks.
•Revenue sharing — Creator payments based on watch time, engagement, and ad performance.
•Merchandise & Super Chat — Integrated product shelves. Live stream monetary interactions.

Trust & Safety

•Content moderation — Automated policy violation detection. Human review escalation.
•Copyright protection — Content ID for rights management. DMCA takedown processing.
•Age restriction — Age-gating for mature content. Kids mode with COPPA compliance.
•Geoblocking — Region-based availability controls. Regulatory compliance per jurisdiction.

Feature Prioritization for MVP

For an initial design or interview scenario, focus on the core lifecycle (upload, transcode, stream) plus essential discovery (search, recommendations). Engagement features, monetization, and moderation can be addressed as follow-up extensions.

Non-Functional Requirements: Quality Attributes

Non-functional requirements define how well the system must perform its functions. For a video platform, these quality attributes are often the differentiating factor between a hobby project and a production system.

Non-Functional Requirements Matrix
Attribute	Upload Phase	Processing Phase	Streaming Phase	Rationale
Availability	99.9% (8.76h downtime/year)	99.5% (1.83 days downtime/year)	99.99% (52.6 min downtime/year)	Streaming is user-facing; processing can be deferred; uploads should rarely fail
Latency	Time-to-first-byte < 100ms	Processing time < 1.5x realtime	Time-to-first-frame < 200ms	Users expect instant feedback at every stage
Throughput	1 PB/day ingestion	50,000 concurrent transcoding jobs	1 billion streams/day	Scale requirements based on actual YouTube metrics
Durability	99.999999999% (11 nines)	99.9999999% (9 nines) during processing	99.99999% (7 nines) for encoded assets	Source videos are irreplaceable; derived assets can be regenerated
Consistency	Eventual (minutes acceptable)	Strong within workflow	Eventual (seconds acceptable)	Metadata propagation can lag; playback requires current state

Performance Requirements

•Upload speed — Saturate available bandwidth. Support parallel chunk uploads. Adapt chunk size based on connection quality (1MB to 32MB).
•Transcoding speed — Process faster than realtime. A 10-minute video should complete processing in under 15 minutes. Prioritize initial lower resolutions for quick availability.
•Playback startup — First frame visible in < 200ms. Achieve through prefetching, optimal initial bitrate selection, and edge caching.
•Adaptive switching — Quality transitions within 1 segment boundary (typically 2-6 seconds). No visible artifacts during switch.
•Seek latency — Random seek completes in < 500ms. Keyframe-aligned seeks in < 200ms.

Scalability Requirements

•Horizontal scalability — All components must scale horizontally. No single points of capacity limitation. Add capacity by adding nodes.
•Elastic scaling — Auto-scale based on load. Handle 10x traffic spikes (viral content, major events) without degradation.
•Geographic distribution — Serve users from nearest edge location. Maintain presence in all major geographic regions.
•Multi-tenancy efficiency — Share infrastructure across millions of channels. Isolate resource-intensive operations.

Back-of-Envelope Scale Estimation

Before designing the architecture, we must quantify the scale we're targeting. These estimates inform capacity planning, storage requirements, and cost projections.

Scale Calculations
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// ===============================================
// DAILY UPLOAD VOLUME
// ===============================================
Uploads per minute:       500 hours = 30,000 minutes of video
Daily uploads:            30,000 min × 60 min × 24 hr = 43,200,000 minutes/day
Average video length:     ~10 minutes
Videos per day:           ~4.3 million new videos
 
// ===============================================
// STORAGE REQUIREMENTS - RAW UPLOADS
// ===============================================
Average raw file size:    ~1 GB per 10-minute video (varies wildly)
Daily ingestion:          4.3M videos × 1 GB = 4.3 PB/day
Yearly ingestion:         4.3 PB × 365 = ~1.6 EB/year (raw only)
 
// ===============================================
// STORAGE REQUIREMENTS - PROCESSED VIDEOS
// ===============================================
Encoded versions per video:
  - 8 resolutions × 3 bitrates = 24 video tracks
  - 2 audio tracks (stereo + spatial)
  - Thumbnails, captions, metadata
 
Storage multiplier:       ~3-5x raw size (encoding efficiency varies)
Total daily storage:      4.3 PB × 4 = ~17 PB/day processed
Total storage (1 year):   ~6 EB (plus historical content)
 
// ===============================================
// STREAMING BANDWIDTH
// ===============================================
Watch hours per day:      1 billion hours
Average bitrate:          ~4 Mbps (mix of mobile/desktop/TV)
Concurrent streams (peak): ~50 million simultaneous
 
Daily egress:             1B hours × 3600 sec × 4 Mbps / 8 = ~1.8 EB/day
Peak bandwidth:           50M × 4 Mbps = 200 Pbps aggregate
 
// ===============================================
// TRANSCODING COMPUTE
// ===============================================
Videos to process:        4.3M/day = ~50 videos/second
Minutes to encode:        43.2M minutes/day
Encoding time per minute: ~0.5 CPU-hours (at medium quality)
Daily compute need:       43.2M × 0.5 = 21.6 million CPU-hours
Concurrent encoding jobs: ~900,000 (spread across 24 hours)

Interview Tip: Order of Magnitude

In interviews, focus on orders of magnitude rather than precise numbers. Know that YouTube handles 'petabytes daily' for uploads and 'exabytes' for streaming egress. Precise numbers change; the scale categories remain relevant for architectural decisions.

Scale Summary for Architecture Planning
Dimension	Approximate Scale	Architectural Implication
Daily video uploads	~4-5 million videos	Massive parallel processing; async workflows
Daily data ingestion	~4 PB raw uploads	Distributed storage; chunked uploads mandatory
Total storage	~10+ EB	Cold storage tiers; intelligent caching
Daily egress	~1-2 EB	Global CDN essential; edge computing
Peak concurrent streams	~50 million	Massive horizontal scaling; predictive caching
Transcoding jobs	~50/second	Distributed job queues; priority scheduling

Constraints and Assumptions

Every design operates within constraints—limitations imposed by technology, business, or physics. Articulating these upfront prevents wasted effort on impossible solutions and clarifies the design space.

Technical Constraints

•Video codecs are compute-intensive — Encoding 1 minute of 4K video at high quality requires 10-30 minutes of CPU time. Hardware encoders trade quality for speed.
•Network bandwidth varies wildly — Users range from 100 Mbps fiber to 100 Kbps mobile. Design must adapt to 1000x variance.
•Storage costs dominate — At EB scale, even pennies per GB become millions in cost. Tiered storage and intelligent eviction are mandatory.
•CDN edge capacity is finite — Hot content must be cached; long-tail content served from origin. Cache hit ratio critical for cost and performance.
•Global latency floor — Speed of light limits cross-continental latency to ~100ms minimum. Edge presence is the only solution.

Business Constraints

•Cost sensitivity — Processing and delivery costs must be sustainable against ad revenue. Target < $0.001 per stream minute.
•Time-to-market for new content — Creators expect videos available within 1 hour of upload. Faster = competitive advantage.
•Regulatory compliance — GDPR in EU, COPPA for kids, local laws in 100+ countries. Cannot ignore any major market.
•Rights management — Copyright holders demand Content ID-level protection. Legal exposure without it is existential.
•Accessibility requirements — Captions required for accessibility. Auto-generation expected for all content.

Working Assumptions

•We operate a global cloud infrastructure — Multi-region data centers, managed storage services, and global CDN are available. Not building from bare metal.
•Users have accounts — Authentication and authorization exist. Focus on video-specific features, not identity systems.
•Payment infrastructure exists — For monetization features, assume billing and payment rails are available.
•Mobile and web clients exist — Focus on backend architecture; client implementation is out of scope.
•Machine learning capabilities available — For recommendations and content analysis, assume ML infrastructure exists. Focus on integration, not model training.

Success Metrics and Service Level Objectives

Requirements are only meaningful if we can measure compliance. Service Level Objectives (SLOs) translate requirements into measurable targets that engineering teams can monitor, alert on, and optimize toward.

Service Level Objectives by Domain
Domain	Metric	SLO Target	Measurement Method
Upload	Upload success rate	≥ 99.9%	(Successful uploads / Attempted uploads) over 24h window
Upload	Time to upload confirmation	< 100ms p99	Server acknowledgment latency for final chunk
Processing	Processing completion rate	≥ 99.5%	(Videos processed / Videos uploaded) within 24h
Processing	Time to first rendition	< 10 min p95	Duration from upload complete to first playable version
Processing	Full processing time	< 2x realtime p95	Duration from upload to all renditions available
Streaming	Playback availability	≥ 99.99%	(Successful stream starts / Attempted plays) over 1h window
Streaming	Time to first frame	< 200ms p95	Duration from play intent to first frame rendered
Streaming	Rebuffering ratio	< 0.5% p95	(Buffer time / Total watch time) per session
Streaming	Video quality score	≥ 4.0/5.0 MOS	Automated quality assessment (VMAF) of delivered video

SLOs Drive Architecture

Notice how SLOs like '99.99% playback availability' directly mandate redundancy strategies, failover mechanisms, and geographic distribution. The architecture emerges from the requirements—not the other way around.

Business Success Metrics

•Creator satisfaction — Upload completion rate, processing time, monetization success. Measured via creator surveys and retention.
•Viewer engagement — Watch time, session duration, return rate. The ultimate measure of platform value.
•Monetization efficiency — Ad fill rate, CPM trends, subscriber conversion. Sustains the business model.
•Operating cost — Cost per upload minute, cost per stream minute, storage cost per GB-month. Ensures sustainability.
•Platform health — Content policy violation rate, copyright claim rate, moderation accuracy. Protects brand and legal exposure.

Requirements Summary and Next Steps

We've established a comprehensive requirements foundation for our video platform design. Let's consolidate the key takeaways before moving to architecture.

Key Takeaways

•Three core phases — Upload, transcode, and stream form the content lifecycle. Each has distinct requirements and SLOs.
•Multi-stakeholder system — Creators, viewers, advertisers, operators, and regulators all have requirements that must be balanced.
•Scale is the primary challenge — Millions of uploads, exabytes of storage, billions of streams. Every decision must work at planetary scale.
•Quality attributes vary by phase — Streaming demands 99.99% availability; processing can tolerate 99.5%. Design accordingly.
•Constraints shape solutions — Compute costs, bandwidth variance, and regulatory requirements eliminate many theoretical solutions.
•SLOs must be measurable — If you can't measure it, you can't manage it. Every requirement needs a quantifiable target.

What's next:

With requirements established, we'll dive into the Video Upload Pipeline in the next page. We'll explore how to accept files from diverse sources, handle interruptions gracefully, validate content, and trigger downstream processing—all while maintaining the reliability and performance our SLOs demand.

Page Complete

You now understand the comprehensive requirements for designing a YouTube-scale video platform. These requirements will guide every architectural decision in the pages that follow, from upload handling to CDN integration.

1 / 6

Loading learning content...

System Design (HLD)YouTube Video Platform

Designing YouTube: A Video Platform at Planetary Scale

LevelAdvanced

Duration180 mins

TopicYouTube Video Platform

1 / 6

Requirements: Upload, Transcode, Stream

Designing for Planetary-Scale Video Consumption

Before diving into architecture, we must crystallize what we're building. This page establishes the comprehensive requirements that will guide every subsequent design decision.

What You Will Learn

Understanding the Problem Domain

A video platform is fundamentally a content lifecycle management system with three distinct phases:

Ingestion — Receiving video content from creators, ranging from smartphone recordings to professional studio output
Processing — Transforming raw uploads into optimized formats suitable for diverse playback environments
Distribution — Delivering processed content to viewers with minimal latency and maximum quality

Each phase presents unique challenges that compound when operating at planetary scale. Let's examine the stakeholders and their expectations to understand the full scope of what we're building.

Stakeholder Requirements Analysis
Stakeholder	Primary Needs	Quality Expectations	Scale Context
Content Creators	Reliable uploads, processing feedback, quality preservation	Upload success rate > 99.9%, processing < 30 min for 1-hour video	500+ hours uploaded per minute globally
Viewers	Instant playback, adaptive quality, seamless experience	Time-to-first-byte < 200ms, zero buffering on stable connections	1+ billion hours watched daily
Advertisers	Accurate targeting, brand safety, viewability	Ad delivery latency < 100ms, 99.99% insertion success rate	Billions of ad impressions daily
Platform Operators	Cost efficiency, observability, compliance	Processing cost < $0.01 per minute of video, 99.95% uptime	Exabytes of storage, petabytes of daily egress
Regulators	Content moderation, data privacy, accessibility	Policy violation detection < 24 hours, GDPR compliance	Global regulatory coverage across 100+ countries

The Scale Amplification Effect

Functional Requirements: Core Features

Functional requirements define what the system must do. For a video platform, these span the entire content lifecycle plus supporting capabilities that enable the core experience.

Video Upload Capabilities

•Multi-source ingestion — Accept uploads from web browsers, mobile apps, desktop applications, and programmatic APIs. Support direct upload from cloud storage providers (Google Drive, Dropbox).
•Large file handling — Support videos up to 256GB and 12 hours in duration. Handle file sizes that exceed available client memory through streaming uploads.
•Resumable uploads — Enable interrupted uploads to resume from the last successful byte, critical for large files on unstable connections. Maintain upload state for at least 7 days.
•Format acceptance — Accept 30+ container formats (MP4, MOV, AVI, MKV, WebM, etc.) and 20+ codecs (H.264, H.265, VP9, AV1, ProRes, etc.). Validate files before processing begins.
•Metadata capture — Collect title, description, tags, category, privacy settings, monetization preferences, and scheduling options. Support programmatic metadata via API.
•Upload progress tracking — Provide real-time progress updates to clients with estimated completion time. Report processing stage transitions (uploading → processing → live).

Video Processing Capabilities

•Multi-resolution transcoding — Generate renditions at 144p, 240p, 360p, 480p, 720p, 1080p, 1440p, 2160p (4K), and potentially 4320p (8K). Each resolution requires multiple bitrate variants.
•Adaptive bitrate encoding — Create multiple bitrate versions at each resolution for adaptive streaming. Typical ladder: 6-12 quality levels per video.
•Audio processing — Transcode audio to AAC, Opus, and spatial audio formats. Support stereo, 5.1 surround, and Dolby Atmos. Normalize loudness per broadcast standards.
•Thumbnail generation — Auto-generate 3-5 candidate thumbnails at key frames. Allow creator upload of custom thumbnails. Generate animated previews for hover states.
•Caption/subtitle processing — Auto-generate captions using speech recognition. Support uploaded captions in 20+ formats (SRT, VTT, TTML, etc.). Enable community-contributed translations.
•Content analysis — Perform automated content moderation, copyright detection (Content ID), brand safety classification, and category inference. Extract video-level features for recommendations.

Video Streaming Capabilities

•Adaptive bitrate streaming — Dynamically adjust quality based on available bandwidth using HLS, DASH, or proprietary protocols. Support seamless quality switching mid-playback.
•Instant playback — Begin playback within 200ms of user intent. Prefetch initial segments based on predicted viewing behavior.
•Seek support — Enable instant seeking to any position in the video. Support keyframe-aligned seeking for efficiency and frame-accurate seeking for precision.
•Live streaming — Support real-time ingestion with <10 second end-to-end latency for standard live and <3 second for ultra-low latency mode.
•Playback controls — Variable speed playback (0.25x to 2x), looping, picture-in-picture, background audio, and Chromecast/AirPlay casting.
•Offline viewing — Enable authorized downloads for offline playback on mobile devices. Enforce DRM and expiration policies.

Functional Requirements: Supporting Features

Beyond the core upload-process-stream lifecycle, a production video platform requires extensive supporting functionality that enables monetization, discovery, engagement, and compliance.

Discovery & Organization

•Search — Full-text search across titles, descriptions, captions. Filter by duration, upload date, quality, features.
•Recommendations — Personalized video suggestions on homepage, sidebar, and end screens.
•Channels & Playlists — Creator pages with subscription management. User-created and auto-generated playlists.
•Categories & Topics — Hierarchical categorization for browsing. Trending and popular aggregations.

Engagement & Social

•Comments — Threaded discussions with moderation tools. Reply notifications and pinning.
•Reactions — Likes/dislikes with privacy options. View count tracking.
•Sharing — Embed codes, social sharing, timestamped links. Private link sharing.
•Subscriptions — Channel following with notification preferences. Feed customization.

Monetization

•Ad insertion — Pre-roll, mid-roll, post-roll placements. Skippable and non-skippable formats.
•Subscription tiers — Premium ad-free experience. Channel memberships with perks.
•Revenue sharing — Creator payments based on watch time, engagement, and ad performance.
•Merchandise & Super Chat — Integrated product shelves. Live stream monetary interactions.

Trust & Safety

•Content moderation — Automated policy violation detection. Human review escalation.
•Copyright protection — Content ID for rights management. DMCA takedown processing.
•Age restriction — Age-gating for mature content. Kids mode with COPPA compliance.
•Geoblocking — Region-based availability controls. Regulatory compliance per jurisdiction.

Feature Prioritization for MVP

Non-Functional Requirements: Quality Attributes

Non-Functional Requirements Matrix
Attribute	Upload Phase	Processing Phase	Streaming Phase	Rationale
Availability	99.9% (8.76h downtime/year)	99.5% (1.83 days downtime/year)	99.99% (52.6 min downtime/year)	Streaming is user-facing; processing can be deferred; uploads should rarely fail
Latency	Time-to-first-byte < 100ms	Processing time < 1.5x realtime	Time-to-first-frame < 200ms	Users expect instant feedback at every stage
Throughput	1 PB/day ingestion	50,000 concurrent transcoding jobs	1 billion streams/day	Scale requirements based on actual YouTube metrics
Durability	99.999999999% (11 nines)	99.9999999% (9 nines) during processing	99.99999% (7 nines) for encoded assets	Source videos are irreplaceable; derived assets can be regenerated
Consistency	Eventual (minutes acceptable)	Strong within workflow	Eventual (seconds acceptable)	Metadata propagation can lag; playback requires current state

Performance Requirements

•Upload speed — Saturate available bandwidth. Support parallel chunk uploads. Adapt chunk size based on connection quality (1MB to 32MB).
•Transcoding speed — Process faster than realtime. A 10-minute video should complete processing in under 15 minutes. Prioritize initial lower resolutions for quick availability.
•Playback startup — First frame visible in < 200ms. Achieve through prefetching, optimal initial bitrate selection, and edge caching.
•Adaptive switching — Quality transitions within 1 segment boundary (typically 2-6 seconds). No visible artifacts during switch.
•Seek latency — Random seek completes in < 500ms. Keyframe-aligned seeks in < 200ms.

Scalability Requirements

•Horizontal scalability — All components must scale horizontally. No single points of capacity limitation. Add capacity by adding nodes.
•Elastic scaling — Auto-scale based on load. Handle 10x traffic spikes (viral content, major events) without degradation.
•Geographic distribution — Serve users from nearest edge location. Maintain presence in all major geographic regions.
•Multi-tenancy efficiency — Share infrastructure across millions of channels. Isolate resource-intensive operations.

Back-of-Envelope Scale Estimation

Before designing the architecture, we must quantify the scale we're targeting. These estimates inform capacity planning, storage requirements, and cost projections.

Scale Calculations
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// ===============================================
// DAILY UPLOAD VOLUME
// ===============================================
Uploads per minute:       500 hours = 30,000 minutes of video
Daily uploads:            30,000 min × 60 min × 24 hr = 43,200,000 minutes/day
Average video length:     ~10 minutes
Videos per day:           ~4.3 million new videos
 
// ===============================================
// STORAGE REQUIREMENTS - RAW UPLOADS
// ===============================================
Average raw file size:    ~1 GB per 10-minute video (varies wildly)
Daily ingestion:          4.3M videos × 1 GB = 4.3 PB/day
Yearly ingestion:         4.3 PB × 365 = ~1.6 EB/year (raw only)
 
// ===============================================
// STORAGE REQUIREMENTS - PROCESSED VIDEOS
// ===============================================
Encoded versions per video:
  - 8 resolutions × 3 bitrates = 24 video tracks
  - 2 audio tracks (stereo + spatial)
  - Thumbnails, captions, metadata
 
Storage multiplier:       ~3-5x raw size (encoding efficiency varies)
Total daily storage:      4.3 PB × 4 = ~17 PB/day processed
Total storage (1 year):   ~6 EB (plus historical content)
 
// ===============================================
// STREAMING BANDWIDTH
// ===============================================
Watch hours per day:      1 billion hours
Average bitrate:          ~4 Mbps (mix of mobile/desktop/TV)
Concurrent streams (peak): ~50 million simultaneous
 
Daily egress:             1B hours × 3600 sec × 4 Mbps / 8 = ~1.8 EB/day
Peak bandwidth:           50M × 4 Mbps = 200 Pbps aggregate
 
// ===============================================
// TRANSCODING COMPUTE
// ===============================================
Videos to process:        4.3M/day = ~50 videos/second
Minutes to encode:        43.2M minutes/day
Encoding time per minute: ~0.5 CPU-hours (at medium quality)
Daily compute need:       43.2M × 0.5 = 21.6 million CPU-hours
Concurrent encoding jobs: ~900,000 (spread across 24 hours)

Interview Tip: Order of Magnitude

Scale Summary for Architecture Planning
Dimension	Approximate Scale	Architectural Implication
Daily video uploads	~4-5 million videos	Massive parallel processing; async workflows
Daily data ingestion	~4 PB raw uploads	Distributed storage; chunked uploads mandatory
Total storage	~10+ EB	Cold storage tiers; intelligent caching
Daily egress	~1-2 EB	Global CDN essential; edge computing
Peak concurrent streams	~50 million	Massive horizontal scaling; predictive caching
Transcoding jobs	~50/second	Distributed job queues; priority scheduling

Constraints and Assumptions

Technical Constraints

•Video codecs are compute-intensive — Encoding 1 minute of 4K video at high quality requires 10-30 minutes of CPU time. Hardware encoders trade quality for speed.
•Network bandwidth varies wildly — Users range from 100 Mbps fiber to 100 Kbps mobile. Design must adapt to 1000x variance.
•Storage costs dominate — At EB scale, even pennies per GB become millions in cost. Tiered storage and intelligent eviction are mandatory.
•CDN edge capacity is finite — Hot content must be cached; long-tail content served from origin. Cache hit ratio critical for cost and performance.
•Global latency floor — Speed of light limits cross-continental latency to ~100ms minimum. Edge presence is the only solution.

Business Constraints

•Cost sensitivity — Processing and delivery costs must be sustainable against ad revenue. Target < $0.001 per stream minute.
•Time-to-market for new content — Creators expect videos available within 1 hour of upload. Faster = competitive advantage.
•Regulatory compliance — GDPR in EU, COPPA for kids, local laws in 100+ countries. Cannot ignore any major market.
•Rights management — Copyright holders demand Content ID-level protection. Legal exposure without it is existential.
•Accessibility requirements — Captions required for accessibility. Auto-generation expected for all content.

Working Assumptions

•We operate a global cloud infrastructure — Multi-region data centers, managed storage services, and global CDN are available. Not building from bare metal.
•Users have accounts — Authentication and authorization exist. Focus on video-specific features, not identity systems.
•Payment infrastructure exists — For monetization features, assume billing and payment rails are available.
•Mobile and web clients exist — Focus on backend architecture; client implementation is out of scope.
•Machine learning capabilities available — For recommendations and content analysis, assume ML infrastructure exists. Focus on integration, not model training.

Success Metrics and Service Level Objectives

Service Level Objectives by Domain
Domain	Metric	SLO Target	Measurement Method
Upload	Upload success rate	≥ 99.9%	(Successful uploads / Attempted uploads) over 24h window
Upload	Time to upload confirmation	< 100ms p99	Server acknowledgment latency for final chunk
Processing	Processing completion rate	≥ 99.5%	(Videos processed / Videos uploaded) within 24h
Processing	Time to first rendition	< 10 min p95	Duration from upload complete to first playable version
Processing	Full processing time	< 2x realtime p95	Duration from upload to all renditions available
Streaming	Playback availability	≥ 99.99%	(Successful stream starts / Attempted plays) over 1h window
Streaming	Time to first frame	< 200ms p95	Duration from play intent to first frame rendered
Streaming	Rebuffering ratio	< 0.5% p95	(Buffer time / Total watch time) per session
Streaming	Video quality score	≥ 4.0/5.0 MOS	Automated quality assessment (VMAF) of delivered video

SLOs Drive Architecture

Business Success Metrics

•Creator satisfaction — Upload completion rate, processing time, monetization success. Measured via creator surveys and retention.
•Viewer engagement — Watch time, session duration, return rate. The ultimate measure of platform value.
•Monetization efficiency — Ad fill rate, CPM trends, subscriber conversion. Sustains the business model.
•Operating cost — Cost per upload minute, cost per stream minute, storage cost per GB-month. Ensures sustainability.
•Platform health — Content policy violation rate, copyright claim rate, moderation accuracy. Protects brand and legal exposure.

Requirements Summary and Next Steps

We've established a comprehensive requirements foundation for our video platform design. Let's consolidate the key takeaways before moving to architecture.

Key Takeaways

•Three core phases — Upload, transcode, and stream form the content lifecycle. Each has distinct requirements and SLOs.
•Multi-stakeholder system — Creators, viewers, advertisers, operators, and regulators all have requirements that must be balanced.
•Scale is the primary challenge — Millions of uploads, exabytes of storage, billions of streams. Every decision must work at planetary scale.
•Quality attributes vary by phase — Streaming demands 99.99% availability; processing can tolerate 99.5%. Design accordingly.
•Constraints shape solutions — Compute costs, bandwidth variance, and regulatory requirements eliminate many theoretical solutions.
•SLOs must be measurable — If you can't measure it, you can't manage it. Every requirement needs a quantifiable target.

What's next:

Page Complete

1 / 6