System Design (HLD)Netflix Streaming

Netflix Streaming Service Design

LevelAdvanced

Duration180 mins

TopicNetflix Streaming

2 / 6

Content Delivery Architecture

The Physics of Global Video Delivery

Delivering video at Netflix scale confronts fundamental physics constraints. Light in fiber optics travels at approximately 200,000 kilometers per second—about two-thirds the speed of light in vacuum. A round trip from New York to Tokyo covers roughly 22,000 kilometers, requiring a minimum of 110 milliseconds just for the signal to travel—before any server processing, queueing, or content delivery.

This latency is immutable. No amount of engineering can make electrons travel faster. The only solution is to bring content closer to users.

Furthermore, streaming 50+ terabits per second from centralized data centers would require backbone network capacity that simply doesn't exist. The internet's core network would collapse. The only feasible architecture pushes content to the edge—as close to end users as physically possible.

This page explores how Netflix architected a content delivery system that effectively shrinks the globe, placing content within milliseconds of virtually every subscriber while maintaining consistency, freshness, and cost efficiency.

What You Will Learn

This page covers the complete content delivery architecture: multi-tier topology (origin, regional, edge), traffic routing strategies, cache warming and eviction, consistency models for distributed content, and how Netflix coordinates thousands of edge locations to deliver seamless playback globally.

Multi-Tier Content Architecture

Netflix's content delivery follows a hierarchical architecture where content flows from origin storage through multiple tiers before reaching end users. Each tier serves a specific purpose in the content distribution pipeline.

Converting Mermaid diagram...

Architecture Tiers Explained

•Origin Layer (AWS S3) — The authoritative source of all encoded content. Stores ~750 petabytes of video across all quality profiles, audio tracks, and subtitle variants. Highly durable (11 nines) but high-latency access. Never streams directly to users.
•Regional Fill Layer — Mid-tier caches in major geographic regions. Pull content from origin on demand. Aggregate requests from edge locations to reduce origin load. Typically 3-5 global locations with high-bandwidth origin connectivity.
•Edge Layer (Open Connect) — Netflix's proprietary CDN with 15,000+ servers in 1,000+ locations. These are as close to users as possible—typically within ISP networks or at Internet Exchange Points (IXPs). This is where most content is served from.
•ISP Embedded Caches — For the largest ISPs, Netflix places servers directly inside their networks. Zero internet transit—content stays within the ISP network. Optimal latency and bandwidth, lowest cost for both parties.

Why Not Just Use a CDN?

Netflix originally used Akamai and other third-party CDNs. In 2012, they built Open Connect because (1) video delivery has unique requirements vs. web content, (2) the scale made CDN costs astronomical, (3) customization was needed for adaptive streaming optimization, and (4) ISP partnership opportunities required direct relationships. Open Connect now handles >95% of Netflix traffic.

Origin Storage and Content Preparation

The origin layer is where content begins its journey. This isn't just storage—it's a sophisticated content preparation and management system that transforms raw master files into optimized delivery formats.

Content Preparation Pipeline

•Master Ingest — Studios deliver raw master files (typically ProRes or DNxHD) at 4K+ resolution. A single master can be 500GB-1TB. These are stored in 'vault' storage with maximum durability.
•Quality Control — Automated and manual QC checks for technical issues: audio sync, color accuracy, artifacts, subtitle timing. Content is held until QC passes.
•Encoding — Masters are transcoded to delivery formats. Each title produces 100+ files: 10 quality levels × 3 DRM systems × multiple audio tracks × multiple subtitle formats.
•Manifest Generation — Create streaming manifests (DASH/HLS) that describe the relationship between quality levels, segments, and files. These are the 'map' that players use.
•DRM Packaging — Content is encrypted with keys for each DRM system. Licenses are generated for the license server to issue on playback.
•Metadata Enrichment — Add chapter markers, skip intro timestamps, credit detection, audio descriptions, and other supplementary data.

Encoding Profiles (Typical Configuration)
Quality Level	Resolution	Bitrate Range	Use Case
Ultra Low	320p	150-300 Kbps	Severe bandwidth constraints
Mobile Low	480p	300-700 Kbps	Mobile on cellular
SD	720p	700-2000 Kbps	Standard definition streaming
HD Low	1080p	2-4 Mbps	HD on constrained bandwidth
HD High	1080p	4-8 Mbps	Full HD streaming
4K SDR	2160p	10-16 Mbps	4K standard dynamic range
4K HDR	2160p	15-25 Mbps	4K with HDR/Dolby Vision
4K HDR High	2160p	25-40 Mbps	Premium 4K on fiber connections

Per-Title Encoding:

Netflix pioneered per-title encoding—analyzing each piece of content individually to determine optimal bitrate ladders. A static shot of two people talking can look perfect at 1 Mbps. An action sequence with explosions needs 10 Mbps for equivalent quality.

Instead of one-size-fits-all encoding profiles, Netflix's system:

Analyzes content complexity per scene (motion, texture, dark scenes)
Runs thousands of encoding trials at different bitrate/quality combinations
Builds a custom encoding ladder that maximizes quality per bit for that specific title
Can achieve 50%+ bitrate savings over static encoding while maintaining quality

This is why Netflix can stream 4K over connections where competitors struggle with HD—they're more efficient per bit.

Encoding Economics

Per-title encoding is computationally expensive—$50-100+ in compute per title. But bandwidth savings over the content's lifetime dwarf encoding costs. A popular movie streamed 100 million times that saves 1 Mbps average = 100 million GB saved = millions of dollars in bandwidth.

Edge Server Design (Open Connect Appliances)

Open Connect Appliances (OCAs) are purpose-built servers optimized for video delivery. Unlike general-purpose CDN edge nodes, these machines are designed with a single mission: stream video as efficiently as possible.

OCA Hardware Specs

•Storage: 200-400 TB SSD (high-IOPS flash storage)
•Network: 100 Gbps NIC (some locations: 200-400 Gbps)
•CPU: Optimized for TLS termination and content serving
•Memory: 256-512 GB RAM (filesystem cache, hot content)
•Form Factor: Standard 2U-4U rack mount
•Power: ~500-800W under load

Software Stack

•OS: FreeBSD (optimized network stack)
•HTTP Server: Custom nginx-based server
•TLS: Hardware-accelerated AES-NI
•Storage: ZFS with SSD optimization
•Monitoring: Full telemetry to central control plane
•Updates: Atomic, remotely managed

Capacity Per OCA:

A single OCA can serve:

100+ Gbps of video traffic
30,000+ concurrent streams
Millions of requests per hour

With 15,000+ OCAs globally, Netflix has over 1,500 Tbps of edge capacity—more than most countries' total internet capacity.

Storage Tiering Within OCA:

Even within a single server, content is tiered:

Hot Tier (Memory): Most popular content (top 100 titles) kept in RAM. Instant serving.
Warm Tier (SSD): Popular content (top 10,000 files) on fast flash storage. Sub-millisecond access.
Cold Tier (or Missing): Rarely accessed content may not be on this OCA. Fetched from fill server on demand.

Effective cache hit rates exceed 95% for edge locations—meaning 95% of requests are served entirely from local storage with no upstream fetch.

Why FreeBSD?

Netflix chose FreeBSD for OCAs because of its superior network stack for high-throughput streaming, ZFS for reliable storage management, and BSD license allowing proprietary modifications. They've contributed extensively back to FreeBSD—Netflix engineers are core contributors to the FreeBSD network stack.

Intelligent Traffic Routing

With thousands of edge locations, Netflix must intelligently route each viewer to the optimal server. This isn't simple geographic proximity—the routing system considers dozens of factors to maximize quality while minimizing cost.

Routing Decision Factors

•Geographic Proximity — Closer servers generally mean lower latency. But 'close' is measured in network hops, not geographic distance.
•Server Load — A nearby server at 90% capacity is worse than a slightly farther server at 50% capacity. Load is measured in real-time.
•Network Path Quality — Historical and real-time measurements of packet loss, jitter, and throughput between client network and candidate servers.
•ISP Peering — Prefer paths that stay within the same ISP network or have good peering arrangements. Cross-ISP traffic is expensive and often slower.
•Content Availability — Not all content is on all servers. Prefer servers that already have the requested content cached.
•Client Capability — Route 4K-capable clients to servers with 4K content. Don't waste high-bandwidth server capacity on SD streams.

Steering Mechanisms:

Netflix uses multiple mechanisms to direct traffic:

1. DNS-Based Steering Initial server selection via DNS. Client resolves netflix.com and receives IP addresses of recommended CDN servers. DNS responses are customized per-request based on client IP's location and network.

2. HTTP Redirect Steering After initial connection, manifest files contain URLs pointing to specific CDN servers. The control plane can dynamically update these between segments to rebalance load or recover from failures.

3. Client-Side Selection The Netflix player can probe multiple candidate servers and select based on measured performance. This provides ultimate flexibility but adds complexity.

4. BGP Anycast (Limited Use) For some high-level routing, anycast IPs route to the topologically nearest server. Less precise but very fast for initial connection.

Real-Time Adaptation

Routing decisions aren't static. If a server becomes overloaded mid-stream, the player can transparently switch to another server between segments. Users never notice—they just experience uninterrupted playback. This 'server switching' happens millions of times per hour across the platform.

Proactive Cache Filling

Unlike reactive caching (fetch on demand), Netflix proactively pushes content to edge servers before users request it. This is possible because video streaming is uniquely predictable—new content is known in advance.

Predictive Content Placement

•New Release Pre-positioning — When a new show launches, content is pushed to all relevant edge locations 12-24 hours before release. First-minute viewers see cached content.
•Popularity-Based Replication — Content is replicated based on predicted regional demand. Korean dramas get more copies in Asian markets; Spanish content in Latin America.
•Time-of-Day Optimization — Off-peak hours (2-6 AM local) are used to fill caches for predicted evening demand. Night shift for bits, not humans.
•Viewing Prediction — ML models predict what each region will watch based on genres, demographics, seasonal patterns, and recommendation algorithms. Cache accordingly.
•Cascade Filling — Popular content propagates: Origin → Fill → Edge. Rarely-watched content stays at higher tiers, pulled on-demand only when requested.

Fill During Night Hours:

Netflix schedules most cache filling during local off-peak hours (typically 2-6 AM). Benefits:

Network Capacity — Internet backbone has abundant capacity at night. Filling is cheap and fast.
Edge Server Headroom — OCAs not busy streaming can dedicate resources to receiving fill traffic.
ISP Relationships — Using off-peak capacity is positive for ISP partnerships—Netflix isn't competing with daytime traffic.

Mathematical Modeling:

The cache filling system solves an optimization problem:

Minimize total egress from origin (bandwidth cost)
Maximize cache hit rate at edge (user experience)
Subject to storage constraints (not all content fits on all servers)
Subject to fill bandwidth constraints (can't transfer faster than links allow)

This is fundamentally a variant of the facility location problem with time-varying demand.

The Cold Start Problem

Even with proactive filling, unexpected virality creates problems. If a 5-year-old show suddenly trends on social media, edge caches may not have it. The system must handle cascading fill requests without overwhelming upstream tiers. Rate limiting and prioritization prevent 'thundering herd' effects.

Consistency, Freshness, and Content Updates

Unlike most distributed systems where consistency is critical, video content is immutable once encoded. Episode 5 of a show doesn't change. This simplifies consistency dramatically—but introduces other challenges around content updates, removals, and version management.

Content Lifecycle Events

•New Content Addition — Push to edge caches before release. Manifest updated to include new files. Atomic activation at release time.
•Content Removal — License expires or content pulled. Caches must stop serving. Handled via manifest invalidation—old files become unreachable even if bits remain on disk.
•Content Correction — Error discovered post-release (audio sync, subtitle error). New version encoded and pushed. Old version invalidated. Viewers may need to restart for new version.
•Metadata Update — Title, description, ratings change. These are in the control plane, not CDN. Propagate via API, not file replacement.
•Encoding Improvement — Re-encode with better codec (AV1 rollout). Gradual rollout with A/B testing. Old and new versions coexist during transition.

Content Addressing:

Netflix uses content-addressable storage at the edge. File names include content hash:

movie_12345_video_1080p_h264_hash_a7b3c9d2e1f4.mp4

Benefits of content addressing:

Deduplication — Identical content stored once regardless of how many manifests reference it
Cache Invalidation — Update manifest to point to new hash; old content naturally becomes unreachable
Corruption Detection — Hash mismatch indicates storage error; re-fetch from upstream
CDN Transparency — CDN doesn't need to understand content semantics; just serve files by name

Manifest Version Control:

Manifests (the 'table of contents' for adaptive streaming) are versioned separately from content. When anything changes:

Generate new manifest version
Push to edge caches (small file, fast propagation)
New playback sessions get new manifest
In-progress sessions continue with old manifest (graceful transition)
Old manifest expires after TTL (typically hours)

Eventual Consistency is Fine

For content delivery, strong consistency isn't needed. If an edge cache serves a slightly stale manifest for a few minutes, the impact is minimal. This allows aggressive caching (long TTLs) and lazy invalidation—accepting minutes of staleness for massive performance gains.

Resilience, Failover, and Disaster Recovery

With 15,000+ servers across 1,000+ locations, failures are constant—not exceptional. The architecture must treat failure as normal and maintain service through all but the most catastrophic scenarios.

Failure Modes Handled

•Single server failure — Automatic removal from rotation; traffic shifts to siblings
•Cluster failure — Traffic redirected to next-closest cluster
•ISP peering failure — Route via alternative ISP or public internet
•Regional outage — Serve from cross-region caches with higher latency
•Origin unavailability — Edge caches serve from local storage; no origin dependency for cached content
•DDoS attack — Anycast absorption, rate limiting, scrubbing centers

Resilience Mechanisms

•Active health checking — Control plane continuously probes every OCA
•Automatic drain — Unhealthy servers stop receiving traffic within seconds
•N+2 capacity planning — Every cluster has 2+ servers of headroom for failures
•Cross-cluster rebalancing — Traffic seamlessly shifts between clusters
•Graceful degradation — Drop to lower quality rather than fail completely
•Client retry logic — Players retry with alternative servers on failure

Chaos Engineering:

Netflix pioneered Chaos Engineering—intentionally injecting failures to verify resilience. Famous examples:

Chaos Monkey — Randomly terminates production instances to ensure services handle server failure
Chaos Kong — Simulates entire region failure to verify cross-region failover
Latency Monkey — Injects artificial latency to test timeout handling

For the CDN specifically:

Randomly pull servers from rotation to verify traffic redistribution
Simulate fill server failures to verify edge caches serve independently
Test content invalidation propagation speed

Recovery Time Objectives:

Failure Type	Detection Time	Recovery Time	User Impact
Single OCA	< 10 seconds	< 30 seconds	None (other servers in cluster)
Cluster	< 1 minute	< 2 minutes	Brief quality reduction
Region	< 5 minutes	< 10 minutes	Possible rebuffering
Origin	N/A	N/A	None (edge serves cached content)

Origin Independence

A key architectural principle: edge caches must not depend on origin for serving. Once content is cached, playback continues even if the origin is completely unavailable. Origin is only needed for cache fills—which can be delayed. This provides extraordinary resilience.

Economics of Content Delivery

Content delivery is Netflix's largest infrastructure cost after content licensing itself. Understanding the economics explains many architectural decisions.

Cost Components of Content Delivery
Component	Cost Driver	Optimization Strategy
Origin Egress	Per-GB charges from cloud	Maximize edge cache hits
Transit Bandwidth	Per-Mbps or per-GB to ISPs	ISP embedding, peering deals
Edge Hardware	Server purchase + refresh cycle	Maximize utilization, longer lifecycles
Colocation	Rack space + power	Optimize server density and power efficiency
Encoding Compute	CPU-hours for transcoding	Per-title optimization, efficient codecs
Operations	Personnel, tooling, monitoring	Automation, centralized control plane

ISP Embedding Economics:

When Netflix places servers inside an ISP's network:

For Netflix:

Eliminates transit costs (often $0.50-5.00 per Mbps)
Lower latency = better quality = happier subscribers
Reduced origin egress as content served locally

For ISPs:

Reduced backbone traffic (Netflix is 15%+ of their traffic)
Better customer experience = less churn
Free hardware (Netflix provides and maintains OCAs)

This is why Netflix offers Open Connect free to ISPs—it's still cheaper than paying transit. Major ISPs have hundreds of OCAs embedded in their networks.

Cost-Per-Stream Optimization:

All architectural decisions ultimately aim to minimize cost-per-stream while maintaining quality:

Better encoding → lower bitrate → lower bandwidth cost
Higher cache hit rate → less origin egress
ISP embedding → near-zero transit cost
Server efficiency → more streams per dollar of hardware

Netflix's effective cost per stream is estimated at fractions of a cent—remarkable for delivering gigabytes of video per viewing session.

Page Complete

You now understand Netflix's content delivery architecture—from origin storage through multi-tier caching to ISP-embedded edge servers. This infrastructure enables 200+ million subscribers to stream content with sub-second startup and minimal rebuffering. Next, we'll explore Open Connect CDN in detail—the custom CDN that makes this possible.

2 / 6

Loading learning content...

System Design (HLD)Netflix Streaming

Netflix Streaming Service Design

LevelAdvanced

Duration180 mins

TopicNetflix Streaming

2 / 6

Content Delivery Architecture

The Physics of Global Video Delivery

This latency is immutable. No amount of engineering can make electrons travel faster. The only solution is to bring content closer to users.

What You Will Learn

Multi-Tier Content Architecture

Converting Mermaid diagram...

Architecture Tiers Explained

•Origin Layer (AWS S3) — The authoritative source of all encoded content. Stores ~750 petabytes of video across all quality profiles, audio tracks, and subtitle variants. Highly durable (11 nines) but high-latency access. Never streams directly to users.
•Regional Fill Layer — Mid-tier caches in major geographic regions. Pull content from origin on demand. Aggregate requests from edge locations to reduce origin load. Typically 3-5 global locations with high-bandwidth origin connectivity.
•Edge Layer (Open Connect) — Netflix's proprietary CDN with 15,000+ servers in 1,000+ locations. These are as close to users as possible—typically within ISP networks or at Internet Exchange Points (IXPs). This is where most content is served from.
•ISP Embedded Caches — For the largest ISPs, Netflix places servers directly inside their networks. Zero internet transit—content stays within the ISP network. Optimal latency and bandwidth, lowest cost for both parties.

Why Not Just Use a CDN?

Origin Storage and Content Preparation

Content Preparation Pipeline

•Master Ingest — Studios deliver raw master files (typically ProRes or DNxHD) at 4K+ resolution. A single master can be 500GB-1TB. These are stored in 'vault' storage with maximum durability.
•Quality Control — Automated and manual QC checks for technical issues: audio sync, color accuracy, artifacts, subtitle timing. Content is held until QC passes.
•Encoding — Masters are transcoded to delivery formats. Each title produces 100+ files: 10 quality levels × 3 DRM systems × multiple audio tracks × multiple subtitle formats.
•Manifest Generation — Create streaming manifests (DASH/HLS) that describe the relationship between quality levels, segments, and files. These are the 'map' that players use.
•DRM Packaging — Content is encrypted with keys for each DRM system. Licenses are generated for the license server to issue on playback.
•Metadata Enrichment — Add chapter markers, skip intro timestamps, credit detection, audio descriptions, and other supplementary data.

Encoding Profiles (Typical Configuration)
Quality Level	Resolution	Bitrate Range	Use Case
Ultra Low	320p	150-300 Kbps	Severe bandwidth constraints
Mobile Low	480p	300-700 Kbps	Mobile on cellular
SD	720p	700-2000 Kbps	Standard definition streaming
HD Low	1080p	2-4 Mbps	HD on constrained bandwidth
HD High	1080p	4-8 Mbps	Full HD streaming
4K SDR	2160p	10-16 Mbps	4K standard dynamic range
4K HDR	2160p	15-25 Mbps	4K with HDR/Dolby Vision
4K HDR High	2160p	25-40 Mbps	Premium 4K on fiber connections

Per-Title Encoding:

Instead of one-size-fits-all encoding profiles, Netflix's system:

Analyzes content complexity per scene (motion, texture, dark scenes)
Runs thousands of encoding trials at different bitrate/quality combinations
Builds a custom encoding ladder that maximizes quality per bit for that specific title
Can achieve 50%+ bitrate savings over static encoding while maintaining quality

This is why Netflix can stream 4K over connections where competitors struggle with HD—they're more efficient per bit.

Encoding Economics

Edge Server Design (Open Connect Appliances)

OCA Hardware Specs

•Storage: 200-400 TB SSD (high-IOPS flash storage)
•Network: 100 Gbps NIC (some locations: 200-400 Gbps)
•CPU: Optimized for TLS termination and content serving
•Memory: 256-512 GB RAM (filesystem cache, hot content)
•Form Factor: Standard 2U-4U rack mount
•Power: ~500-800W under load

Software Stack

•OS: FreeBSD (optimized network stack)
•HTTP Server: Custom nginx-based server
•TLS: Hardware-accelerated AES-NI
•Storage: ZFS with SSD optimization
•Monitoring: Full telemetry to central control plane
•Updates: Atomic, remotely managed

Capacity Per OCA:

A single OCA can serve:

100+ Gbps of video traffic
30,000+ concurrent streams
Millions of requests per hour

With 15,000+ OCAs globally, Netflix has over 1,500 Tbps of edge capacity—more than most countries' total internet capacity.

Storage Tiering Within OCA:

Even within a single server, content is tiered:

Hot Tier (Memory): Most popular content (top 100 titles) kept in RAM. Instant serving.
Warm Tier (SSD): Popular content (top 10,000 files) on fast flash storage. Sub-millisecond access.
Cold Tier (or Missing): Rarely accessed content may not be on this OCA. Fetched from fill server on demand.

Effective cache hit rates exceed 95% for edge locations—meaning 95% of requests are served entirely from local storage with no upstream fetch.

Why FreeBSD?

Intelligent Traffic Routing

Routing Decision Factors

•Geographic Proximity — Closer servers generally mean lower latency. But 'close' is measured in network hops, not geographic distance.
•Server Load — A nearby server at 90% capacity is worse than a slightly farther server at 50% capacity. Load is measured in real-time.
•Network Path Quality — Historical and real-time measurements of packet loss, jitter, and throughput between client network and candidate servers.
•ISP Peering — Prefer paths that stay within the same ISP network or have good peering arrangements. Cross-ISP traffic is expensive and often slower.
•Content Availability — Not all content is on all servers. Prefer servers that already have the requested content cached.
•Client Capability — Route 4K-capable clients to servers with 4K content. Don't waste high-bandwidth server capacity on SD streams.

Steering Mechanisms:

Netflix uses multiple mechanisms to direct traffic:

3. Client-Side Selection The Netflix player can probe multiple candidate servers and select based on measured performance. This provides ultimate flexibility but adds complexity.

4. BGP Anycast (Limited Use) For some high-level routing, anycast IPs route to the topologically nearest server. Less precise but very fast for initial connection.

Real-Time Adaptation

Proactive Cache Filling

Predictive Content Placement

•New Release Pre-positioning — When a new show launches, content is pushed to all relevant edge locations 12-24 hours before release. First-minute viewers see cached content.
•Popularity-Based Replication — Content is replicated based on predicted regional demand. Korean dramas get more copies in Asian markets; Spanish content in Latin America.
•Time-of-Day Optimization — Off-peak hours (2-6 AM local) are used to fill caches for predicted evening demand. Night shift for bits, not humans.
•Viewing Prediction — ML models predict what each region will watch based on genres, demographics, seasonal patterns, and recommendation algorithms. Cache accordingly.
•Cascade Filling — Popular content propagates: Origin → Fill → Edge. Rarely-watched content stays at higher tiers, pulled on-demand only when requested.

Fill During Night Hours:

Netflix schedules most cache filling during local off-peak hours (typically 2-6 AM). Benefits:

Network Capacity — Internet backbone has abundant capacity at night. Filling is cheap and fast.
Edge Server Headroom — OCAs not busy streaming can dedicate resources to receiving fill traffic.
ISP Relationships — Using off-peak capacity is positive for ISP partnerships—Netflix isn't competing with daytime traffic.

Mathematical Modeling:

The cache filling system solves an optimization problem:

Minimize total egress from origin (bandwidth cost)
Maximize cache hit rate at edge (user experience)
Subject to storage constraints (not all content fits on all servers)
Subject to fill bandwidth constraints (can't transfer faster than links allow)

This is fundamentally a variant of the facility location problem with time-varying demand.

The Cold Start Problem

Consistency, Freshness, and Content Updates

Content Lifecycle Events

•New Content Addition — Push to edge caches before release. Manifest updated to include new files. Atomic activation at release time.
•Content Removal — License expires or content pulled. Caches must stop serving. Handled via manifest invalidation—old files become unreachable even if bits remain on disk.
•Content Correction — Error discovered post-release (audio sync, subtitle error). New version encoded and pushed. Old version invalidated. Viewers may need to restart for new version.
•Metadata Update — Title, description, ratings change. These are in the control plane, not CDN. Propagate via API, not file replacement.
•Encoding Improvement — Re-encode with better codec (AV1 rollout). Gradual rollout with A/B testing. Old and new versions coexist during transition.

Content Addressing:

Netflix uses content-addressable storage at the edge. File names include content hash:

movie_12345_video_1080p_h264_hash_a7b3c9d2e1f4.mp4

Benefits of content addressing:

Deduplication — Identical content stored once regardless of how many manifests reference it
Cache Invalidation — Update manifest to point to new hash; old content naturally becomes unreachable
Corruption Detection — Hash mismatch indicates storage error; re-fetch from upstream
CDN Transparency — CDN doesn't need to understand content semantics; just serve files by name

Manifest Version Control:

Manifests (the 'table of contents' for adaptive streaming) are versioned separately from content. When anything changes:

Generate new manifest version
Push to edge caches (small file, fast propagation)
New playback sessions get new manifest
In-progress sessions continue with old manifest (graceful transition)
Old manifest expires after TTL (typically hours)

Eventual Consistency is Fine

Resilience, Failover, and Disaster Recovery

Failure Modes Handled

•Single server failure — Automatic removal from rotation; traffic shifts to siblings
•Cluster failure — Traffic redirected to next-closest cluster
•ISP peering failure — Route via alternative ISP or public internet
•Regional outage — Serve from cross-region caches with higher latency
•Origin unavailability — Edge caches serve from local storage; no origin dependency for cached content
•DDoS attack — Anycast absorption, rate limiting, scrubbing centers

Resilience Mechanisms

•Active health checking — Control plane continuously probes every OCA
•Automatic drain — Unhealthy servers stop receiving traffic within seconds
•N+2 capacity planning — Every cluster has 2+ servers of headroom for failures
•Cross-cluster rebalancing — Traffic seamlessly shifts between clusters
•Graceful degradation — Drop to lower quality rather than fail completely
•Client retry logic — Players retry with alternative servers on failure

Chaos Engineering:

Netflix pioneered Chaos Engineering—intentionally injecting failures to verify resilience. Famous examples:

Chaos Monkey — Randomly terminates production instances to ensure services handle server failure
Chaos Kong — Simulates entire region failure to verify cross-region failover
Latency Monkey — Injects artificial latency to test timeout handling

For the CDN specifically:

Randomly pull servers from rotation to verify traffic redistribution
Simulate fill server failures to verify edge caches serve independently
Test content invalidation propagation speed

Recovery Time Objectives:

Failure Type	Detection Time	Recovery Time	User Impact
Single OCA	< 10 seconds	< 30 seconds	None (other servers in cluster)
Cluster	< 1 minute	< 2 minutes	Brief quality reduction
Region	< 5 minutes	< 10 minutes	Possible rebuffering
Origin	N/A	N/A	None (edge serves cached content)

Origin Independence

Economics of Content Delivery

Content delivery is Netflix's largest infrastructure cost after content licensing itself. Understanding the economics explains many architectural decisions.

Cost Components of Content Delivery
Component	Cost Driver	Optimization Strategy
Origin Egress	Per-GB charges from cloud	Maximize edge cache hits
Transit Bandwidth	Per-Mbps or per-GB to ISPs	ISP embedding, peering deals
Edge Hardware	Server purchase + refresh cycle	Maximize utilization, longer lifecycles
Colocation	Rack space + power	Optimize server density and power efficiency
Encoding Compute	CPU-hours for transcoding	Per-title optimization, efficient codecs
Operations	Personnel, tooling, monitoring	Automation, centralized control plane

ISP Embedding Economics:

When Netflix places servers inside an ISP's network:

For Netflix:

Eliminates transit costs (often $0.50-5.00 per Mbps)
Lower latency = better quality = happier subscribers
Reduced origin egress as content served locally

For ISPs:

Reduced backbone traffic (Netflix is 15%+ of their traffic)
Better customer experience = less churn
Free hardware (Netflix provides and maintains OCAs)

This is why Netflix offers Open Connect free to ISPs—it's still cheaper than paying transit. Major ISPs have hundreds of OCAs embedded in their networks.

Cost-Per-Stream Optimization:

All architectural decisions ultimately aim to minimize cost-per-stream while maintaining quality:

Better encoding → lower bitrate → lower bandwidth cost
Higher cache hit rate → less origin egress
ISP embedding → near-zero transit cost
Server efficiency → more streams per dollar of hardware

Netflix's effective cost per stream is estimated at fractions of a cent—remarkable for delivering gigabytes of video per viewing session.

Page Complete

2 / 6