Loading learning content...
Delivering video at Netflix scale confronts fundamental physics constraints. Light in fiber optics travels at approximately 200,000 kilometers per second—about two-thirds the speed of light in vacuum. A round trip from New York to Tokyo covers roughly 22,000 kilometers, requiring a minimum of 110 milliseconds just for the signal to travel—before any server processing, queueing, or content delivery.
This latency is immutable. No amount of engineering can make electrons travel faster. The only solution is to bring content closer to users.
Furthermore, streaming 50+ terabits per second from centralized data centers would require backbone network capacity that simply doesn't exist. The internet's core network would collapse. The only feasible architecture pushes content to the edge—as close to end users as physically possible.
This page explores how Netflix architected a content delivery system that effectively shrinks the globe, placing content within milliseconds of virtually every subscriber while maintaining consistency, freshness, and cost efficiency.
This page covers the complete content delivery architecture: multi-tier topology (origin, regional, edge), traffic routing strategies, cache warming and eviction, consistency models for distributed content, and how Netflix coordinates thousands of edge locations to deliver seamless playback globally.
Netflix's content delivery follows a hierarchical architecture where content flows from origin storage through multiple tiers before reaching end users. Each tier serves a specific purpose in the content distribution pipeline.
Netflix originally used Akamai and other third-party CDNs. In 2012, they built Open Connect because (1) video delivery has unique requirements vs. web content, (2) the scale made CDN costs astronomical, (3) customization was needed for adaptive streaming optimization, and (4) ISP partnership opportunities required direct relationships. Open Connect now handles >95% of Netflix traffic.
The origin layer is where content begins its journey. This isn't just storage—it's a sophisticated content preparation and management system that transforms raw master files into optimized delivery formats.
| Quality Level | Resolution | Bitrate Range | Use Case |
|---|---|---|---|
| Ultra Low | 320p | 150-300 Kbps | Severe bandwidth constraints |
| Mobile Low | 480p | 300-700 Kbps | Mobile on cellular |
| SD | 720p | 700-2000 Kbps | Standard definition streaming |
| HD Low | 1080p | 2-4 Mbps | HD on constrained bandwidth |
| HD High | 1080p | 4-8 Mbps | Full HD streaming |
| 4K SDR | 2160p | 10-16 Mbps | 4K standard dynamic range |
| 4K HDR | 2160p | 15-25 Mbps | 4K with HDR/Dolby Vision |
| 4K HDR High | 2160p | 25-40 Mbps | Premium 4K on fiber connections |
Per-Title Encoding:
Netflix pioneered per-title encoding—analyzing each piece of content individually to determine optimal bitrate ladders. A static shot of two people talking can look perfect at 1 Mbps. An action sequence with explosions needs 10 Mbps for equivalent quality.
Instead of one-size-fits-all encoding profiles, Netflix's system:
This is why Netflix can stream 4K over connections where competitors struggle with HD—they're more efficient per bit.
Per-title encoding is computationally expensive—$50-100+ in compute per title. But bandwidth savings over the content's lifetime dwarf encoding costs. A popular movie streamed 100 million times that saves 1 Mbps average = 100 million GB saved = millions of dollars in bandwidth.
Open Connect Appliances (OCAs) are purpose-built servers optimized for video delivery. Unlike general-purpose CDN edge nodes, these machines are designed with a single mission: stream video as efficiently as possible.
Capacity Per OCA:
A single OCA can serve:
With 15,000+ OCAs globally, Netflix has over 1,500 Tbps of edge capacity—more than most countries' total internet capacity.
Storage Tiering Within OCA:
Even within a single server, content is tiered:
Effective cache hit rates exceed 95% for edge locations—meaning 95% of requests are served entirely from local storage with no upstream fetch.
Netflix chose FreeBSD for OCAs because of its superior network stack for high-throughput streaming, ZFS for reliable storage management, and BSD license allowing proprietary modifications. They've contributed extensively back to FreeBSD—Netflix engineers are core contributors to the FreeBSD network stack.
With thousands of edge locations, Netflix must intelligently route each viewer to the optimal server. This isn't simple geographic proximity—the routing system considers dozens of factors to maximize quality while minimizing cost.
Steering Mechanisms:
Netflix uses multiple mechanisms to direct traffic:
1. DNS-Based Steering
Initial server selection via DNS. Client resolves netflix.com and receives IP addresses of recommended CDN servers. DNS responses are customized per-request based on client IP's location and network.
2. HTTP Redirect Steering After initial connection, manifest files contain URLs pointing to specific CDN servers. The control plane can dynamically update these between segments to rebalance load or recover from failures.
3. Client-Side Selection The Netflix player can probe multiple candidate servers and select based on measured performance. This provides ultimate flexibility but adds complexity.
4. BGP Anycast (Limited Use) For some high-level routing, anycast IPs route to the topologically nearest server. Less precise but very fast for initial connection.
Routing decisions aren't static. If a server becomes overloaded mid-stream, the player can transparently switch to another server between segments. Users never notice—they just experience uninterrupted playback. This 'server switching' happens millions of times per hour across the platform.
Unlike reactive caching (fetch on demand), Netflix proactively pushes content to edge servers before users request it. This is possible because video streaming is uniquely predictable—new content is known in advance.
Fill During Night Hours:
Netflix schedules most cache filling during local off-peak hours (typically 2-6 AM). Benefits:
Mathematical Modeling:
The cache filling system solves an optimization problem:
This is fundamentally a variant of the facility location problem with time-varying demand.
Even with proactive filling, unexpected virality creates problems. If a 5-year-old show suddenly trends on social media, edge caches may not have it. The system must handle cascading fill requests without overwhelming upstream tiers. Rate limiting and prioritization prevent 'thundering herd' effects.
Unlike most distributed systems where consistency is critical, video content is immutable once encoded. Episode 5 of a show doesn't change. This simplifies consistency dramatically—but introduces other challenges around content updates, removals, and version management.
Content Addressing:
Netflix uses content-addressable storage at the edge. File names include content hash:
movie_12345_video_1080p_h264_hash_a7b3c9d2e1f4.mp4
Benefits of content addressing:
Manifest Version Control:
Manifests (the 'table of contents' for adaptive streaming) are versioned separately from content. When anything changes:
For content delivery, strong consistency isn't needed. If an edge cache serves a slightly stale manifest for a few minutes, the impact is minimal. This allows aggressive caching (long TTLs) and lazy invalidation—accepting minutes of staleness for massive performance gains.
With 15,000+ servers across 1,000+ locations, failures are constant—not exceptional. The architecture must treat failure as normal and maintain service through all but the most catastrophic scenarios.
Chaos Engineering:
Netflix pioneered Chaos Engineering—intentionally injecting failures to verify resilience. Famous examples:
For the CDN specifically:
Recovery Time Objectives:
| Failure Type | Detection Time | Recovery Time | User Impact |
|---|---|---|---|
| Single OCA | < 10 seconds | < 30 seconds | None (other servers in cluster) |
| Cluster | < 1 minute | < 2 minutes | Brief quality reduction |
| Region | < 5 minutes | < 10 minutes | Possible rebuffering |
| Origin | N/A | N/A | None (edge serves cached content) |
A key architectural principle: edge caches must not depend on origin for serving. Once content is cached, playback continues even if the origin is completely unavailable. Origin is only needed for cache fills—which can be delayed. This provides extraordinary resilience.
Content delivery is Netflix's largest infrastructure cost after content licensing itself. Understanding the economics explains many architectural decisions.
| Component | Cost Driver | Optimization Strategy |
|---|---|---|
| Origin Egress | Per-GB charges from cloud | Maximize edge cache hits |
| Transit Bandwidth | Per-Mbps or per-GB to ISPs | ISP embedding, peering deals |
| Edge Hardware | Server purchase + refresh cycle | Maximize utilization, longer lifecycles |
| Colocation | Rack space + power | Optimize server density and power efficiency |
| Encoding Compute | CPU-hours for transcoding | Per-title optimization, efficient codecs |
| Operations | Personnel, tooling, monitoring | Automation, centralized control plane |
ISP Embedding Economics:
When Netflix places servers inside an ISP's network:
For Netflix:
For ISPs:
This is why Netflix offers Open Connect free to ISPs—it's still cheaper than paying transit. Major ISPs have hundreds of OCAs embedded in their networks.
Cost-Per-Stream Optimization:
All architectural decisions ultimately aim to minimize cost-per-stream while maintaining quality:
Netflix's effective cost per stream is estimated at fractions of a cent—remarkable for delivering gigabytes of video per viewing session.
You now understand Netflix's content delivery architecture—from origin storage through multi-tier caching to ISP-embedded edge servers. This infrastructure enables 200+ million subscribers to stream content with sub-second startup and minimal rebuffering. Next, we'll explore Open Connect CDN in detail—the custom CDN that makes this possible.