Loading content...
In 2007, Netflix started streaming video using third-party CDNs like Akamai and Limelight. By 2012, as streaming exploded in popularity, Netflix was paying hundreds of millions annually for CDN services—and still facing quality limitations. The solution was radical: build a proprietary CDN from scratch.
Open Connect launched in 2012 and now delivers over 95% of Netflix traffic globally. It's not just a cost-saving measure—it's a strategic asset that enables quality and features impossible with generic CDNs.
This page examines Open Connect in depth: its architecture, deployment models, the control plane that orchestrates it, ISP partnership dynamics, and the technical innovations that make it the world's most efficient video delivery network.
Open Connect comprises 15,000+ servers across 1,000+ locations in 50+ countries. During peak hours, it accounts for approximately 15% of all downstream internet traffic in North America. A single Open Connect Appliance can serve 100+ Gbps—more bandwidth than entire data centers at many companies.
Building infrastructure from scratch is usually a mistake—buy don't build. Why did Netflix choose differently? The answer lies in video streaming's unique characteristics and Netflix's scale.
The Build vs Buy Calculation:
At 2012 scale (30 million subscribers, ~3% of today's traffic), the math was already compelling:
Third-Party CDN Cost:
Open Connect Cost:
With Netflix serving exabytes of content, Open Connect pays for itself multiple times over each year. The economics only improve with scale.
Open Connect makes sense at Netflix scale. For smaller streaming services, third-party CDNs remain the right choice. The crossover point is typically hundreds of petabytes per month—traffic levels only a handful of companies reach.
The Open Connect Appliance (OCA) is a purpose-built server optimized for a single task: serving video efficiently. Every component—hardware and software—is chosen for this specific workload.
| Generation | Year | Storage | Network | Key Improvement |
|---|---|---|---|---|
| Gen 1 | 2012 | 36 HDD (108 TB) | 10 Gbps | Initial deployment |
| Gen 2 | 2014 | 36 HDD (144 TB) | 40 Gbps | 4x network speedup |
| Gen 3 | 2016 | 36 SSD (216 TB) | 100 Gbps | SSD transition, 10x IOPS |
| Gen 4 | 2019 | 36 SSD (360 TB) | 100 Gbps | NVMe, larger SSDs |
| Gen 5 | 2022 | 36 SSD (400+ TB) | 200-400 Gbps | PCIe 4.0, next-gen NICs |
Software Stack Deep Dive:
Operating System: FreeBSD
Netflix chose FreeBSD over Linux for OCAs due to:
HTTP Server: Custom nginx Fork
Netflix forked nginx and heavily modified it:
TLS Termination:
Every video byte is encrypted. OCAs must terminate TLS at 100+ Gbps:
Content Storage:
Content is stored as simple files on ZFS:
OCAs are remarkably simple by design. No application databases, no complex state management, no microservices. Just receive HTTP requests, read files from disk, send bytes over network. This simplicity enables reliability—fewer components mean fewer failure modes.
Open Connect servers are deployed in three primary models, each serving different purposes and requirements. The choice of model depends on traffic volume, ISP relationships, and geographic needs.
Location Selection Criteria:
Netflix evaluates potential locations using multiple factors:
Traffic Analysis:
Network Quality:
Economic Factors:
Operational Considerations:
A location is added when the ROI is positive—when traffic volume justifies the capital expense and operational overhead.
Netflix provides OCAs to qualifying ISPs at no cost—no hardware fee, no licensing fee. ISPs provide rack space, power, and network connectivity. Both parties benefit: Netflix gets optimal delivery, ISPs reduce their backhaul traffic. This 'free equipment' model has driven rapid adoption across major ISPs worldwide.
While OCAs are simple content servers, the control plane that manages them is sophisticated. It handles steering decisions, cache management, health monitoring, and configuration across 15,000+ servers.
Steering Deep Dive:
When a Netflix player initiates playback, it doesn't know which OCA to connect to. The steering process:
Real-Time Adaptation:
Steering decisions aren't static. If an OCA becomes overloaded mid-session:
This all happens in seconds—users don't notice the orchestration.
If the control plane fails, new playback sessions can't be steered. But ongoing sessions continue—players already have URLs. The control plane is therefore built with extreme redundancy across multiple AWS regions. It's the most protected component in the entire Netflix infrastructure.
Optimal traffic steering is a complex optimization problem. Netflix has developed sophisticated algorithms that continuously learn and adapt to network conditions.
Multi-Armed Bandit Approach:
Netflix uses a variant of the multi-armed bandit algorithm for steering decisions:
The Problem:
The Solution:
Continuous Learning:
Failover Logic:
Steering always provides multiple fallback OCAs:
Primary: oca1.netflix.com (best predicted quality)
Secondary: oca2.netflix.com (second best, different cluster)
Tertiary: oca3.netflix.com (different region, worst-case fallback)
Players try primary first, fall back on errors. This provides resilience without control plane involvement during playback.
Netflix doesn't just optimize for lowest latency—they optimize for stream quality: bitrate achieved, rebuffer rate, video resolution. A slightly higher-latency path that delivers better quality is preferred. This is why Netflix built their own CDN: third-party CDNs optimize for generic web metrics, not streaming-specific quality.
Getting content onto OCAs before users request it is critical for quality streaming. The fill system is a complex choreography of prediction, prioritization, and network utilization.
| Priority | Content Type | Fill Strategy | Cache Duration |
|---|---|---|---|
| P0 (Critical) | New release first 24h | Push to all relevant OCAs pre-launch | Indefinite |
| P1 (High) | Top 100 titles | Keep on all OCAs, immediate refill | Weeks |
| P2 (Medium) | Top 1000 titles | Regional OCAs, moderate fill priority | Days |
| P3 (Low) | Long tail content | On-demand fill only, limited replication | Hours |
| P4 (Archive) | Rarely accessed | Origin-only, no edge caching | N/A |
Fill Traffic Engineering:
Fill operations must not interfere with viewer traffic. Several mechanisms ensure this:
Bandwidth Limiting:
Hierarchical Filling:
Deduplication:
Graceful Degradation:
Netflix targets 95%+ cache hit rate at edge OCAs. This means only 5% of requests need upstream fetches. For popular content, hit rates exceed 99%. This is achieved through careful fill policies, not luck—every fill decision is data-driven.
Open Connect's success depends on ISP partnerships. These relationships are complex—technical, commercial, and sometimes political. Understanding them is essential for any large-scale content delivery system.
Partnership Requirements:
To qualify for Open Connect, ISPs must meet technical and business requirements:
Technical Requirements:
Operational Requirements:
Relationship Tiers:
| Tier | Subscriber Count | Typical Deployment | Netflix Involvement |
|---|---|---|---|
| Tier 1 | 10M+ | 100+ OCAs, multiple sites | Dedicated relationship manager |
| Tier 2 | 1-10M | 10-100 OCAs, several sites | Regional partnership team |
| Tier 3 | 100K-1M | 2-10 OCAs, 1-2 sites | Automated onboarding |
| IXP-only | <100K | Served via shared IXP deployment | No direct partnership |
Not all ISPs want Open Connect. Some see it as Netflix avoiding transit payments. Others (especially those with own streaming services) resist making Netflix's experience superior. In these cases, Netflix relies on IXP deployments and transit—quality is lower, but service continues.
Operating 15,000+ servers across 1,000+ locations requires sophisticated monitoring and streamlined operations. Netflix has developed tools and processes that enable a relatively small team to manage this massive fleet.
Operational Philosophy:
Automate Everything:
With 15,000+ servers, human-driven operations don't scale. Netflix automates:
No On-Site Requirements:
OCAs are designed for fully remote operation:
Hardware Replacement Process:
Typical time from failure to replacement: 24-72 hours depending on location.
Incident Response:
When issues occur, the incident process:
You now have deep knowledge of Open Connect—Netflix's proprietary CDN that serves 15%+ of global internet traffic. From purpose-built hardware to sophisticated steering algorithms to ISP partnerships, Open Connect is a masterclass in building infrastructure at planetary scale. Next, we'll explore the Personalization Engine—how Netflix decides what to show you.