System Design (HLD)CDN Caching

CDN Caching: Global Content Distribution at Scale

LevelIntermediate

Duration90 mins

TopicCDN Caching

1 / 5

CDN Architecture: The Foundation of Global Content Distribution

The Global Speed Challenge

When you stream a video from Netflix, load images on Instagram, or download a software update from Microsoft, you're experiencing the magic of Content Delivery Networks (CDNs). These invisible infrastructure giants ensure that content reaches billions of users worldwide with sub-second latency, regardless of whether the user is in Tokyo, São Paulo, or Cape Town.

Consider the physics of the problem: light travels at approximately 300,000 kilometers per second. A request from Sydney to a server in New York must traverse roughly 16,000 kilometers of fiber optic cable—introducing at least 53 milliseconds of latency each way, 106 milliseconds round-trip, just from the speed of light limitation. In reality, network routing, packet processing, and protocol overhead multiply this to 200-400 milliseconds. For a webpage loading 100 resources, this becomes 20-40 seconds of latency—an eternity in user experience terms.

CDNs solve this fundamental physics problem by eliminating distance.

What You Will Master

By the end of this page, you will understand the complete architecture of Content Delivery Networks—from their core components and topology patterns to the sophisticated techniques they use to route traffic, cache content, and maintain consistency across thousands of globally distributed edge servers. You'll gain the knowledge to design CDN strategies for systems serving global audiences.

What is a Content Delivery Network?

A Content Delivery Network (CDN) is a geographically distributed network of proxy servers and data centers designed to provide high availability and performance by distributing content closer to end users. Rather than serving all content from a single origin server, CDNs cache and deliver content from edge locations strategically positioned around the world.

The Core Principle: Move Content to Users, Not Users to Content

The fundamental insight behind CDNs is deceptively simple: instead of making every user request travel to a centralized data center, replicate content across geographically distributed locations and serve users from the nearest one. This transforms content delivery from a single-point model to a distributed mesh model.

The Evolution of CDNs

CDNs emerged in the late 1990s as internet traffic began overwhelming origin servers. Akamai, founded in 1998, pioneered the commercial CDN industry with algorithms developed at MIT to solve the "flash crowd" problem—sudden traffic spikes that would crash popular websites. Today, CDNs have evolved from simple static content caches into sophisticated platforms that handle:

Static content delivery: Images, videos, stylesheets, JavaScript files
Dynamic content acceleration: API responses, personalized content
Live and on-demand video streaming: Adaptive bitrate streaming at scale
Security services: DDoS protection, Web Application Firewalls, bot mitigation
Edge computing: Running code at edge locations (serverless functions)

CDN Industry Landscape: Major Providers and Their Specializations
Provider	Founded	Key Strengths	Notable Use Cases
Cloudflare	2010	Security-first, global anycast, edge computing (Workers)	Web performance, DDoS protection, edge applications
Akamai	1998	Largest network, enterprise-grade, media delivery	Streaming platforms, large enterprises, gaming
Amazon CloudFront	2008	AWS integration, Lambda@Edge, global infrastructure	AWS-native workloads, e-commerce, SaaS
Fastly	2011	Real-time configuration, edge computing, instant purge	Publishers, media companies, high-performance apps
Google Cloud CDN	2015	GCP integration, Anycast, global load balancing	GCP workloads, YouTube infrastructure
Azure CDN	2015	Azure integration, multiple providers, enterprise focus	Microsoft ecosystem, enterprise applications

CDN vs Traditional Hosting

Traditional hosting serves all requests from a single location, creating a performance ceiling based on geographic distance. CDNs invert this model—content exists everywhere simultaneously, and the network routes users to the optimal copy. This isn't just an optimization; it's a fundamental architectural shift that enables truly global applications.

CDN Architecture: Core Components

A production CDN architecture comprises several interconnected components, each serving a specific function in the content delivery pipeline. Understanding these components is essential for designing effective caching strategies and troubleshooting delivery issues.

The Complete CDN Stack:

Core CDN Components

•Origin Server — The authoritative source of content. This is your application server, object storage (S3, GCS), or content management system. The origin is the source of truth that CDN edge nodes cache from.
•Edge Servers (Points of Presence/PoPs) — Geographically distributed servers that cache and serve content to end users. A major CDN may operate 200+ PoPs across 100+ countries, with each PoP containing multiple edge servers.
•DNS Infrastructure — The routing layer that directs users to optimal edge servers. CDNs operate sophisticated DNS systems using techniques like GeoDNS, latency-based routing, and Anycast to ensure users connect to the best available PoP.
•Origin Shield (Mid-Tier Cache) — An intermediate caching layer between edge servers and the origin. Origin shields reduce origin load by consolidating cache misses from multiple edge PoPs through a single regional cache.
•Control Plane — The management layer for configuration, purging, analytics, and security policies. This is how operators configure cache behavior, invalidate content, and monitor performance.
•Backbone Network — Private fiber networks connecting CDN PoPs. Major CDNs operate their own backbone infrastructure to avoid public internet congestion and reduce inter-PoP latency.

Converting Mermaid diagram...

Understanding the Request Flow:

DNS Resolution: User requests cdn.example.com. The CDN's DNS infrastructure determines the optimal PoP based on geography, network conditions, PoP health, and capacity.
Edge Connection: The user connects to the assigned edge PoP. Modern CDNs use persistent connections (HTTP/2, HTTP/3/QUIC) to eliminate connection setup overhead for subsequent requests.
Cache Lookup: The edge server checks its local cache for the requested content. This lookup is typically sub-millisecond using in-memory hash tables or SSD-backed storage.
Cache Hit (Happy Path): If content is cached and valid (not expired), the edge server responds immediately. Latency is typically 10-50ms depending on user proximity to the PoP.
Cache Miss (Origin Fetch): If content is missing or stale, the edge server fetches from the origin shield (if configured) or directly from the origin. The response is cached for subsequent requests.

CDN Topology Patterns

CDN providers employ different network topologies to balance cost, performance, and operational complexity. Understanding these patterns helps architects make informed decisions about CDN selection and configuration.

Three Primary Topology Patterns:

Flat Topology

•All edge PoPs connect directly to origin
•Simplest architecture—no intermediate tiers
•Cache misses go straight to origin
•Higher origin load during cache population
•Suitable for: Small-scale deployments, single-region origins

Hierarchical Topology

•Edge PoPs → Origin Shield → Origin
•Mid-tier caches consolidate origin requests
•Dramatically reduces origin load
•Adds one hop but improves cache efficiency
•Suitable for: Large-scale, global deployments

Mesh Topology (Peer-to-Peer Edge Caching)

Advanced CDNs implement mesh topologies where edge PoPs can fetch content from each other rather than always going to the origin or origin shield. This peer-to-peer approach offers several advantages:

Reduced Origin Dependency: If PoP-A has content cached, nearby PoP-B can fetch from PoP-A instead of the distant origin
Improved Resilience: Content availability doesn't depend on a single origin shield
Lower Latency for Cache Misses: Inter-PoP latency is often lower than PoP-to-origin latency
Better Long-Tail Content Distribution: Rarely-accessed content can still be served efficiently

The Trade-off Triangle:

Every CDN topology balances three competing concerns:

CDN Topology Trade-off Analysis
Topology	Origin Load	Cache Efficiency	Complexity	Latency (Miss)
Flat	High	Lower (duplicate caching)	Low	Depends on origin distance
Hierarchical	Very Low	High (consolidated misses)	Medium	Medium (extra hop)
Mesh	Low	Highest (peer sharing)	High	Lowest (nearest peer)

Choosing the Right Topology

Most production deployments benefit from hierarchical topology with origin shields. The slight latency increase on cache misses (typically 20-50ms) is vastly outweighed by the origin load reduction—often 10-100x fewer requests reaching your origin servers. Mesh topology is typically reserved for the largest CDN providers who can justify the operational complexity.

Request Routing Mechanisms

How does a CDN ensure that a user in Mumbai connects to a server in Mumbai rather than one in Montreal? This is the request routing problem, and CDNs employ multiple sophisticated techniques to solve it.

Primary Routing Mechanisms:

DNS-Based Routing

•GeoDNS — Resolves DNS queries based on the geographic location of the user's DNS resolver. IP geolocation databases map resolver IPs to locations. Limitation: resolver location != user location (especially with Google DNS, Cloudflare DNS).
•Latency-Based DNS — Returns the PoP with lowest measured latency to the user's resolver. Requires continuous latency probing infrastructure. More accurate than pure geography but still resolver-dependent.
•EDNS Client Subnet (ECS) — Extension that includes a portion of the client's IP in DNS queries, allowing CDNs to route based on actual user location rather than resolver location. Partially addresses the resolver mismatch problem.
•Weighted DNS Routing — Distributes traffic across PoPs based on capacity weights. Used for load balancing and gradual traffic shifting during deployments or incident response.

Anycast Routing

•Same IP, Multiple Locations — All CDN edge servers announce the same IP address via BGP. Internet routing naturally directs packets to the "nearest" server based on BGP path length.
•Always Optimal — Unlike DNS-based routing, Anycast routing happens at the network layer, so it always routes to the topologically nearest server regardless of DNS resolver location.
•Instant Failover — If a PoP goes down, BGP withdrawal automatically reroutes traffic to the next-nearest PoP within seconds, without DNS TTL delays.
•UDP-Friendly — Anycast works seamlessly with UDP protocols (DNS, QUIC), making it ideal for HTTP/3 and DNS-over-HTTPS. TCP connections can persist across routing changes with careful implementation.

Hybrid Routing: The Best of Both Worlds

Modern CDNs typically combine DNS-based and Anycast routing:

Anycast for initial DNS resolution: The CDN's authoritative DNS servers use Anycast for low-latency, resilient DNS resolution
GeoDNS/Latency-based routing for edge assignment: DNS responses direct users to optimal edge PoPs using geographic and performance data
Anycast at the edge (optionally): Some CDNs also use Anycast for edge server addressing, providing seamless failover

Client-Based Routing (Advanced)

Some CDNs implement client-side routing where JavaScript running in the browser measures latency to multiple PoPs and directs subsequent requests to the fastest one. This provides true end-to-end latency optimization but adds complexity and initial measurement overhead.

DNS Caching Complicates Routing

DNS resolvers cache responses according to TTL values. If a CDN uses DNS routing with a 5-minute TTL and a PoP fails, users may continue attempting to connect to the failed PoP until their cached DNS entry expires. Anycast solves this at the network layer but isn't always suitable. Balance TTL values carefully: shorter TTLs enable faster failover but increase DNS query volume and latency.

Edge Server Architecture

Edge servers are the workhorses of CDN infrastructure. Each edge server handles thousands of concurrent connections, performs cache lookups, negotiates TLS, executes edge logic, and routes cache misses—all while maintaining sub-millisecond response times. Understanding their internal architecture illuminates CDN behavior and performance characteristics.

Edge Server Component Stack:

Edge Server Components

•Connection Termination Layer — Handles TCP/TLS handshakes, HTTP/2 multiplexing, and HTTP/3 QUIC connections. Modern edge servers terminate millions of concurrent connections using event-driven architectures (epoll/kqueue) and kernel bypass techniques (DPDK, io_uring).
•Request Router — Parses incoming requests and determines handling: serve from cache, fetch from origin, execute edge function, or return error. Routing rules evaluate path patterns, headers, query parameters, and cookies.
•Cache Storage Engine — Multi-tier storage combining RAM cache (hot content), SSD cache (warm content), and optional HDD cache (cold content). Cache lookup uses consistent hashing to distribute objects across storage tiers. Hot content may be cached entirely in memory for single-digit microsecond access.
•Origin Fetcher — Manages connections to origin servers or upstream caches. Implements connection pooling, health checking, retries, and timeout handling. May support multiple origin configurations for A/B testing or failover.
•Edge Compute Runtime — Executes customer code (JavaScript, Wasm) at the edge. Isolates workloads using V8 isolates or WebAssembly sandboxing. Examples: Cloudflare Workers, Lambda@Edge, Fastly Compute@Edge.
•Observability Instrumentation — Emits metrics (request counts, latencies, cache hit rates), logs (access logs, error logs), and traces (distributed tracing for debugging). Data flows to centralized analytics infrastructure for real-time monitoring.

The Cache Storage Hierarchy:

Modern edge servers implement tiered caching to optimize for both speed and capacity:

Tier	Medium	Capacity	Access Latency	Content Type
L1	RAM	64-256 GB	1-10 μs	Extremely hot objects
L2	NVMe SSD	2-16 TB	50-200 μs	Popular objects
L3	SATA SSD/HDD	16-100 TB	500 μs - 5 ms	Long-tail content

Cache admission policies determine which tier receives new objects. Simple policies place all objects in L3 and promote based on access frequency. Advanced policies predict object popularity from request patterns to optimize initial placement.

Connection Handling at Scale:

A single edge server may handle 100,000+ concurrent connections. This requires:

Event-driven I/O: Non-blocking socket operations using epoll (Linux) or kqueue (BSD)
Connection coalescing: HTTP/2 multiplexing reduces connection count for same-origin requests
Kernel bypass: DPDK or F-stack moves network processing to userspace for reduced latency
TLS session resumption: Session tickets and TLS 1.3 0-RTT reduce handshake overhead

Why CDN Edge Servers Are Special

CDN edge servers aren't just web servers with caching—they're purpose-built systems optimized for the specific access patterns of content delivery. A typical web server might handle 10,000 concurrent connections; a CDN edge server handles 10x that while maintaining predictable microsecond-level performance. This specialization is why building your own CDN is rarely cost-effective compared to using established providers.

Cache Storage Strategies

How edge servers decide what to cache, what to evict, and where to store objects significantly impacts cache hit rates and overall CDN performance. These decisions involve complex trade-offs between storage capacity, access latency, and operational cost.

Cache Storage Decisions:

Key Caching Decisions

•Admission: Should this object be cached at all? Objects smaller than a threshold might bypass caching entirely (header-only responses, tiny files). Objects that appear to be one-time requests might be excluded.
•Placement: Which storage tier should initially receive this object? Hot objects go to RAM, bulk content to SSD. Prediction algorithms analyze request patterns to optimize initial placement.
•Promotion/Demotion: When should objects move between tiers? Objects exceeding access frequency thresholds are promoted to faster storage; objects falling below thresholds are demoted.
•Eviction: When storage is full, which objects should be removed? LRU (Least Recently Used), LFU (Least Frequently Used), and hybrid policies like SLRU (Segmented LRU) govern eviction.

Eviction Policy Deep Dive:

LRU (Least Recently Used)

Evicts the object that hasn't been accessed for the longest time
Simple to implement with O(1) operations using hash map + doubly linked list
Works well when recency predicts future access
Weakness: One-time bulk scans evict hot content (cache pollution)

LFU (Least Frequently Used)

Evicts objects with the lowest access count
Retains persistently popular content even if temporarily idle
Weakness: Historical frequency may not predict future access; new content struggles to gain cache position

SLRU (Segmented LRU)

Divides cache into probationary and protected segments
New objects enter probationary segment; re-access promotes to protected
Protected objects demote to probationary when touched in probationary
Balances recency and frequency; reduces cache pollution from one-time access

ARC (Adaptive Replacement Cache)

Maintains four lists tracking recent and frequent access patterns
Dynamically adapts between LRU and LFU behavior based on workload
Self-tuning: no manual parameter adjustment required
Used by ZFS and some advanced caching systems

Cache Warming

After PoP restarts or new PoP deployments, caches are cold—every request is a cache miss. Smart CDNs implement cache warming by pre-populating caches from peer PoPs or origin servers before receiving live traffic. This is especially important for video streaming where cache misses cause buffering. Some CDNs allow customers to trigger cache warming before major events (product launches, live streams).

Multi-CDN Architectures

For mission-critical applications, relying on a single CDN provider introduces unacceptable risk. Multi-CDN architectures distribute traffic across multiple CDN providers to maximize performance, reduce costs, and ensure resilience against provider-specific outages.

Why Multi-CDN?

Multi-CDN Benefits

•Resilience: CDN outages happen. Cloudflare, Fastly, and Akamai have all experienced significant outages. Multi-CDN ensures you can instantly failover to an alternative provider.
•Geographic Optimization: CDN A might have excellent performance in Europe while CDN B excels in Asia. Multi-CDN lets you use each provider where they're strongest.
•Cost Optimization: Different CDNs offer different pricing models. Some are cheaper for bandwidth-heavy workloads; others for request-heavy workloads. Multi-CDN enables cost arbitrage.
•Feature Flexibility: CDN features vary. One might offer superior edge computing; another better bot protection. Multi-CDN allows using best-of-breed features.
•Leverage and Negotiation: Using multiple providers reduces vendor lock-in and improves negotiating position for enterprise contracts.

Multi-CDN Implementation Approaches:

1. DNS-Based Traffic Splitting

Traffic director (NS1, Route 53, Constellix) routes users to different CDNs based on performance, cost, or load
Can implement active-passive (primary CDN + failover) or active-active (weighted distribution)
Limitation: DNS TTLs slow failover; switching CDN doesn't invalidate user's DNS cache

2. Intelligent Traffic Management Platforms

Platforms like Cedexis (Citrix), Conviva, or Constellix continuously measure CDN performance from global vantage points
Real-time routing decisions based on actual latency, availability, and throughput metrics
Can route different user segments to different CDNs (premium users to faster CDN)

3. CDN Load Balancer Layer

Deploy a CDN specifically for routing to other CDNs
Example: Cloudflare in front of origin-shielded Akamai and Fastly
The front CDN becomes the DNS target; it load balances to backend CDNs

4. Application-Layer Routing

Application logic determines which CDN to use per request
Maximum flexibility but also maximum complexity
Useful for dynamic CDN selection based on content type or user tier

Multi-CDN Complexity Tax

Multi-CDN architectures significantly increase operational complexity. You must maintain configurations at multiple providers, ensure cache invalidation propagates to all CDNs, monitor performance across providers, and handle billing from multiple vendors. For many organizations, the complexity cost outweighs the benefits. Evaluate honestly whether your scale and reliability requirements justify multi-CDN before implementing.

Summary: CDN Architecture Mastery

We've completed an exhaustive exploration of CDN architecture—from the fundamental physics problem that CDNs solve to the sophisticated components and patterns that enable global content delivery at scale.

Key Takeaways

•CDNs solve the physics of distance — By caching content at globally distributed edge locations, CDNs eliminate the latency tax of cross-continental data transfer, enabling sub-100ms response times worldwide.
•Architecture is multi-layered — Production CDN architecture comprises origin servers, edge PoPs, origin shields, DNS routing infrastructure, and control planes, each optimized for specific functions in the delivery pipeline.
•Topology choices matter — Flat, hierarchical, and mesh topologies offer different trade-offs between simplicity, cache efficiency, and origin load. Hierarchical with origin shields is the default for large-scale deployments.
•Routing is sophisticated — CDNs combine DNS-based routing (GeoDNS, latency-based) with Anycast to ensure users connect to optimal edge servers while maintaining resilience against failures.
•Edge servers are purpose-built — These specialized systems handle massive concurrent connections while executing cache lookups, TLS termination, and edge compute in microsecond-level latencies.
•Multi-CDN increases resilience — For mission-critical applications, distributing traffic across multiple CDN providers protects against provider-specific outages and enables geographic optimization.

Next Steps:

With a solid understanding of CDN architecture, we'll next explore Edge Locations—diving deeper into how CDN providers strategically position their Points of Presence, the infrastructure within each PoP, and how to evaluate CDN coverage for your geographic requirements.

Architecture Foundation Complete

You now possess a comprehensive understanding of CDN architecture—the foundation for all subsequent CDN caching topics. This knowledge enables you to make informed decisions about CDN selection, configuration, and optimization for globally distributed applications.

1 / 5

Loading learning content...

System Design (HLD)CDN Caching

CDN Caching: Global Content Distribution at Scale

LevelIntermediate

Duration90 mins

TopicCDN Caching

1 / 5

CDN Architecture: The Foundation of Global Content Distribution

The Global Speed Challenge

CDNs solve this fundamental physics problem by eliminating distance.

What You Will Master

What is a Content Delivery Network?

The Core Principle: Move Content to Users, Not Users to Content

The Evolution of CDNs

Static content delivery: Images, videos, stylesheets, JavaScript files
Dynamic content acceleration: API responses, personalized content
Live and on-demand video streaming: Adaptive bitrate streaming at scale
Security services: DDoS protection, Web Application Firewalls, bot mitigation
Edge computing: Running code at edge locations (serverless functions)

CDN Industry Landscape: Major Providers and Their Specializations
Provider	Founded	Key Strengths	Notable Use Cases
Cloudflare	2010	Security-first, global anycast, edge computing (Workers)	Web performance, DDoS protection, edge applications
Akamai	1998	Largest network, enterprise-grade, media delivery	Streaming platforms, large enterprises, gaming
Amazon CloudFront	2008	AWS integration, Lambda@Edge, global infrastructure	AWS-native workloads, e-commerce, SaaS
Fastly	2011	Real-time configuration, edge computing, instant purge	Publishers, media companies, high-performance apps
Google Cloud CDN	2015	GCP integration, Anycast, global load balancing	GCP workloads, YouTube infrastructure
Azure CDN	2015	Azure integration, multiple providers, enterprise focus	Microsoft ecosystem, enterprise applications

CDN vs Traditional Hosting

CDN Architecture: Core Components

The Complete CDN Stack:

Core CDN Components

•Origin Server — The authoritative source of content. This is your application server, object storage (S3, GCS), or content management system. The origin is the source of truth that CDN edge nodes cache from.
•Edge Servers (Points of Presence/PoPs) — Geographically distributed servers that cache and serve content to end users. A major CDN may operate 200+ PoPs across 100+ countries, with each PoP containing multiple edge servers.
•DNS Infrastructure — The routing layer that directs users to optimal edge servers. CDNs operate sophisticated DNS systems using techniques like GeoDNS, latency-based routing, and Anycast to ensure users connect to the best available PoP.
•Origin Shield (Mid-Tier Cache) — An intermediate caching layer between edge servers and the origin. Origin shields reduce origin load by consolidating cache misses from multiple edge PoPs through a single regional cache.
•Control Plane — The management layer for configuration, purging, analytics, and security policies. This is how operators configure cache behavior, invalidate content, and monitor performance.
•Backbone Network — Private fiber networks connecting CDN PoPs. Major CDNs operate their own backbone infrastructure to avoid public internet congestion and reduce inter-PoP latency.

Converting Mermaid diagram...

Understanding the Request Flow:

DNS Resolution: User requests cdn.example.com. The CDN's DNS infrastructure determines the optimal PoP based on geography, network conditions, PoP health, and capacity.
Edge Connection: The user connects to the assigned edge PoP. Modern CDNs use persistent connections (HTTP/2, HTTP/3/QUIC) to eliminate connection setup overhead for subsequent requests.
Cache Lookup: The edge server checks its local cache for the requested content. This lookup is typically sub-millisecond using in-memory hash tables or SSD-backed storage.
Cache Hit (Happy Path): If content is cached and valid (not expired), the edge server responds immediately. Latency is typically 10-50ms depending on user proximity to the PoP.
Cache Miss (Origin Fetch): If content is missing or stale, the edge server fetches from the origin shield (if configured) or directly from the origin. The response is cached for subsequent requests.

CDN Topology Patterns

Three Primary Topology Patterns:

Flat Topology

•All edge PoPs connect directly to origin
•Simplest architecture—no intermediate tiers
•Cache misses go straight to origin
•Higher origin load during cache population
•Suitable for: Small-scale deployments, single-region origins

Hierarchical Topology

•Edge PoPs → Origin Shield → Origin
•Mid-tier caches consolidate origin requests
•Dramatically reduces origin load
•Adds one hop but improves cache efficiency
•Suitable for: Large-scale, global deployments

Mesh Topology (Peer-to-Peer Edge Caching)

Reduced Origin Dependency: If PoP-A has content cached, nearby PoP-B can fetch from PoP-A instead of the distant origin
Improved Resilience: Content availability doesn't depend on a single origin shield
Lower Latency for Cache Misses: Inter-PoP latency is often lower than PoP-to-origin latency
Better Long-Tail Content Distribution: Rarely-accessed content can still be served efficiently

The Trade-off Triangle:

Every CDN topology balances three competing concerns:

CDN Topology Trade-off Analysis
Topology	Origin Load	Cache Efficiency	Complexity	Latency (Miss)
Flat	High	Lower (duplicate caching)	Low	Depends on origin distance
Hierarchical	Very Low	High (consolidated misses)	Medium	Medium (extra hop)
Mesh	Low	Highest (peer sharing)	High	Lowest (nearest peer)

Choosing the Right Topology

Request Routing Mechanisms

Primary Routing Mechanisms:

DNS-Based Routing

•GeoDNS — Resolves DNS queries based on the geographic location of the user's DNS resolver. IP geolocation databases map resolver IPs to locations. Limitation: resolver location != user location (especially with Google DNS, Cloudflare DNS).
•Latency-Based DNS — Returns the PoP with lowest measured latency to the user's resolver. Requires continuous latency probing infrastructure. More accurate than pure geography but still resolver-dependent.
•EDNS Client Subnet (ECS) — Extension that includes a portion of the client's IP in DNS queries, allowing CDNs to route based on actual user location rather than resolver location. Partially addresses the resolver mismatch problem.
•Weighted DNS Routing — Distributes traffic across PoPs based on capacity weights. Used for load balancing and gradual traffic shifting during deployments or incident response.

Anycast Routing

•Same IP, Multiple Locations — All CDN edge servers announce the same IP address via BGP. Internet routing naturally directs packets to the "nearest" server based on BGP path length.
•Always Optimal — Unlike DNS-based routing, Anycast routing happens at the network layer, so it always routes to the topologically nearest server regardless of DNS resolver location.
•Instant Failover — If a PoP goes down, BGP withdrawal automatically reroutes traffic to the next-nearest PoP within seconds, without DNS TTL delays.
•UDP-Friendly — Anycast works seamlessly with UDP protocols (DNS, QUIC), making it ideal for HTTP/3 and DNS-over-HTTPS. TCP connections can persist across routing changes with careful implementation.

Hybrid Routing: The Best of Both Worlds

Modern CDNs typically combine DNS-based and Anycast routing:

Anycast for initial DNS resolution: The CDN's authoritative DNS servers use Anycast for low-latency, resilient DNS resolution
GeoDNS/Latency-based routing for edge assignment: DNS responses direct users to optimal edge PoPs using geographic and performance data
Anycast at the edge (optionally): Some CDNs also use Anycast for edge server addressing, providing seamless failover

Client-Based Routing (Advanced)

DNS Caching Complicates Routing

Edge Server Architecture

Edge Server Component Stack:

Edge Server Components

•Connection Termination Layer — Handles TCP/TLS handshakes, HTTP/2 multiplexing, and HTTP/3 QUIC connections. Modern edge servers terminate millions of concurrent connections using event-driven architectures (epoll/kqueue) and kernel bypass techniques (DPDK, io_uring).
•Request Router — Parses incoming requests and determines handling: serve from cache, fetch from origin, execute edge function, or return error. Routing rules evaluate path patterns, headers, query parameters, and cookies.
•Cache Storage Engine — Multi-tier storage combining RAM cache (hot content), SSD cache (warm content), and optional HDD cache (cold content). Cache lookup uses consistent hashing to distribute objects across storage tiers. Hot content may be cached entirely in memory for single-digit microsecond access.
•Origin Fetcher — Manages connections to origin servers or upstream caches. Implements connection pooling, health checking, retries, and timeout handling. May support multiple origin configurations for A/B testing or failover.
•Edge Compute Runtime — Executes customer code (JavaScript, Wasm) at the edge. Isolates workloads using V8 isolates or WebAssembly sandboxing. Examples: Cloudflare Workers, Lambda@Edge, Fastly Compute@Edge.
•Observability Instrumentation — Emits metrics (request counts, latencies, cache hit rates), logs (access logs, error logs), and traces (distributed tracing for debugging). Data flows to centralized analytics infrastructure for real-time monitoring.

The Cache Storage Hierarchy:

Modern edge servers implement tiered caching to optimize for both speed and capacity:

Tier	Medium	Capacity	Access Latency	Content Type
L1	RAM	64-256 GB	1-10 μs	Extremely hot objects
L2	NVMe SSD	2-16 TB	50-200 μs	Popular objects
L3	SATA SSD/HDD	16-100 TB	500 μs - 5 ms	Long-tail content

Connection Handling at Scale:

A single edge server may handle 100,000+ concurrent connections. This requires:

Event-driven I/O: Non-blocking socket operations using epoll (Linux) or kqueue (BSD)
Connection coalescing: HTTP/2 multiplexing reduces connection count for same-origin requests
Kernel bypass: DPDK or F-stack moves network processing to userspace for reduced latency
TLS session resumption: Session tickets and TLS 1.3 0-RTT reduce handshake overhead

Why CDN Edge Servers Are Special

Cache Storage Strategies

Cache Storage Decisions:

Key Caching Decisions

•Admission: Should this object be cached at all? Objects smaller than a threshold might bypass caching entirely (header-only responses, tiny files). Objects that appear to be one-time requests might be excluded.
•Placement: Which storage tier should initially receive this object? Hot objects go to RAM, bulk content to SSD. Prediction algorithms analyze request patterns to optimize initial placement.
•Promotion/Demotion: When should objects move between tiers? Objects exceeding access frequency thresholds are promoted to faster storage; objects falling below thresholds are demoted.
•Eviction: When storage is full, which objects should be removed? LRU (Least Recently Used), LFU (Least Frequently Used), and hybrid policies like SLRU (Segmented LRU) govern eviction.

Eviction Policy Deep Dive:

LRU (Least Recently Used)

Evicts the object that hasn't been accessed for the longest time
Simple to implement with O(1) operations using hash map + doubly linked list
Works well when recency predicts future access
Weakness: One-time bulk scans evict hot content (cache pollution)

LFU (Least Frequently Used)

Evicts objects with the lowest access count
Retains persistently popular content even if temporarily idle
Weakness: Historical frequency may not predict future access; new content struggles to gain cache position

SLRU (Segmented LRU)

Divides cache into probationary and protected segments
New objects enter probationary segment; re-access promotes to protected
Protected objects demote to probationary when touched in probationary
Balances recency and frequency; reduces cache pollution from one-time access

ARC (Adaptive Replacement Cache)

Maintains four lists tracking recent and frequent access patterns
Dynamically adapts between LRU and LFU behavior based on workload
Self-tuning: no manual parameter adjustment required
Used by ZFS and some advanced caching systems

Cache Warming

Multi-CDN Architectures

Why Multi-CDN?

Multi-CDN Benefits

•Resilience: CDN outages happen. Cloudflare, Fastly, and Akamai have all experienced significant outages. Multi-CDN ensures you can instantly failover to an alternative provider.
•Geographic Optimization: CDN A might have excellent performance in Europe while CDN B excels in Asia. Multi-CDN lets you use each provider where they're strongest.
•Cost Optimization: Different CDNs offer different pricing models. Some are cheaper for bandwidth-heavy workloads; others for request-heavy workloads. Multi-CDN enables cost arbitrage.
•Feature Flexibility: CDN features vary. One might offer superior edge computing; another better bot protection. Multi-CDN allows using best-of-breed features.
•Leverage and Negotiation: Using multiple providers reduces vendor lock-in and improves negotiating position for enterprise contracts.

Multi-CDN Implementation Approaches:

1. DNS-Based Traffic Splitting

Traffic director (NS1, Route 53, Constellix) routes users to different CDNs based on performance, cost, or load
Can implement active-passive (primary CDN + failover) or active-active (weighted distribution)
Limitation: DNS TTLs slow failover; switching CDN doesn't invalidate user's DNS cache

2. Intelligent Traffic Management Platforms

Platforms like Cedexis (Citrix), Conviva, or Constellix continuously measure CDN performance from global vantage points
Real-time routing decisions based on actual latency, availability, and throughput metrics
Can route different user segments to different CDNs (premium users to faster CDN)

3. CDN Load Balancer Layer

Deploy a CDN specifically for routing to other CDNs
Example: Cloudflare in front of origin-shielded Akamai and Fastly
The front CDN becomes the DNS target; it load balances to backend CDNs

4. Application-Layer Routing

Application logic determines which CDN to use per request
Maximum flexibility but also maximum complexity
Useful for dynamic CDN selection based on content type or user tier

Multi-CDN Complexity Tax

Summary: CDN Architecture Mastery

Key Takeaways

•CDNs solve the physics of distance — By caching content at globally distributed edge locations, CDNs eliminate the latency tax of cross-continental data transfer, enabling sub-100ms response times worldwide.
•Architecture is multi-layered — Production CDN architecture comprises origin servers, edge PoPs, origin shields, DNS routing infrastructure, and control planes, each optimized for specific functions in the delivery pipeline.
•Topology choices matter — Flat, hierarchical, and mesh topologies offer different trade-offs between simplicity, cache efficiency, and origin load. Hierarchical with origin shields is the default for large-scale deployments.
•Routing is sophisticated — CDNs combine DNS-based routing (GeoDNS, latency-based) with Anycast to ensure users connect to optimal edge servers while maintaining resilience against failures.
•Edge servers are purpose-built — These specialized systems handle massive concurrent connections while executing cache lookups, TLS termination, and edge compute in microsecond-level latencies.
•Multi-CDN increases resilience — For mission-critical applications, distributing traffic across multiple CDN providers protects against provider-specific outages and enables geographic optimization.

Next Steps:

Architecture Foundation Complete

1 / 5