Dynamic Content Acceleration - Learning Module

Loading content...

0/273

Route Optimization

The Network is Not a Simple Pipe

When developers think about network latency, they often imagine a simple pipe connecting client to server—the longer the pipe, the higher the latency. Reality is far more complex.

The internet is a mesh of interconnected networks, each with its own routing policies, congestion patterns, and failure modes. A packet from Sydney to Virginia might traverse 20+ networks, each making independent routing decisions. The path taken is often not the fastest—it's determined by business agreements, cost optimization, and routing policies that prioritize everything except latency.

CDNs fundamentally change this equation through intelligent route optimization: actively measuring network conditions and dynamically selecting paths that minimize latency and maximize reliability.

What You Will Learn

This page covers how CDNs measure network conditions in real-time, select optimal paths through anycast and intelligent routing, handle global traffic management, and adapt to network failures and congestion—all critical for dynamic content acceleration.

Understanding Internet Routing

To understand why CDN route optimization matters, we must first understand how the internet routes traffic and why default routing often produces suboptimal paths.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
THE INTERNET HIERARCHY:
 
┌─────────────────────────────────────────────────────────────────┐
│ Tier 1 Networks (Global Backbone)                              │
│ AT&T, Level3/Lumen, NTT, Telia, Cogent                         │
│ • Have global reach                                             │
│ • Peer freely with each other (settlement-free)                │
│ • Carry transit for lower-tier networks                        │
└───────────────────────┬─────────────────────┬───────────────────┘
                        │                     │
                        ▼                     ▼
┌─────────────────────────────────────────────────────────────────┐
│ Tier 2 Networks (Regional/National)                            │
│ Regional ISPs, national carriers                                │
│ • Pay Tier 1 for transit to global internet                    │
│ • Peer with each other regionally                              │
│ • Carry transit for Tier 3                                     │
└───────────────────────┬─────────────────────┬───────────────────┘
                        │                     │
                        ▼                     ▼
┌─────────────────────────────────────────────────────────────────┐
│ Tier 3 Networks (Edge Access)                                  │
│ Local ISPs, enterprise networks, mobile carriers               │
│ • Pay Tier 2 or directly Tier 1 for upstream transit           │
│ • Connect end users and enterprise customers                   │
│ • Often called "eyeball networks"                              │
└─────────────────────────────────────────────────────────────────┘
 
ROUTING PROTOCOL: BGP (Border Gateway Protocol)
• Each network (AS - Autonomous System) announces routes
• Routing decisions based on: policy > AS path length > metrics
• Optimizes for: cost, business relationships, traffic engineering
• Does NOT optimize for: latency, packet loss, jitter

Why Default Routing is Suboptimal

•Policy over performance: BGP prefers routes based on business relationships, not latency. A customer route is preferred over a peer route, regardless of speed.
•Asymmetric paths: The forward path (client → server) often differs from the return path (server → client). Each can be independently suboptimal.
•Slow convergence: When network conditions change, BGP takes seconds to minutes to converge. Latency-sensitive traffic suffers during transitions.
•Lack of real-time metrics: BGP doesn't measure latency, loss, or jitter. A congested path looks identical to an uncongested one.
•Cold potato routing: Networks hand off traffic as quickly as possible to reduce their own costs, even if it means longer paths through other networks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
User in London accessing server in Amsterdam (~400km direct)
 
POLICY-OPTIMAL PATH (what BGP might choose):
London → London exchange → Tier 1 in US → US peering point → 
European Tier 1 → Amsterdam
Total distance: ~14,000km, Latency: 120ms
 
LATENCY-OPTIMAL PATH (what we want):
London → London Internet Exchange → Amsterdam peering → Amsterdam
Total distance: ~400km, Latency: 8ms
 
The BGP path is 15× longer because:
1. User's ISP has cheaper transit to US network
2. US network peers with destination's upstream in US
3. No direct peering between user's ISP and destination network

CDNs Bypass BGP Limitations

CDNs have direct peering at major internet exchanges worldwide, allowing them to bypass the inefficient transit hierarchy. User traffic terminates at a nearby edge, then travels through the CDN's optimized backbone rather than the public internet.

Anycast Routing: One Address, Many Locations

Anycast is the foundational technology enabling CDN route optimization. Unlike unicast (one address = one destination), anycast announces the same IP address from multiple locations. The network naturally routes users to the "nearest" instance of that address.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
UNICAST (Traditional):
IP: 203.0.113.50 = Only server in Virginia
 
User in Sydney → [internet routing] → Virginia (only option)
User in London → [internet routing] → Virginia (only option)
User in São Paulo → [internet routing] → Virginia (only option)
 
All users, regardless of location, route to Virginia.
 
ANYCAST (CDN):
IP: 203.0.113.50 = Announced from 200+ locations globally
 
User in Sydney → [internet routing] → Sydney PoP (closest)
User in London → [internet routing] → London PoP (closest)
User in São Paulo → [internet routing] → São Paulo PoP (closest)
 
Same IP address, different physical destination based on routing.
 
HOW IT WORKS:
1. Each CDN PoP announces 203.0.113.50 via BGP
2. ISP networks receive multiple paths to 203.0.113.50
3. BGP selects path with shortest AS path (usually nearest PoP)
4. User traffic naturally flows to geographically proximate edge
5. No DNS involvement—happens at IP routing layer

Anycast Advantages and Considerations
Aspect	Advantage	Consideration
Automatic proximity	Users routed to nearest PoP without DNS complexity	"Nearest" is AS path length, not always latency-optimal
DDoS resilience	Attack traffic distributed across all PoPs	Requires consistent capacity at all anycast locations
Failover	If PoP fails, traffic re-routes automatically	BGP convergence takes 30-90 seconds typically
No DNS propagation	Changes take effect at BGP speed, not DNS TTL	Less granular control than DNS-based routing
TCP sessions	Stateless protocols work great	Stateful sessions can break if routing changes

Anycast for dynamic content:

Anycast works exceptionally well for the user-to-edge hop. Users connect to their nearest edge server via anycast. However, for dynamic content, that edge server must then forward to the origin. This edge-to-origin hop uses unicast with CDN-controlled routing, allowing for more sophisticated path selection.

Anycast + DNS Hybrid

Major CDNs combine anycast with intelligent DNS. DNS resolves to an anycast IP, ensuring users reach a nearby edge. But DNS can also encode routing hints, allowing the edge to make informed decisions about which backend path to use.

Real-Time Path Measurement

Intelligent routing requires accurate, real-time understanding of network conditions. CDNs maintain continuous measurement systems that probe network paths and collect performance metrics from actual traffic.

Path Measurement Techniques

•Active probing: Edge servers continuously send probe packets to other edges and origins, measuring RTT, loss, and jitter. Probes run every few seconds per path.
•Passive measurement: Actual user traffic provides ground-truth data. TCP RTT estimates, retransmission rates, and connection timing are collected from production traffic.
•Traceroute analysis: Periodic traceroutes reveal network topology changes, new paths, and potential routing anomalies.
•BGP monitoring: Changes in BGP routing tables indicate network events that might affect path quality.
•Third-party data: Internet weather services, routing databases, and network intelligence feeds provide additional context.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
interface PathMetrics {
  latency: {
    current: number;      // Most recent measurement (ms)
    avg1m: number;        // 1-minute moving average
    avg5m: number;        // 5-minute moving average
    p95: number;          // 95th percentile latency
    jitter: number;       // Standard deviation of latency
  };
  loss: {
    current: number;      // Recent packet loss rate (0-1)
    avg5m: number;        // 5-minute average loss rate
  };
  throughput: {
    measured: number;     // Measured bandwidth (Mbps)
    estimated: number;    // Estimated capacity
  };
  health: 'healthy' | 'degraded' | 'down';
  lastProbe: number;      // Timestamp of last measurement
}
 
class PathMeasurementService {
  private paths: Map<string, PathMetrics> = new Map();
  
  async measurePath(source: EdgeServer, dest: string): Promise<PathMetrics> {
    // Active probe: TCP ping to destination
    const probe = await this.tcpPing(source, dest, {
      count: 10,
      interval: 10, // ms between probes
      timeout: 1000
    });
    
    // Calculate metrics from probe results
    const latencies = probe.results.filter(r => r.success).map(r => r.rtt);
    const losses = probe.results.filter(r => !r.success).length / probe.results.length;
    
    const metrics: PathMetrics = {
      latency: {
        current: latencies[latencies.length - 1],
        avg1m: this.calculateAvg(latencies),
        avg5m: await this.getHistoricalAvg(source, dest, 5 * 60 * 1000),
        p95: this.percentile(latencies, 95),
        jitter: this.standardDeviation(latencies),
      },
      loss: {
        current: losses,
        avg5m: await this.getHistoricalLoss(source, dest, 5 * 60 * 1000),
      },
      throughput: await this.estimateThroughput(source, dest, latencies, losses),
      health: this.determineHealth(latencies, losses),
      lastProbe: Date.now(),
    };
    
    // Update global path database
    this.paths.set(`${source.id}→${dest}`, metrics);
    return metrics;
  }
  
  getBestPath(source: EdgeServer, dests: string[]): string {
    // Score each path by combined latency/loss metric
    const scores = dests.map(dest => ({
      dest,
      score: this.calculatePathScore(this.paths.get(`${source.id}→${dest}`))
    }));
    
    // Return destination with best (lowest) score
    return scores.sort((a, b) => a.score - b.score)[0].dest;
  }
  
  private calculatePathScore(metrics: PathMetrics | undefined): number {
    if (!metrics || metrics.health === 'down') return Infinity;
    
    // Weighted combination: latency + penalty for loss/jitter
    return metrics.latency.avg1m + 
           (metrics.loss.current * 1000) +  // 1% loss = 10ms penalty
           (metrics.latency.jitter * 2);    // Jitter penalty
  }
}

Measurement Overhead

Active probing consumes bandwidth and origin resources. CDNs carefully balance measurement fidelity against overhead—probing more frequently for critical paths, less frequently for rarely-used routes. Passive measurement from production traffic often provides superior data without additional overhead.

Dynamic Origin Selection

When customers have multiple origin servers across regions, CDN edges can dynamically select the optimal origin based on real-time conditions. This goes beyond simple geographic proximity to consider actual network performance.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SCENARIO: Customer has origins in US-East, US-West, and EU-West
Edge server in Singapore receives user request
 
STATIC SELECTION (naive approach):
  Geographic lookup: Singapore is closer to... (check coordinates)
  Result: Route to US-West (7,000km vs 17,000km to EU)
 
DYNAMIC SELECTION (CDN approach):
  Current measurements from Singapore edge:
    → US-East: 185ms latency, 0.1% loss, healthy
    → US-West: 160ms latency, 2.5% loss, degraded (undersea cable issue)
    → EU-West: 145ms latency, 0.05% loss, healthy (via Middle East route)
  
  Path score calculation:
    US-East: 185 + (0.001 × 1000) + jitter = 190
    US-West: 160 + (0.025 × 1000) + jitter = 210 (penalty for loss)
    EU-West: 145 + (0.0005 × 1000) + jitter = 148
  
  RESULT: Route to EU-West (lowest score)
  
  Even though EU is geographically farther, current network conditions
  make it the fastest, most reliable path for THIS request.

Factors in origin selection:

•Latency: Measured RTT to each origin, including both network latency and origin response time.
•Packet loss: Higher loss paths require retransmissions, adding effective latency.
•Jitter: Inconsistent latency creates user experience issues, especially for streaming.
•Origin health: CPU/memory utilization, error rates, queue depths reported by origin servers.
•Origin capacity: Current load relative to capacity, avoiding overloaded origins.
•Request type: Some origins might be optimized for specific content types.
•Regulatory requirements: Data residency rules might require specific origins for certain users.

Weighted Load Balancing

Dynamic selection often uses weighted distribution rather than winner-take-all. If US-East and EU-West have similar scores, traffic might split 60/40 rather than 100/0. This prevents oscillation and provides resilience if the 'best' path suddenly degrades.

Multi-Path Routing and Traffic Engineering

Advanced CDN routing goes beyond selecting a single best path. Multi-path routing simultaneously uses multiple routes, distributing traffic based on real-time conditions and aggregating bandwidth across paths.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
TRADITIONAL ROUTING: Single Path
Edge → Path A → Origin
(100% of traffic on one path)
 
MULTI-PATH ROUTING: Distributed Traffic
Edge → Path A (40%) → Origin
     → Path B (35%) → Origin  
     → Path C (25%) → Origin
 
BENEFITS:
1. Aggregate bandwidth: 3 paths of 100Mbps = ~250-280Mbps effective
2. Failure resilience: Path A fails? Instantly shift to B+C
3. Latency hedging: Some requests route through faster path
4. Congestion avoidance: Spread load to prevent any single path congestion
 
IMPLEMENTATION:
• Per-request path selection based on current metrics
• Sticky sessions: Keep related requests on same path when needed
• Automatic rebalancing as conditions change
• Sub-second failover when paths degrade

Traffic engineering at CDN scale:

CDN Traffic Engineering Techniques

•ECMP (Equal-Cost Multi-Path): At the IP layer, distribute traffic across paths with equal BGP metrics. Simple but doesn't account for performance differences.
•UCMP (Unequal-Cost Multi-Path): Distribute traffic weighted by path quality. Better paths get more traffic proportionally.
•Segment Routing: Encode path through network in packet headers. Provides fine-grained control over intermediate hops.
•SD-WAN principles: Software-defined overlay networks with real-time path optimization. CDNs build similar systems at global scale.
•Request-level steering: Each HTTP request independently routed based on current conditions. Maximum flexibility but requires coordination.

Multi-Path Routing Trade-offs
Approach	Latency	Throughput	Complexity
Single best path	Optimal for path	Limited by path bandwidth	Simple
Static multi-path	Average of paths	Aggregated bandwidth	Moderate
Dynamic multi-path	Near-optimal + hedging	Aggregated + optimized	Complex
Per-request adaptive	Best possible	Maximum utilization	Very complex

Request Ordering

Multi-path routing can cause request reordering. If two requests from the same session take different paths with different latencies, they may arrive at the origin out of order. CDNs must handle this for protocols sensitive to ordering.

Failure Detection and Automatic Failover

Network failures are inevitable—cables are cut, routers fail, entire regions go dark. CDN route optimization includes rapid detection of failures and automatic rerouting to maintain service continuity.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
DETECTION SPEED VS ACCURACY TRADE-OFF:
 
Level 1: Active Health Checks (seconds)
├── Probe interval: 1-5 seconds
├── Failure threshold: 2-3 consecutive failures
├── Detection time: 5-15 seconds
└── Action: Remove path from rotation
 
Level 2: Passive Traffic Analysis (sub-second)
├── Monitor: TCP retransmits, connection failures, HTTP errors
├── Threshold: Error rate > X% over Y requests
├── Detection time: 0.5-2 seconds
└── Action: Reduce traffic weight, increase other paths
 
Level 3: Real-time Connection Failure (immediate)
├── Trigger: Connection refused, timeout, reset
├── Detection time: 0 (per-request)
├── Action: Retry on alternate path immediately
└── Scope: Affects only the failing request
 
Level 4: BGP-level Detection (30-90 seconds)
├── Trigger: BGP route withdrawal or path change
├── Detection time: BGP convergence time
├── Action: Traffic naturally re-routes via anycast
└── Scope: Affects all traffic in that routing domain

Rapid failover implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
class RequestRouter {
  async routeRequest(request: Request): Promise<Response> {
    const origins = this.getAvailableOrigins(request);
    const sortedOrigins = this.rankByScore(origins);
    
    for (let attempt = 0; attempt < 3; attempt++) {
      const origin = sortedOrigins[attempt % sortedOrigins.length];
      
      try {
        // Attempt with timeout based on expected latency
        const timeout = Math.min(
          origin.metrics.latency.p95 * 2,  // 2× of p95 latency
          5000                              // Max 5 second timeout
        );
        
        const response = await this.forwardToOrigin(request, origin, timeout);
        
        // Success: update metrics positively
        this.recordSuccess(origin, response.timing);
        return response;
        
      } catch (error) {
        // Record failure for this origin
        this.recordFailure(origin, error);
        
        if (this.isRetryable(error)) {
          // Connection/timeout errors: try next origin immediately
          console.log(`Attempt ${attempt + 1} failed, trying next origin`);
          continue;
        } else {
          // Application error (4xx/5xx): don't retry, return to user
          throw error;
        }
      }
    }
    
    // All attempts exhausted
    throw new AllOriginsFailedError('Request failed after 3 attempts');
  }
  
  private recordFailure(origin: Origin, error: Error): void {
    origin.failureCount++;
    origin.lastFailure = Date.now();
    
    // Consecutive failures? Reduce weight rapidly
    if (origin.failureCount >= 3) {
      origin.weight = Math.max(origin.weight * 0.5, 0.1);
      
      // Many failures? Mark unhealthy temporarily
      if (origin.failureCount >= 5) {
        origin.health = 'degraded';
        this.scheduleHealthCheck(origin);
      }
    }
  }
}

Cascading Failure Prevention

When one path fails, traffic shifts to remaining paths. This surge can overload them, causing secondary failures. CDN failover includes circuit breakers and load shedding to prevent cascade failures—sometimes it's better to reject excess traffic than to fail completely.

Global Traffic Management (GTM)

Global Traffic Management (GTM) orchestrates routing decisions across the entire CDN network. It combines DNS-based routing with real-time intelligence to direct users to optimal entry points.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
USER REQUEST: www.example.com
 
STEP 1: DNS Resolution
├── User queries local DNS resolver
├── Resolver queries authoritative DNS (CDN-operated)
├── GTM evaluates:
│   ├── User's resolver location (approximate user location)
│   ├── Edge PoP availability/health
│   ├── Current edge load distribution
│   └── Real-time performance metrics
└── Returns IP of optimal edge PoP
 
STEP 2: Edge Selection Factors
├── Geographic proximity (primary)
├── Network connectivity (peering quality to user's network)
├── Edge capacity (current load vs capacity)
├── Health status (synthetic monitoring results)
├── Business rules (cost, contractual requirements)
└── Traffic shaping (A/B testing, canary deployments)
 
STEP 3: User connects to edge
├── TCP/TLS to selected edge
├── HTTP request forwarded
├── Edge applies internal routing (origin selection, path optimization)
└── Response returns through same path
 
STEP 4: Continuous optimization
├── Real User Monitoring (RUM) measures actual experience
├── GTM updates decisions based on aggregated RUM data
├── Anomaly detection triggers investigation
└── Feedback loop: RUM → GTM → DNS → User routing

GTM Routing Policies

•Performance-based: Route to PoP with lowest latency for this user's region, measured via RUM and synthetic monitoring.
•Geographic: Route based on user location for compliance (data residency) or localization requirements.
•Weighted round-robin: Distribute traffic across PoPs according to configured weights for load distribution.
•Failover: Primary PoP with automatic fallback to secondary if primary is unhealthy.
•Geofencing: Restrict content delivery to specific geographic regions (licensing, regulations).

GTM vs Traditional Load Balancing
Aspect	Traditional LB	CDN GTM
Scope	Single datacenter	Global, 200+ PoPs
Primary signal	Server health	User experience (RUM)
Latency consideration	Minimal	Primary optimization target
Anycast support	No	Deeply integrated
Update speed	Seconds	Seconds (DNS TTL permitting)
Intelligence	Basic health checks	ML-based path selection

Summary: Intelligent Path Selection

Route optimization is a core CDN capability that delivers significant performance improvements for dynamic content—content that cannot benefit from caching but can benefit enormously from better network paths.

Key Takeaways

•Default internet routing prioritizes policy over performance — BGP optimizes for business relationships and cost, not latency. CDNs bypass this with direct peering and controlled routing.
•Anycast provides proximity without DNS complexity — Same IP announced globally; network naturally routes to nearest PoP.
•Real-time measurement enables intelligent routing — Active probing and passive traffic analysis provide ground truth about path quality.
•Dynamic origin selection considers more than distance — Latency, loss, jitter, and origin health combine into routing decisions.
•Multi-path routing increases resilience and throughput — Distributing traffic across paths provides aggregate bandwidth and instant failover.
•Rapid failure detection prevents user impact — Layered detection from BGP to per-request enables sub-second failover.
•GTM coordinates global routing decisions — DNS-based steering combined with real-time intelligence optimizes user entry points.

What's next:

The final page of this module explores edge computing—moving application logic to CDN edge locations, enabling not just network optimization but actual computation at the edge for the ultimate in dynamic content acceleration.

Page Complete

You now understand how CDNs optimize network paths beyond what default internet routing provides. This route optimization, combined with edge termination, TCP optimization, and connection reuse, explains the dramatic latency improvements CDNs achieve for dynamic content.