Loading content...
When a user in Tokyo access your application, do they connect to a server in Tokyo or one in Virginia? The answer to this question—and the sophisticated systems that determine it—is the domain of Global Load Balancing (GSLB), also known as Global Server Load Balancing or Geographic Load Balancing.
Global load balancing operates at a fundamentally different layer than the load balancers we've discussed so far. While traditional load balancers distribute traffic within a data center or region, global load balancers distribute traffic across data centers and regions, often spanning continents.
This page explores the architectures, protocols, and strategies that enable global services to route billions of requests daily to the optimal location, providing low latency, high availability, and intelligent traffic management across the entire planet.
By the end of this page, you will understand: how DNS-based global load balancing works, how BGP anycast provides geographic routing, the tradeoffs between different GSLB approaches, multi-region architecture patterns, failover and disaster recovery strategies, and performance optimization techniques used by global services.
Global load balancing addresses three fundamental challenges that single-region architectures cannot solve:
Challenge 1: Latency
Data travels at the speed of light, but even light is slow across continents:
| Route | Distance | Light Speed Latency | Real-World RTT |
|---|---|---|---|
| NYC → London | 5,585 km | ~19 ms | 70-90 ms |
| NYC → Tokyo | 10,850 km | ~36 ms | 150-200 ms |
| NYC → Sydney | 16,000 km | ~53 ms | 200-250 ms |
RTT = Round Trip Time. Real-world is higher due to routing, switching, and processing.
For a user in Sydney accessing a US-only service, every request adds 200+ ms of latency. If a page requires 10 API calls, that's 2+ seconds of added delay—often the difference between engagement and abandonment.
Challenge 2: Availability
No single data center provides 100% availability. Hardware fails. Networks have outages. Natural disasters occur.
Single Region Availability:
99.9% uptime = 8.76 hours/year downtime
Multi-Region with Failover:
If regions fail independently:
99.99% uptime = 52.6 minutes/year downtime
(Achieved when either region can serve all traffic)
Challenge 3: Regulatory Compliance
Many regulations require data to remain in specific geographic regions:
| Regulation | Requirement |
|---|---|
| GDPR (EU) | EU citizen data processed in EU |
| CCPA (California) | California consumer data protections |
| PDPA (Singapore) | Personal data processed in Singapore |
| Data Localization Laws | Various countries require in-country processing |
Global load balancing enables routing users to compliant regions automatically.
Challenge 4: Cost Optimization
Data egress costs vary by region. Traffic between regions is expensive:
AWS Data Transfer Costs (approximate):
Within AZ: Free
Between AZs: $0.01/GB
Between Regions: $0.02-0.09/GB
To Internet: $0.05-0.09/GB
Keeping traffic in-region reduces costs significantly at scale.
Research shows that every 100ms of latency reduces conversions by ~1%. For e-commerce at scale, this translates to millions in lost revenue. Global load balancing isn't just a technical optimization—it's a business imperative.
The most widely used approach to global load balancing leverages the Domain Name System (DNS). Since every connection begins with a DNS lookup, DNS provides a natural interception point for geographic routing.
How DNS-Based GSLB Works:
1. User wants to access app.example.com
2. Browser queries DNS resolver for app.example.com
3. Resolver contacts authoritative DNS (GSLB-enabled)
4. GSLB DNS determines best endpoint based on:
- Geographic location of resolver
- Health of backends in each region
- Load/capacity in each region
- Custom routing policies
5. GSLB returns IP address of optimal region
6. User connects to returned IP
User (Tokyo)
│
▼
DNS Resolver
│
▼
GSLB DNS ─────► Determines: Tokyo is closest
│
▼
Returns: 52.198.x.x (Tokyo region IP)
│
▼
User connects to Tokyo datacenter
Key Caveat: Resolver Location ≠ User Location
DNS queries come from resolvers, not users. Users might use:
EDNS Client Subnet (ECS):
ECS solves the resolver location problem by including client subnet in DNS queries:
DNS Query (without ECS):
"What is app.example.com?" (from resolver 8.8.8.8)
DNS Query (with ECS):
"What is app.example.com?"
"Asking on behalf of client in 203.0.113.0/24"
The authoritative DNS can route based on actual user location, not resolver location.
DNS-Based GSLB Configuration (AWS Route 53 Example):
{
"HealthCheck": {
"Type": "HTTPS",
"FullyQualifiedDomainName": "app.us-east-1.example.com"
}
}
{
"RecordSet": {
"Name": "app.example.com",
"Type": "A",
"SetIdentifier": "us-east-1",
"Region": "us-east-1",
"HealthCheckId": "abc123",
"AliasTarget": {
"DNSName": "alb-us-east-1.elb.amazonaws.com"
}
}
}
// Similar records for us-west-2, eu-west-1, ap-northeast-1...
Routing Policies:
| Policy | Description | Use Case |
|---|---|---|
| Geolocation | Route based on user location | Compliance, localization |
| Latency | Route to lowest-latency region | Performance |
| Weighted | Distribute by percentage | Gradual migration |
| Failover | Primary-secondary | Disaster recovery |
| Geoproximity | Route to nearest with bias | Balance latency/capacity |
DNS responses are cached based on TTL (Time To Live). If TTL is 300 seconds (5 minutes), failover takes up to 5 minutes as caches expire. Lower TTLs mean faster failover but more DNS queries. Common compromise: 60 seconds for critical services.
Anycast is a network addressing and routing method where the same IP address is announced from multiple locations. Routers automatically direct traffic to the nearest (in network terms) location.
How Anycast Works:
Data Centers:
Tokyo DC announces: 203.0.113.0/24
London DC announces: 203.0.113.0/24
NYC DC announces: 203.0.113.0/24
(All the same IP range)
Routing:
User in Tokyo → Internet → Route to Tokyo DC (closest)
User in Paris → Internet → Route to London DC (closest)
User in Brazil → Internet → Route to NYC DC (closest)
┌──────────────────────────────────────────┐
│ Internet (BGP) │
│ │
│ User Tokyo ─────────► Tokyo DC │
│ (shortest AS path) │
│ │
│ User Paris ─────────► London DC │
│ (shortest AS path) │
│ │
│ User Brazil ────────► NYC DC │
│ (shortest AS path) │
└──────────────────────────────────────────┘
Anycast vs. DNS-Based:
| Aspect | DNS-Based | Anycast |
|---|---|---|
| Routing granularity | Per DNS resolver | Per packet |
| Failover speed | TTL-dependent (seconds to minutes) | Seconds (BGP convergence) |
| Connection affinity | Good (same IP for connection) | Variable (routing changes mid-connection possible) |
| Caching impact | None | None |
| Implementation | DNS configuration | BGP/network configuration |
| Best for | HTTP/HTTPS (application layer) | DNS, UDP, stateless services |
Anycast and TCP Connections:
Anycast works best with stateless protocols because routing changes can occur mid-connection:
1. User starts TCP connection to 203.0.113.1 (routed to NYC)
2. During connection, BGP route changes
3. Next packets are routed to London
4. London doesn't have connection state → Connection fails
Mitigations:
Where Anycast is Used:
Cloudflare's Architecture:
Cloudflare announces same IPs from 300+ data centers worldwide.
User in Tokyo:
1. DNS for example.com → Returns Cloudflare anycast IP
2. TCP connection → Anycast routes to nearest Cloudflare PoP
3. Cloudflare PoP proxies to origin (if needed)
Implementing anycast requires announcing BGP routes, which typically requires having your own AS number and IP space, or using a provider (cloud, CDN) that offers anycast. It's not something you can do with standard hosting.
Global load balancing integrates with broader multi-region architecture. Several patterns exist, each with distinct tradeoffs.
Pattern 1: Active-Passive (Primary-Secondary)
Primary (US-East)
┌─────────────────┐
All traffic ─►│ Full Application │
│ Full Database │
└─────────────────┘
│
│ Replication
▼
Secondary (EU-West)
┌─────────────────┐
On failover ─►│ Full Application │
│ DB Replica │
└─────────────────┘
Characteristics:
Pattern 2: Active-Active (All Regions Serve Traffic)
US-East EU-West
┌─────────────────┐ ┌─────────────────┐
US Users ─►│ Application │ │ Application │◄─ EU Users
│ Database (Primary)│◄──────►│ Database (Replica)│
└─────────────────┘ └─────────────────┘
Characteristics:
Pattern 3: Follow-the-Sun
Timeframe Active Region
00:00-08:00 UTC Asia-Pacific
08:00-16:00 UTC Europe
16:00-24:00 UTC Americas
GSLB routes based on time of day + user location.
Characteristics:
Pattern 4: Data Sovereignty (Geo-Sharded)
EU Users → EU Region (EU data only)
US Users → US Region (US data only)
APAC Users → APAC Region (APAC data only)
No cross-region data replication (compliance requirement).
Characteristics:
Pattern 5: CDN with Origin Shield
Edge PoPs (100s worldwide)
│
▼
Origin Shield (1-2 per region)
│
▼
Origin (Single or Multi-Region)
GSLB routes to CDN edge, CDN handles global distribution.
Characteristics:
| Pattern | Complexity | Cost | RTO | Best For |
|---|---|---|---|---|
| Active-Passive | Low | Low | Minutes | Basic DR, cost-sensitive |
| Active-Active | High | High | Seconds | Low latency, high availability |
| Follow-the-Sun | Medium | Medium | Varies | Regional user bases |
| Geo-Sharded | Medium | Medium | N/A | Compliance requirements |
| CDN + Origin | Low | Varies | N/A | Content-heavy applications |
Global load balancing enables automatic failover when regions fail. However, failover involves multiple components that must work together.
Failover Detection:
GSLB systems detect failures through health checks:
1. GSLB health check to US-East fails
2. Health check fails 3 consecutive times (30 seconds)
3. US-East marked unhealthy
4. DNS queries now return EU-West only
5. DNS TTL expires on clients (60 seconds)
6. Clients reconnect to EU-West
Total failover time: ~90 seconds (30s detection + 60s TTL)
Reducing Failover Time:
| Component | Optimization | Tradeoff |
|---|---|---|
| Health check interval | Reduce 10s → 5s | More health check traffic |
| Unhealthy threshold | Reduce 3 → 2 | More false positives |
| DNS TTL | Reduce 300s → 60s | More DNS queries (cost) |
| Connection timeout | Reduce on clients | Faster retry to new region |
Failover Types:
1. Automated Failover: GSLB automatically reroutes when health checks fail.
2. Manual Failover: Operators explicitly trigger failover.
3. Hybrid (Automated with Safeguards):
If (US-East unhealthy) AND (EU-West healthy) AND (not in maintenance window):
Failover automatically
Else:
Page on-call for manual decision
Data Consistency During Failover:
The hardest part of failover is data:
Scenario:
User writes data to US-East
US-East fails before replication to EU-West
User is failed over to EU-West
User's recent data is missing!
Mitigation Strategies:
Synchronous Replication: Write confirmed only when replicated
Asynchronous with Replay: Accept some data lag, handle on failover
Multi-Primary Databases: Write to any region
Failback After Recovery:
1. US-East recovers
2. Verify US-East is fully healthy (data sync complete)
3. Gradually shift traffic back (10% → 50% → 100%)
4. Monitor for issues
5. Return to normal state
DO NOT:
- Immediately failback (may oscillate)
- Failback without verifying data sync
- Failback during business peak
The only way to know if failover works is to test it. Regularly (quarterly or monthly) perform failover drills. Chaos engineering (deliberately causing failures) reveals gaps before real outages find them. If you can't test it in production, test in staging with production-like data.
Global load balancing is not just about routing to the closest region—it's about optimizing the entire user experience.
Technique 1: Latency-Based Routing
Rather than geographic proximity, route based on actual measured latency:
AWS Route 53 Latency Routing:
Continuously measures latency from resolver locations to regions
Returns IP of region with lowest measured latency
Result: User in São Paulo might be routed to US-East (lower latency)
instead of EU-West (geographically closer but higher latency)
Technique 2: Edge/CDN Integration
Request Path with CDN:
User → CDN Edge (nearest PoP)
│
▼ cached? ──yes──► Return from cache
│
no
│
▼
Origin Shield (regional cache)
│
▼ cached? ──yes──► Return from cache
│
no
│
▼
Origin (Application server)
CDN reduces load on origin and improves latency for cacheable content.
Technique 3: Connection Optimization
TLS Session Resumption:
0-RTT TLS (TLS 1.3):
TCP Fast Open:
Technique 4: Protocol Optimization
| Protocol | Benefit |
|---|---|
| HTTP/2 | Multiplexing (reduce connections), header compression |
| HTTP/3 (QUIC) | No head-of-line blocking, 0-RTT, faster handshake |
| gRPC | Efficient binary protocol, multiplexed |
| WebSocket | Persistent connection (reduce connection setup) |
Technique 5: Regional Pre-fetching
<!-- Hint browser to establish connection early -->
<link rel="preconnect" href="https://api.example.com">
<link rel="dns-prefetch" href="https://cdn.example.com">
Browser establishes connections before they're needed, hiding latency from user perception.
Use Real User Monitoring (RUM) to measure actual latency experienced by users. Server-side metrics miss network and rendering time. Tools like Google Analytics, Datadog RUM, or New Relic provide user-centric performance data.
Major cloud providers offer managed GSLB services with varying capabilities.
AWS Global Accelerator + Route 53:
Global Accelerator
(Anycast IPs)
│
┌────────────┼────────────┐
▼ ▼ ▼
Edge PoP Edge PoP Edge PoP
(Americas) (Europe) (Asia)
│ │ │
│ AWS Backbone │
└────────────┬────────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
Region 1 Region 2 Region 3
(ALB/NLB) (ALB/NLB) (ALB/NLB)
Features:
Google Cloud Global Load Balancer:
Single anycast IP serves entire globe:
Client → Google Edge → Google Network → Backend (any region)
Backend selection based on:
- Geographic proximity
- Backend capacity
- Health status
Features:
| Feature | AWS (GA + R53) | GCP (Global LB) | Azure (Traffic Manager) |
|---|---|---|---|
| Routing Method | Anycast + DNS | Anycast | DNS |
| Static IP | Yes (GA) | Yes | No (DNS-based) |
| Failover Speed | ~30s | Seconds | TTL-dependent |
| Protocol Support | TCP/UDP (GA), HTTP (ALB) | HTTP(S), TCP, UDP | HTTP, TCP |
| CDN Integration | CloudFront separate | Cloud CDN integrated | Azure CDN separate |
| Health Checks | Multi-layer | Multi-layer | Multi-layer |
| Pricing Model | Per accelerator + data | Per hour + data | Per rule + data | Per DNS query + endpoint |
CDN Providers:
CDN providers offer sophisticated GSLB as a core feature:
| Provider | Approach | Edge PoPs |
|---|---|---|
| Cloudflare | Anycast | 300+ |
| Akamai | DNS + Anycast | 4000+ |
| Fastly | Anycast | 70+ |
| AWS CloudFront | DNS | 450+ |
Configuration Example: Cloudflare Load Balancer
{
"name": "example-lb",
"default_pools": ["pool-us-east", "pool-eu-west"],
"fallback_pool": "pool-us-east",
"region_pools": {
"WNAM": ["pool-us-west"],
"ENAM": ["pool-us-east"],
"WEU": ["pool-eu-west"],
"EEU": ["pool-eu-west"],
"OC": ["pool-ap-syd"]
},
"steering_policy": "geo"
}
Cloud-native GSLB solutions work best with same-cloud resources. For multi-cloud or hybrid, consider CDN-based GSLB (Cloudflare, Akamai) or self-managed DNS (NS1, Dyn) that can route to any backend regardless of cloud provider.
We've comprehensively explored global load balancing, from DNS-based routing to BGP anycast, and from multi-region architecture patterns to cloud provider solutions.
Module Complete:
Congratulations! You've completed the Load Balancing module. You now understand:
This knowledge equips you to design, implement, and operate load balancing at any scale—from a simple web application to a globally distributed platform serving millions of users.
You have mastered load balancing concepts at a principal engineer level. You can now design load balancing strategies for any scale, from single-server applications to globally distributed systems. Apply these concepts to build resilient, performant, and scalable network architectures.