Loading learning content...
Before a user's request touches your application code, before it queries your database or hits your cache, a critical decision has already been made: which region will handle this request? This decision—made in milliseconds by DNS servers, load balancers, and routing policies—determines the user's latency experience, affects your system's load distribution, and enables regional failover.
Traffic routing is the front door of multi-region architecture. A user in São Paulo types your URL; within 20 milliseconds, routing infrastructure has determined whether their request goes to US-East, EU-West, or a South American edge location. This invisible choreography happens billions of times daily, yet most users never know it exists.
This page explores the technologies and strategies that make intelligent traffic routing possible—from DNS fundamentals to sophisticated health-aware load balancers—giving you the tools to direct traffic with precision and resilience.
By the end of this page, you will understand DNS-based geographic routing, implement global load balancing strategies, design health check systems that enable automatic failover, and choose appropriate routing policies for different multi-region configurations.
The Domain Name System (DNS) is the internet's phone book, translating human-readable domains into IP addresses. For multi-region systems, DNS becomes a routing layer, directing users to different IP addresses based on geography, health, or other factors.
How DNS Routing Works
When a user requests app.example.com:
example.comWith geographic DNS routing, step 3 becomes intelligent: the authoritative server considers the resolver's location (or uses EDNS Client Subnet for the user's actual location) and returns the IP address of the nearest/best region.
DNS Record Types for Routing
A Record: Maps domain to IPv4 address
app.example.com. 60 IN A 54.192.1.1
AAAA Record: Maps domain to IPv6 address
app.example.com. 60 IN AAAA 2600:9000:5306:6f00::1
CNAME Record: Aliases one domain to another (used for cloud load balancer integration)
app.example.com. 60 IN CNAME d123abc.cloudfront.net.
Alias Record (AWS Route 53): AWS-specific, points to AWS resources without CNAME limitations
TTL (Time-To-Live) Considerations
TTL controls how long DNS responses are cached. For multi-region:
Best Practice: Use 60-300 second TTL for production services. This balances failover speed with DNS infrastructure load. For critical services with sub-minute failover requirements, consider supplementing DNS with IP-layer failover.
Limitations of DNS Routing
Before planned failovers or infrastructure changes, lower your TTL well in advance (at least 24 hours before, set TTL to target value). This ensures caches expire before your change, enabling faster cutover. Restore higher TTL after stability is confirmed.
Modern DNS services offer sophisticated routing policies that go beyond simple domain-to-IP mapping. Understanding these policies is essential for designing multi-region traffic distribution.
Simple (Round-Robin) Routing
Multiple IP addresses returned for a single domain; clients pick one (typically first):
app.example.com. 60 IN A 54.192.1.1
app.example.com. 60 IN A 52.215.2.2
app.example.com. 60 IN A 13.114.3.3
Weighted Routing
Assign weights to endpoints; DNS responses proportionally reflect weights:
App US-East (weight 70)
App EU-West (weight 30)
Use cases:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193
# AWS Route 53 Routing Policy Examples # 1. GEOLOCATION ROUTING# Route users based on geographic location resource "aws_route53_record" "app_us" { zone_id = aws_route53_zone.main.zone_id name = "app.example.com" type = "A" geolocation_routing_policy { continent = "NA" # North America } set_identifier = "us-east" alias { name = aws_lb.us_east.dns_name zone_id = aws_lb.us_east.zone_id evaluate_target_health = true }} resource "aws_route53_record" "app_eu" { zone_id = aws_route53_zone.main.zone_id name = "app.example.com" type = "A" geolocation_routing_policy { continent = "EU" # Europe } set_identifier = "eu-west" alias { name = aws_lb.eu_west.dns_name zone_id = aws_lb.eu_west.zone_id evaluate_target_health = true }} resource "aws_route53_record" "app_default" { zone_id = aws_route53_zone.main.zone_id name = "app.example.com" type = "A" geolocation_routing_policy { country = "*" # Default for unmatched locations } set_identifier = "default" alias { name = aws_lb.us_east.dns_name zone_id = aws_lb.us_east.zone_id evaluate_target_health = true }} # 2. LATENCY-BASED ROUTING# Route to the region with lowest latency from user's location resource "aws_route53_record" "app_latency_us" { zone_id = aws_route53_zone.main.zone_id name = "app.example.com" type = "A" latency_routing_policy { region = "us-east-1" } set_identifier = "us-east-latency" alias { name = aws_lb.us_east.dns_name zone_id = aws_lb.us_east.zone_id evaluate_target_health = true }} resource "aws_route53_record" "app_latency_eu" { zone_id = aws_route53_zone.main.zone_id name = "app.example.com" type = "A" latency_routing_policy { region = "eu-west-1" } set_identifier = "eu-west-latency" alias { name = aws_lb.eu_west.dns_name zone_id = aws_lb.eu_west.zone_id evaluate_target_health = true }} # 3. FAILOVER ROUTING# Primary/secondary configuration for active-passive resource "aws_route53_record" "app_primary" { zone_id = aws_route53_zone.main.zone_id name = "app.example.com" type = "A" failover_routing_policy { type = "PRIMARY" } set_identifier = "primary" alias { name = aws_lb.us_east.dns_name zone_id = aws_lb.us_east.zone_id evaluate_target_health = true # Critical: enables failover } health_check_id = aws_route53_health_check.primary.id} resource "aws_route53_record" "app_secondary" { zone_id = aws_route53_zone.main.zone_id name = "app.example.com" type = "A" failover_routing_policy { type = "SECONDARY" } set_identifier = "secondary" alias { name = aws_lb.eu_west.dns_name zone_id = aws_lb.eu_west.zone_id evaluate_target_health = true }} # 4. WEIGHTED ROUTING# Distribute traffic by percentage resource "aws_route53_record" "app_weighted_us" { zone_id = aws_route53_zone.main.zone_id name = "app.example.com" type = "A" ttl = 60 weighted_routing_policy { weight = 80 # 80% of traffic } set_identifier = "us-east-weighted" records = ["54.192.1.1"] health_check_id = aws_route53_health_check.us_east.id} resource "aws_route53_record" "app_weighted_eu" { zone_id = aws_route53_zone.main.zone_id name = "app.example.com" type = "A" ttl = 60 weighted_routing_policy { weight = 20 # 20% of traffic } set_identifier = "eu-west-weighted" records = ["52.215.2.2"] health_check_id = aws_route53_health_check.eu_west.id} # 5. HEALTH CHECK CONFIGURATIONresource "aws_route53_health_check" "primary" { fqdn = "health.us-east.example.com" port = 443 type = "HTTPS" resource_path = "/health" failure_threshold = "3" request_interval = "10" regions = [ "us-east-1", "us-west-2", "eu-west-1" ] tags = { Name = "Primary Region Health Check" }}Geolocation Routing
Route based on user's geographic location (country, continent, or US state):
Latency-Based Routing
AWS Route 53 maintains latency measurements between its resolver locations and AWS regions. Responses direct users to the lowest-latency region:
Failover Routing
Explicit primary/secondary configuration with health-check-driven failover:
Multivalue Answer Routing
Returns multiple healthy IP addresses (up to 8):
| Policy | Best For | Considerations |
|---|---|---|
| Geolocation | Compliance, regulatory requirements | Configure defaults; users near borders may be suboptimal |
| Latency | Performance optimization | AWS only; measures network, not application latency |
| Weighted | Migrations, canary, load balancing | Need to adjust weights manually or via automation |
| Failover | Active-passive DR | Only two tiers; can combine with other policies |
| Multivalue | Simple load distribution with health | Client behavior varies; not true load balancing |
DNS-based routing has fundamental limitations: caching, coarse-grained decisions, and slow failover. Global Load Balancers (GLBs) operate at the network layer, providing more sophisticated and responsive traffic management.
How Global Load Balancers Work
Unlike DNS routing, GLBs typically use anycast IP addressing: multiple locations advertise the same IP address, and network routing (BGP) directs traffic to the nearest location. From there, the GLB can make intelligent decisions:
This provides:
Major Global Load Balancer Solutions
AWS Global Accelerator
Google Cloud Load Balancing
Cloudflare
Azure Front Door
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
# Google Cloud Global Load Balancer Configuration# Single anycast IP serving traffic from multiple regions # Backend service grouping regional instance groupsresource "google_compute_backend_service" "global_app" { name = "global-app-backend" protocol = "HTTP" port_name = "http" timeout_sec = 30 enable_cdn = true # Health check for backend instances health_checks = [google_compute_health_check.app.id] # US region backend with capacity backend { group = google_compute_region_instance_group_manager.us.instance_group balancing_mode = "UTILIZATION" capacity_scaler = 1.0 max_utilization = 0.8 } # EU region backend with capacity backend { group = google_compute_region_instance_group_manager.eu.instance_group balancing_mode = "UTILIZATION" capacity_scaler = 1.0 max_utilization = 0.8 } # Asia region backend with capacity backend { group = google_compute_region_instance_group_manager.asia.instance_group balancing_mode = "UTILIZATION" capacity_scaler = 1.0 max_utilization = 0.8 } # Outlier detection for automatic ejection of unhealthy instances outlier_detection { consecutive_errors = 5 interval { seconds = 10 } base_ejection_time { seconds = 30 } max_ejection_percent = 50 } # Circuit breaker settings circuit_breakers { max_connections = 1000 max_pending_requests = 200 max_requests = 1000 max_retries = 3 }} # Health check definitionresource "google_compute_health_check" "app" { name = "app-health-check" check_interval_sec = 5 timeout_sec = 5 healthy_threshold = 2 unhealthy_threshold = 3 http_health_check { port = 8080 request_path = "/healthz" }} # URL map for routingresource "google_compute_url_map" "global_app" { name = "global-app-urlmap" default_service = google_compute_backend_service.global_app.id # Path-based routing example host_rule { hosts = ["app.example.com"] path_matcher = "app-paths" } path_matcher { name = "app-paths" default_service = google_compute_backend_service.global_app.id path_rule { paths = ["/api/*"] service = google_compute_backend_service.global_app.id } path_rule { paths = ["/static/*"] service = google_compute_backend_bucket.static.id } }} # HTTPS proxy with TLS terminationresource "google_compute_target_https_proxy" "global_app" { name = "global-app-https-proxy" url_map = google_compute_url_map.global_app.id ssl_certificates = [google_compute_managed_ssl_certificate.app.id]} # Global forwarding rule with anycast IPresource "google_compute_global_forwarding_rule" "global_app" { name = "global-app-forwarding-rule" target = google_compute_target_https_proxy.global_app.id port_range = "443" ip_address = google_compute_global_address.app.address} # Reserve a global anycast IPresource "google_compute_global_address" "app" { name = "global-app-ip"} # Managed SSL certificateresource "google_compute_managed_ssl_certificate" "app" { name = "app-cert" managed { domains = ["app.example.com"] }}GLB vs DNS Routing: When to Use Each
Prefer Global Load Balancer when:
Prefer DNS Routing when:
Combined Approach (Common in Practice)
Many production systems combine DNS and GLB:
With anycast, users always connect to the same IP address regardless of region. This simplifies DNS configuration, eliminates TTL-based failover delays, and allows instant traffic shifting. If you're using a cloud GLB, you're likely already using anycast.
Health checks are the nervous system of multi-region traffic routing. They continuously probe endpoints, detect failures, and trigger routing changes. Properly designed health checks are essential for reliable failover.
Health Check Types
TCP Health Checks
HTTP/HTTPS Health Checks
Deep Health Checks (Application-Aware)
Best Practice: Layered Health Checks
Implement multiple health endpoints:
/alive (liveness): Application process is running/ready (readiness): Application can serve traffic/health (deep): All dependencies are functional123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249
/** * Multi-Layer Health Check Implementation * * Provides endpoints for different health check scenarios: * - Liveness: Process is running * - Readiness: Ready to receive traffic * - Deep: All dependencies verified */ import { Router, Request, Response } from 'express';import { Pool } from 'pg';import Redis from 'ioredis'; interface HealthStatus { status: 'healthy' | 'degraded' | 'unhealthy'; checks: Record<string, CheckResult>; timestamp: string; version: string; region: string;} interface CheckResult { status: 'pass' | 'fail' | 'warn'; latencyMs?: number; message?: string;} class HealthChecker { private db: Pool; private redis: Redis; private appVersion: string; private region: string; constructor(db: Pool, redis: Redis) { this.db = db; this.redis = redis; this.appVersion = process.env.APP_VERSION || 'unknown'; this.region = process.env.AWS_REGION || 'unknown'; } /** * Liveness check: Is the process running? * Used by: Container orchestration (restart on fail) */ async checkLiveness(): Promise<CheckResult> { // If this code executes, we're alive return { status: 'pass' }; } /** * Readiness check: Can we serve traffic? * Used by: Load balancer (remove from rotation on fail) */ async checkReadiness(): Promise<HealthStatus> { const checks: Record<string, CheckResult> = {}; // Check database connection pool checks.database = await this.checkDatabase(); // Check cache connection checks.cache = await this.checkCache(); // Overall status based on critical dependencies const criticalFailed = checks.database.status === 'fail'; return { status: criticalFailed ? 'unhealthy' : 'healthy', checks, timestamp: new Date().toISOString(), version: this.appVersion, region: this.region }; } /** * Deep health check: Are all dependencies fully functional? * Used by: DNS/GLB routing (failover to another region on fail) */ async checkDeep(): Promise<HealthStatus> { const checks: Record<string, CheckResult> = {}; // All dependency checks const [dbCheck, cacheCheck, externalCheck] = await Promise.allSettled([ this.checkDatabaseQuery(), this.checkCacheReadWrite(), this.checkExternalDependencies() ]); checks.database = dbCheck.status === 'fulfilled' ? dbCheck.value : { status: 'fail', message: 'Check threw exception' }; checks.cache = cacheCheck.status === 'fulfilled' ? cacheCheck.value : { status: 'fail', message: 'Check threw exception' }; checks.external = externalCheck.status === 'fulfilled' ? externalCheck.value : { status: 'fail', message: 'Check threw exception' }; // Determine overall health const failCount = Object.values(checks) .filter(c => c.status === 'fail').length; const warnCount = Object.values(checks) .filter(c => c.status === 'warn').length; let status: 'healthy' | 'degraded' | 'unhealthy' = 'healthy'; if (failCount > 0) status = 'unhealthy'; else if (warnCount > 0) status = 'degraded'; return { status, checks, timestamp: new Date().toISOString(), version: this.appVersion, region: this.region }; } private async checkDatabase(): Promise<CheckResult> { const start = Date.now(); try { // Quick connection check await this.db.query('SELECT 1'); return { status: 'pass', latencyMs: Date.now() - start }; } catch (error) { return { status: 'fail', message: (error as Error).message, latencyMs: Date.now() - start }; } } private async checkDatabaseQuery(): Promise<CheckResult> { const start = Date.now(); try { // More thorough check: verify we can query const result = await this.db.query( 'SELECT COUNT(*) FROM information_schema.tables' ); const latency = Date.now() - start; // Warn if query is slow if (latency > 1000) { return { status: 'warn', message: 'Query latency elevated', latencyMs: latency }; } return { status: 'pass', latencyMs: latency }; } catch (error) { return { status: 'fail', message: (error as Error).message, latencyMs: Date.now() - start }; } } private async checkCache(): Promise<CheckResult> { const start = Date.now(); try { await this.redis.ping(); return { status: 'pass', latencyMs: Date.now() - start }; } catch (error) { return { status: 'fail', message: (error as Error).message, latencyMs: Date.now() - start }; } } private async checkCacheReadWrite(): Promise<CheckResult> { const start = Date.now(); const testKey = `health-check:${Date.now()}`; const testValue = 'test'; try { await this.redis.set(testKey, testValue, 'EX', 10); const retrieved = await this.redis.get(testKey); await this.redis.del(testKey); if (retrieved !== testValue) { return { status: 'fail', message: 'Read/write verification failed', latencyMs: Date.now() - start }; } return { status: 'pass', latencyMs: Date.now() - start }; } catch (error) { return { status: 'fail', message: (error as Error).message, latencyMs: Date.now() - start }; } } private async checkExternalDependencies(): Promise<CheckResult> { // Check critical external services // In production, you might check payment providers, etc. return { status: 'pass' }; }} // Express router setupexport function createHealthRouter(db: Pool, redis: Redis): Router { const router = Router(); const checker = new HealthChecker(db, redis); // Liveness - always succeed if running router.get('/alive', async (req: Request, res: Response) => { const result = await checker.checkLiveness(); res.status(200).json(result); }); // Readiness - check critical dependencies router.get('/ready', async (req: Request, res: Response) => { const result = await checker.checkReadiness(); const statusCode = result.status === 'healthy' ? 200 : 503; res.status(statusCode).json(result); }); // Deep health - comprehensive dependency check router.get('/health', async (req: Request, res: Response) => { const result = await checker.checkDeep(); let statusCode: number; switch (result.status) { case 'healthy': statusCode = 200; break; case 'degraded': statusCode = 200; break; // Still serving case 'unhealthy': statusCode = 503; break; } res.status(statusCode).json(result); }); return router;}Health Check Configuration Parameters
Interval: How often to probe (typically 10-30 seconds)
Timeout: How long to wait for response (typically 5-10 seconds)
Healthy Threshold: Consecutive passes to mark healthy (typically 2-3)
Unhealthy Threshold: Consecutive failures to mark unhealthy (typically 2-5)
Multi-Location Probing
Probe from multiple geographic locations:
Deep health checks can cause cascading failures: if a non-critical dependency fails, the health check fails, traffic shifts, overwhelming other regions, which then also fail health checks. Consider separating critical (routing-affecting) and non-critical (monitoring-only) dependencies.
Beyond automatic failover, traffic shifting enables controlled migration between regions for deployments, maintenance, and testing.
Gradual Traffic Migration
When deploying to a new region or recovering from failover:
This identifies issues before they affect all users and allows capacity validation.
Canary Routing
Route a small percentage of traffic to a canary deployment:
Session Affinity Considerations
Some applications require consistent region routing for a user session:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239
"""Traffic Shifting Controller for Multi-Region Systems Manages gradual traffic shifts between regions withautomated rollback based on health metrics."""import boto3from dataclasses import dataclassfrom datetime import datetime, timedeltafrom typing import Dict, List, Optionalimport time @dataclassclass ShiftStage: """Configuration for a traffic shift stage.""" target_weights: Dict[str, int] # region -> weight (0-100) duration_minutes: int # Time to wait before next stage rollback_threshold: float # Error rate that triggers rollback @dataclassclass ShiftPlan: """Multi-stage traffic shift plan.""" name: str stages: List[ShiftStage] current_stage: int = 0 started_at: Optional[datetime] = None class TrafficShifter: """ Manages controlled traffic shifts between regions. Integrates with Route 53 weighted routing and CloudWatch for metric-based progression and rollback. """ def __init__(self, hosted_zone_id: str, record_name: str): self.route53 = boto3.client('route53') self.cloudwatch = boto3.client('cloudwatch') self.hosted_zone_id = hosted_zone_id self.record_name = record_name self.active_plan: Optional[ShiftPlan] = None def create_gradual_shift_plan( self, from_region: str, to_region: str, stages: int = 4 ) -> ShiftPlan: """ Create a gradual traffic shift plan between regions. Example 4-stage shift (75% US to 25% EU → 25% US to 75% EU): Stage 1: 75% US, 25% EU (validate) Stage 2: 50% US, 50% EU Stage 3: 25% US, 75% EU Stage 4: 10% US, 90% EU (keep some in original) """ shift_amounts = [25, 50, 75, 90][:stages] plan_stages = [] for i, to_weight in enumerate(shift_amounts): from_weight = 100 - to_weight plan_stages.append(ShiftStage( target_weights={ from_region: from_weight, to_region: to_weight }, duration_minutes=15 if i < stages - 1 else 0, rollback_threshold=0.05 # 5% error rate )) return ShiftPlan( name=f"shift-{from_region}-to-{to_region}", stages=plan_stages ) def execute_plan(self, plan: ShiftPlan, auto_progress: bool = False): """ Execute a traffic shift plan. If auto_progress is True, automatically advances stages when metrics are healthy. Otherwise, waits for manual confirmation. """ self.active_plan = plan plan.started_at = datetime.now() for stage_num, stage in enumerate(plan.stages): plan.current_stage = stage_num print(f"Executing stage {stage_num + 1}/{len(plan.stages)}") print(f"Target weights: {stage.target_weights}") # Apply traffic weights self._update_route53_weights(stage.target_weights) # Wait for DNS propagation print("Waiting for DNS propagation (60s)...") time.sleep(60) # Monitor for stage duration if stage.duration_minutes > 0: if auto_progress: healthy = self._monitor_stage( stage.duration_minutes, stage.rollback_threshold ) if not healthy: print("Unhealthy metrics detected, initiating rollback") self._rollback_to_stage(0) # Return to initial state return False else: print(f"Stage complete. Waiting {stage.duration_minutes}m") print("Monitor metrics and call advance_stage() to continue") return True # Pause for manual review print("Traffic shift completed successfully") self.active_plan = None return True def _update_route53_weights(self, weights: Dict[str, int]): """Update Route 53 weighted records.""" changes = [] # Get current records response = self.route53.list_resource_record_sets( HostedZoneId=self.hosted_zone_id, StartRecordName=self.record_name, StartRecordType='A', MaxItems='10' ) for record in response['ResourceRecordSets']: if record['Name'].rstrip('.') == self.record_name: if 'SetIdentifier' in record: region = record['SetIdentifier'] if region in weights: changes.append({ 'Action': 'UPSERT', 'ResourceRecordSet': { **record, 'Weight': weights[region] } }) if changes: self.route53.change_resource_record_sets( HostedZoneId=self.hosted_zone_id, ChangeBatch={ 'Comment': f'Traffic shift: {weights}', 'Changes': changes } ) def _monitor_stage( self, duration_minutes: int, error_threshold: float ) -> bool: """ Monitor metrics during stage execution. Returns False if rollback should be triggered. """ end_time = datetime.now() + timedelta(minutes=duration_minutes) check_interval = 60 # seconds while datetime.now() < end_time: error_rate = self._get_current_error_rate() print(f"Current error rate: {error_rate:.2%}") if error_rate > error_threshold: return False # Trigger rollback time.sleep(check_interval) return True # Stage completed successfully def _get_current_error_rate(self) -> float: """Get current error rate from CloudWatch.""" end_time = datetime.now() start_time = end_time - timedelta(minutes=5) # Get 5xx error count errors_response = self.cloudwatch.get_metric_statistics( Namespace='AWS/ApplicationELB', MetricName='HTTPCode_Target_5XX_Count', StartTime=start_time, EndTime=end_time, Period=300, Statistics=['Sum'] ) # Get total request count requests_response = self.cloudwatch.get_metric_statistics( Namespace='AWS/ApplicationELB', MetricName='RequestCount', StartTime=start_time, EndTime=end_time, Period=300, Statistics=['Sum'] ) errors = sum( dp['Sum'] for dp in errors_response.get('Datapoints', []) ) requests = sum( dp['Sum'] for dp in requests_response.get('Datapoints', []) ) if requests == 0: return 0.0 return errors / requests def _rollback_to_stage(self, stage_num: int): """Rollback to a previous stage's weights.""" if not self.active_plan: return stage = self.active_plan.stages[stage_num] print(f"Rolling back to stage {stage_num}: {stage.target_weights}") self._update_route53_weights(stage.target_weights) # Example usageif __name__ == "__main__": shifter = TrafficShifter( hosted_zone_id="Z123456789", record_name="app.example.com" ) # Create and execute gradual shift from US to EU plan = shifter.create_gradual_shift_plan( from_region="us-east", to_region="eu-west", stages=4 ) shifter.execute_plan(plan, auto_progress=True)Blue-Green Region Deployments
For major changes, run parallel infrastructure:
Maintenance Window Routing
During regional maintenance:
Sudden traffic shifts can overwhelm the receiving region. Always scale capacity before increasing traffic weight. Connection pools, cache warming, and JIT compilation all need time to stabilize under new load patterns.
We've explored the technologies and strategies that direct users to the right region in multi-region systems. Let's consolidate the key principles:
Module Complete: Multi-Region Architecture
This module has taken you through the complete journey of multi-region system design:
With this knowledge, you can design multi-region architectures that provide high availability, low latency, and resilience to regional failures—the hallmarks of truly global-scale systems.
You've completed the Multi-Region Architecture module. You now understand the patterns, mechanisms, and operational practices for building systems that span geographic regions—delivering high availability and low latency to users worldwide.