Loading learning content...
At 1 billion redirects per day, our URL shortener faces an extraordinary read challenge:
This isn't just about adding more servers—it's about designing a system where the architecture itself scales, where adding capacity is linear, and where no single component becomes a bottleneck.
In this final page, we'll bring together everything we've learned into a cohesive, production-ready architecture capable of handling the world's redirect traffic.
By the end of this page, you will understand how to horizontally scale the redirect path, when and how to shard databases, design for high availability with zero-downtime deployments, and implement comprehensive monitoring and alerting.
Before diving into tactics, let's establish first principles for scaling read-heavy workloads.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
Complete Read Path for Billion-Scale URL Shortener=================================================== User Request: GET https://short.url/a7Xk2B ┌─────────────────────────────────────────────────────────────────────────┐│ EDGE LAYER ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ││ │ CDN PoP 1 │ │ CDN PoP 2 │ │ CDN PoP N │ ││ │ (Europe) │ │ (Americas) │ │ (Asia) │ ││ │ │ │ │ │ │ ││ │ Cache: 70% │ │ Cache: 70% │ │ Cache: 70% │ ││ │ hit rate │ │ hit rate │ │ hit rate │ ││ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ ││ │ │ │ │└───────────┼──────────────────────┼──────────────────────┼───────────────┘ └──────────────────────┼──────────────────────┘ │ 30% of requests ▼┌─────────────────────────────────────────────────────────────────────────┐│ REGIONAL LAYER ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ Load Balancer (ALB/NLB) │ ││ │ Health Checks, SSL Termination │ ││ └─────────────────────────────────────────────────────────────────┘ ││ │ ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ Application Server Fleet (Auto-scaling) │ ││ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ ││ │ │ App 1 │ │ App 2 │ │ App 3 │ │ App 4 │ │ App N │ │ ││ │ │ Local │ │ Local │ │ Local │ │ Local │ │ Local │ │ ││ │ │ Cache │ │ Cache │ │ Cache │ │ Cache │ │ Cache │ │ ││ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ ││ │ \___________/ \___________/ \___________/ │ ││ │ 25% hit rate │ ││ └─────────────────────────────────────────────────────────────────┘ ││ │ 75% to Redis ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ Redis Cluster │ ││ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ ││ │ │Master 1 │ │Master 2 │ │Master 3 │ │Replica │ │ ││ │ │ Shard │ │ Shard │ │ Shard │ │ Pool │ │ ││ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ ││ │ 95% hit rate │ ││ └─────────────────────────────────────────────────────────────────┘ ││ │ 5% to Database ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ Database Tier │ ││ │ ┌───────────────┐ ┌───────────────────────────────┐ │ ││ │ │ Primary │───▶ │ Read Replicas (3-5) │ │ ││ │ │ (Writes) │ │ Low-latency reads │ │ ││ │ └───────────────┘ └───────────────────────────────┘ │ ││ └─────────────────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────────────────┘ Hit Rate Summary:- CDN: 70% → Only 30% reach origin- Local Cache: 25% of remaining → Only 22.5% to Redis- Redis: 95% of remaining → Only 1.5% to Database Net result: 1.5% of requests hit databaseAt 1B requests/day: 15M database reads/day = 173/second avgWith three cache layers (CDN 70% + Local 25% + Redis 95%), only 1.5% of requests reach the database. This transforms a 12,000 RPS problem into a 180 RPS problem—easily handled by a single database with read replicas.
Application servers should be completely stateless, allowing them to scale horizontally with zero coordination.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
/** * Stateless Application Server Design * * Any request can be handled by any server. * All state lives in external systems (Redis, Database). */ class RedirectServer { // Configuration loaded at startup (immutable during runtime) private readonly config: ServerConfig; // Caches - ephemeral, can be lost on restart private readonly localCache: LocalUrlCache; // External connections - shared state private readonly redis: RedisCluster; private readonly db: DatabasePool; private readonly analytics: AnalyticsEmitter; constructor(config: ServerConfig) { this.config = config; // Local cache: volatile, per-instance this.localCache = new LocalUrlCache({ maxSize: 1_000_000, // ~300MB ttlMs: 10 * 60 * 1000, // 10 minutes }); // Redis: shared across all instances this.redis = new RedisCluster(config.redisNodes); // Database: primary + read replicas this.db = new DatabasePool({ primary: config.dbPrimary, replicas: config.dbReplicas, readFromReplica: true, // Reads go to replicas }); // Analytics: fire-and-forget via Kafka this.analytics = new AnalyticsEmitter(config.kafkaBrokers); } /** * Main redirect handler - stateless. * Same request on any server produces same result. */ async handleRedirect(shortCode: string, request: Request): Promise<Response> { // 1. Check local cache let longUrl = this.localCache.get(shortCode); if (longUrl) { this.emitAnalytics(shortCode, request, 'local'); return this.redirect(longUrl); } // 2. Check Redis (shared cache) longUrl = await this.redis.get(`url:${shortCode}`); if (longUrl) { this.localCache.set(shortCode, longUrl); // Promote to local this.emitAnalytics(shortCode, request, 'redis'); return this.redirect(longUrl); } // 3. Check database (source of truth) longUrl = await this.db.replica.query( 'SELECT long_url FROM urls WHERE short_code = $1', [shortCode] ); if (!longUrl) { return new Response('Not Found', { status: 404 }); } // Populate caches this.localCache.set(shortCode, longUrl); this.redis.setex(`url:${shortCode}`, 3600, longUrl).catch(() => {}); this.emitAnalytics(shortCode, request, 'database'); return this.redirect(longUrl); }} // Key properties of stateless design:// ✓ No session data on server// ✓ No local writes that need persistence// ✓ Can restart/replace any server instantly// ✓ Scale from 2 to 200 instances seamlessly123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
# Kubernetes Horizontal Pod Autoscaler Configuration apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: redirect-server-hpaspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: redirect-server # Scale between 5 and 100 pods minReplicas: 5 maxReplicas: 100 metrics: # Primary: CPU utilization - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60 # Scale up at 60% CPU # Secondary: Request latency (custom metric) - type: Pods pods: metric: name: http_request_duration_p99 target: type: AverageValue averageValue: 30ms # Scale up if P99 exceeds 30ms # Tertiary: Requests per second - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: 1000 # ~1000 RPS per pod behavior: scaleUp: stabilizationWindowSeconds: 30 # Wait 30s before scaling up policies: - type: Pods value: 10 # Add up to 10 pods at a time periodSeconds: 60 scaleDown: stabilizationWindowSeconds: 300 # Wait 5min before scaling down policies: - type: Percent value: 10 # Remove 10% of pods at a time periodSeconds: 60Scale-up should be aggressive (respond to traffic spikes quickly), but scale-down should be gradual. Traffic can spike again; scaling down too fast and then up again (flapping) wastes resources and causes instability. Wait 5+ minutes of stable low traffic before reducing capacity.
Even with aggressive caching, the database must handle millions of reads daily. Read replicas and careful sharding ensure the database layer scales.
123456789101112131415161718192021222324252627282930313233343536373839404142
Database Read Scaling with Replicas==================================== ┌─────────────────────────────────────────┐ │ APPLICATION LAYER │ │ (Reads distributed across replicas) │ └────────────────────┬────────────────────┘ │ ┌───────────────┴───────────────┐ │ │ ┌────▼────┐ ┌────▼────┐ │ WRITES │ │ READS │ └────┬────┘ └────┬────┘ │ │ ▼ ▼ ┌─────────────────────┐ ┌─────────────────────────────┐ │ PRIMARY │ │ REPLICA POOL │ │ (Single Node) │ │ │ │ │ │ ┌────────┐ ┌────────┐ │ │ • All writes │────▶│ │Replica │ │Replica │ │ │ • Sync replication │ │ │ 1 │ │ 2 │ │ │ • HA failover │ │ └────────┘ └────────┘ │ │ │ │ │ └─────────────────────┘ │ ┌────────┐ ┌────────┐ │ │ │Replica │ │Replica │ │ Replication Lag: │ │ 3 │ │ 4 │ │ Async: 10-100ms │ └────────┘ └────────┘ │ │ │ │ Load Balanced │ │ (Round-robin or latency) │ └─────────────────────────────┘ Benefits:- Read capacity scales linearly with replicas- Replicas can be in different AZs for DR- Queries don't impact write performance- Easy to add capacity without downtime Trade-offs:- Replication lag (reads may be slightly stale)- Additional infrastructure cost- Operational complexity (promotion, failover)12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
/** * Database Connection Pool with Read Routing */ class DatabasePool { private primary: Pool; private replicas: Pool[]; private replicaIndex = 0; constructor(config: DbConfig) { // Primary: used only for writes this.primary = new Pool({ host: config.primaryHost, max: 20, // Lower pool size for writes idleTimeoutMillis: 30000, }); // Replicas: used for reads this.replicas = config.replicaHosts.map(host => new Pool({ host, max: 50, // Higher pool size for reads idleTimeoutMillis: 30000, })); } /** * Route read queries to replicas (round-robin). */ async read(query: string, params: any[]): Promise<QueryResult> { const replica = this.getNextReplica(); try { return await replica.query(query, params); } catch (error) { // If replica fails, try another return await this.readWithFallback(query, params, replica); } } /** * Route write queries to primary only. */ async write(query: string, params: any[]): Promise<QueryResult> { return await this.primary.query(query, params); } private getNextReplica(): Pool { // Round-robin across replicas this.replicaIndex = (this.replicaIndex + 1) % this.replicas.length; return this.replicas[this.replicaIndex]; } private async readWithFallback( query: string, params: any[], failedReplica: Pool ): Promise<QueryResult> { // Try other replicas for (const replica of this.replicas) { if (replica !== failedReplica) { try { return await replica.query(query, params); } catch { continue; } } } // All replicas failed - fall back to primary console.warn('All replicas failed, falling back to primary'); return await this.primary.query(query, params); }} // Usage in redirect handler:const longUrl = await db.read( 'SELECT long_url FROM urls WHERE short_code = $1', [shortCode]);Async replication means replicas lag behind primary by 10-100ms typically. For redirects, this is acceptable—a URL created 50ms ago appearing on replicas 500ms later is fine. For more critical read-after-write scenarios (e.g., user dashboard), route those reads to primary.
At extreme scale (100+ billion URLs), even read replicas aren't enough. Sharding distributes data across multiple independent database clusters.
| Metric | Single Database | Sharding Needed |
|---|---|---|
| Data Size | < 1 TB | 5 TB |
| Write Rate | < 10K/sec | 50K/sec |
| Read Rate (post-cache) | < 100K/sec | 500K/sec |
| Query Latency | < 10ms P99 | 50ms P99 despite optimization |
| Replication Lag | < 100ms | 1 second (can't keep up) |
For URL shorteners, sharding by short code is natural—each lookup needs only one shard:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
/** * Database Sharding by Short Code * * Distribute URLs across N shards based on short code hash. * Each short code maps to exactly one shard. */ class ShardedDatabase { private shards: DatabasePool[]; private readonly numShards: number; constructor(shardConfigs: ShardConfig[]) { this.numShards = shardConfigs.length; this.shards = shardConfigs.map(config => new DatabasePool(config)); } /** * Determine which shard holds a given short code. * Uses consistent hashing for stable mapping. */ getShardIndex(shortCode: string): number { // Simple modulo hashing (use consistent hashing for production) const hash = this.hashCode(shortCode); return Math.abs(hash) % this.numShards; } /** * Get the shard for a short code. */ getShard(shortCode: string): DatabasePool { const index = this.getShardIndex(shortCode); return this.shards[index]; } /** * Lookup a URL (routed to correct shard). */ async getLongUrl(shortCode: string): Promise<string | null> { const shard = this.getShard(shortCode); const result = await shard.read( 'SELECT long_url FROM urls WHERE short_code = $1', [shortCode] ); return result.rows[0]?.long_url ?? null; } /** * Create a URL (routed to correct shard). */ async createUrl(shortCode: string, longUrl: string, userId: string): Promise<void> { const shard = this.getShard(shortCode); await shard.write( 'INSERT INTO urls (short_code, long_url, user_id) VALUES ($1, $2, $3)', [shortCode, longUrl, userId] ); } /** * DJB2 hash function - fast and well-distributed. */ private hashCode(str: string): number { let hash = 5381; for (let i = 0; i < str.length; i++) { hash = ((hash << 5) + hash) ^ str.charCodeAt(i); } return hash; }} // Shard topology example (4 shards):// // Shard 0: short codes where hash(code) % 4 == 0// Shard 1: short codes where hash(code) % 4 == 1// Shard 2: short codes where hash(code) % 4 == 2// Shard 3: short codes where hash(code) % 4 == 3//// Each shard is a full primary + replicas cluster1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
/** * Consistent Hashing for Sharding * * When adding/removing shards, only K/N keys need to move * (K = total keys, N = total shards). * * Without consistent hashing, adding 1 shard redistributes * ~all keys (disaster for operations). */ import { createHash } from 'crypto'; class ConsistentHashRing { private ring: Map<number, string> = new Map(); // position -> shardId private sortedPositions: number[] = []; private readonly virtualNodes: number; constructor(virtualNodes: number = 150) { this.virtualNodes = virtualNodes; // More virtual nodes = better distribution } /** * Add a shard to the ring. */ addShard(shardId: string): void { for (let i = 0; i < this.virtualNodes; i++) { const position = this.hash(`${shardId}:vnode:${i}`); this.ring.set(position, shardId); } this.sortedPositions = [...this.ring.keys()].sort((a, b) => a - b); } /** * Remove a shard from the ring. */ removeShard(shardId: string): void { for (let i = 0; i < this.virtualNodes; i++) { const position = this.hash(`${shardId}:vnode:${i}`); this.ring.delete(position); } this.sortedPositions = [...this.ring.keys()].sort((a, b) => a - b); } /** * Get the shard for a key. */ getShard(key: string): string { if (this.ring.size === 0) { throw new Error('No shards in ring'); } const position = this.hash(key); // Find first position >= key position (binary search) let idx = this.binarySearch(position); // Wrap around to first position if past end if (idx >= this.sortedPositions.length) { idx = 0; } return this.ring.get(this.sortedPositions[idx])!; } private hash(key: string): number { const hash = createHash('md5').update(key).digest(); return hash.readUInt32BE(0); } private binarySearch(position: number): number { let low = 0; let high = this.sortedPositions.length; while (low < high) { const mid = Math.floor((low + high) / 2); if (this.sortedPositions[mid] < position) { low = mid + 1; } else { high = mid; } } return low; }} // Adding a shard (e.g., scaling from 4 to 5 shards):// - Only ~20% of keys need to migrate (1/5)// - Compare to modulo hashing: ~80% of keys would migrate!Sharding optimizes single-key lookups but makes aggregate queries (count all URLs, user's URLs across shards) expensive. Design your shard key to avoid cross-shard queries for common operations. For URL shorteners, shard by short code since each redirect needs only one shard.
At 99.99% availability (52 minutes downtime/year), we need redundancy at every layer and automated failover.
1234567891011121314151617181920212223242526272829303132333435363738394041
Multi-Availability-Zone Architecture===================================== ┌─────────────────────────────┐ │ Global Load Balancer │ │ (DNS-based / Anycast) │ └──────────────┬──────────────┘ │ ┌─────────────────────┴─────────────────────┐ │ │ ┌────────▼────────┐ ┌────────▼────────┐ │ REGION: US-EAST │ REGION: EU-WEST │ │ │ ┌───────────────────────────────┐ │ ┌───────────────────────────────┐ │ │ Availability Zone A │ │ │ Availability Zone A │ │ │ ┌───────┐ ┌───────┐ ┌───────┐│ │ │ ┌───────┐ ┌───────┐ ┌───────┐│ │ │ │App 1-3│ │Redis │ │DB Rep │││ │ │ │App 1-3│ │Redis │ │DB Rep │││ │ │ └───────┘ └───────┘ └───────┘│ │ │ └───────┘ └───────┘ └───────┘│ │ └───────────────────────────────┘ │ └───────────────────────────────┘ │ │ │ ┌───────────────────────────────┐ │ ┌───────────────────────────────┐ │ │ Availability Zone B │ │ │ Availability Zone B │ │ │ ┌───────┐ ┌───────┐ ┌───────┐│ │ │ ┌───────┐ ┌───────┐ ┌───────┐│ │ │ │App 4-6│ │Redis │ │DB Pri │││ │ │ │App 4-6│ │Redis │ │DB Rep │││ │ │ └───────┘ └───────┘ └───────┘│ │ │ └───────┘ └───────┘ └───────┘│ │ └───────────────────────────────┘ │ └───────────────────────────────┘ │ │ │ ┌───────────────────────────────┐ │ ┌───────────────────────────────┐ │ │ Availability Zone C │ │ │ Availability Zone C │ │ │ ┌───────┐ ┌───────┐ ┌───────┐│ │ │ ┌───────┐ ┌───────┐ ┌───────┐│ │ │ │App 7-9│ │Redis │ │DB Rep │││ │ │ │App 7-9│ │Redis │ │DB Pri │││ │ │ └───────┘ └───────┘ └───────┘│ │ │ └───────┘ └───────┘ └───────┘│ │ └───────────────────────────────┘ │ └───────────────────────────────┘ │ │ └───────────────────────────────────────────┘ Failure Scenarios Handled:✓ Single server failure: Load balancer routes to healthy servers✓ Single AZ failure: Other AZs in region continue serving✓ Single region failure: Traffic routes to other region✓ Database primary failure: Automated failover to standby| Component | Failure Detection | Failover Mechanism | Recovery Time |
|---|---|---|---|
| Application Server | Load balancer health checks | Remove from rotation, start new | 5-30 seconds |
| Redis Node | Sentinel/Cluster monitoring | Promote replica to master | 1-5 seconds |
| Database Replica | Replication monitoring | Remove from pool, alert | Instant (other replicas continue) |
| Database Primary | Heartbeat + replication lag | Promote standby, update DNS | 10-60 seconds |
| Entire AZ | Cross-AZ health probes | Route traffic to other AZs | 10-30 seconds |
| Entire Region | Global health monitoring | DNS failover to other region | 1-5 minutes |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
/** * Circuit Breaker Pattern * * Prevent cascading failures by failing fast when a dependency is unhealthy. */ enum CircuitState { CLOSED = 'CLOSED', // Normal operation OPEN = 'OPEN', // Failing fast HALF_OPEN = 'HALF_OPEN' // Testing recovery} class CircuitBreaker { private state: CircuitState = CircuitState.CLOSED; private failureCount: number = 0; private lastFailureTime: number = 0; private successCount: number = 0; constructor( private readonly failureThreshold: number = 5, private readonly resetTimeoutMs: number = 30000, private readonly halfOpenSuccessThreshold: number = 3 ) {} async execute<T>(operation: () => Promise<T>): Promise<T> { // If circuit is open, check if we should try again if (this.state === CircuitState.OPEN) { if (Date.now() - this.lastFailureTime > this.resetTimeoutMs) { this.state = CircuitState.HALF_OPEN; this.successCount = 0; } else { throw new CircuitOpenError('Circuit breaker is open'); } } try { const result = await operation(); this.onSuccess(); return result; } catch (error) { this.onFailure(); throw error; } } private onSuccess(): void { if (this.state === CircuitState.HALF_OPEN) { this.successCount++; if (this.successCount >= this.halfOpenSuccessThreshold) { this.state = CircuitState.CLOSED; this.failureCount = 0; } } else { this.failureCount = 0; } } private onFailure(): void { this.failureCount++; this.lastFailureTime = Date.now(); if (this.failureCount >= this.failureThreshold) { this.state = CircuitState.OPEN; } if (this.state === CircuitState.HALF_OPEN) { this.state = CircuitState.OPEN; } }} // Usage:const redisCircuit = new CircuitBreaker(5, 30000, 3); async function getFromRedis(key: string): Promise<string | null> { try { return await redisCircuit.execute(() => redis.get(key)); } catch (error) { if (error instanceof CircuitOpenError) { // Redis is down, skip to database return null; } throw error; }}When components fail, degrade gracefully. If Redis is down, serve from database (slower but works). If analytics is down, still serve redirects (skip analytics). If one region is down, serve from another (higher latency but available). Never let non-critical failures impact critical paths.
At billion-scale, you need comprehensive observability to detect issues before users notice and debug when they don't.
| Category | Metric | Alert Threshold |
|---|---|---|
| Latency | Redirect P50 | 15ms (warning) |
| Latency | Redirect P99 | 50ms (critical) |
| Throughput | Requests per second | < 80% of capacity or > 120% baseline |
| Errors | 5xx error rate | 0.1% (warning), > 1% (critical) |
| Errors | 4xx error rate | 5% (investigate - normal levels vary) |
| Cache | Redis hit rate | < 90% (warning) |
| Cache | CDN hit rate | < 60% (warning) |
| Infrastructure | CPU utilization | 70% (warning), > 85% (critical) |
| Infrastructure | Memory utilization | 80% (warning) |
| Database | Replication lag | 500ms (warning), > 5s (critical) |
| Database | Connection pool exhaustion | 80% utilized (warning) |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
/** * Metrics Instrumentation * * Instrument key operations for monitoring and alerting. */ import { Counter, Histogram, Gauge } from 'prom-client'; // Request metricsconst httpRequestDuration = new Histogram({ name: 'http_request_duration_seconds', help: 'HTTP request duration in seconds', labelNames: ['method', 'route', 'status', 'cache_level'], buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1],}); const httpRequestTotal = new Counter({ name: 'http_requests_total', help: 'Total HTTP requests', labelNames: ['method', 'route', 'status'],}); // Cache metricsconst cacheHits = new Counter({ name: 'cache_hits_total', help: 'Total cache hits', labelNames: ['cache_level'], // local, redis, cdn}); const cacheMisses = new Counter({ name: 'cache_misses_total', help: 'Total cache misses', labelNames: ['cache_level'],}); // Database metricsconst dbQueryDuration = new Histogram({ name: 'db_query_duration_seconds', help: 'Database query duration', labelNames: ['query_type', 'shard'], buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.5],}); const dbConnectionPoolSize = new Gauge({ name: 'db_connection_pool_size', help: 'Current database connection pool size', labelNames: ['pool', 'state'], // state: active, idle}); // Instrumented redirect handlerasync function handleRedirectWithMetrics( shortCode: string, request: Request): Promise<Response> { const startTime = Date.now(); let status = 200; let cacheLevel = 'database'; try { // ... lookup logic with cache level tracking const result = await lookupUrl(shortCode); cacheLevel = result.cacheLevel; if (!result.longUrl) { status = 404; return new Response('Not Found', { status: 404 }); } status = 302; return Response.redirect(result.longUrl, 302); } catch (error) { status = 500; throw error; } finally { const duration = (Date.now() - startTime) / 1000; httpRequestDuration.observe( { method: 'GET', route: '/redirect', status, cache_level: cacheLevel }, duration ); httpRequestTotal.inc({ method: 'GET', route: '/redirect', status }); if (cacheLevel !== 'database') { cacheHits.inc({ cache_level: cacheLevel }); } else { cacheMisses.inc({ cache_level: 'all' }); } }}The standard observability stack: Prometheus for metrics, Grafana for dashboards, Jaeger/Zipkin for traces, ELK/Loki for logs, PagerDuty/Opsgenie for alerting. Alternatively, all-in-one platforms like Datadog or New Relic provide integrated solutions.
Let's consolidate everything we've designed into a complete, production-ready architecture:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
Complete URL Shortener Architecture==================================== USERS (Global) │ ┌───────────────┴───────────────┐ ▼ ▼ ┌───────────────┐ ┌───────────────┐ │ CDN Edge │ │ CDN Edge │ │ (Cloudflare)│ │ (Fastly) │ │ 70% hit rate│ │ │ └───────┬───────┘ └───────┬───────┘ │ │ └───────────────┬───────────────┘ ▼ ┌───────────────────────┐ │ GeoDNS Routing │ │ (Route 53 / NS1) │ └───────────┬───────────┘ │ ┌───────────────────────────┼───────────────────────────┐ ▼ ▼ ▼┌───────────────┐ ┌───────────────┐ ┌───────────────┐│ US-EAST-1 │ │ EU-WEST-1 │ │ AP-SOUTH-1 ││ │ │ │ │ ││ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ ││ │ ALB │ │ │ │ ALB │ │ │ │ ALB │ ││ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ ││ │ │ │ │ │ │ │ ││ ┌─────▼─────┐ │ │ ┌─────▼─────┐ │ │ ┌─────▼─────┐ ││ │ K8s Pod │ │ │ │ K8s Pod │ │ │ │ K8s Pod │ ││ │ Fleet │ │ │ │ Fleet │ │ │ │ Fleet │ ││ │ (5-100) │ │ │ │ (5-100) │ │ │ │ (5-100) │ ││ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ ││ │ │ │ │ │ │ │ ││ ┌─────▼─────┐ │ │ ┌─────▼─────┐ │ │ ┌─────▼─────┐ ││ │ Redis │ │ │ │ Redis │ │ │ │ Redis │ ││ │ Cluster │ │ │ │ Cluster │ │ │ │ Cluster │ ││ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ ││ │ │ │ │ │ │ │ ││ ┌─────▼─────┐ │ │ ┌─────▼─────┐ │ │ ┌─────▼─────┐ ││ │ RDS │◄├─────────┤►│ RDS │◄├─────────┤►│ RDS │ ││ │ Primary │ │ Replica │ │ Replica │ │ Replica │ │ Replica │ ││ └───────────┘ │ │ └───────────┘ │ │ └───────────┘ │└───────────────┘ └───────────────┘ └───────────────┘ │ │ │ └─────────────────────────┴─────────────────────────┘ │ ┌─────────────┴─────────────┐ │ Kafka Cluster │ │ (Analytics Events) │ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐ │ Flink / Spark │ │ (Stream Processing) │ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐ │ ClickHouse │ │ (Analytics Storage) │ └───────────────────────────┘| Component | Technology | Scale | Purpose |
|---|---|---|---|
| CDN | Cloudflare / Fastly | 200+ PoPs globally | Edge caching, DDoS protection |
| DNS | Route 53 / NS1 | Anycast global | Geo-routing, health checks |
| Load Balancer | AWS ALB / NLB | Per-region | SSL termination, distribution |
| App Servers | Kubernetes | 5-100 pods/region | Stateless redirect handling |
| Cache | Redis Cluster | 100GB/region | Shared URL cache |
| Database | PostgreSQL / Aurora | Multi-AZ primary + replicas | Source of truth |
| Analytics Queue | Kafka | Multi-region | Event streaming |
| Analytics Processing | Flink | Auto-scaling | Enrichment, aggregation |
| Analytics Storage | ClickHouse | 100TB+ | Time-series queries |
We've completed our comprehensive design of a billion-scale URL shortener. Let's consolidate the key scaling decisions:
Throughout this module, we've designed a complete URL shortening service capable of handling:
| Page | Topic | Key Concepts |
|---|---|---|
| 1 | Requirements | Functional/non-functional requirements, capacity estimation, API design |
| 2 | URL Encoding | Base62, hash-based, Snowflake IDs, collision handling |
| 3 | Redirect Latency | Multi-layer caching, CDN, global distribution |
| 4 | Analytics | Event streaming, aggregation, time-series storage |
| 5 | Custom URLs | Namespace separation, validation, abuse prevention |
| 6 | Scaling Reads | Horizontal scaling, sharding, high availability |
Congratulations! You've mastered the design of a URL shortener—a classic system design problem that touches on caching, databases, distributed systems, and global infrastructure. These patterns apply far beyond URL shortening to any read-heavy, latency-sensitive service.