Loading content...
When a user clicks a shortened URL, they expect instantaneous redirection. Every millisecond of delay is perceptible; every hundred milliseconds feels broken.
Consider the redirect path:
Steps 2-4 and 6-7 are network overhead (100-400ms typically). Our goal is to make step 5—our processing—essentially invisible at under 10ms for cached lookups and under 50ms worst-case.
At 1 billion redirects per day (50,000+ per second at peak), this latency target requires sophisticated caching, global distribution, and optimized data paths.
By the end of this page, you will master multi-layer caching strategies (CDN → in-memory → distributed cache → database), understand database selection and optimization for redirect workloads, and design global distribution patterns for consistent sub-50ms latency worldwide.
Before optimizing, we must understand where time is spent in a redirect request. Let's trace a request through the system:
123456789101112131415161718192021222324
Redirect Request Latency Breakdown=================================== [Client] → [DNS] → [CDN Edge] → [Load Balancer] → [App Server] → [Cache] → [DB] Component Latency (typical):────────────────────────────DNS Lookup: 1-50ms (cached: 0ms, cold: 50ms+)CDN Edge: 1-5ms (regional, very fast)TLS Handshake: 10-50ms (session resumption helps)Load Balancer: 0.5-2ms (minimal overhead)App Server Processing: 1-10ms (parse request, logic)Cache Lookup (Redis): 0.5-2ms (same region)Database Lookup: 2-20ms (depends on indexing) TOTAL (cache hit): 15-70ms (dominated by network)TOTAL (cache miss): 25-100ms (+ database latency) What we control directly:- App Server Processing: 1-10ms- Cache Lookup: 0.5-2ms- Database Lookup: 2-20ms─────────────────────────────Our Budget: <50ms total, target <10ms for cache hitAverage latency is misleading. What matters is the tail latency—how slow the slowest requests are:
| Percentile | Target | What It Means | Acceptable Per 1B Requests |
|---|---|---|---|
| P50 (median) | <10ms | Half of requests under 10ms | 500M requests under 10ms |
| P90 | <25ms | 90% of requests under 25ms | 100M may exceed 25ms |
| P99 | <50ms | 99% under 50ms | 10M may exceed 50ms |
| P99.9 | <100ms | 99.9% under 100ms | 1M may exceed 100ms |
| P99.99 | <500ms | Avoid timeouts | 100K may be slow |
At 1B requests/day, even P99.99 latency (500ms) affects 100,000 users daily. These users will perceive the service as broken. Tail latency optimization is critical—and often harder than median optimization because it involves eliminating edge cases, garbage collection pauses, and cache misses.
URL shortener redirects are perfectly cacheable: the mapping from short code to long URL rarely changes. We exploit this with aggressive, multi-layer caching.
12345678910111213141516171819202122232425262728293031323334353637
Multi-Layer Cache Architecture============================== ┌───────────────────────────────────────────────────────────────────┐│ LAYER 1: CDN EDGE CACHE ││ • 200+ global edge locations ││ • Latency: 1-5ms ││ • Capacity: Effectively unlimited ││ • TTL: 24 hours (popular URLs always cached) ││ • Hit rate target: 60-70% of all requests │└───────────────────────────────────────────────────────────────────┘ ↓ cache miss┌───────────────────────────────────────────────────────────────────┐│ LAYER 2: LOCAL IN-MEMORY CACHE ││ • Per-server LRU cache (HashMap) ││ • Latency: <0.1ms (memory access) ││ • Capacity: 1-5GB per server (10-50M entries) ││ • TTL: 10 minutes ││ • Hit rate target: 20-30% of CDN misses │└───────────────────────────────────────────────────────────────────┘ ↓ cache miss┌───────────────────────────────────────────────────────────────────┐│ LAYER 3: DISTRIBUTED REDIS CACHE ││ • Redis Cluster across region ││ • Latency: 0.5-2ms ││ • Capacity: 100GB+ per region ││ • TTL: 1 hour ││ • Hit rate target: 95%+ of local cache misses │└───────────────────────────────────────────────────────────────────┘ ↓ cache miss┌───────────────────────────────────────────────────────────────────┐│ LAYER 4: DATABASE (Source) ││ • Primary database or read replica ││ • Latency: 2-20ms ││ • Contains all URL mappings ││ • Target: <5% of total requests reach DB │└───────────────────────────────────────────────────────────────────┘Content Delivery Networks (CDNs) like Cloudflare, AWS CloudFront, or Fastly can cache redirect responses at edge locations worldwide:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
// CDN Cache Configuration for Redirects // HTTP Response Headers for Cachingconst redirectResponse = { statusCode: 302, // Temporary redirect (allows analytics tracking) headers: { 'Location': longUrl, // CDN Cache Control 'Cache-Control': 'public, max-age=86400, s-maxage=86400', // public: CDN can cache // max-age: browser cache for 24h // s-maxage: CDN cache for 24h (overrides max-age for CDN) // CDN-specific headers 'CDN-Cache-Control': 'public, max-age=86400', 'Surrogate-Control': 'max-age=86400', // Fastly/Varnish // Cache key variants (cache per-URL only, not per-user) 'Vary': 'Accept-Encoding', // Only vary on encoding, not cookies // Analytics bypass hint 'X-Cache-Status': 'origin', // Track origin vs edge hits }}; // CDN Edge Configuration (conceptual - varies by provider)const cdnConfig = { // Cache based only on path (not query params or headers) cacheKeyRules: { includeQueryString: false, // /abc123?ref=twitter same as /abc123 includeHost: true, // Different domains cached separately includeCookies: false, // User-specific cookies don't vary cache }, // Stale-while-revalidate for high availability staleContentRules: { serveStaleOnError: true, // Return cached version if origin down staleMaxAge: 3600, // Serve stale up to 1 hour }, // Origin shield (reduce origin load) originShield: { enabled: true, region: 'us-east-1', // Single region contacts origin }};301 (Permanent) redirects are cached indefinitely by browsers—subsequent visits never hit your service. This prevents analytics tracking. 302 (Temporary) redirects allow CDN caching while ensuring browsers check back. Use 302 for analytics, 301 for static/permanent links.
When CDN misses, requests hit our application servers. Here, we employ two cache layers: local in-memory cache and distributed Redis cache.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
/** * Local In-Memory LRU Cache * * Each application server maintains its own cache for hottest URLs. * Eliminates network round-trip for frequently accessed URLs. */ class LocalUrlCache { private cache: Map<string, CacheEntry>; private readonly maxSize: number; private readonly ttlMs: number; constructor(maxSize: number = 1_000_000, ttlMs: number = 600_000) { this.cache = new Map(); this.maxSize = maxSize; // 1M entries ≈ 300MB memory this.ttlMs = ttlMs; // 10 minutes TTL } get(shortCode: string): string | null { const entry = this.cache.get(shortCode); if (!entry) return null; // Check expiration if (Date.now() > entry.expiresAt) { this.cache.delete(shortCode); return null; } // LRU: Move to end (most recently used) this.cache.delete(shortCode); this.cache.set(shortCode, entry); return entry.longUrl; } set(shortCode: string, longUrl: string): void { // Evict oldest if at capacity (LRU eviction) if (this.cache.size >= this.maxSize) { const oldestKey = this.cache.keys().next().value; this.cache.delete(oldestKey); } this.cache.set(shortCode, { longUrl, expiresAt: Date.now() + this.ttlMs, }); } invalidate(shortCode: string): void { this.cache.delete(shortCode); }} interface CacheEntry { longUrl: string; expiresAt: number;} // Memory footprint estimation:// 1M entries × 300 bytes = 300MB per server// Typical server RAM: 16GB → cache uses ~2% of memoryRedis provides shared caching across servers with sub-millisecond latency:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
/** * Distributed Redis Cache Layer * * Shared cache across all application servers. * Provides cache coherence and higher capacity than local cache. */ import Redis from 'ioredis'; class RedisUrlCache { private redis: Redis.Cluster; private readonly defaultTtl: number = 3600; // 1 hour constructor(nodes: { host: string; port: number }[]) { this.redis = new Redis.Cluster(nodes, { redisOptions: { connectTimeout: 5000, commandTimeout: 100, // Fail fast - 100ms timeout }, scaleReads: 'slave', // Read from replicas for scalability enableReadyCheck: true, }); } async get(shortCode: string): Promise<string | null> { try { const longUrl = await this.redis.get(`url:${shortCode}`); return longUrl; } catch (error) { // Redis failure should not block redirects console.error('Redis get failed:', error); return null; } } async set(shortCode: string, longUrl: string): Promise<void> { try { await this.redis.setex( `url:${shortCode}`, this.defaultTtl, longUrl ); } catch (error) { // Log but don't fail - cache is optimization, not requirement console.error('Redis set failed:', error); } } async getMulti(shortCodes: string[]): Promise<Map<string, string>> { const keys = shortCodes.map(code => `url:${code}`); const values = await this.redis.mget(...keys); const result = new Map<string, string>(); values.forEach((value, index) => { if (value) { result.set(shortCodes[index], value); } }); return result; } async invalidate(shortCode: string): Promise<void> { await this.redis.del(`url:${shortCode}`); }} // Redis Cluster Topology for URL Shortener:// // ┌─────────────────────────────────────────────┐// │ Redis Cluster (6 nodes per region) │// │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │// │ │Master 1 │ │Master 2 │ │Master 3 │ │// │ │ Slots │ │ Slots │ │ Slots │ │// │ │ 0-5460 │ │5461-10922│ │10923-16383│ │// │ └────┬────┘ └────┬────┘ └────┬────┘ │// │ │ │ │ │// │ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │// │ │Replica 1│ │Replica 2│ │Replica 3│ │// │ └─────────┘ └─────────┘ └─────────┘ │// └─────────────────────────────────────────────┘On server startup or after cache flush, cold caches cause latency spikes. Implement cache warming: pre-populate the cache with the top 1000-10000 most accessed URLs from the last 24 hours. This ensures immediate high hit rates.
Let's implement the complete redirect flow with all cache layers:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
/** * Production Redirect Handler * * Multi-layer cache lookup with graceful fallback. */ class RedirectHandler { private localCache: LocalUrlCache; private redisCache: RedisUrlCache; private database: UrlDatabase; private analytics: AnalyticsEmitter; async handleRedirect( shortCode: string, request: Request ): Promise<Response> { const startTime = performance.now(); let cacheLevel = 'none'; try { // Layer 1: Check local in-memory cache (fastest) let longUrl = this.localCache.get(shortCode); if (longUrl) { cacheLevel = 'local'; } else { // Layer 2: Check Redis distributed cache longUrl = await this.redisCache.get(shortCode); if (longUrl) { cacheLevel = 'redis'; // Promote to local cache for future requests this.localCache.set(shortCode, longUrl); } else { // Layer 3: Database lookup (slowest) longUrl = await this.database.getLongUrl(shortCode); if (longUrl) { cacheLevel = 'database'; // Populate both cache layers this.localCache.set(shortCode, longUrl); // Don't await - async cache population this.redisCache.set(shortCode, longUrl).catch(() => {}); } } } // URL not found if (!longUrl) { return this.notFoundResponse(shortCode); } // Emit analytics asynchronously (never block redirect) this.emitAnalytics(shortCode, request, cacheLevel); // Return redirect response const latency = performance.now() - startTime; return this.redirectResponse(longUrl, latency, cacheLevel); } catch (error) { // Fail gracefully - try to serve from any available source return this.handleError(shortCode, error); } } private redirectResponse( longUrl: string, latencyMs: number, cacheLevel: string ): Response { return new Response(null, { status: 302, headers: { 'Location': longUrl, 'Cache-Control': 'public, max-age=86400, s-maxage=86400', 'X-Response-Time': `${latencyMs.toFixed(2)}ms`, 'X-Cache-Level': cacheLevel, }, }); } private notFoundResponse(shortCode: string): Response { return new Response('URL not found', { status: 404, headers: { 'Cache-Control': 'no-store', // Don't cache 404s }, }); } private emitAnalytics( shortCode: string, request: Request, cacheLevel: string ): void { // Fire-and-forget analytics emission setImmediate(() => { this.analytics.emit({ shortCode, timestamp: Date.now(), ip: request.headers.get('cf-connecting-ip') ?? request.ip, userAgent: request.headers.get('user-agent'), referer: request.headers.get('referer'), cacheLevel, }); }); }}When cache misses occur, database performance is critical. With proper caching, only 1-5% of requests reach the database, but that's still 10-50 million queries per day!
| Database | Strengths | Weaknesses | Best For |
|---|---|---|---|
| PostgreSQL | Mature, ACID, rich indexing | Horizontal scaling complex | Moderate scale, complex queries |
| MySQL | Simple, well-understood, replication | Limited horizontal scaling | Simple use cases, read replicas |
| DynamoDB | Serverless, auto-scaling, global tables | Limited query flexibility | AWS-native, global scale |
| Cassandra | Write-heavy, linear scalability | Eventual consistency, no joins | Extreme write scale |
| MongoDB | Flexible schema, sharding built-in | Less mature transactions | Rapid iteration, flexible needs |
For pure key-value lookups, DynamoDB (or similar) excels:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
/** * DynamoDB Table Design for URL Shortener */ // Primary table: URLsconst urlsTable = { TableName: 'Urls', // Simple primary key - short code only KeySchema: [ { AttributeName: 'shortCode', KeyType: 'HASH' } ], AttributeDefinitions: [ { AttributeName: 'shortCode', AttributeType: 'S' }, { AttributeName: 'userId', AttributeType: 'S' }, { AttributeName: 'createdAt', AttributeType: 'N' }, ], // Global Secondary Index for user's URLs GlobalSecondaryIndexes: [ { IndexName: 'UserUrlsIndex', KeySchema: [ { AttributeName: 'userId', KeyType: 'HASH' }, { AttributeName: 'createdAt', KeyType: 'RANGE' } ], Projection: { ProjectionType: 'ALL' } } ], // On-demand capacity for auto-scaling BillingMode: 'PAY_PER_REQUEST',}; // Sample item structureconst urlItem = { shortCode: 'a7Xk2B', // Partition key longUrl: 'https://example.com/very/long/path?with=params', userId: 'user_12345', createdAt: 1704067200000, // Unix timestamp expiresAt: null, // Optional TTL clickCount: 1523, // Denormalized for quick access customAlias: false, metadata: { title: 'My Campaign Link', tags: ['marketing', 'q1-2024'], }}; // Redirect lookup queryconst getLongUrl = async (shortCode: string): Promise<string | null> => { const result = await dynamodb.get({ TableName: 'Urls', Key: { shortCode }, ProjectionExpression: 'longUrl, expiresAt', // Only fetch needed fields ConsistentRead: false, // Eventually consistent = faster }).promise(); if (!result.Item) return null; // Check expiration if (result.Item.expiresAt && result.Item.expiresAt < Date.now()) { return null; // Expired } return result.Item.longUrl;};URL shortener redirects are pure key-value lookups with no joins or complex queries. DynamoDB provides: single-digit millisecond latency, automatic horizontal scaling, global tables for multi-region, and serverless operation. For this access pattern, it's nearly ideal.
Users click short URLs from everywhere. A user in Tokyo shouldn't wait 200ms for a round-trip to US servers. Global distribution minimizes latency for all users.
123456789101112131415161718192021222324252627282930313233343536373839
Global Distribution Architecture================================== ┌─────────────────────┐ │ DNS (GeoDNS) │ │ Route to nearest │ │ edge region │ └──────────┬──────────┘ │ ┌──────────────────────────┼──────────────────────────┐ │ │ │ ▼ ▼ ▼┌───────────────┐ ┌───────────────┐ ┌───────────────┐│ US-EAST │ │ EU-WEST │ │ AP-NORTHEAST ││ Region │ │ Region │ │ Region │├───────────────┤ ├───────────────┤ ├───────────────┤│ • CDN Edge │ │ • CDN Edge │ │ • CDN Edge ││ • App Servers │ │ • App Servers │ │ • App Servers ││ • Redis Cache │ │ • Redis Cache │ │ • Redis Cache ││ • DB Replica │◄────────┤ • DB Replica │◄────────┤ • DB Replica │└───────┬───────┘ └───────────────┘ └───────────────┘ │ │ Replication ▼┌───────────────┐│ PRIMARY DB ││ (US-EAST) ││ ││ All writes ││ go here │└───────────────┘ Data Flow:- READS: Served from nearest region (local replica)- WRITES: Routed to primary region, async replicated Latency from any major city:- With local region: 10-30ms- Without local region: 100-300ms (unacceptable)1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
/** * Geographic DNS Routing Configuration * * Route users to nearest healthy region based on: * 1. Geographic location (latency-based) * 2. Health checks (failover to next-closest) * 3. Load balancing (within region) */ // AWS Route 53 Latency-Based Routing (conceptual)const dnsConfig = { recordSets: [ { name: 'short.url', type: 'A', region: 'us-east-1', aliasTarget: 'alb-us-east.elb.amazonaws.com', healthCheckId: 'hc-us-east', }, { name: 'short.url', type: 'A', region: 'eu-west-1', aliasTarget: 'alb-eu-west.elb.amazonaws.com', healthCheckId: 'hc-eu-west', }, { name: 'short.url', type: 'A', region: 'ap-northeast-1', aliasTarget: 'alb-ap-ne.elb.amazonaws.com', healthCheckId: 'hc-ap-ne', }, ], healthChecks: { type: 'HTTPS', path: '/health', interval: 10, // Check every 10 seconds failureThreshold: 2, // 2 failures = unhealthy regions: ['us-east-1', 'eu-west-1', 'ap-northeast-1'], }, routingPolicy: 'latency', // Route to lowest-latency healthy region}; // Failover behavior:// 1. US-East goes down → Route 53 detects via health check// 2. US users routed to EU-West (next closest healthy)// 3. When US-East recovers → traffic automatically returnsCDNs like Cloudflare use Anycast IP addressing—the same IP address is announced from multiple locations. Users are automatically routed to the nearest point of presence based on BGP routing, often faster than DNS-based routing.
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
When a URL is updated or deleted, cached versions must be invalidated across ALL layers and regions.
| Event | What to Invalidate | Urgency | Strategy |
|---|---|---|---|
| URL destination changed | All caches for that code | High | Active invalidation + short TTL |
| URL deleted | All caches for that code | High | Active invalidation |
| URL expired (TTL) | All caches for that code | Medium | TTL-based expiration |
| User requests private | All caches for that code | High | Immediate invalidation |
| Security issue (phishing) | All caches for affected URLs | Critical | Emergency purge |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
/** * Multi-Layer Cache Invalidation */ class CacheInvalidator { private localCache: LocalUrlCache; private redisCache: RedisUrlCache; private cdnPurger: CdnPurger; private pubsub: PubSub; /** * Invalidate a short code across all cache layers and regions. * Uses pub/sub to notify all app server instances. */ async invalidate(shortCode: string): Promise<void> { const invalidationId = generateUuid(); console.log(`[Invalidation ${invalidationId}] Starting for ${shortCode}`); // 1. Invalidate local cache (this instance only) this.localCache.invalidate(shortCode); // 2. Invalidate distributed Redis cache await this.redisCache.invalidate(shortCode); // 3. Publish invalidation event to all app servers (all regions) await this.pubsub.publish('cache-invalidation', { shortCode, invalidationId, timestamp: Date.now(), }); // 4. Purge from CDN edge caches await this.cdnPurger.purge(`/${shortCode}`); console.log(`[Invalidation ${invalidationId}] Complete`); } /** * Subscribe to invalidation events from other instances */ async subscribeToInvalidations(): Promise<void> { await this.pubsub.subscribe('cache-invalidation', (message) => { // Invalidate local cache when notified by other instances this.localCache.invalidate(message.shortCode); console.log(`Received invalidation for ${message.shortCode}`); }); }} // CDN Purge API (example for Cloudflare)class CdnPurger { async purge(path: string): Promise<void> { await fetch('https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache', { method: 'POST', headers: { 'Authorization': `Bearer ${process.env.CF_API_TOKEN}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ files: [`https://short.url${path}`], }), }); } async purgeAll(): Promise<void> { // Emergency: purge entire cache await fetch('https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache', { method: 'POST', headers: { 'Authorization': `Bearer ${process.env.CF_API_TOKEN}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ purge_everything: true }), }); }}Even with active invalidation, there's a propagation window. CDN purges take 1-30 seconds. Pub/sub messages take 100-500ms. Browser caches may not be clearable at all (if user has cached 301). Design for eventual consistency—absolute immediate invalidation is impossible in distributed systems.
We've built a comprehensive latency optimization strategy for URL shortener redirects. Let's consolidate the key approaches:
| Technique | Latency Impact | Implementation Effort | Hit Rate |
|---|---|---|---|
| CDN Edge Caching | -100ms+ (eliminates origin) | Medium | 60-70% |
| Local In-Memory Cache | -1-2ms (eliminates Redis) | Low | 20-30% of misses |
| Redis Distributed Cache | -5-15ms (eliminates DB) | Medium | 95%+ of local misses |
| Database Read Replicas | -10-50ms (reduces latency) | Medium | N/A |
| Global Multi-Region | -50-200ms (locality) | High | N/A |
| KeyValue DB (DynamoDB) | -5-10ms vs SQL DB | Medium | N/A |
You now understand how to achieve sub-50ms redirect latency at scale. Next, we'll explore analytics collection—how to gather click data from billions of redirects without impacting that hard-won latency.