Loading content...
A rate limiter running on a single server is straightforward—we've mastered token buckets and sliding windows. But modern API platforms run across hundreds or thousands of servers in multiple geographic regions. When a client with a 100 requests/minute limit makes a request that lands on server A in Virginia, how does server B in Singapore know about it?
This is the challenge of distributed rate limiting: maintaining consistent, accurate limits across a globally distributed fleet while preserving the sub-millisecond latency that users expect. It's a classic distributed systems problem that forces us to navigate the trade-offs of the CAP theorem.
In this page, we'll explore how companies like Stripe, Cloudflare, and GitHub solve distributed rate limiting at scale—from centralized coordination to gossip protocols to intelligent approximations that sacrifice a little accuracy for orders of magnitude better performance.
By the end of this page, you'll understand: (1) Why distributed rate limiting is fundamentally hard, (2) Centralized approaches with Redis/Memcached, (3) Local + remote hybrid architectures, (4) Gossip protocols for eventually consistent limiting, (5) Cell-based architectures for geographic distribution, and (6) The trade-offs each approach makes.
Before diving into solutions, let's deeply understand why distributed rate limiting is challenging. This understanding prevents us from choosing inappropriate solutions.
Consider a simplified scenario:
Client: 100 requests/minute limit
Server A Server B
├── Receives request 1 ├── Receives request 2
├── Local count: 1 ├── Local count: 1
├── Allows ├── Allows
└── ??? └── ???
Question: What is the client's *actual* request count?
Answer: 2 (but neither server knows this)
Without coordination, each server makes decisions based on incomplete information. A client could make:
With N servers and no coordination, clients can potentially make N × limit requests.
Distributed rate limiting is subject to the CAP theorem:
Consistency (C): All nodes see the same request count at the same time Availability (A): Every request gets a rate limit decision Partition Tolerance (P): System continues working during network failures
We must choose between:
CP (Consistent + Partition-tolerant):
AP (Available + Partition-tolerant):
Real-world choice: Most production systems choose AP, accepting some over-limiting in exchange for availability and low latency. The trade-off is acceptable because rate limits are usually approximate quotas, not hard security boundaries.
Consider: If your API has a 50ms p99 latency target and rate limiting adds 10ms for a remote check, you've consumed 20% of your budget. For sub-10ms APIs, synchronous remote rate limiting is simply not viable. Architecture must account for this constraint.
The simplest distributed rate limiting architecture uses a centralized counter service—typically Redis—that all application servers query. While not infinitely scalable, this approach works well for many production systems.
┌─────────────────────────────────────┐
│ Load Balancer │
└─────────────┬───────────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ App │ │ App │ │ App │
│ Server A │ │ Server B │ │ Server C │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└──────────────┬───────┴──────────────────────┘
│
▼
┌─────────────────┐
│ Redis │
│ (Counters) │
└─────────────────┘
Every rate limit check goes through Redis, ensuring all servers see the same state.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105
/** * Centralized rate limiter using Redis * All nodes share a single source of truth */import Redis from 'ioredis'; class CentralizedRateLimiter { private redis: Redis; private readonly scriptSha: string | null = null; constructor(redisUrl: string) { this.redis = new Redis(redisUrl, { // Connection pooling for performance maxRetriesPerRequest: 3, retryDelayOnFailover: 100, connectTimeout: 1000, commandTimeout: 100, // Fast timeout for rate limiting }); } /** * Check rate limit using Redis Lua script (atomic) */ async checkLimit( clientId: string, limit: number, windowSizeSeconds: number ): Promise<RateLimitResult> { const now = Date.now(); const windowKey = this.getWindowKey(clientId, now, windowSizeSeconds); try { // Use EVAL for atomic increment + check const result = await this.redis.eval( SLIDING_WINDOW_SCRIPT, 1, // Number of keys windowKey, // KEYS[1] limit.toString(), // ARGV[1] windowSizeSeconds.toString(), // ARGV[2] now.toString() // ARGV[3] ) as [number, number, number]; const [allowed, remaining, retryAfter] = result; return { allowed: allowed === 1, remaining, limit, retryAfter, resetAt: this.calculateResetAt(now, windowSizeSeconds) }; } catch (error) { // Fail-open: allow on Redis errors console.error('Rate limiter Redis error:', error); return this.failOpen(limit); } } private getWindowKey(clientId: string, now: number, windowSize: number): string { // Key includes window number for automatic expiry const window = Math.floor(now / (windowSize * 1000)); return `ratelimit:${clientId}:${window}`; } private calculateResetAt(now: number, windowSize: number): number { const windowMs = windowSize * 1000; const currentWindowStart = Math.floor(now / windowMs) * windowMs; return currentWindowStart + windowMs; } /** * Fail-open policy: allow request on Redis failure * Protects availability at cost of rate limit accuracy */ private failOpen(limit: number): RateLimitResult { return { allowed: true, remaining: limit, limit, retryAfter: 0, resetAt: Date.now() + 60000 }; }} // Lua script for atomic sliding window counterconst SLIDING_WINDOW_SCRIPT = `local key = KEYS[1]local limit = tonumber(ARGV[1])local window_size = tonumber(ARGV[2]) * 1000local now = tonumber(ARGV[3]) -- Get current countlocal current = tonumber(redis.call('GET', key) or '0') if current < limit then -- Increment and set expiry redis.call('INCR', key) redis.call('PEXPIRE', key, window_size * 2) return {1, limit - current - 1, 0}else return {0, 0, math.ceil(window_size / 1000)}end`;Redis Cluster: Redis Cluster shards keys across multiple nodes, distributing load:
Client A → Hash → Shard 1
Client B → Hash → Shard 2
Client C → Hash → Shard 3
With 10 shards, throughput scales ~10×.
Read Replicas: For read-heavy workloads (checking limits without incrementing), replicas can serve reads:
Master: Handles INCR (writes)
Replica 1-N: Handle GET (reads)
Hot Key Problem: Very active clients become hot keys that overload specific shards. Mitigation:
The most common production pattern combines local rate limiting with asynchronous remote synchronization. This approach provides low latency while maintaining reasonable accuracy.
┌────────────────────────────────────┐
│ Client Request │
└─────────────────┬──────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Application Server │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Local Rate Limiter │ │
│ │ • In-memory token bucket │ │
│ │ • Sub-microsecond latency │ │
│ │ • Per-node limit (global limit ÷ node count) │ │
│ └──────────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ╔══════════╧═══════════╗ │
│ ║ Local Allowed? ║ │
│ ╚══════════╤═══════════╝ │
│ ▼ No ▼ Yes │
│ Return 429 Mark for Background Sync │
└───────────────────────────────────┬─────────────────────────────────┘
│ (Async, every 100ms)
▼
┌───────────────────────┐
│ Redis Cluster │
│ (Global Counters) │
└───────────────────────┘
How it works:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
/** * Hybrid Rate Limiter: Local + Remote * Fast local checks with async global synchronization */class HybridRateLimiter { private localBuckets: Map<string, TokenBucket> = new Map(); private pendingIncrements: Map<string, number> = new Map(); private globalCounts: Map<string, number> = new Map(); private readonly syncIntervalMs = 100; // Sync every 100ms private readonly localSharePercent: number; constructor( private readonly redis: Redis, private readonly nodeCount: number = 10 ) { // Each node gets 1/N of the limit + some buffer for eventual consistency this.localSharePercent = 1.0 / nodeCount + 0.1; // 10% buffer // Start background sync setInterval(() => this.syncWithRedis(), this.syncIntervalMs); } /** * Main rate limit check - uses local bucket first */ async checkLimit( clientId: string, globalLimit: number, windowSizeSeconds: number ): Promise<RateLimitResult> { // Calculate local limit (node's share of global) const localLimit = Math.ceil(globalLimit * this.localSharePercent); // Get or create local bucket let bucket = this.localBuckets.get(clientId); if (!bucket) { bucket = new TokenBucket(localLimit, localLimit / windowSizeSeconds); this.localBuckets.set(clientId, bucket); } // Fast local check const localResult = bucket.consume(); if (!localResult.allowed) { // Definitely denied - local share exhausted return { allowed: false, remaining: 0, limit: globalLimit, retryAfter: localResult.retryAfter, resetAt: localResult.resetAt }; } // Allowed locally - record for sync this.recordIncrement(clientId); // Estimate global remaining from cached data const globalUsed = this.globalCounts.get(clientId) || 0; const remaining = Math.max(0, globalLimit - globalUsed - 1); return { allowed: true, remaining, limit: globalLimit, retryAfter: 0, resetAt: this.calculateResetAt(windowSizeSeconds) }; } /** * Record increment for batch sync */ private recordIncrement(clientId: string): void { const current = this.pendingIncrements.get(clientId) || 0; this.pendingIncrements.set(clientId, current + 1); } /** * Background sync with Redis * Batches local increments and fetches global state */ private async syncWithRedis(): Promise<void> { if (this.pendingIncrements.size === 0) return; const now = Date.now(); const pipeline = this.redis.pipeline(); // Batch all pending increments for (const [clientId, count] of this.pendingIncrements) { const key = this.getWindowKey(clientId, now); pipeline.incrby(key, count); pipeline.pexpire(key, 120000); // 2 minute expiry } this.pendingIncrements.clear(); try { const results = await pipeline.exec(); // Update global counts from Redis responses let i = 0; for (const [clientId] of this.pendingIncrements) { const countResult = results![i * 2]; // INCRBY result if (countResult && countResult[1]) { this.globalCounts.set(clientId, countResult[1] as number); } i++; } } catch (error) { console.error('Redis sync failed:', error); // Continue operating with local limits only } } private getWindowKey(clientId: string, now: number): string { const minute = Math.floor(now / 60000); return `ratelimit:${clientId}:${minute}`; } private calculateResetAt(windowSize: number): number { const windowMs = windowSize * 1000; const now = Date.now(); return Math.ceil(now / windowMs) * windowMs; }}The local share percentage significantly affects accuracy. Too small = excessive false rejections. Too large = significant over-limit allowance. Start with (1/nodeCount + 10%) and tune based on observed traffic patterns. If traffic is well-balanced across nodes, reduce the buffer. If traffic is bursty, increase it.
Best Case (Balanced Traffic): When requests are evenly distributed across N nodes:
Worst Case (Skewed Traffic): When all requests hit one node:
Mitigation: Adaptive rebalancing can detect skew and adjust local limits dynamically:
if (localUsage > 80% && globalUsage < 50%) {
// This node is hot, others are cold
// Temporarily increase local share
localShare = Math.min(0.5, localShare * 1.2);
}
For systems that must avoid any centralized dependency, gossip protocols enable peer-to-peer coordination. Each node shares its local counts with neighbors, eventually reaching a consistent view without a central coordinator.
Time T0: Each node has local counts
Node A: {client1: 10, client2: 5}
Node B: {client1: 8, client3: 12}
Node C: {client2: 3, client3: 7}
Time T1: Node A gossips to Node B
Node A → Node B: {client1: 10, client2: 5, timestamp: T0}
Node B merges: {client1: 18, client2: 5, client3: 12}
Time T2: Node B gossips to Node C
Node B → Node C: {client1: 18, client2: 5, client3: 12}
Node C merges: {client1: 18, client2: 8, client3: 19}
Time T3: Eventually all nodes converge to:
{client1: 18, client2: 8, client3: 19}
Key Properties:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167
/** * Gossip-based distributed rate limiter * Decentralized coordination without single point of failure */class GossipRateLimiter { private localCounts: Map<string, CounterWithTimestamp> = new Map(); private peerNodes: string[] = []; private readonly gossipIntervalMs = 50; // Gossip every 50ms private readonly convergenceTarget = 200; // Expect convergence within 200ms constructor( private readonly nodeId: string, private readonly nodeCount: number ) { // Start gossip protocol setInterval(() => this.gossip(), this.gossipIntervalMs); } /** * Configure peer nodes for gossip */ setPeers(peers: string[]): void { this.peerNodes = peers.filter(p => p !== this.nodeId); } /** * Check rate limit using local + gossiped counts */ async checkLimit( clientId: string, limit: number, windowSizeSeconds: number ): Promise<RateLimitResult> { const entry = this.localCounts.get(clientId); const globalEstimate = entry?.globalEstimate || 0; if (globalEstimate >= limit) { return { allowed: false, remaining: 0, limit, retryAfter: Math.ceil(windowSizeSeconds / limit) }; } // Increment local count this.incrementLocal(clientId); return { allowed: true, remaining: Math.max(0, limit - globalEstimate - 1), limit, retryAfter: 0 }; } /** * Increment local counter for a client */ private incrementLocal(clientId: string): void { let entry = this.localCounts.get(clientId); if (!entry) { entry = { localCount: 0, remoteCounts: new Map(), lastUpdated: Date.now(), globalEstimate: 0 }; this.localCounts.set(clientId, entry); } entry.localCount++; entry.lastUpdated = Date.now(); this.updateGlobalEstimate(entry); } /** * Gossip local state to random peer */ private async gossip(): Promise<void> { if (this.peerNodes.length === 0) return; // Select random peer (or use consistent hashing for stability) const peer = this.peerNodes[Math.floor(Math.random() * this.peerNodes.length)]; // Build gossip message with local counts const message: GossipMessage = { sourceNode: this.nodeId, timestamp: Date.now(), counts: new Map() }; for (const [clientId, entry] of this.localCounts) { message.counts.set(clientId, { nodeId: this.nodeId, count: entry.localCount, timestamp: entry.lastUpdated }); } // Send to peer (in real impl, use HTTP/gRPC) await this.sendToPeer(peer, message); } /** * Handle incoming gossip from peer */ handleGossip(message: GossipMessage): void { for (const [clientId, peerCount] of message.counts) { let entry = this.localCounts.get(clientId); if (!entry) { entry = { localCount: 0, remoteCounts: new Map(), lastUpdated: Date.now(), globalEstimate: 0 }; this.localCounts.set(clientId, entry); } // Merge peer count if newer const existing = entry.remoteCounts.get(peerCount.nodeId); if (!existing || peerCount.timestamp > existing.timestamp) { entry.remoteCounts.set(peerCount.nodeId, peerCount); this.updateGlobalEstimate(entry); } } } /** * Calculate global estimate from local + all known remote counts */ private updateGlobalEstimate(entry: CounterWithTimestamp): void { let total = entry.localCount; // Sum all remote counts for (const remote of entry.remoteCounts.values()) { total += remote.count; } entry.globalEstimate = total; } private async sendToPeer(peer: string, message: GossipMessage): Promise<void> { // Implementation: HTTP POST, gRPC call, or message queue // Typically uses a lightweight reliable transport }} interface CounterWithTimestamp { localCount: number; remoteCounts: Map<string, PeerCount>; lastUpdated: number; globalEstimate: number;} interface PeerCount { nodeId: string; count: number; timestamp: number;} interface GossipMessage { sourceNode: string; timestamp: number; counts: Map<string, PeerCount>;}| Parameter | Faster Gossip | Slower Gossip |
|---|---|---|
| Convergence Time | 10-50ms | 200-1000ms |
| Network Bandwidth | Higher | Lower |
| Accuracy During Burst | Better | Worse |
| CPU Overhead | Higher | Lower |
| Suitable For | Strict limits, low latency | Loose limits, bandwidth-constrained |
For globally distributed systems serving users across continents, cross-region latency makes real-time synchronization impractical. Cell-based architectures solve this by partitioning the problem geographically.
Global Control Plane
(Configuration, Aggregated Analytics)
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ US-EAST │ │ EU-WEST │ │ AP-SOUTH│
│ CELL │ │ CELL │ │ CELL │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ Local │ │ Local │ │ Local │
│ Redis │ │ Redis │ │ Redis │
│ Cluster │ │ Cluster │ │ Cluster │
└─────────┘ └─────────┘ └─────────┘
Cell Properties:
Static Allocation:
Global limit: 10,000 requests/hour
US-EAST: 4,000 (40%)
EU-WEST: 3,500 (35%)
AP-SOUTH: 2,500 (25%)
Based on expected traffic distribution. Simple but inflexible.
Dynamic Allocation: Cells request quota from a global coordinator:
async function requestQuota(cellId: string, clientId: string): Promise<number> {
const globalRemaining = await globalCoordinator.getRemaining(clientId);
const cellCount = await globalCoordinator.getActiveCellCount();
// Grant fair share to this cell
const quota = Math.floor(globalRemaining / cellCount);
await globalCoordinator.reserveQuota(clientId, cellId, quota);
return quota;
}
Cells request quota in batches (e.g., 100 request blocks) to minimize coordinator traffic.
Overcommit + Reconcile: Each cell gets 100% of the limit but tracks global usage:
Cell A: Allows up to 10,000, used 3,000
Cell B: Allows up to 10,000, used 4,500
Cell C: Allows up to 10,000, used 2,000
Global usage: 9,500 (reported via async sync)
Remaining: 500 (distributed proactively)
When global usage approaches limit, cells reduce local limits.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130
/** * Cell-based rate limiter for global deployments * Each geographic cell manages local limits with global coordination */class CellRateLimiter { private localQuota: Map<string, CellQuota> = new Map(); private readonly quotaRefreshIntervalMs = 5000; // Refresh quota every 5s constructor( private readonly cellId: string, private readonly localRedis: Redis, private readonly globalCoordinator: GlobalCoordinator ) { // Periodic quota refresh from global coordinator setInterval(() => this.refreshQuotas(), this.quotaRefreshIntervalMs); } /** * Check rate limit using local cell quota */ async checkLimit( clientId: string, globalLimit: number ): Promise<RateLimitResult> { let quota = this.localQuota.get(clientId); if (!quota || quota.remaining <= 0) { // Need to fetch/refresh quota from global quota = await this.fetchQuota(clientId, globalLimit); this.localQuota.set(clientId, quota); } if (quota.remaining <= 0) { return { allowed: false, remaining: 0, limit: globalLimit, retryAfter: Math.ceil((quota.resetAt - Date.now()) / 1000) }; } // Consume from local quota quota.remaining--; quota.used++; // Record in local Redis for durability await this.recordUsage(clientId, 1); return { allowed: true, remaining: quota.remaining, limit: globalLimit, retryAfter: 0 }; } /** * Fetch quota allocation from global coordinator */ private async fetchQuota( clientId: string, globalLimit: number ): Promise<CellQuota> { try { const allocation = await this.globalCoordinator.requestQuota({ cellId: this.cellId, clientId, requestedQuota: Math.ceil(globalLimit / 10), // Request 10% blocks currentUsage: await this.getLocalUsage(clientId) }); return { remaining: allocation.granted, used: 0, resetAt: allocation.expiresAt }; } catch (error) { // Fail-open with conservative local limit console.error('Global coordinator unreachable:', error); return { remaining: Math.ceil(globalLimit / 100), // 1% fallback used: 0, resetAt: Date.now() + 60000 }; } } /** * Refresh quotas for active clients */ private async refreshQuotas(): Promise<void> { const activeClients = Array.from(this.localQuota.keys()); for (const clientId of activeClients) { const quota = this.localQuota.get(clientId)!; // Report usage and get fresh quota if (quota.used > 0) { await this.globalCoordinator.reportUsage({ cellId: this.cellId, clientId, used: quota.used }); quota.used = 0; } } } private async recordUsage(clientId: string, count: number): Promise<void> { const key = `cell:${this.cellId}:ratelimit:${clientId}`; await this.localRedis.incrby(key, count); } private async getLocalUsage(clientId: string): Promise<number> { const key = `cell:${this.cellId}:ratelimit:${clientId}`; const usage = await this.localRedis.get(key); return parseInt(usage || '0', 10); }} interface CellQuota { remaining: number; used: number; resetAt: number;} interface GlobalCoordinator { requestQuota(request: QuotaRequest): Promise<QuotaAllocation>; reportUsage(report: UsageReport): Promise<void>;}Stripe uses a cell-based architecture for their rate limiting. Each geographic cell (US, EU, APAC) manages local limits while a global aggregation layer ensures total limits are respected. This approach lets them serve 99th percentile latencies under 10ms while accurately enforcing global limits.
Distributed systems fail. Network partitions, node crashes, and Redis outages are inevitable. A robust distributed rate limiter must handle these gracefully.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133
/** * Resilient rate limiter with multiple fallback layers */class ResilientRateLimiter { private readonly layers: RateLimitLayer[] = []; constructor( private readonly distributed: DistributedRateLimiter, private readonly local: LocalRateLimiter, private readonly emergency: EmergencyRateLimiter ) { this.layers = [ { name: 'distributed', limiter: this.distributed, healthCheck: () => this.distributed.isHealthy(), timeout: 100 // 100ms timeout }, { name: 'local', limiter: this.local, healthCheck: () => true, // Always available timeout: 1 // 1ms (in-memory) }, { name: 'emergency', limiter: this.emergency, healthCheck: () => true, timeout: 1 } ]; } /** * Try each layer in order until one succeeds */ async checkLimit( clientId: string, limit: number ): Promise<RateLimitResult> { for (const layer of this.layers) { if (!layer.healthCheck()) { continue; // Skip unhealthy layer } try { const result = await this.withTimeout( layer.limiter.checkLimit(clientId, limit), layer.timeout ); // Record which layer handled this this.metrics.recordLayerUsage(layer.name); return result; } catch (error) { console.warn(`Rate limiter layer ${layer.name} failed:`, error); this.metrics.recordLayerFailure(layer.name); continue; // Try next layer } } // All layers failed - fail open as last resort console.error('All rate limiter layers failed, failing open'); this.metrics.recordFailOpen(); return { allowed: true, remaining: limit, limit, retryAfter: 0 }; } private async withTimeout<T>( promise: Promise<T>, timeoutMs: number ): Promise<T> { return Promise.race([ promise, new Promise<never>((_, reject) => setTimeout(() => reject(new Error('Timeout')), timeoutMs) ) ]); } private metrics = { recordLayerUsage: (layer: string) => { /* Prometheus counter */ }, recordLayerFailure: (layer: string) => { /* Prometheus counter */ }, recordFailOpen: () => { /* Prometheus counter + alert */ } };} /** * Emergency rate limiter with very conservative limits * Used when all other systems are down */class EmergencyRateLimiter { private counts: Map<string, number> = new Map(); private lastReset: number = Date.now(); private readonly resetIntervalMs = 60000; // Emergency limits: 10% of normal to prevent abuse during outage private readonly emergencyLimitPercent = 0.1; async checkLimit( clientId: string, normalLimit: number ): Promise<RateLimitResult> { const now = Date.now(); // Reset all counters periodically if (now - this.lastReset > this.resetIntervalMs) { this.counts.clear(); this.lastReset = now; } const emergencyLimit = Math.ceil(normalLimit * this.emergencyLimitPercent); const current = this.counts.get(clientId) || 0; if (current >= emergencyLimit) { return { allowed: false, remaining: 0, limit: emergencyLimit, retryAfter: Math.ceil((this.lastReset + this.resetIntervalMs - now) / 1000) }; } this.counts.set(clientId, current + 1); return { allowed: true, remaining: emergencyLimit - current - 1, limit: emergencyLimit, retryAfter: 0 }; }}Distributed rate limiting is a complex but critical component of production API systems. Let's consolidate our understanding:
What's Next:
We've covered how to coordinate rate limits across infrastructure. But how do we design the policies themselves? In the next page, we'll explore Per-User vs Per-API Limits—the strategies for designing flexible, hierarchical rate limiting policies that serve different use cases fairly.
You now understand the full landscape of distributed rate limiting—from centralized Redis to gossip protocols to cell-based global architecture. You can design systems that maintain accurate limits across hundreds of servers while preserving sub-millisecond latency. Next, we'll dive into policy design.