Rate Limiter - Learning Module

Loading content...

0/273

Distributed Rate Limiting

Rate Limiting Across a Global Fleet

A rate limiter running on a single server is straightforward—we've mastered token buckets and sliding windows. But modern API platforms run across hundreds or thousands of servers in multiple geographic regions. When a client with a 100 requests/minute limit makes a request that lands on server A in Virginia, how does server B in Singapore know about it?

This is the challenge of distributed rate limiting: maintaining consistent, accurate limits across a globally distributed fleet while preserving the sub-millisecond latency that users expect. It's a classic distributed systems problem that forces us to navigate the trade-offs of the CAP theorem.

In this page, we'll explore how companies like Stripe, Cloudflare, and GitHub solve distributed rate limiting at scale—from centralized coordination to gossip protocols to intelligent approximations that sacrifice a little accuracy for orders of magnitude better performance.

What You Will Master

By the end of this page, you'll understand: (1) Why distributed rate limiting is fundamentally hard, (2) Centralized approaches with Redis/Memcached, (3) Local + remote hybrid architectures, (4) Gossip protocols for eventually consistent limiting, (5) Cell-based architectures for geographic distribution, and (6) The trade-offs each approach makes.

The Distributed Challenge

Before diving into solutions, let's deeply understand why distributed rate limiting is challenging. This understanding prevents us from choosing inappropriate solutions.

The Fundamental Problem

Consider a simplified scenario:

Client: 100 requests/minute limit

Server A                    Server B
├── Receives request 1      ├── Receives request 2
├── Local count: 1          ├── Local count: 1  
├── Allows                  ├── Allows
└── ???                     └── ???

Question: What is the client's *actual* request count?
Answer: 2 (but neither server knows this)

Without coordination, each server makes decisions based on incomplete information. A client could make:

100 requests to Server A (all allowed)
100 requests to Server B (all allowed)
Total: 200 requests, 2× the limit

With N servers and no coordination, clients can potentially make N × limit requests.

CAP Theorem Implications

Distributed rate limiting is subject to the CAP theorem:

Consistency (C): All nodes see the same request count at the same time Availability (A): Every request gets a rate limit decision Partition Tolerance (P): System continues working during network failures

We must choose between:

CP (Consistent + Partition-tolerant):

Strong consistency through distributed consensus
Some requests may block or fail during partitions
Example: Synchronous Redis cluster

AP (Available + Partition-tolerant):

Always returns a decision, but counts may be inconsistent
May allow some over-limit requests during partitions
Example: Local counting with async sync

Real-world choice: Most production systems choose AP, accepting some over-limiting in exchange for availability and low latency. The trade-off is acceptable because rate limits are usually approximate quotas, not hard security boundaries.

Key Distributed Challenges

•Network Latency — Synchronizing with a remote counter adds milliseconds to every request. At 100K+ rps, this overhead is crippling.
•Single Point of Failure — Centralized counters become reliability risks. If Redis goes down, rate limiting fails.
•Clock Skew — Different servers have different times. What's 'the current minute' when servers disagree by 2 seconds?
•Hot Keys — High-traffic clients create hot spots on specific shards, overwhelming individual nodes.
•Cross-Region Sync — When servers span continents, synchronization latency makes strong consistency impractical.

The Latency Budget

Consider: If your API has a 50ms p99 latency target and rate limiting adds 10ms for a remote check, you've consumed 20% of your budget. For sub-10ms APIs, synchronous remote rate limiting is simply not viable. Architecture must account for this constraint.

Centralized Rate Limiting with Redis

The simplest distributed rate limiting architecture uses a centralized counter service—typically Redis—that all application servers query. While not infinitely scalable, this approach works well for many production systems.

Architecture Overview

                    ┌─────────────────────────────────────┐
                    │          Load Balancer              │
                    └─────────────┬───────────────────────┘
                                  │
          ┌───────────────────────┼───────────────────────┐
          │                       │                       │
          ▼                       ▼                       ▼
   ┌─────────────┐        ┌─────────────┐        ┌─────────────┐
   │   App       │        │   App       │        │   App       │
   │  Server A   │        │  Server B   │        │  Server C   │
   └──────┬──────┘        └──────┬──────┘        └──────┬──────┘
          │                      │                      │
          └──────────────┬───────┴──────────────────────┘
                         │
                         ▼
               ┌─────────────────┐
               │     Redis       │
               │   (Counters)    │
               └─────────────────┘

Every rate limit check goes through Redis, ensuring all servers see the same state.

centralized-rate-limiter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
/**
 * Centralized rate limiter using Redis
 * All nodes share a single source of truth
 */
import Redis from 'ioredis';
 
class CentralizedRateLimiter {
  private redis: Redis;
  private readonly scriptSha: string | null = null;
  
  constructor(redisUrl: string) {
    this.redis = new Redis(redisUrl, {
      // Connection pooling for performance
      maxRetriesPerRequest: 3,
      retryDelayOnFailover: 100,
      connectTimeout: 1000,
      commandTimeout: 100,  // Fast timeout for rate limiting
    });
  }
  
  /**
   * Check rate limit using Redis Lua script (atomic)
   */
  async checkLimit(
    clientId: string,
    limit: number,
    windowSizeSeconds: number
  ): Promise<RateLimitResult> {
    const now = Date.now();
    const windowKey = this.getWindowKey(clientId, now, windowSizeSeconds);
    
    try {
      // Use EVAL for atomic increment + check
      const result = await this.redis.eval(
        SLIDING_WINDOW_SCRIPT,
        1,                          // Number of keys
        windowKey,                  // KEYS[1]
        limit.toString(),           // ARGV[1]
        windowSizeSeconds.toString(), // ARGV[2]
        now.toString()              // ARGV[3]
      ) as [number, number, number];
      
      const [allowed, remaining, retryAfter] = result;
      
      return {
        allowed: allowed === 1,
        remaining,
        limit,
        retryAfter,
        resetAt: this.calculateResetAt(now, windowSizeSeconds)
      };
    } catch (error) {
      // Fail-open: allow on Redis errors
      console.error('Rate limiter Redis error:', error);
      return this.failOpen(limit);
    }
  }
  
  private getWindowKey(clientId: string, now: number, windowSize: number): string {
    // Key includes window number for automatic expiry
    const window = Math.floor(now / (windowSize * 1000));
    return `ratelimit:${clientId}:${window}`;
  }
  
  private calculateResetAt(now: number, windowSize: number): number {
    const windowMs = windowSize * 1000;
    const currentWindowStart = Math.floor(now / windowMs) * windowMs;
    return currentWindowStart + windowMs;
  }
  
  /**
   * Fail-open policy: allow request on Redis failure
   * Protects availability at cost of rate limit accuracy
   */
  private failOpen(limit: number): RateLimitResult {
    return {
      allowed: true,
      remaining: limit,
      limit,
      retryAfter: 0,
      resetAt: Date.now() + 60000
    };
  }
}
 
// Lua script for atomic sliding window counter
const SLIDING_WINDOW_SCRIPT = `
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window_size = tonumber(ARGV[2]) * 1000
local now = tonumber(ARGV[3])
 
-- Get current count
local current = tonumber(redis.call('GET', key) or '0')
 
if current < limit then
  -- Increment and set expiry
  redis.call('INCR', key)
  redis.call('PEXPIRE', key, window_size * 2)
  
  return {1, limit - current - 1, 0}
else
  return {0, 0, math.ceil(window_size / 1000)}
end
`;

Advantages

•Strong consistency — All servers see the same count
•Simple implementation — Single source of truth
•Battle-tested — Redis is well-understood
•Atomic operations — Lua scripts prevent races

Disadvantages

•Added latency — Network round-trip per request
•Single point of failure — Redis availability = rate limiter availability
•Scalability ceiling — Single Redis has throughput limits
•Cross-region latency — Distant servers pay high latency

Scaling Centralized Redis

Redis Cluster: Redis Cluster shards keys across multiple nodes, distributing load:

Client A → Hash → Shard 1
Client B → Hash → Shard 2
Client C → Hash → Shard 3

With 10 shards, throughput scales ~10×.

Read Replicas: For read-heavy workloads (checking limits without incrementing), replicas can serve reads:

Master: Handles INCR (writes)
Replica 1-N: Handle GET (reads)

Hot Key Problem: Very active clients become hot keys that overload specific shards. Mitigation:

Add randomness to key names to spread across slots
Use local rate limiting as first-pass filter
Aggregate famous clients to dedicated shards

Local + Remote Hybrid Architecture

The most common production pattern combines local rate limiting with asynchronous remote synchronization. This approach provides low latency while maintaining reasonable accuracy.

The Two-Layer Architecture

                     ┌────────────────────────────────────┐
                     │           Client Request            │
                     └─────────────────┬──────────────────┘
                                       │
                                       ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Application Server                            │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                     Local Rate Limiter                       │    │
│  │  • In-memory token bucket                                    │    │
│  │  • Sub-microsecond latency                                   │    │
│  │  • Per-node limit (global limit ÷ node count)               │    │
│  └──────────────────────────┬──────────────────────────────────┘    │
│                             │                                        │
│                  ╔══════════╧═══════════╗                           │
│                  ║    Local Allowed?    ║                           │
│                  ╚══════════╤═══════════╝                           │
│                   ▼ No                 ▼ Yes                        │
│              Return 429       Mark for Background Sync               │
└───────────────────────────────────┬─────────────────────────────────┘
                                    │ (Async, every 100ms)
                                    ▼
                        ┌───────────────────────┐
                        │    Redis Cluster      │
                        │   (Global Counters)   │
                        └───────────────────────┘

How it works:

Every request first checks the local rate limiter (in-memory, instant)
If denied locally → immediately return 429
If allowed locally → process request, mark for async sync
Background process batches local counts and syncs to Redis every 100ms
Redis response updates local limiter's view of global usage

hybrid-rate-limiter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
/**
 * Hybrid Rate Limiter: Local + Remote
 * Fast local checks with async global synchronization
 */
class HybridRateLimiter {
  private localBuckets: Map<string, TokenBucket> = new Map();
  private pendingIncrements: Map<string, number> = new Map();
  private globalCounts: Map<string, number> = new Map();
  
  private readonly syncIntervalMs = 100;  // Sync every 100ms
  private readonly localSharePercent: number;
  
  constructor(
    private readonly redis: Redis,
    private readonly nodeCount: number = 10
  ) {
    // Each node gets 1/N of the limit + some buffer for eventual consistency
    this.localSharePercent = 1.0 / nodeCount + 0.1;  // 10% buffer
    
    // Start background sync
    setInterval(() => this.syncWithRedis(), this.syncIntervalMs);
  }
  
  /**
   * Main rate limit check - uses local bucket first
   */
  async checkLimit(
    clientId: string,
    globalLimit: number,
    windowSizeSeconds: number
  ): Promise<RateLimitResult> {
    // Calculate local limit (node's share of global)
    const localLimit = Math.ceil(globalLimit * this.localSharePercent);
    
    // Get or create local bucket
    let bucket = this.localBuckets.get(clientId);
    if (!bucket) {
      bucket = new TokenBucket(localLimit, localLimit / windowSizeSeconds);
      this.localBuckets.set(clientId, bucket);
    }
    
    // Fast local check
    const localResult = bucket.consume();
    
    if (!localResult.allowed) {
      // Definitely denied - local share exhausted
      return {
        allowed: false,
        remaining: 0,
        limit: globalLimit,
        retryAfter: localResult.retryAfter,
        resetAt: localResult.resetAt
      };
    }
    
    // Allowed locally - record for sync
    this.recordIncrement(clientId);
    
    // Estimate global remaining from cached data
    const globalUsed = this.globalCounts.get(clientId) || 0;
    const remaining = Math.max(0, globalLimit - globalUsed - 1);
    
    return {
      allowed: true,
      remaining,
      limit: globalLimit,
      retryAfter: 0,
      resetAt: this.calculateResetAt(windowSizeSeconds)
    };
  }
  
  /**
   * Record increment for batch sync
   */
  private recordIncrement(clientId: string): void {
    const current = this.pendingIncrements.get(clientId) || 0;
    this.pendingIncrements.set(clientId, current + 1);
  }
  
  /**
   * Background sync with Redis
   * Batches local increments and fetches global state
   */
  private async syncWithRedis(): Promise<void> {
    if (this.pendingIncrements.size === 0) return;
    
    const now = Date.now();
    const pipeline = this.redis.pipeline();
    
    // Batch all pending increments
    for (const [clientId, count] of this.pendingIncrements) {
      const key = this.getWindowKey(clientId, now);
      pipeline.incrby(key, count);
      pipeline.pexpire(key, 120000);  // 2 minute expiry
    }
    
    this.pendingIncrements.clear();
    
    try {
      const results = await pipeline.exec();
      
      // Update global counts from Redis responses
      let i = 0;
      for (const [clientId] of this.pendingIncrements) {
        const countResult = results![i * 2];  // INCRBY result
        if (countResult && countResult[1]) {
          this.globalCounts.set(clientId, countResult[1] as number);
        }
        i++;
      }
    } catch (error) {
      console.error('Redis sync failed:', error);
      // Continue operating with local limits only
    }
  }
  
  private getWindowKey(clientId: string, now: number): string {
    const minute = Math.floor(now / 60000);
    return `ratelimit:${clientId}:${minute}`;
  }
  
  private calculateResetAt(windowSize: number): number {
    const windowMs = windowSize * 1000;
    const now = Date.now();
    return Math.ceil(now / windowMs) * windowMs;
  }
}

Tuning the Local Share

The local share percentage significantly affects accuracy. Too small = excessive false rejections. Too large = significant over-limit allowance. Start with (1/nodeCount + 10%) and tune based on observed traffic patterns. If traffic is well-balanced across nodes, reduce the buffer. If traffic is bursty, increase it.

Accuracy Analysis

Best Case (Balanced Traffic): When requests are evenly distributed across N nodes:

Each node uses ~1/N of the limit
Total usage: ~100% of limit
Accuracy: ~100%

Worst Case (Skewed Traffic): When all requests hit one node:

That node allows 1/N + buffer = ~20% for N=10
Other nodes: idle
Total allowed: ~20% of limit
Accuracy: Client gets only 20% of entitled limit

Mitigation: Adaptive rebalancing can detect skew and adjust local limits dynamically:

if (localUsage > 80% && globalUsage < 50%) {
  // This node is hot, others are cold
  // Temporarily increase local share
  localShare = Math.min(0.5, localShare * 1.2);
}

Gossip-Based Distributed Limiting

For systems that must avoid any centralized dependency, gossip protocols enable peer-to-peer coordination. Each node shares its local counts with neighbors, eventually reaching a consistent view without a central coordinator.

How Gossip Works for Rate Limiting

Time T0: Each node has local counts

Node A: {client1: 10, client2: 5}
Node B: {client1: 8, client3: 12}
Node C: {client2: 3, client3: 7}

Time T1: Node A gossips to Node B
Node A → Node B: {client1: 10, client2: 5, timestamp: T0}
Node B merges: {client1: 18, client2: 5, client3: 12}

Time T2: Node B gossips to Node C
Node B → Node C: {client1: 18, client2: 5, client3: 12}
Node C merges: {client1: 18, client2: 8, client3: 19}

Time T3: Eventually all nodes converge to:
{client1: 18, client2: 8, client3: 19}

Key Properties:

No central coordinator — Any node can fail without losing the system
Eventually consistent — All nodes converge to the same view
Scalable — Each node talks to a few peers, not all
Tunable convergence — Trade speed for bandwidth by adjusting gossip frequency

gossip-rate-limiter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
/**
 * Gossip-based distributed rate limiter
 * Decentralized coordination without single point of failure
 */
class GossipRateLimiter {
  private localCounts: Map<string, CounterWithTimestamp> = new Map();
  private peerNodes: string[] = [];
  private readonly gossipIntervalMs = 50;  // Gossip every 50ms
  private readonly convergenceTarget = 200; // Expect convergence within 200ms
  
  constructor(
    private readonly nodeId: string,
    private readonly nodeCount: number
  ) {
    // Start gossip protocol
    setInterval(() => this.gossip(), this.gossipIntervalMs);
  }
  
  /**
   * Configure peer nodes for gossip
   */
  setPeers(peers: string[]): void {
    this.peerNodes = peers.filter(p => p !== this.nodeId);
  }
  
  /**
   * Check rate limit using local + gossiped counts
   */
  async checkLimit(
    clientId: string,
    limit: number,
    windowSizeSeconds: number
  ): Promise<RateLimitResult> {
    const entry = this.localCounts.get(clientId);
    const globalEstimate = entry?.globalEstimate || 0;
    
    if (globalEstimate >= limit) {
      return {
        allowed: false,
        remaining: 0,
        limit,
        retryAfter: Math.ceil(windowSizeSeconds / limit)
      };
    }
    
    // Increment local count
    this.incrementLocal(clientId);
    
    return {
      allowed: true,
      remaining: Math.max(0, limit - globalEstimate - 1),
      limit,
      retryAfter: 0
    };
  }
  
  /**
   * Increment local counter for a client
   */
  private incrementLocal(clientId: string): void {
    let entry = this.localCounts.get(clientId);
    if (!entry) {
      entry = {
        localCount: 0,
        remoteCounts: new Map(),
        lastUpdated: Date.now(),
        globalEstimate: 0
      };
      this.localCounts.set(clientId, entry);
    }
    
    entry.localCount++;
    entry.lastUpdated = Date.now();
    this.updateGlobalEstimate(entry);
  }
  
  /**
   * Gossip local state to random peer
   */
  private async gossip(): Promise<void> {
    if (this.peerNodes.length === 0) return;
    
    // Select random peer (or use consistent hashing for stability)
    const peer = this.peerNodes[Math.floor(Math.random() * this.peerNodes.length)];
    
    // Build gossip message with local counts
    const message: GossipMessage = {
      sourceNode: this.nodeId,
      timestamp: Date.now(),
      counts: new Map()
    };
    
    for (const [clientId, entry] of this.localCounts) {
      message.counts.set(clientId, {
        nodeId: this.nodeId,
        count: entry.localCount,
        timestamp: entry.lastUpdated
      });
    }
    
    // Send to peer (in real impl, use HTTP/gRPC)
    await this.sendToPeer(peer, message);
  }
  
  /**
   * Handle incoming gossip from peer
   */
  handleGossip(message: GossipMessage): void {
    for (const [clientId, peerCount] of message.counts) {
      let entry = this.localCounts.get(clientId);
      if (!entry) {
        entry = {
          localCount: 0,
          remoteCounts: new Map(),
          lastUpdated: Date.now(),
          globalEstimate: 0
        };
        this.localCounts.set(clientId, entry);
      }
      
      // Merge peer count if newer
      const existing = entry.remoteCounts.get(peerCount.nodeId);
      if (!existing || peerCount.timestamp > existing.timestamp) {
        entry.remoteCounts.set(peerCount.nodeId, peerCount);
        this.updateGlobalEstimate(entry);
      }
    }
  }
  
  /**
   * Calculate global estimate from local + all known remote counts
   */
  private updateGlobalEstimate(entry: CounterWithTimestamp): void {
    let total = entry.localCount;
    
    // Sum all remote counts
    for (const remote of entry.remoteCounts.values()) {
      total += remote.count;
    }
    
    entry.globalEstimate = total;
  }
  
  private async sendToPeer(peer: string, message: GossipMessage): Promise<void> {
    // Implementation: HTTP POST, gRPC call, or message queue
    // Typically uses a lightweight reliable transport
  }
}
 
interface CounterWithTimestamp {
  localCount: number;
  remoteCounts: Map<string, PeerCount>;
  lastUpdated: number;
  globalEstimate: number;
}
 
interface PeerCount {
  nodeId: string;
  count: number;
  timestamp: number;
}
 
interface GossipMessage {
  sourceNode: string;
  timestamp: number;
  counts: Map<string, PeerCount>;
}

Gossip Protocol Trade-offs
Parameter	Faster Gossip	Slower Gossip
Convergence Time	10-50ms	200-1000ms
Network Bandwidth	Higher	Lower
Accuracy During Burst	Better	Worse
CPU Overhead	Higher	Lower
Suitable For	Strict limits, low latency	Loose limits, bandwidth-constrained

Cell-Based Geographic Architecture

For globally distributed systems serving users across continents, cross-region latency makes real-time synchronization impractical. Cell-based architectures solve this by partitioning the problem geographically.

Cell Architecture Overview

                         Global Control Plane
                    (Configuration, Aggregated Analytics)
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
   ┌─────────┐          ┌─────────┐          ┌─────────┐
   │ US-EAST │          │ EU-WEST │          │ AP-SOUTH│
   │  CELL   │          │  CELL   │          │  CELL   │
   └────┬────┘          └────┬────┘          └────┬────┘
        │                    │                    │
   ┌────┴────┐          ┌────┴────┐          ┌────┴────┐
   │ Local   │          │ Local   │          │ Local   │
   │ Redis   │          │ Redis   │          │ Redis   │
   │ Cluster │          │ Cluster │          │ Cluster │
   └─────────┘          └─────────┘          └─────────┘

Cell Properties:

Each cell is a self-contained rate limiting system
Cells are geographically co-located with users
Inter-cell sync is eventual (seconds, not milliseconds)
Each cell gets a portion of the global limit

Limit Allocation Strategies

Static Allocation:

Global limit: 10,000 requests/hour

US-EAST: 4,000 (40%)
EU-WEST: 3,500 (35%)
AP-SOUTH: 2,500 (25%)

Based on expected traffic distribution. Simple but inflexible.

Dynamic Allocation: Cells request quota from a global coordinator:

async function requestQuota(cellId: string, clientId: string): Promise<number> {
  const globalRemaining = await globalCoordinator.getRemaining(clientId);
  const cellCount = await globalCoordinator.getActiveCellCount();
  
  // Grant fair share to this cell
  const quota = Math.floor(globalRemaining / cellCount);
  
  await globalCoordinator.reserveQuota(clientId, cellId, quota);
  return quota;
}

Cells request quota in batches (e.g., 100 request blocks) to minimize coordinator traffic.

Overcommit + Reconcile: Each cell gets 100% of the limit but tracks global usage:

Cell A: Allows up to 10,000, used 3,000
Cell B: Allows up to 10,000, used 4,500
Cell C: Allows up to 10,000, used 2,000

Global usage: 9,500 (reported via async sync)
Remaining: 500 (distributed proactively)

When global usage approaches limit, cells reduce local limits.

cell-rate-limiter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
/**
 * Cell-based rate limiter for global deployments
 * Each geographic cell manages local limits with global coordination
 */
class CellRateLimiter {
  private localQuota: Map<string, CellQuota> = new Map();
  private readonly quotaRefreshIntervalMs = 5000;  // Refresh quota every 5s
  
  constructor(
    private readonly cellId: string,
    private readonly localRedis: Redis,
    private readonly globalCoordinator: GlobalCoordinator
  ) {
    // Periodic quota refresh from global coordinator
    setInterval(() => this.refreshQuotas(), this.quotaRefreshIntervalMs);
  }
  
  /**
   * Check rate limit using local cell quota
   */
  async checkLimit(
    clientId: string,
    globalLimit: number
  ): Promise<RateLimitResult> {
    let quota = this.localQuota.get(clientId);
    
    if (!quota || quota.remaining <= 0) {
      // Need to fetch/refresh quota from global
      quota = await this.fetchQuota(clientId, globalLimit);
      this.localQuota.set(clientId, quota);
    }
    
    if (quota.remaining <= 0) {
      return {
        allowed: false,
        remaining: 0,
        limit: globalLimit,
        retryAfter: Math.ceil((quota.resetAt - Date.now()) / 1000)
      };
    }
    
    // Consume from local quota
    quota.remaining--;
    quota.used++;
    
    // Record in local Redis for durability
    await this.recordUsage(clientId, 1);
    
    return {
      allowed: true,
      remaining: quota.remaining,
      limit: globalLimit,
      retryAfter: 0
    };
  }
  
  /**
   * Fetch quota allocation from global coordinator
   */
  private async fetchQuota(
    clientId: string,
    globalLimit: number
  ): Promise<CellQuota> {
    try {
      const allocation = await this.globalCoordinator.requestQuota({
        cellId: this.cellId,
        clientId,
        requestedQuota: Math.ceil(globalLimit / 10),  // Request 10% blocks
        currentUsage: await this.getLocalUsage(clientId)
      });
      
      return {
        remaining: allocation.granted,
        used: 0,
        resetAt: allocation.expiresAt
      };
    } catch (error) {
      // Fail-open with conservative local limit
      console.error('Global coordinator unreachable:', error);
      return {
        remaining: Math.ceil(globalLimit / 100),  // 1% fallback
        used: 0,
        resetAt: Date.now() + 60000
      };
    }
  }
  
  /**
   * Refresh quotas for active clients
   */
  private async refreshQuotas(): Promise<void> {
    const activeClients = Array.from(this.localQuota.keys());
    
    for (const clientId of activeClients) {
      const quota = this.localQuota.get(clientId)!;
      
      // Report usage and get fresh quota
      if (quota.used > 0) {
        await this.globalCoordinator.reportUsage({
          cellId: this.cellId,
          clientId,
          used: quota.used
        });
        quota.used = 0;
      }
    }
  }
  
  private async recordUsage(clientId: string, count: number): Promise<void> {
    const key = `cell:${this.cellId}:ratelimit:${clientId}`;
    await this.localRedis.incrby(key, count);
  }
  
  private async getLocalUsage(clientId: string): Promise<number> {
    const key = `cell:${this.cellId}:ratelimit:${clientId}`;
    const usage = await this.localRedis.get(key);
    return parseInt(usage || '0', 10);
  }
}
 
interface CellQuota {
  remaining: number;
  used: number;
  resetAt: number;
}
 
interface GlobalCoordinator {
  requestQuota(request: QuotaRequest): Promise<QuotaAllocation>;
  reportUsage(report: UsageReport): Promise<void>;
}

Cell Architecture at Scale

Stripe uses a cell-based architecture for their rate limiting. Each geographic cell (US, EU, APAC) manages local limits while a global aggregation layer ensures total limits are respected. This approach lets them serve 99th percentile latencies under 10ms while accurately enforcing global limits.

Fault Tolerance Strategies

Distributed systems fail. Network partitions, node crashes, and Redis outages are inevitable. A robust distributed rate limiter must handle these gracefully.

Failure Modes and Mitigations

•Redis Primary Failure → Promote replica to primary, failover in <30 seconds. During failover, use local limits only.
•Network Partition → Cells operate independently with local limits. Reconcile when partition heals.
•Full Redis Cluster Outage → Fall back to in-memory local limiting. Alert operators immediately.
•Global Coordinator Unreachable → Continue with last known quotas. Request larger-than-normal batches when restored.
•Individual Node Failure → Other nodes compensate. Traffic redistributes automatically via load balancer.

resilient-rate-limiter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
/**
 * Resilient rate limiter with multiple fallback layers
 */
class ResilientRateLimiter {
  private readonly layers: RateLimitLayer[] = [];
  
  constructor(
    private readonly distributed: DistributedRateLimiter,
    private readonly local: LocalRateLimiter,
    private readonly emergency: EmergencyRateLimiter
  ) {
    this.layers = [
      {
        name: 'distributed',
        limiter: this.distributed,
        healthCheck: () => this.distributed.isHealthy(),
        timeout: 100  // 100ms timeout
      },
      {
        name: 'local',
        limiter: this.local,
        healthCheck: () => true,  // Always available
        timeout: 1  // 1ms (in-memory)
      },
      {
        name: 'emergency',
        limiter: this.emergency,
        healthCheck: () => true,
        timeout: 1
      }
    ];
  }
  
  /**
   * Try each layer in order until one succeeds
   */
  async checkLimit(
    clientId: string,
    limit: number
  ): Promise<RateLimitResult> {
    for (const layer of this.layers) {
      if (!layer.healthCheck()) {
        continue;  // Skip unhealthy layer
      }
      
      try {
        const result = await this.withTimeout(
          layer.limiter.checkLimit(clientId, limit),
          layer.timeout
        );
        
        // Record which layer handled this
        this.metrics.recordLayerUsage(layer.name);
        
        return result;
      } catch (error) {
        console.warn(`Rate limiter layer ${layer.name} failed:`, error);
        this.metrics.recordLayerFailure(layer.name);
        continue;  // Try next layer
      }
    }
    
    // All layers failed - fail open as last resort
    console.error('All rate limiter layers failed, failing open');
    this.metrics.recordFailOpen();
    return { allowed: true, remaining: limit, limit, retryAfter: 0 };
  }
  
  private async withTimeout<T>(
    promise: Promise<T>,
    timeoutMs: number
  ): Promise<T> {
    return Promise.race([
      promise,
      new Promise<never>((_, reject) =>
        setTimeout(() => reject(new Error('Timeout')), timeoutMs)
      )
    ]);
  }
  
  private metrics = {
    recordLayerUsage: (layer: string) => { /* Prometheus counter */ },
    recordLayerFailure: (layer: string) => { /* Prometheus counter */ },
    recordFailOpen: () => { /* Prometheus counter + alert */ }
  };
}
 
/**
 * Emergency rate limiter with very conservative limits
 * Used when all other systems are down
 */
class EmergencyRateLimiter {
  private counts: Map<string, number> = new Map();
  private lastReset: number = Date.now();
  private readonly resetIntervalMs = 60000;
  
  // Emergency limits: 10% of normal to prevent abuse during outage
  private readonly emergencyLimitPercent = 0.1;
  
  async checkLimit(
    clientId: string,
    normalLimit: number
  ): Promise<RateLimitResult> {
    const now = Date.now();
    
    // Reset all counters periodically
    if (now - this.lastReset > this.resetIntervalMs) {
      this.counts.clear();
      this.lastReset = now;
    }
    
    const emergencyLimit = Math.ceil(normalLimit * this.emergencyLimitPercent);
    const current = this.counts.get(clientId) || 0;
    
    if (current >= emergencyLimit) {
      return {
        allowed: false,
        remaining: 0,
        limit: emergencyLimit,
        retryAfter: Math.ceil((this.lastReset + this.resetIntervalMs - now) / 1000)
      };
    }
    
    this.counts.set(clientId, current + 1);
    
    return {
      allowed: true,
      remaining: emergencyLimit - current - 1,
      limit: emergencyLimit,
      retryAfter: 0
    };
  }
}

Summary: Distributed Rate Limiting

Distributed rate limiting is a complex but critical component of production API systems. Let's consolidate our understanding:

Key Takeaways

•Distribution is fundamentally hard — CAP theorem trade-offs apply. Choose AP (availability) for most APIs.
•Centralized Redis works well at moderate scale — Simple, consistent, but adds latency and creates a single point of failure.
•Hybrid local+remote is the production standard — Local buckets for speed, async sync for accuracy, best of both worlds.
•Gossip protocols enable decentralization — No coordinator, eventually consistent, scales to large node counts.
•Cell-based architecture solves global distribution — Geographic partitioning with quota allocation from a global coordinator.
•Fault tolerance requires multiple layers — Distributed → local → emergency fallback ensures availability during failures.

What's Next:

We've covered how to coordinate rate limits across infrastructure. But how do we design the policies themselves? In the next page, we'll explore Per-User vs Per-API Limits—the strategies for designing flexible, hierarchical rate limiting policies that serve different use cases fairly.

Page Complete

You now understand the full landscape of distributed rate limiting—from centralized Redis to gossip protocols to cell-based global architecture. You can design systems that maintain accurate limits across hundreds of servers while preserving sub-millisecond latency. Next, we'll dive into policy design.