Rate Limiting At Gateway - Learning Module

Loading content...

0/273

Token Bucket Algorithm

The Elegant Solution to Bursty Traffic

Imagine you're managing a popular concert venue. You want to control the flow of people entering, but you also understand that people arrive in groups—not one by one in perfect intervals. A rigid "exactly one person per second" rule would create frustrating queues, while "unlimited entry" would overwhelm your venue.

The solution? Give each person a token at the door. Tokens accumulate up to a maximum (the bucket size), and each person entering consumes one token. If tokens are available, entry is immediate. If the bucket is empty, wait for more tokens to accumulate.

This is the Token Bucket Algorithm—and it has been the industry standard for rate limiting for decades. Used by AWS, Google Cloud, Stripe, and virtually every major API provider, it elegantly balances burst accommodation with sustained rate enforcement.

What You Will Learn

By the end of this page, you will understand the token bucket algorithm's mechanics, mathematical properties, implementation patterns, and practical considerations. You'll know how to configure buckets for your specific use cases and understand the algorithm's behavior under various traffic patterns.

Algorithm Mechanics

The token bucket algorithm operates on a beautifully simple principle: tokens are added to a bucket at a constant rate, and each request consumes tokens from the bucket. Let's formalize the mechanics.

Core Parameters

•Bucket Size (b) — Maximum number of tokens the bucket can hold. This determines the maximum burst size.
•Refill Rate (r) — Rate at which tokens are added to the bucket (tokens per second). This determines the sustained average rate.
•Current Tokens (t) — Number of tokens currently in the bucket. Starts full (t = b) in most implementations.
•Token Cost (c) — Number of tokens consumed per request. Usually 1, but can vary by endpoint cost.

The Algorithm:

When a request arrives:

Calculate tokens accumulated since last request: elapsed_time × refill_rate
Add accumulated tokens to current count: tokens = min(tokens + accumulated, bucket_size)
Check if tokens available: If tokens >= cost, allow request and tokens -= cost
Otherwise: Reject request with 429 status (or queue if desired)

This lazy evaluation approach is efficient—we only update the bucket when requests arrive, not continuously.

Converting Mermaid diagram...

Mathematical Properties

Understanding the mathematical properties of the token bucket helps you configure it correctly and predict its behavior under various conditions.

Token Bucket Properties
Property	Formula	Interpretation
Maximum burst	b (bucket size)	Maximum requests that can be processed instantaneously when bucket is full
Sustained rate	r (refill rate)	Long-term average rate the system can sustain
Time to refill	b / r	Time to refill an empty bucket to full
Burst interval	b / r	Minimum time between maximum bursts
Tokens after time T	min(b, t₀ + r×T)	Where t₀ is initial tokens; capped at bucket size
Max requests in time T	b + r×T	Initial burst plus sustained rate over time

The Burst-Rate Trade-off

A larger bucket allows bigger bursts but also means longer recovery time after a burst. A smaller bucket provides tighter control but may reject legitimate bursty traffic. The key is matching bucket size to expected legitimate burst patterns—like page loads that trigger multiple API calls.

Example Scenario:

Configure a bucket with:

Bucket size: 100 tokens
Refill rate: 10 tokens/second
Token cost: 1 per request

Behavior:

Can handle a burst of 100 requests instantly
After burst, can sustain 10 requests/second
Empty bucket refills in 10 seconds
In any 60-second window: max 100 + (10 × 60) = 700 requests

Production Implementation

Let's build a production-ready token bucket implementation. We'll start with a single-node version, then discuss distributed considerations.

token-bucket.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
interface TokenBucketConfig {
  bucketSize: number;      // Maximum tokens (burst capacity)
  refillRate: number;      // Tokens per second
  costPerRequest?: number; // Default: 1
}
 
interface BucketState {
  tokens: number;
  lastRefillTime: number;  // Unix timestamp in ms
}
 
class TokenBucket {
  private config: Required<TokenBucketConfig>;
  private state: BucketState;
 
  constructor(config: TokenBucketConfig) {
    this.config = {
      ...config,
      costPerRequest: config.costPerRequest ?? 1,
    };
    // Start with a full bucket
    this.state = {
      tokens: this.config.bucketSize,
      lastRefillTime: Date.now(),
    };
  }
 
  /**
   * Attempt to consume tokens for a request.
   * Returns true if allowed, false if rate limited.
   */
  tryConsume(cost: number = this.config.costPerRequest): boolean {
    this.refill();
    
    if (this.state.tokens >= cost) {
      this.state.tokens -= cost;
      return true;
    }
    return false;
  }
 
  /**
   * Get current bucket state (for rate limit headers)
   */
  getState(): { remaining: number; resetTime: number } {
    this.refill();
    const timeToFullRefill = 
      (this.config.bucketSize - this.state.tokens) / this.config.refillRate;
    return {
      remaining: Math.floor(this.state.tokens),
      resetTime: Date.now() + timeToFullRefill * 1000,
    };
  }
 
  /**
   * Refill tokens based on elapsed time (lazy evaluation)
   */
  private refill(): void {
    const now = Date.now();
    const elapsedSeconds = (now - this.state.lastRefillTime) / 1000;
    const tokensToAdd = elapsedSeconds * this.config.refillRate;
    
    this.state.tokens = Math.min(
      this.config.bucketSize,
      this.state.tokens + tokensToAdd
    );
    this.state.lastRefillTime = now;
  }
}
 
// Usage example
const limiter = new TokenBucket({
  bucketSize: 100,    // Allow burst of 100 requests
  refillRate: 10,     // Sustain 10 requests/second
});
 
function handleRequest(req: Request): Response {
  if (!limiter.tryConsume()) {
    const state = limiter.getState();
    return new Response(JSON.stringify({
      error: "rate_limit_exceeded",
      retry_after_ms: (state.resetTime - Date.now()),
    }), {
      status: 429,
      headers: {
        "X-RateLimit-Remaining": String(state.remaining),
        "Retry-After": String(Math.ceil((state.resetTime - Date.now()) / 1000)),
      },
    });
  }
  
  // Process request normally...
  return processRequest(req);
}

Lazy Refill is Key

Notice we don't run a background timer to add tokens. Instead, we calculate accumulated tokens when a request arrives. This lazy evaluation is more efficient and avoids timer overhead, especially when managing millions of buckets (one per user).

Per-Key Rate Limiting

In practice, you need separate buckets for different clients—identified by API key, user ID, IP address, or other attributes. This requires managing a collection of buckets.

rate-limiter-manager.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
interface RateLimiterConfig {
  bucketSize: number;
  refillRate: number;
  keyExtractor: (req: Request) => string;
  cleanupIntervalMs?: number;  // How often to purge stale buckets
  bucketTTLMs?: number;        // How long to keep idle buckets
}
 
class RateLimiterManager {
  private buckets: Map<string, { bucket: TokenBucket; lastAccess: number }>;
  private config: Required<RateLimiterConfig>;
  private cleanupInterval: NodeJS.Timeout;
 
  constructor(config: RateLimiterConfig) {
    this.config = {
      ...config,
      cleanupIntervalMs: config.cleanupIntervalMs ?? 60000,  // 1 minute
      bucketTTLMs: config.bucketTTLMs ?? 3600000,            // 1 hour
    };
    this.buckets = new Map();
    
    // Periodic cleanup to prevent memory leaks
    this.cleanupInterval = setInterval(
      () => this.cleanup(),
      this.config.cleanupIntervalMs
    );
  }
 
  tryConsume(req: Request, cost: number = 1): boolean {
    const key = this.config.keyExtractor(req);
    const entry = this.getOrCreateBucket(key);
    entry.lastAccess = Date.now();
    return entry.bucket.tryConsume(cost);
  }
 
  private getOrCreateBucket(key: string) {
    let entry = this.buckets.get(key);
    if (!entry) {
      entry = {
        bucket: new TokenBucket({
          bucketSize: this.config.bucketSize,
          refillRate: this.config.refillRate,
        }),
        lastAccess: Date.now(),
      };
      this.buckets.set(key, entry);
    }
    return entry;
  }
 
  private cleanup(): void {
    const now = Date.now();
    for (const [key, entry] of this.buckets) {
      if (now - entry.lastAccess > this.config.bucketTTLMs) {
        this.buckets.delete(key);
      }
    }
  }
 
  destroy(): void {
    clearInterval(this.cleanupInterval);
  }
}
 
// Usage: Rate limit by API key
const apiKeyLimiter = new RateLimiterManager({
  bucketSize: 1000,
  refillRate: 100,
  keyExtractor: (req) => req.headers.get("X-API-Key") ?? "anonymous",
});
 
// Usage: Rate limit by IP address
const ipLimiter = new RateLimiterManager({
  bucketSize: 60,
  refillRate: 1,
  keyExtractor: (req) => req.headers.get("X-Forwarded-For")?.split(",")[0] ?? "unknown",
});

Memory Management Matters

Without cleanup, bucket maps grow unbounded. If you have millions of unique clients, you'll exhaust memory. The cleanup mechanism purges buckets that haven't been accessed recently—they'll be recreated fresh if the client returns.

Variable Cost Endpoints

Not all API endpoints are equal. A simple health check consumes far fewer resources than a complex search query or data export. Token buckets support this naturally through variable token costs.

Example Endpoint Cost Weighting
Endpoint	Token Cost	Rationale
GET /health	0	No-cost monitoring endpoints
GET /users/:id	1	Simple database lookup
GET /users	5	Paginated list with pagination
POST /search	10	Full-text search across indexes
POST /export	50	Expensive data export operation
POST /batch	N	Dynamic cost based on batch size

cost-weighted-limiter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Define endpoint costs
const ENDPOINT_COSTS: Record<string, number> = {
  "GET:/health": 0,
  "GET:/users/:id": 1,
  "GET:/users": 5,
  "POST:/search": 10,
  "POST:/export": 50,
};
 
function getEndpointCost(method: string, path: string): number {
  // Normalize path (e.g., /users/123 -> /users/:id)
  const normalizedPath = normalizePath(path);
  const key = `${method}:${normalizedPath}`;
  return ENDPOINT_COSTS[key] ?? 1;  // Default cost of 1
}
 
function handleRequest(req: Request): Response {
  const cost = getEndpointCost(req.method, req.url);
  
  // Health checks bypass rate limiting entirely
  if (cost === 0) {
    return processRequest(req);
  }
  
  if (!limiter.tryConsume(req, cost)) {
    return rateLimitResponse(req);
  }
  
  return processRequest(req);
}
 
// For batch endpoints, calculate dynamic cost
function handleBatchRequest(req: Request): Response {
  const items = req.body.items;
  const cost = Math.min(items.length, 100);  // Cap at 100 to prevent abuse
  
  if (!limiter.tryConsume(req, cost)) {
    return rateLimitResponse(req);
  }
  
  return processBatch(req);
}

Align Cost with Resource Consumption

Profile your endpoints to understand actual resource consumption. A search endpoint that queries Elasticsearch and processes results might consume 10x the resources of a simple GET. Your cost weights should reflect this reality.

Token Bucket vs. Leaky Bucket

The token bucket is often confused with the leaky bucket algorithm. While related, they have distinct behaviors that suit different use cases.

Token Bucket

•Tokens accumulate over time, up to bucket size
•Allows bursts up to bucket capacity
•Requests are allowed or rejected immediately
•Best for: APIs with bursty but legitimate traffic
•Used by: AWS API Gateway, Stripe, most modern APIs

Leaky Bucket

•Requests fill a bucket; bucket leaks at constant rate
•Smooths output to constant rate
•Requests may be queued (not immediate rejection)
•Best for: Traffic shaping, constant-rate processing
•Used by: Network traffic shaping, video streaming

Key Difference:

The token bucket controls the average rate while allowing bursts. The leaky bucket controls the instantaneous rate, smoothing bursts into a constant flow.

For API rate limiting, the token bucket is almost always preferred because:

User experience: Legitimate burst traffic (page loads, batch operations) isn't artificially delayed
Simplicity: No queue management needed—requests are allowed or rejected
Predictability: Clients can send bursts knowing they'll be processed immediately if tokens are available
Efficiency: No buffer memory for queued requests

Summary: Token Bucket Algorithm

The token bucket algorithm is the workhorse of API rate limiting. Let's consolidate what we've learned:

Key Takeaways

•Two parameters define behavior — Bucket size controls burst; refill rate controls sustained rate.
•Lazy evaluation is efficient — Calculate token accumulation only when requests arrive.
•Per-key buckets enable client isolation — Each user/API key gets their own bucket.
•Variable costs reflect resource reality — Expensive endpoints consume more tokens.
•Memory management prevents leaks — Cleanup idle buckets periodically.
•Token bucket beats leaky bucket for APIs — Burst accommodation provides better UX.

What's Next:

The token bucket is excellent for controlling average rates with burst accommodation. But what if you need more precise time-window control? The next page explores the Sliding Window Algorithm—an approach that provides smoother rate limiting without the "boundary problem" of fixed windows.

Page Complete

You now have a deep understanding of the token bucket algorithm and can implement it in production. This knowledge forms the foundation for more advanced rate limiting strategies we'll explore next.