Rate Limiting - Learning Module

Loading content...

0/273

Rate Limiting Algorithms

The Mechanisms Behind the Protection

Behind every rate limiting decision—every request allowed, every request rejected—lies an algorithm tracking time, counting requests, and making split-second decisions. The choice of algorithm determines not just whether you can implement rate limiting, but how precisely you can control traffic, how much memory you consume, and how fairly you treat your users.

Rate limiting algorithms range from simple counters that can be implemented in a few lines of code to sophisticated sliding window implementations that provide mathematical accuracy. Each has trade-offs in precision, memory usage, implementation complexity, and performance. Understanding these trade-offs is essential for choosing the right algorithm for your specific use case.

What You Will Learn

By the end of this page, you will deeply understand five core rate limiting algorithms—Token Bucket, Leaky Bucket, Fixed Window, Sliding Window Log, and Sliding Window Counter. You'll know how each works, their memory and computational characteristics, their strengths and weaknesses, and when to choose each for different scenarios.

Algorithm Selection Criteria

Before diving into specific algorithms, let's establish the criteria we use to evaluate them. Different systems have different constraints, and the 'best' algorithm depends entirely on your requirements.

Rate Limiting Algorithm Evaluation Criteria
Criterion	What It Measures	Why It Matters
Memory Efficiency	Memory per tracked entity (IP, user, key)	At millions of tracked entities, memory adds up; O(1) per entity vs O(n) per request matters
Accuracy	How closely actual rate matches configured limit	Inaccurate algorithms allow bursts above limit or reject requests below limit
Burst Handling	Ability to allow short bursts while enforcing long-term rate	Real traffic is bursty; pure smoothing frustrates legitimate users
Computational Cost	CPU per rate limit check	At millions of requests/second, every microsecond matters
Implementation Complexity	Difficulty of correct implementation	Complex algorithms are harder to implement, test, debug, and maintain
Distributed Support	Ease of coordination across multiple nodes	Most production systems require distributed rate limiting
Fairness	Equal treatment of requests within limit	Some algorithms can starve late-arriving requests within a window
Smooth Rate Enforcement	Whether rate enforcement is gradual or spiky	Some algorithms allow bursts at window boundaries that can stress systems

No Perfect Algorithm

There is no universally 'best' rate limiting algorithm. Token Bucket is excellent for allowing bursts while enforcing average rates. Fixed Window is simple and memory-efficient but has boundary problems. Choose based on your specific constraints and requirements.

Token Bucket Algorithm

The Token Bucket algorithm is perhaps the most widely used rate limiting algorithm, favored for its intuitive model and excellent balance of features. It's used by AWS, Google Cloud Platform, and countless enterprise systems.

The mental model:

Imagine a bucket that holds tokens. The bucket has a maximum capacity (let's say 100 tokens). Tokens are added to the bucket at a constant rate (say, 10 tokens per second). When a request arrives, it takes a token from the bucket. If the bucket is empty, the request is rejected (or queued).

This simple model naturally supports:

Rate limiting: The refill rate limits the average throughput
Burst handling: A full bucket allows bursts up to capacity
Graceful degradation: Partial buckets allow partial bursts

token-bucket.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
/**
 * Token Bucket Rate Limiter
 * 
 * State per client:
 * - tokens: Current number of available tokens
 * - lastRefillTime: Timestamp of last token refill
 */
interface TokenBucketState {
  tokens: number;
  lastRefillTime: number;
}
 
class TokenBucketRateLimiter {
  private readonly capacity: number;      // Maximum tokens in bucket
  private readonly refillRate: number;    // Tokens added per second
  private readonly buckets: Map<string, TokenBucketState> = new Map();
 
  constructor(capacity: number, refillRate: number) {
    this.capacity = capacity;
    this.refillRate = refillRate;
  }
 
  /**
   * Check if request should be allowed
   * @param clientId - Unique identifier for the client
   * @param tokensNeeded - Tokens required (usually 1, can vary by operation cost)
   * @returns Whether the request is allowed
   */
  isAllowed(clientId: string, tokensNeeded: number = 1): boolean {
    const now = Date.now();
    let state = this.buckets.get(clientId);
    
    // Initialize new clients with full bucket
    if (!state) {
      state = { tokens: this.capacity, lastRefillTime: now };
      this.buckets.set(clientId, state);
    }
    
    // Calculate tokens to add based on elapsed time
    const elapsed = (now - state.lastRefillTime) / 1000; // seconds
    const tokensToAdd = elapsed * this.refillRate;
    
    // Refill bucket (capped at capacity)
    state.tokens = Math.min(this.capacity, state.tokens + tokensToAdd);
    state.lastRefillTime = now;
    
    // Check if we have enough tokens
    if (state.tokens >= tokensNeeded) {
      state.tokens -= tokensNeeded;
      return true;
    }
    
    return false;
  }
 
  /**
   * Get time until next token is available
   */
  getRetryAfter(clientId: string): number {
    const state = this.buckets.get(clientId);
    if (!state || state.tokens >= 1) return 0;
    
    const tokensNeeded = 1 - state.tokens;
    return Math.ceil((tokensNeeded / this.refillRate) * 1000); // ms
  }
}
 
// Usage example
const limiter = new TokenBucketRateLimiter(
  100,  // capacity: 100 tokens max
  10    // refillRate: 10 tokens per second
);
 
// This allows bursts of up to 100 requests immediately,
// but sustained rate is limited to 10 requests/second
const isAllowed = limiter.isAllowed("user-123");

Token Bucket Strengths

•Intuitive model — Easy to explain and reason about
•Burst support — Allows short bursts up to capacity
•Memory efficient — O(1) state per client (just tokens + timestamp)
•Variable request costs — Different operations can consume different token counts
•Smooth refill — Tokens refill continuously, not in chunks
•Well-supported — Implemented in many libraries and services

Token Bucket Weaknesses

•Burst amplification — Client can save up tokens and burst
•No smoothing — Doesn't enforce even distribution of requests
•Refill timing — In distributed systems, coordinating refills is complex
•State per client — Memory grows with unique clients tracked
•Floating point precision — Token counts can have rounding issues

When to Use Token Bucket

Token Bucket is ideal when you want to: allow burst traffic while limiting sustained rate, charge different costs for different operations, or implement a model users can easily understand ('you have X requests remaining'). It's the default choice for most API rate limiting use cases.

Leaky Bucket Algorithm

The Leaky Bucket algorithm is often confused with Token Bucket, but serves a fundamentally different purpose. While Token Bucket controls the average rate while allowing bursts, Leaky Bucket enforces a smooth, constant output rate.

The mental model:

Imagine a bucket with a hole in the bottom. Water (requests) flows into the bucket. The hole 'leaks' water out at a constant rate. If water (requests) arrive faster than the bucket can leak, the bucket fills up. Once full, additional water (requests) overflows and is discarded.

Key difference from Token Bucket:

Token Bucket: 'How many tokens do I have? Spend them on requests.'
Leaky Bucket: 'Queue requests and process them at a steady rate.'

leaky-bucket.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
/**
 * Leaky Bucket Rate Limiter (as a meter)
 *
 * This implementation treats Leaky Bucket as a rate limiter that
 * smooths traffic by processing requests at a constant rate.
 *
 * State per client:
 * - waterLevel: Current "water" in the bucket
 * - lastLeakTime: Timestamp of last leak calculation
 */
interface LeakyBucketState {
  waterLevel: number;
  lastLeakTime: number;
}
 
class LeakyBucketRateLimiter {
  private readonly capacity: number;    // Maximum bucket size
  private readonly leakRate: number;    // "Water" leaked per second
  private readonly buckets: Map<string, LeakyBucketState> = new Map();
 
  constructor(capacity: number, leakRate: number) {
    this.capacity = capacity;
    this.leakRate = leakRate;
  }
 
  /**
   * Attempt to add a request to the bucket
   * @returns true if request accepted, false if bucket overflow (rejected)
   */
  isAllowed(clientId: string, amount: number = 1): boolean {
    const now = Date.now();
    let state = this.buckets.get(clientId);
 
    // Initialize new clients with empty bucket
    if (!state) {
      state = { waterLevel: 0, lastLeakTime: now };
      this.buckets.set(clientId, state);
    }
 
    // Calculate water that has leaked since last check
    const elapsed = (now - state.lastLeakTime) / 1000;
    const leaked = elapsed * this.leakRate;
 
    // Update water level (can't go below 0)
    state.waterLevel = Math.max(0, state.waterLevel - leaked);
    state.lastLeakTime = now;
 
    // Check if adding this request would overflow
    if (state.waterLevel + amount <= this.capacity) {
      state.waterLevel += amount;
      return true;
    }
 
    // Bucket would overflow - reject request
    return false;
  }
 
  /**
   * Get queue depth (how full is the bucket)
   */
  getQueueDepth(clientId: string): number {
    const state = this.buckets.get(clientId);
    if (!state) return 0;
 
    const now = Date.now();
    const elapsed = (now - state.lastLeakTime) / 1000;
    const leaked = elapsed * this.leakRate;
 
    return Math.max(0, state.waterLevel - leaked);
  }
}
 
// Usage: Allow bursts into queue but process at steady rate
const limiter = new LeakyBucketRateLimiter(
  50,   // capacity: queue up to 50 requests
  10    // leakRate: process 10 requests per second
);
 
// Requests are accepted as long as bucket isn't full
// Processing happens at constant 10/second rate

Token Bucket vs Leaky Bucket: A Critical Distinction

These two algorithms are often confused because they both use the bucket metaphor, but they solve different problems:

Aspect	Token Bucket	Leaky Bucket
Mental model	Tokens accumulate, requests consume them	Requests fill bucket, processing drains it
Burst behavior	Allows bursts up to capacity	Queues bursts, processes at constant rate
Output pattern	Variable (bursty allowed)	Smooth (constant rate)
Best for	API rate limits	Traffic shaping/smoothing
When bucket full	Requests succeed	Requests are queued
When bucket empty	Requests fail	Processing stops

Use Leaky Bucket when:

You need to smooth traffic spikes (protect downstream systems)
You're implementing network traffic shaping
Constant processing rate is more important than burst accommodation
You want to queue requests rather than reject them

The Leaky Bucket as a Queue

In many implementations, Leaky Bucket is literally a queue with a size limit. Requests enter the queue, and a worker processes them at a fixed rate. This transforms bursty input into smooth output—exactly what you want for traffic shaping to protect databases or downstream services.

Fixed Window Counter

The Fixed Window Counter is the simplest rate limiting algorithm. It divides time into fixed windows (e.g., 60-second windows) and counts requests within each window. When the count exceeds the limit, requests are rejected until the next window begins.

The mental model:

Imagine a counter that resets every minute. Each request increments the counter. If the counter exceeds your limit, reject requests. When a new minute starts, the counter resets to zero.

Simplicity is its strength: Fixed Window can be implemented with a single atomic increment operation and a timestamp comparison, making it extremely fast and easy to distribute.

fixed-window.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
/**
 * Fixed Window Counter Rate Limiter
 *
 * Simple and efficient, but has boundary problems.
 *
 * State per client:
 * - count: Requests in current window
 * - windowStart: Timestamp when current window began
 */
interface FixedWindowState {
  count: number;
  windowStart: number;
}
 
class FixedWindowRateLimiter {
  private readonly limit: number;           // Max requests per window
  private readonly windowSizeMs: number;    // Window duration in ms
  private readonly windows: Map<string, FixedWindowState> = new Map();
 
  constructor(limit: number, windowSizeSeconds: number) {
    this.limit = limit;
    this.windowSizeMs = windowSizeSeconds * 1000;
  }
 
  /**
   * Get current window start time (always aligned to window boundaries)
   */
  private getCurrentWindow(now: number): number {
    return Math.floor(now / this.windowSizeMs) * this.windowSizeMs;
  }
 
  isAllowed(clientId: string): boolean {
    const now = Date.now();
    const currentWindow = this.getCurrentWindow(now);
    
    let state = this.windows.get(clientId);
 
    // New client or window has changed - reset counter
    if (!state || state.windowStart !== currentWindow) {
      state = { count: 0, windowStart: currentWindow };
      this.windows.set(clientId, state);
    }
 
    // Check limit
    if (state.count < this.limit) {
      state.count++;
      return true;
    }
 
    return false;
  }
 
  /**
   * Get remaining requests in current window
   */
  getRemaining(clientId: string): number {
    const now = Date.now();
    const currentWindow = this.getCurrentWindow(now);
    const state = this.windows.get(clientId);
 
    if (!state || state.windowStart !== currentWindow) {
      return this.limit;
    }
 
    return Math.max(0, this.limit - state.count);
  }
 
  /**
   * Get time until window resets
   */
  getResetTime(clientId: string): number {
    const now = Date.now();
    const currentWindow = this.getCurrentWindow(now);
    return currentWindow + this.windowSizeMs - now;
  }
}
 
// Usage: 100 requests per 60-second window
const limiter = new FixedWindowRateLimiter(100, 60);

The Boundary Problem:

Fixed Window has a critical flaw: the 'boundary problem' or 'burst-at-edges' issue. Consider a limit of 100 requests per minute:

|-------- Window 1 --------|-------- Window 2 --------|
                     99 reqs|100 reqs
                    (0:59)  |(1:00)

A client makes 99 requests at second 59 of Window 1, then 100 requests at second 0 of Window 2. That's 199 requests in 2 seconds—nearly 2x the intended limit!

This happens because each window is independent. The algorithm has no visibility into request patterns near boundaries.

Fixed Window Strengths

•Extremely simple — Just a counter and a timestamp
•Memory efficient — O(1) state per client
•Fast — Single atomic operation (increment)
•Easy to distribute — Single key makes Redis implementation trivial
•Predictable reset — Users know exactly when their quota resets

Fixed Window Weaknesses

•Boundary problem — Up to 2x limit near window edges
•Burst concentration — All quota can be used in first second
•Thundering herd — Many clients may reset simultaneously
•Poor for security — Attackers can exploit boundary issue
•Stampede potential — Clients waiting for reset all resume at once

When to Avoid Fixed Window

Avoid Fixed Window for security-critical rate limiting (authentication, payment endpoints) where the boundary problem could be exploited. It's acceptable for soft limits where approximate enforcement is sufficient (general API usage tracking, non-critical analytics).

Sliding Window Log Algorithm

The Sliding Window Log algorithm provides perfect accuracy by maintaining a log of all request timestamps within the rate limit window. Instead of fixed boundaries, it uses a 'sliding' window that always looks back from the current moment.

The mental model:

Keep a list of all request timestamps. For each new request, remove timestamps older than the window size, then count remaining entries. If the count is under the limit, allow the request and add its timestamp to the log.

Mathematical precision: This algorithm enforces exact limits. If your limit is 100 requests per minute, there will never be more than 100 requests in any 60-second period, regardless of when those periods start.

sliding-window-log.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
/**
 * Sliding Window Log Rate Limiter
 *
 * Perfectly accurate but memory-intensive.
 *
 * State per client:
 * - timestamps: Array of all request timestamps within window
 */
class SlidingWindowLogRateLimiter {
  private readonly limit: number;
  private readonly windowSizeMs: number;
  private readonly logs: Map<string, number[]> = new Map();
 
  constructor(limit: number, windowSizeSeconds: number) {
    this.limit = limit;
    this.windowSizeMs = windowSizeSeconds * 1000;
  }
 
  isAllowed(clientId: string): boolean {
    const now = Date.now();
    const windowStart = now - this.windowSizeMs;
 
    // Get or initialize the log for this client
    let timestamps = this.logs.get(clientId);
    if (!timestamps) {
      timestamps = [];
      this.logs.set(clientId, timestamps);
    }
 
    // Remove timestamps outside the current window
    // Use binary search for efficiency in sorted array
    while (timestamps.length > 0 && timestamps[0] <= windowStart) {
      timestamps.shift();
    }
 
    // Check if we're under the limit
    if (timestamps.length < this.limit) {
      timestamps.push(now);
      return true;
    }
 
    return false;
  }
 
  getRemaining(clientId: string): number {
    const now = Date.now();
    const windowStart = now - this.windowSizeMs;
    const timestamps = this.logs.get(clientId);
 
    if (!timestamps) return this.limit;
 
    // Count timestamps within window
    const inWindow = timestamps.filter(ts => ts > windowStart).length;
    return Math.max(0, this.limit - inWindow);
  }
 
  /**
   * Get time until oldest request expires (next slot opens)
   */
  getRetryAfter(clientId: string): number {
    const now = Date.now();
    const timestamps = this.logs.get(clientId);
 
    if (!timestamps || timestamps.length < this.limit) {
      return 0;
    }
 
    // Oldest timestamp will expire first
    const oldestExpiry = timestamps[0] + this.windowSizeMs;
    return Math.max(0, oldestExpiry - now);
  }
}
 
// Usage: Exactly 100 requests in any rolling 60-second window
const limiter = new SlidingWindowLogRateLimiter(100, 60);

Sliding Window Log Strengths

•Perfect accuracy — No boundary problem, exact enforcement
•True sliding window — Any interval is correctly limited
•Precise retry timing — Know exactly when next slot opens
•Audit trail — Timestamps available for analysis
•No approximations — Mathematically correct behavior

Sliding Window Log Weaknesses

•O(n) memory per client — Stores every timestamp
•Expensive cleanup — Must remove old entries on each request
•Doesn't scale — 1000 req/s limit = 60000 entries per client per minute
•Sorted set overhead — More complex data structure operations
•Storage costs — At scale, memory becomes prohibitive

When to Use Sliding Window Log

Use Sliding Window Log when perfect accuracy is critical AND request volumes are low (< 100 requests/minute per client). It's ideal for high-value, low-frequency operations: payment processing, admin actions, expensive computations. Avoid for high-throughput general API limiting where memory costs would be prohibitive.

Sliding Window Counter Algorithm

The Sliding Window Counter algorithm combines the memory efficiency of Fixed Window with the accuracy of Sliding Window Log. It achieves this by weighting the previous window's count based on overlap with the current sliding window.

The mental model:

Maintain counts for the current and previous fixed windows. When checking the rate limit, calculate a weighted sum: the full count of the current window plus a proportion of the previous window based on how much it overlaps with the sliding window.

The magic formula:

weighted_count = current_window_count + (previous_window_count × overlap_percentage)

If we're 30% through the current window, we include 70% of the previous window's count.

sliding-window-counter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
/**
 * Sliding Window Counter Rate Limiter
 *
 * Best of both worlds: O(1) memory with near-perfect accuracy.
 *
 * State per client:
 * - currentCount: Requests in current window
 * - previousCount: Requests in previous window
 * - currentWindowStart: When current window began
 */
interface SlidingWindowCounterState {
  currentCount: number;
  previousCount: number;
  currentWindowStart: number;
}
 
class SlidingWindowCounterRateLimiter {
  private readonly limit: number;
  private readonly windowSizeMs: number;
  private readonly windows: Map<string, SlidingWindowCounterState> = new Map();
 
  constructor(limit: number, windowSizeSeconds: number) {
    this.limit = limit;
    this.windowSizeMs = windowSizeSeconds * 1000;
  }
 
  private getWindowStart(timestamp: number): number {
    return Math.floor(timestamp / this.windowSizeMs) * this.windowSizeMs;
  }
 
  isAllowed(clientId: string): boolean {
    const now = Date.now();
    const currentWindowStart = this.getWindowStart(now);
 
    let state = this.windows.get(clientId);
 
    // Initialize or handle window transitions
    if (!state) {
      state = {
        currentCount: 0,
        previousCount: 0,
        currentWindowStart: currentWindowStart,
      };
      this.windows.set(clientId, state);
    } else if (state.currentWindowStart !== currentWindowStart) {
      // Window has changed - shift counts
      const windowsElapsed = Math.floor(
        (currentWindowStart - state.currentWindowStart) / this.windowSizeMs
      );
 
      if (windowsElapsed === 1) {
        // Normal transition: current becomes previous
        state.previousCount = state.currentCount;
        state.currentCount = 0;
      } else {
        // Multiple windows elapsed: both reset
        state.previousCount = 0;
        state.currentCount = 0;
      }
      state.currentWindowStart = currentWindowStart;
    }
 
    // Calculate position within current window (0.0 to 1.0)
    const windowElapsed = (now - currentWindowStart) / this.windowSizeMs;
 
    // Calculate overlap of sliding window with previous fixed window
    const previousWindowWeight = 1 - windowElapsed;
 
    // Weighted count across both windows
    const weightedCount =
      state.currentCount + state.previousCount * previousWindowWeight;
 
    if (weightedCount < this.limit) {
      state.currentCount++;
      return true;
    }
 
    return false;
  }
 
  getRemaining(clientId: string): number {
    const now = Date.now();
    const currentWindowStart = this.getWindowStart(now);
    const state = this.windows.get(clientId);
 
    if (!state) return this.limit;
 
    // Handle stale state
    if (state.currentWindowStart !== currentWindowStart) {
      return this.limit; // Effectively reset
    }
 
    const windowElapsed = (now - currentWindowStart) / this.windowSizeMs;
    const previousWindowWeight = 1 - windowElapsed;
    const weightedCount =
      state.currentCount + state.previousCount * previousWindowWeight;
 
    return Math.max(0, Math.floor(this.limit - weightedCount));
  }
}
 
// Usage: ~100 requests per rolling 60-second window
const limiter = new SlidingWindowCounterRateLimiter(100, 60);

Accuracy Analysis:

Sliding Window Counter isn't perfectly accurate, but its error bounds are predictable and small:

Worst case: When the previous window had many more requests than the current window, the weighted count slightly underestimates actual requests in the sliding window
Error bound: Maximum error is bounded by the difference between windows, weighted by position
In practice: Error is typically < 1% with reasonable traffic patterns

Why it works:

The weighted approximation assumes requests in the previous window were evenly distributed. If a client made 60 requests in the previous window, we assume ~1 request per second. When calculating overlap, we credit them proportionally.

This assumption breaks down with very bursty traffic within a single fixed window, but for most traffic patterns, it's remarkably accurate.

The Recommended Default

Sliding Window Counter is often the best default choice for API rate limiting. It provides near-perfect accuracy with O(1) memory—only 2 counters per client regardless of request volume. It eliminates the Fixed Window boundary problem while avoiding Sliding Window Log's memory explosion. Major cloud providers use variants of this algorithm.

Comparative Analysis

Let's compare all five algorithms across key dimensions to help you choose the right one for your use case:

Rate Limiting Algorithm Comparison
Algorithm	Memory	Accuracy	Burst Handling	Complexity	Best For
Token Bucket	O(1) per client	Exact (for sustained rate)	Excellent (by design)	Low	APIs needing burst tolerance
Leaky Bucket	O(1) per client	Exact (for output rate)	Queues bursts	Low	Traffic shaping/smoothing
Fixed Window	O(1) per client	Poor (2x at boundaries)	All at once per window	Very Low	Simple quota tracking
Sliding Window Log	O(n) per client	Perfect	Natural handling	Medium	Low-volume, high-value operations
Sliding Window Counter	O(1) per client	Near-perfect (~99%)	Natural handling	Medium	General API rate limiting

Decision Framework:

┌─────────────────────────────────────────────────────────────┐
│ Do you need to allow bursts while limiting sustained rate? │
└─────────────────────────┬───────────────────────────────────┘
                          │
                    ┌─────▼─────┐
                    │    YES    │──────▶ Token Bucket
                    └───────────┘
                          │ NO
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ Do you need smooth, constant output rate?                   │
└─────────────────────────┬───────────────────────────────────┘
                          │
                    ┌─────▼─────┐
                    │    YES    │──────▶ Leaky Bucket
                    └───────────┘
                          │ NO
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ Is request volume low (< 100/min per client)?               │
│ AND do you need perfect accuracy?                           │
└─────────────────────────┬───────────────────────────────────┘
                          │
                    ┌─────▼─────┐
                    │    YES    │──────▶ Sliding Window Log
                    └───────────┘
                          │ NO
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ Is approximate accuracy okay (~99%)? Need memory efficiency?│
└─────────────────────────┬───────────────────────────────────┘
                          │
                    ┌─────▼─────┐
                    │    YES    │──────▶ Sliding Window Counter
                    └───────────┘        (Recommended Default)
                          │ NO
                          ▼
                    Fixed Window
                    (Only for very simple use cases)

Hybrid Approaches

In production, you often combine algorithms. Example: Token Bucket for per-user API limits (allowing bursts) + Leaky Bucket queue in front of your database (smoothing load) + Fixed Window for simple daily quotas (where boundaries don't matter). Different layers can use different algorithms based on their specific requirements.

Summary: Rate Limiting Algorithms

We've covered the five fundamental rate limiting algorithms in depth. Let's consolidate the key insights:

Key Takeaways

•Token Bucket excels at allowing bursts while enforcing sustained rates—ideal for user-facing APIs where occasional burst usage is acceptable
•Leaky Bucket smooths traffic to a constant output rate—ideal for protecting downstream systems from bursty input
•Fixed Window is simple and memory-efficient but has the boundary problem—suitable only where approximate limiting is acceptable
•Sliding Window Log provides perfect accuracy but O(n) memory—reserve for low-volume, high-value operations
•Sliding Window Counter offers the best tradeoff: near-perfect accuracy with O(1) memory—the recommended default for most use cases
•Algorithm choice depends on requirements: burst tolerance, accuracy needs, memory constraints, and traffic volume
•Hybrid approaches combining multiple algorithms at different layers are common in production

What's next:

Understanding algorithms is essential, but most production systems require rate limiting across multiple servers. The next page covers Distributed Rate Limiting—the challenges of coordination, consistency tradeoffs, and architectures for implementing rate limiting at scale across distributed infrastructure.

Page Complete

You now understand the core rate limiting algorithms, their mechanics, and their trade-offs. Whether you're implementing rate limiting from scratch or evaluating existing solutions, this knowledge helps you make informed decisions about which algorithm fits your specific requirements.

Rate Limiting Algorithms

The Mechanisms Behind the Protection

What You Will Learn

Algorithm Selection Criteria

Rate Limiting Algorithm Evaluation Criteria
Criterion	What It Measures	Why It Matters
Memory Efficiency	Memory per tracked entity (IP, user, key)	At millions of tracked entities, memory adds up; O(1) per entity vs O(n) per request matters
Accuracy	How closely actual rate matches configured limit	Inaccurate algorithms allow bursts above limit or reject requests below limit
Burst Handling	Ability to allow short bursts while enforcing long-term rate	Real traffic is bursty; pure smoothing frustrates legitimate users
Computational Cost	CPU per rate limit check	At millions of requests/second, every microsecond matters
Implementation Complexity	Difficulty of correct implementation	Complex algorithms are harder to implement, test, debug, and maintain
Distributed Support	Ease of coordination across multiple nodes	Most production systems require distributed rate limiting
Fairness	Equal treatment of requests within limit	Some algorithms can starve late-arriving requests within a window
Smooth Rate Enforcement	Whether rate enforcement is gradual or spiky	Some algorithms allow bursts at window boundaries that can stress systems

No Perfect Algorithm

Token Bucket Algorithm

The mental model:

This simple model naturally supports:

Rate limiting: The refill rate limits the average throughput
Burst handling: A full bucket allows bursts up to capacity
Graceful degradation: Partial buckets allow partial bursts

token-bucket.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
/**
 * Token Bucket Rate Limiter
 * 
 * State per client:
 * - tokens: Current number of available tokens
 * - lastRefillTime: Timestamp of last token refill
 */
interface TokenBucketState {
  tokens: number;
  lastRefillTime: number;
}
 
class TokenBucketRateLimiter {
  private readonly capacity: number;      // Maximum tokens in bucket
  private readonly refillRate: number;    // Tokens added per second
  private readonly buckets: Map<string, TokenBucketState> = new Map();
 
  constructor(capacity: number, refillRate: number) {
    this.capacity = capacity;
    this.refillRate = refillRate;
  }
 
  /**
   * Check if request should be allowed
   * @param clientId - Unique identifier for the client
   * @param tokensNeeded - Tokens required (usually 1, can vary by operation cost)
   * @returns Whether the request is allowed
   */
  isAllowed(clientId: string, tokensNeeded: number = 1): boolean {
    const now = Date.now();
    let state = this.buckets.get(clientId);
    
    // Initialize new clients with full bucket
    if (!state) {
      state = { tokens: this.capacity, lastRefillTime: now };
      this.buckets.set(clientId, state);
    }
    
    // Calculate tokens to add based on elapsed time
    const elapsed = (now - state.lastRefillTime) / 1000; // seconds
    const tokensToAdd = elapsed * this.refillRate;
    
    // Refill bucket (capped at capacity)
    state.tokens = Math.min(this.capacity, state.tokens + tokensToAdd);
    state.lastRefillTime = now;
    
    // Check if we have enough tokens
    if (state.tokens >= tokensNeeded) {
      state.tokens -= tokensNeeded;
      return true;
    }
    
    return false;
  }
 
  /**
   * Get time until next token is available
   */
  getRetryAfter(clientId: string): number {
    const state = this.buckets.get(clientId);
    if (!state || state.tokens >= 1) return 0;
    
    const tokensNeeded = 1 - state.tokens;
    return Math.ceil((tokensNeeded / this.refillRate) * 1000); // ms
  }
}
 
// Usage example
const limiter = new TokenBucketRateLimiter(
  100,  // capacity: 100 tokens max
  10    // refillRate: 10 tokens per second
);
 
// This allows bursts of up to 100 requests immediately,
// but sustained rate is limited to 10 requests/second
const isAllowed = limiter.isAllowed("user-123");

Token Bucket Strengths

•Intuitive model — Easy to explain and reason about
•Burst support — Allows short bursts up to capacity
•Memory efficient — O(1) state per client (just tokens + timestamp)
•Variable request costs — Different operations can consume different token counts
•Smooth refill — Tokens refill continuously, not in chunks
•Well-supported — Implemented in many libraries and services

Token Bucket Weaknesses

•Burst amplification — Client can save up tokens and burst
•No smoothing — Doesn't enforce even distribution of requests
•Refill timing — In distributed systems, coordinating refills is complex
•State per client — Memory grows with unique clients tracked
•Floating point precision — Token counts can have rounding issues

When to Use Token Bucket

Leaky Bucket Algorithm

The mental model:

Key difference from Token Bucket:

Token Bucket: 'How many tokens do I have? Spend them on requests.'
Leaky Bucket: 'Queue requests and process them at a steady rate.'

leaky-bucket.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
/**
 * Leaky Bucket Rate Limiter (as a meter)
 *
 * This implementation treats Leaky Bucket as a rate limiter that
 * smooths traffic by processing requests at a constant rate.
 *
 * State per client:
 * - waterLevel: Current "water" in the bucket
 * - lastLeakTime: Timestamp of last leak calculation
 */
interface LeakyBucketState {
  waterLevel: number;
  lastLeakTime: number;
}
 
class LeakyBucketRateLimiter {
  private readonly capacity: number;    // Maximum bucket size
  private readonly leakRate: number;    // "Water" leaked per second
  private readonly buckets: Map<string, LeakyBucketState> = new Map();
 
  constructor(capacity: number, leakRate: number) {
    this.capacity = capacity;
    this.leakRate = leakRate;
  }
 
  /**
   * Attempt to add a request to the bucket
   * @returns true if request accepted, false if bucket overflow (rejected)
   */
  isAllowed(clientId: string, amount: number = 1): boolean {
    const now = Date.now();
    let state = this.buckets.get(clientId);
 
    // Initialize new clients with empty bucket
    if (!state) {
      state = { waterLevel: 0, lastLeakTime: now };
      this.buckets.set(clientId, state);
    }
 
    // Calculate water that has leaked since last check
    const elapsed = (now - state.lastLeakTime) / 1000;
    const leaked = elapsed * this.leakRate;
 
    // Update water level (can't go below 0)
    state.waterLevel = Math.max(0, state.waterLevel - leaked);
    state.lastLeakTime = now;
 
    // Check if adding this request would overflow
    if (state.waterLevel + amount <= this.capacity) {
      state.waterLevel += amount;
      return true;
    }
 
    // Bucket would overflow - reject request
    return false;
  }
 
  /**
   * Get queue depth (how full is the bucket)
   */
  getQueueDepth(clientId: string): number {
    const state = this.buckets.get(clientId);
    if (!state) return 0;
 
    const now = Date.now();
    const elapsed = (now - state.lastLeakTime) / 1000;
    const leaked = elapsed * this.leakRate;
 
    return Math.max(0, state.waterLevel - leaked);
  }
}
 
// Usage: Allow bursts into queue but process at steady rate
const limiter = new LeakyBucketRateLimiter(
  50,   // capacity: queue up to 50 requests
  10    // leakRate: process 10 requests per second
);
 
// Requests are accepted as long as bucket isn't full
// Processing happens at constant 10/second rate

Token Bucket vs Leaky Bucket: A Critical Distinction

These two algorithms are often confused because they both use the bucket metaphor, but they solve different problems:

Aspect	Token Bucket	Leaky Bucket
Mental model	Tokens accumulate, requests consume them	Requests fill bucket, processing drains it
Burst behavior	Allows bursts up to capacity	Queues bursts, processes at constant rate
Output pattern	Variable (bursty allowed)	Smooth (constant rate)
Best for	API rate limits	Traffic shaping/smoothing
When bucket full	Requests succeed	Requests are queued
When bucket empty	Requests fail	Processing stops

Use Leaky Bucket when:

You need to smooth traffic spikes (protect downstream systems)
You're implementing network traffic shaping
Constant processing rate is more important than burst accommodation
You want to queue requests rather than reject them

The Leaky Bucket as a Queue

Fixed Window Counter

The mental model:

Imagine a counter that resets every minute. Each request increments the counter. If the counter exceeds your limit, reject requests. When a new minute starts, the counter resets to zero.

Simplicity is its strength: Fixed Window can be implemented with a single atomic increment operation and a timestamp comparison, making it extremely fast and easy to distribute.

fixed-window.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
/**
 * Fixed Window Counter Rate Limiter
 *
 * Simple and efficient, but has boundary problems.
 *
 * State per client:
 * - count: Requests in current window
 * - windowStart: Timestamp when current window began
 */
interface FixedWindowState {
  count: number;
  windowStart: number;
}
 
class FixedWindowRateLimiter {
  private readonly limit: number;           // Max requests per window
  private readonly windowSizeMs: number;    // Window duration in ms
  private readonly windows: Map<string, FixedWindowState> = new Map();
 
  constructor(limit: number, windowSizeSeconds: number) {
    this.limit = limit;
    this.windowSizeMs = windowSizeSeconds * 1000;
  }
 
  /**
   * Get current window start time (always aligned to window boundaries)
   */
  private getCurrentWindow(now: number): number {
    return Math.floor(now / this.windowSizeMs) * this.windowSizeMs;
  }
 
  isAllowed(clientId: string): boolean {
    const now = Date.now();
    const currentWindow = this.getCurrentWindow(now);
    
    let state = this.windows.get(clientId);
 
    // New client or window has changed - reset counter
    if (!state || state.windowStart !== currentWindow) {
      state = { count: 0, windowStart: currentWindow };
      this.windows.set(clientId, state);
    }
 
    // Check limit
    if (state.count < this.limit) {
      state.count++;
      return true;
    }
 
    return false;
  }
 
  /**
   * Get remaining requests in current window
   */
  getRemaining(clientId: string): number {
    const now = Date.now();
    const currentWindow = this.getCurrentWindow(now);
    const state = this.windows.get(clientId);
 
    if (!state || state.windowStart !== currentWindow) {
      return this.limit;
    }
 
    return Math.max(0, this.limit - state.count);
  }
 
  /**
   * Get time until window resets
   */
  getResetTime(clientId: string): number {
    const now = Date.now();
    const currentWindow = this.getCurrentWindow(now);
    return currentWindow + this.windowSizeMs - now;
  }
}
 
// Usage: 100 requests per 60-second window
const limiter = new FixedWindowRateLimiter(100, 60);

The Boundary Problem:

Fixed Window has a critical flaw: the 'boundary problem' or 'burst-at-edges' issue. Consider a limit of 100 requests per minute:

|-------- Window 1 --------|-------- Window 2 --------|
                     99 reqs|100 reqs
                    (0:59)  |(1:00)

A client makes 99 requests at second 59 of Window 1, then 100 requests at second 0 of Window 2. That's 199 requests in 2 seconds—nearly 2x the intended limit!

This happens because each window is independent. The algorithm has no visibility into request patterns near boundaries.

Fixed Window Strengths

•Extremely simple — Just a counter and a timestamp
•Memory efficient — O(1) state per client
•Fast — Single atomic operation (increment)
•Easy to distribute — Single key makes Redis implementation trivial
•Predictable reset — Users know exactly when their quota resets

Fixed Window Weaknesses

•Boundary problem — Up to 2x limit near window edges
•Burst concentration — All quota can be used in first second
•Thundering herd — Many clients may reset simultaneously
•Poor for security — Attackers can exploit boundary issue
•Stampede potential — Clients waiting for reset all resume at once

When to Avoid Fixed Window

Sliding Window Log Algorithm

The mental model:

sliding-window-log.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
/**
 * Sliding Window Log Rate Limiter
 *
 * Perfectly accurate but memory-intensive.
 *
 * State per client:
 * - timestamps: Array of all request timestamps within window
 */
class SlidingWindowLogRateLimiter {
  private readonly limit: number;
  private readonly windowSizeMs: number;
  private readonly logs: Map<string, number[]> = new Map();
 
  constructor(limit: number, windowSizeSeconds: number) {
    this.limit = limit;
    this.windowSizeMs = windowSizeSeconds * 1000;
  }
 
  isAllowed(clientId: string): boolean {
    const now = Date.now();
    const windowStart = now - this.windowSizeMs;
 
    // Get or initialize the log for this client
    let timestamps = this.logs.get(clientId);
    if (!timestamps) {
      timestamps = [];
      this.logs.set(clientId, timestamps);
    }
 
    // Remove timestamps outside the current window
    // Use binary search for efficiency in sorted array
    while (timestamps.length > 0 && timestamps[0] <= windowStart) {
      timestamps.shift();
    }
 
    // Check if we're under the limit
    if (timestamps.length < this.limit) {
      timestamps.push(now);
      return true;
    }
 
    return false;
  }
 
  getRemaining(clientId: string): number {
    const now = Date.now();
    const windowStart = now - this.windowSizeMs;
    const timestamps = this.logs.get(clientId);
 
    if (!timestamps) return this.limit;
 
    // Count timestamps within window
    const inWindow = timestamps.filter(ts => ts > windowStart).length;
    return Math.max(0, this.limit - inWindow);
  }
 
  /**
   * Get time until oldest request expires (next slot opens)
   */
  getRetryAfter(clientId: string): number {
    const now = Date.now();
    const timestamps = this.logs.get(clientId);
 
    if (!timestamps || timestamps.length < this.limit) {
      return 0;
    }
 
    // Oldest timestamp will expire first
    const oldestExpiry = timestamps[0] + this.windowSizeMs;
    return Math.max(0, oldestExpiry - now);
  }
}
 
// Usage: Exactly 100 requests in any rolling 60-second window
const limiter = new SlidingWindowLogRateLimiter(100, 60);

Sliding Window Log Strengths

•Perfect accuracy — No boundary problem, exact enforcement
•True sliding window — Any interval is correctly limited
•Precise retry timing — Know exactly when next slot opens
•Audit trail — Timestamps available for analysis
•No approximations — Mathematically correct behavior

Sliding Window Log Weaknesses

•O(n) memory per client — Stores every timestamp
•Expensive cleanup — Must remove old entries on each request
•Doesn't scale — 1000 req/s limit = 60000 entries per client per minute
•Sorted set overhead — More complex data structure operations
•Storage costs — At scale, memory becomes prohibitive

When to Use Sliding Window Log

Sliding Window Counter Algorithm

The mental model:

The magic formula:

weighted_count = current_window_count + (previous_window_count × overlap_percentage)

If we're 30% through the current window, we include 70% of the previous window's count.

sliding-window-counter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
/**
 * Sliding Window Counter Rate Limiter
 *
 * Best of both worlds: O(1) memory with near-perfect accuracy.
 *
 * State per client:
 * - currentCount: Requests in current window
 * - previousCount: Requests in previous window
 * - currentWindowStart: When current window began
 */
interface SlidingWindowCounterState {
  currentCount: number;
  previousCount: number;
  currentWindowStart: number;
}
 
class SlidingWindowCounterRateLimiter {
  private readonly limit: number;
  private readonly windowSizeMs: number;
  private readonly windows: Map<string, SlidingWindowCounterState> = new Map();
 
  constructor(limit: number, windowSizeSeconds: number) {
    this.limit = limit;
    this.windowSizeMs = windowSizeSeconds * 1000;
  }
 
  private getWindowStart(timestamp: number): number {
    return Math.floor(timestamp / this.windowSizeMs) * this.windowSizeMs;
  }
 
  isAllowed(clientId: string): boolean {
    const now = Date.now();
    const currentWindowStart = this.getWindowStart(now);
 
    let state = this.windows.get(clientId);
 
    // Initialize or handle window transitions
    if (!state) {
      state = {
        currentCount: 0,
        previousCount: 0,
        currentWindowStart: currentWindowStart,
      };
      this.windows.set(clientId, state);
    } else if (state.currentWindowStart !== currentWindowStart) {
      // Window has changed - shift counts
      const windowsElapsed = Math.floor(
        (currentWindowStart - state.currentWindowStart) / this.windowSizeMs
      );
 
      if (windowsElapsed === 1) {
        // Normal transition: current becomes previous
        state.previousCount = state.currentCount;
        state.currentCount = 0;
      } else {
        // Multiple windows elapsed: both reset
        state.previousCount = 0;
        state.currentCount = 0;
      }
      state.currentWindowStart = currentWindowStart;
    }
 
    // Calculate position within current window (0.0 to 1.0)
    const windowElapsed = (now - currentWindowStart) / this.windowSizeMs;
 
    // Calculate overlap of sliding window with previous fixed window
    const previousWindowWeight = 1 - windowElapsed;
 
    // Weighted count across both windows
    const weightedCount =
      state.currentCount + state.previousCount * previousWindowWeight;
 
    if (weightedCount < this.limit) {
      state.currentCount++;
      return true;
    }
 
    return false;
  }
 
  getRemaining(clientId: string): number {
    const now = Date.now();
    const currentWindowStart = this.getWindowStart(now);
    const state = this.windows.get(clientId);
 
    if (!state) return this.limit;
 
    // Handle stale state
    if (state.currentWindowStart !== currentWindowStart) {
      return this.limit; // Effectively reset
    }
 
    const windowElapsed = (now - currentWindowStart) / this.windowSizeMs;
    const previousWindowWeight = 1 - windowElapsed;
    const weightedCount =
      state.currentCount + state.previousCount * previousWindowWeight;
 
    return Math.max(0, Math.floor(this.limit - weightedCount));
  }
}
 
// Usage: ~100 requests per rolling 60-second window
const limiter = new SlidingWindowCounterRateLimiter(100, 60);

Accuracy Analysis:

Sliding Window Counter isn't perfectly accurate, but its error bounds are predictable and small:

Worst case: When the previous window had many more requests than the current window, the weighted count slightly underestimates actual requests in the sliding window
Error bound: Maximum error is bounded by the difference between windows, weighted by position
In practice: Error is typically < 1% with reasonable traffic patterns

Why it works:

This assumption breaks down with very bursty traffic within a single fixed window, but for most traffic patterns, it's remarkably accurate.

The Recommended Default

Comparative Analysis

Let's compare all five algorithms across key dimensions to help you choose the right one for your use case:

Rate Limiting Algorithm Comparison
Algorithm	Memory	Accuracy	Burst Handling	Complexity	Best For
Token Bucket	O(1) per client	Exact (for sustained rate)	Excellent (by design)	Low	APIs needing burst tolerance
Leaky Bucket	O(1) per client	Exact (for output rate)	Queues bursts	Low	Traffic shaping/smoothing
Fixed Window	O(1) per client	Poor (2x at boundaries)	All at once per window	Very Low	Simple quota tracking
Sliding Window Log	O(n) per client	Perfect	Natural handling	Medium	Low-volume, high-value operations
Sliding Window Counter	O(1) per client	Near-perfect (~99%)	Natural handling	Medium	General API rate limiting

Decision Framework:

┌─────────────────────────────────────────────────────────────┐
│ Do you need to allow bursts while limiting sustained rate? │
└─────────────────────────┬───────────────────────────────────┘
                          │
                    ┌─────▼─────┐
                    │    YES    │──────▶ Token Bucket
                    └───────────┘
                          │ NO
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ Do you need smooth, constant output rate?                   │
└─────────────────────────┬───────────────────────────────────┘
                          │
                    ┌─────▼─────┐
                    │    YES    │──────▶ Leaky Bucket
                    └───────────┘
                          │ NO
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ Is request volume low (< 100/min per client)?               │
│ AND do you need perfect accuracy?                           │
└─────────────────────────┬───────────────────────────────────┘
                          │
                    ┌─────▼─────┐
                    │    YES    │──────▶ Sliding Window Log
                    └───────────┘
                          │ NO
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ Is approximate accuracy okay (~99%)? Need memory efficiency?│
└─────────────────────────┬───────────────────────────────────┘
                          │
                    ┌─────▼─────┐
                    │    YES    │──────▶ Sliding Window Counter
                    └───────────┘        (Recommended Default)
                          │ NO
                          ▼
                    Fixed Window
                    (Only for very simple use cases)

Hybrid Approaches

Summary: Rate Limiting Algorithms

We've covered the five fundamental rate limiting algorithms in depth. Let's consolidate the key insights:

Key Takeaways

•Token Bucket excels at allowing bursts while enforcing sustained rates—ideal for user-facing APIs where occasional burst usage is acceptable
•Leaky Bucket smooths traffic to a constant output rate—ideal for protecting downstream systems from bursty input
•Fixed Window is simple and memory-efficient but has the boundary problem—suitable only where approximate limiting is acceptable
•Sliding Window Log provides perfect accuracy but O(n) memory—reserve for low-volume, high-value operations
•Sliding Window Counter offers the best tradeoff: near-perfect accuracy with O(1) memory—the recommended default for most use cases
•Algorithm choice depends on requirements: burst tolerance, accuracy needs, memory constraints, and traffic volume
•Hybrid approaches combining multiple algorithms at different layers are common in production

What's next:

Page Complete