Loading content...
Behind every rate limiting decision—every request allowed, every request rejected—lies an algorithm tracking time, counting requests, and making split-second decisions. The choice of algorithm determines not just whether you can implement rate limiting, but how precisely you can control traffic, how much memory you consume, and how fairly you treat your users.
Rate limiting algorithms range from simple counters that can be implemented in a few lines of code to sophisticated sliding window implementations that provide mathematical accuracy. Each has trade-offs in precision, memory usage, implementation complexity, and performance. Understanding these trade-offs is essential for choosing the right algorithm for your specific use case.
By the end of this page, you will deeply understand five core rate limiting algorithms—Token Bucket, Leaky Bucket, Fixed Window, Sliding Window Log, and Sliding Window Counter. You'll know how each works, their memory and computational characteristics, their strengths and weaknesses, and when to choose each for different scenarios.
Before diving into specific algorithms, let's establish the criteria we use to evaluate them. Different systems have different constraints, and the 'best' algorithm depends entirely on your requirements.
| Criterion | What It Measures | Why It Matters |
|---|---|---|
| Memory Efficiency | Memory per tracked entity (IP, user, key) | At millions of tracked entities, memory adds up; O(1) per entity vs O(n) per request matters |
| Accuracy | How closely actual rate matches configured limit | Inaccurate algorithms allow bursts above limit or reject requests below limit |
| Burst Handling | Ability to allow short bursts while enforcing long-term rate | Real traffic is bursty; pure smoothing frustrates legitimate users |
| Computational Cost | CPU per rate limit check | At millions of requests/second, every microsecond matters |
| Implementation Complexity | Difficulty of correct implementation | Complex algorithms are harder to implement, test, debug, and maintain |
| Distributed Support | Ease of coordination across multiple nodes | Most production systems require distributed rate limiting |
| Fairness | Equal treatment of requests within limit | Some algorithms can starve late-arriving requests within a window |
| Smooth Rate Enforcement | Whether rate enforcement is gradual or spiky | Some algorithms allow bursts at window boundaries that can stress systems |
There is no universally 'best' rate limiting algorithm. Token Bucket is excellent for allowing bursts while enforcing average rates. Fixed Window is simple and memory-efficient but has boundary problems. Choose based on your specific constraints and requirements.
The Token Bucket algorithm is perhaps the most widely used rate limiting algorithm, favored for its intuitive model and excellent balance of features. It's used by AWS, Google Cloud Platform, and countless enterprise systems.
The mental model:
Imagine a bucket that holds tokens. The bucket has a maximum capacity (let's say 100 tokens). Tokens are added to the bucket at a constant rate (say, 10 tokens per second). When a request arrives, it takes a token from the bucket. If the bucket is empty, the request is rejected (or queued).
This simple model naturally supports:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
/** * Token Bucket Rate Limiter * * State per client: * - tokens: Current number of available tokens * - lastRefillTime: Timestamp of last token refill */interface TokenBucketState { tokens: number; lastRefillTime: number;} class TokenBucketRateLimiter { private readonly capacity: number; // Maximum tokens in bucket private readonly refillRate: number; // Tokens added per second private readonly buckets: Map<string, TokenBucketState> = new Map(); constructor(capacity: number, refillRate: number) { this.capacity = capacity; this.refillRate = refillRate; } /** * Check if request should be allowed * @param clientId - Unique identifier for the client * @param tokensNeeded - Tokens required (usually 1, can vary by operation cost) * @returns Whether the request is allowed */ isAllowed(clientId: string, tokensNeeded: number = 1): boolean { const now = Date.now(); let state = this.buckets.get(clientId); // Initialize new clients with full bucket if (!state) { state = { tokens: this.capacity, lastRefillTime: now }; this.buckets.set(clientId, state); } // Calculate tokens to add based on elapsed time const elapsed = (now - state.lastRefillTime) / 1000; // seconds const tokensToAdd = elapsed * this.refillRate; // Refill bucket (capped at capacity) state.tokens = Math.min(this.capacity, state.tokens + tokensToAdd); state.lastRefillTime = now; // Check if we have enough tokens if (state.tokens >= tokensNeeded) { state.tokens -= tokensNeeded; return true; } return false; } /** * Get time until next token is available */ getRetryAfter(clientId: string): number { const state = this.buckets.get(clientId); if (!state || state.tokens >= 1) return 0; const tokensNeeded = 1 - state.tokens; return Math.ceil((tokensNeeded / this.refillRate) * 1000); // ms }} // Usage exampleconst limiter = new TokenBucketRateLimiter( 100, // capacity: 100 tokens max 10 // refillRate: 10 tokens per second); // This allows bursts of up to 100 requests immediately,// but sustained rate is limited to 10 requests/secondconst isAllowed = limiter.isAllowed("user-123");Token Bucket is ideal when you want to: allow burst traffic while limiting sustained rate, charge different costs for different operations, or implement a model users can easily understand ('you have X requests remaining'). It's the default choice for most API rate limiting use cases.
The Leaky Bucket algorithm is often confused with Token Bucket, but serves a fundamentally different purpose. While Token Bucket controls the average rate while allowing bursts, Leaky Bucket enforces a smooth, constant output rate.
The mental model:
Imagine a bucket with a hole in the bottom. Water (requests) flows into the bucket. The hole 'leaks' water out at a constant rate. If water (requests) arrive faster than the bucket can leak, the bucket fills up. Once full, additional water (requests) overflows and is discarded.
Key difference from Token Bucket:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
/** * Leaky Bucket Rate Limiter (as a meter) * * This implementation treats Leaky Bucket as a rate limiter that * smooths traffic by processing requests at a constant rate. * * State per client: * - waterLevel: Current "water" in the bucket * - lastLeakTime: Timestamp of last leak calculation */interface LeakyBucketState { waterLevel: number; lastLeakTime: number;} class LeakyBucketRateLimiter { private readonly capacity: number; // Maximum bucket size private readonly leakRate: number; // "Water" leaked per second private readonly buckets: Map<string, LeakyBucketState> = new Map(); constructor(capacity: number, leakRate: number) { this.capacity = capacity; this.leakRate = leakRate; } /** * Attempt to add a request to the bucket * @returns true if request accepted, false if bucket overflow (rejected) */ isAllowed(clientId: string, amount: number = 1): boolean { const now = Date.now(); let state = this.buckets.get(clientId); // Initialize new clients with empty bucket if (!state) { state = { waterLevel: 0, lastLeakTime: now }; this.buckets.set(clientId, state); } // Calculate water that has leaked since last check const elapsed = (now - state.lastLeakTime) / 1000; const leaked = elapsed * this.leakRate; // Update water level (can't go below 0) state.waterLevel = Math.max(0, state.waterLevel - leaked); state.lastLeakTime = now; // Check if adding this request would overflow if (state.waterLevel + amount <= this.capacity) { state.waterLevel += amount; return true; } // Bucket would overflow - reject request return false; } /** * Get queue depth (how full is the bucket) */ getQueueDepth(clientId: string): number { const state = this.buckets.get(clientId); if (!state) return 0; const now = Date.now(); const elapsed = (now - state.lastLeakTime) / 1000; const leaked = elapsed * this.leakRate; return Math.max(0, state.waterLevel - leaked); }} // Usage: Allow bursts into queue but process at steady rateconst limiter = new LeakyBucketRateLimiter( 50, // capacity: queue up to 50 requests 10 // leakRate: process 10 requests per second); // Requests are accepted as long as bucket isn't full// Processing happens at constant 10/second rateToken Bucket vs Leaky Bucket: A Critical Distinction
These two algorithms are often confused because they both use the bucket metaphor, but they solve different problems:
| Aspect | Token Bucket | Leaky Bucket |
|---|---|---|
| Mental model | Tokens accumulate, requests consume them | Requests fill bucket, processing drains it |
| Burst behavior | Allows bursts up to capacity | Queues bursts, processes at constant rate |
| Output pattern | Variable (bursty allowed) | Smooth (constant rate) |
| Best for | API rate limits | Traffic shaping/smoothing |
| When bucket full | Requests succeed | Requests are queued |
| When bucket empty | Requests fail | Processing stops |
Use Leaky Bucket when:
In many implementations, Leaky Bucket is literally a queue with a size limit. Requests enter the queue, and a worker processes them at a fixed rate. This transforms bursty input into smooth output—exactly what you want for traffic shaping to protect databases or downstream services.
The Fixed Window Counter is the simplest rate limiting algorithm. It divides time into fixed windows (e.g., 60-second windows) and counts requests within each window. When the count exceeds the limit, requests are rejected until the next window begins.
The mental model:
Imagine a counter that resets every minute. Each request increments the counter. If the counter exceeds your limit, reject requests. When a new minute starts, the counter resets to zero.
Simplicity is its strength: Fixed Window can be implemented with a single atomic increment operation and a timestamp comparison, making it extremely fast and easy to distribute.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
/** * Fixed Window Counter Rate Limiter * * Simple and efficient, but has boundary problems. * * State per client: * - count: Requests in current window * - windowStart: Timestamp when current window began */interface FixedWindowState { count: number; windowStart: number;} class FixedWindowRateLimiter { private readonly limit: number; // Max requests per window private readonly windowSizeMs: number; // Window duration in ms private readonly windows: Map<string, FixedWindowState> = new Map(); constructor(limit: number, windowSizeSeconds: number) { this.limit = limit; this.windowSizeMs = windowSizeSeconds * 1000; } /** * Get current window start time (always aligned to window boundaries) */ private getCurrentWindow(now: number): number { return Math.floor(now / this.windowSizeMs) * this.windowSizeMs; } isAllowed(clientId: string): boolean { const now = Date.now(); const currentWindow = this.getCurrentWindow(now); let state = this.windows.get(clientId); // New client or window has changed - reset counter if (!state || state.windowStart !== currentWindow) { state = { count: 0, windowStart: currentWindow }; this.windows.set(clientId, state); } // Check limit if (state.count < this.limit) { state.count++; return true; } return false; } /** * Get remaining requests in current window */ getRemaining(clientId: string): number { const now = Date.now(); const currentWindow = this.getCurrentWindow(now); const state = this.windows.get(clientId); if (!state || state.windowStart !== currentWindow) { return this.limit; } return Math.max(0, this.limit - state.count); } /** * Get time until window resets */ getResetTime(clientId: string): number { const now = Date.now(); const currentWindow = this.getCurrentWindow(now); return currentWindow + this.windowSizeMs - now; }} // Usage: 100 requests per 60-second windowconst limiter = new FixedWindowRateLimiter(100, 60);The Boundary Problem:
Fixed Window has a critical flaw: the 'boundary problem' or 'burst-at-edges' issue. Consider a limit of 100 requests per minute:
|-------- Window 1 --------|-------- Window 2 --------|
99 reqs|100 reqs
(0:59) |(1:00)
A client makes 99 requests at second 59 of Window 1, then 100 requests at second 0 of Window 2. That's 199 requests in 2 seconds—nearly 2x the intended limit!
This happens because each window is independent. The algorithm has no visibility into request patterns near boundaries.
Avoid Fixed Window for security-critical rate limiting (authentication, payment endpoints) where the boundary problem could be exploited. It's acceptable for soft limits where approximate enforcement is sufficient (general API usage tracking, non-critical analytics).
The Sliding Window Log algorithm provides perfect accuracy by maintaining a log of all request timestamps within the rate limit window. Instead of fixed boundaries, it uses a 'sliding' window that always looks back from the current moment.
The mental model:
Keep a list of all request timestamps. For each new request, remove timestamps older than the window size, then count remaining entries. If the count is under the limit, allow the request and add its timestamp to the log.
Mathematical precision: This algorithm enforces exact limits. If your limit is 100 requests per minute, there will never be more than 100 requests in any 60-second period, regardless of when those periods start.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
/** * Sliding Window Log Rate Limiter * * Perfectly accurate but memory-intensive. * * State per client: * - timestamps: Array of all request timestamps within window */class SlidingWindowLogRateLimiter { private readonly limit: number; private readonly windowSizeMs: number; private readonly logs: Map<string, number[]> = new Map(); constructor(limit: number, windowSizeSeconds: number) { this.limit = limit; this.windowSizeMs = windowSizeSeconds * 1000; } isAllowed(clientId: string): boolean { const now = Date.now(); const windowStart = now - this.windowSizeMs; // Get or initialize the log for this client let timestamps = this.logs.get(clientId); if (!timestamps) { timestamps = []; this.logs.set(clientId, timestamps); } // Remove timestamps outside the current window // Use binary search for efficiency in sorted array while (timestamps.length > 0 && timestamps[0] <= windowStart) { timestamps.shift(); } // Check if we're under the limit if (timestamps.length < this.limit) { timestamps.push(now); return true; } return false; } getRemaining(clientId: string): number { const now = Date.now(); const windowStart = now - this.windowSizeMs; const timestamps = this.logs.get(clientId); if (!timestamps) return this.limit; // Count timestamps within window const inWindow = timestamps.filter(ts => ts > windowStart).length; return Math.max(0, this.limit - inWindow); } /** * Get time until oldest request expires (next slot opens) */ getRetryAfter(clientId: string): number { const now = Date.now(); const timestamps = this.logs.get(clientId); if (!timestamps || timestamps.length < this.limit) { return 0; } // Oldest timestamp will expire first const oldestExpiry = timestamps[0] + this.windowSizeMs; return Math.max(0, oldestExpiry - now); }} // Usage: Exactly 100 requests in any rolling 60-second windowconst limiter = new SlidingWindowLogRateLimiter(100, 60);Use Sliding Window Log when perfect accuracy is critical AND request volumes are low (< 100 requests/minute per client). It's ideal for high-value, low-frequency operations: payment processing, admin actions, expensive computations. Avoid for high-throughput general API limiting where memory costs would be prohibitive.
The Sliding Window Counter algorithm combines the memory efficiency of Fixed Window with the accuracy of Sliding Window Log. It achieves this by weighting the previous window's count based on overlap with the current sliding window.
The mental model:
Maintain counts for the current and previous fixed windows. When checking the rate limit, calculate a weighted sum: the full count of the current window plus a proportion of the previous window based on how much it overlaps with the sliding window.
The magic formula:
weighted_count = current_window_count + (previous_window_count × overlap_percentage)
If we're 30% through the current window, we include 70% of the previous window's count.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103
/** * Sliding Window Counter Rate Limiter * * Best of both worlds: O(1) memory with near-perfect accuracy. * * State per client: * - currentCount: Requests in current window * - previousCount: Requests in previous window * - currentWindowStart: When current window began */interface SlidingWindowCounterState { currentCount: number; previousCount: number; currentWindowStart: number;} class SlidingWindowCounterRateLimiter { private readonly limit: number; private readonly windowSizeMs: number; private readonly windows: Map<string, SlidingWindowCounterState> = new Map(); constructor(limit: number, windowSizeSeconds: number) { this.limit = limit; this.windowSizeMs = windowSizeSeconds * 1000; } private getWindowStart(timestamp: number): number { return Math.floor(timestamp / this.windowSizeMs) * this.windowSizeMs; } isAllowed(clientId: string): boolean { const now = Date.now(); const currentWindowStart = this.getWindowStart(now); let state = this.windows.get(clientId); // Initialize or handle window transitions if (!state) { state = { currentCount: 0, previousCount: 0, currentWindowStart: currentWindowStart, }; this.windows.set(clientId, state); } else if (state.currentWindowStart !== currentWindowStart) { // Window has changed - shift counts const windowsElapsed = Math.floor( (currentWindowStart - state.currentWindowStart) / this.windowSizeMs ); if (windowsElapsed === 1) { // Normal transition: current becomes previous state.previousCount = state.currentCount; state.currentCount = 0; } else { // Multiple windows elapsed: both reset state.previousCount = 0; state.currentCount = 0; } state.currentWindowStart = currentWindowStart; } // Calculate position within current window (0.0 to 1.0) const windowElapsed = (now - currentWindowStart) / this.windowSizeMs; // Calculate overlap of sliding window with previous fixed window const previousWindowWeight = 1 - windowElapsed; // Weighted count across both windows const weightedCount = state.currentCount + state.previousCount * previousWindowWeight; if (weightedCount < this.limit) { state.currentCount++; return true; } return false; } getRemaining(clientId: string): number { const now = Date.now(); const currentWindowStart = this.getWindowStart(now); const state = this.windows.get(clientId); if (!state) return this.limit; // Handle stale state if (state.currentWindowStart !== currentWindowStart) { return this.limit; // Effectively reset } const windowElapsed = (now - currentWindowStart) / this.windowSizeMs; const previousWindowWeight = 1 - windowElapsed; const weightedCount = state.currentCount + state.previousCount * previousWindowWeight; return Math.max(0, Math.floor(this.limit - weightedCount)); }} // Usage: ~100 requests per rolling 60-second windowconst limiter = new SlidingWindowCounterRateLimiter(100, 60);Accuracy Analysis:
Sliding Window Counter isn't perfectly accurate, but its error bounds are predictable and small:
Why it works:
The weighted approximation assumes requests in the previous window were evenly distributed. If a client made 60 requests in the previous window, we assume ~1 request per second. When calculating overlap, we credit them proportionally.
This assumption breaks down with very bursty traffic within a single fixed window, but for most traffic patterns, it's remarkably accurate.
Sliding Window Counter is often the best default choice for API rate limiting. It provides near-perfect accuracy with O(1) memory—only 2 counters per client regardless of request volume. It eliminates the Fixed Window boundary problem while avoiding Sliding Window Log's memory explosion. Major cloud providers use variants of this algorithm.
Let's compare all five algorithms across key dimensions to help you choose the right one for your use case:
| Algorithm | Memory | Accuracy | Burst Handling | Complexity | Best For |
|---|---|---|---|---|---|
| Token Bucket | O(1) per client | Exact (for sustained rate) | Excellent (by design) | Low | APIs needing burst tolerance |
| Leaky Bucket | O(1) per client | Exact (for output rate) | Queues bursts | Low | Traffic shaping/smoothing |
| Fixed Window | O(1) per client | Poor (2x at boundaries) | All at once per window | Very Low | Simple quota tracking |
| Sliding Window Log | O(n) per client | Perfect | Natural handling | Medium | Low-volume, high-value operations |
| Sliding Window Counter | O(1) per client | Near-perfect (~99%) | Natural handling | Medium | General API rate limiting |
Decision Framework:
┌─────────────────────────────────────────────────────────────┐
│ Do you need to allow bursts while limiting sustained rate? │
└─────────────────────────┬───────────────────────────────────┘
│
┌─────▼─────┐
│ YES │──────▶ Token Bucket
└───────────┘
│ NO
▼
┌─────────────────────────────────────────────────────────────┐
│ Do you need smooth, constant output rate? │
└─────────────────────────┬───────────────────────────────────┘
│
┌─────▼─────┐
│ YES │──────▶ Leaky Bucket
└───────────┘
│ NO
▼
┌─────────────────────────────────────────────────────────────┐
│ Is request volume low (< 100/min per client)? │
│ AND do you need perfect accuracy? │
└─────────────────────────┬───────────────────────────────────┘
│
┌─────▼─────┐
│ YES │──────▶ Sliding Window Log
└───────────┘
│ NO
▼
┌─────────────────────────────────────────────────────────────┐
│ Is approximate accuracy okay (~99%)? Need memory efficiency?│
└─────────────────────────┬───────────────────────────────────┘
│
┌─────▼─────┐
│ YES │──────▶ Sliding Window Counter
└───────────┘ (Recommended Default)
│ NO
▼
Fixed Window
(Only for very simple use cases)
In production, you often combine algorithms. Example: Token Bucket for per-user API limits (allowing bursts) + Leaky Bucket queue in front of your database (smoothing load) + Fixed Window for simple daily quotas (where boundaries don't matter). Different layers can use different algorithms based on their specific requirements.
We've covered the five fundamental rate limiting algorithms in depth. Let's consolidate the key insights:
What's next:
Understanding algorithms is essential, but most production systems require rate limiting across multiple servers. The next page covers Distributed Rate Limiting—the challenges of coordination, consistency tradeoffs, and architectures for implementing rate limiting at scale across distributed infrastructure.
You now understand the core rate limiting algorithms, their mechanics, and their trade-offs. Whether you're implementing rate limiting from scratch or evaluating existing solutions, this knowledge helps you make informed decisions about which algorithm fits your specific requirements.