Loading content...
While the Token Bucket algorithm excels at allowing controlled bursts, some systems require exact enforcement of time-based limits. Consider a payment API that allows 100 transactions per hour—allowing 100 transactions in the last minute of one hour and 100 more in the first minute of the next violates the spirit of the limit.
The Sliding Window family of algorithms solves this problem by tracking requests within a continuously moving time window, ensuring that at any instant, the rate limit is precisely enforced. This seemingly simple requirement leads to fascinating trade-offs between accuracy, memory, and computational complexity.
In this page, we'll explore the evolution from fixed windows to sliding window logs to sliding window counters, understanding when each approach is appropriate and how they're implemented in production systems at companies like Google, Amazon, and Cloudflare.
By the end of this page, you'll understand: (1) The boundary-spike problem with fixed windows, (2) Sliding Window Log for perfect accuracy, (3) Sliding Window Counter for efficient approximation, (4) Memory and computational trade-offs, and (5) Choosing between algorithms for specific use cases.
Before diving into sliding windows, let's understand why fixed windows are insufficient for many use cases. This understanding motivates the more complex sliding window approaches.
Fixed window rate limiting divides time into discrete, non-overlapping intervals:
|←── Window 1 ──→|←── Window 2 ──→|←── Window 3 ──→|
12:00-12:01 12:01-12:02 12:02-12:03
Counter: 0→87 Counter: 0→45 Counter: 0→...
Algorithm:
Advantages:
12345678910111213141516171819202122232425262728293031323334353637383940
/** * Fixed Window Rate Limiter * Simple but has boundary-spike problem */class FixedWindowRateLimiter { private counter: number = 0; private windowStart: number = 0; constructor( private readonly limit: number, // Max requests per window private readonly windowSizeMs: number // Window duration in milliseconds ) {} check(): RateLimitResult { const now = Date.now(); const currentWindow = Math.floor(now / this.windowSizeMs); // Check if we've moved to a new window if (currentWindow > this.windowStart) { this.counter = 0; this.windowStart = currentWindow; } if (this.counter < this.limit) { this.counter++; return { allowed: true, remaining: this.limit - this.counter, resetAt: (currentWindow + 1) * this.windowSizeMs }; } return { allowed: false, remaining: 0, resetAt: (currentWindow + 1) * this.windowSizeMs, retryAfter: Math.ceil(((currentWindow + 1) * this.windowSizeMs - now) / 1000) }; }}Fixed windows have a critical flaw: requests cluster at window boundaries, allowing bursts up to 2× the intended limit.
Example Scenario:
Window 1 (12:00-12:01) Window 2 (12:01-12:02)
| ▓▓▓▓▓▓▓▓▓▓▓▓|▓▓▓▓▓▓▓▓▓▓▓ |
Last 1s: 100 reqs │ First 1s: 100 reqs
↓
200 requests in 2 seconds
(2× the intended rate!)
This isn't a theoretical problem—it's exploited regularly:
At scale, boundary spikes cause visible problems. If 1,000 clients each make their full quota at boundaries, servers receive 2× expected load every minute. Many production outages trace back to fixed-window boundary effects combined with synchronized client behavior.
| Aspect | Characteristic | Impact |
|---|---|---|
| Time Complexity | O(1) | Excellent |
| Space Complexity | O(1) per client | Excellent |
| Accuracy | ±100% at boundaries | Poor |
| Implementation | Very simple | Easy |
| Suitable For | Rough limiting, non-critical | Limited use |
The Sliding Window Log algorithm provides perfect accuracy by tracking every individual request timestamp. At any moment, it counts exactly how many requests occurred in the trailing window.
How It Works:
Visual Representation:
Current time: 12:05:30
Window size: 60 seconds
Log entries: [12:04:31, 12:04:45, 12:04:55, 12:05:10, 12:05:20, 12:05:28]
↑ ↑ ↑ ↑ ↑ ↑
In window In window In window In window In window In window
Requests in window: 6
Limit: 10
→ This request ALLOWED (6 < 10)
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
/** * Sliding Window Log Rate Limiter * Perfect accuracy but high memory usage */class SlidingWindowLog { private timestamps: number[] = []; constructor( private readonly limit: number, private readonly windowSizeMs: number ) {} check(): RateLimitResult { const now = Date.now(); const windowStart = now - this.windowSizeMs; // Remove expired timestamps (entries older than window) // Using binary search for efficiency in large logs const cutoffIndex = this.binarySearchCutoff(windowStart); this.timestamps = this.timestamps.slice(cutoffIndex); // Check current count const currentCount = this.timestamps.length; if (currentCount < this.limit) { // Allow and record this request this.timestamps.push(now); return { allowed: true, remaining: this.limit - currentCount - 1, resetAt: this.calculateResetTime(now) }; } // Calculate when the oldest request will expire const oldestTimestamp = this.timestamps[0]; const retryAfter = Math.ceil((oldestTimestamp + this.windowSizeMs - now) / 1000); return { allowed: false, remaining: 0, resetAt: oldestTimestamp + this.windowSizeMs, retryAfter }; } /** * Binary search for first timestamp >= cutoff */ private binarySearchCutoff(cutoff: number): number { let low = 0; let high = this.timestamps.length; while (low < high) { const mid = Math.floor((low + high) / 2); if (this.timestamps[mid] < cutoff) { low = mid + 1; } else { high = mid; } } return low; } private calculateResetTime(now: number): number { if (this.timestamps.length === 0) { return now + this.windowSizeMs; } return this.timestamps[0] + this.windowSizeMs; } /** * Get detailed status for debugging/monitoring */ getStatus(): WindowLogStatus { const now = Date.now(); const windowStart = now - this.windowSizeMs; const validTimestamps = this.timestamps.filter(t => t >= windowStart); return { requestsInWindow: validTimestamps.length, limit: this.limit, remaining: Math.max(0, this.limit - validTimestamps.length), oldestRequest: validTimestamps[0], newestRequest: validTimestamps[validTimestamps.length - 1] }; }} interface WindowLogStatus { requestsInWindow: number; limit: number; remaining: number; oldestRequest?: number; newestRequest?: number;}Redis sorted sets are ideal for sliding window logs—timestamps serve as scores, enabling efficient range queries:
123456789101112131415161718192021222324252627282930313233343536
-- Redis Lua script for sliding window log-- KEYS[1] = rate limit key-- ARGV[1] = current timestamp (ms) -- ARGV[2] = window size (ms)-- ARGV[3] = limit local key = KEYS[1]local now = tonumber(ARGV[1])local windowSize = tonumber(ARGV[2])local limit = tonumber(ARGV[3])local windowStart = now - windowSize -- Remove entries outside the windowredis.call('ZREMRANGEBYSCORE', key, '-inf', windowStart) -- Count entries in windowlocal count = redis.call('ZCARD', key) if count < limit then -- Add new entry with current timestamp as score -- Use timestamp + random to handle concurrent requests local member = now .. ':' .. math.random(1000000) redis.call('ZADD', key, now, member) -- Set expiry to clean up inactive keys redis.call('PEXPIRE', key, windowSize * 2) return {1, limit - count - 1, 0} -- allowed, remaining, retryAfterelse -- Get oldest entry to calculate retry time local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES') local oldestTimestamp = tonumber(oldest[2]) or now local retryAfter = math.ceil((oldestTimestamp + windowSize - now) / 1000) return {0, 0, retryAfter} -- denied, remaining=0, retryAfterendPer-request storage:
Example calculation:
This is clearly impractical for high-volume, high-limit scenarios. The sliding window log is best suited for:
The Sliding Window Counter combines fixed windows' efficiency with sliding windows' smooth behavior by estimating the request count in the current sliding window using weighted counts from adjacent fixed windows.
The Key Insight:
Instead of tracking every timestamp, we track counts in fixed windows and interpolate. If we're 30% into the current fixed window, we weight:
This approximation eliminates boundary spikes while using only O(1) memory per client.
Estimated requests in sliding window:
sliding_count = (previous_window_count × previous_weight) + current_window_count
where:
previous_weight = 1 - (elapsed_in_current / window_size)
elapsed_in_current = current_time - current_window_start
Example:
Limit: 100 requests per minute
Current time: 12:05:30
Window size: 60 seconds
Previous window (12:04:00-12:05:00): 80 requests
Current window (12:05:00-ongoing): 30 requests so far
elapsed_in_current = 30 seconds
previous_weight = 1 - (30/60) = 0.5
sliding_count = (80 × 0.5) + 30
= 40 + 30
= 70 requests
Limit = 100, current estimate = 70
→ Request ALLOWED (30 remaining)
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114
/** * Sliding Window Counter Rate Limiter * O(1) memory with ~1-2% accuracy variance */class SlidingWindowCounter { private previousWindowCount: number = 0; private currentWindowCount: number = 0; private currentWindowStart: number; constructor( private readonly limit: number, private readonly windowSizeMs: number ) { // Align to window boundary for consistency const now = Date.now(); this.currentWindowStart = Math.floor(now / windowSizeMs) * windowSizeMs; } check(): RateLimitResult { const now = Date.now(); this.updateWindows(now); // Calculate sliding window estimate const elapsedInCurrent = now - this.currentWindowStart; const previousWeight = 1 - (elapsedInCurrent / this.windowSizeMs); const estimatedCount = (this.previousWindowCount * previousWeight) + this.currentWindowCount; if (estimatedCount < this.limit) { this.currentWindowCount++; // Recalculate after increment const newEstimatedCount = (this.previousWindowCount * previousWeight) + this.currentWindowCount; return { allowed: true, remaining: Math.max(0, Math.floor(this.limit - newEstimatedCount) ), resetAt: this.calculateResetTime(now, previousWeight) }; } // Calculate retry time const retryAfter = this.calculateRetryAfter( now, estimatedCount, previousWeight ); return { allowed: false, remaining: 0, resetAt: this.currentWindowStart + this.windowSizeMs, retryAfter }; } /** * Slide windows forward as time progresses */ private updateWindows(now: number): void { const currentWindow = Math.floor(now / this.windowSizeMs); const storedWindow = Math.floor(this.currentWindowStart / this.windowSizeMs); if (currentWindow === storedWindow) { // Still in the same window return; } if (currentWindow === storedWindow + 1) { // Moved to the next window this.previousWindowCount = this.currentWindowCount; this.currentWindowCount = 0; this.currentWindowStart = currentWindow * this.windowSizeMs; } else { // Skipped one or more windows (long idle period) this.previousWindowCount = 0; this.currentWindowCount = 0; this.currentWindowStart = currentWindow * this.windowSizeMs; } } /** * Calculate when to retry if rate limited */ private calculateRetryAfter( now: number, currentCount: number, previousWeight: number ): number { // We need: (prev × newWeight) + current < limit // As time passes, previousWeight decreases // Find when the estimate drops below the limit const excess = currentCount - this.limit + 1; if (excess <= 0) return 0; // Simplified: wait for more previous window to expire const fractionToWait = excess / this.previousWindowCount; const msToWait = fractionToWait * this.windowSizeMs; return Math.ceil(msToWait / 1000); } private calculateResetTime(now: number, previousWeight: number): number { // Window resets when current window fully expires return this.currentWindowStart + (2 * this.windowSizeMs); }}The sliding window counter requires just two keys per client:
123456789101112131415161718192021222324252627282930313233343536373839404142434445
-- Redis Lua script for sliding window counter-- KEYS[1] = current window key (e.g., "ratelimit:user123:202301011200")-- KEYS[2] = previous window key (e.g., "ratelimit:user123:202301011159")-- ARGV[1] = limit-- ARGV[2] = window size in seconds-- ARGV[3] = current timestamp (ms) local currentKey = KEYS[1]local previousKey = KEYS[2]local limit = tonumber(ARGV[1])local windowSize = tonumber(ARGV[2]) * 1000 -- Convert to mslocal now = tonumber(ARGV[3]) -- Get counts from both windowslocal previousCount = tonumber(redis.call('GET', previousKey) or '0')local currentCount = tonumber(redis.call('GET', currentKey) or '0') -- Calculate the sliding window estimatelocal currentWindowStart = math.floor(now / windowSize) * windowSizelocal elapsedInCurrent = now - currentWindowStartlocal previousWeight = 1 - (elapsedInCurrent / windowSize) local estimate = (previousCount * previousWeight) + currentCount if estimate < limit then -- Increment current window counter redis.call('INCR', currentKey) -- Set TTL (2 window sizes to cover previous + current) redis.call('PEXPIRE', currentKey, windowSize * 2) -- Calculate new remaining local newEstimate = (previousCount * previousWeight) + currentCount + 1 local remaining = math.max(0, math.floor(limit - newEstimate)) return {1, remaining, 0} -- allowed, remaining, retryAfterelse -- Calculate retry time local excess = estimate - limit + 1 local fractionToWait = excess / (previousCount + 0.01) -- Avoid div by 0 local msToWait = fractionToWait * windowSize local retryAfter = math.ceil(msToWait / 1000) return {0, 0, retryAfter} -- denied, remaining, retryAfterendUse predictable key names that include the window timestamp: ratelimit:{client}:{window_number}. This allows calculating previous and current window keys from any timestamp. Example: ratelimit:user123:1672531200 where 1672531200 is the window start epoch.
| Aspect | Value | Notes |
|---|---|---|
| Time Complexity | O(1) | Fixed number of operations |
| Space Complexity | O(1) | Just 2 counters per client |
| Accuracy | ~98-99% | Slightly conservative estimate |
| Boundary Spikes | Eliminated | Smooth transition between windows |
| Memory per Client | ~32 bytes | Two 8-byte counters + overhead |
The sliding window counter makes an approximation that has bounded error. Understanding this error helps determine when the algorithm is appropriate.
The approximation assumes requests are uniformly distributed within each window. When traffic is bursty, the estimate can diverge from reality:
Scenario 1: Traffic Front-Loaded
Previous window: 100 requests, all in first 10 seconds
Current time: 30 seconds into current window (50% through)
Actual previous weight: Only 10s of 60s should count → ~17%
Our weight: 1 - 0.5 = 50%
Estimate: 100 × 0.5 = 50 (from previous)
Actual: 100 × 0.17 = 17 (from previous)
→ We over-estimate by 33 requests (conservative error)
Scenario 2: Traffic Back-Loaded
Previous window: 100 requests, all in last 10 seconds
Current time: 30 seconds into current window (50% through)
Actual previous weight: All 100 still relevant → 100%
Our weight: 1 - 0.5 = 50%
Estimate: 100 × 0.5 = 50 (from previous)
Actual: 100 × 1.0 = 100 (from previous)
→ We under-estimate by 50 requests (allows burst)
Worst Case: Maximum over-count happens when:
previousLimit requestsMaximum under-count happens when:
previousLimit × (1 - 1/windowSize) requestsPractical Error: With real traffic patterns (not adversarial), Cloudflare reports <1% error in production. The algorithm is conservative on average—it tends to allow slightly fewer requests than the exact limit.
| Algorithm | Accuracy | Error Pattern | Memory |
|---|---|---|---|
| Fixed Window | ±100% | Boundary spikes allow 2× limit | O(1) |
| Sliding Window Log | 100% | Perfect, no error | O(limit) |
| Sliding Window Counter | ~99% | Slight under/over-count based on distribution | O(1) |
| Token Bucket | N/A | Different model (average rate + burst) | O(1) |
For most APIs, the sliding window counter's ~99% accuracy is perfectly acceptable. Use the sliding window log (or combine with external enforcement) for: billing-critical limits where over-allowing has financial impact, security rate limits where precision prevents attacks, and compliance requirements mandating exact enforcement.
With multiple algorithms available, choosing the right one for your use case is critical. Let's create a decision framework.
Use Token Bucket when:
Use Fixed Window when:
Use Sliding Window Counter when:
Use Sliding Window Log when:
| Use Case | Recommended | Rationale |
|---|---|---|
| General API rate limiting | Sliding Window Counter | Balance of accuracy and efficiency |
| High-value endpoints with low limits | Sliding Window Log | Perfect accuracy justifies memory cost |
| Allowing burst with average rate | Token Bucket | Natural fit for bursty traffic |
| Per-endpoint limiting with many endpoints | Sliding Window Counter | O(1) memory scales well |
| Authentication/login attempts | Sliding Window Log | Security requires precision |
| Prototype/MVP | Fixed Window | Simplest to implement quickly |
| Combined burst + rate | Token Bucket + Sliding | Layered limiting strategy |
Production systems often combine algorithms:
Token Bucket + Sliding Window Counter:
1. Token Bucket: Allow 100 burst, refill 10/sec
→ Controls instant burst capacity
2. Sliding Window Counter: 1000/hour
→ Controls sustained rate
A request must pass BOTH checks. This prevents:
Per-Endpoint + Global Limits:
1. Global: 10,000 requests/hour per user (any endpoint)
2. /search: 100 requests/minute per user
3. /export: 10 requests/hour per user
Checking all applicable rules ensures expensive operations don't consume all quota while still limiting overall usage.
Begin with a single sliding window counter for your entire API. Monitor usage patterns. Add endpoint-specific limits only for endpoints that need them. Layer token buckets only if burst control becomes important. Complexity has maintenance costs—add it only when data justifies it.
For systems handling millions of requests per second, even O(1) algorithms need optimization. Let's explore techniques used at massive scale.
Instead of one window size, use multiple resolutions for different purposes:
// Coarse windows for long-term limits
const hourlyCounter = new SlidingWindowCounter(10000, 3600 * 1000);
// Fine windows for burst prevention
const secondCounter = new SlidingWindowCounter(100, 1000);
function checkLimit(request: Request): boolean {
// Short-term check first (faster to fail)
if (!secondCounter.check().allowed) return false;
// Then long-term check
if (!hourlyCounter.check().allowed) return false;
return true;
}
Benefits:
For extremely high limits or cardinality counting (unique IPs), exact counting is impractical. HyperLogLog provides approximate counting with bounded error:
-- Redis HyperLogLog for unique visitors per hour
local key = 'ratelimit:unique_ips:' .. window_hour
redis.call('PFADD', key, client_ip)
redis.call('EXPIRE', key, 7200) -- 2 hours
local unique_count = redis.call('PFCOUNT', key)
if unique_count > limit then
return 0 -- Rate limited
end
HyperLogLog characteristics:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
/** * Optimized Sliding Window with local caching * Reduces Redis round-trips for high-throughput scenarios */class OptimizedSlidingWindow { private localCache: Map<string, LocalCounter> = new Map(); private readonly localIncrement: number; private readonly syncIntervalMs: number; constructor( private readonly redis: RedisClient, private readonly limit: number, private readonly windowSizeMs: number, nodeCount: number = 10 ) { // Each node gets a portion of the limit for local counting this.localIncrement = Math.max(1, Math.floor(limit / (nodeCount * 10))); this.syncIntervalMs = 100; // Sync with Redis every 100ms // Periodic sync setInterval(() => this.syncToRedis(), this.syncIntervalMs); } async check(clientId: string): Promise<RateLimitResult> { let local = this.localCache.get(clientId); const now = Date.now(); if (!local || local.expiry < now) { // Cache miss or expired - fetch from Redis const redisResult = await this.fetchFromRedis(clientId, now); local = { count: redisResult.count, pending: 0, expiry: now + this.syncIntervalMs }; this.localCache.set(clientId, local); } // Estimate: Redis count + pending local increments const estimate = local.count + local.pending; if (estimate < this.limit) { // Increment locally, batch sync to Redis local.pending++; return { allowed: true, remaining: this.limit - estimate - 1, resetAt: this.calculateReset(now) }; } return { allowed: false, remaining: 0, resetAt: this.calculateReset(now), retryAfter: Math.ceil(this.windowSizeMs / 1000 / this.limit) }; } private async syncToRedis(): Promise<void> { const pipeline = this.redis.pipeline(); for (const [clientId, local] of this.localCache) { if (local.pending > 0) { // Batch increment pipeline.incrby(this.getRedisKey(clientId), local.pending); local.pending = 0; } } await pipeline.exec(); } private async fetchFromRedis(clientId: string, now: number): Promise<{count: number}> { const key = this.getRedisKey(clientId); const count = await this.redis.get(key); return { count: parseInt(count || '0', 10) }; } private getRedisKey(clientId: string): string { const window = Math.floor(Date.now() / this.windowSizeMs); return `ratelimit:${clientId}:${window}`; } private calculateReset(now: number): number { const windowStart = Math.floor(now / this.windowSizeMs) * this.windowSizeMs; return windowStart + this.windowSizeMs; }} interface LocalCounter { count: number; // Last known Redis count pending: number; // Local increments not yet synced expiry: number; // When to refresh from Redis}We've explored the family of sliding window algorithms that solve the boundary-spike problem of fixed windows. Let's consolidate our understanding:
What's Next:
We've mastered single-node rate limiting algorithms. But in production, APIs run on multiple servers across multiple regions. How do we coordinate rate limits across distributed systems? In the next page, we'll explore Distributed Rate Limiting—the challenge of maintaining consistent limits when state is spread across a fleet of servers.
You now understand the full spectrum of sliding window algorithms—from the simple fixed window to the precise sliding window log to the efficient sliding window counter. You can analyze accuracy trade-offs, choose appropriate algorithms, and implement optimized solutions. Next, we'll tackle the distributed systems challenge.