Rate Limiter - Learning Module

Loading content...

0/273

Sliding Window Algorithm

Precision Rate Limiting Without Boundary Problems

While the Token Bucket algorithm excels at allowing controlled bursts, some systems require exact enforcement of time-based limits. Consider a payment API that allows 100 transactions per hour—allowing 100 transactions in the last minute of one hour and 100 more in the first minute of the next violates the spirit of the limit.

The Sliding Window family of algorithms solves this problem by tracking requests within a continuously moving time window, ensuring that at any instant, the rate limit is precisely enforced. This seemingly simple requirement leads to fascinating trade-offs between accuracy, memory, and computational complexity.

In this page, we'll explore the evolution from fixed windows to sliding window logs to sliding window counters, understanding when each approach is appropriate and how they're implemented in production systems at companies like Google, Amazon, and Cloudflare.

What You Will Master

By the end of this page, you'll understand: (1) The boundary-spike problem with fixed windows, (2) Sliding Window Log for perfect accuracy, (3) Sliding Window Counter for efficient approximation, (4) Memory and computational trade-offs, and (5) Choosing between algorithms for specific use cases.

The Fixed Window Problem

Before diving into sliding windows, let's understand why fixed windows are insufficient for many use cases. This understanding motivates the more complex sliding window approaches.

How Fixed Windows Work

Fixed window rate limiting divides time into discrete, non-overlapping intervals:

|←── Window 1 ──→|←── Window 2 ──→|←── Window 3 ──→|
    12:00-12:01        12:01-12:02        12:02-12:03
    Counter: 0→87      Counter: 0→45      Counter: 0→...

Algorithm:

Determine which window the current request belongs to
Increment counter for that window
If counter exceeds limit → deny
If counter is within limit → allow
Reset counter when window changes

Advantages:

Extremely simple to implement
O(1) time and space
Easy to understand and debug

fixed-window.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/**
 * Fixed Window Rate Limiter
 * Simple but has boundary-spike problem
 */
class FixedWindowRateLimiter {
  private counter: number = 0;
  private windowStart: number = 0;
  
  constructor(
    private readonly limit: number,        // Max requests per window
    private readonly windowSizeMs: number  // Window duration in milliseconds
  ) {}
  
  check(): RateLimitResult {
    const now = Date.now();
    const currentWindow = Math.floor(now / this.windowSizeMs);
    
    // Check if we've moved to a new window
    if (currentWindow > this.windowStart) {
      this.counter = 0;
      this.windowStart = currentWindow;
    }
    
    if (this.counter < this.limit) {
      this.counter++;
      return {
        allowed: true,
        remaining: this.limit - this.counter,
        resetAt: (currentWindow + 1) * this.windowSizeMs
      };
    }
    
    return {
      allowed: false,
      remaining: 0,
      resetAt: (currentWindow + 1) * this.windowSizeMs,
      retryAfter: Math.ceil(((currentWindow + 1) * this.windowSizeMs - now) / 1000)
    };
  }
}

The Boundary-Spike Problem

Fixed windows have a critical flaw: requests cluster at window boundaries, allowing bursts up to 2× the intended limit.

Example Scenario:

Limit: 100 requests per minute
At 12:00:59, client makes 100 requests (window 1, counter = 100)
At 12:01:00, window resets, client makes 100 more requests (window 2, counter = 100)
Result: 200 requests in 2 seconds!

Window 1 (12:00-12:01)          Window 2 (12:01-12:02)
|                    ▓▓▓▓▓▓▓▓▓▓▓▓|▓▓▓▓▓▓▓▓▓▓▓                    |
             Last 1s: 100 reqs   │ First 1s: 100 reqs
                                 ↓
                    200 requests in 2 seconds
                    (2× the intended rate!)

This isn't a theoretical problem—it's exploited regularly:

Clients can intentionally sync to boundaries for double capacity
Unintentional traffic patterns cause natural boundary clustering
Cron jobs and scheduled tasks often align to minute/hour boundaries

Real-World Impact

At scale, boundary spikes cause visible problems. If 1,000 clients each make their full quota at boundaries, servers receive 2× expected load every minute. Many production outages trace back to fixed-window boundary effects combined with synchronized client behavior.

Fixed Window Characteristics
Aspect	Characteristic	Impact
Time Complexity	O(1)	Excellent
Space Complexity	O(1) per client	Excellent
Accuracy	±100% at boundaries	Poor
Implementation	Very simple	Easy
Suitable For	Rough limiting, non-critical	Limited use

Sliding Window Log: Perfect Accuracy

The Sliding Window Log algorithm provides perfect accuracy by tracking every individual request timestamp. At any moment, it counts exactly how many requests occurred in the trailing window.

How It Works:

Store the timestamp of every request in a log (sorted list)
When checking a new request:
- Remove all timestamps older than (now - window_size)
- Count remaining timestamps
- If count < limit → allow and add current timestamp
- If count ≥ limit → deny

Visual Representation:

Current time: 12:05:30
Window size: 60 seconds

Log entries: [12:04:31, 12:04:45, 12:04:55, 12:05:10, 12:05:20, 12:05:28]
                ↑           ↑           ↑          ↑          ↑          ↑
           In window     In window   In window  In window  In window  In window
           
Requests in window: 6
Limit: 10
→ This request ALLOWED (6 < 10)

sliding-window-log.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
/**
 * Sliding Window Log Rate Limiter
 * Perfect accuracy but high memory usage
 */
class SlidingWindowLog {
  private timestamps: number[] = [];
  
  constructor(
    private readonly limit: number,
    private readonly windowSizeMs: number
  ) {}
  
  check(): RateLimitResult {
    const now = Date.now();
    const windowStart = now - this.windowSizeMs;
    
    // Remove expired timestamps (entries older than window)
    // Using binary search for efficiency in large logs
    const cutoffIndex = this.binarySearchCutoff(windowStart);
    this.timestamps = this.timestamps.slice(cutoffIndex);
    
    // Check current count
    const currentCount = this.timestamps.length;
    
    if (currentCount < this.limit) {
      // Allow and record this request
      this.timestamps.push(now);
      return {
        allowed: true,
        remaining: this.limit - currentCount - 1,
        resetAt: this.calculateResetTime(now)
      };
    }
    
    // Calculate when the oldest request will expire
    const oldestTimestamp = this.timestamps[0];
    const retryAfter = Math.ceil((oldestTimestamp + this.windowSizeMs - now) / 1000);
    
    return {
      allowed: false,
      remaining: 0,
      resetAt: oldestTimestamp + this.windowSizeMs,
      retryAfter
    };
  }
  
  /**
   * Binary search for first timestamp >= cutoff
   */
  private binarySearchCutoff(cutoff: number): number {
    let low = 0;
    let high = this.timestamps.length;
    
    while (low < high) {
      const mid = Math.floor((low + high) / 2);
      if (this.timestamps[mid] < cutoff) {
        low = mid + 1;
      } else {
        high = mid;
      }
    }
    return low;
  }
  
  private calculateResetTime(now: number): number {
    if (this.timestamps.length === 0) {
      return now + this.windowSizeMs;
    }
    return this.timestamps[0] + this.windowSizeMs;
  }
  
  /**
   * Get detailed status for debugging/monitoring
   */
  getStatus(): WindowLogStatus {
    const now = Date.now();
    const windowStart = now - this.windowSizeMs;
    const validTimestamps = this.timestamps.filter(t => t >= windowStart);
    
    return {
      requestsInWindow: validTimestamps.length,
      limit: this.limit,
      remaining: Math.max(0, this.limit - validTimestamps.length),
      oldestRequest: validTimestamps[0],
      newestRequest: validTimestamps[validTimestamps.length - 1]
    };
  }
}
 
interface WindowLogStatus {
  requestsInWindow: number;
  limit: number;
  remaining: number;
  oldestRequest?: number;
  newestRequest?: number;
}

Redis Implementation with Sorted Sets

Redis sorted sets are ideal for sliding window logs—timestamps serve as scores, enabling efficient range queries:

sliding-window-log-redis.lua
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- Redis Lua script for sliding window log
-- KEYS[1] = rate limit key
-- ARGV[1] = current timestamp (ms)  
-- ARGV[2] = window size (ms)
-- ARGV[3] = limit
 
local key = KEYS[1]
local now = tonumber(ARGV[1])
local windowSize = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local windowStart = now - windowSize
 
-- Remove entries outside the window
redis.call('ZREMRANGEBYSCORE', key, '-inf', windowStart)
 
-- Count entries in window
local count = redis.call('ZCARD', key)
 
if count < limit then
  -- Add new entry with current timestamp as score
  -- Use timestamp + random to handle concurrent requests
  local member = now .. ':' .. math.random(1000000)
  redis.call('ZADD', key, now, member)
  
  -- Set expiry to clean up inactive keys
  redis.call('PEXPIRE', key, windowSize * 2)
  
  return {1, limit - count - 1, 0}  -- allowed, remaining, retryAfter
else
  -- Get oldest entry to calculate retry time
  local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
  local oldestTimestamp = tonumber(oldest[2]) or now
  local retryAfter = math.ceil((oldestTimestamp + windowSize - now) / 1000)
  
  return {0, 0, retryAfter}  -- denied, remaining=0, retryAfter
end

Advantages

•Perfect accuracy — Exact count of requests in trailing window
•No boundary spikes — Window slides continuously
•Precise retry-after — Know exactly when space opens
•Audit trail — Timestamps are stored for analysis

Disadvantages

•High memory — O(limit) per client, can be millions of entries
•Cleanup overhead — Must remove expired entries regularly
•Not constant time — Cleanup is O(expired) operations
•Scale challenges — Memory grows with limit × clients

Memory Analysis

Per-request storage:

Timestamp: 8 bytes
Redis sorted set overhead: ~40 bytes per entry
Total: ~48 bytes per request in window

Example calculation:

Limit: 10,000 requests per hour
Clients: 1 million
Memory per client: 10,000 × 48 = 480 KB
Total memory: 1M × 480 KB = 480 GB

This is clearly impractical for high-volume, high-limit scenarios. The sliding window log is best suited for:

Low limits (< 1,000 per window)
Premium tiers requiring exact limiting
Security-critical APIs where precision justifies cost

Sliding Window Counter: Efficient Approximation

The Sliding Window Counter combines fixed windows' efficiency with sliding windows' smooth behavior by estimating the request count in the current sliding window using weighted counts from adjacent fixed windows.

The Key Insight:

Instead of tracking every timestamp, we track counts in fixed windows and interpolate. If we're 30% into the current fixed window, we weight:

70% of the previous window's count
30% (so far) of the current window's count

This approximation eliminates boundary spikes while using only O(1) memory per client.

The Formula

Estimated requests in sliding window:

sliding_count = (previous_window_count × previous_weight) + current_window_count

where:
  previous_weight = 1 - (elapsed_in_current / window_size)
  elapsed_in_current = current_time - current_window_start

Example:

Limit: 100 requests per minute
Current time: 12:05:30
Window size: 60 seconds

Previous window (12:04:00-12:05:00): 80 requests
Current window (12:05:00-ongoing): 30 requests so far

elapsed_in_current = 30 seconds
previous_weight = 1 - (30/60) = 0.5

sliding_count = (80 × 0.5) + 30
             = 40 + 30
             = 70 requests

Limit = 100, current estimate = 70
→ Request ALLOWED (30 remaining)

sliding-window-counter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
/**
 * Sliding Window Counter Rate Limiter
 * O(1) memory with ~1-2% accuracy variance
 */
class SlidingWindowCounter {
  private previousWindowCount: number = 0;
  private currentWindowCount: number = 0;
  private currentWindowStart: number;
  
  constructor(
    private readonly limit: number,
    private readonly windowSizeMs: number
  ) {
    // Align to window boundary for consistency
    const now = Date.now();
    this.currentWindowStart = Math.floor(now / windowSizeMs) * windowSizeMs;
  }
  
  check(): RateLimitResult {
    const now = Date.now();
    this.updateWindows(now);
    
    // Calculate sliding window estimate
    const elapsedInCurrent = now - this.currentWindowStart;
    const previousWeight = 1 - (elapsedInCurrent / this.windowSizeMs);
    
    const estimatedCount = 
      (this.previousWindowCount * previousWeight) + 
      this.currentWindowCount;
    
    if (estimatedCount < this.limit) {
      this.currentWindowCount++;
      
      // Recalculate after increment
      const newEstimatedCount = 
        (this.previousWindowCount * previousWeight) + 
        this.currentWindowCount;
      
      return {
        allowed: true,
        remaining: Math.max(0, 
          Math.floor(this.limit - newEstimatedCount)
        ),
        resetAt: this.calculateResetTime(now, previousWeight)
      };
    }
    
    // Calculate retry time
    const retryAfter = this.calculateRetryAfter(
      now, 
      estimatedCount, 
      previousWeight
    );
    
    return {
      allowed: false,
      remaining: 0,
      resetAt: this.currentWindowStart + this.windowSizeMs,
      retryAfter
    };
  }
  
  /**
   * Slide windows forward as time progresses
   */
  private updateWindows(now: number): void {
    const currentWindow = Math.floor(now / this.windowSizeMs);
    const storedWindow = Math.floor(this.currentWindowStart / this.windowSizeMs);
    
    if (currentWindow === storedWindow) {
      // Still in the same window
      return;
    }
    
    if (currentWindow === storedWindow + 1) {
      // Moved to the next window
      this.previousWindowCount = this.currentWindowCount;
      this.currentWindowCount = 0;
      this.currentWindowStart = currentWindow * this.windowSizeMs;
    } else {
      // Skipped one or more windows (long idle period)
      this.previousWindowCount = 0;
      this.currentWindowCount = 0;
      this.currentWindowStart = currentWindow * this.windowSizeMs;
    }
  }
  
  /**
   * Calculate when to retry if rate limited
   */
  private calculateRetryAfter(
    now: number, 
    currentCount: number, 
    previousWeight: number
  ): number {
    // We need: (prev × newWeight) + current < limit
    // As time passes, previousWeight decreases
    // Find when the estimate drops below the limit
    
    const excess = currentCount - this.limit + 1;
    if (excess <= 0) return 0;
    
    // Simplified: wait for more previous window to expire
    const fractionToWait = excess / this.previousWindowCount;
    const msToWait = fractionToWait * this.windowSizeMs;
    
    return Math.ceil(msToWait / 1000);
  }
  
  private calculateResetTime(now: number, previousWeight: number): number {
    // Window resets when current window fully expires
    return this.currentWindowStart + (2 * this.windowSizeMs);
  }
}

Redis Implementation

The sliding window counter requires just two keys per client:

sliding-window-counter-redis.lua
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- Redis Lua script for sliding window counter
-- KEYS[1] = current window key (e.g., "ratelimit:user123:202301011200")
-- KEYS[2] = previous window key (e.g., "ratelimit:user123:202301011159")
-- ARGV[1] = limit
-- ARGV[2] = window size in seconds
-- ARGV[3] = current timestamp (ms)
 
local currentKey = KEYS[1]
local previousKey = KEYS[2]
local limit = tonumber(ARGV[1])
local windowSize = tonumber(ARGV[2]) * 1000  -- Convert to ms
local now = tonumber(ARGV[3])
 
-- Get counts from both windows
local previousCount = tonumber(redis.call('GET', previousKey) or '0')
local currentCount = tonumber(redis.call('GET', currentKey) or '0')
 
-- Calculate the sliding window estimate
local currentWindowStart = math.floor(now / windowSize) * windowSize
local elapsedInCurrent = now - currentWindowStart
local previousWeight = 1 - (elapsedInCurrent / windowSize)
 
local estimate = (previousCount * previousWeight) + currentCount
 
if estimate < limit then
  -- Increment current window counter
  redis.call('INCR', currentKey)
  
  -- Set TTL (2 window sizes to cover previous + current)
  redis.call('PEXPIRE', currentKey, windowSize * 2)
  
  -- Calculate new remaining
  local newEstimate = (previousCount * previousWeight) + currentCount + 1
  local remaining = math.max(0, math.floor(limit - newEstimate))
  
  return {1, remaining, 0}  -- allowed, remaining, retryAfter
else
  -- Calculate retry time
  local excess = estimate - limit + 1
  local fractionToWait = excess / (previousCount + 0.01)  -- Avoid div by 0
  local msToWait = fractionToWait * windowSize
  local retryAfter = math.ceil(msToWait / 1000)
  
  return {0, 0, retryAfter}  -- denied, remaining, retryAfter
end

Key Naming Convention

Use predictable key names that include the window timestamp: ratelimit:{client}:{window_number}. This allows calculating previous and current window keys from any timestamp. Example: ratelimit:user123:1672531200 where 1672531200 is the window start epoch.

Sliding Window Counter Characteristics
Aspect	Value	Notes
Time Complexity	O(1)	Fixed number of operations
Space Complexity	O(1)	Just 2 counters per client
Accuracy	~98-99%	Slightly conservative estimate
Boundary Spikes	Eliminated	Smooth transition between windows
Memory per Client	~32 bytes	Two 8-byte counters + overhead

Accuracy Analysis

The sliding window counter makes an approximation that has bounded error. Understanding this error helps determine when the algorithm is appropriate.

When Is the Approximation Inaccurate?

The approximation assumes requests are uniformly distributed within each window. When traffic is bursty, the estimate can diverge from reality:

Scenario 1: Traffic Front-Loaded

Previous window: 100 requests, all in first 10 seconds
Current time: 30 seconds into current window (50% through)
Actual previous weight: Only 10s of 60s should count → ~17%
Our weight: 1 - 0.5 = 50%

Estimate: 100 × 0.5 = 50 (from previous)
Actual: 100 × 0.17 = 17 (from previous)

→ We over-estimate by 33 requests (conservative error)

Scenario 2: Traffic Back-Loaded

Previous window: 100 requests, all in last 10 seconds
Current time: 30 seconds into current window (50% through)
Actual previous weight: All 100 still relevant → 100%
Our weight: 1 - 0.5 = 50%

Estimate: 100 × 0.5 = 50 (from previous)
Actual: 100 × 1.0 = 100 (from previous)

→ We under-estimate by 50 requests (allows burst)

Maximum Error Bound

Worst Case: Maximum over-count happens when:

Previous window had all requests in first instant
We're at the beginning of current window (weight ≈ 100%)
Error: all previous requests are counted but none are actually in sliding window
Maximum over-estimate: previousLimit requests

Maximum under-count happens when:

Previous window had all requests in last instant
We're near end of current window (weight ≈ 0%)
Error: few previous requests counted but most are in sliding window
Maximum under-estimate: previousLimit × (1 - 1/windowSize) requests

Practical Error: With real traffic patterns (not adversarial), Cloudflare reports <1% error in production. The algorithm is conservative on average—it tends to allow slightly fewer requests than the exact limit.

Algorithm Accuracy Comparison
Algorithm	Accuracy	Error Pattern	Memory
Fixed Window	±100%	Boundary spikes allow 2× limit	O(1)
Sliding Window Log	100%	Perfect, no error	O(limit)
Sliding Window Counter	~99%	Slight under/over-count based on distribution	O(1)
Token Bucket	N/A	Different model (average rate + burst)	O(1)

When Accuracy Matters Most

For most APIs, the sliding window counter's ~99% accuracy is perfectly acceptable. Use the sliding window log (or combine with external enforcement) for: billing-critical limits where over-allowing has financial impact, security rate limits where precision prevents attacks, and compliance requirements mandating exact enforcement.

Choosing the Right Algorithm

With multiple algorithms available, choosing the right one for your use case is critical. Let's create a decision framework.

Decision Matrix

Use Token Bucket when:

You want to allow bursting while maintaining average rate
Clients have variable traffic patterns (idle periods followed by activity)
The limit is about capacity protection, not strict quota
Example: API rate limits where users can 'save up' unused capacity

Use Fixed Window when:

Simplicity is paramount and precision isn't critical
The limit is very loose (not close to capacity)
You're prototyping or building an MVP
Example: Internal tools with trusted users

Use Sliding Window Counter when:

You need smooth rate limiting without boundary spikes
Memory efficiency is important (high client count)
Small accuracy variance is acceptable
Example: Most public APIs, general-purpose rate limiting

Use Sliding Window Log when:

Perfect accuracy is required
Limits are low (< 1,000 per window)
You need an audit trail of requests
Example: Payment API rate limits, security-critical endpoints

Algorithm Selection Guide
Use Case	Recommended	Rationale
General API rate limiting	Sliding Window Counter	Balance of accuracy and efficiency
High-value endpoints with low limits	Sliding Window Log	Perfect accuracy justifies memory cost
Allowing burst with average rate	Token Bucket	Natural fit for bursty traffic
Per-endpoint limiting with many endpoints	Sliding Window Counter	O(1) memory scales well
Authentication/login attempts	Sliding Window Log	Security requires precision
Prototype/MVP	Fixed Window	Simplest to implement quickly
Combined burst + rate	Token Bucket + Sliding	Layered limiting strategy

Hybrid Approaches

Production systems often combine algorithms:

Token Bucket + Sliding Window Counter:

1. Token Bucket: Allow 100 burst, refill 10/sec
   → Controls instant burst capacity
   
2. Sliding Window Counter: 1000/hour
   → Controls sustained rate

A request must pass BOTH checks. This prevents:

Large bursts from exhausting hourly quota instantly (bucket limits instant burst)
Sustained high rate even with small bursts (sliding window limits average)

Per-Endpoint + Global Limits:

1. Global: 10,000 requests/hour per user (any endpoint)
2. /search: 100 requests/minute per user  
3. /export: 10 requests/hour per user

Checking all applicable rules ensures expensive operations don't consume all quota while still limiting overall usage.

Start Simple, Add Complexity

Begin with a single sliding window counter for your entire API. Monitor usage patterns. Add endpoint-specific limits only for endpoints that need them. Layer token buckets only if burst control becomes important. Complexity has maintenance costs—add it only when data justifies it.

Advanced Optimizations

For systems handling millions of requests per second, even O(1) algorithms need optimization. Let's explore techniques used at massive scale.

Multi-Resolution Windows

Instead of one window size, use multiple resolutions for different purposes:

// Coarse windows for long-term limits
const hourlyCounter = new SlidingWindowCounter(10000, 3600 * 1000);

// Fine windows for burst prevention
const secondCounter = new SlidingWindowCounter(100, 1000);

function checkLimit(request: Request): boolean {
  // Short-term check first (faster to fail)
  if (!secondCounter.check().allowed) return false;
  
  // Then long-term check
  if (!hourlyCounter.check().allowed) return false;
  
  return true;
}

Benefits:

Fine-grained windows catch bursts immediately
Coarse windows track overall quota accurately
Check order optimizes for common rejection case

Probabilistic Counting with HyperLogLog

For extremely high limits or cardinality counting (unique IPs), exact counting is impractical. HyperLogLog provides approximate counting with bounded error:

-- Redis HyperLogLog for unique visitors per hour
local key = 'ratelimit:unique_ips:' .. window_hour
redis.call('PFADD', key, client_ip)
redis.call('EXPIRE', key, 7200)  -- 2 hours

local unique_count = redis.call('PFCOUNT', key)
if unique_count > limit then
  return 0  -- Rate limited
end

HyperLogLog characteristics:

Memory: 12 KB per counter regardless of cardinality
Error: ±0.81% standard error
Use case: Unique visitor limits, fraud detection

optimized-sliding-window.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
/**
 * Optimized Sliding Window with local caching
 * Reduces Redis round-trips for high-throughput scenarios
 */
class OptimizedSlidingWindow {
  private localCache: Map<string, LocalCounter> = new Map();
  private readonly localIncrement: number;
  private readonly syncIntervalMs: number;
  
  constructor(
    private readonly redis: RedisClient,
    private readonly limit: number,
    private readonly windowSizeMs: number,
    nodeCount: number = 10
  ) {
    // Each node gets a portion of the limit for local counting
    this.localIncrement = Math.max(1, Math.floor(limit / (nodeCount * 10)));
    this.syncIntervalMs = 100;  // Sync with Redis every 100ms
    
    // Periodic sync
    setInterval(() => this.syncToRedis(), this.syncIntervalMs);
  }
  
  async check(clientId: string): Promise<RateLimitResult> {
    let local = this.localCache.get(clientId);
    const now = Date.now();
    
    if (!local || local.expiry < now) {
      // Cache miss or expired - fetch from Redis
      const redisResult = await this.fetchFromRedis(clientId, now);
      local = {
        count: redisResult.count,
        pending: 0,
        expiry: now + this.syncIntervalMs
      };
      this.localCache.set(clientId, local);
    }
    
    // Estimate: Redis count + pending local increments
    const estimate = local.count + local.pending;
    
    if (estimate < this.limit) {
      // Increment locally, batch sync to Redis
      local.pending++;
      return {
        allowed: true,
        remaining: this.limit - estimate - 1,
        resetAt: this.calculateReset(now)
      };
    }
    
    return {
      allowed: false,
      remaining: 0,
      resetAt: this.calculateReset(now),
      retryAfter: Math.ceil(this.windowSizeMs / 1000 / this.limit)
    };
  }
  
  private async syncToRedis(): Promise<void> {
    const pipeline = this.redis.pipeline();
    
    for (const [clientId, local] of this.localCache) {
      if (local.pending > 0) {
        // Batch increment
        pipeline.incrby(this.getRedisKey(clientId), local.pending);
        local.pending = 0;
      }
    }
    
    await pipeline.exec();
  }
  
  private async fetchFromRedis(clientId: string, now: number): Promise<{count: number}> {
    const key = this.getRedisKey(clientId);
    const count = await this.redis.get(key);
    return { count: parseInt(count || '0', 10) };
  }
  
  private getRedisKey(clientId: string): string {
    const window = Math.floor(Date.now() / this.windowSizeMs);
    return `ratelimit:${clientId}:${window}`;
  }
  
  private calculateReset(now: number): number {
    const windowStart = Math.floor(now / this.windowSizeMs) * this.windowSizeMs;
    return windowStart + this.windowSizeMs;
  }
}
 
interface LocalCounter {
  count: number;    // Last known Redis count
  pending: number;  // Local increments not yet synced
  expiry: number;   // When to refresh from Redis
}

Summary: Sliding Window Algorithms

We've explored the family of sliding window algorithms that solve the boundary-spike problem of fixed windows. Let's consolidate our understanding:

Key Takeaways

•Fixed windows are insufficient — Boundary spikes allow 2× limits, creating real-world problems at scale.
•Sliding Window Log is precise — Perfect accuracy by tracking every timestamp, but O(limit) memory per client.
•Sliding Window Counter is practical — O(1) memory with ~99% accuracy through weighted interpolation.
•Choose based on requirements — Token bucket for burst control, sliding window for time-based limits.
•Hybrid approaches are powerful — Combine algorithms for layered protection (burst + sustained).
•Optimize for your scale — Local caching, batched updates, and probabilistic counting for extreme throughput.

What's Next:

We've mastered single-node rate limiting algorithms. But in production, APIs run on multiple servers across multiple regions. How do we coordinate rate limits across distributed systems? In the next page, we'll explore Distributed Rate Limiting—the challenge of maintaining consistent limits when state is spread across a fleet of servers.

Page Complete

You now understand the full spectrum of sliding window algorithms—from the simple fixed window to the precise sliding window log to the efficient sliding window counter. You can analyze accuracy trade-offs, choose appropriate algorithms, and implement optimized solutions. Next, we'll tackle the distributed systems challenge.