System Design (HLD)API Security

API Security

LevelAdvanced

Duration90 mins

TopicAPI Security

4 / 5

Rate Limiting for Security

Defense Through Throttling

Every public API is under constant attack. Automated scripts probe for weaknesses, botnets attempt credential stuffing, and competitors scrape data. Without rate limiting, a single malicious actor can overwhelm your infrastructure, exhaust your resources, and deny service to legitimate users.

Rate limiting for security goes far beyond simple "100 requests per minute" throttling. Modern rate limiting systems must:

Distinguish attack patterns from legitimate burst traffic
Apply different limits based on endpoint sensitivity
Detect and block credential stuffing and brute-force attacks
Protect against enumeration of users, resources, and IDs
Implement progressive penalties for repeated violations
Maintain service for legitimate users during attacks

This page explores rate limiting as a core security control, not just a resource management tool.

What You Will Learn

By the end of this page, you will understand rate limiting algorithms (token bucket, sliding window), security-specific patterns for authentication endpoints, distributed rate limiting architectures, bypass prevention techniques, and intelligent abuse detection systems that adapt to attack patterns.

Security-Focused Rate Limiting

Traditional rate limiting protects resources from overload. Security-focused rate limiting protects assets from exploitation. The distinction matters:

Resource Protection:

Goal: Prevent server overload
Metric: Requests per time period
Response: Return 429, retry later
Scope: Usually per-client or global

Security Protection:

Goal: Prevent credential stuffing, brute force, enumeration
Metric: Failed attempts, suspicious patterns, behavioral anomalies
Response: Temporary blocks, CAPTCHA challenges, account locks
Scope: Per-endpoint, per-user, per-IP, combined dimensions

The same API often needs both: a general rate limit protects infrastructure, while security-specific limits protect authentication flows.

Rate Limit Types and Their Security Functions
Rate Limit Type	Primary Purpose	Example	Security Benefit
Global rate limit	Infrastructure protection	10,000 req/min total	DDoS mitigation
Per-API-key limit	Fair usage enforcement	1,000 req/min per key	Limits compromised key blast radius
Per-IP limit	Bot protection	100 req/min per IP	Slows enumeration attacks
Per-user limit	Account protection	10 auth attempts/hour	Prevents brute force
Per-endpoint limit	Sensitive operation protection	5 password resets/day	Prevents abuse of high-value operations
Failure-based limit	Attack detection	5 failed logins trigger lockout	Stops credential stuffing

Rate Limits Are Defense in Depth

Rate limits slow down attacks but don't prevent them. An attacker willing to wait can still eventually brute-force a weak password or enumerate all user IDs. Rate limits buy time for detection and response, and make attacks economically unviable. Always combine with strong authentication, monitoring, and alerting.

Rate Limiting Algorithms

Choosing the right algorithm affects both security effectiveness and user experience. Each algorithm has distinct characteristics for handling burst traffic and edge cases.

Common Algorithms:

Fixed Window Counter: Simple, but allows burst at window boundaries
Sliding Window Log: Precise, but high memory usage
Sliding Window Counter: Balanced accuracy and efficiency
Token Bucket: Smooths traffic, allows controlled bursts
Leaky Bucket: Strict rate enforcement, queues excess

For security applications, sliding window and token bucket are most common. Fixed window has a dangerous edge case where attackers can make 2x the limit by timing requests at window boundaries.

rate_limiters.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
import time
import redis
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Tuple, Optional
 
@dataclass
class RateLimitResult:
    """Result of a rate limit check."""
    allowed: bool
    remaining: int
    reset_at: float
    retry_after: Optional[float] = None
 
 
class RateLimiter(ABC):
    """Base class for rate limiters."""
    
    @abstractmethod
    def check(self, key: str) -> RateLimitResult:
        """Check if request is allowed and consume one token."""
        pass
 
 
class SlidingWindowRateLimiter(RateLimiter):
    """
    Sliding window counter algorithm.
    
    Combines current and previous window counts with weighting,
    providing accurate rate limiting without per-request storage.
    
    Example: 100 req/min limit
    - Previous window (0:00-1:00): 80 requests
    - Current window (1:00-2:00): 30 requests
    - Current time: 1:30 (halfway through current window)
    - Weighted count: 80 * 0.5 + 30 = 70 (under limit)
    """
    
    def __init__(
        self,
        redis_client: redis.Redis,
        limit: int,
        window_seconds: int,
    ):
        self.redis = redis_client
        self.limit = limit
        self.window = window_seconds
    
    def check(self, key: str) -> RateLimitResult:
        now = time.time()
        
        # Current and previous window keys
        current_window = int(now // self.window)
        previous_window = current_window - 1
        
        current_key = f"{key}:{current_window}"
        previous_key = f"{key}:{previous_window}"
        
        # Get counts (pipeline for efficiency)
        pipe = self.redis.pipeline()
        pipe.get(previous_key)
        pipe.get(current_key)
        results = pipe.execute()
        
        previous_count = int(results[0] or 0)
        current_count = int(results[1] or 0)
        
        # Calculate window position (0.0 to 1.0)
        window_position = (now % self.window) / self.window
        
        # Weighted count: previous window contribution decreases over time
        weighted_count = previous_count * (1 - window_position) + current_count
        
        # Check limit
        if weighted_count >= self.limit:
            reset_at = (current_window + 1) * self.window
            return RateLimitResult(
                allowed=False,
                remaining=0,
                reset_at=reset_at,
                retry_after=reset_at - now,
            )
        
        # Increment current window counter
        pipe = self.redis.pipeline()
        pipe.incr(current_key)
        pipe.expire(current_key, self.window * 2)  # Keep for next window's calculation
        pipe.execute()
        
        return RateLimitResult(
            allowed=True,
            remaining=max(0, int(self.limit - weighted_count - 1)),
            reset_at=(current_window + 1) * self.window,
        )
 
 
class TokenBucketRateLimiter(RateLimiter):
    """
    Token bucket algorithm.
    
    Tokens are added at a constant rate up to a maximum.
    Each request consumes one token. Allows controlled bursts
    when bucket is full.
    
    Useful for APIs where occasional bursts are acceptable but
    sustained high rates should be limited.
    """
    
    def __init__(
        self,
        redis_client: redis.Redis,
        capacity: int,
        refill_rate: float,  # tokens per second
    ):
        self.redis = redis_client
        self.capacity = capacity
        self.refill_rate = refill_rate
    
    def check(self, key: str) -> RateLimitResult:
        now = time.time()
        
        # Lua script for atomic token bucket (prevents race conditions)
        lua_script = """
        local key = KEYS[1]
        local capacity = tonumber(ARGV[1])
        local refill_rate = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])
        
        -- Get current state
        local bucket = redis.call('HMGET', key, 'tokens', 'last_update')
        local tokens = tonumber(bucket[1]) or capacity
        local last_update = tonumber(bucket[2]) or now
        
        -- Calculate tokens to add since last update
        local elapsed = now - last_update
        tokens = math.min(capacity, tokens + elapsed * refill_rate)
        
        -- Check if request can be fulfilled
        if tokens < 1 then
            -- Calculate when a token will be available
            local wait_time = (1 - tokens) / refill_rate
            return {0, tokens, wait_time}
        end
        
        -- Consume token
        tokens = tokens - 1
        redis.call('HMSET', key, 'tokens', tokens, 'last_update', now)
        redis.call('EXPIRE', key, 3600)  -- Cleanup after 1 hour of inactivity
        
        return {1, tokens, 0}
        """
        
        result = self.redis.eval(
            lua_script,
            1,
            key,
            self.capacity,
            self.refill_rate,
            now,
        )
        
        allowed, remaining, retry_after = result
        
        return RateLimitResult(
            allowed=bool(allowed),
            remaining=int(remaining),
            reset_at=now + (self.capacity - remaining) / self.refill_rate,
            retry_after=retry_after if retry_after > 0 else None,
        )
 
 
class FailureBasedRateLimiter:
    """
    Rate limiter that tracks failures, not total requests.
    
    Perfect for authentication endpoints where:
    - Successful requests should not count against limit
    - Failed attempts indicate potential attack
    - Progressive lockout after repeated failures
    """
    
    LOCKOUT_LEVELS = [
        (5, 60),      # After 5 failures: 1 minute lockout
        (10, 300),    # After 10 failures: 5 minute lockout
        (15, 1800),   # After 15 failures: 30 minute lockout
        (20, 3600),   # After 20 failures: 1 hour lockout
    ]
    
    def __init__(self, redis_client: redis.Redis, window_seconds: int = 3600):
        self.redis = redis_client
        self.window = window_seconds
    
    def record_failure(self, key: str) -> Tuple[bool, int, Optional[int]]:
        """
        Record a failed attempt.
        
        Returns:
            Tuple of (is_locked_out, failure_count, lockout_seconds)
        """
        now = time.time()
        failure_key = f"failures:{key}"
        
        pipe = self.redis.pipeline()
        pipe.incr(failure_key)
        pipe.expire(failure_key, self.window)
        results = pipe.execute()
        
        failure_count = results[0]
        
        # Determine lockout based on failure count
        lockout_seconds = 0
        for threshold, duration in self.LOCKOUT_LEVELS:
            if failure_count >= threshold:
                lockout_seconds = duration
        
        if lockout_seconds > 0:
            lockout_key = f"lockout:{key}"
            self.redis.setex(lockout_key, lockout_seconds, "1")
            return True, failure_count, lockout_seconds
        
        return False, failure_count, None
    
    def record_success(self, key: str) -> None:
        """
        Record a successful attempt.
        
        Clears failure count and any lockout.
        """
        failure_key = f"failures:{key}"
        lockout_key = f"lockout:{key}"
        
        self.redis.delete(failure_key, lockout_key)
    
    def is_locked_out(self, key: str) -> Tuple[bool, Optional[int]]:
        """
        Check if key is currently locked out.
        
        Returns:
            Tuple of (is_locked_out, seconds_remaining)
        """
        lockout_key = f"lockout:{key}"
        ttl = self.redis.ttl(lockout_key)
        
        if ttl > 0:
            return True, ttl
        
        return False, None

Choosing the Right Algorithm

For authentication endpoints, use failure-based limiting. For general API protection, sliding window offers the best balance. Token bucket is ideal when you want to allow legitimate burst traffic (e.g., initial page loads) while capping sustained rates.

Distributed Rate Limiting

In distributed systems with multiple servers, rate limiting must be coordinated to prevent attackers from rotating requests across servers to bypass limits.

Centralized vs. Local Rate Limiting:

Approach	Accuracy	Latency	Availability	Complexity
Local only	Low	None	High	Low
Centralized (Redis)	High	1-5ms	Medium	Medium
Eventually consistent	Medium	Near-zero	High	High
Synchronous consensus	Very High	10-50ms	Low	Very High

For security purposes, centralized (Redis) is the standard choice. The 1-5ms latency is acceptable, and accuracy matters when detecting attacks.

distributed_limiter.go
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
package ratelimit
 
import (
    "context"
    "fmt"
    "strconv"
    "time"
    
    "github.com/redis/go-redis/v9"
)
 
// DistributedRateLimiter provides cluster-wide rate limiting
type DistributedRateLimiter struct {
    redis        *redis.ClusterClient
    keyPrefix    string
    limit        int
    windowSecs   int
}
 
// LimitConfig defines rate limit parameters
type LimitConfig struct {
    Key          string        // Identifier (IP, user ID, API key)
    Limit        int           // Max requests
    Window       time.Duration // Time window
    Cost         int           // Request cost (default 1)
}
 
// LimitResult contains the rate limit check result
type LimitResult struct {
    Allowed    bool
    Remaining  int
    ResetAt    time.Time
    RetryAfter time.Duration
}
 
// NewDistributedRateLimiter creates a new distributed limiter
func NewDistributedRateLimiter(
    redis *redis.ClusterClient,
    keyPrefix string,
) *DistributedRateLimiter {
    return &DistributedRateLimiter{
        redis:     redis,
        keyPrefix: keyPrefix,
    }
}
 
// Check performs atomic rate limit check using Lua
func (rl *DistributedRateLimiter) Check(
    ctx context.Context,
    config LimitConfig,
) (*LimitResult, error) {
    if config.Cost == 0 {
        config.Cost = 1
    }
    
    windowSecs := int(config.Window.Seconds())
    now := time.Now()
    
    // Sliding window counter with Lua for atomicity
    luaScript := `
    local key = KEYS[1]
    local window = tonumber(ARGV[1])
    local limit = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])
    local cost = tonumber(ARGV[4])
    
    local current_window = math.floor(now / window)
    local previous_window = current_window - 1
    
    local current_key = key .. ":" .. current_window
    local previous_key = key .. ":" .. previous_window
    
    -- Get counts
    local previous_count = tonumber(redis.call('GET', previous_key) or 0)
    local current_count = tonumber(redis.call('GET', current_key) or 0)
    
    -- Calculate weighted count
    local position = (now % window) / window
    local weighted = previous_count * (1 - position) + current_count
    
    -- Check limit
    if weighted + cost > limit then
        local reset_at = (current_window + 1) * window
        return {0, math.max(0, math.floor(limit - weighted)), reset_at}
    end
    
    -- Increment
    redis.call('INCRBY', current_key, cost)
    redis.call('EXPIRE', current_key, window * 2)
    
    local remaining = math.max(0, math.floor(limit - weighted - cost))
    local reset_at = (current_window + 1) * window
    
    return {1, remaining, reset_at}
    `
    
    key := fmt.Sprintf("%s:%s", rl.keyPrefix, config.Key)
    
    result, err := rl.redis.Eval(ctx, luaScript, []string{key},
        windowSecs,
        config.Limit,
        now.Unix(),
        config.Cost,
    ).Slice()
    
    if err != nil {
        return nil, fmt.Errorf("rate limit check failed: %w", err)
    }
    
    allowed := result[0].(int64) == 1
    remaining := int(result[1].(int64))
    resetAt := time.Unix(result[2].(int64), 0)
    
    var retryAfter time.Duration
    if !allowed {
        retryAfter = resetAt.Sub(now)
    }
    
    return &LimitResult{
        Allowed:    allowed,
        Remaining:  remaining,
        ResetAt:    resetAt,
        RetryAfter: retryAfter,
    }, nil
}
 
// MultiDimensionLimiter applies multiple rate limits
type MultiDimensionLimiter struct {
    limiter *DistributedRateLimiter
    limits  []LimitConfig
}
 
// CheckAll applies all configured limits
func (mdl *MultiDimensionLimiter) CheckAll(
    ctx context.Context,
    dimensions map[string]string,
) (*LimitResult, string, error) {
    // Check each dimension
    for _, limitConfig := range mdl.limits {
        key, ok := dimensions[limitConfig.Key]
        if !ok {
            continue
        }
        
        config := LimitConfig{
            Key:    fmt.Sprintf("%s:%s", limitConfig.Key, key),
            Limit:  limitConfig.Limit,
            Window: limitConfig.Window,
            Cost:   limitConfig.Cost,
        }
        
        result, err := mdl.limiter.Check(ctx, config)
        if err != nil {
            return nil, "", err
        }
        
        if !result.Allowed {
            return result, limitConfig.Key, nil
        }
    }
    
    // All limits passed
    return &LimitResult{Allowed: true}, "", nil
}
 
// Example configuration for a login endpoint
func NewLoginRateLimiter(redis *redis.ClusterClient) *MultiDimensionLimiter {
    limiter := NewDistributedRateLimiter(redis, "login_limit")
    
    return &MultiDimensionLimiter{
        limiter: limiter,
        limits: []LimitConfig{
            // Per-IP: 10 attempts per minute
            {Key: "ip", Limit: 10, Window: time.Minute},
            // Per-username: 5 attempts per 15 minutes
            {Key: "username", Limit: 5, Window: 15 * time.Minute},
            // Global: 1000 per minute (DDoS protection)
            {Key: "global", Limit: 1000, Window: time.Minute},
        },
    }
}

Graceful Degradation

If Redis is unavailable, don't fail open (allow all) or fail closed (block all). Implement fallback to local rate limiting with conservative limits. This maintains protection while avoiding cascading failures. Log centralized limiter failures for investigation.

Authentication Endpoint Protection

Authentication endpoints (login, password reset, registration) are the most attacked surfaces of any API. They require specialized rate limiting strategies that go beyond simple request counting.

Attack Patterns to Defend Against:

Credential Stuffing: Automated testing of stolen username/password pairs
Brute Force: Guessing passwords for known usernames
Username Enumeration: Discovering valid usernames from timing/error differences
Account Lockout DoS: Intentionally locking out legitimate users
Registration Abuse: Mass creation of fake accounts

auth_rate_limiter.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
import hashlib
import logging
from dataclasses import dataclass
from datetime import datetime, timedelta, timezone
from enum import Enum
from typing import Optional, Tuple
 
import redis
 
logger = logging.getLogger(__name__)
 
class AuthAction(Enum):
    LOGIN = "login"
    PASSWORD_RESET = "password_reset"
    REGISTRATION = "registration"
    MFA_VERIFY = "mfa_verify"
 
 
@dataclass
class AuthRateLimitResult:
    allowed: bool
    remaining_attempts: int
    lockout_until: Optional[datetime] = None
    requires_captcha: bool = False
    warning_message: Optional[str] = None
 
 
class AuthenticationRateLimiter:
    """
    Specialized rate limiter for authentication endpoints.
    
    Implements multi-dimensional limiting:
    - Per-IP: Blocks IPs with too many failures
    - Per-username: Protects specific accounts
    - Per-IP+username: Catches targeted attacks
    - Global: DDoS protection
    
    Also implements:
    - Progressive lockouts
    - CAPTCHA triggers
    - Username enumeration protection
    """
    
    # Thresholds for different actions
    LIMITS = {
        AuthAction.LOGIN: {
            "per_ip": (20, 300),       # 20 attempts per 5 minutes
            "per_username": (5, 900),  # 5 attempts per 15 minutes
            "per_combo": (3, 600),     # 3 attempts per 10 minutes
            "captcha_threshold": 3,    # Require CAPTCHA after 3 failures
        },
        AuthAction.PASSWORD_RESET: {
            "per_ip": (10, 3600),      # 10 per hour
            "per_email": (3, 3600),    # 3 per hour per email
            "per_combo": (2, 3600),    # 2 per hour
            "captcha_threshold": 1,    # Always CAPTCHA after 1
        },
        AuthAction.REGISTRATION: {
            "per_ip": (5, 3600),       # 5 registrations per hour per IP
            "captcha_threshold": 1,    # Always CAPTCHA
        },
        AuthAction.MFA_VERIFY: {
            "per_user": (5, 300),      # 5 attempts per 5 minutes
            "lockout_threshold": 5,    # Lock after 5 failures
        },
    }
    
    # Progressive lockout durations
    LOCKOUT_PROGRESSION = [
        (1, 0),        # First offense: no lockout
        (3, 60),       # After 3: 1 minute
        (5, 300),      # After 5: 5 minutes
        (10, 1800),    # After 10: 30 minutes
        (15, 3600),    # After 15: 1 hour
        (20, 86400),   # After 20: 24 hours
    ]
    
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client
    
    def check_login(
        self,
        ip: str,
        username: str,
    ) -> AuthRateLimitResult:
        """
        Check rate limits for login attempt.
        
        Returns AuthRateLimitResult with allowed status and any
        requirements (CAPTCHA, lockout info).
        """
        limits = self.LIMITS[AuthAction.LOGIN]
        
        # Check account lockout first
        lockout_result = self._check_lockout(f"user:{username}")
        if lockout_result:
            return AuthRateLimitResult(
                allowed=False,
                remaining_attempts=0,
                lockout_until=lockout_result,
                warning_message="Account temporarily locked due to repeated failed attempts",
            )
        
        # Check IP-level lockout
        ip_lockout = self._check_lockout(f"ip:{ip}")
        if ip_lockout:
            return AuthRateLimitResult(
                allowed=False,
                remaining_attempts=0,
                lockout_until=ip_lockout,
                warning_message="Too many attempts from this IP address",
            )
        
        # Get failure counts
        ip_failures = self._get_failure_count(f"ip:{ip}", limits["per_ip"][1])
        user_failures = self._get_failure_count(f"user:{username}", limits["per_username"][1])
        combo_failures = self._get_failure_count(
            f"combo:{self._hash_combo(ip, username)}", 
            limits["per_combo"][1]
        )
        
        # Check limits
        if ip_failures >= limits["per_ip"][0]:
            self._apply_lockout(f"ip:{ip}", ip_failures)
            return AuthRateLimitResult(
                allowed=False,
                remaining_attempts=0,
                warning_message="Too many login attempts. Please try again later.",
            )
        
        if user_failures >= limits["per_username"][0]:
            self._apply_lockout(f"user:{username}", user_failures)
            return AuthRateLimitResult(
                allowed=False,
                remaining_attempts=0,
                warning_message="Account protected due to multiple failed attempts.",
            )
        
        # Determine CAPTCHA requirement
        max_failures = max(ip_failures, user_failures, combo_failures)
        requires_captcha = max_failures >= limits["captcha_threshold"]
        
        # Calculate remaining attempts (most restrictive)
        remaining = min(
            limits["per_ip"][0] - ip_failures,
            limits["per_username"][0] - user_failures,
        )
        
        return AuthRateLimitResult(
            allowed=True,
            remaining_attempts=remaining,
            requires_captcha=requires_captcha,
        )
    
    def record_login_failure(
        self,
        ip: str,
        username: str,
    ) -> None:
        """Record a failed login attempt."""
        limits = self.LIMITS[AuthAction.LOGIN]
        
        # Increment all relevant counters
        self._increment_failure(f"ip:{ip}", limits["per_ip"][1])
        self._increment_failure(f"user:{username}", limits["per_username"][1])
        self._increment_failure(
            f"combo:{self._hash_combo(ip, username)}", 
            limits["per_combo"][1]
        )
        
        logger.info(
            f"Login failure recorded",
            extra={"ip": ip, "username_hash": hashlib.sha256(username.encode()).hexdigest()[:8]}
        )
    
    def record_login_success(
        self,
        ip: str,
        username: str,
    ) -> None:
        """Clear failure counts on successful login."""
        # Clear per-user failures (account is now authenticated)
        self.redis.delete(f"failures:user:{username}")
        
        # Clear combo failures
        combo_key = f"failures:combo:{self._hash_combo(ip, username)}"
        self.redis.delete(combo_key)
        
        # Clear any lockout on the account
        self.redis.delete(f"lockout:user:{username}")
        
        # Note: Don't clear IP failures - legitimate user shouldn't
        # clear penalties from previous attack traffic
    
    def _hash_combo(self, ip: str, username: str) -> str:
        """Create deterministic hash of IP + username combo."""
        combined = f"{ip}:{username.lower()}"
        return hashlib.sha256(combined.encode()).hexdigest()[:16]
    
    def _get_failure_count(self, key: str, window: int) -> int:
        """Get failure count for a key within window."""
        full_key = f"failures:{key}"
        count = self.redis.get(full_key)
        return int(count) if count else 0
    
    def _increment_failure(self, key: str, window: int) -> int:
        """Increment failure count with expiry."""
        full_key = f"failures:{key}"
        pipe = self.redis.pipeline()
        pipe.incr(full_key)
        pipe.expire(full_key, window)
        results = pipe.execute()
        return results[0]
    
    def _check_lockout(self, key: str) -> Optional[datetime]:
        """Check if key is locked out, return lockout end time."""
        lockout_key = f"lockout:{key}"
        ttl = self.redis.ttl(lockout_key)
        
        if ttl > 0:
            return datetime.now(timezone.utc) + timedelta(seconds=ttl)
        
        return None
    
    def _apply_lockout(self, key: str, failure_count: int) -> int:
        """Apply progressive lockout based on failure count."""
        lockout_seconds = 0
        for threshold, duration in self.LOCKOUT_PROGRESSION:
            if failure_count >= threshold:
                lockout_seconds = duration
        
        if lockout_seconds > 0:
            lockout_key = f"lockout:{key}"
            self.redis.setex(lockout_key, lockout_seconds, "1")
            
            logger.warning(
                f"Lockout applied",
                extra={"key_hash": hashlib.sha256(key.encode()).hexdigest()[:8], 
                       "duration": lockout_seconds}
            )
        
        return lockout_seconds

Username Enumeration Prevention

Always return identical responses for valid and invalid usernames. Rate limit based on the submitted username regardless of existence. Use consistent response times (add artificial delay if needed). This prevents attackers from building lists of valid usernames.

Bypass Prevention

Sophisticated attackers will attempt to bypass rate limits through various techniques. Your implementation must anticipate and counter these evasion strategies.

Common Bypass Techniques:

Rate Limit Bypass Techniques and Countermeasures
Technique	How It Works	Countermeasure
IP rotation	Use proxy/VPN pool to get new IP for each request	Rate limit by multiple dimensions (IP + fingerprint + behavior)
Distributed attacks	Spread requests across botnet to stay under per-IP limits	Global rate limits, anomaly detection, behavioral analysis
Header manipulation	Spoof X-Forwarded-For to claim different IPs	Configure trusted proxy lists, use CF-Connecting-IP or equivalent
Request variation	Add random query params to bypass caching/dedup	Normalize requests before rate limiting
Slowloris	Keep connections open with slow data to exhaust limits	Connection-level limits, timeouts, concurrent request caps
Window boundary abuse	Time requests at window boundaries for 2x limit	Use sliding window instead of fixed window
Account rotation	Create many accounts to multiply per-user limits	Strict registration limits, phone verification, behavioral analysis

Defense Strategies

•Multi-dimensional limiting — Combine IP, user, API key, device fingerprint, and behavioral signals. Attackers can rotate one dimension but not all.
•Nested limits — Apply both narrow (per-minute) and wide (per-day) windows. Sustained low-rate attacks hit daily limits.
•Adaptive thresholds — Lower limits during detected attacks. If global suspicious activity increases, tighten per-client limits.
•Request normalization — Strip random query params, normalize paths, canonicalize requests before counting.
•Device fingerprinting — Use browser/device fingerprints as additional rate limit dimension. Proxy rotation doesn't change fingerprints.
•Behavioral analysis — Flag accounts with unusual patterns (too-regular timing, inhuman speeds, odd geographic patterns).
•CAPTCHA escalation — Require CAPTCHA after threshold, then harder CAPTCHA, then block. Makes automation economically unviable.

Trusted Proxy Configuration

Never trust X-Forwarded-For from untrusted sources. Configure your reverse proxy (nginx, Cloudflare, AWS ALB) as the trusted source, and use their specific headers (CF-Connecting-IP, X-Real-IP). Attackers can trivially spoof X-Forwarded-For if you don't filter it.

Communicating Rate Limits

Good APIs communicate rate limit status clearly, allowing legitimate clients to adjust behavior while not revealing too much to attackers.

Standard Headers (RFC 6585 & Draft RateLimit Headers):

HTTP/1.1 200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 95
RateLimit-Reset: 1640995200
Retry-After: 60

Header Definitions:

RateLimit-Limit: Maximum requests allowed in window
RateLimit-Remaining: Requests remaining in current window
RateLimit-Reset: Unix timestamp when limit resets
Retry-After: Seconds to wait before retrying (on 429 response)

rate_limit_response.go
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
package ratelimit
 
import (
    "fmt"
    "net/http"
    "strconv"
    "time"
)
 
// RateLimitHeaders adds rate limit info to response headers
func AddRateLimitHeaders(
    w http.ResponseWriter,
    result *LimitResult,
    limit int,
) {
    // Standard draft headers
    w.Header().Set("RateLimit-Limit", strconv.Itoa(limit))
    w.Header().Set("RateLimit-Remaining", strconv.Itoa(result.Remaining))
    w.Header().Set("RateLimit-Reset", strconv.FormatInt(result.ResetAt.Unix(), 10))
    
    // Legacy headers for compatibility
    w.Header().Set("X-RateLimit-Limit", strconv.Itoa(limit))
    w.Header().Set("X-RateLimit-Remaining", strconv.Itoa(result.Remaining))
    w.Header().Set("X-RateLimit-Reset", strconv.FormatInt(result.ResetAt.Unix(), 10))
}
 
// RateLimitedResponse sends proper 429 response
func SendRateLimitedResponse(
    w http.ResponseWriter,
    result *LimitResult,
    limit int,
    limitType string, // "ip", "user", "api_key", etc.
) {
    // Add headers
    AddRateLimitHeaders(w, result, limit)
    
    // Add Retry-After
    if result.RetryAfter > 0 {
        w.Header().Set("Retry-After", strconv.Itoa(int(result.RetryAfter.Seconds())))
    }
    
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusTooManyRequests)
    
    // Response body - be careful not to reveal too much
    // Don't specify exact limit type to attackers
    response := fmt.Sprintf(`{
    "error": {
        "code": "rate_limit_exceeded",
        "message": "Too many requests. Please retry after %d seconds.",
        "retry_after": %d
    }
}`, int(result.RetryAfter.Seconds()), int(result.RetryAfter.Seconds()))
    
    w.Write([]byte(response))
}
 
// RateLimitMiddleware creates HTTP middleware for rate limiting
func RateLimitMiddleware(
    limiter *DistributedRateLimiter,
    keyExtractor func(*http.Request) string,
    limit int,
    window time.Duration,
) func(http.Handler) http.Handler {
    return func(next http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            key := keyExtractor(r)
            
            config := LimitConfig{
                Key:    key,
                Limit:  limit,
                Window: window,
            }
            
            result, err := limiter.Check(r.Context(), config)
            if err != nil {
                // Log error, don't fail open in production
                // For security, fail closed on error
                http.Error(w, "Service unavailable", http.StatusServiceUnavailable)
                return
            }
            
            // Always add headers (even on success)
            AddRateLimitHeaders(w, result, limit)
            
            if !result.Allowed {
                SendRateLimitedResponse(w, result, limit, "request")
                return
            }
            
            next.ServeHTTP(w, r)
        })
    }
}
 
// IPExtractor gets client IP respecting proxies
func IPExtractor(trustedProxies []string) func(*http.Request) string {
    return func(r *http.Request) string {
        // If behind Cloudflare
        if cfIP := r.Header.Get("CF-Connecting-IP"); cfIP != "" {
            return cfIP
        }
        
        // If behind AWS ALB
        if xffIP := r.Header.Get("X-Forwarded-For"); xffIP != "" {
            // Take the leftmost IP (first client)
            // In production, validate against trusted proxy list
            return strings.Split(xffIP, ",")[0]
        }
        
        // Direct connection
        return strings.Split(r.RemoteAddr, ":")[0]
    }
}

Security Information Disclosure

For security-sensitive endpoints (login, password reset), don't reveal rate limit details to potential attackers. Return generic '429 Too Many Requests' without specifying which limit was hit or exact remaining attempts. This prevents attackers from optimizing their approach.

Monitoring and Alerting

Rate limiting generates valuable security intelligence. Monitoring rate limit events helps detect attacks, tune thresholds, and understand usage patterns.

Key Metrics to Track:

Rate Limit Monitoring Metrics

•Rate limit triggers per endpoint — Which endpoints are being attacked or overused?
•Unique IPs hitting limits — Is this distributed attack or single actor?
•Limit type distribution — Are IP limits, user limits, or global limits being hit?
•Geographic distribution — Attacks often cluster by region
•Time patterns — Attack timing, business hour patterns, anomalies
•Repeat offenders — IPs/users that repeatedly hit limits
•Legitimate user impact — Are real users being rate limited? (critical to minimize)
•Lockout events — Progressive lockouts triggered

Rate Limit Alert Conditions
Condition	Severity	Typical Threshold	Response
Global rate limit approaching	Warning	80% capacity for >5 min	Investigate traffic spike
Unusual IP concentration	Medium	100 limits from single IP/hour	Consider IP block
Auth endpoint spike	High	10x baseline auth failures	Credential stuffing likely
Multiple user lockouts	Medium	50 lockouts/hour	Targeted attack or enumeration
Legitimate user limits	High	Known-good users hitting limits	Review/adjust thresholds
Distributed pattern	Critical	1000 IPs hitting limits simultaneously	Botnet attack, escalate

Integration with SIEM

Export rate limit events to your SIEM (Splunk, ELK, Datadog) for correlation with other security events. Rate limit triggers often correlate with vulnerability scans, brute force campaigns, and other attack activity. The combination provides richer context for investigation.

Summary: Rate Limiting for Security

Rate limiting is a critical security control that slows attacks, protects resources, and provides visibility into malicious activity. Let's consolidate the key concepts:

Key Takeaways

•Security rate limiting differs from resource protection — Focus on failures, suspicious patterns, and attack-specific behaviors rather than just request counts.
•Use sliding window or token bucket — Avoid fixed window algorithms that allow boundary abuse. Sliding window provides the best accuracy/efficiency balance.
•Multi-dimensional limiting is essential — Combine IP, user, API key, and behavioral signals. Attackers can rotate one dimension but not all.
•Authentication endpoints need special treatment — Failure-based limits, progressive lockouts, CAPTCHA escalation, and enumeration protection.
•Anticipate bypass techniques — IP rotation, header spoofing, distributed attacks. Design defenses that counter common evasion strategies.
•Communicate limits appropriately — Clear headers for legitimate clients, minimal information for potential attackers on sensitive endpoints.
•Monitor and alert — Rate limit events are security intelligence. Track patterns, alert on anomalies, correlate with other security events.

What's Next:

Rate limiting controls how many requests can be made, but not what's in those requests. The next page explores Input Validation, covering how to protect your API from injection attacks, malformed data, and malicious payloads that slip past authentication and rate limits.

Page Complete

You now understand rate limiting as a security mechanism, from algorithm selection through distributed implementation, authentication protection, bypass prevention, and monitoring. You can design rate limiting systems that defend against real-world attack patterns. Next, we'll explore input validation to protect against malicious request content.

4 / 5

Loading learning content...

System Design (HLD)API Security

API Security

LevelAdvanced

Duration90 mins

TopicAPI Security

4 / 5

Rate Limiting for Security

Defense Through Throttling

Rate limiting for security goes far beyond simple "100 requests per minute" throttling. Modern rate limiting systems must:

Distinguish attack patterns from legitimate burst traffic
Apply different limits based on endpoint sensitivity
Detect and block credential stuffing and brute-force attacks
Protect against enumeration of users, resources, and IDs
Implement progressive penalties for repeated violations
Maintain service for legitimate users during attacks

This page explores rate limiting as a core security control, not just a resource management tool.

What You Will Learn

Security-Focused Rate Limiting

Traditional rate limiting protects resources from overload. Security-focused rate limiting protects assets from exploitation. The distinction matters:

Resource Protection:

Goal: Prevent server overload
Metric: Requests per time period
Response: Return 429, retry later
Scope: Usually per-client or global

Security Protection:

Goal: Prevent credential stuffing, brute force, enumeration
Metric: Failed attempts, suspicious patterns, behavioral anomalies
Response: Temporary blocks, CAPTCHA challenges, account locks
Scope: Per-endpoint, per-user, per-IP, combined dimensions

The same API often needs both: a general rate limit protects infrastructure, while security-specific limits protect authentication flows.

Rate Limit Types and Their Security Functions
Rate Limit Type	Primary Purpose	Example	Security Benefit
Global rate limit	Infrastructure protection	10,000 req/min total	DDoS mitigation
Per-API-key limit	Fair usage enforcement	1,000 req/min per key	Limits compromised key blast radius
Per-IP limit	Bot protection	100 req/min per IP	Slows enumeration attacks
Per-user limit	Account protection	10 auth attempts/hour	Prevents brute force
Per-endpoint limit	Sensitive operation protection	5 password resets/day	Prevents abuse of high-value operations
Failure-based limit	Attack detection	5 failed logins trigger lockout	Stops credential stuffing

Rate Limits Are Defense in Depth

Rate Limiting Algorithms

Choosing the right algorithm affects both security effectiveness and user experience. Each algorithm has distinct characteristics for handling burst traffic and edge cases.

Common Algorithms:

Fixed Window Counter: Simple, but allows burst at window boundaries
Sliding Window Log: Precise, but high memory usage
Sliding Window Counter: Balanced accuracy and efficiency
Token Bucket: Smooths traffic, allows controlled bursts
Leaky Bucket: Strict rate enforcement, queues excess

rate_limiters.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
import time
import redis
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Tuple, Optional
 
@dataclass
class RateLimitResult:
    """Result of a rate limit check."""
    allowed: bool
    remaining: int
    reset_at: float
    retry_after: Optional[float] = None
 
 
class RateLimiter(ABC):
    """Base class for rate limiters."""
    
    @abstractmethod
    def check(self, key: str) -> RateLimitResult:
        """Check if request is allowed and consume one token."""
        pass
 
 
class SlidingWindowRateLimiter(RateLimiter):
    """
    Sliding window counter algorithm.
    
    Combines current and previous window counts with weighting,
    providing accurate rate limiting without per-request storage.
    
    Example: 100 req/min limit
    - Previous window (0:00-1:00): 80 requests
    - Current window (1:00-2:00): 30 requests
    - Current time: 1:30 (halfway through current window)
    - Weighted count: 80 * 0.5 + 30 = 70 (under limit)
    """
    
    def __init__(
        self,
        redis_client: redis.Redis,
        limit: int,
        window_seconds: int,
    ):
        self.redis = redis_client
        self.limit = limit
        self.window = window_seconds
    
    def check(self, key: str) -> RateLimitResult:
        now = time.time()
        
        # Current and previous window keys
        current_window = int(now // self.window)
        previous_window = current_window - 1
        
        current_key = f"{key}:{current_window}"
        previous_key = f"{key}:{previous_window}"
        
        # Get counts (pipeline for efficiency)
        pipe = self.redis.pipeline()
        pipe.get(previous_key)
        pipe.get(current_key)
        results = pipe.execute()
        
        previous_count = int(results[0] or 0)
        current_count = int(results[1] or 0)
        
        # Calculate window position (0.0 to 1.0)
        window_position = (now % self.window) / self.window
        
        # Weighted count: previous window contribution decreases over time
        weighted_count = previous_count * (1 - window_position) + current_count
        
        # Check limit
        if weighted_count >= self.limit:
            reset_at = (current_window + 1) * self.window
            return RateLimitResult(
                allowed=False,
                remaining=0,
                reset_at=reset_at,
                retry_after=reset_at - now,
            )
        
        # Increment current window counter
        pipe = self.redis.pipeline()
        pipe.incr(current_key)
        pipe.expire(current_key, self.window * 2)  # Keep for next window's calculation
        pipe.execute()
        
        return RateLimitResult(
            allowed=True,
            remaining=max(0, int(self.limit - weighted_count - 1)),
            reset_at=(current_window + 1) * self.window,
        )
 
 
class TokenBucketRateLimiter(RateLimiter):
    """
    Token bucket algorithm.
    
    Tokens are added at a constant rate up to a maximum.
    Each request consumes one token. Allows controlled bursts
    when bucket is full.
    
    Useful for APIs where occasional bursts are acceptable but
    sustained high rates should be limited.
    """
    
    def __init__(
        self,
        redis_client: redis.Redis,
        capacity: int,
        refill_rate: float,  # tokens per second
    ):
        self.redis = redis_client
        self.capacity = capacity
        self.refill_rate = refill_rate
    
    def check(self, key: str) -> RateLimitResult:
        now = time.time()
        
        # Lua script for atomic token bucket (prevents race conditions)
        lua_script = """
        local key = KEYS[1]
        local capacity = tonumber(ARGV[1])
        local refill_rate = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])
        
        -- Get current state
        local bucket = redis.call('HMGET', key, 'tokens', 'last_update')
        local tokens = tonumber(bucket[1]) or capacity
        local last_update = tonumber(bucket[2]) or now
        
        -- Calculate tokens to add since last update
        local elapsed = now - last_update
        tokens = math.min(capacity, tokens + elapsed * refill_rate)
        
        -- Check if request can be fulfilled
        if tokens < 1 then
            -- Calculate when a token will be available
            local wait_time = (1 - tokens) / refill_rate
            return {0, tokens, wait_time}
        end
        
        -- Consume token
        tokens = tokens - 1
        redis.call('HMSET', key, 'tokens', tokens, 'last_update', now)
        redis.call('EXPIRE', key, 3600)  -- Cleanup after 1 hour of inactivity
        
        return {1, tokens, 0}
        """
        
        result = self.redis.eval(
            lua_script,
            1,
            key,
            self.capacity,
            self.refill_rate,
            now,
        )
        
        allowed, remaining, retry_after = result
        
        return RateLimitResult(
            allowed=bool(allowed),
            remaining=int(remaining),
            reset_at=now + (self.capacity - remaining) / self.refill_rate,
            retry_after=retry_after if retry_after > 0 else None,
        )
 
 
class FailureBasedRateLimiter:
    """
    Rate limiter that tracks failures, not total requests.
    
    Perfect for authentication endpoints where:
    - Successful requests should not count against limit
    - Failed attempts indicate potential attack
    - Progressive lockout after repeated failures
    """
    
    LOCKOUT_LEVELS = [
        (5, 60),      # After 5 failures: 1 minute lockout
        (10, 300),    # After 10 failures: 5 minute lockout
        (15, 1800),   # After 15 failures: 30 minute lockout
        (20, 3600),   # After 20 failures: 1 hour lockout
    ]
    
    def __init__(self, redis_client: redis.Redis, window_seconds: int = 3600):
        self.redis = redis_client
        self.window = window_seconds
    
    def record_failure(self, key: str) -> Tuple[bool, int, Optional[int]]:
        """
        Record a failed attempt.
        
        Returns:
            Tuple of (is_locked_out, failure_count, lockout_seconds)
        """
        now = time.time()
        failure_key = f"failures:{key}"
        
        pipe = self.redis.pipeline()
        pipe.incr(failure_key)
        pipe.expire(failure_key, self.window)
        results = pipe.execute()
        
        failure_count = results[0]
        
        # Determine lockout based on failure count
        lockout_seconds = 0
        for threshold, duration in self.LOCKOUT_LEVELS:
            if failure_count >= threshold:
                lockout_seconds = duration
        
        if lockout_seconds > 0:
            lockout_key = f"lockout:{key}"
            self.redis.setex(lockout_key, lockout_seconds, "1")
            return True, failure_count, lockout_seconds
        
        return False, failure_count, None
    
    def record_success(self, key: str) -> None:
        """
        Record a successful attempt.
        
        Clears failure count and any lockout.
        """
        failure_key = f"failures:{key}"
        lockout_key = f"lockout:{key}"
        
        self.redis.delete(failure_key, lockout_key)
    
    def is_locked_out(self, key: str) -> Tuple[bool, Optional[int]]:
        """
        Check if key is currently locked out.
        
        Returns:
            Tuple of (is_locked_out, seconds_remaining)
        """
        lockout_key = f"lockout:{key}"
        ttl = self.redis.ttl(lockout_key)
        
        if ttl > 0:
            return True, ttl
        
        return False, None

Choosing the Right Algorithm

Distributed Rate Limiting

In distributed systems with multiple servers, rate limiting must be coordinated to prevent attackers from rotating requests across servers to bypass limits.

Centralized vs. Local Rate Limiting:

Approach	Accuracy	Latency	Availability	Complexity
Local only	Low	None	High	Low
Centralized (Redis)	High	1-5ms	Medium	Medium
Eventually consistent	Medium	Near-zero	High	High
Synchronous consensus	Very High	10-50ms	Low	Very High

For security purposes, centralized (Redis) is the standard choice. The 1-5ms latency is acceptable, and accuracy matters when detecting attacks.

distributed_limiter.go
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
package ratelimit
 
import (
    "context"
    "fmt"
    "strconv"
    "time"
    
    "github.com/redis/go-redis/v9"
)
 
// DistributedRateLimiter provides cluster-wide rate limiting
type DistributedRateLimiter struct {
    redis        *redis.ClusterClient
    keyPrefix    string
    limit        int
    windowSecs   int
}
 
// LimitConfig defines rate limit parameters
type LimitConfig struct {
    Key          string        // Identifier (IP, user ID, API key)
    Limit        int           // Max requests
    Window       time.Duration // Time window
    Cost         int           // Request cost (default 1)
}
 
// LimitResult contains the rate limit check result
type LimitResult struct {
    Allowed    bool
    Remaining  int
    ResetAt    time.Time
    RetryAfter time.Duration
}
 
// NewDistributedRateLimiter creates a new distributed limiter
func NewDistributedRateLimiter(
    redis *redis.ClusterClient,
    keyPrefix string,
) *DistributedRateLimiter {
    return &DistributedRateLimiter{
        redis:     redis,
        keyPrefix: keyPrefix,
    }
}
 
// Check performs atomic rate limit check using Lua
func (rl *DistributedRateLimiter) Check(
    ctx context.Context,
    config LimitConfig,
) (*LimitResult, error) {
    if config.Cost == 0 {
        config.Cost = 1
    }
    
    windowSecs := int(config.Window.Seconds())
    now := time.Now()
    
    // Sliding window counter with Lua for atomicity
    luaScript := `
    local key = KEYS[1]
    local window = tonumber(ARGV[1])
    local limit = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])
    local cost = tonumber(ARGV[4])
    
    local current_window = math.floor(now / window)
    local previous_window = current_window - 1
    
    local current_key = key .. ":" .. current_window
    local previous_key = key .. ":" .. previous_window
    
    -- Get counts
    local previous_count = tonumber(redis.call('GET', previous_key) or 0)
    local current_count = tonumber(redis.call('GET', current_key) or 0)
    
    -- Calculate weighted count
    local position = (now % window) / window
    local weighted = previous_count * (1 - position) + current_count
    
    -- Check limit
    if weighted + cost > limit then
        local reset_at = (current_window + 1) * window
        return {0, math.max(0, math.floor(limit - weighted)), reset_at}
    end
    
    -- Increment
    redis.call('INCRBY', current_key, cost)
    redis.call('EXPIRE', current_key, window * 2)
    
    local remaining = math.max(0, math.floor(limit - weighted - cost))
    local reset_at = (current_window + 1) * window
    
    return {1, remaining, reset_at}
    `
    
    key := fmt.Sprintf("%s:%s", rl.keyPrefix, config.Key)
    
    result, err := rl.redis.Eval(ctx, luaScript, []string{key},
        windowSecs,
        config.Limit,
        now.Unix(),
        config.Cost,
    ).Slice()
    
    if err != nil {
        return nil, fmt.Errorf("rate limit check failed: %w", err)
    }
    
    allowed := result[0].(int64) == 1
    remaining := int(result[1].(int64))
    resetAt := time.Unix(result[2].(int64), 0)
    
    var retryAfter time.Duration
    if !allowed {
        retryAfter = resetAt.Sub(now)
    }
    
    return &LimitResult{
        Allowed:    allowed,
        Remaining:  remaining,
        ResetAt:    resetAt,
        RetryAfter: retryAfter,
    }, nil
}
 
// MultiDimensionLimiter applies multiple rate limits
type MultiDimensionLimiter struct {
    limiter *DistributedRateLimiter
    limits  []LimitConfig
}
 
// CheckAll applies all configured limits
func (mdl *MultiDimensionLimiter) CheckAll(
    ctx context.Context,
    dimensions map[string]string,
) (*LimitResult, string, error) {
    // Check each dimension
    for _, limitConfig := range mdl.limits {
        key, ok := dimensions[limitConfig.Key]
        if !ok {
            continue
        }
        
        config := LimitConfig{
            Key:    fmt.Sprintf("%s:%s", limitConfig.Key, key),
            Limit:  limitConfig.Limit,
            Window: limitConfig.Window,
            Cost:   limitConfig.Cost,
        }
        
        result, err := mdl.limiter.Check(ctx, config)
        if err != nil {
            return nil, "", err
        }
        
        if !result.Allowed {
            return result, limitConfig.Key, nil
        }
    }
    
    // All limits passed
    return &LimitResult{Allowed: true}, "", nil
}
 
// Example configuration for a login endpoint
func NewLoginRateLimiter(redis *redis.ClusterClient) *MultiDimensionLimiter {
    limiter := NewDistributedRateLimiter(redis, "login_limit")
    
    return &MultiDimensionLimiter{
        limiter: limiter,
        limits: []LimitConfig{
            // Per-IP: 10 attempts per minute
            {Key: "ip", Limit: 10, Window: time.Minute},
            // Per-username: 5 attempts per 15 minutes
            {Key: "username", Limit: 5, Window: 15 * time.Minute},
            // Global: 1000 per minute (DDoS protection)
            {Key: "global", Limit: 1000, Window: time.Minute},
        },
    }
}

Graceful Degradation

Authentication Endpoint Protection

Authentication endpoints (login, password reset, registration) are the most attacked surfaces of any API. They require specialized rate limiting strategies that go beyond simple request counting.

Attack Patterns to Defend Against:

Credential Stuffing: Automated testing of stolen username/password pairs
Brute Force: Guessing passwords for known usernames
Username Enumeration: Discovering valid usernames from timing/error differences
Account Lockout DoS: Intentionally locking out legitimate users
Registration Abuse: Mass creation of fake accounts

auth_rate_limiter.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
import hashlib
import logging
from dataclasses import dataclass
from datetime import datetime, timedelta, timezone
from enum import Enum
from typing import Optional, Tuple
 
import redis
 
logger = logging.getLogger(__name__)
 
class AuthAction(Enum):
    LOGIN = "login"
    PASSWORD_RESET = "password_reset"
    REGISTRATION = "registration"
    MFA_VERIFY = "mfa_verify"
 
 
@dataclass
class AuthRateLimitResult:
    allowed: bool
    remaining_attempts: int
    lockout_until: Optional[datetime] = None
    requires_captcha: bool = False
    warning_message: Optional[str] = None
 
 
class AuthenticationRateLimiter:
    """
    Specialized rate limiter for authentication endpoints.
    
    Implements multi-dimensional limiting:
    - Per-IP: Blocks IPs with too many failures
    - Per-username: Protects specific accounts
    - Per-IP+username: Catches targeted attacks
    - Global: DDoS protection
    
    Also implements:
    - Progressive lockouts
    - CAPTCHA triggers
    - Username enumeration protection
    """
    
    # Thresholds for different actions
    LIMITS = {
        AuthAction.LOGIN: {
            "per_ip": (20, 300),       # 20 attempts per 5 minutes
            "per_username": (5, 900),  # 5 attempts per 15 minutes
            "per_combo": (3, 600),     # 3 attempts per 10 minutes
            "captcha_threshold": 3,    # Require CAPTCHA after 3 failures
        },
        AuthAction.PASSWORD_RESET: {
            "per_ip": (10, 3600),      # 10 per hour
            "per_email": (3, 3600),    # 3 per hour per email
            "per_combo": (2, 3600),    # 2 per hour
            "captcha_threshold": 1,    # Always CAPTCHA after 1
        },
        AuthAction.REGISTRATION: {
            "per_ip": (5, 3600),       # 5 registrations per hour per IP
            "captcha_threshold": 1,    # Always CAPTCHA
        },
        AuthAction.MFA_VERIFY: {
            "per_user": (5, 300),      # 5 attempts per 5 minutes
            "lockout_threshold": 5,    # Lock after 5 failures
        },
    }
    
    # Progressive lockout durations
    LOCKOUT_PROGRESSION = [
        (1, 0),        # First offense: no lockout
        (3, 60),       # After 3: 1 minute
        (5, 300),      # After 5: 5 minutes
        (10, 1800),    # After 10: 30 minutes
        (15, 3600),    # After 15: 1 hour
        (20, 86400),   # After 20: 24 hours
    ]
    
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client
    
    def check_login(
        self,
        ip: str,
        username: str,
    ) -> AuthRateLimitResult:
        """
        Check rate limits for login attempt.
        
        Returns AuthRateLimitResult with allowed status and any
        requirements (CAPTCHA, lockout info).
        """
        limits = self.LIMITS[AuthAction.LOGIN]
        
        # Check account lockout first
        lockout_result = self._check_lockout(f"user:{username}")
        if lockout_result:
            return AuthRateLimitResult(
                allowed=False,
                remaining_attempts=0,
                lockout_until=lockout_result,
                warning_message="Account temporarily locked due to repeated failed attempts",
            )
        
        # Check IP-level lockout
        ip_lockout = self._check_lockout(f"ip:{ip}")
        if ip_lockout:
            return AuthRateLimitResult(
                allowed=False,
                remaining_attempts=0,
                lockout_until=ip_lockout,
                warning_message="Too many attempts from this IP address",
            )
        
        # Get failure counts
        ip_failures = self._get_failure_count(f"ip:{ip}", limits["per_ip"][1])
        user_failures = self._get_failure_count(f"user:{username}", limits["per_username"][1])
        combo_failures = self._get_failure_count(
            f"combo:{self._hash_combo(ip, username)}", 
            limits["per_combo"][1]
        )
        
        # Check limits
        if ip_failures >= limits["per_ip"][0]:
            self._apply_lockout(f"ip:{ip}", ip_failures)
            return AuthRateLimitResult(
                allowed=False,
                remaining_attempts=0,
                warning_message="Too many login attempts. Please try again later.",
            )
        
        if user_failures >= limits["per_username"][0]:
            self._apply_lockout(f"user:{username}", user_failures)
            return AuthRateLimitResult(
                allowed=False,
                remaining_attempts=0,
                warning_message="Account protected due to multiple failed attempts.",
            )
        
        # Determine CAPTCHA requirement
        max_failures = max(ip_failures, user_failures, combo_failures)
        requires_captcha = max_failures >= limits["captcha_threshold"]
        
        # Calculate remaining attempts (most restrictive)
        remaining = min(
            limits["per_ip"][0] - ip_failures,
            limits["per_username"][0] - user_failures,
        )
        
        return AuthRateLimitResult(
            allowed=True,
            remaining_attempts=remaining,
            requires_captcha=requires_captcha,
        )
    
    def record_login_failure(
        self,
        ip: str,
        username: str,
    ) -> None:
        """Record a failed login attempt."""
        limits = self.LIMITS[AuthAction.LOGIN]
        
        # Increment all relevant counters
        self._increment_failure(f"ip:{ip}", limits["per_ip"][1])
        self._increment_failure(f"user:{username}", limits["per_username"][1])
        self._increment_failure(
            f"combo:{self._hash_combo(ip, username)}", 
            limits["per_combo"][1]
        )
        
        logger.info(
            f"Login failure recorded",
            extra={"ip": ip, "username_hash": hashlib.sha256(username.encode()).hexdigest()[:8]}
        )
    
    def record_login_success(
        self,
        ip: str,
        username: str,
    ) -> None:
        """Clear failure counts on successful login."""
        # Clear per-user failures (account is now authenticated)
        self.redis.delete(f"failures:user:{username}")
        
        # Clear combo failures
        combo_key = f"failures:combo:{self._hash_combo(ip, username)}"
        self.redis.delete(combo_key)
        
        # Clear any lockout on the account
        self.redis.delete(f"lockout:user:{username}")
        
        # Note: Don't clear IP failures - legitimate user shouldn't
        # clear penalties from previous attack traffic
    
    def _hash_combo(self, ip: str, username: str) -> str:
        """Create deterministic hash of IP + username combo."""
        combined = f"{ip}:{username.lower()}"
        return hashlib.sha256(combined.encode()).hexdigest()[:16]
    
    def _get_failure_count(self, key: str, window: int) -> int:
        """Get failure count for a key within window."""
        full_key = f"failures:{key}"
        count = self.redis.get(full_key)
        return int(count) if count else 0
    
    def _increment_failure(self, key: str, window: int) -> int:
        """Increment failure count with expiry."""
        full_key = f"failures:{key}"
        pipe = self.redis.pipeline()
        pipe.incr(full_key)
        pipe.expire(full_key, window)
        results = pipe.execute()
        return results[0]
    
    def _check_lockout(self, key: str) -> Optional[datetime]:
        """Check if key is locked out, return lockout end time."""
        lockout_key = f"lockout:{key}"
        ttl = self.redis.ttl(lockout_key)
        
        if ttl > 0:
            return datetime.now(timezone.utc) + timedelta(seconds=ttl)
        
        return None
    
    def _apply_lockout(self, key: str, failure_count: int) -> int:
        """Apply progressive lockout based on failure count."""
        lockout_seconds = 0
        for threshold, duration in self.LOCKOUT_PROGRESSION:
            if failure_count >= threshold:
                lockout_seconds = duration
        
        if lockout_seconds > 0:
            lockout_key = f"lockout:{key}"
            self.redis.setex(lockout_key, lockout_seconds, "1")
            
            logger.warning(
                f"Lockout applied",
                extra={"key_hash": hashlib.sha256(key.encode()).hexdigest()[:8], 
                       "duration": lockout_seconds}
            )
        
        return lockout_seconds

Username Enumeration Prevention

Bypass Prevention

Sophisticated attackers will attempt to bypass rate limits through various techniques. Your implementation must anticipate and counter these evasion strategies.

Common Bypass Techniques:

Rate Limit Bypass Techniques and Countermeasures
Technique	How It Works	Countermeasure
IP rotation	Use proxy/VPN pool to get new IP for each request	Rate limit by multiple dimensions (IP + fingerprint + behavior)
Distributed attacks	Spread requests across botnet to stay under per-IP limits	Global rate limits, anomaly detection, behavioral analysis
Header manipulation	Spoof X-Forwarded-For to claim different IPs	Configure trusted proxy lists, use CF-Connecting-IP or equivalent
Request variation	Add random query params to bypass caching/dedup	Normalize requests before rate limiting
Slowloris	Keep connections open with slow data to exhaust limits	Connection-level limits, timeouts, concurrent request caps
Window boundary abuse	Time requests at window boundaries for 2x limit	Use sliding window instead of fixed window
Account rotation	Create many accounts to multiply per-user limits	Strict registration limits, phone verification, behavioral analysis

Defense Strategies

•Multi-dimensional limiting — Combine IP, user, API key, device fingerprint, and behavioral signals. Attackers can rotate one dimension but not all.
•Nested limits — Apply both narrow (per-minute) and wide (per-day) windows. Sustained low-rate attacks hit daily limits.
•Adaptive thresholds — Lower limits during detected attacks. If global suspicious activity increases, tighten per-client limits.
•Request normalization — Strip random query params, normalize paths, canonicalize requests before counting.
•Device fingerprinting — Use browser/device fingerprints as additional rate limit dimension. Proxy rotation doesn't change fingerprints.
•Behavioral analysis — Flag accounts with unusual patterns (too-regular timing, inhuman speeds, odd geographic patterns).
•CAPTCHA escalation — Require CAPTCHA after threshold, then harder CAPTCHA, then block. Makes automation economically unviable.

Trusted Proxy Configuration

Communicating Rate Limits

Good APIs communicate rate limit status clearly, allowing legitimate clients to adjust behavior while not revealing too much to attackers.

Standard Headers (RFC 6585 & Draft RateLimit Headers):

HTTP/1.1 200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 95
RateLimit-Reset: 1640995200
Retry-After: 60

Header Definitions:

RateLimit-Limit: Maximum requests allowed in window
RateLimit-Remaining: Requests remaining in current window
RateLimit-Reset: Unix timestamp when limit resets
Retry-After: Seconds to wait before retrying (on 429 response)

rate_limit_response.go
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
package ratelimit
 
import (
    "fmt"
    "net/http"
    "strconv"
    "time"
)
 
// RateLimitHeaders adds rate limit info to response headers
func AddRateLimitHeaders(
    w http.ResponseWriter,
    result *LimitResult,
    limit int,
) {
    // Standard draft headers
    w.Header().Set("RateLimit-Limit", strconv.Itoa(limit))
    w.Header().Set("RateLimit-Remaining", strconv.Itoa(result.Remaining))
    w.Header().Set("RateLimit-Reset", strconv.FormatInt(result.ResetAt.Unix(), 10))
    
    // Legacy headers for compatibility
    w.Header().Set("X-RateLimit-Limit", strconv.Itoa(limit))
    w.Header().Set("X-RateLimit-Remaining", strconv.Itoa(result.Remaining))
    w.Header().Set("X-RateLimit-Reset", strconv.FormatInt(result.ResetAt.Unix(), 10))
}
 
// RateLimitedResponse sends proper 429 response
func SendRateLimitedResponse(
    w http.ResponseWriter,
    result *LimitResult,
    limit int,
    limitType string, // "ip", "user", "api_key", etc.
) {
    // Add headers
    AddRateLimitHeaders(w, result, limit)
    
    // Add Retry-After
    if result.RetryAfter > 0 {
        w.Header().Set("Retry-After", strconv.Itoa(int(result.RetryAfter.Seconds())))
    }
    
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusTooManyRequests)
    
    // Response body - be careful not to reveal too much
    // Don't specify exact limit type to attackers
    response := fmt.Sprintf(`{
    "error": {
        "code": "rate_limit_exceeded",
        "message": "Too many requests. Please retry after %d seconds.",
        "retry_after": %d
    }
}`, int(result.RetryAfter.Seconds()), int(result.RetryAfter.Seconds()))
    
    w.Write([]byte(response))
}
 
// RateLimitMiddleware creates HTTP middleware for rate limiting
func RateLimitMiddleware(
    limiter *DistributedRateLimiter,
    keyExtractor func(*http.Request) string,
    limit int,
    window time.Duration,
) func(http.Handler) http.Handler {
    return func(next http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            key := keyExtractor(r)
            
            config := LimitConfig{
                Key:    key,
                Limit:  limit,
                Window: window,
            }
            
            result, err := limiter.Check(r.Context(), config)
            if err != nil {
                // Log error, don't fail open in production
                // For security, fail closed on error
                http.Error(w, "Service unavailable", http.StatusServiceUnavailable)
                return
            }
            
            // Always add headers (even on success)
            AddRateLimitHeaders(w, result, limit)
            
            if !result.Allowed {
                SendRateLimitedResponse(w, result, limit, "request")
                return
            }
            
            next.ServeHTTP(w, r)
        })
    }
}
 
// IPExtractor gets client IP respecting proxies
func IPExtractor(trustedProxies []string) func(*http.Request) string {
    return func(r *http.Request) string {
        // If behind Cloudflare
        if cfIP := r.Header.Get("CF-Connecting-IP"); cfIP != "" {
            return cfIP
        }
        
        // If behind AWS ALB
        if xffIP := r.Header.Get("X-Forwarded-For"); xffIP != "" {
            // Take the leftmost IP (first client)
            // In production, validate against trusted proxy list
            return strings.Split(xffIP, ",")[0]
        }
        
        // Direct connection
        return strings.Split(r.RemoteAddr, ":")[0]
    }
}

Security Information Disclosure

Monitoring and Alerting

Rate limiting generates valuable security intelligence. Monitoring rate limit events helps detect attacks, tune thresholds, and understand usage patterns.

Key Metrics to Track:

Rate Limit Monitoring Metrics

•Rate limit triggers per endpoint — Which endpoints are being attacked or overused?
•Unique IPs hitting limits — Is this distributed attack or single actor?
•Limit type distribution — Are IP limits, user limits, or global limits being hit?
•Geographic distribution — Attacks often cluster by region
•Time patterns — Attack timing, business hour patterns, anomalies
•Repeat offenders — IPs/users that repeatedly hit limits
•Legitimate user impact — Are real users being rate limited? (critical to minimize)
•Lockout events — Progressive lockouts triggered

Rate Limit Alert Conditions
Condition	Severity	Typical Threshold	Response
Global rate limit approaching	Warning	80% capacity for >5 min	Investigate traffic spike
Unusual IP concentration	Medium	100 limits from single IP/hour	Consider IP block
Auth endpoint spike	High	10x baseline auth failures	Credential stuffing likely
Multiple user lockouts	Medium	50 lockouts/hour	Targeted attack or enumeration
Legitimate user limits	High	Known-good users hitting limits	Review/adjust thresholds
Distributed pattern	Critical	1000 IPs hitting limits simultaneously	Botnet attack, escalate

Integration with SIEM

Summary: Rate Limiting for Security

Rate limiting is a critical security control that slows attacks, protects resources, and provides visibility into malicious activity. Let's consolidate the key concepts:

Key Takeaways

•Security rate limiting differs from resource protection — Focus on failures, suspicious patterns, and attack-specific behaviors rather than just request counts.
•Use sliding window or token bucket — Avoid fixed window algorithms that allow boundary abuse. Sliding window provides the best accuracy/efficiency balance.
•Multi-dimensional limiting is essential — Combine IP, user, API key, and behavioral signals. Attackers can rotate one dimension but not all.
•Authentication endpoints need special treatment — Failure-based limits, progressive lockouts, CAPTCHA escalation, and enumeration protection.
•Anticipate bypass techniques — IP rotation, header spoofing, distributed attacks. Design defenses that counter common evasion strategies.
•Communicate limits appropriately — Clear headers for legitimate clients, minimal information for potential attackers on sensitive endpoints.
•Monitor and alert — Rate limit events are security intelligence. Track patterns, alert on anomalies, correlate with other security events.

What's Next:

Page Complete

4 / 5