Loading learning content...
Every public API is under constant attack. Automated scripts probe for weaknesses, botnets attempt credential stuffing, and competitors scrape data. Without rate limiting, a single malicious actor can overwhelm your infrastructure, exhaust your resources, and deny service to legitimate users.
Rate limiting for security goes far beyond simple "100 requests per minute" throttling. Modern rate limiting systems must:
This page explores rate limiting as a core security control, not just a resource management tool.
By the end of this page, you will understand rate limiting algorithms (token bucket, sliding window), security-specific patterns for authentication endpoints, distributed rate limiting architectures, bypass prevention techniques, and intelligent abuse detection systems that adapt to attack patterns.
Traditional rate limiting protects resources from overload. Security-focused rate limiting protects assets from exploitation. The distinction matters:
Resource Protection:
Security Protection:
The same API often needs both: a general rate limit protects infrastructure, while security-specific limits protect authentication flows.
| Rate Limit Type | Primary Purpose | Example | Security Benefit |
|---|---|---|---|
| Global rate limit | Infrastructure protection | 10,000 req/min total | DDoS mitigation |
| Per-API-key limit | Fair usage enforcement | 1,000 req/min per key | Limits compromised key blast radius |
| Per-IP limit | Bot protection | 100 req/min per IP | Slows enumeration attacks |
| Per-user limit | Account protection | 10 auth attempts/hour | Prevents brute force |
| Per-endpoint limit | Sensitive operation protection | 5 password resets/day | Prevents abuse of high-value operations |
| Failure-based limit | Attack detection | 5 failed logins trigger lockout | Stops credential stuffing |
Rate limits slow down attacks but don't prevent them. An attacker willing to wait can still eventually brute-force a weak password or enumerate all user IDs. Rate limits buy time for detection and response, and make attacks economically unviable. Always combine with strong authentication, monitoring, and alerting.
Choosing the right algorithm affects both security effectiveness and user experience. Each algorithm has distinct characteristics for handling burst traffic and edge cases.
Common Algorithms:
For security applications, sliding window and token bucket are most common. Fixed window has a dangerous edge case where attackers can make 2x the limit by timing requests at window boundaries.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247
import timeimport redisfrom abc import ABC, abstractmethodfrom dataclasses import dataclassfrom typing import Tuple, Optional @dataclassclass RateLimitResult: """Result of a rate limit check.""" allowed: bool remaining: int reset_at: float retry_after: Optional[float] = None class RateLimiter(ABC): """Base class for rate limiters.""" @abstractmethod def check(self, key: str) -> RateLimitResult: """Check if request is allowed and consume one token.""" pass class SlidingWindowRateLimiter(RateLimiter): """ Sliding window counter algorithm. Combines current and previous window counts with weighting, providing accurate rate limiting without per-request storage. Example: 100 req/min limit - Previous window (0:00-1:00): 80 requests - Current window (1:00-2:00): 30 requests - Current time: 1:30 (halfway through current window) - Weighted count: 80 * 0.5 + 30 = 70 (under limit) """ def __init__( self, redis_client: redis.Redis, limit: int, window_seconds: int, ): self.redis = redis_client self.limit = limit self.window = window_seconds def check(self, key: str) -> RateLimitResult: now = time.time() # Current and previous window keys current_window = int(now // self.window) previous_window = current_window - 1 current_key = f"{key}:{current_window}" previous_key = f"{key}:{previous_window}" # Get counts (pipeline for efficiency) pipe = self.redis.pipeline() pipe.get(previous_key) pipe.get(current_key) results = pipe.execute() previous_count = int(results[0] or 0) current_count = int(results[1] or 0) # Calculate window position (0.0 to 1.0) window_position = (now % self.window) / self.window # Weighted count: previous window contribution decreases over time weighted_count = previous_count * (1 - window_position) + current_count # Check limit if weighted_count >= self.limit: reset_at = (current_window + 1) * self.window return RateLimitResult( allowed=False, remaining=0, reset_at=reset_at, retry_after=reset_at - now, ) # Increment current window counter pipe = self.redis.pipeline() pipe.incr(current_key) pipe.expire(current_key, self.window * 2) # Keep for next window's calculation pipe.execute() return RateLimitResult( allowed=True, remaining=max(0, int(self.limit - weighted_count - 1)), reset_at=(current_window + 1) * self.window, ) class TokenBucketRateLimiter(RateLimiter): """ Token bucket algorithm. Tokens are added at a constant rate up to a maximum. Each request consumes one token. Allows controlled bursts when bucket is full. Useful for APIs where occasional bursts are acceptable but sustained high rates should be limited. """ def __init__( self, redis_client: redis.Redis, capacity: int, refill_rate: float, # tokens per second ): self.redis = redis_client self.capacity = capacity self.refill_rate = refill_rate def check(self, key: str) -> RateLimitResult: now = time.time() # Lua script for atomic token bucket (prevents race conditions) lua_script = """ local key = KEYS[1] local capacity = tonumber(ARGV[1]) local refill_rate = tonumber(ARGV[2]) local now = tonumber(ARGV[3]) -- Get current state local bucket = redis.call('HMGET', key, 'tokens', 'last_update') local tokens = tonumber(bucket[1]) or capacity local last_update = tonumber(bucket[2]) or now -- Calculate tokens to add since last update local elapsed = now - last_update tokens = math.min(capacity, tokens + elapsed * refill_rate) -- Check if request can be fulfilled if tokens < 1 then -- Calculate when a token will be available local wait_time = (1 - tokens) / refill_rate return {0, tokens, wait_time} end -- Consume token tokens = tokens - 1 redis.call('HMSET', key, 'tokens', tokens, 'last_update', now) redis.call('EXPIRE', key, 3600) -- Cleanup after 1 hour of inactivity return {1, tokens, 0} """ result = self.redis.eval( lua_script, 1, key, self.capacity, self.refill_rate, now, ) allowed, remaining, retry_after = result return RateLimitResult( allowed=bool(allowed), remaining=int(remaining), reset_at=now + (self.capacity - remaining) / self.refill_rate, retry_after=retry_after if retry_after > 0 else None, ) class FailureBasedRateLimiter: """ Rate limiter that tracks failures, not total requests. Perfect for authentication endpoints where: - Successful requests should not count against limit - Failed attempts indicate potential attack - Progressive lockout after repeated failures """ LOCKOUT_LEVELS = [ (5, 60), # After 5 failures: 1 minute lockout (10, 300), # After 10 failures: 5 minute lockout (15, 1800), # After 15 failures: 30 minute lockout (20, 3600), # After 20 failures: 1 hour lockout ] def __init__(self, redis_client: redis.Redis, window_seconds: int = 3600): self.redis = redis_client self.window = window_seconds def record_failure(self, key: str) -> Tuple[bool, int, Optional[int]]: """ Record a failed attempt. Returns: Tuple of (is_locked_out, failure_count, lockout_seconds) """ now = time.time() failure_key = f"failures:{key}" pipe = self.redis.pipeline() pipe.incr(failure_key) pipe.expire(failure_key, self.window) results = pipe.execute() failure_count = results[0] # Determine lockout based on failure count lockout_seconds = 0 for threshold, duration in self.LOCKOUT_LEVELS: if failure_count >= threshold: lockout_seconds = duration if lockout_seconds > 0: lockout_key = f"lockout:{key}" self.redis.setex(lockout_key, lockout_seconds, "1") return True, failure_count, lockout_seconds return False, failure_count, None def record_success(self, key: str) -> None: """ Record a successful attempt. Clears failure count and any lockout. """ failure_key = f"failures:{key}" lockout_key = f"lockout:{key}" self.redis.delete(failure_key, lockout_key) def is_locked_out(self, key: str) -> Tuple[bool, Optional[int]]: """ Check if key is currently locked out. Returns: Tuple of (is_locked_out, seconds_remaining) """ lockout_key = f"lockout:{key}" ttl = self.redis.ttl(lockout_key) if ttl > 0: return True, ttl return False, NoneFor authentication endpoints, use failure-based limiting. For general API protection, sliding window offers the best balance. Token bucket is ideal when you want to allow legitimate burst traffic (e.g., initial page loads) while capping sustained rates.
In distributed systems with multiple servers, rate limiting must be coordinated to prevent attackers from rotating requests across servers to bypass limits.
Centralized vs. Local Rate Limiting:
| Approach | Accuracy | Latency | Availability | Complexity |
|---|---|---|---|---|
| Local only | Low | None | High | Low |
| Centralized (Redis) | High | 1-5ms | Medium | Medium |
| Eventually consistent | Medium | Near-zero | High | High |
| Synchronous consensus | Very High | 10-50ms | Low | Very High |
For security purposes, centralized (Redis) is the standard choice. The 1-5ms latency is acceptable, and accuracy matters when detecting attacks.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181
package ratelimit import ( "context" "fmt" "strconv" "time" "github.com/redis/go-redis/v9") // DistributedRateLimiter provides cluster-wide rate limitingtype DistributedRateLimiter struct { redis *redis.ClusterClient keyPrefix string limit int windowSecs int} // LimitConfig defines rate limit parameterstype LimitConfig struct { Key string // Identifier (IP, user ID, API key) Limit int // Max requests Window time.Duration // Time window Cost int // Request cost (default 1)} // LimitResult contains the rate limit check resulttype LimitResult struct { Allowed bool Remaining int ResetAt time.Time RetryAfter time.Duration} // NewDistributedRateLimiter creates a new distributed limiterfunc NewDistributedRateLimiter( redis *redis.ClusterClient, keyPrefix string,) *DistributedRateLimiter { return &DistributedRateLimiter{ redis: redis, keyPrefix: keyPrefix, }} // Check performs atomic rate limit check using Luafunc (rl *DistributedRateLimiter) Check( ctx context.Context, config LimitConfig,) (*LimitResult, error) { if config.Cost == 0 { config.Cost = 1 } windowSecs := int(config.Window.Seconds()) now := time.Now() // Sliding window counter with Lua for atomicity luaScript := ` local key = KEYS[1] local window = tonumber(ARGV[1]) local limit = tonumber(ARGV[2]) local now = tonumber(ARGV[3]) local cost = tonumber(ARGV[4]) local current_window = math.floor(now / window) local previous_window = current_window - 1 local current_key = key .. ":" .. current_window local previous_key = key .. ":" .. previous_window -- Get counts local previous_count = tonumber(redis.call('GET', previous_key) or 0) local current_count = tonumber(redis.call('GET', current_key) or 0) -- Calculate weighted count local position = (now % window) / window local weighted = previous_count * (1 - position) + current_count -- Check limit if weighted + cost > limit then local reset_at = (current_window + 1) * window return {0, math.max(0, math.floor(limit - weighted)), reset_at} end -- Increment redis.call('INCRBY', current_key, cost) redis.call('EXPIRE', current_key, window * 2) local remaining = math.max(0, math.floor(limit - weighted - cost)) local reset_at = (current_window + 1) * window return {1, remaining, reset_at} ` key := fmt.Sprintf("%s:%s", rl.keyPrefix, config.Key) result, err := rl.redis.Eval(ctx, luaScript, []string{key}, windowSecs, config.Limit, now.Unix(), config.Cost, ).Slice() if err != nil { return nil, fmt.Errorf("rate limit check failed: %w", err) } allowed := result[0].(int64) == 1 remaining := int(result[1].(int64)) resetAt := time.Unix(result[2].(int64), 0) var retryAfter time.Duration if !allowed { retryAfter = resetAt.Sub(now) } return &LimitResult{ Allowed: allowed, Remaining: remaining, ResetAt: resetAt, RetryAfter: retryAfter, }, nil} // MultiDimensionLimiter applies multiple rate limitstype MultiDimensionLimiter struct { limiter *DistributedRateLimiter limits []LimitConfig} // CheckAll applies all configured limitsfunc (mdl *MultiDimensionLimiter) CheckAll( ctx context.Context, dimensions map[string]string,) (*LimitResult, string, error) { // Check each dimension for _, limitConfig := range mdl.limits { key, ok := dimensions[limitConfig.Key] if !ok { continue } config := LimitConfig{ Key: fmt.Sprintf("%s:%s", limitConfig.Key, key), Limit: limitConfig.Limit, Window: limitConfig.Window, Cost: limitConfig.Cost, } result, err := mdl.limiter.Check(ctx, config) if err != nil { return nil, "", err } if !result.Allowed { return result, limitConfig.Key, nil } } // All limits passed return &LimitResult{Allowed: true}, "", nil} // Example configuration for a login endpointfunc NewLoginRateLimiter(redis *redis.ClusterClient) *MultiDimensionLimiter { limiter := NewDistributedRateLimiter(redis, "login_limit") return &MultiDimensionLimiter{ limiter: limiter, limits: []LimitConfig{ // Per-IP: 10 attempts per minute {Key: "ip", Limit: 10, Window: time.Minute}, // Per-username: 5 attempts per 15 minutes {Key: "username", Limit: 5, Window: 15 * time.Minute}, // Global: 1000 per minute (DDoS protection) {Key: "global", Limit: 1000, Window: time.Minute}, }, }}If Redis is unavailable, don't fail open (allow all) or fail closed (block all). Implement fallback to local rate limiting with conservative limits. This maintains protection while avoiding cascading failures. Log centralized limiter failures for investigation.
Authentication endpoints (login, password reset, registration) are the most attacked surfaces of any API. They require specialized rate limiting strategies that go beyond simple request counting.
Attack Patterns to Defend Against:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242
import hashlibimport loggingfrom dataclasses import dataclassfrom datetime import datetime, timedelta, timezonefrom enum import Enumfrom typing import Optional, Tuple import redis logger = logging.getLogger(__name__) class AuthAction(Enum): LOGIN = "login" PASSWORD_RESET = "password_reset" REGISTRATION = "registration" MFA_VERIFY = "mfa_verify" @dataclassclass AuthRateLimitResult: allowed: bool remaining_attempts: int lockout_until: Optional[datetime] = None requires_captcha: bool = False warning_message: Optional[str] = None class AuthenticationRateLimiter: """ Specialized rate limiter for authentication endpoints. Implements multi-dimensional limiting: - Per-IP: Blocks IPs with too many failures - Per-username: Protects specific accounts - Per-IP+username: Catches targeted attacks - Global: DDoS protection Also implements: - Progressive lockouts - CAPTCHA triggers - Username enumeration protection """ # Thresholds for different actions LIMITS = { AuthAction.LOGIN: { "per_ip": (20, 300), # 20 attempts per 5 minutes "per_username": (5, 900), # 5 attempts per 15 minutes "per_combo": (3, 600), # 3 attempts per 10 minutes "captcha_threshold": 3, # Require CAPTCHA after 3 failures }, AuthAction.PASSWORD_RESET: { "per_ip": (10, 3600), # 10 per hour "per_email": (3, 3600), # 3 per hour per email "per_combo": (2, 3600), # 2 per hour "captcha_threshold": 1, # Always CAPTCHA after 1 }, AuthAction.REGISTRATION: { "per_ip": (5, 3600), # 5 registrations per hour per IP "captcha_threshold": 1, # Always CAPTCHA }, AuthAction.MFA_VERIFY: { "per_user": (5, 300), # 5 attempts per 5 minutes "lockout_threshold": 5, # Lock after 5 failures }, } # Progressive lockout durations LOCKOUT_PROGRESSION = [ (1, 0), # First offense: no lockout (3, 60), # After 3: 1 minute (5, 300), # After 5: 5 minutes (10, 1800), # After 10: 30 minutes (15, 3600), # After 15: 1 hour (20, 86400), # After 20: 24 hours ] def __init__(self, redis_client: redis.Redis): self.redis = redis_client def check_login( self, ip: str, username: str, ) -> AuthRateLimitResult: """ Check rate limits for login attempt. Returns AuthRateLimitResult with allowed status and any requirements (CAPTCHA, lockout info). """ limits = self.LIMITS[AuthAction.LOGIN] # Check account lockout first lockout_result = self._check_lockout(f"user:{username}") if lockout_result: return AuthRateLimitResult( allowed=False, remaining_attempts=0, lockout_until=lockout_result, warning_message="Account temporarily locked due to repeated failed attempts", ) # Check IP-level lockout ip_lockout = self._check_lockout(f"ip:{ip}") if ip_lockout: return AuthRateLimitResult( allowed=False, remaining_attempts=0, lockout_until=ip_lockout, warning_message="Too many attempts from this IP address", ) # Get failure counts ip_failures = self._get_failure_count(f"ip:{ip}", limits["per_ip"][1]) user_failures = self._get_failure_count(f"user:{username}", limits["per_username"][1]) combo_failures = self._get_failure_count( f"combo:{self._hash_combo(ip, username)}", limits["per_combo"][1] ) # Check limits if ip_failures >= limits["per_ip"][0]: self._apply_lockout(f"ip:{ip}", ip_failures) return AuthRateLimitResult( allowed=False, remaining_attempts=0, warning_message="Too many login attempts. Please try again later.", ) if user_failures >= limits["per_username"][0]: self._apply_lockout(f"user:{username}", user_failures) return AuthRateLimitResult( allowed=False, remaining_attempts=0, warning_message="Account protected due to multiple failed attempts.", ) # Determine CAPTCHA requirement max_failures = max(ip_failures, user_failures, combo_failures) requires_captcha = max_failures >= limits["captcha_threshold"] # Calculate remaining attempts (most restrictive) remaining = min( limits["per_ip"][0] - ip_failures, limits["per_username"][0] - user_failures, ) return AuthRateLimitResult( allowed=True, remaining_attempts=remaining, requires_captcha=requires_captcha, ) def record_login_failure( self, ip: str, username: str, ) -> None: """Record a failed login attempt.""" limits = self.LIMITS[AuthAction.LOGIN] # Increment all relevant counters self._increment_failure(f"ip:{ip}", limits["per_ip"][1]) self._increment_failure(f"user:{username}", limits["per_username"][1]) self._increment_failure( f"combo:{self._hash_combo(ip, username)}", limits["per_combo"][1] ) logger.info( f"Login failure recorded", extra={"ip": ip, "username_hash": hashlib.sha256(username.encode()).hexdigest()[:8]} ) def record_login_success( self, ip: str, username: str, ) -> None: """Clear failure counts on successful login.""" # Clear per-user failures (account is now authenticated) self.redis.delete(f"failures:user:{username}") # Clear combo failures combo_key = f"failures:combo:{self._hash_combo(ip, username)}" self.redis.delete(combo_key) # Clear any lockout on the account self.redis.delete(f"lockout:user:{username}") # Note: Don't clear IP failures - legitimate user shouldn't # clear penalties from previous attack traffic def _hash_combo(self, ip: str, username: str) -> str: """Create deterministic hash of IP + username combo.""" combined = f"{ip}:{username.lower()}" return hashlib.sha256(combined.encode()).hexdigest()[:16] def _get_failure_count(self, key: str, window: int) -> int: """Get failure count for a key within window.""" full_key = f"failures:{key}" count = self.redis.get(full_key) return int(count) if count else 0 def _increment_failure(self, key: str, window: int) -> int: """Increment failure count with expiry.""" full_key = f"failures:{key}" pipe = self.redis.pipeline() pipe.incr(full_key) pipe.expire(full_key, window) results = pipe.execute() return results[0] def _check_lockout(self, key: str) -> Optional[datetime]: """Check if key is locked out, return lockout end time.""" lockout_key = f"lockout:{key}" ttl = self.redis.ttl(lockout_key) if ttl > 0: return datetime.now(timezone.utc) + timedelta(seconds=ttl) return None def _apply_lockout(self, key: str, failure_count: int) -> int: """Apply progressive lockout based on failure count.""" lockout_seconds = 0 for threshold, duration in self.LOCKOUT_PROGRESSION: if failure_count >= threshold: lockout_seconds = duration if lockout_seconds > 0: lockout_key = f"lockout:{key}" self.redis.setex(lockout_key, lockout_seconds, "1") logger.warning( f"Lockout applied", extra={"key_hash": hashlib.sha256(key.encode()).hexdigest()[:8], "duration": lockout_seconds} ) return lockout_secondsAlways return identical responses for valid and invalid usernames. Rate limit based on the submitted username regardless of existence. Use consistent response times (add artificial delay if needed). This prevents attackers from building lists of valid usernames.
Sophisticated attackers will attempt to bypass rate limits through various techniques. Your implementation must anticipate and counter these evasion strategies.
Common Bypass Techniques:
| Technique | How It Works | Countermeasure |
|---|---|---|
| IP rotation | Use proxy/VPN pool to get new IP for each request | Rate limit by multiple dimensions (IP + fingerprint + behavior) |
| Distributed attacks | Spread requests across botnet to stay under per-IP limits | Global rate limits, anomaly detection, behavioral analysis |
| Header manipulation | Spoof X-Forwarded-For to claim different IPs | Configure trusted proxy lists, use CF-Connecting-IP or equivalent |
| Request variation | Add random query params to bypass caching/dedup | Normalize requests before rate limiting |
| Slowloris | Keep connections open with slow data to exhaust limits | Connection-level limits, timeouts, concurrent request caps |
| Window boundary abuse | Time requests at window boundaries for 2x limit | Use sliding window instead of fixed window |
| Account rotation | Create many accounts to multiply per-user limits | Strict registration limits, phone verification, behavioral analysis |
Never trust X-Forwarded-For from untrusted sources. Configure your reverse proxy (nginx, Cloudflare, AWS ALB) as the trusted source, and use their specific headers (CF-Connecting-IP, X-Real-IP). Attackers can trivially spoof X-Forwarded-For if you don't filter it.
Good APIs communicate rate limit status clearly, allowing legitimate clients to adjust behavior while not revealing too much to attackers.
Standard Headers (RFC 6585 & Draft RateLimit Headers):
HTTP/1.1 200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 95
RateLimit-Reset: 1640995200
Retry-After: 60
Header Definitions:
RateLimit-Limit: Maximum requests allowed in windowRateLimit-Remaining: Requests remaining in current windowRateLimit-Reset: Unix timestamp when limit resetsRetry-After: Seconds to wait before retrying (on 429 response)123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114
package ratelimit import ( "fmt" "net/http" "strconv" "time") // RateLimitHeaders adds rate limit info to response headersfunc AddRateLimitHeaders( w http.ResponseWriter, result *LimitResult, limit int,) { // Standard draft headers w.Header().Set("RateLimit-Limit", strconv.Itoa(limit)) w.Header().Set("RateLimit-Remaining", strconv.Itoa(result.Remaining)) w.Header().Set("RateLimit-Reset", strconv.FormatInt(result.ResetAt.Unix(), 10)) // Legacy headers for compatibility w.Header().Set("X-RateLimit-Limit", strconv.Itoa(limit)) w.Header().Set("X-RateLimit-Remaining", strconv.Itoa(result.Remaining)) w.Header().Set("X-RateLimit-Reset", strconv.FormatInt(result.ResetAt.Unix(), 10))} // RateLimitedResponse sends proper 429 responsefunc SendRateLimitedResponse( w http.ResponseWriter, result *LimitResult, limit int, limitType string, // "ip", "user", "api_key", etc.) { // Add headers AddRateLimitHeaders(w, result, limit) // Add Retry-After if result.RetryAfter > 0 { w.Header().Set("Retry-After", strconv.Itoa(int(result.RetryAfter.Seconds()))) } w.Header().Set("Content-Type", "application/json") w.WriteHeader(http.StatusTooManyRequests) // Response body - be careful not to reveal too much // Don't specify exact limit type to attackers response := fmt.Sprintf(`{ "error": { "code": "rate_limit_exceeded", "message": "Too many requests. Please retry after %d seconds.", "retry_after": %d }}`, int(result.RetryAfter.Seconds()), int(result.RetryAfter.Seconds())) w.Write([]byte(response))} // RateLimitMiddleware creates HTTP middleware for rate limitingfunc RateLimitMiddleware( limiter *DistributedRateLimiter, keyExtractor func(*http.Request) string, limit int, window time.Duration,) func(http.Handler) http.Handler { return func(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { key := keyExtractor(r) config := LimitConfig{ Key: key, Limit: limit, Window: window, } result, err := limiter.Check(r.Context(), config) if err != nil { // Log error, don't fail open in production // For security, fail closed on error http.Error(w, "Service unavailable", http.StatusServiceUnavailable) return } // Always add headers (even on success) AddRateLimitHeaders(w, result, limit) if !result.Allowed { SendRateLimitedResponse(w, result, limit, "request") return } next.ServeHTTP(w, r) }) }} // IPExtractor gets client IP respecting proxiesfunc IPExtractor(trustedProxies []string) func(*http.Request) string { return func(r *http.Request) string { // If behind Cloudflare if cfIP := r.Header.Get("CF-Connecting-IP"); cfIP != "" { return cfIP } // If behind AWS ALB if xffIP := r.Header.Get("X-Forwarded-For"); xffIP != "" { // Take the leftmost IP (first client) // In production, validate against trusted proxy list return strings.Split(xffIP, ",")[0] } // Direct connection return strings.Split(r.RemoteAddr, ":")[0] }}For security-sensitive endpoints (login, password reset), don't reveal rate limit details to potential attackers. Return generic '429 Too Many Requests' without specifying which limit was hit or exact remaining attempts. This prevents attackers from optimizing their approach.
Rate limiting generates valuable security intelligence. Monitoring rate limit events helps detect attacks, tune thresholds, and understand usage patterns.
Key Metrics to Track:
| Condition | Severity | Typical Threshold | Response |
|---|---|---|---|
| Global rate limit approaching | Warning | 80% capacity for >5 min | Investigate traffic spike |
| Unusual IP concentration | Medium | 100 limits from single IP/hour | Consider IP block |
| Auth endpoint spike | High | 10x baseline auth failures | Credential stuffing likely |
| Multiple user lockouts | Medium | 50 lockouts/hour | Targeted attack or enumeration |
| Legitimate user limits | High | Known-good users hitting limits | Review/adjust thresholds |
| Distributed pattern | Critical | 1000 IPs hitting limits simultaneously | Botnet attack, escalate |
Export rate limit events to your SIEM (Splunk, ELK, Datadog) for correlation with other security events. Rate limit triggers often correlate with vulnerability scans, brute force campaigns, and other attack activity. The combination provides richer context for investigation.
Rate limiting is a critical security control that slows attacks, protects resources, and provides visibility into malicious activity. Let's consolidate the key concepts:
What's Next:
Rate limiting controls how many requests can be made, but not what's in those requests. The next page explores Input Validation, covering how to protect your API from injection attacks, malformed data, and malicious payloads that slip past authentication and rate limits.
You now understand rate limiting as a security mechanism, from algorithm selection through distributed implementation, authentication protection, bypass prevention, and monitoring. You can design rate limiting systems that defend against real-world attack patterns. Next, we'll explore input validation to protect against malicious request content.