Loading learning content...
Rate limiting is ultimately a communication problem. When you reject a request because a limit has been exceeded, you're having a conversation with the client: explaining what happened, why, and what they should do next. The quality of this conversation determines whether your rate limiting is a frustrating obstacle or a transparent, manageable constraint.
Poorly communicated rate limits lead to confused developers, frustrated users, wasted support tickets, and clients implementing inefficient retry strategies that make the problem worse. Well-communicated rate limits become a predictable part of the API contract—something developers can plan for and handle gracefully.
By the end of this page, you will master HTTP status codes and headers for rate limiting, understand client-side retry strategies, design graceful degradation patterns, handle various failure modes, and create excellent user experiences even when enforcing limits.
The HTTP specification and conventions provide specific status codes for rate limiting scenarios. Using the correct codes ensures clients and intermediaries (proxies, CDNs) handle responses appropriately.
| Status Code | Name | Use Case | Key Characteristic |
|---|---|---|---|
| 429 | Too Many Requests | Primary rate limit response | Indicates client should slow down; use Retry-After header |
| 503 | Service Unavailable | Server overloaded, temporary | General overload, not client-specific; use for circuit breakers |
| 403 | Forbidden | Quota exhausted (no recovery) | When limit is permanent until action (upgrade, wait for billing cycle) |
| 401 | Unauthorized | Authentication-related limits | Failed auth attempts exceeded; forces re-authentication |
| 420 | Enhance Your Calm | Unofficial (Twitter legacy) | Avoid using; not standardized, but you may see it |
429 Too Many Requests: The Primary Choice
RFC 6585 defines 429 specifically for rate limiting:
The 429 status code indicates that the user has sent too many requests in a given amount of time ("rate limiting").
Important distinctions:
429 = Client-specific limit exceeded. This specific client sent too many requests. The solution is to slow down.
503 = Server-wide capacity issue. The server can't handle more requests from anyone. Not the client's fault.
Using 503 for rate limiting is incorrect because it signals server problems rather than client behavior problems. Clients receiving 503 might not implement backoff, assuming the issue is temporary.
Some APIs return 200 OK with an error in the body when rate limited. This is problematic: clients may not parse the body, monitoring tools will show success, and caching proxies may cache the 'error' response. Always use proper 4xx status codes for client errors.
HTTP headers communicate rate limit state to clients, enabling them to track their usage and adjust behavior before hitting limits. While there's no single standard, the IETF draft RateLimit Fields for HTTP is converging on common conventions.
12345678910111213141516171819202122232425262728293031
HTTP/1.1 429 Too Many RequestsContent-Type: application/jsonDate: Wed, 08 Jan 2025 07:30:44 GMT # Standard rate limit headers (IETF draft)RateLimit-Limit: 100RateLimit-Remaining: 0RateLimit-Reset: 45 # Alternative: Unix timestamp (GitHub style)X-RateLimit-Limit: 100X-RateLimit-Remaining: 0X-RateLimit-Reset: 1704700284 # Retry-After: How long to wait (RFC 7231)# Can be seconds or HTTP-dateRetry-After: 45# ORRetry-After: Wed, 08 Jan 2025 07:31:29 GMT # Optional: Policy name for multiple limitsRateLimit-Policy: standard;w=60;burst=10 { "error": "rate_limit_exceeded", "message": "Too many requests. Please retry after 45 seconds.", "retry_after": 45, "limit": 100, "remaining": 0, "reset": 1704700284}Header conventions across major APIs:
| Provider | Limit Header | Remaining Header | Reset Header | Reset Format |
|---|---|---|---|---|
| GitHub | X-RateLimit-Limit | X-RateLimit-Remaining | X-RateLimit-Reset | Unix timestamp |
| x-rate-limit-limit | x-rate-limit-remaining | x-rate-limit-reset | Unix timestamp | |
| Stripe | (in body) | (in body) | (in body) | Unix timestamp |
| Shopify | X-Shopify-Shop-Api-Call-Limit | (as fraction) | n/a | bucket format |
| Google Cloud | (varies by API) | (varies) | Retry-After | Seconds |
Recommendation: Use the IETF draft format (RateLimit-*) as primary, with X-RateLimit-* for compatibility with existing clients. Always include Retry-After on 429 responses.
Include rate limit headers on successful responses too, not just 429s. This allows clients to proactively adjust their request rate as they approach limits, rather than waiting to be rejected. Well-designed clients will monitor these headers and slow down before hitting limits.
While headers provide machine-readable rate limit information, the response body should provide human-readable context, detailed error information, and guidance for resolution.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
// Basic rate limit response{ "error": "rate_limit_exceeded", "message": "You have exceeded your request limit of 100 requests per minute.", "retry_after": 45} // Detailed response with context{ "error": { "code": "rate_limit_exceeded", "message": "API rate limit exceeded", "details": "Your application has exceeded the rate limit of 100 requests per 60 seconds.", "type": "rate_limit_error" }, "rate_limit": { "limit": 100, "remaining": 0, "reset_at": "2025-01-08T07:31:29Z", "reset_in_seconds": 45, "policy": "standard_api", "scope": "user" }, "guidance": { "retry_after": 45, "suggested_action": "Implement exponential backoff and retry after the reset time.", "documentation": "https://docs.example.com/rate-limits", "upgrade_url": "https://example.com/pricing" }, "request_id": "req_abc123xyz"} // Multiple limit types response{ "error": "rate_limit_exceeded", "message": "Daily API quota exceeded", "limits": { "minute": { "limit": 100, "remaining": 85, "reset_in_seconds": 45 }, "daily": { "limit": 10000, "remaining": 0, "reset_at": "2025-01-09T00:00:00Z" } }, "triggered_by": "daily", "guidance": "Your daily quota has been exhausted. Consider upgrading to a higher tier for increased limits."} // Quota exhaustion (requires action){ "error": "quota_exhausted", "message": "Monthly API quota exhausted", "code": "QUOTA_EXCEEDED", "quota": { "type": "monthly", "limit": 100000, "used": 100000, "reset_at": "2025-02-01T00:00:00Z" }, "resolution": { "options": [ { "action": "upgrade", "description": "Upgrade to Professional tier for 500,000 requests/month", "url": "https://example.com/upgrade" }, { "action": "wait", "description": "Wait for quota reset on February 1st", "reset_at": "2025-02-01T00:00:00Z" }, { "action": "purchase_addon", "description": "Purchase additional request credits", "url": "https://example.com/credits" } ] }}Error messages should be localizable. Include both a stable error code for programmatic handling and a human-readable message that can be translated. Consider accepting an Accept-Language header and returning localized messages for international APIs.
How clients respond to 429 errors significantly impacts both their success rate and your server load. Poorly implemented retries can amplify problems; well-implemented retries handle limits gracefully.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164
/** * Rate Limit-Aware HTTP Client * * Implements exponential backoff with jitter and respects Retry-After. */interface RetryConfig { maxRetries: number; baseDelayMs: number; maxDelayMs: number; jitterFactor: number; // 0-1, how much randomness to add} const DEFAULT_RETRY_CONFIG: RetryConfig = { maxRetries: 5, baseDelayMs: 1000, maxDelayMs: 60000, jitterFactor: 0.3,}; class RateLimitAwareClient { private config: RetryConfig; constructor(config: Partial<RetryConfig> = {}) { this.config = { ...DEFAULT_RETRY_CONFIG, ...config }; } async request<T>( url: string, options: RequestInit = {} ): Promise<T> { let lastError: Error | null = null; for (let attempt = 0; attempt <= this.config.maxRetries; attempt++) { try { const response = await fetch(url, options); // Track rate limit headers proactively this.updateRateLimitState(url, response.headers); // Success if (response.ok) { return await response.json(); } // Rate limited - extract retry timing if (response.status === 429) { const retryAfter = this.parseRetryAfter(response.headers); const delay = retryAfter || this.calculateBackoff(attempt); console.log( `Rate limited on attempt ${attempt + 1}. ` + `Waiting ${delay}ms before retry.` ); await this.sleep(delay); continue; } // Server overloaded if (response.status === 503) { const retryAfter = this.parseRetryAfter(response.headers); const delay = retryAfter || this.calculateBackoff(attempt); await this.sleep(delay); continue; } // Other errors - don't retry const error = await response.json(); throw new Error(`API error ${response.status}: ${error.message}`); } catch (err) { lastError = err as Error; // Network errors - worth retrying if (this.isRetryableError(err)) { const delay = this.calculateBackoff(attempt); await this.sleep(delay); continue; } throw err; } } throw new Error( `Max retries (${this.config.maxRetries}) exceeded. ` + `Last error: ${lastError?.message}` ); } /** * Parse Retry-After header (seconds or HTTP-date) */ private parseRetryAfter(headers: Headers): number | null { const retryAfter = headers.get('Retry-After'); if (!retryAfter) return null; // Check if it's a number of seconds const seconds = parseInt(retryAfter, 10); if (!isNaN(seconds)) { return seconds * 1000; // Convert to ms } // Try to parse as HTTP-date const date = new Date(retryAfter); if (!isNaN(date.getTime())) { return Math.max(0, date.getTime() - Date.now()); } return null; } /** * Exponential backoff with jitter */ private calculateBackoff(attempt: number): number { // Exponential: 1s, 2s, 4s, 8s, 16s, ... const exponentialDelay = this.config.baseDelayMs * Math.pow(2, attempt); // Cap at maximum const cappedDelay = Math.min(exponentialDelay, this.config.maxDelayMs); // Add jitter (randomness to prevent thundering herd) const jitter = cappedDelay * this.config.jitterFactor * Math.random(); return Math.floor(cappedDelay + jitter); } /** * Proactively track rate limits to avoid hitting them */ private updateRateLimitState(url: string, headers: Headers): void { const remaining = headers.get('RateLimit-Remaining') || headers.get('X-RateLimit-Remaining'); const limit = headers.get('RateLimit-Limit') || headers.get('X-RateLimit-Limit'); if (remaining && limit) { const remainingNum = parseInt(remaining, 10); const limitNum = parseInt(limit, 10); // If approaching limit, slow down proactively if (remainingNum < limitNum * 0.1) { console.warn( `Approaching rate limit: ${remainingNum}/${limitNum} remaining` ); // Client could slow down request rate here } } } private isRetryableError(err: unknown): boolean { // Network errors are retryable if (err instanceof TypeError && err.message.includes('fetch')) { return true; } return false; } private sleep(ms: number): Promise<void> { return new Promise(resolve => setTimeout(resolve, ms)); }}Why jitter matters:
Without jitter, many clients hitting a rate limit at the same time will all retry at exactly the same moment. This creates a 'thundering herd' that can re-overload the server:
Time 0:00 - Server rate limits 1000 clients
Time 0:01 - All 1000 clients retry simultaneously → overload again
Time 0:02 - All 1000 clients retry simultaneously → overload again
...
With jitter:
Time 0:00 - Server rate limits 1000 clients
Time 0:00-0:02 - Clients retry spread across 2 seconds → manageable load
Jitter formula: delay = baseDelay * 2^attempt * (1 + random() * jitterFactor)
When a server sends a Retry-After header, ALWAYS respect it. The server knows better than any client algorithm when it's safe to retry. Ignoring Retry-After in favor of aggressive retries wastes resources and may result in extended rate limiting or blocking.
Rate limiting doesn't have to be all-or-nothing. Graceful degradation allows you to reduce service quality progressively as limits approach, maintaining some functionality even when full service isn't possible.
| Strategy | Description | Example | When to Use |
|---|---|---|---|
| Reduce Response Quality | Return less data as limit approaches | Search returns 10 results instead of 100 | Data-heavy endpoints |
| Disable Non-Essential Features | Turn off expensive features first | Disable real-time updates, use polling | Feature-rich applications |
| Serve Cached Data | Return stale data instead of blocking | Show cached product prices (5 min old) | Read-heavy workloads |
| Queue for Later | Accept request but process asynchronously | Webhook delivery with delay | Available for write operations |
| Reduce Frequency | Allow operation but limit how often | Full sync daily, incremental syncs blocked | Sync/export operations |
| Priority Queuing | Serve high-priority requests, queue others | Checkout proceeds, browsing waits | Mixed-priority traffic |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167
/** * Graceful Degradation Implementation * * Reduce service quality progressively rather than hard blocking. */interface DegradationLevel { name: string; threshold: number; // % of limit consumed modifications: { maxResults?: number; includeDetails?: boolean; useCachedData?: boolean; maxPageSize?: number; enableRealtime?: boolean; };} const DEGRADATION_LEVELS: DegradationLevel[] = [ { name: 'full', threshold: 0, modifications: { maxResults: 100, includeDetails: true, useCachedData: false, maxPageSize: 100, enableRealtime: true, }, }, { name: 'reduced', threshold: 70, modifications: { maxResults: 50, includeDetails: true, useCachedData: false, maxPageSize: 50, enableRealtime: true, }, }, { name: 'minimal', threshold: 85, modifications: { maxResults: 20, includeDetails: false, useCachedData: true, // Accept slightly stale data maxPageSize: 20, enableRealtime: false, }, }, { name: 'critical', threshold: 95, modifications: { maxResults: 10, includeDetails: false, useCachedData: true, maxPageSize: 10, enableRealtime: false, }, },]; class GracefulDegradationMiddleware { getDegradationLevel(rateLimitInfo: RateLimitInfo): DegradationLevel { const usagePercent = ((rateLimitInfo.limit - rateLimitInfo.remaining) / rateLimitInfo.limit) * 100; // Find highest threshold we've exceeded for (let i = DEGRADATION_LEVELS.length - 1; i >= 0; i--) { if (usagePercent >= DEGRADATION_LEVELS[i].threshold) { return DEGRADATION_LEVELS[i]; } } return DEGRADATION_LEVELS[0]; } applyDegradation( req: Request, res: Response, level: DegradationLevel ): void { // Modify request parameters based on degradation level const mods = level.modifications; // Limit result count if (req.query.limit && mods.maxResults) { req.query.limit = Math.min( parseInt(req.query.limit as string, 10), mods.maxResults ).toString(); } // Set header indicating degradation res.setHeader('X-Degradation-Level', level.name); if (level.name !== 'full') { res.setHeader( 'Warning', `299 - "Service degraded: ${level.name} mode active"` ); } // Attach to request for downstream use (req as any).degradationLevel = level; }} /** * Priority-based request handling under load */class PriorityRateLimiter { private readonly highPriorityEndpoints = [ '/api/checkout', '/api/payment', '/api/auth/login', ]; private readonly lowPriorityEndpoints = [ '/api/analytics', '/api/recommendations', '/api/history', ]; determineRequestPriority(path: string, user: User | null): Priority { // Critical endpoints always high priority if (this.highPriorityEndpoints.some(e => path.startsWith(e))) { return 'high'; } // Low-value endpoints are deprioritized if (this.lowPriorityEndpoints.some(e => path.startsWith(e))) { return 'low'; } // Paying customers get priority if (user?.subscription === 'premium') { return 'high'; } return 'normal'; } async handleWithPriority( priority: Priority, handler: () => Promise<Response> ): Promise<Response> { const limits = { high: { limit: 1000, queue: true }, normal: { limit: 100, queue: true }, low: { limit: 50, queue: false }, // Don't queue, just reject }; const config = limits[priority]; if (!this.checkLimit(priority, config.limit)) { if (config.queue) { return this.queueRequest(priority, handler); } return new Response('Rate limit exceeded', { status: 429 }); } return handler(); }}When serving degraded responses, tell the client. Use headers like X-Degradation-Level or Warning to indicate reduced quality. Include in the response body that results are limited or cached. This allows clients to handle degraded responses appropriately (e.g., show a banner indicating results may be incomplete).
For user-facing applications, rate limiting must be handled gracefully in the UI. A spinning loader that never resolves or a cryptic error message creates a poor experience. Thoughtful UX design can turn rate limiting from a frustration into an acceptable constraint.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
/** * React component for handling rate-limited API calls */import { useState, useCallback } from 'react'; interface RateLimitState { isLimited: boolean; retryAfter: number | null; remaining: number | null; limit: number | null;} function useRateLimitedApi<T>( apiCall: () => Promise<T>): { execute: () => Promise<T | null>; state: RateLimitState; isLoading: boolean;} { const [state, setState] = useState<RateLimitState>({ isLimited: false, retryAfter: null, remaining: null, limit: null, }); const [isLoading, setIsLoading] = useState(false); const execute = useCallback(async () => { if (state.isLimited && state.retryAfter && state.retryAfter > 0) { return null; } setIsLoading(true); try { const response = await apiCall(); setState(prev => ({ ...prev, isLimited: false, retryAfter: null, })); return response; } catch (error) { if (error.status === 429) { const retryAfter = error.retryAfter || 60; setState({ isLimited: true, retryAfter, remaining: 0, limit: error.limit, }); // Auto-reset after retry period setTimeout(() => { setState(prev => ({ ...prev, isLimited: false, retryAfter: null, })); }, retryAfter * 1000); } throw error; } finally { setIsLoading(false); } }, [apiCall, state.isLimited, state.retryAfter]); return { execute, state, isLoading };} // Usage in componentfunction SearchComponent() { const { execute, state, isLoading } = useRateLimitedApi( () => api.search(query) ); if (state.isLimited) { return ( <div className="rate-limit-banner"> <div className="message"> You've made too many searches. Please wait: </div> <CountdownTimer seconds={state.retryAfter} /> <div className="help"> <a href="/upgrade">Upgrade</a> for unlimited searches </div> </div> ); } return ( <div> <SearchInput onSearch={execute} disabled={isLoading} /> {state.remaining !== null && ( <div className="usage-indicator"> {state.remaining} / {state.limit} searches remaining </div> )} </div> );}Rate limiting should be invisible to normal users. If legitimate users frequently hit limits, either your limits are too aggressive or your UX encourages excessive requests. Monitor how often real users encounter limits and adjust accordingly.
Clear rate limit documentation prevents confusion, reduces support tickets, and enables developers to build resilient integrations. Every API with rate limiting should document its limits comprehensively.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
# Rate Limits ## Overview Our API implements rate limiting to ensure fair usage and protect service quality. Rate limits are applied per API key. ## Limit Tiers | Tier | Requests/min | Requests/day | Burst ||--------------|--------------|--------------|-------|| Free | 60 | 1,000 | 10 || Starter | 300 | 25,000 | 50 || Professional | 1,000 | 250,000 | 200 || Enterprise | Custom | Custom | Custom| ## Headers Every response includes rate limit headers: | Header | Description ||--------|-------------|| `RateLimit-Limit` | Maximum requests in current window || `RateLimit-Remaining` | Requests remaining in current window || `RateLimit-Reset` | Seconds until limit resets || `Retry-After` | (429 only) Seconds to wait before retrying | ## Handling 429 Responses When you receive a 429 response: 1. Read the `Retry-After` header2. Wait for the specified number of seconds3. Retry your request **Example 429 Response:**```json{ "error": "rate_limit_exceeded", "message": "Rate limit exceeded. Retry after 30 seconds.", "retry_after": 30}``` ## Best Practices 1. **Monitor headers** — Track remaining requests proactively2. **Implement backoff** — Use exponential backoff with jitter3. **Respect Retry-After** — Always wait the specified time4. **Cache responses** — Reduce requests by caching when possible5. **Batch operations** — Use bulk endpoints where available ## Special Endpoint Limits Some endpoints have stricter limits: - `/auth/login` — 5 requests/minute (security)- `/export` — 10 requests/hour (resource intensive)- `/search` — 30 requests/minute (compute intensive) ## Request Costs Most endpoints cost 1 request. Some cost more: | Endpoint | Cost ||----------|------|| `/search` | 1 || `/export` | 10 || `/batch/*` | 1 per item (max 100) |If you provide SDKs, build rate limit handling in. The SDK should automatically retry with backoff, expose remaining quota to the application, and provide hooks for applications to handle rate limit events. This reduces the burden on developers and ensures consistent behavior.
Response handling transforms rate limiting from an obstacle into a manageable constraint. Done well, it enables clients to adjust their behavior, provides clear guidance on recovery, and maintains good user experience even when limits are enforced.
Module Complete:
Congratulations! You've completed the comprehensive module on Rate Limiting and Throttling. You now understand:
These capabilities form a critical component of securing and scaling distributed systems. Apply these patterns thoughtfully, monitor their effectiveness, and iterate based on real-world traffic patterns and attack evolution.
You have completed Module 4: Rate Limiting and Throttling. You now possess world-class knowledge of rate limiting—from the algorithms and architectures to the nuances of client identification and response handling. These skills are essential for any engineer building or operating systems at scale.