System Design HLDRate Limiting and Throttling

Rate Limiting and Throttling

LevelAdvanced

Duration90 mins

TopicRate Limiting and Throttling

5 / 5

Response Handling

The User Experience of Being Limited

Rate limiting is ultimately a communication problem. When you reject a request because a limit has been exceeded, you're having a conversation with the client: explaining what happened, why, and what they should do next. The quality of this conversation determines whether your rate limiting is a frustrating obstacle or a transparent, manageable constraint.

Poorly communicated rate limits lead to confused developers, frustrated users, wasted support tickets, and clients implementing inefficient retry strategies that make the problem worse. Well-communicated rate limits become a predictable part of the API contract—something developers can plan for and handle gracefully.

What You Will Learn

By the end of this page, you will master HTTP status codes and headers for rate limiting, understand client-side retry strategies, design graceful degradation patterns, handle various failure modes, and create excellent user experiences even when enforcing limits.

HTTP Status Codes for Rate Limiting

The HTTP specification and conventions provide specific status codes for rate limiting scenarios. Using the correct codes ensures clients and intermediaries (proxies, CDNs) handle responses appropriately.

HTTP Status Codes for Rate Limiting
Status Code	Name	Use Case	Key Characteristic
429	Too Many Requests	Primary rate limit response	Indicates client should slow down; use Retry-After header
503	Service Unavailable	Server overloaded, temporary	General overload, not client-specific; use for circuit breakers
403	Forbidden	Quota exhausted (no recovery)	When limit is permanent until action (upgrade, wait for billing cycle)
401	Unauthorized	Authentication-related limits	Failed auth attempts exceeded; forces re-authentication
420	Enhance Your Calm	Unofficial (Twitter legacy)	Avoid using; not standardized, but you may see it

429 Too Many Requests: The Primary Choice

RFC 6585 defines 429 specifically for rate limiting:

The 429 status code indicates that the user has sent too many requests in a given amount of time ("rate limiting").

Important distinctions:

429 = Client-specific limit exceeded. This specific client sent too many requests. The solution is to slow down.
503 = Server-wide capacity issue. The server can't handle more requests from anyone. Not the client's fault.

Using 503 for rate limiting is incorrect because it signals server problems rather than client behavior problems. Clients receiving 503 might not implement backoff, assuming the issue is temporary.

Don't Use 200 for Rate Limiting

Some APIs return 200 OK with an error in the body when rate limited. This is problematic: clients may not parse the body, monitoring tools will show success, and caching proxies may cache the 'error' response. Always use proper 4xx status codes for client errors.

Rate Limit Headers

HTTP headers communicate rate limit state to clients, enabling them to track their usage and adjust behavior before hitting limits. While there's no single standard, the IETF draft RateLimit Fields for HTTP is converging on common conventions.

rate-limit-headers.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Date: Wed, 08 Jan 2025 07:30:44 GMT
 
# Standard rate limit headers (IETF draft)
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 45
 
# Alternative: Unix timestamp (GitHub style)
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1704700284
 
# Retry-After: How long to wait (RFC 7231)
# Can be seconds or HTTP-date
Retry-After: 45
# OR
Retry-After: Wed, 08 Jan 2025 07:31:29 GMT
 
# Optional: Policy name for multiple limits
RateLimit-Policy: standard;w=60;burst=10
 
{
  "error": "rate_limit_exceeded",
  "message": "Too many requests. Please retry after 45 seconds.",
  "retry_after": 45,
  "limit": 100,
  "remaining": 0,
  "reset": 1704700284
}

Header conventions across major APIs:

Provider	Limit Header	Remaining Header	Reset Header	Reset Format
GitHub	X-RateLimit-Limit	X-RateLimit-Remaining	X-RateLimit-Reset	Unix timestamp
Twitter	x-rate-limit-limit	x-rate-limit-remaining	x-rate-limit-reset	Unix timestamp
Stripe	(in body)	(in body)	(in body)	Unix timestamp
Shopify	X-Shopify-Shop-Api-Call-Limit	(as fraction)	n/a	bucket format
Google Cloud	(varies by API)	(varies)	Retry-After	Seconds

Recommendation: Use the IETF draft format (RateLimit-*) as primary, with X-RateLimit-* for compatibility with existing clients. Always include Retry-After on 429 responses.

Headers on Every Response

Include rate limit headers on successful responses too, not just 429s. This allows clients to proactively adjust their request rate as they approach limits, rather than waiting to be rejected. Well-designed clients will monitor these headers and slow down before hitting limits.

Response Body Design

While headers provide machine-readable rate limit information, the response body should provide human-readable context, detailed error information, and guidance for resolution.

response-bodies.json
JSON Response Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
// Basic rate limit response
{
  "error": "rate_limit_exceeded",
  "message": "You have exceeded your request limit of 100 requests per minute.",
  "retry_after": 45
}
 
// Detailed response with context
{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "API rate limit exceeded",
    "details": "Your application has exceeded the rate limit of 100 requests per 60 seconds.",
    "type": "rate_limit_error"
  },
  "rate_limit": {
    "limit": 100,
    "remaining": 0,
    "reset_at": "2025-01-08T07:31:29Z",
    "reset_in_seconds": 45,
    "policy": "standard_api",
    "scope": "user"
  },
  "guidance": {
    "retry_after": 45,
    "suggested_action": "Implement exponential backoff and retry after the reset time.",
    "documentation": "https://docs.example.com/rate-limits",
    "upgrade_url": "https://example.com/pricing"
  },
  "request_id": "req_abc123xyz"
}
 
// Multiple limit types response
{
  "error": "rate_limit_exceeded",
  "message": "Daily API quota exceeded",
  "limits": {
    "minute": {
      "limit": 100,
      "remaining": 85,
      "reset_in_seconds": 45
    },
    "daily": {
      "limit": 10000,
      "remaining": 0,
      "reset_at": "2025-01-09T00:00:00Z"
    }
  },
  "triggered_by": "daily",
  "guidance": "Your daily quota has been exhausted. Consider upgrading to a higher tier for increased limits."
}
 
// Quota exhaustion (requires action)
{
  "error": "quota_exhausted",
  "message": "Monthly API quota exhausted",
  "code": "QUOTA_EXCEEDED",
  "quota": {
    "type": "monthly",
    "limit": 100000,
    "used": 100000,
    "reset_at": "2025-02-01T00:00:00Z"
  },
  "resolution": {
    "options": [
      {
        "action": "upgrade",
        "description": "Upgrade to Professional tier for 500,000 requests/month",
        "url": "https://example.com/upgrade"
      },
      {
        "action": "wait",
        "description": "Wait for quota reset on February 1st",
        "reset_at": "2025-02-01T00:00:00Z"
      },
      {
        "action": "purchase_addon",
        "description": "Purchase additional request credits",
        "url": "https://example.com/credits"
      }
    ]
  }
}

Response Body Best Practices

•Include request_id — Unique identifier for support troubleshooting
•Human-readable message — Clear explanation of what happened and why
•Machine-readable code — Stable error codes for programmatic handling
•Actionable guidance — Tell the client exactly what to do (wait, retry, upgrade)
•Documentation links — Point to comprehensive rate limit documentation
•Consistent structure — Use the same error format as your other API errors
•Include timestamp — ISO 8601 format for reset times across time zones
•Multi-limit visibility — If multiple limits exist, show all of them

Localization Consideration

Error messages should be localizable. Include both a stable error code for programmatic handling and a human-readable message that can be translated. Consider accepting an Accept-Language header and returning localized messages for international APIs.

Client-Side Retry Strategies

How clients respond to 429 errors significantly impacts both their success rate and your server load. Poorly implemented retries can amplify problems; well-implemented retries handle limits gracefully.

retry-client.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
/**
 * Rate Limit-Aware HTTP Client
 * 
 * Implements exponential backoff with jitter and respects Retry-After.
 */
interface RetryConfig {
  maxRetries: number;
  baseDelayMs: number;
  maxDelayMs: number;
  jitterFactor: number;  // 0-1, how much randomness to add
}
 
const DEFAULT_RETRY_CONFIG: RetryConfig = {
  maxRetries: 5,
  baseDelayMs: 1000,
  maxDelayMs: 60000,
  jitterFactor: 0.3,
};
 
class RateLimitAwareClient {
  private config: RetryConfig;
  
  constructor(config: Partial<RetryConfig> = {}) {
    this.config = { ...DEFAULT_RETRY_CONFIG, ...config };
  }
 
  async request<T>(
    url: string,
    options: RequestInit = {}
  ): Promise<T> {
    let lastError: Error | null = null;
    
    for (let attempt = 0; attempt <= this.config.maxRetries; attempt++) {
      try {
        const response = await fetch(url, options);
        
        // Track rate limit headers proactively
        this.updateRateLimitState(url, response.headers);
        
        // Success
        if (response.ok) {
          return await response.json();
        }
        
        // Rate limited - extract retry timing
        if (response.status === 429) {
          const retryAfter = this.parseRetryAfter(response.headers);
          const delay = retryAfter || this.calculateBackoff(attempt);
          
          console.log(
            `Rate limited on attempt ${attempt + 1}. ` +
            `Waiting ${delay}ms before retry.`
          );
          
          await this.sleep(delay);
          continue;
        }
        
        // Server overloaded
        if (response.status === 503) {
          const retryAfter = this.parseRetryAfter(response.headers);
          const delay = retryAfter || this.calculateBackoff(attempt);
          
          await this.sleep(delay);
          continue;
        }
        
        // Other errors - don't retry
        const error = await response.json();
        throw new Error(`API error ${response.status}: ${error.message}`);
        
      } catch (err) {
        lastError = err as Error;
        
        // Network errors - worth retrying
        if (this.isRetryableError(err)) {
          const delay = this.calculateBackoff(attempt);
          await this.sleep(delay);
          continue;
        }
        
        throw err;
      }
    }
    
    throw new Error(
      `Max retries (${this.config.maxRetries}) exceeded. ` +
      `Last error: ${lastError?.message}`
    );
  }
 
  /**
   * Parse Retry-After header (seconds or HTTP-date)
   */
  private parseRetryAfter(headers: Headers): number | null {
    const retryAfter = headers.get('Retry-After');
    if (!retryAfter) return null;
    
    // Check if it's a number of seconds
    const seconds = parseInt(retryAfter, 10);
    if (!isNaN(seconds)) {
      return seconds * 1000; // Convert to ms
    }
    
    // Try to parse as HTTP-date
    const date = new Date(retryAfter);
    if (!isNaN(date.getTime())) {
      return Math.max(0, date.getTime() - Date.now());
    }
    
    return null;
  }
 
  /**
   * Exponential backoff with jitter
   */
  private calculateBackoff(attempt: number): number {
    // Exponential: 1s, 2s, 4s, 8s, 16s, ...
    const exponentialDelay = this.config.baseDelayMs * Math.pow(2, attempt);
    
    // Cap at maximum
    const cappedDelay = Math.min(exponentialDelay, this.config.maxDelayMs);
    
    // Add jitter (randomness to prevent thundering herd)
    const jitter = cappedDelay * this.config.jitterFactor * Math.random();
    
    return Math.floor(cappedDelay + jitter);
  }
 
  /**
   * Proactively track rate limits to avoid hitting them
   */
  private updateRateLimitState(url: string, headers: Headers): void {
    const remaining = headers.get('RateLimit-Remaining') || 
                      headers.get('X-RateLimit-Remaining');
    const limit = headers.get('RateLimit-Limit') || 
                  headers.get('X-RateLimit-Limit');
    
    if (remaining && limit) {
      const remainingNum = parseInt(remaining, 10);
      const limitNum = parseInt(limit, 10);
      
      // If approaching limit, slow down proactively
      if (remainingNum < limitNum * 0.1) {
        console.warn(
          `Approaching rate limit: ${remainingNum}/${limitNum} remaining`
        );
        // Client could slow down request rate here
      }
    }
  }
 
  private isRetryableError(err: unknown): boolean {
    // Network errors are retryable
    if (err instanceof TypeError && err.message.includes('fetch')) {
      return true;
    }
    return false;
  }
 
  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

Why jitter matters:

Without jitter, many clients hitting a rate limit at the same time will all retry at exactly the same moment. This creates a 'thundering herd' that can re-overload the server:

Time 0:00 - Server rate limits 1000 clients
Time 0:01 - All 1000 clients retry simultaneously → overload again
Time 0:02 - All 1000 clients retry simultaneously → overload again
...

With jitter:

Time 0:00 - Server rate limits 1000 clients  
Time 0:00-0:02 - Clients retry spread across 2 seconds → manageable load

Jitter formula: delay = baseDelay * 2^attempt * (1 + random() * jitterFactor)

Respect Retry-After

When a server sends a Retry-After header, ALWAYS respect it. The server knows better than any client algorithm when it's safe to retry. Ignoring Retry-After in favor of aggressive retries wastes resources and may result in extended rate limiting or blocking.

Graceful Degradation Patterns

Rate limiting doesn't have to be all-or-nothing. Graceful degradation allows you to reduce service quality progressively as limits approach, maintaining some functionality even when full service isn't possible.

Graceful Degradation Strategies
Strategy	Description	Example	When to Use
Reduce Response Quality	Return less data as limit approaches	Search returns 10 results instead of 100	Data-heavy endpoints
Disable Non-Essential Features	Turn off expensive features first	Disable real-time updates, use polling	Feature-rich applications
Serve Cached Data	Return stale data instead of blocking	Show cached product prices (5 min old)	Read-heavy workloads
Queue for Later	Accept request but process asynchronously	Webhook delivery with delay	Available for write operations
Reduce Frequency	Allow operation but limit how often	Full sync daily, incremental syncs blocked	Sync/export operations
Priority Queuing	Serve high-priority requests, queue others	Checkout proceeds, browsing waits	Mixed-priority traffic

graceful-degradation.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
/**
 * Graceful Degradation Implementation
 * 
 * Reduce service quality progressively rather than hard blocking.
 */
interface DegradationLevel {
  name: string;
  threshold: number;     // % of limit consumed
  modifications: {
    maxResults?: number;
    includeDetails?: boolean;
    useCachedData?: boolean;
    maxPageSize?: number;
    enableRealtime?: boolean;
  };
}
 
const DEGRADATION_LEVELS: DegradationLevel[] = [
  {
    name: 'full',
    threshold: 0,
    modifications: {
      maxResults: 100,
      includeDetails: true,
      useCachedData: false,
      maxPageSize: 100,
      enableRealtime: true,
    },
  },
  {
    name: 'reduced',
    threshold: 70,
    modifications: {
      maxResults: 50,
      includeDetails: true,
      useCachedData: false,
      maxPageSize: 50,
      enableRealtime: true,
    },
  },
  {
    name: 'minimal',
    threshold: 85,
    modifications: {
      maxResults: 20,
      includeDetails: false,
      useCachedData: true,  // Accept slightly stale data
      maxPageSize: 20,
      enableRealtime: false,
    },
  },
  {
    name: 'critical',
    threshold: 95,
    modifications: {
      maxResults: 10,
      includeDetails: false,
      useCachedData: true,
      maxPageSize: 10,
      enableRealtime: false,
    },
  },
];
 
class GracefulDegradationMiddleware {
  getDegradationLevel(rateLimitInfo: RateLimitInfo): DegradationLevel {
    const usagePercent = ((rateLimitInfo.limit - rateLimitInfo.remaining) / 
                          rateLimitInfo.limit) * 100;
    
    // Find highest threshold we've exceeded
    for (let i = DEGRADATION_LEVELS.length - 1; i >= 0; i--) {
      if (usagePercent >= DEGRADATION_LEVELS[i].threshold) {
        return DEGRADATION_LEVELS[i];
      }
    }
    
    return DEGRADATION_LEVELS[0];
  }
 
  applyDegradation(
    req: Request,
    res: Response,
    level: DegradationLevel
  ): void {
    // Modify request parameters based on degradation level
    const mods = level.modifications;
    
    // Limit result count
    if (req.query.limit && mods.maxResults) {
      req.query.limit = Math.min(
        parseInt(req.query.limit as string, 10),
        mods.maxResults
      ).toString();
    }
    
    // Set header indicating degradation
    res.setHeader('X-Degradation-Level', level.name);
    
    if (level.name !== 'full') {
      res.setHeader(
        'Warning',
        `299 - "Service degraded: ${level.name} mode active"`
      );
    }
    
    // Attach to request for downstream use
    (req as any).degradationLevel = level;
  }
}
 
/**
 * Priority-based request handling under load
 */
class PriorityRateLimiter {
  private readonly highPriorityEndpoints = [
    '/api/checkout',
    '/api/payment',
    '/api/auth/login',
  ];
 
  private readonly lowPriorityEndpoints = [
    '/api/analytics',
    '/api/recommendations',
    '/api/history',
  ];
 
  determineRequestPriority(path: string, user: User | null): Priority {
    // Critical endpoints always high priority
    if (this.highPriorityEndpoints.some(e => path.startsWith(e))) {
      return 'high';
    }
 
    // Low-value endpoints are deprioritized
    if (this.lowPriorityEndpoints.some(e => path.startsWith(e))) {
      return 'low';
    }
 
    // Paying customers get priority
    if (user?.subscription === 'premium') {
      return 'high';
    }
 
    return 'normal';
  }
 
  async handleWithPriority(
    priority: Priority,
    handler: () => Promise<Response>
  ): Promise<Response> {
    const limits = {
      high: { limit: 1000, queue: true },
      normal: { limit: 100, queue: true },
      low: { limit: 50, queue: false },  // Don't queue, just reject
    };
 
    const config = limits[priority];
    
    if (!this.checkLimit(priority, config.limit)) {
      if (config.queue) {
        return this.queueRequest(priority, handler);
      }
      return new Response('Rate limit exceeded', { status: 429 });
    }
 
    return handler();
  }
}

Communicate Degradation

When serving degraded responses, tell the client. Use headers like X-Degradation-Level or Warning to indicate reduced quality. Include in the response body that results are limited or cached. This allows clients to handle degraded responses appropriately (e.g., show a banner indicating results may be incomplete).

User Experience in Applications

For user-facing applications, rate limiting must be handled gracefully in the UI. A spinning loader that never resolves or a cryptic error message creates a poor experience. Thoughtful UX design can turn rate limiting from a frustration into an acceptable constraint.

UI Patterns for Rate Limiting

•Proactive Usage Indicators — Show quota usage before users hit limits. A progress bar showing '75/100 API calls today' sets expectations.
•Countdown Timers — When limited, show a countdown until reset. 'Try again in 2:34' is better than 'Too many requests'.
•Queue Position — If requests are queued, show position. 'Your request is #5 in queue, estimated 30 seconds'.
•Stale Data Indication — If showing cached data, indicate age. 'Prices as of 5 minutes ago. Refresh in 30 seconds.'
•Debounced Inputs — For search/autocomplete, debounce client-side to reduce requests. Don't hit the API on every keystroke.
•Optimistic UI with Retry — Show optimistic updates, retry in background, only show error if retries fail.
•Upgrade Prompts — When quota is exhausted, offer clear upgrade paths. 'Upgrade to Pro for unlimited access'.
•Reduced Functionality Mode — Indicate what's limited. 'Search limited to 10 results. Upgrade for full results.'

rate-limit-ui.tsx
React
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
/**
 * React component for handling rate-limited API calls
 */
import { useState, useCallback } from 'react';
 
interface RateLimitState {
  isLimited: boolean;
  retryAfter: number | null;
  remaining: number | null;
  limit: number | null;
}
 
function useRateLimitedApi<T>(
  apiCall: () => Promise<T>
): {
  execute: () => Promise<T | null>;
  state: RateLimitState;
  isLoading: boolean;
} {
  const [state, setState] = useState<RateLimitState>({
    isLimited: false,
    retryAfter: null,
    remaining: null,
    limit: null,
  });
  const [isLoading, setIsLoading] = useState(false);
 
  const execute = useCallback(async () => {
    if (state.isLimited && state.retryAfter && state.retryAfter > 0) {
      return null;
    }
 
    setIsLoading(true);
    try {
      const response = await apiCall();
      setState(prev => ({ 
        ...prev, 
        isLimited: false,
        retryAfter: null,
      }));
      return response;
    } catch (error) {
      if (error.status === 429) {
        const retryAfter = error.retryAfter || 60;
        setState({
          isLimited: true,
          retryAfter,
          remaining: 0,
          limit: error.limit,
        });
        
        // Auto-reset after retry period
        setTimeout(() => {
          setState(prev => ({
            ...prev,
            isLimited: false,
            retryAfter: null,
          }));
        }, retryAfter * 1000);
      }
      throw error;
    } finally {
      setIsLoading(false);
    }
  }, [apiCall, state.isLimited, state.retryAfter]);
 
  return { execute, state, isLoading };
}
 
// Usage in component
function SearchComponent() {
  const { execute, state, isLoading } = useRateLimitedApi(
    () => api.search(query)
  );
 
  if (state.isLimited) {
    return (
      <div className="rate-limit-banner">
        <div className="message">
          You've made too many searches. Please wait:
        </div>
        <CountdownTimer seconds={state.retryAfter} />
        <div className="help">
          <a href="/upgrade">Upgrade</a> for unlimited searches
        </div>
      </div>
    );
  }
 
  return (
    <div>
      <SearchInput onSearch={execute} disabled={isLoading} />
      {state.remaining !== null && (
        <div className="usage-indicator">
          {state.remaining} / {state.limit} searches remaining
        </div>
      )}
    </div>
  );
}

Don't Punish Good Users

Rate limiting should be invisible to normal users. If legitimate users frequently hit limits, either your limits are too aggressive or your UX encourages excessive requests. Monitor how often real users encounter limits and adjust accordingly.

Documentation Standards

Clear rate limit documentation prevents confusion, reduces support tickets, and enables developers to build resilient integrations. Every API with rate limiting should document its limits comprehensively.

What to Document

•Limit values — Exact numbers per endpoint, per tier, per time window
•Time windows — Sliding vs fixed, window size, reset behavior
•Limit scope — What's counted together (per IP, per key, per endpoint)
•Headers used — Exact header names and their meanings
•Error responses — Full examples of 429 responses with all fields
•Retry guidance — How to calculate retry delays, what Retry-After means
•Tier differences — How limits vary across subscription levels
•Special endpoint limits — Which endpoints have different limits
•Cost weighting — If some operations cost more than 1 'request'
•Burst behavior — Whether bursts are allowed and to what extent

rate-limit-docs.md
Markdown
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# Rate Limits
 
## Overview
 
Our API implements rate limiting to ensure fair usage and protect 
service quality. Rate limits are applied per API key.
 
## Limit Tiers
 
| Tier         | Requests/min | Requests/day | Burst |
|--------------|--------------|--------------|-------|
| Free         | 60           | 1,000        | 10    |
| Starter      | 300          | 25,000       | 50    |
| Professional | 1,000        | 250,000      | 200   |
| Enterprise   | Custom       | Custom       | Custom|
 
## Headers
 
Every response includes rate limit headers:
 
| Header | Description |
|--------|-------------|
| `RateLimit-Limit` | Maximum requests in current window |
| `RateLimit-Remaining` | Requests remaining in current window |
| `RateLimit-Reset` | Seconds until limit resets |
| `Retry-After` | (429 only) Seconds to wait before retrying |
 
## Handling 429 Responses
 
When you receive a 429 response:
 
1. Read the `Retry-After` header
2. Wait for the specified number of seconds
3. Retry your request
 
**Example 429 Response:**
```json
{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded. Retry after 30 seconds.",
  "retry_after": 30
}
```
 
## Best Practices
 
1. **Monitor headers** — Track remaining requests proactively
2. **Implement backoff** — Use exponential backoff with jitter
3. **Respect Retry-After** — Always wait the specified time
4. **Cache responses** — Reduce requests by caching when possible
5. **Batch operations** — Use bulk endpoints where available
 
## Special Endpoint Limits
 
Some endpoints have stricter limits:
 
- `/auth/login` — 5 requests/minute (security)
- `/export` — 10 requests/hour (resource intensive)
- `/search` — 30 requests/minute (compute intensive)
 
## Request Costs
 
Most endpoints cost 1 request. Some cost more:
 
| Endpoint | Cost |
|----------|------|
| `/search` | 1 |
| `/export` | 10 |
| `/batch/*` | 1 per item (max 100) |

SDKs Should Handle Rate Limits

If you provide SDKs, build rate limit handling in. The SDK should automatically retry with backoff, expose remaining quota to the application, and provide hooks for applications to handle rate limit events. This reduces the burden on developers and ensures consistent behavior.

Summary: Response Handling

Response handling transforms rate limiting from an obstacle into a manageable constraint. Done well, it enables clients to adjust their behavior, provides clear guidance on recovery, and maintains good user experience even when limits are enforced.

Key Takeaways

•Use 429 Too Many Requests — The correct status code for rate limiting; 503 is for server issues, not client limits
•Include comprehensive headers — RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset on every response; Retry-After on 429s
•Design informative response bodies — Error codes, human messages, retry guidance, and documentation links
•Clients should implement exponential backoff with jitter — Prevents thundering herd; respects server guidance
•Graceful degradation over hard blocking — Reduce quality progressively; queue when possible; prioritize critical requests
•User-facing apps need thoughtful UX — Countdown timers, usage indicators, upgrade prompts, and clear explanations
•Document everything — Limits, headers, examples, and best practices; unclear documentation creates support burden
•SDKs should handle rate limits — Build retry logic into client libraries for consistent behavior

Module Complete:

Congratulations! You've completed the comprehensive module on Rate Limiting and Throttling. You now understand:

Why rate limiting is essential for protecting systems from abuse
How the core algorithms work (Token Bucket, Leaky Bucket, Fixed/Sliding Window)
Where to implement distributed rate limiting and the consistency tradeoffs
Who is making requests through client identification strategies
What to return when limits are exceeded for optimal client experience

These capabilities form a critical component of securing and scaling distributed systems. Apply these patterns thoughtfully, monitor their effectiveness, and iterate based on real-world traffic patterns and attack evolution.

Module Complete

You have completed Module 4: Rate Limiting and Throttling. You now possess world-class knowledge of rate limiting—from the algorithms and architectures to the nuances of client identification and response handling. These skills are essential for any engineer building or operating systems at scale.

5 / 5

Loading learning content...

System Design HLDRate Limiting and Throttling

Rate Limiting and Throttling

LevelAdvanced

Duration90 mins

TopicRate Limiting and Throttling

5 / 5

Response Handling

The User Experience of Being Limited

What You Will Learn

HTTP Status Codes for Rate Limiting

HTTP Status Codes for Rate Limiting
Status Code	Name	Use Case	Key Characteristic
429	Too Many Requests	Primary rate limit response	Indicates client should slow down; use Retry-After header
503	Service Unavailable	Server overloaded, temporary	General overload, not client-specific; use for circuit breakers
403	Forbidden	Quota exhausted (no recovery)	When limit is permanent until action (upgrade, wait for billing cycle)
401	Unauthorized	Authentication-related limits	Failed auth attempts exceeded; forces re-authentication
420	Enhance Your Calm	Unofficial (Twitter legacy)	Avoid using; not standardized, but you may see it

429 Too Many Requests: The Primary Choice

RFC 6585 defines 429 specifically for rate limiting:

The 429 status code indicates that the user has sent too many requests in a given amount of time ("rate limiting").

Important distinctions:

429 = Client-specific limit exceeded. This specific client sent too many requests. The solution is to slow down.
503 = Server-wide capacity issue. The server can't handle more requests from anyone. Not the client's fault.

Using 503 for rate limiting is incorrect because it signals server problems rather than client behavior problems. Clients receiving 503 might not implement backoff, assuming the issue is temporary.

Don't Use 200 for Rate Limiting

Rate Limit Headers

rate-limit-headers.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Date: Wed, 08 Jan 2025 07:30:44 GMT
 
# Standard rate limit headers (IETF draft)
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 45
 
# Alternative: Unix timestamp (GitHub style)
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1704700284
 
# Retry-After: How long to wait (RFC 7231)
# Can be seconds or HTTP-date
Retry-After: 45
# OR
Retry-After: Wed, 08 Jan 2025 07:31:29 GMT
 
# Optional: Policy name for multiple limits
RateLimit-Policy: standard;w=60;burst=10
 
{
  "error": "rate_limit_exceeded",
  "message": "Too many requests. Please retry after 45 seconds.",
  "retry_after": 45,
  "limit": 100,
  "remaining": 0,
  "reset": 1704700284
}

Header conventions across major APIs:

Provider	Limit Header	Remaining Header	Reset Header	Reset Format
GitHub	X-RateLimit-Limit	X-RateLimit-Remaining	X-RateLimit-Reset	Unix timestamp
Twitter	x-rate-limit-limit	x-rate-limit-remaining	x-rate-limit-reset	Unix timestamp
Stripe	(in body)	(in body)	(in body)	Unix timestamp
Shopify	X-Shopify-Shop-Api-Call-Limit	(as fraction)	n/a	bucket format
Google Cloud	(varies by API)	(varies)	Retry-After	Seconds

Recommendation: Use the IETF draft format (RateLimit-*) as primary, with X-RateLimit-* for compatibility with existing clients. Always include Retry-After on 429 responses.

Headers on Every Response

Response Body Design

While headers provide machine-readable rate limit information, the response body should provide human-readable context, detailed error information, and guidance for resolution.

response-bodies.json
JSON Response Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
// Basic rate limit response
{
  "error": "rate_limit_exceeded",
  "message": "You have exceeded your request limit of 100 requests per minute.",
  "retry_after": 45
}
 
// Detailed response with context
{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "API rate limit exceeded",
    "details": "Your application has exceeded the rate limit of 100 requests per 60 seconds.",
    "type": "rate_limit_error"
  },
  "rate_limit": {
    "limit": 100,
    "remaining": 0,
    "reset_at": "2025-01-08T07:31:29Z",
    "reset_in_seconds": 45,
    "policy": "standard_api",
    "scope": "user"
  },
  "guidance": {
    "retry_after": 45,
    "suggested_action": "Implement exponential backoff and retry after the reset time.",
    "documentation": "https://docs.example.com/rate-limits",
    "upgrade_url": "https://example.com/pricing"
  },
  "request_id": "req_abc123xyz"
}
 
// Multiple limit types response
{
  "error": "rate_limit_exceeded",
  "message": "Daily API quota exceeded",
  "limits": {
    "minute": {
      "limit": 100,
      "remaining": 85,
      "reset_in_seconds": 45
    },
    "daily": {
      "limit": 10000,
      "remaining": 0,
      "reset_at": "2025-01-09T00:00:00Z"
    }
  },
  "triggered_by": "daily",
  "guidance": "Your daily quota has been exhausted. Consider upgrading to a higher tier for increased limits."
}
 
// Quota exhaustion (requires action)
{
  "error": "quota_exhausted",
  "message": "Monthly API quota exhausted",
  "code": "QUOTA_EXCEEDED",
  "quota": {
    "type": "monthly",
    "limit": 100000,
    "used": 100000,
    "reset_at": "2025-02-01T00:00:00Z"
  },
  "resolution": {
    "options": [
      {
        "action": "upgrade",
        "description": "Upgrade to Professional tier for 500,000 requests/month",
        "url": "https://example.com/upgrade"
      },
      {
        "action": "wait",
        "description": "Wait for quota reset on February 1st",
        "reset_at": "2025-02-01T00:00:00Z"
      },
      {
        "action": "purchase_addon",
        "description": "Purchase additional request credits",
        "url": "https://example.com/credits"
      }
    ]
  }
}

Response Body Best Practices

•Include request_id — Unique identifier for support troubleshooting
•Human-readable message — Clear explanation of what happened and why
•Machine-readable code — Stable error codes for programmatic handling
•Actionable guidance — Tell the client exactly what to do (wait, retry, upgrade)
•Documentation links — Point to comprehensive rate limit documentation
•Consistent structure — Use the same error format as your other API errors
•Include timestamp — ISO 8601 format for reset times across time zones
•Multi-limit visibility — If multiple limits exist, show all of them

Localization Consideration

Client-Side Retry Strategies

retry-client.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
/**
 * Rate Limit-Aware HTTP Client
 * 
 * Implements exponential backoff with jitter and respects Retry-After.
 */
interface RetryConfig {
  maxRetries: number;
  baseDelayMs: number;
  maxDelayMs: number;
  jitterFactor: number;  // 0-1, how much randomness to add
}
 
const DEFAULT_RETRY_CONFIG: RetryConfig = {
  maxRetries: 5,
  baseDelayMs: 1000,
  maxDelayMs: 60000,
  jitterFactor: 0.3,
};
 
class RateLimitAwareClient {
  private config: RetryConfig;
  
  constructor(config: Partial<RetryConfig> = {}) {
    this.config = { ...DEFAULT_RETRY_CONFIG, ...config };
  }
 
  async request<T>(
    url: string,
    options: RequestInit = {}
  ): Promise<T> {
    let lastError: Error | null = null;
    
    for (let attempt = 0; attempt <= this.config.maxRetries; attempt++) {
      try {
        const response = await fetch(url, options);
        
        // Track rate limit headers proactively
        this.updateRateLimitState(url, response.headers);
        
        // Success
        if (response.ok) {
          return await response.json();
        }
        
        // Rate limited - extract retry timing
        if (response.status === 429) {
          const retryAfter = this.parseRetryAfter(response.headers);
          const delay = retryAfter || this.calculateBackoff(attempt);
          
          console.log(
            `Rate limited on attempt ${attempt + 1}. ` +
            `Waiting ${delay}ms before retry.`
          );
          
          await this.sleep(delay);
          continue;
        }
        
        // Server overloaded
        if (response.status === 503) {
          const retryAfter = this.parseRetryAfter(response.headers);
          const delay = retryAfter || this.calculateBackoff(attempt);
          
          await this.sleep(delay);
          continue;
        }
        
        // Other errors - don't retry
        const error = await response.json();
        throw new Error(`API error ${response.status}: ${error.message}`);
        
      } catch (err) {
        lastError = err as Error;
        
        // Network errors - worth retrying
        if (this.isRetryableError(err)) {
          const delay = this.calculateBackoff(attempt);
          await this.sleep(delay);
          continue;
        }
        
        throw err;
      }
    }
    
    throw new Error(
      `Max retries (${this.config.maxRetries}) exceeded. ` +
      `Last error: ${lastError?.message}`
    );
  }
 
  /**
   * Parse Retry-After header (seconds or HTTP-date)
   */
  private parseRetryAfter(headers: Headers): number | null {
    const retryAfter = headers.get('Retry-After');
    if (!retryAfter) return null;
    
    // Check if it's a number of seconds
    const seconds = parseInt(retryAfter, 10);
    if (!isNaN(seconds)) {
      return seconds * 1000; // Convert to ms
    }
    
    // Try to parse as HTTP-date
    const date = new Date(retryAfter);
    if (!isNaN(date.getTime())) {
      return Math.max(0, date.getTime() - Date.now());
    }
    
    return null;
  }
 
  /**
   * Exponential backoff with jitter
   */
  private calculateBackoff(attempt: number): number {
    // Exponential: 1s, 2s, 4s, 8s, 16s, ...
    const exponentialDelay = this.config.baseDelayMs * Math.pow(2, attempt);
    
    // Cap at maximum
    const cappedDelay = Math.min(exponentialDelay, this.config.maxDelayMs);
    
    // Add jitter (randomness to prevent thundering herd)
    const jitter = cappedDelay * this.config.jitterFactor * Math.random();
    
    return Math.floor(cappedDelay + jitter);
  }
 
  /**
   * Proactively track rate limits to avoid hitting them
   */
  private updateRateLimitState(url: string, headers: Headers): void {
    const remaining = headers.get('RateLimit-Remaining') || 
                      headers.get('X-RateLimit-Remaining');
    const limit = headers.get('RateLimit-Limit') || 
                  headers.get('X-RateLimit-Limit');
    
    if (remaining && limit) {
      const remainingNum = parseInt(remaining, 10);
      const limitNum = parseInt(limit, 10);
      
      // If approaching limit, slow down proactively
      if (remainingNum < limitNum * 0.1) {
        console.warn(
          `Approaching rate limit: ${remainingNum}/${limitNum} remaining`
        );
        // Client could slow down request rate here
      }
    }
  }
 
  private isRetryableError(err: unknown): boolean {
    // Network errors are retryable
    if (err instanceof TypeError && err.message.includes('fetch')) {
      return true;
    }
    return false;
  }
 
  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

Why jitter matters:

Without jitter, many clients hitting a rate limit at the same time will all retry at exactly the same moment. This creates a 'thundering herd' that can re-overload the server:

Time 0:00 - Server rate limits 1000 clients
Time 0:01 - All 1000 clients retry simultaneously → overload again
Time 0:02 - All 1000 clients retry simultaneously → overload again
...

With jitter:

Time 0:00 - Server rate limits 1000 clients  
Time 0:00-0:02 - Clients retry spread across 2 seconds → manageable load

Jitter formula: delay = baseDelay * 2^attempt * (1 + random() * jitterFactor)

Respect Retry-After

Graceful Degradation Patterns

Graceful Degradation Strategies
Strategy	Description	Example	When to Use
Reduce Response Quality	Return less data as limit approaches	Search returns 10 results instead of 100	Data-heavy endpoints
Disable Non-Essential Features	Turn off expensive features first	Disable real-time updates, use polling	Feature-rich applications
Serve Cached Data	Return stale data instead of blocking	Show cached product prices (5 min old)	Read-heavy workloads
Queue for Later	Accept request but process asynchronously	Webhook delivery with delay	Available for write operations
Reduce Frequency	Allow operation but limit how often	Full sync daily, incremental syncs blocked	Sync/export operations
Priority Queuing	Serve high-priority requests, queue others	Checkout proceeds, browsing waits	Mixed-priority traffic

graceful-degradation.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
/**
 * Graceful Degradation Implementation
 * 
 * Reduce service quality progressively rather than hard blocking.
 */
interface DegradationLevel {
  name: string;
  threshold: number;     // % of limit consumed
  modifications: {
    maxResults?: number;
    includeDetails?: boolean;
    useCachedData?: boolean;
    maxPageSize?: number;
    enableRealtime?: boolean;
  };
}
 
const DEGRADATION_LEVELS: DegradationLevel[] = [
  {
    name: 'full',
    threshold: 0,
    modifications: {
      maxResults: 100,
      includeDetails: true,
      useCachedData: false,
      maxPageSize: 100,
      enableRealtime: true,
    },
  },
  {
    name: 'reduced',
    threshold: 70,
    modifications: {
      maxResults: 50,
      includeDetails: true,
      useCachedData: false,
      maxPageSize: 50,
      enableRealtime: true,
    },
  },
  {
    name: 'minimal',
    threshold: 85,
    modifications: {
      maxResults: 20,
      includeDetails: false,
      useCachedData: true,  // Accept slightly stale data
      maxPageSize: 20,
      enableRealtime: false,
    },
  },
  {
    name: 'critical',
    threshold: 95,
    modifications: {
      maxResults: 10,
      includeDetails: false,
      useCachedData: true,
      maxPageSize: 10,
      enableRealtime: false,
    },
  },
];
 
class GracefulDegradationMiddleware {
  getDegradationLevel(rateLimitInfo: RateLimitInfo): DegradationLevel {
    const usagePercent = ((rateLimitInfo.limit - rateLimitInfo.remaining) / 
                          rateLimitInfo.limit) * 100;
    
    // Find highest threshold we've exceeded
    for (let i = DEGRADATION_LEVELS.length - 1; i >= 0; i--) {
      if (usagePercent >= DEGRADATION_LEVELS[i].threshold) {
        return DEGRADATION_LEVELS[i];
      }
    }
    
    return DEGRADATION_LEVELS[0];
  }
 
  applyDegradation(
    req: Request,
    res: Response,
    level: DegradationLevel
  ): void {
    // Modify request parameters based on degradation level
    const mods = level.modifications;
    
    // Limit result count
    if (req.query.limit && mods.maxResults) {
      req.query.limit = Math.min(
        parseInt(req.query.limit as string, 10),
        mods.maxResults
      ).toString();
    }
    
    // Set header indicating degradation
    res.setHeader('X-Degradation-Level', level.name);
    
    if (level.name !== 'full') {
      res.setHeader(
        'Warning',
        `299 - "Service degraded: ${level.name} mode active"`
      );
    }
    
    // Attach to request for downstream use
    (req as any).degradationLevel = level;
  }
}
 
/**
 * Priority-based request handling under load
 */
class PriorityRateLimiter {
  private readonly highPriorityEndpoints = [
    '/api/checkout',
    '/api/payment',
    '/api/auth/login',
  ];
 
  private readonly lowPriorityEndpoints = [
    '/api/analytics',
    '/api/recommendations',
    '/api/history',
  ];
 
  determineRequestPriority(path: string, user: User | null): Priority {
    // Critical endpoints always high priority
    if (this.highPriorityEndpoints.some(e => path.startsWith(e))) {
      return 'high';
    }
 
    // Low-value endpoints are deprioritized
    if (this.lowPriorityEndpoints.some(e => path.startsWith(e))) {
      return 'low';
    }
 
    // Paying customers get priority
    if (user?.subscription === 'premium') {
      return 'high';
    }
 
    return 'normal';
  }
 
  async handleWithPriority(
    priority: Priority,
    handler: () => Promise<Response>
  ): Promise<Response> {
    const limits = {
      high: { limit: 1000, queue: true },
      normal: { limit: 100, queue: true },
      low: { limit: 50, queue: false },  // Don't queue, just reject
    };
 
    const config = limits[priority];
    
    if (!this.checkLimit(priority, config.limit)) {
      if (config.queue) {
        return this.queueRequest(priority, handler);
      }
      return new Response('Rate limit exceeded', { status: 429 });
    }
 
    return handler();
  }
}

Communicate Degradation

User Experience in Applications

UI Patterns for Rate Limiting

•Proactive Usage Indicators — Show quota usage before users hit limits. A progress bar showing '75/100 API calls today' sets expectations.
•Countdown Timers — When limited, show a countdown until reset. 'Try again in 2:34' is better than 'Too many requests'.
•Queue Position — If requests are queued, show position. 'Your request is #5 in queue, estimated 30 seconds'.
•Stale Data Indication — If showing cached data, indicate age. 'Prices as of 5 minutes ago. Refresh in 30 seconds.'
•Debounced Inputs — For search/autocomplete, debounce client-side to reduce requests. Don't hit the API on every keystroke.
•Optimistic UI with Retry — Show optimistic updates, retry in background, only show error if retries fail.
•Upgrade Prompts — When quota is exhausted, offer clear upgrade paths. 'Upgrade to Pro for unlimited access'.
•Reduced Functionality Mode — Indicate what's limited. 'Search limited to 10 results. Upgrade for full results.'

rate-limit-ui.tsx
React
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
/**
 * React component for handling rate-limited API calls
 */
import { useState, useCallback } from 'react';
 
interface RateLimitState {
  isLimited: boolean;
  retryAfter: number | null;
  remaining: number | null;
  limit: number | null;
}
 
function useRateLimitedApi<T>(
  apiCall: () => Promise<T>
): {
  execute: () => Promise<T | null>;
  state: RateLimitState;
  isLoading: boolean;
} {
  const [state, setState] = useState<RateLimitState>({
    isLimited: false,
    retryAfter: null,
    remaining: null,
    limit: null,
  });
  const [isLoading, setIsLoading] = useState(false);
 
  const execute = useCallback(async () => {
    if (state.isLimited && state.retryAfter && state.retryAfter > 0) {
      return null;
    }
 
    setIsLoading(true);
    try {
      const response = await apiCall();
      setState(prev => ({ 
        ...prev, 
        isLimited: false,
        retryAfter: null,
      }));
      return response;
    } catch (error) {
      if (error.status === 429) {
        const retryAfter = error.retryAfter || 60;
        setState({
          isLimited: true,
          retryAfter,
          remaining: 0,
          limit: error.limit,
        });
        
        // Auto-reset after retry period
        setTimeout(() => {
          setState(prev => ({
            ...prev,
            isLimited: false,
            retryAfter: null,
          }));
        }, retryAfter * 1000);
      }
      throw error;
    } finally {
      setIsLoading(false);
    }
  }, [apiCall, state.isLimited, state.retryAfter]);
 
  return { execute, state, isLoading };
}
 
// Usage in component
function SearchComponent() {
  const { execute, state, isLoading } = useRateLimitedApi(
    () => api.search(query)
  );
 
  if (state.isLimited) {
    return (
      <div className="rate-limit-banner">
        <div className="message">
          You've made too many searches. Please wait:
        </div>
        <CountdownTimer seconds={state.retryAfter} />
        <div className="help">
          <a href="/upgrade">Upgrade</a> for unlimited searches
        </div>
      </div>
    );
  }
 
  return (
    <div>
      <SearchInput onSearch={execute} disabled={isLoading} />
      {state.remaining !== null && (
        <div className="usage-indicator">
          {state.remaining} / {state.limit} searches remaining
        </div>
      )}
    </div>
  );
}

Don't Punish Good Users

Documentation Standards

What to Document

•Limit values — Exact numbers per endpoint, per tier, per time window
•Time windows — Sliding vs fixed, window size, reset behavior
•Limit scope — What's counted together (per IP, per key, per endpoint)
•Headers used — Exact header names and their meanings
•Error responses — Full examples of 429 responses with all fields
•Retry guidance — How to calculate retry delays, what Retry-After means
•Tier differences — How limits vary across subscription levels
•Special endpoint limits — Which endpoints have different limits
•Cost weighting — If some operations cost more than 1 'request'
•Burst behavior — Whether bursts are allowed and to what extent

rate-limit-docs.md
Markdown
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# Rate Limits
 
## Overview
 
Our API implements rate limiting to ensure fair usage and protect 
service quality. Rate limits are applied per API key.
 
## Limit Tiers
 
| Tier         | Requests/min | Requests/day | Burst |
|--------------|--------------|--------------|-------|
| Free         | 60           | 1,000        | 10    |
| Starter      | 300          | 25,000       | 50    |
| Professional | 1,000        | 250,000      | 200   |
| Enterprise   | Custom       | Custom       | Custom|
 
## Headers
 
Every response includes rate limit headers:
 
| Header | Description |
|--------|-------------|
| `RateLimit-Limit` | Maximum requests in current window |
| `RateLimit-Remaining` | Requests remaining in current window |
| `RateLimit-Reset` | Seconds until limit resets |
| `Retry-After` | (429 only) Seconds to wait before retrying |
 
## Handling 429 Responses
 
When you receive a 429 response:
 
1. Read the `Retry-After` header
2. Wait for the specified number of seconds
3. Retry your request
 
**Example 429 Response:**
```json
{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded. Retry after 30 seconds.",
  "retry_after": 30
}
```
 
## Best Practices
 
1. **Monitor headers** — Track remaining requests proactively
2. **Implement backoff** — Use exponential backoff with jitter
3. **Respect Retry-After** — Always wait the specified time
4. **Cache responses** — Reduce requests by caching when possible
5. **Batch operations** — Use bulk endpoints where available
 
## Special Endpoint Limits
 
Some endpoints have stricter limits:
 
- `/auth/login` — 5 requests/minute (security)
- `/export` — 10 requests/hour (resource intensive)
- `/search` — 30 requests/minute (compute intensive)
 
## Request Costs
 
Most endpoints cost 1 request. Some cost more:
 
| Endpoint | Cost |
|----------|------|
| `/search` | 1 |
| `/export` | 10 |
| `/batch/*` | 1 per item (max 100) |

SDKs Should Handle Rate Limits

Summary: Response Handling

Key Takeaways

•Use 429 Too Many Requests — The correct status code for rate limiting; 503 is for server issues, not client limits
•Include comprehensive headers — RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset on every response; Retry-After on 429s
•Design informative response bodies — Error codes, human messages, retry guidance, and documentation links
•Clients should implement exponential backoff with jitter — Prevents thundering herd; respects server guidance
•Graceful degradation over hard blocking — Reduce quality progressively; queue when possible; prioritize critical requests
•User-facing apps need thoughtful UX — Countdown timers, usage indicators, upgrade prompts, and clear explanations
•Document everything — Limits, headers, examples, and best practices; unclear documentation creates support burden
•SDKs should handle rate limits — Build retry logic into client libraries for consistent behavior

Module Complete:

Congratulations! You've completed the comprehensive module on Rate Limiting and Throttling. You now understand:

Why rate limiting is essential for protecting systems from abuse
How the core algorithms work (Token Bucket, Leaky Bucket, Fixed/Sliding Window)
Where to implement distributed rate limiting and the consistency tradeoffs
Who is making requests through client identification strategies
What to return when limits are exceeded for optimal client experience

Module Complete

5 / 5