Rate Limiter - Learning Module

Loading content...

0/273

Client Communication

The Art of Saying 'Not So Fast'

A rate limiter that simply rejects requests without explanation is a rate limiter that creates frustrated developers and support tickets. The difference between a good and excellent rate limiting experience lies entirely in communication—how you inform clients of their limits, their current usage, and what they should do when limits are reached.

Consider the difference:

Poor: 429 Too Many Requests (no additional context)

Excellent:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1609459200
Retry-After: 37

{"error": "Rate limit exceeded", "message": "You have exceeded your quota of 1000 requests per hour. Your quota resets in 37 seconds."}

The excellent version tells developers exactly what happened, how much quota they have, when it resets, and when to retry. This page covers everything needed to create this excellent experience.

What You Will Master

By the end of this page, you'll understand: (1) Standard rate limiting HTTP headers, (2) Designing informative error responses, (3) Client-side retry strategies and exponential backoff, (4) Proactive rate limit communication, and (5) Building SDK support for rate limiting.

HTTP Headers for Rate Limiting

HTTP headers are the primary mechanism for communicating rate limit status to clients. While no single standard exists, a set of conventional headers has emerged that most major APIs follow.

Standard Rate Limiting Headers
Header	Purpose	Example Value	Notes
X-RateLimit-Limit	Maximum requests in window	1000	Per-window quota
X-RateLimit-Remaining	Requests remaining in current window	423	Decrements with each request
X-RateLimit-Reset	When window resets (Unix timestamp)	1609459200	UTC seconds since epoch
Retry-After	Seconds until retry is likely to succeed	37	Standard HTTP header (RFC 7231)
X-RateLimit-Resource	Which resource is being limited	/api/search	Useful for per-endpoint limits
X-RateLimit-Policy	Detailed policy string	1000;w=3600	IETF draft format

The Core Three Headers

Every rate-limited API should return at minimum:

X-RateLimit-Limit: The maximum number of requests allowed in the current time window.

X-RateLimit-Limit: 1000

This tells clients their total budget.

X-RateLimit-Remaining: How many requests remain before hitting the limit.

X-RateLimit-Remaining: 423

Clients can use this to implement client-side throttling before hitting 429s.

X-RateLimit-Reset: Unix timestamp (seconds) when the rate limit window resets.

X-RateLimit-Reset: 1609459200

Clients can calculate: seconds_to_reset = reset_timestamp - current_time

Important: Use Unix timestamps in seconds or milliseconds (document which). Avoid human-readable formats that require parsing.

rate-limit-headers.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/**
 * Generate rate limit headers for HTTP response
 */
interface RateLimitHeaders {
  'X-RateLimit-Limit': string;
  'X-RateLimit-Remaining': string;
  'X-RateLimit-Reset': string;
  'Retry-After'?: string;
  'X-RateLimit-Resource'?: string;
}
 
function generateRateLimitHeaders(
  result: RateLimitResult,
  resource?: string
): RateLimitHeaders {
  const headers: RateLimitHeaders = {
    'X-RateLimit-Limit': result.limit.toString(),
    'X-RateLimit-Remaining': Math.max(0, result.remaining).toString(),
    'X-RateLimit-Reset': Math.floor(result.resetAt / 1000).toString()
  };
  
  // Only include Retry-After when rate limited
  if (!result.allowed && result.retryAfter > 0) {
    headers['Retry-After'] = result.retryAfter.toString();
  }
  
  // Include resource if provided (for multi-limit APIs)
  if (resource) {
    headers['X-RateLimit-Resource'] = resource;
  }
  
  return headers;
}
 
/**
 * Express middleware to add rate limit headers
 */
function rateLimitMiddleware(limiter: RateLimiter) {
  return async (req: Request, res: Response, next: NextFunction) => {
    const clientId = extractClientId(req);
    const result = await limiter.checkLimit(clientId);
    
    // Always add headers, even on successful requests
    const headers = generateRateLimitHeaders(result);
    Object.entries(headers).forEach(([key, value]) => {
      res.setHeader(key, value);
    });
    
    if (!result.allowed) {
      res.status(429).json({
        error: 'Rate limit exceeded',
        message: `You have exceeded your quota of ${result.limit} requests per hour.`,
        retryAfter: result.retryAfter
      });
      return;
    }
    
    next();
  };
}

The Retry-After Header

The Retry-After header is a standard HTTP header (RFC 7231) that tells clients how long to wait before retrying. It's essential for 429 responses:

Formats:

# Seconds (preferred for rate limiting)
Retry-After: 120

# HTTP date (alternative)
Retry-After: Wed, 21 Oct 2025 07:28:00 GMT

Why Retry-After matters:

Well-behaved clients respect it, preventing retry storms
Reduces unnecessary load on your servers
Improves client experience (no wasted retries)
Some HTTP clients/libraries automatically respect it

Calculating Retry-After:

function calculateRetryAfter(result: RateLimitResult): number {
  if (result.allowed) return 0;
  
  const now = Date.now();
  const resetAt = result.resetAt;
  
  // Seconds until reset
  let retryAfter = Math.ceil((resetAt - now) / 1000);
  
  // Add small jitter to prevent thundering herd
  const jitter = Math.random() * 5;  // 0-5 seconds
  retryAfter = Math.ceil(retryAfter + jitter);
  
  return Math.max(1, retryAfter);  // Minimum 1 second
}

Add Headers to ALL Responses

Include rate limit headers on successful responses, not just 429s. This allows clients to monitor their usage proactively and implement client-side throttling before hitting limits. It's much better to slow down gracefully than to suddenly hit a wall.

Multi-Limit Header Strategies

When APIs have multiple rate limits (global + per-endpoint, per-second + per-hour), communicating all limits requires careful design. Several patterns exist.

Pattern 1: Most Constrained Limit

Return only the limit closest to being exceeded:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 3      # Only 3 left!
X-RateLimit-Reset: 1609459200
X-RateLimit-Resource: /api/search  # Which limit is constrained

Pros: Simple headers, clients know biggest concern Cons: Other limits invisible, may surprise clients

Pattern 2: Primary + Secondary Headers

Return global limit in standard headers, endpoint-specific in extended:

# Primary (global limit)
X-RateLimit-Limit: 10000
X-RateLimit-Remaining: 4500
X-RateLimit-Reset: 1609459200

# Secondary (endpoint limit)
X-RateLimit-Limit-Endpoint: 100
X-RateLimit-Remaining-Endpoint: 23
X-RateLimit-Reset-Endpoint: 1609458600

Pros: Both limits visible Cons: Non-standard headers, complex parsing

Pattern 3: JSON Header Body (GitHub Style)

Return detailed limit info in response body:

{
  "data": { ... },
  "rate_limits": {
    "core": {
      "limit": 5000,
      "remaining": 4999,
      "reset": 1609459200
    },
    "search": {
      "limit": 30,
      "remaining": 28,
      "reset": 1609458600
    },
    "graphql": {
      "limit": 5000,
      "remaining": 4900,
      "reset": 1609459200,
      "unit": "points"
    }
  }
}

Pros: Complete information, well-structured Cons: Adds to response size, changes response schema

Pattern 4: RateLimit-Policy Header (IETF Draft)

An IETF draft standard proposes a structured policy header:

RateLimit-Policy: burst;q=100, daily;q=10000, search;q=30
RateLimit: limit=100, remaining=23, reset=37

Structured format:

RateLimit-Policy: <policy-name>;q=<quota>;w=<window-seconds>

Examples:
RateLimit-Policy: default;q=1000;w=3600
RateLimit-Policy: burst;q=100;w=1, sustained;q=5000;w=3600

Pros: Standard format, extensible Cons: Not yet widely adopted, complex parsing

multi-limit-headers.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
/**
 * Generate headers for multiple rate limits
 */
interface MultiLimitResult {
  limits: {
    name: string;
    limit: number;
    remaining: number;
    resetAt: number;
  }[];
  mostConstrained: string;  // Name of the most constrained limit
}
 
function generateMultiLimitHeaders(result: MultiLimitResult): Record<string, string> {
  // Find the most constrained (lowest remaining percentage)
  const mostConstrained = result.limits.reduce((prev, curr) => {
    const prevPercent = prev.remaining / prev.limit;
    const currPercent = curr.remaining / curr.limit;
    return currPercent < prevPercent ? curr : prev;
  });
  
  // Standard headers show most constrained
  const headers: Record<string, string> = {
    'X-RateLimit-Limit': mostConstrained.limit.toString(),
    'X-RateLimit-Remaining': mostConstrained.remaining.toString(),
    'X-RateLimit-Reset': Math.floor(mostConstrained.resetAt / 1000).toString(),
    'X-RateLimit-Resource': mostConstrained.name
  };
  
  // Extended headers for all limits
  for (const limit of result.limits) {
    const prefix = `X-RateLimit-${capitalize(limit.name)}`;
    headers[`${prefix}-Limit`] = limit.limit.toString();
    headers[`${prefix}-Remaining`] = limit.remaining.toString();
    headers[`${prefix}-Reset`] = Math.floor(limit.resetAt / 1000).toString();
  }
  
  return headers;
}
 
// Example output:
// X-RateLimit-Limit: 30
// X-RateLimit-Remaining: 2
// X-RateLimit-Reset: 1609458600
// X-RateLimit-Resource: search
// X-RateLimit-Global-Limit: 5000
// X-RateLimit-Global-Remaining: 4500
// X-RateLimit-Global-Reset: 1609459200
// X-RateLimit-Search-Limit: 30
// X-RateLimit-Search-Remaining: 2
// X-RateLimit-Search-Reset: 1609458600

Designing Informative Error Responses

When rate limiting kicks in, the 429 response body should be a model of clarity. Well-designed error responses reduce support burden and help developers self-serve.

rate-limit-error-response.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
  "error": {
    "type": "rate_limit_exceeded",
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "You have exceeded your rate limit.",
    
    "details": {
      "limit": 1000,
      "remaining": 0,
      "reset_at": "2025-01-15T10:00:00Z",
      "reset_in_seconds": 1847,
      "window": "1 hour",
      "resource": "core",
      "scope": "user"
    },
    
    "guidance": {
      "retry_after": 1847,
      "message": "Your rate limit will reset in 30 minutes and 47 seconds.",
      "documentation_url": "https://api.example.com/docs/rate-limits",
      "upgrade_url": "https://example.com/pricing"
    },
    
    "request_id": "req_abc123xyz"
  }
}

Anatomy of an Excellent Error Response

1. Error Identification:

"type": "rate_limit_exceeded",
"code": "RATE_LIMIT_EXCEEDED"

Machine-readable type and code for programmatic handling.

2. Human-Readable Message:

"message": "You have exceeded your rate limit."

Clear, non-technical language that can be shown to end users if needed.

3. Technical Details:

"details": {
  "limit": 1000,
  "remaining": 0,
  "reset_at": "2025-01-15T10:00:00Z",
  "window": "1 hour"
}

All the information a developer needs to understand the situation.

4. Actionable Guidance:

"guidance": {
  "retry_after": 1847,
  "message": "Your rate limit will reset in 30 minutes.",
  "documentation_url": "...",
  "upgrade_url": "..."
}

Tells developers what to do next.

Error Response Best Practices

•Include request ID for debugging
•Provide both timestamp and seconds-to-reset
•Link to documentation
•Use consistent error schema
•Include which limit was exceeded (if multiple)

Common Mistakes to Avoid

•Generic 'Too many requests' with no details
•Missing retry-after information
•Inconsistent error formats across endpoints
•Confusing technical jargon
•Blaming the user ('You are rate limited')

Localized Error Messages

For international APIs, consider localized error messages:

function getRateLimitMessage(locale: string, details: RateLimitDetails): string {
  const messages: Record<string, (d: RateLimitDetails) => string> = {
    'en': (d) => `Rate limit exceeded. Resets in ${d.resetIn} seconds.`,
    'es': (d) => `Límite de tasa excedido. Se reinicia en ${d.resetIn} segundos.`,
    'ja': (d) => `レート制限を超えました。${d.resetIn}秒後にリセットされます。`,
    'de': (d) => `Ratenlimit überschritten. Wird in ${d.resetIn} Sekunden zurückgesetzt.`
  };
  
  const getMessage = messages[locale] || messages['en'];
  return getMessage(details);
}

Client-Side Retry Strategies

Well-behaved API clients implement intelligent retry strategies. As an API provider, you should guide clients toward correct retry behavior and provide SDKs that implement it.

Exponential Backoff with Jitter

The industry-standard retry strategy combines exponential backoff with random jitter:

Algorithm:

wait_time = min(max_wait, base_wait * (2 ^ attempt)) + random_jitter

Example progression:

Attempt 1: 1.0s + jitter(0-0.5s) → wait 1.0-1.5s
Attempt 2: 2.0s + jitter(0-0.5s) → wait 2.0-2.5s
Attempt 3: 4.0s + jitter(0-0.5s) → wait 4.0-4.5s
Attempt 4: 8.0s + jitter(0-0.5s) → wait 8.0-8.5s
Attempt 5: 16.0s + jitter(0-0.5s) → wait 16.0-16.5s
...
Max wait: 60s (capped)

Why jitter is critical: Without jitter, clients that failed at the same time retry at the same time, creating a 'thundering herd'. Random jitter spreads retries across time, smoothing load.

retry-client.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
/**
 * Intelligent retry client with exponential backoff
 */
interface RetryConfig {
  maxRetries: number;        // Maximum retry attempts
  baseDelayMs: number;       // Initial delay
  maxDelayMs: number;        // Maximum delay cap
  jitterPercent: number;     // Jitter as percentage (0-100)
}
 
const defaultConfig: RetryConfig = {
  maxRetries: 5,
  baseDelayMs: 1000,
  maxDelayMs: 60000,
  jitterPercent: 25
};
 
async function fetchWithRetry(
  url: string,
  options: RequestInit,
  config: RetryConfig = defaultConfig
): Promise<Response> {
  let lastError: Error | null = null;
  
  for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
    try {
      const response = await fetch(url, options);
      
      // Success - return immediately
      if (response.ok) {
        return response;
      }
      
      // Rate limited - retry with appropriate delay
      if (response.status === 429) {
        const retryAfter = getRetryAfter(response);
        
        if (attempt < config.maxRetries) {
          const delay = calculateDelay(attempt, retryAfter, config);
          console.log(`Rate limited. Waiting ${delay}ms before retry ${attempt + 1}`);
          await sleep(delay);
          continue;
        }
      }
      
      // Server error (5xx) - may be transient, retry
      if (response.status >= 500) {
        if (attempt < config.maxRetries) {
          const delay = calculateDelay(attempt, null, config);
          console.log(`Server error. Waiting ${delay}ms before retry ${attempt + 1}`);
          await sleep(delay);
          continue;
        }
      }
      
      // Client error (4xx except 429) - don't retry
      return response;
      
    } catch (error) {
      // Network error - retry
      lastError = error as Error;
      
      if (attempt < config.maxRetries) {
        const delay = calculateDelay(attempt, null, config);
        console.log(`Network error. Waiting ${delay}ms before retry ${attempt + 1}`);
        await sleep(delay);
        continue;
      }
    }
  }
  
  throw lastError || new Error('Max retries exceeded');
}
 
function getRetryAfter(response: Response): number | null {
  const retryAfter = response.headers.get('Retry-After');
  if (!retryAfter) return null;
  
  // Could be seconds or HTTP date
  const seconds = parseInt(retryAfter, 10);
  if (!isNaN(seconds)) {
    return seconds * 1000;  // Convert to ms
  }
  
  // Try parsing as date
  const date = new Date(retryAfter);
  if (!isNaN(date.getTime())) {
    return Math.max(0, date.getTime() - Date.now());
  }
  
  return null;
}
 
function calculateDelay(
  attempt: number,
  retryAfterMs: number | null,
  config: RetryConfig
): number {
  // If server provided Retry-After, respect it (with small jitter)
  if (retryAfterMs !== null) {
    const jitter = retryAfterMs * (config.jitterPercent / 100) * Math.random();
    return retryAfterMs + jitter;
  }
  
  // Exponential backoff
  const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt);
  const cappedDelay = Math.min(exponentialDelay, config.maxDelayMs);
  
  // Add jitter
  const jitter = cappedDelay * (config.jitterPercent / 100) * Math.random();
  
  return cappedDelay + jitter;
}
 
function sleep(ms: number): Promise<void> {
  return new Promise(resolve => setTimeout(resolve, ms));
}

Circuit Breaker Pattern

For clients making many requests, the circuit breaker pattern prevents repeatedly hitting a known-broken or rate-limited endpoint:

States:

Closed: Normal operation, requests go through
Open: Failures exceeded threshold, requests fail immediately
Half-Open: After timeout, allow one request to test recovery

[Closed] ---(failures > threshold)---> [Open]
    ^                                     |
    |                                     | (timeout)
    |                                     v
    +-------(success)-------- [Half-Open]
                     \----(failure)---┘

This prevents hammering a rate-limited API and wasting quota on guaranteed-to-fail requests.

Document Retry Expectations

Explicitly document expected retry behavior in your API docs. State: (1) Which error codes are retryable, (2) Expected backoff algorithm, (3) Maximum retry count before giving up, (4) How to interpret Retry-After. Clients that ignore these guidelines can be deprioritized or blocked.

Proactive Rate Limit Communication

Great APIs don't wait until clients hit limits to communicate. Proactive communication helps developers manage usage before problems occur.

Proactive Communication Strategies

•Dashboard Visibility — Real-time usage dashboard showing current usage, limits, and historical trends
•Email Alerts — Notify when usage reaches 80%, 90%, 95% of limits
•Webhook Notifications — Machine-readable alerts for automated response
•Dedicated Status Endpoint — /v1/rate_limit endpoint to check status without consuming quota
•Usage Analytics API — Historical usage data for capacity planning
•In-Dashboard Upgrade Prompts — Contextual prompts when approaching limits

rate-limit-status-api.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
/**
 * Dedicated rate limit status endpoint
 * GET /v1/rate-limit/status
 * 
 * This endpoint does NOT consume quota (exempt from rate limiting)
 */
app.get('/v1/rate-limit/status', async (req, res) => {
  const clientId = extractClientId(req);
  
  const status = await rateLimiter.getStatus(clientId);
  
  res.json({
    limits: status.limits.map(limit => ({
      name: limit.name,
      resource: limit.resource,
      window: limit.window,
      limit: limit.limit,
      used: limit.used,
      remaining: limit.remaining,
      reset_at: limit.resetAt.toISOString(),
      utilization_percent: Math.round((limit.used / limit.limit) * 100)
    })),
    usage_summary: {
      current_period: {
        start: status.periodStart.toISOString(),
        end: status.periodEnd.toISOString(),
        total_requests: status.totalRequests
      },
      historical: {
        last_24h: status.last24hRequests,
        last_7d: status.last7dRequests,
        last_30d: status.last30dRequests
      }
    },
    tier: {
      name: status.tierName,
      limits_url: `https://api.example.com/docs/rate-limits#${status.tierName}`
    },
    warnings: status.warnings  // e.g., "Approaching 90% of daily limit"
  });
});
 
/**
 * Webhook notification for approaching limits
 */
interface RateLimitWebhook {
  event: 'limit.warning' | 'limit.exceeded' | 'limit.reset';
  timestamp: string;
  client: {
    id: string;
    email: string;
  };
  limit: {
    name: string;
    value: number;
    used: number;
    threshold_percent: number;  // 80, 90, 95, 100
    reset_at: string;
  };
  recommendation: string;
}
 
// Example webhook payload
const webhookPayload: RateLimitWebhook = {
  event: 'limit.warning',
  timestamp: '2025-01-15T09:30:00Z',
  client: {
    id: 'client_abc123',
    email: 'developer@example.com'
  },
  limit: {
    name: 'api_requests_daily',
    value: 10000,
    used: 9000,
    threshold_percent: 90,
    reset_at: '2025-01-16T00:00:00Z'
  },
  recommendation: 'Consider upgrading to Pro plan for 100,000 daily requests.'
};

Rate Limit Dashboard Design

A developer-friendly dashboard should show:

1. Current Status (at a glance):

┌─────────────────────────────────────────────────┐
│  API Requests (Daily)                           │
│  ████████████████████░░░░  8,234 / 10,000       │
│                            82% used             │
│  Resets in 4 hours 23 minutes                   │
└─────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────┐
│  Search API (Per Minute)                        │
│  ████████░░░░░░░░░░░░░░░░  12 / 30              │
│                            40% used             │
│  Resets in 48 seconds                           │
└─────────────────────────────────────────────────┘

2. Historical Trends:

Graph of usage over past 7/30 days
Identify growth patterns
Project when limits will be exceeded

3. Per-Endpoint Breakdown:

Which endpoints consume the most quota?
Opportunities for optimization

4. Alert Configuration:

Set threshold for email/webhook alerts
Enable/disable notifications

SDK and Client Library Support

Official SDKs should handle rate limiting automatically, providing a seamless developer experience. Here's how to design SDK rate limit handling.

sdk-rate-limit-handling.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
/**
 * Example SDK with built-in rate limit handling
 */
class ExampleAPIClient {
  private rateLimitState: RateLimitState = {
    remaining: Infinity,
    resetAt: 0,
    limit: Infinity
  };
  
  private config: ClientConfig;
  
  constructor(apiKey: string, options: Partial<ClientConfig> = {}) {
    this.config = {
      apiKey,
      baseUrl: options.baseUrl || 'https://api.example.com/v1',
      maxRetries: options.maxRetries ?? 3,
      autoRetry: options.autoRetry ?? true,
      onRateLimitWarning: options.onRateLimitWarning,
      onRateLimitExceeded: options.onRateLimitExceeded
    };
  }
  
  /**
   * Main request method with automatic rate limit handling
   */
  async request<T>(
    method: string,
    path: string,
    data?: unknown
  ): Promise<T> {
    // Pre-flight check: warn if approaching limit
    if (this.rateLimitState.remaining < 10) {
      this.config.onRateLimitWarning?.({
        remaining: this.rateLimitState.remaining,
        resetAt: new Date(this.rateLimitState.resetAt)
      });
    }
    
    // Pre-flight check: wait if definitely rate limited
    if (this.rateLimitState.remaining <= 0) {
      const waitTime = this.rateLimitState.resetAt - Date.now();
      if (waitTime > 0 && waitTime < 60000) {
        // Wait up to 60s rather than making a doomed request
        await this.sleep(waitTime);
      }
    }
    
    let lastError: Error | null = null;
    
    for (let attempt = 0; attempt <= this.config.maxRetries; attempt++) {
      try {
        const response = await this.makeRequest(method, path, data);
        
        // Update rate limit state from headers
        this.updateRateLimitState(response);
        
        if (response.ok) {
          return await response.json();
        }
        
        if (response.status === 429) {
          this.config.onRateLimitExceeded?.({
            retryAfter: this.getRetryAfter(response)
          });
          
          if (this.config.autoRetry && attempt < this.config.maxRetries) {
            const retryAfter = this.getRetryAfter(response);
            await this.sleep(retryAfter);
            continue;
          }
          
          throw new RateLimitError(await response.json());
        }
        
        // Other errors...
        throw new APIError(response.status, await response.json());
        
      } catch (error) {
        lastError = error as Error;
        if (!(error instanceof RateLimitError) || !this.config.autoRetry) {
          throw error;
        }
      }
    }
    
    throw lastError;
  }
  
  /**
   * Get current rate limit status without making a request
   */
  getRateLimitStatus(): RateLimitStatus {
    return {
      remaining: this.rateLimitState.remaining,
      limit: this.rateLimitState.limit,
      resetAt: new Date(this.rateLimitState.resetAt),
      isLimited: this.rateLimitState.remaining <= 0 && 
                 Date.now() < this.rateLimitState.resetAt
    };
  }
  
  /**
   * Check if it's safe to make a request
   */
  canMakeRequest(): boolean {
    if (this.rateLimitState.remaining > 0) return true;
    if (Date.now() >= this.rateLimitState.resetAt) return true;
    return false;
  }
  
  private updateRateLimitState(response: Response): void {
    const limit = response.headers.get('X-RateLimit-Limit');
    const remaining = response.headers.get('X-RateLimit-Remaining');
    const reset = response.headers.get('X-RateLimit-Reset');
    
    if (limit) this.rateLimitState.limit = parseInt(limit, 10);
    if (remaining) this.rateLimitState.remaining = parseInt(remaining, 10);
    if (reset) this.rateLimitState.resetAt = parseInt(reset, 10) * 1000;
  }
  
  private getRetryAfter(response: Response): number {
    const retryAfter = response.headers.get('Retry-After');
    if (retryAfter) {
      return parseInt(retryAfter, 10) * 1000;
    }
    // Fallback to reset time
    return Math.max(0, this.rateLimitState.resetAt - Date.now());
  }
  
  private async makeRequest(
    method: string,
    path: string,
    data?: unknown
  ): Promise<Response> {
    return fetch(`${this.config.baseUrl}${path}`, {
      method,
      headers: {
        'Authorization': `Bearer ${this.config.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: data ? JSON.stringify(data) : undefined
    });
  }
  
  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}
 
interface ClientConfig {
  apiKey: string;
  baseUrl: string;
  maxRetries: number;
  autoRetry: boolean;
  onRateLimitWarning?: (info: { remaining: number; resetAt: Date }) => void;
  onRateLimitExceeded?: (info: { retryAfter: number }) => void;
}
 
class RateLimitError extends Error {
  constructor(public readonly details: object) {
    super('Rate limit exceeded');
    this.name = 'RateLimitError';
  }
}

SDK Best Practices

1. Automatic Retry with Configurable Behavior:

// Default: retry automatically
const client = new Client(apiKey);

// Disable auto-retry for more control
const client = new Client(apiKey, { autoRetry: false });

2. Proactive Waiting: If the SDK knows a request will fail (remaining = 0 and reset is soon), wait rather than make a doomed request.

3. Hook for Custom Handling:

const client = new Client(apiKey, {
  onRateLimitWarning: (info) => {
    console.warn(`Only ${info.remaining} requests left!`);
  },
  onRateLimitExceeded: (info) => {
    alertOps(`Rate limited, waiting ${info.retryAfter}ms`);
  }
});

4. Expose Rate Limit State:

// Let users check status
if (!client.canMakeRequest()) {
  const status = client.getRateLimitStatus();
  console.log(`Rate limited until ${status.resetAt}`);
}

Summary: Client Communication

Effective client communication transforms rate limiting from a frustrating barrier into a manageable part of API integration. Let's consolidate our understanding:

Key Takeaways

•Standard headers matter — X-RateLimit-Limit, Remaining, Reset, and Retry-After are the foundation of rate limit communication.
•Include headers on all responses — Successful requests should also include headers so clients can monitor usage proactively.
•Error responses must be informative — Include what limit was hit, when it resets, how to retry, and where to find documentation.
•Exponential backoff with jitter — The industry-standard retry strategy prevents thundering herds and provides good recovery.
•Proactive communication — Dashboards, alerts, and status endpoints help developers manage usage before hitting limits.
•SDKs should handle this — Official client libraries should automatically retry, respect Retry-After, and expose rate limit state.

Module Complete:

Congratulations! You've completed the comprehensive Rate Limiter module. You now understand:

Requirements — Why rate limiting is essential and how to define requirements
Token Bucket — The elegant algorithm for smooth rate limiting with burst support
Sliding Window — Precise time-based limiting without boundary spikes
Distributed Rate Limiting — Coordinating limits across global infrastructure
Policy Design — Per-user, per-API, hierarchical, and tier-based limits
Client Communication — Headers, errors, retries, and SDK design

You're now equipped to design and implement production-grade rate limiters at any scale.

Module Complete

You've mastered rate limiter design from algorithms to policies to client communication. You can now design rate limiting systems that protect infrastructure while providing excellent developer experience. Apply these concepts to build APIs that scale reliably while remaining developer-friendly.