Loading content...
Implementing a rate limiting algorithm is only half the challenge. The other half—often more impactful on user experience—is policy design: deciding what to limit, who to identify, and how limits should interact across different dimensions.
Consider Stripe's API: They limit by API key, by endpoint, by IP address, and by card number being charged—all simultaneously. GitHub limits by authenticated user, but also by organization and by IP for unauthenticated requests. Cloudflare limits by zone, by plan tier, and by specific security rule.
In this page, we'll master the art of designing rate limiting policies that are fair, flexible, and aligned with business objectives—policies that protect infrastructure while enabling legitimate high-volume usage.
By the end of this page, you'll understand: (1) Identity dimensions for rate limiting (user, API key, IP, etc.), (2) Resource dimensions (global, endpoint-specific, operation-based), (3) Hierarchical and composite limit strategies, (4) Business tier differentiation, and (5) Real-world policy patterns from major API providers.
The first decision in rate limit policy design is identity: what entity are we tracking and limiting? The choice profoundly affects both security and user experience.
| Identity | Best For | Challenges | Examples |
|---|---|---|---|
| API Key | Authenticated APIs | Key sharing, key leakage | Stripe, Twilio, OpenAI |
| User ID | Multi-device users | Requires authentication | GitHub, Slack |
| Organization/Tenant | B2B SaaS | Large orgs need high limits | AWS, Salesforce |
| IP Address | Unauthenticated endpoints | NAT, shared IPs, VPNs | Login pages, public APIs |
| Device/Client ID | Mobile apps | Device spoofing | Mobile gaming, streaming |
| Composite | Defense in depth | Complex configuration | Most production systems |
The most common approach for authenticated APIs. Each API key gets its own quota.
Advantages:
Challenges:
Best Practices:
// Per-key limits with abuse detection
interface APIKeyPolicy {
keyId: string;
limits: {
requestsPerMinute: number;
requestsPerDay: number;
concurrentRequests: number;
};
flags: {
allowOverageWith429: boolean; // Soft vs hard limit
suspiciousActivityThreshold: number; // Trigger investigation
};
}
Limits tied to the authenticated user, regardless of which API keys or sessions they use.
Advantages:
Challenges:
Pattern: Key and User Limits Together
User: alice@example.com
├── API Key: key_prod_abc → 10,000 req/day (development key)
├── API Key: key_prod_xyz → 10,000 req/day (production key)
└── User Total: 15,000 req/day (regardless of key used)
Result: Even with 2 keys, Alice can't exceed 15,000/day total
Essential for unauthenticated endpoints and as a fallback layer.
Advantages:
Challenges:
Best Practices:
function getRateLimitIP(request: Request): string {
// Check for forwarded headers (from trusted proxies)
const forwarded = request.headers['x-forwarded-for'];
if (forwarded && isFromTrustedProxy(request)) {
return forwarded.split(',')[0].trim(); // Client's real IP
}
// Check Cloudflare header
const cfIP = request.headers['cf-connecting-ip'];
if (cfIP) return cfIP;
// Fallback to direct connection IP
return request.socket.remoteAddress;
}
Handle Shared IPs:
A single IP address from a large corporation might represent 50,000 employees. If you rate limit that IP at 100 requests/minute, effectively no one can use your service. Mitigation: identify major corporate IP ranges (via WHOIS) and apply higher limits, or use authenticated rate limiting for business customers.
For B2B SaaS, limits often apply at the organization level:
Organization: Acme Corp
├── User: alice → No individual limit
├── User: bob → No individual limit
├── User: carol → No individual limit
└── Org Total: 1,000,000 req/day (shared pool)
Why Organization Limits:
Combined Org + User Limits:
Organization: 1,000,000 req/day
└── Per-User within Org: 100,000 req/day
Result: Org can use 1M total, but no single user can use more than 100K
This prevents one runaway user from starving the team
Beyond who, we must decide what resource to limit. Different operations have vastly different costs, and a single global limit is often too coarse.
Global Limit Only:
API Key: 10,000 requests/hour (any endpoint)
/api/search: 0 cost awareness
/api/export: 0 cost awareness
/api/simple: 0 cost awareness
Problem: A single /api/export call might consume 100× the resources of /api/simple, but both count as 1 request.
Per-Endpoint Limits:
API Key: 10,000 requests/hour (global)
/api/search: 100 requests/minute
/api/export: 10 requests/hour
/api/simple: 1,000 requests/minute
Now expensive operations are protected separately, and cheap operations can sustain higher throughput.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
/** * Endpoint-specific rate limit configuration */interface EndpointRateLimits { // Global limit applies to all endpoints global: RateLimit; // Per-endpoint overrides endpoints: { [pattern: string]: RateLimit; };} interface RateLimit { requests: number; window: 'second' | 'minute' | 'hour' | 'day'; burst?: number; // Optional burst allowance} // Example configurationconst apiLimits: EndpointRateLimits = { global: { requests: 10000, window: 'hour' }, endpoints: { // Expensive operations 'POST /api/export': { requests: 10, window: 'hour' }, 'POST /api/analyze': { requests: 100, window: 'hour' }, // Medium-cost operations 'GET /api/search': { requests: 100, window: 'minute' }, 'POST /api/upload': { requests: 50, window: 'minute' }, // Cheap operations (high limit) 'GET /api/status': { requests: 1000, window: 'minute' }, // Pattern-based limits 'GET /api/users/*': { requests: 500, window: 'minute' }, 'POST /api/webhooks/*': { requests: 1000, window: 'hour' }, }}; /** * Rate limit middleware that checks applicable limits */async function rateLimitMiddleware( request: Request, limits: EndpointRateLimits, limiter: RateLimiter): Promise<RateLimitResult> { const clientId = extractClientId(request); const endpoint = `${request.method} ${request.path}`; // Check global limit first const globalResult = await limiter.checkLimit( `${clientId}:global`, limits.global.requests, windowToSeconds(limits.global.window) ); if (!globalResult.allowed) { return globalResult; } // Check endpoint-specific limit const endpointLimit = findMatchingLimit(endpoint, limits.endpoints); if (endpointLimit) { const endpointResult = await limiter.checkLimit( `${clientId}:${endpoint}`, endpointLimit.requests, windowToSeconds(endpointLimit.window) ); if (!endpointResult.allowed) { return endpointResult; } } return { allowed: true, remaining: globalResult.remaining };}Instead of per-endpoint limits, assign costs to operations:
API Key: 100,000 'credits' per hour
GET /api/simple: 1 credit
GET /api/search: 10 credits
POST /api/upload: 50 credits
POST /api/export: 500 credits
Advantages:
Disadvantages:
| Operation | Typical Cost | Rationale |
|---|---|---|
| Read from cache | 1 unit | Very cheap, served from memory |
| Read from database | 5 units | I/O bound, uses DB connections |
| Complex search | 20 units | CPU intensive, index scanning |
| Write operation | 10 units | Requires replication, durability |
| File upload | 50 units | Storage, bandwidth, processing |
| ML inference | 100 units | GPU time, expensive compute |
| Report generation | 500 units | Long-running, resource heavy |
Real-world rate limiting policies are rarely one-dimensional. Hierarchical rate limiting applies limits at multiple levels of a hierarchy, where requests must satisfy all applicable limits.
Level 1: Infrastructure Limit (protect the platform)
│
└── Level 2: Organization Limit (capacity per paying customer)
│
└── Level 3: User Limit (fairness within org)
│
└── Level 4: Endpoint Limit (protect expensive operations)
│
└── Level 5: IP Limit (prevent credential stuffing)
Evaluation Order: A request must pass ALL levels:
If ANY level fails → 429 Too Many Requests
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
/** * Hierarchical rate limiter that evaluates multiple levels */interface RateLimitHierarchy { levels: RateLimitLevel[];} interface RateLimitLevel { name: string; extractKey: (request: Request) => string; getLimit: (request: Request) => RateLimitConfig; priority: number; // Lower = check first} class HierarchicalRateLimiter { constructor( private readonly hierarchy: RateLimitLevel[], private readonly limiter: RateLimiter ) { // Sort by priority this.hierarchy.sort((a, b) => a.priority - b.priority); } async checkRequest(request: Request): Promise<HierarchicalResult> { const results: LevelResult[] = []; for (const level of this.hierarchy) { const key = level.extractKey(request); const config = level.getLimit(request); const result = await this.limiter.checkLimit( `${level.name}:${key}`, config.requests, config.windowSeconds ); results.push({ level: level.name, key, ...result }); if (!result.allowed) { // This level blocked the request return { allowed: false, blockedBy: level.name, levels: results, retryAfter: result.retryAfter }; } } // All levels passed // Return the most constrained remaining count const minRemaining = Math.min(...results.map(r => r.remaining)); return { allowed: true, levels: results, remaining: minRemaining }; }} // Example hierarchy configurationconst apiHierarchy: RateLimitLevel[] = [ { name: 'infrastructure', priority: 0, extractKey: () => 'global', getLimit: () => ({ requests: 10000000, windowSeconds: 60 }) }, { name: 'organization', priority: 1, extractKey: (req) => req.auth.organizationId, getLimit: (req) => getOrgLimit(req.auth.organizationId) }, { name: 'user', priority: 2, extractKey: (req) => req.auth.userId, getLimit: (req) => getUserLimit(req.auth.userId) }, { name: 'endpoint', priority: 3, extractKey: (req) => `${req.auth.userId}:${req.method}:${req.path}`, getLimit: (req) => getEndpointLimit(req.path) }, { name: 'ip', priority: 4, extractKey: (req) => req.clientIP, getLimit: () => ({ requests: 1000, windowSeconds: 60 }) }];Check cheaper/faster limits first. If a local in-memory check fails, skip the expensive distributed checks. Order matters: infrastructure (cached) → endpoint (local) → user (distributed) → organization (distributed).
Hierarchies should support inheritance for configuration simplicity:
defaults:
user:
requests_per_hour: 10000
endpoint:
GET: 1000/minute
POST: 100/minute
DELETE: 10/minute
overrides:
organizations:
acme_corp:
user:
requests_per_hour: 100000 # 10x default
endpoints:
POST /api/export:
requests: 100/hour # Override specific endpoint
users:
alice@acme.com:
requests_per_hour: 500000 # Power user exception
Resolution: Most specific wins. Alice gets 500K/hour even though Acme's default is 100K.
Rate limits are often tied to business tiers—free, pro, enterprise—reflecting both capacity constraints and commercial value. Designing tier-based limits requires balancing fairness, monetization, and user experience.
| Limit Dimension | Free | Pro ($) | Enterprise ($$) |
|---|---|---|---|
| Requests/month | 1,000 | 100,000 | 10,000,000+ |
| Requests/second | 10 | 100 | 1,000+ |
| Concurrent connections | 5 | 50 | Unlimited |
| File upload size | 10 MB | 100 MB | 1 GB |
| API versions | Latest only | Last 2 | All supported |
| Support response time | Community | 24 hours | 1 hour SLA |
| Custom limits | No | No | Yes |
Principle 1: Free Tier Should Be Genuinely Useful
Principle 2: Clear Upgrade Path
Principle 3: Enterprise = Flexibility
Principle 4: Protect the Platform at All Tiers
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192
/** * Tier-based rate limit configuration */interface TierConfig { tier: 'free' | 'pro' | 'business' | 'enterprise'; limits: { global: RateLimit; endpoints: Record<string, RateLimit>; features: { maxFileSize: number; maxConcurrent: number; burstMultiplier: number; // How much burst allowed }; }; overages: { allowed: boolean; pricePerRequest?: number; // Usage-based pricing hardCap?: number; // Absolute maximum };} const tierConfigs: Record<string, TierConfig> = { free: { tier: 'free', limits: { global: { requests: 1000, window: 'month' }, endpoints: { 'POST /api/ai/*': { requests: 100, window: 'month' }, }, features: { maxFileSize: 10 * 1024 * 1024, // 10 MB maxConcurrent: 5, burstMultiplier: 1.0, // No burst } }, overages: { allowed: false, // Hard stop at limit } }, pro: { tier: 'pro', limits: { global: { requests: 100000, window: 'month' }, endpoints: { 'POST /api/ai/*': { requests: 10000, window: 'month' }, }, features: { maxFileSize: 100 * 1024 * 1024, // 100 MB maxConcurrent: 50, burstMultiplier: 2.0, // 2x burst allowed } }, overages: { allowed: true, pricePerRequest: 0.001, // $0.001 per extra request hardCap: 500000, // Can't exceed 5x limit even with overages } }, enterprise: { tier: 'enterprise', limits: { global: { requests: 10000000, window: 'month' }, // Base, negotiable endpoints: {}, // All endpoint limits negotiated features: { maxFileSize: 1024 * 1024 * 1024, // 1 GB maxConcurrent: 1000, burstMultiplier: 5.0, // High burst for batch operations } }, overages: { allowed: true, pricePerRequest: 0.0001, // Volume discount // No hard cap for enterprise } }}; /** * Get effective limits for a user */function getEffectiveLimits(user: User): TierConfig { const baseConfig = tierConfigs[user.tier]; // Apply any negotiated overrides if (user.customLimits) { return mergeConfigs(baseConfig, user.customLimits); } return baseConfig;}Soft Limits (Overages Allowed):
Hard Limits (Strict Enforcement):
Hybrid Approach (Common):
0-100%: Included in plan
100-200%: Overage charges apply
200%+: Hard rejection (must upgrade plan)
This allows some flexibility while preventing runaway usage.
Let's examine rate limiting policies from major API providers to understand how these concepts work in practice.
Stripe limits at multiple dimensions simultaneously:
By API Key:
By Resource:
By IP (Fraudulent Activity):
Key Innovation: Stripe uses request ID idempotency. Retries with the same idempotency key don't consume quota—critical for payment reliability.
Authenticated Requests:
Unauthenticated Requests:
GraphQL API:
Search API:
Key Innovation: GitHub's point-based GraphQL limiting accurately reflects query cost. Fetching 1 field costs 1 point; fetching 100 nested objects costs 100+ points.
Multi-Dimensional Limits:
Tier Progression:
Free: 3 RPM, 40K TPM
Tier 1: 3,500 RPM, 90K TPM ($5 min spend)
Tier 2: 5,000 RPM, 450K TPM ($50 min spend)
Tier 3: 5,000 RPM, 1M TPM ($100 min spend)
Tier 4: 10,000 RPM, 2M TPM ($250 min spend)
Tier 5: 10,000 RPM, 10M TPM ($1000 min spend)
Model-Specific Limits:
Key Innovation: Token-based limiting accurately reflects LLM costs. A request generating 1,000 tokens costs 20× more than 50 tokens, reflected in TPM limits.
| Industry | Primary Limit Dimension | Secondary Dimensions | Key Consideration |
|---|---|---|---|
| Payments (Stripe) | Per API Key + Per Endpoint | Per-card, Per-IP | Idempotency for retries |
| AI/ML (OpenAI) | Tokens Not Requests | Per-model, Per-tier | Cost varies by output length |
| Developer Tools (GitHub) | Per-user + Per-IP | Per-endpoint (search) | GraphQL query complexity |
| Communication (Twilio) | Per-account + Per-number | Per-message type | Carrier regulations |
| Cloud (AWS) | Per-service + Per-account | Per-region, Per-resource | Service-specific limits |
| Social (Twitter/X) | Per-user + Per-app | Per-endpoint, Per-feature | Read vs. write distinction |
Production rate limiting requires dynamic policy management—the ability to update limits without deployment, create custom overrides, and respond to incidents rapidly.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102
/** * Rate limit policy management system */interface PolicyStore { // Get effective policy for a request getPolicy(context: PolicyContext): Promise<EffectivePolicy>; // Policy CRUD operations createPolicy(policy: Policy): Promise<Policy>; updatePolicy(id: string, policy: Partial<Policy>): Promise<Policy>; deletePolicy(id: string): Promise<void>; // Override management createOverride(override: PolicyOverride): Promise<PolicyOverride>; listOverrides(filters: OverrideFilters): Promise<PolicyOverride[]>;} interface Policy { id: string; name: string; description: string; // Selector: who does this policy apply to? selector: PolicySelector; // Limits: what are the limits? limits: LimitDefinition[]; // Metadata priority: number; // Higher = takes precedence enabled: boolean; createdAt: Date; updatedAt: Date; version: number; // For optimistic concurrency} interface PolicySelector { type: 'all' | 'tier' | 'organization' | 'user' | 'ip_range'; value?: string; // e.g., tier name, org ID, user ID, CIDR conditions?: PolicyCondition[]; // Additional conditions} interface LimitDefinition { resource: string; // 'global', 'endpoint:/api/export', etc. limit: number; window: string; // '1m', '1h', '1d' burst?: number; action: 'reject' | 'throttle' | 'warn';} /** * Policy resolution with inheritance and priority */class PolicyResolver { constructor(private readonly store: PolicyStore) {} async resolve(context: PolicyContext): Promise<EffectivePolicy> { // Get all applicable policies const policies = await this.store.getPoliciesForContext(context); // Sort by priority (descending) and specificity policies.sort((a, b) => { if (a.priority !== b.priority) return b.priority - a.priority; return this.getSpecificity(b) - this.getSpecificity(a); }); // Merge limits from all policies const effectiveLimits = new Map<string, LimitDefinition>(); for (const policy of policies) { for (const limit of policy.limits) { const existing = effectiveLimits.get(limit.resource); if (!existing || policy.priority > existing.priority) { effectiveLimits.set(limit.resource, limit); } } } // Check for active overrides const overrides = await this.store.getActiveOverrides(context); for (const override of overrides) { effectiveLimits.set(override.resource, override.limit); } return { limits: Array.from(effectiveLimits.values()), appliedPolicies: policies.map(p => p.id), appliedOverrides: overrides.map(o => o.id) }; } private getSpecificity(policy: Policy): number { // More specific selectors have higher specificity switch (policy.selector.type) { case 'user': return 4; case 'organization': return 3; case 'tier': return 2; case 'ip_range': return 1; case 'all': return 0; } }}Rate limiting policy design is as important as the underlying algorithms. Let's consolidate our understanding:
What's Next:
We've designed policies—but how do we communicate them to clients? In the final page, we'll explore Client Communication—the headers, error responses, and retry strategies that create a great developer experience even when rate limiting is engaged.
You now understand how to design sophisticated rate limiting policies across multiple dimensions—identity, resource, and business tier. You can create hierarchical policies, configure tier-based limits, and implement dynamic policy management. Next, we'll master client communication.