Loading content...
A free-tier user testing your API and an enterprise customer powering production workloads have vastly different needs. A search endpoint hitting Elasticsearch and a simple health check endpoint have vastly different costs. A single global rate limit ignores these realities.
Effective rate limiting is multi-dimensional. It considers who is making the request, what resource they're accessing, and what tier of service they're entitled to. This page explores the strategies for implementing sophisticated, fair, and business-aligned rate limiting.
By the end of this page, you will understand per-user and per-API limiting strategies, hierarchical limit structures, tier-based differentiation, and how to balance fairness, protection, and business requirements in your rate limiting design.
Rate limits can be applied along multiple dimensions. Understanding these dimensions helps you design limits that are both protective and fair.
| Dimension | Identifier | Purpose | Example |
|---|---|---|---|
| Global | None (aggregate) | Protect overall system capacity | 10,000 req/sec across all users |
| Per-IP | Client IP address | Block distributed attacks, anonymous abuse | 100 req/min per IP |
| Per-User | User ID, Session | Fair usage per authenticated user | 1,000 req/hour per user |
| Per-API Key | API Key | Enforce contract limits for developers | 10,000 req/day per key |
| Per-Tenant | Organization ID | Multi-tenant isolation | 50,000 req/hour per org |
| Per-Endpoint | URL path/method | Protect expensive operations | 10 req/min on /export |
| Per-Resource | Resource ID | Prevent resource-specific abuse | 100 req/min per document |
Production systems typically apply multiple dimensions simultaneously. A request might be checked against global limits, per-IP limits, per-user limits, AND per-endpoint limits. Each dimension catches different abuse patterns.
Per-user rate limiting is the foundation of fair API access. It ensures that each authenticated user gets their allocated share of resources, regardless of how other users behave.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
interface UserLimitConfig { tier: 'free' | 'pro' | 'enterprise'; customLimits?: Partial<TierLimits>;} interface TierLimits { requestsPerMinute: number; requestsPerHour: number; requestsPerDay: number; burstSize: number;} const TIER_LIMITS: Record<string, TierLimits> = { free: { requestsPerMinute: 20, requestsPerHour: 500, requestsPerDay: 5000, burstSize: 10, }, pro: { requestsPerMinute: 100, requestsPerHour: 5000, requestsPerDay: 50000, burstSize: 50, }, enterprise: { requestsPerMinute: 1000, requestsPerHour: 50000, requestsPerDay: 500000, burstSize: 200, },}; class PerUserRateLimiter { private limiters: Map<string, MultiWindowLimiter>; private userConfigs: Map<string, UserLimitConfig>; async checkLimit(userId: string): Promise<LimitResult> { const config = await this.getUserConfig(userId); const limits = this.getEffectiveLimits(config); // Check all time windows const results = await Promise.all([ this.checkWindow(userId, 'minute', limits.requestsPerMinute), this.checkWindow(userId, 'hour', limits.requestsPerHour), this.checkWindow(userId, 'day', limits.requestsPerDay), ]); // Find the most restrictive limit that's hit const blocked = results.find(r => !r.allowed); if (blocked) { return { allowed: false, retryAfter: blocked.retryAfter, limitType: blocked.window, remaining: 0, }; } // Return the most restrictive remaining count const minRemaining = Math.min(...results.map(r => r.remaining)); return { allowed: true, remaining: minRemaining, limits: { minute: results[0].remaining, hour: results[1].remaining, day: results[2].remaining, }, }; } private getEffectiveLimits(config: UserLimitConfig): TierLimits { const baseLimits = TIER_LIMITS[config.tier]; return { ...baseLimits, ...config.customLimits }; }}Per-user limiting requires reliable user identification. For authenticated APIs, this is straightforward. For partially authenticated or anonymous access, you may need to fall back to IP-based or device fingerprint-based limiting, which is less precise.
Not all API endpoints are equal. Some operations are cheap (reading cached data), while others are expensive (complex queries, external API calls, data exports). Per-endpoint limits protect expensive operations while allowing high throughput on inexpensive ones.
| Category | Examples | Suggested Limit Approach |
|---|---|---|
| No-Cost | Health checks, ping, status | No limit or very high (monitoring traffic) |
| Low-Cost | Read single resource, user profile | High limits (100s/minute) |
| Medium-Cost | List with pagination, search | Moderate limits (10s/minute) |
| High-Cost | Complex aggregations, reports | Low limits (10s/hour) |
| Expensive | Data export, batch operations | Very low limits (few/day) |
| External | Calls to third-party APIs | Match third-party limits |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
interface EndpointConfig { pattern: string; // URL pattern (glob or regex) methods: string[]; // HTTP methods limits: { requestsPerMinute?: number; requestsPerHour?: number; tokenCost?: number; // For token bucket }; bypassForTiers?: string[]; // Tiers that bypass this limit} const ENDPOINT_CONFIGS: EndpointConfig[] = [ { pattern: '/health', methods: ['GET'], limits: { requestsPerMinute: Infinity }, // No limit }, { pattern: '/api/v1/users/:id', methods: ['GET'], limits: { requestsPerMinute: 100 }, }, { pattern: '/api/v1/search', methods: ['GET', 'POST'], limits: { requestsPerMinute: 30, tokenCost: 5 }, }, { pattern: '/api/v1/export', methods: ['POST'], limits: { requestsPerHour: 5, tokenCost: 50 }, bypassForTiers: ['enterprise'], }, { pattern: '/api/v1/*/batch', methods: ['POST'], limits: { requestsPerMinute: 10 }, },]; class EndpointRateLimiter { private configs: EndpointConfig[]; private limiters: Map<string, RateLimiter>; getEndpointConfig(method: string, path: string): EndpointConfig | null { return this.configs.find(config => config.methods.includes(method) && this.matchPattern(config.pattern, path) ) ?? null; } async checkEndpointLimit( userId: string, method: string, path: string, userTier: string ): Promise<LimitResult> { const config = this.getEndpointConfig(method, path); if (!config) { return { allowed: true, remaining: Infinity }; } // Check if user tier bypasses this endpoint limit if (config.bypassForTiers?.includes(userTier)) { return { allowed: true, remaining: Infinity }; } // Create composite key for per-user-per-endpoint limiting const limitKey = `${userId}:${method}:${this.normalizePattern(config.pattern)}`; return this.checkLimit(limitKey, config.limits); }}Real-world systems often need hierarchical limits where a parent entity (organization) has an overall limit, and child entities (users) have individual limits within that. This prevents a single user from consuming an organization's entire allocation.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
interface HierarchicalLimitCheck { level: string; key: string; limit: number; window: string;} class HierarchicalRateLimiter { /** * Check limits at multiple hierarchy levels. * All levels must pass for the request to be allowed. */ async checkHierarchy(request: Request): Promise<LimitResult> { const checks = this.buildHierarchy(request); const results = await Promise.all( checks.map(check => this.checkSingleLimit(check)) ); // Find first blocked level for (let i = 0; i < results.length; i++) { if (!results[i].allowed) { return { allowed: false, blockedAt: checks[i].level, retryAfter: results[i].retryAfter, message: `Rate limit exceeded at ${checks[i].level} level`, }; } } // Return the most restrictive remaining count const minRemaining = Math.min(...results.map(r => r.remaining)); return { allowed: true, remaining: minRemaining }; } private buildHierarchy(request: Request): HierarchicalLimitCheck[] { const userId = request.userId; const orgId = request.organizationId; const endpoint = request.endpoint; const checks: HierarchicalLimitCheck[] = []; // Level 1: Global limit checks.push({ level: 'global', key: 'global', limit: 100000, window: 'minute', }); // Level 2: Organization limit (if applicable) if (orgId) { const orgLimit = await this.getOrgLimit(orgId); checks.push({ level: 'organization', key: `org:${orgId}`, limit: orgLimit, window: 'minute', }); } // Level 3: User limit const userLimit = await this.getUserLimit(userId); checks.push({ level: 'user', key: `user:${userId}`, limit: userLimit, window: 'minute', }); // Level 4: Endpoint limit for this user const endpointLimit = this.getEndpointLimit(endpoint); if (endpointLimit) { checks.push({ level: 'endpoint', key: `user:${userId}:endpoint:${endpoint}`, limit: endpointLimit, window: 'minute', }); } return checks; }}Each hierarchy level adds a rate limit check, increasing latency. For most APIs, 2-3 levels (global + user + endpoint) are sufficient. Deep hierarchies (5+ levels) may add noticeable overhead.
Most APIs differentiate limits based on subscription tier. This enables monetization while maintaining fair access for all users.
| Tier | Req/Min | Req/Hour | Req/Day | Cost Weight | Priority |
|---|---|---|---|---|---|
| Anonymous | 10 | 100 | 500 | 1.5x | Low |
| Free | 30 | 500 | 5,000 | 1x | Normal |
| Starter | 100 | 3,000 | 30,000 | 1x | Normal |
| Pro | 500 | 20,000 | 200,000 | 0.8x | High |
| Enterprise | Custom | Custom | Custom | 0.5x | Highest |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
interface TierConfig { id: string; name: string; limits: { requestsPerMinute: number; requestsPerHour: number; requestsPerDay: number; concurrentRequests: number; }; features: { burstMultiplier: number; // Allow 2x burst for premium priorityQueue: boolean; // Premium users get priority customEndpoints: boolean; // Access to premium endpoints reducedCostWeight: number; // 0.8 = 20% more effective requests }; pricing: { monthlyPrice: number; overageRate: number; // Cost per 1000 requests over limit allowOverage: boolean; };} const TIERS: TierConfig[] = [ { id: 'free', name: 'Free', limits: { requestsPerMinute: 30, requestsPerHour: 500, requestsPerDay: 5000, concurrentRequests: 2, }, features: { burstMultiplier: 1.0, priorityQueue: false, customEndpoints: false, reducedCostWeight: 1.0, }, pricing: { monthlyPrice: 0, overageRate: 0, allowOverage: false, // Hard limit }, }, { id: 'pro', name: 'Pro', limits: { requestsPerMinute: 500, requestsPerHour: 20000, requestsPerDay: 200000, concurrentRequests: 10, }, features: { burstMultiplier: 2.0, priorityQueue: true, customEndpoints: true, reducedCostWeight: 0.8, }, pricing: { monthlyPrice: 99, overageRate: 0.50, // $0.50 per 1000 over allowOverage: true, }, },];When users hit limits, provide a clear upgrade path. Include upgrade URLs in 429 responses. This turns rate limiting from a frustration into a conversion opportunity.
Designing multi-dimensional rate limits requires careful thought. Here are battle-tested practices from production systems:
1234567891011121314151617181920
// Informative 429 response with all dimensions{ "error": { "code": "RATE_LIMIT_EXCEEDED", "message": "You have exceeded your hourly request limit", "details": { "limit_type": "user_hourly", "limit": 1000, "used": 1000, "resets_at": "2024-01-15T14:00:00Z", "retry_after_seconds": 847 }, "other_limits": { "user_daily": { "limit": 10000, "remaining": 4523 }, "endpoint_minute": { "limit": 30, "remaining": 28 } }, "upgrade_url": "https://api.example.com/pricing", "documentation_url": "https://docs.example.com/rate-limits" }}What's Next:
So far, we've assumed rate limiting on a single server. But modern systems span multiple regions and instances. The next page tackles Distributed Rate Limiting—ensuring consistent limits across a global infrastructure.
You now understand how to design multi-dimensional rate limiting that balances protection, fairness, and business requirements. Next, we'll tackle the challenge of distributed consistency.