Rate Limiter - Learning Module

Loading content...

0/273

Per-User vs Per-API Limits

Designing Fair and Flexible Rate Limit Policies

Implementing a rate limiting algorithm is only half the challenge. The other half—often more impactful on user experience—is policy design: deciding what to limit, who to identify, and how limits should interact across different dimensions.

Consider Stripe's API: They limit by API key, by endpoint, by IP address, and by card number being charged—all simultaneously. GitHub limits by authenticated user, but also by organization and by IP for unauthenticated requests. Cloudflare limits by zone, by plan tier, and by specific security rule.

In this page, we'll master the art of designing rate limiting policies that are fair, flexible, and aligned with business objectives—policies that protect infrastructure while enabling legitimate high-volume usage.

What You Will Master

By the end of this page, you'll understand: (1) Identity dimensions for rate limiting (user, API key, IP, etc.), (2) Resource dimensions (global, endpoint-specific, operation-based), (3) Hierarchical and composite limit strategies, (4) Business tier differentiation, and (5) Real-world policy patterns from major API providers.

Identity Dimensions: Who Are We Limiting?

The first decision in rate limit policy design is identity: what entity are we tracking and limiting? The choice profoundly affects both security and user experience.

Common Identity Dimensions
Identity	Best For	Challenges	Examples
API Key	Authenticated APIs	Key sharing, key leakage	Stripe, Twilio, OpenAI
User ID	Multi-device users	Requires authentication	GitHub, Slack
Organization/Tenant	B2B SaaS	Large orgs need high limits	AWS, Salesforce
IP Address	Unauthenticated endpoints	NAT, shared IPs, VPNs	Login pages, public APIs
Device/Client ID	Mobile apps	Device spoofing	Mobile gaming, streaming
Composite	Defense in depth	Complex configuration	Most production systems

API Key Limiting

The most common approach for authenticated APIs. Each API key gets its own quota.

Advantages:

Clear accountability (key owner is responsible)
Easy to track, bill, and report usage
Keys can be revoked if abused
Natural alignment with business tiers

Challenges:

Key sharing: Multiple users share one key to circumvent limits
Key proliferation: Users create multiple keys to multiply limits
Leaked keys: Attackers use stolen keys, impacting legitimate owners

Best Practices:

// Per-key limits with abuse detection
interface APIKeyPolicy {
  keyId: string;
  limits: {
    requestsPerMinute: number;
    requestsPerDay: number;
    concurrentRequests: number;
  };
  flags: {
    allowOverageWith429: boolean;  // Soft vs hard limit
    suspiciousActivityThreshold: number;  // Trigger investigation
  };
}

User ID Limiting

Limits tied to the authenticated user, regardless of which API keys or sessions they use.

Advantages:

Prevents circumvention via multiple API keys
Limits follow user across devices
Better for consumer applications where users shouldn't manage keys

Challenges:

Requires authentication before rate limiting
For some endpoints, authentication may be expensive
Session management adds complexity

Pattern: Key and User Limits Together

User: alice@example.com
  ├── API Key: key_prod_abc → 10,000 req/day (development key)
  ├── API Key: key_prod_xyz → 10,000 req/day (production key)
  └── User Total: 15,000 req/day (regardless of key used)

Result: Even with 2 keys, Alice can't exceed 15,000/day total

IP Address Limiting

Essential for unauthenticated endpoints and as a fallback layer.

Advantages:

No authentication required
Simple to implement
Good for login/signup protection

Challenges:

NAT: Many users behind same IP (universities, corporations)
Proxies/VPNs: Users change IPs to bypass limits
Mobile carriers: IPs shared among thousands of users
CDN/Load Balancers: All traffic may appear from a few IPs

Best Practices:

function getRateLimitIP(request: Request): string {
  // Check for forwarded headers (from trusted proxies)
  const forwarded = request.headers['x-forwarded-for'];
  if (forwarded && isFromTrustedProxy(request)) {
    return forwarded.split(',')[0].trim();  // Client's real IP
  }
  
  // Check Cloudflare header
  const cfIP = request.headers['cf-connecting-ip'];
  if (cfIP) return cfIP;
  
  // Fallback to direct connection IP
  return request.socket.remoteAddress;
}

Handle Shared IPs:

Set higher limits for known corporate/university IP ranges
Use IP + User-Agent combination for finer granularity
Implement CAPTCHA/challenge for IPs near limits

The Corporate NAT Problem

A single IP address from a large corporation might represent 50,000 employees. If you rate limit that IP at 100 requests/minute, effectively no one can use your service. Mitigation: identify major corporate IP ranges (via WHOIS) and apply higher limits, or use authenticated rate limiting for business customers.

Organization/Tenant Limiting

For B2B SaaS, limits often apply at the organization level:

Organization: Acme Corp
  ├── User: alice → No individual limit
  ├── User: bob → No individual limit
  ├── User: carol → No individual limit
  └── Org Total: 1,000,000 req/day (shared pool)

Why Organization Limits:

Align with billing (orgs pay for capacity)
Large teams need more capacity than individuals
Simplifies management (admins control allocation)
Encourages teams to self-govern usage

Combined Org + User Limits:

Organization: 1,000,000 req/day
  └── Per-User within Org: 100,000 req/day

Result: Org can use 1M total, but no single user can use more than 100K
This prevents one runaway user from starving the team

Resource Dimensions: What Are We Limiting?

Beyond who, we must decide what resource to limit. Different operations have vastly different costs, and a single global limit is often too coarse.

Global vs. Per-Endpoint Limits

Global Limit Only:

API Key: 10,000 requests/hour (any endpoint)

/api/search: 0 cost awareness
/api/export: 0 cost awareness
/api/simple: 0 cost awareness

Problem: A single /api/export call might consume 100× the resources of /api/simple, but both count as 1 request.

Per-Endpoint Limits:

API Key: 10,000 requests/hour (global)
  /api/search: 100 requests/minute
  /api/export: 10 requests/hour
  /api/simple: 1,000 requests/minute

Now expensive operations are protected separately, and cheap operations can sustain higher throughput.

endpoint-rate-limits.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
/**
 * Endpoint-specific rate limit configuration
 */
interface EndpointRateLimits {
  // Global limit applies to all endpoints
  global: RateLimit;
  
  // Per-endpoint overrides
  endpoints: {
    [pattern: string]: RateLimit;
  };
}
 
interface RateLimit {
  requests: number;
  window: 'second' | 'minute' | 'hour' | 'day';
  burst?: number;  // Optional burst allowance
}
 
// Example configuration
const apiLimits: EndpointRateLimits = {
  global: { requests: 10000, window: 'hour' },
  endpoints: {
    // Expensive operations
    'POST /api/export': { requests: 10, window: 'hour' },
    'POST /api/analyze': { requests: 100, window: 'hour' },
    
    // Medium-cost operations
    'GET /api/search': { requests: 100, window: 'minute' },
    'POST /api/upload': { requests: 50, window: 'minute' },
    
    // Cheap operations (high limit)
    'GET /api/status': { requests: 1000, window: 'minute' },
    
    // Pattern-based limits
    'GET /api/users/*': { requests: 500, window: 'minute' },
    'POST /api/webhooks/*': { requests: 1000, window: 'hour' },
  }
};
 
/**
 * Rate limit middleware that checks applicable limits
 */
async function rateLimitMiddleware(
  request: Request,
  limits: EndpointRateLimits,
  limiter: RateLimiter
): Promise<RateLimitResult> {
  const clientId = extractClientId(request);
  const endpoint = `${request.method} ${request.path}`;
  
  // Check global limit first
  const globalResult = await limiter.checkLimit(
    `${clientId}:global`,
    limits.global.requests,
    windowToSeconds(limits.global.window)
  );
  
  if (!globalResult.allowed) {
    return globalResult;
  }
  
  // Check endpoint-specific limit
  const endpointLimit = findMatchingLimit(endpoint, limits.endpoints);
  if (endpointLimit) {
    const endpointResult = await limiter.checkLimit(
      `${clientId}:${endpoint}`,
      endpointLimit.requests,
      windowToSeconds(endpointLimit.window)
    );
    
    if (!endpointResult.allowed) {
      return endpointResult;
    }
  }
  
  return { allowed: true, remaining: globalResult.remaining };
}

Weighted Request Costing

Instead of per-endpoint limits, assign costs to operations:

API Key: 100,000 'credits' per hour

GET /api/simple: 1 credit
GET /api/search: 10 credits
POST /api/upload: 50 credits
POST /api/export: 500 credits

Advantages:

Single unified quota is easier to explain
Flexible: use cheap calls more, or fewer expensive calls
Aligns well with usage-based billing

Disadvantages:

Less intuitive for users ("how many credits do I have left?")
Requires documenting costs for every operation
Costs may need tuning as backend efficiency changes

Cost-Based Rate Limiting Examples
Operation	Typical Cost	Rationale
Read from cache	1 unit	Very cheap, served from memory
Read from database	5 units	I/O bound, uses DB connections
Complex search	20 units	CPU intensive, index scanning
Write operation	10 units	Requires replication, durability
File upload	50 units	Storage, bandwidth, processing
ML inference	100 units	GPU time, expensive compute
Report generation	500 units	Long-running, resource heavy

Hierarchical Rate Limiting

Real-world rate limiting policies are rarely one-dimensional. Hierarchical rate limiting applies limits at multiple levels of a hierarchy, where requests must satisfy all applicable limits.

The Limit Hierarchy

Level 1: Infrastructure Limit (protect the platform)
    │
    └── Level 2: Organization Limit (capacity per paying customer)
            │
            └── Level 3: User Limit (fairness within org)
                    │
                    └── Level 4: Endpoint Limit (protect expensive operations)
                            │
                            └── Level 5: IP Limit (prevent credential stuffing)

Evaluation Order: A request must pass ALL levels:

Infrastructure: Is global capacity available? (10M req/min)
Organization: Has Acme Corp used their quota? (1M req/day)
User: Has alice@acme.com hit her limit? (100K req/day)
Endpoint: Has Alice hit /api/export limit? (10 req/hour)
IP: Has 203.0.113.50 been flagged? (100 req/min)

If ANY level fails → 429 Too Many Requests

hierarchical-limiter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
/**
 * Hierarchical rate limiter that evaluates multiple levels
 */
interface RateLimitHierarchy {
  levels: RateLimitLevel[];
}
 
interface RateLimitLevel {
  name: string;
  extractKey: (request: Request) => string;
  getLimit: (request: Request) => RateLimitConfig;
  priority: number;  // Lower = check first
}
 
class HierarchicalRateLimiter {
  constructor(
    private readonly hierarchy: RateLimitLevel[],
    private readonly limiter: RateLimiter
  ) {
    // Sort by priority
    this.hierarchy.sort((a, b) => a.priority - b.priority);
  }
  
  async checkRequest(request: Request): Promise<HierarchicalResult> {
    const results: LevelResult[] = [];
    
    for (const level of this.hierarchy) {
      const key = level.extractKey(request);
      const config = level.getLimit(request);
      
      const result = await this.limiter.checkLimit(
        `${level.name}:${key}`,
        config.requests,
        config.windowSeconds
      );
      
      results.push({
        level: level.name,
        key,
        ...result
      });
      
      if (!result.allowed) {
        // This level blocked the request
        return {
          allowed: false,
          blockedBy: level.name,
          levels: results,
          retryAfter: result.retryAfter
        };
      }
    }
    
    // All levels passed
    // Return the most constrained remaining count
    const minRemaining = Math.min(...results.map(r => r.remaining));
    
    return {
      allowed: true,
      levels: results,
      remaining: minRemaining
    };
  }
}
 
// Example hierarchy configuration
const apiHierarchy: RateLimitLevel[] = [
  {
    name: 'infrastructure',
    priority: 0,
    extractKey: () => 'global',
    getLimit: () => ({ requests: 10000000, windowSeconds: 60 })
  },
  {
    name: 'organization',
    priority: 1,
    extractKey: (req) => req.auth.organizationId,
    getLimit: (req) => getOrgLimit(req.auth.organizationId)
  },
  {
    name: 'user',
    priority: 2,
    extractKey: (req) => req.auth.userId,
    getLimit: (req) => getUserLimit(req.auth.userId)
  },
  {
    name: 'endpoint',
    priority: 3,
    extractKey: (req) => `${req.auth.userId}:${req.method}:${req.path}`,
    getLimit: (req) => getEndpointLimit(req.path)
  },
  {
    name: 'ip',
    priority: 4,
    extractKey: (req) => req.clientIP,
    getLimit: () => ({ requests: 1000, windowSeconds: 60 })
  }
];

Optimization: Early Exit

Check cheaper/faster limits first. If a local in-memory check fails, skip the expensive distributed checks. Order matters: infrastructure (cached) → endpoint (local) → user (distributed) → organization (distributed).

Limit Inheritance and Defaults

Hierarchies should support inheritance for configuration simplicity:

defaults:
  user:
    requests_per_hour: 10000
  endpoint:
    GET: 1000/minute
    POST: 100/minute
    DELETE: 10/minute

overrides:
  organizations:
    acme_corp:
      user:
        requests_per_hour: 100000  # 10x default
      endpoints:
        POST /api/export:
          requests: 100/hour  # Override specific endpoint
          
  users:
    alice@acme.com:
      requests_per_hour: 500000  # Power user exception

Resolution: Most specific wins. Alice gets 500K/hour even though Acme's default is 100K.

Business Tier Differentiation

Rate limits are often tied to business tiers—free, pro, enterprise—reflecting both capacity constraints and commercial value. Designing tier-based limits requires balancing fairness, monetization, and user experience.

Example Tier-Based Rate Limits (Inspired by Real APIs)
Limit Dimension	Free	Pro ($)	Enterprise ($$)
Requests/month	1,000	100,000	10,000,000+
Requests/second	10	100	1,000+
Concurrent connections	5	50	Unlimited
File upload size	10 MB	100 MB	1 GB
API versions	Latest only	Last 2	All supported
Support response time	Community	24 hours	1 hour SLA
Custom limits	No	No	Yes

Designing Tier Limits

Principle 1: Free Tier Should Be Genuinely Useful

Too restrictive → No one tries your API
Too generous → No reason to upgrade
Sweet spot: Enough for development, hobby projects, and evaluation

Principle 2: Clear Upgrade Path

Users should hit limits during growth, not during evaluation
Provide warnings at 80%, 90%, 95% usage
Make upgrading frictionless (one-click, no data migration)

Principle 3: Enterprise = Flexibility

Large customers have unique needs
Negotiate limits per contract
Offer self-service limit increases with billing

Principle 4: Protect the Platform at All Tiers

Even unlimited plans have fair use limits
Reserve infrastructure capacity for everyone
Burst limits prevent any single customer from impacting others

tier-based-limits.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
/**
 * Tier-based rate limit configuration
 */
interface TierConfig {
  tier: 'free' | 'pro' | 'business' | 'enterprise';
  limits: {
    global: RateLimit;
    endpoints: Record<string, RateLimit>;
    features: {
      maxFileSize: number;
      maxConcurrent: number;
      burstMultiplier: number;  // How much burst allowed
    };
  };
  overages: {
    allowed: boolean;
    pricePerRequest?: number;  // Usage-based pricing
    hardCap?: number;  // Absolute maximum
  };
}
 
const tierConfigs: Record<string, TierConfig> = {
  free: {
    tier: 'free',
    limits: {
      global: { requests: 1000, window: 'month' },
      endpoints: {
        'POST /api/ai/*': { requests: 100, window: 'month' },
      },
      features: {
        maxFileSize: 10 * 1024 * 1024,  // 10 MB
        maxConcurrent: 5,
        burstMultiplier: 1.0,  // No burst
      }
    },
    overages: {
      allowed: false,  // Hard stop at limit
    }
  },
  
  pro: {
    tier: 'pro',
    limits: {
      global: { requests: 100000, window: 'month' },
      endpoints: {
        'POST /api/ai/*': { requests: 10000, window: 'month' },
      },
      features: {
        maxFileSize: 100 * 1024 * 1024,  // 100 MB
        maxConcurrent: 50,
        burstMultiplier: 2.0,  // 2x burst allowed
      }
    },
    overages: {
      allowed: true,
      pricePerRequest: 0.001,  // $0.001 per extra request
      hardCap: 500000,  // Can't exceed 5x limit even with overages
    }
  },
  
  enterprise: {
    tier: 'enterprise',
    limits: {
      global: { requests: 10000000, window: 'month' },  // Base, negotiable
      endpoints: {},  // All endpoint limits negotiated
      features: {
        maxFileSize: 1024 * 1024 * 1024,  // 1 GB
        maxConcurrent: 1000,
        burstMultiplier: 5.0,  // High burst for batch operations
      }
    },
    overages: {
      allowed: true,
      pricePerRequest: 0.0001,  // Volume discount
      // No hard cap for enterprise
    }
  }
};
 
/**
 * Get effective limits for a user
 */
function getEffectiveLimits(user: User): TierConfig {
  const baseConfig = tierConfigs[user.tier];
  
  // Apply any negotiated overrides
  if (user.customLimits) {
    return mergeConfigs(baseConfig, user.customLimits);
  }
  
  return baseConfig;
}

Soft Limits vs. Hard Limits

Soft Limits (Overages Allowed):

Request continues, but user is billed for overage
Better UX (service doesn't stop unexpectedly)
Requires billing infrastructure
Risk: Surprise bills (implement spend alerts!)

Hard Limits (Strict Enforcement):

Request is rejected with 429
Clear, predictable behavior
Simpler implementation
Risk: Frustrated users, lost revenue

Hybrid Approach (Common):

0-100%: Included in plan
100-200%: Overage charges apply
200%+: Hard rejection (must upgrade plan)

This allows some flexibility while preventing runaway usage.

Real-World Policy Patterns

Let's examine rate limiting policies from major API providers to understand how these concepts work in practice.

Stripe's Rate Limiting

Stripe limits at multiple dimensions simultaneously:

By API Key:

Live mode: 100 requests/second
Test mode: 25 requests/second
Per-endpoint limits for expensive operations

By Resource:

Create customer: 100/second
Create payment intent: 100/second
List operations: 100/second (but paginated)
Idempotent requests: Cached, don't count against limit

By IP (Fraudulent Activity):

Card creation from suspicious IPs: Aggressive limiting
Geographic anomaly detection

Key Innovation: Stripe uses request ID idempotency. Retries with the same idempotency key don't consume quota—critical for payment reliability.

GitHub's Rate Limiting

Authenticated Requests:

5,000 requests/hour per user
Resets each hour (fixed window)

Unauthenticated Requests:

60 requests/hour per IP
Forces authentication for serious use

GraphQL API:

Point-based system instead of request count
Complex queries cost more points
5,000 points/hour

Search API:

30 requests/minute authenticated
10 requests/minute unauthenticated
More restrictive due to expense

Key Innovation: GitHub's point-based GraphQL limiting accurately reflects query cost. Fetching 1 field costs 1 point; fetching 100 nested objects costs 100+ points.

OpenAI's Rate Limiting

Multi-Dimensional Limits:

Requests per minute (RPM)
Tokens per minute (TPM)
Tokens per day (TPD)

Tier Progression:

Free: 3 RPM, 40K TPM
Tier 1: 3,500 RPM, 90K TPM ($5 min spend)
Tier 2: 5,000 RPM, 450K TPM ($50 min spend)
Tier 3: 5,000 RPM, 1M TPM ($100 min spend)
Tier 4: 10,000 RPM, 2M TPM ($250 min spend)
Tier 5: 10,000 RPM, 10M TPM ($1000 min spend)

Model-Specific Limits:

GPT-4: Lower limits (expensive)
GPT-3.5: Higher limits (cheaper)

Key Innovation: Token-based limiting accurately reflects LLM costs. A request generating 1,000 tokens costs 20× more than 50 tokens, reflected in TPM limits.

Rate Limiting Patterns Across Industries
Industry	Primary Limit Dimension	Secondary Dimensions	Key Consideration
Payments (Stripe)	Per API Key + Per Endpoint	Per-card, Per-IP	Idempotency for retries
AI/ML (OpenAI)	Tokens Not Requests	Per-model, Per-tier	Cost varies by output length
Developer Tools (GitHub)	Per-user + Per-IP	Per-endpoint (search)	GraphQL query complexity
Communication (Twilio)	Per-account + Per-number	Per-message type	Carrier regulations
Cloud (AWS)	Per-service + Per-account	Per-region, Per-resource	Service-specific limits
Social (Twitter/X)	Per-user + Per-app	Per-endpoint, Per-feature	Read vs. write distinction

Policy Configuration and Management

Production rate limiting requires dynamic policy management—the ability to update limits without deployment, create custom overrides, and respond to incidents rapidly.

policy-management.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
/**
 * Rate limit policy management system
 */
interface PolicyStore {
  // Get effective policy for a request
  getPolicy(context: PolicyContext): Promise<EffectivePolicy>;
  
  // Policy CRUD operations
  createPolicy(policy: Policy): Promise<Policy>;
  updatePolicy(id: string, policy: Partial<Policy>): Promise<Policy>;
  deletePolicy(id: string): Promise<void>;
  
  // Override management
  createOverride(override: PolicyOverride): Promise<PolicyOverride>;
  listOverrides(filters: OverrideFilters): Promise<PolicyOverride[]>;
}
 
interface Policy {
  id: string;
  name: string;
  description: string;
  
  // Selector: who does this policy apply to?
  selector: PolicySelector;
  
  // Limits: what are the limits?
  limits: LimitDefinition[];
  
  // Metadata
  priority: number;  // Higher = takes precedence
  enabled: boolean;
  createdAt: Date;
  updatedAt: Date;
  version: number;  // For optimistic concurrency
}
 
interface PolicySelector {
  type: 'all' | 'tier' | 'organization' | 'user' | 'ip_range';
  value?: string;  // e.g., tier name, org ID, user ID, CIDR
  conditions?: PolicyCondition[];  // Additional conditions
}
 
interface LimitDefinition {
  resource: string;  // 'global', 'endpoint:/api/export', etc.
  limit: number;
  window: string;  // '1m', '1h', '1d'
  burst?: number;
  action: 'reject' | 'throttle' | 'warn';
}
 
/**
 * Policy resolution with inheritance and priority
 */
class PolicyResolver {
  constructor(private readonly store: PolicyStore) {}
  
  async resolve(context: PolicyContext): Promise<EffectivePolicy> {
    // Get all applicable policies
    const policies = await this.store.getPoliciesForContext(context);
    
    // Sort by priority (descending) and specificity
    policies.sort((a, b) => {
      if (a.priority !== b.priority) return b.priority - a.priority;
      return this.getSpecificity(b) - this.getSpecificity(a);
    });
    
    // Merge limits from all policies
    const effectiveLimits = new Map<string, LimitDefinition>();
    
    for (const policy of policies) {
      for (const limit of policy.limits) {
        const existing = effectiveLimits.get(limit.resource);
        if (!existing || policy.priority > existing.priority) {
          effectiveLimits.set(limit.resource, limit);
        }
      }
    }
    
    // Check for active overrides
    const overrides = await this.store.getActiveOverrides(context);
    for (const override of overrides) {
      effectiveLimits.set(override.resource, override.limit);
    }
    
    return {
      limits: Array.from(effectiveLimits.values()),
      appliedPolicies: policies.map(p => p.id),
      appliedOverrides: overrides.map(o => o.id)
    };
  }
  
  private getSpecificity(policy: Policy): number {
    // More specific selectors have higher specificity
    switch (policy.selector.type) {
      case 'user': return 4;
      case 'organization': return 3;
      case 'tier': return 2;
      case 'ip_range': return 1;
      case 'all': return 0;
    }
  }
}

Operational Capabilities

•Emergency Overrides — Temporarily raise/lower limits for specific users or endpoints during incidents
•Gradual Rollout — Test new limits on a percentage of traffic before full deployment
•Automatic Expiry — Promotional limit increases that expire automatically
•Audit Logging — Track who changed what limits when, for compliance and debugging
•Dry Run Mode — Test policies without enforcement to see impact
•Alerting Integration — Notify when users approach or exceed limits

Summary: Per-User vs Per-API Limits

Rate limiting policy design is as important as the underlying algorithms. Let's consolidate our understanding:

Key Takeaways

•Identity matters — API key, user, organization, and IP address each serve different purposes. Combine for defense in depth.
•Resources vary in cost — Not all requests are equal. Per-endpoint limits and weighted costing protect expensive operations.
•Hierarchies provide flexibility — Multi-level limits enable fine-grained control while maintaining overall capacity protection.
•Business tiers drive design — Rate limits should align with pricing tiers, providing clear upgrade paths.
•Learn from industry — Stripe's idempotency, GitHub's point-based GraphQL, OpenAI's token limits—each solves specific problems.
•Management is operational — Dynamic policy updates, overrides, and gradual rollouts are essential for production systems.

What's Next:

We've designed policies—but how do we communicate them to clients? In the final page, we'll explore Client Communication—the headers, error responses, and retry strategies that create a great developer experience even when rate limiting is engaged.

Page Complete

You now understand how to design sophisticated rate limiting policies across multiple dimensions—identity, resource, and business tier. You can create hierarchical policies, configure tier-based limits, and implement dynamic policy management. Next, we'll master client communication.

Per-User vs Per-API Limits

Designing Fair and Flexible Rate Limit Policies

What You Will Master

Identity Dimensions: Who Are We Limiting?

The first decision in rate limit policy design is identity: what entity are we tracking and limiting? The choice profoundly affects both security and user experience.

Common Identity Dimensions
Identity	Best For	Challenges	Examples
API Key	Authenticated APIs	Key sharing, key leakage	Stripe, Twilio, OpenAI
User ID	Multi-device users	Requires authentication	GitHub, Slack
Organization/Tenant	B2B SaaS	Large orgs need high limits	AWS, Salesforce
IP Address	Unauthenticated endpoints	NAT, shared IPs, VPNs	Login pages, public APIs
Device/Client ID	Mobile apps	Device spoofing	Mobile gaming, streaming
Composite	Defense in depth	Complex configuration	Most production systems

API Key Limiting

The most common approach for authenticated APIs. Each API key gets its own quota.

Advantages:

Clear accountability (key owner is responsible)
Easy to track, bill, and report usage
Keys can be revoked if abused
Natural alignment with business tiers

Challenges:

Key sharing: Multiple users share one key to circumvent limits
Key proliferation: Users create multiple keys to multiply limits
Leaked keys: Attackers use stolen keys, impacting legitimate owners

Best Practices:

// Per-key limits with abuse detection
interface APIKeyPolicy {
  keyId: string;
  limits: {
    requestsPerMinute: number;
    requestsPerDay: number;
    concurrentRequests: number;
  };
  flags: {
    allowOverageWith429: boolean;  // Soft vs hard limit
    suspiciousActivityThreshold: number;  // Trigger investigation
  };
}

User ID Limiting

Limits tied to the authenticated user, regardless of which API keys or sessions they use.

Advantages:

Prevents circumvention via multiple API keys
Limits follow user across devices
Better for consumer applications where users shouldn't manage keys

Challenges:

Requires authentication before rate limiting
For some endpoints, authentication may be expensive
Session management adds complexity

Pattern: Key and User Limits Together

User: alice@example.com
  ├── API Key: key_prod_abc → 10,000 req/day (development key)
  ├── API Key: key_prod_xyz → 10,000 req/day (production key)
  └── User Total: 15,000 req/day (regardless of key used)

Result: Even with 2 keys, Alice can't exceed 15,000/day total

IP Address Limiting

Essential for unauthenticated endpoints and as a fallback layer.

Advantages:

No authentication required
Simple to implement
Good for login/signup protection

Challenges:

NAT: Many users behind same IP (universities, corporations)
Proxies/VPNs: Users change IPs to bypass limits
Mobile carriers: IPs shared among thousands of users
CDN/Load Balancers: All traffic may appear from a few IPs

Best Practices:

function getRateLimitIP(request: Request): string {
  // Check for forwarded headers (from trusted proxies)
  const forwarded = request.headers['x-forwarded-for'];
  if (forwarded && isFromTrustedProxy(request)) {
    return forwarded.split(',')[0].trim();  // Client's real IP
  }
  
  // Check Cloudflare header
  const cfIP = request.headers['cf-connecting-ip'];
  if (cfIP) return cfIP;
  
  // Fallback to direct connection IP
  return request.socket.remoteAddress;
}

Handle Shared IPs:

Set higher limits for known corporate/university IP ranges
Use IP + User-Agent combination for finer granularity
Implement CAPTCHA/challenge for IPs near limits

The Corporate NAT Problem

Organization/Tenant Limiting

For B2B SaaS, limits often apply at the organization level:

Organization: Acme Corp
  ├── User: alice → No individual limit
  ├── User: bob → No individual limit
  ├── User: carol → No individual limit
  └── Org Total: 1,000,000 req/day (shared pool)

Why Organization Limits:

Align with billing (orgs pay for capacity)
Large teams need more capacity than individuals
Simplifies management (admins control allocation)
Encourages teams to self-govern usage

Combined Org + User Limits:

Organization: 1,000,000 req/day
  └── Per-User within Org: 100,000 req/day

Result: Org can use 1M total, but no single user can use more than 100K
This prevents one runaway user from starving the team

Resource Dimensions: What Are We Limiting?

Beyond who, we must decide what resource to limit. Different operations have vastly different costs, and a single global limit is often too coarse.

Global vs. Per-Endpoint Limits

Global Limit Only:

API Key: 10,000 requests/hour (any endpoint)

/api/search: 0 cost awareness
/api/export: 0 cost awareness
/api/simple: 0 cost awareness

Problem: A single /api/export call might consume 100× the resources of /api/simple, but both count as 1 request.

Per-Endpoint Limits:

API Key: 10,000 requests/hour (global)
  /api/search: 100 requests/minute
  /api/export: 10 requests/hour
  /api/simple: 1,000 requests/minute

Now expensive operations are protected separately, and cheap operations can sustain higher throughput.

endpoint-rate-limits.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
/**
 * Endpoint-specific rate limit configuration
 */
interface EndpointRateLimits {
  // Global limit applies to all endpoints
  global: RateLimit;
  
  // Per-endpoint overrides
  endpoints: {
    [pattern: string]: RateLimit;
  };
}
 
interface RateLimit {
  requests: number;
  window: 'second' | 'minute' | 'hour' | 'day';
  burst?: number;  // Optional burst allowance
}
 
// Example configuration
const apiLimits: EndpointRateLimits = {
  global: { requests: 10000, window: 'hour' },
  endpoints: {
    // Expensive operations
    'POST /api/export': { requests: 10, window: 'hour' },
    'POST /api/analyze': { requests: 100, window: 'hour' },
    
    // Medium-cost operations
    'GET /api/search': { requests: 100, window: 'minute' },
    'POST /api/upload': { requests: 50, window: 'minute' },
    
    // Cheap operations (high limit)
    'GET /api/status': { requests: 1000, window: 'minute' },
    
    // Pattern-based limits
    'GET /api/users/*': { requests: 500, window: 'minute' },
    'POST /api/webhooks/*': { requests: 1000, window: 'hour' },
  }
};
 
/**
 * Rate limit middleware that checks applicable limits
 */
async function rateLimitMiddleware(
  request: Request,
  limits: EndpointRateLimits,
  limiter: RateLimiter
): Promise<RateLimitResult> {
  const clientId = extractClientId(request);
  const endpoint = `${request.method} ${request.path}`;
  
  // Check global limit first
  const globalResult = await limiter.checkLimit(
    `${clientId}:global`,
    limits.global.requests,
    windowToSeconds(limits.global.window)
  );
  
  if (!globalResult.allowed) {
    return globalResult;
  }
  
  // Check endpoint-specific limit
  const endpointLimit = findMatchingLimit(endpoint, limits.endpoints);
  if (endpointLimit) {
    const endpointResult = await limiter.checkLimit(
      `${clientId}:${endpoint}`,
      endpointLimit.requests,
      windowToSeconds(endpointLimit.window)
    );
    
    if (!endpointResult.allowed) {
      return endpointResult;
    }
  }
  
  return { allowed: true, remaining: globalResult.remaining };
}

Weighted Request Costing

Instead of per-endpoint limits, assign costs to operations:

API Key: 100,000 'credits' per hour

GET /api/simple: 1 credit
GET /api/search: 10 credits
POST /api/upload: 50 credits
POST /api/export: 500 credits

Advantages:

Single unified quota is easier to explain
Flexible: use cheap calls more, or fewer expensive calls
Aligns well with usage-based billing

Disadvantages:

Less intuitive for users ("how many credits do I have left?")
Requires documenting costs for every operation
Costs may need tuning as backend efficiency changes

Cost-Based Rate Limiting Examples
Operation	Typical Cost	Rationale
Read from cache	1 unit	Very cheap, served from memory
Read from database	5 units	I/O bound, uses DB connections
Complex search	20 units	CPU intensive, index scanning
Write operation	10 units	Requires replication, durability
File upload	50 units	Storage, bandwidth, processing
ML inference	100 units	GPU time, expensive compute
Report generation	500 units	Long-running, resource heavy

Hierarchical Rate Limiting

Real-world rate limiting policies are rarely one-dimensional. Hierarchical rate limiting applies limits at multiple levels of a hierarchy, where requests must satisfy all applicable limits.

The Limit Hierarchy

Level 1: Infrastructure Limit (protect the platform)
    │
    └── Level 2: Organization Limit (capacity per paying customer)
            │
            └── Level 3: User Limit (fairness within org)
                    │
                    └── Level 4: Endpoint Limit (protect expensive operations)
                            │
                            └── Level 5: IP Limit (prevent credential stuffing)

Evaluation Order: A request must pass ALL levels:

Infrastructure: Is global capacity available? (10M req/min)
Organization: Has Acme Corp used their quota? (1M req/day)
User: Has alice@acme.com hit her limit? (100K req/day)
Endpoint: Has Alice hit /api/export limit? (10 req/hour)
IP: Has 203.0.113.50 been flagged? (100 req/min)

If ANY level fails → 429 Too Many Requests

hierarchical-limiter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
/**
 * Hierarchical rate limiter that evaluates multiple levels
 */
interface RateLimitHierarchy {
  levels: RateLimitLevel[];
}
 
interface RateLimitLevel {
  name: string;
  extractKey: (request: Request) => string;
  getLimit: (request: Request) => RateLimitConfig;
  priority: number;  // Lower = check first
}
 
class HierarchicalRateLimiter {
  constructor(
    private readonly hierarchy: RateLimitLevel[],
    private readonly limiter: RateLimiter
  ) {
    // Sort by priority
    this.hierarchy.sort((a, b) => a.priority - b.priority);
  }
  
  async checkRequest(request: Request): Promise<HierarchicalResult> {
    const results: LevelResult[] = [];
    
    for (const level of this.hierarchy) {
      const key = level.extractKey(request);
      const config = level.getLimit(request);
      
      const result = await this.limiter.checkLimit(
        `${level.name}:${key}`,
        config.requests,
        config.windowSeconds
      );
      
      results.push({
        level: level.name,
        key,
        ...result
      });
      
      if (!result.allowed) {
        // This level blocked the request
        return {
          allowed: false,
          blockedBy: level.name,
          levels: results,
          retryAfter: result.retryAfter
        };
      }
    }
    
    // All levels passed
    // Return the most constrained remaining count
    const minRemaining = Math.min(...results.map(r => r.remaining));
    
    return {
      allowed: true,
      levels: results,
      remaining: minRemaining
    };
  }
}
 
// Example hierarchy configuration
const apiHierarchy: RateLimitLevel[] = [
  {
    name: 'infrastructure',
    priority: 0,
    extractKey: () => 'global',
    getLimit: () => ({ requests: 10000000, windowSeconds: 60 })
  },
  {
    name: 'organization',
    priority: 1,
    extractKey: (req) => req.auth.organizationId,
    getLimit: (req) => getOrgLimit(req.auth.organizationId)
  },
  {
    name: 'user',
    priority: 2,
    extractKey: (req) => req.auth.userId,
    getLimit: (req) => getUserLimit(req.auth.userId)
  },
  {
    name: 'endpoint',
    priority: 3,
    extractKey: (req) => `${req.auth.userId}:${req.method}:${req.path}`,
    getLimit: (req) => getEndpointLimit(req.path)
  },
  {
    name: 'ip',
    priority: 4,
    extractKey: (req) => req.clientIP,
    getLimit: () => ({ requests: 1000, windowSeconds: 60 })
  }
];

Optimization: Early Exit

Limit Inheritance and Defaults

Hierarchies should support inheritance for configuration simplicity:

defaults:
  user:
    requests_per_hour: 10000
  endpoint:
    GET: 1000/minute
    POST: 100/minute
    DELETE: 10/minute

overrides:
  organizations:
    acme_corp:
      user:
        requests_per_hour: 100000  # 10x default
      endpoints:
        POST /api/export:
          requests: 100/hour  # Override specific endpoint
          
  users:
    alice@acme.com:
      requests_per_hour: 500000  # Power user exception

Resolution: Most specific wins. Alice gets 500K/hour even though Acme's default is 100K.

Business Tier Differentiation

Example Tier-Based Rate Limits (Inspired by Real APIs)
Limit Dimension	Free	Pro ($)	Enterprise ($$)
Requests/month	1,000	100,000	10,000,000+
Requests/second	10	100	1,000+
Concurrent connections	5	50	Unlimited
File upload size	10 MB	100 MB	1 GB
API versions	Latest only	Last 2	All supported
Support response time	Community	24 hours	1 hour SLA
Custom limits	No	No	Yes

Designing Tier Limits

Principle 1: Free Tier Should Be Genuinely Useful

Too restrictive → No one tries your API
Too generous → No reason to upgrade
Sweet spot: Enough for development, hobby projects, and evaluation

Principle 2: Clear Upgrade Path

Users should hit limits during growth, not during evaluation
Provide warnings at 80%, 90%, 95% usage
Make upgrading frictionless (one-click, no data migration)

Principle 3: Enterprise = Flexibility

Large customers have unique needs
Negotiate limits per contract
Offer self-service limit increases with billing

Principle 4: Protect the Platform at All Tiers

Even unlimited plans have fair use limits
Reserve infrastructure capacity for everyone
Burst limits prevent any single customer from impacting others

tier-based-limits.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
/**
 * Tier-based rate limit configuration
 */
interface TierConfig {
  tier: 'free' | 'pro' | 'business' | 'enterprise';
  limits: {
    global: RateLimit;
    endpoints: Record<string, RateLimit>;
    features: {
      maxFileSize: number;
      maxConcurrent: number;
      burstMultiplier: number;  // How much burst allowed
    };
  };
  overages: {
    allowed: boolean;
    pricePerRequest?: number;  // Usage-based pricing
    hardCap?: number;  // Absolute maximum
  };
}
 
const tierConfigs: Record<string, TierConfig> = {
  free: {
    tier: 'free',
    limits: {
      global: { requests: 1000, window: 'month' },
      endpoints: {
        'POST /api/ai/*': { requests: 100, window: 'month' },
      },
      features: {
        maxFileSize: 10 * 1024 * 1024,  // 10 MB
        maxConcurrent: 5,
        burstMultiplier: 1.0,  // No burst
      }
    },
    overages: {
      allowed: false,  // Hard stop at limit
    }
  },
  
  pro: {
    tier: 'pro',
    limits: {
      global: { requests: 100000, window: 'month' },
      endpoints: {
        'POST /api/ai/*': { requests: 10000, window: 'month' },
      },
      features: {
        maxFileSize: 100 * 1024 * 1024,  // 100 MB
        maxConcurrent: 50,
        burstMultiplier: 2.0,  // 2x burst allowed
      }
    },
    overages: {
      allowed: true,
      pricePerRequest: 0.001,  // $0.001 per extra request
      hardCap: 500000,  // Can't exceed 5x limit even with overages
    }
  },
  
  enterprise: {
    tier: 'enterprise',
    limits: {
      global: { requests: 10000000, window: 'month' },  // Base, negotiable
      endpoints: {},  // All endpoint limits negotiated
      features: {
        maxFileSize: 1024 * 1024 * 1024,  // 1 GB
        maxConcurrent: 1000,
        burstMultiplier: 5.0,  // High burst for batch operations
      }
    },
    overages: {
      allowed: true,
      pricePerRequest: 0.0001,  // Volume discount
      // No hard cap for enterprise
    }
  }
};
 
/**
 * Get effective limits for a user
 */
function getEffectiveLimits(user: User): TierConfig {
  const baseConfig = tierConfigs[user.tier];
  
  // Apply any negotiated overrides
  if (user.customLimits) {
    return mergeConfigs(baseConfig, user.customLimits);
  }
  
  return baseConfig;
}

Soft Limits vs. Hard Limits

Soft Limits (Overages Allowed):

Request continues, but user is billed for overage
Better UX (service doesn't stop unexpectedly)
Requires billing infrastructure
Risk: Surprise bills (implement spend alerts!)

Hard Limits (Strict Enforcement):

Request is rejected with 429
Clear, predictable behavior
Simpler implementation
Risk: Frustrated users, lost revenue

Hybrid Approach (Common):

0-100%: Included in plan
100-200%: Overage charges apply
200%+: Hard rejection (must upgrade plan)

This allows some flexibility while preventing runaway usage.

Real-World Policy Patterns

Let's examine rate limiting policies from major API providers to understand how these concepts work in practice.

Stripe's Rate Limiting

Stripe limits at multiple dimensions simultaneously:

By API Key:

Live mode: 100 requests/second
Test mode: 25 requests/second
Per-endpoint limits for expensive operations

By Resource:

Create customer: 100/second
Create payment intent: 100/second
List operations: 100/second (but paginated)
Idempotent requests: Cached, don't count against limit

By IP (Fraudulent Activity):

Card creation from suspicious IPs: Aggressive limiting
Geographic anomaly detection

Key Innovation: Stripe uses request ID idempotency. Retries with the same idempotency key don't consume quota—critical for payment reliability.

GitHub's Rate Limiting

Authenticated Requests:

5,000 requests/hour per user
Resets each hour (fixed window)

Unauthenticated Requests:

60 requests/hour per IP
Forces authentication for serious use

GraphQL API:

Point-based system instead of request count
Complex queries cost more points
5,000 points/hour

Search API:

30 requests/minute authenticated
10 requests/minute unauthenticated
More restrictive due to expense

Key Innovation: GitHub's point-based GraphQL limiting accurately reflects query cost. Fetching 1 field costs 1 point; fetching 100 nested objects costs 100+ points.

OpenAI's Rate Limiting

Multi-Dimensional Limits:

Requests per minute (RPM)
Tokens per minute (TPM)
Tokens per day (TPD)

Tier Progression:

Free: 3 RPM, 40K TPM
Tier 1: 3,500 RPM, 90K TPM ($5 min spend)
Tier 2: 5,000 RPM, 450K TPM ($50 min spend)
Tier 3: 5,000 RPM, 1M TPM ($100 min spend)
Tier 4: 10,000 RPM, 2M TPM ($250 min spend)
Tier 5: 10,000 RPM, 10M TPM ($1000 min spend)

Model-Specific Limits:

GPT-4: Lower limits (expensive)
GPT-3.5: Higher limits (cheaper)

Key Innovation: Token-based limiting accurately reflects LLM costs. A request generating 1,000 tokens costs 20× more than 50 tokens, reflected in TPM limits.

Rate Limiting Patterns Across Industries
Industry	Primary Limit Dimension	Secondary Dimensions	Key Consideration
Payments (Stripe)	Per API Key + Per Endpoint	Per-card, Per-IP	Idempotency for retries
AI/ML (OpenAI)	Tokens Not Requests	Per-model, Per-tier	Cost varies by output length
Developer Tools (GitHub)	Per-user + Per-IP	Per-endpoint (search)	GraphQL query complexity
Communication (Twilio)	Per-account + Per-number	Per-message type	Carrier regulations
Cloud (AWS)	Per-service + Per-account	Per-region, Per-resource	Service-specific limits
Social (Twitter/X)	Per-user + Per-app	Per-endpoint, Per-feature	Read vs. write distinction

Policy Configuration and Management

Production rate limiting requires dynamic policy management—the ability to update limits without deployment, create custom overrides, and respond to incidents rapidly.

policy-management.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
/**
 * Rate limit policy management system
 */
interface PolicyStore {
  // Get effective policy for a request
  getPolicy(context: PolicyContext): Promise<EffectivePolicy>;
  
  // Policy CRUD operations
  createPolicy(policy: Policy): Promise<Policy>;
  updatePolicy(id: string, policy: Partial<Policy>): Promise<Policy>;
  deletePolicy(id: string): Promise<void>;
  
  // Override management
  createOverride(override: PolicyOverride): Promise<PolicyOverride>;
  listOverrides(filters: OverrideFilters): Promise<PolicyOverride[]>;
}
 
interface Policy {
  id: string;
  name: string;
  description: string;
  
  // Selector: who does this policy apply to?
  selector: PolicySelector;
  
  // Limits: what are the limits?
  limits: LimitDefinition[];
  
  // Metadata
  priority: number;  // Higher = takes precedence
  enabled: boolean;
  createdAt: Date;
  updatedAt: Date;
  version: number;  // For optimistic concurrency
}
 
interface PolicySelector {
  type: 'all' | 'tier' | 'organization' | 'user' | 'ip_range';
  value?: string;  // e.g., tier name, org ID, user ID, CIDR
  conditions?: PolicyCondition[];  // Additional conditions
}
 
interface LimitDefinition {
  resource: string;  // 'global', 'endpoint:/api/export', etc.
  limit: number;
  window: string;  // '1m', '1h', '1d'
  burst?: number;
  action: 'reject' | 'throttle' | 'warn';
}
 
/**
 * Policy resolution with inheritance and priority
 */
class PolicyResolver {
  constructor(private readonly store: PolicyStore) {}
  
  async resolve(context: PolicyContext): Promise<EffectivePolicy> {
    // Get all applicable policies
    const policies = await this.store.getPoliciesForContext(context);
    
    // Sort by priority (descending) and specificity
    policies.sort((a, b) => {
      if (a.priority !== b.priority) return b.priority - a.priority;
      return this.getSpecificity(b) - this.getSpecificity(a);
    });
    
    // Merge limits from all policies
    const effectiveLimits = new Map<string, LimitDefinition>();
    
    for (const policy of policies) {
      for (const limit of policy.limits) {
        const existing = effectiveLimits.get(limit.resource);
        if (!existing || policy.priority > existing.priority) {
          effectiveLimits.set(limit.resource, limit);
        }
      }
    }
    
    // Check for active overrides
    const overrides = await this.store.getActiveOverrides(context);
    for (const override of overrides) {
      effectiveLimits.set(override.resource, override.limit);
    }
    
    return {
      limits: Array.from(effectiveLimits.values()),
      appliedPolicies: policies.map(p => p.id),
      appliedOverrides: overrides.map(o => o.id)
    };
  }
  
  private getSpecificity(policy: Policy): number {
    // More specific selectors have higher specificity
    switch (policy.selector.type) {
      case 'user': return 4;
      case 'organization': return 3;
      case 'tier': return 2;
      case 'ip_range': return 1;
      case 'all': return 0;
    }
  }
}

Operational Capabilities

•Emergency Overrides — Temporarily raise/lower limits for specific users or endpoints during incidents
•Gradual Rollout — Test new limits on a percentage of traffic before full deployment
•Automatic Expiry — Promotional limit increases that expire automatically
•Audit Logging — Track who changed what limits when, for compliance and debugging
•Dry Run Mode — Test policies without enforcement to see impact
•Alerting Integration — Notify when users approach or exceed limits

Summary: Per-User vs Per-API Limits

Rate limiting policy design is as important as the underlying algorithms. Let's consolidate our understanding:

Key Takeaways

•Identity matters — API key, user, organization, and IP address each serve different purposes. Combine for defense in depth.
•Resources vary in cost — Not all requests are equal. Per-endpoint limits and weighted costing protect expensive operations.
•Hierarchies provide flexibility — Multi-level limits enable fine-grained control while maintaining overall capacity protection.
•Business tiers drive design — Rate limits should align with pricing tiers, providing clear upgrade paths.
•Learn from industry — Stripe's idempotency, GitHub's point-based GraphQL, OpenAI's token limits—each solves specific problems.
•Management is operational — Dynamic policy updates, overrides, and gradual rollouts are essential for production systems.

What's Next:

Page Complete