Rate Limiting At Gateway - Learning Module

Loading content...

0/273

Per-User vs Per-API Limits

One Size Does Not Fit All

A free-tier user testing your API and an enterprise customer powering production workloads have vastly different needs. A search endpoint hitting Elasticsearch and a simple health check endpoint have vastly different costs. A single global rate limit ignores these realities.

Effective rate limiting is multi-dimensional. It considers who is making the request, what resource they're accessing, and what tier of service they're entitled to. This page explores the strategies for implementing sophisticated, fair, and business-aligned rate limiting.

What You Will Learn

By the end of this page, you will understand per-user and per-API limiting strategies, hierarchical limit structures, tier-based differentiation, and how to balance fairness, protection, and business requirements in your rate limiting design.

Dimensions of Rate Limiting

Rate limits can be applied along multiple dimensions. Understanding these dimensions helps you design limits that are both protective and fair.

Rate Limiting Dimensions
Dimension	Identifier	Purpose	Example
Global	None (aggregate)	Protect overall system capacity	10,000 req/sec across all users
Per-IP	Client IP address	Block distributed attacks, anonymous abuse	100 req/min per IP
Per-User	User ID, Session	Fair usage per authenticated user	1,000 req/hour per user
Per-API Key	API Key	Enforce contract limits for developers	10,000 req/day per key
Per-Tenant	Organization ID	Multi-tenant isolation	50,000 req/hour per org
Per-Endpoint	URL path/method	Protect expensive operations	10 req/min on /export
Per-Resource	Resource ID	Prevent resource-specific abuse	100 req/min per document

Layer Your Limits

Production systems typically apply multiple dimensions simultaneously. A request might be checked against global limits, per-IP limits, per-user limits, AND per-endpoint limits. Each dimension catches different abuse patterns.

Per-User Rate Limiting

Per-user rate limiting is the foundation of fair API access. It ensures that each authenticated user gets their allocated share of resources, regardless of how other users behave.

Benefits of Per-User Limits

•Fairness — Heavy users can't monopolize shared resources
•Predictability — Users can rely on their allocated capacity
•Accountability — Usage is tracked to identifiable accounts
•Monetization — Different tiers can have different limits
•Abuse Detection — Unusual patterns for a specific user are easier to spot

per-user-limiter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
interface UserLimitConfig {
  tier: 'free' | 'pro' | 'enterprise';
  customLimits?: Partial<TierLimits>;
}
 
interface TierLimits {
  requestsPerMinute: number;
  requestsPerHour: number;
  requestsPerDay: number;
  burstSize: number;
}
 
const TIER_LIMITS: Record<string, TierLimits> = {
  free: {
    requestsPerMinute: 20,
    requestsPerHour: 500,
    requestsPerDay: 5000,
    burstSize: 10,
  },
  pro: {
    requestsPerMinute: 100,
    requestsPerHour: 5000,
    requestsPerDay: 50000,
    burstSize: 50,
  },
  enterprise: {
    requestsPerMinute: 1000,
    requestsPerHour: 50000,
    requestsPerDay: 500000,
    burstSize: 200,
  },
};
 
class PerUserRateLimiter {
  private limiters: Map<string, MultiWindowLimiter>;
  private userConfigs: Map<string, UserLimitConfig>;
 
  async checkLimit(userId: string): Promise<LimitResult> {
    const config = await this.getUserConfig(userId);
    const limits = this.getEffectiveLimits(config);
    
    // Check all time windows
    const results = await Promise.all([
      this.checkWindow(userId, 'minute', limits.requestsPerMinute),
      this.checkWindow(userId, 'hour', limits.requestsPerHour),
      this.checkWindow(userId, 'day', limits.requestsPerDay),
    ]);
    
    // Find the most restrictive limit that's hit
    const blocked = results.find(r => !r.allowed);
    if (blocked) {
      return {
        allowed: false,
        retryAfter: blocked.retryAfter,
        limitType: blocked.window,
        remaining: 0,
      };
    }
    
    // Return the most restrictive remaining count
    const minRemaining = Math.min(...results.map(r => r.remaining));
    return {
      allowed: true,
      remaining: minRemaining,
      limits: {
        minute: results[0].remaining,
        hour: results[1].remaining,
        day: results[2].remaining,
      },
    };
  }
 
  private getEffectiveLimits(config: UserLimitConfig): TierLimits {
    const baseLimits = TIER_LIMITS[config.tier];
    return { ...baseLimits, ...config.customLimits };
  }
}

The Identity Challenge

Per-user limiting requires reliable user identification. For authenticated APIs, this is straightforward. For partially authenticated or anonymous access, you may need to fall back to IP-based or device fingerprint-based limiting, which is less precise.

Per-API and Per-Endpoint Limiting

Not all API endpoints are equal. Some operations are cheap (reading cached data), while others are expensive (complex queries, external API calls, data exports). Per-endpoint limits protect expensive operations while allowing high throughput on inexpensive ones.

Endpoint Cost Categories
Category	Examples	Suggested Limit Approach
No-Cost	Health checks, ping, status	No limit or very high (monitoring traffic)
Low-Cost	Read single resource, user profile	High limits (100s/minute)
Medium-Cost	List with pagination, search	Moderate limits (10s/minute)
High-Cost	Complex aggregations, reports	Low limits (10s/hour)
Expensive	Data export, batch operations	Very low limits (few/day)
External	Calls to third-party APIs	Match third-party limits

endpoint-limiter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
interface EndpointConfig {
  pattern: string;                    // URL pattern (glob or regex)
  methods: string[];                  // HTTP methods
  limits: {
    requestsPerMinute?: number;
    requestsPerHour?: number;
    tokenCost?: number;               // For token bucket
  };
  bypassForTiers?: string[];          // Tiers that bypass this limit
}
 
const ENDPOINT_CONFIGS: EndpointConfig[] = [
  {
    pattern: '/health',
    methods: ['GET'],
    limits: { requestsPerMinute: Infinity },  // No limit
  },
  {
    pattern: '/api/v1/users/:id',
    methods: ['GET'],
    limits: { requestsPerMinute: 100 },
  },
  {
    pattern: '/api/v1/search',
    methods: ['GET', 'POST'],
    limits: { requestsPerMinute: 30, tokenCost: 5 },
  },
  {
    pattern: '/api/v1/export',
    methods: ['POST'],
    limits: { requestsPerHour: 5, tokenCost: 50 },
    bypassForTiers: ['enterprise'],
  },
  {
    pattern: '/api/v1/*/batch',
    methods: ['POST'],
    limits: { requestsPerMinute: 10 },
  },
];
 
class EndpointRateLimiter {
  private configs: EndpointConfig[];
  private limiters: Map<string, RateLimiter>;
 
  getEndpointConfig(method: string, path: string): EndpointConfig | null {
    return this.configs.find(config => 
      config.methods.includes(method) && this.matchPattern(config.pattern, path)
    ) ?? null;
  }
 
  async checkEndpointLimit(
    userId: string,
    method: string,
    path: string,
    userTier: string
  ): Promise<LimitResult> {
    const config = this.getEndpointConfig(method, path);
    
    if (!config) {
      return { allowed: true, remaining: Infinity };
    }
    
    // Check if user tier bypasses this endpoint limit
    if (config.bypassForTiers?.includes(userTier)) {
      return { allowed: true, remaining: Infinity };
    }
    
    // Create composite key for per-user-per-endpoint limiting
    const limitKey = `${userId}:${method}:${this.normalizePattern(config.pattern)}`;
    
    return this.checkLimit(limitKey, config.limits);
  }
}

Hierarchical Rate Limiting

Real-world systems often need hierarchical limits where a parent entity (organization) has an overall limit, and child entities (users) have individual limits within that. This prevents a single user from consuming an organization's entire allocation.

Converting Mermaid diagram...

hierarchical-limiter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
interface HierarchicalLimitCheck {
  level: string;
  key: string;
  limit: number;
  window: string;
}
 
class HierarchicalRateLimiter {
  /**
   * Check limits at multiple hierarchy levels.
   * All levels must pass for the request to be allowed.
   */
  async checkHierarchy(request: Request): Promise<LimitResult> {
    const checks = this.buildHierarchy(request);
    
    const results = await Promise.all(
      checks.map(check => this.checkSingleLimit(check))
    );
    
    // Find first blocked level
    for (let i = 0; i < results.length; i++) {
      if (!results[i].allowed) {
        return {
          allowed: false,
          blockedAt: checks[i].level,
          retryAfter: results[i].retryAfter,
          message: `Rate limit exceeded at ${checks[i].level} level`,
        };
      }
    }
    
    // Return the most restrictive remaining count
    const minRemaining = Math.min(...results.map(r => r.remaining));
    return { allowed: true, remaining: minRemaining };
  }
 
  private buildHierarchy(request: Request): HierarchicalLimitCheck[] {
    const userId = request.userId;
    const orgId = request.organizationId;
    const endpoint = request.endpoint;
    
    const checks: HierarchicalLimitCheck[] = [];
    
    // Level 1: Global limit
    checks.push({
      level: 'global',
      key: 'global',
      limit: 100000,
      window: 'minute',
    });
    
    // Level 2: Organization limit (if applicable)
    if (orgId) {
      const orgLimit = await this.getOrgLimit(orgId);
      checks.push({
        level: 'organization',
        key: `org:${orgId}`,
        limit: orgLimit,
        window: 'minute',
      });
    }
    
    // Level 3: User limit
    const userLimit = await this.getUserLimit(userId);
    checks.push({
      level: 'user',
      key: `user:${userId}`,
      limit: userLimit,
      window: 'minute',
    });
    
    // Level 4: Endpoint limit for this user
    const endpointLimit = this.getEndpointLimit(endpoint);
    if (endpointLimit) {
      checks.push({
        level: 'endpoint',
        key: `user:${userId}:endpoint:${endpoint}`,
        limit: endpointLimit,
        window: 'minute',
      });
    }
    
    return checks;
  }
}

Hierarchical Limit Trade-offs

Each hierarchy level adds a rate limit check, increasing latency. For most APIs, 2-3 levels (global + user + endpoint) are sufficient. Deep hierarchies (5+ levels) may add noticeable overhead.

Tier-Based Rate Limiting

Most APIs differentiate limits based on subscription tier. This enables monetization while maintaining fair access for all users.

Example Tier Structure
Tier	Req/Min	Req/Hour	Req/Day	Cost Weight	Priority
Anonymous	10	100	500	1.5x	Low
Free	30	500	5,000	1x	Normal
Starter	100	3,000	30,000	1x	Normal
Pro	500	20,000	200,000	0.8x	High
Enterprise	Custom	Custom	Custom	0.5x	Highest

tier-config.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
interface TierConfig {
  id: string;
  name: string;
  limits: {
    requestsPerMinute: number;
    requestsPerHour: number;
    requestsPerDay: number;
    concurrentRequests: number;
  };
  features: {
    burstMultiplier: number;     // Allow 2x burst for premium
    priorityQueue: boolean;       // Premium users get priority
    customEndpoints: boolean;     // Access to premium endpoints
    reducedCostWeight: number;    // 0.8 = 20% more effective requests
  };
  pricing: {
    monthlyPrice: number;
    overageRate: number;          // Cost per 1000 requests over limit
    allowOverage: boolean;
  };
}
 
const TIERS: TierConfig[] = [
  {
    id: 'free',
    name: 'Free',
    limits: {
      requestsPerMinute: 30,
      requestsPerHour: 500,
      requestsPerDay: 5000,
      concurrentRequests: 2,
    },
    features: {
      burstMultiplier: 1.0,
      priorityQueue: false,
      customEndpoints: false,
      reducedCostWeight: 1.0,
    },
    pricing: {
      monthlyPrice: 0,
      overageRate: 0,
      allowOverage: false,  // Hard limit
    },
  },
  {
    id: 'pro',
    name: 'Pro',
    limits: {
      requestsPerMinute: 500,
      requestsPerHour: 20000,
      requestsPerDay: 200000,
      concurrentRequests: 10,
    },
    features: {
      burstMultiplier: 2.0,
      priorityQueue: true,
      customEndpoints: true,
      reducedCostWeight: 0.8,
    },
    pricing: {
      monthlyPrice: 99,
      overageRate: 0.50,    // $0.50 per 1000 over
      allowOverage: true,
    },
  },
];

Graceful Upgrades

When users hit limits, provide a clear upgrade path. Include upgrade URLs in 429 responses. This turns rate limiting from a frustration into a conversion opportunity.

Best Practices

Designing multi-dimensional rate limits requires careful thought. Here are battle-tested practices from production systems:

Rate Limiting Design Principles

•Start with user limits, add dimensions as needed — Over-engineered limits add complexity without benefit. Start simple.
•Make limits visible and predictable — Include X-RateLimit-* headers in every response. Surprises frustrate developers.
•Provide granular feedback — Tell users which limit they hit (user hourly? endpoint? global?) so they can adapt.
•Allow bursts within reason — Legitimate usage is often bursty. Token buckets or generous burst sizes prevent frustration.
•Monitor and adjust — Track how often users hit limits. If 10% of legitimate users hit limits regularly, they're too restrictive.
•Implement graceful degradation — Consider returning cached/stale data instead of 429 for some read endpoints.
•Plan for limit increases — Make limits configurable without code changes. Customers will ask for increases.
•Test at limits — Load test at and beyond your limits to verify system behavior under constraint.

rate-limit-response.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Informative 429 response with all dimensions
{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "You have exceeded your hourly request limit",
    "details": {
      "limit_type": "user_hourly",
      "limit": 1000,
      "used": 1000,
      "resets_at": "2024-01-15T14:00:00Z",
      "retry_after_seconds": 847
    },
    "other_limits": {
      "user_daily": { "limit": 10000, "remaining": 4523 },
      "endpoint_minute": { "limit": 30, "remaining": 28 }
    },
    "upgrade_url": "https://api.example.com/pricing",
    "documentation_url": "https://docs.example.com/rate-limits"
  }
}

Summary: Per-User vs Per-API Limits

Key Takeaways

•Rate limiting is multi-dimensional — Global, IP, user, tenant, endpoint, and resource dimensions each catch different abuse patterns.
•Per-user limits ensure fairness — Each user gets their allocation regardless of others' behavior.
•Per-endpoint limits protect resources — Expensive operations need stricter limits than cheap ones.
•Hierarchical limits prevent monopolization — Org limits prevent single users from consuming team allocation.
•Tiers enable monetization — Different limits for different subscription levels.
•Transparency builds trust — Clear headers, informative errors, and documented limits.

What's Next:

So far, we've assumed rate limiting on a single server. But modern systems span multiple regions and instances. The next page tackles Distributed Rate Limiting—ensuring consistent limits across a global infrastructure.

Page Complete

You now understand how to design multi-dimensional rate limiting that balances protection, fairness, and business requirements. Next, we'll tackle the challenge of distributed consistency.