Loading system design...
Design a distributed rate-limiting service that protects APIs from abuse by throttling the number of requests a client can make within a configurable time window. The rate limiter should work across multiple API servers sharing state via a centralised store (Redis).
| Metric | Value |
|---|---|
| Total API requests | 10 billion / day (~115,000 per second) |
| Unique clients (API keys) | 10 million |
| Rate-limit rules | ~1,000 (endpoint × tier combinations) |
| Redis memory per client | ~100 bytes (counter + timestamp) |
| Total Redis memory | 10M clients × 100B ≈ 1 GB |
| Redis latency (p99) | < 1ms (same-AZ) |
| Rate-limit check overhead | < 2ms added to request latency |
Limit the number of requests a client (identified by user ID, API key, or IP address) can make within a configurable time window
Support multiple rate-limiting algorithms: Token Bucket, Leaky Bucket, Fixed Window Counter, Sliding Window Log, Sliding Window Counter
Return appropriate HTTP headers on every response: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
Return 429 Too Many Requests with a Retry-After header when the limit is exceeded
Support configurable rules per API endpoint (e.g., /api/login: 5 req/min, /api/search: 100 req/min)
Support tiered limits based on subscription plan (Free: 100 req/hr, Pro: 10,000 req/hr, Enterprise: unlimited)
Work in a distributed environment: multiple API servers share the same rate-limiting state
Allow burst traffic within limits (Token Bucket: bucket size = burst capacity)
Provide a dashboard/API for monitoring rate-limit metrics: throttled requests per client, top offenders, limit utilisation
Non-functional requirements define the system qualities critical to your users. Frame them as 'The system should be able to...' statements. These will guide your deep dives later.
Think about CAP theorem trade-offs, scalability limits, latency targets, durability guarantees, security requirements, fault tolerance, and compliance needs.
Frame NFRs for this specific system. 'Low latency search under 100ms' is far more valuable than just 'low latency'.
Add concrete numbers: 'P99 response time < 500ms', '99.9% availability', '10M DAU'. This drives architectural decisions.
Choose the 3-5 most critical NFRs. Every system should be 'scalable', but what makes THIS system's scaling uniquely challenging?