Design a Rate Limiter

Design a distributed rate-limiting service that protects APIs from abuse by throttling the number of requests a client can make within a configurable time window. The rate limiter should work across multiple API servers sharing state via a centralised store (Redis).

Scale Estimates

Metric	Value
Total API requests	10 billion / day (~115,000 per second)
Unique clients (API keys)	10 million
Rate-limit rules	~1,000 (endpoint × tier combinations)
Redis memory per client	~100 bytes (counter + timestamp)
Total Redis memory	10M clients × 100B ≈ 1 GB
Redis latency (p99)	< 1ms (same-AZ)
Rate-limit check overhead	< 2ms added to request latency

Non-Functional Requirements

Ultra-low latency: Rate-limit decision in < 2ms — must not meaningfully increase API response time
High availability: If the rate limiter is down, the system should degrade gracefully (fail-open or local fallback)
Accuracy: Sliding Window Counter provides ~99.97% accuracy — acceptable for most use cases
Distributed consistency: All API servers see the same counter state via Redis; atomic Lua scripts prevent race conditions
Configurability: Rules can be updated without redeploying services (hot-reload via pub/sub)
Observability: Real-time dashboard of throttled requests, top offenders, limit utilisation per endpoint

Scale Estimates

Metric

Value

Total API requests

10 billion / day (~115,000 per second)

Unique clients (API keys)

10 million

Rate-limit rules

~1,000 (endpoint × tier combinations)

Redis memory per client

~100 bytes (counter + timestamp)

Total Redis memory

10M clients × 100B ≈ 1 GB

Redis latency (p99)

< 1ms (same-AZ)

Rate-limit check overhead

< 2ms added to request latency

Non-Functional Requirements

Ultra-low latency: Rate-limit decision in < 2ms — must not meaningfully increase API response time

High availability: If the rate limiter is down, the system should degrade gracefully (fail-open or local fallback)

Accuracy: Sliding Window Counter provides ~99.97% accuracy — acceptable for most use cases

Distributed consistency: All API servers see the same counter state via Redis; atomic Lua scripts prevent race conditions

Configurability: Rules can be updated without redeploying services (hot-reload via pub/sub)

Observability: Real-time dashboard of throttled requests, top offenders, limit utilisation per endpoint

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Rate Limiter

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Rate Limiter

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1Compare the five rate-limiting algorithms. When would you choose each?

2Where should the rate limiter sit in the architecture? API Gateway, middleware, or sidecar?

3How would you implement distributed rate limiting across multiple servers?

4What happens if Redis (the rate-limit store) goes down?

5How would you implement the Token Bucket algorithm with Redis?

6How would you handle rate limiting across multiple datacenters?

7How would you design rate-limiting rules configuration and hot-reloading?

Key Topics

Asked At

Design a Rate Limiter

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1Compare the five rate-limiting algorithms. When would you choose each?

2Where should the rate limiter sit in the architecture? API Gateway, middleware, or sidecar?

3How would you implement distributed rate limiting across multiple servers?

4What happens if Redis (the rate-limit store) goes down?

5How would you implement the Token Bucket algorithm with Redis?

6How would you handle rate limiting across multiple datacenters?

7How would you design rate-limiting rules configuration and hot-reloading?

Key Topics

Asked At