Loading system design...
Design an API Gateway — the single entry point for all client requests in a microservices architecture. The gateway handles cross-cutting concerns (authentication, rate limiting, routing, circuit breaking, observability) so that individual backend services don't have to.
| Metric | Value |
|---|---|
| Total API requests | 1 billion / day (~12,000 per second) |
| Peak traffic | 50,000 requests per second |
| Backend services | 50–200 microservices |
| Gateway instances | 10–50 (auto-scaled) |
| Auth token validations | 12,000 / sec (cached: ~90% hit rate) |
| Rate-limit checks | 12,000 / sec (Redis: < 1ms per check) |
| Average response time | < 5ms gateway overhead (excluding backend latency) |
| Concurrent connections | 100,000+ per gateway instance (non-blocking I/O) |
Route incoming API requests to the correct backend microservice based on URL path, HTTP method, and headers
Authenticate and authorise every request: verify JWT/OAuth2 tokens, API keys, or mTLS certificates before forwarding to backends
Rate limit requests per client (API key / user ID / IP) with configurable limits per endpoint and subscription tier
Aggregate responses from multiple backend services into a single API response (API composition / BFF pattern)
Transform requests and responses: header injection, payload reshaping, protocol translation (REST ↔ gRPC, HTTP ↔ WebSocket)
Implement circuit breaker pattern: detect unhealthy backends, stop forwarding traffic, and return fallback responses
Cache responses for idempotent GET requests at the gateway layer to reduce backend load
Provide centralised logging, distributed tracing (trace ID propagation), and metrics collection for all API traffic
Support canary/blue-green deployments by routing a percentage of traffic to new service versions
Load balance across multiple instances of each backend service using round-robin, least-connections, or weighted strategies
Non-functional requirements define the system qualities critical to your users. Frame them as 'The system should be able to...' statements. These will guide your deep dives later.
Think about CAP theorem trade-offs, scalability limits, latency targets, durability guarantees, security requirements, fault tolerance, and compliance needs.
Frame NFRs for this specific system. 'Low latency search under 100ms' is far more valuable than just 'low latency'.
Add concrete numbers: 'P99 response time < 500ms', '99.9% availability', '10M DAU'. This drives architectural decisions.
Choose the 3-5 most critical NFRs. Every system should be 'scalable', but what makes THIS system's scaling uniquely challenging?