Loading learning content...
An API Gateway sits at the most critical juncture in your architecture—the boundary between the external world and your internal services. Every request from every client passes through this single point. This position of absolute visibility and control comes with significant responsibility.
Understanding what an API Gateway should do (and equally important, what it should not do) is essential for designing systems that are secure, performant, and maintainable. In this page, we'll exhaustively examine the core responsibilities of an API Gateway, providing the depth necessary to make informed architectural decisions.
By the end of this page, you will understand the complete set of responsibilities an API Gateway handles: request routing, security (authentication and authorization), rate limiting and throttling, request/response transformation, load balancing, observability, and fault tolerance. You'll know when to implement each capability at the gateway versus delegating to backend services.
The most fundamental responsibility of an API Gateway is routing—determining which backend service should handle each incoming request. While this sounds simple, production routing involves sophisticated logic that goes far beyond basic URL matching.
When a request arrives at the gateway, the routing engine evaluates multiple dimensions to determine the destination:
| Factor | Example | Use Case |
|---|---|---|
| URL Path | /api/v2/products/{id} | Route to Product Service v2 |
| HTTP Method | POST vs GET on same path | Different handlers for read vs write |
| Query Parameters | ?version=beta | Route to canary deployment |
| HTTP Headers | Accept: application/json vs xml | Content negotiation routing |
| Host/Domain | api.example.com vs partner.example.com | Multi-tenant routing |
| Client Identity | Premium tier vs free tier | Route to dedicated service instances |
| Geographic Origin | Request from EU vs US | Data residency compliance |
| Request Body | GraphQL operation name | Route GraphQL to appropriate resolver |
| Time of Day | Business hours vs off-peak | Route to cost-optimized backends |
| Traffic Weight | 90% stable, 10% canary | Progressive deployment rollouts |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
// Production-grade routing configurationinterface Route { id: string; priority: number; // Higher priority routes evaluated first predicates: RoutePredicate[]; // All must match (AND logic) destination: Destination; filters?: RequestFilter[]; // Transformations before forwarding fallback?: Destination; // If primary destination fails metadata?: RouteMetadata; // For observability and debugging} interface RoutePredicate { type: 'path' | 'method' | 'header' | 'query' | 'host' | 'weight' | 'time' | 'custom'; pattern?: string; // Regex or glob pattern value?: string | string[]; // Exact match values weight?: number; // For traffic splitting (0-100) timeWindow?: TimeWindow; // For time-based routing customEvaluator?: string; // Reference to custom logic (use sparingly!)} // Example: Complex multi-factor routingconst routes: Route[] = [ // Route 1: Premium API users to dedicated, high-performance cluster { id: 'premium-users-products', priority: 100, // Evaluated before general routes predicates: [ { type: 'path', pattern: '/api/products/**' }, { type: 'header', value: 'x-api-tier: premium' }, ], destination: { service: 'product-service-premium', loadBalancing: 'least-connections', // Optimal for long-running requests }, metadata: { sla: '99.99%', latencyTarget: 50, // ms }, }, // Route 2: Canary deployment - 5% of traffic to new version { id: 'products-canary', priority: 90, predicates: [ { type: 'path', pattern: '/api/products/**' }, { type: 'weight', weight: 5 }, // 5% of matching requests ], destination: { service: 'product-service-v2-canary', version: '2.1.0-beta', }, filters: [ { type: 'addHeader', key: 'X-Canary', value: 'true' }, ], metadata: { experiment: 'product-v2-rollout', owner: 'platform-team', }, }, // Route 3: EU data residency - route EU users to EU region { id: 'products-eu', priority: 85, predicates: [ { type: 'path', pattern: '/api/products/**' }, { type: 'header', value: 'CF-IPCountry: DE,FR,IT,ES,NL,BE,AT,PL' }, // EU countries ], destination: { service: 'product-service-eu', region: 'eu-west-1', }, metadata: { compliance: 'GDPR', }, }, // Route 4: Default stable route - catches everything else { id: 'products-default', priority: 50, predicates: [ { type: 'path', pattern: '/api/products/**' }, ], destination: { service: 'product-service', version: '2.0.0-stable', }, },];Path matching is the most common routing predicate. Gateways typically support multiple matching strategies:
Exact Match: /api/users matches only /api/users, not /api/users/123
Prefix Match: /api/users matches /api/users, /api/users/123, /api/users/123/orders
Glob Patterns: /api/*/products matches /api/v1/products, /api/v2/products
Regex Match: /api/users/[0-9]+ matches /api/users/123 but not /api/users/abc
Path Variables: /api/users/{userId}/orders/{orderId} extracts path parameters for backend use
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
class PathMatcher { // Compile patterns once for performance private readonly matchers: Map<string, CompiledMatcher> = new Map(); match(pattern: string, path: string): PathMatchResult | null { const compiled = this.getOrCompile(pattern); const result = compiled.regex.exec(path); if (!result) return null; // Extract path variables if pattern contains them const params: Record<string, string> = {}; compiled.paramNames.forEach((name, index) => { params[name] = result[index + 1]; }); return { matched: true, pattern, path, params, // For weighted routing decisions specificity: this.calculateSpecificity(pattern), }; } private getOrCompile(pattern: string): CompiledMatcher { let compiled = this.matchers.get(pattern); if (!compiled) { compiled = this.compile(pattern); this.matchers.set(pattern, compiled); } return compiled; } private compile(pattern: string): CompiledMatcher { // Extract parameter names: /users/{userId} → ['userId'] const paramNames: string[] = []; // Convert pattern to regex // /api/users/{userId}/orders → /api/users/([^/]+)/orders let regexPattern = pattern .replace(/\{([^}]+)\}/g, (_, paramName) => { paramNames.push(paramName); return '([^/]+)'; }) .replace(/\*\*/g, '.*') // ** matches anything including slashes .replace(/\*/g, '[^/]*'); // * matches anything except slashes return { regex: new RegExp(`^${regexPattern}$`), paramNames, }; } private calculateSpecificity(pattern: string): number { // More specific patterns have higher scores // /users/123/orders is more specific than /users/** let score = 0; const segments = pattern.split('/').filter(Boolean); for (const segment of segments) { if (segment === '**') score += 1; // Least specific else if (segment === '*') score += 5; // Somewhat specific else if (segment.includes('{')) score += 10; // Path variable else score += 20; // Exact segment match } return score; }} interface CompiledMatcher { regex: RegExp; paramNames: string[];} interface PathMatchResult { matched: boolean; pattern: string; path: string; params: Record<string, string>; specificity: number;}Routes are typically evaluated in priority order, and the first match wins. A misconfigured priority can cause traffic to route incorrectly. Always test routing configurations thoroughly, and implement route validation that warns about overlapping patterns with unclear precedence.
The API Gateway serves as the primary security checkpoint for your entire system. Every request from the external world must be authenticated and authorized before reaching backend services. This centralization of security is both powerful and critical—a misconfiguration here exposes your entire infrastructure.
Authentication verifies identity. The gateway supports multiple authentication mechanisms depending on client types:
| Mechanism | How It Works | Best For | Security Considerations |
|---|---|---|---|
| API Keys | Static key in header or query param | Simple integrations, internal services | Revocation requires key rotation; no user context; easy to leak |
| JWT (Bearer Token) | Self-contained signed token with claims | Mobile/web apps, microservices | Stateless validation; must handle expiration and refresh |
| OAuth 2.0 | Delegated authorization via access tokens | Third-party integrations, user consent flows | Complex flows; requires token introspection or JWT validation |
| mTLS (Mutual TLS) | Client certificate authentication | Service-to-service, high-security B2B | Certificate management overhead; strongest machine identity |
| HMAC Signatures | Request signing with shared secret | Webhooks, partner APIs | Replay protection needed; clock sync issues |
| Basic Auth | Username:password in header (base64) | Internal tools, legacy systems | Only with HTTPS; credentials in every request |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142
interface AuthResult { authenticated: boolean; identity?: Identity; method?: string; error?: AuthError;} interface Identity { type: 'user' | 'service' | 'apiKey'; id: string; tenantId?: string; roles: string[]; permissions: string[]; metadata: Record<string, unknown>; // When the identity was established (for freshness checks) authenticatedAt: Date; // When this identity expires expiresAt?: Date;} class AuthenticationHandler { private readonly strategies: AuthStrategy[]; constructor( private readonly jwtValidator: JWTValidator, private readonly apiKeyStore: ApiKeyStore, private readonly mtlsValidator: MTLSValidator, ) { // Strategies evaluated in order; first success wins this.strategies = [ new MTLSStrategy(this.mtlsValidator), new JWTStrategy(this.jwtValidator), new ApiKeyStrategy(this.apiKeyStore), ]; } async authenticate(request: Request): Promise<AuthResult> { // Track which strategies were attempted (for debugging) const attempts: { strategy: string; result: 'skipped' | 'failed' | 'success' }[] = []; for (const strategy of this.strategies) { // Check if this strategy applies to this request if (!strategy.shouldAttempt(request)) { attempts.push({ strategy: strategy.name, result: 'skipped' }); continue; } try { const identity = await strategy.authenticate(request); if (identity) { attempts.push({ strategy: strategy.name, result: 'success' }); // Record successful authentication for observability this.recordSuccess(strategy.name, identity); return { authenticated: true, identity, method: strategy.name, }; } attempts.push({ strategy: strategy.name, result: 'failed' }); } catch (error) { // Log but continue to next strategy console.error(`Auth strategy ${strategy.name} threw:`, error); attempts.push({ strategy: strategy.name, result: 'failed' }); } } // No strategy succeeded this.recordFailure(attempts); return { authenticated: false, error: this.determineError(attempts), }; } private determineError(attempts: { strategy: string; result: string }[]): AuthError { // If all strategies were skipped, no credentials were provided if (attempts.every(a => a.result === 'skipped')) { return { code: 'MISSING_CREDENTIALS', message: 'No authentication credentials provided', httpStatus: 401, }; } // If some strategies were attempted but failed return { code: 'INVALID_CREDENTIALS', message: 'Authentication failed', httpStatus: 401, }; }} // JWT Strategy Implementationclass JWTStrategy implements AuthStrategy { name = 'jwt'; constructor(private readonly validator: JWTValidator) {} shouldAttempt(request: Request): boolean { const authHeader = request.headers.get('Authorization'); return authHeader?.startsWith('Bearer ') ?? false; } async authenticate(request: Request): Promise<Identity | null> { const token = request.headers.get('Authorization')!.replace('Bearer ', ''); try { const payload = await this.validator.verify(token); // Check token hasn't expired (belt and suspenders) if (payload.exp && payload.exp < Date.now() / 1000) { return null; } return { type: 'user', id: payload.sub, tenantId: payload.tenant_id, roles: payload.roles || [], permissions: payload.permissions || [], metadata: { email: payload.email, name: payload.name, }, authenticatedAt: new Date(), expiresAt: payload.exp ? new Date(payload.exp * 1000) : undefined, }; } catch (error) { // Invalid signature, malformed token, etc. return null; } }}Authentication establishes identity; authorization determines permissions. The gateway can enforce authorization at multiple levels:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
interface AuthorizationPolicy { route: string; // Route pattern this policy applies to requiredRoles?: string[]; // Any of these roles grants access (OR) requiredPermissions?: string[]; // All of these required (AND) conditions?: AuthCondition[]; // Additional runtime checks denyPolicy?: DenyPolicy; // Explicit deny rules (evaluated first)} interface AuthCondition { type: 'rateLimit' | 'timeWindow' | 'ipRange' | 'resourceOwnership' | 'custom'; params: Record<string, unknown>;} class AuthorizationEnforcer { async authorize( identity: Identity, request: Request, route: RouteMatch, ): Promise<AuthorizationResult> { const policies = this.getPoliciesForRoute(route.id); for (const policy of policies) { // Deny policies are evaluated first and are absolute if (policy.denyPolicy) { const denied = await this.evaluateDenyPolicy(policy.denyPolicy, identity, request); if (denied) { return { authorized: false, reason: 'EXPLICIT_DENY', policy: policy.id, }; } } // Check role requirements (any role matches = pass) if (policy.requiredRoles?.length) { const hasRole = policy.requiredRoles.some(role => identity.roles.includes(role) ); if (!hasRole) { return { authorized: false, reason: 'MISSING_ROLE', required: policy.requiredRoles, actual: identity.roles, }; } } // Check permission requirements (all must match) if (policy.requiredPermissions?.length) { const hasAllPermissions = policy.requiredPermissions.every(perm => identity.permissions.includes(perm) || this.hasWildcardPermission(identity, perm) ); if (!hasAllPermissions) { return { authorized: false, reason: 'MISSING_PERMISSION', required: policy.requiredPermissions, actual: identity.permissions, }; } } // Evaluate runtime conditions if (policy.conditions?.length) { for (const condition of policy.conditions) { const passed = await this.evaluateCondition(condition, identity, request); if (!passed) { return { authorized: false, reason: 'CONDITION_FAILED', condition: condition.type, }; } } } } return { authorized: true }; } private hasWildcardPermission(identity: Identity, requiredPerm: string): boolean { // Check if user has wildcard permission // e.g., 'products:*' should match 'products:read' const [resource, action] = requiredPerm.split(':'); return identity.permissions.some(perm => { if (perm === '*') return true; // Super admin if (perm === `${resource}:*`) return true; // Resource admin return false; }); }}The gateway should handle coarse-grained authorization (can this user access this API at all?). Fine-grained authorization (can this user access this specific resource?) often belongs in the service with full context. The gateway passes verified identity downstream; services make final access decisions.
Rate limiting protects your infrastructure from abuse—whether malicious attacks or unintentional client bugs that hammer your APIs. The gateway is the natural enforcement point: it sees all traffic before it reaches backend services.
Rate limiting isn't one-size-fits-all. Different scenarios require different approaches:
| Strategy | Mechanism | Best For | Trade-offs |
|---|---|---|---|
| Fixed Window | Count requests in fixed time intervals (e.g., per minute) | Simple quota enforcement | Burst at window boundaries; not smooth |
| Sliding Window | Rolling time window for smooth limiting | Fair distribution over time | More complex; requires timestamp tracking |
| Token Bucket | Tokens refill at steady rate; burst up to bucket size | Allowing controlled bursts | Most flexible; widely used |
| Leaky Bucket | Requests processed at fixed rate; excess queued or dropped | Smooth outgoing rate | Can introduce latency; queue management |
| Concurrent Limit | Limit simultaneous in-flight requests | Protecting slow backends | Complements rate limits; different dimension |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129
interface RateLimitConfig { // Bucket capacity (max burst size) bucketSize: number; // Tokens added per second (sustained rate) refillRate: number; // Key type for limiting (global, per-user, per-IP, per-API-key) keyType: 'global' | 'userId' | 'ip' | 'apiKey' | 'custom'; // For custom key type, function to extract key keyExtractor?: (req: Request, identity: Identity) => string; // Response when rate limited rejectionResponse: { status: number; body: unknown; headers: Record<string, string>; };} class TokenBucketRateLimiter { constructor( private readonly redis: Redis, private readonly config: RateLimitConfig, ) {} async checkLimit(request: Request, identity: Identity | null): Promise<RateLimitResult> { const key = this.extractKey(request, identity); const now = Date.now(); // Lua script for atomic token bucket operation // This is critical - non-atomic implementations have race conditions const result = await this.redis.eval( this.TOKEN_BUCKET_SCRIPT, 1, // Number of keys `ratelimit:${key}`, // Key now, // Current timestamp this.config.bucketSize, // Bucket capacity this.config.refillRate, // Refill rate (tokens/second) 1, // Tokens requested ) as [number, number, number]; // [allowed (0/1), remaining, resetTime] const [allowed, remaining, resetTime] = result; return { allowed: allowed === 1, remaining, resetTime: new Date(resetTime), limit: this.config.bucketSize, headers: { 'X-RateLimit-Limit': String(this.config.bucketSize), 'X-RateLimit-Remaining': String(Math.max(0, remaining)), 'X-RateLimit-Reset': String(Math.ceil(resetTime / 1000)), }, }; } private extractKey(request: Request, identity: Identity | null): string { switch (this.config.keyType) { case 'global': return 'global'; case 'userId': return identity?.id ?? 'anonymous'; case 'ip': return this.extractClientIP(request); case 'apiKey': return request.headers.get('X-API-Key') ?? 'no-key'; case 'custom': return this.config.keyExtractor!(request, identity!); } } private extractClientIP(request: Request): string { // Check common proxy headers in order of trust const xForwardedFor = request.headers.get('X-Forwarded-For'); if (xForwardedFor) { // Take the first (client) IP, not proxies return xForwardedFor.split(',')[0].trim(); } const xRealIP = request.headers.get('X-Real-IP'); if (xRealIP) return xRealIP; // Fallback to connection IP (if available) return request.headers.get('CF-Connecting-IP') ?? 'unknown'; } // Atomic Lua script for token bucket // Ensures no race conditions across distributed gateway instances private readonly TOKEN_BUCKET_SCRIPT = ` local key = KEYS[1] local now = tonumber(ARGV[1]) local capacity = tonumber(ARGV[2]) local refillRate = tonumber(ARGV[3]) local requested = tonumber(ARGV[4]) -- Get current bucket state local bucket = redis.call('HMGET', key, 'tokens', 'lastRefill') local tokens = tonumber(bucket[1]) or capacity local lastRefill = tonumber(bucket[2]) or now -- Calculate token refill since last request local elapsed = (now - lastRefill) / 1000 -- Convert to seconds local refill = math.min(capacity, tokens + (elapsed * refillRate)) -- Try to consume tokens local allowed = 0 local remaining = refill if refill >= requested then remaining = refill - requested allowed = 1 end -- Update bucket state redis.call('HMSET', key, 'tokens', remaining, 'lastRefill', now) redis.call('EXPIRE', key, 3600) -- Cleanup unused keys after 1 hour -- Calculate reset time (when bucket will be full again) local resetTime = now + ((capacity - remaining) / refillRate * 1000) return {allowed, remaining, resetTime} `;} interface RateLimitResult { allowed: boolean; remaining: number; resetTime: Date; limit: number; headers: Record<string, string>;}Production systems often need rate limits across multiple dimensions simultaneously:
A request is only allowed if it passes all applicable rate limits.
With multiple gateway instances, rate limit state must be shared (Redis, memcached). This introduces latency and potential inconsistency. Some systems accept eventual consistency in rate limiting—allowing brief overages—rather than adding latency to every request. Choose based on your requirements: strict compliance vs. performance.
The gateway often needs to transform requests before forwarding them to backend services, and transform responses before returning them to clients. This capability bridges differences between external API contracts and internal service implementations.
/api/v1/users → internal /users-service/api/users123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154
interface TransformationPipeline { requestTransforms: RequestTransform[]; responseTransforms: ResponseTransform[];} interface RequestTransform { type: 'addHeader' | 'removeHeader' | 'rewritePath' | 'addQuery' | 'modifyBody'; config: unknown;} class TransformationEngine { async transformRequest( request: Request, transforms: RequestTransform[], context: RequestContext, ): Promise<Request> { let transformed = request.clone(); for (const transform of transforms) { transformed = await this.applyRequestTransform(transformed, transform, context); } return transformed; } private async applyRequestTransform( request: Request, transform: RequestTransform, context: RequestContext, ): Promise<Request> { const headers = new Headers(request.headers); switch (transform.type) { case 'addHeader': { const { key, value, valueTemplate } = transform.config as AddHeaderConfig; // Support template interpolation: 'User {identity.userId}' const resolvedValue = valueTemplate ? this.interpolate(valueTemplate, context) : value; headers.set(key, resolvedValue); break; } case 'removeHeader': { const { keys } = transform.config as RemoveHeaderConfig; keys.forEach(key => headers.delete(key)); break; } case 'rewritePath': { const { pattern, replacement } = transform.config as RewritePathConfig; const url = new URL(request.url); url.pathname = url.pathname.replace(new RegExp(pattern), replacement); return new Request(url.toString(), { ...request, headers }); } case 'modifyBody': { const { modifications } = transform.config as ModifyBodyConfig; const body = await request.json(); const modified = this.applyBodyModifications(body, modifications); return new Request(request.url, { ...request, headers, body: JSON.stringify(modified), }); } } return new Request(request.url, { ...request, headers }); } async transformResponse( response: Response, transforms: ResponseTransform[], context: RequestContext, ): Promise<Response> { let transformed = response.clone(); for (const transform of transforms) { transformed = await this.applyResponseTransform(transformed, transform, context); } return transformed; } private async applyResponseTransform( response: Response, transform: ResponseTransform, context: RequestContext, ): Promise<Response> { switch (transform.type) { case 'filterFields': { // Remove internal fields from response const { exclude } = transform.config as FilterFieldsConfig; const body = await response.json(); const filtered = this.filterObject(body, exclude); return new Response(JSON.stringify(filtered), { status: response.status, headers: response.headers, }); } case 'normalizeError': { // Convert backend error format to standard client format if (response.status >= 400) { const error = await response.json().catch(() => ({})); const normalized = { error: { code: error.code || error.errorCode || 'UNKNOWN_ERROR', message: error.message || error.description || 'An error occurred', requestId: context.requestId, // Explicitly exclude internal details }, }; return new Response(JSON.stringify(normalized), { status: response.status, headers: response.headers, }); } return response; } case 'addResponseHeader': { const { key, value } = transform.config as AddHeaderConfig; const headers = new Headers(response.headers); headers.set(key, value); return new Response(response.body, { status: response.status, headers, }); } } return response; } private filterObject(obj: unknown, excludePaths: string[]): unknown { if (typeof obj !== 'object' || obj === null) return obj; if (Array.isArray(obj)) { return obj.map(item => this.filterObject(item, excludePaths)); } const result: Record<string, unknown> = {}; for (const [key, value] of Object.entries(obj)) { if (!excludePaths.includes(key)) { result[key] = this.filterObject(value, excludePaths); } } return result; }}Body transformations require parsing and re-serializing payloads, which adds latency. For high-throughput APIs, prefer header-only transformations where possible. If body transformation is necessary, consider streaming approaches that don't buffer the entire payload.
The API Gateway is the ideal vantage point for observability. Every request—successful or failed—flows through it. By instrumenting the gateway thoroughly, you gain complete visibility into system behavior without modifying individual services.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161
interface RequestMetrics { requestId: string; startTime: number; endTime?: number; route: string; method: string; statusCode?: number; backendLatency?: number; totalLatency?: number; bytesIn: number; bytesOut?: number; userId?: string; clientIp: string; userAgent: string; cacheHit?: boolean; rateLimited?: boolean; error?: ErrorInfo;} class GatewayObservability { constructor( private readonly metrics: MetricsClient, private readonly logger: Logger, private readonly tracer: Tracer, ) {} // Called at the start of every request startRequest(request: Request): RequestContext { const requestId = request.headers.get('X-Request-Id') ?? crypto.randomUUID(); const startTime = performance.now(); // Start distributed trace const span = this.tracer.startSpan('gateway.request', { attributes: { 'http.method': request.method, 'http.url': request.url, 'http.user_agent': request.headers.get('User-Agent') ?? 'unknown', 'request.id': requestId, }, }); return { requestId, startTime, span, metrics: { requestId, startTime, method: request.method, route: 'unknown', // Set after routing clientIp: this.extractClientIP(request), userAgent: request.headers.get('User-Agent') ?? 'unknown', bytesIn: parseInt(request.headers.get('Content-Length') ?? '0'), }, }; } // Called after routing decision recordRoute(context: RequestContext, route: RouteMatch): void { context.metrics.route = route.id; context.span.setAttribute('route.id', route.id); context.span.setAttribute('route.backend', route.destination.service); } // Called after authentication recordAuth(context: RequestContext, identity: Identity | null): void { if (identity) { context.metrics.userId = identity.id; context.span.setAttribute('user.id', identity.id); // Don't log PII in metrics! } } // Called for backend request startBackendSpan(context: RequestContext, backend: string): Span { return this.tracer.startSpan('gateway.backend_request', { parent: context.span, attributes: { 'backend.service': backend, }, }); } // Called at the end of every request finishRequest( context: RequestContext, response: Response, backendLatency?: number, ): void { const endTime = performance.now(); const totalLatency = endTime - context.startTime; // Complete metrics context.metrics.endTime = endTime; context.metrics.statusCode = response.status; context.metrics.totalLatency = totalLatency; context.metrics.backendLatency = backendLatency; context.metrics.bytesOut = parseInt(response.headers.get('Content-Length') ?? '0'); // Emit Prometheus-style metrics this.metrics.histogram('gateway_request_duration_seconds', totalLatency / 1000, { route: context.metrics.route, method: context.metrics.method, status: String(response.status), }); this.metrics.counter('gateway_requests_total', 1, { route: context.metrics.route, method: context.metrics.method, status: String(response.status), }); if (backendLatency) { this.metrics.histogram('gateway_backend_duration_seconds', backendLatency / 1000, { route: context.metrics.route, backend: context.span.getAttribute('route.backend') as string, }); } // Structured access log this.logger.info('request', { request_id: context.requestId, method: context.metrics.method, route: context.metrics.route, status: response.status, latency_ms: totalLatency.toFixed(2), backend_latency_ms: backendLatency?.toFixed(2), bytes_in: context.metrics.bytesIn, bytes_out: context.metrics.bytesOut, user_id: context.metrics.userId, client_ip: context.metrics.clientIp, // Omit user_agent from default logs (high cardinality) }); // Finish trace span context.span.setStatus({ code: response.status >= 400 ? 2 : 0 }); context.span.end(); } // Called on errors recordError(context: RequestContext, error: Error): void { context.metrics.error = { type: error.name, message: error.message, }; context.span.recordException(error); this.logger.error('request_error', { request_id: context.requestId, error_type: error.name, error_message: error.message, stack: error.stack, }); this.metrics.counter('gateway_errors_total', 1, { route: context.metrics.route, error_type: error.name, }); }}Focus on the four golden signals: Latency (how long requests take), Traffic (request rate), Errors (failure rate), and Saturation (how 'full' the gateway is). With these, you can answer most operational questions about system health.
While dedicated load balancers often sit in front of the gateway, the gateway itself performs load balancing across backend service instances and implements fault tolerance mechanisms to handle backend failures gracefully.
| Algorithm | How It Works | Best For | Limitations |
|---|---|---|---|
| Round Robin | Rotate through instances sequentially | Identical instances, uniform request cost | Ignores instance load; simple but naive |
| Weighted Round Robin | Rotate with weights per instance | Heterogeneous instance capacities | Static weights; doesn't adapt |
| Least Connections | Send to instance with fewest active requests | Variable request durations | Requires tracking; slight overhead |
| Least Response Time | Send to instance with fastest recent responses | Performance-sensitive workloads | Requires tracking; can amplify issues |
| Random | Random selection (with optional weights) | Simple, surprisingly effective | No load awareness |
| Consistent Hashing | Hash request attribute to determine instance | Cache locality, sticky sessions (careful!) | Uneven distribution possible; requires rebalancing |
| P2C (Power of Two Choices) | Pick 2 random instances, choose less loaded | Excellent balance with low overhead | Modern default; combines simplicity with adaptivity |
When a backend service becomes unhealthy, continuing to send requests just makes things worse—requests queue up, timeouts cascade, and the failure spreads. The circuit breaker pattern stops this cascade:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
enum CircuitState { CLOSED, // Normal operation, requests flow through OPEN, // Failing, requests immediately rejected HALF_OPEN, // Testing if backend recovered} interface CircuitBreakerConfig { // Number of failures before opening circuit failureThreshold: number; // Time to wait in OPEN state before testing resetTimeout: number; // Number of successful requests in HALF_OPEN to close successThreshold: number; // What counts as a failure (status codes, exceptions) failureCondition: (response: Response | Error) => boolean; // Fallback response when circuit is open fallback?: () => Response;} class CircuitBreaker { private state: CircuitState = CircuitState.CLOSED; private failureCount: number = 0; private successCount: number = 0; private lastFailureTime: number = 0; private stateChanges: { state: CircuitState; timestamp: number }[] = []; constructor( private readonly name: string, private readonly config: CircuitBreakerConfig, private readonly metrics: MetricsClient, ) {} async execute<T>( operation: () => Promise<Response>, ): Promise<Response> { // Check if circuit should remain open if (this.state === CircuitState.OPEN) { if (Date.now() - this.lastFailureTime < this.config.resetTimeout) { // Circuit still open, fail fast this.metrics.counter('circuit_breaker_rejected', 1, { name: this.name }); if (this.config.fallback) { return this.config.fallback(); } throw new CircuitOpenError(`Circuit ${this.name} is open`); } // Timeout elapsed, try half-open this.transition(CircuitState.HALF_OPEN); } try { const response = await operation(); // Check if response counts as failure if (this.config.failureCondition(response)) { this.recordFailure(); return response; } // Success! this.recordSuccess(); return response; } catch (error) { this.recordFailure(); throw error; } } private recordSuccess(): void { if (this.state === CircuitState.HALF_OPEN) { this.successCount++; if (this.successCount >= this.config.successThreshold) { // Backend recovered, close circuit this.transition(CircuitState.CLOSED); this.failureCount = 0; this.successCount = 0; } } else if (this.state === CircuitState.CLOSED) { // Decay failure count on success this.failureCount = Math.max(0, this.failureCount - 1); } } private recordFailure(): void { this.failureCount++; this.lastFailureTime = Date.now(); if (this.state === CircuitState.HALF_OPEN) { // Still failing, go back to open this.transition(CircuitState.OPEN); this.successCount = 0; } else if ( this.state === CircuitState.CLOSED && this.failureCount >= this.config.failureThreshold ) { // Threshold exceeded, open circuit this.transition(CircuitState.OPEN); } this.metrics.counter('circuit_breaker_failure', 1, { name: this.name }); } private transition(newState: CircuitState): void { const oldState = this.state; this.state = newState; this.stateChanges.push({ state: newState, timestamp: Date.now() }); this.metrics.gauge('circuit_breaker_state', newState, { name: this.name }); console.log(`Circuit ${this.name}: ${CircuitState[oldState]} → ${CircuitState[newState]}`); } getState(): CircuitState { return this.state; }}Poorly tuned circuit breakers cause problems: too sensitive and they open on transient errors; too slow and they don't protect. Start conservative (higher thresholds, longer reset times) and tune based on observed behavior. Consider using adaptive algorithms that adjust thresholds based on traffic volume.
We've examined the comprehensive set of responsibilities that make an API Gateway the critical infrastructure component it is. Let's consolidate:
What's Next:
With a thorough understanding of what a gateway does, we'll next explore where to place the gateway in your architecture—edge vs. internal placement, multi-tier gateway architectures, and how gateway placement affects security, performance, and operational concerns.
You now have a comprehensive understanding of API Gateway responsibilities. From routing and security to rate limiting, transformation, observability, and fault tolerance, you understand the critical functions that make the gateway the backbone of modern distributed architectures.