System Design (HLD)What Is an API Gateway?

What Is an API Gateway?

LevelIntermediate

Duration60 mins

TopicWhat Is an API Gateway?

2 / 4

Gateway Responsibilities

The Gateway's Mandate

An API Gateway sits at the most critical juncture in your architecture—the boundary between the external world and your internal services. Every request from every client passes through this single point. This position of absolute visibility and control comes with significant responsibility.

Understanding what an API Gateway should do (and equally important, what it should not do) is essential for designing systems that are secure, performant, and maintainable. In this page, we'll exhaustively examine the core responsibilities of an API Gateway, providing the depth necessary to make informed architectural decisions.

What You Will Learn

By the end of this page, you will understand the complete set of responsibilities an API Gateway handles: request routing, security (authentication and authorization), rate limiting and throttling, request/response transformation, load balancing, observability, and fault tolerance. You'll know when to implement each capability at the gateway versus delegating to backend services.

Request Routing

The most fundamental responsibility of an API Gateway is routing—determining which backend service should handle each incoming request. While this sounds simple, production routing involves sophisticated logic that goes far beyond basic URL matching.

Anatomy of a Routing Decision

When a request arrives at the gateway, the routing engine evaluates multiple dimensions to determine the destination:

Routing Decision Factors
Factor	Example	Use Case
URL Path	/api/v2/products/{id}	Route to Product Service v2
HTTP Method	POST vs GET on same path	Different handlers for read vs write
Query Parameters	?version=beta	Route to canary deployment
HTTP Headers	Accept: application/json vs xml	Content negotiation routing
Host/Domain	api.example.com vs partner.example.com	Multi-tenant routing
Client Identity	Premium tier vs free tier	Route to dedicated service instances
Geographic Origin	Request from EU vs US	Data residency compliance
Request Body	GraphQL operation name	Route GraphQL to appropriate resolver
Time of Day	Business hours vs off-peak	Route to cost-optimized backends
Traffic Weight	90% stable, 10% canary	Progressive deployment rollouts

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
// Production-grade routing configuration
interface Route {
  id: string;
  priority: number;           // Higher priority routes evaluated first
  predicates: RoutePredicate[]; // All must match (AND logic)
  destination: Destination;
  filters?: RequestFilter[];    // Transformations before forwarding
  fallback?: Destination;       // If primary destination fails
  metadata?: RouteMetadata;     // For observability and debugging
}
 
interface RoutePredicate {
  type: 'path' | 'method' | 'header' | 'query' | 'host' | 'weight' | 'time' | 'custom';
  pattern?: string;           // Regex or glob pattern
  value?: string | string[];  // Exact match values
  weight?: number;            // For traffic splitting (0-100)
  timeWindow?: TimeWindow;    // For time-based routing
  customEvaluator?: string;   // Reference to custom logic (use sparingly!)
}
 
// Example: Complex multi-factor routing
const routes: Route[] = [
  // Route 1: Premium API users to dedicated, high-performance cluster
  {
    id: 'premium-users-products',
    priority: 100,  // Evaluated before general routes
    predicates: [
      { type: 'path', pattern: '/api/products/**' },
      { type: 'header', value: 'x-api-tier: premium' },
    ],
    destination: {
      service: 'product-service-premium',
      loadBalancing: 'least-connections',  // Optimal for long-running requests
    },
    metadata: {
      sla: '99.99%',
      latencyTarget: 50, // ms
    },
  },
  
  // Route 2: Canary deployment - 5% of traffic to new version
  {
    id: 'products-canary',
    priority: 90,
    predicates: [
      { type: 'path', pattern: '/api/products/**' },
      { type: 'weight', weight: 5 },  // 5% of matching requests
    ],
    destination: {
      service: 'product-service-v2-canary',
      version: '2.1.0-beta',
    },
    filters: [
      { type: 'addHeader', key: 'X-Canary', value: 'true' },
    ],
    metadata: {
      experiment: 'product-v2-rollout',
      owner: 'platform-team',
    },
  },
  
  // Route 3: EU data residency - route EU users to EU region
  {
    id: 'products-eu',
    priority: 85,
    predicates: [
      { type: 'path', pattern: '/api/products/**' },
      { type: 'header', value: 'CF-IPCountry: DE,FR,IT,ES,NL,BE,AT,PL' },  // EU countries
    ],
    destination: {
      service: 'product-service-eu',
      region: 'eu-west-1',
    },
    metadata: {
      compliance: 'GDPR',
    },
  },
  
  // Route 4: Default stable route - catches everything else
  {
    id: 'products-default',
    priority: 50,
    predicates: [
      { type: 'path', pattern: '/api/products/**' },
    ],
    destination: {
      service: 'product-service',
      version: '2.0.0-stable',
    },
  },
];

Path Matching Strategies

Path matching is the most common routing predicate. Gateways typically support multiple matching strategies:

Exact Match: /api/users matches only /api/users, not /api/users/123
Prefix Match: /api/users matches /api/users, /api/users/123, /api/users/123/orders
Glob Patterns: /api/*/products matches /api/v1/products, /api/v2/products
Regex Match: /api/users/[0-9]+ matches /api/users/123 but not /api/users/abc
Path Variables: /api/users/{userId}/orders/{orderId} extracts path parameters for backend use

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
class PathMatcher {
  // Compile patterns once for performance
  private readonly matchers: Map<string, CompiledMatcher> = new Map();
  
  match(pattern: string, path: string): PathMatchResult | null {
    const compiled = this.getOrCompile(pattern);
    const result = compiled.regex.exec(path);
    
    if (!result) return null;
    
    // Extract path variables if pattern contains them
    const params: Record<string, string> = {};
    compiled.paramNames.forEach((name, index) => {
      params[name] = result[index + 1];
    });
    
    return {
      matched: true,
      pattern,
      path,
      params,
      // For weighted routing decisions
      specificity: this.calculateSpecificity(pattern),
    };
  }
  
  private getOrCompile(pattern: string): CompiledMatcher {
    let compiled = this.matchers.get(pattern);
    if (!compiled) {
      compiled = this.compile(pattern);
      this.matchers.set(pattern, compiled);
    }
    return compiled;
  }
  
  private compile(pattern: string): CompiledMatcher {
    // Extract parameter names: /users/{userId} → ['userId']
    const paramNames: string[] = [];
    
    // Convert pattern to regex
    // /api/users/{userId}/orders → /api/users/([^/]+)/orders
    let regexPattern = pattern
      .replace(/\{([^}]+)\}/g, (_, paramName) => {
        paramNames.push(paramName);
        return '([^/]+)';
      })
      .replace(/\*\*/g, '.*')   // ** matches anything including slashes
      .replace(/\*/g, '[^/]*');  // * matches anything except slashes
    
    return {
      regex: new RegExp(`^${regexPattern}$`),
      paramNames,
    };
  }
  
  private calculateSpecificity(pattern: string): number {
    // More specific patterns have higher scores
    // /users/123/orders is more specific than /users/**
    let score = 0;
    const segments = pattern.split('/').filter(Boolean);
    
    for (const segment of segments) {
      if (segment === '**') score += 1;      // Least specific
      else if (segment === '*') score += 5;  // Somewhat specific
      else if (segment.includes('{')) score += 10; // Path variable
      else score += 20;                       // Exact segment match
    }
    
    return score;
  }
}
 
interface CompiledMatcher {
  regex: RegExp;
  paramNames: string[];
}
 
interface PathMatchResult {
  matched: boolean;
  pattern: string;
  path: string;
  params: Record<string, string>;
  specificity: number;
}

Route Ordering Matters

Routes are typically evaluated in priority order, and the first match wins. A misconfigured priority can cause traffic to route incorrectly. Always test routing configurations thoroughly, and implement route validation that warns about overlapping patterns with unclear precedence.

Authentication and Security

The API Gateway serves as the primary security checkpoint for your entire system. Every request from the external world must be authenticated and authorized before reaching backend services. This centralization of security is both powerful and critical—a misconfiguration here exposes your entire infrastructure.

Authentication: Who Is Making the Request?

Authentication verifies identity. The gateway supports multiple authentication mechanisms depending on client types:

Authentication Mechanisms and Their Use Cases
Mechanism	How It Works	Best For	Security Considerations
API Keys	Static key in header or query param	Simple integrations, internal services	Revocation requires key rotation; no user context; easy to leak
JWT (Bearer Token)	Self-contained signed token with claims	Mobile/web apps, microservices	Stateless validation; must handle expiration and refresh
OAuth 2.0	Delegated authorization via access tokens	Third-party integrations, user consent flows	Complex flows; requires token introspection or JWT validation
mTLS (Mutual TLS)	Client certificate authentication	Service-to-service, high-security B2B	Certificate management overhead; strongest machine identity
HMAC Signatures	Request signing with shared secret	Webhooks, partner APIs	Replay protection needed; clock sync issues
Basic Auth	Username:password in header (base64)	Internal tools, legacy systems	Only with HTTPS; credentials in every request

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
interface AuthResult {
  authenticated: boolean;
  identity?: Identity;
  method?: string;
  error?: AuthError;
}
 
interface Identity {
  type: 'user' | 'service' | 'apiKey';
  id: string;
  tenantId?: string;
  roles: string[];
  permissions: string[];
  metadata: Record<string, unknown>;
  // When the identity was established (for freshness checks)
  authenticatedAt: Date;
  // When this identity expires
  expiresAt?: Date;
}
 
class AuthenticationHandler {
  private readonly strategies: AuthStrategy[];
  
  constructor(
    private readonly jwtValidator: JWTValidator,
    private readonly apiKeyStore: ApiKeyStore,
    private readonly mtlsValidator: MTLSValidator,
  ) {
    // Strategies evaluated in order; first success wins
    this.strategies = [
      new MTLSStrategy(this.mtlsValidator),
      new JWTStrategy(this.jwtValidator),
      new ApiKeyStrategy(this.apiKeyStore),
    ];
  }
  
  async authenticate(request: Request): Promise<AuthResult> {
    // Track which strategies were attempted (for debugging)
    const attempts: { strategy: string; result: 'skipped' | 'failed' | 'success' }[] = [];
    
    for (const strategy of this.strategies) {
      // Check if this strategy applies to this request
      if (!strategy.shouldAttempt(request)) {
        attempts.push({ strategy: strategy.name, result: 'skipped' });
        continue;
      }
      
      try {
        const identity = await strategy.authenticate(request);
        
        if (identity) {
          attempts.push({ strategy: strategy.name, result: 'success' });
          
          // Record successful authentication for observability
          this.recordSuccess(strategy.name, identity);
          
          return {
            authenticated: true,
            identity,
            method: strategy.name,
          };
        }
        
        attempts.push({ strategy: strategy.name, result: 'failed' });
        
      } catch (error) {
        // Log but continue to next strategy
        console.error(`Auth strategy ${strategy.name} threw:`, error);
        attempts.push({ strategy: strategy.name, result: 'failed' });
      }
    }
    
    // No strategy succeeded
    this.recordFailure(attempts);
    
    return {
      authenticated: false,
      error: this.determineError(attempts),
    };
  }
  
  private determineError(attempts: { strategy: string; result: string }[]): AuthError {
    // If all strategies were skipped, no credentials were provided
    if (attempts.every(a => a.result === 'skipped')) {
      return {
        code: 'MISSING_CREDENTIALS',
        message: 'No authentication credentials provided',
        httpStatus: 401,
      };
    }
    
    // If some strategies were attempted but failed
    return {
      code: 'INVALID_CREDENTIALS',
      message: 'Authentication failed',
      httpStatus: 401,
    };
  }
}
 
// JWT Strategy Implementation
class JWTStrategy implements AuthStrategy {
  name = 'jwt';
  
  constructor(private readonly validator: JWTValidator) {}
  
  shouldAttempt(request: Request): boolean {
    const authHeader = request.headers.get('Authorization');
    return authHeader?.startsWith('Bearer ') ?? false;
  }
  
  async authenticate(request: Request): Promise<Identity | null> {
    const token = request.headers.get('Authorization')!.replace('Bearer ', '');
    
    try {
      const payload = await this.validator.verify(token);
      
      // Check token hasn't expired (belt and suspenders)
      if (payload.exp && payload.exp < Date.now() / 1000) {
        return null;
      }
      
      return {
        type: 'user',
        id: payload.sub,
        tenantId: payload.tenant_id,
        roles: payload.roles || [],
        permissions: payload.permissions || [],
        metadata: {
          email: payload.email,
          name: payload.name,
        },
        authenticatedAt: new Date(),
        expiresAt: payload.exp ? new Date(payload.exp * 1000) : undefined,
      };
      
    } catch (error) {
      // Invalid signature, malformed token, etc.
      return null;
    }
  }
}

Authorization: What Can They Access?

Authentication establishes identity; authorization determines permissions. The gateway can enforce authorization at multiple levels:

Route-Level Authorization: Certain routes require specific roles
Resource-Level Authorization: Access to specific resource IDs
Operation-Level Authorization: Different permissions for read vs. write
Attribute-Based Access Control (ABAC): Complex policies based on attributes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
interface AuthorizationPolicy {
  route: string;              // Route pattern this policy applies to
  requiredRoles?: string[];   // Any of these roles grants access (OR)
  requiredPermissions?: string[]; // All of these required (AND)
  conditions?: AuthCondition[]; // Additional runtime checks
  denyPolicy?: DenyPolicy;    // Explicit deny rules (evaluated first)
}
 
interface AuthCondition {
  type: 'rateLimit' | 'timeWindow' | 'ipRange' | 'resourceOwnership' | 'custom';
  params: Record<string, unknown>;
}
 
class AuthorizationEnforcer {
  async authorize(
    identity: Identity,
    request: Request,
    route: RouteMatch,
  ): Promise<AuthorizationResult> {
    const policies = this.getPoliciesForRoute(route.id);
    
    for (const policy of policies) {
      // Deny policies are evaluated first and are absolute
      if (policy.denyPolicy) {
        const denied = await this.evaluateDenyPolicy(policy.denyPolicy, identity, request);
        if (denied) {
          return {
            authorized: false,
            reason: 'EXPLICIT_DENY',
            policy: policy.id,
          };
        }
      }
      
      // Check role requirements (any role matches = pass)
      if (policy.requiredRoles?.length) {
        const hasRole = policy.requiredRoles.some(role => 
          identity.roles.includes(role)
        );
        if (!hasRole) {
          return {
            authorized: false,
            reason: 'MISSING_ROLE',
            required: policy.requiredRoles,
            actual: identity.roles,
          };
        }
      }
      
      // Check permission requirements (all must match)
      if (policy.requiredPermissions?.length) {
        const hasAllPermissions = policy.requiredPermissions.every(perm =>
          identity.permissions.includes(perm) || this.hasWildcardPermission(identity, perm)
        );
        if (!hasAllPermissions) {
          return {
            authorized: false,
            reason: 'MISSING_PERMISSION',
            required: policy.requiredPermissions,
            actual: identity.permissions,
          };
        }
      }
      
      // Evaluate runtime conditions
      if (policy.conditions?.length) {
        for (const condition of policy.conditions) {
          const passed = await this.evaluateCondition(condition, identity, request);
          if (!passed) {
            return {
              authorized: false,
              reason: 'CONDITION_FAILED',
              condition: condition.type,
            };
          }
        }
      }
    }
    
    return { authorized: true };
  }
  
  private hasWildcardPermission(identity: Identity, requiredPerm: string): boolean {
    // Check if user has wildcard permission
    // e.g., 'products:*' should match 'products:read'
    const [resource, action] = requiredPerm.split(':');
    
    return identity.permissions.some(perm => {
      if (perm === '*') return true;  // Super admin
      if (perm === `${resource}:*`) return true;  // Resource admin
      return false;
    });
  }
}

Gateway vs. Service Authorization

The gateway should handle coarse-grained authorization (can this user access this API at all?). Fine-grained authorization (can this user access this specific resource?) often belongs in the service with full context. The gateway passes verified identity downstream; services make final access decisions.

Rate Limiting and Throttling

Rate limiting protects your infrastructure from abuse—whether malicious attacks or unintentional client bugs that hammer your APIs. The gateway is the natural enforcement point: it sees all traffic before it reaches backend services.

Types of Rate Limiting

Rate limiting isn't one-size-fits-all. Different scenarios require different approaches:

Rate Limiting Strategies
Strategy	Mechanism	Best For	Trade-offs
Fixed Window	Count requests in fixed time intervals (e.g., per minute)	Simple quota enforcement	Burst at window boundaries; not smooth
Sliding Window	Rolling time window for smooth limiting	Fair distribution over time	More complex; requires timestamp tracking
Token Bucket	Tokens refill at steady rate; burst up to bucket size	Allowing controlled bursts	Most flexible; widely used
Leaky Bucket	Requests processed at fixed rate; excess queued or dropped	Smooth outgoing rate	Can introduce latency; queue management
Concurrent Limit	Limit simultaneous in-flight requests	Protecting slow backends	Complements rate limits; different dimension

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
interface RateLimitConfig {
  // Bucket capacity (max burst size)
  bucketSize: number;
  // Tokens added per second (sustained rate)
  refillRate: number;
  // Key type for limiting (global, per-user, per-IP, per-API-key)
  keyType: 'global' | 'userId' | 'ip' | 'apiKey' | 'custom';
  // For custom key type, function to extract key
  keyExtractor?: (req: Request, identity: Identity) => string;
  // Response when rate limited
  rejectionResponse: {
    status: number;
    body: unknown;
    headers: Record<string, string>;
  };
}
 
class TokenBucketRateLimiter {
  constructor(
    private readonly redis: Redis,
    private readonly config: RateLimitConfig,
  ) {}
  
  async checkLimit(request: Request, identity: Identity | null): Promise<RateLimitResult> {
    const key = this.extractKey(request, identity);
    const now = Date.now();
    
    // Lua script for atomic token bucket operation
    // This is critical - non-atomic implementations have race conditions
    const result = await this.redis.eval(
      this.TOKEN_BUCKET_SCRIPT,
      1,                              // Number of keys
      `ratelimit:${key}`,           // Key
      now,                            // Current timestamp
      this.config.bucketSize,         // Bucket capacity
      this.config.refillRate,         // Refill rate (tokens/second)
      1,                              // Tokens requested
    ) as [number, number, number];    // [allowed (0/1), remaining, resetTime]
    
    const [allowed, remaining, resetTime] = result;
    
    return {
      allowed: allowed === 1,
      remaining,
      resetTime: new Date(resetTime),
      limit: this.config.bucketSize,
      headers: {
        'X-RateLimit-Limit': String(this.config.bucketSize),
        'X-RateLimit-Remaining': String(Math.max(0, remaining)),
        'X-RateLimit-Reset': String(Math.ceil(resetTime / 1000)),
      },
    };
  }
  
  private extractKey(request: Request, identity: Identity | null): string {
    switch (this.config.keyType) {
      case 'global':
        return 'global';
      case 'userId':
        return identity?.id ?? 'anonymous';
      case 'ip':
        return this.extractClientIP(request);
      case 'apiKey':
        return request.headers.get('X-API-Key') ?? 'no-key';
      case 'custom':
        return this.config.keyExtractor!(request, identity!);
    }
  }
  
  private extractClientIP(request: Request): string {
    // Check common proxy headers in order of trust
    const xForwardedFor = request.headers.get('X-Forwarded-For');
    if (xForwardedFor) {
      // Take the first (client) IP, not proxies
      return xForwardedFor.split(',')[0].trim();
    }
    
    const xRealIP = request.headers.get('X-Real-IP');
    if (xRealIP) return xRealIP;
    
    // Fallback to connection IP (if available)
    return request.headers.get('CF-Connecting-IP') ?? 'unknown';
  }
  
  // Atomic Lua script for token bucket
  // Ensures no race conditions across distributed gateway instances
  private readonly TOKEN_BUCKET_SCRIPT = `
    local key = KEYS[1]
    local now = tonumber(ARGV[1])
    local capacity = tonumber(ARGV[2])
    local refillRate = tonumber(ARGV[3])
    local requested = tonumber(ARGV[4])
    
    -- Get current bucket state
    local bucket = redis.call('HMGET', key, 'tokens', 'lastRefill')
    local tokens = tonumber(bucket[1]) or capacity
    local lastRefill = tonumber(bucket[2]) or now
    
    -- Calculate token refill since last request
    local elapsed = (now - lastRefill) / 1000  -- Convert to seconds
    local refill = math.min(capacity, tokens + (elapsed * refillRate))
    
    -- Try to consume tokens
    local allowed = 0
    local remaining = refill
    
    if refill >= requested then
      remaining = refill - requested
      allowed = 1
    end
    
    -- Update bucket state
    redis.call('HMSET', key, 'tokens', remaining, 'lastRefill', now)
    redis.call('EXPIRE', key, 3600)  -- Cleanup unused keys after 1 hour
    
    -- Calculate reset time (when bucket will be full again)
    local resetTime = now + ((capacity - remaining) / refillRate * 1000)
    
    return {allowed, remaining, resetTime}
  `;
}
 
interface RateLimitResult {
  allowed: boolean;
  remaining: number;
  resetTime: Date;
  limit: number;
  headers: Record<string, string>;
}

Multi-Dimensional Rate Limiting

Production systems often need rate limits across multiple dimensions simultaneously:

Per-user: Each user gets 1000 requests/hour
Per-IP: Each IP gets 100 requests/minute (catch bots/attacks)
Per-API-key: Each third-party integration has its own quota
Per-endpoint: The expensive report endpoint has a stricter limit
Global: Total system capacity protection

A request is only allowed if it passes all applicable rate limits.

Distributed Rate Limiting Challenges

With multiple gateway instances, rate limit state must be shared (Redis, memcached). This introduces latency and potential inconsistency. Some systems accept eventual consistency in rate limiting—allowing brief overages—rather than adding latency to every request. Choose based on your requirements: strict compliance vs. performance.

Request and Response Transformation

The gateway often needs to transform requests before forwarding them to backend services, and transform responses before returning them to clients. This capability bridges differences between external API contracts and internal service implementations.

Common Transformation Scenarios

Transformation Use Cases

•Header Manipulation — Add internal headers (X-User-Id, X-Request-Id), remove sensitive headers, rename headers for backend compatibility
•Authentication Token Enrichment — Replace opaque OAuth token with decoded claims for backend consumption
•Path Rewriting — Public /api/v1/users → internal /users-service/api/users
•Query Parameter Transformation — Convert pagination styles, add default parameters
•Protocol Translation — HTTP/JSON to gRPC/Protobuf, or vice versa
•Response Filtering — Remove internal fields from responses, redact sensitive data
•Error Response Normalization — Convert diverse backend error formats to consistent client format
•Compression/Decompression — Handle content encoding for backends that don't support it

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
interface TransformationPipeline {
  requestTransforms: RequestTransform[];
  responseTransforms: ResponseTransform[];
}
 
interface RequestTransform {
  type: 'addHeader' | 'removeHeader' | 'rewritePath' | 'addQuery' | 'modifyBody';
  config: unknown;
}
 
class TransformationEngine {
  async transformRequest(
    request: Request,
    transforms: RequestTransform[],
    context: RequestContext,
  ): Promise<Request> {
    let transformed = request.clone();
    
    for (const transform of transforms) {
      transformed = await this.applyRequestTransform(transformed, transform, context);
    }
    
    return transformed;
  }
  
  private async applyRequestTransform(
    request: Request,
    transform: RequestTransform,
    context: RequestContext,
  ): Promise<Request> {
    const headers = new Headers(request.headers);
    
    switch (transform.type) {
      case 'addHeader': {
        const { key, value, valueTemplate } = transform.config as AddHeaderConfig;
        // Support template interpolation: 'User {identity.userId}'
        const resolvedValue = valueTemplate 
          ? this.interpolate(valueTemplate, context)
          : value;
        headers.set(key, resolvedValue);
        break;
      }
      
      case 'removeHeader': {
        const { keys } = transform.config as RemoveHeaderConfig;
        keys.forEach(key => headers.delete(key));
        break;
      }
      
      case 'rewritePath': {
        const { pattern, replacement } = transform.config as RewritePathConfig;
        const url = new URL(request.url);
        url.pathname = url.pathname.replace(new RegExp(pattern), replacement);
        return new Request(url.toString(), { ...request, headers });
      }
      
      case 'modifyBody': {
        const { modifications } = transform.config as ModifyBodyConfig;
        const body = await request.json();
        const modified = this.applyBodyModifications(body, modifications);
        return new Request(request.url, {
          ...request,
          headers,
          body: JSON.stringify(modified),
        });
      }
    }
    
    return new Request(request.url, { ...request, headers });
  }
  
  async transformResponse(
    response: Response,
    transforms: ResponseTransform[],
    context: RequestContext,
  ): Promise<Response> {
    let transformed = response.clone();
    
    for (const transform of transforms) {
      transformed = await this.applyResponseTransform(transformed, transform, context);
    }
    
    return transformed;
  }
  
  private async applyResponseTransform(
    response: Response,
    transform: ResponseTransform,
    context: RequestContext,
  ): Promise<Response> {
    switch (transform.type) {
      case 'filterFields': {
        // Remove internal fields from response
        const { exclude } = transform.config as FilterFieldsConfig;
        const body = await response.json();
        const filtered = this.filterObject(body, exclude);
        
        return new Response(JSON.stringify(filtered), {
          status: response.status,
          headers: response.headers,
        });
      }
      
      case 'normalizeError': {
        // Convert backend error format to standard client format
        if (response.status >= 400) {
          const error = await response.json().catch(() => ({}));
          const normalized = {
            error: {
              code: error.code || error.errorCode || 'UNKNOWN_ERROR',
              message: error.message || error.description || 'An error occurred',
              requestId: context.requestId,
              // Explicitly exclude internal details
            },
          };
          
          return new Response(JSON.stringify(normalized), {
            status: response.status,
            headers: response.headers,
          });
        }
        return response;
      }
      
      case 'addResponseHeader': {
        const { key, value } = transform.config as AddHeaderConfig;
        const headers = new Headers(response.headers);
        headers.set(key, value);
        return new Response(response.body, { 
          status: response.status, 
          headers,
        });
      }
    }
    
    return response;
  }
  
  private filterObject(obj: unknown, excludePaths: string[]): unknown {
    if (typeof obj !== 'object' || obj === null) return obj;
    
    if (Array.isArray(obj)) {
      return obj.map(item => this.filterObject(item, excludePaths));
    }
    
    const result: Record<string, unknown> = {};
    for (const [key, value] of Object.entries(obj)) {
      if (!excludePaths.includes(key)) {
        result[key] = this.filterObject(value, excludePaths);
      }
    }
    return result;
  }
}

Transformation Performance

Body transformations require parsing and re-serializing payloads, which adds latency. For high-throughput APIs, prefer header-only transformations where possible. If body transformation is necessary, consider streaming approaches that don't buffer the entire payload.

Observability: Metrics, Logging, and Tracing

The API Gateway is the ideal vantage point for observability. Every request—successful or failed—flows through it. By instrumenting the gateway thoroughly, you gain complete visibility into system behavior without modifying individual services.

The Three Pillars at the Gateway

Metrics

•Request Rate — Requests per second by route, method, status
•Latency Histograms — p50, p95, p99 by route
•Error Rates — 4xx, 5xx breakdown by route and cause
•Rate Limit Hits — How often limits are triggered
•Backend Health — Response times and errors per backend
•Connection Pools — Active connections, pool exhaustion
•Cache Hit Rates — If gateway caches responses

Logging

•Access Logs — Every request with key metadata
•Security Events — Auth failures, rate limit violations
•Error Details — Stack traces, error contexts
•Audit Trail — Who accessed what, when
•Debug Logs — Routing decisions, transformation details
•Circuit Breaker State — Open/close events
•Configuration Changes — Audit of route updates

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
interface RequestMetrics {
  requestId: string;
  startTime: number;
  endTime?: number;
  route: string;
  method: string;
  statusCode?: number;
  backendLatency?: number;
  totalLatency?: number;
  bytesIn: number;
  bytesOut?: number;
  userId?: string;
  clientIp: string;
  userAgent: string;
  cacheHit?: boolean;
  rateLimited?: boolean;
  error?: ErrorInfo;
}
 
class GatewayObservability {
  constructor(
    private readonly metrics: MetricsClient,
    private readonly logger: Logger,
    private readonly tracer: Tracer,
  ) {}
  
  // Called at the start of every request
  startRequest(request: Request): RequestContext {
    const requestId = request.headers.get('X-Request-Id') ?? crypto.randomUUID();
    const startTime = performance.now();
    
    // Start distributed trace
    const span = this.tracer.startSpan('gateway.request', {
      attributes: {
        'http.method': request.method,
        'http.url': request.url,
        'http.user_agent': request.headers.get('User-Agent') ?? 'unknown',
        'request.id': requestId,
      },
    });
    
    return {
      requestId,
      startTime,
      span,
      metrics: {
        requestId,
        startTime,
        method: request.method,
        route: 'unknown', // Set after routing
        clientIp: this.extractClientIP(request),
        userAgent: request.headers.get('User-Agent') ?? 'unknown',
        bytesIn: parseInt(request.headers.get('Content-Length') ?? '0'),
      },
    };
  }
  
  // Called after routing decision
  recordRoute(context: RequestContext, route: RouteMatch): void {
    context.metrics.route = route.id;
    context.span.setAttribute('route.id', route.id);
    context.span.setAttribute('route.backend', route.destination.service);
  }
  
  // Called after authentication
  recordAuth(context: RequestContext, identity: Identity | null): void {
    if (identity) {
      context.metrics.userId = identity.id;
      context.span.setAttribute('user.id', identity.id);
      // Don't log PII in metrics!
    }
  }
  
  // Called for backend request
  startBackendSpan(context: RequestContext, backend: string): Span {
    return this.tracer.startSpan('gateway.backend_request', {
      parent: context.span,
      attributes: {
        'backend.service': backend,
      },
    });
  }
  
  // Called at the end of every request
  finishRequest(
    context: RequestContext,
    response: Response,
    backendLatency?: number,
  ): void {
    const endTime = performance.now();
    const totalLatency = endTime - context.startTime;
    
    // Complete metrics
    context.metrics.endTime = endTime;
    context.metrics.statusCode = response.status;
    context.metrics.totalLatency = totalLatency;
    context.metrics.backendLatency = backendLatency;
    context.metrics.bytesOut = parseInt(response.headers.get('Content-Length') ?? '0');
    
    // Emit Prometheus-style metrics
    this.metrics.histogram('gateway_request_duration_seconds', totalLatency / 1000, {
      route: context.metrics.route,
      method: context.metrics.method,
      status: String(response.status),
    });
    
    this.metrics.counter('gateway_requests_total', 1, {
      route: context.metrics.route,
      method: context.metrics.method,
      status: String(response.status),
    });
    
    if (backendLatency) {
      this.metrics.histogram('gateway_backend_duration_seconds', backendLatency / 1000, {
        route: context.metrics.route,
        backend: context.span.getAttribute('route.backend') as string,
      });
    }
    
    // Structured access log
    this.logger.info('request', {
      request_id: context.requestId,
      method: context.metrics.method,
      route: context.metrics.route,
      status: response.status,
      latency_ms: totalLatency.toFixed(2),
      backend_latency_ms: backendLatency?.toFixed(2),
      bytes_in: context.metrics.bytesIn,
      bytes_out: context.metrics.bytesOut,
      user_id: context.metrics.userId,
      client_ip: context.metrics.clientIp,
      // Omit user_agent from default logs (high cardinality)
    });
    
    // Finish trace span
    context.span.setStatus({ code: response.status >= 400 ? 2 : 0 });
    context.span.end();
  }
  
  // Called on errors
  recordError(context: RequestContext, error: Error): void {
    context.metrics.error = {
      type: error.name,
      message: error.message,
    };
    
    context.span.recordException(error);
    
    this.logger.error('request_error', {
      request_id: context.requestId,
      error_type: error.name,
      error_message: error.message,
      stack: error.stack,
    });
    
    this.metrics.counter('gateway_errors_total', 1, {
      route: context.metrics.route,
      error_type: error.name,
    });
  }
}

Golden Signals at the Gateway

Focus on the four golden signals: Latency (how long requests take), Traffic (request rate), Errors (failure rate), and Saturation (how 'full' the gateway is). With these, you can answer most operational questions about system health.

Load Balancing and Fault Tolerance

While dedicated load balancers often sit in front of the gateway, the gateway itself performs load balancing across backend service instances and implements fault tolerance mechanisms to handle backend failures gracefully.

Backend Load Balancing Strategies

Load Balancing Algorithms
Algorithm	How It Works	Best For	Limitations
Round Robin	Rotate through instances sequentially	Identical instances, uniform request cost	Ignores instance load; simple but naive
Weighted Round Robin	Rotate with weights per instance	Heterogeneous instance capacities	Static weights; doesn't adapt
Least Connections	Send to instance with fewest active requests	Variable request durations	Requires tracking; slight overhead
Least Response Time	Send to instance with fastest recent responses	Performance-sensitive workloads	Requires tracking; can amplify issues
Random	Random selection (with optional weights)	Simple, surprisingly effective	No load awareness
Consistent Hashing	Hash request attribute to determine instance	Cache locality, sticky sessions (careful!)	Uneven distribution possible; requires rebalancing
P2C (Power of Two Choices)	Pick 2 random instances, choose less loaded	Excellent balance with low overhead	Modern default; combines simplicity with adaptivity

Circuit Breakers: Failing Fast

When a backend service becomes unhealthy, continuing to send requests just makes things worse—requests queue up, timeouts cascade, and the failure spreads. The circuit breaker pattern stops this cascade:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
enum CircuitState {
  CLOSED,      // Normal operation, requests flow through
  OPEN,        // Failing, requests immediately rejected
  HALF_OPEN,   // Testing if backend recovered
}
 
interface CircuitBreakerConfig {
  // Number of failures before opening circuit
  failureThreshold: number;
  // Time to wait in OPEN state before testing
  resetTimeout: number;
  // Number of successful requests in HALF_OPEN to close
  successThreshold: number;
  // What counts as a failure (status codes, exceptions)
  failureCondition: (response: Response | Error) => boolean;
  // Fallback response when circuit is open
  fallback?: () => Response;
}
 
class CircuitBreaker {
  private state: CircuitState = CircuitState.CLOSED;
  private failureCount: number = 0;
  private successCount: number = 0;
  private lastFailureTime: number = 0;
  private stateChanges: { state: CircuitState; timestamp: number }[] = [];
  
  constructor(
    private readonly name: string,
    private readonly config: CircuitBreakerConfig,
    private readonly metrics: MetricsClient,
  ) {}
  
  async execute<T>(
    operation: () => Promise<Response>,
  ): Promise<Response> {
    // Check if circuit should remain open
    if (this.state === CircuitState.OPEN) {
      if (Date.now() - this.lastFailureTime < this.config.resetTimeout) {
        // Circuit still open, fail fast
        this.metrics.counter('circuit_breaker_rejected', 1, { name: this.name });
        
        if (this.config.fallback) {
          return this.config.fallback();
        }
        
        throw new CircuitOpenError(`Circuit ${this.name} is open`);
      }
      
      // Timeout elapsed, try half-open
      this.transition(CircuitState.HALF_OPEN);
    }
    
    try {
      const response = await operation();
      
      // Check if response counts as failure
      if (this.config.failureCondition(response)) {
        this.recordFailure();
        return response;
      }
      
      // Success!
      this.recordSuccess();
      return response;
      
    } catch (error) {
      this.recordFailure();
      throw error;
    }
  }
  
  private recordSuccess(): void {
    if (this.state === CircuitState.HALF_OPEN) {
      this.successCount++;
      
      if (this.successCount >= this.config.successThreshold) {
        // Backend recovered, close circuit
        this.transition(CircuitState.CLOSED);
        this.failureCount = 0;
        this.successCount = 0;
      }
    } else if (this.state === CircuitState.CLOSED) {
      // Decay failure count on success
      this.failureCount = Math.max(0, this.failureCount - 1);
    }
  }
  
  private recordFailure(): void {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    
    if (this.state === CircuitState.HALF_OPEN) {
      // Still failing, go back to open
      this.transition(CircuitState.OPEN);
      this.successCount = 0;
    } else if (
      this.state === CircuitState.CLOSED &&
      this.failureCount >= this.config.failureThreshold
    ) {
      // Threshold exceeded, open circuit
      this.transition(CircuitState.OPEN);
    }
    
    this.metrics.counter('circuit_breaker_failure', 1, { name: this.name });
  }
  
  private transition(newState: CircuitState): void {
    const oldState = this.state;
    this.state = newState;
    
    this.stateChanges.push({ state: newState, timestamp: Date.now() });
    
    this.metrics.gauge('circuit_breaker_state', newState, { name: this.name });
    
    console.log(`Circuit ${this.name}: ${CircuitState[oldState]} → ${CircuitState[newState]}`);
  }
  
  getState(): CircuitState {
    return this.state;
  }
}

Circuit Breaker Tuning

Poorly tuned circuit breakers cause problems: too sensitive and they open on transient errors; too slow and they don't protect. Start conservative (higher thresholds, longer reset times) and tune based on observed behavior. Consider using adaptive algorithms that adjust thresholds based on traffic volume.

Summary: Gateway Responsibilities

We've examined the comprehensive set of responsibilities that make an API Gateway the critical infrastructure component it is. Let's consolidate:

Key Takeaways

•Request Routing — Sophisticated matching based on paths, headers, methods, weights, and custom predicates directs traffic to the right backends.
•Authentication & Authorization — Centralized security validates identity (JWT, API keys, mTLS) and enforces access policies before requests reach services.
•Rate Limiting — Token bucket and other algorithms protect infrastructure from abuse, with multi-dimensional limits per user, IP, key, and endpoint.
•Request/Response Transformation — Headers, paths, and bodies can be modified to bridge external API contracts with internal implementations.
•Observability — Metrics, structured logging, and distributed tracing provide complete visibility into every request's journey.
•Load Balancing & Fault Tolerance — P2C, least connections, and circuit breakers ensure traffic is distributed fairly and failures don't cascade.
•Keep It Thin — Despite these responsibilities, the gateway should remain a 'dumb pipe'—routing, securing, and observing, never computing business logic.

What's Next:

With a thorough understanding of what a gateway does, we'll next explore where to place the gateway in your architecture—edge vs. internal placement, multi-tier gateway architectures, and how gateway placement affects security, performance, and operational concerns.

Page Complete

You now have a comprehensive understanding of API Gateway responsibilities. From routing and security to rate limiting, transformation, observability, and fault tolerance, you understand the critical functions that make the gateway the backbone of modern distributed architectures.

2 / 4

Loading learning content...

System Design (HLD)What Is an API Gateway?

What Is an API Gateway?

LevelIntermediate

Duration60 mins

TopicWhat Is an API Gateway?

2 / 4

Gateway Responsibilities

The Gateway's Mandate

What You Will Learn

Request Routing

Anatomy of a Routing Decision

When a request arrives at the gateway, the routing engine evaluates multiple dimensions to determine the destination:

Routing Decision Factors
Factor	Example	Use Case
URL Path	/api/v2/products/{id}	Route to Product Service v2
HTTP Method	POST vs GET on same path	Different handlers for read vs write
Query Parameters	?version=beta	Route to canary deployment
HTTP Headers	Accept: application/json vs xml	Content negotiation routing
Host/Domain	api.example.com vs partner.example.com	Multi-tenant routing
Client Identity	Premium tier vs free tier	Route to dedicated service instances
Geographic Origin	Request from EU vs US	Data residency compliance
Request Body	GraphQL operation name	Route GraphQL to appropriate resolver
Time of Day	Business hours vs off-peak	Route to cost-optimized backends
Traffic Weight	90% stable, 10% canary	Progressive deployment rollouts

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
// Production-grade routing configuration
interface Route {
  id: string;
  priority: number;           // Higher priority routes evaluated first
  predicates: RoutePredicate[]; // All must match (AND logic)
  destination: Destination;
  filters?: RequestFilter[];    // Transformations before forwarding
  fallback?: Destination;       // If primary destination fails
  metadata?: RouteMetadata;     // For observability and debugging
}
 
interface RoutePredicate {
  type: 'path' | 'method' | 'header' | 'query' | 'host' | 'weight' | 'time' | 'custom';
  pattern?: string;           // Regex or glob pattern
  value?: string | string[];  // Exact match values
  weight?: number;            // For traffic splitting (0-100)
  timeWindow?: TimeWindow;    // For time-based routing
  customEvaluator?: string;   // Reference to custom logic (use sparingly!)
}
 
// Example: Complex multi-factor routing
const routes: Route[] = [
  // Route 1: Premium API users to dedicated, high-performance cluster
  {
    id: 'premium-users-products',
    priority: 100,  // Evaluated before general routes
    predicates: [
      { type: 'path', pattern: '/api/products/**' },
      { type: 'header', value: 'x-api-tier: premium' },
    ],
    destination: {
      service: 'product-service-premium',
      loadBalancing: 'least-connections',  // Optimal for long-running requests
    },
    metadata: {
      sla: '99.99%',
      latencyTarget: 50, // ms
    },
  },
  
  // Route 2: Canary deployment - 5% of traffic to new version
  {
    id: 'products-canary',
    priority: 90,
    predicates: [
      { type: 'path', pattern: '/api/products/**' },
      { type: 'weight', weight: 5 },  // 5% of matching requests
    ],
    destination: {
      service: 'product-service-v2-canary',
      version: '2.1.0-beta',
    },
    filters: [
      { type: 'addHeader', key: 'X-Canary', value: 'true' },
    ],
    metadata: {
      experiment: 'product-v2-rollout',
      owner: 'platform-team',
    },
  },
  
  // Route 3: EU data residency - route EU users to EU region
  {
    id: 'products-eu',
    priority: 85,
    predicates: [
      { type: 'path', pattern: '/api/products/**' },
      { type: 'header', value: 'CF-IPCountry: DE,FR,IT,ES,NL,BE,AT,PL' },  // EU countries
    ],
    destination: {
      service: 'product-service-eu',
      region: 'eu-west-1',
    },
    metadata: {
      compliance: 'GDPR',
    },
  },
  
  // Route 4: Default stable route - catches everything else
  {
    id: 'products-default',
    priority: 50,
    predicates: [
      { type: 'path', pattern: '/api/products/**' },
    ],
    destination: {
      service: 'product-service',
      version: '2.0.0-stable',
    },
  },
];

Path Matching Strategies

Path matching is the most common routing predicate. Gateways typically support multiple matching strategies:

Exact Match: /api/users matches only /api/users, not /api/users/123
Prefix Match: /api/users matches /api/users, /api/users/123, /api/users/123/orders
Glob Patterns: /api/*/products matches /api/v1/products, /api/v2/products
Regex Match: /api/users/[0-9]+ matches /api/users/123 but not /api/users/abc
Path Variables: /api/users/{userId}/orders/{orderId} extracts path parameters for backend use

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
class PathMatcher {
  // Compile patterns once for performance
  private readonly matchers: Map<string, CompiledMatcher> = new Map();
  
  match(pattern: string, path: string): PathMatchResult | null {
    const compiled = this.getOrCompile(pattern);
    const result = compiled.regex.exec(path);
    
    if (!result) return null;
    
    // Extract path variables if pattern contains them
    const params: Record<string, string> = {};
    compiled.paramNames.forEach((name, index) => {
      params[name] = result[index + 1];
    });
    
    return {
      matched: true,
      pattern,
      path,
      params,
      // For weighted routing decisions
      specificity: this.calculateSpecificity(pattern),
    };
  }
  
  private getOrCompile(pattern: string): CompiledMatcher {
    let compiled = this.matchers.get(pattern);
    if (!compiled) {
      compiled = this.compile(pattern);
      this.matchers.set(pattern, compiled);
    }
    return compiled;
  }
  
  private compile(pattern: string): CompiledMatcher {
    // Extract parameter names: /users/{userId} → ['userId']
    const paramNames: string[] = [];
    
    // Convert pattern to regex
    // /api/users/{userId}/orders → /api/users/([^/]+)/orders
    let regexPattern = pattern
      .replace(/\{([^}]+)\}/g, (_, paramName) => {
        paramNames.push(paramName);
        return '([^/]+)';
      })
      .replace(/\*\*/g, '.*')   // ** matches anything including slashes
      .replace(/\*/g, '[^/]*');  // * matches anything except slashes
    
    return {
      regex: new RegExp(`^${regexPattern}$`),
      paramNames,
    };
  }
  
  private calculateSpecificity(pattern: string): number {
    // More specific patterns have higher scores
    // /users/123/orders is more specific than /users/**
    let score = 0;
    const segments = pattern.split('/').filter(Boolean);
    
    for (const segment of segments) {
      if (segment === '**') score += 1;      // Least specific
      else if (segment === '*') score += 5;  // Somewhat specific
      else if (segment.includes('{')) score += 10; // Path variable
      else score += 20;                       // Exact segment match
    }
    
    return score;
  }
}
 
interface CompiledMatcher {
  regex: RegExp;
  paramNames: string[];
}
 
interface PathMatchResult {
  matched: boolean;
  pattern: string;
  path: string;
  params: Record<string, string>;
  specificity: number;
}

Route Ordering Matters

Authentication and Security

Authentication: Who Is Making the Request?

Authentication verifies identity. The gateway supports multiple authentication mechanisms depending on client types:

Authentication Mechanisms and Their Use Cases
Mechanism	How It Works	Best For	Security Considerations
API Keys	Static key in header or query param	Simple integrations, internal services	Revocation requires key rotation; no user context; easy to leak
JWT (Bearer Token)	Self-contained signed token with claims	Mobile/web apps, microservices	Stateless validation; must handle expiration and refresh
OAuth 2.0	Delegated authorization via access tokens	Third-party integrations, user consent flows	Complex flows; requires token introspection or JWT validation
mTLS (Mutual TLS)	Client certificate authentication	Service-to-service, high-security B2B	Certificate management overhead; strongest machine identity
HMAC Signatures	Request signing with shared secret	Webhooks, partner APIs	Replay protection needed; clock sync issues
Basic Auth	Username:password in header (base64)	Internal tools, legacy systems	Only with HTTPS; credentials in every request

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
interface AuthResult {
  authenticated: boolean;
  identity?: Identity;
  method?: string;
  error?: AuthError;
}
 
interface Identity {
  type: 'user' | 'service' | 'apiKey';
  id: string;
  tenantId?: string;
  roles: string[];
  permissions: string[];
  metadata: Record<string, unknown>;
  // When the identity was established (for freshness checks)
  authenticatedAt: Date;
  // When this identity expires
  expiresAt?: Date;
}
 
class AuthenticationHandler {
  private readonly strategies: AuthStrategy[];
  
  constructor(
    private readonly jwtValidator: JWTValidator,
    private readonly apiKeyStore: ApiKeyStore,
    private readonly mtlsValidator: MTLSValidator,
  ) {
    // Strategies evaluated in order; first success wins
    this.strategies = [
      new MTLSStrategy(this.mtlsValidator),
      new JWTStrategy(this.jwtValidator),
      new ApiKeyStrategy(this.apiKeyStore),
    ];
  }
  
  async authenticate(request: Request): Promise<AuthResult> {
    // Track which strategies were attempted (for debugging)
    const attempts: { strategy: string; result: 'skipped' | 'failed' | 'success' }[] = [];
    
    for (const strategy of this.strategies) {
      // Check if this strategy applies to this request
      if (!strategy.shouldAttempt(request)) {
        attempts.push({ strategy: strategy.name, result: 'skipped' });
        continue;
      }
      
      try {
        const identity = await strategy.authenticate(request);
        
        if (identity) {
          attempts.push({ strategy: strategy.name, result: 'success' });
          
          // Record successful authentication for observability
          this.recordSuccess(strategy.name, identity);
          
          return {
            authenticated: true,
            identity,
            method: strategy.name,
          };
        }
        
        attempts.push({ strategy: strategy.name, result: 'failed' });
        
      } catch (error) {
        // Log but continue to next strategy
        console.error(`Auth strategy ${strategy.name} threw:`, error);
        attempts.push({ strategy: strategy.name, result: 'failed' });
      }
    }
    
    // No strategy succeeded
    this.recordFailure(attempts);
    
    return {
      authenticated: false,
      error: this.determineError(attempts),
    };
  }
  
  private determineError(attempts: { strategy: string; result: string }[]): AuthError {
    // If all strategies were skipped, no credentials were provided
    if (attempts.every(a => a.result === 'skipped')) {
      return {
        code: 'MISSING_CREDENTIALS',
        message: 'No authentication credentials provided',
        httpStatus: 401,
      };
    }
    
    // If some strategies were attempted but failed
    return {
      code: 'INVALID_CREDENTIALS',
      message: 'Authentication failed',
      httpStatus: 401,
    };
  }
}
 
// JWT Strategy Implementation
class JWTStrategy implements AuthStrategy {
  name = 'jwt';
  
  constructor(private readonly validator: JWTValidator) {}
  
  shouldAttempt(request: Request): boolean {
    const authHeader = request.headers.get('Authorization');
    return authHeader?.startsWith('Bearer ') ?? false;
  }
  
  async authenticate(request: Request): Promise<Identity | null> {
    const token = request.headers.get('Authorization')!.replace('Bearer ', '');
    
    try {
      const payload = await this.validator.verify(token);
      
      // Check token hasn't expired (belt and suspenders)
      if (payload.exp && payload.exp < Date.now() / 1000) {
        return null;
      }
      
      return {
        type: 'user',
        id: payload.sub,
        tenantId: payload.tenant_id,
        roles: payload.roles || [],
        permissions: payload.permissions || [],
        metadata: {
          email: payload.email,
          name: payload.name,
        },
        authenticatedAt: new Date(),
        expiresAt: payload.exp ? new Date(payload.exp * 1000) : undefined,
      };
      
    } catch (error) {
      // Invalid signature, malformed token, etc.
      return null;
    }
  }
}

Authorization: What Can They Access?

Authentication establishes identity; authorization determines permissions. The gateway can enforce authorization at multiple levels:

Route-Level Authorization: Certain routes require specific roles
Resource-Level Authorization: Access to specific resource IDs
Operation-Level Authorization: Different permissions for read vs. write
Attribute-Based Access Control (ABAC): Complex policies based on attributes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
interface AuthorizationPolicy {
  route: string;              // Route pattern this policy applies to
  requiredRoles?: string[];   // Any of these roles grants access (OR)
  requiredPermissions?: string[]; // All of these required (AND)
  conditions?: AuthCondition[]; // Additional runtime checks
  denyPolicy?: DenyPolicy;    // Explicit deny rules (evaluated first)
}
 
interface AuthCondition {
  type: 'rateLimit' | 'timeWindow' | 'ipRange' | 'resourceOwnership' | 'custom';
  params: Record<string, unknown>;
}
 
class AuthorizationEnforcer {
  async authorize(
    identity: Identity,
    request: Request,
    route: RouteMatch,
  ): Promise<AuthorizationResult> {
    const policies = this.getPoliciesForRoute(route.id);
    
    for (const policy of policies) {
      // Deny policies are evaluated first and are absolute
      if (policy.denyPolicy) {
        const denied = await this.evaluateDenyPolicy(policy.denyPolicy, identity, request);
        if (denied) {
          return {
            authorized: false,
            reason: 'EXPLICIT_DENY',
            policy: policy.id,
          };
        }
      }
      
      // Check role requirements (any role matches = pass)
      if (policy.requiredRoles?.length) {
        const hasRole = policy.requiredRoles.some(role => 
          identity.roles.includes(role)
        );
        if (!hasRole) {
          return {
            authorized: false,
            reason: 'MISSING_ROLE',
            required: policy.requiredRoles,
            actual: identity.roles,
          };
        }
      }
      
      // Check permission requirements (all must match)
      if (policy.requiredPermissions?.length) {
        const hasAllPermissions = policy.requiredPermissions.every(perm =>
          identity.permissions.includes(perm) || this.hasWildcardPermission(identity, perm)
        );
        if (!hasAllPermissions) {
          return {
            authorized: false,
            reason: 'MISSING_PERMISSION',
            required: policy.requiredPermissions,
            actual: identity.permissions,
          };
        }
      }
      
      // Evaluate runtime conditions
      if (policy.conditions?.length) {
        for (const condition of policy.conditions) {
          const passed = await this.evaluateCondition(condition, identity, request);
          if (!passed) {
            return {
              authorized: false,
              reason: 'CONDITION_FAILED',
              condition: condition.type,
            };
          }
        }
      }
    }
    
    return { authorized: true };
  }
  
  private hasWildcardPermission(identity: Identity, requiredPerm: string): boolean {
    // Check if user has wildcard permission
    // e.g., 'products:*' should match 'products:read'
    const [resource, action] = requiredPerm.split(':');
    
    return identity.permissions.some(perm => {
      if (perm === '*') return true;  // Super admin
      if (perm === `${resource}:*`) return true;  // Resource admin
      return false;
    });
  }
}

Gateway vs. Service Authorization

Rate Limiting and Throttling

Types of Rate Limiting

Rate limiting isn't one-size-fits-all. Different scenarios require different approaches:

Rate Limiting Strategies
Strategy	Mechanism	Best For	Trade-offs
Fixed Window	Count requests in fixed time intervals (e.g., per minute)	Simple quota enforcement	Burst at window boundaries; not smooth
Sliding Window	Rolling time window for smooth limiting	Fair distribution over time	More complex; requires timestamp tracking
Token Bucket	Tokens refill at steady rate; burst up to bucket size	Allowing controlled bursts	Most flexible; widely used
Leaky Bucket	Requests processed at fixed rate; excess queued or dropped	Smooth outgoing rate	Can introduce latency; queue management
Concurrent Limit	Limit simultaneous in-flight requests	Protecting slow backends	Complements rate limits; different dimension

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
interface RateLimitConfig {
  // Bucket capacity (max burst size)
  bucketSize: number;
  // Tokens added per second (sustained rate)
  refillRate: number;
  // Key type for limiting (global, per-user, per-IP, per-API-key)
  keyType: 'global' | 'userId' | 'ip' | 'apiKey' | 'custom';
  // For custom key type, function to extract key
  keyExtractor?: (req: Request, identity: Identity) => string;
  // Response when rate limited
  rejectionResponse: {
    status: number;
    body: unknown;
    headers: Record<string, string>;
  };
}
 
class TokenBucketRateLimiter {
  constructor(
    private readonly redis: Redis,
    private readonly config: RateLimitConfig,
  ) {}
  
  async checkLimit(request: Request, identity: Identity | null): Promise<RateLimitResult> {
    const key = this.extractKey(request, identity);
    const now = Date.now();
    
    // Lua script for atomic token bucket operation
    // This is critical - non-atomic implementations have race conditions
    const result = await this.redis.eval(
      this.TOKEN_BUCKET_SCRIPT,
      1,                              // Number of keys
      `ratelimit:${key}`,           // Key
      now,                            // Current timestamp
      this.config.bucketSize,         // Bucket capacity
      this.config.refillRate,         // Refill rate (tokens/second)
      1,                              // Tokens requested
    ) as [number, number, number];    // [allowed (0/1), remaining, resetTime]
    
    const [allowed, remaining, resetTime] = result;
    
    return {
      allowed: allowed === 1,
      remaining,
      resetTime: new Date(resetTime),
      limit: this.config.bucketSize,
      headers: {
        'X-RateLimit-Limit': String(this.config.bucketSize),
        'X-RateLimit-Remaining': String(Math.max(0, remaining)),
        'X-RateLimit-Reset': String(Math.ceil(resetTime / 1000)),
      },
    };
  }
  
  private extractKey(request: Request, identity: Identity | null): string {
    switch (this.config.keyType) {
      case 'global':
        return 'global';
      case 'userId':
        return identity?.id ?? 'anonymous';
      case 'ip':
        return this.extractClientIP(request);
      case 'apiKey':
        return request.headers.get('X-API-Key') ?? 'no-key';
      case 'custom':
        return this.config.keyExtractor!(request, identity!);
    }
  }
  
  private extractClientIP(request: Request): string {
    // Check common proxy headers in order of trust
    const xForwardedFor = request.headers.get('X-Forwarded-For');
    if (xForwardedFor) {
      // Take the first (client) IP, not proxies
      return xForwardedFor.split(',')[0].trim();
    }
    
    const xRealIP = request.headers.get('X-Real-IP');
    if (xRealIP) return xRealIP;
    
    // Fallback to connection IP (if available)
    return request.headers.get('CF-Connecting-IP') ?? 'unknown';
  }
  
  // Atomic Lua script for token bucket
  // Ensures no race conditions across distributed gateway instances
  private readonly TOKEN_BUCKET_SCRIPT = `
    local key = KEYS[1]
    local now = tonumber(ARGV[1])
    local capacity = tonumber(ARGV[2])
    local refillRate = tonumber(ARGV[3])
    local requested = tonumber(ARGV[4])
    
    -- Get current bucket state
    local bucket = redis.call('HMGET', key, 'tokens', 'lastRefill')
    local tokens = tonumber(bucket[1]) or capacity
    local lastRefill = tonumber(bucket[2]) or now
    
    -- Calculate token refill since last request
    local elapsed = (now - lastRefill) / 1000  -- Convert to seconds
    local refill = math.min(capacity, tokens + (elapsed * refillRate))
    
    -- Try to consume tokens
    local allowed = 0
    local remaining = refill
    
    if refill >= requested then
      remaining = refill - requested
      allowed = 1
    end
    
    -- Update bucket state
    redis.call('HMSET', key, 'tokens', remaining, 'lastRefill', now)
    redis.call('EXPIRE', key, 3600)  -- Cleanup unused keys after 1 hour
    
    -- Calculate reset time (when bucket will be full again)
    local resetTime = now + ((capacity - remaining) / refillRate * 1000)
    
    return {allowed, remaining, resetTime}
  `;
}
 
interface RateLimitResult {
  allowed: boolean;
  remaining: number;
  resetTime: Date;
  limit: number;
  headers: Record<string, string>;
}

Multi-Dimensional Rate Limiting

Production systems often need rate limits across multiple dimensions simultaneously:

Per-user: Each user gets 1000 requests/hour
Per-IP: Each IP gets 100 requests/minute (catch bots/attacks)
Per-API-key: Each third-party integration has its own quota
Per-endpoint: The expensive report endpoint has a stricter limit
Global: Total system capacity protection

A request is only allowed if it passes all applicable rate limits.

Distributed Rate Limiting Challenges

Request and Response Transformation

Common Transformation Scenarios

Transformation Use Cases

•Header Manipulation — Add internal headers (X-User-Id, X-Request-Id), remove sensitive headers, rename headers for backend compatibility
•Authentication Token Enrichment — Replace opaque OAuth token with decoded claims for backend consumption
•Path Rewriting — Public /api/v1/users → internal /users-service/api/users
•Query Parameter Transformation — Convert pagination styles, add default parameters
•Protocol Translation — HTTP/JSON to gRPC/Protobuf, or vice versa
•Response Filtering — Remove internal fields from responses, redact sensitive data
•Error Response Normalization — Convert diverse backend error formats to consistent client format
•Compression/Decompression — Handle content encoding for backends that don't support it

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
interface TransformationPipeline {
  requestTransforms: RequestTransform[];
  responseTransforms: ResponseTransform[];
}
 
interface RequestTransform {
  type: 'addHeader' | 'removeHeader' | 'rewritePath' | 'addQuery' | 'modifyBody';
  config: unknown;
}
 
class TransformationEngine {
  async transformRequest(
    request: Request,
    transforms: RequestTransform[],
    context: RequestContext,
  ): Promise<Request> {
    let transformed = request.clone();
    
    for (const transform of transforms) {
      transformed = await this.applyRequestTransform(transformed, transform, context);
    }
    
    return transformed;
  }
  
  private async applyRequestTransform(
    request: Request,
    transform: RequestTransform,
    context: RequestContext,
  ): Promise<Request> {
    const headers = new Headers(request.headers);
    
    switch (transform.type) {
      case 'addHeader': {
        const { key, value, valueTemplate } = transform.config as AddHeaderConfig;
        // Support template interpolation: 'User {identity.userId}'
        const resolvedValue = valueTemplate 
          ? this.interpolate(valueTemplate, context)
          : value;
        headers.set(key, resolvedValue);
        break;
      }
      
      case 'removeHeader': {
        const { keys } = transform.config as RemoveHeaderConfig;
        keys.forEach(key => headers.delete(key));
        break;
      }
      
      case 'rewritePath': {
        const { pattern, replacement } = transform.config as RewritePathConfig;
        const url = new URL(request.url);
        url.pathname = url.pathname.replace(new RegExp(pattern), replacement);
        return new Request(url.toString(), { ...request, headers });
      }
      
      case 'modifyBody': {
        const { modifications } = transform.config as ModifyBodyConfig;
        const body = await request.json();
        const modified = this.applyBodyModifications(body, modifications);
        return new Request(request.url, {
          ...request,
          headers,
          body: JSON.stringify(modified),
        });
      }
    }
    
    return new Request(request.url, { ...request, headers });
  }
  
  async transformResponse(
    response: Response,
    transforms: ResponseTransform[],
    context: RequestContext,
  ): Promise<Response> {
    let transformed = response.clone();
    
    for (const transform of transforms) {
      transformed = await this.applyResponseTransform(transformed, transform, context);
    }
    
    return transformed;
  }
  
  private async applyResponseTransform(
    response: Response,
    transform: ResponseTransform,
    context: RequestContext,
  ): Promise<Response> {
    switch (transform.type) {
      case 'filterFields': {
        // Remove internal fields from response
        const { exclude } = transform.config as FilterFieldsConfig;
        const body = await response.json();
        const filtered = this.filterObject(body, exclude);
        
        return new Response(JSON.stringify(filtered), {
          status: response.status,
          headers: response.headers,
        });
      }
      
      case 'normalizeError': {
        // Convert backend error format to standard client format
        if (response.status >= 400) {
          const error = await response.json().catch(() => ({}));
          const normalized = {
            error: {
              code: error.code || error.errorCode || 'UNKNOWN_ERROR',
              message: error.message || error.description || 'An error occurred',
              requestId: context.requestId,
              // Explicitly exclude internal details
            },
          };
          
          return new Response(JSON.stringify(normalized), {
            status: response.status,
            headers: response.headers,
          });
        }
        return response;
      }
      
      case 'addResponseHeader': {
        const { key, value } = transform.config as AddHeaderConfig;
        const headers = new Headers(response.headers);
        headers.set(key, value);
        return new Response(response.body, { 
          status: response.status, 
          headers,
        });
      }
    }
    
    return response;
  }
  
  private filterObject(obj: unknown, excludePaths: string[]): unknown {
    if (typeof obj !== 'object' || obj === null) return obj;
    
    if (Array.isArray(obj)) {
      return obj.map(item => this.filterObject(item, excludePaths));
    }
    
    const result: Record<string, unknown> = {};
    for (const [key, value] of Object.entries(obj)) {
      if (!excludePaths.includes(key)) {
        result[key] = this.filterObject(value, excludePaths);
      }
    }
    return result;
  }
}

Transformation Performance

Observability: Metrics, Logging, and Tracing

The Three Pillars at the Gateway

Metrics

•Request Rate — Requests per second by route, method, status
•Latency Histograms — p50, p95, p99 by route
•Error Rates — 4xx, 5xx breakdown by route and cause
•Rate Limit Hits — How often limits are triggered
•Backend Health — Response times and errors per backend
•Connection Pools — Active connections, pool exhaustion
•Cache Hit Rates — If gateway caches responses

Logging

•Access Logs — Every request with key metadata
•Security Events — Auth failures, rate limit violations
•Error Details — Stack traces, error contexts
•Audit Trail — Who accessed what, when
•Debug Logs — Routing decisions, transformation details
•Circuit Breaker State — Open/close events
•Configuration Changes — Audit of route updates

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
interface RequestMetrics {
  requestId: string;
  startTime: number;
  endTime?: number;
  route: string;
  method: string;
  statusCode?: number;
  backendLatency?: number;
  totalLatency?: number;
  bytesIn: number;
  bytesOut?: number;
  userId?: string;
  clientIp: string;
  userAgent: string;
  cacheHit?: boolean;
  rateLimited?: boolean;
  error?: ErrorInfo;
}
 
class GatewayObservability {
  constructor(
    private readonly metrics: MetricsClient,
    private readonly logger: Logger,
    private readonly tracer: Tracer,
  ) {}
  
  // Called at the start of every request
  startRequest(request: Request): RequestContext {
    const requestId = request.headers.get('X-Request-Id') ?? crypto.randomUUID();
    const startTime = performance.now();
    
    // Start distributed trace
    const span = this.tracer.startSpan('gateway.request', {
      attributes: {
        'http.method': request.method,
        'http.url': request.url,
        'http.user_agent': request.headers.get('User-Agent') ?? 'unknown',
        'request.id': requestId,
      },
    });
    
    return {
      requestId,
      startTime,
      span,
      metrics: {
        requestId,
        startTime,
        method: request.method,
        route: 'unknown', // Set after routing
        clientIp: this.extractClientIP(request),
        userAgent: request.headers.get('User-Agent') ?? 'unknown',
        bytesIn: parseInt(request.headers.get('Content-Length') ?? '0'),
      },
    };
  }
  
  // Called after routing decision
  recordRoute(context: RequestContext, route: RouteMatch): void {
    context.metrics.route = route.id;
    context.span.setAttribute('route.id', route.id);
    context.span.setAttribute('route.backend', route.destination.service);
  }
  
  // Called after authentication
  recordAuth(context: RequestContext, identity: Identity | null): void {
    if (identity) {
      context.metrics.userId = identity.id;
      context.span.setAttribute('user.id', identity.id);
      // Don't log PII in metrics!
    }
  }
  
  // Called for backend request
  startBackendSpan(context: RequestContext, backend: string): Span {
    return this.tracer.startSpan('gateway.backend_request', {
      parent: context.span,
      attributes: {
        'backend.service': backend,
      },
    });
  }
  
  // Called at the end of every request
  finishRequest(
    context: RequestContext,
    response: Response,
    backendLatency?: number,
  ): void {
    const endTime = performance.now();
    const totalLatency = endTime - context.startTime;
    
    // Complete metrics
    context.metrics.endTime = endTime;
    context.metrics.statusCode = response.status;
    context.metrics.totalLatency = totalLatency;
    context.metrics.backendLatency = backendLatency;
    context.metrics.bytesOut = parseInt(response.headers.get('Content-Length') ?? '0');
    
    // Emit Prometheus-style metrics
    this.metrics.histogram('gateway_request_duration_seconds', totalLatency / 1000, {
      route: context.metrics.route,
      method: context.metrics.method,
      status: String(response.status),
    });
    
    this.metrics.counter('gateway_requests_total', 1, {
      route: context.metrics.route,
      method: context.metrics.method,
      status: String(response.status),
    });
    
    if (backendLatency) {
      this.metrics.histogram('gateway_backend_duration_seconds', backendLatency / 1000, {
        route: context.metrics.route,
        backend: context.span.getAttribute('route.backend') as string,
      });
    }
    
    // Structured access log
    this.logger.info('request', {
      request_id: context.requestId,
      method: context.metrics.method,
      route: context.metrics.route,
      status: response.status,
      latency_ms: totalLatency.toFixed(2),
      backend_latency_ms: backendLatency?.toFixed(2),
      bytes_in: context.metrics.bytesIn,
      bytes_out: context.metrics.bytesOut,
      user_id: context.metrics.userId,
      client_ip: context.metrics.clientIp,
      // Omit user_agent from default logs (high cardinality)
    });
    
    // Finish trace span
    context.span.setStatus({ code: response.status >= 400 ? 2 : 0 });
    context.span.end();
  }
  
  // Called on errors
  recordError(context: RequestContext, error: Error): void {
    context.metrics.error = {
      type: error.name,
      message: error.message,
    };
    
    context.span.recordException(error);
    
    this.logger.error('request_error', {
      request_id: context.requestId,
      error_type: error.name,
      error_message: error.message,
      stack: error.stack,
    });
    
    this.metrics.counter('gateway_errors_total', 1, {
      route: context.metrics.route,
      error_type: error.name,
    });
  }
}

Golden Signals at the Gateway

Load Balancing and Fault Tolerance

Backend Load Balancing Strategies

Load Balancing Algorithms
Algorithm	How It Works	Best For	Limitations
Round Robin	Rotate through instances sequentially	Identical instances, uniform request cost	Ignores instance load; simple but naive
Weighted Round Robin	Rotate with weights per instance	Heterogeneous instance capacities	Static weights; doesn't adapt
Least Connections	Send to instance with fewest active requests	Variable request durations	Requires tracking; slight overhead
Least Response Time	Send to instance with fastest recent responses	Performance-sensitive workloads	Requires tracking; can amplify issues
Random	Random selection (with optional weights)	Simple, surprisingly effective	No load awareness
Consistent Hashing	Hash request attribute to determine instance	Cache locality, sticky sessions (careful!)	Uneven distribution possible; requires rebalancing
P2C (Power of Two Choices)	Pick 2 random instances, choose less loaded	Excellent balance with low overhead	Modern default; combines simplicity with adaptivity

Circuit Breakers: Failing Fast

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
enum CircuitState {
  CLOSED,      // Normal operation, requests flow through
  OPEN,        // Failing, requests immediately rejected
  HALF_OPEN,   // Testing if backend recovered
}
 
interface CircuitBreakerConfig {
  // Number of failures before opening circuit
  failureThreshold: number;
  // Time to wait in OPEN state before testing
  resetTimeout: number;
  // Number of successful requests in HALF_OPEN to close
  successThreshold: number;
  // What counts as a failure (status codes, exceptions)
  failureCondition: (response: Response | Error) => boolean;
  // Fallback response when circuit is open
  fallback?: () => Response;
}
 
class CircuitBreaker {
  private state: CircuitState = CircuitState.CLOSED;
  private failureCount: number = 0;
  private successCount: number = 0;
  private lastFailureTime: number = 0;
  private stateChanges: { state: CircuitState; timestamp: number }[] = [];
  
  constructor(
    private readonly name: string,
    private readonly config: CircuitBreakerConfig,
    private readonly metrics: MetricsClient,
  ) {}
  
  async execute<T>(
    operation: () => Promise<Response>,
  ): Promise<Response> {
    // Check if circuit should remain open
    if (this.state === CircuitState.OPEN) {
      if (Date.now() - this.lastFailureTime < this.config.resetTimeout) {
        // Circuit still open, fail fast
        this.metrics.counter('circuit_breaker_rejected', 1, { name: this.name });
        
        if (this.config.fallback) {
          return this.config.fallback();
        }
        
        throw new CircuitOpenError(`Circuit ${this.name} is open`);
      }
      
      // Timeout elapsed, try half-open
      this.transition(CircuitState.HALF_OPEN);
    }
    
    try {
      const response = await operation();
      
      // Check if response counts as failure
      if (this.config.failureCondition(response)) {
        this.recordFailure();
        return response;
      }
      
      // Success!
      this.recordSuccess();
      return response;
      
    } catch (error) {
      this.recordFailure();
      throw error;
    }
  }
  
  private recordSuccess(): void {
    if (this.state === CircuitState.HALF_OPEN) {
      this.successCount++;
      
      if (this.successCount >= this.config.successThreshold) {
        // Backend recovered, close circuit
        this.transition(CircuitState.CLOSED);
        this.failureCount = 0;
        this.successCount = 0;
      }
    } else if (this.state === CircuitState.CLOSED) {
      // Decay failure count on success
      this.failureCount = Math.max(0, this.failureCount - 1);
    }
  }
  
  private recordFailure(): void {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    
    if (this.state === CircuitState.HALF_OPEN) {
      // Still failing, go back to open
      this.transition(CircuitState.OPEN);
      this.successCount = 0;
    } else if (
      this.state === CircuitState.CLOSED &&
      this.failureCount >= this.config.failureThreshold
    ) {
      // Threshold exceeded, open circuit
      this.transition(CircuitState.OPEN);
    }
    
    this.metrics.counter('circuit_breaker_failure', 1, { name: this.name });
  }
  
  private transition(newState: CircuitState): void {
    const oldState = this.state;
    this.state = newState;
    
    this.stateChanges.push({ state: newState, timestamp: Date.now() });
    
    this.metrics.gauge('circuit_breaker_state', newState, { name: this.name });
    
    console.log(`Circuit ${this.name}: ${CircuitState[oldState]} → ${CircuitState[newState]}`);
  }
  
  getState(): CircuitState {
    return this.state;
  }
}

Circuit Breaker Tuning

Summary: Gateway Responsibilities

We've examined the comprehensive set of responsibilities that make an API Gateway the critical infrastructure component it is. Let's consolidate:

Key Takeaways

•Request Routing — Sophisticated matching based on paths, headers, methods, weights, and custom predicates directs traffic to the right backends.
•Authentication & Authorization — Centralized security validates identity (JWT, API keys, mTLS) and enforces access policies before requests reach services.
•Rate Limiting — Token bucket and other algorithms protect infrastructure from abuse, with multi-dimensional limits per user, IP, key, and endpoint.
•Request/Response Transformation — Headers, paths, and bodies can be modified to bridge external API contracts with internal implementations.
•Observability — Metrics, structured logging, and distributed tracing provide complete visibility into every request's journey.
•Load Balancing & Fault Tolerance — P2C, least connections, and circuit breakers ensure traffic is distributed fairly and failures don't cascade.
•Keep It Thin — Despite these responsibilities, the gateway should remain a 'dumb pipe'—routing, securing, and observing, never computing business logic.

What's Next:

Page Complete

2 / 4