Loading content...
The Strangler Fig Pattern's entire viability rests on a single architectural element: the routing façade. This component is the air traffic controller of your migration—it decides which requests go to the legacy monolith and which are directed to the new microservices. Without a well-designed façade, gradual migration is impossible; with a brilliant one, it becomes almost invisible.
Think of the routing façade as the fig tree's seed deposit point—the location where the new system first attaches to the old. Every request to your application passes through this layer, giving you the control necessary to redirect traffic without changing clients, DNS entries, or any external configuration.
The façade serves multiple critical functions:
By the end of this page, you will understand how to design and implement a routing façade, the various technological approaches to façade implementation, strategies for intelligent request routing, techniques for maintaining consistency during migration, and patterns for handling edge cases and failure scenarios.
The classic Façade pattern from the Gang of Four provides a unified interface to a set of interfaces in a subsystem. In the Strangler Fig context, we extend this concept: our façade not only unifies interfaces but also decides which implementation serves each request.
From Traditional Façade to Migration Façade:
A traditional façade hides complexity behind a simple interface. A migration façade does this plus:
The Principle of Transparent Substitution:
The migration façade must implement what we call transparent substitution—clients should be completely unaware of which backend serves their request. This means:
API Contract Preservation: The façade exposes identical APIs regardless of backend. If the monolith returned JSON with specific field names, the new service must too.
Latency Consistency: The façade should add minimal latency. Any significant overhead defeats gradual migration, as shifted traffic will perform differently.
Error Handling Uniformity: Error responses must be consistent across backends. A 404 from the monolith should look identical to a 404 from a microservice.
Header/Cookie Preservation: The façade must correctly proxy headers, cookies, and other metadata. Breaking authentication or session state is a critical failure.
Every proxy hop adds latency—typically 1-5ms for well-optimized proxies, potentially 10-50ms for poorly implemented ones. If your migration adds visible latency, users will notice degraded performance that you'll struggle to explain. Profile your façade under production load before committing to its design.
There are several approaches to implementing the routing façade, each with distinct tradeoffs. The right choice depends on your existing infrastructure, team expertise, and specific requirements.
| Approach | Examples | Best For | Considerations |
|---|---|---|---|
| API Gateway | Kong, AWS API Gateway, Apigee | Organizations already using API management, teams needing rich features | Can add latency; may require license costs; learning curve for complex routing |
| Reverse Proxy | NGINX, HAProxy, Envoy | High-performance requirements, teams with ops expertise | Requires configuration management; less built-in analytics |
| Service Mesh | Istio, Linkerd, Consul Connect | Kubernetes environments, teams with service mesh investment | Complex to operate; overkill if not already adopted |
| In-Monolith Router | Custom code in existing app | Simple migrations, teams wanting minimal infrastructure | Couples routing to monolith lifecycle; harder to change |
| Load Balancer | AWS ALB, F5, Cloudflare | Simple path-based routing, existing infrastructure reuse | Limited routing sophistication; typically edge-only |
NGINX is one of the most common choices for routing façades due to its performance, flexibility, and wide adoption. Here's a configuration pattern for Strangler Fig routing:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
# Upstream definitions for backendsupstream monolith { server monolith.internal:8080; keepalive 32;} upstream user_service { server user-service.internal:8080; keepalive 32;} upstream order_service { server order-service.internal:8080; keepalive 32;} # Use maps for flexible routing configurationmap $request_uri $backend { default monolith; ~^/api/v2/users user_service; ~^/api/v2/orders order_service;} # Split traffic for gradual rollout (using split_clients)split_clients "$request_id" $user_service_variant { 10% user_service; # 10% to new service * monolith; # 90% to monolith} server { listen 443 ssl http2; server_name api.example.com; # SSL configuration ssl_certificate /etc/ssl/certs/api.crt; ssl_certificate_key /etc/ssl/private/api.key; # Preserve client information proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # Default routing based on path location / { proxy_pass http://$backend; proxy_http_version 1.1; proxy_set_header Connection ""; } # Users endpoint with percentage-based split location /api/v2/users { proxy_pass http://$user_service_variant; proxy_http_version 1.1; proxy_set_header Connection ""; } # Health checks location /health { access_log off; return 200 'healthy'; }}How you decide which backend handles a request is a critical design decision. Different strategies suit different stages of migration and different risk tolerances.
/api/v2/users/* goes to user-service, everything else to monolith. Simple, predictable, but coarse-grained.Composing Strategies:
In practice, you'll combine multiple strategies. A sophisticated routing configuration might work like this:
This composition provides layered control: you can always override the default behavior for specific users or requests, while the percentage split handles general traffic.
Always implement a header-based override that lets you force traffic to either backend. During an incident, being able to say 'send X-Force-Legacy: true to bypass the migration' is invaluable. This escape hatch has saved countless on-call engineers in production emergencies.
Before shifting production traffic, you need confidence that the new service behaves identically to the monolith. Shadow traffic (also called dark launching or traffic mirroring) provides this confidence by sending copies of production requests to the new service without serving its responses to users.
How Shadow Traffic Works:
Critical Considerations for Shadow Traffic:
Only Shadow Read Operations: Never shadow writes. Sending a 'create order' request to both backends would create duplicate orders. Shadow GET requests, not POST/PUT/DELETE.
Handle Asynchronous Non-Determinism: Timestamps, UUIDs, and random values will differ. Your comparison logic must normalize these, comparing structure and semantics rather than exact bytes.
Monitor Shadow Latency: If the new service is slow, it's an early warning—even though users don't see it yet. Shadow traffic revealed performance problems before they impact users.
Beware of Side Effects: Even 'read' operations might have side effects (cache warming, analytics events, rate limiting). Ensure shadowing doesn't cause problems.
Sample if Necessary: Shadowing 100% of traffic can overload the new service. Start with a sample rate (1%, 10%) and increase as the service proves stable.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170
interface ComparisonResult { requestId: string; match: boolean; differences: Difference[]; primaryLatencyMs: number; shadowLatencyMs: number;} interface Difference { path: string; primaryValue: unknown; shadowValue: unknown; type: 'missing' | 'extra' | 'mismatch';} class ShadowComparator { private ignorePaths: Set<string> = new Set([ '$.timestamp', '$.requestId', '$.serverVersion', '$.traceId', ]); async compare( requestId: string, primary: Response, shadow: Response, primaryLatencyMs: number, shadowLatencyMs: number ): Promise<ComparisonResult> { const differences: Difference[] = []; // Compare status codes if (primary.status !== shadow.status) { differences.push({ path: '$.statusCode', primaryValue: primary.status, shadowValue: shadow.status, type: 'mismatch', }); } // Compare response bodies const primaryBody = await this.normalizeBody(primary); const shadowBody = await this.normalizeBody(shadow); this.compareObjects('$', primaryBody, shadowBody, differences); const result: ComparisonResult = { requestId, match: differences.length === 0, differences, primaryLatencyMs, shadowLatencyMs, }; // Log discrepancies for investigation if (!result.match) { await this.logDiscrepancy(result); } // Alert on significant latency differences if (shadowLatencyMs > primaryLatencyMs * 2) { await this.alertLatencyRegression(requestId, primaryLatencyMs, shadowLatencyMs); } return result; } private compareObjects( path: string, primary: any, shadow: any, differences: Difference[] ): void { // Skip ignored paths if (this.ignorePaths.has(path)) return; // Normalize timestamps for comparison if (this.isTimestamp(primary) && this.isTimestamp(shadow)) { // Timestamps within 1 second are considered matching if (Math.abs(new Date(primary).getTime() - new Date(shadow).getTime()) < 1000) { return; } } // Handle different types if (typeof primary !== typeof shadow) { differences.push({ path, primaryValue: primary, shadowValue: shadow, type: 'mismatch', }); return; } // Handle arrays if (Array.isArray(primary) && Array.isArray(shadow)) { const maxLength = Math.max(primary.length, shadow.length); for (let i = 0; i < maxLength; i++) { this.compareObjects(`${path}[${i}]`, primary[i], shadow[i], differences); } return; } // Handle objects if (typeof primary === 'object' && primary !== null) { const allKeys = new Set([...Object.keys(primary), ...Object.keys(shadow)]); for (const key of allKeys) { const newPath = `${path}.${key}`; if (!(key in primary)) { differences.push({ path: newPath, primaryValue: undefined, shadowValue: shadow[key], type: 'extra', }); } else if (!(key in shadow)) { differences.push({ path: newPath, primaryValue: primary[key], shadowValue: undefined, type: 'missing', }); } else { this.compareObjects(newPath, primary[key], shadow[key], differences); } } return; } // Handle primitives if (primary !== shadow) { differences.push({ path, primaryValue: primary, shadowValue: shadow, type: 'mismatch', }); } } private isTimestamp(value: unknown): boolean { if (typeof value !== 'string') return false; const date = new Date(value); return !isNaN(date.getTime()); } private async normalizeBody(response: Response): Promise<any> { try { return await response.json(); } catch { return await response.text(); } } private async logDiscrepancy(result: ComparisonResult): Promise<void> { // Send to observability platform console.log('Shadow comparison discrepancy:', JSON.stringify(result, null, 2)); } private async alertLatencyRegression( requestId: string, primaryMs: number, shadowMs: number ): Promise<void> { console.warn(`Latency regression detected: primary=${primaryMs}ms, shadow=${shadowMs}ms`); }}New services will fail. This isn't pessimism—it's realism. The routing façade must gracefully handle these failures, ideally by falling back to the monolith until the new service recovers. This is your safety net during migration.
The Fallback Hierarchy:
Circuit Breaker Configuration:
Circuit breakers prevent a failing service from being overwhelmed with requests (making recovery harder) and protect your system from wasting resources on doomed requests. Key parameters:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163
interface CircuitBreakerState { status: 'closed' | 'open' | 'half-open'; failures: number; successes: number; lastFailureTime: number | null; nextAttemptTime: number | null;} interface CircuitBreakerConfig { failureThreshold: number; // Number of failures to open successThreshold: number; // Successes to close from half-open timeout: number; // Ms before half-open transition volumeThreshold: number; // Min requests before evaluation} class FallbackRouter { private circuits: Map<string, CircuitBreakerState> = new Map(); private config: CircuitBreakerConfig = { failureThreshold: 5, successThreshold: 3, timeout: 30000, volumeThreshold: 10, }; async route(request: Request, primary: Backend, fallback: Backend): Promise<Response> { const circuitId = primary.name; const circuit = this.getCircuit(circuitId); // Check if circuit is open if (circuit.status === 'open') { if (Date.now() < (circuit.nextAttemptTime || 0)) { // Circuit is open, use fallback console.log(`Circuit open for ${circuitId}, using fallback`); return this.callWithMetrics(fallback, request, 'fallback'); } // Time to try again - transition to half-open circuit.status = 'half-open'; circuit.successes = 0; } try { // Attempt primary const response = await this.callWithMetrics(primary, request, 'primary'); if (response.ok) { this.recordSuccess(circuitId); return response; } else if (response.status >= 500) { // Server error - treat as failure this.recordFailure(circuitId); return this.callWithMetrics(fallback, request, 'fallback'); } // Client error (4xx) - not a circuit failure return response; } catch (error) { // Network error, timeout, etc. this.recordFailure(circuitId); console.error(`Primary failed for ${circuitId}:`, error); try { return await this.callWithMetrics(fallback, request, 'fallback'); } catch (fallbackError) { console.error('Fallback also failed:', fallbackError); return new Response( JSON.stringify({ error: 'Service temporarily unavailable' }), { status: 503 } ); } } } private getCircuit(id: string): CircuitBreakerState { if (!this.circuits.has(id)) { this.circuits.set(id, { status: 'closed', failures: 0, successes: 0, lastFailureTime: null, nextAttemptTime: null, }); } return this.circuits.get(id)!; } private recordSuccess(circuitId: string): void { const circuit = this.getCircuit(circuitId); if (circuit.status === 'half-open') { circuit.successes++; if (circuit.successes >= this.config.successThreshold) { // Enough successes - close the circuit circuit.status = 'closed'; circuit.failures = 0; console.log(`Circuit closed for ${circuitId}`); } } else { // Reset failure count on success circuit.failures = Math.max(0, circuit.failures - 1); } } private recordFailure(circuitId: string): void { const circuit = this.getCircuit(circuitId); if (circuit.status === 'half-open') { // Failed during probe - reopen circuit circuit.status = 'open'; circuit.nextAttemptTime = Date.now() + this.config.timeout; console.log(`Circuit reopened for ${circuitId}`); } else { circuit.failures++; circuit.lastFailureTime = Date.now(); if (circuit.failures >= this.config.failureThreshold) { // Threshold exceeded - open circuit circuit.status = 'open'; circuit.nextAttemptTime = Date.now() + this.config.timeout; console.log(`Circuit opened for ${circuitId} after ${circuit.failures} failures`); } } } private async callWithMetrics( backend: Backend, request: Request, role: 'primary' | 'fallback' ): Promise<Response> { const start = Date.now(); try { const response = await backend.handle(request); const duration = Date.now() - start; // Emit metrics this.emitMetric('request_duration_ms', duration, { backend: backend.name, role, status: response.status.toString(), }); return response; } catch (error) { const duration = Date.now() - start; this.emitMetric('request_duration_ms', duration, { backend: backend.name, role, status: 'error', }); throw error; } } private emitMetric(name: string, value: number, labels: Record<string, string>): void { // Integration with your metrics system (Prometheus, DataDog, etc.) console.log(`Metric: ${name}=${value}`, labels); }} interface Backend { name: string; handle(request: Request): Promise<Response>;}If you've shifted 80% of traffic to a new service and it fails, that traffic will suddenly hit the monolith. Ensure the monolith can still handle full load, or your fallback becomes a cascading failure. Test fallback scenarios regularly to verify capacity assumptions.
One of the trickiest aspects of routing façade design is handling session state and stateful interactions. When a user's requests might be handled by different backends, state consistency becomes critical.
Common Stateful Challenges:
Session Storage: User sessions may be stored in the monolith's local memory or database. New services need access.
Shopping Cart State: During checkout migration, cart state might be split between old and new systems.
Multi-Step Flows: Wizards, checkout processes, and other multi-step operations may span both backends.
Authentication Tokens: JWTs or session cookies need to be valid across all backends.
Rate Limiting State: Request counts for rate limiting must be consistent across backends.
Sticky Sessions: Use with Caution
Sticky sessions (routing all requests from the same user to the same backend) seem like an easy solution. Indeed, they can help during migration by ensuring consistency. However:
Use sticky sessions as an optimization (avoiding session lookup) rather than a correctness requirement (the only way things work). Your system should function correctly even without stickiness—just potentially with higher latency for session lookup.
Before you begin migration, externalize all stateful dependencies. Move sessions to Redis, move caches to a shared tier, move rate-limiting state to a distributed counter. This investment pays dividends throughout migration by making any request routable to any backend.
When requests can go to either the monolith or microservices, debugging becomes significantly more complex. Your observability strategy must account for this complexity, providing clear visibility into routing decisions and cross-backend request flows.
Essential Headers for Debugging:
Inject these headers at the façade for every response, enabling rapid debugging:
X-Backend: user-service # Which backend handled this
X-Backend-Version: 1.2.3 # Backend's version
X-Correlation-ID: abc-123 # Request trace ID
X-Route-Reason: percentage-split # Why this routing was chosen
X-Fallback-Used: false # Whether fallback was triggered
These headers let any engineer quickly determine which system handled a request, invaluable when investigating discrepancies or debugging production issues.
Build a dashboard that shows, for each migrated endpoint: request volume, error rate, P50/P95/P99 latency, and response size—split by backend. This single dashboard becomes your migration control center, instantly revealing differences between old and new implementations.
The routing façade is the linchpin of the Strangler Fig Pattern. Without it, gradual migration is impossible. With a well-designed façade, migration becomes a controlled, reversible, observable process.
What's Next:
With the routing façade in place, the next page explores Extracting Functionality—the methodology for identifying, isolating, and migrating bounded contexts from the monolith to independent microservices. This is where the actual work of decomposition happens.
You now understand how to design and implement the routing façade that enables gradual migration. From implementation approaches to routing strategies, shadow traffic to circuit breakers, you have the tools to build a robust traffic management layer. Next, we'll learn how to identify and extract functionality from the monolith.