Strangler Fig Pattern - Learning Module

Loading content...

0/273

Façade for Routing

The Traffic Controller of Migration

The Strangler Fig Pattern's entire viability rests on a single architectural element: the routing façade. This component is the air traffic controller of your migration—it decides which requests go to the legacy monolith and which are directed to the new microservices. Without a well-designed façade, gradual migration is impossible; with a brilliant one, it becomes almost invisible.

Think of the routing façade as the fig tree's seed deposit point—the location where the new system first attaches to the old. Every request to your application passes through this layer, giving you the control necessary to redirect traffic without changing clients, DNS entries, or any external configuration.

The façade serves multiple critical functions:

Routes requests to appropriate backends based on configurable rules
Provides a single entry point regardless of where functionality lives
Enables percentage-based traffic splitting for gradual rollouts
Offers a fallback mechanism when new services fail
Maintains consistent API contracts even as implementations change

What You Will Learn

By the end of this page, you will understand how to design and implement a routing façade, the various technological approaches to façade implementation, strategies for intelligent request routing, techniques for maintaining consistency during migration, and patterns for handling edge cases and failure scenarios.

The Façade Pattern in Migration Context

The classic Façade pattern from the Gang of Four provides a unified interface to a set of interfaces in a subsystem. In the Strangler Fig context, we extend this concept: our façade not only unifies interfaces but also decides which implementation serves each request.

From Traditional Façade to Migration Façade:

A traditional façade hides complexity behind a simple interface. A migration façade does this plus:

Maintains multiple implementations of the same logical endpoint
Decides at runtime which implementation handles each request
Supports gradual traffic shifting between implementations
Provides monitoring and comparison capabilities
Enables instant rollback without deployment

Converting Mermaid diagram...

The Principle of Transparent Substitution:

The migration façade must implement what we call transparent substitution—clients should be completely unaware of which backend serves their request. This means:

API Contract Preservation: The façade exposes identical APIs regardless of backend. If the monolith returned JSON with specific field names, the new service must too.
Latency Consistency: The façade should add minimal latency. Any significant overhead defeats gradual migration, as shifted traffic will perform differently.
Error Handling Uniformity: Error responses must be consistent across backends. A 404 from the monolith should look identical to a 404 from a microservice.
Header/Cookie Preservation: The façade must correctly proxy headers, cookies, and other metadata. Breaking authentication or session state is a critical failure.

The Latency Tax

Every proxy hop adds latency—typically 1-5ms for well-optimized proxies, potentially 10-50ms for poorly implemented ones. If your migration adds visible latency, users will notice degraded performance that you'll struggle to explain. Profile your façade under production load before committing to its design.

Implementation Approaches

There are several approaches to implementing the routing façade, each with distinct tradeoffs. The right choice depends on your existing infrastructure, team expertise, and specific requirements.

Façade Implementation Options
Approach	Examples	Best For	Considerations
API Gateway	Kong, AWS API Gateway, Apigee	Organizations already using API management, teams needing rich features	Can add latency; may require license costs; learning curve for complex routing
Reverse Proxy	NGINX, HAProxy, Envoy	High-performance requirements, teams with ops expertise	Requires configuration management; less built-in analytics
Service Mesh	Istio, Linkerd, Consul Connect	Kubernetes environments, teams with service mesh investment	Complex to operate; overkill if not already adopted
In-Monolith Router	Custom code in existing app	Simple migrations, teams wanting minimal infrastructure	Couples routing to monolith lifecycle; harder to change
Load Balancer	AWS ALB, F5, Cloudflare	Simple path-based routing, existing infrastructure reuse	Limited routing sophistication; typically edge-only

NGINX is one of the most common choices for routing façades due to its performance, flexibility, and wide adoption. Here's a configuration pattern for Strangler Fig routing:

nginx.conf
NGINX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# Upstream definitions for backends
upstream monolith {
    server monolith.internal:8080;
    keepalive 32;
}
 
upstream user_service {
    server user-service.internal:8080;
    keepalive 32;
}
 
upstream order_service {
    server order-service.internal:8080;
    keepalive 32;
}
 
# Use maps for flexible routing configuration
map $request_uri $backend {
    default             monolith;
    ~^/api/v2/users     user_service;
    ~^/api/v2/orders    order_service;
}
 
# Split traffic for gradual rollout (using split_clients)
split_clients "$request_id" $user_service_variant {
    10%     user_service;    # 10% to new service
    *       monolith;        # 90% to monolith
}
 
server {
    listen 443 ssl http2;
    server_name api.example.com;
    
    # SSL configuration
    ssl_certificate     /etc/ssl/certs/api.crt;
    ssl_certificate_key /etc/ssl/private/api.key;
    
    # Preserve client information
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    
    # Default routing based on path
    location / {
        proxy_pass http://$backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
    
    # Users endpoint with percentage-based split
    location /api/v2/users {
        proxy_pass http://$user_service_variant;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
    
    # Health checks
    location /health {
        access_log off;
        return 200 'healthy';
    }
}

Routing Strategies

How you decide which backend handles a request is a critical design decision. Different strategies suit different stages of migration and different risk tolerances.

Primary Routing Strategies

•Path-Based Routing — Route based on URL path. /api/v2/users/* goes to user-service, everything else to monolith. Simple, predictable, but coarse-grained.
•Header-Based Routing — Route based on request headers. Use feature flags, A/B testing headers, or custom migration headers. Allows fine-grained control without changing URLs.
•Percentage-Based Routing — Route a percentage of traffic to each backend. Essential for gradual rollout. Start at 1%, increase as confidence grows.
•User-Based Routing — Route specific users or user segments to the new backend. Test with internal users first, then expand to beta users, then general availability.
•Geographic Routing — Route based on client geography. Test in one region before global rollout. Useful for minimizing blast radius.
•Sticky Routing — Ensure a user's session consistently hits the same backend. Critical when backends have different state or behavior.

Composing Strategies:

In practice, you'll combine multiple strategies. A sophisticated routing configuration might work like this:

Is this an internal user? → Route to new service (user-based)
Is this a beta user? → Route to new service (user-based)
Is there a migration header? → Route based on header value (header-based)
Otherwise → Route percentage-based with sticky sessions

This composition provides layered control: you can always override the default behavior for specific users or requests, while the percentage split handles general traffic.

Converting Mermaid diagram...

The Override Escape Hatch

Always implement a header-based override that lets you force traffic to either backend. During an incident, being able to say 'send X-Force-Legacy: true to bypass the migration' is invaluable. This escape hatch has saved countless on-call engineers in production emergencies.

Shadow Traffic and Response Comparison

Before shifting production traffic, you need confidence that the new service behaves identically to the monolith. Shadow traffic (also called dark launching or traffic mirroring) provides this confidence by sending copies of production requests to the new service without serving its responses to users.

How Shadow Traffic Works:

Real request arrives at the façade
Façade routes to monolith and serves that response to the user
Simultaneously, façade copies the request to the new service
New service's response is logged and compared, but not served
Discrepancies are flagged for investigation

Converting Mermaid diagram...

Critical Considerations for Shadow Traffic:

Only Shadow Read Operations: Never shadow writes. Sending a 'create order' request to both backends would create duplicate orders. Shadow GET requests, not POST/PUT/DELETE.
Handle Asynchronous Non-Determinism: Timestamps, UUIDs, and random values will differ. Your comparison logic must normalize these, comparing structure and semantics rather than exact bytes.
Monitor Shadow Latency: If the new service is slow, it's an early warning—even though users don't see it yet. Shadow traffic revealed performance problems before they impact users.
Beware of Side Effects: Even 'read' operations might have side effects (cache warming, analytics events, rate limiting). Ensure shadowing doesn't cause problems.
Sample if Necessary: Shadowing 100% of traffic can overload the new service. Start with a sample rate (1%, 10%) and increase as the service proves stable.

ShadowComparator.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
interface ComparisonResult {
    requestId: string;
    match: boolean;
    differences: Difference[];
    primaryLatencyMs: number;
    shadowLatencyMs: number;
}
 
interface Difference {
    path: string;
    primaryValue: unknown;
    shadowValue: unknown;
    type: 'missing' | 'extra' | 'mismatch';
}
 
class ShadowComparator {
    private ignorePaths: Set<string> = new Set([
        '$.timestamp',
        '$.requestId',
        '$.serverVersion',
        '$.traceId',
    ]);
    
    async compare(
        requestId: string,
        primary: Response,
        shadow: Response,
        primaryLatencyMs: number,
        shadowLatencyMs: number
    ): Promise<ComparisonResult> {
        const differences: Difference[] = [];
        
        // Compare status codes
        if (primary.status !== shadow.status) {
            differences.push({
                path: '$.statusCode',
                primaryValue: primary.status,
                shadowValue: shadow.status,
                type: 'mismatch',
            });
        }
        
        // Compare response bodies
        const primaryBody = await this.normalizeBody(primary);
        const shadowBody = await this.normalizeBody(shadow);
        
        this.compareObjects('$', primaryBody, shadowBody, differences);
        
        const result: ComparisonResult = {
            requestId,
            match: differences.length === 0,
            differences,
            primaryLatencyMs,
            shadowLatencyMs,
        };
        
        // Log discrepancies for investigation
        if (!result.match) {
            await this.logDiscrepancy(result);
        }
        
        // Alert on significant latency differences
        if (shadowLatencyMs > primaryLatencyMs * 2) {
            await this.alertLatencyRegression(requestId, primaryLatencyMs, shadowLatencyMs);
        }
        
        return result;
    }
    
    private compareObjects(
        path: string,
        primary: any,
        shadow: any,
        differences: Difference[]
    ): void {
        // Skip ignored paths
        if (this.ignorePaths.has(path)) return;
        
        // Normalize timestamps for comparison
        if (this.isTimestamp(primary) && this.isTimestamp(shadow)) {
            // Timestamps within 1 second are considered matching
            if (Math.abs(new Date(primary).getTime() - new Date(shadow).getTime()) < 1000) {
                return;
            }
        }
        
        // Handle different types
        if (typeof primary !== typeof shadow) {
            differences.push({
                path,
                primaryValue: primary,
                shadowValue: shadow,
                type: 'mismatch',
            });
            return;
        }
        
        // Handle arrays
        if (Array.isArray(primary) && Array.isArray(shadow)) {
            const maxLength = Math.max(primary.length, shadow.length);
            for (let i = 0; i < maxLength; i++) {
                this.compareObjects(`${path}[${i}]`, primary[i], shadow[i], differences);
            }
            return;
        }
        
        // Handle objects
        if (typeof primary === 'object' && primary !== null) {
            const allKeys = new Set([...Object.keys(primary), ...Object.keys(shadow)]);
            for (const key of allKeys) {
                const newPath = `${path}.${key}`;
                if (!(key in primary)) {
                    differences.push({
                        path: newPath,
                        primaryValue: undefined,
                        shadowValue: shadow[key],
                        type: 'extra',
                    });
                } else if (!(key in shadow)) {
                    differences.push({
                        path: newPath,
                        primaryValue: primary[key],
                        shadowValue: undefined,
                        type: 'missing',
                    });
                } else {
                    this.compareObjects(newPath, primary[key], shadow[key], differences);
                }
            }
            return;
        }
        
        // Handle primitives
        if (primary !== shadow) {
            differences.push({
                path,
                primaryValue: primary,
                shadowValue: shadow,
                type: 'mismatch',
            });
        }
    }
    
    private isTimestamp(value: unknown): boolean {
        if (typeof value !== 'string') return false;
        const date = new Date(value);
        return !isNaN(date.getTime());
    }
    
    private async normalizeBody(response: Response): Promise<any> {
        try {
            return await response.json();
        } catch {
            return await response.text();
        }
    }
    
    private async logDiscrepancy(result: ComparisonResult): Promise<void> {
        // Send to observability platform
        console.log('Shadow comparison discrepancy:', JSON.stringify(result, null, 2));
    }
    
    private async alertLatencyRegression(
        requestId: string,
        primaryMs: number,
        shadowMs: number
    ): Promise<void> {
        console.warn(`Latency regression detected: primary=${primaryMs}ms, shadow=${shadowMs}ms`);
    }
}

Fallback and Circuit Breaking

New services will fail. This isn't pessimism—it's realism. The routing façade must gracefully handle these failures, ideally by falling back to the monolith until the new service recovers. This is your safety net during migration.

The Fallback Hierarchy:

Primary: Route to designated backend (new service or monolith based on routing rules)
Fallback: If primary fails, route to fallback (usually monolith)
Degraded: If both fail, return cached response or graceful error
Circuit Open: If failure rate exceeds threshold, skip primary entirely until recovery

Converting Mermaid diagram...

Circuit Breaker Configuration:

Circuit breakers prevent a failing service from being overwhelmed with requests (making recovery harder) and protect your system from wasting resources on doomed requests. Key parameters:

Failure Threshold: Number or percentage of failures before opening (e.g., 50% of last 10 requests)
Success Threshold: Number of successes in half-open state before closing (e.g., 3 consecutive)
Timeout: How long the circuit stays open before testing (e.g., 30 seconds)
Request Volume: Minimum requests before evaluation (e.g., don't judge on first 5 requests)

FallbackRouter.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
interface CircuitBreakerState {
    status: 'closed' | 'open' | 'half-open';
    failures: number;
    successes: number;
    lastFailureTime: number | null;
    nextAttemptTime: number | null;
}
 
interface CircuitBreakerConfig {
    failureThreshold: number;      // Number of failures to open
    successThreshold: number;      // Successes to close from half-open
    timeout: number;               // Ms before half-open transition
    volumeThreshold: number;       // Min requests before evaluation
}
 
class FallbackRouter {
    private circuits: Map<string, CircuitBreakerState> = new Map();
    private config: CircuitBreakerConfig = {
        failureThreshold: 5,
        successThreshold: 3,
        timeout: 30000,
        volumeThreshold: 10,
    };
    
    async route(request: Request, primary: Backend, fallback: Backend): Promise<Response> {
        const circuitId = primary.name;
        const circuit = this.getCircuit(circuitId);
        
        // Check if circuit is open
        if (circuit.status === 'open') {
            if (Date.now() < (circuit.nextAttemptTime || 0)) {
                // Circuit is open, use fallback
                console.log(`Circuit open for ${circuitId}, using fallback`);
                return this.callWithMetrics(fallback, request, 'fallback');
            }
            // Time to try again - transition to half-open
            circuit.status = 'half-open';
            circuit.successes = 0;
        }
        
        try {
            // Attempt primary
            const response = await this.callWithMetrics(primary, request, 'primary');
            
            if (response.ok) {
                this.recordSuccess(circuitId);
                return response;
            } else if (response.status >= 500) {
                // Server error - treat as failure
                this.recordFailure(circuitId);
                return this.callWithMetrics(fallback, request, 'fallback');
            }
            
            // Client error (4xx) - not a circuit failure
            return response;
            
        } catch (error) {
            // Network error, timeout, etc.
            this.recordFailure(circuitId);
            console.error(`Primary failed for ${circuitId}:`, error);
            
            try {
                return await this.callWithMetrics(fallback, request, 'fallback');
            } catch (fallbackError) {
                console.error('Fallback also failed:', fallbackError);
                return new Response(
                    JSON.stringify({ error: 'Service temporarily unavailable' }),
                    { status: 503 }
                );
            }
        }
    }
    
    private getCircuit(id: string): CircuitBreakerState {
        if (!this.circuits.has(id)) {
            this.circuits.set(id, {
                status: 'closed',
                failures: 0,
                successes: 0,
                lastFailureTime: null,
                nextAttemptTime: null,
            });
        }
        return this.circuits.get(id)!;
    }
    
    private recordSuccess(circuitId: string): void {
        const circuit = this.getCircuit(circuitId);
        
        if (circuit.status === 'half-open') {
            circuit.successes++;
            if (circuit.successes >= this.config.successThreshold) {
                // Enough successes - close the circuit
                circuit.status = 'closed';
                circuit.failures = 0;
                console.log(`Circuit closed for ${circuitId}`);
            }
        } else {
            // Reset failure count on success
            circuit.failures = Math.max(0, circuit.failures - 1);
        }
    }
    
    private recordFailure(circuitId: string): void {
        const circuit = this.getCircuit(circuitId);
        
        if (circuit.status === 'half-open') {
            // Failed during probe - reopen circuit
            circuit.status = 'open';
            circuit.nextAttemptTime = Date.now() + this.config.timeout;
            console.log(`Circuit reopened for ${circuitId}`);
        } else {
            circuit.failures++;
            circuit.lastFailureTime = Date.now();
            
            if (circuit.failures >= this.config.failureThreshold) {
                // Threshold exceeded - open circuit
                circuit.status = 'open';
                circuit.nextAttemptTime = Date.now() + this.config.timeout;
                console.log(`Circuit opened for ${circuitId} after ${circuit.failures} failures`);
            }
        }
    }
    
    private async callWithMetrics(
        backend: Backend,
        request: Request,
        role: 'primary' | 'fallback'
    ): Promise<Response> {
        const start = Date.now();
        try {
            const response = await backend.handle(request);
            const duration = Date.now() - start;
            
            // Emit metrics
            this.emitMetric('request_duration_ms', duration, {
                backend: backend.name,
                role,
                status: response.status.toString(),
            });
            
            return response;
        } catch (error) {
            const duration = Date.now() - start;
            this.emitMetric('request_duration_ms', duration, {
                backend: backend.name,
                role,
                status: 'error',
            });
            throw error;
        }
    }
    
    private emitMetric(name: string, value: number, labels: Record<string, string>): void {
        // Integration with your metrics system (Prometheus, DataDog, etc.)
        console.log(`Metric: ${name}=${value}`, labels);
    }
}
 
interface Backend {
    name: string;
    handle(request: Request): Promise<Response>;
}

Fallback Capacity Planning

If you've shifted 80% of traffic to a new service and it fails, that traffic will suddenly hit the monolith. Ensure the monolith can still handle full load, or your fallback becomes a cascading failure. Test fallback scenarios regularly to verify capacity assumptions.

Session and State Management

One of the trickiest aspects of routing façade design is handling session state and stateful interactions. When a user's requests might be handled by different backends, state consistency becomes critical.

Common Stateful Challenges:

Session Storage: User sessions may be stored in the monolith's local memory or database. New services need access.
Shopping Cart State: During checkout migration, cart state might be split between old and new systems.
Multi-Step Flows: Wizards, checkout processes, and other multi-step operations may span both backends.
Authentication Tokens: JWTs or session cookies need to be valid across all backends.
Rate Limiting State: Request counts for rate limiting must be consistent across backends.

Anti-Patterns

•Storing session in backend-local memory
•Depending on sticky sessions for correctness (not just optimization)
•Duplicating state across backends without synchronization
•Assuming cookies set by one backend will be sent to another
•Hard-coding backend-specific session handling

Best Practices

•Externalize session to shared store (Redis, DynamoDB)
•Use JWT with claims, avoiding server-side session lookup
•Implement session-awareness in the façade layer
•Version session schemas for backward compatibility
•Test multi-backend session flows explicitly

Sticky Sessions: Use with Caution

Sticky sessions (routing all requests from the same user to the same backend) seem like an easy solution. Indeed, they can help during migration by ensuring consistency. However:

They obscure problems: If a user only ever hits one backend, you won't discover incompatibilities until you disable stickiness
They complicate rollout: You can't do percentage-based traffic shifting per-user
They create failure scenarios: If a backend crashes, stuck users experience errors until the sticky cookie expires

Use sticky sessions as an optimization (avoiding session lookup) rather than a correctness requirement (the only way things work). Your system should function correctly even without stickiness—just potentially with higher latency for session lookup.

The Externalized State Principle

Before you begin migration, externalize all stateful dependencies. Move sessions to Redis, move caches to a shared tier, move rate-limiting state to a distributed counter. This investment pays dividends throughout migration by making any request routable to any backend.

Observability and Debugging

When requests can go to either the monolith or microservices, debugging becomes significantly more complex. Your observability strategy must account for this complexity, providing clear visibility into routing decisions and cross-backend request flows.

Required Observability Capabilities

•Correlation IDs: Every request gets a unique ID at the façade, propagated to all backends. Enables tracing the complete request path across systems.
•Backend Tagging: Metrics and logs tagged with which backend handled the request. Allows comparing error rates, latencies, and behaviors between backends.
•Routing Decision Logging: Log why each request was routed where. Debug configuration issues and verify routing rules work as expected.
•Comparative Dashboards: Side-by-side views of monolith vs. microservice metrics for migrated endpoints. Spot differences before users notice.
•Fallback Alerting: Alert when fallback is triggered, especially for extended periods. Don't silently limp along on the monolith.
•Traffic Split Visualization: See actual traffic distribution across backends. Verify that configured percentages match reality.

Essential Headers for Debugging:

Inject these headers at the façade for every response, enabling rapid debugging:

X-Backend: user-service          # Which backend handled this
X-Backend-Version: 1.2.3          # Backend's version
X-Correlation-ID: abc-123         # Request trace ID
X-Route-Reason: percentage-split  # Why this routing was chosen
X-Fallback-Used: false            # Whether fallback was triggered

These headers let any engineer quickly determine which system handled a request, invaluable when investigating discrepancies or debugging production issues.

The Comparative Dashboard

Build a dashboard that shows, for each migrated endpoint: request volume, error rate, P50/P95/P99 latency, and response size—split by backend. This single dashboard becomes your migration control center, instantly revealing differences between old and new implementations.

Summary: The Routing Façade Mastery

The routing façade is the linchpin of the Strangler Fig Pattern. Without it, gradual migration is impossible. With a well-designed façade, migration becomes a controlled, reversible, observable process.

Key Takeaways

•The façade enables transparent substitution — Clients should be unaware of which backend serves their request. API contracts, latency, and error handling must be consistent.
•Multiple implementation approaches exist — API gateways, reverse proxies, service meshes, and in-monolith routers each have tradeoffs. Choose based on your context.
•Routing strategies can be composed — Path-based, header-based, percentage-based, and user-based routing combine for fine-grained traffic control.
•Shadow traffic validates before rollout — Mirror production traffic to new services, compare responses, and fix discrepancies before users see them.
•Fallback and circuit breaking provide safety — When new services fail, gracefully route to fallback. Circuit breakers prevent cascading failures.
•Externalize stateful dependencies — Sessions, caches, and rate-limiting state must be accessible from all backends for correct routing.
•Observability is non-negotiable — Correlation IDs, backend tagging, and comparative dashboards are essential for debugging distributed routing.

What's Next:

With the routing façade in place, the next page explores Extracting Functionality—the methodology for identifying, isolating, and migrating bounded contexts from the monolith to independent microservices. This is where the actual work of decomposition happens.

Page Complete

You now understand how to design and implement the routing façade that enables gradual migration. From implementation approaches to routing strategies, shadow traffic to circuit breakers, you have the tools to build a robust traffic management layer. Next, we'll learn how to identify and extract functionality from the monolith.

Façade for Routing

The Traffic Controller of Migration

The façade serves multiple critical functions:

Routes requests to appropriate backends based on configurable rules
Provides a single entry point regardless of where functionality lives
Enables percentage-based traffic splitting for gradual rollouts
Offers a fallback mechanism when new services fail
Maintains consistent API contracts even as implementations change

What You Will Learn

The Façade Pattern in Migration Context

From Traditional Façade to Migration Façade:

A traditional façade hides complexity behind a simple interface. A migration façade does this plus:

Maintains multiple implementations of the same logical endpoint
Decides at runtime which implementation handles each request
Supports gradual traffic shifting between implementations
Provides monitoring and comparison capabilities
Enables instant rollback without deployment

Converting Mermaid diagram...

The Principle of Transparent Substitution:

The migration façade must implement what we call transparent substitution—clients should be completely unaware of which backend serves their request. This means:

API Contract Preservation: The façade exposes identical APIs regardless of backend. If the monolith returned JSON with specific field names, the new service must too.
Latency Consistency: The façade should add minimal latency. Any significant overhead defeats gradual migration, as shifted traffic will perform differently.
Error Handling Uniformity: Error responses must be consistent across backends. A 404 from the monolith should look identical to a 404 from a microservice.
Header/Cookie Preservation: The façade must correctly proxy headers, cookies, and other metadata. Breaking authentication or session state is a critical failure.

The Latency Tax

Implementation Approaches

There are several approaches to implementing the routing façade, each with distinct tradeoffs. The right choice depends on your existing infrastructure, team expertise, and specific requirements.

Façade Implementation Options
Approach	Examples	Best For	Considerations
API Gateway	Kong, AWS API Gateway, Apigee	Organizations already using API management, teams needing rich features	Can add latency; may require license costs; learning curve for complex routing
Reverse Proxy	NGINX, HAProxy, Envoy	High-performance requirements, teams with ops expertise	Requires configuration management; less built-in analytics
Service Mesh	Istio, Linkerd, Consul Connect	Kubernetes environments, teams with service mesh investment	Complex to operate; overkill if not already adopted
In-Monolith Router	Custom code in existing app	Simple migrations, teams wanting minimal infrastructure	Couples routing to monolith lifecycle; harder to change
Load Balancer	AWS ALB, F5, Cloudflare	Simple path-based routing, existing infrastructure reuse	Limited routing sophistication; typically edge-only

NGINX is one of the most common choices for routing façades due to its performance, flexibility, and wide adoption. Here's a configuration pattern for Strangler Fig routing:

nginx.conf
NGINX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# Upstream definitions for backends
upstream monolith {
    server monolith.internal:8080;
    keepalive 32;
}
 
upstream user_service {
    server user-service.internal:8080;
    keepalive 32;
}
 
upstream order_service {
    server order-service.internal:8080;
    keepalive 32;
}
 
# Use maps for flexible routing configuration
map $request_uri $backend {
    default             monolith;
    ~^/api/v2/users     user_service;
    ~^/api/v2/orders    order_service;
}
 
# Split traffic for gradual rollout (using split_clients)
split_clients "$request_id" $user_service_variant {
    10%     user_service;    # 10% to new service
    *       monolith;        # 90% to monolith
}
 
server {
    listen 443 ssl http2;
    server_name api.example.com;
    
    # SSL configuration
    ssl_certificate     /etc/ssl/certs/api.crt;
    ssl_certificate_key /etc/ssl/private/api.key;
    
    # Preserve client information
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    
    # Default routing based on path
    location / {
        proxy_pass http://$backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
    
    # Users endpoint with percentage-based split
    location /api/v2/users {
        proxy_pass http://$user_service_variant;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
    
    # Health checks
    location /health {
        access_log off;
        return 200 'healthy';
    }
}

Routing Strategies

How you decide which backend handles a request is a critical design decision. Different strategies suit different stages of migration and different risk tolerances.

Primary Routing Strategies

•Path-Based Routing — Route based on URL path. /api/v2/users/* goes to user-service, everything else to monolith. Simple, predictable, but coarse-grained.
•Header-Based Routing — Route based on request headers. Use feature flags, A/B testing headers, or custom migration headers. Allows fine-grained control without changing URLs.
•Percentage-Based Routing — Route a percentage of traffic to each backend. Essential for gradual rollout. Start at 1%, increase as confidence grows.
•User-Based Routing — Route specific users or user segments to the new backend. Test with internal users first, then expand to beta users, then general availability.
•Geographic Routing — Route based on client geography. Test in one region before global rollout. Useful for minimizing blast radius.
•Sticky Routing — Ensure a user's session consistently hits the same backend. Critical when backends have different state or behavior.

Composing Strategies:

In practice, you'll combine multiple strategies. A sophisticated routing configuration might work like this:

Is this an internal user? → Route to new service (user-based)
Is this a beta user? → Route to new service (user-based)
Is there a migration header? → Route based on header value (header-based)
Otherwise → Route percentage-based with sticky sessions

This composition provides layered control: you can always override the default behavior for specific users or requests, while the percentage split handles general traffic.

Converting Mermaid diagram...

The Override Escape Hatch

Shadow Traffic and Response Comparison

How Shadow Traffic Works:

Real request arrives at the façade
Façade routes to monolith and serves that response to the user
Simultaneously, façade copies the request to the new service
New service's response is logged and compared, but not served
Discrepancies are flagged for investigation

Converting Mermaid diagram...

Critical Considerations for Shadow Traffic:

Only Shadow Read Operations: Never shadow writes. Sending a 'create order' request to both backends would create duplicate orders. Shadow GET requests, not POST/PUT/DELETE.
Handle Asynchronous Non-Determinism: Timestamps, UUIDs, and random values will differ. Your comparison logic must normalize these, comparing structure and semantics rather than exact bytes.
Monitor Shadow Latency: If the new service is slow, it's an early warning—even though users don't see it yet. Shadow traffic revealed performance problems before they impact users.
Beware of Side Effects: Even 'read' operations might have side effects (cache warming, analytics events, rate limiting). Ensure shadowing doesn't cause problems.
Sample if Necessary: Shadowing 100% of traffic can overload the new service. Start with a sample rate (1%, 10%) and increase as the service proves stable.

ShadowComparator.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
interface ComparisonResult {
    requestId: string;
    match: boolean;
    differences: Difference[];
    primaryLatencyMs: number;
    shadowLatencyMs: number;
}
 
interface Difference {
    path: string;
    primaryValue: unknown;
    shadowValue: unknown;
    type: 'missing' | 'extra' | 'mismatch';
}
 
class ShadowComparator {
    private ignorePaths: Set<string> = new Set([
        '$.timestamp',
        '$.requestId',
        '$.serverVersion',
        '$.traceId',
    ]);
    
    async compare(
        requestId: string,
        primary: Response,
        shadow: Response,
        primaryLatencyMs: number,
        shadowLatencyMs: number
    ): Promise<ComparisonResult> {
        const differences: Difference[] = [];
        
        // Compare status codes
        if (primary.status !== shadow.status) {
            differences.push({
                path: '$.statusCode',
                primaryValue: primary.status,
                shadowValue: shadow.status,
                type: 'mismatch',
            });
        }
        
        // Compare response bodies
        const primaryBody = await this.normalizeBody(primary);
        const shadowBody = await this.normalizeBody(shadow);
        
        this.compareObjects('$', primaryBody, shadowBody, differences);
        
        const result: ComparisonResult = {
            requestId,
            match: differences.length === 0,
            differences,
            primaryLatencyMs,
            shadowLatencyMs,
        };
        
        // Log discrepancies for investigation
        if (!result.match) {
            await this.logDiscrepancy(result);
        }
        
        // Alert on significant latency differences
        if (shadowLatencyMs > primaryLatencyMs * 2) {
            await this.alertLatencyRegression(requestId, primaryLatencyMs, shadowLatencyMs);
        }
        
        return result;
    }
    
    private compareObjects(
        path: string,
        primary: any,
        shadow: any,
        differences: Difference[]
    ): void {
        // Skip ignored paths
        if (this.ignorePaths.has(path)) return;
        
        // Normalize timestamps for comparison
        if (this.isTimestamp(primary) && this.isTimestamp(shadow)) {
            // Timestamps within 1 second are considered matching
            if (Math.abs(new Date(primary).getTime() - new Date(shadow).getTime()) < 1000) {
                return;
            }
        }
        
        // Handle different types
        if (typeof primary !== typeof shadow) {
            differences.push({
                path,
                primaryValue: primary,
                shadowValue: shadow,
                type: 'mismatch',
            });
            return;
        }
        
        // Handle arrays
        if (Array.isArray(primary) && Array.isArray(shadow)) {
            const maxLength = Math.max(primary.length, shadow.length);
            for (let i = 0; i < maxLength; i++) {
                this.compareObjects(`${path}[${i}]`, primary[i], shadow[i], differences);
            }
            return;
        }
        
        // Handle objects
        if (typeof primary === 'object' && primary !== null) {
            const allKeys = new Set([...Object.keys(primary), ...Object.keys(shadow)]);
            for (const key of allKeys) {
                const newPath = `${path}.${key}`;
                if (!(key in primary)) {
                    differences.push({
                        path: newPath,
                        primaryValue: undefined,
                        shadowValue: shadow[key],
                        type: 'extra',
                    });
                } else if (!(key in shadow)) {
                    differences.push({
                        path: newPath,
                        primaryValue: primary[key],
                        shadowValue: undefined,
                        type: 'missing',
                    });
                } else {
                    this.compareObjects(newPath, primary[key], shadow[key], differences);
                }
            }
            return;
        }
        
        // Handle primitives
        if (primary !== shadow) {
            differences.push({
                path,
                primaryValue: primary,
                shadowValue: shadow,
                type: 'mismatch',
            });
        }
    }
    
    private isTimestamp(value: unknown): boolean {
        if (typeof value !== 'string') return false;
        const date = new Date(value);
        return !isNaN(date.getTime());
    }
    
    private async normalizeBody(response: Response): Promise<any> {
        try {
            return await response.json();
        } catch {
            return await response.text();
        }
    }
    
    private async logDiscrepancy(result: ComparisonResult): Promise<void> {
        // Send to observability platform
        console.log('Shadow comparison discrepancy:', JSON.stringify(result, null, 2));
    }
    
    private async alertLatencyRegression(
        requestId: string,
        primaryMs: number,
        shadowMs: number
    ): Promise<void> {
        console.warn(`Latency regression detected: primary=${primaryMs}ms, shadow=${shadowMs}ms`);
    }
}

Fallback and Circuit Breaking

The Fallback Hierarchy:

Primary: Route to designated backend (new service or monolith based on routing rules)
Fallback: If primary fails, route to fallback (usually monolith)
Degraded: If both fail, return cached response or graceful error
Circuit Open: If failure rate exceeds threshold, skip primary entirely until recovery

Converting Mermaid diagram...

Circuit Breaker Configuration:

Circuit breakers prevent a failing service from being overwhelmed with requests (making recovery harder) and protect your system from wasting resources on doomed requests. Key parameters:

Failure Threshold: Number or percentage of failures before opening (e.g., 50% of last 10 requests)
Success Threshold: Number of successes in half-open state before closing (e.g., 3 consecutive)
Timeout: How long the circuit stays open before testing (e.g., 30 seconds)
Request Volume: Minimum requests before evaluation (e.g., don't judge on first 5 requests)

FallbackRouter.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
interface CircuitBreakerState {
    status: 'closed' | 'open' | 'half-open';
    failures: number;
    successes: number;
    lastFailureTime: number | null;
    nextAttemptTime: number | null;
}
 
interface CircuitBreakerConfig {
    failureThreshold: number;      // Number of failures to open
    successThreshold: number;      // Successes to close from half-open
    timeout: number;               // Ms before half-open transition
    volumeThreshold: number;       // Min requests before evaluation
}
 
class FallbackRouter {
    private circuits: Map<string, CircuitBreakerState> = new Map();
    private config: CircuitBreakerConfig = {
        failureThreshold: 5,
        successThreshold: 3,
        timeout: 30000,
        volumeThreshold: 10,
    };
    
    async route(request: Request, primary: Backend, fallback: Backend): Promise<Response> {
        const circuitId = primary.name;
        const circuit = this.getCircuit(circuitId);
        
        // Check if circuit is open
        if (circuit.status === 'open') {
            if (Date.now() < (circuit.nextAttemptTime || 0)) {
                // Circuit is open, use fallback
                console.log(`Circuit open for ${circuitId}, using fallback`);
                return this.callWithMetrics(fallback, request, 'fallback');
            }
            // Time to try again - transition to half-open
            circuit.status = 'half-open';
            circuit.successes = 0;
        }
        
        try {
            // Attempt primary
            const response = await this.callWithMetrics(primary, request, 'primary');
            
            if (response.ok) {
                this.recordSuccess(circuitId);
                return response;
            } else if (response.status >= 500) {
                // Server error - treat as failure
                this.recordFailure(circuitId);
                return this.callWithMetrics(fallback, request, 'fallback');
            }
            
            // Client error (4xx) - not a circuit failure
            return response;
            
        } catch (error) {
            // Network error, timeout, etc.
            this.recordFailure(circuitId);
            console.error(`Primary failed for ${circuitId}:`, error);
            
            try {
                return await this.callWithMetrics(fallback, request, 'fallback');
            } catch (fallbackError) {
                console.error('Fallback also failed:', fallbackError);
                return new Response(
                    JSON.stringify({ error: 'Service temporarily unavailable' }),
                    { status: 503 }
                );
            }
        }
    }
    
    private getCircuit(id: string): CircuitBreakerState {
        if (!this.circuits.has(id)) {
            this.circuits.set(id, {
                status: 'closed',
                failures: 0,
                successes: 0,
                lastFailureTime: null,
                nextAttemptTime: null,
            });
        }
        return this.circuits.get(id)!;
    }
    
    private recordSuccess(circuitId: string): void {
        const circuit = this.getCircuit(circuitId);
        
        if (circuit.status === 'half-open') {
            circuit.successes++;
            if (circuit.successes >= this.config.successThreshold) {
                // Enough successes - close the circuit
                circuit.status = 'closed';
                circuit.failures = 0;
                console.log(`Circuit closed for ${circuitId}`);
            }
        } else {
            // Reset failure count on success
            circuit.failures = Math.max(0, circuit.failures - 1);
        }
    }
    
    private recordFailure(circuitId: string): void {
        const circuit = this.getCircuit(circuitId);
        
        if (circuit.status === 'half-open') {
            // Failed during probe - reopen circuit
            circuit.status = 'open';
            circuit.nextAttemptTime = Date.now() + this.config.timeout;
            console.log(`Circuit reopened for ${circuitId}`);
        } else {
            circuit.failures++;
            circuit.lastFailureTime = Date.now();
            
            if (circuit.failures >= this.config.failureThreshold) {
                // Threshold exceeded - open circuit
                circuit.status = 'open';
                circuit.nextAttemptTime = Date.now() + this.config.timeout;
                console.log(`Circuit opened for ${circuitId} after ${circuit.failures} failures`);
            }
        }
    }
    
    private async callWithMetrics(
        backend: Backend,
        request: Request,
        role: 'primary' | 'fallback'
    ): Promise<Response> {
        const start = Date.now();
        try {
            const response = await backend.handle(request);
            const duration = Date.now() - start;
            
            // Emit metrics
            this.emitMetric('request_duration_ms', duration, {
                backend: backend.name,
                role,
                status: response.status.toString(),
            });
            
            return response;
        } catch (error) {
            const duration = Date.now() - start;
            this.emitMetric('request_duration_ms', duration, {
                backend: backend.name,
                role,
                status: 'error',
            });
            throw error;
        }
    }
    
    private emitMetric(name: string, value: number, labels: Record<string, string>): void {
        // Integration with your metrics system (Prometheus, DataDog, etc.)
        console.log(`Metric: ${name}=${value}`, labels);
    }
}
 
interface Backend {
    name: string;
    handle(request: Request): Promise<Response>;
}

Fallback Capacity Planning

Session and State Management

Common Stateful Challenges:

Session Storage: User sessions may be stored in the monolith's local memory or database. New services need access.
Shopping Cart State: During checkout migration, cart state might be split between old and new systems.
Multi-Step Flows: Wizards, checkout processes, and other multi-step operations may span both backends.
Authentication Tokens: JWTs or session cookies need to be valid across all backends.
Rate Limiting State: Request counts for rate limiting must be consistent across backends.

Anti-Patterns

•Storing session in backend-local memory
•Depending on sticky sessions for correctness (not just optimization)
•Duplicating state across backends without synchronization
•Assuming cookies set by one backend will be sent to another
•Hard-coding backend-specific session handling

Best Practices

•Externalize session to shared store (Redis, DynamoDB)
•Use JWT with claims, avoiding server-side session lookup
•Implement session-awareness in the façade layer
•Version session schemas for backward compatibility
•Test multi-backend session flows explicitly

Sticky Sessions: Use with Caution

Sticky sessions (routing all requests from the same user to the same backend) seem like an easy solution. Indeed, they can help during migration by ensuring consistency. However:

They obscure problems: If a user only ever hits one backend, you won't discover incompatibilities until you disable stickiness
They complicate rollout: You can't do percentage-based traffic shifting per-user
They create failure scenarios: If a backend crashes, stuck users experience errors until the sticky cookie expires

The Externalized State Principle

Observability and Debugging

Required Observability Capabilities

•Correlation IDs: Every request gets a unique ID at the façade, propagated to all backends. Enables tracing the complete request path across systems.
•Backend Tagging: Metrics and logs tagged with which backend handled the request. Allows comparing error rates, latencies, and behaviors between backends.
•Routing Decision Logging: Log why each request was routed where. Debug configuration issues and verify routing rules work as expected.
•Comparative Dashboards: Side-by-side views of monolith vs. microservice metrics for migrated endpoints. Spot differences before users notice.
•Fallback Alerting: Alert when fallback is triggered, especially for extended periods. Don't silently limp along on the monolith.
•Traffic Split Visualization: See actual traffic distribution across backends. Verify that configured percentages match reality.

Essential Headers for Debugging:

Inject these headers at the façade for every response, enabling rapid debugging:

X-Backend: user-service          # Which backend handled this
X-Backend-Version: 1.2.3          # Backend's version
X-Correlation-ID: abc-123         # Request trace ID
X-Route-Reason: percentage-split  # Why this routing was chosen
X-Fallback-Used: false            # Whether fallback was triggered

These headers let any engineer quickly determine which system handled a request, invaluable when investigating discrepancies or debugging production issues.

The Comparative Dashboard

Summary: The Routing Façade Mastery

Key Takeaways

•The façade enables transparent substitution — Clients should be unaware of which backend serves their request. API contracts, latency, and error handling must be consistent.
•Multiple implementation approaches exist — API gateways, reverse proxies, service meshes, and in-monolith routers each have tradeoffs. Choose based on your context.
•Routing strategies can be composed — Path-based, header-based, percentage-based, and user-based routing combine for fine-grained traffic control.
•Shadow traffic validates before rollout — Mirror production traffic to new services, compare responses, and fix discrepancies before users see them.
•Fallback and circuit breaking provide safety — When new services fail, gracefully route to fallback. Circuit breakers prevent cascading failures.
•Externalize stateful dependencies — Sessions, caches, and rate-limiting state must be accessible from all backends for correct routing.
•Observability is non-negotiable — Correlation IDs, backend tagging, and comparative dashboards are essential for debugging distributed routing.

What's Next:

Page Complete