System Design (HLD)Service Mesh

Service Mesh: Network Infrastructure for Microservices

LevelAdvanced

Duration90 mins

TopicService Mesh

4 / 5

Traffic Management

Controlling Traffic in Distributed Systems

In the pre-mesh era, routing decisions happened at the application level or through central load balancers with limited context. Traffic splitting for canary deployments required custom infrastructure. Retries were scattered across client code with inconsistent implementation. Circuit breaking, when implemented at all, lived in libraries that needed per-language support.

Service mesh transforms traffic management from a fragmented responsibility into a unified, declarative capability. The mesh knows about every service, every endpoint, and every request. It can make intelligent routing decisions, implement sophisticated deployment strategies, and enforce resilience policies—all without changing application code.

This is arguably the most transformative capability of service mesh: making traffic behavior a configurable resource rather than an application development concern.

Learning Objectives

By the end of this page, you will understand how service meshes implement request routing and traffic splitting, load balancing algorithms and their trade-offs, retry policies and timeout configuration, circuit breaking for resilience, advanced deployment patterns (canary, blue-green, A/B testing), and traffic mirroring for safe validation.

Request Routing Fundamentals

At its core, service mesh traffic management answers a simple question: when a request arrives, where should it go? The answer depends on rich context the mesh can inspect:

Request headers, paths, methods
Source service identity
Endpoint health and load
Version labels and metadata
Geographic proximity
Configuration policies

The Routing Decision Pipeline:

When a sidecar proxy receives an outbound request (e.g., http://product-service:8080/api/products), it proceeds through a multi-stage routing decision:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
┌─────────────────────────────────────────────────────────────────────────────┐
│                      ROUTING DECISION PIPELINE                              │
│                                                                             │
│  Outbound Request: GET http://product-service:8080/api/products/123         │
│                                                                             │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  STEP 1: Service Discovery                                            │ │
│  │  ────────────────────────                                              │ │
│  │  → Resolve "product-service" to available endpoints                    │ │
│  │  → Query service registry (Kubernetes endpoints, Consul catalog)      │ │
│  │  → Result: [10.1.2.3:8080, 10.1.2.4:8080, 10.1.2.5:8080]              │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                           │                                                 │
│                           ▼                                                 │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  STEP 2: Route Matching (VirtualService / ServiceRouter)              │ │
│  │  ─────────────────────────────────────────────────────                 │ │
│  │  → Does any route rule match this request?                             │ │
│  │     - Match by path: /api/products/*                                   │ │
│  │     - Match by headers: x-user-type: premium                           │ │
│  │     - Match by method: GET                                             │ │
│  │  → If matched, apply route-specific destination/weights               │ │
│  │  → Result: Route to subset "v2" with 10% weight                        │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                           │                                                 │
│                           ▼                                                 │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  STEP 3: Subset Selection (DestinationRule)                           │ │
│  │  ────────────────────────────────────────                              │ │
│  │  → Filter endpoints by subset labels (e.g., version=v2)               │ │
│  │  → Apply subset-specific traffic policies                              │ │
│  │  → Result: [10.1.2.5:8080] (only v2 endpoints)                        │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                           │                                                 │
│                           ▼                                                 │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  STEP 4: Load Balancing                                               │ │
│  │  ─────────────────────                                                 │ │
│  │  → Apply load balancing algorithm (round-robin, least-conn, etc.)     │ │
│  │  → Consider endpoint health, locality, weight                          │ │
│  │  → Result: Select 10.1.2.5:8080                                        │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                           │                                                 │
│                           ▼                                                 │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  STEP 5: Connection & Policy Application                              │ │
│  │  ────────────────────────────────────                                  │ │
│  │  → Establish mTLS connection to destination proxy                      │ │
│  │  → Apply timeout, retry policies                                       │ │
│  │  → Forward request with telemetry                                      │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Key Routing Abstractions:

Different meshes use different abstractions, but the concepts map:

Concept	Istio	Linkerd	Consul Connect
Route rules	VirtualService	HTTPRoute (Gateway API)	ServiceRouter
Destination config	DestinationRule	ServiceProfile	ServiceDefaults
Service subsets	subset (labels)	TrafficSplit	ServiceResolver
Retry/timeout	Per-route in VS	ServiceProfile	ServiceDefaults

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: product-service-routing
  namespace: ecommerce
spec:
  hosts:
    - product-service                    # Internal service name
    - products.example.com               # External hostname (if exposed)
  
  http:
    # Route 1: A/B test - premium users get new experience
    - name: "premium-users-new-ui"
      match:
        - headers:
            x-user-tier:
              exact: "premium"
            x-feature-flags:
              regex: ".*new-ui.*"
      route:
        - destination:
            host: product-service
            subset: v2-experimental
          weight: 100
      timeout: 5s
      retries:
        attempts: 2
        perTryTimeout: 2s
    
    # Route 2: Path-based routing for specific features
    - name: "recommendations-api"
      match:
        - uri:
            prefix: "/api/v2/recommendations"
      route:
        - destination:
            host: recommendation-service  # Route to different service
            port:
              number: 8080
      timeout: 10s   # Recommendations can be slower
    
    # Route 3: Canary deployment - gradual rollout
    - name: "canary-rollout"
      route:
        - destination:
            host: product-service
            subset: v1-stable
          weight: 90
        - destination:
            host: product-service
            subset: v2-canary
          weight: 10
      retries:
        attempts: 3
        perTryTimeout: 1s
        retryOn: "5xx,reset,connect-failure,retriable-4xx"
      timeout: 3s
      
      # Fault injection for testing (comment out in prod)
      # fault:
      #   delay:
      #     percentage:
      #       value: 1
      #     fixedDelay: 5s
      #   abort:
      #     percentage:
      #       value: 0.1
      #     httpStatus: 500

Route Ordering Matters

Routes are evaluated in order—first match wins. Place more specific routes (with header/path matches) before generic catch-all routes. A common mistake is placing a generic route first, which matches all requests and ignores subsequent specific rules.

Traffic Splitting and Canary Deployments

One of the most powerful traffic management capabilities is traffic splitting—directing percentages of traffic to different versions of a service. This enables deployment strategies that were previously complex and risky:

Canary Deployments:

Deploy a new version alongside the existing one, sending a small percentage of traffic (1-5%) to the canary. Monitor error rates, latency, and business metrics. If the canary performs well, gradually increase traffic percentage. If problems occur, immediately route 100% to stable.

Blue-Green Deployments:

Maintain two complete environments (blue and green). Deploy new version to the inactive environment, validate, then switch traffic instantly from blue to green (or vice versa). The mesh enables instant traffic cutover without DNS propagation delays.

A/B Testing:

Route different user segments to different versions based on headers, cookies, or user attributes. Measure conversion rates, engagement, or other business metrics per version.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
                CANARY DEPLOYMENT TIMELINE
                
Day 0: Deploy v2 canary (no traffic)
────────────────────────────────────────────────────────────────
  v1 ████████████████████████████████████████████████████ 100%
  v2                                                       0%
 
Day 1: Initial canary (1% traffic)
────────────────────────────────────────────────────────────────
  v1 ███████████████████████████████████████████████████  99%
  v2 █                                                     1%
  📊 Monitor: Error rate 0.1%, P99 latency 45ms (same as v1)
 
Day 2: Increased canary (5% traffic)
────────────────────────────────────────────────────────────────
  v1 █████████████████████████████████████████████████    95%
  v2 ███                                                   5%
  📊 Monitor: Error rate 0.1%, P99 latency 42ms (better!)
 
Day 3: Expanded canary (25% traffic)
────────────────────────────────────────────────────────────────
  v1 █████████████████████████████████████████            75%
  v2 █████████████                                        25%
  📊 Monitor: All metrics nominal
 
Day 4: Majority canary (50% traffic)
────────────────────────────────────────────────────────────────
  v1 ██████████████████████████                           50%
  v2 ██████████████████████████                           50%
  📊 Monitor: v2 showing 10% better conversion
 
Day 5: Full rollout (100% traffic to v2)
────────────────────────────────────────────────────────────────
  v1 (standby for rollback if needed)                      0%
  v2 ████████████████████████████████████████████████████ 100%
  🎉 Canary complete - v2 is now stable

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# DestinationRule: Define service subsets
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: product-service
  namespace: ecommerce
spec:
  host: product-service
  subsets:
    - name: stable
      labels:
        version: v1
    - name: canary
      labels:
        version: v2
 
---
# VirtualService: Day 1 - 1% canary
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: product-service
  namespace: ecommerce
spec:
  hosts:
    - product-service
  http:
    - route:
        - destination:
            host: product-service
            subset: stable
          weight: 99
        - destination:
            host: product-service
            subset: canary
          weight: 1
 
---
# Update weights as canary progresses...
# Day 2: weight: 95/5
# Day 3: weight: 75/25
# Day 4: weight: 50/50
# Day 5: weight: 0/100
 
---
# Automated progressive delivery with Flagger (GitOps approach)
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: product-service
  namespace: ecommerce
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: product-service
  
  progressDeadlineSeconds: 600
  
  service:
    port: 8080
    targetPort: 8080
  
  analysis:
    # Canary analysis interval
    interval: 1m
    
    # Max number of failed checks before rollback
    threshold: 5
    
    # Max traffic percentage routed to canary
    maxWeight: 50
    
    # Canary increment step
    stepWeight: 10
    
    # Prometheus metrics to analyze
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m
      - name: request-duration
        thresholdRange:
          max: 500  # milliseconds
        interval: 1m

Sticky Sessions and Canary Deployments

Traffic splitting is typically per-request, not per-user. A user might hit v1 on one request and v2 on the next. For consistent user experience during A/B tests, use header-based routing with session affinity or hash-based load balancing on user ID. This ensures the same user consistently reaches the same version.

Load Balancing Algorithms

Once the mesh selects which service and subset to route to, it must choose among multiple endpoints (pod instances). Load balancing algorithms determine this selection, each with distinct characteristics suitable for different workloads.

Load Balancing Algorithms Comparison
Algorithm	How It Works	Best For	Limitations
Round Robin	Cycles through endpoints sequentially	Homogeneous endpoints, uniform request cost	Ignores endpoint load; slow endpoints back up requests
Least Connections	Routes to endpoint with fewest active connections	Variable request duration workloads	May not account for in-progress request completion
Random	Randomly selects endpoint	Simple, no state needed	Can create uneven distribution short-term
Weighted	Routes proportionally to configured weights	Heterogeneous capacity endpoints	Requires manual weight configuration
Consistent Hash	Hashes request attribute to select endpoint	Session stickiness, caching locality	Rebalancing on endpoint changes
Locality-Aware	Prefers endpoints in same zone/region	Multi-zone deployments, latency reduction	May reduce effective capacity
Least Request	Routes to endpoint with fewest pending requests	High-throughput scenarios	Requires request counting state

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# Istio DestinationRule: Load Balancing Configuration
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: product-service
  namespace: ecommerce
spec:
  host: product-service
  
  trafficPolicy:
    # Global load balancing policy
    loadBalancer:
      # Simple algorithm selection
      simple: LEAST_REQUEST
      
      # OR: Consistent hash for session affinity
      # consistentHash:
      #   httpHeaderName: x-user-id  # Hash on user ID header
      #   # OR
      #   httpCookie:
      #     name: SERVERID
      #     ttl: 3600s
      #   # OR
      #   useSourceIp: true
    
    # Locality-aware load balancing
    localityLbSetting:
      enabled: true
      # Failover order when local zone endpoints unavailable
      failover:
        - from: us-east-1a
          to: us-east-1b
        - from: us-east-1b
          to: us-east-1c
      # Distribute within region before cross-region
      distribute:
        - from: "us-east-1/*"
          to:
            "us-east-1/*": 80      # 80% to same region
            "us-west-2/*": 20      # 20% to other region (DR readiness)
  
  subsets:
    - name: v1
      labels:
        version: v1
      trafficPolicy:
        # Subset-specific load balancing override
        loadBalancer:
          simple: ROUND_ROBIN
    
    - name: v2
      labels:
        version: v2
      trafficPolicy:
        loadBalancer:
          simple: LEAST_CONN
 
---
# Consul Connect: Load Balancing Configuration
Kind = "service-resolver"
Name = "product-service"
 
# Locality-aware routing
Redirect {
  Service    = "product-service"
  Datacenter = "dc1"
}
 
# Load balancing policy
LoadBalancer {
  Policy = "least_request"
  
  # Ring hash for consistent hashing
  # RingHashConfig {
  #   MinimumRingSize = 1024
  # }
  
  # Least request config
  LeastRequestConfig {
    ChoiceCount = 2  # Power of two random choices
  }
}

Locality-Aware Load Balancing:

In multi-zone or multi-region deployments, locality-aware load balancing is critical for latency and cost optimization:

Zone Affinity: Prefer endpoints in the same availability zone. Cross-zone traffic incurs latency and often cost.
Weighted Distribution: Configure what percentage of traffic should stay local vs. distribute for resilience.
Failover Ordering: Define where traffic should go when local endpoints are unhealthy.
Priority Levels: Istio supports priority levels where higher-priority zones are preferred until their endpoints are saturated.

The Power of Two Random Choices

The 'LEAST_REQUEST' algorithm in Envoy often uses the 'Power of Two Random Choices' technique: pick two endpoints randomly, route to the one with fewer pending requests. This achieves near-optimal load distribution with minimal coordination overhead. It's the recommended default for most workloads.

Retries and Timeouts

Distributed systems experience transient failures—network glitches, momentary overload, rolling updates. Without automatic retries, every such failure propagates to users. Without timeouts, slow downstream services cause cascading resource exhaustion.

Service mesh provides consistent retry and timeout policies at the infrastructure level.

Retry Configuration Principles:

Retry Policy Design Considerations

•Idempotency: Only retry operations that are safe to repeat. GET requests are generally idempotent; POST requests creating resources may not be. Configure retryOn conditions carefully.
•Retry Budget: Limit total retries to prevent retry storms. If a service is failing, unlimited retries amplify load and delay recovery. Linkerd's retry budgets are particularly elegant.
•Backoff Strategy: Use exponential backoff with jitter to prevent thundering herd when service recovers. Immediate retries can overwhelm recovering services.
•Per-Try Timeout: Each retry attempt should have its own timeout, not just total request timeout. This prevents a single slow try from consuming all retry budget.
•Deadline Propagation: Ensure downstream services respect remaining deadline. If 3 seconds remain, don't wait 10 seconds inside a downstream call.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# Istio VirtualService: Retry and Timeout Configuration
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: product-service
  namespace: ecommerce
spec:
  hosts:
    - product-service
  http:
    - route:
        - destination:
            host: product-service
      
      # Overall request timeout (including all retries)
      timeout: 10s
      
      retries:
        # Maximum retry attempts (excludes initial request)
        attempts: 3
        
        # Timeout per retry attempt
        perTryTimeout: 2s
        
        # Conditions that trigger retry
        retryOn: >-
          5xx,
          reset,
          connect-failure,
          retriable-4xx,
          refused-stream,
          retriable-status-codes,
          retriable-headers
        
        # Only retry idempotent methods
        retryRemoteLocalities: true
 
---
# Linkerd ServiceProfile: Retry with Budget
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: product-service.ecommerce.svc.cluster.local
  namespace: ecommerce
spec:
  routes:
    - name: GET /products
      condition:
        method: GET
        pathRegex: /products.*
      isRetryable: true   # Mark this route as safe to retry
      timeout: 5s
    
    - name: POST /products
      condition:
        method: POST
        pathRegex: /products
      isRetryable: false  # POST creates resources - don't retry
      timeout: 10s
  
  # Retry budget prevents retry amplification
  retryBudget:
    # Max ratio of retries to original requests
    retryRatio: 0.2       # At most 20% of traffic can be retries
    
    # Minimum retries per second (regardless of ratio)
    minRetriesPerSecond: 10
    
    # How long to consider past requests for budget
    ttl: 10s
 
---
# Consul Connect: Retry Configuration
Kind = "service-defaults"
Name = "product-service"
Protocol = "http"
 
UpstreamConfig {
  Defaults {
    ConnectTimeoutMs = 5000   # Connection establishment timeout
    
    Limits {
      MaxConnections = 100
    }
  }
}
 
# Note: Consul relies on Envoy's retry configuration
# Configure via proxy-defaults or service-router

Timeout Strategy:

Timeouts must be carefully coordinated across call chains. Consider a three-service chain: A → B → C.

If A has 10s timeout, B has 10s timeout, and C has 10s timeout...
A calls B, which calls C. If C is slow, B waits 10s. But A also waits 10s.
B's call to C might succeed at 9s, but A has already timed out.

Solution: Cascading Timeouts

A's timeout to B: 10s
B's timeout to C: 8s (leaves buffer for B's processing)
C's overall timeout: 5s

This ensures timeouts cascade inward, preventing wasted work downstream when upstream has already given up.

The Retry Amplification Danger

Without retry budgets, retries can cause cascading failure. If Service A calls B with 3 retries, and B calls C with 3 retries, one failed C request can generate up to 9 downstream requests. Across multiple callers, this amplification can overwhelm already-stressed services. Linkerd's retry budgets and Istio's outlier detection help contain this danger.

Circuit Breaking

Circuit breaking is a resilience pattern that prevents cascading failures by failing fast when a downstream service is unhealthy. Rather than sending requests to a failing service (which wastes resources and delays failure response), the circuit "opens" and immediately returns errors.

Circuit States:

Closed (Normal): Requests flow normally. The circuit monitors for failures.
Open (Tripped): Too many failures detected. New requests fail immediately without attempting the downstream call.
Half-Open (Testing): After a cooldown period, circuit allows a limited number of test requests. If they succeed, circuit closes. If they fail, circuit reopens.

Mesh Circuit Breaking:

Service mesh implements circuit breaking through outlier detection—identifying and ejecting unhealthy endpoints from the load balancer pool. When an endpoint has too many consecutive failures, it's temporarily removed. Traffic routes only to healthy endpoints.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
┌─────────────────────────────────────────────────────────────────────────┐
│                     CIRCUIT BREAKER STATE MACHINE                       │
│                                                                         │
│                                                                         │
│  ┌───────────────────┐                                                  │
│  │      CLOSED       │◄──────────────────────────────────┐              │
│  │                   │                                   │              │
│  │  Normal operation │          Success on test          │              │
│  │  Requests flow    │          requests                 │              │
│  │  through          │                                   │              │
│  └────────┬──────────┘                                   │              │
│           │                                              │              │
│           │  Failure threshold                           │              │
│           │  exceeded (e.g., 5                           │              │
│           │  consecutive 5xx)                            │              │
│           │                                   ┌──────────┴──────────┐   │
│           │                                   │                     │   │
│           ▼                                   │     HALF-OPEN       │   │
│  ┌───────────────────┐                        │                     │   │
│  │       OPEN        │                        │  Testing phase      │   │
│  │                   │  Timeout/cooldown      │  Allow limited      │   │
│  │  Fail immediately │  period expires        │  test requests      │   │
│  │  No downstream    │ ──────────────────────►│                     │   │
│  │  requests sent    │                        │                     │   │
│  │                   │                        └──────────┬──────────┘   │
│  └───────────────────┘                                   │              │
│                                                          │              │
│                         Failures on test                 │              │
│                         requests                         │              │
│                              ◄───────────────────────────┘              │
│                              │                                          │
│                              ▼                                          │
│                       Return to OPEN                                    │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# Istio DestinationRule: Outlier Detection (Circuit Breaking)
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: product-service
  namespace: ecommerce
spec:
  host: product-service
  
  trafficPolicy:
    # Connection pool limits (prevent resource exhaustion)
    connectionPool:
      tcp:
        maxConnections: 100          # Max TCP connections to service
        connectTimeout: 5s           # Connection establishment timeout
      http:
        h2UpgradePolicy: UPGRADE     # Use HTTP/2 when possible
        http1MaxPendingRequests: 100 # Max pending HTTP/1.1 requests
        http2MaxRequests: 1000       # Max concurrent HTTP/2 requests
        maxRequestsPerConnection: 100
        maxRetries: 3                # Max concurrent retries
    
    # Outlier detection (circuit breaking logic)
    outlierDetection:
      # How often to analyze endpoints for ejection
      interval: 10s
      
      # Consecutive errors before ejection
      consecutive5xxErrors: 5
      
      # Or: failure percentage threshold
      # consecutiveLocalOriginFailures: 5
      
      # Consecutive gateway errors (502, 503, 504)
      consecutiveGatewayErrors: 5
      
      # How long endpoint stays ejected
      baseEjectionTime: 30s
      
      # Maximum percentage of pool that can be ejected
      maxEjectionPercent: 50
      
      # Min healthy percentage required (prevent ejecting everyone)
      minHealthPercent: 20
      
      # Time between ejection analysis intervals
      splitExternalLocalOriginErrors: true
 
---
# Consul Connect: Circuit Breaking via Service Defaults
Kind = "service-defaults"
Name = "product-service"
Protocol = "http"
 
UpstreamConfig {
  Defaults {
    PassiveHealthCheck {
      # Check interval
      Interval = "10s"
      
      # Consecutive failures for ejection
      MaxFailures = 5
      
      # What percentage triggers ejection
      EnforcingConsecutive5xx = 100
    }
    
    Limits {
      # Connection limits
      MaxConnections        = 100
      MaxPendingRequests    = 100
      MaxConcurrentRequests = 100
    }
  }
}

maxEjectionPercent: The Safety Valve

maxEjectionPercent prevents the circuit breaker from ejecting all endpoints during widespread failure. If set to 50%, at least half the endpoints always remain in the pool—even if technically unhealthy. This ensures service availability (with degraded quality) rather than complete outage. Balance this against minHealthPercent for your availability requirements.

Traffic Mirroring (Shadow Traffic)

Traffic mirroring—also called shadowing—sends copies of live production traffic to a non-production service for testing. The mirrored service's responses are ignored; only the primary service's response returns to the caller.

Use Cases:

Pre-production validation: Test a new version with real traffic patterns without user impact.
Performance benchmarking: Compare latency and resource consumption between versions under identical load.
Bug reproduction: Capture production traffic patterns that expose issues difficult to reproduce synthetically.
ML model validation: Validate new recommendation or fraud detection models against production traffic without affecting users.

┌─────────────────────────────────────────────────────────────────────────┐
│                      TRAFFIC MIRRORING FLOW                             │
│                                                                         │
│    Client Request                                                       │
│         │                                                               │
│         ▼                                                               │
│  ┌──────────────┐                                                       │
│  │   Sidecar    │                                                       │
│  │   Proxy      │                                                       │
│  └──────┬───────┘                                                       │
│         │                                                               │
│         ├─────────────────────────────────────────────┐                 │
│         │                                             │                 │
│         │ Primary Request                             │ Mirrored Copy   │
│         │ (synchronous, response                      │ (fire-and-forget│
│         │  returned to client)                        │  response ignored│
│         │                                             │                 │
│         ▼                                             ▼                 │
│  ┌──────────────────────┐                    ┌──────────────────────┐   │
│  │   Primary Service    │                    │   Shadow Service     │   │
│  │   (v1 - Stable)      │                    │   (v2 - Testing)     │   │
│  │                      │                    │                      │   │
│  │   Handles actual     │                    │   Processes request  │   │
│  │   client request     │                    │   but response is    │   │
│  │                      │                    │   discarded          │   │
│  └──────────┬───────────┘                    └──────────────────────┘   │
│             │                                                           │
│             │ Response returned to client                               │
│             │                                                           │
│             ▼                                                           │
│       Client receives                                                   │
│       response from v1                                                  │
│       only                                                              │
│                                                                         │
│  📊 Meanwhile: Compare v1 vs v2 metrics, logs, behaviors               │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# Istio VirtualService: Traffic Mirroring
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: product-service
  namespace: ecommerce
spec:
  hosts:
    - product-service
  http:
    - route:
        - destination:
            host: product-service
            subset: v1
          weight: 100
      
      # Mirror 100% of traffic to v2
      mirror:
        host: product-service
        subset: v2
      
      # Percentage of traffic to mirror (optional, defaults to 100)
      mirrorPercentage:
        value: 100.0
      
      # Note: mirrored requests have header "-shadow" appended to Host
 
---
# With sampling for high-volume services
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: high-volume-service
  namespace: production
spec:
  hosts:
    - high-volume-service
  http:
    - route:
        - destination:
            host: high-volume-service
            subset: stable
      
      # Mirror only 10% of traffic to avoid overwhelming shadow
      mirror:
        host: high-volume-service
        subset: shadow
      mirrorPercentage:
        value: 10.0
 
---
# The shadow service deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: product-service-shadow
  namespace: ecommerce
spec:
  replicas: 2  # Smaller than production (no real traffic served)
  selector:
    matchLabels:
      app: product-service
      version: v2-shadow
  template:
    metadata:
      labels:
        app: product-service
        version: v2-shadow
    spec:
      containers:
        - name: product-service
          image: product-service:v2
          resources:
            # Can use fewer resources since responses not needed
            requests:
              memory: "256Mi"
              cpu: "100m"

Mirroring Side Effects

Mirrored requests can have unintended side effects! If your service writes to databases, sends emails, or calls external APIs, the shadow service will duplicate these operations. Ensure shadow services use isolated datastores, mock external dependencies, or have safeguards to prevent duplicate side effects.

Fault Injection for Testing Resilience

How do you know your retry logic works? That your circuit breakers trip correctly? That your timeouts are configured properly? You test with intentional faults.

Service mesh enables fault injection—deliberately introducing failures to validate system resilience. Rather than waiting for production incidents, you create controlled failure scenarios.

Types of Fault Injection:

Delay Injection

•Adds artificial latency to requests
•Tests timeout handling
•Validates cascading timeout behavior
•Exposes performance degradation patterns
•Useful for chaos engineering exercises

Abort Injection

•Returns error responses without calling backend
•Tests error handling code paths
•Validates retry and circuit breaker logic
•Simulates complete service failure
•Exposes failure mode behaviors

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# Istio VirtualService: Fault Injection for Testing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: payment-service-chaos
  namespace: ecommerce
spec:
  hosts:
    - payment-service
  http:
    # Inject faults only for test traffic (identified by header)
    - match:
        - headers:
            x-chaos-test:
              exact: "enabled"
      route:
        - destination:
            host: payment-service
      fault:
        # Delay 50% of requests by 5 seconds
        delay:
          percentage:
            value: 50
          fixedDelay: 5s
        
        # Abort 10% of requests with 503
        abort:
          percentage:
            value: 10
          httpStatus: 503
    
    # Normal traffic (no fault injection)
    - route:
        - destination:
            host: payment-service
 
---
# Targeted fault injection: Only affect specific paths
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: targeted-chaos
  namespace: ecommerce
spec:
  hosts:
    - inventory-service
  http:
    # Simulate slow database queries on specific endpoint
    - match:
        - uri:
            prefix: "/api/inventory/check"
          headers:
            x-load-test:
              exact: "true"
      route:
        - destination:
            host: inventory-service
      fault:
        delay:
          percentage:
            value: 100
          fixedDelay: 3s   # Simulate slow inventory check
    
    # All other requests normal
    - route:
        - destination:
            host: inventory-service
 
---
# Chaos engineering: Random failures across service
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: chaos-engineering
  namespace: staging  # Only in staging environment!
spec:
  hosts:
    - "*"  # All services in namespace
  http:
    - route:
        - destination:
            host: $HOST  # Dynamic per-request
      fault:
        # 1% of all requests fail randomly
        abort:
          percentage:
            value: 1
          httpStatus: 500
        
        # 5% of all requests delayed
        delay:
          percentage:
            value: 5
          fixedDelay: 2s

Never in Production (Usually)

Fault injection configurations should be applied carefully. Use header-based matching to limit faults to test traffic. Apply to staging/test environments first. If running chaos experiments in production (advanced practice), use extremely low percentages and have immediate rollback capability. Always monitor impact during chaos experiments.

Summary: Traffic Management

We've explored the full spectrum of service mesh traffic management capabilities. These features transform traffic behavior from hardcoded application logic into configurable infrastructure policy.

Key Takeaways

•Request routing is multi-stage — Service discovery, route matching, subset selection, and load balancing compose to determine where requests go.
•Traffic splitting enables safe deployments — Canary releases, blue-green deployments, and A/B testing become configuration changes, not code changes.
•Load balancing algorithms matter — Choose based on workload characteristics. Locality awareness reduces latency and cost in multi-zone deployments.
•Retries need budgets — Unlimited retries cause amplification storms. Use retry budgets and proper backoff to prevent cascading failures.
•Circuit breaking prevents cascade — Outlier detection ejects failing endpoints, failing fast rather than overwhelming struggling services.
•Traffic mirroring validates safely — Test new versions with production traffic without user impact. Be careful about side effects.
•Fault injection proves resilience — Intentionally break things in controlled ways to validate your system handles failures correctly.

What's Next:

The final page examines observability through mesh—how service mesh provides metrics, distributed tracing, and service graphs that illuminate the behavior of distributed systems without application instrumentation.

Page Complete

You now understand the comprehensive traffic management capabilities of service mesh—from basic routing to advanced deployment strategies, from retry policies to chaos engineering. These tools enable operational confidence in complex distributed systems.

4 / 5

Loading learning content...

System Design (HLD)Service Mesh

Service Mesh: Network Infrastructure for Microservices

LevelAdvanced

Duration90 mins

TopicService Mesh

4 / 5

Traffic Management

Controlling Traffic in Distributed Systems

This is arguably the most transformative capability of service mesh: making traffic behavior a configurable resource rather than an application development concern.

Learning Objectives

Request Routing Fundamentals

At its core, service mesh traffic management answers a simple question: when a request arrives, where should it go? The answer depends on rich context the mesh can inspect:

Request headers, paths, methods
Source service identity
Endpoint health and load
Version labels and metadata
Geographic proximity
Configuration policies

The Routing Decision Pipeline:

When a sidecar proxy receives an outbound request (e.g., http://product-service:8080/api/products), it proceeds through a multi-stage routing decision:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
┌─────────────────────────────────────────────────────────────────────────────┐
│                      ROUTING DECISION PIPELINE                              │
│                                                                             │
│  Outbound Request: GET http://product-service:8080/api/products/123         │
│                                                                             │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  STEP 1: Service Discovery                                            │ │
│  │  ────────────────────────                                              │ │
│  │  → Resolve "product-service" to available endpoints                    │ │
│  │  → Query service registry (Kubernetes endpoints, Consul catalog)      │ │
│  │  → Result: [10.1.2.3:8080, 10.1.2.4:8080, 10.1.2.5:8080]              │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                           │                                                 │
│                           ▼                                                 │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  STEP 2: Route Matching (VirtualService / ServiceRouter)              │ │
│  │  ─────────────────────────────────────────────────────                 │ │
│  │  → Does any route rule match this request?                             │ │
│  │     - Match by path: /api/products/*                                   │ │
│  │     - Match by headers: x-user-type: premium                           │ │
│  │     - Match by method: GET                                             │ │
│  │  → If matched, apply route-specific destination/weights               │ │
│  │  → Result: Route to subset "v2" with 10% weight                        │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                           │                                                 │
│                           ▼                                                 │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  STEP 3: Subset Selection (DestinationRule)                           │ │
│  │  ────────────────────────────────────────                              │ │
│  │  → Filter endpoints by subset labels (e.g., version=v2)               │ │
│  │  → Apply subset-specific traffic policies                              │ │
│  │  → Result: [10.1.2.5:8080] (only v2 endpoints)                        │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                           │                                                 │
│                           ▼                                                 │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  STEP 4: Load Balancing                                               │ │
│  │  ─────────────────────                                                 │ │
│  │  → Apply load balancing algorithm (round-robin, least-conn, etc.)     │ │
│  │  → Consider endpoint health, locality, weight                          │ │
│  │  → Result: Select 10.1.2.5:8080                                        │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                           │                                                 │
│                           ▼                                                 │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  STEP 5: Connection & Policy Application                              │ │
│  │  ────────────────────────────────────                                  │ │
│  │  → Establish mTLS connection to destination proxy                      │ │
│  │  → Apply timeout, retry policies                                       │ │
│  │  → Forward request with telemetry                                      │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Key Routing Abstractions:

Different meshes use different abstractions, but the concepts map:

Concept	Istio	Linkerd	Consul Connect
Route rules	VirtualService	HTTPRoute (Gateway API)	ServiceRouter
Destination config	DestinationRule	ServiceProfile	ServiceDefaults
Service subsets	subset (labels)	TrafficSplit	ServiceResolver
Retry/timeout	Per-route in VS	ServiceProfile	ServiceDefaults

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: product-service-routing
  namespace: ecommerce
spec:
  hosts:
    - product-service                    # Internal service name
    - products.example.com               # External hostname (if exposed)
  
  http:
    # Route 1: A/B test - premium users get new experience
    - name: "premium-users-new-ui"
      match:
        - headers:
            x-user-tier:
              exact: "premium"
            x-feature-flags:
              regex: ".*new-ui.*"
      route:
        - destination:
            host: product-service
            subset: v2-experimental
          weight: 100
      timeout: 5s
      retries:
        attempts: 2
        perTryTimeout: 2s
    
    # Route 2: Path-based routing for specific features
    - name: "recommendations-api"
      match:
        - uri:
            prefix: "/api/v2/recommendations"
      route:
        - destination:
            host: recommendation-service  # Route to different service
            port:
              number: 8080
      timeout: 10s   # Recommendations can be slower
    
    # Route 3: Canary deployment - gradual rollout
    - name: "canary-rollout"
      route:
        - destination:
            host: product-service
            subset: v1-stable
          weight: 90
        - destination:
            host: product-service
            subset: v2-canary
          weight: 10
      retries:
        attempts: 3
        perTryTimeout: 1s
        retryOn: "5xx,reset,connect-failure,retriable-4xx"
      timeout: 3s
      
      # Fault injection for testing (comment out in prod)
      # fault:
      #   delay:
      #     percentage:
      #       value: 1
      #     fixedDelay: 5s
      #   abort:
      #     percentage:
      #       value: 0.1
      #     httpStatus: 500

Route Ordering Matters

Traffic Splitting and Canary Deployments

Canary Deployments:

Blue-Green Deployments:

A/B Testing:

Route different user segments to different versions based on headers, cookies, or user attributes. Measure conversion rates, engagement, or other business metrics per version.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
                CANARY DEPLOYMENT TIMELINE
                
Day 0: Deploy v2 canary (no traffic)
────────────────────────────────────────────────────────────────
  v1 ████████████████████████████████████████████████████ 100%
  v2                                                       0%
 
Day 1: Initial canary (1% traffic)
────────────────────────────────────────────────────────────────
  v1 ███████████████████████████████████████████████████  99%
  v2 █                                                     1%
  📊 Monitor: Error rate 0.1%, P99 latency 45ms (same as v1)
 
Day 2: Increased canary (5% traffic)
────────────────────────────────────────────────────────────────
  v1 █████████████████████████████████████████████████    95%
  v2 ███                                                   5%
  📊 Monitor: Error rate 0.1%, P99 latency 42ms (better!)
 
Day 3: Expanded canary (25% traffic)
────────────────────────────────────────────────────────────────
  v1 █████████████████████████████████████████            75%
  v2 █████████████                                        25%
  📊 Monitor: All metrics nominal
 
Day 4: Majority canary (50% traffic)
────────────────────────────────────────────────────────────────
  v1 ██████████████████████████                           50%
  v2 ██████████████████████████                           50%
  📊 Monitor: v2 showing 10% better conversion
 
Day 5: Full rollout (100% traffic to v2)
────────────────────────────────────────────────────────────────
  v1 (standby for rollback if needed)                      0%
  v2 ████████████████████████████████████████████████████ 100%
  🎉 Canary complete - v2 is now stable

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# DestinationRule: Define service subsets
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: product-service
  namespace: ecommerce
spec:
  host: product-service
  subsets:
    - name: stable
      labels:
        version: v1
    - name: canary
      labels:
        version: v2
 
---
# VirtualService: Day 1 - 1% canary
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: product-service
  namespace: ecommerce
spec:
  hosts:
    - product-service
  http:
    - route:
        - destination:
            host: product-service
            subset: stable
          weight: 99
        - destination:
            host: product-service
            subset: canary
          weight: 1
 
---
# Update weights as canary progresses...
# Day 2: weight: 95/5
# Day 3: weight: 75/25
# Day 4: weight: 50/50
# Day 5: weight: 0/100
 
---
# Automated progressive delivery with Flagger (GitOps approach)
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: product-service
  namespace: ecommerce
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: product-service
  
  progressDeadlineSeconds: 600
  
  service:
    port: 8080
    targetPort: 8080
  
  analysis:
    # Canary analysis interval
    interval: 1m
    
    # Max number of failed checks before rollback
    threshold: 5
    
    # Max traffic percentage routed to canary
    maxWeight: 50
    
    # Canary increment step
    stepWeight: 10
    
    # Prometheus metrics to analyze
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m
      - name: request-duration
        thresholdRange:
          max: 500  # milliseconds
        interval: 1m

Sticky Sessions and Canary Deployments

Load Balancing Algorithms

Load Balancing Algorithms Comparison
Algorithm	How It Works	Best For	Limitations
Round Robin	Cycles through endpoints sequentially	Homogeneous endpoints, uniform request cost	Ignores endpoint load; slow endpoints back up requests
Least Connections	Routes to endpoint with fewest active connections	Variable request duration workloads	May not account for in-progress request completion
Random	Randomly selects endpoint	Simple, no state needed	Can create uneven distribution short-term
Weighted	Routes proportionally to configured weights	Heterogeneous capacity endpoints	Requires manual weight configuration
Consistent Hash	Hashes request attribute to select endpoint	Session stickiness, caching locality	Rebalancing on endpoint changes
Locality-Aware	Prefers endpoints in same zone/region	Multi-zone deployments, latency reduction	May reduce effective capacity
Least Request	Routes to endpoint with fewest pending requests	High-throughput scenarios	Requires request counting state

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# Istio DestinationRule: Load Balancing Configuration
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: product-service
  namespace: ecommerce
spec:
  host: product-service
  
  trafficPolicy:
    # Global load balancing policy
    loadBalancer:
      # Simple algorithm selection
      simple: LEAST_REQUEST
      
      # OR: Consistent hash for session affinity
      # consistentHash:
      #   httpHeaderName: x-user-id  # Hash on user ID header
      #   # OR
      #   httpCookie:
      #     name: SERVERID
      #     ttl: 3600s
      #   # OR
      #   useSourceIp: true
    
    # Locality-aware load balancing
    localityLbSetting:
      enabled: true
      # Failover order when local zone endpoints unavailable
      failover:
        - from: us-east-1a
          to: us-east-1b
        - from: us-east-1b
          to: us-east-1c
      # Distribute within region before cross-region
      distribute:
        - from: "us-east-1/*"
          to:
            "us-east-1/*": 80      # 80% to same region
            "us-west-2/*": 20      # 20% to other region (DR readiness)
  
  subsets:
    - name: v1
      labels:
        version: v1
      trafficPolicy:
        # Subset-specific load balancing override
        loadBalancer:
          simple: ROUND_ROBIN
    
    - name: v2
      labels:
        version: v2
      trafficPolicy:
        loadBalancer:
          simple: LEAST_CONN
 
---
# Consul Connect: Load Balancing Configuration
Kind = "service-resolver"
Name = "product-service"
 
# Locality-aware routing
Redirect {
  Service    = "product-service"
  Datacenter = "dc1"
}
 
# Load balancing policy
LoadBalancer {
  Policy = "least_request"
  
  # Ring hash for consistent hashing
  # RingHashConfig {
  #   MinimumRingSize = 1024
  # }
  
  # Least request config
  LeastRequestConfig {
    ChoiceCount = 2  # Power of two random choices
  }
}

Locality-Aware Load Balancing:

In multi-zone or multi-region deployments, locality-aware load balancing is critical for latency and cost optimization:

Zone Affinity: Prefer endpoints in the same availability zone. Cross-zone traffic incurs latency and often cost.
Weighted Distribution: Configure what percentage of traffic should stay local vs. distribute for resilience.
Failover Ordering: Define where traffic should go when local endpoints are unhealthy.
Priority Levels: Istio supports priority levels where higher-priority zones are preferred until their endpoints are saturated.

The Power of Two Random Choices

Retries and Timeouts

Service mesh provides consistent retry and timeout policies at the infrastructure level.

Retry Configuration Principles:

Retry Policy Design Considerations

•Idempotency: Only retry operations that are safe to repeat. GET requests are generally idempotent; POST requests creating resources may not be. Configure retryOn conditions carefully.
•Retry Budget: Limit total retries to prevent retry storms. If a service is failing, unlimited retries amplify load and delay recovery. Linkerd's retry budgets are particularly elegant.
•Backoff Strategy: Use exponential backoff with jitter to prevent thundering herd when service recovers. Immediate retries can overwhelm recovering services.
•Per-Try Timeout: Each retry attempt should have its own timeout, not just total request timeout. This prevents a single slow try from consuming all retry budget.
•Deadline Propagation: Ensure downstream services respect remaining deadline. If 3 seconds remain, don't wait 10 seconds inside a downstream call.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# Istio VirtualService: Retry and Timeout Configuration
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: product-service
  namespace: ecommerce
spec:
  hosts:
    - product-service
  http:
    - route:
        - destination:
            host: product-service
      
      # Overall request timeout (including all retries)
      timeout: 10s
      
      retries:
        # Maximum retry attempts (excludes initial request)
        attempts: 3
        
        # Timeout per retry attempt
        perTryTimeout: 2s
        
        # Conditions that trigger retry
        retryOn: >-
          5xx,
          reset,
          connect-failure,
          retriable-4xx,
          refused-stream,
          retriable-status-codes,
          retriable-headers
        
        # Only retry idempotent methods
        retryRemoteLocalities: true
 
---
# Linkerd ServiceProfile: Retry with Budget
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: product-service.ecommerce.svc.cluster.local
  namespace: ecommerce
spec:
  routes:
    - name: GET /products
      condition:
        method: GET
        pathRegex: /products.*
      isRetryable: true   # Mark this route as safe to retry
      timeout: 5s
    
    - name: POST /products
      condition:
        method: POST
        pathRegex: /products
      isRetryable: false  # POST creates resources - don't retry
      timeout: 10s
  
  # Retry budget prevents retry amplification
  retryBudget:
    # Max ratio of retries to original requests
    retryRatio: 0.2       # At most 20% of traffic can be retries
    
    # Minimum retries per second (regardless of ratio)
    minRetriesPerSecond: 10
    
    # How long to consider past requests for budget
    ttl: 10s
 
---
# Consul Connect: Retry Configuration
Kind = "service-defaults"
Name = "product-service"
Protocol = "http"
 
UpstreamConfig {
  Defaults {
    ConnectTimeoutMs = 5000   # Connection establishment timeout
    
    Limits {
      MaxConnections = 100
    }
  }
}
 
# Note: Consul relies on Envoy's retry configuration
# Configure via proxy-defaults or service-router

Timeout Strategy:

Timeouts must be carefully coordinated across call chains. Consider a three-service chain: A → B → C.

If A has 10s timeout, B has 10s timeout, and C has 10s timeout...
A calls B, which calls C. If C is slow, B waits 10s. But A also waits 10s.
B's call to C might succeed at 9s, but A has already timed out.

Solution: Cascading Timeouts

A's timeout to B: 10s
B's timeout to C: 8s (leaves buffer for B's processing)
C's overall timeout: 5s

This ensures timeouts cascade inward, preventing wasted work downstream when upstream has already given up.

The Retry Amplification Danger

Circuit Breaking

Circuit States:

Closed (Normal): Requests flow normally. The circuit monitors for failures.
Open (Tripped): Too many failures detected. New requests fail immediately without attempting the downstream call.
Half-Open (Testing): After a cooldown period, circuit allows a limited number of test requests. If they succeed, circuit closes. If they fail, circuit reopens.

Mesh Circuit Breaking:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
┌─────────────────────────────────────────────────────────────────────────┐
│                     CIRCUIT BREAKER STATE MACHINE                       │
│                                                                         │
│                                                                         │
│  ┌───────────────────┐                                                  │
│  │      CLOSED       │◄──────────────────────────────────┐              │
│  │                   │                                   │              │
│  │  Normal operation │          Success on test          │              │
│  │  Requests flow    │          requests                 │              │
│  │  through          │                                   │              │
│  └────────┬──────────┘                                   │              │
│           │                                              │              │
│           │  Failure threshold                           │              │
│           │  exceeded (e.g., 5                           │              │
│           │  consecutive 5xx)                            │              │
│           │                                   ┌──────────┴──────────┐   │
│           │                                   │                     │   │
│           ▼                                   │     HALF-OPEN       │   │
│  ┌───────────────────┐                        │                     │   │
│  │       OPEN        │                        │  Testing phase      │   │
│  │                   │  Timeout/cooldown      │  Allow limited      │   │
│  │  Fail immediately │  period expires        │  test requests      │   │
│  │  No downstream    │ ──────────────────────►│                     │   │
│  │  requests sent    │                        │                     │   │
│  │                   │                        └──────────┬──────────┘   │
│  └───────────────────┘                                   │              │
│                                                          │              │
│                         Failures on test                 │              │
│                         requests                         │              │
│                              ◄───────────────────────────┘              │
│                              │                                          │
│                              ▼                                          │
│                       Return to OPEN                                    │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# Istio DestinationRule: Outlier Detection (Circuit Breaking)
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: product-service
  namespace: ecommerce
spec:
  host: product-service
  
  trafficPolicy:
    # Connection pool limits (prevent resource exhaustion)
    connectionPool:
      tcp:
        maxConnections: 100          # Max TCP connections to service
        connectTimeout: 5s           # Connection establishment timeout
      http:
        h2UpgradePolicy: UPGRADE     # Use HTTP/2 when possible
        http1MaxPendingRequests: 100 # Max pending HTTP/1.1 requests
        http2MaxRequests: 1000       # Max concurrent HTTP/2 requests
        maxRequestsPerConnection: 100
        maxRetries: 3                # Max concurrent retries
    
    # Outlier detection (circuit breaking logic)
    outlierDetection:
      # How often to analyze endpoints for ejection
      interval: 10s
      
      # Consecutive errors before ejection
      consecutive5xxErrors: 5
      
      # Or: failure percentage threshold
      # consecutiveLocalOriginFailures: 5
      
      # Consecutive gateway errors (502, 503, 504)
      consecutiveGatewayErrors: 5
      
      # How long endpoint stays ejected
      baseEjectionTime: 30s
      
      # Maximum percentage of pool that can be ejected
      maxEjectionPercent: 50
      
      # Min healthy percentage required (prevent ejecting everyone)
      minHealthPercent: 20
      
      # Time between ejection analysis intervals
      splitExternalLocalOriginErrors: true
 
---
# Consul Connect: Circuit Breaking via Service Defaults
Kind = "service-defaults"
Name = "product-service"
Protocol = "http"
 
UpstreamConfig {
  Defaults {
    PassiveHealthCheck {
      # Check interval
      Interval = "10s"
      
      # Consecutive failures for ejection
      MaxFailures = 5
      
      # What percentage triggers ejection
      EnforcingConsecutive5xx = 100
    }
    
    Limits {
      # Connection limits
      MaxConnections        = 100
      MaxPendingRequests    = 100
      MaxConcurrentRequests = 100
    }
  }
}

maxEjectionPercent: The Safety Valve

Traffic Mirroring (Shadow Traffic)

Use Cases:

Pre-production validation: Test a new version with real traffic patterns without user impact.
Performance benchmarking: Compare latency and resource consumption between versions under identical load.
Bug reproduction: Capture production traffic patterns that expose issues difficult to reproduce synthetically.
ML model validation: Validate new recommendation or fraud detection models against production traffic without affecting users.

┌─────────────────────────────────────────────────────────────────────────┐
│                      TRAFFIC MIRRORING FLOW                             │
│                                                                         │
│    Client Request                                                       │
│         │                                                               │
│         ▼                                                               │
│  ┌──────────────┐                                                       │
│  │   Sidecar    │                                                       │
│  │   Proxy      │                                                       │
│  └──────┬───────┘                                                       │
│         │                                                               │
│         ├─────────────────────────────────────────────┐                 │
│         │                                             │                 │
│         │ Primary Request                             │ Mirrored Copy   │
│         │ (synchronous, response                      │ (fire-and-forget│
│         │  returned to client)                        │  response ignored│
│         │                                             │                 │
│         ▼                                             ▼                 │
│  ┌──────────────────────┐                    ┌──────────────────────┐   │
│  │   Primary Service    │                    │   Shadow Service     │   │
│  │   (v1 - Stable)      │                    │   (v2 - Testing)     │   │
│  │                      │                    │                      │   │
│  │   Handles actual     │                    │   Processes request  │   │
│  │   client request     │                    │   but response is    │   │
│  │                      │                    │   discarded          │   │
│  └──────────┬───────────┘                    └──────────────────────┘   │
│             │                                                           │
│             │ Response returned to client                               │
│             │                                                           │
│             ▼                                                           │
│       Client receives                                                   │
│       response from v1                                                  │
│       only                                                              │
│                                                                         │
│  📊 Meanwhile: Compare v1 vs v2 metrics, logs, behaviors               │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# Istio VirtualService: Traffic Mirroring
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: product-service
  namespace: ecommerce
spec:
  hosts:
    - product-service
  http:
    - route:
        - destination:
            host: product-service
            subset: v1
          weight: 100
      
      # Mirror 100% of traffic to v2
      mirror:
        host: product-service
        subset: v2
      
      # Percentage of traffic to mirror (optional, defaults to 100)
      mirrorPercentage:
        value: 100.0
      
      # Note: mirrored requests have header "-shadow" appended to Host
 
---
# With sampling for high-volume services
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: high-volume-service
  namespace: production
spec:
  hosts:
    - high-volume-service
  http:
    - route:
        - destination:
            host: high-volume-service
            subset: stable
      
      # Mirror only 10% of traffic to avoid overwhelming shadow
      mirror:
        host: high-volume-service
        subset: shadow
      mirrorPercentage:
        value: 10.0
 
---
# The shadow service deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: product-service-shadow
  namespace: ecommerce
spec:
  replicas: 2  # Smaller than production (no real traffic served)
  selector:
    matchLabels:
      app: product-service
      version: v2-shadow
  template:
    metadata:
      labels:
        app: product-service
        version: v2-shadow
    spec:
      containers:
        - name: product-service
          image: product-service:v2
          resources:
            # Can use fewer resources since responses not needed
            requests:
              memory: "256Mi"
              cpu: "100m"

Mirroring Side Effects

Fault Injection for Testing Resilience

How do you know your retry logic works? That your circuit breakers trip correctly? That your timeouts are configured properly? You test with intentional faults.

Service mesh enables fault injection—deliberately introducing failures to validate system resilience. Rather than waiting for production incidents, you create controlled failure scenarios.

Types of Fault Injection:

Delay Injection

•Adds artificial latency to requests
•Tests timeout handling
•Validates cascading timeout behavior
•Exposes performance degradation patterns
•Useful for chaos engineering exercises

Abort Injection

•Returns error responses without calling backend
•Tests error handling code paths
•Validates retry and circuit breaker logic
•Simulates complete service failure
•Exposes failure mode behaviors

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# Istio VirtualService: Fault Injection for Testing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: payment-service-chaos
  namespace: ecommerce
spec:
  hosts:
    - payment-service
  http:
    # Inject faults only for test traffic (identified by header)
    - match:
        - headers:
            x-chaos-test:
              exact: "enabled"
      route:
        - destination:
            host: payment-service
      fault:
        # Delay 50% of requests by 5 seconds
        delay:
          percentage:
            value: 50
          fixedDelay: 5s
        
        # Abort 10% of requests with 503
        abort:
          percentage:
            value: 10
          httpStatus: 503
    
    # Normal traffic (no fault injection)
    - route:
        - destination:
            host: payment-service
 
---
# Targeted fault injection: Only affect specific paths
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: targeted-chaos
  namespace: ecommerce
spec:
  hosts:
    - inventory-service
  http:
    # Simulate slow database queries on specific endpoint
    - match:
        - uri:
            prefix: "/api/inventory/check"
          headers:
            x-load-test:
              exact: "true"
      route:
        - destination:
            host: inventory-service
      fault:
        delay:
          percentage:
            value: 100
          fixedDelay: 3s   # Simulate slow inventory check
    
    # All other requests normal
    - route:
        - destination:
            host: inventory-service
 
---
# Chaos engineering: Random failures across service
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: chaos-engineering
  namespace: staging  # Only in staging environment!
spec:
  hosts:
    - "*"  # All services in namespace
  http:
    - route:
        - destination:
            host: $HOST  # Dynamic per-request
      fault:
        # 1% of all requests fail randomly
        abort:
          percentage:
            value: 1
          httpStatus: 500
        
        # 5% of all requests delayed
        delay:
          percentage:
            value: 5
          fixedDelay: 2s

Never in Production (Usually)

Summary: Traffic Management

We've explored the full spectrum of service mesh traffic management capabilities. These features transform traffic behavior from hardcoded application logic into configurable infrastructure policy.

Key Takeaways

•Request routing is multi-stage — Service discovery, route matching, subset selection, and load balancing compose to determine where requests go.
•Traffic splitting enables safe deployments — Canary releases, blue-green deployments, and A/B testing become configuration changes, not code changes.
•Load balancing algorithms matter — Choose based on workload characteristics. Locality awareness reduces latency and cost in multi-zone deployments.
•Retries need budgets — Unlimited retries cause amplification storms. Use retry budgets and proper backoff to prevent cascading failures.
•Circuit breaking prevents cascade — Outlier detection ejects failing endpoints, failing fast rather than overwhelming struggling services.
•Traffic mirroring validates safely — Test new versions with production traffic without user impact. Be careful about side effects.
•Fault injection proves resilience — Intentionally break things in controlled ways to validate your system handles failures correctly.

What's Next:

Page Complete

4 / 5