Computer NetworksContainerized Networking

Containerized Networking

LevelAdvanced

Duration90 mins

TopicContainerized Networking

4 / 5

Service Mesh Architecture

Beyond Basic Networking: The Service Mesh

Kubernetes provides the networking primitives—pod IPs, Services, DNS. But as microservice architectures grow to hundreds of services, new challenges emerge that Kubernetes alone cannot solve:

How do you trace a request across 20 microservices?
How do you enforce mTLS between every service without modifying code?
How do you implement canary deployments, circuit breakers, and retry policies?
How do you observe which services are communicating and with what latency?

The answer is a service mesh—a dedicated infrastructure layer that handles service-to-service communication, providing a unified approach to traffic management, security, and observability without requiring changes to application code.

What You Will Learn

By the end of this page, you will understand the service mesh architecture: the sidecar proxy pattern, data plane vs. control plane, traffic management capabilities (routing, load balancing, retries), security features (mTLS, authorization), and observability (distributed tracing, metrics). You'll see how implementations like Istio, Linkerd, and Cilium service mesh work.

What is a Service Mesh?

A service mesh is a configurable infrastructure layer for microservices that makes service-to-service communication safe, fast, and reliable. It abstracts the complexity of network communication away from application code, providing common functionality in a uniform, platform-wide manner.

The core insight: Instead of building retries, timeout handling, circuit breakers, TLS termination, and monitoring into every microservice, offload these concerns to infrastructure that's automatically injected into every service.

Service Mesh Core Capabilities

•Traffic Management — Load balancing, traffic routing, A/B testing, canary deployments, fault injection, retries, timeouts, circuit breakers.
•Security — Mutual TLS (mTLS) between all services, certificate management, access policies, encryption in transit.
•Observability — Distributed tracing, metrics collection, access logging, topology visualization.
•Reliability — Automatic retries with backoff, health checking, outlier detection, rate limiting.

Why not implement this in application code?

Historically, these features were implemented as libraries (e.g., Netflix OSS Hystrix for circuit breaking, Ribbon for load balancing). This approach has drawbacks:

Library Approach	Service Mesh Approach
Language-specific (Java, Python, Go separate)	Language-agnostic (works with any language)
Requires code changes	No code changes (inject proxy)
Inconsistent across teams	Consistent, platform-wide
Version drift between services	Centrally managed, updated uniformly
Observability scattered	Unified telemetry collection

The Cost of a Service Mesh

Service meshes add complexity and resource overhead (every pod gets a sidecar proxy). They're not always necessary. Consider a service mesh when you have: (1) 10+ microservices, (2) multiple teams/languages, (3) strict security requirements (mTLS everywhere), or (4) need for advanced traffic management. For simpler architectures, Kubernetes primitives may suffice.

The Sidecar Proxy Pattern

The sidecar pattern is the architectural foundation of most service meshes. For every application container, a sidecar proxy is injected into the pod. This proxy intercepts all network traffic in and out of the pod, enabling the mesh to control and observe all communication.

Envoy Proxy (created by Lyft) is the most common sidecar, used by Istio, AWS App Mesh, and many others. linkerd-proxy (Rust-based) is used by Linkerd for its efficiency.

Converting Mermaid diagram...

How traffic interception works:

When an application sends a request (e.g., curl http://backend-api:3000/users), instead of going directly to the network, the request is redirected to the sidecar proxy. This is accomplished through iptables rules (or eBPF in newer implementations) that transparently redirect traffic.

Interception flow:

Application calls backend-api:3000
iptables REDIRECT rule captures the outbound packet
Packet redirected to sidecar proxy (Envoy) on localhost:15001
Envoy looks up routing rules from control plane
Envoy establishes mTLS connection to destination pod's sidecar
Destination sidecar decrypts and forwards to application
Response flows back through both sidecars

sidecar-injection.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Automatic sidecar injection (Istio)
# Enable for entire namespace:
apiVersion: v1
kind: Namespace
metadata:
  name: my-app
  labels:
    istio-injection: enabled  # Istio injects sidecar automatically
---
# Or annotate specific deployments:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
spec:
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "true"
    spec:
      containers:
      - name: frontend
        image: myapp/frontend:v1
        ports:
        - containerPort: 8080
# Istio mutating webhook adds envoy sidecar container
# Result: Pod has 2 containers (frontend + istio-proxy)

view-sidecar.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# View pods with sidecars
kubectl get pods -n my-app
# NAME                       READY   STATUS    RESTARTS   AGE
# frontend-7b9c6d8-xk2m4    2/2     Running   0          5m
#                           ^^^
#                           2 containers: app + sidecar
 
# Inspect the sidecar container
kubectl describe pod frontend-7b9c6d8-xk2m4 -n my-app
# Containers:
#   frontend:
#     Image:  myapp/frontend:v1
#     Port:   8080/TCP
#   istio-proxy:
#     Image:  docker.io/istio/proxyv2:1.18.0
#     Ports:  15090/TCP, 15021/TCP, 15020/TCP
#     Args:   proxy, sidecar, ...
 
# View iptables rules inside pod (traffic redirection)
kubectl exec frontend-7b9c6d8-xk2m4 -c istio-proxy -- iptables -t nat -L -n
# Shows REDIRECT rules capturing port 80, 443, etc.

Sidecar Overhead

Each sidecar consumes ~50-100MB memory and adds ~1-3ms latency per hop. For a cluster with 1000 pods, that's 50-100GB of RAM for sidecars alone. eBPF-based meshes (like Cilium) reduce this dramatically by running in the kernel without per-pod proxies.

Data Plane and Control Plane Architecture

Service meshes follow a two-plane architecture:

Data Plane: The network of sidecar proxies that handle actual traffic. Every request between services passes through the data plane. This is the 'hot path'—it must be highly performant.

Control Plane: The management layer that configures the data plane. It translates high-level policies into proxy configuration, distributes certificates, collects telemetry. This is the 'brain' of the mesh.

Data Plane vs Control Plane Responsibilities
Aspect	Data Plane (Proxies)	Control Plane
Function	Process and route traffic	Manage and configure proxies
Components	Envoy, linkerd-proxy	istiod, linkerd-controller
Performance	Ultra-low latency (<1ms)	Not latency-critical
Scaling	One per pod (thousands)	Small centralized cluster
State	Ephemeral (config from control plane)	Persistent (certificates, policies)
Failure impact	Container traffic affected	New configs not propagated

Converting Mermaid diagram...

The xDS protocol:

Control planes push configuration to data plane proxies using the xDS (x Discovery Service) protocol family:

LDS (Listener Discovery Service): What ports to listen on
RDS (Route Discovery Service): How to route requests
CDS (Cluster Discovery Service): Available upstream clusters
EDS (Endpoint Discovery Service): Pod IPs in each cluster
SDS (Secret Discovery Service): TLS certificates

xDS is a gRPC streaming protocol—proxies maintain long-lived connections to the control plane and receive updates in real-time as pods come and go.

istio-control-plane.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# View Istio control plane (istiod)
kubectl get pods -n istio-system
# NAME                      READY   STATUS    RESTARTS   AGE
# istiod-76d66d9c9c-abcde   1/1     Running   0          10d
 
# View Envoy's configuration from control plane
istioctl proxy-config listeners <pod-name> -n my-app
# Shows listeners (ports Envoy is accepting connections on)
 
istioctl proxy-config clusters <pod-name> -n my-app
# Shows discovered upstream clusters (services)
 
istioctl proxy-config routes <pod-name> -n my-app
# Shows routing rules
 
# Validate mesh configuration
istioctl analyze -n my-app
# Checks for configuration errors, warnings
 
# Debug control plane to sidecar sync
istioctl proxy-status
# Shows sync status of all sidecars

Traffic Management: Advanced Routing

Service meshes provide sophisticated traffic management that goes far beyond Kubernetes Services' basic round-robin load balancing. You can implement complex routing, gradual rollouts, and resilience patterns—all without modifying application code.

VirtualService defines how requests are routed to destinations. You can route based on headers, URI paths, query parameters, or any combination.

virtual-service-routing.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews-routing
spec:
  hosts:
  - reviews  # Kubernetes Service name
  http:
  # Route by header (A/B testing)
  - match:
    - headers:
        x-user-type:
          exact: "beta-tester"
    route:
    - destination:
        host: reviews
        subset: v3  # Beta version
  
  # Route by path (API versioning)
  - match:
    - uri:
        prefix: "/api/v2"
    route:
    - destination:
        host: reviews
        subset: v2
  
  # Default route (production)
  - route:
    - destination:
        host: reviews
        subset: v1
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews-subsets
spec:
  host: reviews
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
  - name: v3
    labels:
      version: v3

Security: mTLS and Authorization

Service meshes provide comprehensive security without application changes. The two pillars are mutual TLS (mTLS) for encryption and identity, and authorization policies for access control.

Mutual TLS (mTLS):

Unlike standard TLS (where only the server presents a certificate), mTLS requires both parties to present certificates. This provides:

Encryption — All traffic is encrypted in transit
Authentication — Both parties verify each other's identity
No network-level trust — Compromised network doesn't compromise communication

Converting Mermaid diagram...

mtls-policies.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Enable strict mTLS for entire mesh (Istio)
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system  # Applies mesh-wide
spec:
  mtls:
    mode: STRICT  # Reject all non-mTLS traffic
---
# Per-namespace policy (override mesh default)
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: allow-plaintext-legacy
  namespace: legacy-apps  # Only for this namespace
spec:
  mtls:
    mode: PERMISSIVE  # Accept both mTLS and plaintext
---
# Authorization Policy: Only frontend can call backend
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: backend-authz
  namespace: production
spec:
  selector:
    matchLabels:
      app: backend
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/production/sa/frontend"]
    to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/api/*"]
  # Default: deny all other traffic

SPIFFE identity:

Service meshes use SPIFFE (Secure Production Identity Framework For Everyone) for workload identity. Each workload gets a cryptographic identity (SPIFFE ID):

spiffe://cluster.local/ns/production/sa/frontend

This identity is embedded in the workload's X.509 certificate, issued by the mesh's certificate authority. Authorization policies use these identities for fine-grained access control.

Zero Trust Networking

Service meshes enable zero-trust security: never trust, always verify. Even within the cluster, all traffic is encrypted and authenticated. Network position (IP address, network segment) grants no implicit trust—only cryptographic identity matters.

Observability: Metrics, Tracing, Logging

One of the most powerful service mesh benefits is automatic observability. Because all traffic flows through sidecars, the mesh can collect comprehensive telemetry without any application instrumentation.

Three Pillars of Observability

•Metrics — Request counts, latencies, error rates. Exported to Prometheus, Datadog, etc. Enables dashboards, alerts, SLO tracking.
•Distributed Tracing — Follows a request across all services. Exported to Jaeger, Zipkin, Tempo. Reveals latency bottlenecks, failure points.
•Access Logs — Detailed per-request logs with timing, headers, status codes. Exported to Elasticsearch, Loki, Splunk.

Metrics Generated by Sidecars
Metric	Description	Example Use
`istio_requests_total`	Total requests by source, destination, response code	Calculate error rate, QPS
`istio_request_duration_milliseconds`	Request latency histogram	P50, P95, P99 latency
`istio_tcp_connections_opened_total`	TCP connections opened	Connection pool monitoring
`istio_request_bytes_total`	Request body size	Bandwidth analysis
`istio_response_bytes_total`	Response body size	Bandwidth analysis

tracing-config.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Istio tracing configuration
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: mesh-tracing
  namespace: istio-system
spec:
  tracing:
  - providers:
    - name: jaeger
    randomSamplingPercentage: 10  # Sample 10% of requests
 
# View traces in Jaeger:
# kubectl port-forward svc/jaeger-query 16686:16686 -n istio-system
# Open http://localhost:16686
 
# Example trace shows:
# frontend (10ms) 
#   └── backend-api (8ms)
#         ├── database (5ms)
#         └── cache (1ms)

Trace context propagation:

For distributed tracing to work, trace context (trace ID, span ID) must propagate through the request chain. Applications must forward tracing headers:

x-request-id
x-b3-traceid, x-b3-spanid, x-b3-parentspanid, x-b3-sampled (B3 format)
traceparent, tracestate (W3C Trace Context)

Most HTTP libraries do this automatically, but it's important to verify—if any service drops these headers, the trace breaks.

Observability Without Code Changes

Service meshes provide 'golden signals' (latency, traffic, errors, saturation) for every service automatically. This is invaluable for rapidly understanding system behavior. However, application-level tracing (database queries, cache hits, business logic) still requires instrumentation.

Service Mesh Implementations

Several service mesh implementations exist, each with different architectures, trade-offs, and use cases.

Major Service Mesh Implementations
Mesh	Data Plane	Control Plane	Best For
Istio	Envoy (C++)	istiod (Go)	Feature-rich; complex environments
Linkerd	linkerd-proxy (Rust)	linkerd-control (Go)	Simplicity; low overhead; fast start
Cilium	eBPF (kernel)	Cilium Agent	Performance; no sidecars; observability
Consul Connect	Envoy or built-in	Consul (Go)	HashiCorp ecosystem integration
AWS App Mesh	Envoy	AWS managed	AWS-native applications
GCP Traffic Director	Envoy	GCP managed	GCP-native; global load balancing

Istio is the most feature-rich and widely adopted service mesh. It provides comprehensive traffic management, security, and observability. Originally complex (multiple control plane components), it has simplified significantly (single istiod binary since v1.5).

Pros:

Most features (traffic management, security, observability)
Largest community and ecosystem
Extensive documentation and support

Cons:

Higher resource consumption (Envoy sidecars)
Steeper learning curve
Configuration complexity

Summary: Service Mesh Mastery

We've covered service mesh architecture comprehensively—from the sidecar pattern to advanced traffic management, security, and observability.

Key Takeaways

•Service mesh = infrastructure for microservice communication — Provides traffic management, security, and observability without code changes.
•Sidecar pattern intercepts all traffic — Proxies (Envoy) injected alongside application containers capture and control network traffic.
•Data plane handles traffic, control plane configures — xDS protocol pushes configuration to sidecars in real-time.
•Advanced traffic routing — Canary deployments, A/B testing, header-based routing, circuit breakers, retries.
•mTLS provides identity and encryption — SPIFFE identities enable zero-trust networking; authorization policies control access.
•Automatic observability — Metrics, distributed tracing, and logging collected without application instrumentation.
•Choose based on needs — Istio for features, Linkerd for simplicity, Cilium for performance.

What's next:

In the final page of this module, we'll explore CNI plugins in depth—the implementations that actually create the container network. You'll understand how Calico, Cilium, Flannel, and others implement Kubernetes' networking requirements, their architectures, and when to choose each.

Page Complete

You now understand service mesh architecture deeply—the sidecar pattern, traffic management, mTLS security, and observability features. Whether you choose Istio, Linkerd, or Cilium, you have the conceptual foundation to operate a service mesh in production.

4 / 5

Loading learning content...

Computer NetworksContainerized Networking

Containerized Networking

LevelAdvanced

Duration90 mins

TopicContainerized Networking

4 / 5

Service Mesh Architecture

Beyond Basic Networking: The Service Mesh

Kubernetes provides the networking primitives—pod IPs, Services, DNS. But as microservice architectures grow to hundreds of services, new challenges emerge that Kubernetes alone cannot solve:

How do you trace a request across 20 microservices?
How do you enforce mTLS between every service without modifying code?
How do you implement canary deployments, circuit breakers, and retry policies?
How do you observe which services are communicating and with what latency?

What You Will Learn

What is a Service Mesh?

Service Mesh Core Capabilities

•Traffic Management — Load balancing, traffic routing, A/B testing, canary deployments, fault injection, retries, timeouts, circuit breakers.
•Security — Mutual TLS (mTLS) between all services, certificate management, access policies, encryption in transit.
•Observability — Distributed tracing, metrics collection, access logging, topology visualization.
•Reliability — Automatic retries with backoff, health checking, outlier detection, rate limiting.

Why not implement this in application code?

Historically, these features were implemented as libraries (e.g., Netflix OSS Hystrix for circuit breaking, Ribbon for load balancing). This approach has drawbacks:

Library Approach	Service Mesh Approach
Language-specific (Java, Python, Go separate)	Language-agnostic (works with any language)
Requires code changes	No code changes (inject proxy)
Inconsistent across teams	Consistent, platform-wide
Version drift between services	Centrally managed, updated uniformly
Observability scattered	Unified telemetry collection

The Cost of a Service Mesh

The Sidecar Proxy Pattern

Envoy Proxy (created by Lyft) is the most common sidecar, used by Istio, AWS App Mesh, and many others. linkerd-proxy (Rust-based) is used by Linkerd for its efficiency.

Converting Mermaid diagram...

How traffic interception works:

Interception flow:

Application calls backend-api:3000
iptables REDIRECT rule captures the outbound packet
Packet redirected to sidecar proxy (Envoy) on localhost:15001
Envoy looks up routing rules from control plane
Envoy establishes mTLS connection to destination pod's sidecar
Destination sidecar decrypts and forwards to application
Response flows back through both sidecars

sidecar-injection.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Automatic sidecar injection (Istio)
# Enable for entire namespace:
apiVersion: v1
kind: Namespace
metadata:
  name: my-app
  labels:
    istio-injection: enabled  # Istio injects sidecar automatically
---
# Or annotate specific deployments:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
spec:
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "true"
    spec:
      containers:
      - name: frontend
        image: myapp/frontend:v1
        ports:
        - containerPort: 8080
# Istio mutating webhook adds envoy sidecar container
# Result: Pod has 2 containers (frontend + istio-proxy)

view-sidecar.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# View pods with sidecars
kubectl get pods -n my-app
# NAME                       READY   STATUS    RESTARTS   AGE
# frontend-7b9c6d8-xk2m4    2/2     Running   0          5m
#                           ^^^
#                           2 containers: app + sidecar
 
# Inspect the sidecar container
kubectl describe pod frontend-7b9c6d8-xk2m4 -n my-app
# Containers:
#   frontend:
#     Image:  myapp/frontend:v1
#     Port:   8080/TCP
#   istio-proxy:
#     Image:  docker.io/istio/proxyv2:1.18.0
#     Ports:  15090/TCP, 15021/TCP, 15020/TCP
#     Args:   proxy, sidecar, ...
 
# View iptables rules inside pod (traffic redirection)
kubectl exec frontend-7b9c6d8-xk2m4 -c istio-proxy -- iptables -t nat -L -n
# Shows REDIRECT rules capturing port 80, 443, etc.

Sidecar Overhead

Data Plane and Control Plane Architecture

Service meshes follow a two-plane architecture:

Data Plane: The network of sidecar proxies that handle actual traffic. Every request between services passes through the data plane. This is the 'hot path'—it must be highly performant.

Data Plane vs Control Plane Responsibilities
Aspect	Data Plane (Proxies)	Control Plane
Function	Process and route traffic	Manage and configure proxies
Components	Envoy, linkerd-proxy	istiod, linkerd-controller
Performance	Ultra-low latency (<1ms)	Not latency-critical
Scaling	One per pod (thousands)	Small centralized cluster
State	Ephemeral (config from control plane)	Persistent (certificates, policies)
Failure impact	Container traffic affected	New configs not propagated

Converting Mermaid diagram...

The xDS protocol:

Control planes push configuration to data plane proxies using the xDS (x Discovery Service) protocol family:

LDS (Listener Discovery Service): What ports to listen on
RDS (Route Discovery Service): How to route requests
CDS (Cluster Discovery Service): Available upstream clusters
EDS (Endpoint Discovery Service): Pod IPs in each cluster
SDS (Secret Discovery Service): TLS certificates

xDS is a gRPC streaming protocol—proxies maintain long-lived connections to the control plane and receive updates in real-time as pods come and go.

istio-control-plane.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# View Istio control plane (istiod)
kubectl get pods -n istio-system
# NAME                      READY   STATUS    RESTARTS   AGE
# istiod-76d66d9c9c-abcde   1/1     Running   0          10d
 
# View Envoy's configuration from control plane
istioctl proxy-config listeners <pod-name> -n my-app
# Shows listeners (ports Envoy is accepting connections on)
 
istioctl proxy-config clusters <pod-name> -n my-app
# Shows discovered upstream clusters (services)
 
istioctl proxy-config routes <pod-name> -n my-app
# Shows routing rules
 
# Validate mesh configuration
istioctl analyze -n my-app
# Checks for configuration errors, warnings
 
# Debug control plane to sidecar sync
istioctl proxy-status
# Shows sync status of all sidecars

Traffic Management: Advanced Routing

VirtualService defines how requests are routed to destinations. You can route based on headers, URI paths, query parameters, or any combination.

virtual-service-routing.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews-routing
spec:
  hosts:
  - reviews  # Kubernetes Service name
  http:
  # Route by header (A/B testing)
  - match:
    - headers:
        x-user-type:
          exact: "beta-tester"
    route:
    - destination:
        host: reviews
        subset: v3  # Beta version
  
  # Route by path (API versioning)
  - match:
    - uri:
        prefix: "/api/v2"
    route:
    - destination:
        host: reviews
        subset: v2
  
  # Default route (production)
  - route:
    - destination:
        host: reviews
        subset: v1
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews-subsets
spec:
  host: reviews
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
  - name: v3
    labels:
      version: v3

Security: mTLS and Authorization

Service meshes provide comprehensive security without application changes. The two pillars are mutual TLS (mTLS) for encryption and identity, and authorization policies for access control.

Mutual TLS (mTLS):

Unlike standard TLS (where only the server presents a certificate), mTLS requires both parties to present certificates. This provides:

Encryption — All traffic is encrypted in transit
Authentication — Both parties verify each other's identity
No network-level trust — Compromised network doesn't compromise communication

Converting Mermaid diagram...

mtls-policies.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Enable strict mTLS for entire mesh (Istio)
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system  # Applies mesh-wide
spec:
  mtls:
    mode: STRICT  # Reject all non-mTLS traffic
---
# Per-namespace policy (override mesh default)
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: allow-plaintext-legacy
  namespace: legacy-apps  # Only for this namespace
spec:
  mtls:
    mode: PERMISSIVE  # Accept both mTLS and plaintext
---
# Authorization Policy: Only frontend can call backend
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: backend-authz
  namespace: production
spec:
  selector:
    matchLabels:
      app: backend
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/production/sa/frontend"]
    to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/api/*"]
  # Default: deny all other traffic

SPIFFE identity:

Service meshes use SPIFFE (Secure Production Identity Framework For Everyone) for workload identity. Each workload gets a cryptographic identity (SPIFFE ID):

spiffe://cluster.local/ns/production/sa/frontend

This identity is embedded in the workload's X.509 certificate, issued by the mesh's certificate authority. Authorization policies use these identities for fine-grained access control.

Zero Trust Networking

Observability: Metrics, Tracing, Logging

Three Pillars of Observability

•Metrics — Request counts, latencies, error rates. Exported to Prometheus, Datadog, etc. Enables dashboards, alerts, SLO tracking.
•Distributed Tracing — Follows a request across all services. Exported to Jaeger, Zipkin, Tempo. Reveals latency bottlenecks, failure points.
•Access Logs — Detailed per-request logs with timing, headers, status codes. Exported to Elasticsearch, Loki, Splunk.

Metrics Generated by Sidecars
Metric	Description	Example Use
`istio_requests_total`	Total requests by source, destination, response code	Calculate error rate, QPS
`istio_request_duration_milliseconds`	Request latency histogram	P50, P95, P99 latency
`istio_tcp_connections_opened_total`	TCP connections opened	Connection pool monitoring
`istio_request_bytes_total`	Request body size	Bandwidth analysis
`istio_response_bytes_total`	Response body size	Bandwidth analysis

tracing-config.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Istio tracing configuration
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: mesh-tracing
  namespace: istio-system
spec:
  tracing:
  - providers:
    - name: jaeger
    randomSamplingPercentage: 10  # Sample 10% of requests
 
# View traces in Jaeger:
# kubectl port-forward svc/jaeger-query 16686:16686 -n istio-system
# Open http://localhost:16686
 
# Example trace shows:
# frontend (10ms) 
#   └── backend-api (8ms)
#         ├── database (5ms)
#         └── cache (1ms)

Trace context propagation:

For distributed tracing to work, trace context (trace ID, span ID) must propagate through the request chain. Applications must forward tracing headers:

x-request-id
x-b3-traceid, x-b3-spanid, x-b3-parentspanid, x-b3-sampled (B3 format)
traceparent, tracestate (W3C Trace Context)

Most HTTP libraries do this automatically, but it's important to verify—if any service drops these headers, the trace breaks.

Observability Without Code Changes

Service Mesh Implementations

Several service mesh implementations exist, each with different architectures, trade-offs, and use cases.

Major Service Mesh Implementations
Mesh	Data Plane	Control Plane	Best For
Istio	Envoy (C++)	istiod (Go)	Feature-rich; complex environments
Linkerd	linkerd-proxy (Rust)	linkerd-control (Go)	Simplicity; low overhead; fast start
Cilium	eBPF (kernel)	Cilium Agent	Performance; no sidecars; observability
Consul Connect	Envoy or built-in	Consul (Go)	HashiCorp ecosystem integration
AWS App Mesh	Envoy	AWS managed	AWS-native applications
GCP Traffic Director	Envoy	GCP managed	GCP-native; global load balancing

Pros:

Most features (traffic management, security, observability)
Largest community and ecosystem
Extensive documentation and support

Cons:

Higher resource consumption (Envoy sidecars)
Steeper learning curve
Configuration complexity

Summary: Service Mesh Mastery

We've covered service mesh architecture comprehensively—from the sidecar pattern to advanced traffic management, security, and observability.

Key Takeaways

•Service mesh = infrastructure for microservice communication — Provides traffic management, security, and observability without code changes.
•Sidecar pattern intercepts all traffic — Proxies (Envoy) injected alongside application containers capture and control network traffic.
•Data plane handles traffic, control plane configures — xDS protocol pushes configuration to sidecars in real-time.
•Advanced traffic routing — Canary deployments, A/B testing, header-based routing, circuit breakers, retries.
•mTLS provides identity and encryption — SPIFFE identities enable zero-trust networking; authorization policies control access.
•Automatic observability — Metrics, distributed tracing, and logging collected without application instrumentation.
•Choose based on needs — Istio for features, Linkerd for simplicity, Cilium for performance.

What's next:

Page Complete

4 / 5