Kubernetes Networking - Learning Module

Loading content...

0/273

Service Mesh Integration

Beyond Kubernetes Native Networking

We've explored Kubernetes' native networking primitives—Services, Ingress, Network Policies, and DNS. These tools provide a solid foundation for container networking, but as organizations scale to hundreds of microservices, new challenges emerge:

How do you ensure all service-to-service communication is encrypted?
How do you perform fine-grained traffic splitting for canary deployments?
How do you get consistent observability across all services without modifying application code?
How do you implement circuit breakers and retries at the infrastructure level?

Kubernetes doesn't answer these questions natively. Enter the service mesh—an infrastructure layer that handles service-to-service communication with features that go far beyond what Kubernetes provides out of the box.

What You Will Learn

By the end of this page, you will understand service mesh architecture, how meshes integrate with Kubernetes networking, key features (mTLS, traffic management, observability), and how to evaluate whether your organization needs a service mesh.

Service Mesh Architecture

A service mesh is an infrastructure layer that mediates all service-to-service communication within a cluster. It transparently intercepts network traffic through sidecar proxies, adding capabilities without requiring application code changes.

The Sidecar Pattern

Without Service Mesh:
┌────────────┐              ┌────────────┐
│  Service A │ ──HTTP────▶  │  Service B │
└────────────┘              └────────────┘

With Service Mesh:
┌─────────────────────────┐  ┌─────────────────────────┐
│ Pod                     │  │ Pod                     │
│ ┌───────────┐          │  │          ┌───────────┐ │
│ │ Service A │──▶┐      │  │      ┌──▶│ Service B │ │
│ └───────────┘   │      │  │      │   └───────────┘ │
│                 ▼      │  │      │                 │
│          ┌─────────┐   │  │   ┌─────────┐          │
│          │ Sidecar │───┼──┼──▶│ Sidecar │          │
│          │ Proxy   │   │  │   │ Proxy   │          │
│          └─────────┘   │  │   └─────────┘          │
└─────────────────────────┘  └─────────────────────────┘
                    │                    │
                    └──────┬─────────────┘
                           ▼
                    ┌──────────────┐
                    │ Control Plane │
                    └──────────────┘

Data Plane vs Control Plane

Data Plane: The sidecar proxies (typically Envoy) that handle actual traffic. They intercept all ingress and egress from application containers.

Control Plane: The management component that configures and coordinates the proxies. It distributes configuration, certificates, and policies.

Popular Service Mesh Solutions
Mesh	Proxy	Strengths	Considerations
Istio	Envoy	Feature-rich, wide adoption, strong traffic management	Resource overhead, complexity
Linkerd	linkerd2-proxy (Rust)	Lightweight, simple, low latency	Fewer features than Istio
Consul Connect	Envoy/Built-in	Multi-platform, HashiCorp ecosystem integration	Requires Consul service discovery
AWS App Mesh	Envoy	Native AWS integration	AWS-only
Cilium Service Mesh	eBPF/Envoy	No sidecars (eBPF), high performance	Requires newer kernels, maturing
Kuma	Envoy	Multi-zone (Kubernetes + VMs), simple UX	Smaller community

The Envoy Dominance

Envoy Proxy (created by Lyft) is the de facto data plane for most service meshes. Its programmability through xDS APIs makes it ideal for dynamic mesh environments. Understanding Envoy concepts translates across Istio, App Mesh, Consul, and others.

How Traffic Interception Works

Service meshes transparently intercept traffic using iptables rules or eBPF programs. Understanding this mechanism is crucial for debugging and understanding mesh behavior.

iptables-Based Interception (Istio, Linkerd)

iptables-interception.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# When a Pod with a sidecar makes an outbound request:
 
1. Application container sends request to Service B (ClusterIP:Port)
2. iptables rules REDIRECT traffic to the sidecar proxy (localhost:15001)
3. Sidecar proxy receives the request
4. Proxy applies policies (mTLS, retries, timeout)
5. Proxy establishes connection to destination Pod's sidecar
6. Destination sidecar receives traffic, applies policies
7. Destination sidecar forwards to application container
 
# Actual iptables rules (Istio example):
iptables -t nat -L -n
 
Chain ISTIO_OUTPUT
# Redirect all outbound TCP to Envoy
-A ISTIO_OUTPUT -j REDIRECT --to-ports 15001
 
Chain ISTIO_REDIRECT
# Redirect all inbound TCP to Envoy
-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15006
 
# Visual flow:
App → localhost:15001 (outbound proxy) → Remote Pod:15006 (inbound proxy) → App
          ↑                                       ↑
     iptables REDIRECT                      iptables REDIRECT

Sidecar Injection

Sidecars are typically injected automatically via Kubernetes admission webhooks. When a Pod is created, the mesh's mutating admission webhook modifies the Pod spec to add the sidecar container and init container (for iptables setup).

sidecar-injection.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Enable automatic sidecar injection for a namespace (Istio)
kubectl label namespace production istio-injection=enabled
 
# Alternatively, annotate individual Pods
apiVersion: v1
kind: Pod
metadata:
  name: my-app
  annotations:
    sidecar.istio.io/inject: "true"          # Enable injection
    sidecar.istio.io/proxyCPU: "100m"        # Sidecar resource requests
    sidecar.istio.io/proxyMemory: "128Mi"
spec:
  containers:
  - name: my-app
    image: myapp:latest
 
# After injection, the Pod has additional containers:
# - istio-init: Init container that sets up iptables
# - istio-proxy: The Envoy sidecar
 
# View injected Pod:
kubectl get pod my-app -o yaml
# containers:
#   - name: my-app        # Your application
#   - name: istio-proxy   # Injected sidecar
# initContainers:
#   - name: istio-init    # Sets up traffic interception

Resource Overhead

Each sidecar consumes CPU and memory. In Istio, the default is ~100m CPU and ~128Mi memory per sidecar. For a cluster with 1,000 Pods, that's ~100 CPU cores and ~128GB of memory just for sidecars. Plan capacity accordingly.

Mutual TLS: Zero-Trust Service Security

Mutual TLS (mTLS) is one of the most compelling service mesh features. It encrypts all service-to-service traffic and provides cryptographic identity verification for both ends of every connection.

How mTLS Works in a Service Mesh

mtls-flow.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# mTLS Handshake between Service A and Service B:
 
1. Service A's sidecar initiates connection to Service B
2. Service A's sidecar presents its certificate:
   - Identity: spiffe://cluster.local/ns/production/sa/service-a
   - Signed by: Mesh CA (Istiod, Linkerd Identity)
3. Service B's sidecar verifies:
   - Certificate is valid and not expired
   - Signed by trusted CA
   - Identity matches expected caller (AuthorizationPolicy)
4. Service B's sidecar presents its certificate (mutual verification)
5. Service A's sidecar verifies Service B's identity
6. Encrypted channel established (TLS 1.3)
7. Application traffic flows encrypted
 
# Certificate Identity (SPIFFE format):
spiffe://cluster.local/ns/<namespace>/sa/<service-account>
 
# Example:
spiffe://cluster.local/ns/production/sa/user-service
 
# This cryptographically proves:
# - The workload is in the 'production' namespace
# - The workload runs as ServiceAccount 'user-service'
# - The mesh CA has validated this identity

istio-mtls-config.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Enable strict mTLS for a namespace (Istio)
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT            # STRICT, PERMISSIVE, or DISABLE
    # STRICT: Reject non-mTLS connections
    # PERMISSIVE: Accept both mTLS and plaintext (for migration)
    # DISABLE: No mTLS
 
---
# Enable strict mTLS mesh-wide
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system   # Mesh-wide default
spec:
  mtls:
    mode: STRICT
 
---
# Destination rule to enforce client-side mTLS
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: enable-mtls
  namespace: production
spec:
  host: "*.production.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL    # Use mesh certificates
 
---
# Verify mTLS is working
# Check if connection is encrypted:
kubectl exec -n production deploy/service-a -c istio-proxy -- \
  openssl s_client -connect service-b:80 -showcerts

mTLS Benefits

•Encryption in transit — All service traffic encrypted
•Workload identity — Cryptographic proof of caller identity
•Automatic rotation — Mesh handles certificate lifecycle
•Zero application changes — Transparent to services
•Compliance — Meets encryption requirements (PCI, HIPAA)

mTLS Considerations

•CPU overhead — TLS handshakes and encryption cost CPU
•Latency — Handshake adds ~1-5ms to connection setup
•Debugging complexity — Can't easily inspect traffic
•External service integration — May need special config
•Certificate management — Mesh CA is a critical component

Advanced Traffic Management

Service meshes provide sophisticated traffic management capabilities that go far beyond what Kubernetes Services offer. This enables canary deployments, A/B testing, traffic mirroring, and resilience patterns.

Traffic Splitting and Canary Deployments

traffic-splitting.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Istio VirtualService for traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews
  namespace: production
spec:
  hosts:
  - reviews                     # The service to configure
  http:
  - match:
    - headers:
        end-user:
          exact: "jason"        # Route test user to v3
    route:
    - destination:
        host: reviews
        subset: v3
  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 90                # 90% to v1
    - destination:
        host: reviews
        subset: v2
      weight: 10                # 10% to v2 (canary)
 
---
# DestinationRule defines subsets (versions)
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews
  namespace: production
spec:
  host: reviews
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
  - name: v3
    labels:
      version: v3

Resilience Patterns: Retries, Timeouts, Circuit Breakers

resilience-config.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# VirtualService with retries and timeouts
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: ratings
  namespace: production
spec:
  hosts:
  - ratings
  http:
  - route:
    - destination:
        host: ratings
    retries:
      attempts: 3               # Retry up to 3 times
      perTryTimeout: 2s         # Timeout per attempt
      retryOn: 5xx,reset,connect-failure,refused-stream
    timeout: 10s                # Total request timeout
 
---
# DestinationRule with circuit breaker
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: ratings
  namespace: production
spec:
  host: ratings
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100     # Max connections to service
      http:
        h2UpgradePolicy: UPGRADE
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
    outlierDetection:           # Circuit breaker
      consecutive5xxErrors: 5   # Trip after 5 errors
      interval: 30s             # Analysis window
      baseEjectionTime: 30s     # Initial ejection time
      maxEjectionPercent: 50    # Max % of hosts to eject
      minHealthPercent: 30      # Min healthy hosts before tripping

Traffic Mirroring (Shadow Traffic)

Mirror production traffic to test new versions without affecting users:

traffic-mirroring.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: ratings
spec:
  hosts:
  - ratings
  http:
  - route:
    - destination:
        host: ratings
        subset: v1
      weight: 100               # 100% to v1 (production)
    mirror:
      host: ratings
      subset: v2                # Mirror copy to v2
    mirrorPercentage:
      value: 100.0              # Mirror 100% of traffic
 
# Traffic flow:
# Client → ratings-v1 (responds to client)
#       ↘ ratings-v2 (response discarded, but logs/metrics captured)

Progressive Delivery with Service Mesh

Service meshes enable sophisticated progressive delivery strategies:

Shadow testing: Mirror traffic to new version, observe behavior
Canary release: Route 1% → 5% → 25% → 100% gradually
Header-based routing: Test internally with specific headers
Automatic rollback: Use mesh observability to detect issues and rollback

Tools like Flagger and Argo Rollouts integrate with meshes for automated progressive delivery.

Service-Level Authorization

While Kubernetes Network Policies operate at L3/L4 (IPs and ports), service mesh authorization works at L7, enabling identity-based access control with understanding of HTTP methods, paths, and headers.

authorization-policies.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# Deny all traffic by default (zero trust)
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: production
spec:
  {}  # Empty spec = deny all
 
---
# Allow specific service-to-service communication
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - "cluster.local/ns/production/sa/frontend"  # SPIFFE identity
    to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/api/v1/*"]
 
---
# Allow based on JWT claims (end-user identity)
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: require-jwt
  namespace: production
spec:
  selector:
    matchLabels:
      app: orders
  action: ALLOW
  rules:
  - from:
    - source:
        requestPrincipals: ["https://auth.example.com/*"]  # JWT issuer
    when:
    - key: request.auth.claims[groups]
      values: ["admin", "order-managers"]
 
---
# Deny access from specific namespaces
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: deny-untrusted
  namespace: production
spec:
  selector:
    matchLabels:
      app: database
  action: DENY
  rules:
  - from:
    - source:
        namespaces: ["development", "testing"]

Network Policy vs Service Mesh Authorization
Aspect	Network Policy	Service Mesh AuthZ
Layer	L3/L4 (IP, Port)	L7 (HTTP, gRPC)
Identity	Labels, Namespaces	Cryptographic (mTLS/JWT)
Granularity	Pod-level	Request-level (method, path)
JWT Support	No	Yes
Logging	Limited	Rich audit logging
Enforcement	CNI/iptables	Sidecar proxy
Overhead	Minimal	Sidecar resources

Mesh-Level Observability

Service meshes provide consistent observability across all services without requiring application instrumentation. The sidecar proxies emit metrics, traces, and logs for every request.

Automatic Metrics

Proxies automatically generate RED metrics (Rate, Errors, Duration) for all traffic:

mesh-metrics.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Istio generates Prometheus metrics automatically:
 
# Request rate by source/destination:
istio_requests_total{
  source_workload="frontend",
  destination_workload="api",
  response_code="200"
}
 
# Request latency histogram:
istio_request_duration_milliseconds_bucket{
  source_workload="frontend",
  destination_workload="api",
  le="100"
}
 
# TCP bytes:
istio_tcp_sent_bytes_total
istio_tcp_received_bytes_total
 
# Example Prometheus queries:
 
# Request rate per service:
sum(rate(istio_requests_total[5m])) by (destination_workload)
 
# P99 latency:
histogram_quantile(0.99, 
  sum(rate(istio_request_duration_milliseconds_bucket[5m])) 
  by (destination_workload, le)
)
 
# Error rate:
sum(rate(istio_requests_total{response_code=~"5.*"}[5m])) 
  / sum(rate(istio_requests_total[5m]))

Distributed Tracing

Service meshes integrate with tracing backends (Jaeger, Zipkin, Datadog) to provide request-level visibility:

tracing-config.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Istio telemetry configuration for tracing
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: mesh-default
  namespace: istio-system
spec:
  tracing:
  - randomSamplingPercentage: 100.0  # Sample 100% in dev; 1-5% in prod
    providers:
    - name: jaeger
    customTags:
      environment:
        literal:
          value: "production"
 
---
# Applications must propagate trace headers:
# x-request-id
# x-b3-traceid
# x-b3-spanid
# x-b3-parentspanid
# x-b3-sampled
# x-b3-flags
# x-ot-span-context
 
# Or for W3C Trace Context:
# traceparent
# tracestate

Service Graph Visualization

Mesh observability tools visualize service dependencies and traffic flow:

Mesh Observability Tools

•Kiali (Istio): Service graph, traffic animation, configuration validation
•Linkerd Viz: Built-in dashboard with Grafana dashboards
•Jaeger/Zipkin: Distributed tracing visualization
•Grafana: Metrics dashboards using mesh-generated Prometheus data
•Envoy Access Logs: Detailed per-request logging for debugging

Golden Signals from Mesh Data

The mesh provides the four golden signals of monitoring without any application changes:

• Latency: istio_request_duration_milliseconds • Traffic: istio_requests_total • Errors: istio_requests_total{response_code=~"5.*"} • Saturation: Connection pool exhaustion, queue depth

Build dashboards and alerts around these metrics for comprehensive SLO monitoring.

Service Mesh and Kubernetes Networking Integration

Service meshes work alongside Kubernetes networking, not as a replacement. Understanding how they integrate helps you design effective architectures.

Kubernetes Networking vs Service Mesh
Component	Kubernetes Native	With Service Mesh
Service Discovery	ClusterIP + DNS	ClusterIP + DNS (mesh uses this)
Load Balancing	kube-proxy (L4)	Sidecar (L7, advanced algorithms)
External Traffic	Ingress Controller	Mesh Gateway (Istio Gateway) or Ingress
Network Security	Network Policies (L3/L4)	Network Policies + AuthZ Policies (L7)
TLS	Application or Ingress	Automatic mTLS everywhere
Traffic Management	None	VirtualServices, DestinationRules
Observability	Application-implemented	Automatic (proxy-generated)

mesh-gateway.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# Istio Gateway replaces/complements Ingress for mesh traffic
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: main-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway     # Use Istio's ingress gateway deployment
  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: main-tls-secret
    hosts:
    - "*.example.com"
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*.example.com"
    tls:
      httpsRedirect: true     # Redirect HTTP to HTTPS
 
---
# VirtualService binds Gateway to internal services
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: api-routing
  namespace: production
spec:
  hosts:
  - "api.example.com"
  gateways:
  - istio-system/main-gateway  # Reference the Gateway
  http:
  - match:
    - uri:
        prefix: /v1
    route:
    - destination:
        host: api-v1
        port:
          number: 80
  - match:
    - uri:
        prefix: /v2
    route:
    - destination:
        host: api-v2
        port:
          number: 80

Keep Network Policies

Even with a service mesh, keep Kubernetes Network Policies as defense-in-depth. Network Policies operate at the CNI level (before traffic reaches sidecars) and provide coarse-grained segmentation. Mesh authorization adds fine-grained L7 controls on top.

When to Use a Service Mesh

Service meshes add significant value but also complexity and resource overhead. The decision to adopt one should be based on your specific requirements.

Strong Indicators FOR a Service Mesh

•50+ microservices with complex communication patterns
•Strict security requirements (encryption, audit logging)
•Compliance mandates requiring mTLS (PCI-DSS, HIPAA)
•Need for advanced deployments (canary, traffic splitting)
•Polyglot environment where consistent observability is hard
•Zero-trust security model implementation
•Dedicated platform team to operate the mesh

Indicators AGAINST a Service Mesh

•Fewer than 10-20 services (overhead may outweigh benefits)
•Resource-constrained environments (edge, IoT)
•Latency-critical applications where microseconds matter
•Small team without capacity to operate additional infrastructure
•Simple architecture without complex traffic requirements
•Already using embedded clients (SDK-based service mesh)
•High-frequency internal calls where proxy overhead is significant

Alternatives to Full Service Mesh

If you need only some mesh features, consider lighter-weight alternatives:

Need	Mesh-Free Alternative
mTLS	SPIFFE/SPIRE with native app support
Observability	OpenTelemetry SDK in applications
Retries/Timeouts	Client libraries (resilience4j, Polly)
Canary Deployments	Argo Rollouts, Flagger with Ingress
Rate Limiting	API Gateway (Kong, Ambassador)
Circuit Breakers	Client libraries or sidecar-less options

Start Simple, Evolve

Don't adopt a service mesh preemptively. Start with Kubernetes native networking. As your architecture grows and requirements emerge, evaluate if mesh benefits outweigh the complexity. Many successful organizations run hundreds of services without a mesh, relying on client libraries and API gateways.

Summary: Service Mesh Integration

We've comprehensively covered service mesh integration with Kubernetes networking. Let's consolidate the key takeaways:

Key Takeaways

•Service meshes add L7 capabilities to Kubernetes networking through sidecar proxies.
•Traffic interception via iptables transparently routes all traffic through sidecars without application changes.
•mTLS provides encryption and identity for all service-to-service communication automatically.
•Advanced traffic management enables canary deployments, traffic splitting, mirroring, and resilience patterns.
•Authorization policies work at L7, complementing Kubernetes Network Policies with identity-aware access control.
•Automatic observability provides metrics, traces, and logs without application instrumentation.
•Evaluate mesh adoption carefully — the benefits must outweigh the complexity and resource overhead for your specific situation.

Module Complete:

You've now completed the Kubernetes Networking module. We covered:

Service Types — ClusterIP, NodePort, LoadBalancer for Pod connectivity
Ingress Controllers — L7 traffic management and TLS termination
Network Policies — Zero-trust Pod security and segmentation
DNS in Kubernetes — Service discovery with CoreDNS
Service Mesh Integration — Advanced networking beyond native Kubernetes

With this knowledge, you can design, implement, and troubleshoot networking for production Kubernetes clusters at any scale.

Module Complete

Congratulations! You now have a comprehensive understanding of Kubernetes networking—from native primitives through service mesh integration. You can design secure, scalable, and observable networking architectures for containerized applications. Continue to the next module to explore Kubernetes storage!