System Design HLDLayer 4 vs Layer 7 Load Balancing

Layer 4 vs Layer 7 Load Balancing

LevelIntermediate

Duration75 mins

TopicLayer 4 vs Layer 7 Load Balancing

2 / 5

Layer 7 Load Balancing: Application Layer (HTTP)

Intelligence at the Application Layer

When you log into a web application and are routed to a server that already has your session cached, when a mobile app version receives different API responses than the web version, or when A/B testing silently directs 10% of users to a new feature—Layer 7 load balancing is at work. Unlike its Layer 4 counterpart, a Layer 7 load balancer doesn't just see packets; it understands the application protocol, parsing HTTP requests, inspecting headers, modifying responses, and making intelligent routing decisions based on content.

This intelligence comes at a cost: Layer 7 load balancers must terminate connections, parse protocols, and potentially re-encrypt traffic. But the capabilities this enables—content-based routing, request transformation, protocol translation, sophisticated health checking, and granular observability—make Layer 7 indispensable for modern application architectures.

What You Will Learn

By the end of this page, you will understand how Layer 7 load balancers process HTTP/HTTPS traffic, the routing capabilities enabled by application-layer inspection, TLS termination strategies and their trade-offs, and the use cases where Layer 7 is essential despite its performance overhead.

Layer 7 in the Network Stack

Layer 7—the Application Layer in the OSI model—is where application protocols like HTTP, HTTPS, gRPC, and WebSocket operate. A Layer 7 load balancer fully participates in these protocols, acting as a client to backend servers and a server to incoming clients.

The Full Protocol Stack View

At Layer 7, the load balancer has visibility into everything:

Layer 3: Source/destination IP addresses
Layer 4: TCP ports, connection state
Layer 5-6: TLS handshake, session establishment, compression
Layer 7: HTTP methods, URLs, headers, cookies, request bodies, response codes

This complete visibility enables routing decisions impossible at lower layers.

Information Available for Layer 7 Routing Decisions
Category	Attributes	Routing Examples
Request Line	HTTP method, URL path, query parameters	Route /api/* to API servers, /static/* to CDN origin
Headers	Host, User-Agent, Accept, Authorization, Custom headers	Route mobile clients differently, A/B testing by header
Cookies	Session ID, user preferences, feature flags	Sticky sessions, canary deployments
Request Body	JSON/XML payload, form data	Content-based routing (e.g., customer tier in request)
TLS	SNI hostname, client certificate	Virtual hosting, mutual TLS authentication
Response	Status code, headers, body	Error handling, response transformation

Connection Termination and Re-establishment

The fundamental architectural difference between Layer 4 and Layer 7 is connection handling:

Layer 4: Connections pass through the load balancer (NAT) or around it (DSR). The load balancer doesn't participate in the protocol.

Layer 7: The load balancer terminates the client connection and establishes a new connection to the backend. These are separate TCP connections:

Client ↔ Load Balancer (Connection A)
Load Balancer ↔ Backend (Connection B)

The load balancer receives the complete HTTP request on Connection A, parses it, makes a routing decision, and forwards it on Connection B. This is why Layer 7 is sometimes called a proxy or reverse proxy.

Converting Mermaid diagram...

Connection Pooling

Layer 7 load balancers typically maintain connection pools to backend servers instead of establishing new connections for every request. This amortizes the TCP handshake cost across multiple requests and is essential for high-throughput environments. The load balancer multiplexes many client connections onto fewer backend connections.

HTTP Request Routing Strategies

The power of Layer 7 load balancing lies in content-based routing—directing requests based on their content rather than just their source. This enables sophisticated traffic management patterns.

Path-Based Routing

Routing based on the URL path is the most common Layer 7 pattern:

/api/*           → API servers (high-memory instances)
/static/*        → Static file servers (CDN origin)
/admin/*         → Admin panel (secured network)
/health          → Health check endpoint (any server)
/ws/*            → WebSocket servers (sticky sessions)

This allows a single load balancer to front multiple distinct services, routing to appropriate backends based on the request URL.

nginx-path-routing.conf
NGINX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# NGINX Layer 7 Path-Based Routing Configuration
 
upstream api_servers {
    server api-1.internal:8080 weight=3;
    server api-2.internal:8080 weight=2;
    server api-3.internal:8080 weight=1;
    keepalive 32;  # Connection pool
}
 
upstream static_servers {
    server static-1.internal:80;
    server static-2.internal:80;
    keepalive 64;
}
 
upstream websocket_servers {
    ip_hash;  # Sticky sessions for WebSocket
    server ws-1.internal:9000;
    server ws-2.internal:9000;
}
 
server {
    listen 443 ssl http2;
    server_name api.example.com;
    
    # Path-based routing
    location /api/ {
        proxy_pass http://api_servers;
        proxy_http_version 1.1;
        proxy_set_header Connection "";  # Enable keepalive
    }
    
    location /static/ {
        proxy_pass http://static_servers;
        proxy_cache_valid 200 1h;
    }
    
    location /ws/ {
        proxy_pass http://websocket_servers;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Header-Based Routing

Routing based on HTTP headers enables sophisticated traffic segmentation:

Host header (Virtual hosting): Route requests by domain name

api.example.com → API cluster
www.example.com → Web frontend cluster
admin.example.com → Admin cluster

User-Agent routing: Different backends for different clients

Mobile app → Mobile-optimized APIs
Web browser → Full web application
Bot/Crawler → Read replicas, rate-limited

Custom headers for traffic management:

X-Feature-Flag: new-checkout → Canary servers
X-Client-Version: 2.0 → Version-specific backends
X-Debug: true → Debug-enabled servers

Common Header-Based Routing Patterns
Header	Pattern	Use Case
Host	Match domain name	Multi-tenant, virtual hosting
User-Agent	Contains 'Mobile' or 'iOS'	Mobile-specific backends
Authorization	JWT audience claim	Multi-service authentication
Accept-Language	Match locale	Geo-localized content
X-Forwarded-For	IP range match	Internal vs external traffic
Cookie	Contains feature flag	A/B testing, canary releases

Method-Based Routing

HTTP methods can drive routing decisions:

GET requests → Read replica servers (horizontally scaled)
POST/PUT/DELETE → Primary database servers (write path)
OPTIONS → Lightweight CORS handlers

This pattern is foundational for read-write splitting in database architectures.

Query Parameter Routing

Routing based on query parameters enables dynamic traffic management:

/search?region=us    → US search cluster
/search?region=eu    → EU search cluster
/api?version=2       → API v2 servers
/checkout?test=true  → Test environment

Routing Complexity Trade-off

While Layer 7 routing is powerful, complex routing rules can become difficult to maintain and debug. Each routing decision adds latency and cognitive overhead. Design routing strategies that are as simple as possible while meeting requirements—overly clever routing often leads to operational nightmares.

TLS Termination and Re-encryption

HTTPS traffic is encrypted, which presents a fundamental challenge: how can the load balancer inspect HTTP content for routing if the content is encrypted? The answer is TLS termination—decrypting traffic at the load balancer.

TLS Termination Patterns

There are three primary approaches to handling TLS in load-balanced environments:

1. TLS Termination at Load Balancer (Most Common)

Client → [HTTPS] → Load Balancer → [HTTP] → Backend

Load balancer holds SSL certificates
Decrypts incoming traffic, routes based on content
Forwards unencrypted HTTP to backends
Benefits: Full Layer 7 capability, centralized certificate management
Drawback: Unencrypted internal traffic

2. TLS Termination with Re-encryption

Client → [HTTPS] → Load Balancer → [HTTPS] → Backend

Load balancer decrypts, inspects, then re-encrypts
Backends receive encrypted traffic
Benefits: End-to-end encryption, Layer 7 routing
Drawback: Double encryption overhead, certificate management complexity

3. TLS Passthrough (Layer 4 behavior)

Client → [HTTPS] → Load Balancer → [HTTPS] → Backend

Load balancer passes encrypted traffic unchanged
Routing based on SNI (Server Name Indication) only
Benefits: No certificate management at LB, backend controls encryption
Drawback: No Layer 7 routing capability

Converting Mermaid diagram...

SNI-Based Routing

Server Name Indication (SNI) is a TLS extension where the client sends the target hostname in the initial TLS handshake (ClientHello). This enables:

Virtual hosting with a single IP: Multiple SSL certificates on one IP address
TLS passthrough with routing: Route encrypted traffic based on intended hostname without decryption

SNI-based routing allows a Layer 4 load balancer to make routing decisions for HTTPS traffic without terminating TLS—a hybrid approach.

Performance Impact of TLS

TLS adds significant overhead:

Handshake latency: 1-2 round trips for TLS 1.2, 0-1 for TLS 1.3
CPU cost: Asymmetric cryptography (RSA/ECDSA) for handshake, symmetric for data
Memory: Session state, certificate chains

Modern hardware acceleration (AES-NI, specialized TLS offload) mitigates CPU costs, but TLS remains the primary performance differentiator between Layer 4 and Layer 7.

TLS Termination Strategy Comparison
Strategy	Security	Performance	Layer 7 Features	Complexity
Termination only	Good (internal trust)	Best	Full	Low
Re-encryption	Excellent	Moderate	Full	High
Passthrough	Excellent	Best	None (SNI only)	Low
Mutual TLS	Excellent + AuthN	Moderate	Full	High

Mutual TLS (mTLS)

In zero-trust architectures, mutual TLS requires both client and server to present certificates. Layer 7 load balancers can terminate client mTLS, validate the client certificate, and pass identity information (e.g., CN, SAN) to backends via headers. This centralizes certificate validation while enabling certificate-based authentication.

Request and Response Manipulation

Layer 7 load balancers don't just route—they transform. The ability to modify requests and responses passing through enables powerful patterns for observability, security, and compatibility.

Request Header Manipulation

Common request modifications:

Standard proxy headers:

X-Forwarded-For: Original client IP (appended if exists)
X-Forwarded-Proto: Original protocol (http/https)
X-Forwarded-Host: Original Host header
X-Real-IP: Client IP (single value)

Custom headers for routing context:

X-Request-ID: Unique identifier for distributed tracing
X-Correlation-ID: Cross-service correlation
X-Client-Cert-CN: Client certificate common name (from mTLS)
X-Backend-Server: For debugging routing decisions

envoy-header-manipulation.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Envoy Proxy Header Manipulation Configuration
routes:
  - match:
      prefix: "/api/"
    route:
      cluster: api_cluster
    request_headers_to_add:
      - header:
          key: "X-Request-ID"
          value: "%REQ(X-Request-ID)%"  # Preserve if exists
        append: false
      - header:
          key: "X-Forwarded-Proto"
          value: "%DOWNSTREAM_PROTOCOL%"
      - header:
          key: "X-Request-Start"
          value: "%START_TIME(%s.%3f)%"  # Timestamp for latency tracking
    request_headers_to_remove:
      - "X-Debug"  # Strip debug header for production
    response_headers_to_add:
      - header:
          key: "X-Served-By"
          value: "%UPSTREAM_HOST%"
      - header:
          key: "X-Response-Time"
          value: "%RESPONSE_DURATION%ms"

Response Manipulation

Response modifications serve different purposes:

Security headers (added by LB):

Strict-Transport-Security: HSTS enforcement
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Content Security Policy headers

CORS headers:

Access-Control-Allow-Origin
Access-Control-Allow-Methods
Access-Control-Allow-Headers

Centralizing security headers at the load balancer ensures consistent enforcement without requiring each backend to implement them.

URL Rewriting

Layer 7 load balancers can transform URLs before forwarding:

Incoming: /api/v1/users/123
Rewritten: /users/123

Incoming: /legacy-endpoint
Rewritten: /v2/new-endpoint

This enables:

API versioning: Strip version prefix, route to appropriate backend
Migration: Redirect old URLs to new paths without client changes
Backend simplification: Normalize URLs across different client conventions

Advanced Manipulation Capabilities

•Request body modification: Parse and modify JSON payloads (with significant performance impact)
•Protocol translation: Convert between HTTP/1.1 and HTTP/2, gRPC-Web to gRPC
•Compression: Compress responses or decompress requests
•Response caching: Cache responses at the load balancer layer
•Rate limiting injection: Add rate limit headers based on centralized limits
•Request signing: Add HMAC or JWT signatures for backend authentication

Body Inspection Performance Impact

While Layer 7 load balancers can inspect and modify request/response bodies, this is expensive. The entire body must be buffered in memory before processing, adding latency and memory pressure. Use body inspection sparingly—typically only for security scanning or specific transformation requirements.

Advanced Traffic Management

Layer 7 load balancers enable sophisticated traffic management patterns that are impossible at lower layers. These capabilities form the foundation of modern deployment strategies and resilience patterns.

Canary Deployments

Gradually roll out changes by sending a small percentage of traffic to new versions:

Weight-based splitting: 95% to stable, 5% to canary
Header-based routing: Specific users/teams to canary
Cookie-based assignment: Consistent user experience during canary

The load balancer tracks metrics (error rates, latency) per backend, enabling automated rollback if the canary performs poorly.

A/B Testing

Route users to different backends for experimentation:

Assign users to cohorts based on user ID hash, cookie, or header
Ensure sticky routing for test duration
Track metrics per cohort for analysis

traffic-splitting.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Kubernetes Gateway API Traffic Splitting
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: canary-route
spec:
  parentRefs:
    - name: production-gateway
  rules:
    # Specific header routes to canary
    - matches:
        - headers:
            - name: "X-Canary"
              value: "true"
      backendRefs:
        - name: api-canary
          port: 8080
          weight: 100
    
    # Default traffic: 95% stable, 5% canary
    - backendRefs:
        - name: api-stable
          port: 8080
          weight: 95
        - name: api-canary
          port: 8080
          weight: 5

Blue-Green Deployments

Maintain two identical production environments:

Blue: Current production
Green: New version (fully tested)

Switch all traffic instantly by changing load balancer routing. Rollback is equally instant. The load balancer provides the abstraction that makes this possible.

Circuit Breaking

Layer 7 load balancers can implement circuit breakers:

Closed (normal): Requests flow to backend
Open (tripped): Requests fail fast without reaching backend
Half-open (testing): Limited requests to check recovery

Triggers include:

Error rate threshold exceeded
Latency threshold exceeded
Consecutive failures
Connection pool exhaustion

Traffic Management Capabilities Matrix

•Request mirroring (shadowing): Send copy of live traffic to test environment without affecting production responses
•Rate limiting: Per-client, per-endpoint, or global request rate limits with configurable response (429, retry-after)
•Request buffering: Buffer slow client uploads before forwarding to backends
•Retry logic: Automatic retries on 5xx errors with configurable backoff
•Timeout management: Connection, read, and write timeouts with per-route overrides
•Fault injection: Deliberately inject delays or errors for chaos engineering

Request Mirroring for Safe Testing

Request mirroring (also called traffic shadowing) sends a copy of production traffic to a secondary cluster. The mirror responses are discarded, but you can validate new code against real traffic patterns. This is invaluable for testing performance characteristics, catching edge cases, and validating behavior before canary deployment.

Health Checking and Observability

Layer 7 load balancers provide rich health checking and observability capabilities, leveraging their protocol awareness to deliver insights impossible at Layer 4.

Application-Aware Health Checks

Layer 7 health checks validate application health, not just network connectivity:

HTTP health checks:

Request specific health endpoints (/health, /ready)
Validate response status codes (expect 200-299)
Check response body content (JSON field validation)
Verify response time within thresholds

Sophisticated health semantics:

Readiness: Can the service handle traffic?
Liveness: Is the service alive (even if overloaded)?
Startup: Has the service finished initializing?

health-check-config.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Envoy Health Check Configuration
clusters:
  - name: api_cluster
    health_checks:
      - timeout: 5s
        interval: 10s
        unhealthy_threshold: 3
        healthy_threshold: 2
        http_health_check:
          path: "/health/ready"
          host: "internal-health-check"
          expected_statuses:
            - start: 200
              end: 299
          response_assertions:
            - text_match:
                exact: '"status":"healthy"'
                header: false  # Check body
        event_log_path: /var/log/health-checks.log

Request-Level Metrics

Layer 7 load balancers emit detailed per-request metrics:

Latency metrics:

Time to first byte (TTFB)
Total request duration
Backend response time
Queue wait time

Request/Response metrics:

Request count by status code (2xx, 4xx, 5xx)
Request size and response size distributions
Requests by backend, path, method, client

Error tracking:

Connection errors vs HTTP errors
Timeout breakdown (connect, read, write)
Retry attempts and outcomes

Distributed Tracing Integration

Layer 7 load balancers participate in distributed tracing:

Trace context propagation: Forward or generate X-Request-ID, traceparent, X-B3-* headers
Span creation: Generate load balancer spans showing routing and proxy latency
Baggage handling: Propagate application-specific context

Key Layer 7 Observability Metrics
Metric	Description	Alert Threshold Example
request_duration_p99	99th percentile latency	500ms for 5 minutes
request_error_rate	5xx responses / total	1% for 2 minutes
backend_health	Healthy backends / total	< 50%
connection_pool_usage	Active / max connections	80%
upstream_rq_retry	Retry rate	5%
downstream_rq_timeout	Client timeout rate	0.1%

Centralized Observability Value

Because all traffic flows through the Layer 7 load balancer, it provides a unique vantage point for observability. Error rates, latency distributions, and traffic patterns are visible without instrumentation in every backend. This makes the load balancer a critical data source for incident response and capacity planning.

Protocol Support Beyond HTTP

While HTTP dominates Layer 7 load balancing, modern load balancers support a growing array of application protocols, each with unique routing and management capabilities.

WebSocket

WebSocket provides full-duplex communication over a single TCP connection. Layer 7 load balancers must:

Recognize the Upgrade: websocket header and 101 Switching Protocols response
Maintain the connection for extended periods (not just request-response)
Implement sticky sessions (connection must persist to the same backend)
Handle connection draining during deployments

gRPC

gRPC uses HTTP/2 as its transport, enabling Layer 7 load balancers to:

Route based on gRPC service and method names
Stream bidirectionally (not just request-response)
Health check using the gRPC health checking protocol
Handle gRPC-specific trailers for status reporting

HTTP/2 and HTTP/3

HTTP/2 introduces multiplexing—multiple requests over a single connection. Layer 7 load balancers must:

Demultiplex streams for per-request routing decisions
Manage stream priorities and flow control
Support HTTP/2 to backend or downgrade to HTTP/1.1

HTTP/3 (QUIC-based) adds further complexity:

UDP-based transport (not TCP)
Connection migration (client IP may change)
Integrated TLS 1.3
Per-stream encryption

Layer 7 Protocol Support Matrix
Protocol	Routing Capabilities	Special Considerations
HTTP/1.1	Full (path, headers, method, body)	Connection: keep-alive management
HTTP/2	Full with stream awareness	Multiplexing, HPACK compression
HTTP/3	Full but emerging	UDP, QUIC connection IDs
WebSocket	Initial handshake only	Long-lived connections, sticky required
gRPC	Service and method routing	Streaming, trailers, deadline propagation
GraphQL	Query/mutation parsing possible	Complex—typically via plugins

Protocol Translation

Some Layer 7 load balancers can translate between protocols:

gRPC-Web to gRPC: Enable browser clients to access gRPC backends
HTTP/1.1 to HTTP/2: Accept HTTP/1.1 from clients, use HTTP/2 to backends
JSON to gRPC: Transcoding between REST and gRPC (with protobuf definitions)

This enables gradual protocol migrations and supports clients that cannot use newer protocols.

HTTP/2 to Backends

Using HTTP/2 between the load balancer and backends can significantly improve performance by enabling connection multiplexing. Instead of maintaining large connection pools, a single HTTP/2 connection can carry hundreds of concurrent requests. This is particularly valuable in microservice architectures with high request rates between services.

Summary: Layer 7 Load Balancing

Layer 7 load balancing provides application-aware traffic management, enabling routing decisions and transformations impossible at the transport layer. This intelligence comes at a cost—connection termination, protocol parsing, and potential TLS overhead—but delivers capabilities essential for modern application architectures.

Key Takeaways

•Layer 7 terminates connections: Two separate connections (client↔LB, LB↔backend) enable full protocol inspection
•Content-based routing: Route by path, headers, cookies, query parameters, and even request body
•TLS termination is foundational: Required for HTTP content inspection; re-encryption available for end-to-end security
•Request/response manipulation: Add headers, rewrite URLs, transform protocols, inject security headers
•Advanced traffic management: Canary deployments, A/B testing, circuit breaking, rate limiting, request mirroring
•Rich observability: Per-request metrics, application-aware health checks, distributed tracing integration
•Multi-protocol support: HTTP/1.1, HTTP/2, HTTP/3, WebSocket, gRPC each with protocol-specific handling

What's next:

With Layer 4 and Layer 7 fundamentals established, the next page examines the performance vs. flexibility trade-off—quantifying the overhead of Layer 7 processing and establishing decision criteria for when each layer is appropriate.

Page Complete

You now understand how Layer 7 load balancing leverages application-layer awareness to deliver intelligent traffic routing, transformation, and management. This positions you to evaluate the trade-offs between Layer 4 and Layer 7 approaches in different scenarios.

2 / 5

Loading learning content...

System Design HLDLayer 4 vs Layer 7 Load Balancing

Layer 4 vs Layer 7 Load Balancing

LevelIntermediate

Duration75 mins

TopicLayer 4 vs Layer 7 Load Balancing

2 / 5

Layer 7 Load Balancing: Application Layer (HTTP)

Intelligence at the Application Layer

What You Will Learn

Layer 7 in the Network Stack

The Full Protocol Stack View

At Layer 7, the load balancer has visibility into everything:

Layer 3: Source/destination IP addresses
Layer 4: TCP ports, connection state
Layer 5-6: TLS handshake, session establishment, compression
Layer 7: HTTP methods, URLs, headers, cookies, request bodies, response codes

This complete visibility enables routing decisions impossible at lower layers.

Information Available for Layer 7 Routing Decisions
Category	Attributes	Routing Examples
Request Line	HTTP method, URL path, query parameters	Route /api/* to API servers, /static/* to CDN origin
Headers	Host, User-Agent, Accept, Authorization, Custom headers	Route mobile clients differently, A/B testing by header
Cookies	Session ID, user preferences, feature flags	Sticky sessions, canary deployments
Request Body	JSON/XML payload, form data	Content-based routing (e.g., customer tier in request)
TLS	SNI hostname, client certificate	Virtual hosting, mutual TLS authentication
Response	Status code, headers, body	Error handling, response transformation

Connection Termination and Re-establishment

The fundamental architectural difference between Layer 4 and Layer 7 is connection handling:

Layer 4: Connections pass through the load balancer (NAT) or around it (DSR). The load balancer doesn't participate in the protocol.

Layer 7: The load balancer terminates the client connection and establishes a new connection to the backend. These are separate TCP connections:

Client ↔ Load Balancer (Connection A)
Load Balancer ↔ Backend (Connection B)

Converting Mermaid diagram...

Connection Pooling

HTTP Request Routing Strategies

The power of Layer 7 load balancing lies in content-based routing—directing requests based on their content rather than just their source. This enables sophisticated traffic management patterns.

Path-Based Routing

Routing based on the URL path is the most common Layer 7 pattern:

/api/*           → API servers (high-memory instances)
/static/*        → Static file servers (CDN origin)
/admin/*         → Admin panel (secured network)
/health          → Health check endpoint (any server)
/ws/*            → WebSocket servers (sticky sessions)

This allows a single load balancer to front multiple distinct services, routing to appropriate backends based on the request URL.

nginx-path-routing.conf
NGINX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# NGINX Layer 7 Path-Based Routing Configuration
 
upstream api_servers {
    server api-1.internal:8080 weight=3;
    server api-2.internal:8080 weight=2;
    server api-3.internal:8080 weight=1;
    keepalive 32;  # Connection pool
}
 
upstream static_servers {
    server static-1.internal:80;
    server static-2.internal:80;
    keepalive 64;
}
 
upstream websocket_servers {
    ip_hash;  # Sticky sessions for WebSocket
    server ws-1.internal:9000;
    server ws-2.internal:9000;
}
 
server {
    listen 443 ssl http2;
    server_name api.example.com;
    
    # Path-based routing
    location /api/ {
        proxy_pass http://api_servers;
        proxy_http_version 1.1;
        proxy_set_header Connection "";  # Enable keepalive
    }
    
    location /static/ {
        proxy_pass http://static_servers;
        proxy_cache_valid 200 1h;
    }
    
    location /ws/ {
        proxy_pass http://websocket_servers;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Header-Based Routing

Routing based on HTTP headers enables sophisticated traffic segmentation:

Host header (Virtual hosting): Route requests by domain name

api.example.com → API cluster
www.example.com → Web frontend cluster
admin.example.com → Admin cluster

User-Agent routing: Different backends for different clients

Mobile app → Mobile-optimized APIs
Web browser → Full web application
Bot/Crawler → Read replicas, rate-limited

Custom headers for traffic management:

X-Feature-Flag: new-checkout → Canary servers
X-Client-Version: 2.0 → Version-specific backends
X-Debug: true → Debug-enabled servers

Common Header-Based Routing Patterns
Header	Pattern	Use Case
Host	Match domain name	Multi-tenant, virtual hosting
User-Agent	Contains 'Mobile' or 'iOS'	Mobile-specific backends
Authorization	JWT audience claim	Multi-service authentication
Accept-Language	Match locale	Geo-localized content
X-Forwarded-For	IP range match	Internal vs external traffic
Cookie	Contains feature flag	A/B testing, canary releases

Method-Based Routing

HTTP methods can drive routing decisions:

GET requests → Read replica servers (horizontally scaled)
POST/PUT/DELETE → Primary database servers (write path)
OPTIONS → Lightweight CORS handlers

This pattern is foundational for read-write splitting in database architectures.

Query Parameter Routing

Routing based on query parameters enables dynamic traffic management:

/search?region=us    → US search cluster
/search?region=eu    → EU search cluster
/api?version=2       → API v2 servers
/checkout?test=true  → Test environment

Routing Complexity Trade-off

TLS Termination and Re-encryption

TLS Termination Patterns

There are three primary approaches to handling TLS in load-balanced environments:

1. TLS Termination at Load Balancer (Most Common)

Client → [HTTPS] → Load Balancer → [HTTP] → Backend

Load balancer holds SSL certificates
Decrypts incoming traffic, routes based on content
Forwards unencrypted HTTP to backends
Benefits: Full Layer 7 capability, centralized certificate management
Drawback: Unencrypted internal traffic

2. TLS Termination with Re-encryption

Client → [HTTPS] → Load Balancer → [HTTPS] → Backend

Load balancer decrypts, inspects, then re-encrypts
Backends receive encrypted traffic
Benefits: End-to-end encryption, Layer 7 routing
Drawback: Double encryption overhead, certificate management complexity

3. TLS Passthrough (Layer 4 behavior)

Client → [HTTPS] → Load Balancer → [HTTPS] → Backend

Load balancer passes encrypted traffic unchanged
Routing based on SNI (Server Name Indication) only
Benefits: No certificate management at LB, backend controls encryption
Drawback: No Layer 7 routing capability

Converting Mermaid diagram...

SNI-Based Routing

Server Name Indication (SNI) is a TLS extension where the client sends the target hostname in the initial TLS handshake (ClientHello). This enables:

Virtual hosting with a single IP: Multiple SSL certificates on one IP address
TLS passthrough with routing: Route encrypted traffic based on intended hostname without decryption

SNI-based routing allows a Layer 4 load balancer to make routing decisions for HTTPS traffic without terminating TLS—a hybrid approach.

Performance Impact of TLS

TLS adds significant overhead:

Handshake latency: 1-2 round trips for TLS 1.2, 0-1 for TLS 1.3
CPU cost: Asymmetric cryptography (RSA/ECDSA) for handshake, symmetric for data
Memory: Session state, certificate chains

Modern hardware acceleration (AES-NI, specialized TLS offload) mitigates CPU costs, but TLS remains the primary performance differentiator between Layer 4 and Layer 7.

TLS Termination Strategy Comparison
Strategy	Security	Performance	Layer 7 Features	Complexity
Termination only	Good (internal trust)	Best	Full	Low
Re-encryption	Excellent	Moderate	Full	High
Passthrough	Excellent	Best	None (SNI only)	Low
Mutual TLS	Excellent + AuthN	Moderate	Full	High

Mutual TLS (mTLS)

Request and Response Manipulation

Layer 7 load balancers don't just route—they transform. The ability to modify requests and responses passing through enables powerful patterns for observability, security, and compatibility.

Request Header Manipulation

Common request modifications:

Standard proxy headers:

X-Forwarded-For: Original client IP (appended if exists)
X-Forwarded-Proto: Original protocol (http/https)
X-Forwarded-Host: Original Host header
X-Real-IP: Client IP (single value)

Custom headers for routing context:

X-Request-ID: Unique identifier for distributed tracing
X-Correlation-ID: Cross-service correlation
X-Client-Cert-CN: Client certificate common name (from mTLS)
X-Backend-Server: For debugging routing decisions

envoy-header-manipulation.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Envoy Proxy Header Manipulation Configuration
routes:
  - match:
      prefix: "/api/"
    route:
      cluster: api_cluster
    request_headers_to_add:
      - header:
          key: "X-Request-ID"
          value: "%REQ(X-Request-ID)%"  # Preserve if exists
        append: false
      - header:
          key: "X-Forwarded-Proto"
          value: "%DOWNSTREAM_PROTOCOL%"
      - header:
          key: "X-Request-Start"
          value: "%START_TIME(%s.%3f)%"  # Timestamp for latency tracking
    request_headers_to_remove:
      - "X-Debug"  # Strip debug header for production
    response_headers_to_add:
      - header:
          key: "X-Served-By"
          value: "%UPSTREAM_HOST%"
      - header:
          key: "X-Response-Time"
          value: "%RESPONSE_DURATION%ms"

Response Manipulation

Response modifications serve different purposes:

Security headers (added by LB):

Strict-Transport-Security: HSTS enforcement
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Content Security Policy headers

CORS headers:

Access-Control-Allow-Origin
Access-Control-Allow-Methods
Access-Control-Allow-Headers

Centralizing security headers at the load balancer ensures consistent enforcement without requiring each backend to implement them.

URL Rewriting

Layer 7 load balancers can transform URLs before forwarding:

Incoming: /api/v1/users/123
Rewritten: /users/123

Incoming: /legacy-endpoint
Rewritten: /v2/new-endpoint

This enables:

API versioning: Strip version prefix, route to appropriate backend
Migration: Redirect old URLs to new paths without client changes
Backend simplification: Normalize URLs across different client conventions

Advanced Manipulation Capabilities

•Request body modification: Parse and modify JSON payloads (with significant performance impact)
•Protocol translation: Convert between HTTP/1.1 and HTTP/2, gRPC-Web to gRPC
•Compression: Compress responses or decompress requests
•Response caching: Cache responses at the load balancer layer
•Rate limiting injection: Add rate limit headers based on centralized limits
•Request signing: Add HMAC or JWT signatures for backend authentication

Body Inspection Performance Impact

Advanced Traffic Management

Canary Deployments

Gradually roll out changes by sending a small percentage of traffic to new versions:

Weight-based splitting: 95% to stable, 5% to canary
Header-based routing: Specific users/teams to canary
Cookie-based assignment: Consistent user experience during canary

The load balancer tracks metrics (error rates, latency) per backend, enabling automated rollback if the canary performs poorly.

A/B Testing

Route users to different backends for experimentation:

Assign users to cohorts based on user ID hash, cookie, or header
Ensure sticky routing for test duration
Track metrics per cohort for analysis

traffic-splitting.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Kubernetes Gateway API Traffic Splitting
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: canary-route
spec:
  parentRefs:
    - name: production-gateway
  rules:
    # Specific header routes to canary
    - matches:
        - headers:
            - name: "X-Canary"
              value: "true"
      backendRefs:
        - name: api-canary
          port: 8080
          weight: 100
    
    # Default traffic: 95% stable, 5% canary
    - backendRefs:
        - name: api-stable
          port: 8080
          weight: 95
        - name: api-canary
          port: 8080
          weight: 5

Blue-Green Deployments

Maintain two identical production environments:

Blue: Current production
Green: New version (fully tested)

Switch all traffic instantly by changing load balancer routing. Rollback is equally instant. The load balancer provides the abstraction that makes this possible.

Circuit Breaking

Layer 7 load balancers can implement circuit breakers:

Closed (normal): Requests flow to backend
Open (tripped): Requests fail fast without reaching backend
Half-open (testing): Limited requests to check recovery

Triggers include:

Error rate threshold exceeded
Latency threshold exceeded
Consecutive failures
Connection pool exhaustion

Traffic Management Capabilities Matrix

•Request mirroring (shadowing): Send copy of live traffic to test environment without affecting production responses
•Rate limiting: Per-client, per-endpoint, or global request rate limits with configurable response (429, retry-after)
•Request buffering: Buffer slow client uploads before forwarding to backends
•Retry logic: Automatic retries on 5xx errors with configurable backoff
•Timeout management: Connection, read, and write timeouts with per-route overrides
•Fault injection: Deliberately inject delays or errors for chaos engineering

Request Mirroring for Safe Testing

Health Checking and Observability

Layer 7 load balancers provide rich health checking and observability capabilities, leveraging their protocol awareness to deliver insights impossible at Layer 4.

Application-Aware Health Checks

Layer 7 health checks validate application health, not just network connectivity:

HTTP health checks:

Request specific health endpoints (/health, /ready)
Validate response status codes (expect 200-299)
Check response body content (JSON field validation)
Verify response time within thresholds

Sophisticated health semantics:

Readiness: Can the service handle traffic?
Liveness: Is the service alive (even if overloaded)?
Startup: Has the service finished initializing?

health-check-config.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Envoy Health Check Configuration
clusters:
  - name: api_cluster
    health_checks:
      - timeout: 5s
        interval: 10s
        unhealthy_threshold: 3
        healthy_threshold: 2
        http_health_check:
          path: "/health/ready"
          host: "internal-health-check"
          expected_statuses:
            - start: 200
              end: 299
          response_assertions:
            - text_match:
                exact: '"status":"healthy"'
                header: false  # Check body
        event_log_path: /var/log/health-checks.log

Request-Level Metrics

Layer 7 load balancers emit detailed per-request metrics:

Latency metrics:

Time to first byte (TTFB)
Total request duration
Backend response time
Queue wait time

Request/Response metrics:

Request count by status code (2xx, 4xx, 5xx)
Request size and response size distributions
Requests by backend, path, method, client

Error tracking:

Connection errors vs HTTP errors
Timeout breakdown (connect, read, write)
Retry attempts and outcomes

Distributed Tracing Integration

Layer 7 load balancers participate in distributed tracing:

Trace context propagation: Forward or generate X-Request-ID, traceparent, X-B3-* headers
Span creation: Generate load balancer spans showing routing and proxy latency
Baggage handling: Propagate application-specific context

Key Layer 7 Observability Metrics
Metric	Description	Alert Threshold Example
request_duration_p99	99th percentile latency	500ms for 5 minutes
request_error_rate	5xx responses / total	1% for 2 minutes
backend_health	Healthy backends / total	< 50%
connection_pool_usage	Active / max connections	80%
upstream_rq_retry	Retry rate	5%
downstream_rq_timeout	Client timeout rate	0.1%

Centralized Observability Value

Protocol Support Beyond HTTP

While HTTP dominates Layer 7 load balancing, modern load balancers support a growing array of application protocols, each with unique routing and management capabilities.

WebSocket

WebSocket provides full-duplex communication over a single TCP connection. Layer 7 load balancers must:

Recognize the Upgrade: websocket header and 101 Switching Protocols response
Maintain the connection for extended periods (not just request-response)
Implement sticky sessions (connection must persist to the same backend)
Handle connection draining during deployments

gRPC

gRPC uses HTTP/2 as its transport, enabling Layer 7 load balancers to:

Route based on gRPC service and method names
Stream bidirectionally (not just request-response)
Health check using the gRPC health checking protocol
Handle gRPC-specific trailers for status reporting

HTTP/2 and HTTP/3

HTTP/2 introduces multiplexing—multiple requests over a single connection. Layer 7 load balancers must:

Demultiplex streams for per-request routing decisions
Manage stream priorities and flow control
Support HTTP/2 to backend or downgrade to HTTP/1.1

HTTP/3 (QUIC-based) adds further complexity:

UDP-based transport (not TCP)
Connection migration (client IP may change)
Integrated TLS 1.3
Per-stream encryption

Layer 7 Protocol Support Matrix
Protocol	Routing Capabilities	Special Considerations
HTTP/1.1	Full (path, headers, method, body)	Connection: keep-alive management
HTTP/2	Full with stream awareness	Multiplexing, HPACK compression
HTTP/3	Full but emerging	UDP, QUIC connection IDs
WebSocket	Initial handshake only	Long-lived connections, sticky required
gRPC	Service and method routing	Streaming, trailers, deadline propagation
GraphQL	Query/mutation parsing possible	Complex—typically via plugins

Protocol Translation

Some Layer 7 load balancers can translate between protocols:

gRPC-Web to gRPC: Enable browser clients to access gRPC backends
HTTP/1.1 to HTTP/2: Accept HTTP/1.1 from clients, use HTTP/2 to backends
JSON to gRPC: Transcoding between REST and gRPC (with protobuf definitions)

This enables gradual protocol migrations and supports clients that cannot use newer protocols.

HTTP/2 to Backends

Summary: Layer 7 Load Balancing

Key Takeaways

•Layer 7 terminates connections: Two separate connections (client↔LB, LB↔backend) enable full protocol inspection
•Content-based routing: Route by path, headers, cookies, query parameters, and even request body
•TLS termination is foundational: Required for HTTP content inspection; re-encryption available for end-to-end security
•Request/response manipulation: Add headers, rewrite URLs, transform protocols, inject security headers
•Advanced traffic management: Canary deployments, A/B testing, circuit breaking, rate limiting, request mirroring
•Rich observability: Per-request metrics, application-aware health checks, distributed tracing integration
•Multi-protocol support: HTTP/1.1, HTTP/2, HTTP/3, WebSocket, gRPC each with protocol-specific handling

What's next:

Page Complete

2 / 5