System Design HLDLayer 4 vs Layer 7 Load Balancing

Layer 4 vs Layer 7 Load Balancing

LevelIntermediate

Duration75 mins

TopicLayer 4 vs Layer 7 Load Balancing

5 / 5

Hybrid Approaches

The Best of Both Worlds

Real-world production systems rarely fit neatly into a "Layer 4 only" or "Layer 7 only" category. The most sophisticated infrastructures—those powering hyperscale cloud providers, global content delivery networks, and the world's largest web properties—employ hybrid architectures that strategically layer L4 and L7 components.

These hybrid approaches don't just combine layers arbitrarily; they're designed to leverage each layer's strengths at the appropriate point in the request path. When architected thoughtfully, hybrid systems deliver performance approaching Layer 4 limits while retaining the intelligence and flexibility of Layer 7.

What You Will Learn

By the end of this page, you will understand the major hybrid architectural patterns used in production, how cloud providers implement multi-layer load balancing, the design principles that guide hybrid architecture decisions, and implementation considerations for building robust hybrid systems.

The Case for Hybrid Architectures

Hybrid architectures emerge from the recognition that different parts of the traffic path have different requirements. The entry point to your infrastructure has different constraints than the internal service mesh.

Why Not Pure Layer 4?

Pure Layer 4 architectures struggle with:

HTTP routing: Cannot route by path, header, or host
TLS termination: Each backend must manage its own certificates
Observability: Limited to connection-level metrics
Deployment strategies: No canary, A/B, or weighted routing
Security: No WAF, limited rate limiting options

Why Not Pure Layer 7?

Pure Layer 7 architectures face challenges with:

Scalability of L7 tier: L7 proxies require more resources per connection
Non-HTTP protocols: Cannot handle TCP/UDP services efficiently
Latency accumulation: Multiple L7 hops add up
TLS passthrough: Cannot support scenarios requiring end-to-end client encryption
Cost: 4-8x more compute for equivalent throughput

Hybrid Architecture Motivations
Challenge	Single-Layer Solution	Hybrid Solution
L7 scalability	Over-provision L7	L4 distributes to L7 pool
Mixed protocols	Separate infrastructures	L4 for non-HTTP, L7 for HTTP
TLS passthrough + routing	Compromise on one	L4 for passthrough, L7 for termination
Geographic distribution	DNS only	L4 Anycast + L7 routing
Cost optimization	Accept higher cost	Route traffic to appropriate layer

The Hybrid Principle

Use each layer for what it does best:

Layer 4: High-throughput distribution, TCP/UDP handling, geographic routing, health of L7 tier
Layer 7: Content routing, TLS termination, application features, observability, security

The goal is to minimize total latency while maximizing capability—pushing L4's efficiency as far as possible before adding L7 intelligence only where needed.

The 80/20 Rule for Hybrid

In most web architectures, 80%+ of traffic is HTTP/HTTPS that benefits from L7 features. The hybrid architecture ensures this traffic gets L7 treatment while the remaining 20% (databases, custom protocols, latency-critical paths) gets optimized L4 handling.

Pattern: Layer 4 in Front of Layer 7

The most common hybrid pattern: a Layer 4 tier that distributes traffic across a pool of Layer 7 proxies. This architecture combines L4's distribution efficiency with L7's intelligence.

Architecture Overview

Internet → DNS → L4 Load Balancer(s) → L7 Proxy Pool → Services

Layer 4 tier responsibilities:

Distribute connections across L7 instances using consistent hashing or round-robin
Health check L7 proxies (TCP health or simple HTTP health endpoint)
Provide high availability for the L7 tier
Optionally handle TLS passthrough (SNI-based routing to L7 instances)

Layer 7 tier responsibilities:

TLS termination (or second termination after L4 passthrough)
Content-based routing (path, header, cookie)
Request/response transformation
Caching, compression, security features
Detailed observability and logging

Converting Mermaid diagram...

Benefits of L4 + L7 Architecture

Horizontal scaling of L7: Add more L7 proxies; L4 distributes automatically
L7 high availability: L4 health checks detect L7 failures; removes from rotation
Zero-downtime L7 updates: Roll L7 instances one at a time; L4 drains connections
Resource isolation: L7 instances can be sized independently
Multi-tenancy: Different L7 pools for different workloads (production vs staging)

Implementation Considerations

Connection distribution:

Use consistent hashing on client IP for sticky sessions without L7 cookie tracking
Or use round-robin with L7-level session handling

Health checking:

L4 should check a dedicated health endpoint on L7, not the application
Health endpoint responds 200 if L7 is ready; 503 during draining/startup

Connection draining:

When updating L7, fail health check to stop new connections from L4
L7 continues serving existing connections until graceful shutdown timeout

l4-l7-hybrid.conf

HAProxy

# HAProxy L4 Configuration - Frontend to L7 Pool
 
global
    maxconn 500000
    
defaults
    mode tcp
    timeout connect 5s
    timeout client 30s
    timeout server 30s
 
# L4 frontend - distributes to L7 pool
frontend tcp_front
    bind *:443
    default_backend l7_pool
 
# L7 proxy pool
backend l7_pool
    balance roundrobin
    option tcp-check
    
    # Health check L7 instances on dedicated port
    server nginx1 10.0.1.10:443 check port 8080
    server nginx2 10.0.1.11:443 check port 8080
    server nginx3 10.0.1.12:443 check port 8080
    
    # Connection draining: wait up to 60s for connections to close
    default-server inter 3s fall 3 rise 2 drain-timeout 60s

Cloud Provider Implementation

AWS NLB in front of ALB is a common implementation of this pattern. NLB provides static IPs, ultra-high throughput, and WebSocket support, while ALB provides path routing, WAF integration, and HTTP/2 support. GCP and Azure offer similar combinations with their respective load balancer types.

Pattern: Anycast Layer 4 + Regional Layer 7

For global infrastructure, combining Anycast at Layer 4 with regional Layer 7 processing enables both geographic optimization and content intelligence.

How Anycast Works

Anycast is a routing technique where the same IP address is advertised from multiple locations. BGP routing ensures clients connect to the nearest advertising location.

Single IP: 192.0.2.1
Advertised from: Singapore, Frankfurt, Virginia, São Paulo

Client in Tokyo → routes to Singapore instance
Client in London → routes to Frankfurt instance
Client in NYC → routes to Virginia instance

Anycast + L7 Architecture

Global Anycast IP: Single IP routed to nearest PoP (Point of Presence)
L4 at PoP: Terminates TCP, health checks regional L7
Regional L7: TLS termination, routing, caching
Origin: Backend services (may be centralized or distributed)

Converting Mermaid diagram...

Benefits of Anycast + L7

Minimized latency: Clients connect to geographically nearest PoP
DDoS absorption: Attack traffic distributed across all PoPs
Regional processing: TLS termination close to users
Edge caching: Cache responses at each PoP
Simple client configuration: Single IP, no geographic DNS complexity

CDN Architecture Example

Content Delivery Networks are the canonical example of Anycast + L7:

1. Client resolves cdn.example.com → Anycast IP 198.51.100.1
2. BGP routes to nearest PoP (e.g., Singapore)
3. Singapore L4 receives connection, forwards to L7
4. Singapore L7 checks cache:
   - Cache hit: Serve immediately
   - Cache miss: Fetch from origin, cache, serve
5. Future requests for same content served from Singapore cache

Anycast Challenges

BGP convergence: Route changes during network events can redirect clients mid-connection. Mitigation: TCP connection recovery, session tickets that work across PoPs.

Uneven distribution: BGP doesn't balance load—it routes to "nearest" by AS path metrics, which may not reflect capacity. Mitigation: Withdraw routes when overloaded, use BGP communities for traffic engineering.

Debugging complexity: Same IP maps to different infrastructure in different locations. Mitigation: Include PoP identifier in responses, robust distributed logging.

When to Use Anycast

Anycast is essential for global, latency-sensitive services with geographically distributed users. If your users are in one region, simple regional deployment is simpler. Anycast shines for CDNs, DNS infrastructure, DDoS protection, and global APIs where every millisecond of latency matters.

Pattern: Protocol-Based Traffic Split

Organizations with diverse protocols route different traffic types through different layers based on protocol requirements.

Architecture Overview

┌─────────────────────────────────────────────────┐
│                  DNS Resolution                  │
└─────────────────────────────────────────────────┘
           ↓                    ↓                  ↓
     api.example.com      db.example.com     game.example.com
           ↓                    ↓                  ↓
    ┌───────────┐        ┌───────────┐       ┌───────────┐
    │ L7 Proxy  │        │ L4 LB     │       │ L4 LB     │
    │ (HTTP/S)  │        │ (TCP/5432)│       │ (UDP/7777)│
    └───────────┘        └───────────┘       └───────────┘
           ↓                    ↓                  ↓
      Web/API              PostgreSQL         Game Servers

Protocol Categories

Protocol Routing Strategy
Protocol Type	Layer	Example Services	Key Features Used
HTTP/HTTPS	Layer 7	Web apps, REST APIs	Path routing, TLS, caching, WAF
gRPC	Layer 7	Internal services	Method routing, streaming, health
WebSocket	L7 or L4	Real-time features	Upgrade handling, sticky sessions
Database (SQL)	Layer 4	PostgreSQL, MySQL	Connection pooling, read replicas
Cache (Redis)	Layer 4	Redis, Memcached	High-throughput, low-latency
Message Queue	Layer 4	Kafka, RabbitMQ	Protocol-specific, persistent
Gaming/Custom UDP	Layer 4	Game servers	Ultra-low latency, packet-level

Unified Entry Point Approach

Some organizations want a single entry point that routes by protocol:

Single L4 frontend inspects initial packets:

TCP + TLS ClientHello with HTTP ALPN → Route to L7 pool
TCP + TLS ClientHello with h2 ALPN → Route to L7 pool
TCP + PostgreSQL startup → Route to DB pool
UDP → Route to gaming servers

This requires sophisticated L4 that can parse enough of the connection to determine protocol.

Implementation with Port-Based Routing

Simpler approach: Separate ports for different protocols:

# Kubernetes Service definitions
---
apiVersion: v1
kind: Service
metadata:
  name: unified-entry
spec:
  type: LoadBalancer
  ports:
    - name: https
      port: 443
      targetPort: 8443  # → L7 pool
    - name: database
      port: 5432
      targetPort: 5432  # → L4 to DB pool
    - name: redis
      port: 6379
      targetPort: 6379  # → L4 to Redis
    - name: game
      port: 7777
      protocol: UDP
      targetPort: 7777  # → L4 to game servers

protocol-routing.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Envoy Protocol-Based Routing Configuration
static_resources:
  listeners:
    # HTTP/HTTPS → L7 Processing
    - name: https_listener
      address:
        socket_address: {address: 0.0.0.0, port_value: 443}
      filter_chains:
        - transport_socket:
            name: envoy.transport_sockets.tls
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
              common_tls_context:
                alpn_protocols: ["h2", "http/1.1"]
          filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                route_config:
                  # Full L7 routing config here
    
    # Database → L4 Passthrough
    - name: postgres_listener
      address:
        socket_address: {address: 0.0.0.0, port_value: 5432}
      filter_chains:
        - filters:
            - name: envoy.filters.network.tcp_proxy
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
                cluster: postgres_cluster
    
    # Gaming → L4 UDP
    - name: game_listener
      address:
        socket_address: {address: 0.0.0.0, port_value: 7777, protocol: UDP}
      filter_chains:
        - filters:
            - name: envoy.filters.udp.udp_proxy
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.udp.udp_proxy.v3.UdpProxyConfig
                cluster: game_servers

Operational Simplicity vs Capability

A unified proxy handling all protocols (like Envoy) simplifies operations—one technology to master, one configuration language, unified metrics. However, specialized proxies (HAProxy for TCP, NGINX for HTTP) may outperform generalists for specific protocols. Balance operational simplicity against performance requirements.

Cloud Provider Hybrid Architectures

Major cloud providers offer both L4 and L7 load balancing services designed to work together. Understanding these options enables effective hybrid architectures in cloud environments.

AWS Hybrid Architecture

Network Load Balancer (NLB): Layer 4

Ultra-high performance (millions of requests/second)
Static IPs, Elastic IPs
TLS passthrough or termination
WebSocket and long-lived TCP support

Application Load Balancer (ALB): Layer 7

Path/host/header-based routing
TLS termination with certificate management (ACM)
WAF integration
HTTP/2 and gRPC support

Common pattern: NLB → ALB for static IPs with L7 features.

Cloud Provider Load Balancer Comparison
Provider	Layer 4 Product	Layer 7 Product	Hybrid Support
AWS	Network Load Balancer (NLB)	Application Load Balancer (ALB)	NLB → ALB for static IPs
GCP	Network LB / TCP Proxy LB	HTTP(S) LB	Unified Global LB supports both
Azure	Azure Load Balancer	Application Gateway	Frontend IP can use both
Cloudflare	Spectrum (L4)	CDN / Workers	Automatic protocol detection

GCP Global Load Balancing

Google Cloud offers a unique unified global load balancer that combines aspects of both L4 and L7:

External HTTP(S) LB (Premium Tier):

Global Anycast IP
Google's global network routes to nearest PoP
L7 routing with URL maps
CDN integration, Cloud Armor WAF
Backend service supports multi-region

TCP/UDP Load Balancer:

Regional or global scope
L4 passthrough
Health checking with TCP or HTTP

Azure Traffic Management

Azure Load Balancer: Layer 4 regional

High availability sets
Outbound NAT
Inbound NAT for management

Azure Application Gateway: Layer 7 regional

URL-based routing
WAF v2 integration
WebSocket support

Azure Front Door: Global L7

Global Anycast
Edge caching
WAF at edge
Acceleration for dynamic content

aws-hybrid-terraform.tf

HCL

# AWS Hybrid Architecture: NLB → ALB
# Provides static IPs (NLB) with L7 features (ALB)
 
# Network Load Balancer (L4) with static IPs
resource "aws_lb" "nlb" {
  name               = "nlb-frontend"
  internal           = false
  load_balancer_type = "network"
  subnets            = var.public_subnets
  
  enable_cross_zone_load_balancing = true
}
 
# NLB Listener - TLS passthrough to ALB
resource "aws_lb_listener" "nlb_443" {
  load_balancer_arn = aws_lb.nlb.arn
  port              = 443
  protocol          = "TLS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = var.certificate_arn
  
  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.alb_target.arn
  }
}
 
# Target Group pointing to ALB
resource "aws_lb_target_group" "alb_target" {
  name        = "alb-target"
  port        = 443
  protocol    = "TCP"
  vpc_id      = var.vpc_id
  target_type = "alb"
}
 
resource "aws_lb_target_group_attachment" "alb" {
  target_group_arn = aws_lb_target_group.alb_target.arn
  target_id        = aws_lb.alb.arn
  port             = 443
}
 
# Application Load Balancer (L7) - behind NLB
resource "aws_lb" "alb" {
  name               = "alb-backend"
  internal           = true  # Not public-facing
  load_balancer_type = "application"
  subnets            = var.private_subnets
  security_groups    = [aws_security_group.alb.id]
}
 
# ALB routing rules (L7 features)
resource "aws_lb_listener" "alb_443" {
  load_balancer_arn = aws_lb.alb.arn
  port              = 443
  protocol          = "HTTPS"
  
  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.api.arn
  }
}
 
resource "aws_lb_listener_rule" "admin" {
  listener_arn = aws_lb_listener.alb_443.arn
  priority     = 100
  
  condition {
    path_pattern { values = ["/admin/*"] }
  }
  
  action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.admin.arn
  }
}

Cloud-Managed vs Self-Managed

Cloud-managed load balancers (AWS ALB, GCP HTTP LB) provide operational simplicity, automatic scaling, and integration with cloud services. Self-managed proxies (NGINX, Envoy on VMs) provide more control and can be cheaper at very high scale. Most organizations use cloud-managed for production traffic and self-managed for specialized needs.

Service Mesh Hybrid Patterns

Service meshes like Istio, Linkerd, and Consul Connect implement sophisticated hybrid patterns at the application level, providing L7 capabilities for internal (east-west) traffic while integrating with external (north-south) load balancing.

The Two-Tier Model

North-South tier (external traffic):

Cloud load balancer or reverse proxy
Terminates external TLS
Routes to mesh ingress

East-West tier (internal traffic):

Sidecar proxies (Envoy-based typically)
mTLS between services
L7 routing, observability, policy

Istio Architecture Example

Converting Mermaid diagram...

Performance Considerations in Service Mesh

Service mesh adds L7 processing to every hop:

Latency per hop:

Envoy proxy: 0.5-2ms per sidecar
mTLS overhead: 0.1-0.5ms for handshake (amortized with connection reuse)

For a request traversing 4 services:

Ingress → A → B → C → D
= 5 proxy hops × 1ms = ~5ms added latency

Mitigation strategies:

Connection pooling: Reuse connections to amortize TLS cost
Protocol optimization: HTTP/2 for multiplexing
Bypass for latency-critical: Some paths can bypass mesh
eBPF optimization: Cilium uses eBPF to reduce proxy hops

Hybrid Mesh Patterns

Selective mesh enrollment:

Enroll services that benefit from mesh features
Exclude latency-critical services (direct connection)
Exclude non-HTTP services (use L4 LB instead)

Ambient mesh (Istio):

L4 processing in shared node proxy
L7 processing only when needed (via waypoint proxy)
Reduces sidecar overhead while retaining capabilities

Service Mesh Traffic Categories
Traffic Type	Mesh Treatment	Rationale
External API	Through ingress gateway	TLS termination, routing, auth
Internal HTTP	Full mesh (sidecars)	mTLS, observability, policy
Internal gRPC	Full mesh (sidecars)	Streaming, deadline propagation
Database connections	L4 only or bypass	Binary protocol, low overhead needed
High-frequency internal	Consider bypass	When sub-ms latency critical

Mesh Complexity Trade-off

Service mesh provides powerful capabilities but adds operational complexity: proxy resource consumption, control plane management, configuration complexity, and debugging difficulty. Adopt incrementally, starting with observability (which requires minimal config), then adding mTLS, traffic management, and policy as needed.

Design Principles for Hybrid Architectures

Successful hybrid architectures follow consistent design principles that balance performance, capability, and operational simplicity.

Principle 1: Push Work to the Appropriate Layer

L4 should handle:

Initial connection distribution
Geographic/Anycast routing
Health monitoring of the L7 tier
Non-HTTP protocols
Ultra-low-latency requirements

L7 should handle:

Content-based routing
TLS termination (when inspection needed)
Authorization, authentication
Caching, compression
Observability, logging

Principle 2: Minimize Hops

Each layer adds latency. Design to minimize total hops:

Avoid: Client → L4 → L7 → L4 → Backend Prefer: Client → L4 → L7 → Backend

Questions to ask:

Is each layer providing value?
Can layers be consolidated?
Are there paths that bypass unnecessary layers?

Hybrid Architecture Design Checklist

•Purpose: Every layer must have a clear, justifiable purpose
•Observability: Each layer should emit metrics visible in a unified system
•Health propagation: Health state should propagate through layers (unhealthy backend → L7 → L4)
•Failure isolation: Failure in one layer shouldn't cascade to others
•Independent scaling: Layers should scale independently based on their bottleneck
•Consistent identity: Request IDs and trace context should flow through all layers
•Draining support: All layers must support graceful connection draining

Principle 3: Plan for Failure

L4 failure:

Use HA pairs or cloud-managed L4
Stateless L4 allows instant failover
ECMP across multiple L4 instances

L7 failure:

Pool of L7 instances; L4 health checks remove failed instances
Connection draining for graceful shutdown
Session replication or externalized session state

Network partition:

L4 continues forwarding (no state to split)
L7 may lose consistency if state is distributed
Design L7 to be stateless where possible

Principle 4: Operational Simplicity

Configuration management:

Use infrastructure-as-code for all layers
Consistent configuration language where possible
Automated testing of configuration changes

Monitoring:

Unified dashboards showing all layers
Correlated metrics (L4 throughput + L7 request rate + backend health)
Alerting on each layer's role (L4: connection failures; L7: HTTP errors)

Debugging:

Request tracing through all layers
Unique request IDs generated at entry, propagated throughout
Logging at each layer with correlated IDs

Start Simple, Add Layers Deliberately

The best architecture is the simplest one that meets requirements. Start with a single L7 layer (or cloud-managed LB). Add L4 only when you have specific needs: static IPs, protocol support, scale of L7 tier, or latency requirements that demand it. Complexity should be earned, not assumed.

Summary: Hybrid Approaches

Hybrid architectures combine Layer 4 and Layer 7 load balancing to achieve objectives impossible with either layer alone. When designed thoughtfully, they deliver the best of both worlds: L4's performance and L7's intelligence.

Key Takeaways

•Hybrid is the norm, not the exception: Production systems typically combine L4 and L7 strategically
•L4 in front of L7: The most common pattern—L4 distributes to a pool of L7 proxies for HA and scale
•Anycast + L7: Global distribution via Anycast, regional L7 processing—the CDN architecture pattern
•Protocol-based split: Different protocols route to appropriate layers (HTTP → L7, DB → L4)
•Cloud providers support hybrid: NLB + ALB, GCP Global LB, Azure Front Door all enable hybrid patterns
•Service mesh is internal hybrid: L7 processing for east-west traffic via sidecar proxies
•Design principles: Push work to appropriate layers, minimize hops, plan for failure, maintain simplicity

Module Complete:

You've now completed the comprehensive exploration of Layer 4 vs Layer 7 load balancing. You understand:

How each layer operates and its fundamental characteristics
The performance vs flexibility trade-off with quantified metrics
Concrete use cases guiding when to choose each layer
Hybrid architectural patterns that combine both layers effectively

This knowledge forms the foundation for designing load balancing architectures that are performant, intelligent, and operationally excellent.

Module Complete

You now possess expert-level understanding of Layer 4 vs Layer 7 load balancing. From transport-layer packet forwarding to application-layer content routing, from quantified performance trade-offs to production hybrid architectures—you're equipped to make informed load balancing decisions for any scale of system.

5 / 5

Loading learning content...

System Design HLDLayer 4 vs Layer 7 Load Balancing

Layer 4 vs Layer 7 Load Balancing

LevelIntermediate

Duration75 mins

TopicLayer 4 vs Layer 7 Load Balancing

5 / 5

Hybrid Approaches

The Best of Both Worlds

What You Will Learn

The Case for Hybrid Architectures

Why Not Pure Layer 4?

Pure Layer 4 architectures struggle with:

HTTP routing: Cannot route by path, header, or host
TLS termination: Each backend must manage its own certificates
Observability: Limited to connection-level metrics
Deployment strategies: No canary, A/B, or weighted routing
Security: No WAF, limited rate limiting options

Why Not Pure Layer 7?

Pure Layer 7 architectures face challenges with:

Scalability of L7 tier: L7 proxies require more resources per connection
Non-HTTP protocols: Cannot handle TCP/UDP services efficiently
Latency accumulation: Multiple L7 hops add up
TLS passthrough: Cannot support scenarios requiring end-to-end client encryption
Cost: 4-8x more compute for equivalent throughput

Hybrid Architecture Motivations
Challenge	Single-Layer Solution	Hybrid Solution
L7 scalability	Over-provision L7	L4 distributes to L7 pool
Mixed protocols	Separate infrastructures	L4 for non-HTTP, L7 for HTTP
TLS passthrough + routing	Compromise on one	L4 for passthrough, L7 for termination
Geographic distribution	DNS only	L4 Anycast + L7 routing
Cost optimization	Accept higher cost	Route traffic to appropriate layer

The Hybrid Principle

Use each layer for what it does best:

Layer 4: High-throughput distribution, TCP/UDP handling, geographic routing, health of L7 tier
Layer 7: Content routing, TLS termination, application features, observability, security

The goal is to minimize total latency while maximizing capability—pushing L4's efficiency as far as possible before adding L7 intelligence only where needed.

The 80/20 Rule for Hybrid

Pattern: Layer 4 in Front of Layer 7

The most common hybrid pattern: a Layer 4 tier that distributes traffic across a pool of Layer 7 proxies. This architecture combines L4's distribution efficiency with L7's intelligence.

Architecture Overview

Internet → DNS → L4 Load Balancer(s) → L7 Proxy Pool → Services

Layer 4 tier responsibilities:

Distribute connections across L7 instances using consistent hashing or round-robin
Health check L7 proxies (TCP health or simple HTTP health endpoint)
Provide high availability for the L7 tier
Optionally handle TLS passthrough (SNI-based routing to L7 instances)

Layer 7 tier responsibilities:

TLS termination (or second termination after L4 passthrough)
Content-based routing (path, header, cookie)
Request/response transformation
Caching, compression, security features
Detailed observability and logging

Converting Mermaid diagram...

Benefits of L4 + L7 Architecture

Horizontal scaling of L7: Add more L7 proxies; L4 distributes automatically
L7 high availability: L4 health checks detect L7 failures; removes from rotation
Zero-downtime L7 updates: Roll L7 instances one at a time; L4 drains connections
Resource isolation: L7 instances can be sized independently
Multi-tenancy: Different L7 pools for different workloads (production vs staging)

Implementation Considerations

Connection distribution:

Use consistent hashing on client IP for sticky sessions without L7 cookie tracking
Or use round-robin with L7-level session handling

Health checking:

L4 should check a dedicated health endpoint on L7, not the application
Health endpoint responds 200 if L7 is ready; 503 during draining/startup

Connection draining:

When updating L7, fail health check to stop new connections from L4
L7 continues serving existing connections until graceful shutdown timeout

l4-l7-hybrid.conf

HAProxy

# HAProxy L4 Configuration - Frontend to L7 Pool
 
global
    maxconn 500000
    
defaults
    mode tcp
    timeout connect 5s
    timeout client 30s
    timeout server 30s
 
# L4 frontend - distributes to L7 pool
frontend tcp_front
    bind *:443
    default_backend l7_pool
 
# L7 proxy pool
backend l7_pool
    balance roundrobin
    option tcp-check
    
    # Health check L7 instances on dedicated port
    server nginx1 10.0.1.10:443 check port 8080
    server nginx2 10.0.1.11:443 check port 8080
    server nginx3 10.0.1.12:443 check port 8080
    
    # Connection draining: wait up to 60s for connections to close
    default-server inter 3s fall 3 rise 2 drain-timeout 60s

Cloud Provider Implementation

Pattern: Anycast Layer 4 + Regional Layer 7

For global infrastructure, combining Anycast at Layer 4 with regional Layer 7 processing enables both geographic optimization and content intelligence.

How Anycast Works

Anycast is a routing technique where the same IP address is advertised from multiple locations. BGP routing ensures clients connect to the nearest advertising location.

Single IP: 192.0.2.1
Advertised from: Singapore, Frankfurt, Virginia, São Paulo

Client in Tokyo → routes to Singapore instance
Client in London → routes to Frankfurt instance
Client in NYC → routes to Virginia instance

Anycast + L7 Architecture

Global Anycast IP: Single IP routed to nearest PoP (Point of Presence)
L4 at PoP: Terminates TCP, health checks regional L7
Regional L7: TLS termination, routing, caching
Origin: Backend services (may be centralized or distributed)

Converting Mermaid diagram...

Benefits of Anycast + L7

Minimized latency: Clients connect to geographically nearest PoP
DDoS absorption: Attack traffic distributed across all PoPs
Regional processing: TLS termination close to users
Edge caching: Cache responses at each PoP
Simple client configuration: Single IP, no geographic DNS complexity

CDN Architecture Example

Content Delivery Networks are the canonical example of Anycast + L7:

1. Client resolves cdn.example.com → Anycast IP 198.51.100.1
2. BGP routes to nearest PoP (e.g., Singapore)
3. Singapore L4 receives connection, forwards to L7
4. Singapore L7 checks cache:
   - Cache hit: Serve immediately
   - Cache miss: Fetch from origin, cache, serve
5. Future requests for same content served from Singapore cache

Anycast Challenges

BGP convergence: Route changes during network events can redirect clients mid-connection. Mitigation: TCP connection recovery, session tickets that work across PoPs.

Debugging complexity: Same IP maps to different infrastructure in different locations. Mitigation: Include PoP identifier in responses, robust distributed logging.

When to Use Anycast

Pattern: Protocol-Based Traffic Split

Organizations with diverse protocols route different traffic types through different layers based on protocol requirements.

Architecture Overview

┌─────────────────────────────────────────────────┐
│                  DNS Resolution                  │
└─────────────────────────────────────────────────┘
           ↓                    ↓                  ↓
     api.example.com      db.example.com     game.example.com
           ↓                    ↓                  ↓
    ┌───────────┐        ┌───────────┐       ┌───────────┐
    │ L7 Proxy  │        │ L4 LB     │       │ L4 LB     │
    │ (HTTP/S)  │        │ (TCP/5432)│       │ (UDP/7777)│
    └───────────┘        └───────────┘       └───────────┘
           ↓                    ↓                  ↓
      Web/API              PostgreSQL         Game Servers

Protocol Categories

Protocol Routing Strategy
Protocol Type	Layer	Example Services	Key Features Used
HTTP/HTTPS	Layer 7	Web apps, REST APIs	Path routing, TLS, caching, WAF
gRPC	Layer 7	Internal services	Method routing, streaming, health
WebSocket	L7 or L4	Real-time features	Upgrade handling, sticky sessions
Database (SQL)	Layer 4	PostgreSQL, MySQL	Connection pooling, read replicas
Cache (Redis)	Layer 4	Redis, Memcached	High-throughput, low-latency
Message Queue	Layer 4	Kafka, RabbitMQ	Protocol-specific, persistent
Gaming/Custom UDP	Layer 4	Game servers	Ultra-low latency, packet-level

Unified Entry Point Approach

Some organizations want a single entry point that routes by protocol:

Single L4 frontend inspects initial packets:

TCP + TLS ClientHello with HTTP ALPN → Route to L7 pool
TCP + TLS ClientHello with h2 ALPN → Route to L7 pool
TCP + PostgreSQL startup → Route to DB pool
UDP → Route to gaming servers

This requires sophisticated L4 that can parse enough of the connection to determine protocol.

Implementation with Port-Based Routing

Simpler approach: Separate ports for different protocols:

# Kubernetes Service definitions
---
apiVersion: v1
kind: Service
metadata:
  name: unified-entry
spec:
  type: LoadBalancer
  ports:
    - name: https
      port: 443
      targetPort: 8443  # → L7 pool
    - name: database
      port: 5432
      targetPort: 5432  # → L4 to DB pool
    - name: redis
      port: 6379
      targetPort: 6379  # → L4 to Redis
    - name: game
      port: 7777
      protocol: UDP
      targetPort: 7777  # → L4 to game servers

protocol-routing.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Envoy Protocol-Based Routing Configuration
static_resources:
  listeners:
    # HTTP/HTTPS → L7 Processing
    - name: https_listener
      address:
        socket_address: {address: 0.0.0.0, port_value: 443}
      filter_chains:
        - transport_socket:
            name: envoy.transport_sockets.tls
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
              common_tls_context:
                alpn_protocols: ["h2", "http/1.1"]
          filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                route_config:
                  # Full L7 routing config here
    
    # Database → L4 Passthrough
    - name: postgres_listener
      address:
        socket_address: {address: 0.0.0.0, port_value: 5432}
      filter_chains:
        - filters:
            - name: envoy.filters.network.tcp_proxy
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
                cluster: postgres_cluster
    
    # Gaming → L4 UDP
    - name: game_listener
      address:
        socket_address: {address: 0.0.0.0, port_value: 7777, protocol: UDP}
      filter_chains:
        - filters:
            - name: envoy.filters.udp.udp_proxy
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.udp.udp_proxy.v3.UdpProxyConfig
                cluster: game_servers

Operational Simplicity vs Capability

Cloud Provider Hybrid Architectures

Major cloud providers offer both L4 and L7 load balancing services designed to work together. Understanding these options enables effective hybrid architectures in cloud environments.

AWS Hybrid Architecture

Network Load Balancer (NLB): Layer 4

Ultra-high performance (millions of requests/second)
Static IPs, Elastic IPs
TLS passthrough or termination
WebSocket and long-lived TCP support

Application Load Balancer (ALB): Layer 7

Path/host/header-based routing
TLS termination with certificate management (ACM)
WAF integration
HTTP/2 and gRPC support

Common pattern: NLB → ALB for static IPs with L7 features.

Cloud Provider Load Balancer Comparison
Provider	Layer 4 Product	Layer 7 Product	Hybrid Support
AWS	Network Load Balancer (NLB)	Application Load Balancer (ALB)	NLB → ALB for static IPs
GCP	Network LB / TCP Proxy LB	HTTP(S) LB	Unified Global LB supports both
Azure	Azure Load Balancer	Application Gateway	Frontend IP can use both
Cloudflare	Spectrum (L4)	CDN / Workers	Automatic protocol detection

GCP Global Load Balancing

Google Cloud offers a unique unified global load balancer that combines aspects of both L4 and L7:

External HTTP(S) LB (Premium Tier):

Global Anycast IP
Google's global network routes to nearest PoP
L7 routing with URL maps
CDN integration, Cloud Armor WAF
Backend service supports multi-region

TCP/UDP Load Balancer:

Regional or global scope
L4 passthrough
Health checking with TCP or HTTP

Azure Traffic Management

Azure Load Balancer: Layer 4 regional

High availability sets
Outbound NAT
Inbound NAT for management

Azure Application Gateway: Layer 7 regional

URL-based routing
WAF v2 integration
WebSocket support

Azure Front Door: Global L7

Global Anycast
Edge caching
WAF at edge
Acceleration for dynamic content

aws-hybrid-terraform.tf

HCL

# AWS Hybrid Architecture: NLB → ALB
# Provides static IPs (NLB) with L7 features (ALB)
 
# Network Load Balancer (L4) with static IPs
resource "aws_lb" "nlb" {
  name               = "nlb-frontend"
  internal           = false
  load_balancer_type = "network"
  subnets            = var.public_subnets
  
  enable_cross_zone_load_balancing = true
}
 
# NLB Listener - TLS passthrough to ALB
resource "aws_lb_listener" "nlb_443" {
  load_balancer_arn = aws_lb.nlb.arn
  port              = 443
  protocol          = "TLS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = var.certificate_arn
  
  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.alb_target.arn
  }
}
 
# Target Group pointing to ALB
resource "aws_lb_target_group" "alb_target" {
  name        = "alb-target"
  port        = 443
  protocol    = "TCP"
  vpc_id      = var.vpc_id
  target_type = "alb"
}
 
resource "aws_lb_target_group_attachment" "alb" {
  target_group_arn = aws_lb_target_group.alb_target.arn
  target_id        = aws_lb.alb.arn
  port             = 443
}
 
# Application Load Balancer (L7) - behind NLB
resource "aws_lb" "alb" {
  name               = "alb-backend"
  internal           = true  # Not public-facing
  load_balancer_type = "application"
  subnets            = var.private_subnets
  security_groups    = [aws_security_group.alb.id]
}
 
# ALB routing rules (L7 features)
resource "aws_lb_listener" "alb_443" {
  load_balancer_arn = aws_lb.alb.arn
  port              = 443
  protocol          = "HTTPS"
  
  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.api.arn
  }
}
 
resource "aws_lb_listener_rule" "admin" {
  listener_arn = aws_lb_listener.alb_443.arn
  priority     = 100
  
  condition {
    path_pattern { values = ["/admin/*"] }
  }
  
  action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.admin.arn
  }
}

Cloud-Managed vs Self-Managed

Service Mesh Hybrid Patterns

The Two-Tier Model

North-South tier (external traffic):

Cloud load balancer or reverse proxy
Terminates external TLS
Routes to mesh ingress

East-West tier (internal traffic):

Sidecar proxies (Envoy-based typically)
mTLS between services
L7 routing, observability, policy

Istio Architecture Example

Converting Mermaid diagram...

Performance Considerations in Service Mesh

Service mesh adds L7 processing to every hop:

Latency per hop:

Envoy proxy: 0.5-2ms per sidecar
mTLS overhead: 0.1-0.5ms for handshake (amortized with connection reuse)

For a request traversing 4 services:

Ingress → A → B → C → D
= 5 proxy hops × 1ms = ~5ms added latency

Mitigation strategies:

Connection pooling: Reuse connections to amortize TLS cost
Protocol optimization: HTTP/2 for multiplexing
Bypass for latency-critical: Some paths can bypass mesh
eBPF optimization: Cilium uses eBPF to reduce proxy hops

Hybrid Mesh Patterns

Selective mesh enrollment:

Enroll services that benefit from mesh features
Exclude latency-critical services (direct connection)
Exclude non-HTTP services (use L4 LB instead)

Ambient mesh (Istio):

L4 processing in shared node proxy
L7 processing only when needed (via waypoint proxy)
Reduces sidecar overhead while retaining capabilities

Service Mesh Traffic Categories
Traffic Type	Mesh Treatment	Rationale
External API	Through ingress gateway	TLS termination, routing, auth
Internal HTTP	Full mesh (sidecars)	mTLS, observability, policy
Internal gRPC	Full mesh (sidecars)	Streaming, deadline propagation
Database connections	L4 only or bypass	Binary protocol, low overhead needed
High-frequency internal	Consider bypass	When sub-ms latency critical

Mesh Complexity Trade-off

Design Principles for Hybrid Architectures

Successful hybrid architectures follow consistent design principles that balance performance, capability, and operational simplicity.

Principle 1: Push Work to the Appropriate Layer

L4 should handle:

Initial connection distribution
Geographic/Anycast routing
Health monitoring of the L7 tier
Non-HTTP protocols
Ultra-low-latency requirements

L7 should handle:

Content-based routing
TLS termination (when inspection needed)
Authorization, authentication
Caching, compression
Observability, logging

Principle 2: Minimize Hops

Each layer adds latency. Design to minimize total hops:

Avoid: Client → L4 → L7 → L4 → Backend Prefer: Client → L4 → L7 → Backend

Questions to ask:

Is each layer providing value?
Can layers be consolidated?
Are there paths that bypass unnecessary layers?

Hybrid Architecture Design Checklist

•Purpose: Every layer must have a clear, justifiable purpose
•Observability: Each layer should emit metrics visible in a unified system
•Health propagation: Health state should propagate through layers (unhealthy backend → L7 → L4)
•Failure isolation: Failure in one layer shouldn't cascade to others
•Independent scaling: Layers should scale independently based on their bottleneck
•Consistent identity: Request IDs and trace context should flow through all layers
•Draining support: All layers must support graceful connection draining

Principle 3: Plan for Failure

L4 failure:

Use HA pairs or cloud-managed L4
Stateless L4 allows instant failover
ECMP across multiple L4 instances

L7 failure:

Pool of L7 instances; L4 health checks remove failed instances
Connection draining for graceful shutdown
Session replication or externalized session state

Network partition:

L4 continues forwarding (no state to split)
L7 may lose consistency if state is distributed
Design L7 to be stateless where possible

Principle 4: Operational Simplicity

Configuration management:

Use infrastructure-as-code for all layers
Consistent configuration language where possible
Automated testing of configuration changes

Monitoring:

Unified dashboards showing all layers
Correlated metrics (L4 throughput + L7 request rate + backend health)
Alerting on each layer's role (L4: connection failures; L7: HTTP errors)

Debugging:

Request tracing through all layers
Unique request IDs generated at entry, propagated throughout
Logging at each layer with correlated IDs

Start Simple, Add Layers Deliberately

Summary: Hybrid Approaches

Key Takeaways

•Hybrid is the norm, not the exception: Production systems typically combine L4 and L7 strategically
•L4 in front of L7: The most common pattern—L4 distributes to a pool of L7 proxies for HA and scale
•Anycast + L7: Global distribution via Anycast, regional L7 processing—the CDN architecture pattern
•Protocol-based split: Different protocols route to appropriate layers (HTTP → L7, DB → L4)
•Cloud providers support hybrid: NLB + ALB, GCP Global LB, Azure Front Door all enable hybrid patterns
•Service mesh is internal hybrid: L7 processing for east-west traffic via sidecar proxies
•Design principles: Push work to appropriate layers, minimize hops, plan for failure, maintain simplicity

Module Complete:

You've now completed the comprehensive exploration of Layer 4 vs Layer 7 load balancing. You understand:

How each layer operates and its fundamental characteristics
The performance vs flexibility trade-off with quantified metrics
Concrete use cases guiding when to choose each layer
Hybrid architectural patterns that combine both layers effectively

This knowledge forms the foundation for designing load balancing architectures that are performant, intelligent, and operationally excellent.

Module Complete

5 / 5