Load Balancing - Learning Module

Loading content...

0/240

Layer 4 vs Layer 7 Load Balancing

The OSI Layer Decision: Where Intelligence Meets Performance

When architects design load balancing solutions, one of the most consequential decisions they make is selecting between Layer 4 (L4) and Layer 7 (L7) load balancing. This choice determines everything from the information available for routing decisions to the performance characteristics of the system.

At its core, this decision represents a fundamental tradeoff: performance versus intelligence. Layer 4 load balancers are blindingly fast but relatively uninformed about the actual content being transferred. Layer 7 load balancers understand the application protocol fully but must pay a performance cost for that understanding.

This page dissects both approaches with the depth required to make informed architectural decisions in production systems.

What You Will Master

By the end of this page, you will understand: the OSI model context for L4 vs L7, exactly what information each layer can access, the performance implications of each approach, when to choose one over the other, and how modern systems often combine both approaches.

OSI Model Context: Understanding the Layers

To truly understand L4 vs L7 load balancing, we must first establish a clear picture of the OSI model layers involved.

Relevant OSI Layers for Load Balancing:

Layer	Name	Protocol Examples	Load Balancer Visibility
7	Application	HTTP, HTTPS, DNS, FTP	Full request/response content
6	Presentation	SSL/TLS, Compression	Encryption handling
5	Session	NetBIOS, RPC	Connection sessions
4	Transport	TCP, UDP	Ports, TCP flags, connection state
3	Network	IP, ICMP	Source/destination IP addresses
2	Data Link	Ethernet, WiFi	MAC addresses
1	Physical	Cables, signals	Bit-level transmission

Why Layers 4 and 7?

Load balancers primarily operate at Layer 4 or Layer 7 because:

Layer 4 is where connection-oriented transport (TCP) provides enough information for intelligent distribution without requiring content inspection
Layer 7 is where application semantics become visible, enabling content-based routing

Layers 5 and 6 are often considered part of the application layer in the TCP/IP model, and their functions (sessions, encryption) are handled as part of L7 load balancing.

What Each Layer Can See:

Layer 3 (Network) Information:

Source IP address
Destination IP address
Protocol number (TCP=6, UDP=17)
IP header options

Layer 4 (Transport) Information: All of Layer 3 plus:

Source port number (ephemeral port)
Destination port number (service port)
TCP flags (SYN, ACK, FIN, RST, etc.)
TCP sequence and acknowledgment numbers
TCP window size
UDP datagram length

Layer 7 (Application) Information: All of Layers 3 and 4 plus:

HTTP method (GET, POST, PUT, DELETE)
URL path and query parameters
HTTP headers (Host, Cookie, User-Agent, etc.)
HTTP body content
Protocol-specific fields (gRPC service/method, WebSocket messages)
SSL/TLS handshake details (SNI hostname before decryption)

The Critical Distinction

A Layer 4 load balancer sees a TCP connection as a stream of bytes with no understanding of what those bytes represent. A Layer 7 load balancer parses those bytes according to the application protocol (e.g., HTTP) and understands the semantic meaning of the request.

Layer 4 Load Balancing: Deep Dive

Layer 4 load balancing operates at the transport layer, making routing decisions based solely on network information without inspecting application content.

How L4 Load Balancing Works:

Connection Arrival: Client initiates TCP connection (SYN packet) to LB VIP
Backend Selection: LB selects backend using L4 algorithm (hash of IP+port, round-robin)
Connection Establishment: LB either:
- NAT Mode: Translates destination IP/port, forwards packet
- Proxy Mode: Completes TCP handshake with client, opens new connection to backend
Data Transfer: All subsequent packets in the connection flow to the selected backend
Connection Termination: When either side closes, LB handles connection cleanup

NAT-Based L4 Load Balancing (DNAT):

Client: 203.0.113.10:54321
    │
    ▼ SYN to 10.0.0.1:80 (VIP)
┌─────────────────────────────┐
│ L4 Load Balancer            │
│  - Records: 203.0.113.10:54321 → Backend 1                   │
│  - Translates destination: 10.0.0.1:80 → 192.168.1.10:80    │
└─────────────────────────────┘
    │
    ▼ SYN to 192.168.1.10:80 (Backend 1)
Backend 1: 192.168.1.10:80

The Load Balancer maintains a connection tracking table mapping client connections to backends. All packets in a flow are directed to the same backend.

L4 Load Balancer Routing Factors:

Factor	Source	Routing Use
Source IP	IP header	Geographic routing, client affinity
Source Port	TCP/UDP header	Connection hashing
Destination Port	TCP/UDP header	Service routing (port 80 → web, port 443 → TLS)
Protocol	IP header	Protocol-specific pools

What L4 Load Balancers CANNOT Do:

Route based on URL path (/api/* vs /static/*)
Route based on HTTP headers (Host, Cookie, Authorization)
Perform content-based health checks (check HTTP 200 response)
Modify HTTP requests or responses
Implement rate limiting per API endpoint
Perform request/response logging with HTTP details
SSL termination with certificate management (some L4 LBs support SSL passthrough)

What L4 Load Balancers CAN Do:

Extremely fast packet forwarding (millions of packets per second)
Protocol-agnostic load balancing (works with any TCP/UDP protocol)
Simple health checks (TCP connect, port availability)
Connection tracking and consistent routing
Direct Server Return (DSR) for high-bandwidth scenarios
Low latency (microseconds, not milliseconds)

L4 Load Balancer Performance Characteristics
Metric	Typical L4 Performance	Notes
Latency Added	< 100 μs	Order of magnitude faster than L7
Connections/sec	1M+	Limited by connection tracking table size
Throughput	100+ Gbps	Often wire-speed forwarding
Memory Usage	Low	Only connection tracking state
CPU Usage	Low	No application parsing

When to Choose L4

Choose L4 load balancing when: you need maximum throughput and minimum latency, you're load balancing non-HTTP protocols (databases, gaming, custom TCP), your routing logic can be based purely on IP and port, or you want to offload SSL termination to backends.

Layer 7 Load Balancing: Deep Dive

Layer 7 load balancing operates at the application layer, parsing the full request content and making intelligent routing decisions based on application semantics.

How L7 Load Balancing Works:

Connection Establishment: Client completes full TCP handshake with LB
SSL Termination: If HTTPS, LB decrypts the traffic
Request Parsing: LB reads and parses the complete HTTP request
Routing Decision: LB applies rules based on request content
Backend Selection: Based on routing rules and load balancing algorithm
Backend Connection: LB opens connection to selected backend
Request Forwarding: LB forwards request (potentially modified)
Response Handling: LB receives response, may modify it
Client Response: LB sends response to client

Why L7 Requires Full Proxy:

Unlike L4, which can use NAT to forward packets, L7 must operate as a full proxy:

Client ◄──TCP Connection 1──► L7 LB ◄──TCP Connection 2──► Backend
        (TLS Session 1)                (Optionally TLS Session 2)

The load balancer terminates the client's TCP connection and establishes a completely separate connection to the backend. This is necessary because:

HTTP is a request-response protocol that must be parsed
Headers may need to be added/modified (X-Forwarded-For, X-Real-IP)
Routing decisions require reading the complete request
SSL termination requires access to encrypted content

L7 Load Balancer Routing Factors:

Factor	Example	Routing Use
URL Path	`/api/v2/*`	Route to API service version 2
Host Header	`api.example.com`	Virtual hosting, multi-tenant routing
HTTP Method	`GET` vs `POST`	Route reads vs writes differently
Query Parameters	`?version=beta`	A/B testing, feature flags
Cookie Values	`session_id`	Session stickiness
Custom Headers	`X-Tenant-ID`	Multi-tenant routing
Content-Type	`application/json`	Route to JSON-optimized backends
Client Certificate	CN, OU fields	mTLS-based routing
gRPC Service/Method	`user.UserService/GetUser`	gRPC service routing

Powerful L7 Routing Examples:

# NGINX L7 routing configuration

# Route by URL path
location /api/ {
    proxy_pass http://api_backends;
}

location /static/ {
    proxy_pass http://cdn_backends;
}

# Route by header
if ($http_x_tenant_id = "enterprise") {
    proxy_pass http://enterprise_backends;
}

# Route by cookie for A/B testing
if ($cookie_experiment = "new_ui") {
    proxy_pass http://new_ui_backends;
}

L7 Load Balancer Capabilities Matrix
Capability	Description	Value
Content-Based Routing	Route by URL, headers, cookies	Essential for microservices
Request Modification	Add/remove/modify headers	Required for distributed tracing
Response Modification	Transform responses, compression	Reduces client bandwidth
SSL Termination	Centralized certificate management	Operational simplification
Rate Limiting	Per-endpoint or per-user limits	API protection
Authentication	JWT validation, OAuth integration	Security boundary
HTTP/2 to HTTP/1.1	Protocol translation	Legacy backend support
Connection Pooling	Reuse backend connections	Reduces backend load
Retry Logic	Automatic retry on failure	Improved reliability
Circuit Breaking	Prevent cascading failures	System resilience

L7 Performance Overhead

L7 load balancing adds significant overhead: request buffering, HTTP parsing, SSL processing, and potentially request/response modification. Expect 1-10ms of latency (vs. microseconds for L4) and 10-100x lower throughput. This is often an acceptable tradeoff for the routing intelligence gained.

L4 vs L7: Comprehensive Comparison

Let's directly compare L4 and L7 load balancing across all relevant dimensions:

L4 vs L7 Complete Comparison
Dimension	Layer 4	Layer 7
OSI Layer	Transport (TCP/UDP)	Application (HTTP/HTTPS)
Visibility	IP addresses, ports, TCP flags	Full request content
Routing Intelligence	Limited to IP/port hashing	Content-based, highly flexible
Latency Added	< 100 μs	1-10 ms
Throughput	100+ Gbps	1-10 Gbps (typical)
Connections/sec	1M+	10K-100K
Connection Model	NAT or simple proxy	Full terminating proxy
SSL/TLS	Passthrough only	Full termination/re-encryption
Health Checks	TCP connect, port probe	HTTP response code, content
Protocol Support	Any TCP/UDP protocol	HTTP, HTTPS, gRPC, WebSocket
Header Manipulation	Not possible	Full add/modify/remove
Sticky Sessions	IP/port hash only	Cookie, header, or URL-based
Observability	Connection metrics	Full request/response logging
Complexity	Lower	Higher
Resource Usage	Lower (stateless)	Higher (request buffering)

L4 Use Cases

•Database load balancing (MySQL, PostgreSQL)
•Gaming servers (low latency critical)
•Video streaming (high bandwidth)
•Non-HTTP protocols (SMTP, MQTT)
•Front-tier load balancing before L7 LB pool
•SSL passthrough when backends must terminate
•High-frequency trading systems
•DNS load balancing

L7 Use Cases

•Web application traffic routing
•API gateway functionality
•Microservices routing (path-based)
•A/B testing and canary deployments
•Multi-tenant SaaS routing
•WebSocket load balancing
•gRPC service routing
•CDN origin selection

Connection Multiplexing and Protocol Evolution

One of the most significant differences between L4 and L7 load balancing becomes apparent when considering modern HTTP protocols.

The HTTP/1.1 vs HTTP/2 Challenge:

HTTP/1.1:

Single request-response per connection (without pipelining)
Clients open multiple connections for parallelism (typically 6-8 per domain)
L4 load balancing works reasonably well (each connection gets one backend)

HTTP/2:

Multiple streams (requests) multiplexed over single TCP connection
Clients typically use 1-2 connections per domain
L4 load balancing is problematic:
- Single connection → single backend
- No distribution of requests within the connection
- Backend affinity is per-connection, not per-request

L7 Multiplexing Advantage:

                    HTTP/2 Streams
                    ┌─────────────┐
Client ─HTTP/2──►   │   Stream 1  │──► Backend 1
(single connection) │   Stream 2 │──► Backend 2
                    │   Stream 3  │──► Backend 1
                    │   Stream 4  │──► Backend 3
                    └─────────────┘
                    L7 Load Balancer
                    (demultiplexes streams)

An L7 load balancer can route individual HTTP/2 streams to different backends, maintaining the parallelism benefits even though there's a single client connection.

gRPC Considerations:

gRPC uses HTTP/2 as its transport protocol, making L7 load balancing particularly important:

Aspect	L4 Load Balancing	L7 Load Balancing
Connection model	Long-lived connections	Stream-level routing
Request distribution	All requests to one backend	Distributed per-RPC
Streaming RPCs	Both ends to same backend	Can handle properly
Connection failures	Full reconnection needed	Transparent retry
Health checking	TCP only	gRPC health protocol

HTTP/3 and QUIC:

HTTP/3 uses QUIC (UDP-based) instead of TCP, further complicating L4 load balancing:

UDP has no connection establishment (L4 LB connection tracking is harder)
QUIC connection IDs may change during a session
L4 must use connection ID tracking, not 4-tuple
L7 load balancing with QUIC termination is emerging

Protocol Translation:

L7 load balancers can perform protocol translation:

Client ──HTTP/2──► L7 LB ──HTTP/1.1──► Legacy Backend
Client ──HTTP/3──► L7 LB ──HTTP/2──► Modern Backend
Client ──gRPC──► L7 LB ──JSON/REST──► REST Backend

This enables gradual protocol upgrades without replacing all backends.

gRPC Load Balancing Best Practice

For gRPC services, always use L7 load balancing (or client-side load balancing with service discovery). L4 load balancing will create hot spots because gRPC clients maintain long-lived connections, and all RPCs from one connection go to the same backend.

Multi-Tier Load Balancing Architecture

In practice, large-scale systems often combine L4 and L7 load balancing in a multi-tier architecture that leverages the strengths of each.

The Multi-Tier Pattern:

                          Internet
                              │
                              ▼
                    ┌─────────────────┐
                    │   Edge Router   │ ◄── BGP Anycast (L3)
                    │   / Global LB   │     Geographic routing
                    └─────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │   L4 LB Tier    │ ◄── High throughput
                    │   (NLB/IPVS)    │     SSL pass-through
                    └─────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
        ┌──────────┐    ┌──────────┐    ┌──────────┐
        │  L7 LB   │    │  L7 LB   │    │  L7 LB   │
        │  (Envoy) │    │  (Envoy) │    │  (Envoy) │
        └──────────┘    └──────────┘    └──────────┘
              │               │               │
              ▼               ▼               ▼
        ┌──────────┐    ┌──────────┐    ┌──────────┐
        │ Services │    │ Services │    │ Services │
        └──────────┘    └──────────┘    └──────────┘

Why This Architecture?

L4 Tier Responsibilities:
- Distribute massive traffic across L7 LB pool
- Handle DDoS mitigation (drops bad traffic early)
- Provide redundancy for L7 LBs
- SSL passthrough for L7 LBs to terminate
- Achieve wire-speed packet processing
L7 Tier Responsibilities:
- SSL/TLS termination with certificate management
- Content-based routing to services
- Rate limiting and authentication
- Request/response transformation
- Detailed observability and logging

Multi-Tier Architecture Benefits
Concern	Single-Tier L7	Multi-Tier (L4 + L7)
L7 LB Scaling	Complex, requires DNS changes	Add L7 nodes; L4 distributes automatically
L7 LB Failures	Direct user impact	L4 routes around failed L7 nodes
DDoS Protection	L7 LBs exhausted by attack	L4 tier drops attack traffic
SSL Performance	L7 LB CPU constrained	L7 pool scales horizontally
Deployment	Rolling update is complex	Update L7 behind L4 seamlessly
Cost	Lower (fewer components)	Higher but more scalable

Real-World Example: AWS Architecture

Internet ──► Route 53 (DNS)
                 │
                 ▼
            CloudFront (CDN) ◄── L7-like edge caching
                 │
                 ▼
            NLB (Network LB) ◄── L4 load balancing
                 │
                 ▼
            ALB (Application LB) ◄── L7 load balancing
                 │
                 ▼
            Target Groups (ECS/EKS/EC2)

When to Use Multi-Tier:

Traffic exceeds single L7 LB capacity
Need HA for L7 load balancers themselves
DDoS protection requirements
Multiple L7 LB pools for different services
Geographic distribution with regional L7 LBs

Cost Consideration

Multi-tier architectures add complexity and cost. Start with single-tier L7 unless you have specific requirements (massive scale, DDoS concerns, HA for LBs). Cloud-managed L7 load balancers like AWS ALB handle many of these concerns automatically.

Implementation Examples: L4 and L7 in Practice

Let's examine concrete implementations of both L4 and L7 load balancing to solidify understanding.

L4 Example: Linux IPVS (IP Virtual Server)

IPVS is a highly performant L4 load balancer built into the Linux kernel.

# Install IPVS admin tools
apt-get install ipvsadm

# Create virtual service on VIP 10.0.0.1:80
ipvsadm -A -t 10.0.0.1:80 -s rr

# Add real servers (backends)
ipvsadm -a -t 10.0.0.1:80 -r 192.168.1.10:80 -m
ipvsadm -a -t 10.0.0.1:80 -r 192.168.1.11:80 -m
ipvsadm -a -t 10.0.0.1:80 -r 192.168.1.12:80 -m

# Flags:
# -A: Add virtual service
# -t: TCP (use -u for UDP)
# -s rr: Scheduling algorithm (round-robin)
# -a: Add real server
# -r: Real server address
# -m: Masquerading (NAT) mode

IPVS Scheduling Algorithms:

rr: Round-Robin
wrr: Weighted Round-Robin
lc: Least Connections
wlc: Weighted Least Connections
lblc: Locality-Based Least Connections
sh: Source Hashing (session persistence)

L7 Example: NGINX Configuration

# nginx.conf - L7 Load Balancing

# Define upstream backends
upstream api_servers {
    least_conn;  # L7 algorithm: least connections
    server 192.168.1.10:8080 weight=5;
    server 192.168.1.11:8080 weight=3;
    server 192.168.1.12:8080 backup;
    
    keepalive 32;  # Connection pooling
}

upstream static_servers {
    server 192.168.2.10:80;
    server 192.168.2.11:80;
}

server {
    listen 443 ssl http2;
    server_name example.com;
    
    # SSL termination
    ssl_certificate /etc/nginx/certs/example.crt;
    ssl_certificate_key /etc/nginx/certs/example.key;
    
    # L7 content-based routing
    location /api/ {
        proxy_pass http://api_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
    
    location /static/ {
        proxy_pass http://static_servers;
    }
    
    # A/B testing by cookie
    location /app/ {
        if ($cookie_experiment = "v2") {
            proxy_pass http://app_v2_servers;
        }
        proxy_pass http://app_v1_servers;
    }
    
    # Rate limiting (L7 capability)
    limit_req zone=api_limit burst=20;
}

L7 Example: Envoy Proxy (Modern Cloud-Native)

# envoy.yaml - L7 Load Balancing
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8080
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_http
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: backend
                      domains: ["*"]
                      routes:
                        # L7 path-based routing
                        - match:
                            prefix: "/api/v2"
                          route:
                            cluster: api_v2_cluster
                        - match:
                            prefix: "/api"
                          route:
                            cluster: api_v1_cluster
                        - match:
                            prefix: "/"
                          route:
                            cluster: web_cluster
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
    - name: api_v2_cluster
      connect_timeout: 0.25s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: api_v2_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: api-v2
                      port_value: 8080

Configuration Differences

Notice how L4 configuration (IPVS) is minimal—just IPs, ports, and algorithm. L7 configuration (NGINX, Envoy) is much more detailed because it understands application semantics and can make complex routing decisions.

Summary: L4 vs L7 Decision Framework

We've comprehensively explored the differences between Layer 4 and Layer 7 load balancing. Let's consolidate this knowledge into a decision framework.

Key Takeaways

•L4 sees network data; L7 sees application data — This fundamental difference drives all capabilities and tradeoffs.
•L4 is fast but limited — Microsecond latency and massive throughput, but routing based only on IP and port.
•L7 is intelligent but slower — Full content inspection enables sophisticated routing at the cost of latency and throughput.
•HTTP/2 and gRPC need L7 — Multiplexed protocols require application-layer load balancing for proper request distribution.
•Multi-tier combines both — Use L4 for scale and resilience, L7 for intelligence and routing.
•Choose based on requirements — Non-HTTP protocols or extreme performance → L4. Content-based routing or modern protocols → L7.
•Most web applications need L7 — The capabilities (SSL termination, path routing, observability) are essential for modern architectures.

Decision Flowchart:

Need content-based routing? ──Yes──► L7
         │
         No
         │
         ▼
Need SSL termination? ──Yes──► L7
         │
         No
         │
         ▼
Protocol is HTTP/2, gRPC, WebSocket? ──Yes──► L7
         │
         No
         │
         ▼
Need sub-millisecond latency? ──Yes──► L4
         │
         No
         │
         ▼
Need >10 Gbps throughput? ──Yes──► L4
         │
         No
         │
         ▼
Consider L7 for observability benefits

What's Next:

Now that we understand how load balancers work at different layers, we'll explore the algorithms they use to select backends. Round-robin, weighted distribution, least connections, consistent hashing—each algorithm has distinct characteristics that make it suitable for different workloads.

Page Complete

You now understand the fundamental distinction between L4 and L7 load balancing, including the tradeoffs, capabilities, and use cases for each. Next, we'll dive into the algorithms that determine how requests are distributed across backends.