Loading learning content...
When Netflix streams video to 200 million subscribers simultaneously, when Google handles millions of search queries per second, or when AWS routes traffic across global regions, Layer 4 load balancing forms the invisible backbone that makes such scale possible. Operating at the transport layer of the networking stack, Layer 4 load balancers make routing decisions based on network information alone—IP addresses and port numbers—without any awareness of the payload they're carrying.
This seeming limitation is actually its greatest strength. By remaining agnostic to application content, Layer 4 load balancers achieve extraordinary throughput with minimal latency overhead. They're the high-performance workhorses of modern infrastructure, handling millions of connections per second on commodity hardware.
By the end of this page, you will understand how Layer 4 load balancing operates at the transport layer, the mechanics of TCP and UDP connection routing, the architectural patterns that enable massive scalability, and the precise scenarios where Layer 4 is the optimal choice over application-aware alternatives.
To fully understand Layer 4 load balancing, we must first establish where it sits in the networking stack and what information is available at this layer.
The OSI (Open Systems Interconnection) model defines seven layers of network abstraction, each building upon the previous:
Layer 4—the Transport Layer—is where Layer 4 load balancers operate. At this level, the load balancer sees:
Critically, the load balancer does not see the actual content of the request—no HTTP headers, no URLs, no cookies, no request bodies. It operates on metadata alone.
| Layer | Name | Information Available | Load Balancing Capability |
|---|---|---|---|
| Layer 3 | Network | IP addresses only | Basic routing, geographic distribution |
| Layer 4 | Transport | IP + Port + Protocol | Connection/session distribution |
| Layer 7 | Application | Full request content | Content-based routing, header inspection |
In practice, most engineers use the simplified TCP/IP model (4 layers) rather than OSI (7 layers). In TCP/IP terminology, Layer 4 corresponds to the Transport layer, and Layer 7 corresponds to the Application layer. The concepts remain identical regardless of which model you reference.
Layer 4 load balancing operates through one of two fundamental mechanisms: NAT-based routing or Direct Server Return (DSR). Each has distinct operational characteristics that determine when it's appropriate.
In NAT (Network Address Translation) mode, the load balancer acts as a full proxy at the network level:
The key characteristic: all traffic (both request and response) flows through the load balancer. This enables connection tracking, health checking, and consistent routing, but the load balancer becomes a potential bottleneck.
DSR (also called Direct Routing or Triangulation) is a high-performance technique where:
The key characteristic: only inbound traffic flows through the load balancer; responses go directly from backend to client. This dramatically increases throughput, as response traffic (typically larger than requests) doesn't consume load balancer resources.
DSR requires all backend servers to be on the same Layer 2 network (same broadcast domain) as the load balancer. Additionally, each backend must accept packets destined for the VIP, typically by configuring the VIP on a loopback interface. These constraints make DSR more complex to deploy but deliver superior performance for asymmetric traffic patterns.
| Characteristic | NAT Mode | Direct Server Return (DSR) |
|---|---|---|
| Traffic flow | Symmetric (all through LB) | Asymmetric (responses bypass LB) |
| Load balancer as bottleneck | Yes (for response traffic) | No (handles only requests) |
| Backend network requirements | Any (can span networks) | Same Layer 2 segment |
| Backend configuration | Standard | VIP on loopback interface |
| Connection tracking | Full support | Limited (no response visibility) |
| Health checking | Straightforward | Requires additional mechanisms |
| Typical use case | General purpose | High-bandwidth services (video, downloads) |
TCP connection handling is central to Layer 4 load balancing performance and behavior. Understanding the nuances of TCP at this layer is essential for production deployments.
Every TCP connection begins with a three-way handshake:
The load balancer must make its routing decision at the SYN packet—before any data is exchanged. This is why Layer 4 balancers cannot route based on content; routing happens before content exists.
Layer 4 load balancers identify connections using the 5-tuple:
(Source IP, Source Port, Destination IP, Destination Port, Protocol)
Once a routing decision is made for a connection, subsequent packets with the same 5-tuple must go to the same backend server. This is called connection affinity or connection persistence.
The load balancer maintains a connection table mapping 5-tuples to backend servers. For high-traffic systems, this table can contain millions of entries, requiring careful memory management.
12345678910111213
# Conceptual Layer 4 Connection Table +----------------------+----------------------+-------------------+| Client Connection | Backend Server | State |+----------------------+----------------------+-------------------+| 10.0.1.5:45123 | 192.168.1.10:8080 | ESTABLISHED || 10.0.1.7:52891 | 192.168.1.11:8080 | ESTABLISHED || 10.0.1.5:45124 | 192.168.1.12:8080 | TIME_WAIT || 10.0.1.9:38442 | 192.168.1.10:8080 | SYN_RECEIVED |+----------------------+----------------------+-------------------+ # Note: Same client (10.0.1.5) can have multiple connections# routed to different backends based on source portLayer 4 load balancers must track connection state to ensure packets are routed correctly. This creates several operational considerations:
TCP Connection States:
The TIME_WAIT problem: When a connection closes, TCP requires maintaining state for a period (TIME_WAIT) to handle delayed packets. With millions of short-lived connections, TIME_WAIT entries can exhaust connection table memory. Production load balancers implement aggressive connection reaping, reduced TIME_WAIT durations, or connection table compression.
State synchronization in HA pairs: When running load balancers in high-availability pairs, connection state must be synchronized between the active and standby unit. This is complex for high-throughput scenarios and represents a key engineering challenge.
Layer 4 load balancers are exposed to SYN flood attacks—malicious actors send millions of SYN packets to exhaust connection table memory without completing handshakes. Production systems implement SYN cookies, connection rate limiting, and stateless packet filtering to mitigate this attack vector.
While TCP dominates web traffic, UDP is critical for real-time applications: video streaming, voice over IP, gaming, and DNS. Layer 4 load balancing for UDP presents unique challenges due to the protocol's connectionless nature.
Unlike TCP, UDP has no connection establishment or teardown—packets are fired independently with no guaranteed delivery or ordering. This creates fundamental difficulties:
Despite being connectionless, many UDP-based protocols require session affinity—all packets from a given source must reach the same backend. This is critical for:
Layer 4 load balancers implement pseudo-sessions for UDP:
The timeout value is critical: too short causes session breakage, too long exhausts memory.
| Application | Protocol | Recommended Timeout | Rationale |
|---|---|---|---|
| DNS | UDP/53 | 30 seconds | Short queries, stateless |
| VoIP/SIP | UDP/5060 | 180 seconds | Call setup, registration |
| RTP (Voice/Video) | UDP dynamic | 60 seconds | Active streams, packet loss acceptable |
| Gaming | UDP custom | 120 seconds | Session persistence, reconnection |
| QUIC/HTTP3 | UDP/443 | 300 seconds | Long-lived connections |
The emergence of QUIC (Quick UDP Internet Connections)—the protocol underlying HTTP/3—has transformed UDP load balancing requirements. QUIC implements connection semantics over UDP:
Traditional Layer 4 load balancing based on 5-tuple breaks when clients migrate networks (e.g., phone moving from WiFi to cellular). Advanced Layer 4 load balancers now support QUIC-aware routing:
This represents an evolution of Layer 4 balancing—extracting just enough information from the packet to maintain sessions without full Layer 7 inspection.
Unlike TCP, UDP provides no built-in acknowledgment mechanism. Health checking UDP services requires application-specific probes—sending a valid request and expecting a valid response. For DNS, this might mean sending a query and expecting a response. Many environments fall back to TCP health checks even for UDP services when the service supports both protocols.
The primary advantage of Layer 4 load balancing is performance. By avoiding application-layer inspection, Layer 4 balancers achieve throughput and latency figures that Layer 7 balancers cannot match.
Modern Layer 4 load balancers can handle extraordinary traffic volumes:
The key enabler is minimal per-packet processing:
No protocol parsing, no content inspection, no connection termination/re-establishment.
Latency at Layer 4 is dominated by network propagation time, not processing overhead:
For comparison, Layer 7 processing typically adds 0.5-5 milliseconds—orders of magnitude more.
Layer 4 load balancers scale through several patterns:
Horizontal scaling with ECMP: Deploy multiple independent Layer 4 balancers behind a router using Equal-Cost Multi-Path (ECMP) routing. The router distributes traffic across balancers based on packet hashes.
Kernel bypass: Technologies like DPDK (Data Plane Development Kit) and XDP (eXpress Data Path) allow packet processing in user space or early kernel stages, avoiding kernel network stack overhead entirely.
Hardware offload: Specialized NICs and ASICs implement load balancing in hardware, achieving throughput impossible in software.
| Technology | Throughput | Latency | Complexity | Use Case |
|---|---|---|---|---|
| Linux IPVS | 1-10 Gbps | 50-100 µs | Low | General purpose, small-medium scale |
| HAProxy (L4 mode) | 1-5 Gbps | 100-200 µs | Low | Flexible configuration, visibility |
| DPDK-based | 10-40 Gbps | <10 µs | High | High-performance, telecom grade |
| XDP/eBPF | 10-40 Gbps | <10 µs | Medium | Programmable, cloud native |
| Hardware ASIC | 100+ Gbps | <5 µs | Medium | Enterprise, carrier grade |
| Cloud LB (AWS NLB) | 100+ Gbps | ~50 µs | Low | Cloud-native, managed |
Layer 4 load balancers use various algorithms to distribute connections across backend servers. The choice of algorithm affects load distribution, cache efficiency, and connection persistence.
The simplest algorithm: connections are distributed to backends in circular order.
Advantages: Perfect distribution if all connections are equal Disadvantages: Ignores connection duration, backend capacity, or current load
Servers receive connections in proportion to assigned weights.
Example: Server A (weight 3), Server B (weight 1) → A receives 75% of connections Use case: Heterogeneous server capacities, gradual rollouts
New connections go to the backend with the fewest active connections.
Advantages: Automatically adapts to varying connection durations Disadvantages: Requires real-time connection counting, overhead for high-volume systems
The source IP address is hashed to determine the backend. Same source IP always routes to the same backend.
Advantages: Natural session affinity without connection tracking Disadvantages: Uneven distribution if source IPs are clustered (e.g., behind NAT)
123456789101112131415
def source_ip_hash(source_ip: str, backends: list[str]) -> str: """ Route based on hash of source IP address. Same source IP always maps to same backend (assuming stable backend list). """ # Simple hash-based routing hash_value = hash(source_ip) backend_index = hash_value % len(backends) return backends[backend_index] # Example usagebackends = ["backend-1", "backend-2", "backend-3"]print(source_ip_hash("10.0.1.5", backends)) # Consistent resultprint(source_ip_hash("10.0.1.5", backends)) # Same resultprint(source_ip_hash("10.0.1.6", backends)) # May differConsistent hashing minimizes redistribution when backends are added or removed. Instead of modular hashing, backends are placed on a hash ring, and connections map to the nearest backend clockwise.
Advantages: Adding/removing a backend affects only 1/N of connections (where N = number of backends) Use case: Cache servers, stateful backends where connection redistribution is costly
Google's Maglev paper introduced a consistent hashing algorithm specifically designed for load balancers. It provides:
Maglev hashing is used in Google's production load balancing and several open-source implementations.
At large scale, algorithm choice matters less than expected—with millions of connections per second, even imperfect algorithms converge toward even distribution. The critical factors become: connection tracking overhead, memory usage for hash tables, and behavior during backend changes. Maglev hashing excels at all three.
Understanding real-world implementations helps contextualize Layer 4 concepts. Here are the most significant production systems:
IPVS is the Linux kernel's native Layer 4 load balancing implementation, part of the LVS (Linux Virtual Server) project. It operates within the kernel's netfilter framework.
Capabilities:
Use case: Kubernetes kube-proxy (in IPVS mode), traditional datacenter load balancing
AWS NLB is a managed Layer 4 service designed for extreme scale:
Capabilities:
Use case: High-performance TCP/UDP services, gaming, IoT, non-HTTP protocols
Modern cloud-native load balancing increasingly uses eBPF (extended Berkeley Packet Filter) technology:
How it works:
Advantages:
Implementation: Cilium uses eBPF to implement kube-proxy replacement, achieving 10x better performance than iptables-based implementations.
Historically, high-performance Layer 4 load balancing required expensive hardware (F5, Citrix). Modern software implementations—particularly those using DPDK or eBPF—can match or exceed hardware performance on commodity servers. This has democratized Layer 4 load balancing, making it accessible to any organization.
Layer 4 load balancing represents the foundational layer of network traffic distribution. By operating at the transport layer, it achieves performance characteristics impossible at higher layers.
What's next:
Layer 4 load balancing excels at raw performance but lacks application awareness. In the next page, we'll explore Layer 7 load balancing, which trades some performance for the ability to make routing decisions based on HTTP headers, URLs, cookies, and request content—enabling sophisticated traffic management that Layer 4 cannot achieve.
You now understand how Layer 4 load balancing operates at the transport layer, handling TCP and UDP connections with minimal overhead. This foundational knowledge prepares you to understand Layer 7 load balancing and, critically, when to choose each approach.