Computer NetworksCloud & Datacenter Networking

Load Balancing

LevelIntermediate

Duration75 mins

TopicCloud & Datacenter Networking

1 / 5

Load Balancer Concept

The Foundation of Scalable Systems

Every time you access Netflix, Google, Amazon, or any major web service, your request is handled not by a single server, but by one selected from potentially thousands of servers distributed across the globe. The invisible orchestrator making this selection—deciding which server handles your specific request—is the load balancer.

Load balancing is not merely a networking convenience; it is the foundational architectural pattern that enables the internet as we know it. Without load balancing, no website could handle more traffic than a single server provides, no service could achieve high availability, and the entire concept of horizontal scaling would be impossible.

This page provides an exhaustive exploration of load balancer concepts, from fundamental principles to architectural considerations that guide the design of systems serving billions of users.

What You Will Master

By the end of this page, you will understand: the fundamental problem load balancing solves, the core architectural patterns for implementing load balancers, how load balancers fit into the broader network topology, the key metrics and considerations that drive load balancer design, and why load balancing is essential for every aspect of modern distributed systems.

The Fundamental Problem: Why Load Balancing Exists

To understand load balancing at a deep level, we must first understand the fundamental problem it solves. Consider a simple web application serving users:

The Single Server Limitation:

Every server has finite resources:

CPU: Limited processing power (measured in cores and clock speed)
Memory: Limited RAM for active sessions and caching
Network I/O: Limited bandwidth for incoming and outgoing traffic
Disk I/O: Limited throughput for database operations
Connection Limits: Operating system limits on concurrent connections (typically 65,535 per IP:port combination)

When user demand exceeds any of these limits, the server becomes a bottleneck. Response times increase, requests queue up, and eventually the server fails entirely—often at the worst possible moment (during traffic spikes when you need it most).

The Thundering Herd Problem

A particularly dangerous scenario is the 'thundering herd' problem: when a single server fails and all its traffic shifts to remaining servers, those servers may also fail from the sudden load increase, creating a cascade of failures. Load balancing with proper configuration prevents this catastrophic scenario.

Vertical vs. Horizontal Scaling:

Faced with capacity limits, there are two fundamental approaches:

Vertical Scaling (Scale Up):

Add more resources to a single server
More CPU cores, more RAM, faster storage
Simple to implement (no architectural changes)
Has hard limits (you can't add infinite RAM)
Creates single point of failure
Extremely expensive at high end

Horizontal Scaling (Scale Out):

Add more servers to handle load
Each server handles a portion of traffic
Theoretically unlimited scaling
Requires a mechanism to distribute traffic
More complex but more resilient
Often more cost-effective

Horizontal scaling is the approach that enables modern web-scale services—and it requires load balancing to function.

Vertical vs. Horizontal Scaling Comparison
Aspect	Vertical Scaling	Horizontal Scaling
Approach	Bigger, more powerful servers	More servers of similar size
Complexity	Low (single server)	Higher (distributed system)
Cost Curve	Exponential (diminishing returns)	Linear (predictable)
Failure Impact	Total outage	Partial degradation
Maximum Capacity	Hardware limits	Practically unlimited
Recovery Time	Full restart required	Traffic shifts automatically
Geographic Distribution	Single location	Multiple regions possible
Requires Load Balancing	No	Yes (essential)

What Is a Load Balancer: Core Definition and Purpose

A load balancer is a network device or software component that distributes incoming network traffic across multiple backend servers (also called targets, endpoints, or upstream servers) according to configurable rules and algorithms.

Formal Definition:

A load balancer is a reverse proxy that accepts client connections and forwards them to one or more backend servers, making decisions about which backend should handle each request based on:

The current health and availability of backends
The current load on each backend
The characteristics of the incoming request
Configured distribution policies and algorithms

Key Terminology:

Term	Definition
Frontend	The client-facing side of the load balancer (IP:port clients connect to)
Backend	The pool of servers that actually handle requests
Listener	A process on the LB that accepts connections on a port
Target Group	A logical grouping of backend servers
Health Check	Periodic tests to verify backend availability
Session Persistence	Routing subsequent requests from same client to same backend
Connection Draining	Gracefully completing existing connections before removing a backend

Load Balancer vs. Reverse Proxy

While all load balancers are reverse proxies, not all reverse proxies are load balancers. A reverse proxy forwards requests to a backend; a load balancer is a reverse proxy that specifically distributes load across multiple backends using selection algorithms. Technologies like NGINX, HAProxy, and Envoy can function as both.

The Load Balancer's Core Responsibilities:

Traffic Distribution:
- Accept incoming client connections
- Select appropriate backend server
- Forward request to selected backend
- Return response to client
- All while appearing as a single server to clients
Health Monitoring:
- Continuously verify backend availability
- Detect failed or degraded backends
- Remove unhealthy backends from rotation
- Reintroduce recovered backends
Session Management:
- Track client sessions when required
- Maintain sticky sessions (affinity)
- Handle session state across backends
Security Boundary:
- Hide backend topology from clients
- Terminate SSL/TLS connections
- Provide DDoS protection
- Implement access control policies
Observability:
- Log all requests and responses
- Expose metrics (latency, error rates, throughput)
- Enable debugging and troubleshooting
- Support distributed tracing

Load Balancer Architecture Patterns

Load balancers can be deployed in several architectural patterns, each with distinct characteristics for performance, reliability, and complexity. Understanding these patterns is essential for designing scalable systems.

Pattern 1: Single Load Balancer (Basic)

                     ┌─────────────┐
   Clients ────────► │ Load        │ ────► Backend 1
                     │ Balancer    │ ────► Backend 2
                     └─────────────┘ ────► Backend 3

Pros: Simple to implement and manage
Cons: Single point of failure
Use Case: Development, testing, low-traffic applications

Pattern 2: Active-Passive (High Availability)

                     ┌─────────────┐
   Clients ────────► │ Active LB   │ ────► Backends
                     └─────────────┘
                           ▲ heartbeat
                           ▼
                     ┌─────────────┐
                     │ Passive LB  │ (standby)
                     └─────────────┘

Pros: Eliminates single point of failure
Cons: Passive LB is underutilized; failover time
Mechanism: Virtual IP (VIP) floats between nodes using VRRP
Use Case: Critical applications requiring redundancy

Pattern 3: Active-Active (Load Sharing)

                     ┌─────────────┐
                  ┌─►│ LB Node 1   │─┐
   DNS/Anycast ──┼──►│             │─┼──► Backends
                  └─►│ LB Node 2   │─┘
                     └─────────────┘

Pros: Full utilization of all LB nodes; better performance
Cons: More complex; requires consistent configuration
Mechanism: DNS Round-Robin, BGP Anycast, or upstream load balancing
Use Case: High-traffic applications

Pattern 4: Multi-Tier Load Balancing

                     ┌─────────────┐
   Clients ────────► │ L4 LB       │ ──► L7 LB Pool
                     │ (TCP/UDP)   │     ┌──────────┐
                     └─────────────┘ ──► │ L7 LB 1  │──► Services
                                         │ L7 LB 2  │──► Services
                                         └──────────┘

Pros: Separates concerns; scalable L7 processing
Cons: Additional latency and complexity
Use Case: Large-scale systems with complex routing needs

Pattern 5: Service Mesh / Sidecar Proxy

                     ┌───────────────────────┐
                     │ Pod/Container         │
   ──────────────►   │  ┌─────────┐          │
                     │  │ Sidecar │◄──────►Service│
                     │  │ Proxy   │          │
                     │  └─────────┘          │
                     └───────────────────────┘

Pros: Decentralized; service-to-service load balancing
Cons: Resource overhead per service instance
Mechanism: Envoy, Linkerd sidecars
Use Case: Kubernetes, microservices architectures

Choosing the Right Pattern

The choice of architecture pattern depends on traffic volume, availability requirements, and operational complexity tolerance. Most production systems start with Active-Passive for simplicity and migrate to Active-Active or Multi-Tier as they scale. Service mesh patterns are increasingly common in containerized environments.

Types of Load Balancers: Hardware, Software, and Cloud

Load balancers come in three fundamental forms, each with distinct operational characteristics:

Hardware Load Balancers:

Purpose-built physical appliances optimized for network processing.

Examples: F5 BIG-IP, Citrix ADC (NetScaler), A10 Networks

Characteristics:

Custom ASICs for wire-speed processing
Extremely high throughput (100+ Gbps)
Sub-millisecond latency
Hardware SSL acceleration
Dedicated support and SLAs
Capital expenditure model
Fixed capacity (must over-provision)
Complex to manage and upgrade
Vendor lock-in risks

Best For: Financial services, telecommunications, enterprises with existing hardware infrastructure

Software Load Balancers:

Software applications running on commodity hardware or virtual machines.

Examples: NGINX, HAProxy, Envoy Proxy, Traefik, Caddy

Characteristics:

Runs on standard servers/VMs
Flexible and programmable
Easy to update and patch
Horizontal scaling by adding instances
Open-source options available
Configuration-as-code friendly
Lower upfront cost
Performance depends on underlying hardware
Requires operational expertise

Performance Benchmarks (typical):

Software LB	Requests/sec (L7)	Connections/sec (L4)	Latency Added
NGINX	100K-500K	50K-200K	1-5ms
HAProxy	500K-1M	100K-500K	0.5-2ms
Envoy	100K-300K	50K-150K	1-3ms

Note: Performance varies significantly based on configuration, hardware, and workload

Cloud Load Balancers:

Managed services provided by cloud providers.

Examples:

AWS: ALB, NLB, CLB, GWLB
Azure: Azure Load Balancer, Application Gateway
GCP: Cloud Load Balancing, Traffic Director

Characteristics:

Fully managed (no infrastructure to maintain)
Auto-scaling based on demand
Pay-per-use pricing
Integrated with cloud ecosystems
Global distribution options
Built-in redundancy and availability
Limited customization compared to software LBs
Potential for vendor lock-in
Cost can scale unexpectedly with traffic

AWS Load Balancer Types:

Type	Layer	Use Case	Key Feature
Classic (CLB)	4/7	Legacy	Simple, deprecated
Application (ALB)	7	HTTP/HTTPS	Content-based routing
Network (NLB)	4	TCP/UDP	Ultra-low latency
Gateway (GWLB)	3	Appliances	Inline security insertion

Hybrid Approaches

Many organizations combine multiple load balancer types: cloud LBs for external traffic ingress, software LBs (like Envoy) for internal service-to-service communication, and potentially hardware LBs for specialized high-frequency trading or telecom workloads.

Load Balancing in Network Topology

Understanding where load balancers fit in the overall network topology is essential for designing resilient systems. Let's examine the typical data path for a request.

Complete Request Path:

User Browser
    │
    ▼
┌─────────────────┐
│ DNS Resolution  │  ──► Returns load balancer IP(s)
└─────────────────┘
    │
    ▼
┌─────────────────┐
│ CDN / Edge      │  ──► Cache hit → return cached content
└─────────────────┘      Cache miss → continue to origin
    │
    ▼
┌─────────────────┐
│ Global LB       │  ──► Route to nearest/best region
│ (GSLB/Anycast)  │
└─────────────────┘
    │
    ▼
┌─────────────────┐
│ Regional LB     │  ──► Route to availability zone
│ (L4/L7)         │
└─────────────────┘
    │
    ▼
┌─────────────────┐
│ Application LB  │  ──► Route to specific service
│ (L7 Routing)    │
└─────────────────┘
    │
    ▼
┌─────────────────┐
│ Service Mesh    │  ──► Route to service instance
│ (Sidecar Proxy) │
└─────────────────┘
    │
    ▼
┌─────────────────┐
│ Application     │  ──► Process request
│ Instance        │
└─────────────────┘

Notice that a single request may traverse multiple layers of load balancing, each making decisions at different scopes.

Load Balancing Layers in Modern Architecture
Layer	Scope	Decision Factors	Examples
DNS/GSLB	Global	Geography, health, policy	Route 53, Cloudflare, Akamai
CDN Edge	Regional	Cache status, content type	CloudFront, Fastly
Regional LB	Region/DC	Zone health, capacity	AWS ALB, Azure LB
Application LB	Service	Path, headers, content	NGINX Ingress, Envoy
Service Mesh	Instance	Load, latency, circuit state	Istio, Linkerd

Inline vs. Out-of-Band:

Load balancers can operate in two fundamental modes:

Inline (Proxy) Mode:

All traffic flows through the load balancer
LB terminates client connection, creates new backend connection
Can inspect and modify traffic
Adds latency (typically 1-5ms)
Most common mode for L7 load balancing

Direct Server Return (DSR) Mode:

Client → LB → Backend (request path)
Client ← Backend (direct response path, bypassing LB)
LB only handles incoming traffic
Backend responds directly to client
Much higher throughput (LB doesn't process responses)
Cannot modify responses
Complex to configure
Used for high-bandwidth streaming, gaming

When to Use DSR

Direct Server Return is valuable when responses are significantly larger than requests (streaming video, file downloads) and you want to minimize load on the load balancer. However, it sacrifices the ability to inspect or modify responses, and requires backends to be configured with the LB's VIP address.

Key Metrics and Design Considerations

Designing and operating load balancers requires understanding the key metrics that indicate system health and performance.

Traffic Metrics:

Metric	Description	Typical Thresholds
Requests per Second (RPS)	Total incoming request rate	Scale trigger: 80% of capacity
Connections per Second (CPS)	New TCP connections established	High CPS can exhaust ephemeral ports
Concurrent Connections	Active connections at any moment	Affects memory usage
Bandwidth (In/Out)	Data transfer rate	Network capacity limits
Active Backend Count	Healthy backends available	Alert if < N backends

Latency Metrics:

Metric	Description	Target Values
Connection Time	Time to establish backend connection	< 10ms
Time to First Byte (TTFB)	Time until first response byte	< 100ms (internal)
Total Request Time	Complete request-response duration	Application specific
Queue Time	Time spent waiting for processing	Should be ~0ms

Error Metrics:

Metric	Description	Healthy Threshold
5xx Error Rate	Server-side errors percentage	< 0.1%
4xx Error Rate	Client-side errors percentage	< 5% (varies by app)
Connection Errors	Failed connections to backends	< 0.01%
Health Check Failures	Failed backend health checks	Alert on any
Retry Rate	Requests requiring retry	< 1%

Capacity Planning Considerations:

Headroom: Always maintain 30-50% capacity headroom
Burst Capacity: Size for peak traffic, not average
Connection Limits: Each connection consumes memory and file descriptors
SSL/TLS Overhead: Encryption adds CPU load (10-30%)
Keep-Alive Settings: Balance connection reuse vs. resource consumption
Timeout Configuration: Prevent resource exhaustion from slow clients/backends

The Tail Latency Problem

Don't just monitor average latency—monitor P99 and P99.9 latencies. If your P99 latency is 10x your average, 1% of your users are having a terrible experience. Load balancer issues often manifest in tail latencies before affecting averages.

Load Balancer Anti-Patterns to Avoid

•Single Load Balancer: Always deploy at least two for redundancy
•No Health Checks: Blindly routing to failed backends causes user-facing errors
•Overly Aggressive Timeouts: Can cause cascading failures under load
•Ignoring Connection Limits: Leads to resource exhaustion
•Session Affinity Without Need: Reduces load distribution effectiveness
•Not Draining Connections: Can cause errors during deployments
•Insufficient Logging: Makes debugging production issues impossible

SSL/TLS Termination at the Load Balancer

One of the most important functions of modern load balancers is SSL/TLS termination—decrypting incoming HTTPS traffic and optionally re-encrypting it before forwarding to backends.

How SSL/TLS Termination Works:

Client ──HTTPS──► Load Balancer ──HTTP──► Backend
       (encrypted)    (decrypts)    (plaintext)
                      (inspects)
                      (routes)

Benefits of SSL Termination at LB:

Centralized Certificate Management: One place to manage and renew certificates
Offload CPU from Backends: Backends don't perform expensive crypto operations
Enable L7 Inspection: LB can route based on HTTP headers, paths, cookies
Simplified Backend Configuration: Backends serve plain HTTP
Better Performance: LB hardware/optimizations for SSL processing

SSL/TLS Deployment Patterns:

SSL/TLS Termination Patterns
Pattern	Client→LB	LB→Backend	Use Case
SSL Termination	HTTPS	HTTP	Internal backends in trusted network
SSL Passthrough	HTTPS	HTTPS (unchanged)	End-to-end encryption required
SSL Re-encryption	HTTPS	HTTPS (new connection)	Most common for compliance
Mutual TLS (mTLS)	mTLS	mTLS	Zero-trust security models

Performance Considerations:

SSL/TLS processing is CPU-intensive. Key factors affecting performance:

Cipher Suite Selection: ECDHE+AES-GCM is fast; RSA key exchange is slow
Certificate Key Size: 2048-bit RSA is adequate; 4096-bit is 4x slower
Session Resumption: TLS session tickets/IDs reduce repeated handshakes
OCSP Stapling: Avoids client-side certificate revocation checks
Hardware Acceleration: Modern CPUs have AES-NI instructions

Example NGINX SSL Configuration (Optimized):

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256';
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 1d;
ssl_session_tickets off;  # Prefer session cache for forward secrecy
ssl_stapling on;
ssl_stapling_verify on;

End-to-End Encryption Compliance

Some compliance standards (PCI-DSS, HIPAA) require encryption in transit at all points. In these cases, use SSL re-encryption: terminate at the LB for inspection and routing, then establish a new encrypted connection to backends. This provides both visibility and compliance.

Summary: Load Balancer Fundamentals

We've established the foundational concepts of load balancing. Let's consolidate the key principles before moving to Layer 4 vs. Layer 7 specifics.

Key Takeaways

•Load balancers enable horizontal scaling — Without them, you're limited to the capacity of a single server, which is always insufficient for real-world demand.
•Load balancers are reverse proxies with distribution logic — They accept client connections and intelligently forward to backend pools.
•Multiple architecture patterns exist — From simple single-LB setups to sophisticated multi-tier and service mesh deployments, each with tradeoffs.
•Three fundamental types — Hardware (raw performance), Software (flexibility), and Cloud (managed convenience).
•Load balancers exist at multiple points — From global DNS-based routing to service-level sidecar proxies, modern systems use layered load balancing.
•SSL/TLS termination is a core function — Centralizing encryption simplifies management and enables content-based routing.
•Metrics and monitoring are essential — RPS, latency, error rates, and backend health determine system reliability.

What's Next:

Now that we understand what load balancers are and their role in network architecture, we'll dive deep into the fundamental distinction between Layer 4 (L4) and Layer 7 (L7) load balancing. This distinction determines what information the load balancer can use for routing decisions and has profound implications for performance, capabilities, and use cases.

Page Complete

You now understand the core concepts of load balancing: why it's essential, how it fits into network architecture, the types available, and key operational considerations. Next, we'll explore how L4 and L7 load balancers differ in their approach to traffic distribution.

1 / 5

Loading learning content...

Computer NetworksCloud & Datacenter Networking

Load Balancing

LevelIntermediate

Duration75 mins

TopicCloud & Datacenter Networking

1 / 5

Load Balancer Concept

The Foundation of Scalable Systems

This page provides an exhaustive exploration of load balancer concepts, from fundamental principles to architectural considerations that guide the design of systems serving billions of users.

What You Will Master

The Fundamental Problem: Why Load Balancing Exists

To understand load balancing at a deep level, we must first understand the fundamental problem it solves. Consider a simple web application serving users:

The Single Server Limitation:

Every server has finite resources:

CPU: Limited processing power (measured in cores and clock speed)
Memory: Limited RAM for active sessions and caching
Network I/O: Limited bandwidth for incoming and outgoing traffic
Disk I/O: Limited throughput for database operations
Connection Limits: Operating system limits on concurrent connections (typically 65,535 per IP:port combination)

The Thundering Herd Problem

Vertical vs. Horizontal Scaling:

Faced with capacity limits, there are two fundamental approaches:

Vertical Scaling (Scale Up):

Add more resources to a single server
More CPU cores, more RAM, faster storage
Simple to implement (no architectural changes)
Has hard limits (you can't add infinite RAM)
Creates single point of failure
Extremely expensive at high end

Horizontal Scaling (Scale Out):

Add more servers to handle load
Each server handles a portion of traffic
Theoretically unlimited scaling
Requires a mechanism to distribute traffic
More complex but more resilient
Often more cost-effective

Horizontal scaling is the approach that enables modern web-scale services—and it requires load balancing to function.

Vertical vs. Horizontal Scaling Comparison
Aspect	Vertical Scaling	Horizontal Scaling
Approach	Bigger, more powerful servers	More servers of similar size
Complexity	Low (single server)	Higher (distributed system)
Cost Curve	Exponential (diminishing returns)	Linear (predictable)
Failure Impact	Total outage	Partial degradation
Maximum Capacity	Hardware limits	Practically unlimited
Recovery Time	Full restart required	Traffic shifts automatically
Geographic Distribution	Single location	Multiple regions possible
Requires Load Balancing	No	Yes (essential)

What Is a Load Balancer: Core Definition and Purpose

Formal Definition:

A load balancer is a reverse proxy that accepts client connections and forwards them to one or more backend servers, making decisions about which backend should handle each request based on:

The current health and availability of backends
The current load on each backend
The characteristics of the incoming request
Configured distribution policies and algorithms

Key Terminology:

Term	Definition
Frontend	The client-facing side of the load balancer (IP:port clients connect to)
Backend	The pool of servers that actually handle requests
Listener	A process on the LB that accepts connections on a port
Target Group	A logical grouping of backend servers
Health Check	Periodic tests to verify backend availability
Session Persistence	Routing subsequent requests from same client to same backend
Connection Draining	Gracefully completing existing connections before removing a backend

Load Balancer vs. Reverse Proxy

The Load Balancer's Core Responsibilities:

Traffic Distribution:
- Accept incoming client connections
- Select appropriate backend server
- Forward request to selected backend
- Return response to client
- All while appearing as a single server to clients
Health Monitoring:
- Continuously verify backend availability
- Detect failed or degraded backends
- Remove unhealthy backends from rotation
- Reintroduce recovered backends
Session Management:
- Track client sessions when required
- Maintain sticky sessions (affinity)
- Handle session state across backends
Security Boundary:
- Hide backend topology from clients
- Terminate SSL/TLS connections
- Provide DDoS protection
- Implement access control policies
Observability:
- Log all requests and responses
- Expose metrics (latency, error rates, throughput)
- Enable debugging and troubleshooting
- Support distributed tracing

Load Balancer Architecture Patterns

Pattern 1: Single Load Balancer (Basic)

                     ┌─────────────┐
   Clients ────────► │ Load        │ ────► Backend 1
                     │ Balancer    │ ────► Backend 2
                     └─────────────┘ ────► Backend 3

Pros: Simple to implement and manage
Cons: Single point of failure
Use Case: Development, testing, low-traffic applications

Pattern 2: Active-Passive (High Availability)

                     ┌─────────────┐
   Clients ────────► │ Active LB   │ ────► Backends
                     └─────────────┘
                           ▲ heartbeat
                           ▼
                     ┌─────────────┐
                     │ Passive LB  │ (standby)
                     └─────────────┘

Pros: Eliminates single point of failure
Cons: Passive LB is underutilized; failover time
Mechanism: Virtual IP (VIP) floats between nodes using VRRP
Use Case: Critical applications requiring redundancy

Pattern 3: Active-Active (Load Sharing)

                     ┌─────────────┐
                  ┌─►│ LB Node 1   │─┐
   DNS/Anycast ──┼──►│             │─┼──► Backends
                  └─►│ LB Node 2   │─┘
                     └─────────────┘

Pros: Full utilization of all LB nodes; better performance
Cons: More complex; requires consistent configuration
Mechanism: DNS Round-Robin, BGP Anycast, or upstream load balancing
Use Case: High-traffic applications

Pattern 4: Multi-Tier Load Balancing

                     ┌─────────────┐
   Clients ────────► │ L4 LB       │ ──► L7 LB Pool
                     │ (TCP/UDP)   │     ┌──────────┐
                     └─────────────┘ ──► │ L7 LB 1  │──► Services
                                         │ L7 LB 2  │──► Services
                                         └──────────┘

Pros: Separates concerns; scalable L7 processing
Cons: Additional latency and complexity
Use Case: Large-scale systems with complex routing needs

Pattern 5: Service Mesh / Sidecar Proxy

                     ┌───────────────────────┐
                     │ Pod/Container         │
   ──────────────►   │  ┌─────────┐          │
                     │  │ Sidecar │◄──────►Service│
                     │  │ Proxy   │          │
                     │  └─────────┘          │
                     └───────────────────────┘

Pros: Decentralized; service-to-service load balancing
Cons: Resource overhead per service instance
Mechanism: Envoy, Linkerd sidecars
Use Case: Kubernetes, microservices architectures

Choosing the Right Pattern

Types of Load Balancers: Hardware, Software, and Cloud

Load balancers come in three fundamental forms, each with distinct operational characteristics:

Hardware Load Balancers:

Purpose-built physical appliances optimized for network processing.

Examples: F5 BIG-IP, Citrix ADC (NetScaler), A10 Networks

Characteristics:

Custom ASICs for wire-speed processing
Extremely high throughput (100+ Gbps)
Sub-millisecond latency
Hardware SSL acceleration
Dedicated support and SLAs
Capital expenditure model
Fixed capacity (must over-provision)
Complex to manage and upgrade
Vendor lock-in risks

Best For: Financial services, telecommunications, enterprises with existing hardware infrastructure

Software Load Balancers:

Software applications running on commodity hardware or virtual machines.

Examples: NGINX, HAProxy, Envoy Proxy, Traefik, Caddy

Characteristics:

Runs on standard servers/VMs
Flexible and programmable
Easy to update and patch
Horizontal scaling by adding instances
Open-source options available
Configuration-as-code friendly
Lower upfront cost
Performance depends on underlying hardware
Requires operational expertise

Performance Benchmarks (typical):

Software LB	Requests/sec (L7)	Connections/sec (L4)	Latency Added
NGINX	100K-500K	50K-200K	1-5ms
HAProxy	500K-1M	100K-500K	0.5-2ms
Envoy	100K-300K	50K-150K	1-3ms

Note: Performance varies significantly based on configuration, hardware, and workload

Cloud Load Balancers:

Managed services provided by cloud providers.

Examples:

AWS: ALB, NLB, CLB, GWLB
Azure: Azure Load Balancer, Application Gateway
GCP: Cloud Load Balancing, Traffic Director

Characteristics:

Fully managed (no infrastructure to maintain)
Auto-scaling based on demand
Pay-per-use pricing
Integrated with cloud ecosystems
Global distribution options
Built-in redundancy and availability
Limited customization compared to software LBs
Potential for vendor lock-in
Cost can scale unexpectedly with traffic

AWS Load Balancer Types:

Type	Layer	Use Case	Key Feature
Classic (CLB)	4/7	Legacy	Simple, deprecated
Application (ALB)	7	HTTP/HTTPS	Content-based routing
Network (NLB)	4	TCP/UDP	Ultra-low latency
Gateway (GWLB)	3	Appliances	Inline security insertion

Hybrid Approaches

Load Balancing in Network Topology

Understanding where load balancers fit in the overall network topology is essential for designing resilient systems. Let's examine the typical data path for a request.

Complete Request Path:

User Browser
    │
    ▼
┌─────────────────┐
│ DNS Resolution  │  ──► Returns load balancer IP(s)
└─────────────────┘
    │
    ▼
┌─────────────────┐
│ CDN / Edge      │  ──► Cache hit → return cached content
└─────────────────┘      Cache miss → continue to origin
    │
    ▼
┌─────────────────┐
│ Global LB       │  ──► Route to nearest/best region
│ (GSLB/Anycast)  │
└─────────────────┘
    │
    ▼
┌─────────────────┐
│ Regional LB     │  ──► Route to availability zone
│ (L4/L7)         │
└─────────────────┘
    │
    ▼
┌─────────────────┐
│ Application LB  │  ──► Route to specific service
│ (L7 Routing)    │
└─────────────────┘
    │
    ▼
┌─────────────────┐
│ Service Mesh    │  ──► Route to service instance
│ (Sidecar Proxy) │
└─────────────────┘
    │
    ▼
┌─────────────────┐
│ Application     │  ──► Process request
│ Instance        │
└─────────────────┘

Notice that a single request may traverse multiple layers of load balancing, each making decisions at different scopes.

Load Balancing Layers in Modern Architecture
Layer	Scope	Decision Factors	Examples
DNS/GSLB	Global	Geography, health, policy	Route 53, Cloudflare, Akamai
CDN Edge	Regional	Cache status, content type	CloudFront, Fastly
Regional LB	Region/DC	Zone health, capacity	AWS ALB, Azure LB
Application LB	Service	Path, headers, content	NGINX Ingress, Envoy
Service Mesh	Instance	Load, latency, circuit state	Istio, Linkerd

Inline vs. Out-of-Band:

Load balancers can operate in two fundamental modes:

Inline (Proxy) Mode:

All traffic flows through the load balancer
LB terminates client connection, creates new backend connection
Can inspect and modify traffic
Adds latency (typically 1-5ms)
Most common mode for L7 load balancing

Direct Server Return (DSR) Mode:

Client → LB → Backend (request path)
Client ← Backend (direct response path, bypassing LB)
LB only handles incoming traffic
Backend responds directly to client
Much higher throughput (LB doesn't process responses)
Cannot modify responses
Complex to configure
Used for high-bandwidth streaming, gaming

When to Use DSR

Key Metrics and Design Considerations

Designing and operating load balancers requires understanding the key metrics that indicate system health and performance.

Traffic Metrics:

Metric	Description	Typical Thresholds
Requests per Second (RPS)	Total incoming request rate	Scale trigger: 80% of capacity
Connections per Second (CPS)	New TCP connections established	High CPS can exhaust ephemeral ports
Concurrent Connections	Active connections at any moment	Affects memory usage
Bandwidth (In/Out)	Data transfer rate	Network capacity limits
Active Backend Count	Healthy backends available	Alert if < N backends

Latency Metrics:

Metric	Description	Target Values
Connection Time	Time to establish backend connection	< 10ms
Time to First Byte (TTFB)	Time until first response byte	< 100ms (internal)
Total Request Time	Complete request-response duration	Application specific
Queue Time	Time spent waiting for processing	Should be ~0ms

Error Metrics:

Metric	Description	Healthy Threshold
5xx Error Rate	Server-side errors percentage	< 0.1%
4xx Error Rate	Client-side errors percentage	< 5% (varies by app)
Connection Errors	Failed connections to backends	< 0.01%
Health Check Failures	Failed backend health checks	Alert on any
Retry Rate	Requests requiring retry	< 1%

Capacity Planning Considerations:

Headroom: Always maintain 30-50% capacity headroom
Burst Capacity: Size for peak traffic, not average
Connection Limits: Each connection consumes memory and file descriptors
SSL/TLS Overhead: Encryption adds CPU load (10-30%)
Keep-Alive Settings: Balance connection reuse vs. resource consumption
Timeout Configuration: Prevent resource exhaustion from slow clients/backends

The Tail Latency Problem

Load Balancer Anti-Patterns to Avoid

•Single Load Balancer: Always deploy at least two for redundancy
•No Health Checks: Blindly routing to failed backends causes user-facing errors
•Overly Aggressive Timeouts: Can cause cascading failures under load
•Ignoring Connection Limits: Leads to resource exhaustion
•Session Affinity Without Need: Reduces load distribution effectiveness
•Not Draining Connections: Can cause errors during deployments
•Insufficient Logging: Makes debugging production issues impossible

SSL/TLS Termination at the Load Balancer

One of the most important functions of modern load balancers is SSL/TLS termination—decrypting incoming HTTPS traffic and optionally re-encrypting it before forwarding to backends.

How SSL/TLS Termination Works:

Client ──HTTPS──► Load Balancer ──HTTP──► Backend
       (encrypted)    (decrypts)    (plaintext)
                      (inspects)
                      (routes)

Benefits of SSL Termination at LB:

Centralized Certificate Management: One place to manage and renew certificates
Offload CPU from Backends: Backends don't perform expensive crypto operations
Enable L7 Inspection: LB can route based on HTTP headers, paths, cookies
Simplified Backend Configuration: Backends serve plain HTTP
Better Performance: LB hardware/optimizations for SSL processing

SSL/TLS Deployment Patterns:

SSL/TLS Termination Patterns
Pattern	Client→LB	LB→Backend	Use Case
SSL Termination	HTTPS	HTTP	Internal backends in trusted network
SSL Passthrough	HTTPS	HTTPS (unchanged)	End-to-end encryption required
SSL Re-encryption	HTTPS	HTTPS (new connection)	Most common for compliance
Mutual TLS (mTLS)	mTLS	mTLS	Zero-trust security models

Performance Considerations:

SSL/TLS processing is CPU-intensive. Key factors affecting performance:

Cipher Suite Selection: ECDHE+AES-GCM is fast; RSA key exchange is slow
Certificate Key Size: 2048-bit RSA is adequate; 4096-bit is 4x slower
Session Resumption: TLS session tickets/IDs reduce repeated handshakes
OCSP Stapling: Avoids client-side certificate revocation checks
Hardware Acceleration: Modern CPUs have AES-NI instructions

Example NGINX SSL Configuration (Optimized):

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256';
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 1d;
ssl_session_tickets off;  # Prefer session cache for forward secrecy
ssl_stapling on;
ssl_stapling_verify on;

End-to-End Encryption Compliance

Summary: Load Balancer Fundamentals

We've established the foundational concepts of load balancing. Let's consolidate the key principles before moving to Layer 4 vs. Layer 7 specifics.

Key Takeaways

•Load balancers enable horizontal scaling — Without them, you're limited to the capacity of a single server, which is always insufficient for real-world demand.
•Load balancers are reverse proxies with distribution logic — They accept client connections and intelligently forward to backend pools.
•Multiple architecture patterns exist — From simple single-LB setups to sophisticated multi-tier and service mesh deployments, each with tradeoffs.
•Three fundamental types — Hardware (raw performance), Software (flexibility), and Cloud (managed convenience).
•Load balancers exist at multiple points — From global DNS-based routing to service-level sidecar proxies, modern systems use layered load balancing.
•SSL/TLS termination is a core function — Centralizing encryption simplifies management and enables content-based routing.
•Metrics and monitoring are essential — RPS, latency, error rates, and backend health determine system reliability.

What's Next:

Page Complete

1 / 5