Loading learning content...
When a request arrives at a load balancer, a critical decision must be made in microseconds: which backend server should handle this request? This decision, repeated millions of times per second across the internet, is governed by load balancing algorithms.
The choice of algorithm profoundly impacts system behavior. A poor algorithm choice can create hot spots where some servers are overwhelmed while others sit idle. It can break session stickiness, cause cache inefficiency, or lead to uneven resource utilization. Conversely, the right algorithm ensures optimal distribution, maintains consistency when needed, and adapts to backend health and capacity.
This page provides a comprehensive examination of load balancing algorithms, from simple static policies to sophisticated adaptive techniques used by the world's largest services.
By the end of this page, you will understand: the taxonomy of load balancing algorithms (static vs dynamic, stateful vs stateless), the mechanics of each major algorithm, consistency guarantees and their importance, how to select the right algorithm for your use case, and the mathematical foundations underlying modern algorithms.
Load balancing algorithms can be classified along several dimensions. Understanding this taxonomy helps in selecting the right approach.
Dimension 1: Static vs Dynamic
Static Algorithms:
Dynamic Algorithms:
Dimension 2: Stateless vs Stateful
Stateless Algorithms:
Stateful Algorithms:
| Algorithm | Static/Dynamic | Stateless/Stateful | Complexity |
|---|---|---|---|
| Round-Robin | Static | Stateless* | Low |
| Weighted Round-Robin | Static | Stateless* | Low |
| Random | Static | Stateless | Very Low |
| IP Hash | Static | Stateless | Low |
| Least Connections | Dynamic | Stateful | Medium |
| Weighted Least Connections | Dynamic | Stateful | Medium |
| Least Response Time | Dynamic | Stateful | High |
| Consistent Hashing | Static | Stateless | Medium |
| Maglev Hashing | Static | Stateless | Medium |
| Random Two Choices | Dynamic | Stateless | Low |
Round-robin is often classified as 'stateless' even though it maintains a counter for the current position. This state is trivial (a single integer) and losing it merely randomizes the starting position. True stateful algorithms like Least Connections require tracking active connections per backend.
Round-Robin is the simplest and most widely used load balancing algorithm. It distributes requests sequentially across all available backends.
How Round-Robin Works:
Backends: [A, B, C]
Counter: 0
Request 1 → Backend[0 % 3] = A, Counter = 1
Request 2 → Backend[1 % 3] = B, Counter = 2
Request 3 → Backend[2 % 3] = C, Counter = 3
Request 4 → Backend[3 % 3] = A, Counter = 4
...
Implementation (Pseudocode):
class RoundRobinLB:
def __init__(self, backends):
self.backends = backends
self.counter = 0
self.lock = Lock() # Thread safety
def select_backend(self):
with self.lock:
backend = self.backends[self.counter % len(self.backends)]
self.counter += 1
return backend
Characteristics:
Weighted Round-Robin:
When backends have different capacities, weighted round-robin assigns more requests to higher-capacity servers.
Example:
Backend A: weight=3 (handles 3x more traffic)
Backend B: weight=2
Backend C: weight=1
Total weight = 6
Distribution: A, A, A, B, B, C, A, A, A, B, B, C, ...
Smooth Weighted Round-Robin (SWRR):
Naive weighted round-robin creates bursts (all 'A' requests, then 'B', etc.). SWRR distributes more evenly:
Initial: A(3), B(2), C(1) → A, B, A, C, B, A, A, B, A, C, B, A, ...
Pattern: More interleaved distribution
Nginx SWRR Implementation:
def smooth_weighted_round_robin(backends, weights):
current_weights = [0] * len(backends)
total = sum(weights)
def select():
# Add original weight to current weight
for i in range(len(backends)):
current_weights[i] += weights[i]
# Select backend with highest current weight
max_idx = current_weights.index(max(current_weights))
# Reduce selected backend's weight by total
current_weights[max_idx] -= total
return backends[max_idx]
return select
When to Use Weighted Round-Robin:
| Scenario | Weight Assignment |
|---|---|
| Different CPU cores | Weight = number of cores |
| Different memory | Weight = relative memory capacity |
| Canary deployments | New version = 1, old version = 9 (10% canary) |
| Graceful degradation | Unhealthy but responsive = reduced weight |
Setting optimal weights is challenging. Static weights don't adapt to changing conditions—a powerful server running expensive queries might need fewer requests than a less powerful server handling simple requests. Consider dynamic algorithms if workload varies significantly.
Least Connections is a dynamic algorithm that routes requests to the backend with the fewest active connections, naturally adapting to backend capacity and request duration.
How Least Connections Works:
Backends at time T:
A: 5 active connections
B: 3 active connections ← Selected (lowest)
C: 8 active connections
Request arrives → Routed to B
Backends after:
A: 5 active connections
B: 4 active connections
C: 8 active connections
Implementation (Pseudocode):
class LeastConnectionsLB:
def __init__(self, backends):
self.backends = backends
self.connections = {b: 0 for b in backends}
self.lock = Lock()
def select_backend(self):
with self.lock:
# Find backend with minimum connections
backend = min(self.connections, key=self.connections.get)
self.connections[backend] += 1
return backend
def release_connection(self, backend):
with self.lock:
self.connections[backend] -= 1
Why Least Connections is Adaptive:
Weighted Least Connections:
Combines backend weights with connection tracking:
Score = Active Connections / Weight
Backend A: 6 connections, weight=3 → Score = 6/3 = 2.0
Backend B: 4 connections, weight=2 → Score = 4/2 = 2.0
Backend C: 3 connections, weight=1 → Score = 3/1 = 3.0
Select: A or B (tied), typically first encountered
Least Response Time:
More sophisticated than Least Connections, considers actual response times:
Score = Active Connections × Average Response Time
Backend A: 5 conns, 20ms avg → Score = 100
Backend B: 8 conns, 10ms avg → Score = 80 ← Selected (lowest)
Backend C: 3 conns, 50ms avg → Score = 150
Implementation Challenges:
NGINX Least Time Configuration:
upstream backend {
least_time header; # Time to receive response headers
# OR: least_time last_byte; # Time to receive full response
server 192.168.1.10:8080;
server 192.168.1.11:8080;
server 192.168.1.12:8080;
}
| Algorithm | Metric | Pros | Cons |
|---|---|---|---|
| Least Connections | Active connections | Simple, self-adaptive | Doesn't consider capacity |
| Weighted Least Conn. | Connections / weight | Accounts for capacity | Weights need configuration |
| Least Response Time | Connections × latency | Most accurate | Complex, needs measurement |
| Least Pending | Queued requests | Queue-aware | Requires queue visibility |
Least Connections excels when request durations vary significantly (some are fast, some are slow). Round-robin assumes all requests are equal; Least Connections adapts to reality. For uniform requests with similar durations, round-robin is simpler and equally effective.
Hashing algorithms provide consistent backend selection for requests with the same key. This is essential for session affinity, caching efficiency, and stateful services.
IP Hash:
Routes requests from the same client IP to the same backend:
backend_index = hash(client_ip) % num_backends
Client 203.0.113.10 → hash("203.0.113.10") % 3 = 1 → Backend B
Client 198.51.100.5 → hash("198.51.100.5") % 3 = 0 → Backend A
# Same client always hits same backend (unless backends change)
Implementation:
def ip_hash(client_ip, backends):
# Simple hash using Python's hash function
# Production: use consistent hash function like xxHash
hash_val = hash(client_ip)
return backends[hash_val % len(backends)]
Limitations of Simple Modulo Hashing:
When backends change (add/remove), most mappings change:
Before (3 backends): hash(ip) % 3
After (4 backends): hash(ip) % 4
Client with hash=5:
Before: 5 % 3 = 2 → Backend C
After: 5 % 4 = 1 → Backend B ← Different!
Adding one backend remaps ~75% of clients (N-1)/N = 2/3 ≈ 67% changing).
URL/Header/Cookie Hashing:
Hash on application-layer attributes for more specific affinity:
| Hash Key | Use Case | Example |
|---|---|---|
| URL Path | Cache locality | Same path → same cached server |
| Query Parameter | User sharding | ?user_id=123 → consistent backend |
| Cookie Value | Session stickiness | Session cookie → owning server |
| HTTP Header | Tenant routing | X-Tenant-ID → tenant's backend pool |
NGINX Hash Configuration:
upstream backend {
hash $remote_addr consistent; # IP hash with consistent hashing
# OR:
# hash $request_uri consistent; # URL hash
# hash $cookie_session_id consistent; # Cookie hash
server 192.168.1.10:8080;
server 192.168.1.11:8080;
server 192.168.1.12:8080;
}
Why Session Affinity Matters:
Session affinity can create hot spots. If many users share a hash (e.g., behind same NAT), one backend is overloaded. Modern architectures prefer stateless backends with external session stores (Redis, database) to avoid this tradeoff.
Consistent hashing solves the remapping problem of simple modulo hashing. When backends change, only a minimal number of keys need to be remapped.
The Concept: Hash Ring
0 / 2³²
○
/
B3 B1
○ ○
/
| |
\ /
○ ○
B2 B4 (new)
\ /
○
Key K: hash(K) → Position P → Walk clockwise → First backend = handler
Adding a Backend:
When B4 is added, only keys between B2 and B4 remap (were going to B3, now go to B4). All other mappings unchanged.
Expected Remapping: K/N keys (K = total keys, N = backends)
Compare to modulo hashing: (N-1)/N × K keys remapped.
Virtual Nodes (VNodes):
With few backends, the ring can be unbalanced. Virtual nodes create multiple positions per backend:
Backend A → VNodes: A1, A2, A3, A4 (each at different ring positions)
Backend B → VNodes: B1, B2, B3, B4
Backend C → VNodes: C1, C2, C3, C4
Benefits:
Implementation (Pseudocode):
class ConsistentHash:
def __init__(self, backends, vnodes=100):
self.ring = SortedDict() # Position → Backend
self.vnodes = vnodes
for backend in backends:
self.add_backend(backend)
def add_backend(self, backend):
for i in range(self.vnodes):
position = hash(f"{backend}-{i}")
self.ring[position] = backend
def remove_backend(self, backend):
for i in range(self.vnodes):
position = hash(f"{backend}-{i}")
del self.ring[position]
def get_backend(self, key):
if not self.ring:
return None
position = hash(key)
# Find first position >= key position (or wrap to start)
keys = list(self.ring.keys())
for pos in keys:
if pos >= position:
return self.ring[pos]
return self.ring[keys[0]] # Wrap around
Jump Consistent Hash:
An alternative approach that achieves consistent hashing without a ring:
def jump_consistent_hash(key, num_buckets):
b, j = -1, 0
while j < num_buckets:
b = j
key = ((key * 2862933555777941757) + 1) & (2**64 - 1)
j = int((b + 1) * (2**31 / ((key >> 33) + 1)))
return b
Benefits: No storage, O(log N) time, perfect distribution. Limitation: Only supports sequential bucket IDs.
Consistent hashing is essential when backend changes are frequent AND session/cache continuity matters. Use cases: distributed caches (Redis Cluster, Memcached), sharded databases, content delivery networks. If backend changes are rare or continuity doesn't matter, simpler hashing suffices.
Maglev hashing is a consistent hashing algorithm developed by Google for their Maglev load balancers. It provides better distribution and faster lookups than ring-based consistent hashing.
The Problem with Ring-Based Consistent Hashing:
Maglev's Approach: Lookup Table
Instead of a ring, Maglev builds a fixed-size lookup table where each slot maps to a backend:
Table size M = 65537 (prime, larger than backends)
+---+---+---+---+---+---+---+---+---+---+
| A | B | C | A | C | B | A | B | C | A | ...
+---+---+---+---+---+---+---+---+---+---+
0 1 2 3 4 5 6 7 8 9
Lookup: backend = table[hash(key) % M]
Benefits:
How Maglev Builds the Table:
offset and skipdef build_maglev_table(backends, table_size):
table = [None] * table_size
filled = 0
# Compute preference lists for each backend
preferences = {}
for backend in backends:
offset = hash1(backend) % table_size
skip = hash2(backend) % (table_size - 1) + 1
preferences[backend] = []
for i in range(table_size):
preferences[backend].append((offset + i * skip) % table_size)
# Backends take turns filling slots
next_pref = {b: 0 for b in backends} # Next preference index
while filled < table_size:
for backend in backends:
while True:
slot = preferences[backend][next_pref[backend]]
next_pref[backend] += 1
if table[slot] is None:
table[slot] = backend
filled += 1
break
if next_pref[backend] >= table_size:
break
return table
Disruption Analysis:
When one backend is removed:
When one backend is added:
| Aspect | Ring-Based Consistent Hash | Maglev |
|---|---|---|
| Lookup Time | O(log N) | O(1) |
| Memory | O(N × VNodes) | O(M) fixed |
| Distribution | Approximate | Perfect |
| Disruption | Optimal | Optimal |
| Table Build | O(N × VNodes) | O(M × N) |
| Implementation | Simpler | More complex |
Maglev is used in Google's network load balancers serving Google Search, YouTube, and other services. Similar algorithms are used in Envoy Proxy, Katran (Facebook's L4 LB), and modern service meshes. If you're building high-performance infrastructure, Maglev is worth understanding.
The Power of Two Random Choices (also called "P2C" or "Join-Shortest-Queue-d") is an elegant algorithm that achieves near-optimal load distribution with minimal overhead.
The Insight:
Pure random selection has high variance—some backends get many more requests than others. Least Connections is optimal but requires global state.
P2C combines the best of both:
This simple change exponentially reduces the maximum load imbalance.
Mathematical Foundation:
| Algorithm | Max Load (expected) |
|---|---|
| Random | O(log N / log log N) |
| Two Random Choices | O(log log N) |
For N=1000 backends:
How It Works:
import random
class PowerOfTwoLB:
def __init__(self, backends):
self.backends = backends
self.connections = {b: 0 for b in backends}
def select_backend(self):
# Randomly pick 2 backends
choice1, choice2 = random.sample(self.backends, 2)
# Select the one with fewer connections
if self.connections[choice1] <= self.connections[choice2]:
selected = choice1
else:
selected = choice2
self.connections[selected] += 1
return selected
Weighted P2C:
For weighted backends, compare normalized load:
def weighted_p2c_select(backends, weights, connections):
choice1, choice2 = random.sample(range(len(backends)), 2)
load1 = connections[choice1] / weights[choice1]
load2 = connections[choice2] / weights[choice2]
return backends[choice1 if load1 <= load2 else choice2]
Advantages:
Use in Envoy Proxy:
clusters:
- name: my_service
lb_policy: LEAST_REQUEST
least_request_lb_config:
choice_count: 2 # Power of two choices
Interestingly, going from 1 → 2 choices provides exponential improvement. Going from 2 → 3 or more provides only marginal additional benefit. Two choices hits the sweet spot of simplicity and effectiveness.
Choosing the right algorithm requires matching your application's characteristics to algorithm strengths.
| Scenario | Recommended Algorithm | Reason |
|---|---|---|
| Homogeneous backends, uniform requests | Round-Robin | Simplicity; equal distribution |
| Backends with different capacity | Weighted Round-Robin | Respects capacity differences |
| Variable request duration | Least Connections | Adapts to actual load |
| Session stickiness required | IP/Cookie Hash | Consistent routing per client |
| Distributed caching | Consistent Hashing | Maintains cache locality |
| High-performance infrastructure | Maglev | O(1) lookup, minimal disruption |
| Stateless with good distribution | Power of Two Choices | Simple, distributed, effective |
| Microservices with varying latency | Least Response Time | Routes to fastest responders |
What's Next:
Algorithms determine which backend is selected, but how do we know if a backend is healthy enough to receive traffic? Next, we'll explore health checking mechanisms that monitor backend availability and integrate with load balancing algorithms to route traffic only to responsive servers.
You now understand the major load balancing algorithms, from simple round-robin to sophisticated consistent hashing. You can analyze the tradeoffs and select the right algorithm for your specific requirements. Next, we'll explore health checking mechanisms.