Load Distribution Algorithms - Learning Module

Loading content...

0/273

Least Connections: Load-Aware Distribution

Beyond Request Counting

Round Robin and Weighted Round Robin distribute requests evenly, but they're blind to what happens after a request is sent. Consider this scenario:

The Problem:

You have 3 servers receiving round-robin traffic:

Server A receives a request that takes 50ms (fast database query)
Server B receives a request that takes 5 seconds (complex report generation)
Server C receives a request that takes 100ms (moderate processing)

After distributing 3 requests, round robin considers its job done—each server got one. But Server B is now overloaded, processing a long request, while Server A completed its work 4.9 seconds ago and sits idle.

The next 3 requests will again go equally to A, B, and C, despite B still processing its first request. Server B accumulates connections while A and C remain underutilized.

Least Connections solves this by tracking the actual number of active connections to each server and routing new requests to the server with the fewest current connections. It's load-aware, not just request-aware.

What You Will Master

By the end of this page, you will understand Least Connections' mechanics and variants, its significant advantages for variable workloads, implementation complexities including connection tracking overhead, and precisely when it outperforms round-robin approaches.

Algorithm Mechanics: How Least Connections Works

The Least Connections algorithm is conceptually simple:

Core Algorithm:

For each backend server, maintain a counter of active connections
When a new request arrives:
- Select the server with the minimum active connection count
- Increment that server's connection count
When a connection completes:
- Decrement the server's connection count
Repeat

Definition of "Connection":

The term "connection" means different things at different layers:

Layer 4: TCP connections—from SYN to FIN
Layer 7: HTTP requests—from request start to response complete
Application: Could include queued work, not just network connections

Most implementations track HTTP requests or active TCP connections depending on the load balancer type.

Converting Mermaid diagram...

least_connections.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
import threading
from dataclasses import dataclass
from typing import Optional
import heapq
 
@dataclass
class ServerState:
    """Tracks connection state for a single server."""
    address: str
    active_connections: int = 0
    healthy: bool = True
 
class LeastConnectionsLoadBalancer:
    """
    Least Connections load balancer implementation.
    
    Uses a simple linear scan to find the server with fewest
    connections. For small server counts (< 100), this is efficient.
    """
    
    def __init__(self, servers: list[str]):
        """
        Initialize with list of backend server addresses.
        
        Args:
            servers: List of server identifiers
        """
        self._servers = {
            addr: ServerState(address=addr) 
            for addr in servers
        }
        self._lock = threading.Lock()
    
    def get_next_server(self) -> Optional[str]:
        """
        Select server with fewest active connections.
        
        Time Complexity: O(n) where n is number of servers
        
        Returns:
            Server address with minimum connections, or None if all unhealthy
        """
        with self._lock:
            # Filter to healthy servers only
            healthy_servers = [
                s for s in self._servers.values() 
                if s.healthy
            ]
            
            if not healthy_servers:
                return None
            
            # Find server with minimum connections
            # Use address as tiebreaker for determinism
            best = min(
                healthy_servers,
                key=lambda s: (s.active_connections, s.address)
            )
            
            # Increment connection count
            best.active_connections += 1
            
            return best.address
    
    def release_connection(self, address: str) -> None:
        """
        Called when a connection completes.
        
        CRITICAL: This must be called for every connection,
        or the load balancer state becomes corrupted!
        """
        with self._lock:
            if address in self._servers:
                server = self._servers[address]
                server.active_connections = max(0, server.active_connections - 1)
    
    def mark_unhealthy(self, address: str) -> None:
        """Mark a server as unhealthy."""
        with self._lock:
            if address in self._servers:
                self._servers[address].healthy = False
                # Reset connections - existing requests will still complete
                self._servers[address].active_connections = 0
    
    def mark_healthy(self, address: str) -> None:
        """Mark a server as healthy."""
        with self._lock:
            if address in self._servers:
                self._servers[address].healthy = True
    
    def get_connection_counts(self) -> dict[str, int]:
        """Get current connection counts for monitoring."""
        with self._lock:
            return {
                addr: s.active_connections 
                for addr, s in self._servers.items()
            }
 
 
# Context manager for clean connection lifecycle
class ConnectionContext:
    """
    Ensures connection is properly released after use.
    
    Usage:
        with ConnectionContext(lb) as server:
            # Make request to server
            response = make_request(server)
    """
    
    def __init__(self, lb: LeastConnectionsLoadBalancer):
        self._lb = lb
        self._server = None
    
    def __enter__(self) -> Optional[str]:
        self._server = self._lb.get_next_server()
        return self._server
    
    def __exit__(self, exc_type, exc_val, exc_tb):
        if self._server:
            self._lb.release_connection(self._server)
        return False  # Don't suppress exceptions
 
 
# Example usage
if __name__ == "__main__":
    import time
    import random
    
    servers = ["10.0.1.1:8080", "10.0.1.2:8080", "10.0.1.3:8080"]
    lb = LeastConnectionsLoadBalancer(servers)
    
    # Simulate variable workload
    print("Simulating variable request durations:")
    print("-" * 50)
    
    for i in range(10):
        with ConnectionContext(lb) as server:
            # Simulate variable processing time
            duration = random.uniform(0.1, 1.0)
            print(f"Request {i+1}: {server} "
                  f"(counts: {lb.get_connection_counts()})")
            time.sleep(duration)
    
    print("-" * 50)
    print(f"Final state: {lb.get_connection_counts()}")

Critical: Connection Lifecycle Management

The most common bug in Least Connections implementations is failing to decrement the connection count when requests complete. Every 'get' must have a matching 'release'. Use RAII patterns, context managers, or defer statements to ensure cleanup happens even when requests fail.

Why Least Connections Matters: The Variable Workload Problem

To understand when Least Connections excels, we need to examine workload variability—the differences in processing time across requests.

Workload Variability Classifications:

Request Duration Variability Patterns
Pattern	Characteristics	Example	Best Algorithm
Uniform	All requests ~same duration	Health checks, simple APIs	Round Robin (simpler)
Low Variance	Most requests similar, occasional outliers	CRUD operations	Round Robin acceptable
High Variance	Wide range of durations	Report generation, batch processing	Least Connections
Bimodal	Two distinct duration classes	Quick reads + slow writes	Least Connections or queue separation
Long-tail	Most fast, few very slow	Search with complex queries	Least Connections essential

Mathematical Illustration:

Consider 2 servers receiving requests with these processing times:

90% of requests: 10ms
10% of requests: 1000ms (100x longer)

With Round Robin (1000 requests):

Each server receives 500 requests
Expected slow requests per server: 50
Expected cumulative slow request time: 50 seconds
But variance is high—one server might get 60 slow requests, another 40
Result: 20 second imbalance in processing time

With Least Connections:

When a server receives a slow request, its connection count increases
Subsequent fast requests route to the other server
The "slow" server naturally receives fewer new requests while busy
Result: Automatic compensation for variable duration

The Self-Balancing Property

Least Connections is self-balancing: servers that are slow (for any reason) automatically receive less traffic. This is powerful for heterogeneous backends, variable workloads, or when server performance degrades due to garbage collection, I/O issues, or noisy neighbors in cloud environments.

Weighted Least Connections: Combining Load Awareness with Capacity

Standard Least Connections assumes all servers have equal capacity. Weighted Least Connections combines connection tracking with capacity weighting.

The Algorithm:

Instead of selecting the server with minimum connections, select the server with minimum connections / weight.

This ratio represents the load relative to capacity—a server with weight 10 can handle 10 connections as easily as a weight-1 server handles 1.

Example:

Server A: weight=10, connections=3 → ratio = 0.3 Server B: weight=5, connections=2 → ratio = 0.4 Server C: weight=2, connections=0 → ratio = 0.0

Select Server C (lowest ratio).

weighted_least_connections.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
import threading
from dataclasses import dataclass
from typing import Optional
 
@dataclass
class WeightedServer:
    """Server with weight and connection tracking."""
    address: str
    weight: int
    active_connections: int = 0
    healthy: bool = True
    
    @property
    def load_ratio(self) -> float:
        """
        Calculate load relative to capacity.
        
        Lower ratio = server has more available capacity.
        Weight of 0 returns infinity to avoid selection.
        """
        if self.weight == 0:
            return float('inf')
        return self.active_connections / self.weight
 
class WeightedLeastConnectionsLB:
    """
    Weighted Least Connections load balancer.
    
    Selects the server with the lowest connections/weight ratio,
    effectively choosing the server with most available capacity.
    """
    
    def __init__(self, servers: dict[str, int]):
        """
        Initialize with servers and their weights.
        
        Args:
            servers: Dict mapping server address to weight
        """
        self._servers = {
            addr: WeightedServer(address=addr, weight=w)
            for addr, w in servers.items()
        }
        self._lock = threading.Lock()
    
    def get_next_server(self) -> Optional[str]:
        """
        Select server with lowest load ratio.
        
        Time Complexity: O(n)
        
        Returns:
            Server address, or None if no healthy servers
        """
        with self._lock:
            healthy_servers = [
                s for s in self._servers.values()
                if s.healthy and s.weight > 0
            ]
            
            if not healthy_servers:
                return None
            
            # Select by minimum load ratio
            # Use address as tiebreaker for determinism
            best = min(
                healthy_servers,
                key=lambda s: (s.load_ratio, s.address)
            )
            
            best.active_connections += 1
            return best.address
    
    def release_connection(self, address: str) -> None:
        """Release a connection when request completes."""
        with self._lock:
            if address in self._servers:
                server = self._servers[address]
                server.active_connections = max(0, server.active_connections - 1)
    
    def update_weight(self, address: str, new_weight: int) -> None:
        """
        Update server weight dynamically.
        
        Useful for:
        - Responding to detected capacity changes
        - Gradual scaling up/down
        - Manual traffic shifts
        """
        with self._lock:
            if address in self._servers:
                self._servers[address].weight = max(0, new_weight)
    
    def get_states(self) -> list[dict]:
        """Get current state for monitoring/debugging."""
        with self._lock:
            return [
                {
                    "address": s.address,
                    "weight": s.weight,
                    "connections": s.active_connections,
                    "load_ratio": round(s.load_ratio, 3),
                    "healthy": s.healthy,
                }
                for s in self._servers.values()
            ]
 
 
# Demonstration
if __name__ == "__main__":
    # Heterogeneous cluster: different capacity servers
    servers = {
        "large-instance": 10,   # High capacity
        "medium-instance": 5,   # Medium capacity  
        "small-instance": 2,    # Lower capacity
    }
    
    lb = WeightedLeastConnectionsLB(servers)
    
    print("Initial state:")
    for state in lb.get_states():
        print(f"  {state}")
    
    print("\nDistributing 17 requests (proportional to weights 10:5:2):")
    
    selections = []
    for i in range(17):
        server = lb.get_next_server()
        selections.append(server)
    
    # Count selections
    from collections import Counter
    counts = Counter(selections)
    print(f"\nDistribution: {dict(counts)}")
    print("Expected ratio: 10:5:2")
    
    # Show final state (all connections still active)
    print("\nFinal state (connections active):")
    for state in lb.get_states():
        print(f"  {state}")

When to Use Weighted vs Unweighted

Use Weighted Least Connections when you have heterogeneous backends (different instance sizes, hardware generations). Use unweighted Least Connections when all servers are identical—the weight tracking adds complexity without benefit.

Production Implementation: NGINX and HAProxy

Let's examine how production load balancers implement Least Connections.

nginx_least_conn.conf
NGINX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# NGINX Least Connections Configuration
 
upstream backend_pool {
    # Enable least_conn algorithm
    least_conn;
    
    # Server list - NGINX tracks connections to each
    server 10.0.1.1:8080;
    server 10.0.1.2:8080;
    server 10.0.1.3:8080;
    
    # With health checking (NGINX Plus)
    # health_check interval=5s fails=3 passes=2;
}
 
upstream weighted_least_conn {
    # Combine least_conn with weights
    # NGINX uses weighted least connections when weights differ
    least_conn;
    
    # High-capacity servers
    server 10.0.1.1:8080 weight=10;
    server 10.0.1.2:8080 weight=10;
    
    # Medium capacity
    server 10.0.2.1:8080 weight=5;
    
    # Lower capacity
    server 10.0.3.1:8080 weight=2;
    
    # Connection limits per server (optional)
    # Prevents any single server from being overwhelmed
    # server 10.0.1.1:8080 weight=10 max_conns=1000;
}
 
upstream with_keepalive {
    least_conn;
    
    server 10.0.1.1:8080;
    server 10.0.1.2:8080;
    
    # Keep idle connections to backends
    # Reduces connection overhead for HTTP/1.1
    keepalive 32;
    
    # For HTTP/2 backends
    # keepalive_requests 1000;
    # keepalive_timeout 60s;
}
 
server {
    listen 80;
    
    location / {
        proxy_pass http://backend_pool;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        # Timeouts affect connection counts
        # Long timeouts = connections held longer
        proxy_connect_timeout 5s;
        proxy_read_timeout 60s;
        proxy_send_timeout 60s;
    }
    
    # Status page showing connection distribution (NGINX Plus)
    location /api/status {
        # api write=off;
        # allow 127.0.0.1;
        # deny all;
    }
}

haproxy_leastconn.cfg

HAProxy

# HAProxy Least Connections Configuration
 
global
    maxconn 50000
 
defaults
    mode http
    timeout connect 5s
    timeout client 30s
    timeout server 30s
    option httplog
 
backend app_servers_leastconn
    # Least connections balancing
    balance leastconn
    
    server app1 10.0.1.1:8080 check maxconn 2000
    server app2 10.0.1.2:8080 check maxconn 2000
    server app3 10.0.1.3:8080 check maxconn 2000
    
    # maxconn: Maximum connections per server
    # Queues requests when limit reached
 
backend weighted_leastconn
    # Weighted least connections
    balance leastconn
    
    # Weight affects the connection/weight ratio calculation
    server large1 10.0.1.1:8080 weight 100 check
    server large2 10.0.1.2:8080 weight 100 check
    server medium1 10.0.2.1:8080 weight 50 check
    server small1 10.0.3.1:8080 weight 25 check
 
backend with_queue_management
    balance leastconn
    
    # fullconn: Expected max connections for queue calculation
    fullconn 5000
    
    server app1 10.0.1.1:8080 weight 100 check maxconn 1000
    server app2 10.0.1.2:8080 weight 100 check maxconn 1000
    
    # Queue management
    # When all servers at maxconn, requests queue here
    timeout queue 30s
    
    # Retries on connection failure
    retries 3
    option redispatch  # Allow retry on different server
 
frontend http_front
    bind *:80
    default_backend app_servers_leastconn
 
# Very useful: stats page shows live connection counts
listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 5s
    
    # Shows per-server:
    # - Current connections
    # - Connection rate
    # - Bytes in/out
    # - Response times

Connection Limits as Backpressure

The maxconn setting in HAProxy and max_conns in NGINX Plus provide crucial backpressure. Without limits, a slow server accumulates unbounded connections. With limits, excess requests queue at the load balancer, and health checks can detect the overload condition.

Implementation Considerations and Pitfalls

Least Connections introduces complexity that Round Robin avoids. Understanding these challenges is essential for correct implementation.

Common Implementation Pitfalls

•Connection Count Leaks: Failing to decrement counts when connections close (due to errors, timeouts, or code paths that skip cleanup) corrupts the algorithm state. Leaked counts cause servers to appear busy when they're idle.
•Cold Start Problem: New or recovered servers have 0 connections and receive ALL new traffic until they catch up. This can overwhelm them (thundering herd). Solution: slow-start or artificial initial count.
•Stale Connections: Long-lived connections (WebSockets, gRPC streams) inflate counts permanently. Consider ignoring long connections or using request-based counting instead.
•Distributed State: With multiple load balancers, each tracks connections independently. Global least-connections is hard—servers may appear least-loaded to one LB but overloaded from another.
•Selection Overhead: O(n) scan for each request. For thousands of backends, this adds latency. Consider using a min-heap or sampling for large server pools.

Solving the Cold Start Problem:

cold_start_handling.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
import time
from dataclasses import dataclass, field
from typing import Optional
import threading
 
@dataclass
class ServerWithSlowStart:
    """Server with slow-start support to prevent thundering herd."""
    address: str
    weight: int = 1
    active_connections: int = 0
    healthy: bool = True
    
    # Slow start tracking
    _startup_time: Optional[float] = None
    _slow_start_duration: float = 60.0  # seconds
    
    def mark_healthy(self):
        """Called when server becomes healthy."""
        self.healthy = True
        self._startup_time = time.time()
    
    @property
    def effective_weight(self) -> float:
        """
        Weight adjusted for slow-start period.
        
        Gradually increases from 10% to 100% over slow_start_duration.
        """
        if not self._startup_time:
            return float(self.weight)
        
        elapsed = time.time() - self._startup_time
        
        if elapsed >= self._slow_start_duration:
            # Slow start complete
            self._startup_time = None
            return float(self.weight)
        
        # Linear ramp from 0.1 to 1.0
        progress = elapsed / self._slow_start_duration
        multiplier = 0.1 + (0.9 * progress)
        
        return self.weight * multiplier
    
    @property
    def load_ratio(self) -> float:
        """Load ratio using effective weight."""
        ew = self.effective_weight
        if ew == 0:
            return float('inf')
        return self.active_connections / ew
 
 
class SlowStartLeastConnections:
    """
    Least Connections with slow-start support.
    
    New/recovered servers receive gradually increasing traffic
    over the slow-start period, preventing thundering herd.
    """
    
    def __init__(self, servers: dict[str, int], slow_start_seconds: float = 60.0):
        self._slow_start_duration = slow_start_seconds
        self._servers = {}
        self._lock = threading.Lock()
        
        for addr, weight in servers.items():
            server = ServerWithSlowStart(
                address=addr,
                weight=weight,
                healthy=True,
            )
            server._slow_start_duration = slow_start_seconds
            # Existing servers don't need slow start
            server._startup_time = None
            self._servers[addr] = server
    
    def add_server(self, address: str, weight: int = 1) -> None:
        """Add a new server with slow-start."""
        with self._lock:
            server = ServerWithSlowStart(
                address=address,
                weight=weight,
                healthy=False,  # Will be marked healthy with slow-start
            )
            server._slow_start_duration = self._slow_start_duration
            self._servers[address] = server
            
            # Trigger slow-start
            server.mark_healthy()
    
    def get_next_server(self) -> Optional[str]:
        """Select server with lowest load ratio (with slow-start adjustment)."""
        with self._lock:
            healthy = [s for s in self._servers.values() if s.healthy]
            
            if not healthy:
                return None
            
            best = min(healthy, key=lambda s: s.load_ratio)
            best.active_connections += 1
            return best.address
    
    def release_connection(self, address: str) -> None:
        """Release connection when complete."""
        with self._lock:
            if address in self._servers:
                s = self._servers[address]
                s.active_connections = max(0, s.active_connections - 1)

HAProxy and NGINX Slow Start

Both HAProxy (slowstart parameter) and NGINX Plus have built-in slow-start support. HAProxy linearly increases the effective weight from 0 to configured weight over the slowstart period.

Least Connections Variants

Several variants of Least Connections exist, each optimizing for different scenarios.

Least Connections Algorithm Variants
Variant	Selection Criteria	Use Case	Trade-off
Basic Least Conn	Min connections	Homogeneous servers	Simple, widely supported
Weighted Least Conn	Min connections/weight	Heterogeneous servers	Accounts for capacity differences
Least Outstanding Requests	Min pending requests (L7)	HTTP with keep-alive	More accurate than connections for HTTP/2
Least Response Time	Min (connections × response_time)	Backend performance varies	Requires response time tracking
Peak EWMA	Exponentially weighted response time	Twitter Finagle approach	Smooth response to latency changes
P2C + Least Conn	Pick two random, choose lowest	Very large server pools	O(1) with good distribution

Power of Two Random Choices (P2C):

For very large server pools, scanning all servers becomes expensive. The P2C algorithm provides O(1) selection with provably good load distribution:

Pick two servers at random
Choose the one with fewer connections

Mathematically, this achieves near-optimal distribution. The intuition: even random selection between just two options dramatically reduces the chance of selecting an overloaded server.

p2c_least_connections.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import random
import threading
from typing import Optional
 
class P2CLeastConnections:
    """
    Power of Two Random Choices + Least Connections.
    
    Provides O(1) selection with near-optimal load distribution.
    Especially valuable for large server pools (100+ servers).
    """
    
    def __init__(self, servers: list[str]):
        self._servers = list(servers)
        self._connections = {s: 0 for s in servers}
        self._lock = threading.Lock()
    
    def get_next_server(self) -> Optional[str]:
        """
        Select server using P2C algorithm.
        
        Time Complexity: O(1)
        Distribution: Provably near-optimal for large pools
        """
        with self._lock:
            if len(self._servers) < 2:
                if self._servers:
                    server = self._servers[0]
                    self._connections[server] += 1
                    return server
                return None
            
            # Pick two random servers
            a, b = random.sample(self._servers, 2)
            
            # Choose the one with fewer connections
            if self._connections[a] <= self._connections[b]:
                self._connections[a] += 1
                return a
            else:
                self._connections[b] += 1
                return b
    
    def release_connection(self, address: str) -> None:
        """Release connection."""
        with self._lock:
            if address in self._connections:
                self._connections[address] = max(0, self._connections[address] - 1)
 
 
# Benchmark comparison
if __name__ == "__main__":
    import time
    
    # Large server pool
    servers = [f"server-{i}" for i in range(1000)]
    
    # P2C is O(1)
    p2c_lb = P2CLeastConnections(servers)
    
    # Time 100,000 selections
    start = time.perf_counter()
    for _ in range(100_000):
        server = p2c_lb.get_next_server()
        p2c_lb.release_connection(server)
    duration = time.perf_counter() - start
    
    print(f"P2C: 100,000 selections in {duration:.3f}s")
    print(f"     {100_000/duration:.0f} selections/second")

When to Use Least Connections

Least Connections is more complex than Round Robin. Use it when the benefits justify the complexity.

Use Least Connections When

•Request processing times vary significantly (10x+ variance is common)
•Long-running requests are mixed with quick ones
•Backend server performance varies (GC pauses, noisy neighbors)
•You need graceful handling of slow servers without health checks
•WebSocket or streaming connections coexist with regular HTTP

Stay with Round Robin When

•Request durations are relatively uniform
•All backends are homogeneous and equally performant
•You want maximum simplicity and debuggability
•The overhead of connection tracking is a concern
•You're using a distributed load balancing setup where global state is impractical

The Practical Default

For most web applications with API backends, Least Connections is a reasonable default. It handles workload variance gracefully without significant downsides. Start with Least Connections unless you have specific reasons to prefer Round Robin's simplicity or need algorithms with different properties (like consistent hashing for session affinity).

Summary: Least Connections Mastery

Key Takeaways

•Load-aware distribution: Least Connections routes to servers based on actual load, not just request counts.
•Self-balancing: Slow servers automatically receive less traffic—natural load compensation.
•Critical lifecycle management: Connection counts must be accurately tracked—every get needs a release.
•Cold start requires mitigation: Use slow-start to prevent thundering herd on new/recovered servers.
•P2C for scale: For large pools, Power of Two Choices provides O(1) selection with excellent distribution.
•Best for variable workloads: When request durations vary significantly, Least Connections outperforms Round Robin.

What's Next:

We've seen unweighted and weighted Least Connections. The next page combines these insights into Weighted Least Connections, diving deeper into the algorithm that handles both heterogeneous hardware AND variable workloads simultaneously.

Page Complete

You now understand Least Connections deeply—why it matters for variable workloads, how to implement it correctly, common pitfalls to avoid, and when to choose it over simpler algorithms. You can make informed decisions about load-aware distribution in your systems.