Load Distribution Algorithms - Learning Module

Loading content...

101

0/273

Weighted Least Connections: Optimal Distribution for Complex Environments

The Complete Solution

We've explored algorithms that solve pieces of the load balancing puzzle:

Round Robin: Equal distribution, but ignores server capacity and load
Weighted Round Robin: Accounts for capacity, but ignores real-time load
Least Connections: Responds to load, but assumes equal capacity

Weighted Least Connections unifies these approaches, providing a distribution algorithm that:

Respects server capacity — High-capacity servers receive proportionally more traffic
Responds to real-time load — Busy servers receive fewer new requests
Adapts dynamically — Automatically compensates for slow servers, GC pauses, and variable workloads

This makes Weighted Least Connections the algorithm of choice for heterogeneous production environments where both capacity and current load matter.

What You Will Master

By the end of this page, you will understand the mathematical model behind WLC, advanced implementation techniques, edge cases and their solutions, and how to configure and tune WLC in production environments for optimal performance.

The Mathematical Model

Weighted Least Connections selects servers based on their load ratio—the ratio of current connections to capacity.

Load Ratio Definition:

load_ratio(server) = active_connections / weight

The server with the minimum load ratio is selected. This ensures:

Servers with more capacity (higher weight) can sustain more connections before their ratio increases
Servers currently handling many connections have higher ratios and receive fewer new requests

Example Selection:

Server	Weight	Connections	Load Ratio	Selected?
A	10	8	0.80	No
B	5	3	0.60	No
C	2	1	0.50	✓ Yes

Server C has the lowest load ratio (0.50), so it receives the next request.

The Equilibrium Property:

Over time, Weighted Least Connections drives the system toward load ratio equilibrium—all servers tend toward the same ratio.

Why? If Server A has a lower ratio than others, it receives more traffic, increasing its connections and ratio. Eventually, all servers converge to similar ratios.

At equilibrium with load ratio r for all servers:

Server with weight 10: ~10r connections
Server with weight 5: ~5r connections
Server with weight 2: ~2r connections

This is proportional distribution—exactly what weighted algorithms aim to achieve, but achieved dynamically rather than statically.

The Beauty of Dynamic Equilibrium

Unlike Weighted Round Robin which enforces proportions statically, WLC achieves proportional distribution dynamically. If a high-weight server suddenly slows down (GC pause, disk I/O), its connections accumulate, its ratio increases, and traffic automatically shifts away—without any health check or manual intervention.

Production-Grade Implementation

A robust implementation of Weighted Least Connections must handle numerous edge cases and operational concerns.

weighted_least_connections_prod.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
import threading
import time
import math
from dataclasses import dataclass, field
from typing import Optional, Dict, List
from enum import Enum
 
class ServerStatus(Enum):
    HEALTHY = "healthy"
    UNHEALTHY = "unhealthy"
    DRAINING = "draining"  # No new connections, existing ones complete
    WARMING = "warming"     # Slow-start period
 
@dataclass
class ProductionServer:
    """
    Full-featured server state for production WLC.
    
    Tracks not just connections but also health status,
    effective weight for slow-start, and operational metadata.
    """
    address: str
    weight: int
    
    # Connection tracking
    active_connections: int = 0
    
    # Health and status
    status: ServerStatus = ServerStatus.HEALTHY
    consecutive_failures: int = 0
    last_failure_time: Optional[float] = None
    
    # Slow-start support
    warmup_start_time: Optional[float] = None
    warmup_duration_seconds: float = 60.0
    
    # Connection limits
    max_connections: Optional[int] = None
    
    @property
    def effective_weight(self) -> float:
        """
        Weight adjusted for slow-start period.
        
        During warmup, weight gradually increases from 10% to 100%.
        """
        if self.status == ServerStatus.UNHEALTHY:
            return 0.0
        
        if self.status == ServerStatus.DRAINING:
            return 0.0  # No new connections
        
        if self.status == ServerStatus.WARMING and self.warmup_start_time:
            elapsed = time.time() - self.warmup_start_time
            if elapsed >= self.warmup_duration_seconds:
                self.status = ServerStatus.HEALTHY
                return float(self.weight)
            
            # Linear ramp from 10% to 100%
            progress = elapsed / self.warmup_duration_seconds
            return self.weight * (0.1 + 0.9 * progress)
        
        return float(self.weight)
    
    @property
    def load_ratio(self) -> float:
        """
        Calculate load relative to effective capacity.
        
        Returns infinity for unavailable servers to exclude them.
        """
        ew = self.effective_weight
        if ew <= 0:
            return float('inf')
        return self.active_connections / ew
    
    @property
    def can_accept_connection(self) -> bool:
        """Check if server can accept a new connection."""
        if self.status in (ServerStatus.UNHEALTHY, ServerStatus.DRAINING):
            return False
        
        if self.max_connections and self.active_connections >= self.max_connections:
            return False
        
        return True
 
 
class ProductionWLC:
    """
    Production-grade Weighted Least Connections implementation.
    
    Features:
    - Slow-start for new/recovered servers
    - Connection limits per server
    - Graceful draining for maintenance
    - Health tracking with hysteresis
    - Effective weight adjustment based on failures
    - Thread-safe operation
    """
    
    def __init__(
        self,
        servers: Dict[str, int],  # address -> weight
        warmup_duration: float = 60.0,
        max_failures: int = 3,
        failure_recovery_time: float = 30.0,
    ):
        self._servers: Dict[str, ProductionServer] = {}
        self._warmup_duration = warmup_duration
        self._max_failures = max_failures
        self._failure_recovery_time = failure_recovery_time
        self._lock = threading.Lock()
        
        for addr, weight in servers.items():
            self._servers[addr] = ProductionServer(
                address=addr,
                weight=weight,
                warmup_duration_seconds=warmup_duration,
            )
    
    def get_next_server(self) -> Optional[str]:
        """
        Select server with lowest load ratio among available servers.
        
        Returns None if no servers can accept connections.
        
        Time Complexity: O(n) where n is number of servers
        """
        with self._lock:
            available = [
                s for s in self._servers.values()
                if s.can_accept_connection
            ]
            
            if not available:
                # Attempt recovery of failed servers
                return self._attempt_recovery()
            
            # Select by minimum load ratio
            # Tiebreaker: server with more remaining capacity
            best = min(
                available,
                key=lambda s: (s.load_ratio, -s.effective_weight)
            )
            
            best.active_connections += 1
            return best.address
    
    def release_connection(self, address: str, success: bool = True) -> None:
        """
        Release a connection and optionally report success/failure.
        
        Args:
            address: Server address
            success: Whether the request completed successfully
        """
        with self._lock:
            if address not in self._servers:
                return
            
            server = self._servers[address]
            server.active_connections = max(0, server.active_connections - 1)
            
            if success:
                # Reset failure counter on success
                server.consecutive_failures = 0
            else:
                server.consecutive_failures += 1
                server.last_failure_time = time.time()
                
                if server.consecutive_failures >= self._max_failures:
                    self._mark_unhealthy(server)
    
    def _mark_unhealthy(self, server: ProductionServer) -> None:
        """Mark server as unhealthy."""
        server.status = ServerStatus.UNHEALTHY
        server.active_connections = 0  # Existing requests will complete
        print(f"Server {server.address} marked UNHEALTHY "
              f"after {server.consecutive_failures} failures")
    
    def _attempt_recovery(self) -> Optional[str]:
        """
        Attempt to recover a server that's been down long enough.
        
        Uses slow-start to prevent thundering herd on recovery.
        """
        now = time.time()
        
        for server in self._servers.values():
            if server.status == ServerStatus.UNHEALTHY:
                if (server.last_failure_time and 
                    now - server.last_failure_time > self._failure_recovery_time):
                    
                    # Attempt recovery with slow-start
                    server.status = ServerStatus.WARMING
                    server.warmup_start_time = now
                    server.consecutive_failures = 0
                    server.active_connections = 1
                    
                    print(f"Server {server.address} attempting recovery (slow-start)")
                    return server.address
        
        return None
    
    # ===== Operational Methods =====
    
    def start_draining(self, address: str) -> None:
        """Start graceful draining - no new connections, existing ones complete."""
        with self._lock:
            if address in self._servers:
                self._servers[address].status = ServerStatus.DRAINING
    
    def end_draining(self, address: str) -> None:
        """End draining and restore server with slow-start."""
        with self._lock:
            if address in self._servers:
                server = self._servers[address]
                server.status = ServerStatus.WARMING
                server.warmup_start_time = time.time()
    
    def add_server(self, address: str, weight: int) -> None:
        """Add a new server with slow-start."""
        with self._lock:
            server = ProductionServer(
                address=address,
                weight=weight,
                status=ServerStatus.WARMING,
                warmup_start_time=time.time(),
                warmup_duration_seconds=self._warmup_duration,
            )
            self._servers[address] = server
    
    def remove_server(self, address: str) -> None:
        """Remove a server immediately."""
        with self._lock:
            if address in self._servers:
                del self._servers[address]
    
    def update_weight(self, address: str, new_weight: int) -> None:
        """Update server weight."""
        with self._lock:
            if address in self._servers:
                self._servers[address].weight = new_weight
    
    def set_max_connections(self, address: str, max_conn: Optional[int]) -> None:
        """Set per-server connection limit."""
        with self._lock:
            if address in self._servers:
                self._servers[address].max_connections = max_conn
    
    def get_status(self) -> List[Dict]:
        """Get comprehensive status for monitoring."""
        with self._lock:
            return [
                {
                    "address": s.address,
                    "weight": s.weight,
                    "effective_weight": round(s.effective_weight, 2),
                    "connections": s.active_connections,
                    "max_connections": s.max_connections,
                    "load_ratio": round(s.load_ratio, 3) if s.load_ratio != float('inf') else "N/A",
                    "status": s.status.value,
                    "can_accept": s.can_accept_connection,
                }
                for s in self._servers.values()
            ]
 
 
# Demonstration
if __name__ == "__main__":
    import random
    
    # Heterogeneous cluster
    servers = {
        "large-1": 10,
        "large-2": 10,
        "medium-1": 5,
        "small-1": 2,
    }
    
    lb = ProductionWLC(servers, warmup_duration=5.0)
    
    print("Initial status:")
    for s in lb.get_status():
        print(f"  {s}")
    
    print("\nSimulating 27 requests (proportional to 10+10+5+2=27):")
    
    connections = []
    for i in range(27):
        server = lb.get_next_server()
        if server:
            connections.append(server)
    
    from collections import Counter
    print(f"Distribution: {dict(Counter(connections))}")
    print("Expected ratio: 10:10:5:2")

Stateful Complexity

This implementation maintains significant state per server. In distributed load balancing scenarios, this state must be either shared (adding latency and complexity) or accepted as local (each LB makes independent decisions). Most production systems choose local state, accepting that global optimality isn't achieved.

Edge Cases and Their Solutions

Production WLC implementations must handle numerous edge cases gracefully.

WLC Edge Cases and Solutions
Edge Case	Problem	Solution
All servers at 0 connections	All load ratios are 0—which to choose?	Use weight as tiebreaker (prefer higher capacity)
Server with weight 0	Division by zero in ratio calculation	Return infinity, effectively disabling the server
Single server available	No comparison possible	Return the only server (degenerate case)
All servers unhealthy	No servers to select	Return None or attempt recovery of least-recently-failed
Connection count goes negative	More releases than acquisitions (bug)	Clamp to 0, log warning for debugging
Very high connection counts	Potential integer overflow	Use bounded counters, connection limits
Rapid server addition	New servers get all traffic (0 connections)	Slow-start prevents thundering herd
Weight updates mid-flight	Ratio calculations may be inconsistent	Lock during selection, accept brief inconsistency

Handling Tiebreakers:

When multiple servers have the same load ratio, tiebreaking strategy matters:

•By Weight (Descending): Prefer higher-capacity servers. Rationale: they can absorb more load before their ratio increases.
•By Address (Deterministic): Always choose the same server on ties. Useful for debugging and reproducibility.
•Random: Choose randomly among tied servers. Provides natural distribution but complicates debugging.
•Round Robin among Tied: Cycle through tied servers. Balances load among equals.

Tiebreaker Strategy Matters at Low Load

At high load, ties are rare—servers have different connection counts. At low load (e.g., 3 requests across 10 servers), many servers have 0 connections. Your tiebreaker strategy determines distribution in these scenarios. Weight-based tiebreaking maintains proportional distribution even at low load.

Production Configuration: NGINX, HAProxy, and Envoy

All major load balancers support Weighted Least Connections. Configuration is straightforward but tuning requires understanding the interaction between weights, connection limits, and health checking.

nginx_weighted_least_conn.conf
NGINX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# NGINX Weighted Least Connections Configuration
 
upstream backend_wlc {
    # Enable least connections
    least_conn;
    
    # Different weights create Weighted Least Connections
    # NGINX automatically uses WLC when weights differ
    
    # High-capacity instances (c5.2xlarge equivalent)
    server 10.0.1.1:8080 weight=8 max_conns=2000;
    server 10.0.1.2:8080 weight=8 max_conns=2000;
    
    # Medium-capacity instances
    server 10.0.2.1:8080 weight=4 max_conns=1000;
    server 10.0.2.2:8080 weight=4 max_conns=1000;
    
    # Lower-capacity or older instances
    server 10.0.3.1:8080 weight=2 max_conns=500;
    
    # Canary deployment (minimal traffic for testing)
    server 10.0.4.1:8080 weight=1 max_conns=100;
    
    # Backup server - only if all others fail
    server 10.0.5.1:8080 backup;
    
    # Connection queue settings
    # queue 100 timeout=30s;  # NGINX Plus only
    
    # Keepalive connections to backends
    keepalive 64;
}
 
# Slow start configuration (NGINX Plus only)
upstream backend_wlc_slowstart {
    least_conn;
    
    server 10.0.1.1:8080 weight=10 slow_start=30s;
    server 10.0.1.2:8080 weight=10 slow_start=30s;
    # slow_start: ramp up traffic over 30 seconds
}
 
server {
    listen 80;
    
    location / {
        proxy_pass http://backend_wlc;
        
        # Essential for keepalive
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        # Timeouts affect connection counts
        proxy_connect_timeout 5s;
        proxy_read_timeout 60s;
        
        # Retry on specific errors
        proxy_next_upstream error timeout http_502 http_503 http_504;
        proxy_next_upstream_tries 2;
        proxy_next_upstream_timeout 10s;
    }
    
    # Health check endpoint (NGINX Plus)
    location @health_check {
        # health_check interval=5s fails=3 passes=2;
    }
}

haproxy_wlc.cfg

HAProxy

# HAProxy Weighted Least Connections Configuration
 
global
    maxconn 50000
    tune.ssl.default-dh-param 2048
 
defaults
    mode http
    timeout connect 5s
    timeout client 30s
    timeout server 60s
    timeout queue 30s
    
    option httplog
    option redispatch
    retries 3
 
backend app_wlc
    # Least connections with weights
    balance leastconn
    
    # Weight and connection management
    server large-1 10.0.1.1:8080 weight 100 maxconn 2000 check inter 2s fall 3 rise 2
    server large-2 10.0.1.2:8080 weight 100 maxconn 2000 check inter 2s fall 3 rise 2
    server medium-1 10.0.2.1:8080 weight 50 maxconn 1000 check inter 2s fall 3 rise 2
    server small-1 10.0.3.1:8080 weight 25 maxconn 500 check inter 2s fall 3 rise 2
    
    # Slow start for recovered servers
    # Gradually increases traffic over 30s after recovery
    server canary 10.0.4.1:8080 weight 10 maxconn 100 check slowstart 30s
    
    # Active health checking
    option httpchk GET /health
    http-check expect status 200
 
backend app_wlc_advanced
    balance leastconn
    
    # Full connection for queue calculations
    fullconn 4000
    
    # Server with agent-check for dynamic weight
    # External agent can adjust weight based on metrics
    server dynamic-1 10.0.1.1:8080 weight 100 check agent-check agent-inter 5s agent-port 8888
    
    # Agent response: "up ready weight 75" - adjusts weight dynamically!
    
    # Cookie-based session persistence with leastconn
    # Note: This partially defeats load balancing for return visitors
    cookie SERVERID insert indirect nocache
    server sticky-1 10.0.2.1:8080 weight 100 check cookie s1
    server sticky-2 10.0.2.2:8080 weight 100 check cookie s2
 
frontend http_front
    bind *:80
    
    # ACL-based routing
    acl is_api path_beg /api
    use_backend app_wlc if is_api
    default_backend app_wlc
 
listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 5s
    stats admin if TRUE  # Allow runtime weight changes

envoy_wlc.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# Envoy Weighted Least Connections Configuration
 
static_resources:
  clusters:
  - name: backend_wlc
    type: STRICT_DNS
    connect_timeout: 5s
    
    # Least request policy (Envoy's equivalent of least connections)
    lb_policy: LEAST_REQUEST
    
    # Least request configuration
    least_request_lb_config:
      # Power of two choices for O(1) selection
      choice_count: 2
      
      # Slow start configuration
      slow_start_config:
        slow_start_window: 30s
        aggression:
          default_value: 1.0
    
    # Circuit breaker settings
    circuit_breakers:
      thresholds:
      - priority: DEFAULT
        max_connections: 1000
        max_pending_requests: 100
        max_requests: 1000
        max_retries: 3
    
    # Endpoints with weights
    load_assignment:
      cluster_name: backend_wlc
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 10.0.1.1
                port_value: 8080
          load_balancing_weight: 100
        - endpoint:
            address:
              socket_address:
                address: 10.0.1.2
                port_value: 8080
          load_balancing_weight: 100
        - endpoint:
            address:
              socket_address:
                address: 10.0.2.1
                port_value: 8080
          load_balancing_weight: 50
        - endpoint:
            address:
              socket_address:
                address: 10.0.3.1
                port_value: 8080
          load_balancing_weight: 25
    
    # Health checking
    health_checks:
    - timeout: 3s
      interval: 5s
      unhealthy_threshold: 3
      healthy_threshold: 2
      http_health_check:
        path: /health
    
    # Connection settings
    common_http_protocol_options:
      idle_timeout: 60s

Envoy's Least Request

Envoy calls this algorithm 'Least Request' rather than 'Least Connections' because it tracks outstanding HTTP requests, not TCP connections. This is more accurate for HTTP/2 and gRPC where many requests multiplex over a single connection.

Tuning and Optimization

WLC effectiveness depends on proper tuning. Here are the key parameters and their impacts.

WLC Tuning Parameters
Parameter	Effect	Tuning Guidance
Weights	Proportional traffic distribution	Set proportional to measured capacity. Monitor actual distribution and adjust.
max_connections	Per-server connection limit	Set based on server capacity. Too low = requests queue. Too high = server overload.
slow_start duration	Recovery ramp-up time	Longer = safer but slower recovery. 30-60s typical. Adjust based on warmup needs.
Health check interval	Failure detection speed	Faster = quicker detection but more overhead. 2-5s typical.
Failure threshold	Failures before marking unhealthy	Higher = more tolerant of transient errors. 3-5 typical.
Queue timeout	How long queued requests wait	Set based on user tolerance. 30s reasonable for web, 5s for APIs.

Weight Calibration Process:

Baseline: Set all weights to 1 initially
Measure: Run load tests, measure requests per second each server handles before saturation
Normalize: If server A handles 1000 RPS and server B handles 500 RPS, set weights A=2, B=1
Validate: Run with new weights, verify distribution matches capacity
Iterate: Monitor production, adjust as hardware or workloads change

Common Tuning Mistakes:

Tuning Mistakes to Avoid

•Weights too high: 1000 vs 500 works as well as 2 vs 1. Use small numbers for simplicity.
•Ignoring connection limits: Without limits, a slow server accumulates unbounded connections, starving healthy servers of load balancer resources.
•Slow start too short: Server takes 30s to warm up (JIT compilation, cache population) but slow start is 10s → server gets overwhelmed.
•Aggressive health checks: Checking every 100ms creates significant network and server overhead.
•Not monitoring distribution: Set up dashboards showing connections per server. Catch misconfigurations early.

The Golden Rule

Monitor actual connection distribution in production. If one server consistently has higher connections than its weight suggests, investigate—it may be slower, have capacity issues, or weights may need adjustment.

When to Use Weighted Least Connections

Algorithm Comparison
Scenario	Best Algorithm	Reasoning
Homogeneous servers, uniform requests	Round Robin	Simplest, no overhead, optimal for this case
Heterogeneous servers, uniform requests	Weighted Round Robin	Capacity-aware without connection tracking overhead
Homogeneous servers, variable requests	Least Connections	Load-aware without weight complexity
Heterogeneous servers, variable requests	Weighted Least Connections	Full capacity + load awareness
Session affinity required	IP Hash or Consistent Hash	WLC doesn't provide session stickiness
Very large server pools (1000+)	P2C with WLC	O(1) selection while maintaining WLC properties

Decision Flowchart:

Are all servers identical capacity?
  ├── Yes → Are request durations uniform?
  │          ├── Yes → Round Robin
  │          └── No → Least Connections
  └── No → Are request durations uniform?
              ├── Yes → Weighted Round Robin
              └── No → Weighted Least Connections

The Practical Choice

For most production web applications with heterogeneous cloud infrastructure, Weighted Least Connections is the best default choice. It handles capacity differences, adapts to variable workloads, and degrades gracefully when servers have issues. The overhead is minimal and the benefits are significant.

Summary: Weighted Least Connections Mastery

Key Takeaways

•Load ratio = connections/weight: The fundamental metric for selection. Lower ratio = more available capacity.
•Dynamic equilibrium: WLC naturally drives servers toward equal load ratios, achieving proportional distribution dynamically.
•Production complexity: Requires slow-start, connection limits, health tracking, and graceful draining for reliable operation.
•Edge cases matter: Zero weights, ties, and recovery scenarios need explicit handling.
•Tune based on observation: Monitor actual distribution and adjust weights to match measured capacity.
•Best for heterogeneous + variable: The algorithm of choice when both capacity differs AND request durations vary.

What's Next:

We've covered connection-aware algorithms extensively. The next page explores IP Hash—an algorithm that provides session affinity by consistently routing requests from the same client to the same server, enabling stateful applications in load-balanced environments.

Page Complete

You now have complete mastery of Weighted Least Connections—from its mathematical foundations to production configuration and tuning. You can design and operate WLC-based load balancing for heterogeneous environments with confidence.