Loading content...
We've explored algorithms that solve pieces of the load balancing puzzle:
Weighted Least Connections unifies these approaches, providing a distribution algorithm that:
This makes Weighted Least Connections the algorithm of choice for heterogeneous production environments where both capacity and current load matter.
By the end of this page, you will understand the mathematical model behind WLC, advanced implementation techniques, edge cases and their solutions, and how to configure and tune WLC in production environments for optimal performance.
Weighted Least Connections selects servers based on their load ratio—the ratio of current connections to capacity.
Load Ratio Definition:
load_ratio(server) = active_connections / weight
The server with the minimum load ratio is selected. This ensures:
Example Selection:
| Server | Weight | Connections | Load Ratio | Selected? |
|---|---|---|---|---|
| A | 10 | 8 | 0.80 | No |
| B | 5 | 3 | 0.60 | No |
| C | 2 | 1 | 0.50 | ✓ Yes |
Server C has the lowest load ratio (0.50), so it receives the next request.
The Equilibrium Property:
Over time, Weighted Least Connections drives the system toward load ratio equilibrium—all servers tend toward the same ratio.
Why? If Server A has a lower ratio than others, it receives more traffic, increasing its connections and ratio. Eventually, all servers converge to similar ratios.
At equilibrium with load ratio r for all servers:
This is proportional distribution—exactly what weighted algorithms aim to achieve, but achieved dynamically rather than statically.
Unlike Weighted Round Robin which enforces proportions statically, WLC achieves proportional distribution dynamically. If a high-weight server suddenly slows down (GC pause, disk I/O), its connections accumulate, its ratio increases, and traffic automatically shifts away—without any health check or manual intervention.
A robust implementation of Weighted Least Connections must handle numerous edge cases and operational concerns.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298
import threadingimport timeimport mathfrom dataclasses import dataclass, fieldfrom typing import Optional, Dict, Listfrom enum import Enum class ServerStatus(Enum): HEALTHY = "healthy" UNHEALTHY = "unhealthy" DRAINING = "draining" # No new connections, existing ones complete WARMING = "warming" # Slow-start period @dataclassclass ProductionServer: """ Full-featured server state for production WLC. Tracks not just connections but also health status, effective weight for slow-start, and operational metadata. """ address: str weight: int # Connection tracking active_connections: int = 0 # Health and status status: ServerStatus = ServerStatus.HEALTHY consecutive_failures: int = 0 last_failure_time: Optional[float] = None # Slow-start support warmup_start_time: Optional[float] = None warmup_duration_seconds: float = 60.0 # Connection limits max_connections: Optional[int] = None @property def effective_weight(self) -> float: """ Weight adjusted for slow-start period. During warmup, weight gradually increases from 10% to 100%. """ if self.status == ServerStatus.UNHEALTHY: return 0.0 if self.status == ServerStatus.DRAINING: return 0.0 # No new connections if self.status == ServerStatus.WARMING and self.warmup_start_time: elapsed = time.time() - self.warmup_start_time if elapsed >= self.warmup_duration_seconds: self.status = ServerStatus.HEALTHY return float(self.weight) # Linear ramp from 10% to 100% progress = elapsed / self.warmup_duration_seconds return self.weight * (0.1 + 0.9 * progress) return float(self.weight) @property def load_ratio(self) -> float: """ Calculate load relative to effective capacity. Returns infinity for unavailable servers to exclude them. """ ew = self.effective_weight if ew <= 0: return float('inf') return self.active_connections / ew @property def can_accept_connection(self) -> bool: """Check if server can accept a new connection.""" if self.status in (ServerStatus.UNHEALTHY, ServerStatus.DRAINING): return False if self.max_connections and self.active_connections >= self.max_connections: return False return True class ProductionWLC: """ Production-grade Weighted Least Connections implementation. Features: - Slow-start for new/recovered servers - Connection limits per server - Graceful draining for maintenance - Health tracking with hysteresis - Effective weight adjustment based on failures - Thread-safe operation """ def __init__( self, servers: Dict[str, int], # address -> weight warmup_duration: float = 60.0, max_failures: int = 3, failure_recovery_time: float = 30.0, ): self._servers: Dict[str, ProductionServer] = {} self._warmup_duration = warmup_duration self._max_failures = max_failures self._failure_recovery_time = failure_recovery_time self._lock = threading.Lock() for addr, weight in servers.items(): self._servers[addr] = ProductionServer( address=addr, weight=weight, warmup_duration_seconds=warmup_duration, ) def get_next_server(self) -> Optional[str]: """ Select server with lowest load ratio among available servers. Returns None if no servers can accept connections. Time Complexity: O(n) where n is number of servers """ with self._lock: available = [ s for s in self._servers.values() if s.can_accept_connection ] if not available: # Attempt recovery of failed servers return self._attempt_recovery() # Select by minimum load ratio # Tiebreaker: server with more remaining capacity best = min( available, key=lambda s: (s.load_ratio, -s.effective_weight) ) best.active_connections += 1 return best.address def release_connection(self, address: str, success: bool = True) -> None: """ Release a connection and optionally report success/failure. Args: address: Server address success: Whether the request completed successfully """ with self._lock: if address not in self._servers: return server = self._servers[address] server.active_connections = max(0, server.active_connections - 1) if success: # Reset failure counter on success server.consecutive_failures = 0 else: server.consecutive_failures += 1 server.last_failure_time = time.time() if server.consecutive_failures >= self._max_failures: self._mark_unhealthy(server) def _mark_unhealthy(self, server: ProductionServer) -> None: """Mark server as unhealthy.""" server.status = ServerStatus.UNHEALTHY server.active_connections = 0 # Existing requests will complete print(f"Server {server.address} marked UNHEALTHY " f"after {server.consecutive_failures} failures") def _attempt_recovery(self) -> Optional[str]: """ Attempt to recover a server that's been down long enough. Uses slow-start to prevent thundering herd on recovery. """ now = time.time() for server in self._servers.values(): if server.status == ServerStatus.UNHEALTHY: if (server.last_failure_time and now - server.last_failure_time > self._failure_recovery_time): # Attempt recovery with slow-start server.status = ServerStatus.WARMING server.warmup_start_time = now server.consecutive_failures = 0 server.active_connections = 1 print(f"Server {server.address} attempting recovery (slow-start)") return server.address return None # ===== Operational Methods ===== def start_draining(self, address: str) -> None: """Start graceful draining - no new connections, existing ones complete.""" with self._lock: if address in self._servers: self._servers[address].status = ServerStatus.DRAINING def end_draining(self, address: str) -> None: """End draining and restore server with slow-start.""" with self._lock: if address in self._servers: server = self._servers[address] server.status = ServerStatus.WARMING server.warmup_start_time = time.time() def add_server(self, address: str, weight: int) -> None: """Add a new server with slow-start.""" with self._lock: server = ProductionServer( address=address, weight=weight, status=ServerStatus.WARMING, warmup_start_time=time.time(), warmup_duration_seconds=self._warmup_duration, ) self._servers[address] = server def remove_server(self, address: str) -> None: """Remove a server immediately.""" with self._lock: if address in self._servers: del self._servers[address] def update_weight(self, address: str, new_weight: int) -> None: """Update server weight.""" with self._lock: if address in self._servers: self._servers[address].weight = new_weight def set_max_connections(self, address: str, max_conn: Optional[int]) -> None: """Set per-server connection limit.""" with self._lock: if address in self._servers: self._servers[address].max_connections = max_conn def get_status(self) -> List[Dict]: """Get comprehensive status for monitoring.""" with self._lock: return [ { "address": s.address, "weight": s.weight, "effective_weight": round(s.effective_weight, 2), "connections": s.active_connections, "max_connections": s.max_connections, "load_ratio": round(s.load_ratio, 3) if s.load_ratio != float('inf') else "N/A", "status": s.status.value, "can_accept": s.can_accept_connection, } for s in self._servers.values() ] # Demonstrationif __name__ == "__main__": import random # Heterogeneous cluster servers = { "large-1": 10, "large-2": 10, "medium-1": 5, "small-1": 2, } lb = ProductionWLC(servers, warmup_duration=5.0) print("Initial status:") for s in lb.get_status(): print(f" {s}") print("\nSimulating 27 requests (proportional to 10+10+5+2=27):") connections = [] for i in range(27): server = lb.get_next_server() if server: connections.append(server) from collections import Counter print(f"Distribution: {dict(Counter(connections))}") print("Expected ratio: 10:10:5:2")This implementation maintains significant state per server. In distributed load balancing scenarios, this state must be either shared (adding latency and complexity) or accepted as local (each LB makes independent decisions). Most production systems choose local state, accepting that global optimality isn't achieved.
Production WLC implementations must handle numerous edge cases gracefully.
| Edge Case | Problem | Solution |
|---|---|---|
| All servers at 0 connections | All load ratios are 0—which to choose? | Use weight as tiebreaker (prefer higher capacity) |
| Server with weight 0 | Division by zero in ratio calculation | Return infinity, effectively disabling the server |
| Single server available | No comparison possible | Return the only server (degenerate case) |
| All servers unhealthy | No servers to select | Return None or attempt recovery of least-recently-failed |
| Connection count goes negative | More releases than acquisitions (bug) | Clamp to 0, log warning for debugging |
| Very high connection counts | Potential integer overflow | Use bounded counters, connection limits |
| Rapid server addition | New servers get all traffic (0 connections) | Slow-start prevents thundering herd |
| Weight updates mid-flight | Ratio calculations may be inconsistent | Lock during selection, accept brief inconsistency |
Handling Tiebreakers:
When multiple servers have the same load ratio, tiebreaking strategy matters:
At high load, ties are rare—servers have different connection counts. At low load (e.g., 3 requests across 10 servers), many servers have 0 connections. Your tiebreaker strategy determines distribution in these scenarios. Weight-based tiebreaking maintains proportional distribution even at low load.
All major load balancers support Weighted Least Connections. Configuration is straightforward but tuning requires understanding the interaction between weights, connection limits, and health checking.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
# NGINX Weighted Least Connections Configuration upstream backend_wlc { # Enable least connections least_conn; # Different weights create Weighted Least Connections # NGINX automatically uses WLC when weights differ # High-capacity instances (c5.2xlarge equivalent) server 10.0.1.1:8080 weight=8 max_conns=2000; server 10.0.1.2:8080 weight=8 max_conns=2000; # Medium-capacity instances server 10.0.2.1:8080 weight=4 max_conns=1000; server 10.0.2.2:8080 weight=4 max_conns=1000; # Lower-capacity or older instances server 10.0.3.1:8080 weight=2 max_conns=500; # Canary deployment (minimal traffic for testing) server 10.0.4.1:8080 weight=1 max_conns=100; # Backup server - only if all others fail server 10.0.5.1:8080 backup; # Connection queue settings # queue 100 timeout=30s; # NGINX Plus only # Keepalive connections to backends keepalive 64;} # Slow start configuration (NGINX Plus only)upstream backend_wlc_slowstart { least_conn; server 10.0.1.1:8080 weight=10 slow_start=30s; server 10.0.1.2:8080 weight=10 slow_start=30s; # slow_start: ramp up traffic over 30 seconds} server { listen 80; location / { proxy_pass http://backend_wlc; # Essential for keepalive proxy_http_version 1.1; proxy_set_header Connection ""; # Timeouts affect connection counts proxy_connect_timeout 5s; proxy_read_timeout 60s; # Retry on specific errors proxy_next_upstream error timeout http_502 http_503 http_504; proxy_next_upstream_tries 2; proxy_next_upstream_timeout 10s; } # Health check endpoint (NGINX Plus) location @health_check { # health_check interval=5s fails=3 passes=2; }}12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
# HAProxy Weighted Least Connections Configuration global maxconn 50000 tune.ssl.default-dh-param 2048 defaults mode http timeout connect 5s timeout client 30s timeout server 60s timeout queue 30s option httplog option redispatch retries 3 backend app_wlc # Least connections with weights balance leastconn # Weight and connection management server large-1 10.0.1.1:8080 weight 100 maxconn 2000 check inter 2s fall 3 rise 2 server large-2 10.0.1.2:8080 weight 100 maxconn 2000 check inter 2s fall 3 rise 2 server medium-1 10.0.2.1:8080 weight 50 maxconn 1000 check inter 2s fall 3 rise 2 server small-1 10.0.3.1:8080 weight 25 maxconn 500 check inter 2s fall 3 rise 2 # Slow start for recovered servers # Gradually increases traffic over 30s after recovery server canary 10.0.4.1:8080 weight 10 maxconn 100 check slowstart 30s # Active health checking option httpchk GET /health http-check expect status 200 backend app_wlc_advanced balance leastconn # Full connection for queue calculations fullconn 4000 # Server with agent-check for dynamic weight # External agent can adjust weight based on metrics server dynamic-1 10.0.1.1:8080 weight 100 check agent-check agent-inter 5s agent-port 8888 # Agent response: "up ready weight 75" - adjusts weight dynamically! # Cookie-based session persistence with leastconn # Note: This partially defeats load balancing for return visitors cookie SERVERID insert indirect nocache server sticky-1 10.0.2.1:8080 weight 100 check cookie s1 server sticky-2 10.0.2.2:8080 weight 100 check cookie s2 frontend http_front bind *:80 # ACL-based routing acl is_api path_beg /api use_backend app_wlc if is_api default_backend app_wlc listen stats bind *:8404 stats enable stats uri /stats stats refresh 5s stats admin if TRUE # Allow runtime weight changes12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
# Envoy Weighted Least Connections Configuration static_resources: clusters: - name: backend_wlc type: STRICT_DNS connect_timeout: 5s # Least request policy (Envoy's equivalent of least connections) lb_policy: LEAST_REQUEST # Least request configuration least_request_lb_config: # Power of two choices for O(1) selection choice_count: 2 # Slow start configuration slow_start_config: slow_start_window: 30s aggression: default_value: 1.0 # Circuit breaker settings circuit_breakers: thresholds: - priority: DEFAULT max_connections: 1000 max_pending_requests: 100 max_requests: 1000 max_retries: 3 # Endpoints with weights load_assignment: cluster_name: backend_wlc endpoints: - lb_endpoints: - endpoint: address: socket_address: address: 10.0.1.1 port_value: 8080 load_balancing_weight: 100 - endpoint: address: socket_address: address: 10.0.1.2 port_value: 8080 load_balancing_weight: 100 - endpoint: address: socket_address: address: 10.0.2.1 port_value: 8080 load_balancing_weight: 50 - endpoint: address: socket_address: address: 10.0.3.1 port_value: 8080 load_balancing_weight: 25 # Health checking health_checks: - timeout: 3s interval: 5s unhealthy_threshold: 3 healthy_threshold: 2 http_health_check: path: /health # Connection settings common_http_protocol_options: idle_timeout: 60sEnvoy calls this algorithm 'Least Request' rather than 'Least Connections' because it tracks outstanding HTTP requests, not TCP connections. This is more accurate for HTTP/2 and gRPC where many requests multiplex over a single connection.
WLC effectiveness depends on proper tuning. Here are the key parameters and their impacts.
| Parameter | Effect | Tuning Guidance |
|---|---|---|
| Weights | Proportional traffic distribution | Set proportional to measured capacity. Monitor actual distribution and adjust. |
| max_connections | Per-server connection limit | Set based on server capacity. Too low = requests queue. Too high = server overload. |
| slow_start duration | Recovery ramp-up time | Longer = safer but slower recovery. 30-60s typical. Adjust based on warmup needs. |
| Health check interval | Failure detection speed | Faster = quicker detection but more overhead. 2-5s typical. |
| Failure threshold | Failures before marking unhealthy | Higher = more tolerant of transient errors. 3-5 typical. |
| Queue timeout | How long queued requests wait | Set based on user tolerance. 30s reasonable for web, 5s for APIs. |
Weight Calibration Process:
Common Tuning Mistakes:
Monitor actual connection distribution in production. If one server consistently has higher connections than its weight suggests, investigate—it may be slower, have capacity issues, or weights may need adjustment.
| Scenario | Best Algorithm | Reasoning |
|---|---|---|
| Homogeneous servers, uniform requests | Round Robin | Simplest, no overhead, optimal for this case |
| Heterogeneous servers, uniform requests | Weighted Round Robin | Capacity-aware without connection tracking overhead |
| Homogeneous servers, variable requests | Least Connections | Load-aware without weight complexity |
| Heterogeneous servers, variable requests | Weighted Least Connections | Full capacity + load awareness |
| Session affinity required | IP Hash or Consistent Hash | WLC doesn't provide session stickiness |
| Very large server pools (1000+) | P2C with WLC | O(1) selection while maintaining WLC properties |
Decision Flowchart:
Are all servers identical capacity?
├── Yes → Are request durations uniform?
│ ├── Yes → Round Robin
│ └── No → Least Connections
└── No → Are request durations uniform?
├── Yes → Weighted Round Robin
└── No → Weighted Least Connections
For most production web applications with heterogeneous cloud infrastructure, Weighted Least Connections is the best default choice. It handles capacity differences, adapts to variable workloads, and degrades gracefully when servers have issues. The overhead is minimal and the benefits are significant.
What's Next:
We've covered connection-aware algorithms extensively. The next page explores IP Hash—an algorithm that provides session affinity by consistently routing requests from the same client to the same server, enabling stateful applications in load-balanced environments.
You now have complete mastery of Weighted Least Connections—from its mathematical foundations to production configuration and tuning. You can design and operate WLC-based load balancing for heterogeneous environments with confidence.