Loading learning content...
In an ideal world, all servers would be identical—same CPU, same memory, same network capacity. Standard Round Robin would distribute load perfectly. But reality is messier.
Consider these real-world scenarios:
c5.xlarge has 4x the capacity of a c5.largeIn all these cases, equal request distribution creates unequal load distribution. The solution is Weighted Round Robin—an elegant extension that distributes requests proportionally to server capacity.
Weighted Round Robin maintains the simplicity and predictability of standard Round Robin while accommodating heterogeneous infrastructure. It's the algorithm that acknowledges not all servers are created equal.
By the end of this page, you will understand weight assignment strategies, multiple implementation algorithms with different tradeoffs, mathematical properties of weighted distribution, and precisely when weighted round robin is superior to its unweighted counterpart.
A weight is a positive integer that represents a server's relative capacity compared to other servers. Weights are not absolute—they only matter in relation to each other.
The Weight Interpretation:
Given servers with weights:
Total weight = 5 + 3 + 2 = 10
The distribution target becomes:
This is proportional distribution—each server receives traffic proportional to its declared capacity.
Weight Assignment Strategies:
Determining appropriate weights requires understanding your infrastructure:
| Strategy | Description | When to Use |
|---|---|---|
| CPU-Based | Weight proportional to CPU cores or compute capacity | CPU-bound workloads (computation, rendering) |
| Memory-Based | Weight proportional to available RAM | Memory-intensive workloads (caching, large datasets) |
| Instance-Type Based | Cloud instance type determines weight (e.g., xlarge=4, large=2, medium=1) | Cloud deployments with mixed instance sizes |
| Benchmark-Based | Run load tests to measure actual throughput, set weights proportionally | Heterogeneous hardware, complex workloads |
| Manual/Operational | Explicitly set weights for traffic control (canary deployments, testing) | Gradual rollouts, A/B testing, capacity management |
Begin with simple weight assignment (e.g., based on CPU count) and refine based on production metrics. Over-engineering weights upfront rarely pays off—observe actual load distribution and adjust.
The simplest way to implement Weighted Round Robin is to expand the server list according to weights and then apply standard Round Robin.
The Expansion Algorithm:
Given:
Create expanded list: [A, A, A, B, B, C]
Apply standard Round Robin to this list.
Over 6 requests:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
class NaiveWeightedRoundRobin: """ Weighted Round Robin using list expansion. This approach is simple to understand but has significant drawbacks for large weights or many servers. """ def __init__(self, servers: dict[str, int]): """ Initialize with servers and their weights. Args: servers: Dictionary mapping server address to weight e.g., {"10.0.1.1": 5, "10.0.1.2": 3, "10.0.1.3": 2} """ # Expand server list according to weights self._expanded_list = [] for server, weight in servers.items(): self._expanded_list.extend([server] * weight) self._current_index = 0 def get_next_server(self) -> str: """ Select next server from expanded list. Time Complexity: O(1) Space Complexity: O(sum of all weights) - THIS IS THE PROBLEM! """ server = self._expanded_list[self._current_index] self._current_index = (self._current_index + 1) % len(self._expanded_list) return server # Example usageservers = { "10.0.1.1:8080": 3, "10.0.1.2:8080": 2, "10.0.1.3:8080": 1,} lb = NaiveWeightedRoundRobin(servers) # Demonstrate distributionprint("Request distribution over 12 requests:")for i in range(12): server = lb.get_next_server() print(f" Request {i+1:2d} → {server}") # Output shows pattern: A, A, A, B, B, C, A, A, A, B, B, CProblems with the Naive Approach:
The naive expansion approach is useful for understanding the concept but should not be used in production. The burst behavior alone causes significant problems—a server receiving 100 consecutive requests experiences a traffic spike that doesn't reflect its intended load share.
The Smooth Weighted Round Robin (SWRR) algorithm, made famous by its implementation in NGINX, solves the burst problem while maintaining O(n) space complexity (where n is the number of servers, not the sum of weights).
The Key Insight:
Instead of expanding the server list, SWRR tracks a current weight for each server that changes dynamically with each selection. The algorithm interleaves selections across servers, distributing requests smoothly while respecting weight proportions.
Algorithm Steps:
weight and a dynamic current_weight (initially 0)weight to its current_weightcurrent_weighttotal_weight from the selected server's current_weightThis creates a smooth interleaving that distributes requests proportionally without bursts.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
from dataclasses import dataclassfrom typing import Optionalimport threading @dataclassclass WeightedServer: """Server with static weight and dynamic current weight.""" address: str weight: int current_weight: int = 0 effective_weight: int = 0 # Used for health-aware weighting def __post_init__(self): self.effective_weight = self.weight class SmoothWeightedRoundRobin: """ Smooth Weighted Round Robin - The NGINX Algorithm. This implementation provides: - Smooth distribution without bursts - O(n) space complexity - O(n) time per selection (scanning all servers) - Thread-safe operation - Dynamic weight adjustment for health """ def __init__(self, servers: dict[str, int]): """ Initialize with servers and their weights. Args: servers: Dictionary mapping server address to weight """ self._servers = [ WeightedServer(address=addr, weight=w) for addr, w in servers.items() ] self._total_weight = sum(s.weight for s in self._servers) self._lock = threading.Lock() def get_next_server(self) -> Optional[str]: """ Select next server using SWRR algorithm. Time Complexity: O(n) where n is number of servers Space Complexity: O(1) additional space Returns: Selected server address, or None if no servers available """ with self._lock: if not self._servers: return None # Step 1: Add effective_weight to current_weight for all servers for server in self._servers: server.current_weight += server.effective_weight # Step 2: Select server with highest current_weight best_server = max(self._servers, key=lambda s: s.current_weight) # Step 3: Subtract total_weight from selected server total_effective = sum(s.effective_weight for s in self._servers) best_server.current_weight -= total_effective return best_server.address def report_failure(self, address: str) -> None: """ Decrease effective weight on failure (graceful degradation). This allows the algorithm to automatically route less traffic to struggling servers without completely removing them. """ with self._lock: for server in self._servers: if server.address == address: # Decrease effective weight, but keep minimum of 1 server.effective_weight = max(1, server.effective_weight - 1) break def report_success(self, address: str) -> None: """ Increase effective weight on success (gradual recovery). Allows previously-degraded servers to recover their full traffic share as they prove reliable. """ with self._lock: for server in self._servers: if server.address == address: # Increase effective weight, up to original weight if server.effective_weight < server.weight: server.effective_weight += 1 break def get_distribution_state(self) -> list[dict]: """Debug method: Return current state of all servers.""" with self._lock: return [ { "address": s.address, "weight": s.weight, "effective_weight": s.effective_weight, "current_weight": s.current_weight, } for s in self._servers ] # Demonstration of smooth distributionif __name__ == "__main__": servers = { "A": 5, "B": 3, "C": 1, } lb = SmoothWeightedRoundRobin(servers) print("SWRR Distribution over 9 requests:") print("-" * 50) selections = [] for i in range(9): server = lb.get_next_server() selections.append(server) print(f"Request {i+1}: Selected {server}") print("-" * 50) print(f"\nDistribution: {dict((s, selections.count(s)) for s in set(selections))}") print("Expected: A=5, B=3, C=1") # Output demonstrates smooth interleaving: # A, B, A, C, A, B, A, B, A # NOT the burst pattern: A, A, A, A, A, B, B, B, CVisualizing the Smooth Distribution:
Let's trace through SWRR with weights A=5, B=1:
| Request | Before Selection | Selected | After Selection |
|---|---|---|---|
| 1 | A:5, B:1 | A (5>1) | A:-1, B:1 |
| 2 | A:4, B:2 | A (4>2) | A:-2, B:2 |
| 3 | A:3, B:3 | A (tie, A first) | A:-3, B:3 |
| 4 | A:2, B:4 | B (4>2) | A:2, B:-2 |
| 5 | A:7, B:-1 | A (7>-1) | A:1, B:-1 |
| 6 | A:6, B:0 | A (6>0) | A:0, B:0 |
Over 6 requests: A=5, B=1 — exactly proportional to weights!
Notice the interleaving: A, A, A, B, A, A rather than A, A, A, A, A, B
Smooth Weighted Round Robin is used in NGINX, Envoy, and many other production load balancers. It provides optimal distribution smoothness with acceptable O(n) selection overhead. For most systems with fewer than 1000 backends, this is the recommended weighted algorithm.
HAProxy uses a different approach that pre-computes an interleaved sequence achieving smooth distribution while maintaining O(1) selection time. This trades memory for selection performance.
The Algorithm:
GCD-reduced sum of weightsExample:
Weights: A=2, B=1, C=1 (sum=4)
Interleaved sequence: [A, B, A, C]
This spreads A's two slots apart, avoiding the burst pattern [A, A, B, C].
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109
import mathfrom typing import List, Dict def gcd_of_list(numbers: List[int]) -> int: """Compute GCD of a list of numbers.""" result = numbers[0] for num in numbers[1:]: result = math.gcd(result, num) return result def build_interleaved_sequence(servers: Dict[str, int]) -> List[str]: """ Build an interleaved sequence for smooth weighted distribution. Uses the "virtual server" approach: spread each server's slots as evenly as possible across the sequence. Time Complexity: O(total_weight) Space Complexity: O(total_weight) """ if not servers: return [] # Normalize weights by GCD to minimize sequence length weights = list(servers.values()) common_gcd = gcd_of_list(weights) normalized = { server: weight // common_gcd for server, weight in servers.items() } total_weight = sum(normalized.values()) sequence = [None] * total_weight # For each server, distribute its slots evenly for server, weight in normalized.items(): if weight == 0: continue # Calculate ideal spacing between this server's slots spacing = total_weight / weight # Place slots at optimal positions for i in range(weight): ideal_position = int(i * spacing) # Find nearest empty slot (linear probe) pos = ideal_position while sequence[pos] is not None: pos = (pos + 1) % total_weight sequence[pos] = server return sequence class InterleavedWeightedRoundRobin: """ Weighted Round Robin with pre-computed interleaved sequence. Advantages: - O(1) selection time - Smooth distribution Disadvantages: - O(total_weight) memory - Rebuilding required on weight changes """ def __init__(self, servers: Dict[str, int]): self._original_weights = dict(servers) self._sequence = build_interleaved_sequence(servers) self._index = 0 def get_next_server(self) -> str: """O(1) selection from pre-computed sequence.""" if not self._sequence: return None server = self._sequence[self._index] self._index = (self._index + 1) % len(self._sequence) return server def rebuild_sequence(self): """Rebuild after weight changes. O(total_weight).""" self._sequence = build_interleaved_sequence(self._original_weights) self._index = 0 # Demonstrationif __name__ == "__main__": servers = {"A": 5, "B": 3, "C": 2} sequence = build_interleaved_sequence(servers) print(f"Interleaved sequence for weights A=5, B=3, C=2:") print(f" {sequence}") print(f" Length: {len(sequence)}") # Verify distribution from collections import Counter counts = Counter(sequence) print(f"\nDistribution: {dict(counts)}") print(f"Expected: A=5, B=3, C=2") # Show smooth distribution with load balancer lb = InterleavedWeightedRoundRobin(servers) print(f"\nFirst 20 selections:") selections = [lb.get_next_server() for _ in range(20)] print(f" {selections}")| Aspect | Smooth WRR (NGINX) | Interleaved WRR (HAProxy) |
|---|---|---|
| Selection Time | O(n) - must scan all servers | O(1) - array lookup |
| Memory Usage | O(n) - one entry per server | O(Σweights) - can be large |
| Dynamic Weight Changes | Immediate - just update weight | Requires sequence rebuild |
| Distribution Smoothness | Excellent | Excellent |
| Implementation Complexity | Simple | Moderate |
| Best For | Dynamic pools, moderate server count | Static pools, very high throughput |
If you have few servers that change frequently → use SWRR. If you have a stable pool and need maximum throughput → consider interleaved WRR. In practice, SWRR's O(n) overhead is negligible for < 1000 servers, making it the more common choice.
Let's examine how to configure weighted round robin in production load balancers.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
# NGINX Weighted Round Robin Configuration# Uses Smooth Weighted Round Robin algorithm internally upstream backend_pool { # Weight parameter specifies relative capacity # Higher weight = more traffic # High-capacity servers (new hardware) server 10.0.1.1:8080 weight=10; server 10.0.1.2:8080 weight=10; # Medium-capacity servers server 10.0.2.1:8080 weight=5; server 10.0.2.2:8080 weight=5; # Lower-capacity servers (older hardware) server 10.0.3.1:8080 weight=2; # Canary server (testing new version) # Low weight limits blast radius server 10.0.4.1:8080 weight=1; # Health check settings # max_fails: failures before marking unhealthy # fail_timeout: unhealthy period duration} upstream mixed_instance_types { # AWS instance type-based weighting example # c5.2xlarge has ~4x capacity of c5.large server c5-2xlarge-1.internal:8080 weight=8; server c5-2xlarge-2.internal:8080 weight=8; server c5-xlarge-1.internal:8080 weight=4; server c5-xlarge-2.internal:8080 weight=4; server c5-large-1.internal:8080 weight=2; server c5-large-2.internal:8080 weight=2; # Optional: Backup server with zero weight for normal traffic # Only receives traffic when all primary servers are down server backup.internal:8080 backup;} # Weight=0 is special: server doesn't receive traffic# Useful for maintenance or graceful shutdownupstream with_maintenance { server app1.internal:8080 weight=5; server app2.internal:8080 weight=5; server app3.internal:8080 weight=0; # In maintenance} server { listen 80; location / { proxy_pass http://backend_pool; } # Health endpoint for monitoring weight distribution location /upstream_status { # Requires NGINX Plus or ngx_http_upstream_module # Shows current weight and connection counts }}123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
# HAProxy Weighted Round Robin Configuration global maxconn 50000 defaults mode http timeout connect 5s timeout client 30s timeout server 30s # Enable detailed logging for debugging distribution option httplog log stdout format raw local0 backend app_servers # roundrobin with weights - HAProxy's default balance roundrobin # Weight specified after 'weight' keyword # Range: 0-256 (0 means server is disabled) server app1 10.0.1.1:8080 weight 100 check server app2 10.0.1.2:8080 weight 100 check server app3 10.0.2.1:8080 weight 50 check server app4 10.0.2.2:8080 weight 50 check server app5 10.0.3.1:8080 weight 25 check # Dynamic weight adjustment via runtime API # echo "set server app_servers/app5 weight 10" | socat stdio /var/run/haproxy.sock # Slow start: gradually increase weight for new/recovered servers # Prevents thundering herd on recovery server app-new 10.0.4.1:8080 weight 100 check slowstart 60s backend canary_deployment { # Canary pattern: new version gets small traffic percentage balance roundrobin # Stable version: 95% of traffic server stable-1 10.0.1.1:8080 weight 95 check # Canary version: 5% of traffic server canary-1 10.0.2.1:8080 weight 5 check} frontend http_front bind *:80 # Route all traffic to weighted backend default_backend app_servers # ACL for canary (could be based on header, cookie, etc.) acl is_canary hdr(X-Canary) -i true use_backend canary_deployment if is_canary # Runtime API for dynamic weight adjustment# Allows changing weights without reloadlisten stats bind *:8404 stats enable stats uri /stats stats admin if TRUE # Enable runtime modificationsBoth NGINX and HAProxy support runtime weight adjustment without restarts. This is essential for operations like graceful server removal (set weight to 0, wait for connections to drain, then remove) or responding to detected capacity changes. Integrate this with your monitoring and automation systems.
Weighted Round Robin shines in specific scenarios. Understanding these helps you choose appropriately.
When Weighted Round Robin is Insufficient:
Weighted Round Robin introduces operational complexity: you must maintain accurate weights as infrastructure changes. If you're not actively managing weights based on real capacity differences, the complexity may not be justified. Regular audits of weight configurations are essential.
We've explored Weighted Round Robin in depth—from naive implementations to production algorithms. Here are the essential takeaways:
What's Next:
Weighted Round Robin distributes requests proportionally but still ignores actual server load. The next page explores Least Connections—an algorithm that considers the real-time number of active connections to each server, adapting dynamically to variable workloads.
You now have complete understanding of Weighted Round Robin—its algorithms, implementations, configuration patterns, and decision criteria. You can confidently implement proportional traffic distribution for heterogeneous backend pools.