System Design (HLD)Load Distribution Algorithms

Load Distribution Algorithms

LevelIntermediate

Duration90 mins

TopicLoad Distribution Algorithms

2 / 6

Weighted Round Robin: Proportional Distribution for Heterogeneous Systems

When Equal Distribution Isn't Fair

In an ideal world, all servers would be identical—same CPU, same memory, same network capacity. Standard Round Robin would distribute load perfectly. But reality is messier.

Consider these real-world scenarios:

You're migrating to new hardware and running both old and new machines simultaneously
Cloud instances come in various sizes—a c5.xlarge has 4x the capacity of a c5.large
Geographic distribution means some servers have better network paths than others
You're testing new server configurations and want to limit their traffic exposure

In all these cases, equal request distribution creates unequal load distribution. The solution is Weighted Round Robin—an elegant extension that distributes requests proportionally to server capacity.

Weighted Round Robin maintains the simplicity and predictability of standard Round Robin while accommodating heterogeneous infrastructure. It's the algorithm that acknowledges not all servers are created equal.

What You Will Master

By the end of this page, you will understand weight assignment strategies, multiple implementation algorithms with different tradeoffs, mathematical properties of weighted distribution, and precisely when weighted round robin is superior to its unweighted counterpart.

Understanding Weights: The Proportional Model

A weight is a positive integer that represents a server's relative capacity compared to other servers. Weights are not absolute—they only matter in relation to each other.

The Weight Interpretation:

Given servers with weights:

Server A: weight 5
Server B: weight 3
Server C: weight 2

Total weight = 5 + 3 + 2 = 10

The distribution target becomes:

Server A receives 5/10 = 50% of requests
Server B receives 3/10 = 30% of requests
Server C receives 2/10 = 20% of requests

This is proportional distribution—each server receives traffic proportional to its declared capacity.

Converting Mermaid diagram...

Weight Assignment Strategies:

Determining appropriate weights requires understanding your infrastructure:

Weight Assignment Approaches
Strategy	Description	When to Use
CPU-Based	Weight proportional to CPU cores or compute capacity	CPU-bound workloads (computation, rendering)
Memory-Based	Weight proportional to available RAM	Memory-intensive workloads (caching, large datasets)
Instance-Type Based	Cloud instance type determines weight (e.g., xlarge=4, large=2, medium=1)	Cloud deployments with mixed instance sizes
Benchmark-Based	Run load tests to measure actual throughput, set weights proportionally	Heterogeneous hardware, complex workloads
Manual/Operational	Explicitly set weights for traffic control (canary deployments, testing)	Gradual rollouts, A/B testing, capacity management

Start Simple, Refine Later

Begin with simple weight assignment (e.g., based on CPU count) and refine based on production metrics. Over-engineering weights upfront rarely pays off—observe actual load distribution and adjust.

Naive Implementation: The Expansion Approach

The simplest way to implement Weighted Round Robin is to expand the server list according to weights and then apply standard Round Robin.

The Expansion Algorithm:

Given:

Server A: weight 3
Server B: weight 2
Server C: weight 1

Create expanded list: [A, A, A, B, B, C]

Apply standard Round Robin to this list.

Over 6 requests:

Requests 1, 2, 3 → Server A
Requests 4, 5 → Server B
Request 6 → Server C

weighted_round_robin_naive.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
class NaiveWeightedRoundRobin:
    """
    Weighted Round Robin using list expansion.
    
    This approach is simple to understand but has significant
    drawbacks for large weights or many servers.
    """
    
    def __init__(self, servers: dict[str, int]):
        """
        Initialize with servers and their weights.
        
        Args:
            servers: Dictionary mapping server address to weight
                     e.g., {"10.0.1.1": 5, "10.0.1.2": 3, "10.0.1.3": 2}
        """
        # Expand server list according to weights
        self._expanded_list = []
        for server, weight in servers.items():
            self._expanded_list.extend([server] * weight)
        
        self._current_index = 0
    
    def get_next_server(self) -> str:
        """
        Select next server from expanded list.
        
        Time Complexity: O(1)
        Space Complexity: O(sum of all weights) - THIS IS THE PROBLEM!
        """
        server = self._expanded_list[self._current_index]
        self._current_index = (self._current_index + 1) % len(self._expanded_list)
        return server
 
# Example usage
servers = {
    "10.0.1.1:8080": 3,
    "10.0.1.2:8080": 2,
    "10.0.1.3:8080": 1,
}
 
lb = NaiveWeightedRoundRobin(servers)
 
# Demonstrate distribution
print("Request distribution over 12 requests:")
for i in range(12):
    server = lb.get_next_server()
    print(f"  Request {i+1:2d} → {server}")
 
# Output shows pattern: A, A, A, B, B, C, A, A, A, B, B, C

Problems with the Naive Approach:

Expansion Approach Drawbacks

•Memory Explosion: If you have 100 servers with weights of 1000 each, you need 100,000 entries. For cloud deployments with high weights, this becomes prohibitive.
•Burst Behavior: Requests come in bursts to each server (3 consecutive requests to A, then 2 to B). This creates uneven load patterns and poor cache utilization.
•Dynamic Weight Changes: Adding a server or changing a weight requires rebuilding the entire expanded list—an O(total weight) operation.
•GCD Optimization Required: To minimize memory, you should divide all weights by their GCD. Weights (1000, 500, 250) should become (4, 2, 1). All implementations need this optimization.

Not for Production

The naive expansion approach is useful for understanding the concept but should not be used in production. The burst behavior alone causes significant problems—a server receiving 100 consecutive requests experiences a traffic spike that doesn't reflect its intended load share.

Smooth Weighted Round Robin: The NGINX Algorithm

The Smooth Weighted Round Robin (SWRR) algorithm, made famous by its implementation in NGINX, solves the burst problem while maintaining O(n) space complexity (where n is the number of servers, not the sum of weights).

The Key Insight:

Instead of expanding the server list, SWRR tracks a current weight for each server that changes dynamically with each selection. The algorithm interleaves selections across servers, distributing requests smoothly while respecting weight proportions.

Algorithm Steps:

Each server has a static weight and a dynamic current_weight (initially 0)
For each request:
- Add each server's weight to its current_weight
- Select the server with the highest current_weight
- Subtract total_weight from the selected server's current_weight
Repeat

This creates a smooth interleaving that distributes requests proportionally without bursts.

smooth_weighted_round_robin.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
from dataclasses import dataclass
from typing import Optional
import threading
 
@dataclass
class WeightedServer:
    """Server with static weight and dynamic current weight."""
    address: str
    weight: int
    current_weight: int = 0
    effective_weight: int = 0  # Used for health-aware weighting
    
    def __post_init__(self):
        self.effective_weight = self.weight
 
class SmoothWeightedRoundRobin:
    """
    Smooth Weighted Round Robin - The NGINX Algorithm.
    
    This implementation provides:
    - Smooth distribution without bursts
    - O(n) space complexity
    - O(n) time per selection (scanning all servers)
    - Thread-safe operation
    - Dynamic weight adjustment for health
    """
    
    def __init__(self, servers: dict[str, int]):
        """
        Initialize with servers and their weights.
        
        Args:
            servers: Dictionary mapping server address to weight
        """
        self._servers = [
            WeightedServer(address=addr, weight=w)
            for addr, w in servers.items()
        ]
        self._total_weight = sum(s.weight for s in self._servers)
        self._lock = threading.Lock()
    
    def get_next_server(self) -> Optional[str]:
        """
        Select next server using SWRR algorithm.
        
        Time Complexity: O(n) where n is number of servers
        Space Complexity: O(1) additional space
        
        Returns:
            Selected server address, or None if no servers available
        """
        with self._lock:
            if not self._servers:
                return None
            
            # Step 1: Add effective_weight to current_weight for all servers
            for server in self._servers:
                server.current_weight += server.effective_weight
            
            # Step 2: Select server with highest current_weight
            best_server = max(self._servers, key=lambda s: s.current_weight)
            
            # Step 3: Subtract total_weight from selected server
            total_effective = sum(s.effective_weight for s in self._servers)
            best_server.current_weight -= total_effective
            
            return best_server.address
    
    def report_failure(self, address: str) -> None:
        """
        Decrease effective weight on failure (graceful degradation).
        
        This allows the algorithm to automatically route less traffic
        to struggling servers without completely removing them.
        """
        with self._lock:
            for server in self._servers:
                if server.address == address:
                    # Decrease effective weight, but keep minimum of 1
                    server.effective_weight = max(1, server.effective_weight - 1)
                    break
    
    def report_success(self, address: str) -> None:
        """
        Increase effective weight on success (gradual recovery).
        
        Allows previously-degraded servers to recover their full
        traffic share as they prove reliable.
        """
        with self._lock:
            for server in self._servers:
                if server.address == address:
                    # Increase effective weight, up to original weight
                    if server.effective_weight < server.weight:
                        server.effective_weight += 1
                    break
    
    def get_distribution_state(self) -> list[dict]:
        """Debug method: Return current state of all servers."""
        with self._lock:
            return [
                {
                    "address": s.address,
                    "weight": s.weight,
                    "effective_weight": s.effective_weight,
                    "current_weight": s.current_weight,
                }
                for s in self._servers
            ]
 
 
# Demonstration of smooth distribution
if __name__ == "__main__":
    servers = {
        "A": 5,
        "B": 3,
        "C": 1,
    }
    
    lb = SmoothWeightedRoundRobin(servers)
    
    print("SWRR Distribution over 9 requests:")
    print("-" * 50)
    
    selections = []
    for i in range(9):
        server = lb.get_next_server()
        selections.append(server)
        print(f"Request {i+1}: Selected {server}")
    
    print("-" * 50)
    print(f"\nDistribution: {dict((s, selections.count(s)) for s in set(selections))}")
    print("Expected:     A=5, B=3, C=1")
    
    # Output demonstrates smooth interleaving:
    # A, B, A, C, A, B, A, B, A
    # NOT the burst pattern: A, A, A, A, A, B, B, B, C

Visualizing the Smooth Distribution:

Let's trace through SWRR with weights A=5, B=1:

Request	Before Selection	Selected	After Selection
1	A:5, B:1	A (5>1)	A:-1, B:1
2	A:4, B:2	A (4>2)	A:-2, B:2
3	A:3, B:3	A (tie, A first)	A:-3, B:3
4	A:2, B:4	B (4>2)	A:2, B:-2
5	A:7, B:-1	A (7>-1)	A:1, B:-1
6	A:6, B:0	A (6>0)	A:0, B:0

Over 6 requests: A=5, B=1 — exactly proportional to weights!

Notice the interleaving: A, A, A, B, A, A rather than A, A, A, A, A, B

Production-Grade Algorithm

Smooth Weighted Round Robin is used in NGINX, Envoy, and many other production load balancers. It provides optimal distribution smoothness with acceptable O(n) selection overhead. For most systems with fewer than 1000 backends, this is the recommended weighted algorithm.

Interleaved Weighted Round Robin: The HAProxy Approach

HAProxy uses a different approach that pre-computes an interleaved sequence achieving smooth distribution while maintaining O(1) selection time. This trades memory for selection performance.

The Algorithm:

Pre-compute a sequence of length GCD-reduced sum of weights
Interleave server entries to maximize spacing between same-server selections
Use standard round-robin on this pre-computed sequence

Example:

Weights: A=2, B=1, C=1 (sum=4)

Interleaved sequence: [A, B, A, C]

This spreads A's two slots apart, avoiding the burst pattern [A, A, B, C].

interleaved_wrr.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
import math
from typing import List, Dict
 
def gcd_of_list(numbers: List[int]) -> int:
    """Compute GCD of a list of numbers."""
    result = numbers[0]
    for num in numbers[1:]:
        result = math.gcd(result, num)
    return result
 
def build_interleaved_sequence(servers: Dict[str, int]) -> List[str]:
    """
    Build an interleaved sequence for smooth weighted distribution.
    
    Uses the "virtual server" approach: spread each server's slots
    as evenly as possible across the sequence.
    
    Time Complexity: O(total_weight)
    Space Complexity: O(total_weight)
    """
    if not servers:
        return []
    
    # Normalize weights by GCD to minimize sequence length
    weights = list(servers.values())
    common_gcd = gcd_of_list(weights)
    normalized = {
        server: weight // common_gcd 
        for server, weight in servers.items()
    }
    
    total_weight = sum(normalized.values())
    sequence = [None] * total_weight
    
    # For each server, distribute its slots evenly
    for server, weight in normalized.items():
        if weight == 0:
            continue
            
        # Calculate ideal spacing between this server's slots
        spacing = total_weight / weight
        
        # Place slots at optimal positions
        for i in range(weight):
            ideal_position = int(i * spacing)
            
            # Find nearest empty slot (linear probe)
            pos = ideal_position
            while sequence[pos] is not None:
                pos = (pos + 1) % total_weight
            
            sequence[pos] = server
    
    return sequence
 
 
class InterleavedWeightedRoundRobin:
    """
    Weighted Round Robin with pre-computed interleaved sequence.
    
    Advantages:
    - O(1) selection time
    - Smooth distribution
    
    Disadvantages:
    - O(total_weight) memory
    - Rebuilding required on weight changes
    """
    
    def __init__(self, servers: Dict[str, int]):
        self._original_weights = dict(servers)
        self._sequence = build_interleaved_sequence(servers)
        self._index = 0
    
    def get_next_server(self) -> str:
        """O(1) selection from pre-computed sequence."""
        if not self._sequence:
            return None
            
        server = self._sequence[self._index]
        self._index = (self._index + 1) % len(self._sequence)
        return server
    
    def rebuild_sequence(self):
        """Rebuild after weight changes. O(total_weight)."""
        self._sequence = build_interleaved_sequence(self._original_weights)
        self._index = 0
 
 
# Demonstration
if __name__ == "__main__":
    servers = {"A": 5, "B": 3, "C": 2}
    
    sequence = build_interleaved_sequence(servers)
    print(f"Interleaved sequence for weights A=5, B=3, C=2:")
    print(f"  {sequence}")
    print(f"  Length: {len(sequence)}")
    
    # Verify distribution
    from collections import Counter
    counts = Counter(sequence)
    print(f"\nDistribution: {dict(counts)}")
    print(f"Expected:     A=5, B=3, C=2")
    
    # Show smooth distribution with load balancer
    lb = InterleavedWeightedRoundRobin(servers)
    print(f"\nFirst 20 selections:")
    selections = [lb.get_next_server() for _ in range(20)]
    print(f"  {selections}")

SWRR vs Interleaved WRR Comparison
Aspect	Smooth WRR (NGINX)	Interleaved WRR (HAProxy)
Selection Time	O(n) - must scan all servers	O(1) - array lookup
Memory Usage	O(n) - one entry per server	O(Σweights) - can be large
Dynamic Weight Changes	Immediate - just update weight	Requires sequence rebuild
Distribution Smoothness	Excellent	Excellent
Implementation Complexity	Simple	Moderate
Best For	Dynamic pools, moderate server count	Static pools, very high throughput

Choose Based on Your Constraints

If you have few servers that change frequently → use SWRR. If you have a stable pool and need maximum throughput → consider interleaved WRR. In practice, SWRR's O(n) overhead is negligible for < 1000 servers, making it the more common choice.

Production Configuration: NGINX and HAProxy

Let's examine how to configure weighted round robin in production load balancers.

nginx_weighted.conf
NGINX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# NGINX Weighted Round Robin Configuration
# Uses Smooth Weighted Round Robin algorithm internally
 
upstream backend_pool {
    # Weight parameter specifies relative capacity
    # Higher weight = more traffic
    
    # High-capacity servers (new hardware)
    server 10.0.1.1:8080 weight=10;
    server 10.0.1.2:8080 weight=10;
    
    # Medium-capacity servers  
    server 10.0.2.1:8080 weight=5;
    server 10.0.2.2:8080 weight=5;
    
    # Lower-capacity servers (older hardware)
    server 10.0.3.1:8080 weight=2;
    
    # Canary server (testing new version)
    # Low weight limits blast radius
    server 10.0.4.1:8080 weight=1;
    
    # Health check settings
    # max_fails: failures before marking unhealthy
    # fail_timeout: unhealthy period duration
}
 
upstream mixed_instance_types {
    # AWS instance type-based weighting example
    # c5.2xlarge has ~4x capacity of c5.large
    
    server c5-2xlarge-1.internal:8080 weight=8;
    server c5-2xlarge-2.internal:8080 weight=8;
    server c5-xlarge-1.internal:8080 weight=4;
    server c5-xlarge-2.internal:8080 weight=4;
    server c5-large-1.internal:8080 weight=2;
    server c5-large-2.internal:8080 weight=2;
    
    # Optional: Backup server with zero weight for normal traffic
    # Only receives traffic when all primary servers are down
    server backup.internal:8080 backup;
}
 
# Weight=0 is special: server doesn't receive traffic
# Useful for maintenance or graceful shutdown
upstream with_maintenance {
    server app1.internal:8080 weight=5;
    server app2.internal:8080 weight=5;
    server app3.internal:8080 weight=0;  # In maintenance
}
 
server {
    listen 80;
    
    location / {
        proxy_pass http://backend_pool;
    }
    
    # Health endpoint for monitoring weight distribution
    location /upstream_status {
        # Requires NGINX Plus or ngx_http_upstream_module
        # Shows current weight and connection counts
    }
}

haproxy_weighted.cfg

HAProxy

# HAProxy Weighted Round Robin Configuration
 
global
    maxconn 50000
 
defaults
    mode http
    timeout connect 5s
    timeout client 30s
    timeout server 30s
    
    # Enable detailed logging for debugging distribution
    option httplog
    log stdout format raw local0
 
backend app_servers
    # roundrobin with weights - HAProxy's default
    balance roundrobin
    
    # Weight specified after 'weight' keyword
    # Range: 0-256 (0 means server is disabled)
    
    server app1 10.0.1.1:8080 weight 100 check
    server app2 10.0.1.2:8080 weight 100 check
    server app3 10.0.2.1:8080 weight 50 check
    server app4 10.0.2.2:8080 weight 50 check
    server app5 10.0.3.1:8080 weight 25 check
    
    # Dynamic weight adjustment via runtime API
    # echo "set server app_servers/app5 weight 10" | socat stdio /var/run/haproxy.sock
    
    # Slow start: gradually increase weight for new/recovered servers
    # Prevents thundering herd on recovery
    server app-new 10.0.4.1:8080 weight 100 check slowstart 60s
 
backend canary_deployment {
    # Canary pattern: new version gets small traffic percentage
    balance roundrobin
    
    # Stable version: 95% of traffic
    server stable-1 10.0.1.1:8080 weight 95 check
    
    # Canary version: 5% of traffic
    server canary-1 10.0.2.1:8080 weight 5 check
}
 
frontend http_front
    bind *:80
    
    # Route all traffic to weighted backend
    default_backend app_servers
    
    # ACL for canary (could be based on header, cookie, etc.)
    acl is_canary hdr(X-Canary) -i true
    use_backend canary_deployment if is_canary
 
# Runtime API for dynamic weight adjustment
# Allows changing weights without reload
listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats admin if TRUE  # Enable runtime modifications

Dynamic Weight Adjustment

Both NGINX and HAProxy support runtime weight adjustment without restarts. This is essential for operations like graceful server removal (set weight to 0, wait for connections to drain, then remove) or responding to detected capacity changes. Integrate this with your monitoring and automation systems.

Use Cases and Decision Framework

Weighted Round Robin shines in specific scenarios. Understanding these helps you choose appropriately.

Ideal Use Cases for Weighted Round Robin

•Mixed Hardware Generations: Running both old and new servers during hardware refresh cycles. Weight new machines higher to utilize their capacity.
•Cloud Instance Size Variation: Using multiple instance types for cost optimization (spot instances, reserved capacity). Weight according to relative compute power.
•Canary Deployments: Routing a small percentage of traffic to new versions for testing. Set canary weight to 1-5% of stable.
•Gradual Rollouts: Progressively shifting traffic to new deployments by increasing weights over time.
•Geographic Optimization: Weighting nearby servers higher when latency to all is acceptable but local preference exists.
•Graceful Scaling: Adding new capacity by introducing servers with weights that ramp up via slow-start.

When Weighted Round Robin is Insufficient:

Consider Alternatives When

•Request costs vary significantly: Static weights can't account for the difference between a 10ms and 10s request. Consider Least Connections.
•Real-time capacity changes: If server capacity fluctuates (shared hosts, autoscaling groups), adaptive algorithms may be better.
•Session affinity required: Weighted RR provides no session stickiness. Add a session persistence layer or use IP hash.
•Servers have similar capacity: If all servers are identical, standard Round Robin is simpler with no downsides.
•Weights are all 1: If you've set all weights to 1, you're just doing unweighted Round Robin with extra complexity.

The Weight Maintenance Burden

Weighted Round Robin introduces operational complexity: you must maintain accurate weights as infrastructure changes. If you're not actively managing weights based on real capacity differences, the complexity may not be justified. Regular audits of weight configurations are essential.

Summary: Weighted Round Robin Mastery

We've explored Weighted Round Robin in depth—from naive implementations to production algorithms. Here are the essential takeaways:

Key Takeaways

•Weights represent relative capacity: A weight of 10 means twice the traffic of weight 5, regardless of absolute values.
•Smooth WRR (NGINX) is the recommended algorithm: O(n) selection, O(n) space, excellent distribution, simple implementation.
•Avoid naive expansion: The burst pattern and memory consumption make it unsuitable for production.
•Integrate with health checking: Effective weight adjustment provides graceful degradation without hard failures.
•Manage weight configuration: Weights must be maintained as infrastructure evolves—treat them as infrastructure-as-code.
•Use for intentional differentiation: Only apply weighted RR when you have genuine capacity differences to model.

What's Next:

Weighted Round Robin distributes requests proportionally but still ignores actual server load. The next page explores Least Connections—an algorithm that considers the real-time number of active connections to each server, adapting dynamically to variable workloads.

Page Complete

You now have complete understanding of Weighted Round Robin—its algorithms, implementations, configuration patterns, and decision criteria. You can confidently implement proportional traffic distribution for heterogeneous backend pools.

2 / 6

Loading learning content...

System Design (HLD)Load Distribution Algorithms

Load Distribution Algorithms

LevelIntermediate

Duration90 mins

TopicLoad Distribution Algorithms

2 / 6

Weighted Round Robin: Proportional Distribution for Heterogeneous Systems

When Equal Distribution Isn't Fair

In an ideal world, all servers would be identical—same CPU, same memory, same network capacity. Standard Round Robin would distribute load perfectly. But reality is messier.

Consider these real-world scenarios:

You're migrating to new hardware and running both old and new machines simultaneously
Cloud instances come in various sizes—a c5.xlarge has 4x the capacity of a c5.large
Geographic distribution means some servers have better network paths than others
You're testing new server configurations and want to limit their traffic exposure

What You Will Master

Understanding Weights: The Proportional Model

A weight is a positive integer that represents a server's relative capacity compared to other servers. Weights are not absolute—they only matter in relation to each other.

The Weight Interpretation:

Given servers with weights:

Server A: weight 5
Server B: weight 3
Server C: weight 2

Total weight = 5 + 3 + 2 = 10

The distribution target becomes:

Server A receives 5/10 = 50% of requests
Server B receives 3/10 = 30% of requests
Server C receives 2/10 = 20% of requests

This is proportional distribution—each server receives traffic proportional to its declared capacity.

Converting Mermaid diagram...

Weight Assignment Strategies:

Determining appropriate weights requires understanding your infrastructure:

Weight Assignment Approaches
Strategy	Description	When to Use
CPU-Based	Weight proportional to CPU cores or compute capacity	CPU-bound workloads (computation, rendering)
Memory-Based	Weight proportional to available RAM	Memory-intensive workloads (caching, large datasets)
Instance-Type Based	Cloud instance type determines weight (e.g., xlarge=4, large=2, medium=1)	Cloud deployments with mixed instance sizes
Benchmark-Based	Run load tests to measure actual throughput, set weights proportionally	Heterogeneous hardware, complex workloads
Manual/Operational	Explicitly set weights for traffic control (canary deployments, testing)	Gradual rollouts, A/B testing, capacity management

Start Simple, Refine Later

Begin with simple weight assignment (e.g., based on CPU count) and refine based on production metrics. Over-engineering weights upfront rarely pays off—observe actual load distribution and adjust.

Naive Implementation: The Expansion Approach

The simplest way to implement Weighted Round Robin is to expand the server list according to weights and then apply standard Round Robin.

The Expansion Algorithm:

Given:

Server A: weight 3
Server B: weight 2
Server C: weight 1

Create expanded list: [A, A, A, B, B, C]

Apply standard Round Robin to this list.

Over 6 requests:

Requests 1, 2, 3 → Server A
Requests 4, 5 → Server B
Request 6 → Server C

weighted_round_robin_naive.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
class NaiveWeightedRoundRobin:
    """
    Weighted Round Robin using list expansion.
    
    This approach is simple to understand but has significant
    drawbacks for large weights or many servers.
    """
    
    def __init__(self, servers: dict[str, int]):
        """
        Initialize with servers and their weights.
        
        Args:
            servers: Dictionary mapping server address to weight
                     e.g., {"10.0.1.1": 5, "10.0.1.2": 3, "10.0.1.3": 2}
        """
        # Expand server list according to weights
        self._expanded_list = []
        for server, weight in servers.items():
            self._expanded_list.extend([server] * weight)
        
        self._current_index = 0
    
    def get_next_server(self) -> str:
        """
        Select next server from expanded list.
        
        Time Complexity: O(1)
        Space Complexity: O(sum of all weights) - THIS IS THE PROBLEM!
        """
        server = self._expanded_list[self._current_index]
        self._current_index = (self._current_index + 1) % len(self._expanded_list)
        return server
 
# Example usage
servers = {
    "10.0.1.1:8080": 3,
    "10.0.1.2:8080": 2,
    "10.0.1.3:8080": 1,
}
 
lb = NaiveWeightedRoundRobin(servers)
 
# Demonstrate distribution
print("Request distribution over 12 requests:")
for i in range(12):
    server = lb.get_next_server()
    print(f"  Request {i+1:2d} → {server}")
 
# Output shows pattern: A, A, A, B, B, C, A, A, A, B, B, C

Problems with the Naive Approach:

Expansion Approach Drawbacks

•Memory Explosion: If you have 100 servers with weights of 1000 each, you need 100,000 entries. For cloud deployments with high weights, this becomes prohibitive.
•Burst Behavior: Requests come in bursts to each server (3 consecutive requests to A, then 2 to B). This creates uneven load patterns and poor cache utilization.
•Dynamic Weight Changes: Adding a server or changing a weight requires rebuilding the entire expanded list—an O(total weight) operation.
•GCD Optimization Required: To minimize memory, you should divide all weights by their GCD. Weights (1000, 500, 250) should become (4, 2, 1). All implementations need this optimization.

Not for Production

Smooth Weighted Round Robin: The NGINX Algorithm

The Key Insight:

Algorithm Steps:

Each server has a static weight and a dynamic current_weight (initially 0)
For each request:
- Add each server's weight to its current_weight
- Select the server with the highest current_weight
- Subtract total_weight from the selected server's current_weight
Repeat

This creates a smooth interleaving that distributes requests proportionally without bursts.

smooth_weighted_round_robin.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
from dataclasses import dataclass
from typing import Optional
import threading
 
@dataclass
class WeightedServer:
    """Server with static weight and dynamic current weight."""
    address: str
    weight: int
    current_weight: int = 0
    effective_weight: int = 0  # Used for health-aware weighting
    
    def __post_init__(self):
        self.effective_weight = self.weight
 
class SmoothWeightedRoundRobin:
    """
    Smooth Weighted Round Robin - The NGINX Algorithm.
    
    This implementation provides:
    - Smooth distribution without bursts
    - O(n) space complexity
    - O(n) time per selection (scanning all servers)
    - Thread-safe operation
    - Dynamic weight adjustment for health
    """
    
    def __init__(self, servers: dict[str, int]):
        """
        Initialize with servers and their weights.
        
        Args:
            servers: Dictionary mapping server address to weight
        """
        self._servers = [
            WeightedServer(address=addr, weight=w)
            for addr, w in servers.items()
        ]
        self._total_weight = sum(s.weight for s in self._servers)
        self._lock = threading.Lock()
    
    def get_next_server(self) -> Optional[str]:
        """
        Select next server using SWRR algorithm.
        
        Time Complexity: O(n) where n is number of servers
        Space Complexity: O(1) additional space
        
        Returns:
            Selected server address, or None if no servers available
        """
        with self._lock:
            if not self._servers:
                return None
            
            # Step 1: Add effective_weight to current_weight for all servers
            for server in self._servers:
                server.current_weight += server.effective_weight
            
            # Step 2: Select server with highest current_weight
            best_server = max(self._servers, key=lambda s: s.current_weight)
            
            # Step 3: Subtract total_weight from selected server
            total_effective = sum(s.effective_weight for s in self._servers)
            best_server.current_weight -= total_effective
            
            return best_server.address
    
    def report_failure(self, address: str) -> None:
        """
        Decrease effective weight on failure (graceful degradation).
        
        This allows the algorithm to automatically route less traffic
        to struggling servers without completely removing them.
        """
        with self._lock:
            for server in self._servers:
                if server.address == address:
                    # Decrease effective weight, but keep minimum of 1
                    server.effective_weight = max(1, server.effective_weight - 1)
                    break
    
    def report_success(self, address: str) -> None:
        """
        Increase effective weight on success (gradual recovery).
        
        Allows previously-degraded servers to recover their full
        traffic share as they prove reliable.
        """
        with self._lock:
            for server in self._servers:
                if server.address == address:
                    # Increase effective weight, up to original weight
                    if server.effective_weight < server.weight:
                        server.effective_weight += 1
                    break
    
    def get_distribution_state(self) -> list[dict]:
        """Debug method: Return current state of all servers."""
        with self._lock:
            return [
                {
                    "address": s.address,
                    "weight": s.weight,
                    "effective_weight": s.effective_weight,
                    "current_weight": s.current_weight,
                }
                for s in self._servers
            ]
 
 
# Demonstration of smooth distribution
if __name__ == "__main__":
    servers = {
        "A": 5,
        "B": 3,
        "C": 1,
    }
    
    lb = SmoothWeightedRoundRobin(servers)
    
    print("SWRR Distribution over 9 requests:")
    print("-" * 50)
    
    selections = []
    for i in range(9):
        server = lb.get_next_server()
        selections.append(server)
        print(f"Request {i+1}: Selected {server}")
    
    print("-" * 50)
    print(f"\nDistribution: {dict((s, selections.count(s)) for s in set(selections))}")
    print("Expected:     A=5, B=3, C=1")
    
    # Output demonstrates smooth interleaving:
    # A, B, A, C, A, B, A, B, A
    # NOT the burst pattern: A, A, A, A, A, B, B, B, C

Visualizing the Smooth Distribution:

Let's trace through SWRR with weights A=5, B=1:

Request	Before Selection	Selected	After Selection
1	A:5, B:1	A (5>1)	A:-1, B:1
2	A:4, B:2	A (4>2)	A:-2, B:2
3	A:3, B:3	A (tie, A first)	A:-3, B:3
4	A:2, B:4	B (4>2)	A:2, B:-2
5	A:7, B:-1	A (7>-1)	A:1, B:-1
6	A:6, B:0	A (6>0)	A:0, B:0

Over 6 requests: A=5, B=1 — exactly proportional to weights!

Notice the interleaving: A, A, A, B, A, A rather than A, A, A, A, A, B

Production-Grade Algorithm

Interleaved Weighted Round Robin: The HAProxy Approach

HAProxy uses a different approach that pre-computes an interleaved sequence achieving smooth distribution while maintaining O(1) selection time. This trades memory for selection performance.

The Algorithm:

Pre-compute a sequence of length GCD-reduced sum of weights
Interleave server entries to maximize spacing between same-server selections
Use standard round-robin on this pre-computed sequence

Example:

Weights: A=2, B=1, C=1 (sum=4)

Interleaved sequence: [A, B, A, C]

This spreads A's two slots apart, avoiding the burst pattern [A, A, B, C].

interleaved_wrr.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
import math
from typing import List, Dict
 
def gcd_of_list(numbers: List[int]) -> int:
    """Compute GCD of a list of numbers."""
    result = numbers[0]
    for num in numbers[1:]:
        result = math.gcd(result, num)
    return result
 
def build_interleaved_sequence(servers: Dict[str, int]) -> List[str]:
    """
    Build an interleaved sequence for smooth weighted distribution.
    
    Uses the "virtual server" approach: spread each server's slots
    as evenly as possible across the sequence.
    
    Time Complexity: O(total_weight)
    Space Complexity: O(total_weight)
    """
    if not servers:
        return []
    
    # Normalize weights by GCD to minimize sequence length
    weights = list(servers.values())
    common_gcd = gcd_of_list(weights)
    normalized = {
        server: weight // common_gcd 
        for server, weight in servers.items()
    }
    
    total_weight = sum(normalized.values())
    sequence = [None] * total_weight
    
    # For each server, distribute its slots evenly
    for server, weight in normalized.items():
        if weight == 0:
            continue
            
        # Calculate ideal spacing between this server's slots
        spacing = total_weight / weight
        
        # Place slots at optimal positions
        for i in range(weight):
            ideal_position = int(i * spacing)
            
            # Find nearest empty slot (linear probe)
            pos = ideal_position
            while sequence[pos] is not None:
                pos = (pos + 1) % total_weight
            
            sequence[pos] = server
    
    return sequence
 
 
class InterleavedWeightedRoundRobin:
    """
    Weighted Round Robin with pre-computed interleaved sequence.
    
    Advantages:
    - O(1) selection time
    - Smooth distribution
    
    Disadvantages:
    - O(total_weight) memory
    - Rebuilding required on weight changes
    """
    
    def __init__(self, servers: Dict[str, int]):
        self._original_weights = dict(servers)
        self._sequence = build_interleaved_sequence(servers)
        self._index = 0
    
    def get_next_server(self) -> str:
        """O(1) selection from pre-computed sequence."""
        if not self._sequence:
            return None
            
        server = self._sequence[self._index]
        self._index = (self._index + 1) % len(self._sequence)
        return server
    
    def rebuild_sequence(self):
        """Rebuild after weight changes. O(total_weight)."""
        self._sequence = build_interleaved_sequence(self._original_weights)
        self._index = 0
 
 
# Demonstration
if __name__ == "__main__":
    servers = {"A": 5, "B": 3, "C": 2}
    
    sequence = build_interleaved_sequence(servers)
    print(f"Interleaved sequence for weights A=5, B=3, C=2:")
    print(f"  {sequence}")
    print(f"  Length: {len(sequence)}")
    
    # Verify distribution
    from collections import Counter
    counts = Counter(sequence)
    print(f"\nDistribution: {dict(counts)}")
    print(f"Expected:     A=5, B=3, C=2")
    
    # Show smooth distribution with load balancer
    lb = InterleavedWeightedRoundRobin(servers)
    print(f"\nFirst 20 selections:")
    selections = [lb.get_next_server() for _ in range(20)]
    print(f"  {selections}")

SWRR vs Interleaved WRR Comparison
Aspect	Smooth WRR (NGINX)	Interleaved WRR (HAProxy)
Selection Time	O(n) - must scan all servers	O(1) - array lookup
Memory Usage	O(n) - one entry per server	O(Σweights) - can be large
Dynamic Weight Changes	Immediate - just update weight	Requires sequence rebuild
Distribution Smoothness	Excellent	Excellent
Implementation Complexity	Simple	Moderate
Best For	Dynamic pools, moderate server count	Static pools, very high throughput

Choose Based on Your Constraints

Production Configuration: NGINX and HAProxy

Let's examine how to configure weighted round robin in production load balancers.

nginx_weighted.conf
NGINX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# NGINX Weighted Round Robin Configuration
# Uses Smooth Weighted Round Robin algorithm internally
 
upstream backend_pool {
    # Weight parameter specifies relative capacity
    # Higher weight = more traffic
    
    # High-capacity servers (new hardware)
    server 10.0.1.1:8080 weight=10;
    server 10.0.1.2:8080 weight=10;
    
    # Medium-capacity servers  
    server 10.0.2.1:8080 weight=5;
    server 10.0.2.2:8080 weight=5;
    
    # Lower-capacity servers (older hardware)
    server 10.0.3.1:8080 weight=2;
    
    # Canary server (testing new version)
    # Low weight limits blast radius
    server 10.0.4.1:8080 weight=1;
    
    # Health check settings
    # max_fails: failures before marking unhealthy
    # fail_timeout: unhealthy period duration
}
 
upstream mixed_instance_types {
    # AWS instance type-based weighting example
    # c5.2xlarge has ~4x capacity of c5.large
    
    server c5-2xlarge-1.internal:8080 weight=8;
    server c5-2xlarge-2.internal:8080 weight=8;
    server c5-xlarge-1.internal:8080 weight=4;
    server c5-xlarge-2.internal:8080 weight=4;
    server c5-large-1.internal:8080 weight=2;
    server c5-large-2.internal:8080 weight=2;
    
    # Optional: Backup server with zero weight for normal traffic
    # Only receives traffic when all primary servers are down
    server backup.internal:8080 backup;
}
 
# Weight=0 is special: server doesn't receive traffic
# Useful for maintenance or graceful shutdown
upstream with_maintenance {
    server app1.internal:8080 weight=5;
    server app2.internal:8080 weight=5;
    server app3.internal:8080 weight=0;  # In maintenance
}
 
server {
    listen 80;
    
    location / {
        proxy_pass http://backend_pool;
    }
    
    # Health endpoint for monitoring weight distribution
    location /upstream_status {
        # Requires NGINX Plus or ngx_http_upstream_module
        # Shows current weight and connection counts
    }
}

haproxy_weighted.cfg

HAProxy

# HAProxy Weighted Round Robin Configuration
 
global
    maxconn 50000
 
defaults
    mode http
    timeout connect 5s
    timeout client 30s
    timeout server 30s
    
    # Enable detailed logging for debugging distribution
    option httplog
    log stdout format raw local0
 
backend app_servers
    # roundrobin with weights - HAProxy's default
    balance roundrobin
    
    # Weight specified after 'weight' keyword
    # Range: 0-256 (0 means server is disabled)
    
    server app1 10.0.1.1:8080 weight 100 check
    server app2 10.0.1.2:8080 weight 100 check
    server app3 10.0.2.1:8080 weight 50 check
    server app4 10.0.2.2:8080 weight 50 check
    server app5 10.0.3.1:8080 weight 25 check
    
    # Dynamic weight adjustment via runtime API
    # echo "set server app_servers/app5 weight 10" | socat stdio /var/run/haproxy.sock
    
    # Slow start: gradually increase weight for new/recovered servers
    # Prevents thundering herd on recovery
    server app-new 10.0.4.1:8080 weight 100 check slowstart 60s
 
backend canary_deployment {
    # Canary pattern: new version gets small traffic percentage
    balance roundrobin
    
    # Stable version: 95% of traffic
    server stable-1 10.0.1.1:8080 weight 95 check
    
    # Canary version: 5% of traffic
    server canary-1 10.0.2.1:8080 weight 5 check
}
 
frontend http_front
    bind *:80
    
    # Route all traffic to weighted backend
    default_backend app_servers
    
    # ACL for canary (could be based on header, cookie, etc.)
    acl is_canary hdr(X-Canary) -i true
    use_backend canary_deployment if is_canary
 
# Runtime API for dynamic weight adjustment
# Allows changing weights without reload
listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats admin if TRUE  # Enable runtime modifications

Dynamic Weight Adjustment

Use Cases and Decision Framework

Weighted Round Robin shines in specific scenarios. Understanding these helps you choose appropriately.

Ideal Use Cases for Weighted Round Robin

•Mixed Hardware Generations: Running both old and new servers during hardware refresh cycles. Weight new machines higher to utilize their capacity.
•Cloud Instance Size Variation: Using multiple instance types for cost optimization (spot instances, reserved capacity). Weight according to relative compute power.
•Canary Deployments: Routing a small percentage of traffic to new versions for testing. Set canary weight to 1-5% of stable.
•Gradual Rollouts: Progressively shifting traffic to new deployments by increasing weights over time.
•Geographic Optimization: Weighting nearby servers higher when latency to all is acceptable but local preference exists.
•Graceful Scaling: Adding new capacity by introducing servers with weights that ramp up via slow-start.

When Weighted Round Robin is Insufficient:

Consider Alternatives When

•Request costs vary significantly: Static weights can't account for the difference between a 10ms and 10s request. Consider Least Connections.
•Real-time capacity changes: If server capacity fluctuates (shared hosts, autoscaling groups), adaptive algorithms may be better.
•Session affinity required: Weighted RR provides no session stickiness. Add a session persistence layer or use IP hash.
•Servers have similar capacity: If all servers are identical, standard Round Robin is simpler with no downsides.
•Weights are all 1: If you've set all weights to 1, you're just doing unweighted Round Robin with extra complexity.

The Weight Maintenance Burden

Summary: Weighted Round Robin Mastery

We've explored Weighted Round Robin in depth—from naive implementations to production algorithms. Here are the essential takeaways:

Key Takeaways

•Weights represent relative capacity: A weight of 10 means twice the traffic of weight 5, regardless of absolute values.
•Smooth WRR (NGINX) is the recommended algorithm: O(n) selection, O(n) space, excellent distribution, simple implementation.
•Avoid naive expansion: The burst pattern and memory consumption make it unsuitable for production.
•Integrate with health checking: Effective weight adjustment provides graceful degradation without hard failures.
•Manage weight configuration: Weights must be maintained as infrastructure evolves—treat them as infrastructure-as-code.
•Use for intentional differentiation: Only apply weighted RR when you have genuine capacity differences to model.

What's Next:

Page Complete

2 / 6