Load Distribution Algorithms - Learning Module

Loading content...

0/273

IP Hash: Deterministic Routing for Session Affinity

The Session Affinity Problem

All the algorithms we've examined—Round Robin, Weighted Round Robin, Least Connections—share a fundamental property: each request is routed independently of previous requests. The load balancer has no memory.

This works perfectly for stateless applications. But many applications have state:

Session data: Shopping carts, login states, form progress
In-memory caches: User-specific cached data
WebSocket connections: Persistent bidirectional communication
Server-side state: Gaming sessions, collaborative editing

If a user's first request creates session data on Server A, subsequent requests routed to Server B won't find that data. The session breaks.

IP Hash solves this by using the client's IP address to deterministically select a server. The same IP always routes to the same server (as long as the server pool is stable), providing session affinity without shared session storage.

What You Will Master

By the end of this page, you will understand IP Hash mechanics and hash function selection, its strengths and critical limitations, implementation in production load balancers, and alternative approaches to session affinity.

Algorithm Mechanics: How IP Hash Works

IP Hash uses a deterministic hash function to map client IP addresses to backend servers.

The Basic Algorithm:

Extract the client's IP address from the request
Compute hash(client_ip)
Map the hash to a server: server_index = hash(client_ip) % number_of_servers
Route the request to servers[server_index]

The Determinism Property:

Given the same inputs (IP address, server list), the same output (server selection) always results. This is what provides session affinity—a client with IP 192.168.1.50 will always reach the same server.

ip_hash.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
import hashlib
import socket
import struct
from typing import Optional, List
import threading
 
class IPHashLoadBalancer:
    """
    IP Hash load balancer implementation.
    
    Routes requests based on client IP address, ensuring
    the same client always reaches the same server.
    """
    
    def __init__(self, servers: List[str]):
        """
        Initialize with list of backend server addresses.
        
        Args:
            servers: Ordered list of server addresses
        """
        if not servers:
            raise ValueError("At least one server required")
        
        self._servers = list(servers)
        self._lock = threading.Lock()
    
    def _ip_to_int(self, ip: str) -> int:
        """
        Convert IP address string to integer for hashing.
        
        Handles both IPv4 and IPv6 addresses.
        """
        try:
            # Try IPv4 first
            packed = socket.inet_aton(ip)
            return struct.unpack("!I", packed)[0]
        except socket.error:
            try:
                # Try IPv6
                packed = socket.inet_pton(socket.AF_INET6, ip)
                # Use first 64 bits for hashing
                return struct.unpack("!Q", packed[:8])[0]
            except socket.error:
                # Fallback: hash the string directly
                return hash(ip)
    
    def _compute_hash(self, ip: str) -> int:
        """
        Compute hash of IP address.
        
        Uses MD5 for consistent, well-distributed hashing.
        (Not for cryptographic purposes - just distribution)
        """
        ip_int = self._ip_to_int(ip)
        # Use MD5 for good distribution
        h = hashlib.md5(str(ip_int).encode())
        return int(h.hexdigest(), 16)
    
    def get_server(self, client_ip: str) -> str:
        """
        Select server for the given client IP.
        
        Deterministic: same IP always returns same server
        (as long as server list is unchanged).
        
        Time Complexity: O(1)
        """
        with self._lock:
            if not self._servers:
                raise RuntimeError("No servers available")
            
            hash_value = self._compute_hash(client_ip)
            index = hash_value % len(self._servers)
            return self._servers[index]
    
    def add_server(self, server: str) -> None:
        """
        Add a server to the pool.
        
        WARNING: This changes the hash distribution!
        Many clients will be remapped to different servers.
        """
        with self._lock:
            if server not in self._servers:
                self._servers.append(server)
    
    def remove_server(self, server: str) -> None:
        """
        Remove a server from the pool.
        
        WARNING: This changes the hash distribution!
        Clients of the removed server AND some other clients
        will be remapped.
        """
        with self._lock:
            if server in self._servers:
                self._servers.remove(server)
 
 
# Demonstration
if __name__ == "__main__":
    servers = ["10.0.1.1:8080", "10.0.1.2:8080", "10.0.1.3:8080"]
    lb = IPHashLoadBalancer(servers)
    
    # Simulate clients
    clients = [
        "192.168.1.50",
        "192.168.1.51",
        "192.168.1.52",
        "10.0.0.1",
        "172.16.0.100",
    ]
    
    print("IP Hash Distribution:")
    print("-" * 50)
    
    for client_ip in clients:
        server = lb.get_server(client_ip)
        print(f"Client {client_ip:15} → {server}")
    
    print("\nSame client, multiple requests (should be same server):")
    for i in range(3):
        server = lb.get_server("192.168.1.50")
        print(f"  Request {i+1}: 192.168.1.50 → {server}")
    
    print("\nAfter removing a server:")
    lb.remove_server("10.0.1.2:8080")
    for client_ip in clients:
        server = lb.get_server(client_ip)
        print(f"Client {client_ip:15} → {server}")

Hash Function Selection

Use a well-distributed hash function like MD5, SHA1, or xxHash. The built-in hash() function in many languages is not suitable—it may differ between runs or machines, breaking the determinism property. Cryptographic strength isn't needed; distribution quality is what matters.

The Remap Problem: IP Hash's Critical Weakness

IP Hash has a significant problem: server pool changes cause massive remapping.

Example: Adding a Server

Original pool: [A, B, C] (3 servers)

Client hash 0 → A (0 % 3 = 0)
Client hash 1 → B (1 % 3 = 1)
Client hash 2 → C (2 % 3 = 2)
Client hash 3 → A (3 % 3 = 0)
Client hash 4 → B (4 % 3 = 1)
Client hash 5 → C (5 % 3 = 2)

After adding server D: [A, B, C, D] (4 servers)

Client hash 0 → A (0 % 4 = 0) ✓ Same
Client hash 1 → B (1 % 4 = 1) ✓ Same
Client hash 2 → C (2 % 4 = 2) ✓ Same
Client hash 3 → D (3 % 4 = 3) ✗ Was A!
Client hash 4 → A (4 % 4 = 0) ✗ Was B!
Client hash 5 → B (5 % 4 = 1) ✗ Was C!

Result: 50% of clients get remapped to different servers—their sessions break!

Converting Mermaid diagram...

The Math:

When changing from n to n+1 servers:

Expected remapped clients: n / (n + 1) of all clients
For n=3 to n=4: 75% remapped (our example showed 50%, but average is higher)
For n=10 to n=11: ~91% remapped
For n=99 to n=100: ~99% remapped

This gets worse as you scale. Adding one server to a 100-server pool remaps almost everyone!

Critical Limitation

Simple IP Hash is only suitable for stable server pools. If you're frequently adding/removing servers (autoscaling, deployments, failures), the session disruption is unacceptable. Use Consistent Hashing instead (covered in the next page).

IP Address Considerations: What IP Are You Hashing?

Choosing the right IP address to hash is more nuanced than it appears.

IP Address Sources for Hashing
Source	Description	Considerations
Direct Connection IP	IP of the TCP connection	Only correct if no proxies/CDN between client and LB
X-Forwarded-For header	Client IP as reported by upstream proxies	Can be spoofed! Must trust upstream proxy
X-Real-IP header	Alternative client IP header	Same spoofing concerns as X-Forwarded-For
CF-Connecting-IP	Cloudflare's client IP header	Only valid behind Cloudflare
True-Client-IP	Akamai's client IP header	Only valid behind Akamai

NAT and Shared IP Problems:

Many clients share IP addresses:

Corporate NAT: Thousands of employees appear as one IP
Carrier-Grade NAT (CGNAT): Mobile carriers NAT many users behind few IPs
VPNs: All VPN users share the VPN's exit IP
WiFi networks: Coffee shop patrons share one IP

Impact:

All users behind a NAT route to the same server
One corporate office could send all traffic to one backend
Creates severe load imbalance—one server gets 10,000 users, others get 10

Mitigations:

Use subnet hashing: Hash only the first 24 bits (IPv4) to group nearby clients
Fallback algorithms: If one server gets too many connections, blend with least-connections
Cookie-based affinity: More granular than IP (discussed later)
Accept the limitation: For some use cases, this imbalance is acceptable

ip_hash_subnet.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import hashlib
import socket
import struct
from typing import List
 
class SubnetHashLoadBalancer:
    """
    IP Hash using subnet portion of address.
    
    Hashes the first 24 bits of IPv4 addresses (the /24 subnet),
    which can help with some NAT scenarios while maintaining
    geographic locality.
    """
    
    def __init__(self, servers: List[str], subnet_bits: int = 24):
        """
        Args:
            servers: Backend server list
            subnet_bits: Number of bits to use for hashing (1-32 for IPv4)
        """
        self._servers = list(servers)
        self._subnet_bits = subnet_bits
        self._mask = (0xFFFFFFFF << (32 - subnet_bits)) & 0xFFFFFFFF
    
    def _get_subnet(self, ip: str) -> int:
        """Extract subnet portion of IP address."""
        try:
            packed = socket.inet_aton(ip)
            ip_int = struct.unpack("!I", packed)[0]
            return ip_int & self._mask
        except socket.error:
            return hash(ip)
    
    def get_server(self, client_ip: str) -> str:
        """Select server based on subnet hash."""
        subnet = self._get_subnet(client_ip)
        h = hashlib.md5(str(subnet).encode())
        hash_value = int(h.hexdigest(), 16)
        index = hash_value % len(self._servers)
        return self._servers[index]
 
 
# Demonstration
if __name__ == "__main__":
    servers = ["Server-A", "Server-B", "Server-C"]
    lb = SubnetHashLoadBalancer(servers, subnet_bits=24)
    
    # Clients in same /24 subnet
    clients = [
        "192.168.1.1",
        "192.168.1.50",
        "192.168.1.200",
        "192.168.2.1",  # Different subnet
        "192.168.2.50",
    ]
    
    print("Subnet-based IP Hash (/24):")
    print("-" * 50)
    
    for client_ip in clients:
        server = lb.get_server(client_ip)
        subnet = client_ip.rsplit('.', 1)[0] + ".0/24"
        print(f"Client {client_ip:15} (subnet {subnet}) → {server}")

IPv6 Considerations

IPv6 addresses are 128 bits. The first 64 bits typically identify the network, while the last 64 identify the host. For IP hash, using the first 64 bits (or even 48) is usually appropriate. Full address hashing may be unnecessarily granular.

Production Configuration

Let's examine IP Hash configuration in production load balancers.

nginx_ip_hash.conf
NGINX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# NGINX IP Hash Configuration
 
upstream backend_ip_hash {
    # Enable IP hash
    ip_hash;
    
    # Server list - order matters for consistency!
    # Servers should always be listed in the same order
    server 10.0.1.1:8080;
    server 10.0.1.2:8080;
    server 10.0.1.3:8080;
    
    # Graceful removal: mark as down, not removed
    # This minimizes remapping - only clients of this server affected
    # server 10.0.1.4:8080 down;
    
    # Backup server - receives traffic only if hashed server is down
    server 10.0.1.5:8080 backup;
}
 
upstream ip_hash_with_weights {
    ip_hash;
    
    # Weights work differently with ip_hash:
    # Higher weight = more hash slots, more clients routed here
    server 10.0.1.1:8080 weight=3;
    server 10.0.1.2:8080 weight=2;
    server 10.0.1.3:8080 weight=1;
    
    # Approximate distribution: 3:2:1
}
 
# For X-Forwarded-For handling
upstream ip_hash_xff {
    ip_hash;
    
    server 10.0.1.1:8080;
    server 10.0.1.2:8080;
    server 10.0.1.3:8080;
}
 
server {
    listen 80;
    
    # Trust proxy headers from internal sources only
    set_real_ip_from 10.0.0.0/8;
    set_real_ip_from 172.16.0.0/12;
    real_ip_header X-Forwarded-For;
    real_ip_recursive on;  # Use leftmost untrusted IP
    
    location / {
        proxy_pass http://backend_ip_hash;
        
        # Forward client IP for logging/app use
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

haproxy_ip_hash.cfg

HAProxy

# HAProxy IP Hash (Source) Configuration
 
global
    maxconn 50000
 
defaults
    mode http
    timeout connect 5s
    timeout client 30s
    timeout server 60s
 
backend app_source_hash
    # Source IP hash balancing
    balance source
    
    # Hash options
    hash-type consistent  # Use consistent hashing (recommended!)
    # hash-type map-based  # Simple modulo (not recommended)
    
    server app1 10.0.1.1:8080 check
    server app2 10.0.1.2:8080 check
    server app3 10.0.1.3:8080 check
 
backend app_source_with_xff
    balance source
    hash-type consistent
    
    # Use X-Forwarded-For for client IP
    # Only if you trust the upstream proxy!
    http-request set-header X-Client-IP %[req.hdr(X-Forwarded-For),word(1,',')] if { req.hdr(X-Forwarded-For) -m found }
    
    server app1 10.0.1.1:8080 check
    server app2 10.0.1.2:8080 check
    server app3 10.0.1.3:8080 check
 
backend app_header_hash
    # Hash on arbitrary header (alternative to IP)
    balance hdr(X-User-ID)
    hash-type consistent
    
    server app1 10.0.1.1:8080 check
    server app2 10.0.1.2:8080 check
 
backend app_url_hash
    # Hash on URL parameter
    balance url_param userid
    hash-type consistent
    
    server app1 10.0.1.1:8080 check
    server app2 10.0.1.2:8080 check
 
frontend http_front
    bind *:80
    
    # Source IP is determined before routing
    default_backend app_source_hash
 
listen stats
    bind *:8404
    stats enable
    stats uri /stats

HAProxy Consistent Hashing

Note the hash-type consistent directive in HAProxy. This uses consistent hashing instead of simple modulo, dramatically reducing remapping when servers change. Always use consistent hashing for production IP hash deployments!

Alternatives to IP Hash for Session Affinity

IP Hash isn't the only way to achieve session affinity. Each approach has different tradeoffs.

Session Affinity Methods Comparison
Method	Mechanism	Pros	Cons
IP Hash	Hash client IP to select server	Simple, no client cooperation needed	NAT problems, remap on server changes
Cookie-based	Set cookie with server ID, route by cookie	Per-user granularity, survives IP changes	Requires cookie support, inspect HTTP
URL Parameter	Encode server ID in URL	Works without cookies	Ugly URLs, requires app cooperation
Header-based	Custom header identifies user/session	Flexible, app-controlled	Requires app to set header
Consistent Hash	Hash with virtual nodes for smooth remapping	Minimal disruption on changes	More complex to implement
Centralized Session Store	Store sessions in Redis/DB, any server works	True stateless backends	Additional infrastructure, latency

Cookie-Based Affinity in Detail:

Cookie-based affinity is often superior to IP hash. Here's how it works:

nginx_cookie_affinity.conf
NGINX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# NGINX Cookie-Based Session Affinity
 
# Method 1: Sticky cookie (NGINX Plus only)
upstream backend_sticky {
    sticky cookie srv_id expires=1h domain=.example.com path=/;
    
    server 10.0.1.1:8080;
    server 10.0.1.2:8080;
    server 10.0.1.3:8080;
    
    # NGINX Plus sets a cookie (srv_id) on first request
    # Subsequent requests route based on that cookie
}
 
# Method 2: Route based on existing session cookie
upstream backend_route {
    # Use hash of session ID cookie
    hash $cookie_JSESSIONID consistent;
    
    server 10.0.1.1:8080;
    server 10.0.1.2:8080;
    server 10.0.1.3:8080;
}
 
# Method 3: App sets server hint in cookie
map $cookie_server_hint $backend_server {
    default backend_round_robin;
    "app1" backend_app1;
    "app2" backend_app2;
    "app3" backend_app3;
}
 
server {
    listen 80;
    
    location / {
        proxy_pass http://$backend_server;
    }
}

Externalized Session State

The cleanest solution is often to externalize session state to Redis, Memcached, or a database. With shared session storage, any server can handle any request—session affinity becomes unnecessary. This enables true horizontal scaling without load balancer complexity.

When to Use IP Hash

Good Use Cases

•Stable server pool: Servers rarely added/removed
•Non-browser clients: No cookie support needed
•Layer 4 load balancing: TCP-level, can't inspect HTTP
•Cache efficiency: Keep user's data on same cache node
•Simple setup: Quick session affinity without complexity

Poor Use Cases

•Autoscaling environments: Frequent server changes
•Enterprise clients: Heavy NAT creates imbalance
•Mobile users: CGNAT and IP changes
•CDN-fronted: All requests appear from CDN IPs
•Critical session data: Session loss is unacceptable

The Modern Recommendation

For most modern web applications: 1) Externalize session state to Redis/database, eliminating the need for affinity, OR 2) Use cookie-based affinity which is more reliable than IP hash. Reserve IP hash for Layer 4 LB scenarios or non-HTTP protocols.

Summary: IP Hash Mastery

Key Takeaways

•Deterministic routing: Same IP always routes to same server, providing session affinity.
•The remap problem: Server pool changes cause massive client remapping—up to 99% for large pools.
•NAT creates imbalance: Shared IPs concentrate traffic on single servers.
•Know your IP source: X-Forwarded-For must be trusted; spoofing is a real risk.
•Alternatives exist: Cookie-based affinity and externalized sessions are often better.
•Use consistent hashing: For production, always prefer consistent hash over simple modulo.

What's Next:

IP Hash's remap problem is severe. The next page explores Consistent Hashing—an elegant algorithm that minimizes remapping to only K/n clients (where K is total clients and n is servers) when the server pool changes. This makes hash-based load balancing practical for dynamic environments.

Page Complete

You now understand IP Hash load balancing—its mechanics, critical limitations, and alternatives. You can make informed decisions about when IP Hash is appropriate versus when cookie-based affinity or consistent hashing would serve better.