System Design (HLD)Session Persistence (Sticky Sessions)

Session Persistence (Sticky Sessions)

LevelIntermediate

Duration55 mins

TopicSession Persistence (Sticky Sessions)

3 / 5

IP-Based Session Persistence

When Addresses Tell the Story

Before cookies dominated the web, before sophisticated HTTP parsing became commonplace, there was a simpler approach to session persistence: use the client's IP address as the identifier.

The logic seems elegant: every TCP connection has a source IP address. If we consistently route all requests from the same IP to the same backend server, we achieve session affinity without any application-layer involvement. No cookies to manage, no HTTP headers to parse, no client cooperation required.

This simplicity made IP-based persistence the go-to solution for early load balancers and Layer 4 routing. It remains widely available today (often called 'source IP affinity,' 'IP hash,' or 'client IP persistence').

But here's the uncomfortable truth: IP-based persistence is fundamentally broken in the modern internet.

Not 'sometimes problematic.' Not 'edge-case limited.' Fundamentally broken for most real-world scenarios. This page explains why—while also identifying the narrow situations where IP-based persistence still makes sense.

What You Will Learn

By the end of this page, you will understand the mechanics of IP-based persistence, why it fails in modern network environments (NAT, CGN, mobile networks, proxies), when it remains appropriate, and how to configure it across major load balancers. Most importantly, you'll know when NOT to use it.

How IP-Based Persistence Works

IP-based persistence operates at the network layer, making routing decisions based on the source IP address of incoming connections. Here's the fundamental algorithm:

Basic Hash Algorithm:

server_index = hash(source_ip_address) % number_of_servers

The load balancer:

Extracts source IP from incoming packet
Applies a hash function to the IP
Maps the hash result to a server index
Routes all future requests from that IP to the same server

Two Implementation Approaches:

Stateless Hash

•Hash computed on every request
•No state stored in load balancer
•Deterministic: same IP → same server
•Pool changes break all sessions
•Very fast and scalable
•Common: ip_hash in NGINX

Stateful Mapping

•First request creates mapping entry
•Mapping stored in LB memory/table
•Subsequent lookups use table
•Pool changes can preserve sessions
•Memory overhead grows with clients
•Common: Connection tables in F5

Converting Mermaid diagram...

Consistent Hashing Variant:

Modern implementations often use consistent hashing to minimize disruption when servers are added or removed:

Traditional hash: Adding a server changes ~(1/n) × (n-1/n) ≈ 100% of mappings
Consistent hash: Adding a server changes only ~1/n of mappings

With 10 servers, removing one server with traditional hashing shuffles approximately 90% of clients. With consistent hashing, it's approximately 10%—a dramatic improvement for operational stability.

Layer 4 Advantage

IP-based persistence works at Layer 4 (transport layer), meaning it can persist any TCP or UDP traffic—not just HTTP. This makes it valuable for non-HTTP protocols like database connections, SMTP, or custom TCP services where cookie-based persistence isn't possible.

The NAT Problem: Why IP Persistence Fails

The fundamental assumption of IP-based persistence is that each user has a unique, stable IP address. This assumption is catastrophically wrong in the modern internet.

Network Address Translation (NAT):

NAT allows multiple devices to share a single public IP address. Originally a workaround for IPv4 address exhaustion, NAT is now ubiquitous:

Home networks: Your household's 5-10 devices share one public IP
Corporate networks: Thousands of employees share a handful of IPs
Universities: Entire campuses behind NAT gateways
Public WiFi: Coffee shop, airport, hotel—all users share IPs

The Scale of the Problem:

Users Behind Single IP Address
Environment	Typical Users per IP	Impact on IP Persistence
Residential (Home NAT)	5-15 devices	Family members can't use site simultaneously
Small Office	20-100 employees	All staff routed to single server
Corporate Enterprise	1,000-10,000 employees	Massive load imbalance
University Campus	10,000-50,000 students	Complete persistence failure
ISP CGN (Carrier-Grade NAT)	1,000-100,000 subscribers	Catastrophic load concentration
Mobile Carrier	Varies wildly	IPs change mid-session

Carrier-Grade NAT (CGN):

IPv4 exhaustion has driven ISPs to deploy Carrier-Grade NAT (CGN), also called Large Scale NAT (LSN). Under CGN:

Entire neighborhoods share a single public IP
Thousands of subscribers appear as one client
Load balancing becomes meaningless—one server gets ALL traffic

Major ISPs worldwide deploy CGN. In some developing regions, CGN is universal. Your assumption of 'one IP = one user' might be off by a factor of 10,000.

Visualization of the Problem:

Converting Mermaid diagram...

Real-World Disasters

Production incidents caused by IP persistence + NAT are common. A single corporate client sends 10,000 users; one server handles all of them while others sit idle. Auto-scaling can't help—the traffic distribution is fundamentally broken. Server 1 crashes from load while Server 2-10 run at 5% utilization.

Mobile Networks and IP Instability

NAT causes load imbalance, but mobile networks create a different problem: IP addresses that change mid-session.

Why Mobile IPs Change:

Cell Tower Handoffs: Moving between towers can trigger IP changes
WiFi-to-Cellular Transitions: Switching networks changes IP completely
Carrier NAT Rotation: Some carriers rotate CGN IPs periodically
Network Timeouts: Idle connections may get new IPs upon resumption
IPv6 Privacy Extensions: Mobile IPv6 often uses temporary addresses

The Session Destruction Pattern:

Converting Mermaid diagram...

The Scale of Mobile Traffic:

As of 2024, mobile devices account for approximately 60% of global web traffic. In many regions, mobile-first users dominate. An architecture that breaks for mobile users is broken for the majority of your users.

WiFi Offload Complexity:

Modern smartphones aggressively offload traffic to WiFi when available:

User starts on WiFi at home (IP: home router)
User leaves, switches to cellular (IP: carrier NAT)
User arrives at work, switches to corporate WiFi (IP: corporate NAT)
User goes to lunch at café, switches to café WiFi (IP: café router)

That's potentially four different IP addresses in a single morning, each potentially routing to a different server. Any session state stored on Server 1 is lost when the user hits Server 2.

Mobile IP Persistence Failures

•Session loss during commutes — User adds to cart at home, cart is empty upon arriving at work
•Authentication drops — User logs in on WiFi, appears logged out when switching to cellular
•Form submission failures — Multi-step forms break when IP changes between steps
•Real-time feature disruption — Live updates/chat break on network transition
•Inconsistent user experience — Same user sees different server states throughout day

The UX Impact

Mobile users don't understand (or care) that their IP changed. They just know your app 'lost their stuff.' These invisible failures destroy user trust. In a world of app store ratings and social media complaints, invisible infrastructure failures become very visible business problems.

Proxy and CDN Complications

Beyond NAT and mobile networks, various proxy technologies further complicate IP-based persistence:

Forward Proxies:

Organizations route traffic through forward proxies for security, filtering, or caching:

Corporate proxy servers aggregate many users behind few IPs
School content filters do the same
Privacy-conscious users employ proxy services

Reverse Proxies and CDNs:

If you're behind a CDN (Cloudflare, Akamai, CloudFront), the 'source IP' your origin load balancer sees is the CDN edge server—not the actual client:

Client (IP: 203.0.113.50) → CDN Edge (IP: 198.51.100.1) → Your Load Balancer

Your load balancer sees 198.51.100.1 for millions of different users. All CDN traffic from that edge server gets routed to the same backend.

VPN Services:

VPN usage has exploded for privacy, remote work, and geo-circumvention:

All users of a popular VPN exit node share that IP
Users may switch between VPN servers
Corporate VPNs concentrate enterprise traffic

Proxy Types and Their Impact
Proxy Type	How It Affects Source IP	Impact on IP Persistence
Corporate Forward Proxy	Thousands of employees → few IPs	Severe load imbalance
CDN Edge (Cloudflare, etc.)	Millions of users → hundreds of edge IPs	Near-complete persistence failure
Consumer VPN (NordVPN, etc.)	Thousands of subscribers → shared exit IPs	Random session loss, load imbalance
Tor Network	Millions of users → few exit nodes	Complete persistence failure
Corporate VPN (Split Tunnel)	Remote workers → corporate IP	Partial, unpredictable issues
ISP Transparent Proxy	Subscribers → proxy IP	Invisible but significant

The X-Forwarded-For 'Solution' (and Why It's Not):

Some suggest using X-Forwarded-For header instead of TCP source IP. This header (when properly set by proxies) contains the original client IP:

X-Forwarded-For: client_ip, proxy1_ip, proxy2_ip

Problems with X-Forwarded-For:

Trust Issues: Anyone can forge this header. You can only trust it from known, trusted proxies.
Layer 7 Only: Requires HTTP parsing; can't use for non-HTTP protocols.
Still Subject to NAT: The 'original' client IP is still behind NAT.
Configuration Complexity: Different proxies handle XFF differently.
IPv6 Complications: Mixed IPv4/IPv6 environments create inconsistencies.

When X-Forwarded-For Helps

X-Forwarded-For is valuable for logging, rate limiting (at the right layer), and geographic routing. But it doesn't fundamentally solve IP persistence problems—it just moves them from 'CDN IP' to 'client IP,' which is still usually behind NAT.

When IP Persistence Actually Makes Sense

After all these warnings, are there scenarios where IP-based persistence works? Yes—narrow but legitimate use cases exist:

Scenario 1: Internal Services / Service-to-Service Communication

Within a datacenter or cloud VPC, service-to-service calls have stable, unique IPs:

Microservice A (IP: 10.0.1.50) calls Microservice B cluster
IP persistence routes A's traffic consistently
No NAT, no mobile networks, no proxies

Scenario 2: Known, Controlled Client Population

When you control clients and know their network topology:

Internal enterprise applications with dedicated client IPs
Embedded devices with static configurations
B2B interfaces with whitelisted partner IPs

Scenario 3: Non-HTTP Protocols

For protocols without cookie/header support, IP persistence may be the only option:

Database connection pooling
Raw TCP streaming services
SMTP relay servers
Gaming servers (UDP-based)

Scenario 4: Short-Duration Affinity

When persistence only needs to last seconds, not sessions:

Multi-packet transactions (file uploads over UDP)
Connection establishment phases
Burst-mode operations

IP Persistence Viability Checklist

•✓ Is this internal/service-to-service traffic with stable IPs?
•✓ Are clients in a controlled environment without NAT/proxies?
•✓ Is this a non-HTTP protocol where cookies aren't possible?
•✓ Is session loss tolerable (graceful degradation)?
•✓ Can you accept potential load imbalance from shared IPs?
•✓ Is traffic duration short (seconds, not minutes/hours)?

If you answered 'yes' to most of these, IP persistence might work.

But if even one involves public internet users, mobile clients, or web browsers—reconsider. The edge cases will bite you eventually.

Hybrid Approaches

Some architectures use IP persistence as a fallback. Attempt cookie-based persistence first; if no cookie, fall back to IP hash. This provides graceful degradation while optimizing for the common case. HAProxy and others support this pattern.

Configuration Across Platforms

Despite its limitations, you may need to configure IP persistence for specific scenarios. Here's how across major platforms:

nginx.conf
NGINX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Basic IP hash
upstream backend {
    ip_hash;  # Enable IP-based persistence
    
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    server 10.0.0.3:8080;
}
 
# With weight and backup
upstream backend_weighted {
    ip_hash;
    
    server 10.0.0.1:8080 weight=3;
    server 10.0.0.2:8080 weight=2;
    server 10.0.0.3:8080 backup;  # Only used when others fail
}
 
# Using consistent hash on specific variable
upstream backend_hash {
    hash $remote_addr consistent;  # Consistent hashing
    
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    server 10.0.0.3:8080;
}
 
# Hash on X-Forwarded-For (use with trusted proxies only)
upstream backend_xff {
    hash $http_x_forwarded_for consistent;
    
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
}
 
server {
    listen 80;
    
    location / {
        proxy_pass http://backend;
    }
}

NGINX Notes:

ip_hash uses first three octets of IPv4 (class C network)
hash $var consistent is more flexible and uses consistent hashing
For IPv6, ip_hash uses full address

Monitoring and Troubleshooting IP Persistence

If you're using IP-based persistence, monitoring is critical to detect the inevitable problems:

Key Metrics to Watch:

Essential Monitoring

•Request Distribution Variance: Compare requests per server. Variance > 20% may indicate NAT-driven imbalance.
•Load Per Server: Monitor CPU, memory, latency per server. One server spiking while others idle suggests sticky traffic concentration.
•Session Loss Rate: Track session creation vs. 'session not found' errors. Spikes indicate persistence failures.
•Connection Origins: Count unique source IPs vs. total requests. Low ratio means high NAT concentration.
•Response Time Distribution: Compare p95/p99 latency across servers. Overloaded servers show higher tail latency.

Troubleshooting Common Issues:

Issue 1: Severe Load Imbalance

Symptoms: One server at 90% CPU, others at 10%.

Diagnosis:

Check source IP distribution (are many requests from few IPs?)
Look for large NAT (corporate, CGN)
Check if CDN is configured (all traffic from CDN IPs)

Mitigation:

Switch to cookie-based persistence
For CGN, consider hybrid approach with fallback load balancing
Implement connection limiting per IP

Issue 2: Sessions Breaking for Specific Users

Symptoms: Certain users report constant session loss, others unaffected.

Diagnosis:

Check if affected users are mobile (IP changing)
Check if affected users are behind proxy/VPN
Verify consistent hashing is enabled (server pool changes)

Mitigation:

Longest-term: Migrate to cookie-based persistence
Short-term: Increase session timeout, externalize session store

Issue 3: All Traffic Going to One Server After Deployment

Symptoms: After adding/removing servers, traffic distribution radically changes.

Diagnosis:

Check if consistent hashing is enabled
Verify server pool configuration matches actual servers
Check for health check failures affecting routing

Mitigation:

Enable consistent hashing (hash-type consistent in HAProxy)
Use gradual server additions (blue-green deployment)
Implement connection draining before removal

Proactive Detection

Set up alerts for load imbalance ratios. If any server receives more than 2x the average requests, trigger investigation. This catches NAT concentration problems before they become outages.

Summary: IP Persistence in Perspective

We've thoroughly examined IP-based persistence—its mechanics, failures, and limited appropriate uses. Let's synthesize:

IP vs. Cookie Persistence Comparison
Aspect	IP-Based Persistence	Cookie-Based Persistence
Works with NAT	❌ Fails badly	✅ Works perfectly
Works with Mobile	❌ Breaks on IP change	✅ Works across networks
Works with CDN/Proxy	❌ All clients = one IP	✅ Each client distinct
Non-HTTP Protocols	✅ Works (Layer 4)	❌ HTTP only (Layer 7)
Client Cooperation	✅ None required	⚠️ Needs cookie support
Load Balancer State	⚠️ Depends on impl.	✅ Minimal (cookie stored on client)
Security Exposure	✅ Minimal	⚠️ Needs careful cookie config
Server Pool Changes	⚠️ Disrupts sessions	⚠️ Disrupts sessions (similar)

Key Takeaways

•IP persistence assumes stable, unique IPs — This assumption is false for most public internet traffic due to NAT, CGN, and mobile networks.
•NAT causes severe load imbalance — Thousands of users behind one IP all route to one overloaded server while others sit idle.
•Mobile networks break IP persistence — IP addresses change during network transitions, breaking sessions mid-flow.
•CDNs and proxies create similar problems — All traffic from a CDN edge appears as one 'client,' defeating persistence.
•IP persistence works for internal services — Service-to-service communication with stable IPs remains a valid use case.
•Non-HTTP protocols may require IP persistence — When cookies aren't available, IP-based routing may be the only option.
•Cookie-based persistence is superior for web traffic — For any public-facing HTTP/HTTPS service, prefer cookies.

What's Next:

We've now covered both major persistence mechanisms—cookies and IP-based. But sticky sessions, regardless of implementation, have fundamental drawbacks. Next, we'll examine the drawbacks of sticky sessions as an architectural pattern—load imbalance, failover complexity, and scalability constraints—setting up the case for stateless alternatives.

IP Persistence Understood

You now understand IP-based persistence comprehensively: how it works, why it fails in modern networks, the narrow scenarios where it's appropriate, and how to configure it when needed. More importantly, you know when NOT to use it—which is most of the time for public-facing services.

3 / 5

Loading learning content...

System Design (HLD)Session Persistence (Sticky Sessions)

Session Persistence (Sticky Sessions)

LevelIntermediate

Duration55 mins

TopicSession Persistence (Sticky Sessions)

3 / 5

IP-Based Session Persistence

When Addresses Tell the Story

Before cookies dominated the web, before sophisticated HTTP parsing became commonplace, there was a simpler approach to session persistence: use the client's IP address as the identifier.

But here's the uncomfortable truth: IP-based persistence is fundamentally broken in the modern internet.

What You Will Learn

How IP-Based Persistence Works

IP-based persistence operates at the network layer, making routing decisions based on the source IP address of incoming connections. Here's the fundamental algorithm:

Basic Hash Algorithm:

server_index = hash(source_ip_address) % number_of_servers

The load balancer:

Extracts source IP from incoming packet
Applies a hash function to the IP
Maps the hash result to a server index
Routes all future requests from that IP to the same server

Two Implementation Approaches:

Stateless Hash

•Hash computed on every request
•No state stored in load balancer
•Deterministic: same IP → same server
•Pool changes break all sessions
•Very fast and scalable
•Common: ip_hash in NGINX

Stateful Mapping

•First request creates mapping entry
•Mapping stored in LB memory/table
•Subsequent lookups use table
•Pool changes can preserve sessions
•Memory overhead grows with clients
•Common: Connection tables in F5

Converting Mermaid diagram...

Consistent Hashing Variant:

Modern implementations often use consistent hashing to minimize disruption when servers are added or removed:

Traditional hash: Adding a server changes ~(1/n) × (n-1/n) ≈ 100% of mappings
Consistent hash: Adding a server changes only ~1/n of mappings

With 10 servers, removing one server with traditional hashing shuffles approximately 90% of clients. With consistent hashing, it's approximately 10%—a dramatic improvement for operational stability.

Layer 4 Advantage

The NAT Problem: Why IP Persistence Fails

The fundamental assumption of IP-based persistence is that each user has a unique, stable IP address. This assumption is catastrophically wrong in the modern internet.

Network Address Translation (NAT):

NAT allows multiple devices to share a single public IP address. Originally a workaround for IPv4 address exhaustion, NAT is now ubiquitous:

Home networks: Your household's 5-10 devices share one public IP
Corporate networks: Thousands of employees share a handful of IPs
Universities: Entire campuses behind NAT gateways
Public WiFi: Coffee shop, airport, hotel—all users share IPs

The Scale of the Problem:

Users Behind Single IP Address
Environment	Typical Users per IP	Impact on IP Persistence
Residential (Home NAT)	5-15 devices	Family members can't use site simultaneously
Small Office	20-100 employees	All staff routed to single server
Corporate Enterprise	1,000-10,000 employees	Massive load imbalance
University Campus	10,000-50,000 students	Complete persistence failure
ISP CGN (Carrier-Grade NAT)	1,000-100,000 subscribers	Catastrophic load concentration
Mobile Carrier	Varies wildly	IPs change mid-session

Carrier-Grade NAT (CGN):

IPv4 exhaustion has driven ISPs to deploy Carrier-Grade NAT (CGN), also called Large Scale NAT (LSN). Under CGN:

Entire neighborhoods share a single public IP
Thousands of subscribers appear as one client
Load balancing becomes meaningless—one server gets ALL traffic

Major ISPs worldwide deploy CGN. In some developing regions, CGN is universal. Your assumption of 'one IP = one user' might be off by a factor of 10,000.

Visualization of the Problem:

Converting Mermaid diagram...

Real-World Disasters

Mobile Networks and IP Instability

NAT causes load imbalance, but mobile networks create a different problem: IP addresses that change mid-session.

Why Mobile IPs Change:

Cell Tower Handoffs: Moving between towers can trigger IP changes
WiFi-to-Cellular Transitions: Switching networks changes IP completely
Carrier NAT Rotation: Some carriers rotate CGN IPs periodically
Network Timeouts: Idle connections may get new IPs upon resumption
IPv6 Privacy Extensions: Mobile IPv6 often uses temporary addresses

The Session Destruction Pattern:

Converting Mermaid diagram...

The Scale of Mobile Traffic:

WiFi Offload Complexity:

Modern smartphones aggressively offload traffic to WiFi when available:

User starts on WiFi at home (IP: home router)
User leaves, switches to cellular (IP: carrier NAT)
User arrives at work, switches to corporate WiFi (IP: corporate NAT)
User goes to lunch at café, switches to café WiFi (IP: café router)

That's potentially four different IP addresses in a single morning, each potentially routing to a different server. Any session state stored on Server 1 is lost when the user hits Server 2.

Mobile IP Persistence Failures

•Session loss during commutes — User adds to cart at home, cart is empty upon arriving at work
•Authentication drops — User logs in on WiFi, appears logged out when switching to cellular
•Form submission failures — Multi-step forms break when IP changes between steps
•Real-time feature disruption — Live updates/chat break on network transition
•Inconsistent user experience — Same user sees different server states throughout day

The UX Impact

Proxy and CDN Complications

Beyond NAT and mobile networks, various proxy technologies further complicate IP-based persistence:

Forward Proxies:

Organizations route traffic through forward proxies for security, filtering, or caching:

Corporate proxy servers aggregate many users behind few IPs
School content filters do the same
Privacy-conscious users employ proxy services

Reverse Proxies and CDNs:

If you're behind a CDN (Cloudflare, Akamai, CloudFront), the 'source IP' your origin load balancer sees is the CDN edge server—not the actual client:

Client (IP: 203.0.113.50) → CDN Edge (IP: 198.51.100.1) → Your Load Balancer

Your load balancer sees 198.51.100.1 for millions of different users. All CDN traffic from that edge server gets routed to the same backend.

VPN Services:

VPN usage has exploded for privacy, remote work, and geo-circumvention:

All users of a popular VPN exit node share that IP
Users may switch between VPN servers
Corporate VPNs concentrate enterprise traffic

Proxy Types and Their Impact
Proxy Type	How It Affects Source IP	Impact on IP Persistence
Corporate Forward Proxy	Thousands of employees → few IPs	Severe load imbalance
CDN Edge (Cloudflare, etc.)	Millions of users → hundreds of edge IPs	Near-complete persistence failure
Consumer VPN (NordVPN, etc.)	Thousands of subscribers → shared exit IPs	Random session loss, load imbalance
Tor Network	Millions of users → few exit nodes	Complete persistence failure
Corporate VPN (Split Tunnel)	Remote workers → corporate IP	Partial, unpredictable issues
ISP Transparent Proxy	Subscribers → proxy IP	Invisible but significant

The X-Forwarded-For 'Solution' (and Why It's Not):

Some suggest using X-Forwarded-For header instead of TCP source IP. This header (when properly set by proxies) contains the original client IP:

X-Forwarded-For: client_ip, proxy1_ip, proxy2_ip

Problems with X-Forwarded-For:

Trust Issues: Anyone can forge this header. You can only trust it from known, trusted proxies.
Layer 7 Only: Requires HTTP parsing; can't use for non-HTTP protocols.
Still Subject to NAT: The 'original' client IP is still behind NAT.
Configuration Complexity: Different proxies handle XFF differently.
IPv6 Complications: Mixed IPv4/IPv6 environments create inconsistencies.

When X-Forwarded-For Helps

When IP Persistence Actually Makes Sense

After all these warnings, are there scenarios where IP-based persistence works? Yes—narrow but legitimate use cases exist:

Scenario 1: Internal Services / Service-to-Service Communication

Within a datacenter or cloud VPC, service-to-service calls have stable, unique IPs:

Microservice A (IP: 10.0.1.50) calls Microservice B cluster
IP persistence routes A's traffic consistently
No NAT, no mobile networks, no proxies

Scenario 2: Known, Controlled Client Population

When you control clients and know their network topology:

Internal enterprise applications with dedicated client IPs
Embedded devices with static configurations
B2B interfaces with whitelisted partner IPs

Scenario 3: Non-HTTP Protocols

For protocols without cookie/header support, IP persistence may be the only option:

Database connection pooling
Raw TCP streaming services
SMTP relay servers
Gaming servers (UDP-based)

Scenario 4: Short-Duration Affinity

When persistence only needs to last seconds, not sessions:

Multi-packet transactions (file uploads over UDP)
Connection establishment phases
Burst-mode operations

IP Persistence Viability Checklist

•✓ Is this internal/service-to-service traffic with stable IPs?
•✓ Are clients in a controlled environment without NAT/proxies?
•✓ Is this a non-HTTP protocol where cookies aren't possible?
•✓ Is session loss tolerable (graceful degradation)?
•✓ Can you accept potential load imbalance from shared IPs?
•✓ Is traffic duration short (seconds, not minutes/hours)?

If you answered 'yes' to most of these, IP persistence might work.

But if even one involves public internet users, mobile clients, or web browsers—reconsider. The edge cases will bite you eventually.

Hybrid Approaches

Configuration Across Platforms

Despite its limitations, you may need to configure IP persistence for specific scenarios. Here's how across major platforms:

nginx.conf
NGINX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Basic IP hash
upstream backend {
    ip_hash;  # Enable IP-based persistence
    
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    server 10.0.0.3:8080;
}
 
# With weight and backup
upstream backend_weighted {
    ip_hash;
    
    server 10.0.0.1:8080 weight=3;
    server 10.0.0.2:8080 weight=2;
    server 10.0.0.3:8080 backup;  # Only used when others fail
}
 
# Using consistent hash on specific variable
upstream backend_hash {
    hash $remote_addr consistent;  # Consistent hashing
    
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    server 10.0.0.3:8080;
}
 
# Hash on X-Forwarded-For (use with trusted proxies only)
upstream backend_xff {
    hash $http_x_forwarded_for consistent;
    
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
}
 
server {
    listen 80;
    
    location / {
        proxy_pass http://backend;
    }
}

NGINX Notes:

ip_hash uses first three octets of IPv4 (class C network)
hash $var consistent is more flexible and uses consistent hashing
For IPv6, ip_hash uses full address

Monitoring and Troubleshooting IP Persistence

If you're using IP-based persistence, monitoring is critical to detect the inevitable problems:

Key Metrics to Watch:

Essential Monitoring

•Request Distribution Variance: Compare requests per server. Variance > 20% may indicate NAT-driven imbalance.
•Load Per Server: Monitor CPU, memory, latency per server. One server spiking while others idle suggests sticky traffic concentration.
•Session Loss Rate: Track session creation vs. 'session not found' errors. Spikes indicate persistence failures.
•Connection Origins: Count unique source IPs vs. total requests. Low ratio means high NAT concentration.
•Response Time Distribution: Compare p95/p99 latency across servers. Overloaded servers show higher tail latency.

Troubleshooting Common Issues:

Issue 1: Severe Load Imbalance

Symptoms: One server at 90% CPU, others at 10%.

Diagnosis:

Check source IP distribution (are many requests from few IPs?)
Look for large NAT (corporate, CGN)
Check if CDN is configured (all traffic from CDN IPs)

Mitigation:

Switch to cookie-based persistence
For CGN, consider hybrid approach with fallback load balancing
Implement connection limiting per IP

Issue 2: Sessions Breaking for Specific Users

Symptoms: Certain users report constant session loss, others unaffected.

Diagnosis:

Check if affected users are mobile (IP changing)
Check if affected users are behind proxy/VPN
Verify consistent hashing is enabled (server pool changes)

Mitigation:

Longest-term: Migrate to cookie-based persistence
Short-term: Increase session timeout, externalize session store

Issue 3: All Traffic Going to One Server After Deployment

Symptoms: After adding/removing servers, traffic distribution radically changes.

Diagnosis:

Check if consistent hashing is enabled
Verify server pool configuration matches actual servers
Check for health check failures affecting routing

Mitigation:

Enable consistent hashing (hash-type consistent in HAProxy)
Use gradual server additions (blue-green deployment)
Implement connection draining before removal

Proactive Detection

Set up alerts for load imbalance ratios. If any server receives more than 2x the average requests, trigger investigation. This catches NAT concentration problems before they become outages.

Summary: IP Persistence in Perspective

We've thoroughly examined IP-based persistence—its mechanics, failures, and limited appropriate uses. Let's synthesize:

IP vs. Cookie Persistence Comparison
Aspect	IP-Based Persistence	Cookie-Based Persistence
Works with NAT	❌ Fails badly	✅ Works perfectly
Works with Mobile	❌ Breaks on IP change	✅ Works across networks
Works with CDN/Proxy	❌ All clients = one IP	✅ Each client distinct
Non-HTTP Protocols	✅ Works (Layer 4)	❌ HTTP only (Layer 7)
Client Cooperation	✅ None required	⚠️ Needs cookie support
Load Balancer State	⚠️ Depends on impl.	✅ Minimal (cookie stored on client)
Security Exposure	✅ Minimal	⚠️ Needs careful cookie config
Server Pool Changes	⚠️ Disrupts sessions	⚠️ Disrupts sessions (similar)

Key Takeaways

•IP persistence assumes stable, unique IPs — This assumption is false for most public internet traffic due to NAT, CGN, and mobile networks.
•NAT causes severe load imbalance — Thousands of users behind one IP all route to one overloaded server while others sit idle.
•Mobile networks break IP persistence — IP addresses change during network transitions, breaking sessions mid-flow.
•CDNs and proxies create similar problems — All traffic from a CDN edge appears as one 'client,' defeating persistence.
•IP persistence works for internal services — Service-to-service communication with stable IPs remains a valid use case.
•Non-HTTP protocols may require IP persistence — When cookies aren't available, IP-based routing may be the only option.
•Cookie-based persistence is superior for web traffic — For any public-facing HTTP/HTTPS service, prefer cookies.

What's Next:

IP Persistence Understood

3 / 5