Long Polling - Learning Module

Loading content...

0/273

Timeout Handling

The Timeout Minefield

Long polling exists in a world of timeouts. Every component between the client and your application server has opinions about how long connections should live: browsers impose limits, proxies enforce policies, load balancers have configurations, and servers must balance resource usage against responsiveness.

The most common failure mode in long polling implementations isn't the code—it's timeout misconfiguration. A single timeout set too short anywhere in the chain causes mysterious connection drops. A timeout set too long wastes resources and delays reconnection. Getting timeouts right requires understanding every layer of the stack and orchestrating them into a harmonious whole.

What You Will Learn

By the end of this page, you'll understand all the timeout layers in a long polling system, how to configure them to work together, and strategies for graceful timeout recovery. You'll be able to diagnose timeout-related connection issues and configure robust, reliable long polling infrastructure.

Anatomy of Timeout Layers

A typical long poll request traverses multiple components, each with its own timeout configuration. Understanding this stack is essential for correct configuration.

The Timeout Stack:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Client Request Flow Through Timeout Layers
══════════════════════════════════════════════════════════════════════════════
 
┌─────────────────────────────────────────────────────────────────────────────┐
│                              CLIENT BROWSER                                  │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  fetch() timeout          │  XMLHttpRequest timeout                 │    │
│  │  Default: None            │  Default: None (browser decides)        │    │
│  │  Practical: 35-65 seconds │  Practical: 35-65 seconds               │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────────┘
                                        │
                                        ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              CDN / EDGE                                      │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  Connection timeout       │  Read timeout                          │    │
│  │  Cloudflare: 15 seconds   │  Cloudflare: 100 seconds               │    │
│  │  CloudFront: 30 seconds   │  CloudFront: 60 seconds (max)          │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────────┘
                                        │
                                        ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           LOAD BALANCER                                      │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  AWS ALB: 60 seconds (configurable 1-4000s)                         │    │
│  │  nginx: proxy_read_timeout 60s (default)                            │    │
│  │  HAProxy: timeout client 50s (default)                              │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────────┘
                                        │
                                        ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        APPLICATION SERVER                                    │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  Request handler timeout  │  Keep-alive timeout                     │    │
│  │  Express: none by default │  Node http: 5 seconds                   │    │
│  │  Fastify: configurable    │  Go: varies by server                   │    │
│  │  Custom: 30 seconds       │  Should be > long poll timeout          │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────────┘

The Golden Rule:

Timeouts must be configured in ascending order from application to client:

Application Timeout < Load Balancer Timeout < CDN Timeout < Client Timeout

If any outer layer times out before an inner layer, the connection is severed unexpectedly, and the application receives no indication that the client is gone. This leads to resource leaks and orphaned waiters.

Example Configuration Chain:

Layer	Timeout	Rationale
Application	30 seconds	Base timeout, controlled by business logic
Server keep-alive	35 seconds	Slightly longer to allow response transmission
Load balancer	45 seconds	Buffer for network latency
CDN	55 seconds	Additional buffer
Client	60 seconds	Longest, ensures client detects issues last

The Invisible Killer

The most insidious timeout issues occur at proxy layers you don't control. Corporate proxies often have aggressive 30-second timeouts. Mobile carriers may terminate idle connections after 60 seconds. Cloud provider defaults rarely match long polling needs. Always verify the entire path with real traffic.

Server-Side Timeout Strategy

The application server controls the primary long poll timeout. This timer determines how long to wait for events before sending an empty response and prompting the client to reconnect.

Implementing Application-Level Timeouts:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
interface LongPollConfig {
    // Base timeout for holding connections
    baseTimeout: number;        // default: 30000 (30 seconds)
    
    // Variance to prevent synchronized reconnects
    timeoutJitter: number;      // default: 5000 (5 seconds)
    
    // Minimum timeout (for backpressure)
    minTimeout: number;         // default: 10000 (10 seconds)
    
    // Maximum timeout (infrastructure constraint)
    maxTimeout: number;         // default: 55000 (55 seconds)
    
    // Grace period after timeout before forceful close
    gracePeriod: number;        // default: 1000 (1 second)
}
 
class LongPollTimeoutManager {
    private readonly config: LongPollConfig;
    
    constructor(config: Partial<LongPollConfig> = {}) {
        this.config = {
            baseTimeout: 30000,
            timeoutJitter: 5000,
            minTimeout: 10000,
            maxTimeout: 55000,
            gracePeriod: 1000,
            ...config,
        };
    }
    
    /**
     * Calculate timeout for a specific request
     * May vary based on system load or client hints
     */
    calculateTimeout(request: Request, systemLoad: number): number {
        // Start with base timeout
        let timeout = this.config.baseTimeout;
        
        // Apply jitter to prevent synchronized reconnects
        const jitter = (Math.random() - 0.5) * 2 * this.config.timeoutJitter;
        timeout += jitter;
        
        // Reduce timeout under high load (backpressure)
        if (systemLoad > 0.8) {
            timeout = Math.max(
                this.config.minTimeout,
                timeout * (1 - (systemLoad - 0.8) * 2) // Linear reduction
            );
        }
        
        // Respect client's requested timeout if within bounds
        const clientTimeout = parseInt(
            request.headers['x-long-poll-timeout'] || '0'
        ) * 1000;
        
        if (clientTimeout > 0) {
            timeout = Math.min(timeout, clientTimeout);
        }
        
        // Clamp to bounds
        return Math.max(
            this.config.minTimeout,
            Math.min(this.config.maxTimeout, timeout)
        );
    }
    
    /**
     * Create a timeout handler for a long poll request
     */
    createTimeoutHandler(
        request: Request,
        response: Response,
        onTimeout: () => void
    ): { timer: NodeJS.Timeout; cancel: () => void } {
        const timeout = this.calculateTimeout(
            request,
            this.getSystemLoad()
        );
        
        // Track timeout for metrics
        const startTime = Date.now();
        
        const timer = setTimeout(() => {
            const elapsed = Date.now() - startTime;
            
            // Record metric
            metrics.histogram('longpoll.timeout_duration', elapsed);
            metrics.increment('longpoll.timeout_count');
            
            // Execute timeout callback
            onTimeout();
            
            // Start grace timer for forced cleanup
            setTimeout(() => {
                if (!response.writableEnded) {
                    console.warn('Long poll response not ended after grace period');
                    response.destroy();
                }
            }, this.config.gracePeriod);
            
        }, timeout);
        
        // Communicate timeout to client via header
        response.setHeader('X-Long-Poll-Timeout-Ms', timeout.toString());
        
        return {
            timer,
            cancel: () => {
                clearTimeout(timer);
                const elapsed = Date.now() - startTime;
                metrics.histogram('longpoll.response_time', elapsed);
            },
        };
    }
    
    private getSystemLoad(): number {
        // In production: CPU usage, memory pressure, pending request count
        const os = require('os');
        const cpus = os.cpus();
        const load = os.loadavg()[0];
        return Math.min(1, load / cpus.length);
    }
}
 
// Usage in request handler
app.get('/events/poll', async (req, res) => {
    const timeoutManager = new LongPollTimeoutManager();
    
    // Check for immediate data first
    const data = await checkForEvents(req.user.id, req.query.since);
    if (data.length > 0) {
        return res.json({ events: data });
    }
    
    // Set up timeout handler
    const { timer, cancel } = timeoutManager.createTimeoutHandler(
        req,
        res,
        () => {
            // Timeout reached - send empty response
            res.status(204).end();
        }
    );
    
    // Set up event listener
    const handler = (events) => {
        cancel(); // Cancel timeout
        res.json({ events });
    };
    
    eventBus.once(`user:${req.user.id}`, handler);
    
    // Cleanup on disconnect
    req.on('close', () => {
        cancel();
        eventBus.off(`user:${req.user.id}`, handler);
    });
});

Timeout Jitter:

Without jitter, if all clients set 30-second timeouts, they'll all reconnect at approximately the same time, creating periodic load spikes. Jitter distributes reconnections over a time window:

Without jitter: ████████████████████░░░░░░░░░░░░░░░░░░░░ (spike at T+30s)
With 5s jitter: ░░░░░░░░████████████████████████░░░░░░░░ (spread T+25s to T+35s)

Adaptive Timeout Under Load:

When system load increases, holding connections longer consumes more memory. Reducing timeout under load (backpressure) frees resources faster:

System Load	Timeout Adjustment	Rationale
0-80%	Full timeout (30s)	Normal operation
80-90%	Reduce to 20s	Early pressure release
90-100%	Reduce to 10s	Aggressive resource recovery
100% (overload)	Reject new polls	Protect existing connections

Infrastructure Timeout Configuration

Every infrastructure component between client and server needs explicit timeout configuration for long polling. Here are the key components and their settings:

NGINX Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# /etc/nginx/conf.d/longpoll.conf
 
upstream longpoll_backend {
    server 10.0.1.1:8080;
    server 10.0.1.2:8080;
    
    # Keep connections alive to backend
    keepalive 64;
}
 
server {
    listen 443 ssl http2;
    server_name api.example.com;
    
    # Specific location for long polling endpoints
    location /api/events/poll {
        proxy_pass http://longpoll_backend;
        
        # Connection timeout (establishing connection to backend)
        proxy_connect_timeout 10s;
        
        # Read timeout (waiting for backend response)
        # Must be > application timeout (30s) + buffer
        proxy_read_timeout 45s;
        
        # Send timeout (sending request to backend)
        proxy_send_timeout 10s;
        
        # Disable buffering - responses sent immediately
        proxy_buffering off;
        
        # HTTP/1.1 for keep-alive
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        # Pass client info
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        
        # Disable caching entirely
        proxy_cache off;
        add_header Cache-Control "no-cache, no-store, must-revalidate";
    }
    
    # Other API endpoints with standard timeouts
    location /api/ {
        proxy_pass http://longpoll_backend;
        proxy_read_timeout 30s;  # Standard API timeout
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# AWS Application Load Balancer for long polling
 
resource "aws_lb_target_group" "longpoll" {
  name     = "longpoll-targets"
  port     = 8080
  protocol = "HTTP"
  vpc_id   = aws_vpc.main.id
  
  # Health check configuration
  health_check {
    enabled             = true
    healthy_threshold   = 2
    interval            = 30
    matcher             = "200"
    path                = "/health"
    port                = "traffic-port"
    protocol            = "HTTP"
    timeout             = 5
    unhealthy_threshold = 3
  }
  
  # Deregistration delay - allow long polls to complete
  deregistration_delay = 60
  
  # Enable sticky sessions if needed
  stickiness {
    enabled         = false
    type            = "lb_cookie"
    cookie_duration = 86400
  }
}
 
resource "aws_lb_listener_rule" "longpoll" {
  listener_arn = aws_lb_listener.https.arn
  priority     = 100
  
  action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.longpoll.arn
  }
  
  condition {
    path_pattern {
      values = ["/api/events/poll*"]
    }
  }
}
 
# ALB idle timeout - CRITICAL for long polling
resource "aws_lb" "main" {
  name               = "main-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = aws_subnet.public[*].id
  
  # Must exceed application timeout
  idle_timeout = 65  # seconds (max 4000, but keep reasonable)
  
  # Enable HTTP/2 for multiplexing benefits
  enable_http2 = true
}

Cloud Platform Limits:

Platform	Default Timeout	Maximum	Configuration
AWS ALB	60 seconds	4000 seconds	`idle_timeout`
AWS API Gateway	30 seconds	30 seconds	Fixed (consider ALB)
GCP Cloud Load Balancing	30 seconds	86400 seconds	`timeoutSec`
Azure Application Gateway	30 seconds	3600 seconds	Request timeout
Cloudflare	100 seconds	Enterprise only	Proxy read timeout
nginx	60 seconds	Unlimited	`proxy_read_timeout`
HAProxy	50 seconds (client)	Unlimited	`timeout client`

AWS API Gateway Warning

AWS API Gateway has a HARD 30-second timeout that cannot be increased. It is NOT suitable for long polling. Use ALB + EC2/ECS, or consider API Gateway's WebSocket support instead. This is a common source of mysterious connection drops.

Client-Side Timeout Management

The client must implement timeouts that work harmoniously with server infrastructure. Too short, and you spam reconnects. Too long, and users wait forever when connections silently die.

Client Timeout Strategy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
interface ClientTimeoutConfig {
    // Expected server timeout (from header or default)
    expectedServerTimeout: number;
    
    // Buffer to add beyond server timeout
    timeoutBuffer: number;
    
    // Maximum client timeout (absolute limit)
    maxClientTimeout: number;
    
    // Heartbeat interval to detect silent disconnects
    heartbeatInterval: number;
}
 
class TimeoutAwareLongPollClient {
    private config: ClientTimeoutConfig = {
        expectedServerTimeout: 30000,
        timeoutBuffer: 5000,
        maxClientTimeout: 90000,
        heartbeatInterval: 10000,
    };
    
    private abortController: AbortController | null = null;
    private heartbeatTimer: number | null = null;
    private lastActivity: number = 0;
    
    async poll(): Promise<PollResult> {
        this.abortController = new AbortController();
        this.lastActivity = Date.now();
        
        // Calculate appropriate timeout
        const clientTimeout = Math.min(
            this.config.expectedServerTimeout + this.config.timeoutBuffer,
            this.config.maxClientTimeout
        );
        
        // Start heartbeat monitoring
        this.startHeartbeatMonitor();
        
        try {
            const response = await fetch('/api/events/poll', {
                signal: AbortSignal.any([
                    this.abortController.signal,
                    AbortSignal.timeout(clientTimeout),
                ]),
                headers: {
                    'X-Long-Poll-Timeout': String(
                        Math.floor(this.config.expectedServerTimeout / 1000)
                    ),
                },
            });
            
            // Update expected timeout from server response
            const serverTimeout = response.headers.get('X-Long-Poll-Timeout-Ms');
            if (serverTimeout) {
                this.config.expectedServerTimeout = parseInt(serverTimeout);
            }
            
            // Record activity
            this.lastActivity = Date.now();
            
            if (response.status === 200) {
                return { status: 'data', data: await response.json() };
            } else if (response.status === 204) {
                return { status: 'timeout' };
            } else {
                return { status: 'error', error: new Error(`HTTP ${response.status}`) };
            }
            
        } catch (error) {
            if ((error as Error).name === 'TimeoutError') {
                // Client-side timeout indicates possible connection issue
                console.warn('Client timeout before server response');
                
                // Next poll, use shorter expected timeout
                this.config.expectedServerTimeout *= 0.9;
                
                return { status: 'client_timeout' };
            }
            
            throw error;
            
        } finally {
            this.stopHeartbeatMonitor();
        }
    }
    
    /**
     * Network heartbeat detection
     * Detects silent disconnections that don't trigger errors
     */
    private startHeartbeatMonitor(): void {
        this.heartbeatTimer = window.setInterval(() => {
            const timeSinceActivity = Date.now() - this.lastActivity;
            
            // If no activity for much longer than expected, something's wrong
            const activityThreshold = this.config.expectedServerTimeout + 
                this.config.timeoutBuffer * 2;
            
            if (timeSinceActivity > activityThreshold) {
                console.warn('No activity detected, forcing reconnect');
                this.abortController?.abort();
            }
            
            // Optional: Send lightweight keep-alive ping
            // Some proxies close idle connections
            this.sendKeepalive();
            
        }, this.config.heartbeatInterval);
    }
    
    private stopHeartbeatMonitor(): void {
        if (this.heartbeatTimer) {
            clearInterval(this.heartbeatTimer);
            this.heartbeatTimer = null;
        }
    }
    
    private sendKeepalive(): void {
        // Tiny request to keep connection "active" through proxies
        // Note: Only needed if proxy tracks connection activity
        try {
            navigator.sendBeacon('/api/keepalive', '');
        } catch {
            // Best effort, ignore failures
        }
    }
}

The Silent Disconnect Problem:

A particularly insidious issue occurs when a connection dies silently—the TCP connection is terminated by a proxy or network issue, but neither client nor server receives notification. The client waits for its timeout, the server holds resources indefinitely.

Detection strategies:

Client-side activity monitoring — If no data arrives within expected time + buffer, assume connection is dead
Server-side TCP keep-alive — Operating system sends probe packets to detect dead connections
Application-level heartbeats — Periodic tiny messages to confirm liveness
Timeout always wins — Never wait indefinitely; always have a maximum wait time

Mobile Network Reality

Mobile networks are especially prone to silent disconnections. Carriers often NAT connections and may drop idle connections within 30-60 seconds. For mobile clients, use shorter long poll timeouts (20-25 seconds) and expect more frequent reconnections. Battery impact of reconnection is less than battery impact of mobile radio being kept alive.

Graceful Timeout Recovery

Timeouts are not failures—they're expected behavior. A well-designed system treats timeout as just another clean termination mode, ensuring seamless recovery.

The Timeout Response Protocol:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// Server: Clean timeout response
app.get('/events/poll', async (req, res) => {
    const since = req.query.since as string | undefined;
    const timeout = 30000;
    
    // ... wait for events or timeout ...
    
    // Timeout reached - send informative response
    res.status(200).json({
        events: [],          // No events (empty array, not null)
        timeout: true,       // Explicit timeout indicator
        serverTime: Date.now(), // Server timestamp for sync
        nextPollDelay: 0,    // Immediate reconnect suggested
        retryAfter: null,    // No backoff needed
        cursors: {           // Return current cursor positions
            [channelId]: since || 'beginning',
        },
    });
});
 
// Client: Handle timeout as normal flow
class LongPollClient {
    async handlePollResult(response: PollResponse): Promise<void> {
        if (response.timeout) {
            // Normal timeout - not an error
            this.metrics.recordTimeout();
            
            // Sync time with server (detect clock drift)
            const serverTime = response.serverTime;
            const localTime = Date.now();
            const drift = Math.abs(serverTime - localTime);
            
            if (drift > 5000) {
                console.warn(`Clock drift detected: ${drift}ms`);
                // May need to adjust timeout calculations
            }
            
            // Reconnect per server suggestion
            await this.scheduleReconnect(response.nextPollDelay);
            return;
        }
        
        // Process events...
    }
    
    async scheduleReconnect(delay: number): Promise<void> {
        // Track reconnection metrics
        this.metrics.recordReconnect();
        
        if (delay > 0) {
            await sleep(delay);
        }
        
        // Reconnect immediately (or after suggested delay)
        this.poll();
    }
}

What NOT to do on Timeout:

❌ Return 408 Request Timeout (implies client error) ❌ Return 504 Gateway Timeout (implies server error) ❌ Close connection without response ❌ Return error-like response body ❌ Require exponential backoff

What TO do on Timeout:

✅ Return 200 OK with empty events array, or ✅ Return 204 No Content ✅ Include metadata for client intelligence ✅ Suggest immediate reconnection ✅ Clean up all server-side resources ✅ Log for metrics (not as errors)

The 200 vs 204 Debate

Some implementations use 200 OK with empty body; others use 204 No Content. Both are valid. 204 is semantically cleaner (truly nothing to return), but 200 with body allows including metadata like server time, cursor position, and retry hints. For rich clients, 200 with body is often more practical.

Timeout Under System Pressure

When systems are overloaded, timeout behavior becomes critical for stability. The right timeout strategy can prevent cascading failures; the wrong strategy amplifies them.

Load Shedding via Timeout:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
class AdaptiveTimeoutManager {
    private readonly baseTimeout = 30000;
    private readonly minTimeout = 5000;
    
    // Metrics for load detection
    private pendingRequests = 0;
    private recentResponseTimes: number[] = [];
    private recentErrors = 0;
    
    private readonly maxPending = 10000;
    private readonly p99Target = 25000; // Target P99 < 25s
    
    /**
     * Calculate timeout based on system state
     */
    getTimeout(): number {
        // Check capacity
        const capacityUsage = this.pendingRequests / this.maxPending;
        
        if (capacityUsage > 0.95) {
            // Critical load - minimum timeout to free resources immediately
            return this.minTimeout;
        }
        
        if (capacityUsage > 0.8) {
            // High load - reduce timeout linearly
            const reduction = (capacityUsage - 0.8) / 0.15; // 0 at 80%, 1 at 95%
            return Math.max(
                this.minTimeout,
                this.baseTimeout * (1 - reduction * 0.7) // Up to 70% reduction
            );
        }
        
        // Check latency
        const p99 = this.getP99ResponseTime();
        if (p99 > this.p99Target) {
            // Latency too high - reduce timeout to shed load
            const latencyRatio = p99 / this.p99Target;
            return Math.max(
                this.minTimeout,
                this.baseTimeout / latencyRatio
            );
        }
        
        // Normal operation
        return this.baseTimeout;
    }
    
    /**
     * Should we reject new long poll requests entirely?
     */
    shouldReject(): { reject: boolean; retryAfter?: number } {
        const capacityUsage = this.pendingRequests / this.maxPending;
        
        if (capacityUsage > 0.98) {
            // Critical - reject with retry hint
            return { 
                reject: true, 
                retryAfter: 5 + Math.random() * 10 // 5-15 seconds
            };
        }
        
        if (this.recentErrors > 100) { // More than 100 errors in window
            // Error storm - circuit break
            return { 
                reject: true, 
                retryAfter: 30 
            };
        }
        
        return { reject: false };
    }
    
    /**
     * Record request lifecycle
     */
    recordRequest(): () => void {
        this.pendingRequests++;
        const startTime = Date.now();
        
        return () => {
            this.pendingRequests--;
            const duration = Date.now() - startTime;
            this.recentResponseTimes.push(duration);
            
            // Keep only recent samples
            if (this.recentResponseTimes.length > 1000) {
                this.recentResponseTimes = this.recentResponseTimes.slice(-500);
            }
        };
    }
    
    private getP99ResponseTime(): number {
        if (this.recentResponseTimes.length < 10) {
            return 0; // Not enough data
        }
        
        const sorted = [...this.recentResponseTimes].sort((a, b) => a - b);
        const p99Index = Math.floor(sorted.length * 0.99);
        return sorted[p99Index];
    }
}
 
// Usage in handler
const timeoutManager = new AdaptiveTimeoutManager();
 
app.get('/events/poll', async (req, res) => {
    // Check if we should reject
    const rejection = timeoutManager.shouldReject();
    if (rejection.reject) {
        res.status(503);
        res.setHeader('Retry-After', String(rejection.retryAfter));
        return res.json({ 
            error: 'SERVICE_OVERLOADED',
            retryAfter: rejection.retryAfter,
        });
    }
    
    // Record this request
    const complete = timeoutManager.recordRequest();
    
    // Get current timeout
    const timeout = timeoutManager.getTimeout();
    
    try {
        // ... normal long poll logic with timeout ...
    } finally {
        complete();
    }
});

Timeout as Backpressure Mechanism:

Shorter timeouts under load create a natural backpressure mechanism:

High load detected → Reduce timeout to 10 seconds
Connections terminate faster → Resources freed quicker
Client reconnects → May be served by different (less loaded) server
System stabilizes → Timeout returns to normal

This is more graceful than rejecting requests outright, as clients still receive valid responses and can serve cached data during reconnection.

The Retry Storm Risk

Aggressive timeout reduction can trigger retry storms: clients disconnect and immediately reconnect, increasing load further. Mitigate by including a 'nextPollDelay' in timeout responses during high load, spreading reconnection attempts over several seconds. Combined with client-side jitter, this prevents synchronized reconnection waves.

Debugging Timeout Issues

Timeout issues are among the most frustrating to debug because they manifest as silent failures with no clear error messages. Here's a systematic approach to diagnosing timeout problems.

Diagnostic Checklist:

Timeout Issue Diagnosis Guide
Symptom	Likely Cause	Diagnostic Step	Solution
Consistent ~30s disconnects	Proxy or LB timeout	Check infrastructure configs	Increase intermediate timeouts
Random disconnects at varying times	Network instability	Track disconnect timestamps	Add heartbeat/keepalive
Disconnects during low activity	Idle connection killed	Monitor connection activity	Increase keep-alive frequency
Client timeout before server acknowledges	Client timeout too short	Compare client/server timeouts	Increase client timeout buffer
High timeout rate under load only	Server-side backpressure	Check server load metrics	Scale horizontally or reduce timeout
Timeouts on specific routes only	Route-specific config	Compare route configurations	Unify timeout settings

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
// Comprehensive timeout metrics
class TimeoutDiagnostics {
    // Track every connection lifecycle
    instrumentConnection(req: Request, res: Response): void {
        const connectionId = crypto.randomUUID();
        const startTime = Date.now();
        
        const metadata = {
            connectionId,
            clientIp: req.ip,
            userAgent: req.headers['user-agent'],
            route: req.path,
            expectedTimeout: parseInt(req.headers['x-long-poll-timeout'] || '30') * 1000,
        };
        
        // Log connection start
        logger.info('long_poll_start', metadata);
        
        // Track response
        res.on('finish', () => {
            const duration = Date.now() - startTime;
            const status = res.statusCode;
            
            logger.info('long_poll_end', {
                ...metadata,
                duration,
                status,
                terminationReason: this.classifyTermination(duration, status, metadata),
            });
        });
        
        // Track unexpected close
        req.on('close', () => {
            if (!res.writableEnded) {
                const duration = Date.now() - startTime;
                
                logger.warn('long_poll_unexpected_close', {
                    ...metadata,
                    duration,
                    closedBy: duration < metadata.expectedTimeout * 0.9 
                        ? 'likely_intermediate_proxy' 
                        : 'likely_client',
                });
            }
        });
    }
    
    private classifyTermination(
        duration: number, 
        status: number, 
        metadata: any
    ): string {
        if (status === 200 && duration < 1000) {
            return 'immediate_data';
        }
        if (status === 200) {
            return 'data_after_wait';
        }
        if (status === 204) {
            const timeoutRatio = duration / metadata.expectedTimeout;
            if (timeoutRatio > 0.95 && timeoutRatio < 1.05) {
                return 'server_timeout_expected';
            }
            if (timeoutRatio < 0.8) {
                return 'server_timeout_early_suspicious';
            }
            return 'server_timeout';
        }
        if (status === 503) {
            return 'load_shedding';
        }
        return 'unknown';
    }
}
 
// Dashboard query to identify timeout patterns
/*
SELECT 
    terminationReason,
    COUNT(*) as count,
    AVG(duration) as avg_duration,
    PERCENTILE(duration, 0.5) as p50,
    PERCENTILE(duration, 0.99) as p99
FROM long_poll_events
WHERE timestamp > NOW() - INTERVAL 1 HOUR
GROUP BY terminationReason
ORDER BY count DESC
*/

The Ratio Test

Calculate the ratio of actual duration to expected timeout. If most timeouts cluster around a specific ratio (e.g., 0.5 = half expected), a component in the middle of your stack is overriding your timeout. Work backward through infrastructure to find the misconfigured component.

Summary: Mastering the Timeout Stack

We've explored the intricate world of timeout handling in long polling systems. Let's consolidate the essential principles:

Key Takeaways

•Timeouts form a layered stack — Every component from application to client has timeout configuration that must harmonize.
•Order matters — Application < Load Balancer < CDN < Client ensures clean termination at the right layer.
•Jitter prevents storms — Randomizing timeouts spreads reconnection load over time.
•Adaptive timeout enables backpressure — Reducing timeout under load frees resources and protects system stability.
•Timeout is not an error — Design timeout responses as normal flow with helpful metadata for clients.
•Silent disconnects require detection — Activity monitoring and heartbeats catch connections that die without notification.
•Infrastructure defaults rarely work — Explicitly configure every layer for long polling's unique requirements.
•Instrumentation is essential — Detailed logging and metrics are required to diagnose timeout issues.

What's Next:

Now that we understand timeout handling, we'll compare long polling with alternative real-time approaches: WebSockets, Server-Sent Events, and short polling. This comparison will help you choose the right technology for your specific use case.

Page Complete

You now understand the complete timeout landscape for long polling: the timeout stack, server and client strategies, infrastructure configuration, adaptive behavior under load, and diagnostic approaches. Next, we'll compare long polling with other real-time technologies to inform your architecture decisions.