System Design (HLD)WebSockets

WebSockets: Real-Time Bidirectional Communication

LevelIntermediate

Duration75 mins

TopicWebSockets

3 / 5

Scaling WebSocket Connections

The Scaling Challenge: From One to Millions

A single server handling a few hundred WebSocket connections is straightforward. But what happens when your application grows to serve ten thousand users? A hundred thousand? A million? The stateful, long-lived nature of WebSocket connections introduces scaling challenges that simply don't exist with stateless HTTP.

Consider the contrast:

HTTP at scale: Add more servers behind a load balancer. Any server can handle any request. User requests are distributed evenly. Done.
WebSocket at scale: Each connection is bound to a specific server. That server must receive all messages for that connection. When one user sends a message to another, and they're connected to different servers, how does the message route correctly?

This page addresses the fundamental question: How do we architect WebSocket systems that scale horizontally while maintaining the real-time, stateful communication that WebSockets enable?

What You Will Learn

By the end of this page, you will understand WebSocket scaling challenges, resource management at scale, horizontal scaling strategies using pub/sub, sticky sessions and their trade-offs, connection management patterns, and real-world architectures used by high-scale WebSocket systems.

Resource Constraints at Scale

Before discussing scaling strategies, we must understand what limits a single server's WebSocket capacity. Each connection consumes resources, and understanding these constraints helps you plan capacity and optimize configurations.

Per-Connection Resource Consumption

•File Descriptors — Each WebSocket connection requires a socket, which consumes a file descriptor. Linux defaults to 1024 per process (soft limit), but production systems can be configured for millions. Monitor with ulimit -n and /proc/sys/fs/file-max.
•Memory — Each connection requires memory for socket buffers (send/receive), application state, and parsing context. Typical overhead: 2-20KB per idle connection, more when messages are in flight.
•TCP State — The kernel maintains TCP connection tracking structures. Tens of thousands of connections consume kernel memory and CPU for timer management.
•Application State — Your application logic (user sessions, subscriptions, queues) adds memory per connection. This often dominates over network overhead.
•CPU — Idle connections consume minimal CPU, but message processing, serialization, and application logic scale with activity. Connection management (ping/pong) adds constant overhead.

Single Server Capacity Estimations
Resource	Constraint	Typical Limit	Optimization Strategy
File Descriptors	OS limit	1M+ (with tuning)	Raise ulimit, tune kernel parameters
Memory	RAM available	10K-100K connections per GB	Optimize buffers, reduce per-connection state
CPU	Processing capacity	Varies with message rate	Efficient serialization, batch processing
Network Bandwidth	NIC capacity	10 Gbps+	Compression, binary protocols
Event Loop	Concurrency model	Platform-dependent	Use async I/O (Node.js, Go, Rust actors)

Linux Tuning for High Connection Counts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# /etc/sysctl.conf optimizations for WebSocket servers
 
# Increase maximum file descriptors
fs.file-max = 2097152
fs.nr_open = 2097152
 
# Increase socket buffer sizes
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 1048576
net.core.wmem_default = 1048576
 
# TCP tuning
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
 
# Increase connection backlog
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
 
# Allow more local ports for outbound connections
net.ipv4.ip_local_port_range = 1024 65535
 
# Reduce TIME_WAIT connections
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
 
# Increase connection tracking table
net.netfilter.nf_conntrack_max = 2097152
 
# Application process limits (/etc/security/limits.conf)
# *    soft    nofile    1048576
# *    hard    nofile    1048576

The C10K Problem and Beyond

The 'C10K problem' (handling 10,000 concurrent connections) was solved decades ago with event-driven I/O. Modern systems target C100K, C1M, or even C10M. Languages like Go, Node.js, and Rust with async runtimes are designed for this scale. Choose your stack wisely—thread-per-connection models won't scale.

The Statefulness Problem

WebSocket's stateful nature is simultaneously its greatest strength and its most significant scaling challenge. Unlike HTTP, where any request can be routed to any server, WebSocket messages must be routed to the specific server holding that connection.

The fundamental problem:

Imagine a chat application with two users, Alice and Bob:

Alice connects to Server A
Bob connects to Server B
Alice sends a message intended for Bob
Server A receives the message... but Bob's connection is on Server B

How does the message reach Bob?

With HTTP, this isn't a problem—Bob's next request could hit any server, and the response would be the same. But with WebSockets, there is no "next request" from Bob. He's connected to Server B, waiting for messages to arrive on that specific connection.

The Routing Problem Illustrated
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Naive single-server implementation (works but doesn't scale)
const connections = new Map<string, WebSocket>();  // userId -> socket
 
function handleMessage(fromUser: string, message: ChatMessage) {
    const recipientSocket = connections.get(message.toUser);
    if (recipientSocket) {
        recipientSocket.send(JSON.stringify(message));
    }
}
 
// Problem: In a multi-server setup, 'connections' only contains
// users connected to THIS server. If message.toUser is connected
// to a different server, connections.get() returns undefined.
 
//                    ┌─────────────────┐
//                    │  Load Balancer  │
//                    └────────┬────────┘
//                             │
//           ┌─────────────────┼─────────────────┐
//           ▼                 ▼                 ▼
//    ┌─────────────┐   ┌─────────────┐   ┌─────────────┐
//    │  Server A   │   │  Server B   │   │  Server C   │
//    │             │   │             │   │             │
//    │ connections:│   │ connections:│   │ connections:│
//    │ - Alice     │   │ - Bob       │   │ - Charlie   │
//    │ - Dave      │   │ - Eve       │   │ - Frank     │
//    └─────────────┘   └─────────────┘   └─────────────┘
//
// Alice on Server A wants to message Bob on Server B
// Server A has no way to deliver directly—it doesn't know 
// Bob exists, let alone which server he's connected to.

Challenges Created by Statefulness

•Message routing — Messages destined for users on other servers need a forwarding mechanism.
•Broadcast complexity — Broadcasting to all users requires aggregating across all servers.
•Session affinity — The same user must reconnect to the same server (or transfer state).
•Failover — When a server dies, its connections are lost. Reconnecting clients may hit different servers.
•Uneven load — Some servers might have heavy users, others light. Load balancing by connection count doesn't ensure balanced resource usage.
•Deployment complexity — Rolling deploys must gracefully migrate connections or accept reconnection storms.

Pub/Sub for Cross-Server Messaging

The standard solution to WebSocket's statefulness problem is pub/sub messaging between servers. Instead of servers communicating directly with each other, they publish messages to a shared message bus, and servers subscribe to messages relevant to their connected users.

How pub/sub solves the routing problem:

Each server subscribes to channels for its connected users (e.g., user:bob, room:general)
When a message arrives, the server publishes it to the appropriate channel
The pub/sub system forwards the message to all subscribers of that channel
Subscribing servers receive the message and forward to local connections

This decouples senders from receivers. Server A doesn't need to know Bob is on Server B—it just publishes to user:bob, and whoever's subscribed receives it.

Redis Pub/Sub Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
import Redis from 'ioredis';
import WebSocket, { WebSocketServer } from 'ws';
 
// Two Redis clients: one for pub, one for sub (required by Redis)
const publisher = new Redis();
const subscriber = new Redis();
 
const wss = new WebSocketServer({ port: 8080 });
 
// Map: userId -> local WebSocket connection
const localConnections = new Map<string, WebSocket>();
 
// ═══════════════════════════════════════════════════════════════
// SUBSCRIBE TO MESSAGES FROM OTHER SERVERS
// ═══════════════════════════════════════════════════════════════
 
subscriber.on('message', (channel, messageStr) => {
    const message = JSON.parse(messageStr);
    
    if (channel.startsWith('user:')) {
        // Direct message to a user
        const userId = channel.replace('user:', '');
        const socket = localConnections.get(userId);
        
        if (socket && socket.readyState === WebSocket.OPEN) {
            socket.send(messageStr);
        }
    } else if (channel.startsWith('room:')) {
        // Broadcast to all local users in this room
        // (In practice, track room memberships)
        broadcastToRoom(channel.replace('room:', ''), message);
    }
});
 
// ═══════════════════════════════════════════════════════════════
// HANDLE NEW CONNECTIONS
// ═══════════════════════════════════════════════════════════════
 
wss.on('connection', async (ws, request) => {
    const userId = authenticateUser(request);
    
    // Track locally
    localConnections.set(userId, ws);
    
    // Subscribe to this user's channel
    await subscriber.subscribe(`user:${userId}`);
    
    // Subscribe to user's rooms
    const userRooms = await getUserRooms(userId);
    for (const room of userRooms) {
        await subscriber.subscribe(`room:${room}`);
    }
    
    ws.on('message', async (data) => {
        const message = JSON.parse(data.toString());
        
        switch (message.type) {
            case 'direct_message':
                // Publish to target user's channel
                // Any server with that user subscribed will receive it
                await publisher.publish(
                    `user:${message.toUser}`,
                    JSON.stringify({
                        type: 'chat',
                        from: userId,
                        content: message.content,
                        timestamp: Date.now()
                    })
                );
                break;
                
            case 'room_message':
                // Publish to room channel
                // All servers with users in that room receive it
                await publisher.publish(
                    `room:${message.room}`,
                    JSON.stringify({
                        type: 'chat',
                        from: userId,
                        room: message.room,
                        content: message.content,
                        timestamp: Date.now()
                    })
                );
                break;
                
            case 'join_room':
                await subscriber.subscribe(`room:${message.room}`);
                break;
        }
    });
    
    ws.on('close', async () => {
        localConnections.delete(userId);
        
        // Unsubscribe from user channel
        await subscriber.unsubscribe(`user:${userId}`);
        
        // Unsubscribe from rooms if no other local users
        // (implementation depends on reference counting)
    });
});

Pub/Sub Technology Comparison
Technology	Throughput	Persistence	Best For
Redis Pub/Sub	~100K msg/s per node	None (fire-and-forget)	Simple setups, low-latency needs
Redis Streams	~50K msg/s per node	Yes	Reliable delivery, message history
Apache Kafka	1M+ msg/s (cluster)	Yes (durable)	High throughput, event sourcing
NATS	10M+ msg/s	Optional	Ultra-low latency, cloud-native
RabbitMQ	~50K msg/s	Optional	Complex routing, existing AMQP infra

Redis Pub/Sub Limitations

Redis Pub/Sub is fire-and-forget: if a server isn't subscribed when a message is published, it misses the message. There's no persistence or replay. For guaranteed delivery, consider Redis Streams, Kafka, or implement your own acknowledgment layer.

Sticky Sessions and Load Balancing

While pub/sub solves cross-server messaging, there's still the question of how WebSocket connections are initially distributed across servers and how reconnections are handled. Sticky sessions (also called session affinity) ensure that all requests from a user are routed to the same server.

Why sticky sessions for WebSockets:

Connection continuity — The WebSocket upgrade starts as HTTP. The initial request and the subsequent persistent connection must hit the same server.
Reconnection to same server — If a connection drops and the client reconnects quickly, resuming on the same server allows state recovery without pub/sub round-trips.
Reduced pub/sub traffic — Multiple connections from the same user to the same server avoid cross-server routing for self-directed messages.

Sticky session strategies:

Sticky Session Implementation Methods

•Source IP Hashing — Route based on client IP. Simple but fails with NAT (many users share one IP) and doesn't survive server scaling.
•Cookie-Based — Set a cookie with server ID on first request. Subsequent requests (including WebSocket upgrade) use the same server. Widely supported by load balancers.
•Header-Based — Use custom headers to encode server affinity. Works when cookies aren't available.
•Consistent Hashing — Hash user ID to server. Survives adding/removing servers with minimal redistribution. Ideal for WebSockets.
•Connection ID — Include server ID in the connection URL or as a query parameter. Client uses same URL for reconnection.

NGINX Sticky Session Configuration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# NGINX configuration for WebSocket sticky sessions
 
upstream websocket_servers {
    # IP hash for sticky sessions (simple approach)
    ip_hash;
    
    server ws1.example.com:8080;
    server ws2.example.com:8080;
    server ws3.example.com:8080;
}
 
# Alternative: Cookie-based sticky sessions (NGINX Plus or with module)
upstream websocket_servers_cookie {
    sticky cookie srv_id expires=1h domain=.example.com path=/;
    
    server ws1.example.com:8080;
    server ws2.example.com:8080;
    server ws3.example.com:8080;
}
 
server {
    listen 443 ssl;
    server_name ws.example.com;
    
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    
    location /ws {
        proxy_pass http://websocket_servers;
        
        # Required for WebSocket upgrade
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        
        # Pass client info
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        
        # Timeouts for long-lived connections
        proxy_read_timeout 86400s;  # 24 hours
        proxy_send_timeout 86400s;
        
        # Buffer settings
        proxy_buffering off;
    }
}

Consistent Hashing for WebSockets

Consistent hashing is often the best approach for WebSocket routing. Hash the user ID (not IP) to determine server assignment. When servers are added or removed, only 1/N connections need to move. Implement with a library like hashring or use load balancers that support it (HAProxy, Envoy).

Sticky Sessions: Pros

•Connection upgrade works reliably
•State recovery on reconnection
•Reduced pub/sub overhead
•Simpler local state management
•Better cache locality

Sticky Sessions: Cons

•Uneven load distribution possible
•Server failure moves all its users
•Harder to drain servers for maintenance
•May need session rebalancing
•Complicates autoscaling

Connection Management at Scale

Managing millions of concurrent connections requires sophisticated patterns for tracking, organizing, and efficiently operating on connection groups. The naive approach of iterating through all connections for broadcasts quickly becomes a bottleneck.

Connection organization patterns:

Efficient Connection Management
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
// ═══════════════════════════════════════════════════════════════
// MULTI-INDEX CONNECTION MANAGER
// ═══════════════════════════════════════════════════════════════
 
interface Connection {
    socket: WebSocket;
    userId: string;
    deviceId: string;
    rooms: Set<string>;
    lastActivity: number;
    metadata: Record<string, any>;
}
 
class ConnectionManager {
    // Primary index: connectionId -> Connection
    private connections = new Map<string, Connection>();
    
    // Secondary indexes for fast lookups
    private byUserId = new Map<string, Set<string>>();      // userId -> connectionIds
    private byRoom = new Map<string, Set<string>>();        // room -> connectionIds
    private byDeviceType = new Map<string, Set<string>>();  // deviceType -> connectionIds
    
    // Activity tracking for cleanup
    private activityQueue: string[] = [];
    
    addConnection(connId: string, conn: Connection): void {
        this.connections.set(connId, conn);
        
        // Index by user (one user may have multiple connections)
        if (!this.byUserId.has(conn.userId)) {
            this.byUserId.set(conn.userId, new Set());
        }
        this.byUserId.get(conn.userId)!.add(connId);
        
        // Index by device type
        const deviceType = conn.metadata.deviceType || 'unknown';
        if (!this.byDeviceType.has(deviceType)) {
            this.byDeviceType.set(deviceType, new Set());
        }
        this.byDeviceType.get(deviceType)!.add(connId);
    }
    
    removeConnection(connId: string): void {
        const conn = this.connections.get(connId);
        if (!conn) return;
        
        // Remove from all indexes
        this.byUserId.get(conn.userId)?.delete(connId);
        for (const room of conn.rooms) {
            this.byRoom.get(room)?.delete(connId);
        }
        
        this.connections.delete(connId);
    }
    
    joinRoom(connId: string, room: string): void {
        const conn = this.connections.get(connId);
        if (!conn) return;
        
        conn.rooms.add(room);
        
        if (!this.byRoom.has(room)) {
            this.byRoom.set(room, new Set());
        }
        this.byRoom.get(room)!.add(connId);
    }
    
    // Efficient broadcast to room (O(room size), not O(total connections))
    broadcastToRoom(room: string, message: string): void {
        const connIds = this.byRoom.get(room);
        if (!connIds) return;
        
        for (const connId of connIds) {
            const conn = this.connections.get(connId);
            if (conn && conn.socket.readyState === WebSocket.OPEN) {
                conn.socket.send(message);
            }
        }
    }
    
    // Send to specific user (all their devices)
    sendToUser(userId: string, message: string): void {
        const connIds = this.byUserId.get(userId);
        if (!connIds) return;
        
        for (const connId of connIds) {
            const conn = this.connections.get(connId);
            if (conn && conn.socket.readyState === WebSocket.OPEN) {
                conn.socket.send(message);
            }
        }
    }
    
    // Periodic cleanup of idle connections
    cleanupIdleConnections(maxIdleMs: number): number {
        const now = Date.now();
        let cleaned = 0;
        
        for (const [connId, conn] of this.connections) {
            if (now - conn.lastActivity > maxIdleMs) {
                conn.socket.close(1000, 'Idle timeout');
                this.removeConnection(connId);
                cleaned++;
            }
        }
        
        return cleaned;
    }
    
    getStats(): { total: number; byRoom: Record<string, number> } {
        const byRoom: Record<string, number> = {};
        for (const [room, connIds] of this.byRoom) {
            byRoom[room] = connIds.size;
        }
        
        return {
            total: this.connections.size,
            byRoom
        };
    }
}

Batching and buffering:

At high message rates, sending individual messages creates overhead. Batching multiple messages into single network operations improves throughput:

Time-based batching — Collect messages for 10-50ms, then send bundled
Size-based batching — Send when buffer reaches N messages or M bytes
Hybrid — Send on first message, then batch subsequent messages in a time window

Priority queuing:

Not all messages are equal. Implement per-connection priority queues to ensure high-priority messages (auth challenges, error notifications) aren't stuck behind bulk updates.

Horizontal Scaling Architecture

Let's assemble the patterns we've discussed into complete horizontal scaling architectures. These designs are proven in production at scale by companies handling millions of concurrent connections.

Production WebSocket Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// ═══════════════════════════════════════════════════════════════
// ARCHITECTURE DIAGRAM
// ═══════════════════════════════════════════════════════════════
 
                    ┌──────────────────────────────────────┐
                    │          Global Load Balancer        │
                    │     (GeoDNS / Anycast / Cloudflare)  │
                    └────────────────┬─────────────────────┘
                                     │
             ┌───────────────────────┼───────────────────────┐
             ▼                       ▼                       ▼
    ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
    │  Region: US-East│    │  Region: EU-West│    │  Region: APAC   │
    └────────┬────────┘    └────────┬────────┘    └────────┬────────┘
             │                      │                      │
    ┌────────┴────────┐    ┌────────┴────────┐    ┌────────┴────────┐
    │   L7 Load       │    │   L7 Load       │    │   L7 Load       │
    │   Balancer      │    │   Balancer      │    │   Balancer      │
    │ (Consistent     │    │ (Consistent     │    │ (Consistent     │
    │   Hashing)      │    │   Hashing)      │    │   Hashing)      │
    └────────┬────────┘    └────────┬────────┘    └────────┬────────┘
             │                      │                      │
    ┌────────┼────────┐    ┌────────┼────────┐    ┌────────┼────────┐
    ▼        ▼        ▼    ▼        ▼        ▼    ▼        ▼        ▼
  ┌───┐    ┌───┐    ┌───┐  ┌───┐  ┌───┐    ┌───┐  ┌───┐  ┌───┐    ┌───┐
  │WS1│    │WS2│    │WSn│  │WS1│  │WS2│    │WSn│  │WS1│  │WS2│    │WSn│
  └─┬─┘    └─┬─┘    └─┬─┘  └─┬─┘  └─┬─┘    └─┬─┘  └─┬─┘  └─┬─┘    └─┬─┘
    │        │        │      │      │        │      │      │        │
    └────────┴────────┼──────┴──────┴────────┼──────┴──────┴────────┘
                      │                      │
             ┌────────┴────────┐    ┌────────┴────────┐
             │ Message Broker   │◄──►│ Message Broker   │
             │ (Redis Cluster)  │    │ (Kafka)          │
             └────────┬─────────┘    └────────┬─────────┘
                      │                       │
             ┌────────┴─────────────────────┬─┘
             ▼                              ▼
    ┌─────────────────┐           ┌──────────────────┐
    │ Presence Service │           │ Message Service  │
    │ (who's online)   │           │ (history, search)│
    └─────────────────┘           └──────────────────┘
 
 
// ═══════════════════════════════════════════════════════════════
// KEY COMPONENTS
// ═══════════════════════════════════════════════════════════════
 
1. GLOBAL LOAD BALANCER
   - Routes users to nearest region (latency optimization)
   - GeoDNS or Anycast for global distribution
   - Health checking per region
 
2. L7 (APPLICATION) LOAD BALANCER  
   - Understands WebSocket upgrade headers
   - Consistent hashing by user ID
   - Drains connections gracefully on server removal
   - Examples: HAProxy, Envoy, NGINX Plus
 
3. WEBSOCKET SERVERS (STATELESS-ISH)
   - Hold connections and local state
   - Subscribe to pub/sub for incoming messages
   - Publish outgoing messages to pub/sub
   - Can be scaled horizontally with autoscaling
 
4. MESSAGE BROKER
   - Enables cross-server communication
   - Redis for simple pub/sub
   - Kafka for durability and replay
   - Can be clustered for HA and scale
 
5. SUPPORTING SERVICES
   - Presence: Track who's online across all servers
   - Message History: Persistent storage for catch-up
   - User Sessions: Authentication state

Scaling Milestones and Strategies
Scale	Connections	Key Challenges	Strategy
Small	< 10K	Getting it working	Single server, embedded pub/sub
Medium	10K - 100K	Horizontal scaling	Multiple servers, Redis pub/sub, sticky sessions
Large	100K - 1M	Pub/sub bottlenecks	Redis Cluster, connection sharding, batching
Massive	1M - 10M	Global distribution	Multi-region, Kafka, edge presence, protocol optimization
Extreme	10M	Custom everything	Custom protocols, kernel bypass, specialized hardware

Real-World Scaling Examples

Let's examine how major companies have solved WebSocket scaling challenges. These examples provide patterns and numbers that inform architectural decisions.

Slack: Enterprise Messaging at Scale

Slack handles millions of concurrent WebSocket connections for real-time messaging. Their architecture includes:

Edge connections: WebSocket connections terminate at edge servers close to users
Message routing: Apache Kafka for cross-datacenter message distribution
Channel-based routing: Messages route to servers based on channel membership
Fallback transport: Automatic fallback to long-polling when WebSocket fails

Key insight: Slack shards by workspace, so most messages stay within a workspace's server cluster.

Discord: Gaming Communications

Discord serves hundreds of millions of users with extreme real-time requirements:

Erlang-based Gateway: Custom WebSocket gateway in Erlang for massive concurrency
Guild-based sharding: Each Discord Guild (server) is assigned to specific backend servers
Hot-standby failover: Instant takeover when primary fails
Client-side reconnection: Clients maintain session state for seamless resume

Key insight: Discord's guild sharding means most messages don't cross server boundaries.

Pusher: WebSocket as a Service

Pusher (and similar services like Ably, PubNub) operate WebSocket infrastructure for other companies:

Multi-tenant architecture: Single cluster serves thousands of separate applications
Channel abstraction: All routing based on channels, not user IDs
Presence channels: Built-in tracking of who's subscribed to what
Global edge network: Connections terminate close to users, backend in fewer regions

Key insight: Channel-based routing simplifies architecture—no user-to-server mapping needed.

Consider Managed Services

Building and operating WebSocket infrastructure at scale is complex. Services like Pusher, Ably, AWS API Gateway WebSocket, and Azure SignalR provide managed solutions. Evaluate build vs. buy based on your scale, latency requirements, and engineering capacity.

Summary: Scaling WebSockets in Production

Scaling WebSockets presents unique challenges stemming from their stateful, long-lived nature. Let's consolidate the key strategies:

Key Takeaways

•Resource constraints are real but manageable — A tuned server can handle 100K+ connections. Understand file descriptors, memory, and kernel limits.
•Statefulness requires new patterns — Unlike HTTP, WebSocket connections are bound to specific servers. You can't just add more servers and load balance.
•Pub/sub bridges the gap — Message brokers (Redis, Kafka, NATS) enable cross-server message routing. Servers publish, servers subscribe, the broker handles distribution.
•Sticky sessions route connections consistently — Use consistent hashing by user ID for stable server assignment that survives scaling events.
•Multi-level indexing enables efficient operations — Index connections by user, room, and other dimensions for O(1) message routing instead of O(n) iteration.
•Architecture evolves with scale — Start simple (single server), add pub/sub (multi-server), add sharding (100K+), go global (1M+).
•Real-world systems shard by natural boundaries — Slack by workspace, Discord by guild. Find your sharding dimension.

What's next:

We've covered how WebSockets achieve real-time communication and how to scale them. The next page examines WebSocket vs HTTP—a detailed comparison that helps you decide when WebSockets are the right choice and when simpler approaches suffice.

Page Complete

You now understand the challenges and strategies for scaling WebSocket connections. You can design horizontally scalable WebSocket architectures using pub/sub messaging, implement sticky sessions with consistent hashing, and manage connections efficiently at scale.

3 / 5

Loading learning content...

System Design (HLD)WebSockets

WebSockets: Real-Time Bidirectional Communication

LevelIntermediate

Duration75 mins

TopicWebSockets

3 / 5

Scaling WebSocket Connections

The Scaling Challenge: From One to Millions

Consider the contrast:

HTTP at scale: Add more servers behind a load balancer. Any server can handle any request. User requests are distributed evenly. Done.
WebSocket at scale: Each connection is bound to a specific server. That server must receive all messages for that connection. When one user sends a message to another, and they're connected to different servers, how does the message route correctly?

This page addresses the fundamental question: How do we architect WebSocket systems that scale horizontally while maintaining the real-time, stateful communication that WebSockets enable?

What You Will Learn

Resource Constraints at Scale

Per-Connection Resource Consumption

•File Descriptors — Each WebSocket connection requires a socket, which consumes a file descriptor. Linux defaults to 1024 per process (soft limit), but production systems can be configured for millions. Monitor with ulimit -n and /proc/sys/fs/file-max.
•Memory — Each connection requires memory for socket buffers (send/receive), application state, and parsing context. Typical overhead: 2-20KB per idle connection, more when messages are in flight.
•TCP State — The kernel maintains TCP connection tracking structures. Tens of thousands of connections consume kernel memory and CPU for timer management.
•Application State — Your application logic (user sessions, subscriptions, queues) adds memory per connection. This often dominates over network overhead.
•CPU — Idle connections consume minimal CPU, but message processing, serialization, and application logic scale with activity. Connection management (ping/pong) adds constant overhead.

Single Server Capacity Estimations
Resource	Constraint	Typical Limit	Optimization Strategy
File Descriptors	OS limit	1M+ (with tuning)	Raise ulimit, tune kernel parameters
Memory	RAM available	10K-100K connections per GB	Optimize buffers, reduce per-connection state
CPU	Processing capacity	Varies with message rate	Efficient serialization, batch processing
Network Bandwidth	NIC capacity	10 Gbps+	Compression, binary protocols
Event Loop	Concurrency model	Platform-dependent	Use async I/O (Node.js, Go, Rust actors)

Linux Tuning for High Connection Counts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# /etc/sysctl.conf optimizations for WebSocket servers
 
# Increase maximum file descriptors
fs.file-max = 2097152
fs.nr_open = 2097152
 
# Increase socket buffer sizes
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 1048576
net.core.wmem_default = 1048576
 
# TCP tuning
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
 
# Increase connection backlog
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
 
# Allow more local ports for outbound connections
net.ipv4.ip_local_port_range = 1024 65535
 
# Reduce TIME_WAIT connections
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
 
# Increase connection tracking table
net.netfilter.nf_conntrack_max = 2097152
 
# Application process limits (/etc/security/limits.conf)
# *    soft    nofile    1048576
# *    hard    nofile    1048576

The C10K Problem and Beyond

The Statefulness Problem

The fundamental problem:

Imagine a chat application with two users, Alice and Bob:

Alice connects to Server A
Bob connects to Server B
Alice sends a message intended for Bob
Server A receives the message... but Bob's connection is on Server B

How does the message reach Bob?

The Routing Problem Illustrated
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Naive single-server implementation (works but doesn't scale)
const connections = new Map<string, WebSocket>();  // userId -> socket
 
function handleMessage(fromUser: string, message: ChatMessage) {
    const recipientSocket = connections.get(message.toUser);
    if (recipientSocket) {
        recipientSocket.send(JSON.stringify(message));
    }
}
 
// Problem: In a multi-server setup, 'connections' only contains
// users connected to THIS server. If message.toUser is connected
// to a different server, connections.get() returns undefined.
 
//                    ┌─────────────────┐
//                    │  Load Balancer  │
//                    └────────┬────────┘
//                             │
//           ┌─────────────────┼─────────────────┐
//           ▼                 ▼                 ▼
//    ┌─────────────┐   ┌─────────────┐   ┌─────────────┐
//    │  Server A   │   │  Server B   │   │  Server C   │
//    │             │   │             │   │             │
//    │ connections:│   │ connections:│   │ connections:│
//    │ - Alice     │   │ - Bob       │   │ - Charlie   │
//    │ - Dave      │   │ - Eve       │   │ - Frank     │
//    └─────────────┘   └─────────────┘   └─────────────┘
//
// Alice on Server A wants to message Bob on Server B
// Server A has no way to deliver directly—it doesn't know 
// Bob exists, let alone which server he's connected to.

Challenges Created by Statefulness

•Message routing — Messages destined for users on other servers need a forwarding mechanism.
•Broadcast complexity — Broadcasting to all users requires aggregating across all servers.
•Session affinity — The same user must reconnect to the same server (or transfer state).
•Failover — When a server dies, its connections are lost. Reconnecting clients may hit different servers.
•Uneven load — Some servers might have heavy users, others light. Load balancing by connection count doesn't ensure balanced resource usage.
•Deployment complexity — Rolling deploys must gracefully migrate connections or accept reconnection storms.

Pub/Sub for Cross-Server Messaging

How pub/sub solves the routing problem:

Each server subscribes to channels for its connected users (e.g., user:bob, room:general)
When a message arrives, the server publishes it to the appropriate channel
The pub/sub system forwards the message to all subscribers of that channel
Subscribing servers receive the message and forward to local connections

This decouples senders from receivers. Server A doesn't need to know Bob is on Server B—it just publishes to user:bob, and whoever's subscribed receives it.

Redis Pub/Sub Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
import Redis from 'ioredis';
import WebSocket, { WebSocketServer } from 'ws';
 
// Two Redis clients: one for pub, one for sub (required by Redis)
const publisher = new Redis();
const subscriber = new Redis();
 
const wss = new WebSocketServer({ port: 8080 });
 
// Map: userId -> local WebSocket connection
const localConnections = new Map<string, WebSocket>();
 
// ═══════════════════════════════════════════════════════════════
// SUBSCRIBE TO MESSAGES FROM OTHER SERVERS
// ═══════════════════════════════════════════════════════════════
 
subscriber.on('message', (channel, messageStr) => {
    const message = JSON.parse(messageStr);
    
    if (channel.startsWith('user:')) {
        // Direct message to a user
        const userId = channel.replace('user:', '');
        const socket = localConnections.get(userId);
        
        if (socket && socket.readyState === WebSocket.OPEN) {
            socket.send(messageStr);
        }
    } else if (channel.startsWith('room:')) {
        // Broadcast to all local users in this room
        // (In practice, track room memberships)
        broadcastToRoom(channel.replace('room:', ''), message);
    }
});
 
// ═══════════════════════════════════════════════════════════════
// HANDLE NEW CONNECTIONS
// ═══════════════════════════════════════════════════════════════
 
wss.on('connection', async (ws, request) => {
    const userId = authenticateUser(request);
    
    // Track locally
    localConnections.set(userId, ws);
    
    // Subscribe to this user's channel
    await subscriber.subscribe(`user:${userId}`);
    
    // Subscribe to user's rooms
    const userRooms = await getUserRooms(userId);
    for (const room of userRooms) {
        await subscriber.subscribe(`room:${room}`);
    }
    
    ws.on('message', async (data) => {
        const message = JSON.parse(data.toString());
        
        switch (message.type) {
            case 'direct_message':
                // Publish to target user's channel
                // Any server with that user subscribed will receive it
                await publisher.publish(
                    `user:${message.toUser}`,
                    JSON.stringify({
                        type: 'chat',
                        from: userId,
                        content: message.content,
                        timestamp: Date.now()
                    })
                );
                break;
                
            case 'room_message':
                // Publish to room channel
                // All servers with users in that room receive it
                await publisher.publish(
                    `room:${message.room}`,
                    JSON.stringify({
                        type: 'chat',
                        from: userId,
                        room: message.room,
                        content: message.content,
                        timestamp: Date.now()
                    })
                );
                break;
                
            case 'join_room':
                await subscriber.subscribe(`room:${message.room}`);
                break;
        }
    });
    
    ws.on('close', async () => {
        localConnections.delete(userId);
        
        // Unsubscribe from user channel
        await subscriber.unsubscribe(`user:${userId}`);
        
        // Unsubscribe from rooms if no other local users
        // (implementation depends on reference counting)
    });
});

Pub/Sub Technology Comparison
Technology	Throughput	Persistence	Best For
Redis Pub/Sub	~100K msg/s per node	None (fire-and-forget)	Simple setups, low-latency needs
Redis Streams	~50K msg/s per node	Yes	Reliable delivery, message history
Apache Kafka	1M+ msg/s (cluster)	Yes (durable)	High throughput, event sourcing
NATS	10M+ msg/s	Optional	Ultra-low latency, cloud-native
RabbitMQ	~50K msg/s	Optional	Complex routing, existing AMQP infra

Redis Pub/Sub Limitations

Sticky Sessions and Load Balancing

Why sticky sessions for WebSockets:

Connection continuity — The WebSocket upgrade starts as HTTP. The initial request and the subsequent persistent connection must hit the same server.
Reconnection to same server — If a connection drops and the client reconnects quickly, resuming on the same server allows state recovery without pub/sub round-trips.
Reduced pub/sub traffic — Multiple connections from the same user to the same server avoid cross-server routing for self-directed messages.

Sticky session strategies:

Sticky Session Implementation Methods

•Source IP Hashing — Route based on client IP. Simple but fails with NAT (many users share one IP) and doesn't survive server scaling.
•Cookie-Based — Set a cookie with server ID on first request. Subsequent requests (including WebSocket upgrade) use the same server. Widely supported by load balancers.
•Header-Based — Use custom headers to encode server affinity. Works when cookies aren't available.
•Consistent Hashing — Hash user ID to server. Survives adding/removing servers with minimal redistribution. Ideal for WebSockets.
•Connection ID — Include server ID in the connection URL or as a query parameter. Client uses same URL for reconnection.

NGINX Sticky Session Configuration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# NGINX configuration for WebSocket sticky sessions
 
upstream websocket_servers {
    # IP hash for sticky sessions (simple approach)
    ip_hash;
    
    server ws1.example.com:8080;
    server ws2.example.com:8080;
    server ws3.example.com:8080;
}
 
# Alternative: Cookie-based sticky sessions (NGINX Plus or with module)
upstream websocket_servers_cookie {
    sticky cookie srv_id expires=1h domain=.example.com path=/;
    
    server ws1.example.com:8080;
    server ws2.example.com:8080;
    server ws3.example.com:8080;
}
 
server {
    listen 443 ssl;
    server_name ws.example.com;
    
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    
    location /ws {
        proxy_pass http://websocket_servers;
        
        # Required for WebSocket upgrade
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        
        # Pass client info
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        
        # Timeouts for long-lived connections
        proxy_read_timeout 86400s;  # 24 hours
        proxy_send_timeout 86400s;
        
        # Buffer settings
        proxy_buffering off;
    }
}

Consistent Hashing for WebSockets

Sticky Sessions: Pros

•Connection upgrade works reliably
•State recovery on reconnection
•Reduced pub/sub overhead
•Simpler local state management
•Better cache locality

Sticky Sessions: Cons

•Uneven load distribution possible
•Server failure moves all its users
•Harder to drain servers for maintenance
•May need session rebalancing
•Complicates autoscaling

Connection Management at Scale

Connection organization patterns:

Efficient Connection Management
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
// ═══════════════════════════════════════════════════════════════
// MULTI-INDEX CONNECTION MANAGER
// ═══════════════════════════════════════════════════════════════
 
interface Connection {
    socket: WebSocket;
    userId: string;
    deviceId: string;
    rooms: Set<string>;
    lastActivity: number;
    metadata: Record<string, any>;
}
 
class ConnectionManager {
    // Primary index: connectionId -> Connection
    private connections = new Map<string, Connection>();
    
    // Secondary indexes for fast lookups
    private byUserId = new Map<string, Set<string>>();      // userId -> connectionIds
    private byRoom = new Map<string, Set<string>>();        // room -> connectionIds
    private byDeviceType = new Map<string, Set<string>>();  // deviceType -> connectionIds
    
    // Activity tracking for cleanup
    private activityQueue: string[] = [];
    
    addConnection(connId: string, conn: Connection): void {
        this.connections.set(connId, conn);
        
        // Index by user (one user may have multiple connections)
        if (!this.byUserId.has(conn.userId)) {
            this.byUserId.set(conn.userId, new Set());
        }
        this.byUserId.get(conn.userId)!.add(connId);
        
        // Index by device type
        const deviceType = conn.metadata.deviceType || 'unknown';
        if (!this.byDeviceType.has(deviceType)) {
            this.byDeviceType.set(deviceType, new Set());
        }
        this.byDeviceType.get(deviceType)!.add(connId);
    }
    
    removeConnection(connId: string): void {
        const conn = this.connections.get(connId);
        if (!conn) return;
        
        // Remove from all indexes
        this.byUserId.get(conn.userId)?.delete(connId);
        for (const room of conn.rooms) {
            this.byRoom.get(room)?.delete(connId);
        }
        
        this.connections.delete(connId);
    }
    
    joinRoom(connId: string, room: string): void {
        const conn = this.connections.get(connId);
        if (!conn) return;
        
        conn.rooms.add(room);
        
        if (!this.byRoom.has(room)) {
            this.byRoom.set(room, new Set());
        }
        this.byRoom.get(room)!.add(connId);
    }
    
    // Efficient broadcast to room (O(room size), not O(total connections))
    broadcastToRoom(room: string, message: string): void {
        const connIds = this.byRoom.get(room);
        if (!connIds) return;
        
        for (const connId of connIds) {
            const conn = this.connections.get(connId);
            if (conn && conn.socket.readyState === WebSocket.OPEN) {
                conn.socket.send(message);
            }
        }
    }
    
    // Send to specific user (all their devices)
    sendToUser(userId: string, message: string): void {
        const connIds = this.byUserId.get(userId);
        if (!connIds) return;
        
        for (const connId of connIds) {
            const conn = this.connections.get(connId);
            if (conn && conn.socket.readyState === WebSocket.OPEN) {
                conn.socket.send(message);
            }
        }
    }
    
    // Periodic cleanup of idle connections
    cleanupIdleConnections(maxIdleMs: number): number {
        const now = Date.now();
        let cleaned = 0;
        
        for (const [connId, conn] of this.connections) {
            if (now - conn.lastActivity > maxIdleMs) {
                conn.socket.close(1000, 'Idle timeout');
                this.removeConnection(connId);
                cleaned++;
            }
        }
        
        return cleaned;
    }
    
    getStats(): { total: number; byRoom: Record<string, number> } {
        const byRoom: Record<string, number> = {};
        for (const [room, connIds] of this.byRoom) {
            byRoom[room] = connIds.size;
        }
        
        return {
            total: this.connections.size,
            byRoom
        };
    }
}

Batching and buffering:

At high message rates, sending individual messages creates overhead. Batching multiple messages into single network operations improves throughput:

Time-based batching — Collect messages for 10-50ms, then send bundled
Size-based batching — Send when buffer reaches N messages or M bytes
Hybrid — Send on first message, then batch subsequent messages in a time window

Priority queuing:

Not all messages are equal. Implement per-connection priority queues to ensure high-priority messages (auth challenges, error notifications) aren't stuck behind bulk updates.

Horizontal Scaling Architecture

Let's assemble the patterns we've discussed into complete horizontal scaling architectures. These designs are proven in production at scale by companies handling millions of concurrent connections.

Production WebSocket Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// ═══════════════════════════════════════════════════════════════
// ARCHITECTURE DIAGRAM
// ═══════════════════════════════════════════════════════════════
 
                    ┌──────────────────────────────────────┐
                    │          Global Load Balancer        │
                    │     (GeoDNS / Anycast / Cloudflare)  │
                    └────────────────┬─────────────────────┘
                                     │
             ┌───────────────────────┼───────────────────────┐
             ▼                       ▼                       ▼
    ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
    │  Region: US-East│    │  Region: EU-West│    │  Region: APAC   │
    └────────┬────────┘    └────────┬────────┘    └────────┬────────┘
             │                      │                      │
    ┌────────┴────────┐    ┌────────┴────────┐    ┌────────┴────────┐
    │   L7 Load       │    │   L7 Load       │    │   L7 Load       │
    │   Balancer      │    │   Balancer      │    │   Balancer      │
    │ (Consistent     │    │ (Consistent     │    │ (Consistent     │
    │   Hashing)      │    │   Hashing)      │    │   Hashing)      │
    └────────┬────────┘    └────────┬────────┘    └────────┬────────┘
             │                      │                      │
    ┌────────┼────────┐    ┌────────┼────────┐    ┌────────┼────────┐
    ▼        ▼        ▼    ▼        ▼        ▼    ▼        ▼        ▼
  ┌───┐    ┌───┐    ┌───┐  ┌───┐  ┌───┐    ┌───┐  ┌───┐  ┌───┐    ┌───┐
  │WS1│    │WS2│    │WSn│  │WS1│  │WS2│    │WSn│  │WS1│  │WS2│    │WSn│
  └─┬─┘    └─┬─┘    └─┬─┘  └─┬─┘  └─┬─┘    └─┬─┘  └─┬─┘  └─┬─┘    └─┬─┘
    │        │        │      │      │        │      │      │        │
    └────────┴────────┼──────┴──────┴────────┼──────┴──────┴────────┘
                      │                      │
             ┌────────┴────────┐    ┌────────┴────────┐
             │ Message Broker   │◄──►│ Message Broker   │
             │ (Redis Cluster)  │    │ (Kafka)          │
             └────────┬─────────┘    └────────┬─────────┘
                      │                       │
             ┌────────┴─────────────────────┬─┘
             ▼                              ▼
    ┌─────────────────┐           ┌──────────────────┐
    │ Presence Service │           │ Message Service  │
    │ (who's online)   │           │ (history, search)│
    └─────────────────┘           └──────────────────┘
 
 
// ═══════════════════════════════════════════════════════════════
// KEY COMPONENTS
// ═══════════════════════════════════════════════════════════════
 
1. GLOBAL LOAD BALANCER
   - Routes users to nearest region (latency optimization)
   - GeoDNS or Anycast for global distribution
   - Health checking per region
 
2. L7 (APPLICATION) LOAD BALANCER  
   - Understands WebSocket upgrade headers
   - Consistent hashing by user ID
   - Drains connections gracefully on server removal
   - Examples: HAProxy, Envoy, NGINX Plus
 
3. WEBSOCKET SERVERS (STATELESS-ISH)
   - Hold connections and local state
   - Subscribe to pub/sub for incoming messages
   - Publish outgoing messages to pub/sub
   - Can be scaled horizontally with autoscaling
 
4. MESSAGE BROKER
   - Enables cross-server communication
   - Redis for simple pub/sub
   - Kafka for durability and replay
   - Can be clustered for HA and scale
 
5. SUPPORTING SERVICES
   - Presence: Track who's online across all servers
   - Message History: Persistent storage for catch-up
   - User Sessions: Authentication state

Scaling Milestones and Strategies
Scale	Connections	Key Challenges	Strategy
Small	< 10K	Getting it working	Single server, embedded pub/sub
Medium	10K - 100K	Horizontal scaling	Multiple servers, Redis pub/sub, sticky sessions
Large	100K - 1M	Pub/sub bottlenecks	Redis Cluster, connection sharding, batching
Massive	1M - 10M	Global distribution	Multi-region, Kafka, edge presence, protocol optimization
Extreme	10M	Custom everything	Custom protocols, kernel bypass, specialized hardware

Real-World Scaling Examples

Let's examine how major companies have solved WebSocket scaling challenges. These examples provide patterns and numbers that inform architectural decisions.

Slack: Enterprise Messaging at Scale

Slack handles millions of concurrent WebSocket connections for real-time messaging. Their architecture includes:

Edge connections: WebSocket connections terminate at edge servers close to users
Message routing: Apache Kafka for cross-datacenter message distribution
Channel-based routing: Messages route to servers based on channel membership
Fallback transport: Automatic fallback to long-polling when WebSocket fails

Key insight: Slack shards by workspace, so most messages stay within a workspace's server cluster.

Discord: Gaming Communications

Discord serves hundreds of millions of users with extreme real-time requirements:

Erlang-based Gateway: Custom WebSocket gateway in Erlang for massive concurrency
Guild-based sharding: Each Discord Guild (server) is assigned to specific backend servers
Hot-standby failover: Instant takeover when primary fails
Client-side reconnection: Clients maintain session state for seamless resume

Key insight: Discord's guild sharding means most messages don't cross server boundaries.

Pusher: WebSocket as a Service

Pusher (and similar services like Ably, PubNub) operate WebSocket infrastructure for other companies:

Multi-tenant architecture: Single cluster serves thousands of separate applications
Channel abstraction: All routing based on channels, not user IDs
Presence channels: Built-in tracking of who's subscribed to what
Global edge network: Connections terminate close to users, backend in fewer regions

Key insight: Channel-based routing simplifies architecture—no user-to-server mapping needed.

Consider Managed Services

Summary: Scaling WebSockets in Production

Scaling WebSockets presents unique challenges stemming from their stateful, long-lived nature. Let's consolidate the key strategies:

Key Takeaways

•Resource constraints are real but manageable — A tuned server can handle 100K+ connections. Understand file descriptors, memory, and kernel limits.
•Statefulness requires new patterns — Unlike HTTP, WebSocket connections are bound to specific servers. You can't just add more servers and load balance.
•Pub/sub bridges the gap — Message brokers (Redis, Kafka, NATS) enable cross-server message routing. Servers publish, servers subscribe, the broker handles distribution.
•Sticky sessions route connections consistently — Use consistent hashing by user ID for stable server assignment that survives scaling events.
•Multi-level indexing enables efficient operations — Index connections by user, room, and other dimensions for O(1) message routing instead of O(n) iteration.
•Architecture evolves with scale — Start simple (single server), add pub/sub (multi-server), add sharding (100K+), go global (1M+).
•Real-world systems shard by natural boundaries — Slack by workspace, Discord by guild. Find your sharding dimension.

What's next:

Page Complete

3 / 5