Loading learning content...
A single server handling a few hundred WebSocket connections is straightforward. But what happens when your application grows to serve ten thousand users? A hundred thousand? A million? The stateful, long-lived nature of WebSocket connections introduces scaling challenges that simply don't exist with stateless HTTP.
Consider the contrast:
HTTP at scale: Add more servers behind a load balancer. Any server can handle any request. User requests are distributed evenly. Done.
WebSocket at scale: Each connection is bound to a specific server. That server must receive all messages for that connection. When one user sends a message to another, and they're connected to different servers, how does the message route correctly?
This page addresses the fundamental question: How do we architect WebSocket systems that scale horizontally while maintaining the real-time, stateful communication that WebSockets enable?
By the end of this page, you will understand WebSocket scaling challenges, resource management at scale, horizontal scaling strategies using pub/sub, sticky sessions and their trade-offs, connection management patterns, and real-world architectures used by high-scale WebSocket systems.
Before discussing scaling strategies, we must understand what limits a single server's WebSocket capacity. Each connection consumes resources, and understanding these constraints helps you plan capacity and optimize configurations.
ulimit -n and /proc/sys/fs/file-max.| Resource | Constraint | Typical Limit | Optimization Strategy |
|---|---|---|---|
| File Descriptors | OS limit | 1M+ (with tuning) | Raise ulimit, tune kernel parameters |
| Memory | RAM available | 10K-100K connections per GB | Optimize buffers, reduce per-connection state |
| CPU | Processing capacity | Varies with message rate | Efficient serialization, batch processing |
| Network Bandwidth | NIC capacity | 10 Gbps+ | Compression, binary protocols |
| Event Loop | Concurrency model | Platform-dependent | Use async I/O (Node.js, Go, Rust actors) |
123456789101112131415161718192021222324252627282930313233
# /etc/sysctl.conf optimizations for WebSocket servers # Increase maximum file descriptorsfs.file-max = 2097152fs.nr_open = 2097152 # Increase socket buffer sizesnet.core.rmem_max = 16777216net.core.wmem_max = 16777216net.core.rmem_default = 1048576net.core.wmem_default = 1048576 # TCP tuningnet.ipv4.tcp_rmem = 4096 87380 16777216net.ipv4.tcp_wmem = 4096 65536 16777216 # Increase connection backlognet.core.somaxconn = 65535net.ipv4.tcp_max_syn_backlog = 65535 # Allow more local ports for outbound connectionsnet.ipv4.ip_local_port_range = 1024 65535 # Reduce TIME_WAIT connectionsnet.ipv4.tcp_tw_reuse = 1net.ipv4.tcp_fin_timeout = 15 # Increase connection tracking tablenet.netfilter.nf_conntrack_max = 2097152 # Application process limits (/etc/security/limits.conf)# * soft nofile 1048576# * hard nofile 1048576The 'C10K problem' (handling 10,000 concurrent connections) was solved decades ago with event-driven I/O. Modern systems target C100K, C1M, or even C10M. Languages like Go, Node.js, and Rust with async runtimes are designed for this scale. Choose your stack wisely—thread-per-connection models won't scale.
WebSocket's stateful nature is simultaneously its greatest strength and its most significant scaling challenge. Unlike HTTP, where any request can be routed to any server, WebSocket messages must be routed to the specific server holding that connection.
The fundamental problem:
Imagine a chat application with two users, Alice and Bob:
How does the message reach Bob?
With HTTP, this isn't a problem—Bob's next request could hit any server, and the response would be the same. But with WebSockets, there is no "next request" from Bob. He's connected to Server B, waiting for messages to arrive on that specific connection.
12345678910111213141516171819202122232425262728293031
// Naive single-server implementation (works but doesn't scale)const connections = new Map<string, WebSocket>(); // userId -> socket function handleMessage(fromUser: string, message: ChatMessage) { const recipientSocket = connections.get(message.toUser); if (recipientSocket) { recipientSocket.send(JSON.stringify(message)); }} // Problem: In a multi-server setup, 'connections' only contains// users connected to THIS server. If message.toUser is connected// to a different server, connections.get() returns undefined. // ┌─────────────────┐// │ Load Balancer │// └────────┬────────┘// │// ┌─────────────────┼─────────────────┐// ▼ ▼ ▼// ┌─────────────┐ ┌─────────────┐ ┌─────────────┐// │ Server A │ │ Server B │ │ Server C │// │ │ │ │ │ │// │ connections:│ │ connections:│ │ connections:│// │ - Alice │ │ - Bob │ │ - Charlie │// │ - Dave │ │ - Eve │ │ - Frank │// └─────────────┘ └─────────────┘ └─────────────┘//// Alice on Server A wants to message Bob on Server B// Server A has no way to deliver directly—it doesn't know // Bob exists, let alone which server he's connected to.The standard solution to WebSocket's statefulness problem is pub/sub messaging between servers. Instead of servers communicating directly with each other, they publish messages to a shared message bus, and servers subscribe to messages relevant to their connected users.
How pub/sub solves the routing problem:
user:bob, room:general)This decouples senders from receivers. Server A doesn't need to know Bob is on Server B—it just publishes to user:bob, and whoever's subscribed receives it.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102
import Redis from 'ioredis';import WebSocket, { WebSocketServer } from 'ws'; // Two Redis clients: one for pub, one for sub (required by Redis)const publisher = new Redis();const subscriber = new Redis(); const wss = new WebSocketServer({ port: 8080 }); // Map: userId -> local WebSocket connectionconst localConnections = new Map<string, WebSocket>(); // ═══════════════════════════════════════════════════════════════// SUBSCRIBE TO MESSAGES FROM OTHER SERVERS// ═══════════════════════════════════════════════════════════════ subscriber.on('message', (channel, messageStr) => { const message = JSON.parse(messageStr); if (channel.startsWith('user:')) { // Direct message to a user const userId = channel.replace('user:', ''); const socket = localConnections.get(userId); if (socket && socket.readyState === WebSocket.OPEN) { socket.send(messageStr); } } else if (channel.startsWith('room:')) { // Broadcast to all local users in this room // (In practice, track room memberships) broadcastToRoom(channel.replace('room:', ''), message); }}); // ═══════════════════════════════════════════════════════════════// HANDLE NEW CONNECTIONS// ═══════════════════════════════════════════════════════════════ wss.on('connection', async (ws, request) => { const userId = authenticateUser(request); // Track locally localConnections.set(userId, ws); // Subscribe to this user's channel await subscriber.subscribe(`user:${userId}`); // Subscribe to user's rooms const userRooms = await getUserRooms(userId); for (const room of userRooms) { await subscriber.subscribe(`room:${room}`); } ws.on('message', async (data) => { const message = JSON.parse(data.toString()); switch (message.type) { case 'direct_message': // Publish to target user's channel // Any server with that user subscribed will receive it await publisher.publish( `user:${message.toUser}`, JSON.stringify({ type: 'chat', from: userId, content: message.content, timestamp: Date.now() }) ); break; case 'room_message': // Publish to room channel // All servers with users in that room receive it await publisher.publish( `room:${message.room}`, JSON.stringify({ type: 'chat', from: userId, room: message.room, content: message.content, timestamp: Date.now() }) ); break; case 'join_room': await subscriber.subscribe(`room:${message.room}`); break; } }); ws.on('close', async () => { localConnections.delete(userId); // Unsubscribe from user channel await subscriber.unsubscribe(`user:${userId}`); // Unsubscribe from rooms if no other local users // (implementation depends on reference counting) });});| Technology | Throughput | Persistence | Best For |
|---|---|---|---|
| Redis Pub/Sub | ~100K msg/s per node | None (fire-and-forget) | Simple setups, low-latency needs |
| Redis Streams | ~50K msg/s per node | Yes | Reliable delivery, message history |
| Apache Kafka | 1M+ msg/s (cluster) | Yes (durable) | High throughput, event sourcing |
| NATS | 10M+ msg/s | Optional | Ultra-low latency, cloud-native |
| RabbitMQ | ~50K msg/s | Optional | Complex routing, existing AMQP infra |
Redis Pub/Sub is fire-and-forget: if a server isn't subscribed when a message is published, it misses the message. There's no persistence or replay. For guaranteed delivery, consider Redis Streams, Kafka, or implement your own acknowledgment layer.
While pub/sub solves cross-server messaging, there's still the question of how WebSocket connections are initially distributed across servers and how reconnections are handled. Sticky sessions (also called session affinity) ensure that all requests from a user are routed to the same server.
Why sticky sessions for WebSockets:
Connection continuity — The WebSocket upgrade starts as HTTP. The initial request and the subsequent persistent connection must hit the same server.
Reconnection to same server — If a connection drops and the client reconnects quickly, resuming on the same server allows state recovery without pub/sub round-trips.
Reduced pub/sub traffic — Multiple connections from the same user to the same server avoid cross-server routing for self-directed messages.
Sticky session strategies:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
# NGINX configuration for WebSocket sticky sessions upstream websocket_servers { # IP hash for sticky sessions (simple approach) ip_hash; server ws1.example.com:8080; server ws2.example.com:8080; server ws3.example.com:8080;} # Alternative: Cookie-based sticky sessions (NGINX Plus or with module)upstream websocket_servers_cookie { sticky cookie srv_id expires=1h domain=.example.com path=/; server ws1.example.com:8080; server ws2.example.com:8080; server ws3.example.com:8080;} server { listen 443 ssl; server_name ws.example.com; ssl_certificate /path/to/cert.pem; ssl_certificate_key /path/to/key.pem; location /ws { proxy_pass http://websocket_servers; # Required for WebSocket upgrade proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; # Pass client info proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # Timeouts for long-lived connections proxy_read_timeout 86400s; # 24 hours proxy_send_timeout 86400s; # Buffer settings proxy_buffering off; }}Consistent hashing is often the best approach for WebSocket routing. Hash the user ID (not IP) to determine server assignment. When servers are added or removed, only 1/N connections need to move. Implement with a library like hashring or use load balancers that support it (HAProxy, Envoy).
Managing millions of concurrent connections requires sophisticated patterns for tracking, organizing, and efficiently operating on connection groups. The naive approach of iterating through all connections for broadcasts quickly becomes a bottleneck.
Connection organization patterns:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
// ═══════════════════════════════════════════════════════════════// MULTI-INDEX CONNECTION MANAGER// ═══════════════════════════════════════════════════════════════ interface Connection { socket: WebSocket; userId: string; deviceId: string; rooms: Set<string>; lastActivity: number; metadata: Record<string, any>;} class ConnectionManager { // Primary index: connectionId -> Connection private connections = new Map<string, Connection>(); // Secondary indexes for fast lookups private byUserId = new Map<string, Set<string>>(); // userId -> connectionIds private byRoom = new Map<string, Set<string>>(); // room -> connectionIds private byDeviceType = new Map<string, Set<string>>(); // deviceType -> connectionIds // Activity tracking for cleanup private activityQueue: string[] = []; addConnection(connId: string, conn: Connection): void { this.connections.set(connId, conn); // Index by user (one user may have multiple connections) if (!this.byUserId.has(conn.userId)) { this.byUserId.set(conn.userId, new Set()); } this.byUserId.get(conn.userId)!.add(connId); // Index by device type const deviceType = conn.metadata.deviceType || 'unknown'; if (!this.byDeviceType.has(deviceType)) { this.byDeviceType.set(deviceType, new Set()); } this.byDeviceType.get(deviceType)!.add(connId); } removeConnection(connId: string): void { const conn = this.connections.get(connId); if (!conn) return; // Remove from all indexes this.byUserId.get(conn.userId)?.delete(connId); for (const room of conn.rooms) { this.byRoom.get(room)?.delete(connId); } this.connections.delete(connId); } joinRoom(connId: string, room: string): void { const conn = this.connections.get(connId); if (!conn) return; conn.rooms.add(room); if (!this.byRoom.has(room)) { this.byRoom.set(room, new Set()); } this.byRoom.get(room)!.add(connId); } // Efficient broadcast to room (O(room size), not O(total connections)) broadcastToRoom(room: string, message: string): void { const connIds = this.byRoom.get(room); if (!connIds) return; for (const connId of connIds) { const conn = this.connections.get(connId); if (conn && conn.socket.readyState === WebSocket.OPEN) { conn.socket.send(message); } } } // Send to specific user (all their devices) sendToUser(userId: string, message: string): void { const connIds = this.byUserId.get(userId); if (!connIds) return; for (const connId of connIds) { const conn = this.connections.get(connId); if (conn && conn.socket.readyState === WebSocket.OPEN) { conn.socket.send(message); } } } // Periodic cleanup of idle connections cleanupIdleConnections(maxIdleMs: number): number { const now = Date.now(); let cleaned = 0; for (const [connId, conn] of this.connections) { if (now - conn.lastActivity > maxIdleMs) { conn.socket.close(1000, 'Idle timeout'); this.removeConnection(connId); cleaned++; } } return cleaned; } getStats(): { total: number; byRoom: Record<string, number> } { const byRoom: Record<string, number> = {}; for (const [room, connIds] of this.byRoom) { byRoom[room] = connIds.size; } return { total: this.connections.size, byRoom }; }}Batching and buffering:
At high message rates, sending individual messages creates overhead. Batching multiple messages into single network operations improves throughput:
Priority queuing:
Not all messages are equal. Implement per-connection priority queues to ensure high-priority messages (auth challenges, error notifications) aren't stuck behind bulk updates.
Let's assemble the patterns we've discussed into complete horizontal scaling architectures. These designs are proven in production at scale by companies handling millions of concurrent connections.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
// ═══════════════════════════════════════════════════════════════// ARCHITECTURE DIAGRAM// ═══════════════════════════════════════════════════════════════ ┌──────────────────────────────────────┐ │ Global Load Balancer │ │ (GeoDNS / Anycast / Cloudflare) │ └────────────────┬─────────────────────┘ │ ┌───────────────────────┼───────────────────────┐ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Region: US-East│ │ Region: EU-West│ │ Region: APAC │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ │ │ ┌────────┴────────┐ ┌────────┴────────┐ ┌────────┴────────┐ │ L7 Load │ │ L7 Load │ │ L7 Load │ │ Balancer │ │ Balancer │ │ Balancer │ │ (Consistent │ │ (Consistent │ │ (Consistent │ │ Hashing) │ │ Hashing) │ │ Hashing) │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ │ │ ┌────────┼────────┐ ┌────────┼────────┐ ┌────────┼────────┐ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │WS1│ │WS2│ │WSn│ │WS1│ │WS2│ │WSn│ │WS1│ │WS2│ │WSn│ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ │ │ │ │ │ │ │ │ │ └────────┴────────┼──────┴──────┴────────┼──────┴──────┴────────┘ │ │ ┌────────┴────────┐ ┌────────┴────────┐ │ Message Broker │◄──►│ Message Broker │ │ (Redis Cluster) │ │ (Kafka) │ └────────┬─────────┘ └────────┬─────────┘ │ │ ┌────────┴─────────────────────┬─┘ ▼ ▼ ┌─────────────────┐ ┌──────────────────┐ │ Presence Service │ │ Message Service │ │ (who's online) │ │ (history, search)│ └─────────────────┘ └──────────────────┘ // ═══════════════════════════════════════════════════════════════// KEY COMPONENTS// ═══════════════════════════════════════════════════════════════ 1. GLOBAL LOAD BALANCER - Routes users to nearest region (latency optimization) - GeoDNS or Anycast for global distribution - Health checking per region 2. L7 (APPLICATION) LOAD BALANCER - Understands WebSocket upgrade headers - Consistent hashing by user ID - Drains connections gracefully on server removal - Examples: HAProxy, Envoy, NGINX Plus 3. WEBSOCKET SERVERS (STATELESS-ISH) - Hold connections and local state - Subscribe to pub/sub for incoming messages - Publish outgoing messages to pub/sub - Can be scaled horizontally with autoscaling 4. MESSAGE BROKER - Enables cross-server communication - Redis for simple pub/sub - Kafka for durability and replay - Can be clustered for HA and scale 5. SUPPORTING SERVICES - Presence: Track who's online across all servers - Message History: Persistent storage for catch-up - User Sessions: Authentication state| Scale | Connections | Key Challenges | Strategy |
|---|---|---|---|
| Small | < 10K | Getting it working | Single server, embedded pub/sub |
| Medium | 10K - 100K | Horizontal scaling | Multiple servers, Redis pub/sub, sticky sessions |
| Large | 100K - 1M | Pub/sub bottlenecks | Redis Cluster, connection sharding, batching |
| Massive | 1M - 10M | Global distribution | Multi-region, Kafka, edge presence, protocol optimization |
| Extreme | 10M | Custom everything | Custom protocols, kernel bypass, specialized hardware |
Let's examine how major companies have solved WebSocket scaling challenges. These examples provide patterns and numbers that inform architectural decisions.
Slack: Enterprise Messaging at Scale
Slack handles millions of concurrent WebSocket connections for real-time messaging. Their architecture includes:
Key insight: Slack shards by workspace, so most messages stay within a workspace's server cluster.
Discord: Gaming Communications
Discord serves hundreds of millions of users with extreme real-time requirements:
Key insight: Discord's guild sharding means most messages don't cross server boundaries.
Pusher: WebSocket as a Service
Pusher (and similar services like Ably, PubNub) operate WebSocket infrastructure for other companies:
Key insight: Channel-based routing simplifies architecture—no user-to-server mapping needed.
Building and operating WebSocket infrastructure at scale is complex. Services like Pusher, Ably, AWS API Gateway WebSocket, and Azure SignalR provide managed solutions. Evaluate build vs. buy based on your scale, latency requirements, and engineering capacity.
Scaling WebSockets presents unique challenges stemming from their stateful, long-lived nature. Let's consolidate the key strategies:
What's next:
We've covered how WebSockets achieve real-time communication and how to scale them. The next page examines WebSocket vs HTTP—a detailed comparison that helps you decide when WebSockets are the right choice and when simpler approaches suffice.
You now understand the challenges and strategies for scaling WebSocket connections. You can design horizontally scalable WebSocket architectures using pub/sub messaging, implement sticky sessions with consistent hashing, and manage connections efficiently at scale.