Loading learning content...
When Discord reports handling 5 million concurrent WebSocket connections per server cluster, or when Slack maintains persistent connections for millions of simultaneously active users, they're not using magic—they're applying rigorous connection management patterns that have evolved through decades of distributed systems research.
Managing real-time connections at scale is fundamentally different from handling HTTP request-response traffic. In traditional HTTP, connections are ephemeral: a request comes in, gets processed, response goes out, connection closes. But real-time systems maintain persistent, stateful connections that can last hours or days. Each connection consumes memory, file descriptors, and CPU cycles—resources that must be carefully managed to prevent system collapse.
This page dives deep into the architectural patterns, resource optimization strategies, and battle-tested practices that enable systems to maintain millions of concurrent real-time connections while providing sub-second latency guarantees.
By the end of this page, you will understand: • The fundamental resource constraints in connection management • How to architect connection servers for horizontal scalability • Memory optimization techniques for connection state • Connection pooling and multiplexing patterns • How industry leaders like Discord, Slack, and WhatsApp manage connections at massive scale
Before designing connection management systems, we must understand what each connection actually consumes. Every persistent connection—whether WebSocket, SSE, or long-polling—creates resource overhead across multiple system layers.
System Resources Consumed Per Connection:
| Resource | Typical Consumption | Impact at 1M Connections | Optimization Potential |
|---|---|---|---|
| File Descriptors | 1 per connection | 1,000,000 FDs needed | Kernel tuning (ulimit, sysctl) |
| TCP Buffer Memory | ~87KB (send + receive) | ~87GB total | Buffer tuning, lazy allocation |
| Application State | 0.5KB - 10KB | 500MB - 10GB | State externalization, compression |
| Thread/Goroutine | 1 per connection (naive) | 1M threads = crash | Event-driven I/O, async runtimes |
| Kernel Socket Structures | ~400 bytes | ~400MB | Connection pooling |
| TLS Session State | ~50KB if TLS | ~50GB if TLS | TLS session resumption, offloading |
The File Descriptor Limit:
Operating systems impose limits on file descriptor usage. On Linux, the default per-process limit is often 1024, with a system-wide limit around 100,000. Handling a million connections requires:
ulimit -n 1000000/etc/sysctl.confnet.ipv4.tcp_rmem and net.ipv4.tcp_wmemnet.ipv4.ip_local_port_range1234567891011121314151617181920212223242526272829303132333435363738
# Linux kernel parameters for high-connection workloads# /etc/sysctl.conf or /etc/sysctl.d/99-realtime.conf # Increase system-wide file descriptor limitsfs.file-max = 2000000fs.nr_open = 2000000 # Increase socket buffer sizes (tune based on workload)net.core.rmem_max = 16777216net.core.wmem_max = 16777216net.core.rmem_default = 1048576net.core.wmem_default = 1048576 # TCP buffer tuning (min, default, max)# Smaller buffers = more connections, higher latencynet.ipv4.tcp_rmem = 4096 12582912 16777216net.ipv4.tcp_wmem = 4096 12582912 16777216 # Enable TCP memory managementnet.ipv4.tcp_mem = 786432 1048576 1572864 # Increase connection tracking tablenet.netfilter.nf_conntrack_max = 2000000 # Expand ephemeral port rangenet.ipv4.ip_local_port_range = 1024 65535 # Enable TCP keepalive (essential for detecting dead connections)net.ipv4.tcp_keepalive_time = 600net.ipv4.tcp_keepalive_intvl = 60net.ipv4.tcp_keepalive_probes = 5 # Increase backlog queue for incoming connectionsnet.core.somaxconn = 65535net.ipv4.tcp_max_syn_backlog = 65535 # Enable TCP Fast Open (reduces handshake latency)net.ipv4.tcp_fastopen = 3While file descriptors can be tuned, memory cannot be arbitrarily increased. A naive implementation storing 10KB of state per connection requires 10GB of RAM for 1 million connections—before accounting for application code, garbage collection overhead, or TCP buffers. Memory optimization is paramount.
The traditional "thread-per-connection" model collapses at scale. Creating a thread for each of 100,000 connections would require:
Event-driven architectures solve this by using a small number of threads to handle many connections through non-blocking I/O and event loops. Instead of blocking a thread while waiting for data, the system registers interest in events and processes them asynchronously.
Key I/O Multiplexing Primitives:
| Platform | Mechanism | Scalability | Notes |
|---|---|---|---|
| Linux | epoll | O(1) for ready events | Edge-triggered mode preferred for performance |
| BSD/macOS | kqueue | O(1) for ready events | Unified interface for sockets, files, signals |
| Windows | IOCP | O(1) for completed I/O | True async I/O, not just readiness notification |
| Cross-platform | libuv | Abstracts platform differences | Used by Node.js, provides unified API |
The Reactor Pattern:
Modern connection servers implement the Reactor pattern, where:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183
package main import ( "log" "net" "sync" "syscall" "golang.org/x/sys/unix") // ConnectionManager handles millions of concurrent connections// using epoll for event notificationtype ConnectionManager struct { epollFD int listener *net.TCPListener clients map[int]*ClientConnection clientsMu sync.RWMutex eventPool []unix.EpollEvent} // ClientConnection represents minimal state per connection// Keeping this small is critical for memory efficiencytype ClientConnection struct { fd int addr string userID uint64 // User identifier channels []uint32 // Subscribed channel IDs (compact representation) lastPing int64 // Unix timestamp of last ping sendBuf []byte // Outgoing message buffer (pooled) recvBuf []byte // Incoming message buffer (pooled)} func NewConnectionManager(addr string) (*ConnectionManager, error) { // Create epoll instance epollFD, err := unix.EpollCreate1(0) if err != nil { return nil, err } // Create TCP listener listener, err := net.Listen("tcp", addr) if err != nil { return nil, err } cm := &ConnectionManager{ epollFD: epollFD, listener: listener.(*net.TCPListener), clients: make(map[int]*ClientConnection), eventPool: make([]unix.EpollEvent, 10000), // Process 10k events per iteration } // Register listener for accept events listenerFD := getListenerFD(listener) event := &unix.EpollEvent{ Events: unix.EPOLLIN | unix.EPOLLET, // Edge-triggered Fd: int32(listenerFD), } unix.EpollCtl(epollFD, unix.EPOLL_CTL_ADD, listenerFD, event) return cm, nil} // Run is the main event loop - single-threaded, non-blockingfunc (cm *ConnectionManager) Run() { log.Println("Starting event loop...") for { // Wait for events (blocks until at least one event is ready) numEvents, err := unix.EpollWait(cm.epollFD, cm.eventPool, -1) if err != nil { if err == unix.EINTR { continue // Interrupted by signal, retry } log.Printf("epoll_wait error: %v", err) continue } // Process all ready events for i := 0; i < numEvents; i++ { event := cm.eventPool[i] fd := int(event.Fd) if event.Events&(unix.EPOLLERR|unix.EPOLLHUP) != 0 { // Connection error or hangup cm.closeConnection(fd) continue } if event.Events&unix.EPOLLIN != 0 { // Data available to read cm.handleRead(fd) } if event.Events&unix.EPOLLOUT != 0 { // Socket ready for writing cm.handleWrite(fd) } } }} func (cm *ConnectionManager) handleRead(fd int) { cm.clientsMu.RLock() client, exists := cm.clients[fd] cm.clientsMu.RUnlock() if !exists { return } // Use pooled buffer for reading buf := client.recvBuf if buf == nil { buf = bufferPool.Get().([]byte) client.recvBuf = buf } // Non-blocking read - edge-triggered means we must drain the buffer for { n, err := unix.Read(fd, buf) if err != nil { if err == unix.EAGAIN || err == unix.EWOULDBLOCK { break // No more data available } cm.closeConnection(fd) return } if n == 0 { cm.closeConnection(fd) return } // Process the message (dispatch to worker pool for heavy processing) cm.processMessage(client, buf[:n]) }} // Buffer pool for zero-allocation message handlingvar bufferPool = sync.Pool{ New: func() interface{} { return make([]byte, 4096) // 4KB buffers },} func (cm *ConnectionManager) closeConnection(fd int) { cm.clientsMu.Lock() client, exists := cm.clients[fd] if exists { // Return buffers to pool if client.recvBuf != nil { bufferPool.Put(client.recvBuf) } if client.sendBuf != nil { bufferPool.Put(client.sendBuf) } delete(cm.clients, fd) } cm.clientsMu.Unlock() // Deregister from epoll and close socket unix.EpollCtl(cm.epollFD, unix.EPOLL_CTL_DEL, fd, nil) unix.Close(fd)} func (cm *ConnectionManager) processMessage(client *ClientConnection, data []byte) { // Dispatch to worker pool for CPU-intensive processing // This keeps the event loop fast and non-blocking} func getListenerFD(listener net.Listener) int { // Platform-specific extraction of file descriptor return 0 // Simplified} func main() { cm, err := NewConnectionManager(":8080") if err != nil { log.Fatal(err) } cm.Run()}Edge-triggered mode (EPOLLET) notifies you only when state changes—when data arrives, not while data is available. This reduces system calls but requires draining buffers completely per notification. Level-triggered mode notifies you continuously while data is available—simpler but with more syscall overhead. High-performance servers typically use edge-triggered mode with careful buffer management.
Real-time systems at scale separate concerns into specialized server tiers. The connection server (also called gateway server or edge server) is dedicated solely to managing client connections, while backend services handle business logic, state management, and message routing.
Why Separate Connection Servers?
Resource Isolation: Connection handling and business logic have different resource profiles. Mixing them leads to unpredictable resource contention.
Independent Scaling: You can scale connection servers based on concurrent users and backend servers based on message throughput.
Graceful Degradation: Backend issues don't immediately disconnect users; connection servers can buffer or queue messages temporarily.
Deployment Flexibility: Update business logic without disrupting live connections.
Connection Server Responsibilities:
Connection Server Sizing:
A modern connection server on a well-tuned Linux box with 32GB RAM can typically handle:
| Workload Type | Connections per Server | Notes |
|---|---|---|
| Chat (low message rate) | 100,000 - 500,000 | State dominated by connection overhead |
| Collaborative editing | 50,000 - 200,000 | More state per connection, higher message rates |
| Gaming (real-time) | 10,000 - 50,000 | High message frequency, low latency requirements |
| Streaming (server push) | 200,000 - 1,000,000 | Minimal per-connection state, one-way traffic |
Determining Optimal Connections Per Server:
The magic number depends on:
Discord runs dedicated 'Gateway' servers that handle WebSocket connections. Each gateway manages connections for a subset of guilds (servers), with consistent hashing determining which gateway handles which guild. When a gateway fails, clients reconnect to another gateway, which reloads state from their central session store. This architecture allows them to handle millions of concurrent connections across their fleet.
Load balancing persistent connections differs fundamentally from HTTP load balancing. HTTP connections are short-lived, so round-robin or least-connections algorithms work well. But WebSocket connections last minutes to hours, creating challenges:
Challenge 1: Connection Imbalance Over Time
If Server A receives 1000 connections at 9 AM and Server B receives 1000 at 10 AM, but Server A's users disconnect over the day while Server B's users stay connected, you end up with Server A mostly idle and Server B overloaded—even though the load balancer distributed connections "fairly."
Challenge 2: Session Affinity Requirements
Once a WebSocket connection is established, all messages on that connection must go to the same backend server. The load balancer must maintain session affinity (stickiness) for the connection duration.
Challenge 3: Graceful Draining
When taking a server out of rotation for maintenance, you can't simply stop routing new connections. Existing connections must be drained gracefully, which can take hours if users stay connected.
Recommended Approach: L4 with Intelligent Backend Selection
For most real-time systems, use L4 load balancing (TCP level) combined with intelligent initial routing:
/connect endpoint (L7 routed)123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112
package main import ( "crypto/sha256" "encoding/binary" "encoding/json" "net/http" "sync/atomic") // ConnectionRouter determines which connection server a client should usetype ConnectionRouter struct { servers []ServerInfo healthChecker *HealthChecker algorithm RoutingAlgorithm} type ServerInfo struct { URL string CurrentLoad int64 // Current connection count MaxCapacity int64 // Maximum connections this server should accept HealthScore int // 0-100, higher is better DrainMode bool // True if server is being drained for maintenance} type RoutingAlgorithm int const ( LeastConnections RoutingAlgorithm = iota ConsistentHashing WeightedRandom) // GetConnectionServer returns the best server for a new connectionfunc (r *ConnectionRouter) GetConnectionServer(userID string, guildID string) *ServerInfo { healthyServers := r.filterHealthyServers() if len(healthyServers) == 0 { return nil } switch r.algorithm { case ConsistentHashing: // Use consistent hashing for guild-based routing // All users in same guild connect to same server for efficiency return r.consistentHashSelect(guildID, healthyServers) case LeastConnections: // Find server with lowest load relative to capacity return r.leastConnectionsSelect(healthyServers) case WeightedRandom: // Random selection weighted by available capacity return r.weightedRandomSelect(healthyServers) } return healthyServers[0]} func (r *ConnectionRouter) filterHealthyServers() []*ServerInfo { var healthy []*ServerInfo for i := range r.servers { server := &r.servers[i] if server.HealthScore >= 50 && !server.DrainMode && atomic.LoadInt64(&server.CurrentLoad) < server.MaxCapacity { healthy = append(healthy, server) } } return healthy} func (r *ConnectionRouter) consistentHashSelect(key string, servers []*ServerInfo) *ServerInfo { // Simple consistent hashing - in production, use a ring with virtual nodes hash := sha256.Sum256([]byte(key)) hashValue := binary.BigEndian.Uint64(hash[:8]) index := hashValue % uint64(len(servers)) return servers[index]} func (r *ConnectionRouter) leastConnectionsSelect(servers []*ServerInfo) *ServerInfo { var best *ServerInfo var bestRatio float64 = 2.0 // Higher than possible load ratio for _, server := range servers { currentLoad := float64(atomic.LoadInt64(&server.CurrentLoad)) ratio := currentLoad / float64(server.MaxCapacity) if ratio < bestRatio { bestRatio = ratio best = server } } return best} // HTTP handler for connection routingfunc (r *ConnectionRouter) HandleConnectRequest(w http.ResponseWriter, req *http.Request) { userID := req.Header.Get("X-User-ID") guildID := req.URL.Query().Get("guild_id") server := r.GetConnectionServer(userID, guildID) if server == nil { http.Error(w, "No healthy servers available", http.StatusServiceUnavailable) return } // Return WebSocket URL for client to connect directly response := map[string]string{ "websocket_url": server.URL, } json.NewEncoder(w).Encode(response)}When users need to interact frequently (same chat room, same game lobby, same document), routing them to the same connection server eliminates cross-server communication. Consistent hashing on room/guild/document ID achieves this automatically, with graceful rebalancing when servers are added or removed.
Each connection requires associated state: user identity, subscriptions, permissions, and session data. How you store and access this state dramatically impacts scalability.
State Storage Strategies:
| Strategy | Description | Pros | Cons |
|---|---|---|---|
| In-Memory (Per-Server) | State lives in process memory alongside connection | Fastest access, no network calls | Lost on crash, limits horizontal scaling |
| Externalized (Redis) | State stored in Redis, fetched on demand | Survives crashes, enables seamless failover | Network latency for every state access |
| Hybrid (Hot/Warm) | Frequently accessed state in memory, full state in Redis | Balance of speed and resilience | Complexity of cache invalidation |
| Session Tokens (Stateless) | All state encoded in signed token, client holds state | True statelessness, infinite horizontal scaling | Token size limits, cannot revoke mid-session |
The Hybrid Approach in Practice:
Most production systems use a hybrid model:
Minimal hot state in memory: User ID, connection ID, list of subscription IDs (not full subscription objects)
Full state in Redis: User profile, permissions, subscription details, rate limiting counters
Lazy loading with caching: Fetch from Redis on first access, cache locally with TTL
Event-driven invalidation: Pub/sub notifications to invalidate local cache when state changes
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125
package state import ( "context" "encoding/json" "sync" "time" "github.com/go-redis/redis/v8") // ConnectionState represents minimal in-memory state per connection// This struct should be as small as possible - every byte is multiplied by connection counttype ConnectionState struct { ConnectionID uint64 // 8 bytes UserID uint64 // 8 bytes DeviceID uint32 // 4 bytes Permissions uint32 // 4 bytes - bitfield of permissions Subscriptions []uint32 // Variable - channel IDs, not full objects ConnectedAt int64 // 8 bytes - unix timestamp LastActivity int64 // 8 bytes - for idle detection} // Full state stored in Redis - loaded on demandtype FullUserState struct { UserID uint64 Username string AvatarURL string Permissions map[string][]string // Channel -> Permissions Preferences UserPreferences RateLimitTokens int PresenceStatus string CustomStatus string} type UserPreferences struct { NotificationsEnabled bool Theme string Language string} // StateManager handles hybrid local/remote statetype StateManager struct { redis *redis.Client localCache sync.Map // map[uint64]*cachedState ttl time.Duration} type cachedState struct { state *FullUserState expiresAt time.Time} func NewStateManager(redisClient *redis.Client) *StateManager { return &StateManager{ redis: redisClient, ttl: time.Minute * 5, }} // GetFullState retrieves full user state with local cachingfunc (sm *StateManager) GetFullState(ctx context.Context, userID uint64) (*FullUserState, error) { // Check local cache first if cached, ok := sm.localCache.Load(userID); ok { cs := cached.(*cachedState) if time.Now().Before(cs.expiresAt) { return cs.state, nil } // Expired, delete and fetch fresh sm.localCache.Delete(userID) } // Fetch from Redis key := fmt.Sprintf("user:state:%d", userID) data, err := sm.redis.Get(ctx, key).Bytes() if err != nil { if err == redis.Nil { return nil, ErrUserNotFound } return nil, err } var state FullUserState if err := json.Unmarshal(data, &state); err != nil { return nil, err } // Cache locally sm.localCache.Store(userID, &cachedState{ state: &state, expiresAt: time.Now().Add(sm.ttl), }) return &state, nil} // InvalidateLocalCache removes user from local cache// Called when receiving cache invalidation event from Redis pub/subfunc (sm *StateManager) InvalidateLocalCache(userID uint64) { sm.localCache.Delete(userID)} // SubscribeToCacheInvalidation listens for cache invalidation eventsfunc (sm *StateManager) SubscribeToCacheInvalidation(ctx context.Context) { pubsub := sm.redis.Subscribe(ctx, "cache:invalidate:user") defer pubsub.Close() for { select { case <-ctx.Done(): return case msg := <-pubsub.Channel(): // Message payload is user ID var userID uint64 if _, err := fmt.Sscanf(msg.Payload, "%d", &userID); err == nil { sm.InvalidateLocalCache(userID) } } }} // PublishCacheInvalidation notifies all servers to invalidate cached statefunc (sm *StateManager) PublishCacheInvalidation(ctx context.Context, userID uint64) error { return sm.redis.Publish(ctx, "cache:invalidate:user", userID).Err()}Connection state must be explicitly cleaned up when connections close. Memory leaks from orphaned state are a leading cause of connection server instability. Use finalizers, defer statements, or explicit cleanup routines with connection close events. Monitor with heap profiles comparing state count to active connection count.
Connections can die silently. A client's network may drop, a mobile device may enter airplane mode, or a laptop may close without proper disconnect. Without active detection, the server holds resources for connections that no longer exist—eventually exhausting file descriptors or memory.
Detection Mechanisms:
| Mechanism | Layer | Latency to Detect | Overhead |
|---|---|---|---|
| TCP Keepalive | OS/TCP | Minutes (configurable) | Very low - OS handles it |
| WebSocket Ping/Pong | Application | Seconds to minutes | Low - small frames |
| Application Heartbeat | Application | Seconds | Medium - custom logic |
| Activity Timeout | Application | Variable | Minimal - timer check |
Best Practice: Layered Detection
Production systems use multiple layers:
TCP Keepalive (net.ipv4.tcp_keepalive_time = 600): OS-level detection, catches most dead connections
WebSocket Ping/Pong (every 30-60 seconds): Application-level heartbeat, confirms client is responsive
Activity Timeout (5-10 minutes of no messages): Client may be connected but idle; consider disconnecting to free resources
Client-Initiated Heartbeat: Require clients to send periodic pings; track last-ping timestamp
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173
package keepalive import ( "sync" "time" "github.com/gorilla/websocket") type KeepaliveManager struct { connections sync.Map // map[connectionID]*trackedConnection pingInterval time.Duration pongTimeout time.Duration idleTimeout time.Duration ticker *time.Ticker stopCh chan struct{}} type trackedConnection struct { conn *websocket.Conn connectionID string lastPong time.Time lastActivity time.Time pingPending bool mu sync.Mutex} func NewKeepaliveManager(pingInterval, pongTimeout, idleTimeout time.Duration) *KeepaliveManager { return &KeepaliveManager{ pingInterval: pingInterval, // e.g., 30 seconds pongTimeout: pongTimeout, // e.g., 10 seconds idleTimeout: idleTimeout, // e.g., 5 minutes ticker: time.NewTicker(pingInterval), stopCh: make(chan struct{}), }} func (km *KeepaliveManager) Track(connID string, conn *websocket.Conn) { tc := &trackedConnection{ conn: conn, connectionID: connID, lastPong: time.Now(), lastActivity: time.Now(), } // Set up pong handler conn.SetPongHandler(func(appData string) error { tc.mu.Lock() tc.lastPong = time.Now() tc.pingPending = false tc.mu.Unlock() return nil }) km.connections.Store(connID, tc)} func (km *KeepaliveManager) RecordActivity(connID string) { if val, ok := km.connections.Load(connID); ok { tc := val.(*trackedConnection) tc.mu.Lock() tc.lastActivity = time.Now() tc.mu.Unlock() }} func (km *KeepaliveManager) Untrack(connID string) { km.connections.Delete(connID)} // Run starts the keepalive check loopfunc (km *KeepaliveManager) Run(onTimeout func(connID string)) { for { select { case <-km.stopCh: return case <-km.ticker.C: km.checkConnections(onTimeout) } }} func (km *KeepaliveManager) checkConnections(onTimeout func(connID string)) { now := time.Now() var toClose []string km.connections.Range(func(key, value interface{}) bool { connID := key.(string) tc := value.(*trackedConnection) tc.mu.Lock() defer tc.mu.Unlock() // Check for pong timeout (ping was sent but no pong received) if tc.pingPending && now.Sub(tc.lastPong) > km.pongTimeout { toClose = append(toClose, connID) return true } // Check for idle timeout if now.Sub(tc.lastActivity) > km.idleTimeout { toClose = append(toClose, connID) return true } // Send ping if interval has passed if !tc.pingPending && now.Sub(tc.lastPong) > km.pingInterval { if err := tc.conn.WriteControl( websocket.PingMessage, []byte{}, now.Add(time.Second), ); err != nil { toClose = append(toClose, connID) } else { tc.pingPending = true } } return true }) // Close dead connections for _, connID := range toClose { if val, ok := km.connections.Load(connID); ok { tc := val.(*trackedConnection) tc.conn.Close() km.connections.Delete(connID) onTimeout(connID) } }} func (km *KeepaliveManager) Stop() { close(km.stopCh) km.ticker.Stop()} // Metrics returns current connection health statisticsfunc (km *KeepaliveManager) Metrics() KeepaliveMetrics { var total, healthy, pendingPong, nearIdle int km.connections.Range(func(key, value interface{}) bool { total++ tc := value.(*trackedConnection) tc.mu.Lock() defer tc.mu.Unlock() if time.Since(tc.lastPong) < km.pingInterval*2 { healthy++ } if tc.pingPending { pendingPong++ } if time.Since(tc.lastActivity) > km.idleTimeout/2 { nearIdle++ } return true }) return KeepaliveMetrics{ TotalConnections: total, HealthyConnections: healthy, PendingPong: pendingPong, NearIdleTimeout: nearIdle, }} type KeepaliveMetrics struct { TotalConnections int HealthyConnections int PendingPong int NearIdleTimeout int}Mobile clients are especially prone to silent disconnections. Aggressive keepalive (every 15-30 seconds) helps detect dead connections faster but consumes battery. Many mobile apps use longer intervals (60+ seconds) and accept delayed detection as a tradeoff. Some implement push notification fallback: if the persistent connection dies, critical messages route through APNs/FCM.
Deploying updates to connection servers requires care. Unlike HTTP servers where you can simply stop accepting new requests, connection servers have long-lived connections that shouldn't be abruptly terminated.
Graceful Shutdown Sequence:
Mark server as draining: Remove from load balancer rotation, stop accepting new connections
Notify clients: Send a "please reconnect elsewhere" message with the new server URL
Wait for voluntary disconnect: Give clients time to reconnect (30-60 seconds)
Force remaining disconnections: For clients that didn't voluntarily reconnect, close connections
Shutdown: Process can now terminate safely
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
package main import ( "context" "encoding/json" "log" "os" "os/signal" "sync" "syscall" "time") type GracefulShutdownManager struct { server *ConnectionServer draining bool shutdownComplete chan struct{} mu sync.RWMutex} type ReconnectMessage struct { Type string `json:"type"` Reason string `json:"reason"` ReconnectURL string `json:"reconnect_url"` ReconnectIn int `json:"reconnect_in_seconds"`} func (gsm *GracefulShutdownManager) HandleSignals() { sigCh := make(chan os.Signal, 1) signal.Notify(sigCh, syscall.SIGTERM, syscall.SIGINT, syscall.SIGUSR1) for sig := range sigCh { switch sig { case syscall.SIGTERM, syscall.SIGINT: log.Println("Received shutdown signal, initiating graceful drain...") gsm.initiateGracefulShutdown() case syscall.SIGUSR1: log.Println("Received SIGUSR1, initiating drain without shutdown...") gsm.initiateDrain() } }} func (gsm *GracefulShutdownManager) initiateGracefulShutdown() { ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute) defer cancel() // Phase 1: Enter drain mode (stop accepting new connections) gsm.mu.Lock() gsm.draining = true gsm.mu.Unlock() log.Printf("Phase 1: Entered drain mode, stopping new connections") gsm.server.StopAcceptingNewConnections() // Phase 2: Notify all connected clients to reconnect elsewhere reconnectURL := gsm.server.GetAlternateServerURL() message := ReconnectMessage{ Type: "reconnect", Reason: "server_maintenance", ReconnectURL: reconnectURL, ReconnectIn: 30, // Client should reconnect within 30 seconds } messageBytes, _ := json.Marshal(message) log.Printf("Phase 2: Notifying %d connections to reconnect", gsm.server.ConnectionCount()) gsm.server.BroadcastToAll(messageBytes) // Phase 3: Wait for voluntary disconnections waitDuration := 45 * time.Second checkInterval := time.Second deadline := time.Now().Add(waitDuration) log.Printf("Phase 3: Waiting %v for voluntary disconnections", waitDuration) for time.Now().Before(deadline) { remaining := gsm.server.ConnectionCount() if remaining == 0 { log.Println("All connections closed voluntarily") break } log.Printf("Waiting... %d connections remaining", remaining) time.Sleep(checkInterval) } // Phase 4: Force close remaining connections remaining := gsm.server.ConnectionCount() if remaining > 0 { log.Printf("Phase 4: Force closing %d remaining connections", remaining) gsm.server.ForceCloseAllConnections("server_shutdown") } // Phase 5: Wait for all goroutines to finish log.Println("Phase 5: Waiting for cleanup...") gsm.server.WaitForShutdown(ctx) log.Println("Graceful shutdown complete") close(gsm.shutdownComplete)} func (gsm *GracefulShutdownManager) initiateDrain() { gsm.mu.Lock() gsm.draining = true gsm.mu.Unlock() gsm.server.StopAcceptingNewConnections() log.Println("Server is now in drain mode - no new connections accepted")} func (gsm *GracefulShutdownManager) IsDraining() bool { gsm.mu.RLock() defer gsm.mu.RUnlock() return gsm.draining} func (gsm *GracefulShutdownManager) WaitForShutdown() { <-gsm.shutdownComplete} // ConnectionServer stub for illustrationtype ConnectionServer struct { // ... connection management} func (s *ConnectionServer) StopAcceptingNewConnections() { /* ... */ }func (s *ConnectionServer) GetAlternateServerURL() string { return "wss://alt-server.example.com" }func (s *ConnectionServer) ConnectionCount() int { return 0 }func (s *ConnectionServer) BroadcastToAll(msg []byte) { /* ... */ }func (s *ConnectionServer) ForceCloseAllConnections(reason string) { /* ... */ }func (s *ConnectionServer) WaitForShutdown(ctx context.Context) { /* ... */ }Well-designed clients should handle reconnection messages gracefully by maintaining local state while reconnecting. The client should connect to the suggested URL, re-authenticate, and resume operations without user-visible disruption. Implementing exponential backoff with jitter prevents thundering herd when many clients reconnect simultaneously.
Managing millions of concurrent real-time connections requires deliberate architectural decisions at every layer—from kernel tuning to application-level state management.
You now understand the core principles of connection management at scale. Next, we'll explore Presence Systems—how to track and broadcast user online status in real-time across millions of users, a fundamental building block for chat, collaboration, and social applications.