Loading learning content...
Every real-time system designer eventually faces the same question: "What happens when we have 10x, 100x, or 1000x more users?"
Scaling real-time systems is fundamentally harder than scaling request-response systems. When a web server gets overloaded, you can add more servers and distribute requests. But when a WebSocket server reaches capacity, you can't simply move existing connections—they're persistent, stateful, and users will notice if they're disrupted.
This page synthesizes the scaling challenges we've touched on throughout this module and provides a comprehensive framework for capacity planning, horizontal scaling, graceful degradation, and operational excellence in real-time systems.
By the end of this page, you will understand: • Capacity planning methodologies for real-time systems • Horizontal scaling patterns and their tradeoffs • Graceful degradation strategies when systems are overwhelmed • Monitoring, alerting, and observability best practices • Operational runbooks for common real-time system issues • Cost optimization strategies for persistent connections
Capacity planning for real-time systems differs from traditional web applications because you're not just handling request throughput—you're managing concurrent connections that consume resources continuously.
Key Metrics to Model:
| Metric | What It Measures | Typical Bottleneck | How to Measure |
|---|---|---|---|
| Concurrent connections | Active persistent connections | File descriptors, memory | Connection counter + peaks |
| Message rate (in) | Messages received per second | CPU, event loop | Network ingress + message parsing |
| Message rate (out) | Messages sent per second | CPU, network egress | Fan-out × message rate |
| Fan-out ratio | Recipients per message | Cross-server communication | Avg subscribers per broadcast |
| Message size | Bytes per message | Memory, network | Sample and histogram |
| Connection duration | How long connections persist | State management | Distribution analysis |
| Reconnection rate | How often clients reconnect | Auth service, connection handling | Reconnect events / time |
Capacity Model Example:
Let's model a chat application:
Target: 1,000,000 concurrent users
Typical user behavior:
- Average 2 rooms per user
- Average 50 users per room
- Average 1 message sent per user per minute
- Average message size: 200 bytes
Calculations:
- Total rooms: 1,000,000 / 50 = 20,000 active rooms
- Messages/second: 1,000,000 / 60 = ~16,700 messages/second incoming
- Fan-out: Each message reaches 50 users = 835,000 messages/second outgoing
- Bandwidth out: 835,000 × 200 bytes = 167 MB/second = 1.3 Gbps
Server Sizing:
If each server handles:
- 100,000 connections max
- 50,000 messages/second throughput
Then you need:
- Connections: 1,000,000 / 100,000 = 10 servers
- Throughput: 835,000 / 50,000 = 17 servers
- Answer: 17 servers minimum (throughput limited)
- Recommended: 25 servers (50% headroom for spikes)
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139
"""Capacity Planning Model for Real-Time Chat System""" from dataclasses import dataclassfrom typing import Optional @dataclassclass WorkloadProfile: concurrent_users: int avg_rooms_per_user: float avg_users_per_room: int msgs_per_user_per_minute: float avg_message_bytes: int # Optional: spiky traffic modeling peak_multiplier: float = 2.0 # Peak = 2x average daily_peak_duration_hours: float = 4.0 @dataclassclass ServerCapacity: max_connections: int max_msgs_per_second_in: int max_msgs_per_second_out: int max_bandwidth_mbps: int cost_per_hour: float @dataclassclass CapacityPlan: servers_needed: int limiting_factor: str headroom_percent: float estimated_monthly_cost: float # Breakdown connection_limited: int ingest_limited: int fanout_limited: int bandwidth_limited: int def calculate_capacity( workload: WorkloadProfile, server: ServerCapacity, headroom: float = 0.5 # 50% headroom) -> CapacityPlan: """Calculate required server count for a real-time workload.""" # Calculate derived metrics total_rooms = workload.concurrent_users / workload.avg_users_per_room # Incoming message rate msgs_per_second_in = ( workload.concurrent_users * workload.msgs_per_user_per_minute / 60 ) # Fan-out: each message goes to room members avg_fanout = workload.avg_users_per_room msgs_per_second_out = msgs_per_second_in * avg_fanout # Bandwidth requirements bandwidth_in_mbps = (msgs_per_second_in * workload.avg_message_bytes * 8) / 1_000_000 bandwidth_out_mbps = (msgs_per_second_out * workload.avg_message_bytes * 8) / 1_000_000 # Calculate server count for each dimension connection_limited = workload.concurrent_users / server.max_connections ingest_limited = msgs_per_second_in / server.max_msgs_per_second_in fanout_limited = msgs_per_second_out / server.max_msgs_per_second_out bandwidth_limited = max(bandwidth_in_mbps, bandwidth_out_mbps) / server.max_bandwidth_mbps # Apply peak multiplier peak_connection = connection_limited * workload.peak_multiplier peak_ingest = ingest_limited * workload.peak_multiplier peak_fanout = fanout_limited * workload.peak_multiplier peak_bandwidth = bandwidth_limited * workload.peak_multiplier # Find the limiting factor limits = { 'connections': peak_connection, 'message_ingest': peak_ingest, 'message_fanout': peak_fanout, 'bandwidth': peak_bandwidth, } limiting_factor = max(limits, key=limits.get) min_servers = max(limits.values()) # Add headroom servers_needed = int(min_servers * (1 + headroom)) + 1 # Cost estimation (assuming 24x7 operation) monthly_cost = servers_needed * server.cost_per_hour * 24 * 30 return CapacityPlan( servers_needed=servers_needed, limiting_factor=limiting_factor, headroom_percent=headroom * 100, estimated_monthly_cost=monthly_cost, connection_limited=int(connection_limited), ingest_limited=int(ingest_limited), fanout_limited=int(fanout_limited), bandwidth_limited=int(bandwidth_limited), ) # Example usageif __name__ == "__main__": workload = WorkloadProfile( concurrent_users=1_000_000, avg_rooms_per_user=2.0, avg_users_per_room=50, msgs_per_user_per_minute=1.0, avg_message_bytes=200, ) server = ServerCapacity( max_connections=100_000, max_msgs_per_second_in=10_000, max_msgs_per_second_out=50_000, max_bandwidth_mbps=1000, cost_per_hour=0.50, ) plan = calculate_capacity(workload, server) print(f"""Capacity Plan for {workload.concurrent_users:,} concurrent users:━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Servers needed: {plan.servers_needed}Limiting factor: {plan.limiting_factor}Headroom: {plan.headroom_percent}% Breakdown (without headroom): - Connection-limited: {plan.connection_limited} servers - Ingest-limited: {plan.ingest_limited} servers - Fanout-limited: {plan.fanout_limited} servers - Bandwidth-limited: {plan.bandwidth_limited} servers Estimated monthly cost: ${plan.estimated_monthly_cost:, .0f } """)Real-time systems often have pronounced peaks (evening hours, virtual events, breaking news). A system sized for average load will fail at peak. Always model peak traffic separately—typically 2-5x average—and ensure capacity handles peaks with headroom.
Horizontal scaling adds servers to handle more load. For real-time systems, this requires careful design to maintain consistency and enable cross-server communication.
Pattern 1: Shared-Nothing Partitioning
Each server owns a partition of data/users. No cross-server communication needed for within-partition operations.
Pattern 2: Pub/Sub Backbone
Servers don't talk directly. All cross-server messages go through a message broker (Redis, NATS, Kafka).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131
package scaling import ( "context" "encoding/json" "github.com/go-redis/redis/v8") // PubSubBackbone enables cross-server message routingtype PubSubBackbone struct { redis *redis.Client serverID string localDelivery LocalDeliveryFunc subscriptions map[string]bool} type LocalDeliveryFunc func(channelID string, message []byte) type CrossServerMessage struct { SourceServer string `json:"source"` ChannelID string `json:"channel"` Payload []byte `json:"payload"`} func NewPubSubBackbone( redisClient *redis.Client, serverID string, deliveryFunc LocalDeliveryFunc,) *PubSubBackbone { return &PubSubBackbone{ redis: redisClient, serverID: serverID, localDelivery: deliveryFunc, subscriptions: make(map[string]bool), }} // Publish sends a message to all servers with subscribers in this channelfunc (p *PubSubBackbone) Publish(ctx context.Context, channelID string, message []byte) error { msg := CrossServerMessage{ SourceServer: p.serverID, ChannelID: channelID, Payload: message, } data, err := json.Marshal(msg) if err != nil { return err } // Redis Pub/Sub topic per logical channel topic := "realtime:" + channelID return p.redis.Publish(ctx, topic, data).Err()} // Subscribe starts listening for messages on a channelfunc (p *PubSubBackbone) Subscribe(ctx context.Context, channelID string) error { if p.subscriptions[channelID] { return nil // Already subscribed } topic := "realtime:" + channelID pubsub := p.redis.Subscribe(ctx, topic) p.subscriptions[channelID] = true // Handle incoming messages in goroutine go func() { defer pubsub.Close() defer delete(p.subscriptions, channelID) for { select { case <-ctx.Done(): return case msg := <-pubsub.Channel(): var crossMsg CrossServerMessage if err := json.Unmarshal([]byte(msg.Payload), &crossMsg); err != nil { continue } // Don't deliver our own messages back to us if crossMsg.SourceServer == p.serverID { continue } // Deliver to local connections p.localDelivery(crossMsg.ChannelID, crossMsg.Payload) } } }() return nil} // Unsubscribe stops listening for a channel (when no local subscribers remain)func (p *PubSubBackbone) Unsubscribe(ctx context.Context, channelID string) error { delete(p.subscriptions, channelID) topic := "realtime:" + channelID return p.redis.Unsubscribe(ctx, topic).Err()} // Using the backbone in a connection server: type ConnectionServer struct { backbone *PubSubBackbone localRooms map[string]map[string]*Connection // roomID -> connectionID -> Connection} func (s *ConnectionServer) BroadcastToRoom(roomID string, message []byte) { // 1. Deliver to local connections if room, ok := s.localRooms[roomID]; ok { for _, conn := range room { conn.Send(message) } } // 2. Publish to other servers ctx := context.Background() s.backbone.Publish(ctx, roomID, message)} func (s *ConnectionServer) handleCrossServerMessage(roomID string, message []byte) { // Deliver to local connections only if room, ok := s.localRooms[roomID]; ok { for _, conn := range room { conn.Send(message) } }}Pattern 3: Sticky Sessions with Reliable Handoff
Users stick to one server, but can be moved during scale-down or server maintenance.
Comparing Patterns:
| Pattern | Pros | Cons | Best For |
|---|---|---|---|
| Shared-Nothing | Simple, no cross-server traffic | Users in same room must be on same server | Naturally partitionable data (chat rooms, games) |
| Pub/Sub Backbone | Flexible, users can be anywhere | Message broker becomes SPOF | General-purpose, user-to-user messaging |
| Sticky + Handoff | Simple mental model | Handoff complexity, temporary disconnects | Session-based applications |
| Hybrid | Best of both worlds | Implementation complexity | Large-scale production systems |
Standard Redis Pub/Sub has limits (~100K messages/second). For higher scale, use Redis Cluster (sharded by channel) or purpose-built systems like NATS, Kafka, or AWS EventBridge. NATS is particularly designed for real-time pub/sub and can handle millions of messages per second.
When systems are overwhelmed, they shouldn't crash—they should degrade gracefully, preserving core functionality while shedding non-essential load.
Degradation Hierarchy:
| Level | Response | User Impact | Trigger |
|---|---|---|---|
| Normal | Full functionality | None | Load < 70% capacity |
| Warning | Reduce nice-to-have features | Minor (slower typing indicators) | Load > 70% |
| Degraded | Core functionality only | Noticeable (no presence updates) | Load > 85% |
| Critical | Shed load, reject new connections | Severe (some users disconnected) | Load > 95% |
| Emergency | System protection mode | Major (service unavailable for new users) | Load > 100% |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204
package degradation import ( "sync/atomic" "time") type DegradationLevel int const ( LevelNormal DegradationLevel = iota LevelWarning LevelDegraded LevelCritical LevelEmergency) // DegradationController manages system degradation based on loadtype DegradationController struct { currentLevel atomic.Int32 loadMetrics *LoadMetrics featureFlags *FeatureFlags checkInterval time.Duration} type LoadMetrics struct { CPUPercent float64 MemoryPercent float64 ConnectionCount int64 ConnectionLimit int64 MessageQueueDepth int64 QueueLimit int64} type FeatureFlags struct { EnableTypingIndicators bool EnablePresenceUpdates bool EnableReadReceipts bool EnableRichPreviews bool EnableHistoryLoad bool AcceptNewConnections bool EnableNonCriticalAPIs bool} func NewDegradationController() *DegradationController { dc := &DegradationController{ checkInterval: time.Second * 5, featureFlags: defaultFeatureFlags(), } go dc.monitorLoop() return dc} func defaultFeatureFlags() *FeatureFlags { return &FeatureFlags{ EnableTypingIndicators: true, EnablePresenceUpdates: true, EnableReadReceipts: true, EnableRichPreviews: true, EnableHistoryLoad: true, AcceptNewConnections: true, EnableNonCriticalAPIs: true, }} func (dc *DegradationController) monitorLoop() { ticker := time.NewTicker(dc.checkInterval) defer ticker.Stop() for range ticker.C { metrics := dc.loadMetrics if metrics == nil { continue } level := dc.calculateLevel(metrics) oldLevel := DegradationLevel(dc.currentLevel.Load()) if level != oldLevel { dc.currentLevel.Store(int32(level)) dc.applyDegradation(level) dc.alertLevelChange(oldLevel, level) } }} func (dc *DegradationController) calculateLevel(m *LoadMetrics) DegradationLevel { // Calculate overall load score (0-1) cpuLoad := m.CPUPercent / 100 memLoad := m.MemoryPercent / 100 connLoad := float64(m.ConnectionCount) / float64(m.ConnectionLimit) queueLoad := float64(m.MessageQueueDepth) / float64(m.QueueLimit) // Weighted average, prioritizing connection and queue load loadScore := (cpuLoad*0.2 + memLoad*0.2 + connLoad*0.3 + queueLoad*0.3) switch { case loadScore >= 1.0: return LevelEmergency case loadScore >= 0.95: return LevelCritical case loadScore >= 0.85: return LevelDegraded case loadScore >= 0.70: return LevelWarning default: return LevelNormal }} func (dc *DegradationController) applyDegradation(level DegradationLevel) { flags := dc.featureFlags switch level { case LevelNormal: // Full functionality flags.EnableTypingIndicators = true flags.EnablePresenceUpdates = true flags.EnableReadReceipts = true flags.EnableRichPreviews = true flags.EnableHistoryLoad = true flags.AcceptNewConnections = true flags.EnableNonCriticalAPIs = true case LevelWarning: // Reduce frequency of non-critical updates flags.EnableTypingIndicators = true // But at reduced frequency flags.EnablePresenceUpdates = true // But batched more aggressively flags.EnableReadReceipts = true flags.EnableRichPreviews = false // Disable link previews flags.EnableHistoryLoad = true flags.AcceptNewConnections = true flags.EnableNonCriticalAPIs = true case LevelDegraded: // Core functionality only flags.EnableTypingIndicators = false flags.EnablePresenceUpdates = false flags.EnableReadReceipts = false flags.EnableRichPreviews = false flags.EnableHistoryLoad = true // Keep but rate limit flags.AcceptNewConnections = true flags.EnableNonCriticalAPIs = false case LevelCritical: // Shed non-essential load flags.EnableTypingIndicators = false flags.EnablePresenceUpdates = false flags.EnableReadReceipts = false flags.EnableRichPreviews = false flags.EnableHistoryLoad = false // Disable history loading flags.AcceptNewConnections = false // Stop accepting new connections flags.EnableNonCriticalAPIs = false case LevelEmergency: // System protection mode // Consider disconnecting low-value or idle connections flags.EnableTypingIndicators = false flags.EnablePresenceUpdates = false flags.EnableReadReceipts = false flags.EnableRichPreviews = false flags.EnableHistoryLoad = false flags.AcceptNewConnections = false flags.EnableNonCriticalAPIs = false // Trigger aggressive measures dc.triggerEmergencyMeasures() }} func (dc *DegradationController) triggerEmergencyMeasures() { // Consider: // 1. Disconnect idle connections (no activity for 5+ minutes) // 2. Reduce message queue retention // 3. Drop non-delivery-critical messages // 4. Page on-call for manual intervention} func (dc *DegradationController) alertLevelChange(old, new DegradationLevel) { // Log and alert on degradation level changes // This should page on-call for Critical/Emergency} // Middleware to check degradation before processing requestsfunc (dc *DegradationController) ShouldProcess(featureType string) bool { flags := dc.featureFlags switch featureType { case "typing_indicator": return flags.EnableTypingIndicators case "presence": return flags.EnablePresenceUpdates case "read_receipt": return flags.EnableReadReceipts case "rich_preview": return flags.EnableRichPreviews case "history": return flags.EnableHistoryLoad case "new_connection": return flags.AcceptNewConnections default: return flags.EnableNonCriticalAPIs }}Load Shedding Strategies:
| Strategy | Description | When to Use |
|---|---|---|
| Reject new connections | Stop accepting new connections | Approaching capacity |
| Drop non-critical messages | Skip typing indicators, read receipts | High load |
| Reduce broadcast scope | Limit presence to close contacts only | Very high load |
| Disconnect idle users | Close connections with no recent activity | Critical load |
| Priority queuing | Process paid/premium users first | Critical load |
| Circuit break downstream | Stop calling non-essential services | Cascade risk |
When features are disabled, inform users. A subtle banner saying 'Some features temporarily unavailable' is better than users thinking the app is broken. This builds trust and reduces support tickets.
Real-time systems require real-time observability. Traditional request-response metrics (requests/second, latency percentiles) need augmentation with connection and message metrics.
Essential Metrics Dashboard:
| Category | Metric | Alert Threshold (example) |
|---|---|---|
| Connections | Total active connections | 90% of capacity |
| Connections | Connection rate (new/second) | 1000 (potential attack) |
| Connections | Disconnection rate | 10% in 5 minutes (issue!) |
| Messages | Messages in per second | 80% of capacity |
| Messages | Messages out per second | 80% of capacity |
| Messages | Message queue depth | 10,000 (backing up) |
| Latency | Message delivery p99 | 500ms |
| Latency | Connection establishment p99 | 2s |
| Errors | Failed message deliveries/min | 100 |
| Errors | Connection errors/min | 50 |
| Resources | CPU usage | 80% |
| Resources | Memory usage | 85% |
| Resources | File descriptors used | 90% |
| Cross-server | Pub/sub latency p99 | 100ms |
| Cross-server | Pub/sub message rate | 80% of Redis capacity |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109
package metrics import ( "time" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promauto") // Real-time system metricsvar ( // Connection metrics ActiveConnections = promauto.NewGauge(prometheus.GaugeOpts{ Name: "realtime_active_connections", Help: "Current number of active WebSocket connections", }) ConnectionsTotal = promauto.NewCounterVec(prometheus.CounterOpts{ Name: "realtime_connections_total", Help: "Total connections by status", }, []string{"status"}) // "established", "closed", "rejected", "error" ConnectionDuration = promauto.NewHistogram(prometheus.HistogramOpts{ Name: "realtime_connection_duration_seconds", Help: "Duration of WebSocket connections", Buckets: []float64{1, 10, 60, 300, 600, 1800, 3600, 7200}, }) // Message metrics MessagesReceived = promauto.NewCounter(prometheus.CounterOpts{ Name: "realtime_messages_received_total", Help: "Total messages received from clients", }) MessagesSent = promauto.NewCounter(prometheus.CounterOpts{ Name: "realtime_messages_sent_total", Help: "Total messages sent to clients", }) MessageDeliveryLatency = promauto.NewHistogram(prometheus.HistogramOpts{ Name: "realtime_message_delivery_latency_ms", Help: "Time from message receipt to delivery to all recipients", Buckets: []float64{1, 5, 10, 25, 50, 100, 250, 500, 1000}, }) MessageQueueDepth = promauto.NewGauge(prometheus.GaugeOpts{ Name: "realtime_message_queue_depth", Help: "Current depth of the outbound message queue", }) // Fanout metrics FanoutSize = promauto.NewHistogram(prometheus.HistogramOpts{ Name: "realtime_fanout_recipients", Help: "Number of recipients per broadcast message", Buckets: []float64{1, 5, 10, 25, 50, 100, 500, 1000, 5000}, }) // Error metrics DeliveryErrors = promauto.NewCounterVec(prometheus.CounterOpts{ Name: "realtime_delivery_errors_total", Help: "Failed message deliveries by error type", }, []string{"error_type"}) // Cross-server metrics PubSubPublishLatency = promauto.NewHistogram(prometheus.HistogramOpts{ Name: "realtime_pubsub_publish_latency_ms", Help: "Latency of publishing to cross-server pub/sub", Buckets: []float64{0.5, 1, 2, 5, 10, 25, 50, 100}, }) PubSubMessagesReceived = promauto.NewCounter(prometheus.CounterOpts{ Name: "realtime_pubsub_messages_received_total", Help: "Messages received from other servers via pub/sub", }) // Presence metrics PresenceUpdates = promauto.NewCounter(prometheus.CounterOpts{ Name: "realtime_presence_updates_total", Help: "Presence status changes broadcasted", }) OnlineUsers = promauto.NewGauge(prometheus.GaugeOpts{ Name: "realtime_online_users", Help: "Unique users currently online", })) // TrackMessageDelivery measures end-to-end message deliveryfunc TrackMessageDelivery(startTime time.Time, recipientCount int) { latencyMs := float64(time.Since(startTime).Milliseconds()) MessageDeliveryLatency.Observe(latencyMs) FanoutSize.Observe(float64(recipientCount)) MessagesSent.Add(float64(recipientCount))} // Example Grafana dashboard query (PromQL):// // # Message delivery p99 latency// histogram_quantile(0.99, rate(realtime_message_delivery_latency_ms_bucket[5m]))//// # Connection churn rate// rate(realtime_connections_total{status="established"}[5m]) +// rate(realtime_connections_total{status="closed"}[5m])//// # Capacity usage// realtime_active_connections / realtime_connection_capacity//// # Error rate// rate(realtime_delivery_errors_total[5m]) / rate(realtime_messages_sent_total[5m])Distributed Tracing for Real-Time:
Tracing a message through a real-time system:
Unlike request-response, a single message can spawn many traces across servers. Use trace propagation headers in your pub/sub messages.
At millions of messages/second, tracing every message is prohibitively expensive. Sample strategically: 100% of errors, 100% of slow messages (p99+), 1% of normal traffic. This gives visibility without drowning in data.
Real-time systems have unique operational challenges. Here are runbooks for common scenarios:
Runbook 1: Sudden Connection Spike
Symptoms: Connection count rising rapidly, approaching capacity
Investigation:
Response:
Runbook 2: High Message Latency
Symptoms: Message delivery p99 > 500ms
Investigation:
Response:
Runbook 3: Mass Disconnection Event
Symptoms: Disconnection rate > 10% in 5 minutes
Investigation:
Response:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
# Real-Time System Incident Response Checklist ## Severity Classification | Severity | Definition | Response Time | Example ||----------|------------|---------------|---------|| SEV1 | Complete outage | Immediate | All connections dropped || SEV2 | Major degradation | 15 min | >50% latency increase || SEV3 | Partial degradation | 1 hour | One region affected || SEV4 | Minor issue | 24 hours | Non-critical feature broken | ## Initial Response (First 5 Minutes) - [ ] Acknowledge alert and page- [ ] Open incident channel (#incident-YYYYMMDD-N)- [ ] Initial assessment: scope, user impact, trend direction- [ ] Post initial update to status page- [ ] Determine if this needs escalation ## Investigation Checklist ### Connections- [ ] Current connection count vs capacity- [ ] New connections per second- [ ] Disconnection rate and reasons- [ ] Geographic distribution of issues ### Messages - [ ] Message rate (in and out)- [ ] Delivery latency percentiles- [ ] Queue depths- [ ] Failed delivery count ### Infrastructure- [ ] CPU/memory on connection servers- [ ] Redis/pub-sub health- [ ] Network connectivity and latency- [ ] Recent deployments or changes ### Downstream Dependencies- [ ] Auth service health- [ ] Database connectivity- [ ] Storage service health ## Common Mitigations ### Emergency Traffic Reduction```bash# Enable degradation modekubectl set env deployment/connection-server DEGRADATION_LEVEL=3 # Stop accepting new connectionskubectl set env deployment/connection-server ACCEPT_NEW_CONNECTIONS=false # Increase pod countkubectl scale deployment/connection-server --replicas=50``` ### Connection Server Replacement```bash# Drain specific servercurl -X POST http://server-1:9000/admin/drain # Wait for connections to migrate# Monitor: realtime_active_connections{server="server-1"} # Terminate serverkubectl delete pod connection-server-xxxxx``` ## Post-Incident - [ ] All-clear posted to status page- [ ] Incident channel summary written- [ ] Timeline documented- [ ] Action items captured- [ ] Post-mortem scheduled (within 48 hours for SEV1/SEV2)Run regular game days where you intentionally cause failures (kill servers, inject latency, simulate attacks) to practice incident response. Real incidents are smoother when the team has rehearsed the response.
Persistent connections are expensive. Unlike HTTP where you pay per request, real-time systems pay for every second a connection is open—even if the user is idle.
Cost Breakdown Example:
1,000,000 concurrent connections
Server cost: $0.50/hour per server handling 100K connections
Servers needed: 15 (with headroom)
Monthly server cost: 15 × $0.50 × 24 × 30 = $5,400
Plus:
- Redis for pub/sub: $1,500/month
- Load balancers: $500/month
- Data transfer: $2,000/month (at 1.3 Gbps egress)
- Monitoring/logging: $500/month
Total: ~$10,000/month for 1M connections
Cost per connection: $0.01/month
Cost per MAU (if 5% concurrent): $0.20/month
Cost Optimization Strategies:
| Strategy | Savings | Tradeoff |
|---|---|---|
| Disconnect idle connections | 30-50% | Reconnection latency, user experience |
| Use spot/preemptible instances | 50-70% | Instance termination, need graceful handling |
| Right-size instances | 20-40% | Less headroom for spikes |
| Compress messages | 20-40% data transfer | CPU overhead |
| Reduce update frequency | Variable | Less real-time experience |
| Use cheaper regions | 10-30% | Higher latency for users |
| Reserved instances | 30-60% | Commitment, less flexibility |
Idle Connection Management:
Many connections are idle most of the time. Consider:
Aggressive idle timeout: Disconnect after 5 minutes of no activity (not just no messages—no mouse movement/app focus)
Push-to-pull fallback: When user goes idle, switch to polling or push notifications instead of persistent connection
Background/foreground differentiation: Mobile apps in background should disconnect; deliver via push notifications
Tiered service levels: Free users get more aggressive timeouts; paid users get persistent connections
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
package costaware import ( "time") type ConnectionTier int const ( TierFree ConnectionTier = iota TierPro TierEnterprise) type CostAwareConnectionPolicy struct { tier ConnectionTier} func (p *CostAwareConnectionPolicy) GetIdleTimeout() time.Duration { switch p.tier { case TierFree: return 5 * time.Minute // Disconnect free users after 5 min idle case TierPro: return 30 * time.Minute // Pro users get 30 minutes case TierEnterprise: return 24 * time.Hour // Enterprise: essentially no timeout default: return 5 * time.Minute }} func (p *CostAwareConnectionPolicy) GetHeartbeatInterval() time.Duration { switch p.tier { case TierFree: return 60 * time.Second // Less frequent heartbeats case TierPro: return 30 * time.Second case TierEnterprise: return 15 * time.Second // Faster detection default: return 60 * time.Second }} func (p *CostAwareConnectionPolicy) ShouldUseCompression() bool { // Compression uses CPU but saves bandwidth // Worth it for high-traffic scenarios return true} func (p *CostAwareConnectionPolicy) GetMessageBatchWindow() time.Duration { switch p.tier { case TierFree: return 500 * time.Millisecond // Batch more for efficiency case TierPro: return 100 * time.Millisecond case TierEnterprise: return 0 // No batching, immediate delivery default: return 500 * time.Millisecond }} // BackgroundConnectionHandler handles mobile apps going to backgroundtype BackgroundConnectionHandler struct { pushNotificationService PushService} func (h *BackgroundConnectionHandler) OnAppBackground(conn *Connection) { // Schedule disconnection after delay // (user might quickly return to app) time.AfterFunc(30*time.Second, func() { if conn.IsStillBackground() { // Disconnect WebSocket, use push for notifications conn.Close("app_background") h.pushNotificationService.RegisterForPush(conn.UserID) } })} func (h *BackgroundConnectionHandler) OnAppForeground(userID string) { // User returned - they'll reconnect via WebSocket // Messages received via push while away will show in history load h.pushNotificationService.UnregisterForPush(userID)} type PushService interface { RegisterForPush(userID string) UnregisterForPush(userID string)}Know your cost per user. If you charge $5/month and real-time costs $0.50/user, that's 10% of revenue. For free products, every optimization directly improves unit economics. Track cost per connection and cost per message as key metrics alongside performance.
The real-time landscape continues to evolve. Here are trends to watch:
1. Edge Computing for Real-Time:
Push computation closer to users:
2. WebTransport:
Successor to WebSocket with:
3. CRDTs Going Mainstream:
Conflict-free data structures enabling:
4. AI-Powered Real-Time:
5. WebRTC for More Than Video:
Peer-to-peer data channels for:
| Technology | What It Enables | Maturity | When to Consider |
|---|---|---|---|
| WebTransport | Unreliable streams, HTTP/3 | Early adoption | New gaming/video projects |
| Durable Objects | Edge stateful computing | Production | Global low-latency apps |
| NATS JetStream | Persistent real-time streams | Production | Event-driven architectures |
| Y-CRDT | Collaborative editing | Production | Document/canvas collaboration |
| WebRTC DataChannel | P2P data transfer | Mature | Video + data, gaming |
While it's tempting to adopt every new technology, most production systems should use proven patterns (WebSocket + Redis pub/sub, good observability). Adopt new technologies incrementally, validating in production with limited scope before full rollout.
Scaling real-time systems requires thinking differently: you're managing persistent connections, not transient requests; state, not just computation; continuous flow, not discrete events.
Congratulations! You've completed the Real-Time Architecture Patterns module. You now understand connection management, presence systems, real-time collaboration, gaming architectures, and the scaling considerations that tie them all together. These patterns form the foundation of modern interactive applications—from chat and collaboration to gaming and beyond.