System Design (HLD)Real-Time Architecture Patterns

Real-Time Architecture Patterns

LevelAdvanced

Duration90 mins

TopicReal-Time Architecture Patterns

5 / 5

Scaling Considerations

The Scale Problem in Real-Time

Every real-time system designer eventually faces the same question: "What happens when we have 10x, 100x, or 1000x more users?"

Scaling real-time systems is fundamentally harder than scaling request-response systems. When a web server gets overloaded, you can add more servers and distribute requests. But when a WebSocket server reaches capacity, you can't simply move existing connections—they're persistent, stateful, and users will notice if they're disrupted.

This page synthesizes the scaling challenges we've touched on throughout this module and provides a comprehensive framework for capacity planning, horizontal scaling, graceful degradation, and operational excellence in real-time systems.

What You Will Learn

By the end of this page, you will understand: • Capacity planning methodologies for real-time systems • Horizontal scaling patterns and their tradeoffs • Graceful degradation strategies when systems are overwhelmed • Monitoring, alerting, and observability best practices • Operational runbooks for common real-time system issues • Cost optimization strategies for persistent connections

Capacity Planning for Real-Time Systems

Capacity planning for real-time systems differs from traditional web applications because you're not just handling request throughput—you're managing concurrent connections that consume resources continuously.

Key Metrics to Model:

Real-Time System Capacity Metrics
Metric	What It Measures	Typical Bottleneck	How to Measure
Concurrent connections	Active persistent connections	File descriptors, memory	Connection counter + peaks
Message rate (in)	Messages received per second	CPU, event loop	Network ingress + message parsing
Message rate (out)	Messages sent per second	CPU, network egress	Fan-out × message rate
Fan-out ratio	Recipients per message	Cross-server communication	Avg subscribers per broadcast
Message size	Bytes per message	Memory, network	Sample and histogram
Connection duration	How long connections persist	State management	Distribution analysis
Reconnection rate	How often clients reconnect	Auth service, connection handling	Reconnect events / time

Capacity Model Example:

Let's model a chat application:

Target: 1,000,000 concurrent users
Typical user behavior:
  - Average 2 rooms per user
  - Average 50 users per room
  - Average 1 message sent per user per minute
  - Average message size: 200 bytes

Calculations:
  - Total rooms: 1,000,000 / 50 = 20,000 active rooms
  - Messages/second: 1,000,000 / 60 = ~16,700 messages/second incoming
  - Fan-out: Each message reaches 50 users = 835,000 messages/second outgoing
  - Bandwidth out: 835,000 × 200 bytes = 167 MB/second = 1.3 Gbps

Server Sizing:

If each server handles:
  - 100,000 connections max
  - 50,000 messages/second throughput

Then you need:
  - Connections: 1,000,000 / 100,000 = 10 servers
  - Throughput: 835,000 / 50,000 = 17 servers
  - Answer: 17 servers minimum (throughput limited)
  - Recommended: 25 servers (50% headroom for spikes)

capacity_model.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
"""
Capacity Planning Model for Real-Time Chat System
"""
 
from dataclasses import dataclass
from typing import Optional
 
@dataclass
class WorkloadProfile:
    concurrent_users: int
    avg_rooms_per_user: float
    avg_users_per_room: int
    msgs_per_user_per_minute: float
    avg_message_bytes: int
    
    # Optional: spiky traffic modeling
    peak_multiplier: float = 2.0  # Peak = 2x average
    daily_peak_duration_hours: float = 4.0
 
@dataclass
class ServerCapacity:
    max_connections: int
    max_msgs_per_second_in: int
    max_msgs_per_second_out: int
    max_bandwidth_mbps: int
    cost_per_hour: float
 
@dataclass
class CapacityPlan:
    servers_needed: int
    limiting_factor: str
    headroom_percent: float
    estimated_monthly_cost: float
    
    # Breakdown
    connection_limited: int
    ingest_limited: int
    fanout_limited: int
    bandwidth_limited: int
 
def calculate_capacity(
    workload: WorkloadProfile,
    server: ServerCapacity,
    headroom: float = 0.5  # 50% headroom
) -> CapacityPlan:
    """Calculate required server count for a real-time workload."""
    
    # Calculate derived metrics
    total_rooms = workload.concurrent_users / workload.avg_users_per_room
    
    # Incoming message rate
    msgs_per_second_in = (
        workload.concurrent_users * workload.msgs_per_user_per_minute / 60
    )
    
    # Fan-out: each message goes to room members
    avg_fanout = workload.avg_users_per_room
    msgs_per_second_out = msgs_per_second_in * avg_fanout
    
    # Bandwidth requirements
    bandwidth_in_mbps = (msgs_per_second_in * workload.avg_message_bytes * 8) / 1_000_000
    bandwidth_out_mbps = (msgs_per_second_out * workload.avg_message_bytes * 8) / 1_000_000
    
    # Calculate server count for each dimension
    connection_limited = workload.concurrent_users / server.max_connections
    ingest_limited = msgs_per_second_in / server.max_msgs_per_second_in
    fanout_limited = msgs_per_second_out / server.max_msgs_per_second_out
    bandwidth_limited = max(bandwidth_in_mbps, bandwidth_out_mbps) / server.max_bandwidth_mbps
    
    # Apply peak multiplier
    peak_connection = connection_limited * workload.peak_multiplier
    peak_ingest = ingest_limited * workload.peak_multiplier
    peak_fanout = fanout_limited * workload.peak_multiplier
    peak_bandwidth = bandwidth_limited * workload.peak_multiplier
    
    # Find the limiting factor
    limits = {
        'connections': peak_connection,
        'message_ingest': peak_ingest,
        'message_fanout': peak_fanout,
        'bandwidth': peak_bandwidth,
    }
    
    limiting_factor = max(limits, key=limits.get)
    min_servers = max(limits.values())
    
    # Add headroom
    servers_needed = int(min_servers * (1 + headroom)) + 1
    
    # Cost estimation (assuming 24x7 operation)
    monthly_cost = servers_needed * server.cost_per_hour * 24 * 30
    
    return CapacityPlan(
        servers_needed=servers_needed,
        limiting_factor=limiting_factor,
        headroom_percent=headroom * 100,
        estimated_monthly_cost=monthly_cost,
        connection_limited=int(connection_limited),
        ingest_limited=int(ingest_limited),
        fanout_limited=int(fanout_limited),
        bandwidth_limited=int(bandwidth_limited),
    )
 
# Example usage
if __name__ == "__main__":
    workload = WorkloadProfile(
        concurrent_users=1_000_000,
        avg_rooms_per_user=2.0,
        avg_users_per_room=50,
        msgs_per_user_per_minute=1.0,
        avg_message_bytes=200,
    )
    
    server = ServerCapacity(
        max_connections=100_000,
        max_msgs_per_second_in=10_000,
        max_msgs_per_second_out=50_000,
        max_bandwidth_mbps=1000,
        cost_per_hour=0.50,
    )
    
    plan = calculate_capacity(workload, server)
    
    print(f"""
Capacity Plan for {workload.concurrent_users:,} concurrent users:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Servers needed: {plan.servers_needed}
Limiting factor: {plan.limiting_factor}
Headroom: {plan.headroom_percent}%
 
Breakdown (without headroom):
  - Connection-limited: {plan.connection_limited} servers
  - Ingest-limited: {plan.ingest_limited} servers
  - Fanout-limited: {plan.fanout_limited} servers
  - Bandwidth-limited: {plan.bandwidth_limited} servers
 
Estimated monthly cost: ${plan.estimated_monthly_cost:, .0f
                        }
    """)

Plan for Peaks, Not Averages

Real-time systems often have pronounced peaks (evening hours, virtual events, breaking news). A system sized for average load will fail at peak. Always model peak traffic separately—typically 2-5x average—and ensure capacity handles peaks with headroom.

Horizontal Scaling Patterns

Horizontal scaling adds servers to handle more load. For real-time systems, this requires careful design to maintain consistency and enable cross-server communication.

Pattern 1: Shared-Nothing Partitioning

Each server owns a partition of data/users. No cross-server communication needed for within-partition operations.

Converting Mermaid diagram...

Pattern 2: Pub/Sub Backbone

Servers don't talk directly. All cross-server messages go through a message broker (Redis, NATS, Kafka).

Converting Mermaid diagram...

pubsub_backbone.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
package scaling
 
import (
    "context"
    "encoding/json"
 
    "github.com/go-redis/redis/v8"
)
 
// PubSubBackbone enables cross-server message routing
type PubSubBackbone struct {
    redis           *redis.Client
    serverID        string
    localDelivery   LocalDeliveryFunc
    subscriptions   map[string]bool
}
 
type LocalDeliveryFunc func(channelID string, message []byte)
 
type CrossServerMessage struct {
    SourceServer string `json:"source"`
    ChannelID    string `json:"channel"`
    Payload      []byte `json:"payload"`
}
 
func NewPubSubBackbone(
    redisClient *redis.Client,
    serverID string,
    deliveryFunc LocalDeliveryFunc,
) *PubSubBackbone {
    return &PubSubBackbone{
        redis:         redisClient,
        serverID:      serverID,
        localDelivery: deliveryFunc,
        subscriptions: make(map[string]bool),
    }
}
 
// Publish sends a message to all servers with subscribers in this channel
func (p *PubSubBackbone) Publish(ctx context.Context, channelID string, message []byte) error {
    msg := CrossServerMessage{
        SourceServer: p.serverID,
        ChannelID:    channelID,
        Payload:      message,
    }
 
    data, err := json.Marshal(msg)
    if err != nil {
        return err
    }
 
    // Redis Pub/Sub topic per logical channel
    topic := "realtime:" + channelID
    return p.redis.Publish(ctx, topic, data).Err()
}
 
// Subscribe starts listening for messages on a channel
func (p *PubSubBackbone) Subscribe(ctx context.Context, channelID string) error {
    if p.subscriptions[channelID] {
        return nil // Already subscribed
    }
 
    topic := "realtime:" + channelID
    pubsub := p.redis.Subscribe(ctx, topic)
 
    p.subscriptions[channelID] = true
 
    // Handle incoming messages in goroutine
    go func() {
        defer pubsub.Close()
        defer delete(p.subscriptions, channelID)
 
        for {
            select {
            case <-ctx.Done():
                return
            case msg := <-pubsub.Channel():
                var crossMsg CrossServerMessage
                if err := json.Unmarshal([]byte(msg.Payload), &crossMsg); err != nil {
                    continue
                }
 
                // Don't deliver our own messages back to us
                if crossMsg.SourceServer == p.serverID {
                    continue
                }
 
                // Deliver to local connections
                p.localDelivery(crossMsg.ChannelID, crossMsg.Payload)
            }
        }
    }()
 
    return nil
}
 
// Unsubscribe stops listening for a channel (when no local subscribers remain)
func (p *PubSubBackbone) Unsubscribe(ctx context.Context, channelID string) error {
    delete(p.subscriptions, channelID)
    topic := "realtime:" + channelID
    return p.redis.Unsubscribe(ctx, topic).Err()
}
 
// Using the backbone in a connection server:
 
type ConnectionServer struct {
    backbone    *PubSubBackbone
    localRooms  map[string]map[string]*Connection  // roomID -> connectionID -> Connection
}
 
func (s *ConnectionServer) BroadcastToRoom(roomID string, message []byte) {
    // 1. Deliver to local connections
    if room, ok := s.localRooms[roomID]; ok {
        for _, conn := range room {
            conn.Send(message)
        }
    }
 
    // 2. Publish to other servers
    ctx := context.Background()
    s.backbone.Publish(ctx, roomID, message)
}
 
func (s *ConnectionServer) handleCrossServerMessage(roomID string, message []byte) {
    // Deliver to local connections only
    if room, ok := s.localRooms[roomID]; ok {
        for _, conn := range room {
            conn.Send(message)
        }
    }
}

Pattern 3: Sticky Sessions with Reliable Handoff

Users stick to one server, but can be moved during scale-down or server maintenance.

Comparing Patterns:

Horizontal Scaling Pattern Comparison
Pattern	Pros	Cons	Best For
Shared-Nothing	Simple, no cross-server traffic	Users in same room must be on same server	Naturally partitionable data (chat rooms, games)
Pub/Sub Backbone	Flexible, users can be anywhere	Message broker becomes SPOF	General-purpose, user-to-user messaging
Sticky + Handoff	Simple mental model	Handoff complexity, temporary disconnects	Session-based applications
Hybrid	Best of both worlds	Implementation complexity	Large-scale production systems

Redis Cluster for Pub/Sub Scale

Standard Redis Pub/Sub has limits (~100K messages/second). For higher scale, use Redis Cluster (sharded by channel) or purpose-built systems like NATS, Kafka, or AWS EventBridge. NATS is particularly designed for real-time pub/sub and can handle millions of messages per second.

Graceful Degradation Strategies

When systems are overwhelmed, they shouldn't crash—they should degrade gracefully, preserving core functionality while shedding non-essential load.

Degradation Hierarchy:

Level	Response	User Impact	Trigger
Normal	Full functionality	None	Load < 70% capacity
Warning	Reduce nice-to-have features	Minor (slower typing indicators)	Load > 70%
Degraded	Core functionality only	Noticeable (no presence updates)	Load > 85%
Critical	Shed load, reject new connections	Severe (some users disconnected)	Load > 95%
Emergency	System protection mode	Major (service unavailable for new users)	Load > 100%

graceful_degradation.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
package degradation
 
import (
    "sync/atomic"
    "time"
)
 
type DegradationLevel int
 
const (
    LevelNormal DegradationLevel = iota
    LevelWarning
    LevelDegraded
    LevelCritical
    LevelEmergency
)
 
// DegradationController manages system degradation based on load
type DegradationController struct {
    currentLevel      atomic.Int32
    loadMetrics       *LoadMetrics
    featureFlags      *FeatureFlags
    checkInterval     time.Duration
}
 
type LoadMetrics struct {
    CPUPercent        float64
    MemoryPercent     float64
    ConnectionCount   int64
    ConnectionLimit   int64
    MessageQueueDepth int64
    QueueLimit        int64
}
 
type FeatureFlags struct {
    EnableTypingIndicators bool
    EnablePresenceUpdates  bool
    EnableReadReceipts     bool
    EnableRichPreviews     bool
    EnableHistoryLoad      bool
    AcceptNewConnections   bool
    EnableNonCriticalAPIs  bool
}
 
func NewDegradationController() *DegradationController {
    dc := &DegradationController{
        checkInterval: time.Second * 5,
        featureFlags:  defaultFeatureFlags(),
    }
    go dc.monitorLoop()
    return dc
}
 
func defaultFeatureFlags() *FeatureFlags {
    return &FeatureFlags{
        EnableTypingIndicators: true,
        EnablePresenceUpdates:  true,
        EnableReadReceipts:     true,
        EnableRichPreviews:     true,
        EnableHistoryLoad:      true,
        AcceptNewConnections:   true,
        EnableNonCriticalAPIs:  true,
    }
}
 
func (dc *DegradationController) monitorLoop() {
    ticker := time.NewTicker(dc.checkInterval)
    defer ticker.Stop()
 
    for range ticker.C {
        metrics := dc.loadMetrics
        if metrics == nil {
            continue
        }
 
        level := dc.calculateLevel(metrics)
        oldLevel := DegradationLevel(dc.currentLevel.Load())
        
        if level != oldLevel {
            dc.currentLevel.Store(int32(level))
            dc.applyDegradation(level)
            dc.alertLevelChange(oldLevel, level)
        }
    }
}
 
func (dc *DegradationController) calculateLevel(m *LoadMetrics) DegradationLevel {
    // Calculate overall load score (0-1)
    cpuLoad := m.CPUPercent / 100
    memLoad := m.MemoryPercent / 100
    connLoad := float64(m.ConnectionCount) / float64(m.ConnectionLimit)
    queueLoad := float64(m.MessageQueueDepth) / float64(m.QueueLimit)
 
    // Weighted average, prioritizing connection and queue load
    loadScore := (cpuLoad*0.2 + memLoad*0.2 + connLoad*0.3 + queueLoad*0.3)
 
    switch {
    case loadScore >= 1.0:
        return LevelEmergency
    case loadScore >= 0.95:
        return LevelCritical
    case loadScore >= 0.85:
        return LevelDegraded
    case loadScore >= 0.70:
        return LevelWarning
    default:
        return LevelNormal
    }
}
 
func (dc *DegradationController) applyDegradation(level DegradationLevel) {
    flags := dc.featureFlags
 
    switch level {
    case LevelNormal:
        // Full functionality
        flags.EnableTypingIndicators = true
        flags.EnablePresenceUpdates = true
        flags.EnableReadReceipts = true
        flags.EnableRichPreviews = true
        flags.EnableHistoryLoad = true
        flags.AcceptNewConnections = true
        flags.EnableNonCriticalAPIs = true
 
    case LevelWarning:
        // Reduce frequency of non-critical updates
        flags.EnableTypingIndicators = true  // But at reduced frequency
        flags.EnablePresenceUpdates = true   // But batched more aggressively
        flags.EnableReadReceipts = true
        flags.EnableRichPreviews = false     // Disable link previews
        flags.EnableHistoryLoad = true
        flags.AcceptNewConnections = true
        flags.EnableNonCriticalAPIs = true
 
    case LevelDegraded:
        // Core functionality only
        flags.EnableTypingIndicators = false
        flags.EnablePresenceUpdates = false
        flags.EnableReadReceipts = false
        flags.EnableRichPreviews = false
        flags.EnableHistoryLoad = true       // Keep but rate limit
        flags.AcceptNewConnections = true
        flags.EnableNonCriticalAPIs = false
 
    case LevelCritical:
        // Shed non-essential load
        flags.EnableTypingIndicators = false
        flags.EnablePresenceUpdates = false
        flags.EnableReadReceipts = false
        flags.EnableRichPreviews = false
        flags.EnableHistoryLoad = false     // Disable history loading
        flags.AcceptNewConnections = false  // Stop accepting new connections
        flags.EnableNonCriticalAPIs = false
 
    case LevelEmergency:
        // System protection mode
        // Consider disconnecting low-value or idle connections
        flags.EnableTypingIndicators = false
        flags.EnablePresenceUpdates = false
        flags.EnableReadReceipts = false
        flags.EnableRichPreviews = false
        flags.EnableHistoryLoad = false
        flags.AcceptNewConnections = false
        flags.EnableNonCriticalAPIs = false
        
        // Trigger aggressive measures
        dc.triggerEmergencyMeasures()
    }
}
 
func (dc *DegradationController) triggerEmergencyMeasures() {
    // Consider:
    // 1. Disconnect idle connections (no activity for 5+ minutes)
    // 2. Reduce message queue retention
    // 3. Drop non-delivery-critical messages
    // 4. Page on-call for manual intervention
}
 
func (dc *DegradationController) alertLevelChange(old, new DegradationLevel) {
    // Log and alert on degradation level changes
    // This should page on-call for Critical/Emergency
}
 
// Middleware to check degradation before processing requests
func (dc *DegradationController) ShouldProcess(featureType string) bool {
    flags := dc.featureFlags
    
    switch featureType {
    case "typing_indicator":
        return flags.EnableTypingIndicators
    case "presence":
        return flags.EnablePresenceUpdates
    case "read_receipt":
        return flags.EnableReadReceipts
    case "rich_preview":
        return flags.EnableRichPreviews
    case "history":
        return flags.EnableHistoryLoad
    case "new_connection":
        return flags.AcceptNewConnections
    default:
        return flags.EnableNonCriticalAPIs
    }
}

Load Shedding Strategies:

Strategy	Description	When to Use
Reject new connections	Stop accepting new connections	Approaching capacity
Drop non-critical messages	Skip typing indicators, read receipts	High load
Reduce broadcast scope	Limit presence to close contacts only	Very high load
Disconnect idle users	Close connections with no recent activity	Critical load
Priority queuing	Process paid/premium users first	Critical load
Circuit break downstream	Stop calling non-essential services	Cascade risk

Communicate Degradation to Users

When features are disabled, inform users. A subtle banner saying 'Some features temporarily unavailable' is better than users thinking the app is broken. This builds trust and reduces support tickets.

Monitoring and Observability

Real-time systems require real-time observability. Traditional request-response metrics (requests/second, latency percentiles) need augmentation with connection and message metrics.

Essential Metrics Dashboard:

Real-Time System Metrics
Category	Metric	Alert Threshold (example)
Connections	Total active connections	90% of capacity
Connections	Connection rate (new/second)	1000 (potential attack)
Connections	Disconnection rate	10% in 5 minutes (issue!)
Messages	Messages in per second	80% of capacity
Messages	Messages out per second	80% of capacity
Messages	Message queue depth	10,000 (backing up)
Latency	Message delivery p99	500ms
Latency	Connection establishment p99	2s
Errors	Failed message deliveries/min	100
Errors	Connection errors/min	50
Resources	CPU usage	80%
Resources	Memory usage	85%
Resources	File descriptors used	90%
Cross-server	Pub/sub latency p99	100ms
Cross-server	Pub/sub message rate	80% of Redis capacity

metrics.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
package metrics
 
import (
    "time"
    
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)
 
// Real-time system metrics
var (
    // Connection metrics
    ActiveConnections = promauto.NewGauge(prometheus.GaugeOpts{
        Name: "realtime_active_connections",
        Help: "Current number of active WebSocket connections",
    })
 
    ConnectionsTotal = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "realtime_connections_total",
        Help: "Total connections by status",
    }, []string{"status"}) // "established", "closed", "rejected", "error"
 
    ConnectionDuration = promauto.NewHistogram(prometheus.HistogramOpts{
        Name:    "realtime_connection_duration_seconds",
        Help:    "Duration of WebSocket connections",
        Buckets: []float64{1, 10, 60, 300, 600, 1800, 3600, 7200},
    })
 
    // Message metrics
    MessagesReceived = promauto.NewCounter(prometheus.CounterOpts{
        Name: "realtime_messages_received_total",
        Help: "Total messages received from clients",
    })
 
    MessagesSent = promauto.NewCounter(prometheus.CounterOpts{
        Name: "realtime_messages_sent_total",
        Help: "Total messages sent to clients",
    })
 
    MessageDeliveryLatency = promauto.NewHistogram(prometheus.HistogramOpts{
        Name:    "realtime_message_delivery_latency_ms",
        Help:    "Time from message receipt to delivery to all recipients",
        Buckets: []float64{1, 5, 10, 25, 50, 100, 250, 500, 1000},
    })
 
    MessageQueueDepth = promauto.NewGauge(prometheus.GaugeOpts{
        Name: "realtime_message_queue_depth",
        Help: "Current depth of the outbound message queue",
    })
 
    // Fanout metrics
    FanoutSize = promauto.NewHistogram(prometheus.HistogramOpts{
        Name:    "realtime_fanout_recipients",
        Help:    "Number of recipients per broadcast message",
        Buckets: []float64{1, 5, 10, 25, 50, 100, 500, 1000, 5000},
    })
 
    // Error metrics
    DeliveryErrors = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "realtime_delivery_errors_total",
        Help: "Failed message deliveries by error type",
    }, []string{"error_type"})
 
    // Cross-server metrics
    PubSubPublishLatency = promauto.NewHistogram(prometheus.HistogramOpts{
        Name:    "realtime_pubsub_publish_latency_ms",
        Help:    "Latency of publishing to cross-server pub/sub",
        Buckets: []float64{0.5, 1, 2, 5, 10, 25, 50, 100},
    })
 
    PubSubMessagesReceived = promauto.NewCounter(prometheus.CounterOpts{
        Name: "realtime_pubsub_messages_received_total",
        Help: "Messages received from other servers via pub/sub",
    })
 
    // Presence metrics
    PresenceUpdates = promauto.NewCounter(prometheus.CounterOpts{
        Name: "realtime_presence_updates_total",
        Help: "Presence status changes broadcasted",
    })
 
    OnlineUsers = promauto.NewGauge(prometheus.GaugeOpts{
        Name: "realtime_online_users",
        Help: "Unique users currently online",
    })
)
 
// TrackMessageDelivery measures end-to-end message delivery
func TrackMessageDelivery(startTime time.Time, recipientCount int) {
    latencyMs := float64(time.Since(startTime).Milliseconds())
    MessageDeliveryLatency.Observe(latencyMs)
    FanoutSize.Observe(float64(recipientCount))
    MessagesSent.Add(float64(recipientCount))
}
 
// Example Grafana dashboard query (PromQL):
// 
// # Message delivery p99 latency
// histogram_quantile(0.99, rate(realtime_message_delivery_latency_ms_bucket[5m]))
//
// # Connection churn rate
// rate(realtime_connections_total{status="established"}[5m]) +
// rate(realtime_connections_total{status="closed"}[5m])
//
// # Capacity usage
// realtime_active_connections / realtime_connection_capacity
//
// # Error rate
// rate(realtime_delivery_errors_total[5m]) / rate(realtime_messages_sent_total[5m])

Distributed Tracing for Real-Time:

Tracing a message through a real-time system:

Client sends message → Generate trace ID
Connection server receives → Start span: "receive"
Authorize and validate → Child span: "auth"
Publish to pub/sub → Child span: "pubsub_publish"
Other servers receive → Child span: "pubsub_receive" (separate trace Context)
Deliver to recipients → Child spans: "deliver" (one per server)

Unlike request-response, a single message can spawn many traces across servers. Use trace propagation headers in your pub/sub messages.

Sample Wisely at Scale

At millions of messages/second, tracing every message is prohibitively expensive. Sample strategically: 100% of errors, 100% of slow messages (p99+), 1% of normal traffic. This gives visibility without drowning in data.

Operational Runbooks

Real-time systems have unique operational challenges. Here are runbooks for common scenarios:

Runbook 1: Sudden Connection Spike

Symptoms: Connection count rising rapidly, approaching capacity

Investigation:

Check if legitimate traffic (viral moment, event) or attack
Review geographic distribution of new connections
Check auth service for unusual patterns
Review client versions (bot attack often uses outdated clients)

Response:

If attack: Enable rate limiting, block suspicious IPs/regions
If legitimate: Scale up servers immediately, activate degradation mode
Alert additional on-call if scale-up needed

Runbook 2: High Message Latency

Symptoms: Message delivery p99 > 500ms

Investigation:

Check pub/sub latency (Redis response times)
Check CPU usage on connection servers
Check for hot partitions (one channel with millions of subscribers)
Check network between servers and pub/sub

Response:

If pub/sub overloaded: Scale Redis, consider sharding
If hot partition: Migrate heavy channel to dedicated server
If CPU bound: Scale connection servers
If network: Check for saturation, consider upgrading

Runbook 3: Mass Disconnection Event

Symptoms: Disconnection rate > 10% in 5 minutes

Investigation:

Check if server crashed or was terminated
Check network connectivity to affected server
Check client error logs for disconnect reasons
Check for deployment in progress

Response:

If server crash: Let auto-scale replace, investigate root cause
If network: Route traffic away from affected region
If deployment: Pause deployment, rollback if needed
If client bug: Push client hotfix, rate limit reconnections

incident_response.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# Real-Time System Incident Response Checklist
 
## Severity Classification
 
| Severity | Definition | Response Time | Example |
|----------|------------|---------------|---------|
| SEV1 | Complete outage | Immediate | All connections dropped |
| SEV2 | Major degradation | 15 min | >50% latency increase |
| SEV3 | Partial degradation | 1 hour | One region affected |
| SEV4 | Minor issue | 24 hours | Non-critical feature broken |
 
## Initial Response (First 5 Minutes)
 
- [ ] Acknowledge alert and page
- [ ] Open incident channel (#incident-YYYYMMDD-N)
- [ ] Initial assessment: scope, user impact, trend direction
- [ ] Post initial update to status page
- [ ] Determine if this needs escalation
 
## Investigation Checklist
 
### Connections
- [ ] Current connection count vs capacity
- [ ] New connections per second
- [ ] Disconnection rate and reasons
- [ ] Geographic distribution of issues
 
### Messages  
- [ ] Message rate (in and out)
- [ ] Delivery latency percentiles
- [ ] Queue depths
- [ ] Failed delivery count
 
### Infrastructure
- [ ] CPU/memory on connection servers
- [ ] Redis/pub-sub health
- [ ] Network connectivity and latency
- [ ] Recent deployments or changes
 
### Downstream Dependencies
- [ ] Auth service health
- [ ] Database connectivity
- [ ] Storage service health
 
## Common Mitigations
 
### Emergency Traffic Reduction
```bash
# Enable degradation mode
kubectl set env deployment/connection-server DEGRADATION_LEVEL=3
 
# Stop accepting new connections
kubectl set env deployment/connection-server ACCEPT_NEW_CONNECTIONS=false
 
# Increase pod count
kubectl scale deployment/connection-server --replicas=50
```
 
### Connection Server Replacement
```bash
# Drain specific server
curl -X POST http://server-1:9000/admin/drain
 
# Wait for connections to migrate
# Monitor: realtime_active_connections{server="server-1"}
 
# Terminate server
kubectl delete pod connection-server-xxxxx
```
 
## Post-Incident
 
- [ ] All-clear posted to status page
- [ ] Incident channel summary written
- [ ] Timeline documented
- [ ] Action items captured
- [ ] Post-mortem scheduled (within 48 hours for SEV1/SEV2)

Practice Makes Perfect

Run regular game days where you intentionally cause failures (kill servers, inject latency, simulate attacks) to practice incident response. Real incidents are smoother when the team has rehearsed the response.

Cost Optimization for Real-Time Systems

Persistent connections are expensive. Unlike HTTP where you pay per request, real-time systems pay for every second a connection is open—even if the user is idle.

Cost Breakdown Example:

1,000,000 concurrent connections
Server cost: $0.50/hour per server handling 100K connections
Servers needed: 15 (with headroom)

Monthly server cost: 15 × $0.50 × 24 × 30 = $5,400

Plus:
- Redis for pub/sub: $1,500/month
- Load balancers: $500/month
- Data transfer: $2,000/month (at 1.3 Gbps egress)
- Monitoring/logging: $500/month

Total: ~$10,000/month for 1M connections
Cost per connection: $0.01/month
Cost per MAU (if 5% concurrent): $0.20/month

Cost Optimization Strategies:

Real-Time Cost Optimization Strategies
Strategy	Savings	Tradeoff
Disconnect idle connections	30-50%	Reconnection latency, user experience
Use spot/preemptible instances	50-70%	Instance termination, need graceful handling
Right-size instances	20-40%	Less headroom for spikes
Compress messages	20-40% data transfer	CPU overhead
Reduce update frequency	Variable	Less real-time experience
Use cheaper regions	10-30%	Higher latency for users
Reserved instances	30-60%	Commitment, less flexibility

Idle Connection Management:

Many connections are idle most of the time. Consider:

Aggressive idle timeout: Disconnect after 5 minutes of no activity (not just no messages—no mouse movement/app focus)
Push-to-pull fallback: When user goes idle, switch to polling or push notifications instead of persistent connection
Background/foreground differentiation: Mobile apps in background should disconnect; deliver via push notifications
Tiered service levels: Free users get more aggressive timeouts; paid users get persistent connections

cost_aware_connections.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
package costaware
 
import (
    "time"
)
 
type ConnectionTier int
 
const (
    TierFree ConnectionTier = iota
    TierPro
    TierEnterprise
)
 
type CostAwareConnectionPolicy struct {
    tier   ConnectionTier
}
 
func (p *CostAwareConnectionPolicy) GetIdleTimeout() time.Duration {
    switch p.tier {
    case TierFree:
        return 5 * time.Minute   // Disconnect free users after 5 min idle
    case TierPro:
        return 30 * time.Minute  // Pro users get 30 minutes
    case TierEnterprise:
        return 24 * time.Hour    // Enterprise: essentially no timeout
    default:
        return 5 * time.Minute
    }
}
 
func (p *CostAwareConnectionPolicy) GetHeartbeatInterval() time.Duration {
    switch p.tier {
    case TierFree:
        return 60 * time.Second  // Less frequent heartbeats
    case TierPro:
        return 30 * time.Second
    case TierEnterprise:
        return 15 * time.Second  // Faster detection
    default:
        return 60 * time.Second
    }
}
 
func (p *CostAwareConnectionPolicy) ShouldUseCompression() bool {
    // Compression uses CPU but saves bandwidth
    // Worth it for high-traffic scenarios
    return true
}
 
func (p *CostAwareConnectionPolicy) GetMessageBatchWindow() time.Duration {
    switch p.tier {
    case TierFree:
        return 500 * time.Millisecond  // Batch more for efficiency
    case TierPro:
        return 100 * time.Millisecond
    case TierEnterprise:
        return 0  // No batching, immediate delivery
    default:
        return 500 * time.Millisecond
    }
}
 
// BackgroundConnectionHandler handles mobile apps going to background
type BackgroundConnectionHandler struct {
    pushNotificationService PushService
}
 
func (h *BackgroundConnectionHandler) OnAppBackground(conn *Connection) {
    // Schedule disconnection after delay
    // (user might quickly return to app)
    time.AfterFunc(30*time.Second, func() {
        if conn.IsStillBackground() {
            // Disconnect WebSocket, use push for notifications
            conn.Close("app_background")
            h.pushNotificationService.RegisterForPush(conn.UserID)
        }
    })
}
 
func (h *BackgroundConnectionHandler) OnAppForeground(userID string) {
    // User returned - they'll reconnect via WebSocket
    // Messages received via push while away will show in history load
    h.pushNotificationService.UnregisterForPush(userID)
}
 
type PushService interface {
    RegisterForPush(userID string)
    UnregisterForPush(userID string)
}

Unit Economics Matter

Know your cost per user. If you charge $5/month and real-time costs $0.50/user, that's 10% of revenue. For free products, every optimization directly improves unit economics. Track cost per connection and cost per message as key metrics alongside performance.

Future Considerations and Trends

The real-time landscape continues to evolve. Here are trends to watch:

1. Edge Computing for Real-Time:

Push computation closer to users:

Cloudflare Durable Objects: Stateful WebSockets at the edge
AWS Lambda@Edge: Server logic in 200+ locations
Fly.io: Run containers in 30+ regions

2. WebTransport:

Successor to WebSocket with:

Multiple streams over single connection (no head-of-line blocking)
Unreliable delivery option (like UDP)
Built on HTTP/3 and QUIC
Better for gaming and real-time video

3. CRDTs Going Mainstream:

Conflict-free data structures enabling:

Offline-first applications
Peer-to-peer sync without central server
Reduced server load for collaborative features

4. AI-Powered Real-Time:

Real-time transcription and translation
AI assistants in collaborative tools
Intelligent routing based on context
Anomaly detection for security

5. WebRTC for More Than Video:

Peer-to-peer data channels for:

Reduced latency for nearby users
Serverless gaming for casual games
Distributed file sharing
Mesh networking for collaboration

Emerging Technologies for Real-Time
Technology	What It Enables	Maturity	When to Consider
WebTransport	Unreliable streams, HTTP/3	Early adoption	New gaming/video projects
Durable Objects	Edge stateful computing	Production	Global low-latency apps
NATS JetStream	Persistent real-time streams	Production	Event-driven architectures
Y-CRDT	Collaborative editing	Production	Document/canvas collaboration
WebRTC DataChannel	P2P data transfer	Mature	Video + data, gaming

Build for Today, Design for Tomorrow

While it's tempting to adopt every new technology, most production systems should use proven patterns (WebSocket + Redis pub/sub, good observability). Adopt new technologies incrementally, validating in production with limited scope before full rollout.

Summary: Scaling Real-Time Systems

Scaling real-time systems requires thinking differently: you're managing persistent connections, not transient requests; state, not just computation; continuous flow, not discrete events.

Key Takeaways

•Model capacity comprehensively: Connections, message rates, fan-out, and bandwidth all limit scale differently. Identify your bottleneck.
•Choose scaling patterns deliberately: Shared-nothing for natural partitions, pub/sub backbone for flexibility. Know the tradeoffs.
•Plan graceful degradation: Systems will be overloaded. Define degradation levels and automate the response.
•Observe everything: Connection metrics, message metrics, cross-server communication. You can't fix what you can't see.
•Prepare runbooks: Common incidents have known responses. Document and practice them.
•Optimize costs strategically: Idle connection management, tiered service levels, and smart instance selection significantly impact unit economics.
•Stay current but pragmatic: New technologies enable new capabilities, but proven patterns should dominate production systems.

Module Complete

Congratulations! You've completed the Real-Time Architecture Patterns module. You now understand connection management, presence systems, real-time collaboration, gaming architectures, and the scaling considerations that tie them all together. These patterns form the foundation of modern interactive applications—from chat and collaboration to gaming and beyond.

5 / 5

Loading learning content...

System Design (HLD)Real-Time Architecture Patterns

Real-Time Architecture Patterns

LevelAdvanced

Duration90 mins

TopicReal-Time Architecture Patterns

5 / 5

Scaling Considerations

The Scale Problem in Real-Time

Every real-time system designer eventually faces the same question: "What happens when we have 10x, 100x, or 1000x more users?"

What You Will Learn

Capacity Planning for Real-Time Systems

Key Metrics to Model:

Real-Time System Capacity Metrics
Metric	What It Measures	Typical Bottleneck	How to Measure
Concurrent connections	Active persistent connections	File descriptors, memory	Connection counter + peaks
Message rate (in)	Messages received per second	CPU, event loop	Network ingress + message parsing
Message rate (out)	Messages sent per second	CPU, network egress	Fan-out × message rate
Fan-out ratio	Recipients per message	Cross-server communication	Avg subscribers per broadcast
Message size	Bytes per message	Memory, network	Sample and histogram
Connection duration	How long connections persist	State management	Distribution analysis
Reconnection rate	How often clients reconnect	Auth service, connection handling	Reconnect events / time

Capacity Model Example:

Let's model a chat application:

Target: 1,000,000 concurrent users
Typical user behavior:
  - Average 2 rooms per user
  - Average 50 users per room
  - Average 1 message sent per user per minute
  - Average message size: 200 bytes

Calculations:
  - Total rooms: 1,000,000 / 50 = 20,000 active rooms
  - Messages/second: 1,000,000 / 60 = ~16,700 messages/second incoming
  - Fan-out: Each message reaches 50 users = 835,000 messages/second outgoing
  - Bandwidth out: 835,000 × 200 bytes = 167 MB/second = 1.3 Gbps

Server Sizing:

If each server handles:
  - 100,000 connections max
  - 50,000 messages/second throughput

Then you need:
  - Connections: 1,000,000 / 100,000 = 10 servers
  - Throughput: 835,000 / 50,000 = 17 servers
  - Answer: 17 servers minimum (throughput limited)
  - Recommended: 25 servers (50% headroom for spikes)

capacity_model.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
"""
Capacity Planning Model for Real-Time Chat System
"""
 
from dataclasses import dataclass
from typing import Optional
 
@dataclass
class WorkloadProfile:
    concurrent_users: int
    avg_rooms_per_user: float
    avg_users_per_room: int
    msgs_per_user_per_minute: float
    avg_message_bytes: int
    
    # Optional: spiky traffic modeling
    peak_multiplier: float = 2.0  # Peak = 2x average
    daily_peak_duration_hours: float = 4.0
 
@dataclass
class ServerCapacity:
    max_connections: int
    max_msgs_per_second_in: int
    max_msgs_per_second_out: int
    max_bandwidth_mbps: int
    cost_per_hour: float
 
@dataclass
class CapacityPlan:
    servers_needed: int
    limiting_factor: str
    headroom_percent: float
    estimated_monthly_cost: float
    
    # Breakdown
    connection_limited: int
    ingest_limited: int
    fanout_limited: int
    bandwidth_limited: int
 
def calculate_capacity(
    workload: WorkloadProfile,
    server: ServerCapacity,
    headroom: float = 0.5  # 50% headroom
) -> CapacityPlan:
    """Calculate required server count for a real-time workload."""
    
    # Calculate derived metrics
    total_rooms = workload.concurrent_users / workload.avg_users_per_room
    
    # Incoming message rate
    msgs_per_second_in = (
        workload.concurrent_users * workload.msgs_per_user_per_minute / 60
    )
    
    # Fan-out: each message goes to room members
    avg_fanout = workload.avg_users_per_room
    msgs_per_second_out = msgs_per_second_in * avg_fanout
    
    # Bandwidth requirements
    bandwidth_in_mbps = (msgs_per_second_in * workload.avg_message_bytes * 8) / 1_000_000
    bandwidth_out_mbps = (msgs_per_second_out * workload.avg_message_bytes * 8) / 1_000_000
    
    # Calculate server count for each dimension
    connection_limited = workload.concurrent_users / server.max_connections
    ingest_limited = msgs_per_second_in / server.max_msgs_per_second_in
    fanout_limited = msgs_per_second_out / server.max_msgs_per_second_out
    bandwidth_limited = max(bandwidth_in_mbps, bandwidth_out_mbps) / server.max_bandwidth_mbps
    
    # Apply peak multiplier
    peak_connection = connection_limited * workload.peak_multiplier
    peak_ingest = ingest_limited * workload.peak_multiplier
    peak_fanout = fanout_limited * workload.peak_multiplier
    peak_bandwidth = bandwidth_limited * workload.peak_multiplier
    
    # Find the limiting factor
    limits = {
        'connections': peak_connection,
        'message_ingest': peak_ingest,
        'message_fanout': peak_fanout,
        'bandwidth': peak_bandwidth,
    }
    
    limiting_factor = max(limits, key=limits.get)
    min_servers = max(limits.values())
    
    # Add headroom
    servers_needed = int(min_servers * (1 + headroom)) + 1
    
    # Cost estimation (assuming 24x7 operation)
    monthly_cost = servers_needed * server.cost_per_hour * 24 * 30
    
    return CapacityPlan(
        servers_needed=servers_needed,
        limiting_factor=limiting_factor,
        headroom_percent=headroom * 100,
        estimated_monthly_cost=monthly_cost,
        connection_limited=int(connection_limited),
        ingest_limited=int(ingest_limited),
        fanout_limited=int(fanout_limited),
        bandwidth_limited=int(bandwidth_limited),
    )
 
# Example usage
if __name__ == "__main__":
    workload = WorkloadProfile(
        concurrent_users=1_000_000,
        avg_rooms_per_user=2.0,
        avg_users_per_room=50,
        msgs_per_user_per_minute=1.0,
        avg_message_bytes=200,
    )
    
    server = ServerCapacity(
        max_connections=100_000,
        max_msgs_per_second_in=10_000,
        max_msgs_per_second_out=50_000,
        max_bandwidth_mbps=1000,
        cost_per_hour=0.50,
    )
    
    plan = calculate_capacity(workload, server)
    
    print(f"""
Capacity Plan for {workload.concurrent_users:,} concurrent users:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Servers needed: {plan.servers_needed}
Limiting factor: {plan.limiting_factor}
Headroom: {plan.headroom_percent}%
 
Breakdown (without headroom):
  - Connection-limited: {plan.connection_limited} servers
  - Ingest-limited: {plan.ingest_limited} servers
  - Fanout-limited: {plan.fanout_limited} servers
  - Bandwidth-limited: {plan.bandwidth_limited} servers
 
Estimated monthly cost: ${plan.estimated_monthly_cost:, .0f
                        }
    """)

Plan for Peaks, Not Averages

Horizontal Scaling Patterns

Horizontal scaling adds servers to handle more load. For real-time systems, this requires careful design to maintain consistency and enable cross-server communication.

Pattern 1: Shared-Nothing Partitioning

Each server owns a partition of data/users. No cross-server communication needed for within-partition operations.

Converting Mermaid diagram...

Pattern 2: Pub/Sub Backbone

Servers don't talk directly. All cross-server messages go through a message broker (Redis, NATS, Kafka).

Converting Mermaid diagram...

pubsub_backbone.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
package scaling
 
import (
    "context"
    "encoding/json"
 
    "github.com/go-redis/redis/v8"
)
 
// PubSubBackbone enables cross-server message routing
type PubSubBackbone struct {
    redis           *redis.Client
    serverID        string
    localDelivery   LocalDeliveryFunc
    subscriptions   map[string]bool
}
 
type LocalDeliveryFunc func(channelID string, message []byte)
 
type CrossServerMessage struct {
    SourceServer string `json:"source"`
    ChannelID    string `json:"channel"`
    Payload      []byte `json:"payload"`
}
 
func NewPubSubBackbone(
    redisClient *redis.Client,
    serverID string,
    deliveryFunc LocalDeliveryFunc,
) *PubSubBackbone {
    return &PubSubBackbone{
        redis:         redisClient,
        serverID:      serverID,
        localDelivery: deliveryFunc,
        subscriptions: make(map[string]bool),
    }
}
 
// Publish sends a message to all servers with subscribers in this channel
func (p *PubSubBackbone) Publish(ctx context.Context, channelID string, message []byte) error {
    msg := CrossServerMessage{
        SourceServer: p.serverID,
        ChannelID:    channelID,
        Payload:      message,
    }
 
    data, err := json.Marshal(msg)
    if err != nil {
        return err
    }
 
    // Redis Pub/Sub topic per logical channel
    topic := "realtime:" + channelID
    return p.redis.Publish(ctx, topic, data).Err()
}
 
// Subscribe starts listening for messages on a channel
func (p *PubSubBackbone) Subscribe(ctx context.Context, channelID string) error {
    if p.subscriptions[channelID] {
        return nil // Already subscribed
    }
 
    topic := "realtime:" + channelID
    pubsub := p.redis.Subscribe(ctx, topic)
 
    p.subscriptions[channelID] = true
 
    // Handle incoming messages in goroutine
    go func() {
        defer pubsub.Close()
        defer delete(p.subscriptions, channelID)
 
        for {
            select {
            case <-ctx.Done():
                return
            case msg := <-pubsub.Channel():
                var crossMsg CrossServerMessage
                if err := json.Unmarshal([]byte(msg.Payload), &crossMsg); err != nil {
                    continue
                }
 
                // Don't deliver our own messages back to us
                if crossMsg.SourceServer == p.serverID {
                    continue
                }
 
                // Deliver to local connections
                p.localDelivery(crossMsg.ChannelID, crossMsg.Payload)
            }
        }
    }()
 
    return nil
}
 
// Unsubscribe stops listening for a channel (when no local subscribers remain)
func (p *PubSubBackbone) Unsubscribe(ctx context.Context, channelID string) error {
    delete(p.subscriptions, channelID)
    topic := "realtime:" + channelID
    return p.redis.Unsubscribe(ctx, topic).Err()
}
 
// Using the backbone in a connection server:
 
type ConnectionServer struct {
    backbone    *PubSubBackbone
    localRooms  map[string]map[string]*Connection  // roomID -> connectionID -> Connection
}
 
func (s *ConnectionServer) BroadcastToRoom(roomID string, message []byte) {
    // 1. Deliver to local connections
    if room, ok := s.localRooms[roomID]; ok {
        for _, conn := range room {
            conn.Send(message)
        }
    }
 
    // 2. Publish to other servers
    ctx := context.Background()
    s.backbone.Publish(ctx, roomID, message)
}
 
func (s *ConnectionServer) handleCrossServerMessage(roomID string, message []byte) {
    // Deliver to local connections only
    if room, ok := s.localRooms[roomID]; ok {
        for _, conn := range room {
            conn.Send(message)
        }
    }
}

Pattern 3: Sticky Sessions with Reliable Handoff

Users stick to one server, but can be moved during scale-down or server maintenance.

Comparing Patterns:

Horizontal Scaling Pattern Comparison
Pattern	Pros	Cons	Best For
Shared-Nothing	Simple, no cross-server traffic	Users in same room must be on same server	Naturally partitionable data (chat rooms, games)
Pub/Sub Backbone	Flexible, users can be anywhere	Message broker becomes SPOF	General-purpose, user-to-user messaging
Sticky + Handoff	Simple mental model	Handoff complexity, temporary disconnects	Session-based applications
Hybrid	Best of both worlds	Implementation complexity	Large-scale production systems

Redis Cluster for Pub/Sub Scale

Graceful Degradation Strategies

When systems are overwhelmed, they shouldn't crash—they should degrade gracefully, preserving core functionality while shedding non-essential load.

Degradation Hierarchy:

Level	Response	User Impact	Trigger
Normal	Full functionality	None	Load < 70% capacity
Warning	Reduce nice-to-have features	Minor (slower typing indicators)	Load > 70%
Degraded	Core functionality only	Noticeable (no presence updates)	Load > 85%
Critical	Shed load, reject new connections	Severe (some users disconnected)	Load > 95%
Emergency	System protection mode	Major (service unavailable for new users)	Load > 100%

graceful_degradation.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
package degradation
 
import (
    "sync/atomic"
    "time"
)
 
type DegradationLevel int
 
const (
    LevelNormal DegradationLevel = iota
    LevelWarning
    LevelDegraded
    LevelCritical
    LevelEmergency
)
 
// DegradationController manages system degradation based on load
type DegradationController struct {
    currentLevel      atomic.Int32
    loadMetrics       *LoadMetrics
    featureFlags      *FeatureFlags
    checkInterval     time.Duration
}
 
type LoadMetrics struct {
    CPUPercent        float64
    MemoryPercent     float64
    ConnectionCount   int64
    ConnectionLimit   int64
    MessageQueueDepth int64
    QueueLimit        int64
}
 
type FeatureFlags struct {
    EnableTypingIndicators bool
    EnablePresenceUpdates  bool
    EnableReadReceipts     bool
    EnableRichPreviews     bool
    EnableHistoryLoad      bool
    AcceptNewConnections   bool
    EnableNonCriticalAPIs  bool
}
 
func NewDegradationController() *DegradationController {
    dc := &DegradationController{
        checkInterval: time.Second * 5,
        featureFlags:  defaultFeatureFlags(),
    }
    go dc.monitorLoop()
    return dc
}
 
func defaultFeatureFlags() *FeatureFlags {
    return &FeatureFlags{
        EnableTypingIndicators: true,
        EnablePresenceUpdates:  true,
        EnableReadReceipts:     true,
        EnableRichPreviews:     true,
        EnableHistoryLoad:      true,
        AcceptNewConnections:   true,
        EnableNonCriticalAPIs:  true,
    }
}
 
func (dc *DegradationController) monitorLoop() {
    ticker := time.NewTicker(dc.checkInterval)
    defer ticker.Stop()
 
    for range ticker.C {
        metrics := dc.loadMetrics
        if metrics == nil {
            continue
        }
 
        level := dc.calculateLevel(metrics)
        oldLevel := DegradationLevel(dc.currentLevel.Load())
        
        if level != oldLevel {
            dc.currentLevel.Store(int32(level))
            dc.applyDegradation(level)
            dc.alertLevelChange(oldLevel, level)
        }
    }
}
 
func (dc *DegradationController) calculateLevel(m *LoadMetrics) DegradationLevel {
    // Calculate overall load score (0-1)
    cpuLoad := m.CPUPercent / 100
    memLoad := m.MemoryPercent / 100
    connLoad := float64(m.ConnectionCount) / float64(m.ConnectionLimit)
    queueLoad := float64(m.MessageQueueDepth) / float64(m.QueueLimit)
 
    // Weighted average, prioritizing connection and queue load
    loadScore := (cpuLoad*0.2 + memLoad*0.2 + connLoad*0.3 + queueLoad*0.3)
 
    switch {
    case loadScore >= 1.0:
        return LevelEmergency
    case loadScore >= 0.95:
        return LevelCritical
    case loadScore >= 0.85:
        return LevelDegraded
    case loadScore >= 0.70:
        return LevelWarning
    default:
        return LevelNormal
    }
}
 
func (dc *DegradationController) applyDegradation(level DegradationLevel) {
    flags := dc.featureFlags
 
    switch level {
    case LevelNormal:
        // Full functionality
        flags.EnableTypingIndicators = true
        flags.EnablePresenceUpdates = true
        flags.EnableReadReceipts = true
        flags.EnableRichPreviews = true
        flags.EnableHistoryLoad = true
        flags.AcceptNewConnections = true
        flags.EnableNonCriticalAPIs = true
 
    case LevelWarning:
        // Reduce frequency of non-critical updates
        flags.EnableTypingIndicators = true  // But at reduced frequency
        flags.EnablePresenceUpdates = true   // But batched more aggressively
        flags.EnableReadReceipts = true
        flags.EnableRichPreviews = false     // Disable link previews
        flags.EnableHistoryLoad = true
        flags.AcceptNewConnections = true
        flags.EnableNonCriticalAPIs = true
 
    case LevelDegraded:
        // Core functionality only
        flags.EnableTypingIndicators = false
        flags.EnablePresenceUpdates = false
        flags.EnableReadReceipts = false
        flags.EnableRichPreviews = false
        flags.EnableHistoryLoad = true       // Keep but rate limit
        flags.AcceptNewConnections = true
        flags.EnableNonCriticalAPIs = false
 
    case LevelCritical:
        // Shed non-essential load
        flags.EnableTypingIndicators = false
        flags.EnablePresenceUpdates = false
        flags.EnableReadReceipts = false
        flags.EnableRichPreviews = false
        flags.EnableHistoryLoad = false     // Disable history loading
        flags.AcceptNewConnections = false  // Stop accepting new connections
        flags.EnableNonCriticalAPIs = false
 
    case LevelEmergency:
        // System protection mode
        // Consider disconnecting low-value or idle connections
        flags.EnableTypingIndicators = false
        flags.EnablePresenceUpdates = false
        flags.EnableReadReceipts = false
        flags.EnableRichPreviews = false
        flags.EnableHistoryLoad = false
        flags.AcceptNewConnections = false
        flags.EnableNonCriticalAPIs = false
        
        // Trigger aggressive measures
        dc.triggerEmergencyMeasures()
    }
}
 
func (dc *DegradationController) triggerEmergencyMeasures() {
    // Consider:
    // 1. Disconnect idle connections (no activity for 5+ minutes)
    // 2. Reduce message queue retention
    // 3. Drop non-delivery-critical messages
    // 4. Page on-call for manual intervention
}
 
func (dc *DegradationController) alertLevelChange(old, new DegradationLevel) {
    // Log and alert on degradation level changes
    // This should page on-call for Critical/Emergency
}
 
// Middleware to check degradation before processing requests
func (dc *DegradationController) ShouldProcess(featureType string) bool {
    flags := dc.featureFlags
    
    switch featureType {
    case "typing_indicator":
        return flags.EnableTypingIndicators
    case "presence":
        return flags.EnablePresenceUpdates
    case "read_receipt":
        return flags.EnableReadReceipts
    case "rich_preview":
        return flags.EnableRichPreviews
    case "history":
        return flags.EnableHistoryLoad
    case "new_connection":
        return flags.AcceptNewConnections
    default:
        return flags.EnableNonCriticalAPIs
    }
}

Load Shedding Strategies:

Strategy	Description	When to Use
Reject new connections	Stop accepting new connections	Approaching capacity
Drop non-critical messages	Skip typing indicators, read receipts	High load
Reduce broadcast scope	Limit presence to close contacts only	Very high load
Disconnect idle users	Close connections with no recent activity	Critical load
Priority queuing	Process paid/premium users first	Critical load
Circuit break downstream	Stop calling non-essential services	Cascade risk

Communicate Degradation to Users

Monitoring and Observability

Real-time systems require real-time observability. Traditional request-response metrics (requests/second, latency percentiles) need augmentation with connection and message metrics.

Essential Metrics Dashboard:

Real-Time System Metrics
Category	Metric	Alert Threshold (example)
Connections	Total active connections	90% of capacity
Connections	Connection rate (new/second)	1000 (potential attack)
Connections	Disconnection rate	10% in 5 minutes (issue!)
Messages	Messages in per second	80% of capacity
Messages	Messages out per second	80% of capacity
Messages	Message queue depth	10,000 (backing up)
Latency	Message delivery p99	500ms
Latency	Connection establishment p99	2s
Errors	Failed message deliveries/min	100
Errors	Connection errors/min	50
Resources	CPU usage	80%
Resources	Memory usage	85%
Resources	File descriptors used	90%
Cross-server	Pub/sub latency p99	100ms
Cross-server	Pub/sub message rate	80% of Redis capacity

metrics.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
package metrics
 
import (
    "time"
    
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)
 
// Real-time system metrics
var (
    // Connection metrics
    ActiveConnections = promauto.NewGauge(prometheus.GaugeOpts{
        Name: "realtime_active_connections",
        Help: "Current number of active WebSocket connections",
    })
 
    ConnectionsTotal = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "realtime_connections_total",
        Help: "Total connections by status",
    }, []string{"status"}) // "established", "closed", "rejected", "error"
 
    ConnectionDuration = promauto.NewHistogram(prometheus.HistogramOpts{
        Name:    "realtime_connection_duration_seconds",
        Help:    "Duration of WebSocket connections",
        Buckets: []float64{1, 10, 60, 300, 600, 1800, 3600, 7200},
    })
 
    // Message metrics
    MessagesReceived = promauto.NewCounter(prometheus.CounterOpts{
        Name: "realtime_messages_received_total",
        Help: "Total messages received from clients",
    })
 
    MessagesSent = promauto.NewCounter(prometheus.CounterOpts{
        Name: "realtime_messages_sent_total",
        Help: "Total messages sent to clients",
    })
 
    MessageDeliveryLatency = promauto.NewHistogram(prometheus.HistogramOpts{
        Name:    "realtime_message_delivery_latency_ms",
        Help:    "Time from message receipt to delivery to all recipients",
        Buckets: []float64{1, 5, 10, 25, 50, 100, 250, 500, 1000},
    })
 
    MessageQueueDepth = promauto.NewGauge(prometheus.GaugeOpts{
        Name: "realtime_message_queue_depth",
        Help: "Current depth of the outbound message queue",
    })
 
    // Fanout metrics
    FanoutSize = promauto.NewHistogram(prometheus.HistogramOpts{
        Name:    "realtime_fanout_recipients",
        Help:    "Number of recipients per broadcast message",
        Buckets: []float64{1, 5, 10, 25, 50, 100, 500, 1000, 5000},
    })
 
    // Error metrics
    DeliveryErrors = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "realtime_delivery_errors_total",
        Help: "Failed message deliveries by error type",
    }, []string{"error_type"})
 
    // Cross-server metrics
    PubSubPublishLatency = promauto.NewHistogram(prometheus.HistogramOpts{
        Name:    "realtime_pubsub_publish_latency_ms",
        Help:    "Latency of publishing to cross-server pub/sub",
        Buckets: []float64{0.5, 1, 2, 5, 10, 25, 50, 100},
    })
 
    PubSubMessagesReceived = promauto.NewCounter(prometheus.CounterOpts{
        Name: "realtime_pubsub_messages_received_total",
        Help: "Messages received from other servers via pub/sub",
    })
 
    // Presence metrics
    PresenceUpdates = promauto.NewCounter(prometheus.CounterOpts{
        Name: "realtime_presence_updates_total",
        Help: "Presence status changes broadcasted",
    })
 
    OnlineUsers = promauto.NewGauge(prometheus.GaugeOpts{
        Name: "realtime_online_users",
        Help: "Unique users currently online",
    })
)
 
// TrackMessageDelivery measures end-to-end message delivery
func TrackMessageDelivery(startTime time.Time, recipientCount int) {
    latencyMs := float64(time.Since(startTime).Milliseconds())
    MessageDeliveryLatency.Observe(latencyMs)
    FanoutSize.Observe(float64(recipientCount))
    MessagesSent.Add(float64(recipientCount))
}
 
// Example Grafana dashboard query (PromQL):
// 
// # Message delivery p99 latency
// histogram_quantile(0.99, rate(realtime_message_delivery_latency_ms_bucket[5m]))
//
// # Connection churn rate
// rate(realtime_connections_total{status="established"}[5m]) +
// rate(realtime_connections_total{status="closed"}[5m])
//
// # Capacity usage
// realtime_active_connections / realtime_connection_capacity
//
// # Error rate
// rate(realtime_delivery_errors_total[5m]) / rate(realtime_messages_sent_total[5m])

Distributed Tracing for Real-Time:

Tracing a message through a real-time system:

Client sends message → Generate trace ID
Connection server receives → Start span: "receive"
Authorize and validate → Child span: "auth"
Publish to pub/sub → Child span: "pubsub_publish"
Other servers receive → Child span: "pubsub_receive" (separate trace Context)
Deliver to recipients → Child spans: "deliver" (one per server)

Unlike request-response, a single message can spawn many traces across servers. Use trace propagation headers in your pub/sub messages.

Sample Wisely at Scale

Operational Runbooks

Real-time systems have unique operational challenges. Here are runbooks for common scenarios:

Runbook 1: Sudden Connection Spike

Symptoms: Connection count rising rapidly, approaching capacity

Investigation:

Check if legitimate traffic (viral moment, event) or attack
Review geographic distribution of new connections
Check auth service for unusual patterns
Review client versions (bot attack often uses outdated clients)

Response:

If attack: Enable rate limiting, block suspicious IPs/regions
If legitimate: Scale up servers immediately, activate degradation mode
Alert additional on-call if scale-up needed

Runbook 2: High Message Latency

Symptoms: Message delivery p99 > 500ms

Investigation:

Check pub/sub latency (Redis response times)
Check CPU usage on connection servers
Check for hot partitions (one channel with millions of subscribers)
Check network between servers and pub/sub

Response:

If pub/sub overloaded: Scale Redis, consider sharding
If hot partition: Migrate heavy channel to dedicated server
If CPU bound: Scale connection servers
If network: Check for saturation, consider upgrading

Runbook 3: Mass Disconnection Event

Symptoms: Disconnection rate > 10% in 5 minutes

Investigation:

Check if server crashed or was terminated
Check network connectivity to affected server
Check client error logs for disconnect reasons
Check for deployment in progress

Response:

If server crash: Let auto-scale replace, investigate root cause
If network: Route traffic away from affected region
If deployment: Pause deployment, rollback if needed
If client bug: Push client hotfix, rate limit reconnections

incident_response.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# Real-Time System Incident Response Checklist
 
## Severity Classification
 
| Severity | Definition | Response Time | Example |
|----------|------------|---------------|---------|
| SEV1 | Complete outage | Immediate | All connections dropped |
| SEV2 | Major degradation | 15 min | >50% latency increase |
| SEV3 | Partial degradation | 1 hour | One region affected |
| SEV4 | Minor issue | 24 hours | Non-critical feature broken |
 
## Initial Response (First 5 Minutes)
 
- [ ] Acknowledge alert and page
- [ ] Open incident channel (#incident-YYYYMMDD-N)
- [ ] Initial assessment: scope, user impact, trend direction
- [ ] Post initial update to status page
- [ ] Determine if this needs escalation
 
## Investigation Checklist
 
### Connections
- [ ] Current connection count vs capacity
- [ ] New connections per second
- [ ] Disconnection rate and reasons
- [ ] Geographic distribution of issues
 
### Messages  
- [ ] Message rate (in and out)
- [ ] Delivery latency percentiles
- [ ] Queue depths
- [ ] Failed delivery count
 
### Infrastructure
- [ ] CPU/memory on connection servers
- [ ] Redis/pub-sub health
- [ ] Network connectivity and latency
- [ ] Recent deployments or changes
 
### Downstream Dependencies
- [ ] Auth service health
- [ ] Database connectivity
- [ ] Storage service health
 
## Common Mitigations
 
### Emergency Traffic Reduction
```bash
# Enable degradation mode
kubectl set env deployment/connection-server DEGRADATION_LEVEL=3
 
# Stop accepting new connections
kubectl set env deployment/connection-server ACCEPT_NEW_CONNECTIONS=false
 
# Increase pod count
kubectl scale deployment/connection-server --replicas=50
```
 
### Connection Server Replacement
```bash
# Drain specific server
curl -X POST http://server-1:9000/admin/drain
 
# Wait for connections to migrate
# Monitor: realtime_active_connections{server="server-1"}
 
# Terminate server
kubectl delete pod connection-server-xxxxx
```
 
## Post-Incident
 
- [ ] All-clear posted to status page
- [ ] Incident channel summary written
- [ ] Timeline documented
- [ ] Action items captured
- [ ] Post-mortem scheduled (within 48 hours for SEV1/SEV2)

Practice Makes Perfect

Cost Optimization for Real-Time Systems

Persistent connections are expensive. Unlike HTTP where you pay per request, real-time systems pay for every second a connection is open—even if the user is idle.

Cost Breakdown Example:

1,000,000 concurrent connections
Server cost: $0.50/hour per server handling 100K connections
Servers needed: 15 (with headroom)

Monthly server cost: 15 × $0.50 × 24 × 30 = $5,400

Plus:
- Redis for pub/sub: $1,500/month
- Load balancers: $500/month
- Data transfer: $2,000/month (at 1.3 Gbps egress)
- Monitoring/logging: $500/month

Total: ~$10,000/month for 1M connections
Cost per connection: $0.01/month
Cost per MAU (if 5% concurrent): $0.20/month

Cost Optimization Strategies:

Real-Time Cost Optimization Strategies
Strategy	Savings	Tradeoff
Disconnect idle connections	30-50%	Reconnection latency, user experience
Use spot/preemptible instances	50-70%	Instance termination, need graceful handling
Right-size instances	20-40%	Less headroom for spikes
Compress messages	20-40% data transfer	CPU overhead
Reduce update frequency	Variable	Less real-time experience
Use cheaper regions	10-30%	Higher latency for users
Reserved instances	30-60%	Commitment, less flexibility

Idle Connection Management:

Many connections are idle most of the time. Consider:

Aggressive idle timeout: Disconnect after 5 minutes of no activity (not just no messages—no mouse movement/app focus)
Push-to-pull fallback: When user goes idle, switch to polling or push notifications instead of persistent connection
Background/foreground differentiation: Mobile apps in background should disconnect; deliver via push notifications
Tiered service levels: Free users get more aggressive timeouts; paid users get persistent connections

cost_aware_connections.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
package costaware
 
import (
    "time"
)
 
type ConnectionTier int
 
const (
    TierFree ConnectionTier = iota
    TierPro
    TierEnterprise
)
 
type CostAwareConnectionPolicy struct {
    tier   ConnectionTier
}
 
func (p *CostAwareConnectionPolicy) GetIdleTimeout() time.Duration {
    switch p.tier {
    case TierFree:
        return 5 * time.Minute   // Disconnect free users after 5 min idle
    case TierPro:
        return 30 * time.Minute  // Pro users get 30 minutes
    case TierEnterprise:
        return 24 * time.Hour    // Enterprise: essentially no timeout
    default:
        return 5 * time.Minute
    }
}
 
func (p *CostAwareConnectionPolicy) GetHeartbeatInterval() time.Duration {
    switch p.tier {
    case TierFree:
        return 60 * time.Second  // Less frequent heartbeats
    case TierPro:
        return 30 * time.Second
    case TierEnterprise:
        return 15 * time.Second  // Faster detection
    default:
        return 60 * time.Second
    }
}
 
func (p *CostAwareConnectionPolicy) ShouldUseCompression() bool {
    // Compression uses CPU but saves bandwidth
    // Worth it for high-traffic scenarios
    return true
}
 
func (p *CostAwareConnectionPolicy) GetMessageBatchWindow() time.Duration {
    switch p.tier {
    case TierFree:
        return 500 * time.Millisecond  // Batch more for efficiency
    case TierPro:
        return 100 * time.Millisecond
    case TierEnterprise:
        return 0  // No batching, immediate delivery
    default:
        return 500 * time.Millisecond
    }
}
 
// BackgroundConnectionHandler handles mobile apps going to background
type BackgroundConnectionHandler struct {
    pushNotificationService PushService
}
 
func (h *BackgroundConnectionHandler) OnAppBackground(conn *Connection) {
    // Schedule disconnection after delay
    // (user might quickly return to app)
    time.AfterFunc(30*time.Second, func() {
        if conn.IsStillBackground() {
            // Disconnect WebSocket, use push for notifications
            conn.Close("app_background")
            h.pushNotificationService.RegisterForPush(conn.UserID)
        }
    })
}
 
func (h *BackgroundConnectionHandler) OnAppForeground(userID string) {
    // User returned - they'll reconnect via WebSocket
    // Messages received via push while away will show in history load
    h.pushNotificationService.UnregisterForPush(userID)
}
 
type PushService interface {
    RegisterForPush(userID string)
    UnregisterForPush(userID string)
}

Unit Economics Matter

Future Considerations and Trends

The real-time landscape continues to evolve. Here are trends to watch:

1. Edge Computing for Real-Time:

Push computation closer to users:

Cloudflare Durable Objects: Stateful WebSockets at the edge
AWS Lambda@Edge: Server logic in 200+ locations
Fly.io: Run containers in 30+ regions

2. WebTransport:

Successor to WebSocket with:

Multiple streams over single connection (no head-of-line blocking)
Unreliable delivery option (like UDP)
Built on HTTP/3 and QUIC
Better for gaming and real-time video

3. CRDTs Going Mainstream:

Conflict-free data structures enabling:

Offline-first applications
Peer-to-peer sync without central server
Reduced server load for collaborative features

4. AI-Powered Real-Time:

Real-time transcription and translation
AI assistants in collaborative tools
Intelligent routing based on context
Anomaly detection for security

5. WebRTC for More Than Video:

Peer-to-peer data channels for:

Reduced latency for nearby users
Serverless gaming for casual games
Distributed file sharing
Mesh networking for collaboration

Emerging Technologies for Real-Time
Technology	What It Enables	Maturity	When to Consider
WebTransport	Unreliable streams, HTTP/3	Early adoption	New gaming/video projects
Durable Objects	Edge stateful computing	Production	Global low-latency apps
NATS JetStream	Persistent real-time streams	Production	Event-driven architectures
Y-CRDT	Collaborative editing	Production	Document/canvas collaboration
WebRTC DataChannel	P2P data transfer	Mature	Video + data, gaming

Build for Today, Design for Tomorrow

Summary: Scaling Real-Time Systems

Scaling real-time systems requires thinking differently: you're managing persistent connections, not transient requests; state, not just computation; continuous flow, not discrete events.

Key Takeaways

•Model capacity comprehensively: Connections, message rates, fan-out, and bandwidth all limit scale differently. Identify your bottleneck.
•Choose scaling patterns deliberately: Shared-nothing for natural partitions, pub/sub backbone for flexibility. Know the tradeoffs.
•Plan graceful degradation: Systems will be overloaded. Define degradation levels and automate the response.
•Observe everything: Connection metrics, message metrics, cross-server communication. You can't fix what you can't see.
•Prepare runbooks: Common incidents have known responses. Document and practice them.
•Optimize costs strategically: Idle connection management, tiered service levels, and smart instance selection significantly impact unit economics.
•Stay current but pragmatic: New technologies enable new capabilities, but proven patterns should dominate production systems.

Module Complete

5 / 5