Loading learning content...
When you send a message on Discord, it appears for all channel members within 200 milliseconds—often less than 100ms. For perspective, that's faster than a human blink (300-400ms). This seemingly magical instant delivery requires one of the most sophisticated real-time infrastructures ever built.
The challenge isn't sending one message quickly—that's trivial. The challenge is maintaining 10+ million persistent connections simultaneously, any of which might need to receive a message at any moment, while handling 140,000 messages per second at peak, with global geographic distribution, and with zero message loss.
This page takes you deep into Discord's real-time messaging architecture. You'll understand WebSocket connection management at scale, the Gateway service design, message routing and fanout strategies, presence propagation, and the critical role of connection state in distributed systems.
Real-time communication requires bidirectional, low-latency, persistent connections. HTTP's request-response model fails here—polling introduces latency and waste; long-polling has connection limits. WebSockets solve this problem.
What WebSocket provides:
1234567891011
GET /gateway HTTP/1.1Host: gateway.discord.ggUpgrade: websocketConnection: UpgradeSec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==Sec-WebSocket-Version: 13Authorization: Bearer <user_token> # Server responds with 101 Switching Protocols# Connection is now a WebSocket# All subsequent communication uses WebSocket framesThe heartbeat protocol:
To detect dead connections (user closed laptop, network went down without TCP FIN), Discord implements a heartbeat mechanism:
HELLO with heartbeat_interval (typically ~41 seconds)HEARTBEAT (opcode 1) within that intervalHEARTBEAT_ACK (opcode 11)This bidirectional heartbeat ensures both parties detect connection death within ~45 seconds, crucial for accurate presence tracking.
The heartbeat interval is tuned to balance connection health detection against network overhead. Too short: excessive traffic for millions of connections. Too long: stale presence data. 41.25 seconds with jitter prevents thundering herd problems where all clients heartbeat simultaneously.
The Gateway is Discord's most critical service—the edge layer that maintains all WebSocket connections. Every connected client talks to a Gateway server, which then routes messages to/from backend services.
Gateway responsibilities:
Connection-to-session mapping:
Each Gateway server maintains an in-memory map of:
This enables efficient routing: when a message is sent to channel X, the Gateway knows immediately which local connections need to receive it.
| Opcode | Name | Direction | Description |
|---|---|---|---|
| 0 | Dispatch | Server → Client | Event dispatched to client (MESSAGE_CREATE, etc.) |
| 1 | Heartbeat | Both | Keepalive ping |
| 2 | Identify | Client → Server | Initial authentication |
| 3 | Presence Update | Client → Server | Update online status |
| 4 | Voice State Update | Client → Server | Join/leave voice channel |
| 6 | Resume | Client → Server | Reconnect and replay missed events |
| 7 | Reconnect | Server → Client | Server requests client reconnect |
| 9 | Invalid Session | Server → Client | Session is invalid, re-identify |
| 10 | Hello | Server → Client | Initial handshake with heartbeat interval |
| 11 | Heartbeat ACK | Server → Client | Heartbeat acknowledged |
Each Gateway server can handle approximately 100,000-150,000 concurrent WebSocket connections—limited by memory, file descriptors, and CPU for serialization. At 10 million concurrent users, you need 70-100 Gateway servers.
But here's the challenge: when a message is sent to channel X, how do you know which Gateway servers have users watching that channel?
The subscription model:
When a user's WebSocket connects and identifies:
guild:{guild_id} topicchannel:{channel_id} topicsNow when a message is sent:
channel:{channel_id} topic123456789101112131415161718192021222324252627282930313233
// When user identifies successfullyfunc (g *Gateway) onIdentify(conn *Connection, user *User) { // Get all guilds this user belongs to guilds, _ := g.guildService.GetUserGuilds(user.ID) for _, guild := range guilds { // Subscribe to guild-wide events (member updates, etc.) g.pubsub.Subscribe(fmt.Sprintf("guild:%s", guild.ID)) // Get channels visible to this user in this guild channels := g.getVisibleChannels(guild.ID, user.ID) for _, channel := range channels { // Subscribe to channel-specific events (messages, typing) g.pubsub.Subscribe(fmt.Sprintf("channel:%s", channel.ID)) } } // Track locally which user is on which connection g.userConnections[user.ID] = conn} // When receiving pub/sub messagefunc (g *Gateway) onPubSubMessage(topic string, payload []byte) { // Find all local connections interested in this topic connections := g.subscriptionMap.GetConnections(topic) for _, conn := range connections { // Apply per-user filtering (permissions might differ) if g.canUserSeeEvent(conn.UserID, payload) { conn.Send(payload) } }}What about guilds with 500K members? If a message is sent, and 100K of those members are online across 70 Gateways, every Gateway receives it. This is the 'hot channel' problem—solved by special handling for large guilds, explored in the scaling section.
Let's trace exactly what happens when you send a message in Discord, from keypress to delivery on recipients' screens.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
1. CLIENT: User presses Enter └─ Sends HTTP POST to /api/channels/{id}/messages └─ Body: { "content": "Hello world!" } 2. API GATEWAY: Receives request └─ Validates authentication token └─ Rate limit check (5 msgs/5 sec per channel) └─ Routes to Message Service 3. MESSAGE SERVICE: Processes message └─ Validate user has SEND_MESSAGES permission in channel └─ Apply content filtering (banned words, spam detection) └─ Generate unique snowflake ID (timestamp-embedded) └─ Transaction: Write to primary database └─ Update channel's last_message_id 4. MESSAGE SERVICE: Trigger fanout └─ Publish MESSAGE_CREATE event to pub/sub └─ Topic: channel:{channel_id} └─ Payload includes full message object 5. PUB/SUB: Distributes to subscribed Gateways └─ ~10-20 Gateway servers typically have subscribers └─ Parallel delivery to all 6. GATEWAY SERVERS (each): Process event └─ Look up local connections subscribed to this channel └─ For each connection: └─ Check user can still see channel (permissions) └─ Serialize event to wire format (JSON, ETF) └─ Write to WebSocket connection 7. CLIENT: Receives MESSAGE_CREATE └─ Deserialize payload └─ Insert into local message cache └─ Render in UI └─ Update unread indicators TOTAL LATENCY: 50-150ms typical- Network Client→API: 10-40ms- API processing: 5-15ms- Database write: 5-20ms- Pub/sub fanout: 5-15ms- Gateway→Client: 10-40msMessages are SENT via HTTP (reliable, can return errors) but RECEIVED via WebSocket (lowest latency). This hybrid approach gives the best of both worlds: reliable writes with real-time reads.
Typing indicators—optimized for latency:
Typing indicators follow a different path optimized purely for latency:
TYPING_START via WebSocket (not HTTP)Typing indicators are ephemeral—they're never persisted, so they bypass the database entirely, achieving 30-50ms delivery.
Presence—knowing who's online, idle, DND, or offline—seems simple but becomes incredibly complex at scale. Discord must track and propagate presence for millions of users, updating in real-time as users come online, go idle, or disconnect.
The presence challenge:
| Status | Meaning | Determination |
|---|---|---|
| Online | User is active | Heartbeat received, recent activity |
| Idle | User is away | 5+ minutes since last activity |
| Do Not Disturb | User set DND | Explicit user action |
| Invisible | Appears offline to others | User preference (stored) |
| Offline | Not connected | No active gateway connection |
| Streaming | User is streaming | Detected stream activity |
Presence propagation strategy:
Discord doesn't broadcast every presence change to everyone. Instead:
123456789101112131415161718192021222324252627
// PRESENCE_UPDATE event{ "op": 0, "t": "PRESENCE_UPDATE", "d": { "user": { "id": "123456789012345678" }, "status": "online", // online, idle, dnd, offline "activities": [ { "name": "Visual Studio Code", "type": 0, // 0=Game, 1=Streaming, 2=Listening, etc. "state": "Editing page-2.ts", "timestamps": { "start": 1704729600000 } } ], "client_status": { "desktop": "online", "mobile": "idle", "web": null }, "guild_id": "987654321098765432" // Context for this update }}Presence is designed for eventual consistency. If it takes 10 seconds for someone's status to update, that's acceptable. This relaxed consistency requirement allows significant optimization—presence updates can be batched, delayed, and even dropped during overload.
Network connections fail constantly—WiFi handoffs, cellular dead zones, laptop sleep/wake cycles. Discord must handle disconnections gracefully without losing messages or requiring full state resync.
The Resume protocol:
When a client connects, it receives a session_id and tracks a sequence number for each event received. If the connection drops:
RESUME with session_id and last known sequence12345678910111213141516171819202122232425262728
// Client sends RESUME{ "op": 6, "d": { "token": "user_auth_token", "session_id": "abc123def456", "seq": 42 // Last event sequence received }} // If session still valid, server sends:{ "op": 0, "t": "RESUMED", "s": 42, // Confirms sequence "d": {}} // Then replays missed events:{ "op": 0, "t": "MESSAGE_CREATE", "s": 43, "d": {...} }{ "op": 0, "t": "MESSAGE_CREATE", "s": 44, "d": {...} }{ "op": 0, "t": "TYPING_START", "s": 45, "d": {...} } // If session expired (>15-30 seconds):{ "op": 9, // Invalid Session "d": false // Cannot resume, must re-identify}How does the Gateway remember events to replay?
Each Gateway maintains a per-session event buffer:
If the buffer has been exhausted (client was disconnected too long), the session is invalid and client must re-identify, receiving full state (which is expensive but necessary).
With 100K connections per Gateway and 1000 events per buffer, that's potentially 100M buffered events per Gateway. At ~500 bytes average, that's ~50GB per Gateway just for resume buffers. This is why buffers have strict limits and expire quickly.
The client maintains a local cache of Discord state—messages, channels, members, settings. This cache must stay synchronized with the server through the WebSocket connection.
Initial state load (READY event):
When a client identifies, the server sends a massive READY event containing:
For guilds, this is intentionally partial—full member lists for large guilds would be megabytes.
1234567891011121314151617181920212223242526
{ "op": 0, "t": "READY", "s": 1, "d": { "v": 10, // Gateway protocol version "user": { "id": "123456789012345678", "username": "user", "discriminator": "0", "avatar": "a_abc123" }, "guilds": [ // "Unavailable" guilds - just IDs, details sent via GUILD_CREATE { "id": "111", "unavailable": true }, { "id": "222", "unavailable": true } ], "private_channels": [ { "id": "333", "type": 1, "recipients": [...] } ], "session_id": "abc123def456", "resume_gateway_url": "wss://gateway-us-east1-b.discord.gg", "shard": [0, 1], // For bots: shard ID, total shards "application": { "id": "...", "flags": ... } }}Lazy guild loading:
To keep READY fast, guilds are marked unavailable and detailed data arrives via separate GUILD_CREATE events. The client shows a loading state until each guild's data arrives.
Request Guild Members (lazy load):
When you click on a guild's member list:
REQUEST_GUILD_MEMBERS opThis lazy loading is essential—a user in 100 guilds with 1000 members each would otherwise need to load 100K member objects on startup.
When you send a message, the client immediately displays it locally (optimistic update) before server confirmation. If the send fails, the message shows a 'failed to send' indicator. This makes the UI feel instant even with network latency.
We've explored the sophisticated infrastructure that powers Discord's real-time communication. Let's consolidate the key insights:
What's next:
With the real-time text messaging foundation understood, we'll next explore Discord's server architecture—how backend services are organized, how data is stored and sharded, and how the API layer handles 100,000+ requests per second.
You now understand how Discord achieves real-time message delivery to millions of concurrent users. You've learned WebSocket lifecycle management, Gateway architecture, pub/sub event routing, presence propagation, and session resume handling. These patterns apply to any real-time communication system.