Discord Voice Chat - Learning Module

Loading content...

0/273

Scaling to Millions of Concurrent Users

The Scale That Breaks Everything

At 10 million concurrent users, every assumption you've made about system design gets tested. Databases that worked perfectly at 100,000 users start timing out. Network switches that never dropped packets start flooding. Caching strategies that seemed bulletproof show cracks.

Discord operates at this scale—and beyond. During major gaming events (new game releases, esports tournaments, server outages of popular games), traffic spikes 3-5x above normal peaks. The system must not only handle steady-state load but absorb these tsunamis without falling over.

This page explores the strategies Discord uses to scale reliably to millions of concurrent users while maintaining the sub-200ms latency promises that make real-time communication possible.

What You Will Learn

This page covers Discord's scaling strategies in depth. You'll understand geographic distribution and edge deployment, solving the 'hot partition' problem of large servers, sharding strategies that enable horizontal scaling, graceful degradation patterns, and the operational practices that keep the system running reliably.

Understanding Discord's Scale

Before discussing solutions, let's fully appreciate the scale challenges Discord faces.

Discord's Scale Metrics (Approximate)
Metric	Value	Challenge
Concurrent connections	10-15 million	Connection state, memory, file descriptors
Messages/second (peak)	140,000+	Database writes, fanout, search indexing
Voice channels active	150,000+	Real-time media processing, bandwidth
Users in voice	1.5 million+	Per-user audio streams, mixing
Largest servers	1 million+ members	Permission computation, presence fanout
Geographic regions	Global (6+ regions)	Latency optimization, data replication
Daily message volume	4+ billion	Storage growth, retention, compliance

Three categories of scaling challenges:

1. Horizontal Scale (more users) More users = more connections, more messages, more storage. Solution: add more servers, shard data.

2. Hotspot Scale (concentration) One server with 500K members creates concentrated load that can't be distributed. Solution: special handling for hot entities.

3. Spike Scale (temporal) New game launch causes 5x traffic in 10 minutes. Solution: auto-scaling, capacity buffers, graceful degradation.

Each requires different strategies—and sometimes they conflict. Sharding helps horizontal scale but can worsen hotspots if the hot entity lands on one shard.

Organic vs. Engineered Scale

Discord didn't design for 10M users from day one. Many scaling solutions emerged from painful production incidents. The lesson: design for the next order of magnitude, but accept that true hyperscale requires iteration through real-world problems.

Geographic Distribution

Network latency is bounded by physics—the speed of light. A packet from Tokyo to New York takes ~70ms minimum, round-trip ~140ms. For real-time services, this latency budget consumption is unacceptable.

Discord's regional architecture:

Discord deploys infrastructure globally, with users connecting to the nearest regional cluster:

US-East (Virginia): Primary US region
US-West (California): West coast users
US-Central (Chicago): Midwest, some gaming servers
Europe (Netherlands, Frankfurt): EU users
Asia (Singapore, Japan): APAC users
Brazil: South American users
Australia: Oceania users

Converting Mermaid diagram...

What's regional vs. global:

Regional (low latency critical):

Gateway connections (WebSocket)
Voice servers (audio streams)
Edge caches (frequently accessed data)
CDN nodes (media delivery)

Global (consistency critical):

User database (one source of truth for accounts)
Guild configuration (must be consistent)
Message storage (replicated for durability)
Search indices (can tolerate replication lag)

Voice server selection:

When you create a server, Discord assigns a default voice region. Voice servers for that region handle all the server's voice channels. Users in distant regions experience higher latency, but all participants are on the same voice server (required for audio mixing).

Follow the Sun

Discord's peak hours follow daylight. US peak is EU's night and Asia's morning. This allows shifting capacity between regions—VMs released in EU at midnight can be allocated to US for evening peak. Cloud providers charge hourly, so this saves significant cost.

The Hot Partition Problem

Discord's largest servers have over 1 million members. When a popular streamer types in #general, that message must reach 100,000+ concurrent viewers—in under 200ms. This creates the hot partition problem.

Why it's hard:

With consistent hashing by guild_id, all data for that guild lands on one database shard. But one shard can't handle the write amplification of 100K message deliveries.

With pub/sub by channel, every Gateway with subscribers receives the event. For a million-member server with 10% online (100K users), spread across 100 Gateways—every Gateway gets every message.

Hot Partition Scenarios
Scenario	Hot Resource	Impact
New game release announcement	Message fanout	100K+ deliveries per message
Streamer goes live	Presence updates	Millions of friends see status change
Server boost event	Guild metadata	Repeated reads of same config
Channel spam	Message writes	Rate limits may be insufficient
Bot joins large server	Member list	500K member records to sync

Solutions for hot partitions:

1. Large Guild Handling

Servers exceeding a member threshold (typically 1000-10000) receive special treatment:

Presence is not fully tracked—only a sample of online members shown
Member list is paginated and loaded on-demand
Permission calculations cached more aggressively
Events may be batched or rate-limited

2. Tiered Fanout

Instead of direct Gateway fanout:

Message published to regional 'aggregator' nodes
Aggregators batch messages for each Gateway
Gateways receive batched updates (e.g., every 50ms)
Individual messages delivered, but network overhead reduced

3. Read Replicas for Hot Guilds

Guild data for very large servers replicated to dedicated read replicas:

Write to primary shard
Reads distributed across replicas
Eventual consistency acceptable for most reads

Large Guilds Are Different

Discord explicitly documents that servers above certain thresholds have degraded features. No online member list above 1000 members. Limited presence updates. Lazy member loading. Users accept these trade-offs for the benefits of large communities.

Sharding Strategies

Sharding—distributing data across multiple databases—is essential at Discord's scale. Different data types require different sharding approaches.

Guild-based sharding is Discord's primary strategy for relational data.

How it works:

Compute: shard_id = hash(guild_id) % num_shards
All data for a guild lives on one shard
Enables efficient joins within guild context

Sharded by guild:

Channels
Roles and permissions
Guild settings
Audit logs
Emoji

Advantages:

Queries within a guild never cross shards
Consistent transactions for guild operations
Easy to add shards (reshard operation)

Disadvantages:

Cross-guild queries require scatter-gather
Hot guilds create hot shards
User data (spanning many guilds) requires separate strategy

Guild Sharding Example
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Shard router for guild-based sharding
type ShardRouter struct {
    numShards  int
    shardConns []*sql.DB
}
 
func (r *ShardRouter) GetShardForGuild(guildID uint64) *sql.DB {
    shardID := guildID % uint64(r.numShards)
    return r.shardConns[shardID]
}
 
// All guild queries go through the router
func (r *ShardRouter) GetChannels(guildID uint64) ([]Channel, error) {
    db := r.GetShardForGuild(guildID)
    return db.Query("SELECT * FROM channels WHERE guild_id = $1", guildID)
}
 
// Cross-guild query requires scatter-gather
func (r *ShardRouter) GetUserGuilds(userID uint64) ([]Guild, error) {
    // User→Guild mapping stored separately, not guild-sharded
    // Then fetch guild details from appropriate shards
}

Auto-Scaling and Capacity Management

Discord's traffic is highly variable—weekends are 40% higher than weekdays, evenings peak, and gaming events cause sudden spikes. Static capacity would either waste money (over-provisioned) or drop users (under-provisioned).

Scaling Metrics

•Connection count: Primary metric for Gateways
•CPU utilization: For compute-heavy services
•Request rate: For API servers
•Queue depth: For async workers
•Memory pressure: For caching layers
•Latency percentiles: p95/p99 response times

Scaling Triggers

•Scale up at: 70% connection capacity
•Scale up at: 80% CPU utilization
•Scale up at: p99 latency > 100ms
•Scale down at: 30% utilization for 15 min
•Minimum floor: Never below 50% of peak
•Pre-scale for: Known events (game launches)

Stateful scaling challenges:

Gateways are stateful—each holds 100K WebSocket connections with in-memory state. You can't just kill a Gateway; you must drain connections first.

Graceful Gateway scaling:

Scale up: New Gateways added; load balancer routes new connections to them. Existing connections unchanged.
Scale down:
- Mark Gateway for draining (no new connections)
- Send RECONNECT to clients (with resume support)
- Clients reconnect to other Gateways
- Once empty, terminate Gateway
- Takes 5-15 minutes for graceful drain
Rolling updates: Same as scale-down, but one-by-one across fleet. Clients experience brief reconnection.

Predictive Scaling

For known events, Discord pre-scales capacity. A major game launch is announced for 9 PM PST? Spin up extra Gateways at 8:30 PM. This avoids cold-start delays during the traffic surge.

Graceful Degradation Strategies

At hyperscale, failures are inevitable. The question isn't whether systems will fail, but how the system behaves when they do. Graceful degradation means maintaining core functionality even when components fail.

Degradation Hierarchy
Priority	Feature	Degradation Strategy
1 (Critical)	Message delivery	Never drop; queue if overloaded
2 (Critical)	Voice audio	Reduce quality before dropping users
3 (Important)	Presence updates	Batch and delay; eventual consistency
4 (Important)	Typing indicators	Drop entirely under load
5 (Nice-to-have)	Rich embeds	Show placeholder; load async
6 (Nice-to-have)	Animated emoji	Fall back to static
7 (Deferrable)	Search	Disable and show message
8 (Deferrable)	Server discovery	Return cached results

Circuit breakers:

When a downstream service is failing, continuously retrying makes things worse (additional load on struggling service). Circuit breakers prevent this:

Closed state: Requests flow normally
Open state: Requests fail immediately (no backend call)
Half-open state: Allow limited requests to test recovery

Transition logic:

Closed → Open: When error rate exceeds threshold (e.g., >50% in 10s)
Open → Half-open: After timeout (e.g., 30 seconds)
Half-open → Closed: If test requests succeed
Half-open → Open: If test requests fail

Circuit Breaker Implementation
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
type CircuitBreaker struct {
    state         State  // Closed, Open, HalfOpen
    failures      int
    successes     int
    threshold     int
    timeout       time.Duration
    lastFailure   time.Time
    mu            sync.Mutex
}
 
func (cb *CircuitBreaker) Execute(fn func() error) error {
    cb.mu.Lock()
    
    switch cb.state {
    case Open:
        // Check if timeout elapsed
        if time.Since(cb.lastFailure) > cb.timeout {
            cb.state = HalfOpen
            cb.mu.Unlock()
            return cb.tryRequest(fn)
        }
        cb.mu.Unlock()
        return ErrCircuitOpen  // Fail fast
        
    case HalfOpen:
        cb.mu.Unlock()
        return cb.tryRequest(fn)
        
    case Closed:
        cb.mu.Unlock()
        return cb.tryRequest(fn)
    }
    return nil
}
 
func (cb *CircuitBreaker) tryRequest(fn func() error) error {
    err := fn()
    
    cb.mu.Lock()
    defer cb.mu.Unlock()
    
    if err != nil {
        cb.failures++
        cb.lastFailure = time.Now()
        if cb.failures >= cb.threshold {
            cb.state = Open
        }
        return err
    }
    
    cb.successes++
    if cb.state == HalfOpen && cb.successes >= 3 {
        cb.state = Closed
        cb.failures = 0
    }
    return nil
}

Cascading Failure Prevention

Without circuit breakers, one failing service can cascade to take down everything. Service A calls Service B. B is slow, so A's threads block waiting. A runs out of threads, starts failing. Service C calls A, now C is affected. Circuit breakers isolate failures.

Operational Excellence at Scale

Running systems at Discord's scale requires more than good architecture—it requires excellent operations. The best system design is useless if you can't deploy, monitor, and debug it.

Observability Stack

•Metrics: Time-series data (latency, error rates, throughput). Prometheus/Datadog style. Dashboards for every service.
•Logging: Structured logs with correlation IDs. Centralized aggregation (e.g., Elasticsearch). Sampled for high-volume paths.
•Tracing: Distributed tracing across service calls. Every request gets a trace ID. Visualize request flow.
•Alerting: PagerDuty for oncall. Escalation policies. Runbooks for common issues.
•Error Tracking: Sentry-style error aggregation. Prioritize by frequency and impact.

Deployment practices:

Canary deployments: New code to 1% of servers first. Monitor for 15 minutes. If healthy, expand.
Feature flags: Decouple deployment from release. Turn features on/off without deploying.
Blue-green for databases: New schema deployed alongside old. Migrate gradually. Rollback if issues.
Automated rollback: Deployment fails health checks? Automatically roll back to previous version.

Incident response:

Discord publishes a status page and maintains incident response procedures:

Detection: Automated alerting catches issues
Response: On-call engineer paged within 5 minutes
Mitigation: Priority is restoring service, not finding root cause
Communication: Status page updated; @discordapp tweets
Resolution: Once stable, investigate and fix properly
Postmortem: Blameless analysis, action items, knowledge sharing

Game Days

Discord runs regular 'game days'—controlled chaos engineering. Inject failures (kill pods, add latency, simulate region outages) and observe. Does the system degrade gracefully? Are alerts firing correctly? Is the runbook accurate? Fix issues before real incidents.

Designing Discord: Interview Summary

Let's consolidate everything into a coherent design that you could present in a system design interview.

Converting Mermaid diagram...

Key Design Decisions to Discuss

•WebSocket for real-time: Persistent connections enable instant message delivery; HTTP/2 for API calls
•Pub/Sub for fanout: Events published to channels; only interested Gateways subscribe
•Snowflake IDs: Time-sortable unique IDs; natural sharding key for messages
•SFU for voice: Server forwards packets; clients decode and mix; scales linearly
•Polyglot persistence: PostgreSQL for relational, Cassandra for messages, Redis for cache
•Guild-based sharding: All guild data on one shard; enables efficient queries
•Large guild special handling: Presence sampling, lazy loading, batched events
•Geographic distribution: Users connect to nearest region; voice servers regional
•Graceful degradation: Priority hierarchy; typing indicators dropped before messages

Trade-offs to acknowledge:

Eventual consistency for presence (acceptable) vs. strong consistency for messages (required)
Stateful Gateways add complexity but essential for connection management
Large guilds get degraded features—trade-off for scalability
Regional voice means cross-region group chats have one side with high latency
Client-side mixing requires client CPU/bandwidth but massively reduces server cost

Module Complete!

Congratulations! You've completed an exhaustive exploration of Discord's architecture—from requirements through real-time messaging, server architecture, voice channels, and scaling to millions of users. You now have the knowledge to design and discuss sophisticated real-time communication systems at the Principal Engineer level.

Scaling to Millions of Concurrent Users

The Scale That Breaks Everything

This page explores the strategies Discord uses to scale reliably to millions of concurrent users while maintaining the sub-200ms latency promises that make real-time communication possible.

What You Will Learn

Understanding Discord's Scale

Before discussing solutions, let's fully appreciate the scale challenges Discord faces.

Discord's Scale Metrics (Approximate)
Metric	Value	Challenge
Concurrent connections	10-15 million	Connection state, memory, file descriptors
Messages/second (peak)	140,000+	Database writes, fanout, search indexing
Voice channels active	150,000+	Real-time media processing, bandwidth
Users in voice	1.5 million+	Per-user audio streams, mixing
Largest servers	1 million+ members	Permission computation, presence fanout
Geographic regions	Global (6+ regions)	Latency optimization, data replication
Daily message volume	4+ billion	Storage growth, retention, compliance

Three categories of scaling challenges:

1. Horizontal Scale (more users) More users = more connections, more messages, more storage. Solution: add more servers, shard data.

2. Hotspot Scale (concentration) One server with 500K members creates concentrated load that can't be distributed. Solution: special handling for hot entities.

3. Spike Scale (temporal) New game launch causes 5x traffic in 10 minutes. Solution: auto-scaling, capacity buffers, graceful degradation.

Each requires different strategies—and sometimes they conflict. Sharding helps horizontal scale but can worsen hotspots if the hot entity lands on one shard.

Organic vs. Engineered Scale

Geographic Distribution

Discord's regional architecture:

Discord deploys infrastructure globally, with users connecting to the nearest regional cluster:

US-East (Virginia): Primary US region
US-West (California): West coast users
US-Central (Chicago): Midwest, some gaming servers
Europe (Netherlands, Frankfurt): EU users
Asia (Singapore, Japan): APAC users
Brazil: South American users
Australia: Oceania users

Converting Mermaid diagram...

What's regional vs. global:

Regional (low latency critical):

Gateway connections (WebSocket)
Voice servers (audio streams)
Edge caches (frequently accessed data)
CDN nodes (media delivery)

Global (consistency critical):

User database (one source of truth for accounts)
Guild configuration (must be consistent)
Message storage (replicated for durability)
Search indices (can tolerate replication lag)

Voice server selection:

Follow the Sun

The Hot Partition Problem

Why it's hard:

With consistent hashing by guild_id, all data for that guild lands on one database shard. But one shard can't handle the write amplification of 100K message deliveries.

With pub/sub by channel, every Gateway with subscribers receives the event. For a million-member server with 10% online (100K users), spread across 100 Gateways—every Gateway gets every message.

Hot Partition Scenarios
Scenario	Hot Resource	Impact
New game release announcement	Message fanout	100K+ deliveries per message
Streamer goes live	Presence updates	Millions of friends see status change
Server boost event	Guild metadata	Repeated reads of same config
Channel spam	Message writes	Rate limits may be insufficient
Bot joins large server	Member list	500K member records to sync

Solutions for hot partitions:

1. Large Guild Handling

Servers exceeding a member threshold (typically 1000-10000) receive special treatment:

Presence is not fully tracked—only a sample of online members shown
Member list is paginated and loaded on-demand
Permission calculations cached more aggressively
Events may be batched or rate-limited

2. Tiered Fanout

Instead of direct Gateway fanout:

Message published to regional 'aggregator' nodes
Aggregators batch messages for each Gateway
Gateways receive batched updates (e.g., every 50ms)
Individual messages delivered, but network overhead reduced

3. Read Replicas for Hot Guilds

Guild data for very large servers replicated to dedicated read replicas:

Write to primary shard
Reads distributed across replicas
Eventual consistency acceptable for most reads

Large Guilds Are Different

Sharding Strategies

Sharding—distributing data across multiple databases—is essential at Discord's scale. Different data types require different sharding approaches.

Guild-based sharding is Discord's primary strategy for relational data.

How it works:

Compute: shard_id = hash(guild_id) % num_shards
All data for a guild lives on one shard
Enables efficient joins within guild context

Sharded by guild:

Channels
Roles and permissions
Guild settings
Audit logs
Emoji

Advantages:

Queries within a guild never cross shards
Consistent transactions for guild operations
Easy to add shards (reshard operation)

Disadvantages:

Cross-guild queries require scatter-gather
Hot guilds create hot shards
User data (spanning many guilds) requires separate strategy

Guild Sharding Example
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Shard router for guild-based sharding
type ShardRouter struct {
    numShards  int
    shardConns []*sql.DB
}
 
func (r *ShardRouter) GetShardForGuild(guildID uint64) *sql.DB {
    shardID := guildID % uint64(r.numShards)
    return r.shardConns[shardID]
}
 
// All guild queries go through the router
func (r *ShardRouter) GetChannels(guildID uint64) ([]Channel, error) {
    db := r.GetShardForGuild(guildID)
    return db.Query("SELECT * FROM channels WHERE guild_id = $1", guildID)
}
 
// Cross-guild query requires scatter-gather
func (r *ShardRouter) GetUserGuilds(userID uint64) ([]Guild, error) {
    // User→Guild mapping stored separately, not guild-sharded
    // Then fetch guild details from appropriate shards
}

Auto-Scaling and Capacity Management

Scaling Metrics

•Connection count: Primary metric for Gateways
•CPU utilization: For compute-heavy services
•Request rate: For API servers
•Queue depth: For async workers
•Memory pressure: For caching layers
•Latency percentiles: p95/p99 response times

Scaling Triggers

•Scale up at: 70% connection capacity
•Scale up at: 80% CPU utilization
•Scale up at: p99 latency > 100ms
•Scale down at: 30% utilization for 15 min
•Minimum floor: Never below 50% of peak
•Pre-scale for: Known events (game launches)

Stateful scaling challenges:

Gateways are stateful—each holds 100K WebSocket connections with in-memory state. You can't just kill a Gateway; you must drain connections first.

Graceful Gateway scaling:

Scale up: New Gateways added; load balancer routes new connections to them. Existing connections unchanged.
Scale down:
- Mark Gateway for draining (no new connections)
- Send RECONNECT to clients (with resume support)
- Clients reconnect to other Gateways
- Once empty, terminate Gateway
- Takes 5-15 minutes for graceful drain
Rolling updates: Same as scale-down, but one-by-one across fleet. Clients experience brief reconnection.

Predictive Scaling

For known events, Discord pre-scales capacity. A major game launch is announced for 9 PM PST? Spin up extra Gateways at 8:30 PM. This avoids cold-start delays during the traffic surge.

Graceful Degradation Strategies

Degradation Hierarchy
Priority	Feature	Degradation Strategy
1 (Critical)	Message delivery	Never drop; queue if overloaded
2 (Critical)	Voice audio	Reduce quality before dropping users
3 (Important)	Presence updates	Batch and delay; eventual consistency
4 (Important)	Typing indicators	Drop entirely under load
5 (Nice-to-have)	Rich embeds	Show placeholder; load async
6 (Nice-to-have)	Animated emoji	Fall back to static
7 (Deferrable)	Search	Disable and show message
8 (Deferrable)	Server discovery	Return cached results

Circuit breakers:

When a downstream service is failing, continuously retrying makes things worse (additional load on struggling service). Circuit breakers prevent this:

Closed state: Requests flow normally
Open state: Requests fail immediately (no backend call)
Half-open state: Allow limited requests to test recovery

Transition logic:

Closed → Open: When error rate exceeds threshold (e.g., >50% in 10s)
Open → Half-open: After timeout (e.g., 30 seconds)
Half-open → Closed: If test requests succeed
Half-open → Open: If test requests fail

Circuit Breaker Implementation
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
type CircuitBreaker struct {
    state         State  // Closed, Open, HalfOpen
    failures      int
    successes     int
    threshold     int
    timeout       time.Duration
    lastFailure   time.Time
    mu            sync.Mutex
}
 
func (cb *CircuitBreaker) Execute(fn func() error) error {
    cb.mu.Lock()
    
    switch cb.state {
    case Open:
        // Check if timeout elapsed
        if time.Since(cb.lastFailure) > cb.timeout {
            cb.state = HalfOpen
            cb.mu.Unlock()
            return cb.tryRequest(fn)
        }
        cb.mu.Unlock()
        return ErrCircuitOpen  // Fail fast
        
    case HalfOpen:
        cb.mu.Unlock()
        return cb.tryRequest(fn)
        
    case Closed:
        cb.mu.Unlock()
        return cb.tryRequest(fn)
    }
    return nil
}
 
func (cb *CircuitBreaker) tryRequest(fn func() error) error {
    err := fn()
    
    cb.mu.Lock()
    defer cb.mu.Unlock()
    
    if err != nil {
        cb.failures++
        cb.lastFailure = time.Now()
        if cb.failures >= cb.threshold {
            cb.state = Open
        }
        return err
    }
    
    cb.successes++
    if cb.state == HalfOpen && cb.successes >= 3 {
        cb.state = Closed
        cb.failures = 0
    }
    return nil
}

Cascading Failure Prevention

Operational Excellence at Scale

Running systems at Discord's scale requires more than good architecture—it requires excellent operations. The best system design is useless if you can't deploy, monitor, and debug it.

Observability Stack

•Metrics: Time-series data (latency, error rates, throughput). Prometheus/Datadog style. Dashboards for every service.
•Logging: Structured logs with correlation IDs. Centralized aggregation (e.g., Elasticsearch). Sampled for high-volume paths.
•Tracing: Distributed tracing across service calls. Every request gets a trace ID. Visualize request flow.
•Alerting: PagerDuty for oncall. Escalation policies. Runbooks for common issues.
•Error Tracking: Sentry-style error aggregation. Prioritize by frequency and impact.

Deployment practices:

Canary deployments: New code to 1% of servers first. Monitor for 15 minutes. If healthy, expand.
Feature flags: Decouple deployment from release. Turn features on/off without deploying.
Blue-green for databases: New schema deployed alongside old. Migrate gradually. Rollback if issues.
Automated rollback: Deployment fails health checks? Automatically roll back to previous version.

Incident response:

Discord publishes a status page and maintains incident response procedures:

Detection: Automated alerting catches issues
Response: On-call engineer paged within 5 minutes
Mitigation: Priority is restoring service, not finding root cause
Communication: Status page updated; @discordapp tweets
Resolution: Once stable, investigate and fix properly
Postmortem: Blameless analysis, action items, knowledge sharing

Game Days

Designing Discord: Interview Summary

Let's consolidate everything into a coherent design that you could present in a system design interview.

Converting Mermaid diagram...

Key Design Decisions to Discuss

•WebSocket for real-time: Persistent connections enable instant message delivery; HTTP/2 for API calls
•Pub/Sub for fanout: Events published to channels; only interested Gateways subscribe
•Snowflake IDs: Time-sortable unique IDs; natural sharding key for messages
•SFU for voice: Server forwards packets; clients decode and mix; scales linearly
•Polyglot persistence: PostgreSQL for relational, Cassandra for messages, Redis for cache
•Guild-based sharding: All guild data on one shard; enables efficient queries
•Large guild special handling: Presence sampling, lazy loading, batched events
•Geographic distribution: Users connect to nearest region; voice servers regional
•Graceful degradation: Priority hierarchy; typing indicators dropped before messages

Trade-offs to acknowledge:

Eventual consistency for presence (acceptable) vs. strong consistency for messages (required)
Stateful Gateways add complexity but essential for connection management
Large guilds get degraded features—trade-off for scalability
Regional voice means cross-region group chats have one side with high latency
Client-side mixing requires client CPU/bandwidth but massively reduces server cost

Module Complete!