System Design HLDTikTok Short Videos

Designing TikTok: Short-Form Video Platform

LevelAdvanced

Duration120 mins

TopicTikTok Short Videos

5 / 6

Viral Content Handling

When Content Explodes

At 3:47 PM on an ordinary Tuesday, a video of a skateboarding dog is uploaded by a user with 47 followers. By 4:15 PM, it has 10,000 views. By 5:00 PM, 500,000 views. By midnight, 50 million views. The video—and the infrastructure serving it—went from nothing to handling 1,000+ requests per second for a single asset.\n\nThis scenario happens multiple times daily on TikTok. Unlike traditional platforms where established creators generate predictable traffic, TikTok's meritocratic algorithm means any video from any user can go viral at any time. This fundamental unpredictability is one of the hardest system design challenges in consumer media.\n\nThe Viral Content Problem:\n\n- Traffic is concentrated on a single video ID\n- Standard caching strategies assume uniform access patterns\n- Database hot spots emerge on single rows\n- CDN cache hierarchies may not have warmed the content\n- Counter infrastructure faces write amplification\n- Moderation burden spikes (viral mistakes amplified)\n\nThis page explores the techniques TikTok uses to survive—and thrive—under viral load.

Learning Objectives

By the end of this page, you will understand: (1) Viral detection algorithms and early warning systems, (2) Dynamic CDN warming and edge cache strategies, (3) Hot spot mitigation in databases and counters, (4) Graceful degradation patterns for overload scenarios, (5) Moderation escalation for viral content, and (6) Post-viral content management.

Understanding Virality Patterns

Not all popular content is viral. Understanding the difference helps design appropriate responses.

Content Traffic Patterns
Pattern	Growth Rate	Peak Traffic	Duration	Example
Normal	Linear, steady	100 views/hour	Days	Regular creator's new video
Trending	2-5x/hour growth	10K views/hour	24-48 hours	Timely topic, moderate engagement
Viral	10-100x/hour growth	1M+ views/hour	12-24 hours	Unexpected hit, massive shares
Super Viral	1000x+ spike	10M+ views/hour	6-12 hours	Global cultural moment
Celebrity Post	Immediate high baseline	High but predictable	Varies	Known creator, expected audience

Viral Velocity Metrics\n\nTo detect and respond to viral content, we need metrics that capture growth rate rather than absolute numbers:\n\n- View Velocity: Views in last 10 minutes vs previous 10 minutes\n- Engagement Velocity: Engagement rate (likes/views) acceleration\n- Share Velocity: External share rate (highest signal for viral potential)\n- Geographic Spread: How quickly is content crossing regions?\n- Audience Novelty: Is the video reaching users outside creator's typical audience?\n\nA video with 1,000 views in its first hour that suddenly gets 10,000 views in minute 61-70 is more concerning than a video steady at 1,000 views/hour all day.

The Thundering Herd

Viral content often exhibits 'thundering herd' behavior: A share on Twitter or a mention by a celebrity causes thousands of users to request the same content simultaneously. Unlike gradual growth, this creates an instant traffic spike that can overwhelm unprepared systems.

Viral Detection System

Early detection of viral content is critical. The sooner we identify a video is going viral, the more time we have to prepare infrastructure. The goal: detect viral trajectory within 5 minutes of acceleration.

Converting Mermaid diagram...

viral-detector.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
from dataclasses import dataclass
from typing import Literal
import numpy as np
from scipy import stats
 
@dataclass
class ContentMetrics:
    video_id: str
    views_1min: int
    views_5min: int
    views_15min: int
    views_1hr: int
    engagement_rate: float
    share_rate: float
    geographic_spread: int  # Number of distinct regions
    
@dataclass
class ViralClassification:
    level: Literal['normal', 'trending', 'viral', 'super_viral']
    confidence: float
    predicted_peak_views: int
    time_to_peak_hours: float
 
class ViralDetector:
    def __init__(self):
        # Historical baselines by content type
        self.baseline_stats = self._load_baseline_stats()
        self.ml_model = self._load_virality_predictor()
    
    def classify_content(self, metrics: ContentMetrics) -> ViralClassification:
        # Calculate velocity metrics
        velocity_1_to_5 = self._calculate_velocity(
            metrics.views_1min, 
            metrics.views_5min
        )
        velocity_5_to_15 = self._calculate_velocity(
            metrics.views_5min,
            metrics.views_15min
        )
        
        # Z-score against baseline (how many std devs above normal)
        z_score = self._calculate_z_score(metrics)
        
        # Rule-based classification
        if z_score > 10 and velocity_1_to_5 > 50:
            level = 'super_viral'
        elif z_score > 5 and velocity_1_to_5 > 10:
            level = 'viral'
        elif z_score > 2 and velocity_1_to_5 > 3:
            level = 'trending'
        else:
            level = 'normal'
        
        # ML model for prediction
        features = self._extract_features(metrics, velocity_1_to_5)
        prediction = self.ml_model.predict(features)
        
        return ViralClassification(
            level=level,
            confidence=prediction['confidence'],
            predicted_peak_views=prediction['peak_views'],
            time_to_peak_hours=prediction['time_to_peak']
        )
    
    def _calculate_velocity(self, recent: int, older: int) -> float:
        """Velocity = growth rate between time windows."""
        if older == 0:
            return float('inf') if recent > 0 else 0
        return (recent - older) / older * 100  # Percentage growth
    
    def _calculate_z_score(self, metrics: ContentMetrics) -> float:
        """How many std devs above baseline is this content?"""
        baseline_mean = self.baseline_stats['views_5min_mean']
        baseline_std = self.baseline_stats['views_5min_std']
        return (metrics.views_5min - baseline_mean) / baseline_std

False Positive Considerations

Over-triggering viral responses wastes resources. Under-triggering causes outages. The system should err toward early detection (false positives) because the cost of CDN warming is much lower than the cost of viral traffic hitting an unprepared origin. Aim for 95% recall with 50% precision—it's better to warm 100 videos when only 50 go viral than to miss 5 truly viral videos.

CDN Warming Strategies

Under normal operation, CDN caches populate lazily—the first request for content in a region fetches from origin, then subsequent requests hit cache. For viral content, this creates a cache stampede where thousands of simultaneous requests hit origin before the cache warms.\n\nProactive CDN Warming pushes content to edge caches before users request it, eliminating origin load for viral content.

CDN Warming Techniques

•Push-Based Warming — Origin actively pushes content to edge POPs. Triggered when viral detection fires. Content available at edge before users request it.
•Capacity-Aware Warming — Prioritize warming in regions where content is trending. Geographic spread metric guides which POPs to warm first.
•Tiered Warming — Warm all renditions for nearby POPs; only popular renditions (720p, 540p) for distant POPs. Saves bandwidth while ensuring coverage.
•Request Coalescing — Multiple simultaneous origin requests for same content collapsed into single fetch. Reduces origin load during initial cache miss.
•Shield Layer — Mid-tier cache between edge and origin. Even if edge POPs miss, shield handles most requests. Only one request per shield region reaches origin.

Converting Mermaid diagram...

CDN Warming Response by Viral Level
Viral Level	Action	Latency to Warm	POPs Targeted
Trending	Passive (let cache fill naturally)	N/A	N/A
Viral (regional)	Warm regional shield + 10 top POPs	30 seconds	~15 POPs
Viral (multi-region)	Warm all shields + 50 top POPs	60 seconds	~55 POPs
Super Viral	Emergency warm all POPs globally	2 minutes	All 1000+ POPs

Bandwidth Cost

Warming a 10MB video (all renditions) to 1000 POPs = 10GB of transfer before any user watches. At $0.02/GB, that's $0.20 per viral video warmed. With 100+ viral videos/day, CDN warming costs $7,000+/month. But compare to the cost of origin overload and user-facing latency spikes—warming is cheap insurance.

Hot Spot Mitigation in Storage and Counters

Even with CDN warming, viral content creates hot spots in databases and counter systems. A single video ID becomes the target of massive read and write load. Standard database sharding (by video ID) places all load on one shard.

Hot Spot Patterns and Solutions

•Metadata Hot Spot — Millions of requests for video metadata (title, creator, thumbnail URL). Solution: Replicate hot keys to multiple cache nodes; read from random replica.
•Counter Hot Spot — Millions of increments to like/view counters. Solution: Sharded counters as discussed in engagement page; viral videos automatically get 100+ shards.
•Comment Hot Spot — Thousands writing comments simultaneously. Solution: Write to partitioned log first, batch insert to main store; comments may appear with 5-10s delay.
•Engagement Store Hot Spot — Many users recording 'I liked video X'. Solution: Partition user-video relationships by user ID, not video ID. Reads aggregate across user shards.

adaptive-sharding.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
class AdaptiveCounter:
    """
    Counter that automatically increases sharding for hot keys.
    Normal keys: 1 shard (direct counter)
    Trending keys: 10 shards
    Viral keys: 100 shards
    Super viral: 1000 shards
    """
    
    def __init__(self, redis_cluster):
        self.redis = redis_cluster
        self.hot_key_detector = HotKeyDetector()
    
    async def increment(self, key: str, delta: int = 1) -> None:
        # Get current shard count for this key
        shard_count = await self._get_shard_count(key)
        
        # Select random shard for write
        shard_id = random.randint(0, shard_count - 1)
        shard_key = f"{key}:shard:{shard_id}"
        
        await self.redis.incrby(shard_key, delta)
        
        # Track write frequency for hot key detection
        self.hot_key_detector.record_write(key)
        
        # Check if we need to scale up shards
        if self.hot_key_detector.is_hot(key):
            await self._scale_up_shards(key)
    
    async def get_count(self, key: str) -> int:
        shard_count = await self._get_shard_count(key)
        
        if shard_count == 1:
            return await self.redis.get(key) or 0
        
        # Sum all shards
        shard_keys = [f"{key}:shard:{i}" for i in range(shard_count)]
        values = await self.redis.mget(shard_keys)
        return sum(int(v or 0) for v in values)
    
    async def _scale_up_shards(self, key: str) -> None:
        """Double shard count for a hot key."""
        current_count = await self._get_shard_count(key)
        new_count = min(current_count * 2, 1000)  # Max 1000 shards
        
        await self.redis.set(f"{key}:shard_count", new_count)
        
        # Notify other nodes about new shard count
        await self.publish_shard_update(key, new_count)
    
    async def _get_shard_count(self, key: str) -> int:
        count = await self.redis.get(f"{key}:shard_count")
        return int(count) if count else 1
 
 
class HotKeyDetector:
    """Detect hot keys using sliding window and threshold."""
    
    def __init__(self, window_seconds: int = 60, threshold: int = 10000):
        self.window_seconds = window_seconds
        self.threshold = threshold
        self.write_counts = {}  # key -> [(timestamp, count)]
    
    def record_write(self, key: str) -> None:
        now = time.time()
        if key not in self.write_counts:
            self.write_counts[key] = []
        self.write_counts[key].append((now, 1))
        self._cleanup_old_entries(key, now)
    
    def is_hot(self, key: str) -> bool:
        if key not in self.write_counts:
            return False
        now = time.time()
        self._cleanup_old_entries(key, now)
        total = sum(c for _, c in self.write_counts[key])
        return total > self.threshold

Shard Scale-Down

After viral traffic subsides, the 1000-shard counter should consolidate back to fewer shards for efficiency. Background process aggregates shards after 24 hours of reduced activity. This prevents permanent fragmentation from temporary viral spikes.

Graceful Degradation Patterns

Despite best preparations, viral traffic can exceed capacity. The system must degrade gracefully—maintain core functionality while shedding optional features—rather than fail completely.

Degradation Levels for Viral Content
Load Level	Actions	User Experience Impact
Normal (100%)	All features enabled	Full experience
Elevated (150%)	Disable real-time comment streaming	Comments load on refresh, not live
High (200%)	Show cached engagement counts (may be stale)	Counts lag by up to 60 seconds
Critical (300%)	Disable comments entirely on viral videos	Read-only engagement; no new comments
Emergency (500%+)	Serve lower quality video; queue non-critical writes	Lower video quality; engagement delayed
Overload	Rate limit requests to viral content; show alternative videos	Some users see different content

Degradation Techniques

•Circuit Breakers — When a downstream service (comments, analytics) is overloaded, trip the circuit. Return cached/default data instead of waiting for timeout.
•Load Shedding — Under extreme load, randomly reject a percentage of requests with HTTP 503. Client retries with backoff. Prevents 100% failure.
•Priority Queuing — Critical operations (video playback) get priority over optional operations (analytics logging). Separate queues with different capacity allocations.
•Feature Flags — Remotely disable expensive features for specific videos. 'Disable comment streaming for video X' can be toggled in seconds.
•Quality Degradation — Force lower video resolution under load. 720p instead of 1080p reduces bandwidth by 60% with acceptable quality loss.
•Geographic Isolation — If viral traffic is regional, isolate impacted region's infrastructure. Other regions unaffected.

circuit-breaker.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
enum CircuitState {
  CLOSED,    // Normal operation
  OPEN,      // Failing fast, not calling service
  HALF_OPEN  // Testing if service recovered
}
 
class CircuitBreaker {
  private state: CircuitState = CircuitState.CLOSED;
  private failureCount: number = 0;
  private lastFailureTime: number = 0;
  private successCount: number = 0;
  
  constructor(
    private readonly failureThreshold: number = 5,
    private readonly recoveryTimeout: number = 30000,  // 30 seconds
    private readonly halfOpenRequests: number = 3
  ) {}
  
  async call<T>(
    serviceCall: () => Promise<T>,
    fallback: () => Promise<T>
  ): Promise<T> {
    if (this.state === CircuitState.OPEN) {
      // Check if recovery timeout passed
      if (Date.now() - this.lastFailureTime > this.recoveryTimeout) {
        this.state = CircuitState.HALF_OPEN;
        this.successCount = 0;
      } else {
        // Circuit is open, use fallback
        return fallback();
      }
    }
    
    try {
      const result = await serviceCall();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      return fallback();
    }
  }
  
  private onSuccess(): void {
    this.failureCount = 0;
    
    if (this.state === CircuitState.HALF_OPEN) {
      this.successCount++;
      if (this.successCount >= this.halfOpenRequests) {
        // Service recovered, close circuit
        this.state = CircuitState.CLOSED;
      }
    }
  }
  
  private onFailure(): void {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    
    if (this.failureCount >= this.failureThreshold) {
      // Too many failures, open circuit
      this.state = CircuitState.OPEN;
    }
  }
}
 
// Usage example
const commentCircuit = new CircuitBreaker(5, 30000);
 
async function getCommentsWithFallback(videoId: string) {
  return commentCircuit.call(
    () => commentService.getComments(videoId),
    () => Promise.resolve({ 
      comments: [], 
      message: 'Comments temporarily unavailable' 
    })
  );
}

User Communication

When degrading, communicate clearly. 'Comments are loading slowly due to high traffic' is better than a loading spinner that never resolves. Users are forgiving of stated limitations but frustrated by silent failures.

Moderation Escalation for Viral Content

Viral content presents a unique moderation challenge: a policy violation that reaches 1 million viewers causes far more harm than the same violation seen by 100 people. This requires accelerated review processes for content showing viral trajectory.

Viral Moderation Escalation

•Automated Re-Scan — When content crosses viral threshold, re-run all moderation ML models with stricter thresholds. Catch borderline cases that passed initial checks.
•Priority Human Review — Viral content jumps to front of human review queue. Dedicated 'viral response' moderators handle these exclusively.
•Provenance Check — Verify the content isn't a re-upload of previously removed content. Cross-reference against content fingerprint database.
•Context Analysis — Rapid assessment of why content is going viral. Is it genuine entertainment, or is it gaming/manipulation?
•Shadow Limiting — If concerns emerge but decision unclear, reduce distribution velocity while review continues. Don't fully remove, but slow the spread.
•Creator History Check — Has this creator had prior violations? First-time viral creator vs known bad actor requires different response.

Moderation SLOs by Content Reach
Reach Level	Review SLO	Decision Authority	Action Options
< 10K views	24 hours	Standard moderator	Remove, warn, no action
10K - 100K views	4 hours	Senior moderator	Remove, age-gate, reduce reach
100K - 1M views	1 hour	Team lead approval	Same + public statement
1M+ views	30 minutes	Policy team escalation	Same + legal review if needed
10M+ views	15 minutes	Executive notification	All options + coordinated response

The Speed-Accuracy Tradeoff

Faster decisions mean higher error rates. Wrongly removing a viral video with 10M views causes massive creator backlash and potential press coverage. Wrongly leaving up policy-violating content causes harm and regulatory risk. For very high-reach content, the preference is usually to reduce reach while conducting thorough review rather than irreversible removal.

Traffic Prediction and Pre-Scaling

Some viral events are predictable: Major cultural events (Super Bowl, New Year's Eve), scheduled celebrity activities, or trending challenges. Pre-scaling for predicted traffic spikes prevents reactive scrambling.

Pre-Scaling Triggers

•Calendar Events — Super Bowl, World Cup, major holidays trigger pre-scaling. Historical patterns guide capacity multipliers (Super Bowl = 3x normal capacity).
•Trending Challenges — When a challenge starts trending, predict related content surge. Pre-warm servers in anticipation.
•Celebrity Schedules — Major creators sometimes coordinate with platform for expected large drops. Pre-provision infrastructure.
•External Signals — Twitter trending, news mentions, or influencer pre-announcements signal incoming traffic. Monitor external data sources.
•Geographic Events — Local holidays (Diwali, Lunar New Year) require region-specific pre-scaling.

capacity-predictor.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
from datetime import datetime, timedelta
from typing import Dict, List
import numpy as np
 
class CapacityPredictor:
    def __init__(self):
        self.calendar_events = self._load_calendar_events()
        self.historical_patterns = self._load_historical_patterns()
        self.ml_model = self._load_demand_model()
    
    def predict_next_24h(self) -> List[Dict]:
        """
        Predict hourly capacity needs for next 24 hours.
        Returns list of {hour, expected_rps, confidence, scaling_factor}
        """
        now = datetime.utcnow()
        predictions = []
        
        for hour_offset in range(24):
            target_hour = now + timedelta(hours=hour_offset)
            
            # Base prediction from historical patterns
            base_rps = self._get_historical_baseline(target_hour)
            
            # Adjust for known events
            event_multiplier = self._get_event_multiplier(target_hour)
            
            # Adjust for trends (e.g., growing user base)
            trend_multiplier = self._get_trend_multiplier(target_hour)
            
            # ML model for residual prediction
            ml_adjustment = self.ml_model.predict(
                self._extract_features(target_hour)
            )
            
            predicted_rps = base_rps * event_multiplier * trend_multiplier + ml_adjustment
            
            predictions.append({
                'hour': target_hour,
                'expected_rps': int(predicted_rps),
                'confidence': self._calculate_confidence(target_hour),
                'scaling_factor': event_multiplier
            })
        
        return predictions
    
    def _get_event_multiplier(self, dt: datetime) -> float:
        """Check for known events affecting capacity."""
        for event in self.calendar_events:
            if event['start'] <= dt <= event['end']:
                return event['multiplier']
        
        # Day-of-week patterns
        day_multipliers = {
            0: 1.0,   # Monday
            1: 1.0,   # Tuesday
            2: 1.05,  # Wednesday
            3: 1.1,   # Thursday
            4: 1.2,   # Friday
            5: 1.3,   # Saturday
            6: 1.25   # Sunday
        }
        return day_multipliers[dt.weekday()]
    
    def trigger_prescaling(self, predictions: List[Dict]) -> None:
        """Pre-scale infrastructure based on predictions."""
        for pred in predictions:
            if pred['scaling_factor'] > 1.5:
                # Significant event - scale up 2 hours before
                pre_scale_time = pred['hour'] - timedelta(hours=2)
                self.schedule_scale_up(
                    when=pre_scale_time,
                    target_capacity=int(pred['expected_rps'] * 1.3)  # 30% buffer
                )

Scaling Latency

Cloud auto-scaling takes 5-10 minutes to provision new instances. Kubernetes pod scaling takes 1-2 minutes. Pre-scaling must account for this latency—trigger 15+ minutes before predicted spike. Over-provisioning is cheaper than under-provisioning for critical events.

Post-Viral Content Management

After viral traffic subsides, the system should normalize without wasting resources on no-longer-hot content.

Post-Viral Actions

•Counter Consolidation — Merge sharded counters back to single counter. Background job aggregates 1000 shards into authoritative count.
•CDN Cache Adjustment — Reduce TTL boost applied during viral phase. Let content age out of edge caches normally.
•Storage Tier Transition — Move video from hot tier to warm tier based on declining access patterns. Saves storage costs.
•Infrastructure Scale-Down — Reduce temporary capacity added for the spike. Gradual step-down to avoid premature removal.
•Analytics Stabilization — Finalize engagement metrics. Move from real-time approximate counts to batch-processed authoritative counts.
•Moderation Completion — Complete any in-progress reviews. Log final decisions for future reference and model training.

Viral Lifecycle Timeline\n\n\nHour 0-1: Video uploaded, passes automated moderation\nHour 1-3: Initial distribution, moderate engagement\nHour 3-6: Viral detection fires, CDN warming begins\nHour 6-12: Peak viral traffic, all scaling engaged\nHour 12-24: Traffic plateau, holding pattern\nHour 24-48: Decline begins, scale-down initiates\nHour 48-72: Return to baseline\nDay 3-7: Counter consolidation, storage tiering\nDay 7+: Content enters normal lifecycle\n\n\nThe peak typically lasts 6-12 hours, with a long tail. Most viral content is ephemeral—within a week, traffic is indistinguishable from normal content.

Summary: Viral Content Handling

Key Takeaways

•Any Video Can Go Viral — Unlike social-graph platforms, TikTok's algorithm means unpredictable content from unknown creators can spike 1000x. Design for this randomness.
•Early Detection is Critical — Viral detection within 5 minutes of acceleration enables proactive response. Velocity metrics (growth rate) matter more than absolute numbers.
•CDN Warming Prevents Stampede — Push content to edge before users request it. The cost of warming is tiny compared to origin overload.
•Adaptive Sharding — Counters and storage must dynamically shard for hot keys. 1 shard → 1000 shards when needed, consolidate after.
•Graceful Degradation — Shed optional features (comments, real-time counts) before core features (video playback) fail. Circuit breakers with fallbacks.
•Accelerated Moderation — Viral content gets priority human review. The harm from policy violations scales with reach.
•Post-Viral Cleanup — Normalize infrastructure after spike passes. Counter consolidation, cache adjustment, storage tiering.

Coming Up Next\n\nWith content flowing and engagement thriving, creators need tools to understand and monetize their success. The final page explores Creator Tools—analytics dashboards, monetization systems, and the ecosystem infrastructure that keeps creators creating.

Viral Handling Complete

You now understand how to design systems that survive—and thrive—under unpredictable viral traffic. The key insight: assume any content can explode, detect early, prepare proactively, and degrade gracefully when necessary. This resilience is what separates production-grade systems from those that fail under their own success.

5 / 6

Loading learning content...

System Design HLDTikTok Short Videos

Designing TikTok: Short-Form Video Platform

LevelAdvanced

Duration120 mins

TopicTikTok Short Videos

5 / 6

Viral Content Handling

When Content Explodes

Learning Objectives

Understanding Virality Patterns

Not all popular content is viral. Understanding the difference helps design appropriate responses.

Content Traffic Patterns
Pattern	Growth Rate	Peak Traffic	Duration	Example
Normal	Linear, steady	100 views/hour	Days	Regular creator's new video
Trending	2-5x/hour growth	10K views/hour	24-48 hours	Timely topic, moderate engagement
Viral	10-100x/hour growth	1M+ views/hour	12-24 hours	Unexpected hit, massive shares
Super Viral	1000x+ spike	10M+ views/hour	6-12 hours	Global cultural moment
Celebrity Post	Immediate high baseline	High but predictable	Varies	Known creator, expected audience

The Thundering Herd

Viral Detection System

Converting Mermaid diagram...

viral-detector.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
from dataclasses import dataclass
from typing import Literal
import numpy as np
from scipy import stats
 
@dataclass
class ContentMetrics:
    video_id: str
    views_1min: int
    views_5min: int
    views_15min: int
    views_1hr: int
    engagement_rate: float
    share_rate: float
    geographic_spread: int  # Number of distinct regions
    
@dataclass
class ViralClassification:
    level: Literal['normal', 'trending', 'viral', 'super_viral']
    confidence: float
    predicted_peak_views: int
    time_to_peak_hours: float
 
class ViralDetector:
    def __init__(self):
        # Historical baselines by content type
        self.baseline_stats = self._load_baseline_stats()
        self.ml_model = self._load_virality_predictor()
    
    def classify_content(self, metrics: ContentMetrics) -> ViralClassification:
        # Calculate velocity metrics
        velocity_1_to_5 = self._calculate_velocity(
            metrics.views_1min, 
            metrics.views_5min
        )
        velocity_5_to_15 = self._calculate_velocity(
            metrics.views_5min,
            metrics.views_15min
        )
        
        # Z-score against baseline (how many std devs above normal)
        z_score = self._calculate_z_score(metrics)
        
        # Rule-based classification
        if z_score > 10 and velocity_1_to_5 > 50:
            level = 'super_viral'
        elif z_score > 5 and velocity_1_to_5 > 10:
            level = 'viral'
        elif z_score > 2 and velocity_1_to_5 > 3:
            level = 'trending'
        else:
            level = 'normal'
        
        # ML model for prediction
        features = self._extract_features(metrics, velocity_1_to_5)
        prediction = self.ml_model.predict(features)
        
        return ViralClassification(
            level=level,
            confidence=prediction['confidence'],
            predicted_peak_views=prediction['peak_views'],
            time_to_peak_hours=prediction['time_to_peak']
        )
    
    def _calculate_velocity(self, recent: int, older: int) -> float:
        """Velocity = growth rate between time windows."""
        if older == 0:
            return float('inf') if recent > 0 else 0
        return (recent - older) / older * 100  # Percentage growth
    
    def _calculate_z_score(self, metrics: ContentMetrics) -> float:
        """How many std devs above baseline is this content?"""
        baseline_mean = self.baseline_stats['views_5min_mean']
        baseline_std = self.baseline_stats['views_5min_std']
        return (metrics.views_5min - baseline_mean) / baseline_std

False Positive Considerations

CDN Warming Strategies

CDN Warming Techniques

•Push-Based Warming — Origin actively pushes content to edge POPs. Triggered when viral detection fires. Content available at edge before users request it.
•Capacity-Aware Warming — Prioritize warming in regions where content is trending. Geographic spread metric guides which POPs to warm first.
•Tiered Warming — Warm all renditions for nearby POPs; only popular renditions (720p, 540p) for distant POPs. Saves bandwidth while ensuring coverage.
•Request Coalescing — Multiple simultaneous origin requests for same content collapsed into single fetch. Reduces origin load during initial cache miss.
•Shield Layer — Mid-tier cache between edge and origin. Even if edge POPs miss, shield handles most requests. Only one request per shield region reaches origin.

Converting Mermaid diagram...

CDN Warming Response by Viral Level
Viral Level	Action	Latency to Warm	POPs Targeted
Trending	Passive (let cache fill naturally)	N/A	N/A
Viral (regional)	Warm regional shield + 10 top POPs	30 seconds	~15 POPs
Viral (multi-region)	Warm all shields + 50 top POPs	60 seconds	~55 POPs
Super Viral	Emergency warm all POPs globally	2 minutes	All 1000+ POPs

Bandwidth Cost

Hot Spot Mitigation in Storage and Counters

Hot Spot Patterns and Solutions

•Metadata Hot Spot — Millions of requests for video metadata (title, creator, thumbnail URL). Solution: Replicate hot keys to multiple cache nodes; read from random replica.
•Counter Hot Spot — Millions of increments to like/view counters. Solution: Sharded counters as discussed in engagement page; viral videos automatically get 100+ shards.
•Comment Hot Spot — Thousands writing comments simultaneously. Solution: Write to partitioned log first, batch insert to main store; comments may appear with 5-10s delay.
•Engagement Store Hot Spot — Many users recording 'I liked video X'. Solution: Partition user-video relationships by user ID, not video ID. Reads aggregate across user shards.

adaptive-sharding.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
class AdaptiveCounter:
    """
    Counter that automatically increases sharding for hot keys.
    Normal keys: 1 shard (direct counter)
    Trending keys: 10 shards
    Viral keys: 100 shards
    Super viral: 1000 shards
    """
    
    def __init__(self, redis_cluster):
        self.redis = redis_cluster
        self.hot_key_detector = HotKeyDetector()
    
    async def increment(self, key: str, delta: int = 1) -> None:
        # Get current shard count for this key
        shard_count = await self._get_shard_count(key)
        
        # Select random shard for write
        shard_id = random.randint(0, shard_count - 1)
        shard_key = f"{key}:shard:{shard_id}"
        
        await self.redis.incrby(shard_key, delta)
        
        # Track write frequency for hot key detection
        self.hot_key_detector.record_write(key)
        
        # Check if we need to scale up shards
        if self.hot_key_detector.is_hot(key):
            await self._scale_up_shards(key)
    
    async def get_count(self, key: str) -> int:
        shard_count = await self._get_shard_count(key)
        
        if shard_count == 1:
            return await self.redis.get(key) or 0
        
        # Sum all shards
        shard_keys = [f"{key}:shard:{i}" for i in range(shard_count)]
        values = await self.redis.mget(shard_keys)
        return sum(int(v or 0) for v in values)
    
    async def _scale_up_shards(self, key: str) -> None:
        """Double shard count for a hot key."""
        current_count = await self._get_shard_count(key)
        new_count = min(current_count * 2, 1000)  # Max 1000 shards
        
        await self.redis.set(f"{key}:shard_count", new_count)
        
        # Notify other nodes about new shard count
        await self.publish_shard_update(key, new_count)
    
    async def _get_shard_count(self, key: str) -> int:
        count = await self.redis.get(f"{key}:shard_count")
        return int(count) if count else 1
 
 
class HotKeyDetector:
    """Detect hot keys using sliding window and threshold."""
    
    def __init__(self, window_seconds: int = 60, threshold: int = 10000):
        self.window_seconds = window_seconds
        self.threshold = threshold
        self.write_counts = {}  # key -> [(timestamp, count)]
    
    def record_write(self, key: str) -> None:
        now = time.time()
        if key not in self.write_counts:
            self.write_counts[key] = []
        self.write_counts[key].append((now, 1))
        self._cleanup_old_entries(key, now)
    
    def is_hot(self, key: str) -> bool:
        if key not in self.write_counts:
            return False
        now = time.time()
        self._cleanup_old_entries(key, now)
        total = sum(c for _, c in self.write_counts[key])
        return total > self.threshold

Shard Scale-Down

Graceful Degradation Patterns

Despite best preparations, viral traffic can exceed capacity. The system must degrade gracefully—maintain core functionality while shedding optional features—rather than fail completely.

Degradation Levels for Viral Content
Load Level	Actions	User Experience Impact
Normal (100%)	All features enabled	Full experience
Elevated (150%)	Disable real-time comment streaming	Comments load on refresh, not live
High (200%)	Show cached engagement counts (may be stale)	Counts lag by up to 60 seconds
Critical (300%)	Disable comments entirely on viral videos	Read-only engagement; no new comments
Emergency (500%+)	Serve lower quality video; queue non-critical writes	Lower video quality; engagement delayed
Overload	Rate limit requests to viral content; show alternative videos	Some users see different content

Degradation Techniques

•Circuit Breakers — When a downstream service (comments, analytics) is overloaded, trip the circuit. Return cached/default data instead of waiting for timeout.
•Load Shedding — Under extreme load, randomly reject a percentage of requests with HTTP 503. Client retries with backoff. Prevents 100% failure.
•Priority Queuing — Critical operations (video playback) get priority over optional operations (analytics logging). Separate queues with different capacity allocations.
•Feature Flags — Remotely disable expensive features for specific videos. 'Disable comment streaming for video X' can be toggled in seconds.
•Quality Degradation — Force lower video resolution under load. 720p instead of 1080p reduces bandwidth by 60% with acceptable quality loss.
•Geographic Isolation — If viral traffic is regional, isolate impacted region's infrastructure. Other regions unaffected.

circuit-breaker.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
enum CircuitState {
  CLOSED,    // Normal operation
  OPEN,      // Failing fast, not calling service
  HALF_OPEN  // Testing if service recovered
}
 
class CircuitBreaker {
  private state: CircuitState = CircuitState.CLOSED;
  private failureCount: number = 0;
  private lastFailureTime: number = 0;
  private successCount: number = 0;
  
  constructor(
    private readonly failureThreshold: number = 5,
    private readonly recoveryTimeout: number = 30000,  // 30 seconds
    private readonly halfOpenRequests: number = 3
  ) {}
  
  async call<T>(
    serviceCall: () => Promise<T>,
    fallback: () => Promise<T>
  ): Promise<T> {
    if (this.state === CircuitState.OPEN) {
      // Check if recovery timeout passed
      if (Date.now() - this.lastFailureTime > this.recoveryTimeout) {
        this.state = CircuitState.HALF_OPEN;
        this.successCount = 0;
      } else {
        // Circuit is open, use fallback
        return fallback();
      }
    }
    
    try {
      const result = await serviceCall();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      return fallback();
    }
  }
  
  private onSuccess(): void {
    this.failureCount = 0;
    
    if (this.state === CircuitState.HALF_OPEN) {
      this.successCount++;
      if (this.successCount >= this.halfOpenRequests) {
        // Service recovered, close circuit
        this.state = CircuitState.CLOSED;
      }
    }
  }
  
  private onFailure(): void {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    
    if (this.failureCount >= this.failureThreshold) {
      // Too many failures, open circuit
      this.state = CircuitState.OPEN;
    }
  }
}
 
// Usage example
const commentCircuit = new CircuitBreaker(5, 30000);
 
async function getCommentsWithFallback(videoId: string) {
  return commentCircuit.call(
    () => commentService.getComments(videoId),
    () => Promise.resolve({ 
      comments: [], 
      message: 'Comments temporarily unavailable' 
    })
  );
}

User Communication

Moderation Escalation for Viral Content

Viral Moderation Escalation

•Automated Re-Scan — When content crosses viral threshold, re-run all moderation ML models with stricter thresholds. Catch borderline cases that passed initial checks.
•Priority Human Review — Viral content jumps to front of human review queue. Dedicated 'viral response' moderators handle these exclusively.
•Provenance Check — Verify the content isn't a re-upload of previously removed content. Cross-reference against content fingerprint database.
•Context Analysis — Rapid assessment of why content is going viral. Is it genuine entertainment, or is it gaming/manipulation?
•Shadow Limiting — If concerns emerge but decision unclear, reduce distribution velocity while review continues. Don't fully remove, but slow the spread.
•Creator History Check — Has this creator had prior violations? First-time viral creator vs known bad actor requires different response.

Moderation SLOs by Content Reach
Reach Level	Review SLO	Decision Authority	Action Options
< 10K views	24 hours	Standard moderator	Remove, warn, no action
10K - 100K views	4 hours	Senior moderator	Remove, age-gate, reduce reach
100K - 1M views	1 hour	Team lead approval	Same + public statement
1M+ views	30 minutes	Policy team escalation	Same + legal review if needed
10M+ views	15 minutes	Executive notification	All options + coordinated response

The Speed-Accuracy Tradeoff

Traffic Prediction and Pre-Scaling

Pre-Scaling Triggers

•Calendar Events — Super Bowl, World Cup, major holidays trigger pre-scaling. Historical patterns guide capacity multipliers (Super Bowl = 3x normal capacity).
•Trending Challenges — When a challenge starts trending, predict related content surge. Pre-warm servers in anticipation.
•Celebrity Schedules — Major creators sometimes coordinate with platform for expected large drops. Pre-provision infrastructure.
•External Signals — Twitter trending, news mentions, or influencer pre-announcements signal incoming traffic. Monitor external data sources.
•Geographic Events — Local holidays (Diwali, Lunar New Year) require region-specific pre-scaling.

capacity-predictor.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
from datetime import datetime, timedelta
from typing import Dict, List
import numpy as np
 
class CapacityPredictor:
    def __init__(self):
        self.calendar_events = self._load_calendar_events()
        self.historical_patterns = self._load_historical_patterns()
        self.ml_model = self._load_demand_model()
    
    def predict_next_24h(self) -> List[Dict]:
        """
        Predict hourly capacity needs for next 24 hours.
        Returns list of {hour, expected_rps, confidence, scaling_factor}
        """
        now = datetime.utcnow()
        predictions = []
        
        for hour_offset in range(24):
            target_hour = now + timedelta(hours=hour_offset)
            
            # Base prediction from historical patterns
            base_rps = self._get_historical_baseline(target_hour)
            
            # Adjust for known events
            event_multiplier = self._get_event_multiplier(target_hour)
            
            # Adjust for trends (e.g., growing user base)
            trend_multiplier = self._get_trend_multiplier(target_hour)
            
            # ML model for residual prediction
            ml_adjustment = self.ml_model.predict(
                self._extract_features(target_hour)
            )
            
            predicted_rps = base_rps * event_multiplier * trend_multiplier + ml_adjustment
            
            predictions.append({
                'hour': target_hour,
                'expected_rps': int(predicted_rps),
                'confidence': self._calculate_confidence(target_hour),
                'scaling_factor': event_multiplier
            })
        
        return predictions
    
    def _get_event_multiplier(self, dt: datetime) -> float:
        """Check for known events affecting capacity."""
        for event in self.calendar_events:
            if event['start'] <= dt <= event['end']:
                return event['multiplier']
        
        # Day-of-week patterns
        day_multipliers = {
            0: 1.0,   # Monday
            1: 1.0,   # Tuesday
            2: 1.05,  # Wednesday
            3: 1.1,   # Thursday
            4: 1.2,   # Friday
            5: 1.3,   # Saturday
            6: 1.25   # Sunday
        }
        return day_multipliers[dt.weekday()]
    
    def trigger_prescaling(self, predictions: List[Dict]) -> None:
        """Pre-scale infrastructure based on predictions."""
        for pred in predictions:
            if pred['scaling_factor'] > 1.5:
                # Significant event - scale up 2 hours before
                pre_scale_time = pred['hour'] - timedelta(hours=2)
                self.schedule_scale_up(
                    when=pre_scale_time,
                    target_capacity=int(pred['expected_rps'] * 1.3)  # 30% buffer
                )

Scaling Latency

Post-Viral Content Management

After viral traffic subsides, the system should normalize without wasting resources on no-longer-hot content.

Post-Viral Actions

•Counter Consolidation — Merge sharded counters back to single counter. Background job aggregates 1000 shards into authoritative count.
•CDN Cache Adjustment — Reduce TTL boost applied during viral phase. Let content age out of edge caches normally.
•Storage Tier Transition — Move video from hot tier to warm tier based on declining access patterns. Saves storage costs.
•Infrastructure Scale-Down — Reduce temporary capacity added for the spike. Gradual step-down to avoid premature removal.
•Analytics Stabilization — Finalize engagement metrics. Move from real-time approximate counts to batch-processed authoritative counts.
•Moderation Completion — Complete any in-progress reviews. Log final decisions for future reference and model training.

Summary: Viral Content Handling

Key Takeaways

•Any Video Can Go Viral — Unlike social-graph platforms, TikTok's algorithm means unpredictable content from unknown creators can spike 1000x. Design for this randomness.
•Early Detection is Critical — Viral detection within 5 minutes of acceleration enables proactive response. Velocity metrics (growth rate) matter more than absolute numbers.
•CDN Warming Prevents Stampede — Push content to edge before users request it. The cost of warming is tiny compared to origin overload.
•Adaptive Sharding — Counters and storage must dynamically shard for hot keys. 1 shard → 1000 shards when needed, consolidate after.
•Graceful Degradation — Shed optional features (comments, real-time counts) before core features (video playback) fail. Circuit breakers with fallbacks.
•Accelerated Moderation — Viral content gets priority human review. The harm from policy violations scales with reach.
•Post-Viral Cleanup — Normalize infrastructure after spike passes. Counter consolidation, cache adjustment, storage tiering.

Viral Handling Complete

5 / 6