System Design HLDTikTok Short Videos

Designing TikTok: Short-Form Video Platform

LevelAdvanced

Duration120 mins

TopicTikTok Short Videos

2 / 6

Video Creation Pipeline

The Factory Behind the Magic

Every day, over 10 million videos are uploaded to TikTok. Each video must be processed, validated, transcoded into multiple formats, checked for policy violations, and prepared for global distribution—all while giving the creator a near-instantaneous confirmation that their content is live.\n\nThis pipeline is the factory floor of TikTok's content flywheel. Its efficiency directly impacts:\n- Creator experience: Slow processing frustrates creators, reducing upload volume\n- Content freshness: Faster processing means newer content in the recommendation pool\n- Platform safety: Moderation delays can mean policy-violating content reaches millions\n- Infrastructure costs: Inefficient transcoding wastes CPU cycles at massive scale\n\nIn this page, we'll design a video creation pipeline capable of handling TikTok's scale while maintaining sub-minute creator confirmation times.

Learning Objectives

By the end of this page, you will understand: (1) On-device processing strategies for diverse device capabilities, (2) Resumable upload protocols for unreliable networks, (3) Scalable transcoding architecture with priority queues, (4) Multi-stage content moderation (automated + human), and (5) Cost-optimized storage strategies for videos at petabyte scale.

Pipeline Overview: From Tap to Global CDN

Before diving into individual components, let's trace the complete journey of a video from creation to distribution. Understanding this end-to-end flow reveals why each component exists and how they interact.

Converting Mermaid diagram...

Pipeline Stages and SLOs
Stage	Description	Target Latency	Failure Handling
On-Device Processing	Recording, effects, local encoding	Real-time	Client-side retry; offline support
Upload	Chunked, resumable transfer	Varies by network	Auto-retry; resume from last chunk
Validation	Format check, virus scan, size limits	<1 second	Reject with error message
Transcoding	Multi-resolution encoding	<60 seconds	Retry queue; alert on backlog
Thumbnail Gen	Extract key frames	<10 seconds	Use first frame as fallback
Automated Moderation	ML policy check	<5 seconds	Fail-open with human escalation
CDN Push	Distribute to edge nodes	<30 seconds	Lazy push on first request
Index Update	Make searchable	<5 minutes	Async retry; search lag acceptable

On-Device Processing: The Silent Workhorse

A key insight in TikTok's architecture is pushing as much processing as possible to the device. This serves multiple purposes:\n\n- Reduces server load: Billions of filter applications happen on phones, not servers\n- Instant feedback: Users see effects in real-time without network round-trips\n- Device diversity: The app adapts to device capabilities (high-end phones get better effects)\n- Offline support: Users can create content without network connectivity\n\nBut on-device processing introduces significant engineering challenges.

On-Device Capabilities

•AR Filters & Face Detection — Real-time face mesh computation using on-device ML (TensorFlow Lite, Core ML). Must handle diverse skin tones, lighting conditions, and phone orientations.
•Effects & Transitions — GPU-accelerated shader effects for filters, color grading, and transitions. Optimized shaders for different GPU architectures (Adreno, Mali, Apple GPU).
•Music Sync — Audio waveform analysis to suggest beat-aligned cut points. Licensed music library cached locally for offline access.
•Text & Stickers — Vector rendering of overlays with precise positioning. Supports multiple languages and custom fonts.
•Green Screen — Real-time background removal using on-device segmentation models. Quality varies by device capability.

High-End Device Strategy

•4K recording support
•60fps video capture
•Full AR filter suite
•Real-time beautification
•On-device H.265 encoding
•Neural style transfer effects

Low-End Device Strategy

•720p max resolution
•30fps video capture
•Subset of GPU filters
•Simpler face detection
•H.264 encoding only
•Effects applied server-side

Capability Detection Pattern

The app performs capability detection on launch: GPU benchmarks, available RAM, CPU cores, ML accelerator presence. This creates a device profile that determines available features. Feature flags are downloaded server-side to enable/disable effects per device class. This allows gradual rollout of new features to capable devices first.

Pre-Upload Encoding\n\nBefore upload, the device performs local encoding to optimize file size and prepare for server processing:\n\n\nRecorded Video (raw) → ~500MB for 60s\n ↓ On-device H.264/H.265 encoding\nCompressed Video → ~10-30MB for 60s\n ↓ Chunked for resumable upload\nUpload Chunks → 2MB each\n\n\nThe client uses adaptive encoding settings based on network conditions. On slow networks, it may encode at lower bitrate to reduce upload time. On fast networks, it preserves quality for better server-side transcoding.

Resumable Upload Protocol

TikTok users span the globe, including regions with unreliable mobile networks. A simple HTTP POST for video upload would fail constantly on flaky 3G connections. Instead, TikTok uses a sophisticated resumable upload protocol similar to the tus protocol or Google's resumable upload API.\n\nWhy Resumable Uploads Matter\n\n- Average video size: 10-30MB\n- Upload on 3G (1Mbps): 80-240 seconds\n- Connection interruption probability over 3 minutes: >50% in some regions\n- Without resumable: Every interruption = start from scratch\n- With resumable: Continue from last successful chunk

resumable-upload-protocol.md
Protocol Flow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
## Phase 1: Initiate Upload Session
 
POST /upload/initiate
Headers:
  X-Upload-Size: 15728640  # 15MB total
  X-Upload-Checksum: sha256:abc123...
  X-Device-ID: device-xyz
  X-Session-Token: jwt...
  
Response:
{
  "upload_id": "up_123456",
  "chunk_size": 2097152,  # 2MB recommended chunks
  "upload_url": "https://upload-na.tiktok.com/v1/up_123456",
  "expires_at": "2024-01-15T12:00:00Z"
}
 
## Phase 2: Upload Chunks
 
PATCH /v1/up_123456
Headers:
  Content-Range: bytes 0-2097151/15728640
  Content-Length: 2097152
  X-Chunk-Checksum: sha256:chunk1hash
Body: [binary chunk data]
 
Response: 
{ "bytes_received": 2097152, "next_offset": 2097152 }
 
## Phase 3: Resume After Interruption
 
HEAD /v1/up_123456
Response Headers:
  X-Bytes-Received: 8388608  # Server has 8MB
 
# Client resumes from byte 8388608
PATCH /v1/up_123456
Headers:
  Content-Range: bytes 8388608-10485759/15728640
 
## Phase 4: Finalize Upload
 
POST /v1/up_123456/finalize
Headers:
  X-Total-Checksum: sha256:abc123...
 
Response:
{
  "video_id": "vid_789",
  "status": "processing",
  "processing_eta_seconds": 45
}

Upload Infrastructure Design

•Geographic Upload Servers — Upload endpoints in every major region (upload-na, upload-eu, upload-asia, etc.) to minimize initial connection latency. Client connects to nearest based on GeoDNS.
•Temporary Chunk Storage — Chunks stored in regional object storage (S3/GCS) with 24-hour TTL. If upload never completes, chunks auto-expire. No orphan cleanup needed.
•Chunk Verification — Each chunk has its own checksum. Corrupted chunks are rejected immediately; client retries just that chunk. Final checksum validates complete file.
•Parallel Chunk Upload — For high-bandwidth connections, client can upload 2-3 chunks in parallel, reducing total upload time by 40-60%.
•Upload Progress Webhooks — Server can send webhooks on upload progress for analytics and debugging slow uploads by region.

Idempotency is Critical

Network failures mean clients may retry the same chunk multiple times. The server must be idempotent—uploading the same chunk twice should not corrupt the file. Implement by: (1) Tracking received byte ranges, (2) Rejecting overlapping ranges, (3) Using checksums to validate chunk identity.

Validation Service: The Gatekeeper

Once upload completes, the video enters the validation stage. This is a fast, synchronous check that either accepts the video for processing or rejects it immediately with clear error messaging.\n\nWhy Validation Matters\n\n- Security: Reject malicious files before they enter processing pipelines\n- Cost Control: Don't waste transcoding resources on invalid content\n- User Experience: Fast rejection with clear errors helps creators fix issues\n- Compliance: Early check for obvious policy violations (file size, duration limits)

Validation Checks
Check Type	What's Validated	Rejection Reason	SLO
File Integrity	Checksum matches; file not corrupted	Upload failed, please retry	<100ms
Format Detection	Container: MP4, MOV, WebM	Unsupported format	<100ms
Codec Check	Video: H.264, H.265; Audio: AAC, MP3	Unsupported codec	<200ms
Size Limits	Max 500MB; >1KB	File too large/small	<50ms
Duration Limits	15s to 10min (varies by account)	Duration exceeds limit	<200ms
Resolution Check	Min 540p; Max 4K	Resolution too low/high	<200ms
Virus Scan	ClamAV or similar	Upload blocked	<1s
Device Fingerprint	Known spammer device check	Temporary upload block	<100ms
Rate Limit	Max 20 uploads/hour per user	Upload limit reached	<50ms

validation-service.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
interface ValidationResult {
  valid: boolean;
  video_id?: string;
  errors?: ValidationError[];
  warnings?: ValidationWarning[];
}
 
interface ValidationError {
  code: string;
  message: string;
  user_message: string;  // Localized, user-friendly message
}
 
async function validateUpload(uploadId: string): Promise<ValidationResult> {
  const checks = await Promise.all([
    checkFileIntegrity(uploadId),
    checkFormatAndCodec(uploadId),
    checkSizeLimits(uploadId),
    checkDuration(uploadId),
    scanForViruses(uploadId),
    checkDeviceFingerprint(uploadId),
    checkRateLimit(uploadId),
  ]);
  
  const errors = checks
    .filter(c => !c.passed)
    .map(c => c.error);
  
  if (errors.length > 0) {
    await cleanupUpload(uploadId);  // Remove chunks
    await logRejection(uploadId, errors);  // For analytics
    return { valid: false, errors };
  }
  
  // Generate video ID and enqueue for processing
  const videoId = await createVideoRecord(uploadId);
  await enqueueForTranscoding(videoId, getPriority(uploadId));
  
  return { 
    valid: true, 
    video_id: videoId,
    warnings: checks
      .filter(c => c.warning)
      .map(c => c.warning)
  };
}

Fail-Fast, Explain Clearly

Validation errors should be returned in <1 second with actionable messages. 'Video too short' isn't helpful; 'Video must be at least 15 seconds (yours: 12s)' helps creators fix and retry immediately.

Transcoding Architecture: The Heavy Lifting

Transcoding is the most computationally expensive part of the pipeline. Each uploaded video must be converted into multiple resolutions and bitrates to support:\n\n- Adaptive Bitrate Streaming (ABR): Switch quality based on network conditions\n- Device Compatibility: Different codecs for different devices\n- Bandwidth Optimization: Lower quality for data-constrained users\n- Preview Optimization: Quick-loading previews for feed browsing\n\nAt 10 million uploads/day, this represents an enormous computing challenge.

Output Renditions (Per Video)
Rendition	Resolution	Bitrate	Use Case
1080p High	1920x1080	5 Mbps	WiFi playback, downloads
1080p Medium	1920x1080	2.5 Mbps	4G playback
720p	1280x720	1.5 Mbps	3G/4G playback
540p	960x540	800 Kbps	Slow connections
360p	640x360	400 Kbps	Very slow connections
Preview	480x270	200 Kbps	Feed thumbnail previews
Audio Only	N/A	128 Kbps	Background playback

Converting Mermaid diagram...

Transcoding Optimization Strategies

•GPU Acceleration — Modern GPUs (NVIDIA NVENC, AMD VCE) can transcode 10-50x faster than CPU for H.264/H.265. GPU workers handle high-resolution outputs; CPU workers handle simpler tasks.
•Parallel Encoding — Each resolution is encoded independently on different workers. A 60-second video that takes 30s to encode serially takes ~10s with parallel workers.
•Two-Pass Encoding — For quality-critical renditions (1080p), use two-pass encoding for optimal bitrate distribution. Single-pass for previews and low-res.
•Content-Aware Encoding — Static talking-head videos need less bitrate than fast-action dance videos. Analyze motion complexity to adjust bitrate targets.
•Priority Queuing — Verified creators (larger audiences) get faster processing. New account uploads may have lower priority to deter spam.
•Spot Instance Workers — Transcoding is batch-friendly. Use cloud spot/preemptible instances for 60-80% cost savings. Have on-demand capacity for queue spikes.

Capacity Planning Example

10M videos/day = ~115 videos/second. Average 60s video, 7 renditions, 1min encode each on GPU = 7 GPU-minutes per video. At 115/sec, you need 115 × 7 = 805 GPU workers continuously. Add 50% headroom for peaks = ~1,200 GPU workers globally. At $0.50/hr per GPU instance, that's $14,400/hour = $126M/year in transcoding compute alone.

Content Moderation Pipeline

Content moderation is a critical—and often underestimated—component of any user-generated content platform. TikTok must balance:\n\n- Speed: Content should be available quickly for creator satisfaction\n- Safety: Policy-violating content should never reach vulnerable audiences\n- Scale: 10M+ videos/day cannot be manually reviewed pre-publication\n- Nuance: Context matters—a surgery video is acceptable; graphic violence is not\n\nThe solution is a multi-tier system combining automated ML classification with human review for edge cases.

Converting Mermaid diagram...

ML Classifier Categories
Category	Detection Method	Action on Detection	False Positive Impact
Nudity/Sexual	Visual CNN, skin detection	Block or age-restrict	High (art, medical blocked)
Violence/Gore	Visual CNN, motion analysis	Block or limit reach	Medium (news, games)
Hate Speech	Audio transcription + NLP	Block with human review	High (context-dependent)
Dangerous Acts	Action recognition models	Limit reach, add warning	Medium (stunts vs sports)
Misinformation	Claim detection + fact-check DB	Add label, reduce reach	Very high (complex)
Copyright Music	Audio fingerprinting (like Shazam)	Mute audio or ask for license	Low (fingerprints accurate)
Known Bad Content	PDQ/PhotoDNA hash matching	Immediate block	Very low (hash matches)
Minor Safety	Age estimation, context	Block or restrict interactions	High (sensitive)

The Moderation Dilemma

Every moderation decision has tradeoffs. Aggressive automated moderation removes more bad content but also removes legitimate content (false positives), frustrating creators. Lenient moderation allows bad content through, harming users and creating regulatory risk. TikTok tuned toward aggressive removal, accepting ~10% false positive rate, with an appeals process to correct mistakes.

Human Review at Scale\n\nDespite sophisticated ML, human judgment is required for nuanced decisions. TikTok employs 10,000+ content moderators globally. Key operational considerations:\n\n- Specialization: Moderators specialize by content type (violence, hate speech, etc.) and language\n- Calibration: Regular calibration sessions to maintain consistent policy application\n- Wellbeing: Moderation of disturbing content causes psychological harm; rotation, breaks, and counseling support required\n- Follow-the-Sun: 24/7 coverage with shifts following time zones to handle content from any region\n- Escalation Path: Complex cases escalate to senior moderators, then policy teams, then legal\n\nAverage throughput: 300-500 video reviews per moderator per 8-hour shift = 30-60 seconds per video.

Storage Architecture: Petabyte-Scale Video

Storing 10M+ videos per day, each with 7 renditions, generates approximately 250TB of new video data daily. Managing this at cost-effective scale requires sophisticated storage tiering, lifecycle policies, and CDN integration.

Storage Tier Strategy

•Hot Tier (0-7 days) — NVMe SSD-backed storage for immediate CDN origin pulls. 80%+ of views happen within first 48 hours. High cost, high performance.
•Warm Tier (7-30 days) — Standard SSD storage. Content still receiving occasional views. Moderate cost, good performance.
•Cold Tier (30-90 days) — HDD-based object storage. Rarely accessed but must be available. Low cost, higher latency acceptable.
•Archive Tier (90+ days) — Glacier-class storage. Only accessed for legal holds, creator requests, or rare virality revival. Very low cost, multi-hour retrieval time.
•Deleted/Expired — Soft delete with 30-day recovery window, then permanent deletion. Reduces storage costs for churned content.

Storage Cost Model (Illustrative)
Tier	Storage Size	Cost per TB/month	Monthly Cost
Hot (0-7d)	~1.75 PB	$100	$175,000
Warm (7-30d)	~5.75 PB	$40	$230,000
Cold (30-90d)	~15 PB	$15	$225,000
Archive (90d+)	~100 PB	$4	$400,000
Total	~122 PB	—	~$1.03M/month

Smart Lifecycle Policies\n\nNot all content should age at the same rate. Machine learning predicts content longevity based on:\n\n- Engagement velocity: Rapidly decaying views = move to cold faster\n- Creator tier: Top creators' back catalogs remain accessible longer\n- Trending potential: Content matching current trends may resurge\n- Seasonality: Holiday content archived but restored seasonally\n\nThis predictive tiering can reduce storage costs by 20-30% compared to simple age-based policies.

CDN as Cache, Not Storage

CDN edge nodes cache popular content, but storage-of-record remains in origin. CDN cache hit rates of 95%+ mean only 5% of requests hit origin storage. Design origin for throughput, not latency. Cold storage with minutes of latency is acceptable if CDN handles the hot path.

Creator Confirmation and Initial Distribution

The final stage of the creation pipeline is confirming to the creator that their video is live and beginning its distribution journey. This handoff must feel instant while actually representing the completion of a complex async pipeline.

Creator Notification Flow

•Upload Complete (t=0) — Client receives upload_id; shows 'Processing...' spinner
•Preview Ready (t=20-30s) — Low-res preview rendition complete; creator can see their video on profile
•Moderation Passed (t=30-45s) — Automated checks passed; video marked 'published' in creator's view
•Full Processing (t=60-120s) — All renditions complete; video available in full quality
•CDN Warm (t=60-180s) — Video pushed to primary CDN POPs; ready for viewers
•Search Indexed (t=5-30 min) — Video appears in hashtag and sound searches
•Recommendation Pool (t=5-60 min) — Video enters cold-start evaluation for For You page

processing-status-polling.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// Client-side polling for processing status
interface ProcessingStatus {
  video_id: string;
  status: 'processing' | 'ready' | 'failed';
  progress_pct: number;
  preview_url?: string;
  full_url?: string;
  error_message?: string;
  estimated_complete_seconds?: number;
}
 
async function pollProcessingStatus(videoId: string): Promise<void> {
  const pollInterval = 2000; // 2 seconds
  const maxAttempts = 60;    // 2 minutes max
  
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const status = await fetchStatus(videoId);
    
    updateUI(status);
    
    if (status.status === 'ready') {
      showSuccessToast('Your video is now live!');
      navigateToProfile();
      return;
    }
    
    if (status.status === 'failed') {
      showErrorModal(status.error_message);
      return;
    }
    
    // Show preview as soon as available
    if (status.preview_url && !previewShown) {
      showPreview(status.preview_url);
      previewShown = true;
    }
    
    await sleep(pollInterval);
  }
  
  // Timeout: processing taking too long
  showWarning('Still processing. We\'ll notify you when ready.');
}

Perceived vs Actual Latency

The creator sees their video on their profile within 30 seconds (preview rendition). From their perspective, the video is 'live'. Full-quality playback and recommendation distribution happen in the background. This optimistic UI dramatically improves creator satisfaction while complex processing continues asynchronously.

Summary: Video Creation Pipeline

Key Takeaways

•On-Device Processing — Push computation to the edge. Effects, encoding, and chunking happen on phones, not servers. Adapt features to device capability.
•Resumable Uploads — Network reliability varies globally. Chunked, resumable protocols with checksums prevent lost uploads and save creator frustration.
•Fast Validation — Reject bad content immediately with clear errors. Don't waste transcoding resources on policy-violating or malformed uploads.
•Scalable Transcoding — GPU workers, priority queues, and parallel encoding handle 10M+ videos/day. Use spot instances for cost efficiency.
•Multi-Tier Moderation — ML classifiers handle 95%+ of moderation decisions in <5 seconds. Human review for edge cases. Hash matching for known bad content.
•Tiered Storage — Hot/warm/cold/archive tiers optimize cost for different content ages. Predictive lifecycle policies save 20-30%.
•Optimistic Confirmation — Show creator preview in 30s while full processing completes in 2 minutes. Perceived speed matters more than actual speed.

Coming Up Next\n\nWith videos successfully ingested into the platform, the next page explores the most complex and differentiating component of TikTok: the For You page recommendation algorithm. We'll examine how TikTok personalizes content for 1 billion users, handles cold start for new users and videos, and maintains real-time adaptation to user preferences.

Pipeline Complete

You now understand the complete video creation pipeline from tap to global CDN. The key architectural insight: decompose the problem into async stages, optimize each for its specific constraints, and use optimistic UI to decouple perceived latency from actual processing time.

2 / 6

Loading learning content...

System Design HLDTikTok Short Videos

Designing TikTok: Short-Form Video Platform

LevelAdvanced

Duration120 mins

TopicTikTok Short Videos

2 / 6

Video Creation Pipeline

The Factory Behind the Magic

Learning Objectives

Pipeline Overview: From Tap to Global CDN

Converting Mermaid diagram...

Pipeline Stages and SLOs
Stage	Description	Target Latency	Failure Handling
On-Device Processing	Recording, effects, local encoding	Real-time	Client-side retry; offline support
Upload	Chunked, resumable transfer	Varies by network	Auto-retry; resume from last chunk
Validation	Format check, virus scan, size limits	<1 second	Reject with error message
Transcoding	Multi-resolution encoding	<60 seconds	Retry queue; alert on backlog
Thumbnail Gen	Extract key frames	<10 seconds	Use first frame as fallback
Automated Moderation	ML policy check	<5 seconds	Fail-open with human escalation
CDN Push	Distribute to edge nodes	<30 seconds	Lazy push on first request
Index Update	Make searchable	<5 minutes	Async retry; search lag acceptable

On-Device Processing: The Silent Workhorse

On-Device Capabilities

•AR Filters & Face Detection — Real-time face mesh computation using on-device ML (TensorFlow Lite, Core ML). Must handle diverse skin tones, lighting conditions, and phone orientations.
•Effects & Transitions — GPU-accelerated shader effects for filters, color grading, and transitions. Optimized shaders for different GPU architectures (Adreno, Mali, Apple GPU).
•Music Sync — Audio waveform analysis to suggest beat-aligned cut points. Licensed music library cached locally for offline access.
•Text & Stickers — Vector rendering of overlays with precise positioning. Supports multiple languages and custom fonts.
•Green Screen — Real-time background removal using on-device segmentation models. Quality varies by device capability.

High-End Device Strategy

•4K recording support
•60fps video capture
•Full AR filter suite
•Real-time beautification
•On-device H.265 encoding
•Neural style transfer effects

Low-End Device Strategy

•720p max resolution
•30fps video capture
•Subset of GPU filters
•Simpler face detection
•H.264 encoding only
•Effects applied server-side

Capability Detection Pattern

Resumable Upload Protocol

resumable-upload-protocol.md
Protocol Flow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
## Phase 1: Initiate Upload Session
 
POST /upload/initiate
Headers:
  X-Upload-Size: 15728640  # 15MB total
  X-Upload-Checksum: sha256:abc123...
  X-Device-ID: device-xyz
  X-Session-Token: jwt...
  
Response:
{
  "upload_id": "up_123456",
  "chunk_size": 2097152,  # 2MB recommended chunks
  "upload_url": "https://upload-na.tiktok.com/v1/up_123456",
  "expires_at": "2024-01-15T12:00:00Z"
}
 
## Phase 2: Upload Chunks
 
PATCH /v1/up_123456
Headers:
  Content-Range: bytes 0-2097151/15728640
  Content-Length: 2097152
  X-Chunk-Checksum: sha256:chunk1hash
Body: [binary chunk data]
 
Response: 
{ "bytes_received": 2097152, "next_offset": 2097152 }
 
## Phase 3: Resume After Interruption
 
HEAD /v1/up_123456
Response Headers:
  X-Bytes-Received: 8388608  # Server has 8MB
 
# Client resumes from byte 8388608
PATCH /v1/up_123456
Headers:
  Content-Range: bytes 8388608-10485759/15728640
 
## Phase 4: Finalize Upload
 
POST /v1/up_123456/finalize
Headers:
  X-Total-Checksum: sha256:abc123...
 
Response:
{
  "video_id": "vid_789",
  "status": "processing",
  "processing_eta_seconds": 45
}

Upload Infrastructure Design

•Geographic Upload Servers — Upload endpoints in every major region (upload-na, upload-eu, upload-asia, etc.) to minimize initial connection latency. Client connects to nearest based on GeoDNS.
•Temporary Chunk Storage — Chunks stored in regional object storage (S3/GCS) with 24-hour TTL. If upload never completes, chunks auto-expire. No orphan cleanup needed.
•Chunk Verification — Each chunk has its own checksum. Corrupted chunks are rejected immediately; client retries just that chunk. Final checksum validates complete file.
•Parallel Chunk Upload — For high-bandwidth connections, client can upload 2-3 chunks in parallel, reducing total upload time by 40-60%.
•Upload Progress Webhooks — Server can send webhooks on upload progress for analytics and debugging slow uploads by region.

Idempotency is Critical

Validation Service: The Gatekeeper

Validation Checks
Check Type	What's Validated	Rejection Reason	SLO
File Integrity	Checksum matches; file not corrupted	Upload failed, please retry	<100ms
Format Detection	Container: MP4, MOV, WebM	Unsupported format	<100ms
Codec Check	Video: H.264, H.265; Audio: AAC, MP3	Unsupported codec	<200ms
Size Limits	Max 500MB; >1KB	File too large/small	<50ms
Duration Limits	15s to 10min (varies by account)	Duration exceeds limit	<200ms
Resolution Check	Min 540p; Max 4K	Resolution too low/high	<200ms
Virus Scan	ClamAV or similar	Upload blocked	<1s
Device Fingerprint	Known spammer device check	Temporary upload block	<100ms
Rate Limit	Max 20 uploads/hour per user	Upload limit reached	<50ms

validation-service.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
interface ValidationResult {
  valid: boolean;
  video_id?: string;
  errors?: ValidationError[];
  warnings?: ValidationWarning[];
}
 
interface ValidationError {
  code: string;
  message: string;
  user_message: string;  // Localized, user-friendly message
}
 
async function validateUpload(uploadId: string): Promise<ValidationResult> {
  const checks = await Promise.all([
    checkFileIntegrity(uploadId),
    checkFormatAndCodec(uploadId),
    checkSizeLimits(uploadId),
    checkDuration(uploadId),
    scanForViruses(uploadId),
    checkDeviceFingerprint(uploadId),
    checkRateLimit(uploadId),
  ]);
  
  const errors = checks
    .filter(c => !c.passed)
    .map(c => c.error);
  
  if (errors.length > 0) {
    await cleanupUpload(uploadId);  // Remove chunks
    await logRejection(uploadId, errors);  // For analytics
    return { valid: false, errors };
  }
  
  // Generate video ID and enqueue for processing
  const videoId = await createVideoRecord(uploadId);
  await enqueueForTranscoding(videoId, getPriority(uploadId));
  
  return { 
    valid: true, 
    video_id: videoId,
    warnings: checks
      .filter(c => c.warning)
      .map(c => c.warning)
  };
}

Fail-Fast, Explain Clearly

Validation errors should be returned in <1 second with actionable messages. 'Video too short' isn't helpful; 'Video must be at least 15 seconds (yours: 12s)' helps creators fix and retry immediately.

Transcoding Architecture: The Heavy Lifting

Output Renditions (Per Video)
Rendition	Resolution	Bitrate	Use Case
1080p High	1920x1080	5 Mbps	WiFi playback, downloads
1080p Medium	1920x1080	2.5 Mbps	4G playback
720p	1280x720	1.5 Mbps	3G/4G playback
540p	960x540	800 Kbps	Slow connections
360p	640x360	400 Kbps	Very slow connections
Preview	480x270	200 Kbps	Feed thumbnail previews
Audio Only	N/A	128 Kbps	Background playback

Converting Mermaid diagram...

Transcoding Optimization Strategies

•GPU Acceleration — Modern GPUs (NVIDIA NVENC, AMD VCE) can transcode 10-50x faster than CPU for H.264/H.265. GPU workers handle high-resolution outputs; CPU workers handle simpler tasks.
•Parallel Encoding — Each resolution is encoded independently on different workers. A 60-second video that takes 30s to encode serially takes ~10s with parallel workers.
•Two-Pass Encoding — For quality-critical renditions (1080p), use two-pass encoding for optimal bitrate distribution. Single-pass for previews and low-res.
•Content-Aware Encoding — Static talking-head videos need less bitrate than fast-action dance videos. Analyze motion complexity to adjust bitrate targets.
•Priority Queuing — Verified creators (larger audiences) get faster processing. New account uploads may have lower priority to deter spam.
•Spot Instance Workers — Transcoding is batch-friendly. Use cloud spot/preemptible instances for 60-80% cost savings. Have on-demand capacity for queue spikes.

Capacity Planning Example

Content Moderation Pipeline

Converting Mermaid diagram...

ML Classifier Categories
Category	Detection Method	Action on Detection	False Positive Impact
Nudity/Sexual	Visual CNN, skin detection	Block or age-restrict	High (art, medical blocked)
Violence/Gore	Visual CNN, motion analysis	Block or limit reach	Medium (news, games)
Hate Speech	Audio transcription + NLP	Block with human review	High (context-dependent)
Dangerous Acts	Action recognition models	Limit reach, add warning	Medium (stunts vs sports)
Misinformation	Claim detection + fact-check DB	Add label, reduce reach	Very high (complex)
Copyright Music	Audio fingerprinting (like Shazam)	Mute audio or ask for license	Low (fingerprints accurate)
Known Bad Content	PDQ/PhotoDNA hash matching	Immediate block	Very low (hash matches)
Minor Safety	Age estimation, context	Block or restrict interactions	High (sensitive)

The Moderation Dilemma

Storage Architecture: Petabyte-Scale Video

Storage Tier Strategy

•Hot Tier (0-7 days) — NVMe SSD-backed storage for immediate CDN origin pulls. 80%+ of views happen within first 48 hours. High cost, high performance.
•Warm Tier (7-30 days) — Standard SSD storage. Content still receiving occasional views. Moderate cost, good performance.
•Cold Tier (30-90 days) — HDD-based object storage. Rarely accessed but must be available. Low cost, higher latency acceptable.
•Archive Tier (90+ days) — Glacier-class storage. Only accessed for legal holds, creator requests, or rare virality revival. Very low cost, multi-hour retrieval time.
•Deleted/Expired — Soft delete with 30-day recovery window, then permanent deletion. Reduces storage costs for churned content.

Storage Cost Model (Illustrative)
Tier	Storage Size	Cost per TB/month	Monthly Cost
Hot (0-7d)	~1.75 PB	$100	$175,000
Warm (7-30d)	~5.75 PB	$40	$230,000
Cold (30-90d)	~15 PB	$15	$225,000
Archive (90d+)	~100 PB	$4	$400,000
Total	~122 PB	—	~$1.03M/month

CDN as Cache, Not Storage

Creator Confirmation and Initial Distribution

Creator Notification Flow

•Upload Complete (t=0) — Client receives upload_id; shows 'Processing...' spinner
•Preview Ready (t=20-30s) — Low-res preview rendition complete; creator can see their video on profile
•Moderation Passed (t=30-45s) — Automated checks passed; video marked 'published' in creator's view
•Full Processing (t=60-120s) — All renditions complete; video available in full quality
•CDN Warm (t=60-180s) — Video pushed to primary CDN POPs; ready for viewers
•Search Indexed (t=5-30 min) — Video appears in hashtag and sound searches
•Recommendation Pool (t=5-60 min) — Video enters cold-start evaluation for For You page

processing-status-polling.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// Client-side polling for processing status
interface ProcessingStatus {
  video_id: string;
  status: 'processing' | 'ready' | 'failed';
  progress_pct: number;
  preview_url?: string;
  full_url?: string;
  error_message?: string;
  estimated_complete_seconds?: number;
}
 
async function pollProcessingStatus(videoId: string): Promise<void> {
  const pollInterval = 2000; // 2 seconds
  const maxAttempts = 60;    // 2 minutes max
  
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const status = await fetchStatus(videoId);
    
    updateUI(status);
    
    if (status.status === 'ready') {
      showSuccessToast('Your video is now live!');
      navigateToProfile();
      return;
    }
    
    if (status.status === 'failed') {
      showErrorModal(status.error_message);
      return;
    }
    
    // Show preview as soon as available
    if (status.preview_url && !previewShown) {
      showPreview(status.preview_url);
      previewShown = true;
    }
    
    await sleep(pollInterval);
  }
  
  // Timeout: processing taking too long
  showWarning('Still processing. We\'ll notify you when ready.');
}

Perceived vs Actual Latency

Summary: Video Creation Pipeline

Key Takeaways

•On-Device Processing — Push computation to the edge. Effects, encoding, and chunking happen on phones, not servers. Adapt features to device capability.
•Resumable Uploads — Network reliability varies globally. Chunked, resumable protocols with checksums prevent lost uploads and save creator frustration.
•Fast Validation — Reject bad content immediately with clear errors. Don't waste transcoding resources on policy-violating or malformed uploads.
•Scalable Transcoding — GPU workers, priority queues, and parallel encoding handle 10M+ videos/day. Use spot instances for cost efficiency.
•Multi-Tier Moderation — ML classifiers handle 95%+ of moderation decisions in <5 seconds. Human review for edge cases. Hash matching for known bad content.
•Tiered Storage — Hot/warm/cold/archive tiers optimize cost for different content ages. Predictive lifecycle policies save 20-30%.
•Optimistic Confirmation — Show creator preview in 30s while full processing completes in 2 minutes. Perceived speed matters more than actual speed.

Pipeline Complete

2 / 6