System Design (HLD)YouTube Video Platform

Designing YouTube: A Video Platform at Planetary Scale

LevelAdvanced

Duration180 mins

TopicYouTube Video Platform

2 / 6

Video Upload Pipeline

The Gateway to Content: Building a Bulletproof Upload Pipeline

The upload pipeline is the first touchpoint between creators and your platform. Its reliability directly impacts creator satisfaction, content velocity, and ultimately platform growth. A failed upload doesn't just frustrate a single creator—it potentially loses unique content that might never be re-uploaded.

At YouTube's scale, the upload pipeline must handle:

500+ hours of video per minute — approximately 8 hours per second, continuously
Extreme file size variance — from 5-second mobile clips (1MB) to 12-hour 4K streams (200GB+)
Hostile network conditions — uploads from mobile networks, developing regions, and congested connections
Diverse client capabilities — from JavaScript in browsers to native mobile SDKs to programmatic APIs

This page explores the architecture and design patterns that enable reliable uploads at this scale, focusing on the journey from a creator clicking 'Upload' to the video being safely stored and queued for processing.

What You Will Learn

By the end of this page, you will understand chunked resumable upload protocols, multipart upload orchestration, validation strategies, and queue-based processing triggers. You'll be able to design an upload system that achieves 99.9% success rate even under adverse network conditions.

Upload Architecture Overview

A robust upload pipeline consists of multiple coordinated stages, each designed to handle specific failure modes and optimize for different aspects of the upload experience.

Upload Pipeline Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
┌─────────────────────────────────────────────────────────────────────────────────┐
│                           VIDEO UPLOAD PIPELINE                                  │
└─────────────────────────────────────────────────────────────────────────────────┘
 
[Creator Device]
      │
      │ 1. Upload Request (metadata + size)
      ▼
┌─────────────────┐    ┌─────────────────────────────────────────────────────────┐
│   API Gateway   │───▶│  Rate Limiting │ Authentication │ Request Validation    │
└────────┬────────┘    └─────────────────────────────────────────────────────────┘
         │
         │ 2. Initiate Upload Session
         ▼
┌─────────────────┐    ┌─────────────────────────────────────────────────────────┐
│ Upload Service  │───▶│ Create Upload Session │ Generate Signed URLs │ Quota    │
└────────┬────────┘    └─────────────────────────────────────────────────────────┘
         │
         │ 3. Return Upload Configuration
         ▼
[Creator Device] ─────────────────────────────────────────────────────────────────
         │
         │ 4. Direct Chunk Upload (bypassing API Gateway)
         ▼
┌─────────────────┐    ┌─────────────────────────────────────────────────────────┐
│ Storage Service │───▶│ S3/GCS with Signed URLs │ Parallel Chunk Reception      │
│   (Edge PoP)    │    │ Automatic Retry │ Chunk Verification                    │
└────────┬────────┘    └─────────────────────────────────────────────────────────┘
         │
         │ 5. Chunk Completion Callbacks
         ▼
┌─────────────────┐    ┌─────────────────────────────────────────────────────────┐
│ Upload Service  │───▶│ Track Chunks │ Merge on Completion │ Update Progress   │
└────────┬────────┘    └─────────────────────────────────────────────────────────┘
         │
         │ 6. All Chunks Received → Finalize
         ▼
┌─────────────────┐    ┌─────────────────────────────────────────────────────────┐
│ Validation Svc  │───▶│ Container Validation │ Codec Detection │ Virus Scan    │
└────────┬────────┘    └─────────────────────────────────────────────────────────┘
         │
         │ 7. Validation Passed → Queue for Processing
         ▼
┌─────────────────┐    ┌─────────────────────────────────────────────────────────┐
│  Message Queue  │───▶│ Kafka/SQS │ Priority Routing │ Deduplication           │
└─────────────────┘    └─────────────────────────────────────────────────────────┘
         │
         │ 8. Trigger Transcoding Pipeline
         ▼
      [TRANSCODING] ────▶ (Next Page)

Key Architectural Principles

•Direct-to-storage uploads — Clients upload directly to cloud storage (S3/GCS) via signed URLs, bypassing application servers. This offloads bandwidth from your infrastructure and leverages cloud provider's global edge network.
•Chunked uploads — Large files are split into chunks (typically 5-50MB). Each chunk is independently uploaded and verified, enabling parallelism and resumability.
•Stateful session tracking — Upload sessions maintain state: which chunks are received, retry counts, and expiration. State enables resumption after disconnection.
•Asynchronous validation — File validation (format, codec, safety) happens after upload completes, not inline. This prevents timeouts and enables comprehensive analysis.
•Event-driven processing trigger — Message queues decouple upload completion from processing initiation, enabling backpressure and priority management.

Resumable Upload Protocol Design

Resumable uploads are essential for reliability. Network interruptions, browser crashes, and mobile app backgrounding are common—and users should never need to restart a 2-hour upload from the beginning.

The protocol is modeled after Google's Resumable Upload Protocol and consists of three phases: Initiation, Chunk Upload, and Finalization.

upload-protocol.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// ================================================================
// PHASE 1: UPLOAD INITIATION
// ================================================================
 
interface UploadInitRequest {
  fileName: string;
  fileSize: number;           // Total bytes
  mimeType: string;           // e.g., "video/mp4"
  contentHash?: string;       // Optional: SHA-256 of entire file for dedup
  metadata: VideoMetadata;    // Title, description, tags, etc.
  resumeToken?: string;       // If resuming previous upload
}
 
interface UploadInitResponse {
  uploadId: string;           // Unique identifier for this upload session
  uploadUrls: ChunkUploadUrl[];  // Pre-signed URLs for each chunk
  chunkSize: number;          // Recommended chunk size (adaptive)
  expiresAt: Date;            // Session expiration (7 days)
  resumeEndpoint: string;     // URL to check upload status
}
 
interface ChunkUploadUrl {
  chunkIndex: number;
  url: string;                // Pre-signed PUT URL
  expiresAt: Date;            // URL expiration (typically 1 hour)
}
 
// Server-side initiation handler
async function initiateUpload(request: UploadInitRequest): Promise<UploadInitResponse> {
  // 1. Validate request
  validateFileSizeLimit(request.fileSize);  // Max 256GB
  validateMimeType(request.mimeType);
  validateQuota(request.userId);
  
  // 2. Check for duplicate/resume
  if (request.contentHash) {
    const existing = await findByContentHash(request.contentHash);
    if (existing) return handleDuplicate(existing);
  }
  if (request.resumeToken) {
    return resumeExistingUpload(request.resumeToken);
  }
  
  // 3. Calculate optimal chunk size based on file size and network quality
  const chunkSize = calculateOptimalChunkSize(request.fileSize);
  const chunkCount = Math.ceil(request.fileSize / chunkSize);
  
  // 4. Create upload session in database
  const session = await createUploadSession({
    userId: request.userId,
    fileName: request.fileName,
    fileSize: request.fileSize,
    chunkSize,
    chunkCount,
    metadata: request.metadata,
    status: 'INITIATED',
    expiresAt: addDays(now(), 7),
  });
  
  // 5. Generate pre-signed URLs for each chunk
  const uploadUrls = await generateChunkUrls(session.id, chunkCount);
  
  return {
    uploadId: session.id,
    uploadUrls,
    chunkSize,
    expiresAt: session.expiresAt,
    resumeEndpoint: `/uploads/${session.id}/status`,
  };
}
 
// Adaptive chunk size based on file size
function calculateOptimalChunkSize(fileSize: number): number {
  if (fileSize < 10 * MB) return 1 * MB;       // Small files: 1MB chunks
  if (fileSize < 100 * MB) return 5 * MB;      // Medium files: 5MB chunks
  if (fileSize < 1 * GB) return 10 * MB;       // Large files: 10MB chunks
  if (fileSize < 10 * GB) return 25 * MB;      // Very large files: 25MB chunks
  return 50 * MB;                               // Huge files: 50MB chunks
}

chunk-upload.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
// ================================================================
// PHASE 2: CHUNK UPLOAD
// ================================================================
 
// Client-side chunk upload with retry logic
class ChunkUploader {
  private maxRetries = 5;
  private baseDelay = 1000; // 1 second
  
  async uploadChunk(
    chunk: ArrayBuffer,
    url: string,
    chunkIndex: number,
    onProgress: (loaded: number, total: number) => void
  ): Promise<ChunkUploadResult> {
    let lastError: Error | null = null;
    
    for (let attempt = 0; attempt < this.maxRetries; attempt++) {
      try {
        // Calculate chunk hash for integrity verification
        const chunkHash = await this.sha256(chunk);
        
        const response = await fetch(url, {
          method: 'PUT',
          body: chunk,
          headers: {
            'Content-Type': 'application/octet-stream',
            'Content-Length': chunk.byteLength.toString(),
            'X-Chunk-Hash': chunkHash,
          },
          // AbortController for timeout
          signal: AbortSignal.timeout(300000), // 5 minute timeout per chunk
        });
        
        if (!response.ok) {
          throw new UploadError(response.status, await response.text());
        }
        
        // Verify ETag matches our hash (S3 returns MD5 as ETag)
        const etag = response.headers.get('ETag');
        
        return {
          chunkIndex,
          success: true,
          etag,
          bytesUploaded: chunk.byteLength,
        };
        
      } catch (error) {
        lastError = error as Error;
        
        // Don't retry on client errors (4xx)
        if (error instanceof UploadError && error.status >= 400 && error.status < 500) {
          throw error;
        }
        
        // Exponential backoff with jitter
        const delay = this.baseDelay * Math.pow(2, attempt) + Math.random() * 1000;
        await this.sleep(delay);
      }
    }
    
    throw new MaxRetriesExceededError(chunkIndex, lastError);
  }
  
  // Parallel upload with concurrency control
  async uploadAllChunks(
    file: File,
    urls: ChunkUploadUrl[],
    chunkSize: number,
    concurrency: number = 4
  ): Promise<void> {
    const chunks: ChunkTask[] = [];
    
    // Create chunk tasks
    for (let i = 0; i < urls.length; i++) {
      chunks.push({
        index: i,
        start: i * chunkSize,
        end: Math.min((i + 1) * chunkSize, file.size),
        url: urls[i].url,
        status: 'pending',
      });
    }
    
    // Process with limited concurrency (using a semaphore pattern)
    const semaphore = new Semaphore(concurrency);
    
    const tasks = chunks.map(async (chunk) => {
      await semaphore.acquire();
      try {
        const data = await file.slice(chunk.start, chunk.end).arrayBuffer();
        const result = await this.uploadChunk(
          data,
          chunk.url,
          chunk.index,
          (loaded, total) => this.updateProgress(chunk.index, loaded, total)
        );
        chunk.status = 'completed';
        chunk.etag = result.etag;
      } finally {
        semaphore.release();
      }
    });
    
    await Promise.all(tasks);
  }
}

Concurrency Sweet Spot

Parallel chunk uploads dramatically improve throughput, but too many concurrent uploads can saturate the client's connection. Start with 4-6 concurrent chunks and adaptively adjust based on observed throughput and failure rates.

Upload Finalization and Assembly

Once all chunks are uploaded, the system must assemble them into a single coherent file, verify integrity, and prepare for processing. This phase is where many upload systems fail at scale—proper handling of partial uploads, orphaned chunks, and race conditions is critical.

upload-finalization.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
// ================================================================
// PHASE 3: UPLOAD FINALIZATION
// ================================================================
 
interface FinalizeRequest {
  uploadId: string;
  chunkETags: ChunkETag[];  // Client-provided ETags for verification
  finalHash?: string;       // Optional SHA-256 of complete file
}
 
interface ChunkETag {
  chunkIndex: number;
  etag: string;
}
 
async function finalizeUpload(request: FinalizeRequest): Promise<FinalizeResponse> {
  const session = await getUploadSession(request.uploadId);
  
  // 1. Verify session is valid and not expired
  if (session.status !== 'UPLOADING') {
    throw new InvalidSessionStateError(session.status);
  }
  if (session.expiresAt < new Date()) {
    throw new SessionExpiredError(request.uploadId);
  }
  
  // 2. Verify all chunks are present
  const uploadedChunks = await listUploadedChunks(session.id);
  if (uploadedChunks.length !== session.chunkCount) {
    const missing = findMissingChunks(uploadedChunks, session.chunkCount);
    throw new MissingChunksError(missing);
  }
  
  // 3. Verify ETags match (integrity check)
  for (const provided of request.chunkETags) {
    const stored = uploadedChunks.find(c => c.index === provided.chunkIndex);
    if (stored?.etag !== provided.etag) {
      throw new ChunkMismatchError(provided.chunkIndex);
    }
  }
  
  // 4. Initiate multipart completion (cloud storage operation)
  // This tells S3/GCS to assemble chunks into final object
  const assemblyResult = await completeMultipartUpload({
    bucket: session.bucket,
    key: session.objectKey,
    uploadId: session.cloudUploadId,
    parts: uploadedChunks.map(c => ({
      partNumber: c.index + 1,  // S3 uses 1-based indexing
      etag: c.etag,
    })),
  });
  
  // 5. Verify final file size matches expected
  const finalObject = await headObject(session.bucket, session.objectKey);
  if (finalObject.contentLength !== session.fileSize) {
    throw new SizeMismatchError(session.fileSize, finalObject.contentLength);
  }
  
  // 6. Calculate content hash if not provided (for dedup)
  const contentHash = request.finalHash ?? await calculateFileHash(
    session.bucket, 
    session.objectKey
  );
  
  // 7. Update session status
  await updateSession(session.id, {
    status: 'UPLOADED',
    contentHash,
    uploadedAt: new Date(),
    storageLocation: {
      bucket: session.bucket,
      key: session.objectKey,
      region: session.region,
    },
  });
  
  // 8. Queue for validation and processing
  await publishToQueue('video-validation', {
    uploadId: session.id,
    videoId: session.videoId,
    storageLocation: session.storageLocation,
    metadata: session.metadata,
    priority: calculatePriority(session),
  });
  
  // 9. Cleanup: delete individual chunk objects (keep only assembled file)
  await scheduleChunkCleanup(session.id);
  
  return {
    success: true,
    videoId: session.videoId,
    status: 'PROCESSING',
    estimatedProcessingTime: estimateProcessingTime(session.fileSize),
  };
}
 
// Handle orphaned uploads (cleanup job)
async function cleanupOrphanedUploads(): Promise<void> {
  // Find sessions that haven't completed in 7 days
  const orphaned = await findOrphanedSessions({
    maxAge: Duration.days(7),
    status: ['INITIATED', 'UPLOADING'],
  });
  
  for (const session of orphaned) {
    // Delete all uploaded chunks
    await deleteChunks(session.bucket, session.objectKey);
    
    // Mark session as expired
    await updateSession(session.id, {
      status: 'EXPIRED',
      expiredAt: new Date(),
    });
    
    // Notify user if they have email notifications enabled
    await notifyUploadExpired(session.userId, session.fileName);
  }
}

Upload Status State Machine
State	Description	Valid Transitions	Retention
INITIATED	Session created, no chunks uploaded	UPLOADING, EXPIRED	7 days
UPLOADING	At least one chunk received	UPLOADED, EXPIRED	7 days
UPLOADED	All chunks received, file assembled	VALIDATING	Permanent
VALIDATING	Content validation in progress	PROCESSING, REJECTED	Permanent
PROCESSING	Transcoding in progress	READY, FAILED	Permanent
READY	Video available for playback	DELETED	Until deleted
REJECTED	Failed validation (malformed, unsafe)	DELETED	30 days for review
EXPIRED	Session timed out before completion	—	30 days then purged

Content Validation Pipeline

Before investing compute resources in transcoding, uploaded files must pass validation. This catches corrupt files, unsupported formats, and potentially malicious content before they propagate through the system.

Validation Stages

•Container validation — Verify the file is a valid container format (MP4, MKV, WebM, etc.). Parse headers and ensure structural integrity. Reject files that claim to be video but aren't.
•Codec detection — Extract video and audio codec information. Verify codecs are supported (H.264, H.265, VP9, AV1, AAC, Opus, etc.). Identify codec profiles and levels.
•Stream analysis — Count video/audio/subtitle tracks. Extract resolution, frame rate, bit rate, and duration. Verify streams are decodable by sampling frames.
•Virus/malware scanning — Scan file with multiple anti-malware engines. Check for known malicious signatures. Quarantine suspicious files for human review.
•Size/duration verification — Confirm file size matches upload metadata. Extract accurate duration. Flag suspicious discrepancies.
•Initial content analysis — Run lightweight content moderation. Detect obviously violating content (nudity, violence) before committing to full transcoding.

video-validation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
interface ValidationResult {
  valid: boolean;
  errors: ValidationError[];
  warnings: ValidationWarning[];
  mediaInfo: MediaInfo;
  contentFlags: ContentFlag[];
}
 
interface MediaInfo {
  container: string;           // e.g., "mp4"
  duration: number;            // seconds
  fileSize: number;            // bytes
  
  videoStreams: VideoStream[];
  audioStreams: AudioStream[];
  subtitleStreams: SubtitleStream[];
}
 
interface VideoStream {
  index: number;
  codec: string;               // e.g., "h264"
  profile: string;             // e.g., "high"
  level: string;               // e.g., "4.1"
  width: number;
  height: number;
  frameRate: number;           // fps
  bitRate: number;             // bps
  colorSpace: string;
  hdr: boolean;
  rotation: number;            // degrees
}
 
async function validateVideo(location: StorageLocation): Promise<ValidationResult> {
  const result: ValidationResult = {
    valid: true,
    errors: [],
    warnings: [],
    mediaInfo: {} as MediaInfo,
    contentFlags: [],
  };
  
  try {
    // 1. Container validation using FFprobe
    const probeResult = await ffprobe(location);
    if (!probeResult.format) {
      result.valid = false;
      result.errors.push({ code: 'INVALID_CONTAINER', message: 'Not a valid media container' });
      return result;
    }
    
    // 2. Extract media info
    result.mediaInfo = extractMediaInfo(probeResult);
    
    // 3. Validate video streams exist
    if (result.mediaInfo.videoStreams.length === 0) {
      result.valid = false;
      result.errors.push({ code: 'NO_VIDEO_STREAM', message: 'No video track found' });
      return result;
    }
    
    // 4. Check codec support
    const primaryVideo = result.mediaInfo.videoStreams[0];
    if (!SUPPORTED_VIDEO_CODECS.includes(primaryVideo.codec)) {
      result.valid = false;
      result.errors.push({ 
        code: 'UNSUPPORTED_CODEC', 
        message: `Video codec '${primaryVideo.codec}' is not supported` 
      });
      return result;
    }
    
    // 5. Check resolution limits
    if (primaryVideo.width > 8192 || primaryVideo.height > 4320) {
      result.valid = false;
      result.errors.push({ 
        code: 'RESOLUTION_TOO_HIGH', 
        message: `Resolution ${primaryVideo.width}x${primaryVideo.height} exceeds maximum` 
      });
      return result;
    }
    
    // 6. Check duration limits
    if (result.mediaInfo.duration > 43200) { // 12 hours
      result.valid = false;
      result.errors.push({ code: 'DURATION_TOO_LONG', message: 'Video exceeds 12 hour limit' });
      return result;
    }
    
    // 7. Verify decodability by extracting sample frames
    const decodable = await verifyDecodability(location, [0, 30, 60]);
    if (!decodable.success) {
      result.valid = false;
      result.errors.push({ code: 'DECODE_ERROR', message: decodable.error });
      return result;
    }
    
    // 8. Add warnings for suboptimal content
    if (primaryVideo.bitRate > 100_000_000) { // 100 Mbps
      result.warnings.push({ 
        code: 'HIGH_BITRATE', 
        message: 'Very high bitrate may result in slow processing' 
      });
    }
    
    if (result.mediaInfo.audioStreams.length === 0) {
      result.warnings.push({ code: 'NO_AUDIO', message: 'No audio track detected' });
    }
    
    // 9. Run lightweight content moderation
    const contentCheck = await quickContentScan(location);
    result.contentFlags = contentCheck.flags;
    
    if (contentCheck.requiresReview) {
      result.warnings.push({ 
        code: 'CONTENT_REVIEW', 
        message: 'Content flagged for manual review' 
      });
    }
    
    return result;
    
  } catch (error) {
    result.valid = false;
    result.errors.push({ 
      code: 'VALIDATION_FAILED', 
      message: `Validation error: ${error.message}` 
    });
    return result;
  }
}

Validation Performance

Validation must be fast—ideally under 30 seconds for any video length. Use sampling strategies: probe headers without reading entire file, decode only sample frames, and defer full content analysis to the processing pipeline.

Direct-to-Storage Upload Patterns

Routing multi-gigabyte uploads through your application servers creates bottlenecks and unnecessary cost. The industry-standard pattern is direct-to-storage uploads where clients upload directly to cloud storage (S3, GCS, Azure Blob) using pre-signed URLs generated by your backend.

presigned-url-generation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
// AWS S3 Presigned URL Generation
import { S3Client, PutObjectCommand, CreateMultipartUploadCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
 
const s3 = new S3Client({ region: process.env.AWS_REGION });
 
async function generatePresignedUrls(
  uploadSession: UploadSession
): Promise<ChunkUploadUrl[]> {
  const urls: ChunkUploadUrl[] = [];
  
  // For files > 5GB, use S3 multipart upload
  if (uploadSession.fileSize > 5 * GB) {
    return await generateMultipartUrls(uploadSession);
  }
  
  // For smaller files, single PUT operation
  const command = new PutObjectCommand({
    Bucket: uploadSession.bucket,
    Key: uploadSession.objectKey,
    ContentType: uploadSession.mimeType,
    ContentLength: uploadSession.fileSize,
  });
  
  const url = await getSignedUrl(s3, command, {
    expiresIn: 3600, // 1 hour
  });
  
  return [{
    chunkIndex: 0,
    url,
    expiresAt: new Date(Date.now() + 3600 * 1000),
  }];
}
 
async function generateMultipartUrls(
  uploadSession: UploadSession
): Promise<ChunkUploadUrl[]> {
  // 1. Initiate multipart upload
  const createCommand = new CreateMultipartUploadCommand({
    Bucket: uploadSession.bucket,
    Key: uploadSession.objectKey,
    ContentType: uploadSession.mimeType,
  });
  
  const multipartUpload = await s3.send(createCommand);
  const uploadId = multipartUpload.UploadId!;
  
  // Store uploadId in session for later completion
  await updateSession(uploadSession.id, { cloudUploadId: uploadId });
  
  // 2. Generate presigned URL for each part
  const urls: ChunkUploadUrl[] = [];
  
  for (let i = 0; i < uploadSession.chunkCount; i++) {
    const command = new UploadPartCommand({
      Bucket: uploadSession.bucket,
      Key: uploadSession.objectKey,
      UploadId: uploadId,
      PartNumber: i + 1, // S3 uses 1-based indexing
    });
    
    const url = await getSignedUrl(s3, command, {
      expiresIn: 3600,
    });
    
    urls.push({
      chunkIndex: i,
      url,
      expiresAt: new Date(Date.now() + 3600 * 1000),
    });
  }
  
  return urls;
}
 
// URL refresh for long uploads
async function refreshExpiredUrls(
  uploadId: string,
  expiredChunks: number[]
): Promise<ChunkUploadUrl[]> {
  const session = await getUploadSession(uploadId);
  
  // Verify session is still valid
  if (session.status !== 'UPLOADING' || session.expiresAt < new Date()) {
    throw new SessionExpiredError(uploadId);
  }
  
  const refreshed: ChunkUploadUrl[] = [];
  
  for (const chunkIndex of expiredChunks) {
    const command = new UploadPartCommand({
      Bucket: session.bucket,
      Key: session.objectKey,
      UploadId: session.cloudUploadId,
      PartNumber: chunkIndex + 1,
    });
    
    const url = await getSignedUrl(s3, command, {
      expiresIn: 3600,
    });
    
    refreshed.push({
      chunkIndex,
      url,
      expiresAt: new Date(Date.now() + 3600 * 1000),
    });
  }
  
  return refreshed;
}

Benefits

•Eliminates bandwidth bottleneck — Cloud storage handles petabytes of ingestion; your servers don't
•Leverages global edge — S3/GCS have edge locations worldwide; uploads go to nearest node
•Built-in reliability — Cloud storage handles retries, checksums, and durability internally
•Cost efficiency — No data transfer through compute instances; pay only storage ingestion

Challenges

•CORS configuration — Must configure storage bucket for cross-origin requests from web clients
•URL expiration management — Pre-signed URLs expire; must handle refresh for long uploads
•Progress tracking — Clients must report progress; no server-side visibility during upload
•Security considerations — Pre-signed URLs must be short-lived and scoped to specific objects

Upload Progress and Status Tracking

Users expect real-time feedback during uploads, especially for large files that may take hours. A robust status tracking system must handle multiple upload sources, provide accurate ETAs, and enable seamless resumption.

upload-status-tracking.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
interface UploadStatus {
  uploadId: string;
  videoId: string;
  status: UploadState;
  
  // Progress metrics
  progress: {
    bytesUploaded: number;
    bytesTotal: number;
    percentage: number;
    
    chunksCompleted: number;
    chunksTotal: number;
    
    uploadStartedAt: Date;
    estimatedCompletion: Date | null;
    
    currentSpeed: number;       // bytes/second (rolling 30s average)
    averageSpeed: number;       // bytes/second (session average)
  };
  
  // Processing status (post-upload)
  processing: {
    stage: ProcessingStage | null;  // 'validating' | 'transcoding' | 'analyzing'
    progress: number;               // 0-100
    estimatedCompletion: Date | null;
    availableFormats: VideoFormat[];   // Already processed formats
  };
  
  // Error information
  error: {
    code: string | null;
    message: string | null;
    retryable: boolean;
    retryAfter: Date | null;
  };
}
 
// Client-side progress aggregation
class ProgressTracker {
  private chunkProgress: Map<number, number> = new Map();
  private speedSamples: { timestamp: number; bytes: number }[] = [];
  private readonly SPEED_WINDOW = 30_000; // 30 seconds
  
  updateChunkProgress(chunkIndex: number, bytesUploaded: number): void {
    this.chunkProgress.set(chunkIndex, bytesUploaded);
    this.recordSpeedSample(bytesUploaded);
  }
  
  private recordSpeedSample(bytes: number): void {
    const now = Date.now();
    this.speedSamples.push({ timestamp: now, bytes });
    
    // Prune old samples
    const cutoff = now - this.SPEED_WINDOW;
    this.speedSamples = this.speedSamples.filter(s => s.timestamp > cutoff);
  }
  
  getCurrentSpeed(): number {
    if (this.speedSamples.length < 2) return 0;
    
    const first = this.speedSamples[0];
    const last = this.speedSamples[this.speedSamples.length - 1];
    
    const bytesDelta = last.bytes - first.bytes;
    const timeDelta = (last.timestamp - first.timestamp) / 1000; // seconds
    
    return timeDelta > 0 ? bytesDelta / timeDelta : 0;
  }
  
  getTotalProgress(): number {
    let total = 0;
    for (const bytes of this.chunkProgress.values()) {
      total += bytes;
    }
    return total;
  }
  
  getEstimatedCompletion(totalBytes: number): Date | null {
    const speed = this.getCurrentSpeed();
    if (speed <= 0) return null;
    
    const bytesRemaining = totalBytes - this.getTotalProgress();
    const secondsRemaining = bytesRemaining / speed;
    
    return new Date(Date.now() + secondsRemaining * 1000);
  }
}
 
// Server-side status API
async function getUploadStatus(uploadId: string): Promise<UploadStatus> {
  const session = await getUploadSession(uploadId);
  const chunks = await getUploadedChunks(uploadId);
  
  const bytesUploaded = chunks.reduce((sum, c) => sum + c.size, 0);
  
  // Get processing status if upload complete
  let processing = null;
  if (session.status === 'PROCESSING') {
    processing = await getProcessingStatus(session.videoId);
  }
  
  return {
    uploadId,
    videoId: session.videoId,
    status: session.status,
    progress: {
      bytesUploaded,
      bytesTotal: session.fileSize,
      percentage: (bytesUploaded / session.fileSize) * 100,
      chunksCompleted: chunks.length,
      chunksTotal: session.chunkCount,
      uploadStartedAt: session.uploadStartedAt,
      estimatedCompletion: null, // Client-side calculation
      currentSpeed: 0,           // Client-side calculation
      averageSpeed: bytesUploaded / ((Date.now() - session.uploadStartedAt.getTime()) / 1000),
    },
    processing,
    error: session.error || { code: null, message: null, retryable: false, retryAfter: null },
  };
}

Real-Time Updates via WebSocket

For a smoother UX, consider WebSocket connections for real-time status updates. The client establishes a WebSocket on upload start, and the server pushes status changes (chunk completions, validation results, processing progress) as they occur. This eliminates polling overhead and provides instant feedback.

Queue-Based Processing Trigger

The transition from upload to processing is orchestrated through message queues. This decoupling provides essential capabilities: backpressure handling when processing capacity is exceeded, priority scheduling for premium users or trending content, and retry semantics for transient failures.

processing-queue.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
// Message structure for processing queue
interface VideoProcessingMessage {
  messageId: string;
  uploadId: string;
  videoId: string;
  
  // Source location
  source: {
    bucket: string;
    key: string;
    region: string;
    sizeBytes: number;
  };
  
  // Media information from validation
  mediaInfo: MediaInfo;
  
  // Processing configuration
  config: {
    priority: Priority;           // 'critical' | 'high' | 'normal' | 'low'
    targetFormats: VideoFormat[]; // Renditions to generate
    features: ProcessingFeature[]; // 'captions' | 'thumbnails' | 'content_analysis'
    deadline?: Date;              // Optional: must complete by this time
  };
  
  // Metadata for asset creation
  metadata: VideoMetadata;
  
  // Tracking
  enqueuedAt: Date;
  retryCount: number;
  maxRetries: number;
}
 
// Priority calculation
function calculatePriority(session: UploadSession): Priority {
  // Premium subscribers get higher priority
  if (session.user.subscriptionTier === 'premium') {
    return 'high';
  }
  
  // Channels with high subscriber count get priority
  if (session.channel.subscriberCount > 1_000_000) {
    return 'high';
  }
  
  // Scheduled premieres need to meet deadline
  if (session.metadata.scheduledAt) {
    const hoursUntil = (session.metadata.scheduledAt.getTime() - Date.now()) / (1000 * 60 * 60);
    if (hoursUntil < 2) return 'critical';
    if (hoursUntil < 6) return 'high';
  }
  
  // Short videos process faster anyway, deprioritize
  if (session.mediaInfo.duration > 3600) {
    return 'low';
  }
  
  return 'normal';
}
 
// Queue publishing
async function enqueueForProcessing(session: UploadSession): Promise<void> {
  const message: VideoProcessingMessage = {
    messageId: generateUUID(),
    uploadId: session.id,
    videoId: session.videoId,
    source: {
      bucket: session.bucket,
      key: session.objectKey,
      region: session.region,
      sizeBytes: session.fileSize,
    },
    mediaInfo: session.validationResult.mediaInfo,
    config: {
      priority: calculatePriority(session),
      targetFormats: determineTargetFormats(session),
      features: determineFeatures(session),
    },
    metadata: session.metadata,
    enqueuedAt: new Date(),
    retryCount: 0,
    maxRetries: 3,
  };
  
  // Publish to appropriate priority queue
  const queueName = `video-processing-${message.config.priority}`;
  await kafka.send({
    topic: queueName,
    messages: [{
      key: session.videoId,
      value: JSON.stringify(message),
    }],
  });
  
  // Update session status
  await updateSession(session.id, {
    status: 'PROCESSING',
    processingEnqueuedAt: new Date(),
  });
  
  // Emit metric
  metrics.increment('processing.enqueued', {
    priority: message.config.priority,
    region: session.region,
  });
}
 
// Determine which formats to generate based on source
function determineTargetFormats(session: UploadSession): VideoFormat[] {
  const sourceHeight = session.mediaInfo.videoStreams[0].height;
  const formats: VideoFormat[] = [];
  
  // Always generate lower resolutions
  formats.push({ height: 144, codec: 'h264', container: 'mp4' });
  formats.push({ height: 240, codec: 'h264', container: 'mp4' });
  formats.push({ height: 360, codec: 'h264', container: 'mp4' });
  formats.push({ height: 480, codec: 'h264', container: 'mp4' });
  
  if (sourceHeight >= 720) {
    formats.push({ height: 720, codec: 'h264', container: 'mp4' });
    formats.push({ height: 720, codec: 'vp9', container: 'webm' });
  }
  
  if (sourceHeight >= 1080) {
    formats.push({ height: 1080, codec: 'h264', container: 'mp4' });
    formats.push({ height: 1080, codec: 'vp9', container: 'webm' });
  }
  
  if (sourceHeight >= 1440) {
    formats.push({ height: 1440, codec: 'vp9', container: 'webm' });
    formats.push({ height: 1440, codec: 'av1', container: 'mp4' });
  }
  
  if (sourceHeight >= 2160) {
    formats.push({ height: 2160, codec: 'vp9', container: 'webm' });
    formats.push({ height: 2160, codec: 'av1', container: 'mp4' });
  }
  
  return formats;
}

Priority Queue Configuration
Priority	Queue	Consumer Count	Max Wait Time	Use Case
Critical	video-processing-critical	50	5 minutes	Scheduled premieres, breaking news
High	video-processing-high	100	15 minutes	Premium users, popular channels
Normal	video-processing-normal	200	60 minutes	Standard uploads
Low	video-processing-low	50	4 hours	Very long videos, off-peak processing

Upload Pipeline Summary

We've designed a robust, scalable upload pipeline that handles the unique challenges of video ingestion at planetary scale. Let's consolidate the key design decisions:

Key Design Decisions

•Chunked resumable uploads — Split files into manageable chunks for parallel upload and failure recovery. Maintain session state for 7 days to enable resumption.
•Direct-to-storage pattern — Clients upload directly to cloud storage via pre-signed URLs, bypassing application servers. Reduces infrastructure cost and leverages cloud provider's global edge.
•Three-phase protocol — Initiation (create session, generate URLs), upload (parallel chunk transfer), finalization (assembly, verification). Clear state transitions at each phase.
•Stateful session tracking — Track every chunk's status in a database. Enable progress queries and seamless resumption after any interruption.
•Comprehensive validation — Container parsing, codec detection, stream analysis, malware scanning. Catch issues before investing in transcoding compute.
•Event-driven processing trigger — Message queues decouple upload from processing. Enable priority scheduling, backpressure, and retry semantics.
•Priority-based scheduling — Premium users and popular channels get faster processing. Scheduled content gets deadline-aware prioritization.

What's next:

With videos successfully uploaded and queued, we move to the heart of video processing: the Transcoding Architecture. The next page explores how to transform raw uploads into optimized formats suitable for playback across every device and network condition.

Page Complete

You now understand the architecture of a production-grade video upload pipeline. From resumable protocols to direct-to-storage patterns to queue-based processing triggers, these patterns form the foundation for reliable video ingestion at scale.

2 / 6

Loading learning content...

System Design (HLD)YouTube Video Platform

Designing YouTube: A Video Platform at Planetary Scale

LevelAdvanced

Duration180 mins

TopicYouTube Video Platform

2 / 6

Video Upload Pipeline

The Gateway to Content: Building a Bulletproof Upload Pipeline

At YouTube's scale, the upload pipeline must handle:

500+ hours of video per minute — approximately 8 hours per second, continuously
Extreme file size variance — from 5-second mobile clips (1MB) to 12-hour 4K streams (200GB+)
Hostile network conditions — uploads from mobile networks, developing regions, and congested connections
Diverse client capabilities — from JavaScript in browsers to native mobile SDKs to programmatic APIs

What You Will Learn

Upload Architecture Overview

A robust upload pipeline consists of multiple coordinated stages, each designed to handle specific failure modes and optimize for different aspects of the upload experience.

Upload Pipeline Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
┌─────────────────────────────────────────────────────────────────────────────────┐
│                           VIDEO UPLOAD PIPELINE                                  │
└─────────────────────────────────────────────────────────────────────────────────┘
 
[Creator Device]
      │
      │ 1. Upload Request (metadata + size)
      ▼
┌─────────────────┐    ┌─────────────────────────────────────────────────────────┐
│   API Gateway   │───▶│  Rate Limiting │ Authentication │ Request Validation    │
└────────┬────────┘    └─────────────────────────────────────────────────────────┘
         │
         │ 2. Initiate Upload Session
         ▼
┌─────────────────┐    ┌─────────────────────────────────────────────────────────┐
│ Upload Service  │───▶│ Create Upload Session │ Generate Signed URLs │ Quota    │
└────────┬────────┘    └─────────────────────────────────────────────────────────┘
         │
         │ 3. Return Upload Configuration
         ▼
[Creator Device] ─────────────────────────────────────────────────────────────────
         │
         │ 4. Direct Chunk Upload (bypassing API Gateway)
         ▼
┌─────────────────┐    ┌─────────────────────────────────────────────────────────┐
│ Storage Service │───▶│ S3/GCS with Signed URLs │ Parallel Chunk Reception      │
│   (Edge PoP)    │    │ Automatic Retry │ Chunk Verification                    │
└────────┬────────┘    └─────────────────────────────────────────────────────────┘
         │
         │ 5. Chunk Completion Callbacks
         ▼
┌─────────────────┐    ┌─────────────────────────────────────────────────────────┐
│ Upload Service  │───▶│ Track Chunks │ Merge on Completion │ Update Progress   │
└────────┬────────┘    └─────────────────────────────────────────────────────────┘
         │
         │ 6. All Chunks Received → Finalize
         ▼
┌─────────────────┐    ┌─────────────────────────────────────────────────────────┐
│ Validation Svc  │───▶│ Container Validation │ Codec Detection │ Virus Scan    │
└────────┬────────┘    └─────────────────────────────────────────────────────────┘
         │
         │ 7. Validation Passed → Queue for Processing
         ▼
┌─────────────────┐    ┌─────────────────────────────────────────────────────────┐
│  Message Queue  │───▶│ Kafka/SQS │ Priority Routing │ Deduplication           │
└─────────────────┘    └─────────────────────────────────────────────────────────┘
         │
         │ 8. Trigger Transcoding Pipeline
         ▼
      [TRANSCODING] ────▶ (Next Page)

Key Architectural Principles

•Direct-to-storage uploads — Clients upload directly to cloud storage (S3/GCS) via signed URLs, bypassing application servers. This offloads bandwidth from your infrastructure and leverages cloud provider's global edge network.
•Chunked uploads — Large files are split into chunks (typically 5-50MB). Each chunk is independently uploaded and verified, enabling parallelism and resumability.
•Stateful session tracking — Upload sessions maintain state: which chunks are received, retry counts, and expiration. State enables resumption after disconnection.
•Asynchronous validation — File validation (format, codec, safety) happens after upload completes, not inline. This prevents timeouts and enables comprehensive analysis.
•Event-driven processing trigger — Message queues decouple upload completion from processing initiation, enabling backpressure and priority management.

Resumable Upload Protocol Design

The protocol is modeled after Google's Resumable Upload Protocol and consists of three phases: Initiation, Chunk Upload, and Finalization.

upload-protocol.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// ================================================================
// PHASE 1: UPLOAD INITIATION
// ================================================================
 
interface UploadInitRequest {
  fileName: string;
  fileSize: number;           // Total bytes
  mimeType: string;           // e.g., "video/mp4"
  contentHash?: string;       // Optional: SHA-256 of entire file for dedup
  metadata: VideoMetadata;    // Title, description, tags, etc.
  resumeToken?: string;       // If resuming previous upload
}
 
interface UploadInitResponse {
  uploadId: string;           // Unique identifier for this upload session
  uploadUrls: ChunkUploadUrl[];  // Pre-signed URLs for each chunk
  chunkSize: number;          // Recommended chunk size (adaptive)
  expiresAt: Date;            // Session expiration (7 days)
  resumeEndpoint: string;     // URL to check upload status
}
 
interface ChunkUploadUrl {
  chunkIndex: number;
  url: string;                // Pre-signed PUT URL
  expiresAt: Date;            // URL expiration (typically 1 hour)
}
 
// Server-side initiation handler
async function initiateUpload(request: UploadInitRequest): Promise<UploadInitResponse> {
  // 1. Validate request
  validateFileSizeLimit(request.fileSize);  // Max 256GB
  validateMimeType(request.mimeType);
  validateQuota(request.userId);
  
  // 2. Check for duplicate/resume
  if (request.contentHash) {
    const existing = await findByContentHash(request.contentHash);
    if (existing) return handleDuplicate(existing);
  }
  if (request.resumeToken) {
    return resumeExistingUpload(request.resumeToken);
  }
  
  // 3. Calculate optimal chunk size based on file size and network quality
  const chunkSize = calculateOptimalChunkSize(request.fileSize);
  const chunkCount = Math.ceil(request.fileSize / chunkSize);
  
  // 4. Create upload session in database
  const session = await createUploadSession({
    userId: request.userId,
    fileName: request.fileName,
    fileSize: request.fileSize,
    chunkSize,
    chunkCount,
    metadata: request.metadata,
    status: 'INITIATED',
    expiresAt: addDays(now(), 7),
  });
  
  // 5. Generate pre-signed URLs for each chunk
  const uploadUrls = await generateChunkUrls(session.id, chunkCount);
  
  return {
    uploadId: session.id,
    uploadUrls,
    chunkSize,
    expiresAt: session.expiresAt,
    resumeEndpoint: `/uploads/${session.id}/status`,
  };
}
 
// Adaptive chunk size based on file size
function calculateOptimalChunkSize(fileSize: number): number {
  if (fileSize < 10 * MB) return 1 * MB;       // Small files: 1MB chunks
  if (fileSize < 100 * MB) return 5 * MB;      // Medium files: 5MB chunks
  if (fileSize < 1 * GB) return 10 * MB;       // Large files: 10MB chunks
  if (fileSize < 10 * GB) return 25 * MB;      // Very large files: 25MB chunks
  return 50 * MB;                               // Huge files: 50MB chunks
}

chunk-upload.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
// ================================================================
// PHASE 2: CHUNK UPLOAD
// ================================================================
 
// Client-side chunk upload with retry logic
class ChunkUploader {
  private maxRetries = 5;
  private baseDelay = 1000; // 1 second
  
  async uploadChunk(
    chunk: ArrayBuffer,
    url: string,
    chunkIndex: number,
    onProgress: (loaded: number, total: number) => void
  ): Promise<ChunkUploadResult> {
    let lastError: Error | null = null;
    
    for (let attempt = 0; attempt < this.maxRetries; attempt++) {
      try {
        // Calculate chunk hash for integrity verification
        const chunkHash = await this.sha256(chunk);
        
        const response = await fetch(url, {
          method: 'PUT',
          body: chunk,
          headers: {
            'Content-Type': 'application/octet-stream',
            'Content-Length': chunk.byteLength.toString(),
            'X-Chunk-Hash': chunkHash,
          },
          // AbortController for timeout
          signal: AbortSignal.timeout(300000), // 5 minute timeout per chunk
        });
        
        if (!response.ok) {
          throw new UploadError(response.status, await response.text());
        }
        
        // Verify ETag matches our hash (S3 returns MD5 as ETag)
        const etag = response.headers.get('ETag');
        
        return {
          chunkIndex,
          success: true,
          etag,
          bytesUploaded: chunk.byteLength,
        };
        
      } catch (error) {
        lastError = error as Error;
        
        // Don't retry on client errors (4xx)
        if (error instanceof UploadError && error.status >= 400 && error.status < 500) {
          throw error;
        }
        
        // Exponential backoff with jitter
        const delay = this.baseDelay * Math.pow(2, attempt) + Math.random() * 1000;
        await this.sleep(delay);
      }
    }
    
    throw new MaxRetriesExceededError(chunkIndex, lastError);
  }
  
  // Parallel upload with concurrency control
  async uploadAllChunks(
    file: File,
    urls: ChunkUploadUrl[],
    chunkSize: number,
    concurrency: number = 4
  ): Promise<void> {
    const chunks: ChunkTask[] = [];
    
    // Create chunk tasks
    for (let i = 0; i < urls.length; i++) {
      chunks.push({
        index: i,
        start: i * chunkSize,
        end: Math.min((i + 1) * chunkSize, file.size),
        url: urls[i].url,
        status: 'pending',
      });
    }
    
    // Process with limited concurrency (using a semaphore pattern)
    const semaphore = new Semaphore(concurrency);
    
    const tasks = chunks.map(async (chunk) => {
      await semaphore.acquire();
      try {
        const data = await file.slice(chunk.start, chunk.end).arrayBuffer();
        const result = await this.uploadChunk(
          data,
          chunk.url,
          chunk.index,
          (loaded, total) => this.updateProgress(chunk.index, loaded, total)
        );
        chunk.status = 'completed';
        chunk.etag = result.etag;
      } finally {
        semaphore.release();
      }
    });
    
    await Promise.all(tasks);
  }
}

Concurrency Sweet Spot

Upload Finalization and Assembly

upload-finalization.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
// ================================================================
// PHASE 3: UPLOAD FINALIZATION
// ================================================================
 
interface FinalizeRequest {
  uploadId: string;
  chunkETags: ChunkETag[];  // Client-provided ETags for verification
  finalHash?: string;       // Optional SHA-256 of complete file
}
 
interface ChunkETag {
  chunkIndex: number;
  etag: string;
}
 
async function finalizeUpload(request: FinalizeRequest): Promise<FinalizeResponse> {
  const session = await getUploadSession(request.uploadId);
  
  // 1. Verify session is valid and not expired
  if (session.status !== 'UPLOADING') {
    throw new InvalidSessionStateError(session.status);
  }
  if (session.expiresAt < new Date()) {
    throw new SessionExpiredError(request.uploadId);
  }
  
  // 2. Verify all chunks are present
  const uploadedChunks = await listUploadedChunks(session.id);
  if (uploadedChunks.length !== session.chunkCount) {
    const missing = findMissingChunks(uploadedChunks, session.chunkCount);
    throw new MissingChunksError(missing);
  }
  
  // 3. Verify ETags match (integrity check)
  for (const provided of request.chunkETags) {
    const stored = uploadedChunks.find(c => c.index === provided.chunkIndex);
    if (stored?.etag !== provided.etag) {
      throw new ChunkMismatchError(provided.chunkIndex);
    }
  }
  
  // 4. Initiate multipart completion (cloud storage operation)
  // This tells S3/GCS to assemble chunks into final object
  const assemblyResult = await completeMultipartUpload({
    bucket: session.bucket,
    key: session.objectKey,
    uploadId: session.cloudUploadId,
    parts: uploadedChunks.map(c => ({
      partNumber: c.index + 1,  // S3 uses 1-based indexing
      etag: c.etag,
    })),
  });
  
  // 5. Verify final file size matches expected
  const finalObject = await headObject(session.bucket, session.objectKey);
  if (finalObject.contentLength !== session.fileSize) {
    throw new SizeMismatchError(session.fileSize, finalObject.contentLength);
  }
  
  // 6. Calculate content hash if not provided (for dedup)
  const contentHash = request.finalHash ?? await calculateFileHash(
    session.bucket, 
    session.objectKey
  );
  
  // 7. Update session status
  await updateSession(session.id, {
    status: 'UPLOADED',
    contentHash,
    uploadedAt: new Date(),
    storageLocation: {
      bucket: session.bucket,
      key: session.objectKey,
      region: session.region,
    },
  });
  
  // 8. Queue for validation and processing
  await publishToQueue('video-validation', {
    uploadId: session.id,
    videoId: session.videoId,
    storageLocation: session.storageLocation,
    metadata: session.metadata,
    priority: calculatePriority(session),
  });
  
  // 9. Cleanup: delete individual chunk objects (keep only assembled file)
  await scheduleChunkCleanup(session.id);
  
  return {
    success: true,
    videoId: session.videoId,
    status: 'PROCESSING',
    estimatedProcessingTime: estimateProcessingTime(session.fileSize),
  };
}
 
// Handle orphaned uploads (cleanup job)
async function cleanupOrphanedUploads(): Promise<void> {
  // Find sessions that haven't completed in 7 days
  const orphaned = await findOrphanedSessions({
    maxAge: Duration.days(7),
    status: ['INITIATED', 'UPLOADING'],
  });
  
  for (const session of orphaned) {
    // Delete all uploaded chunks
    await deleteChunks(session.bucket, session.objectKey);
    
    // Mark session as expired
    await updateSession(session.id, {
      status: 'EXPIRED',
      expiredAt: new Date(),
    });
    
    // Notify user if they have email notifications enabled
    await notifyUploadExpired(session.userId, session.fileName);
  }
}

Upload Status State Machine
State	Description	Valid Transitions	Retention
INITIATED	Session created, no chunks uploaded	UPLOADING, EXPIRED	7 days
UPLOADING	At least one chunk received	UPLOADED, EXPIRED	7 days
UPLOADED	All chunks received, file assembled	VALIDATING	Permanent
VALIDATING	Content validation in progress	PROCESSING, REJECTED	Permanent
PROCESSING	Transcoding in progress	READY, FAILED	Permanent
READY	Video available for playback	DELETED	Until deleted
REJECTED	Failed validation (malformed, unsafe)	DELETED	30 days for review
EXPIRED	Session timed out before completion	—	30 days then purged

Content Validation Pipeline

Validation Stages

•Container validation — Verify the file is a valid container format (MP4, MKV, WebM, etc.). Parse headers and ensure structural integrity. Reject files that claim to be video but aren't.
•Codec detection — Extract video and audio codec information. Verify codecs are supported (H.264, H.265, VP9, AV1, AAC, Opus, etc.). Identify codec profiles and levels.
•Stream analysis — Count video/audio/subtitle tracks. Extract resolution, frame rate, bit rate, and duration. Verify streams are decodable by sampling frames.
•Virus/malware scanning — Scan file with multiple anti-malware engines. Check for known malicious signatures. Quarantine suspicious files for human review.
•Size/duration verification — Confirm file size matches upload metadata. Extract accurate duration. Flag suspicious discrepancies.
•Initial content analysis — Run lightweight content moderation. Detect obviously violating content (nudity, violence) before committing to full transcoding.

video-validation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
interface ValidationResult {
  valid: boolean;
  errors: ValidationError[];
  warnings: ValidationWarning[];
  mediaInfo: MediaInfo;
  contentFlags: ContentFlag[];
}
 
interface MediaInfo {
  container: string;           // e.g., "mp4"
  duration: number;            // seconds
  fileSize: number;            // bytes
  
  videoStreams: VideoStream[];
  audioStreams: AudioStream[];
  subtitleStreams: SubtitleStream[];
}
 
interface VideoStream {
  index: number;
  codec: string;               // e.g., "h264"
  profile: string;             // e.g., "high"
  level: string;               // e.g., "4.1"
  width: number;
  height: number;
  frameRate: number;           // fps
  bitRate: number;             // bps
  colorSpace: string;
  hdr: boolean;
  rotation: number;            // degrees
}
 
async function validateVideo(location: StorageLocation): Promise<ValidationResult> {
  const result: ValidationResult = {
    valid: true,
    errors: [],
    warnings: [],
    mediaInfo: {} as MediaInfo,
    contentFlags: [],
  };
  
  try {
    // 1. Container validation using FFprobe
    const probeResult = await ffprobe(location);
    if (!probeResult.format) {
      result.valid = false;
      result.errors.push({ code: 'INVALID_CONTAINER', message: 'Not a valid media container' });
      return result;
    }
    
    // 2. Extract media info
    result.mediaInfo = extractMediaInfo(probeResult);
    
    // 3. Validate video streams exist
    if (result.mediaInfo.videoStreams.length === 0) {
      result.valid = false;
      result.errors.push({ code: 'NO_VIDEO_STREAM', message: 'No video track found' });
      return result;
    }
    
    // 4. Check codec support
    const primaryVideo = result.mediaInfo.videoStreams[0];
    if (!SUPPORTED_VIDEO_CODECS.includes(primaryVideo.codec)) {
      result.valid = false;
      result.errors.push({ 
        code: 'UNSUPPORTED_CODEC', 
        message: `Video codec '${primaryVideo.codec}' is not supported` 
      });
      return result;
    }
    
    // 5. Check resolution limits
    if (primaryVideo.width > 8192 || primaryVideo.height > 4320) {
      result.valid = false;
      result.errors.push({ 
        code: 'RESOLUTION_TOO_HIGH', 
        message: `Resolution ${primaryVideo.width}x${primaryVideo.height} exceeds maximum` 
      });
      return result;
    }
    
    // 6. Check duration limits
    if (result.mediaInfo.duration > 43200) { // 12 hours
      result.valid = false;
      result.errors.push({ code: 'DURATION_TOO_LONG', message: 'Video exceeds 12 hour limit' });
      return result;
    }
    
    // 7. Verify decodability by extracting sample frames
    const decodable = await verifyDecodability(location, [0, 30, 60]);
    if (!decodable.success) {
      result.valid = false;
      result.errors.push({ code: 'DECODE_ERROR', message: decodable.error });
      return result;
    }
    
    // 8. Add warnings for suboptimal content
    if (primaryVideo.bitRate > 100_000_000) { // 100 Mbps
      result.warnings.push({ 
        code: 'HIGH_BITRATE', 
        message: 'Very high bitrate may result in slow processing' 
      });
    }
    
    if (result.mediaInfo.audioStreams.length === 0) {
      result.warnings.push({ code: 'NO_AUDIO', message: 'No audio track detected' });
    }
    
    // 9. Run lightweight content moderation
    const contentCheck = await quickContentScan(location);
    result.contentFlags = contentCheck.flags;
    
    if (contentCheck.requiresReview) {
      result.warnings.push({ 
        code: 'CONTENT_REVIEW', 
        message: 'Content flagged for manual review' 
      });
    }
    
    return result;
    
  } catch (error) {
    result.valid = false;
    result.errors.push({ 
      code: 'VALIDATION_FAILED', 
      message: `Validation error: ${error.message}` 
    });
    return result;
  }
}

Validation Performance

Direct-to-Storage Upload Patterns

presigned-url-generation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
// AWS S3 Presigned URL Generation
import { S3Client, PutObjectCommand, CreateMultipartUploadCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
 
const s3 = new S3Client({ region: process.env.AWS_REGION });
 
async function generatePresignedUrls(
  uploadSession: UploadSession
): Promise<ChunkUploadUrl[]> {
  const urls: ChunkUploadUrl[] = [];
  
  // For files > 5GB, use S3 multipart upload
  if (uploadSession.fileSize > 5 * GB) {
    return await generateMultipartUrls(uploadSession);
  }
  
  // For smaller files, single PUT operation
  const command = new PutObjectCommand({
    Bucket: uploadSession.bucket,
    Key: uploadSession.objectKey,
    ContentType: uploadSession.mimeType,
    ContentLength: uploadSession.fileSize,
  });
  
  const url = await getSignedUrl(s3, command, {
    expiresIn: 3600, // 1 hour
  });
  
  return [{
    chunkIndex: 0,
    url,
    expiresAt: new Date(Date.now() + 3600 * 1000),
  }];
}
 
async function generateMultipartUrls(
  uploadSession: UploadSession
): Promise<ChunkUploadUrl[]> {
  // 1. Initiate multipart upload
  const createCommand = new CreateMultipartUploadCommand({
    Bucket: uploadSession.bucket,
    Key: uploadSession.objectKey,
    ContentType: uploadSession.mimeType,
  });
  
  const multipartUpload = await s3.send(createCommand);
  const uploadId = multipartUpload.UploadId!;
  
  // Store uploadId in session for later completion
  await updateSession(uploadSession.id, { cloudUploadId: uploadId });
  
  // 2. Generate presigned URL for each part
  const urls: ChunkUploadUrl[] = [];
  
  for (let i = 0; i < uploadSession.chunkCount; i++) {
    const command = new UploadPartCommand({
      Bucket: uploadSession.bucket,
      Key: uploadSession.objectKey,
      UploadId: uploadId,
      PartNumber: i + 1, // S3 uses 1-based indexing
    });
    
    const url = await getSignedUrl(s3, command, {
      expiresIn: 3600,
    });
    
    urls.push({
      chunkIndex: i,
      url,
      expiresAt: new Date(Date.now() + 3600 * 1000),
    });
  }
  
  return urls;
}
 
// URL refresh for long uploads
async function refreshExpiredUrls(
  uploadId: string,
  expiredChunks: number[]
): Promise<ChunkUploadUrl[]> {
  const session = await getUploadSession(uploadId);
  
  // Verify session is still valid
  if (session.status !== 'UPLOADING' || session.expiresAt < new Date()) {
    throw new SessionExpiredError(uploadId);
  }
  
  const refreshed: ChunkUploadUrl[] = [];
  
  for (const chunkIndex of expiredChunks) {
    const command = new UploadPartCommand({
      Bucket: session.bucket,
      Key: session.objectKey,
      UploadId: session.cloudUploadId,
      PartNumber: chunkIndex + 1,
    });
    
    const url = await getSignedUrl(s3, command, {
      expiresIn: 3600,
    });
    
    refreshed.push({
      chunkIndex,
      url,
      expiresAt: new Date(Date.now() + 3600 * 1000),
    });
  }
  
  return refreshed;
}

Benefits

•Eliminates bandwidth bottleneck — Cloud storage handles petabytes of ingestion; your servers don't
•Leverages global edge — S3/GCS have edge locations worldwide; uploads go to nearest node
•Built-in reliability — Cloud storage handles retries, checksums, and durability internally
•Cost efficiency — No data transfer through compute instances; pay only storage ingestion

Challenges

•CORS configuration — Must configure storage bucket for cross-origin requests from web clients
•URL expiration management — Pre-signed URLs expire; must handle refresh for long uploads
•Progress tracking — Clients must report progress; no server-side visibility during upload
•Security considerations — Pre-signed URLs must be short-lived and scoped to specific objects

Upload Progress and Status Tracking

upload-status-tracking.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
interface UploadStatus {
  uploadId: string;
  videoId: string;
  status: UploadState;
  
  // Progress metrics
  progress: {
    bytesUploaded: number;
    bytesTotal: number;
    percentage: number;
    
    chunksCompleted: number;
    chunksTotal: number;
    
    uploadStartedAt: Date;
    estimatedCompletion: Date | null;
    
    currentSpeed: number;       // bytes/second (rolling 30s average)
    averageSpeed: number;       // bytes/second (session average)
  };
  
  // Processing status (post-upload)
  processing: {
    stage: ProcessingStage | null;  // 'validating' | 'transcoding' | 'analyzing'
    progress: number;               // 0-100
    estimatedCompletion: Date | null;
    availableFormats: VideoFormat[];   // Already processed formats
  };
  
  // Error information
  error: {
    code: string | null;
    message: string | null;
    retryable: boolean;
    retryAfter: Date | null;
  };
}
 
// Client-side progress aggregation
class ProgressTracker {
  private chunkProgress: Map<number, number> = new Map();
  private speedSamples: { timestamp: number; bytes: number }[] = [];
  private readonly SPEED_WINDOW = 30_000; // 30 seconds
  
  updateChunkProgress(chunkIndex: number, bytesUploaded: number): void {
    this.chunkProgress.set(chunkIndex, bytesUploaded);
    this.recordSpeedSample(bytesUploaded);
  }
  
  private recordSpeedSample(bytes: number): void {
    const now = Date.now();
    this.speedSamples.push({ timestamp: now, bytes });
    
    // Prune old samples
    const cutoff = now - this.SPEED_WINDOW;
    this.speedSamples = this.speedSamples.filter(s => s.timestamp > cutoff);
  }
  
  getCurrentSpeed(): number {
    if (this.speedSamples.length < 2) return 0;
    
    const first = this.speedSamples[0];
    const last = this.speedSamples[this.speedSamples.length - 1];
    
    const bytesDelta = last.bytes - first.bytes;
    const timeDelta = (last.timestamp - first.timestamp) / 1000; // seconds
    
    return timeDelta > 0 ? bytesDelta / timeDelta : 0;
  }
  
  getTotalProgress(): number {
    let total = 0;
    for (const bytes of this.chunkProgress.values()) {
      total += bytes;
    }
    return total;
  }
  
  getEstimatedCompletion(totalBytes: number): Date | null {
    const speed = this.getCurrentSpeed();
    if (speed <= 0) return null;
    
    const bytesRemaining = totalBytes - this.getTotalProgress();
    const secondsRemaining = bytesRemaining / speed;
    
    return new Date(Date.now() + secondsRemaining * 1000);
  }
}
 
// Server-side status API
async function getUploadStatus(uploadId: string): Promise<UploadStatus> {
  const session = await getUploadSession(uploadId);
  const chunks = await getUploadedChunks(uploadId);
  
  const bytesUploaded = chunks.reduce((sum, c) => sum + c.size, 0);
  
  // Get processing status if upload complete
  let processing = null;
  if (session.status === 'PROCESSING') {
    processing = await getProcessingStatus(session.videoId);
  }
  
  return {
    uploadId,
    videoId: session.videoId,
    status: session.status,
    progress: {
      bytesUploaded,
      bytesTotal: session.fileSize,
      percentage: (bytesUploaded / session.fileSize) * 100,
      chunksCompleted: chunks.length,
      chunksTotal: session.chunkCount,
      uploadStartedAt: session.uploadStartedAt,
      estimatedCompletion: null, // Client-side calculation
      currentSpeed: 0,           // Client-side calculation
      averageSpeed: bytesUploaded / ((Date.now() - session.uploadStartedAt.getTime()) / 1000),
    },
    processing,
    error: session.error || { code: null, message: null, retryable: false, retryAfter: null },
  };
}

Real-Time Updates via WebSocket

Queue-Based Processing Trigger

processing-queue.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
// Message structure for processing queue
interface VideoProcessingMessage {
  messageId: string;
  uploadId: string;
  videoId: string;
  
  // Source location
  source: {
    bucket: string;
    key: string;
    region: string;
    sizeBytes: number;
  };
  
  // Media information from validation
  mediaInfo: MediaInfo;
  
  // Processing configuration
  config: {
    priority: Priority;           // 'critical' | 'high' | 'normal' | 'low'
    targetFormats: VideoFormat[]; // Renditions to generate
    features: ProcessingFeature[]; // 'captions' | 'thumbnails' | 'content_analysis'
    deadline?: Date;              // Optional: must complete by this time
  };
  
  // Metadata for asset creation
  metadata: VideoMetadata;
  
  // Tracking
  enqueuedAt: Date;
  retryCount: number;
  maxRetries: number;
}
 
// Priority calculation
function calculatePriority(session: UploadSession): Priority {
  // Premium subscribers get higher priority
  if (session.user.subscriptionTier === 'premium') {
    return 'high';
  }
  
  // Channels with high subscriber count get priority
  if (session.channel.subscriberCount > 1_000_000) {
    return 'high';
  }
  
  // Scheduled premieres need to meet deadline
  if (session.metadata.scheduledAt) {
    const hoursUntil = (session.metadata.scheduledAt.getTime() - Date.now()) / (1000 * 60 * 60);
    if (hoursUntil < 2) return 'critical';
    if (hoursUntil < 6) return 'high';
  }
  
  // Short videos process faster anyway, deprioritize
  if (session.mediaInfo.duration > 3600) {
    return 'low';
  }
  
  return 'normal';
}
 
// Queue publishing
async function enqueueForProcessing(session: UploadSession): Promise<void> {
  const message: VideoProcessingMessage = {
    messageId: generateUUID(),
    uploadId: session.id,
    videoId: session.videoId,
    source: {
      bucket: session.bucket,
      key: session.objectKey,
      region: session.region,
      sizeBytes: session.fileSize,
    },
    mediaInfo: session.validationResult.mediaInfo,
    config: {
      priority: calculatePriority(session),
      targetFormats: determineTargetFormats(session),
      features: determineFeatures(session),
    },
    metadata: session.metadata,
    enqueuedAt: new Date(),
    retryCount: 0,
    maxRetries: 3,
  };
  
  // Publish to appropriate priority queue
  const queueName = `video-processing-${message.config.priority}`;
  await kafka.send({
    topic: queueName,
    messages: [{
      key: session.videoId,
      value: JSON.stringify(message),
    }],
  });
  
  // Update session status
  await updateSession(session.id, {
    status: 'PROCESSING',
    processingEnqueuedAt: new Date(),
  });
  
  // Emit metric
  metrics.increment('processing.enqueued', {
    priority: message.config.priority,
    region: session.region,
  });
}
 
// Determine which formats to generate based on source
function determineTargetFormats(session: UploadSession): VideoFormat[] {
  const sourceHeight = session.mediaInfo.videoStreams[0].height;
  const formats: VideoFormat[] = [];
  
  // Always generate lower resolutions
  formats.push({ height: 144, codec: 'h264', container: 'mp4' });
  formats.push({ height: 240, codec: 'h264', container: 'mp4' });
  formats.push({ height: 360, codec: 'h264', container: 'mp4' });
  formats.push({ height: 480, codec: 'h264', container: 'mp4' });
  
  if (sourceHeight >= 720) {
    formats.push({ height: 720, codec: 'h264', container: 'mp4' });
    formats.push({ height: 720, codec: 'vp9', container: 'webm' });
  }
  
  if (sourceHeight >= 1080) {
    formats.push({ height: 1080, codec: 'h264', container: 'mp4' });
    formats.push({ height: 1080, codec: 'vp9', container: 'webm' });
  }
  
  if (sourceHeight >= 1440) {
    formats.push({ height: 1440, codec: 'vp9', container: 'webm' });
    formats.push({ height: 1440, codec: 'av1', container: 'mp4' });
  }
  
  if (sourceHeight >= 2160) {
    formats.push({ height: 2160, codec: 'vp9', container: 'webm' });
    formats.push({ height: 2160, codec: 'av1', container: 'mp4' });
  }
  
  return formats;
}

Priority Queue Configuration
Priority	Queue	Consumer Count	Max Wait Time	Use Case
Critical	video-processing-critical	50	5 minutes	Scheduled premieres, breaking news
High	video-processing-high	100	15 minutes	Premium users, popular channels
Normal	video-processing-normal	200	60 minutes	Standard uploads
Low	video-processing-low	50	4 hours	Very long videos, off-peak processing

Upload Pipeline Summary

We've designed a robust, scalable upload pipeline that handles the unique challenges of video ingestion at planetary scale. Let's consolidate the key design decisions:

Key Design Decisions

•Chunked resumable uploads — Split files into manageable chunks for parallel upload and failure recovery. Maintain session state for 7 days to enable resumption.
•Direct-to-storage pattern — Clients upload directly to cloud storage via pre-signed URLs, bypassing application servers. Reduces infrastructure cost and leverages cloud provider's global edge.
•Three-phase protocol — Initiation (create session, generate URLs), upload (parallel chunk transfer), finalization (assembly, verification). Clear state transitions at each phase.
•Stateful session tracking — Track every chunk's status in a database. Enable progress queries and seamless resumption after any interruption.
•Comprehensive validation — Container parsing, codec detection, stream analysis, malware scanning. Catch issues before investing in transcoding compute.
•Event-driven processing trigger — Message queues decouple upload from processing. Enable priority scheduling, backpressure, and retry semantics.
•Priority-based scheduling — Premium users and popular channels get faster processing. Scheduled content gets deadline-aware prioritization.

What's next:

Page Complete

2 / 6