Loading content...
Video transcoding is the most compute-intensive operation in any video platform. It transforms raw uploads—captured in countless formats, codecs, and quality levels—into optimized versions suitable for every playback scenario: from 144p on a 2G mobile connection to 4K HDR on a fiber-connected smart TV.
At YouTube's scale, this means processing:
This page explores the architecture that makes this possible: distributed job orchestration, encoding pipeline design, quality optimization strategies, and the economics of operating at planetary scale.
By the end of this page, you will understand how to design a distributed transcoding system that balances processing speed, output quality, and compute cost. You'll learn encoding ladder design, parallel job orchestration, quality metrics, and strategies for operating at scale.
Before diving into architecture, let's establish the fundamentals of video encoding and why it's computationally expensive.
1234567891011121314151617181920212223242526272829303132333435
// ================================================================// WHY TRANSCODING IS COMPUTATIONALLY EXPENSIVE// ================================================================ // Video encoding is essentially a massive optimization problem:// "Find the minimum bits to represent this video with acceptable quality" // For each frame, the encoder must:// 1. Analyze motion between frames (motion estimation) - O(n²) per block// 2. Decide block partitioning (quad-tree in HEVC) - exponential choices// 3. Apply transform coding (DCT/wavelet) - O(n log n) per block// 4. Optimize quantization decisions - rate-distortion optimization// 5. Apply entropy coding (CABAC/CAVLC) - sequential, hard to parallelize // Approximate CPU cycles per encoded frame at 1080p:// // Codec | CPU Cycles/Frame | Encode Speed (software) | Compression// ------------|------------------|-------------------------|-------------// H.264 | ~50 billion | 30-60 fps | Baseline// H.265/HEVC | ~150 billion | 5-15 fps | 30-50% better// VP9 | ~120 billion | 8-20 fps | 30-40% better// AV1 | ~500 billion | 0.5-2 fps | 50-60% better // At 30 fps, 1080p, that's:// H.264: 50B × 30 = 1.5 trillion cycles/second// AV1: 500B × 30 = 15 trillion cycles/second // A high-end CPU might deliver 200 billion cycles/second// → H.264 software encoding is roughly 7-10x realtime// → AV1 software encoding is roughly 75-100x realtime // This is why:// 1. Hardware encoders (NVENC, QuickSync) are common for fast encoding// 2. Distributed encoding is mandatory at scale// 3. Newer codecs (AV1) require careful cost-benefit analysisEvery encoding decision is a tradeoff: slower encoding produces better quality at the same bitrate. At scale, this becomes an economic decision: spending more compute to achieve 10% bitrate savings might save millions in CDN costs annually.
An encoding ladder defines the set of quality variants (resolution + bitrate combinations) generated for each video. The ladder determines what quality levels are available to viewers and directly impacts storage costs, CDN egress, and playback quality.
| Resolution | Bitrate (video) | Total Bitrate | Frame Rate | Profile | Use Case |
|---|---|---|---|---|---|
| 2160p (4K) | 20-40 Mbps | 22-42 Mbps | 30/60 fps | High | Smart TVs, desktop |
| 1440p (2K) | 10-16 Mbps | 11-17 Mbps | 30/60 fps | High | Large monitors |
| 1080p | 4-8 Mbps | 4.5-9 Mbps | 30/60 fps | High | Desktop, tablets |
| 720p | 2.5-5 Mbps | 2.8-5.5 Mbps | 30/60 fps | Main | Tablets, mobile (WiFi) |
| 480p | 1-2.5 Mbps | 1.2-2.8 Mbps | 30 fps | Main | Mobile (LTE) |
| 360p | 0.5-1 Mbps | 0.6-1.2 Mbps | 30 fps | Baseline | Mobile (3G) |
| 240p | 0.2-0.5 Mbps | 0.3-0.6 Mbps | 30 fps | Baseline | Slow connections |
| 144p | 0.08-0.2 Mbps | 0.1-0.3 Mbps | 30 fps | Baseline | 2G/poor connectivity |
Modern Multi-Codec Strategy:
A mature platform serves multiple codecs to balance compatibility, quality, and bandwidth:
| Codec | Target Use Case | Browser/Device Support | Pros | Cons |
|---|---|---|---|---|
| H.264 | Universal fallback | All browsers, all devices | Maximum compatibility | Lowest compression |
| VP9 | Web primary | Chrome, Firefox, Edge, Android | 30-40% smaller than H.264, royalty-free | Slower encoding |
| AV1 | Premium quality | Chrome, Firefox, Android 10+ | 50%+ smaller than H.264, royalty-free | Very slow encoding |
| HEVC | Apple ecosystem | Safari, iOS, Apple TV | 30-50% smaller than H.264 | Licensing costs |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119
interface EncodingProfile { resolution: { width: number; height: number }; videoBitrate: { min: number; max: number; target: number }; audioBitrate: number; frameRate: { source: boolean; max: number }; keyframeInterval: number; // seconds codec: 'h264' | 'h265' | 'vp9' | 'av1'; profile: string; level?: string;} interface EncodingLadder { name: string; profiles: EncodingProfile[];} // Standard ladder for most contentconst STANDARD_LADDER: EncodingLadder = { name: 'standard', profiles: [ // 4K variants (if source supports) { resolution: { width: 3840, height: 2160 }, videoBitrate: { min: 15000, max: 45000, target: 25000 }, audioBitrate: 192, frameRate: { source: true, max: 60 }, keyframeInterval: 2, codec: 'vp9', profile: 'profile0', }, // 1080p variants { resolution: { width: 1920, height: 1080 }, videoBitrate: { min: 3000, max: 8000, target: 5000 }, audioBitrate: 128, frameRate: { source: true, max: 60 }, keyframeInterval: 2, codec: 'h264', profile: 'high', level: '4.1', }, { resolution: { width: 1920, height: 1080 }, videoBitrate: { min: 2500, max: 6000, target: 4000 }, audioBitrate: 128, frameRate: { source: true, max: 60 }, keyframeInterval: 2, codec: 'vp9', profile: 'profile0', }, // 720p variants { resolution: { width: 1280, height: 720 }, videoBitrate: { min: 1500, max: 5000, target: 2500 }, audioBitrate: 128, frameRate: { source: true, max: 60 }, keyframeInterval: 2, codec: 'h264', profile: 'main', }, // ... lower resolutions ],}; // Per-title encoding: optimize ladder per videofunction generateOptimizedLadder( sourceInfo: MediaInfo, contentAnalysis: ContentAnalysis): EncodingLadder { const ladder: EncodingProfile[] = []; // Don't upscale: max resolution is source resolution const maxResolution = { width: sourceInfo.width, height: sourceInfo.height, }; // Analyze content complexity for bitrate optimization const complexity = contentAnalysis.motionComplexity; // 0-1 scale const isAnimation = contentAnalysis.isAnimated; const hasFilmGrain = contentAnalysis.hasFilmGrain; for (const template of STANDARD_LADDER.profiles) { // Skip resolutions higher than source if (template.resolution.height > maxResolution.height) continue; // Adjust bitrate based on content characteristics let bitrateMultiplier = 1.0; if (complexity < 0.3) { // Low motion content (talking head, slides) needs less bitrate bitrateMultiplier = 0.7; } else if (complexity > 0.8) { // High motion content (sports, action) needs more bitrate bitrateMultiplier = 1.3; } if (isAnimation) { // Animation typically compresses better bitrateMultiplier *= 0.8; } if (hasFilmGrain) { // Film grain is hard to compress, increase bitrate bitrateMultiplier *= 1.2; } ladder.push({ ...template, videoBitrate: { min: template.videoBitrate.min * bitrateMultiplier, max: template.videoBitrate.max * bitrateMultiplier, target: template.videoBitrate.target * bitrateMultiplier, }, }); } return { name: 'per-title', profiles: ladder };}Netflix pioneered 'per-title encoding' where each video gets a custom encoding ladder based on its content complexity. A simple animated video needs lower bitrates than a grain-heavy action film at the same perceptual quality. This optimization can reduce bandwidth by 20-50% with no quality loss.
Processing millions of videos daily requires a distributed architecture that can parallelize work, handle failures gracefully, and scale elastically with demand.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
┌─────────────────────────────────────────────────────────────────────────────────┐│ DISTRIBUTED TRANSCODING PIPELINE │└─────────────────────────────────────────────────────────────────────────────────┘ ╔═════════════════╗║ Processing ║ ┌─────────────────────────────────────────────────────────┐║ Request Queue ╠════▶│ ORCHESTRATOR SERVICE │╚═════════════════╝ │ • Job planning & splitting │ │ • Segment scheduling │ │ • Progress aggregation │ │ • Failure handling │ └─────────────────────────────────────────────────────────┘ │ │ Create encoding jobs ▼ ┌─────────────────────────────────────────────────────────┐ │ JOB DATABASE │ │ • Video-level jobs │ │ • Segment-level tasks │ │ • Rendition tracking │ └─────────────────────────────────────────────────────────┘ │ ┌─────────────────────┼─────────────────────┐ ▼ ▼ ▼ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Segment │ │ Segment │ │ Segment │ │ Queue #1 │ │ Queue #2 │ │ Queue #N │ │ (Priority) │ │ (Normal) │ │ (Batch) │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ ┌────────────────┼─────────────────────┼─────────────────────┼─────────────┐ ▼ ▼ ▼ ▼ ▼┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐│ Worker │ │ Worker │ ... │ Worker │ ... │ Worker │ │ Worker ││ Node 1 │ │ Node 2 │ │ Node N │ │ GPU 1 │ │ GPU M ││ (CPU) │ │ (CPU) │ │ (CPU) │ │(NVENC) │ │(NVENC) │└────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ │ │ └───────────────┴─────────────────────┴─────────────────────┴─────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ SEGMENT STORAGE (Temporary) │ │ • Encoded segments per rendition │ │ • Intermediate files │ └─────────────────────────────────────────────────────────┘ │ │ All segments complete ▼ ┌─────────────────────────────────────────────────────────┐ │ PACKAGER SERVICE │ │ • Concatenate segments │ │ • Generate manifests (HLS, DASH) │ │ • DRM encryption │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ FINAL STORAGE (Permanent) │ │ • Origin storage for CDN │ │ • Multi-region replication │ └─────────────────────────────────────────────────────────┘The key to fast transcoding is segment-based parallelism. Instead of encoding a 1-hour video sequentially (taking ~2 hours of CPU time), we split it into 100 segments of ~36 seconds each and encode them simultaneously across 100 workers—completing in ~2 minutes wall-clock time.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179
interface SegmentTask { videoId: string; renditionId: string; segmentIndex: number; // Source specification source: { location: StorageLocation; startTime: number; // seconds duration: number; // seconds keyframeAligned: boolean; }; // Encoding specification encoding: EncodingProfile; // Output specification output: { location: StorageLocation; format: 'ts' | 'fmp4' | 'webm'; };} // Orchestrator: split video into segmentsasync function planSegmentedEncoding( videoId: string, sourceInfo: MediaInfo, ladder: EncodingLadder): Promise<SegmentTask[]> { const tasks: SegmentTask[] = []; // Target segment duration (in seconds) const TARGET_SEGMENT_DURATION = 4; // 4-second segments common for HLS/DASH // Find keyframe positions for segment boundaries const keyframes = await extractKeyframePositions(sourceInfo.location); // Align segments to keyframes for seamless concatenation const segmentBoundaries = alignToKeyframes( keyframes, sourceInfo.duration, TARGET_SEGMENT_DURATION ); // Create tasks for each segment × rendition combination for (const profile of ladder.profiles) { const renditionId = generateRenditionId(videoId, profile); for (let i = 0; i < segmentBoundaries.length - 1; i++) { const startTime = segmentBoundaries[i]; const endTime = segmentBoundaries[i + 1]; tasks.push({ videoId, renditionId, segmentIndex: i, source: { location: sourceInfo.location, startTime, duration: endTime - startTime, keyframeAligned: true, }, encoding: profile, output: { location: generateSegmentOutputPath(videoId, renditionId, i), format: 'fmp4', // Fragmented MP4 for DASH compatibility }, }); } } return tasks;} // Worker: encode a single segmentasync function encodeSegment(task: SegmentTask): Promise<SegmentResult> { const startTime = Date.now(); try { // 1. Download source segment (only the needed portion) const sourceSegment = await downloadRange( task.source.location, task.source.startTime, task.source.duration ); // 2. Build FFmpeg command const ffmpegArgs = buildEncodingCommand(task, sourceSegment); // 3. Execute encoding const encodedData = await executeFFmpeg(ffmpegArgs); // 4. Upload encoded segment const uploadResult = await uploadSegment(task.output.location, encodedData); // 5. Calculate quality metrics const qualityMetrics = await calculateQualityMetrics( sourceSegment, encodedData, task.encoding ); return { success: true, videoId: task.videoId, renditionId: task.renditionId, segmentIndex: task.segmentIndex, outputLocation: task.output.location, sizeBytes: encodedData.byteLength, encodingDuration: Date.now() - startTime, qualityMetrics, }; } catch (error) { return { success: false, videoId: task.videoId, renditionId: task.renditionId, segmentIndex: task.segmentIndex, error: error.message, }; }} function buildEncodingCommand( task: SegmentTask, inputPath: string): string[] { const args: string[] = ['-y']; // Overwrite output // Input args.push('-ss', task.source.startTime.toString()); args.push('-t', task.source.duration.toString()); args.push('-i', inputPath); // Video encoding if (task.encoding.codec === 'h264') { args.push('-c:v', 'libx264'); args.push('-preset', 'medium'); // Balance speed/quality args.push('-profile:v', task.encoding.profile); if (task.encoding.level) args.push('-level', task.encoding.level); args.push('-b:v', `${task.encoding.videoBitrate.target}k`); args.push('-maxrate', `${task.encoding.videoBitrate.max}k`); args.push('-bufsize', `${task.encoding.videoBitrate.max * 2}k`); } else if (task.encoding.codec === 'vp9') { args.push('-c:v', 'libvpx-vp9'); args.push('-b:v', `${task.encoding.videoBitrate.target}k`); args.push('-minrate', `${task.encoding.videoBitrate.min}k`); args.push('-maxrate', `${task.encoding.videoBitrate.max}k`); args.push('-quality', 'good'); args.push('-speed', '2'); // 0-8, lower is slower/better } else if (task.encoding.codec === 'av1') { args.push('-c:v', 'libaom-av1'); args.push('-b:v', `${task.encoding.videoBitrate.target}k`); args.push('-cpu-used', '4'); // 0-8, higher is faster args.push('-row-mt', '1'); // Enable row-based multithreading } // Resolution scaling args.push('-vf', `scale=${task.encoding.resolution.width}:${task.encoding.resolution.height}`); // Keyframe interval const gop = task.encoding.keyframeInterval * (task.encoding.frameRate.source ? 30 : task.encoding.frameRate.max); args.push('-g', gop.toString()); args.push('-keyint_min', gop.toString()); // Audio encoding args.push('-c:a', 'aac'); args.push('-b:a', `${task.encoding.audioBitrate}k`); // Output format if (task.output.format === 'fmp4') { args.push('-f', 'mp4'); args.push('-movflags', 'frag_keyframe+empty_moov+default_base_moof'); } return args;}| Video Duration | Sequential Time | 100 Workers | 1000 Workers |
|---|---|---|---|
| 10 minutes | ~20 minutes | ~15 seconds | ~5 seconds |
| 1 hour | ~2 hours | ~1.5 minutes | ~15 seconds |
| 4 hours | ~8 hours | ~6 minutes | ~1 minute |
| 12 hours | ~24 hours | ~18 minutes | ~3 minutes |
Segments must start and end on keyframes for seamless concatenation. If source video has sparse keyframes (e.g., every 10 seconds), segment granularity is limited. Some pipelines re-encode with regular keyframe intervals as a first pass.
Worker nodes are the encoding workhorses. Their design significantly impacts processing throughput, cost efficiency, and reliability.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
// Worker process lifecycleclass EncodingWorker { private running = false; private currentTask: SegmentTask | null = null; async start(): Promise<void> { this.running = true; // Register with orchestrator await this.registerWorker(); // Main work loop while (this.running) { try { // 1. Pull task from queue (with long-polling) const task = await this.pullTask({ timeout: 30_000, visibilityTimeout: 300_000, // 5 minutes to complete }); if (!task) continue; this.currentTask = task; // 2. Send heartbeats during processing const heartbeatInterval = setInterval(() => { this.sendHeartbeat(task.id); }, 30_000); try { // 3. Execute encoding const result = await encodeSegment(task); // 4. Report completion await this.reportCompletion(task.id, result); } finally { clearInterval(heartbeatInterval); this.currentTask = null; } } catch (error) { // Log and continue - individual task failures // shouldn't crash the worker this.reportError(error); await sleep(1000); } } } async pullTask(options: PullOptions): Promise<SegmentTask | null> { // Determine which queues this worker can process const queues = this.getEligibleQueues(); // Fair-weighted polling across queues // (priority queues polled more frequently) for (const queue of queues) { const task = await queue.receive({ maxMessages: 1, waitTime: options.timeout / queues.length, visibilityTimeout: options.visibilityTimeout, }); if (task) { return task; } } return null; } private getEligibleQueues(): Queue[] { const queues: Queue[] = []; // All workers can handle normal priority queues.push(normalPriorityQueue); // Check if this worker has GPU if (this.hasGPU) { queues.push(gpuQueue); } // High-capability workers handle complex codecs if (this.cpuCores >= 64) { queues.push(av1Queue); } return queues; } // Graceful shutdown async stop(): Promise<void> { this.running = false; // Wait for current task to complete if (this.currentTask) { await this.waitForTaskCompletion(60_000); // 60s grace period } await this.deregisterWorker(); }} // Auto-scaling configurationinterface ScalingConfig { minWorkers: number; maxWorkers: number; targetQueueDepth: number; // Target messages in queue scaleUpThreshold: number; // Queue depth to trigger scale-up scaleDownThreshold: number; // Queue depth to trigger scale-down scaleUpCooldown: number; // Seconds between scale-up events scaleDownCooldown: number; // Seconds between scale-down events scaleUpStep: number; // Workers to add per scale event scaleDownStep: number; // Workers to remove per scale event} const CPU_SCALING: ScalingConfig = { minWorkers: 100, maxWorkers: 5000, targetQueueDepth: 1000, scaleUpThreshold: 2000, scaleDownThreshold: 200, scaleUpCooldown: 60, scaleDownCooldown: 300, scaleUpStep: 50, scaleDownStep: 10,};Encoding workloads are perfect for spot/preemptible instances: stateless, interruptible, and idempotent. Using spot instances can reduce compute costs by 60-80%. Design workers for graceful interruption—checkpoint progress and release tasks back to queue if preempted.
At scale, visual inspection of every encoded video is impossible. Automated quality metrics provide objective measurement of encoding quality and enable continuous optimization.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152
interface QualityMetrics { vmaf: { mean: number; // Average VMAF score (0-100) min: number; // Minimum frame VMAF percentile5: number; // 5th percentile (catches brief quality dips) harmonic: number; // Harmonic mean (penalizes low outliers) }; psnr: { mean: number; min: number; }; ssim: { mean: number; min: number; }; bitrate: { average: number; // kbps max: number; variance: number; }; encoding: { speed: number; // Realtime multiplier (1.0 = realtime) cpuTime: number; // Total CPU seconds consumed peakMemory: number; // Peak memory usage in MB };} async function calculateQualityMetrics( source: Buffer, encoded: Buffer, profile: EncodingProfile): Promise<QualityMetrics> { // Extract frames from both videos const sourceFrames = await extractFrames(source, { every: 1 }); // Every second const encodedFrames = await extractFrames(encoded, { every: 1 }); // Ensure frame counts match if (sourceFrames.length !== encodedFrames.length) { throw new Error('Frame count mismatch'); } // Calculate per-frame VMAF const vmafScores: number[] = []; for (let i = 0; i < sourceFrames.length; i++) { const score = await calculateVMAF(sourceFrames[i], encodedFrames[i]); vmafScores.push(score); } // Calculate per-frame PSNR and SSIM const psnrScores = await calculatePSNRBatch(sourceFrames, encodedFrames); const ssimScores = await calculateSSIMBatch(sourceFrames, encodedFrames); // Get bitrate information const bitrateInfo = await analyzeBitrate(encoded); return { vmaf: { mean: average(vmafScores), min: Math.min(...vmafScores), percentile5: percentile(vmafScores, 5), harmonic: harmonicMean(vmafScores), }, psnr: { mean: average(psnrScores), min: Math.min(...psnrScores), }, ssim: { mean: average(ssimScores), min: Math.min(...ssimScores), }, bitrate: bitrateInfo, encoding: { speed: (encoded.duration / encodingTime), cpuTime: encodingCpuTime, peakMemory: peakMemoryMB, }, };} // Quality thresholds for automated rejectioninterface QualityThresholds { vmafMin: number; // Reject if mean VMAF below this vmafP5Min: number; // Reject if 5th percentile below this bitrateOvershoot: number; // Reject if bitrate exceeds target by this %} const QUALITY_THRESHOLDS: Record<string, QualityThresholds> = { 'h264-1080p': { vmafMin: 80, vmafP5Min: 70, bitrateOvershoot: 50, }, 'vp9-1080p': { vmafMin: 82, vmafP5Min: 72, bitrateOvershoot: 40, }, 'av1-1080p': { vmafMin: 85, vmafP5Min: 75, bitrateOvershoot: 30, },}; function validateQuality( metrics: QualityMetrics, profile: EncodingProfile): QualityValidationResult { const key = `${profile.codec}-${profile.resolution.height}p`; const thresholds = QUALITY_THRESHOLDS[key] || QUALITY_THRESHOLDS['h264-1080p']; const issues: QualityIssue[] = []; if (metrics.vmaf.mean < thresholds.vmafMin) { issues.push({ type: 'LOW_VMAF_MEAN', actual: metrics.vmaf.mean, threshold: thresholds.vmafMin, severity: 'error', }); } if (metrics.vmaf.percentile5 < thresholds.vmafP5Min) { issues.push({ type: 'LOW_VMAF_P5', actual: metrics.vmaf.percentile5, threshold: thresholds.vmafP5Min, severity: 'error', }); } const targetBitrate = profile.videoBitrate.target; const overshootPct = ((metrics.bitrate.average - targetBitrate) / targetBitrate) * 100; if (overshootPct > thresholds.bitrateOvershoot) { issues.push({ type: 'BITRATE_OVERSHOOT', actual: overshootPct, threshold: thresholds.bitrateOvershoot, severity: 'warning', }); } return { passed: issues.filter(i => i.severity === 'error').length === 0, issues, metrics, };}Track quality metrics over time to detect encoder regressions, identify content types that encode poorly, and optimize encoding parameters. Dashboard quality by codec, resolution, and content category to surface optimization opportunities.
At 21 million CPU-hours per day, transcoding compute is a major cost center. Strategic optimizations can save millions annually while maintaining or improving quality.
| Strategy | Description | Potential Savings | Tradeoff |
|---|---|---|---|
| Spot instances | Use preemptible/spot instances for encoding | 60-80% | Requires interruption handling |
| Progressive encoding | Generate low-res first, high-res in background | Reduces urgency | Delayed high-quality availability |
| Per-title optimization | Custom bitrate ladder per video | 20-40% storage/CDN | Increased analysis complexity |
| Hardware encoding | Use GPU encoders for high-volume | 50% compute cost | 10-15% quality reduction |
| Smart scheduling | Batch non-urgent encoding to off-peak | 20-30% | Longer processing times |
| Content-aware skips | Skip unneeded renditions | Variable | Reduced coverage |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
// Progressive encoding: make video available faster, optimize laterinterface EncodingPhase { name: string; priority: number; renditions: EncodingProfile[];} const PROGRESSIVE_PHASES: EncodingPhase[] = [ { name: 'immediate', priority: 1, renditions: [ // SD quality available within minutes { resolution: { width: 640, height: 360 }, codec: 'h264', ... }, { resolution: { width: 854, height: 480 }, codec: 'h264', ... }, ], }, { name: 'standard', priority: 2, renditions: [ // HD quality available within 15 minutes { resolution: { width: 1280, height: 720 }, codec: 'h264', ... }, { resolution: { width: 1920, height: 1080 }, codec: 'h264', ... }, ], }, { name: 'premium', priority: 3, renditions: [ // 4K and VP9/AV1 variants background processed { resolution: { width: 3840, height: 2160 }, codec: 'vp9', ... }, { resolution: { width: 1920, height: 1080 }, codec: 'av1', ... }, ], },]; // Content-aware rendition selectionfunction selectRenditions( sourceInfo: MediaInfo, channelMetrics: ChannelMetrics, contentAnalysis: ContentAnalysis): EncodingProfile[] { const renditions: EncodingProfile[] = []; const sourceHeight = sourceInfo.height; // Always include fallback renditions renditions.push(...FALLBACK_RENDITIONS); // Don't upscale: cap at source resolution const maxHeight = sourceHeight; // For small channels, skip 4K initially (can generate on-demand) if (channelMetrics.averageViews < 1000 && sourceHeight >= 2160) { // Skip 4K for now, mark for on-demand generation } else { renditions.push(...getRenditionsUpTo(maxHeight)); } // For static/low-motion content, fewer bitrate variants needed if (contentAnalysis.motionComplexity < 0.3) { // Use single bitrate per resolution instead of CRF/VBR range } // Skip AV1 for short-lived content (news, livestream archives) // The encoding cost doesn't pay back in CDN savings if (contentAnalysis.expectedViewWindow < Duration.days(7)) { renditions = renditions.filter(r => r.codec !== 'av1'); } return renditions;} // On-demand transcoding for long-tail contentasync function transcodeOnDemand( videoId: string, requestedFormat: VideoFormat): Promise<void> { // Check if format already exists const existing = await getRendition(videoId, requestedFormat); if (existing) return; // Check if on-demand transcoding is allowed for this content const video = await getVideo(videoId); if (!video.allowOnDemandTranscode) { throw new NotAvailableError('Format not available'); } // Queue high-priority transcoding await enqueueTranscoding({ videoId, renditions: [requestedFormat], priority: 'high', reason: 'on-demand', }); // Return 202 Accepted - format will be available soon throw new TranscodingInProgressError(estimateCompletionTime(video, requestedFormat));}Run regular cost-benefit analysis: if encoding 4K AV1 costs $X but saves $Y in CDN costs, only generate AV1 when (Y × expected views) > X. For long-tail content with few views, simpler formats are more economical.
We've designed a comprehensive transcoding architecture capable of processing millions of videos daily. Let's consolidate the key design decisions:
What's next:
With videos transcoded into multiple formats, we need to deliver them efficiently to viewers. The next page covers Adaptive Bitrate Streaming—the protocols and techniques that enable seamless playback across varying network conditions.
You now understand the architecture of a production-grade transcoding pipeline. From encoding ladder design to distributed job orchestration to quality assurance, these patterns enable video processing at planetary scale.