Loading learning content...
While text messages form the foundation of messaging, media now dominates bandwidth. Users share billions of photos, videos, voice notes, and documents daily. WhatsApp processes an estimated 7 billion photos and 1 billion videos every single day—roughly 80,000 media files per second.
Media handling introduces challenges that dwarf text messaging: a single 1-minute video can be 50MB, representing 500,000 times the data of a typical text message. Storing, transmitting, and delivering media efficiently while maintaining end-to-end encryption requires sophisticated architecture spanning upload pipelines, transcoding systems, CDN infrastructure, and intelligent caching.
This page explores the complete media handling architecture, from the moment a user selects a photo until it appears on the recipient's screen.
You will understand media upload pipelines with resumable uploads, storage strategies for petabytes of media, thumbnail and preview generation, CDN integration for global delivery, and how end-to-end encryption is maintained for all media types. These patterns apply to any media-heavy application.
Different media types have vastly different requirements for storage, processing, and delivery. Understanding these characteristics drives architectural decisions.
| Media Type | Typical Size | Processing Needed | Delivery Pattern |
|---|---|---|---|
| Photo (Original) | 2-5 MB | Compression, thumbnail generation | Full image on tap |
| Photo (Thumbnail) | 5-20 KB | Pre-generated | Immediate inline display |
| Video | 10-100 MB | Transcoding, multiple bitrates, thumbnails | Progressive/adaptive streaming |
| Voice Note | 0.1-2 MB | Compression (Opus codec) | Full download before play |
| Document (PDF) | 1-100 MB | Preview generation | Full download for viewing |
| GIF/Sticker | 0.1-2 MB | Palette optimization | Cached, immediate display |
| Location | < 1 KB | Map tile URL generation | Map API integration |
| Contact vCard | < 10 KB | None | Direct delivery in message |
Let's derive the storage and bandwidth requirements:
1234567891011121314151617181920212223242526
DAILY VOLUME ESTIMATES══════════════════════Photos: 7 billion/day × 3 MB avg = 21 PB/dayVideos: 1 billion/day × 30 MB avg = 30 PB/dayVoice: 3 billion/day × 0.5 MB avg = 1.5 PB/dayDocuments: 0.5 billion/day × 10 MB avg = 5 PB/day Total daily ingestion: ~57.5 PB/dayAnnual growth: ~21 EB/year (exabytes!) RETENTION ASSUMPTIONS═════════════════════• Media stored until explicitly deleted (or account deletion)• Average media lifetime: ~2 years• Total storage: ~40+ EB (with compression and deduplication) BANDWIDTH REQUIREMENTS══════════════════════Upload: 57.5 PB/day = ~5.3 Tbps sustainedDownload: Assuming each media viewed 2x on average 115 PB/day = ~10.6 Tbps sustainedPeak: 3x average = ~48 Tbps For comparison:• Total global internet traffic: ~400 Tbps• WhatsApp media alone: ~12% of global traffic (order of magnitude)At $0.02/GB/month for cloud object storage, 40 EB costs ~$800 million/month. Efficient storage tiers (hot/warm/cold), compression, and dedicated infrastructure reduce this dramatically, but media storage remains a major cost center for messaging platforms.
Uploading a 50MB video over a mobile network is fraught with failure risks. The upload architecture must handle unreliable networks gracefully.
Rather than uploading files as a single payload, resumable uploads break files into chunks:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
PROTOCOL FLOW═════════════ 1. INITIATE UPLOAD ──────────────── Client → Server: { filename: "video.mp4", filesize: 52_428_800, // 50 MB mime_type: "video/mp4", checksum: "sha256:abc123...", chunk_size: 1_048_576 // 1 MB chunks } Server → Client: { upload_id: "upload_xyz123", upload_url: "https://upload.example.com/upload_xyz123", expires_at: 1704672000 // Upload session expires in 1 hour } 2. UPLOAD CHUNKS ────────────── For each 1 MB chunk: Client → Server: PUT /upload_xyz123 Content-Range: bytes 0-1048575/52428800 Body: [chunk 1 data] Server → Client: HTTP 308 Resume Incomplete Range: bytes=0-1048575 // Confirming received [Continue for all 50 chunks...] 3. FINALIZE UPLOAD ──────────────── After last chunk: Server → Client: HTTP 200 OK { media_id: "media_abc123", url: "https://cdn.example.com/media/abc123", size: 52_428_800, checksum_verified: true } 4. RESUME AFTER FAILURE ───────────────────── If connection lost after chunk 25: Client → Server: PUT /upload_xyz123 Content-Range: bytes */52428800 // Asking what's been received Server → Client: HTTP 308 Resume Incomplete Range: bytes=0-26214399 // Got chunks 0-24 (25 MB) Client resumes from chunk 25, not from beginning!12345678910111213141516171819202122232425262728293031323334353637
┌─────────────────────────────────────────────────────────────────────────┐│ MEDIA UPLOAD PIPELINE │└─────────────────────────────────────────────────────────────────────────┘ ┌──────────────────┐ Client ───►│ Upload Gateway │ • Handles chunked uploads │ (Edge Server) │ • Stores chunks temporarily in local SSD └────────┬─────────┘ • Validates checksums per-chunk │ │ On complete upload: ▼ ┌──────────────────┐ │ Chunk Assembler │ • Assembles chunks into complete file │ │ • Verifies final checksum └────────┬─────────┘ • Uploads to object storage │ ┌────────────┼────────────┐ │ │ │ ▼ ▼ ▼ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ Thumbnail │ │ Transcoding│ │ Encryption │ │ Generator │ │ Pipeline │ │ Service │ └──────┬─────┘ └──────┬─────┘ └──────┬─────┘ │ │ │ └──────────────┴──────────────┘ │ ▼ ┌──────────────────┐ │ Object Storage │ • S3-compatible storage │ (Encrypted) │ • Multiple regions └────────┬─────────┘ • Lifecycle policies │ ▼ ┌──────────────────┐ │ CDN Origin │ • Edge caching │ │ • Global distribution └──────────────────┘Modern apps compress media client-side before upload. A 5MB photo can compress to 200KB with acceptable quality loss. Video is transcoded to H.264/H.265 at lower bitrates. This reduces upload time and storage costs by 10-20x, making the user experience much better on slow networks.
Storing exabytes of media requires a carefully designed storage architecture with multiple tiers, geographic distribution, and efficient deduplication.
Not all media is accessed equally. A tiered storage approach optimizes cost and performance:
| Tier | Access Pattern | Storage Type | Cost/GB/Mo |
|---|---|---|---|
| Hot (0-7 days) | Frequent access (viewing, forwarding) | SSD/NVMe, CDN edge cache | ~$0.10 |
| Warm (7-90 days) | Occasional access | Standard object storage (S3, GCS) | ~$0.02 |
| Cold (90-365 days) | Rare access (search, export) | Infrequent access storage (S3-IA) | ~$0.01 |
| Archive (1+ year) | Very rare (legal holds, recovery) | Glacier, Archive storage | ~$0.004 |
Each media item requires metadata for retrieval, processing status, and access control:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
interface MediaObject { // Identification id: string; // Globally unique media ID uploaderId: string; // Who uploaded this media uploadedAt: Date; // When it was uploaded // Content properties mimeType: string; // "image/jpeg", "video/mp4", etc. originalFilename: string; // User's original filename (encrypted) sizeBytes: number; // Original size duration?: number; // For video/audio, in seconds dimensions?: { // For images/videos width: number; height: number; }; // Encryption (E2EE media) encryptionKey: string; // Media key, encrypted with message key encryptionIv: string; // Initialization vector keyHash: string; // Hash for verification // Storage references storageKey: string; // Object storage key (e.g., S3 key) storageBucket: string; // Which bucket storageRegion: string; // Primary region replicatedRegions: string[]; // Where replicated storageTier: StorageTier; // HOT | WARM | COLD | ARCHIVE // Derived content thumbnails: { small: ThumbnailRef; // 150px max dimension medium: ThumbnailRef; // 300px large: ThumbnailRef; // 800px (for preview) }; transcodes?: { // For video quality_240p: TranscodeRef; quality_480p: TranscodeRef; quality_720p: TranscodeRef; quality_1080p: TranscodeRef; }; // Lifecycle lastAccessedAt: Date; // For tiering decisions expiresAt?: Date; // For disappearing messages deletedAt?: Date; // Soft delete} interface ThumbnailRef { storageKey: string; sizeBytes: number; dimensions: { width: number; height: number }; encryptionKey: string; // Thumbnails are also E2EE}Many users share the same memes, news clips, and viral content. Deduplication can save significant storage:
Content-based hashing: Before encryption, compute a content hash. If identical content exists, reference the existing blob.
Challenge with E2EE: Each sender encrypts with different keys, so the ciphertext differs even for identical plaintext. Deduplication must happen before encryption.
Privacy consideration: Deduplication leaks information ("this content has been shared before"). Most E2EE systems skip deduplication to avoid this metadata leak, accepting the storage cost.
Practical approach: Deduplicate only for media sent by the same user (forwarding their own content uses same encrypted blob).
Media is typically replicated to 2-3 regions for durability and latency. A user in Brazil receives media faster from São Paulo than from US-East. But full replication of 40 EB is expensive. Strategies: replicate only hot tier globally, store cold/archive in primary region only.
Users shouldn't wait for a 50MB video to download just to see what it contains. Thumbnails and previews provide instant visual feedback while the full content loads.
123456789101112131415161718192021222324252627282930313233343536373839404142
IMAGE THUMBNAILS════════════════ ┌─────────────┐ ┌─────────────────┐ ┌─────────────────┐│ Original │─────►│ Image Decoder │─────►│ Resize + ││ (3 MB) │ │ (libvips) │ │ Compress │└─────────────┘ └─────────────────┘ └────────┬────────┘ │ ┌─────────────────────────────────┴─────────┐ │ │ ▼ ▼ ┌─────────────┐ ┌─────────────┐ │ 150x150 │ │ 800x800 │ │ (5-10 KB) │ │ (50-100KB) │ │ Tiny thumb │ │ Preview │ └─────────────┘ └─────────────┘ VIDEO THUMBNAILS════════════════ ┌─────────────┐ ┌─────────────────┐│ Video │─────►│ FFmpeg extract │──────► 3 key frames at 10%, 50%, 90%│ (50 MB) │ │ key frames │└─────────────┘ └─────────────────┘ │ ▼ ┌───────────────────────────────┐ │ Animated Preview (GIF) │ │ or first frame as JPEG │ │ Size: 50-200 KB │ └───────────────────────────────┘ VOICE NOTES═══════════ ┌─────────────┐ ┌─────────────────┐ ┌─────────────────┐│ Audio │─────►│ Waveform │─────►│ Serialized ││ (500 KB) │ │ Extraction │ │ Visualization │└─────────────┘ └─────────────────┘ │ Data (1-2 KB) │ └─────────────────┘ Client renders waveform from data, avoiding image file overhead.Even thumbnails take time to download. BlurHash provides instant visual placeholders that render from ~30 bytes of data:
12345678910111213141516171819202122232425262728293031323334353637383940
// BlurHash encodes an image into a short string (~30 characters)// that can be decoded into a blurry placeholder image on the client. // Server-side: Generate BlurHash during uploadimport { encode } from 'blurhash'; async function generateBlurHash(imageBuffer: Buffer): Promise<string> { const { width, height, data } = await decodeImage(imageBuffer); // Components determine detail level // 4x3 = 12 components, good balance of size vs detail const hash = encode(data, width, height, 4, 3); return hash; // Example: "LEHV6nWB2yk8pyo0adR*.7kCMdnj"} // Client-side: Render BlurHash while real image loadsimport { decode } from 'blurhash'; function renderPlaceholder(hash: string, width: number, height: number): ImageData { const pixels = decode(hash, width, height); // Returns Uint8ClampedArray of RGBA pixel data // Can be drawn directly to canvas return new ImageData(pixels, width, height);} // Message payload includes BlurHash:interface MediaMessage { messageId: string; mediaId: string; blurHash: string; // 30 bytes, in message payload thumbnailUrl: string; // 10 KB, fetched separately fullUrl: string; // 3 MB, loaded on tap dimensions: { width: number; height: number };} // Rendering sequence:// 1. Immediately: Render BlurHash (instant, from message payload)// 2. ~100-500ms: Load thumbnail, replace BlurHash// 3. On tap: Load full image, replace thumbnailThe pattern of BlurHash → Thumbnail → Full image provides excellent perceived performance. Users see 'something' instantly, details emerge progressively. This three-stage loading is now standard in image-heavy applications. Instagram, Pinterest, and WhatsApp all use variants of this approach.
Videos come in countless formats, codecs, and resolutions. A transcoding pipeline normalizes these into consistent, optimized formats for delivery.
Format normalization: iPhones produce HEVC (H.265), some cameras produce ProRes, screen recordings may be VP9. Recipients may not support all codecs.
Adaptive bitrate: Different network conditions require different quality levels. Transcoding creates multiple quality versions.
Size reduction: Uploaded 4K 100Mbps video transcodes to 720p H.264 at 2Mbps—a 50x size reduction with acceptable quality.
Fast start: Reorder video atoms (moov atom at start) for progressive playback without full download.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
VIDEO TRANSCODING PIPELINE═══════════════════════════ ┌─────────────┐│ Uploaded ││ Video │──────────────────────────────────────────────────────┐│ (100 MB) │ │└──────┬──────┘ │ │ │ ▼ │┌─────────────────┐ ││ Input Analysis │ FFprobe: codec, resolution, duration, bitrate │└────────┬────────┘ │ │ │ ▼ │┌─────────────────┐ ││ Transcode Jobs │ Create jobs for each output quality ││ Scheduler │ │└────────┬────────┘ │ │ │ ┌────┴────┬──────────┬──────────┐ │ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────────┐│ 240p │ │ 480p │ │ 720p │ │1080p │ Parallel │ Thumbnail ││150kbps│ │600kbps│ │1.5Mbps│ │3Mbps │ Workers │ Extractor │└───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ (FFmpeg) └─────┬─────┘ │ │ │ │ │ └─────────┴─────────┴─────────┴─────────────────────────────────┘ │ ▼ ┌─────────────────┐ │ Output Storage │ Each quality level │ (Encrypted) │ stored separately └─────────────────┘ FFMPEG EXAMPLE COMMAND:═══════════════════════ffmpeg -i input.mp4 \ -c:v libx264 -preset fast \ -vf scale=-2:720 \ -b:v 1500k -maxrate 2000k \ -bufsize 3000k \ -c:a aac -b:a 128k \ -movflags +faststart \ output_720p.mp4 Output sizes for 1 minute video:• 240p: ~1 MB• 480p: ~4 MB• 720p: ~10 MB• 1080p: ~20 MB• Original stored: 100 MBTranscoding is CPU-intensive. At 1 billion videos/day, brute-force transcoding is infeasible:
Parallelization: Split video into segments (e.g., 10-second chunks), transcode in parallel across machines, concatenate.
Priority queuing: Recent uploads get priority. Older media in queue can wait. Short videos (<30s) often get synchonous transcoding for immediate availability.
Hardware acceleration: GPU-based encoding (NVENC, Quick Sync) is 5-10x faster than CPU for equivalent quality.
Tiered transcoding: Create 240p/480p first (fast, small), delay 1080p for later. Most mobile views use lower quality anyway.
WhatsApp encourages client-side compression before upload. The client transcodes video to H.264, caps resolution, and uses efficient bitrates. This shifts transcoding work to billions of devices, drastically reducing server-side transcoding needs. Most videos are uploaded already optimized.
In an E2EE system, media must be encrypted on the sender's device before upload. The server stores only encrypted blobs it cannot decrypt.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
SENDER'S DEVICE: UPLOAD═══════════════════════ 1. Generate random media encryption key: media_key = random(32 bytes) media_iv = random(16 bytes) 2. Encrypt media locally: encrypted_media = AES-256-CBC(media_key, media_iv, plaintext_media) encrypted_thumbnail = AES-256-CBC(media_key, thumbnail_iv, plaintext_thumbnail) 3. Compute integrity hash: file_hash = SHA256(encrypted_media) 4. Upload encrypted blobs: POST /upload/media Body: encrypted_media Response: { media_url: "https://cdn.example.com/encrypted/abc123" } POST /upload/thumbnail Body: encrypted_thumbnail Response: { thumb_url: "https://cdn.example.com/encrypted/thumb456" } 5. Include key in message (encrypted with recipient's chat key): message_payload = { type: "image", media_url: "https://cdn.example.com/encrypted/abc123", thumb_url: "https://cdn.example.com/encrypted/thumb456", media_key: base64(media_key), // Encrypted in message media_iv: base64(media_iv), thumbnail_iv: base64(thumbnail_iv), file_size: 3145728, file_hash: "sha256:abc123...", mime_type: "image/jpeg" } E2EE_message = encrypt_with_signal_protocol(message_payload) RECIPIENT'S DEVICE: DOWNLOAD════════════════════════════ 1. Receive and decrypt message using Signal Protocol: message_payload = decrypt_with_signal_protocol(E2EE_message) 2. Extract media key and URLs from decrypted payload 3. Download encrypted media from CDN: GET https://cdn.example.com/encrypted/abc123 Response: encrypted_media (server cannot decrypt) 4. Decrypt locally: plaintext_media = AES_decrypt(media_key, media_iv, encrypted_media) 5. Verify integrity: assert SHA256(encrypted_media) == file_hash 6. Display decrypted media to userThe encrypted blob is stored on servers, but the decryption key travels only through the E2EE message channel:
Even if an attacker hacks the media storage and steals all encrypted blobs, they're useless without the keys—which only exist on sender and recipient devices.
| Aspect | What Server Sees | Actual Content |
|---|---|---|
| Photo | 3MB encrypted blob | Birthday party photo |
| Video | 50MB encrypted blob | Child's piano recital |
| Document | 10MB encrypted blob | Tax returns PDF |
| Filename | Nothing (encrypted) | vacation_2024.jpg |
| Thumbnail | Encrypted blob | Blurry preview image |
While content is encrypted, the server sees: upload timestamp, file size, MIME type (if included), IP address of uploader, who the message was sent to. This metadata can be revealing. Some protocols encrypt even metadata, but WhatsApp's implementation protects content while exposing metadata.
Delivering petabytes of media daily requires a global Content Delivery Network (CDN). CDNs cache content at edge locations close to users, reducing latency and origin server load.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
CDN ARCHITECTURE════════════════ ┌─────────────────┐ │ User in Tokyo │ └────────┬────────┘ │ │ 1. Request media ▼ ┌─────────────────┐ │ Tokyo PoP │◄───── Cache HIT? Return immediately │ (Edge Server) │ (~10ms latency) └────────┬────────┘ │ │ Cache MISS ▼ ┌─────────────────┐ │ Regional Cache │◄───── Regional cache HIT? │ (Singapore) │ (~50ms latency) └────────┬────────┘ │ │ Still MISS (rare for popular content) ▼ ┌─────────────────┐ │ Origin Shield │◄───── Coalesce requests to origin │ │ (prevent thundering herd) └────────┬────────┘ │ ▼ ┌─────────────────┐ │ Origin (S3) │ Media storage │ US-East │ (~200ms from Tokyo) └─────────────────┘ CDN CONFIGURATION FOR MESSAGING:════════════════════════════════ 1. CACHE POLICY ───────────── • Media is immutable (content never changes at same URL) • Cache-Control: public, max-age=31536000 (1 year) • E2EE media: content is encrypted, safe to cache anywhere • Unique URLs per media item prevent stale content 2. ACCESS CONTROL ─────────────── • Signed URLs: Token-based access expiring after N hours • Example: https://cdn.example.com/media/abc?token=xyz&expires=1704672000 • Prevents unauthorized access even though content is encrypted • Rate limiting at edge to prevent abuse 3. EDGE FEATURES ────────────── • TLS termination at edge (reduces latency) • Brotli/gzip compression for compressible formats • HTTP/2 for multiplexed downloads • Range request support for video seekingHit rate is everything: CDN costs scale with origin requests. High cache hit rates (>95%) dramatically reduce costs.
Challenge with E2EE: Each message to different recipients uses different encryption keys, creating different ciphertext. The same photo sent to 10 people = 10 different encrypted blobs = 10 cache misses.
WhatsApp's approach: For group messages and forwards, the same encrypted blob can be reused if the message references the same media. But 1:1 messages to different people use different keys.
Long-tail problem: Rarely accessed media (old photos from years ago) may never be in cache. Accept higher origin load for cold content.
At petabyte scale, CDN egress costs dominate. Strategies: negotiate volume discounts, use multiple CDNs and route by price/performance, implement tiered caching, use efficient codecs (AVIF for images, HEVC for video) to reduce file sizes, and leverage CDN's free tier for thumbnail delivery.
Media doesn't live forever. Managing the lifecycle—from upload through access patterns to eventual deletion—is essential for cost control and compliance.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
MEDIA LIFECYCLE STAGES══════════════════════ Day 0: UPLOAD├── Store in HOT tier├── Generate thumbnails├── CDN edge caching active└── High access probability Day 1-7: ACTIVE├── Remain in HOT tier├── Frequent views (recipients, forwards)├── CDN cache hits likely└── No action needed Day 7-30: COOLING├── Monitor access frequency├── If access_count < threshold: transition to WARM├── Remove from CDN edge cache└── Still quick retrieval (~100ms) Day 30-90: WARM├── Standard object storage├── Moved from premium SSD├── On-demand CDN caching only└── ~200ms retrieval Day 90-365: COLD├── Infrequent access tier (S3-IA)├── Higher retrieval cost, lower storage cost├── Minimum storage duration charges apply└── ~500ms retrieval Day 365+: ARCHIVE (Optional)├── Glacier or equivalent├── Minutes to hours for retrieval├── Extremely low storage cost└── Used for legal holds, backups DELETION TRIGGERS:══════════════════• User deletes message/media• Disappearing messages timer expires• Account deleted• Legal hold expired• Storage quota enforcement (if any)Disappearing messages create special challenges for media:
Timer starts on view: The 7-day timer typically starts when recipient views the message, not when sent. Server must track first-view timestamp.
Multi-device sync: If recipient has 3 devices, disappearing message must disappear from all once any device views it + timer expires.
Deletion must be thorough:
Recovery concern: E2EE means once media is deleted, it's unrecoverable. Users who set 24-hour disappearing messages may accidentally lose important content.
True deletion across all replicas, caches, and backups is challenging. CDNs may serve cached content for seconds/minutes after deletion. Database replicas may process delete with lag. Best effort: mark as deleted immediately, let background jobs enforce actual deletion, accept brief inconsistency window.
How media is delivered to the client affects user experience dramatically. Different strategies suit different media types and network conditions.
| Media Type | Strategy | Rationale |
|---|---|---|
| Photos | Download thumbnail inline, full on tap | Quick preview, full quality on demand |
| Short videos (<30s) | Progressive download | Download while playing; simple |
| Long videos (>30s) | Adaptive bitrate streaming (HLS/DASH) | Adjust quality to network; seek support |
| Voice notes | Full download before play | Small files; need complete for scrubbing |
| Documents | Download on tap | No preview needed during chat scroll |
| GIFs/Stickers | Preload next messages' GIFs | Ensure instant animation on scroll |
For longer videos, adaptive streaming adjusts quality based on network conditions:
1234567891011121314151617181920212223242526272829303132333435363738
ADAPTIVE BITRATE STREAMING WITH E2EE═════════════════════════════════════ CHALLENGE:• Standard HLS/DASH expects server to segment video• With E2EE, server cannot decode video to segment it SOLUTION: Client-side segment decryption 1. Upload: Client encrypts full video with media_key Server stores as single encrypted blob 2. Manifest creation: Client/server creates HLS manifest pointing to byte ranges of the encrypted file: #EXTM3U #EXT-X-KEY:METHOD=NONE // No server-side key needed #EXTINF:10.0, https://cdn.example.com/media/abc?range=0-1048576 #EXTINF:10.0, https://cdn.example.com/media/abc?range=1048577-2097152 ... 3. Playback: Client downloads byte ranges, decrypts each segment locally, feeds to video player ALTERNATIVE: Pre-segment before encryption• Client segments video into 10-second chunks• Encrypts each chunk separately (same key, different IVs)• Uploads chunks individually• Server can serve chunks without full-file access• More complex upload, simpler playback QUALITY SWITCHING:• Client monitors download speed• If network degrades: request lower-quality segments• Encoder ladder stored: 240p/480p/720p/1080p versions• Seamless switching between qualitiesNot all users have unlimited data. Smart downloading respects network conditions:
WiFi-only mode: Option to download media only on WiFi, showing placeholders on cellular.
Data saver mode: Download only thumbnails; full media on explicit tap. Reduces data by 90%+.
Quality preferences: Let users choose: 'Always HD,' 'Automatic,' or 'Data Saver.' Store per-user preference.
Network type detection: Detect cellular vs WiFi, 4G vs 3G. Adjust behavior automatically.
Background sync: When on WiFi at night, pre-download media from recent conversations for offline access.
Smart pre-fetching improves perceived performance: when user scrolls to a chat, pre-fetch thumbnails for next ~20 messages. When user opens an image, pre-fetch next/previous images in the conversation. This makes browsing feel instant. But be careful: pre-fetched content that's never viewed wastes bandwidth.
Media handling transforms messaging from a text system into a rich multimedia platform. The architecture must handle enormous scale while maintaining the privacy guarantees of E2EE.
What's next:
With media handling covered, we'll explore presence and delivery receipts—how the system tracks who's online, when messages are delivered, and when they're read. We'll examine the real-time presence infrastructure and the privacy trade-offs of visibility features.
You now understand the complete media handling pipeline for messaging systems—from upload through storage to delivery. These patterns of chunked uploads, progressive loading, E2EE media, and CDN integration apply to any application dealing with user-generated media at scale.