Instagram Photos - Learning Module

Loading content...

0/273

Scaling Media Storage: Exabytes of Photos and Videos

The Storage Challenge at Planetary Scale

Every photo and video uploaded to Instagram must be stored durably, replicated for availability, and served quickly to users worldwide. This is not a typical storage problem—it's one of the largest media storage systems ever built.

Instagram's Storage Scale:

Metric	Scale	Implication
Daily new uploads	2 billion	~6 petabytes of new data daily
Total media objects	100+ billion	Requires distributed, sharded storage
Total storage	20+ exabytes	At this scale, even 1% waste is massive
Read QPS	10+ billion/day	Heavy CDN caching required
Durability requirement	99.999999999%	Never lose a user's photo
Availability requirement	99.99%	Always accessible

At this scale, everything is a trade-off. Storage cost, access latency, durability, and operational complexity must all be balanced. This page explores how Instagram manages media storage at exabyte scale.

What You Will Learn

By the end of this page, you will understand: (1) Object storage architecture and why it's chosen for media, (2) Tiered storage strategies that balance cost and access latency, (3) Replication and erasure coding for durability, (4) Sharding strategies for massive object counts, (5) Cross-region distribution and disaster recovery, (6) Data lifecycle management and deletion, and (7) Cost optimization techniques at exabyte scale.

Why Object Storage for Media

Instagram uses object storage (similar to Amazon S3, Google Cloud Storage, or Azure Blob Storage) for media rather than traditional file systems or databases. Understanding why requires examining the properties of media workloads.

Media Workload Characteristics:

Media Storage Requirements

•Write-once, read-many: Photos are written once, then read millions of times. No updates in place.
•Large objects: Photos are 100KB-10MB each. Videos can be 100MB+. Not suitable for databases.
•Massive scale: 100+ billion objects. No single server can hold this.
•Flat namespace preferred: Hierarchical directories add complexity with no benefit.
•High durability: Users expect photos to last forever. Data loss is unacceptable.
•Variable access patterns: Recent photos accessed constantly; old photos rarely accessed.

Object Storage Advantages

•Infinitely scalable (add more nodes)
•Flat namespace with key-based addressing
•Built-in replication and durability
•Cost-effective for large blobs
•Designed for write-once workloads
•HTTP API (easy CDN integration)

Alternatives & Why Not

•RDBMS: Not designed for blob storage; row limits, poor performance
•File systems (NFS, HDFS): Scalability limits, metadata bottlenecks
•Block storage (EBS): Requires file system layer, not cost-effective
•NoSQL (Cassandra): Blob size limits, not optimal for large objects

Meta's Internal Object Storage:

Meta (Instagram's parent company) operates internal object storage systems specifically designed for social media workloads:

Haystack: Purpose-built photo storage optimized for many small photos
f4: Cold storage system for infrequently accessed data
Warm Blob Storage: Tiered system bridging hot and cold access

These systems are conceptually similar to S3/GCS but optimized for Meta's specific access patterns, hardware, and scale.

Object Key Design:

# Object key structure
/{shard}/media/{year}/{month}/{media_id}/{variant}.{format}

# Examples:
/shard-0042/media/2024/01/abc123def456/large_1080.jpg
/shard-0042/media/2024/01/abc123def456/thumb_150.webp

Key components:

Shard: Distributes data across storage clusters
Date partitioning: Enables lifecycle policies by age
Media ID: Unique identifier, often UUID-based
Variant: Resolution/format variant of the same source image

Key Naming for Performance

Object storage systems often partition by key prefix. Naive sequential keys (upload_0001, upload_0002...) create hot partitions. Using random prefixes (hash of media_id) or including the shard number ensures even distribution across storage nodes.

Tiered Storage Architecture

Not all photos are accessed equally. Recent photos from active users are accessed constantly; old photos from inactive accounts are rarely viewed. Tiered storage aligns storage cost with access patterns.

The Access Pattern Reality:

Instagram's access follows an extreme power law:

~90% of all views are for content <7 days old
~99% of views are for content <90 days old
Content >1 year old accounts for <0.1% of views

Storing all content on high-performance (expensive) storage wastes money. Tiered storage places data on the appropriate storage class based on access likelihood.

Storage Tier Characteristics
Tier	Age	Storage Type	Cost	Access Time	% of Total Data
Hot	<7 days	SSD-backed object store	$$$$	<10ms	~5%
Warm	7-90 days	HDD-backed object store	$$	<50ms	~15%
Cold	90 days - 1 year	Archival storage	$	Minutes	~30%
Glacier/Archive	1 year	Deep archive (tape)	¢	Hours	~50%

Tier Migration Logic:

class TierMigrationService:
    """Manages automatic migration of objects between storage tiers."""
    
    # Tier thresholds
    HOT_TO_WARM_DAYS = 7
    WARM_TO_COLD_DAYS = 90
    COLD_TO_ARCHIVE_DAYS = 365
    
    async def evaluate_for_migration(self, object_key: str, metadata: ObjectMetadata):
        """
        Determine if object should be migrated to a different tier.
        Considers age and recent access patterns.
        """
        age_days = (time.now() - metadata.created_at).days
        last_access_days = (time.now() - metadata.last_accessed_at).days
        
        current_tier = metadata.storage_tier
        
        # Hot → Warm migration
        if current_tier == 'hot' and age_days > self.HOT_TO_WARM_DAYS:
            # Check if still being accessed frequently
            if metadata.access_count_7d < 10:  # Low access
                await self.migrate_object(object_key, 'warm')
        
        # Warm → Cold migration
        elif current_tier == 'warm' and age_days > self.WARM_TO_COLD_DAYS:
            if metadata.access_count_30d < 5:  # Very low access
                await self.migrate_object(object_key, 'cold')
        
        # Cold → Archive migration
        elif current_tier == 'cold' and age_days > self.COLD_TO_ARCHIVE_DAYS:
            if last_access_days > 180:  # No access in 6 months
                await self.migrate_object(object_key, 'archive')
    
    async def migrate_object(self, object_key: str, target_tier: str):
        """
        Migrate object to target tier.
        Uses background copy + delete to avoid data loss.
        """
        # Copy to target tier storage
        await target_storage[target_tier].copy_from(
            source_storage[current_tier],
            object_key
        )
        
        # Verify copy
        await self.verify_copy(object_key, target_tier)
        
        # Update metadata to point to new location
        await metadata_service.update_tier(object_key, target_tier)
        
        # Delete from source tier (after verification)
        await source_storage[current_tier].delete(object_key)
        
        # Log migration event
        await log_migration(object_key, current_tier, target_tier)

Tier Promotion (Cold → Hot):

When cold content is accessed, it may be promoted back to warmer tiers:

async def handle_cold_access(object_key: str):
    """
    Handle access to cold-tier content.
    May trigger promotion to warmer tier.
    """
    # Fetch from cold storage (slow)
    data = await cold_storage.get(object_key)
    
    # Track access
    await metadata_service.record_access(object_key)
    access_count = await metadata_service.get_recent_access_count(object_key)
    
    # Consider promotion if suddenly popular
    if access_count > PROMOTION_THRESHOLD:
        # Async promotion (don't block the request)
        asyncio.create_task(promote_to_warm(object_key, data))
    
    return data

Cost Impact:

Proper tiering dramatically reduces costs:

Scenario	Monthly Cost Estimate
All data in hot tier	$100M+
Tiered storage	$20-30M
Savings	70-80%

Tier Selection is Probabilistic

Migration decisions aren't purely age-based. Hot content from celebrities might stay in hot tier longer. Content that suddenly goes viral (reshared, appears in news) triggers automatic promotion. ML models can predict access likelihood to optimize tier placement.

Durability Through Replication

Instagram promises users their photos will never be lost. Achieving 11-nines durability (99.999999999%) requires sophisticated replication strategies.

Durability Math:

What does 11-nines durability mean?

11-nines = 99.999999999% durability
= 0.000000001% chance of data loss
= 1 in 100 billion objects might be lost per year

With 100 billion objects stored:
→ Expect ~1 object lost per year (approximately)

This is achieved through replication and other protections.

Replication Strategies:

Replication Strategy Comparison
Strategy	Copies	Storage Overhead	Durability	Use Case
3-way replication	3 full copies	200% overhead	~8 nines	Hot tier, highest performance
Cross-AZ replication	3 copies in 3 AZs	200% overhead	~9 nines	Standard durability
Cross-region replication	3+ copies across regions	200%+ overhead	~11 nines	Disaster recovery
Erasure coding (RS 6,3)	9 fragments, any 6 recover	50% overhead	~9 nines	Cold storage cost efficiency
Erasure coding (RS 10,4)	14 fragments, any 10 recover	40% overhead	~10 nines	Archive tier

Erasure Coding Deep Dive:

For cold/archive data, full replication is too expensive. Erasure coding provides durability with less storage overhead:

import galois  # Finite field math for Reed-Solomon

class ErasureCodingService:
    """
    Implements Reed-Solomon erasure coding.
    Splits data into k data fragments, generates m parity fragments.
    Any k fragments can reconstruct the original data.
    """
    
    def __init__(self, k=6, m=3):
        """
        k=6, m=3 means:
        - Split into 6 data fragments
        - Generate 3 parity fragments
        - Total 9 fragments stored across 9 nodes
        - Any 6 fragments can reconstruct data
        - Can tolerate loss of any 3 fragments
        - Storage overhead: (6+3)/6 = 1.5x (50% overhead vs 200% for 3-copy)
        """
        self.k = k
        self.m = m
        self.n = k + m
    
    def encode(self, data: bytes) -> List[bytes]:
        """Split data into n fragments using Reed-Solomon encoding."""
        # Pad data to be divisible by k
        padded_data = self._pad_data(data)
        
        # Split into k data fragments
        fragment_size = len(padded_data) // self.k
        data_fragments = [
            padded_data[i*fragment_size:(i+1)*fragment_size]
            for i in range(self.k)
        ]
        
        # Generate m parity fragments using GF(2^8) math
        parity_fragments = self._generate_parity(data_fragments)
        
        return data_fragments + parity_fragments
    
    def decode(self, fragments: List[bytes], available_indices: List[int]) -> bytes:
        """Reconstruct original data from any k fragments."""
        if len(available_indices) < self.k:
            raise InsufficientFragmentsError(
                f"Need {self.k} fragments, only have {len(available_indices)}"
            )
        
        # Use first k available fragments
        selected = list(zip(available_indices, fragments))[:self.k]
        
        # Reconstruct using Galois field math
        return self._reconstruct(selected)

Fragment Placement:

Fragments must be placed to survive correlated failures:

Failure Type	Mitigation
Disk failure	Fragments on different disks
Server failure	Fragments on different servers
Rack failure	Fragments in different racks
Data center failure	Fragments in different data centers
Region failure	Fragments in different geographic regions

Durability Monitoring:

class DurabilityMonitor:
    """Continuously monitors and maintains data durability."""
    
    async def scan_for_degraded_objects(self):
        """Find objects with fewer than required copies/fragments."""
        for shard in all_shards():
            async for object_key, fragment_status in shard.scan_objects():
                available_count = sum(1 for f in fragment_status if f.is_available)
                
                if available_count < self.min_fragments_for_recovery:
                    # CRITICAL: Object at risk
                    await self.alert_critical(object_key)
                elif available_count < self.target_fragments:
                    # WARN: Need to re-replicate
                    await self.queue_repair(object_key)
    
    async def repair_object(self, object_key: str):
        """Reconstruct missing fragments and restore target durability."""
        metadata = await get_object_metadata(object_key)
        
        # Read available fragments
        available = await gather_available_fragments(object_key)
        
        # Reconstruct original data
        data = erasure_codec.decode(available)
        
        # Generate all fragments fresh
        fragments = erasure_codec.encode(data)
        
        # Re-distribute to healthy nodes
        await distribute_fragments(object_key, fragments)

Durability is Not Availability

Durability (data not lost) and availability (data accessible now) are different. An object in archive tier has high durability but low availability (access takes hours). Instagram optimizes each tier for its requirements—hot tier prioritizes availability; archive tier prioritizes durability per dollar.

Sharding Strategies for 100 Billion Objects

No single storage node can hold 100 billion objects. Data must be sharded (partitioned) across thousands of nodes. The sharding strategy determines performance, scalability, and operational complexity.

Sharding Dimensions:

Sharding Considerations

•Uniform distribution: All shards should have similar data volume
•Avoid hot shards: Access patterns should be evenly distributed
•Scalability: Adding shards should be seamless
•Locality: Related data should be on same shard (if needed)
•Deterministic routing: Given object key, anyone can find its shard

Consistent Hashing for Sharding:

Instagram uses consistent hashing to distribute objects across shards:

import hashlib

class ConsistentHashRing:
    """
    Consistent hash ring for object sharding.
    Minimizes data movement when adding/removing shards.
    """
    
    def __init__(self, shards: List[str], virtual_nodes: int = 150):
        """
        Initialize ring with shards.
        Virtual nodes ensure even distribution.
        """
        self.ring = {}
        self.sorted_keys = []
        
        for shard in shards:
            for i in range(virtual_nodes):
                # Create virtual node key
                key = self._hash(f"{shard}:{i}")
                self.ring[key] = shard
        
        self.sorted_keys = sorted(self.ring.keys())
    
    def get_shard(self, object_key: str) -> str:
        """Determine which shard stores this object."""
        if not self.sorted_keys:
            raise NoShardsError()
        
        key_hash = self._hash(object_key)
        
        # Find first shard with hash >= object hash
        for shard_hash in self.sorted_keys:
            if shard_hash >= key_hash:
                return self.ring[shard_hash]
        
        # Wrap around to first shard
        return self.ring[self.sorted_keys[0]]
    
    def _hash(self, key: str) -> int:
        """Hash function returning consistent integer."""
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

Shard Topology:

Instagram Storage Cluster
├── Region: US-West
│   ├── Zone: us-west-1a
│   │   ├── Shard-0001 ... Shard-0500
│   ├── Zone: us-west-1b
│   │   ├── Shard-0501 ... Shard-1000
│   └── Zone: us-west-1c
│       ├── Shard-1001 ... Shard-1500
├── Region: EU-West
│   └── [Similar structure]
└── Region: Asia-Pacific
    └── [Similar structure]

Total shards: ~10,000+
Objects per shard: ~10 million

Avoiding Hot Shards:

Certain patterns create hot shards:

Problem	Cause	Solution
Temporal clustering	Sequential IDs (upload_0001, upload_0002)	Random media_id generation
Celebrity uploads	All celebrity content on one shard	Hash by media_id, not user_id
Viral content	Single object gets millions of reads	CDN caching before storage
Large user	User with millions of photos	Spread across shards via hash

Resharding Strategy:

When capacity is reached, more shards must be added:

async def expand_shard_capacity():
    """Add new shards to the cluster."""
    # 1. Add new shards to the hash ring
    new_shards = ['shard-1501', 'shard-1502', ...]
    for shard in new_shards:
        hash_ring.add_shard(shard)
    
    # 2. In consistent hashing, only ~1/N data moves
    # For 1500→1510 shards, ~0.6% of data migrates
    
    # 3. Background migration
    for old_shard in existing_shards:
        for object_key in old_shard.scan_objects():
            new_shard = hash_ring.get_shard(object_key)
            if new_shard != old_shard:
                await migrate_object(object_key, old_shard, new_shard)
    
    # 4. Update routing table atomically
    await update_routing_table(hash_ring)

Virtual Nodes Smooth Distribution

Without virtual nodes, consistent hashing can create uneven distribution (some shards get more data). Using 100-200 virtual nodes per physical shard creates near-uniform distribution. The trade-off is slightly more memory for the hash ring structure.

Cross-Region Distribution & Disaster Recovery

Instagram serves users worldwide and must survive regional disasters (data center fires, network outages, natural disasters). Cross-region distribution ensures both low latency and disaster recovery.

Multi-Region Architecture:

Converting Mermaid diagram...

Replication Modes:

Mode	Consistency	Latency	Use Case
Synchronous	Strong	High (cross-region RTT)	Critical data, metadata
Asynchronous	Eventual	Low (local ack)	Media objects (standard)
Semi-synchronous	Bounded staleness	Medium	Balanced approach

Media uses asynchronous replication — writes are acknowledged after local durability, then replicated to other regions in the background:

async def write_object_with_replication(object_key: str, data: bytes, regions: List[str]):
    """Write object with cross-region replication."""
    primary_region = get_primary_region(object_key)
    secondary_regions = [r for r in regions if r != primary_region]
    
    # Phase 1: Write to primary (synchronous)
    await primary_storage[primary_region].put(object_key, data)
    
    # Object is now durable in primary region
    # Can acknowledge to client here
    ack_to_client(object_key, status='written')
    
    # Phase 2: Replicate to secondaries (asynchronous)
    for region in secondary_regions:
        asyncio.create_task(
            replicate_to_region(object_key, data, region)
        )

async def replicate_to_region(object_key: str, data: bytes, region: str):
    """Async replication to secondary region."""
    try:
        await secondary_storage[region].put(object_key, data)
        await update_replication_status(object_key, region, 'replicated')
    except ReplicationError as e:
        # Queue for retry
        await replication_queue.enqueue(object_key, region)
        await alert_replication_lag(object_key, region)

Disaster Recovery Scenarios:

Scenario	Impact	Recovery
Single disk failure	No impact	Automatic repair from redundant copies
Server failure	Brief latency increase	Requests route to other servers
Rack failure	Minor impact	Data spread across racks
AZ failure	~33% capacity loss	Traffic shifts to other AZs
Region failure	Major event	Failover to secondary region
Multi-region failure	Catastrophic	Restore from globally distributed backups

Regional Failover:

class RegionalFailover:
    """Manages failover between regions."""
    
    async def detect_region_health(self, region: str) -> HealthStatus:
        """Check if region is healthy."""
        checks = await asyncio.gather(
            self.check_storage_health(region),
            self.check_network_health(region),
            self.check_compute_health(region),
        )
        return HealthStatus.aggregate(checks)
    
    async def initiate_failover(self, failed_region: str, backup_region: str):
        """Failover traffic from failed region to backup."""
        # 1. Update DNS to route to backup region
        await dns_service.update_region_routing(
            failed_region, 
            target=backup_region
        )
        
        # 2. Ensure backup region has capacity
        await auto_scaler.scale_up(backup_region, factor=2.0)
        
        # 3. Warm caches with likely-needed data
        await cache_warmer.warm_region(backup_region, failed_region)
        
        # 4. Monitor and alert
        await alerting.send_failover_notification(failed_region, backup_region)

RTO and RPO:

Recovery Time Objective (RTO): <15 minutes for regional failover
Recovery Point Objective (RPO): <5 minutes of data loss (async replication lag)

Replication Lag During Outages

Async replication means data written just before a failure might not have reached secondary regions. This 'replication lag' (typically seconds to minutes) represents potential data loss. Critical data (user accounts, financial) uses synchronous replication despite higher latency.

Data Lifecycle: Deletion, Archival & Compliance

Data doesn't live forever. Users delete photos, accounts are terminated, and regulations require data removal. Managing the lifecycle of 100+ billion objects is an operational challenge.

Data Deletion Scenarios:

Scenario	Trigger	Timeline	Complexity
User deletes post	User action	Immediate (soft) / 30 days (hard)	Low
Story expiration	24-hour TTL	Automatic	Low
Account deletion	User request	30-day grace, then permanent	High
Policy takedown	Policy violation	Immediate blocking, async deletion	Medium
GDPR/CCPA request	User right request	30 days mandated	High
Copyright takedown	DMCA request	Prompt removal	Medium
Court order	Legal requirement	As specified	Variable

Deletion Pipeline
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
class DeletionService:
    """Manages data deletion across all storage systems."""
    
    # Grace period before permanent deletion
    SOFT_DELETE_GRACE_DAYS = 30
    
    async def delete_post(self, post_id: str, reason: DeletionReason):
        """
        Delete a post and all associated media.
        Uses soft delete with grace period for user deletions.
        """
        # 1. Soft delete - hide from users immediately
        await self.soft_delete(post_id)
        
        # 2. Queue for hard deletion after grace period
        if reason == DeletionReason.USER_INITIATED:
            await deletion_queue.schedule(
                post_id,
                execution_time=time.now() + timedelta(days=self.SOFT_DELETE_GRACE_DAYS)
            )
        else:
            # Policy violations: shorter grace or immediate
            await deletion_queue.schedule(
                post_id,
                execution_time=time.now() + timedelta(hours=24)
            )
    
    async def soft_delete(self, post_id: str):
        """Mark as deleted but don't remove data yet."""
        await db.update('posts', post_id, {
            'is_deleted': True,
            'deleted_at': time.now(),
            'visible': False
        })
        
        # Remove from all caches
        await cache.invalidate(f"post:{post_id}")
        await cache.invalidate(f"media:{post_id}:*")
        
        # Remove from CDN
        await cdn.purge_urls(get_media_urls(post_id))
    
    async def hard_delete(self, post_id: str):
        """Permanently remove all data."""
        # 1. Delete all media variants from storage
        media_keys = await get_media_keys(post_id)
        for key in media_keys:
            # Delete from all regions
            for region in ALL_REGIONS:
                await storage[region].delete(key)
        
        # 2. Delete metadata
        await db.delete('posts', post_id)
        await db.delete('post_metadata', post_id)
        
        # 3. Delete from search index
        await search_index.delete(post_id)
        
        # 4. Delete analytics data (may have separate retention)
        await analytics.queue_deletion(post_id)
        
        # 5. Log deletion for compliance
        await audit_log.record_deletion(post_id)

Account Deletion Complexity:

Deleting an account requires removing data from dozens of systems:

Account deletion checklist:
□ Profile data (name, bio, settings)
□ All posts and stories
□ All media files (photos, videos)
□ Direct messages (both sides?)
□ Comments on others' posts
□ Likes and saves
□ Follower/following relationships
□ Notification history
□ Search history
□ Login sessions
□ Third-party app connections
□ Analytics and insights data
□ Ad targeting data
□ ML feature vectors
□ Recommendation embeddings
□ Payment information
□ Business account data (if applicable)

GDPR Right to Erasure:

European users can request complete data deletion under GDPR:

async def process_gdpr_erasure_request(user_id: str, request_id: str):
    """Process GDPR Article 17 erasure request."""
    
    # Validate request authenticity
    await verify_user_identity(user_id, request_id)
    
    # Create deletion manifest
    manifest = await generate_deletion_manifest(user_id)
    
    # Execute deletion across all systems
    deletion_tasks = []
    for system, data_categories in manifest.items():
        task = delete_from_system(system, user_id, data_categories)
        deletion_tasks.append(task)
    
    results = await asyncio.gather(*deletion_tasks, return_exceptions=True)
    
    # Verify completion
    failed = [r for r in results if isinstance(r, Exception)]
    if failed:
        await escalate_deletion_failures(user_id, failed)
    
    # Generate compliance report
    report = await generate_deletion_report(user_id, manifest, results)
    
    # Retain deletion record for compliance (proving we deleted)
    await compliance_log.record(request_id, report)
    
    # Notify user
    await notify_erasure_complete(user_id, request_id)

The Deletion Paradox

GDPR requires both deleting user data AND proving you deleted it. This creates a paradox: you must keep some record (the deletion audit log) to prove deletion occurred. The audit log contains minimal identifying information and is kept separately with strict access controls.

Cost Optimization at Exabyte Scale

At exabyte scale, every percentage of optimization saves millions of dollars annually. Instagram employs numerous techniques to minimize storage costs while maintaining performance and durability.

Cost Breakdown:

Cost Component	Approximate %	Optimization Lever
Storage capacity	60%	Tiering, compression, deduplication
Data transfer (egress)	25%	CDN caching, regional routing
Operations (IOPS)	10%	Caching, batch operations
Compute (processing)	5%	Efficient codecs, batching

Optimization Techniques:

Storage Cost Optimization

•Aggressive compression: WebP/AVIF instead of JPEG saves 30-50% storage
•Tiered storage: 70% of data in cheap cold storage
•Variant reduction: Dynamically generate rare variants instead of pre-storing
•Deduplication: Detect duplicate uploads (same photo shared multiple times)
•Content-aware encoding: Adjust quality based on image complexity
•Erasure coding: 50% overhead instead of 200% for cold data
•Lazy deletion: Batch deletes for efficiency
•Regional optimization: Store primarily in cheapest region, replicate strategically

Image Compression Optimization:

class AdaptiveCompressor:
    """Optimize image compression based on content."""
    
    def compress_image(self, image: Image) -> bytes:
        """
        Compress image with content-aware quality selection.
        """
        # Analyze image complexity
        complexity = self.analyze_complexity(image)
        
        # Simple images (flat colors, graphics) can be heavily compressed
        # Complex images (textures, nature) need higher quality
        if complexity < 0.3:
            quality = 70  # Low complexity, high compression
        elif complexity < 0.6:
            quality = 80  # Medium complexity
        else:
            quality = 90  # High complexity, preserve detail
        
        # Choose format based on content
        if image.has_transparency:
            return self.encode_webp(image, quality)
        elif complexity < 0.5:
            return self.encode_webp(image, quality)  # WebP better for simple
        else:
            return self.encode_jpeg(image, quality)  # JPEG competitive for complex
    
    def analyze_complexity(self, image: Image) -> float:
        """Estimate image complexity 0-1 using edge detection."""
        edges = cv2.Canny(image, 100, 200)
        edge_density = np.mean(edges) / 255
        return edge_density

Deduplication Strategy:

class PhotoDeduplicator:
    """Detect and deduplicate identical uploads."""
    
    def __init__(self):
        # Store perceptual hashes of all photos
        self.hash_index = PerceptualHashIndex()
    
    async def check_duplicate(self, image: bytes) -> Optional[str]:
        """
        Check if this image is a duplicate of existing content.
        Returns existing media_id if duplicate found.
        """
        # Compute perceptual hash
        phash = self.compute_perceptual_hash(image)
        
        # Check for exact match
        existing_id = await self.hash_index.lookup_exact(phash)
        if existing_id:
            return existing_id
        
        # Check for near-duplicate (slight edits, different resolution)
        near_matches = await self.hash_index.lookup_similar(phash, threshold=0.95)
        if near_matches:
            # Verify with content comparison
            for match_id in near_matches:
                if await self.verify_visual_match(image, match_id):
                    return match_id
        
        return None
    
    async def handle_duplicate_upload(self, new_upload_id: str, existing_id: str):
        """
        Handle duplicate by referencing existing media.
        """
        # Don't store new media - reference existing
        await db.update('posts', new_upload_id, {
            'media_id': existing_id,
            'is_reference': True
        })
        
        # Increment reference count on original
        await db.increment('media', existing_id, 'reference_count', 1)
        
        # Log storage saved
        await metrics.record('storage_saved_bytes', await get_media_size(existing_id))

Cost Optimization Impact (Estimated)
Optimization	Storage Savings	Annual Savings*
Modern codecs (WebP/AVIF)	30-40%	$20-30M
Tiered storage	60-70%	$50-60M
Deduplication	5-10%	$5-10M
Erasure coding vs replication	50% (on cold)	$10-15M
Variant optimization	10-15%	$5-10M

The Build vs. Buy Decision

At Meta's scale, building custom storage systems (Haystack, f4) is cost-effective. But for most companies, using cloud object storage (S3, GCS) is cheaper considering engineering cost. The break-even point is typically around 100PB+ where custom systems become economical.

Storage Architecture Summary

We've explored how Instagram stores, replicates, and manages exabytes of media—perhaps the largest photo storage system in the world. Let's consolidate the key principles:

Key Storage Principles

•Object storage is the right primitive for media—scalable, durable, and cost-effective for large blobs
•Tiered storage aligns cost with access patterns (hot/warm/cold/archive)
•Replication + erasure coding achieve 11-nines durability with manageable overhead
•Consistent hashing enables seamless sharding across thousands of nodes
•Cross-region distribution provides both low latency and disaster recovery
•Lifecycle management handles deletion, compliance, and data rights at scale
•Continuous cost optimization through compression, deduplication, and smart tiering

Module Complete: Instagram Photo Platform

You've now completed a comprehensive journey through Instagram's architecture:

Requirements: Understanding the scale and constraints of a 2B-user platform
Image Processing: Upload ingestion, transformation, and variant generation
Feed Generation: Fanout, ranking, and personalized curation
Stories: Ephemeral content with unique lifecycle challenges
Explore Algorithm: Recommendation at planetary scale
Storage: Exabyte-scale durability and cost optimization

These patterns—media processing pipelines, hybrid fanout, ML-powered ranking, tiered storage—are foundational to any large-scale content platform. The specific implementations vary, but the principles remain constant.

Instagram Design Complete

You now have a Principal Engineer-level understanding of Instagram's architecture. You can reason about trade-offs in media platforms, design content distribution systems, and architect storage for planetary scale. These patterns will serve you in any system design interview or real-world media platform challenge.