System Design (HLD)Object Storage Fundamentals

Object Storage Fundamentals

LevelIntermediate

Duration60 mins

TopicObject Storage Fundamentals

2 / 4

The Object Storage Model

Anatomy of Object Storage

Having understood why object storage emerged and how it differs from block and file paradigms, we now turn inward to examine how object storage actually works. This isn't abstract theory—these internal mechanics directly influence how you design systems, optimize performance, and avoid costly mistakes.

Every time you call PUT to upload a photo or GET to retrieve a document, a sophisticated distributed system orchestrates the storage, replication, indexing, and retrieval of your data. Understanding this machinery transforms object storage from a "magic cloud bucket" into a predictable, optimizable engineering component.

What You Will Learn

By the end of this page, you will understand the complete internal model of object storage: the anatomy of an object, how objects are organized into buckets with namespaced keys, the HTTP-based access model, how metadata enables rich functionality, and the version control mechanisms that enable point-in-time recovery. This knowledge is essential for designing efficient, cost-effective object storage architectures.

The Anatomy of an Object

An object in object storage is not simply a file with a different name. It's a carefully structured data entity designed for distributed systems. Every object consists of three distinct components, each serving a critical purpose:

1. Object Data (The Payload)

The actual bytes of content—your image, video, backup file, or log. This can range from zero bytes to multiple terabytes (AWS S3 supports objects up to 5TB). The storage system treats this as an opaque blob; it doesn't parse or interpret the content.

2. Object Metadata (The Description)

Key-value pairs that describe the object. Metadata comes in two forms:

System Metadata: Created and managed by the storage system—creation date, content length, ETag (hash), storage class, etc.
User Metadata: Custom key-value pairs you define—x-amz-meta-author: "John Doe", x-amz-meta-version: "2.4.1"

3. Object Key (The Identity)

A unique identifier within the bucket's namespace. The key is the object's "name" and determines how you access it. Keys can be up to 1024 bytes and may contain any UTF-8 characters, including slashes that create the illusion of directory structure.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
┌──────────────────────────────────────────────────────┐
│ OBJECT: "photos/2024/vacation/beach_sunset.jpg"       │
├──────────────────────────────────────────────────────┤
│ KEY                                                  │
│   └─ "photos/2024/vacation/beach_sunset.jpg"         │
├──────────────────────────────────────────────────────┤
│ SYSTEM METADATA                                      │
│   ├─ Content-Type: "image/jpeg"                      │
│   ├─ Content-Length: 2847592                         │
│   ├─ ETag: "a7f3d9b2c1e4..."                         │
│   ├─ Last-Modified: "2024-07-15T14:32:18Z"           │
│   ├─ Storage-Class: "STANDARD"                       │
│   └─ x-amz-server-side-encryption: "AES256"          │
├──────────────────────────────────────────────────────┤
│ USER METADATA                                        │
│   ├─ x-amz-meta-photographer: "jane.smith"           │
│   ├─ x-amz-meta-camera: "Canon EOS R5"               │
│   └─ x-amz-meta-location: "Maldives"                 │
├──────────────────────────────────────────────────────┤
│ DATA                                                 │
│   └─ [2,847,592 bytes of JPEG image data]            │
└──────────────────────────────────────────────────────┘

ETags and Content Verification

The ETag (entity tag) is typically an MD5 hash of the object's content, enabling integrity verification in transit. For multipart uploads, the ETag format differs—it becomes a hash of the part hashes plus a part count (e.g., "a7f3d9b2-5" indicates 5 parts). This distinction matters when implementing client-side verification.

Common System Metadata Fields
Metadata Field	Description	Mutable
Content-Type	MIME type of the object (e.g., image/png, application/json)	Yes (on upload)
Content-Length	Size in bytes	No (determined by content)
Content-Encoding	Encoding transformations applied (e.g., gzip)	Yes (on upload)
ETag	Hash for integrity verification	No (computed automatically)
Last-Modified	Timestamp of last modification	No (set automatically)
Cache-Control	Caching directives for CDN/browsers	Yes (on upload)
Storage-Class	Performance/cost tier (STANDARD, GLACIER, etc.)	Yes (via lifecycle or copy)
x-amz-server-side-encryption	Encryption algorithm used	Yes (policy or per-object)

Object Immutability and Atomicity

A crucial characteristic of object storage is that objects are immutable at the content level. You cannot append to an object or modify bytes 1000-2000 while leaving the rest unchanged. Any modification requires uploading an entirely new version of the object.

This immutability provides atomic writes: either the entire new object is written successfully, or the old object remains unchanged. There's no partial state where half the new content is visible. This atomicity simplifies concurrency reasoning significantly compared to filesystems where partial writes are possible.

However, immutability has implications:

Modifying a 5GB object requires re-uploading all 5GB, even for a one-byte change
Append-heavy workloads (logs) require different strategies (consolidate and re-upload periodically)
Version history grows with each "modify" (if versioning is enabled)

Buckets and Namespace Organization

Objects don't exist in isolation—they're organized into buckets (AWS terminology) or containers (Azure terminology). A bucket is a namespace that groups related objects and provides the boundary for access control, logging, and billing.

Bucket Naming and Global Uniqueness

In most cloud object storage systems, bucket names are globally unique across all customers. When you create a bucket named "my-app-assets" in AWS S3, no other AWS account can create a bucket with that name—anywhere in the world. This global uniqueness enables predictable, consistent URL addressing.

Bucket Naming Best Practices

•Include identifiers — Prefix with company/project name: "acme-prod-assets" instead of just "assets"
•DNS-compatible names — Must be 3-63 characters, lowercase, alphanumeric, and hyphens only
•Regional awareness — Include region hints if deploying multi-region: "acme-assets-us-west-2"
•Environment separation — Production and development should use different buckets: "acme-prod-uploads" vs "acme-dev-uploads"
•Avoid PII and secrets — Bucket names are visible in URLs; never include sensitive information

The Flat Namespace Illusion

Object storage uses a flat namespace—there are no actual directories or folders. The object key photos/2024/vacation/beach.jpg is a single string, not a nested path. The slashes are just characters in the key, with no special meaning to the storage system.

However, object storage APIs create a folder illusion by supporting:

Prefix filtering: List objects with key prefix "photos/2024/"
Delimiter filtering: List objects grouped by "/" delimiter to simulate directory listing
Console presentation: Cloud consoles render keys with slashes as navigable folder trees

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Request: List objects in "photos/" "directory"
GET /?prefix=photos/&delimiter=/
 
// Response: Shows "subdirectories" and files
{
  "Name": "my-bucket",
  "Prefix": "photos/",
  "Delimiter": "/",
  "CommonPrefixes": [           // These look like subdirectories
    { "Prefix": "photos/2023/" },
    { "Prefix": "photos/2024/" }
  ],
  "Contents": [                 // These are objects directly in "photos/"
    { "Key": "photos/index.html", "Size": 1024 },
    { "Key": "photos/readme.txt", "Size": 512 }
  ]
}

The Listing Trap

Because the namespace is flat, listing operations can be expensive. Listing a "directory" with 10 million objects requires iterating through 10 million keys and filtering by prefix. This is O(n) where n is the total objects in the bucket. Design your key structure to avoid hot prefixes that require frequent listing operations.

Key Design Strategies

The way you structure your object keys has significant implications for performance, organization, and cost:

Anti-Pattern: Timestamp-Based Prefixes

logs/2024-01-15-12-00-00-request-123.json
logs/2024-01-15-12-00-01-request-456.json

This creates a "hot partition" problem—all objects share a common prefix in time order, concentrating load on storage nodes responsible for that key range.

Better: Random Prefix Distribution

logs/a1b2c3-2024-01-15-12-00-00-request-123.json
logs/f4e5d6-2024-01-15-12-00-01-request-456.json

The random hash prefix distributes objects across storage partitions, eliminating bottlenecks.

Better: Purpose-Organized Hierarchies

users/{user-id}/profile/avatar.png
users/{user-id}/documents/{document-id}.pdf
products/{product-id}/images/main.jpg

This structure supports intuitive organization while distributing load by user or product ID.

The HTTP/REST Access Model

Object storage's HTTP/REST interface is one of its defining characteristics. Unlike block storage's SCSI commands or file storage's NFS RPC calls, object storage uses standard HTTP verbs that any programming language and any network library can speak. This universality is intentional—it makes object storage accessible from anywhere, by anything.

Core Operations Map to HTTP Verbs

PUT: Upload (create or replace) an object
GET: Retrieve an object's content
HEAD: Retrieve only metadata (no content transfer)
DELETE: Remove an object
POST: Multipart upload operations, presigned URL generation
COPY (custom): Server-side copy between keys/buckets

HTTP Operations and Object Storage Mapping
Operation	HTTP Verb	Path Pattern	Purpose
Put Object	PUT	/bucket/object-key	Upload object content
Get Object	GET	/bucket/object-key	Download object content
Head Object	HEAD	/bucket/object-key	Get metadata only (no body)
Delete Object	DELETE	/bucket/object-key	Remove object
List Objects	GET	/bucket?prefix=xxx	Enumerate objects in bucket
Copy Object	PUT + header	/dest-bucket/dest-key	Server-side copy
Multipart Init	POST	/bucket/key?uploads	Start multipart upload
Upload Part	PUT	/bucket/key?partNumber=N&uploadId=X	Upload part of multipart
Complete Multipart	POST	/bucket/key?uploadId=X	Finalize multipart upload

Request Authentication: The Signature Process

HTTP requests to object storage must be authenticated. AWS S3 uses Signature Version 4 (SigV4), a cryptographic signing process that:

Creates a canonical request string from HTTP method, URL, headers, and body hash
Signs this with your secret access key using HMAC-SHA256
Includes the signature in the Authorization header or query string

This signature proves you hold valid credentials without transmitting your secret key. The signature includes a timestamp, preventing replay attacks (requests expire after ~15 minutes).

Pre-Signed URLs

A powerful feature is the ability to generate pre-signed URLs—temporary, shareable links that grant time-limited access to private objects. The signature is embedded in the URL:

https://my-bucket.s3.region.amazonaws.com/my-object
  ?X-Amz-Algorithm=AWS4-HMAC-SHA256
  &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20240115/region/s3/aws4_request
  &X-Amz-Date=20240115T120000Z
  &X-Amz-Expires=3600
  &X-Amz-SignedHeaders=host
  &X-Amz-Signature=a7f3d9b2c1e4...

Anyone with this URL can access the object for the specified duration (3600 seconds here), without needing AWS credentials.

Direct Browser Uploads

Pre-signed URLs enable direct browser-to-S3 uploads, bypassing your server entirely. Your backend generates a signed PUT URL; the frontend uploads directly to object storage. This reduces server bandwidth and latency while offloading the heavy lifting to the cloud provider's infrastructure.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import { S3Client, GetObjectCommand, PutObjectCommand } from "@aws-sdk/client-s3";
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";
 
const s3Client = new S3Client({ region: "us-west-2" });
 
// Generate a pre-signed URL for downloading (GET)
async function getDownloadUrl(bucket: string, key: string): Promise<string> {
    const command = new GetObjectCommand({ Bucket: bucket, Key: key });
    const signedUrl = await getSignedUrl(s3Client, command, { expiresIn: 3600 });
    return signedUrl;  // Valid for 1 hour
}
 
// Generate a pre-signed URL for uploading (PUT)
async function getUploadUrl(bucket: string, key: string, contentType: string): Promise<string> {
    const command = new PutObjectCommand({
        Bucket: bucket,
        Key: key,
        ContentType: contentType
    });
    const signedUrl = await getSignedUrl(s3Client, command, { expiresIn: 900 });
    return signedUrl;  // Valid for 15 minutes
}
 
// Frontend can now PUT directly to S3:
// fetch(uploadUrl, { method: "PUT", body: file, headers: { "Content-Type": contentType } });

Multipart Uploads for Large Objects

For objects larger than ~100MB (or when network reliability is a concern), multipart upload is essential. Instead of uploading a 5GB file in one request (which fails completely if the connection drops), you:

Initiate the multipart upload, receiving an upload ID
Upload parts in parallel (S3 allows up to 10,000 parts, each 5MB to 5GB)
Complete the upload by listing all part ETags, which assembles the final object

Benefits of Multipart:

Failed parts can be retried without re-uploading successful ones
Parts upload in parallel, utilizing full bandwidth
Upload can be paused and resumed (upload ID is valid for days)
Larger maximum object size (5TB vs 5GB single PUT limit)

Multipart Part Size Strategy: For a 10GB file, finding the optimal part size involves balancing:

Fewer large parts = fewer API calls, less overhead
More smaller parts = faster retries if failures occur
AWS S3 requires minimum 5MB parts (except last part); maximum 10,000 parts

Typically, 8-64MB parts work well for most scenarios.

Metadata Architecture and Usage Patterns

Metadata is often overlooked, but it's the key to building intelligent, efficient systems on top of object storage. While the object body is an opaque blob, metadata makes objects searchable, organizable, and self-describing.

System Metadata vs User Metadata

System metadata is managed by the storage service and includes properties you would expect: content-type, content-length, last-modified timestamp, ETag hash, storage class, and encryption status.

User metadata (sometimes called custom metadata) is entirely user-defined. In AWS S3, user metadata keys must be prefixed with x-amz-meta-. You can store any key-value pairs that help your application:

x-amz-meta-project: "marketing-campaign-q1"
x-amz-meta-uploader: "user-12345"
x-amz-meta-source-system: "image-processor-v2"
x-amz-meta-processing-state: "thumbnailed"

User Metadata Limits by Cloud Provider
Provider	Max Metadata Size	Max Key Length	Max Value Length
AWS S3	2 KB total	No explicit limit	No explicit limit
Google Cloud Storage	8 KB total	1024 bytes	No explicit limit within total
Azure Blob Storage	8 KB total per blob	No explicit limit	No explicit limit

Metadata Retrieval Efficiency

Metadata is returned with HEAD and GET requests, but you cannot query objects by metadata in object storage. If you need to find "all objects where x-amz-meta-status = processed", you must maintain a separate index (DynamoDB, PostgreSQL, Elasticsearch). Object storage is not a database; it won't efficiently query by arbitrary metadata fields.

Practical Metadata Patterns

Pattern 1: Processing Pipeline State Track an object's processing status without a separate database:

x-amz-meta-state: "uploaded" → "processing" → "ready" → "archived"
x-amz-meta-processor-version: "2.1.4"
x-amz-meta-processed-at: "2024-01-15T14:30:00Z"

Workers read metadata with HEAD, process if needed, then copy the object with updated metadata.

Pattern 2: Application Context Attach business context for debugging and auditing:

x-amz-meta-request-id: "req-abc123"
x-amz-meta-user-agent: "iOS-App/3.2.1"
x-amz-meta-feature-flag: "new-upload-flow"

Pattern 3: Content Management Organize content with descriptive metadata:

x-amz-meta-title: "Q4 Sales Report"
x-amz-meta-author: "finance-team"
x-amz-meta-department: "sales"
x-amz-meta-confidentiality: "internal"

Metadata Anti-Patterns

•Storing large data in metadata — Metadata has size limits (2-8KB); use the object body for large content
•Encoding binary data — Base64 encoding wastes space and can hit limits; prefer the object body
•Relying on metadata for queries — Object storage doesn't support metadata queries; use a dedicated index
•Sensitive information exposure — Metadata is visible in API responses; don't store secrets there
•Frequent metadata-only updates — Changing metadata requires copying the entire object (expensive for large objects)

Tags vs Metadata: Know the Difference

AWS S3 offers both metadata and tags. They serve different purposes:

Aspect	Metadata	Tags
Set when	Object creation/copy only	Anytime via separate API
Mutable	No (requires object copy)	Yes (via PUT tagging)
Returned with	GET/HEAD requests	Separate GET tagging API
Limit	2 KB total	Up to 10 tags per object
Use case	Static object description	Dynamic classification, lifecycle policies, billing allocation

Tags are ideal for categorization that changes (project assignments, lifecycle stage) and for triggering lifecycle policies. Metadata is better for immutable descriptions set at upload time.

Versioning: Managing Object History

Object storage supports versioning—maintaining a complete history of every version of every object. When versioning is enabled on a bucket, each PUT operation creates a new version rather than replacing the existing object. This capability is fundamental for data protection, compliance, and implementing rollback mechanisms.

How Versioning Works

When versioning is enabled:

Each object has a unique version ID (in addition to its key)
PUT creates a new version; the previous version remains accessible
DELETE adds a delete marker (a special version indicating deletion) rather than removing data
GET without version ID returns the current (most recent) version
GET with version ID returns that specific historical version

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Bucket: my-bucket (versioning enabled)
Key: config/app-settings.json
 
Version History:
┌────────────────────────────────────────────────────────────────────┐
│ Version ID        │ Timestamp           │ Size  │ Status          │
├────────────────────────────────────────────────────────────────────┤
│ vrsn_001aBcDeF   │ 2024-01-10 09:00:00 │ 1.2KB │ Initial upload  │
│ vrsn_002GhIjKl   │ 2024-01-12 14:30:00 │ 1.3KB │                 │
│ vrsn_003MnOpQr   │ 2024-01-15 11:15:00 │ 1.4KB │ Current version │
│ (delete marker)   │ 2024-01-16 08:00:00 │ -     │ Deleted         │
└────────────────────────────────────────────────────────────────────┘
 
GET request (no version):     → Returns "null" (object appears deleted)
GET request (vrsn_003MnOpQr): → Returns the 1.4KB version
GET request (vrsn_001aBcDeF): → Returns the 1.2KB original version

MFA Delete for Critical Data

AWS S3 supports MFA Delete, which requires multi-factor authentication to permanently delete object versions or change versioning state. This prevents accidental or malicious permanent data loss, even if credentials are compromised. MFA Delete is a critical protection for compliance-sensitive data.

Versioning States

A bucket can be in three versioning states:

Unversioned (default): Objects are overwritten on PUT; DELETE permanently removes
Versioning-Enabled: New versions created on PUT; DELETE creates delete markers
Versioning-Suspended: Existing versions preserved; new PUTs use null version ID

Important: You cannot disable versioning once enabled—only suspend it. Historical versions persist even when suspended. This is intentional: it prevents data loss through configuration changes.

Versioning Use Cases

•Accidental deletion recovery — Easily restore deleted objects by removing the delete marker
•Configuration rollback — Revert application configs to known-good historical versions
•Audit compliance — Maintain complete change history for regulatory requirements (SOX, HIPAA)
•Concurrent update handling — Detect when clients are overwriting each other's changes
•Point-in-time snapshots — Combine with object lock for immutable snapshots
•Safe deployments — Static assets can roll back instantly to previous versions

Cost Implications of Versioning

Versioning isn't free—you pay for storage of every version:

Example: A 10MB file updated daily for a year:

Without versioning: 10MB storage
With versioning: 10MB × 365 days = 3.65GB storage

To manage costs while preserving history:

Lifecycle policies: Automatically expire old versions after N days
Intelligent tiering: Move old versions to cheaper storage classes
Selective versioning: Only enable versioning on buckets that truly need history
Noncurrent version expiration: Delete versions older than retention requirements

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
  "Rules": [
    {
      "ID": "ManageVersionHistory",
      "Status": "Enabled",
      "Filter": { "Prefix": "" },
      "NoncurrentVersionTransitions": [
        {
          "NoncurrentDays": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "NoncurrentDays": 90,
          "StorageClass": "GLACIER"
        }
      ],
      "NoncurrentVersionExpiration": {
        "NoncurrentDays": 365
      }
    }
  ]
}
 
// Result:
// - Current version: Always in STANDARD storage
// - Versions 30-90 days old: Move to STANDARD_IA (cheaper)
// - Versions 90-365 days old: Move to GLACIER (cheapest)
// - Versions older than 365 days: Permanently deleted

Object Lock: Write-Once-Read-Many (WORM)

For regulatory compliance (SEC, FINRA, healthcare) or critical data protection, Object Lock provides WORM (Write-Once-Read-Many) capability. Once an object is locked, it cannot be deleted or modified until the lock expires—not even by the root account owner.

Object Lock Modes

Governance Mode: Objects are locked, but users with specific IAM permissions can override the lock. This provides protection against accidental deletion while allowing authorized exceptional actions.

Compliance Mode: Objects are locked and cannot be deleted by anyone, including the root account, until the retention period expires. This satisfies regulatory requirements for immutable records.

Legal Hold: A separate flag that prevents deletion regardless of retention settings. Used when objects are subject to legal discovery or investigation. Legal holds must be explicitly removed.

Object Lock Mode Comparison
Aspect	Governance Mode	Compliance Mode
Delete prevention	Yes	Yes
Override possible	Yes (with permissions)	No (not even root)
Shorten retention	Yes (with permissions)	No
Extend retention	Yes	Yes
Use case	Accidental protection	Regulatory compliance
Recovery from misconfiguration	Possible	Must wait for expiration

Compliance Mode Is Irreversible

Compliance mode locks are truly immutable. If you accidentally set a 10-year retention on a petabyte of data, you will pay for 10 years of storage with no override possible. Test thoroughly in governance mode before deploying compliance mode. There is no undo.

Implementation Considerations

Versioning is required: Object lock only works with versioning enabled (locks apply to specific versions)
Default retention: You can set bucket-level default retention so every uploaded object inherits the lock
Retention can extend but not shorten: In compliance mode, you can extend retention periods but never reduce them
Legal hold independence: Legal holds are separate from retention. An object can have both, and must have both removed/expired before deletion
Cost awareness: Locked objects cannot move to cheaper storage or be deleted. Plan storage costs accordingly

Object lock is essential for industries with regulatory data retention requirements. Financial services, healthcare, and government applications often require provable, tamper-proof audit trails that only WORM storage can provide.

Summary: The Complete Object Model

We've conducted a comprehensive exploration of the object storage model. Let's consolidate the key insights:

Key Takeaways

•Objects have three components — Data (the content), metadata (key-value descriptions), and key (the unique identifier)
•Buckets provide namespace — Globally unique containers that group objects and define access boundaries
•Flat namespace, simulated hierarchy — No real directories; key prefixes create the folder illusion
•HTTP/REST is the interface — Standard HTTP verbs (GET, PUT, DELETE) enable universal access
•Metadata enables intelligence — System and user metadata describe objects, but you need external indexes for queries
•Versioning protects history — Every version preserved, enabling rollback and audit compliance
•Object Lock ensures immutability — WORM compliance for regulatory requirements with governance or compliance modes

What's next:

Now that we understand the object storage model in depth, we must confront one of its most challenging aspects: eventual consistency. The next page explores why object storage exhibits consistency behaviors that differ from traditional storage, how to design around eventual consistency, and how modern cloud providers have evolved to offer stronger consistency guarantees. This understanding is critical for building correct, reliable systems on object storage.

Page Complete

You now possess a deep understanding of how object storage systems model and organize data. From the anatomy of individual objects to bucket namespaces, HTTP access patterns, metadata architecture, versioning, and WORM compliance—you can now reason about object storage as a well-understood engineering component rather than an opaque cloud service. Next, we tackle the consistency challenge.

2 / 4

Loading learning content...

System Design (HLD)Object Storage Fundamentals

Object Storage Fundamentals

LevelIntermediate

Duration60 mins

TopicObject Storage Fundamentals

2 / 4

The Object Storage Model

Anatomy of Object Storage

What You Will Learn

The Anatomy of an Object

1. Object Data (The Payload)

2. Object Metadata (The Description)

Key-value pairs that describe the object. Metadata comes in two forms:

System Metadata: Created and managed by the storage system—creation date, content length, ETag (hash), storage class, etc.
User Metadata: Custom key-value pairs you define—x-amz-meta-author: "John Doe", x-amz-meta-version: "2.4.1"

3. Object Key (The Identity)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
┌──────────────────────────────────────────────────────┐
│ OBJECT: "photos/2024/vacation/beach_sunset.jpg"       │
├──────────────────────────────────────────────────────┤
│ KEY                                                  │
│   └─ "photos/2024/vacation/beach_sunset.jpg"         │
├──────────────────────────────────────────────────────┤
│ SYSTEM METADATA                                      │
│   ├─ Content-Type: "image/jpeg"                      │
│   ├─ Content-Length: 2847592                         │
│   ├─ ETag: "a7f3d9b2c1e4..."                         │
│   ├─ Last-Modified: "2024-07-15T14:32:18Z"           │
│   ├─ Storage-Class: "STANDARD"                       │
│   └─ x-amz-server-side-encryption: "AES256"          │
├──────────────────────────────────────────────────────┤
│ USER METADATA                                        │
│   ├─ x-amz-meta-photographer: "jane.smith"           │
│   ├─ x-amz-meta-camera: "Canon EOS R5"               │
│   └─ x-amz-meta-location: "Maldives"                 │
├──────────────────────────────────────────────────────┤
│ DATA                                                 │
│   └─ [2,847,592 bytes of JPEG image data]            │
└──────────────────────────────────────────────────────┘

ETags and Content Verification

Common System Metadata Fields
Metadata Field	Description	Mutable
Content-Type	MIME type of the object (e.g., image/png, application/json)	Yes (on upload)
Content-Length	Size in bytes	No (determined by content)
Content-Encoding	Encoding transformations applied (e.g., gzip)	Yes (on upload)
ETag	Hash for integrity verification	No (computed automatically)
Last-Modified	Timestamp of last modification	No (set automatically)
Cache-Control	Caching directives for CDN/browsers	Yes (on upload)
Storage-Class	Performance/cost tier (STANDARD, GLACIER, etc.)	Yes (via lifecycle or copy)
x-amz-server-side-encryption	Encryption algorithm used	Yes (policy or per-object)

Object Immutability and Atomicity

However, immutability has implications:

Modifying a 5GB object requires re-uploading all 5GB, even for a one-byte change
Append-heavy workloads (logs) require different strategies (consolidate and re-upload periodically)
Version history grows with each "modify" (if versioning is enabled)

Buckets and Namespace Organization

Bucket Naming and Global Uniqueness

Bucket Naming Best Practices

•Include identifiers — Prefix with company/project name: "acme-prod-assets" instead of just "assets"
•DNS-compatible names — Must be 3-63 characters, lowercase, alphanumeric, and hyphens only
•Regional awareness — Include region hints if deploying multi-region: "acme-assets-us-west-2"
•Environment separation — Production and development should use different buckets: "acme-prod-uploads" vs "acme-dev-uploads"
•Avoid PII and secrets — Bucket names are visible in URLs; never include sensitive information

The Flat Namespace Illusion

However, object storage APIs create a folder illusion by supporting:

Prefix filtering: List objects with key prefix "photos/2024/"
Delimiter filtering: List objects grouped by "/" delimiter to simulate directory listing
Console presentation: Cloud consoles render keys with slashes as navigable folder trees

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Request: List objects in "photos/" "directory"
GET /?prefix=photos/&delimiter=/
 
// Response: Shows "subdirectories" and files
{
  "Name": "my-bucket",
  "Prefix": "photos/",
  "Delimiter": "/",
  "CommonPrefixes": [           // These look like subdirectories
    { "Prefix": "photos/2023/" },
    { "Prefix": "photos/2024/" }
  ],
  "Contents": [                 // These are objects directly in "photos/"
    { "Key": "photos/index.html", "Size": 1024 },
    { "Key": "photos/readme.txt", "Size": 512 }
  ]
}

The Listing Trap

Key Design Strategies

The way you structure your object keys has significant implications for performance, organization, and cost:

Anti-Pattern: Timestamp-Based Prefixes

logs/2024-01-15-12-00-00-request-123.json
logs/2024-01-15-12-00-01-request-456.json

This creates a "hot partition" problem—all objects share a common prefix in time order, concentrating load on storage nodes responsible for that key range.

Better: Random Prefix Distribution

logs/a1b2c3-2024-01-15-12-00-00-request-123.json
logs/f4e5d6-2024-01-15-12-00-01-request-456.json

The random hash prefix distributes objects across storage partitions, eliminating bottlenecks.

Better: Purpose-Organized Hierarchies

users/{user-id}/profile/avatar.png
users/{user-id}/documents/{document-id}.pdf
products/{product-id}/images/main.jpg

This structure supports intuitive organization while distributing load by user or product ID.

The HTTP/REST Access Model

Core Operations Map to HTTP Verbs

PUT: Upload (create or replace) an object
GET: Retrieve an object's content
HEAD: Retrieve only metadata (no content transfer)
DELETE: Remove an object
POST: Multipart upload operations, presigned URL generation
COPY (custom): Server-side copy between keys/buckets

HTTP Operations and Object Storage Mapping
Operation	HTTP Verb	Path Pattern	Purpose
Put Object	PUT	/bucket/object-key	Upload object content
Get Object	GET	/bucket/object-key	Download object content
Head Object	HEAD	/bucket/object-key	Get metadata only (no body)
Delete Object	DELETE	/bucket/object-key	Remove object
List Objects	GET	/bucket?prefix=xxx	Enumerate objects in bucket
Copy Object	PUT + header	/dest-bucket/dest-key	Server-side copy
Multipart Init	POST	/bucket/key?uploads	Start multipart upload
Upload Part	PUT	/bucket/key?partNumber=N&uploadId=X	Upload part of multipart
Complete Multipart	POST	/bucket/key?uploadId=X	Finalize multipart upload

Request Authentication: The Signature Process

HTTP requests to object storage must be authenticated. AWS S3 uses Signature Version 4 (SigV4), a cryptographic signing process that:

Creates a canonical request string from HTTP method, URL, headers, and body hash
Signs this with your secret access key using HMAC-SHA256
Includes the signature in the Authorization header or query string

This signature proves you hold valid credentials without transmitting your secret key. The signature includes a timestamp, preventing replay attacks (requests expire after ~15 minutes).

Pre-Signed URLs

A powerful feature is the ability to generate pre-signed URLs—temporary, shareable links that grant time-limited access to private objects. The signature is embedded in the URL:

https://my-bucket.s3.region.amazonaws.com/my-object
  ?X-Amz-Algorithm=AWS4-HMAC-SHA256
  &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20240115/region/s3/aws4_request
  &X-Amz-Date=20240115T120000Z
  &X-Amz-Expires=3600
  &X-Amz-SignedHeaders=host
  &X-Amz-Signature=a7f3d9b2c1e4...

Anyone with this URL can access the object for the specified duration (3600 seconds here), without needing AWS credentials.

Direct Browser Uploads

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import { S3Client, GetObjectCommand, PutObjectCommand } from "@aws-sdk/client-s3";
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";
 
const s3Client = new S3Client({ region: "us-west-2" });
 
// Generate a pre-signed URL for downloading (GET)
async function getDownloadUrl(bucket: string, key: string): Promise<string> {
    const command = new GetObjectCommand({ Bucket: bucket, Key: key });
    const signedUrl = await getSignedUrl(s3Client, command, { expiresIn: 3600 });
    return signedUrl;  // Valid for 1 hour
}
 
// Generate a pre-signed URL for uploading (PUT)
async function getUploadUrl(bucket: string, key: string, contentType: string): Promise<string> {
    const command = new PutObjectCommand({
        Bucket: bucket,
        Key: key,
        ContentType: contentType
    });
    const signedUrl = await getSignedUrl(s3Client, command, { expiresIn: 900 });
    return signedUrl;  // Valid for 15 minutes
}
 
// Frontend can now PUT directly to S3:
// fetch(uploadUrl, { method: "PUT", body: file, headers: { "Content-Type": contentType } });

Multipart Uploads for Large Objects

Initiate the multipart upload, receiving an upload ID
Upload parts in parallel (S3 allows up to 10,000 parts, each 5MB to 5GB)
Complete the upload by listing all part ETags, which assembles the final object

Benefits of Multipart:

Failed parts can be retried without re-uploading successful ones
Parts upload in parallel, utilizing full bandwidth
Upload can be paused and resumed (upload ID is valid for days)
Larger maximum object size (5TB vs 5GB single PUT limit)

Multipart Part Size Strategy: For a 10GB file, finding the optimal part size involves balancing:

Fewer large parts = fewer API calls, less overhead
More smaller parts = faster retries if failures occur
AWS S3 requires minimum 5MB parts (except last part); maximum 10,000 parts

Typically, 8-64MB parts work well for most scenarios.

Metadata Architecture and Usage Patterns

System Metadata vs User Metadata

System metadata is managed by the storage service and includes properties you would expect: content-type, content-length, last-modified timestamp, ETag hash, storage class, and encryption status.

x-amz-meta-project: "marketing-campaign-q1"
x-amz-meta-uploader: "user-12345"
x-amz-meta-source-system: "image-processor-v2"
x-amz-meta-processing-state: "thumbnailed"

User Metadata Limits by Cloud Provider
Provider	Max Metadata Size	Max Key Length	Max Value Length
AWS S3	2 KB total	No explicit limit	No explicit limit
Google Cloud Storage	8 KB total	1024 bytes	No explicit limit within total
Azure Blob Storage	8 KB total per blob	No explicit limit	No explicit limit

Metadata Retrieval Efficiency

Practical Metadata Patterns

Pattern 1: Processing Pipeline State Track an object's processing status without a separate database:

x-amz-meta-state: "uploaded" → "processing" → "ready" → "archived"
x-amz-meta-processor-version: "2.1.4"
x-amz-meta-processed-at: "2024-01-15T14:30:00Z"

Workers read metadata with HEAD, process if needed, then copy the object with updated metadata.

Pattern 2: Application Context Attach business context for debugging and auditing:

x-amz-meta-request-id: "req-abc123"
x-amz-meta-user-agent: "iOS-App/3.2.1"
x-amz-meta-feature-flag: "new-upload-flow"

Pattern 3: Content Management Organize content with descriptive metadata:

x-amz-meta-title: "Q4 Sales Report"
x-amz-meta-author: "finance-team"
x-amz-meta-department: "sales"
x-amz-meta-confidentiality: "internal"

Metadata Anti-Patterns

•Storing large data in metadata — Metadata has size limits (2-8KB); use the object body for large content
•Encoding binary data — Base64 encoding wastes space and can hit limits; prefer the object body
•Relying on metadata for queries — Object storage doesn't support metadata queries; use a dedicated index
•Sensitive information exposure — Metadata is visible in API responses; don't store secrets there
•Frequent metadata-only updates — Changing metadata requires copying the entire object (expensive for large objects)

Tags vs Metadata: Know the Difference

AWS S3 offers both metadata and tags. They serve different purposes:

Aspect	Metadata	Tags
Set when	Object creation/copy only	Anytime via separate API
Mutable	No (requires object copy)	Yes (via PUT tagging)
Returned with	GET/HEAD requests	Separate GET tagging API
Limit	2 KB total	Up to 10 tags per object
Use case	Static object description	Dynamic classification, lifecycle policies, billing allocation

Tags are ideal for categorization that changes (project assignments, lifecycle stage) and for triggering lifecycle policies. Metadata is better for immutable descriptions set at upload time.

Versioning: Managing Object History

How Versioning Works

When versioning is enabled:

Each object has a unique version ID (in addition to its key)
PUT creates a new version; the previous version remains accessible
DELETE adds a delete marker (a special version indicating deletion) rather than removing data
GET without version ID returns the current (most recent) version
GET with version ID returns that specific historical version

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Bucket: my-bucket (versioning enabled)
Key: config/app-settings.json
 
Version History:
┌────────────────────────────────────────────────────────────────────┐
│ Version ID        │ Timestamp           │ Size  │ Status          │
├────────────────────────────────────────────────────────────────────┤
│ vrsn_001aBcDeF   │ 2024-01-10 09:00:00 │ 1.2KB │ Initial upload  │
│ vrsn_002GhIjKl   │ 2024-01-12 14:30:00 │ 1.3KB │                 │
│ vrsn_003MnOpQr   │ 2024-01-15 11:15:00 │ 1.4KB │ Current version │
│ (delete marker)   │ 2024-01-16 08:00:00 │ -     │ Deleted         │
└────────────────────────────────────────────────────────────────────┘
 
GET request (no version):     → Returns "null" (object appears deleted)
GET request (vrsn_003MnOpQr): → Returns the 1.4KB version
GET request (vrsn_001aBcDeF): → Returns the 1.2KB original version

MFA Delete for Critical Data

Versioning States

A bucket can be in three versioning states:

Unversioned (default): Objects are overwritten on PUT; DELETE permanently removes
Versioning-Enabled: New versions created on PUT; DELETE creates delete markers
Versioning-Suspended: Existing versions preserved; new PUTs use null version ID

Important: You cannot disable versioning once enabled—only suspend it. Historical versions persist even when suspended. This is intentional: it prevents data loss through configuration changes.

Versioning Use Cases

•Accidental deletion recovery — Easily restore deleted objects by removing the delete marker
•Configuration rollback — Revert application configs to known-good historical versions
•Audit compliance — Maintain complete change history for regulatory requirements (SOX, HIPAA)
•Concurrent update handling — Detect when clients are overwriting each other's changes
•Point-in-time snapshots — Combine with object lock for immutable snapshots
•Safe deployments — Static assets can roll back instantly to previous versions

Cost Implications of Versioning

Versioning isn't free—you pay for storage of every version:

Example: A 10MB file updated daily for a year:

Without versioning: 10MB storage
With versioning: 10MB × 365 days = 3.65GB storage

To manage costs while preserving history:

Lifecycle policies: Automatically expire old versions after N days
Intelligent tiering: Move old versions to cheaper storage classes
Selective versioning: Only enable versioning on buckets that truly need history
Noncurrent version expiration: Delete versions older than retention requirements

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
  "Rules": [
    {
      "ID": "ManageVersionHistory",
      "Status": "Enabled",
      "Filter": { "Prefix": "" },
      "NoncurrentVersionTransitions": [
        {
          "NoncurrentDays": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "NoncurrentDays": 90,
          "StorageClass": "GLACIER"
        }
      ],
      "NoncurrentVersionExpiration": {
        "NoncurrentDays": 365
      }
    }
  ]
}
 
// Result:
// - Current version: Always in STANDARD storage
// - Versions 30-90 days old: Move to STANDARD_IA (cheaper)
// - Versions 90-365 days old: Move to GLACIER (cheapest)
// - Versions older than 365 days: Permanently deleted

Object Lock: Write-Once-Read-Many (WORM)

Object Lock Modes

Compliance Mode: Objects are locked and cannot be deleted by anyone, including the root account, until the retention period expires. This satisfies regulatory requirements for immutable records.

Legal Hold: A separate flag that prevents deletion regardless of retention settings. Used when objects are subject to legal discovery or investigation. Legal holds must be explicitly removed.

Object Lock Mode Comparison
Aspect	Governance Mode	Compliance Mode
Delete prevention	Yes	Yes
Override possible	Yes (with permissions)	No (not even root)
Shorten retention	Yes (with permissions)	No
Extend retention	Yes	Yes
Use case	Accidental protection	Regulatory compliance
Recovery from misconfiguration	Possible	Must wait for expiration

Compliance Mode Is Irreversible

Implementation Considerations

Versioning is required: Object lock only works with versioning enabled (locks apply to specific versions)
Default retention: You can set bucket-level default retention so every uploaded object inherits the lock
Retention can extend but not shorten: In compliance mode, you can extend retention periods but never reduce them
Legal hold independence: Legal holds are separate from retention. An object can have both, and must have both removed/expired before deletion
Cost awareness: Locked objects cannot move to cheaper storage or be deleted. Plan storage costs accordingly

Summary: The Complete Object Model

We've conducted a comprehensive exploration of the object storage model. Let's consolidate the key insights:

Key Takeaways

•Objects have three components — Data (the content), metadata (key-value descriptions), and key (the unique identifier)
•Buckets provide namespace — Globally unique containers that group objects and define access boundaries
•Flat namespace, simulated hierarchy — No real directories; key prefixes create the folder illusion
•HTTP/REST is the interface — Standard HTTP verbs (GET, PUT, DELETE) enable universal access
•Metadata enables intelligence — System and user metadata describe objects, but you need external indexes for queries
•Versioning protects history — Every version preserved, enabling rollback and audit compliance
•Object Lock ensures immutability — WORM compliance for regulatory requirements with governance or compliance modes

What's next:

Page Complete

2 / 4