System Design (HLD)Object Storage Fundamentals

Object Storage Fundamentals

LevelIntermediate

Duration60 mins

TopicObject Storage Fundamentals

3 / 4

Eventual Consistency Considerations

The Consistency Challenge in Distributed Storage

Object storage's seemingly magical ability to store unlimited data with extreme durability comes at a cost that isn't immediately visible: consistency guarantees. Unlike a local filesystem where a write followed by a read always returns what you just wrote, object storage operates in a distributed environment where this simple expectation becomes surprisingly complex.

For years, understanding eventual consistency was mandatory knowledge for any engineer working with object storage. A write could appear to "disappear," reads could return stale data, and list operations could miss recently uploaded objects. While modern object storage services like AWS S3 have evolved to provide strong consistency, understanding the consistency landscape remains critical—for historical systems, alternative providers, and the fundamental distributed systems reasoning that underlies all of this.

What You Will Learn

By the end of this page, you will understand why distributed object storage historically exhibited eventual consistency, the specific consistency behaviors you may encounter, how to design systems that remain correct despite consistency delays, and how modern providers have achieved strong consistency without sacrificing scalability. This knowledge is essential for building reliable distributed systems.

Understanding Consistency Models

Before diving into object storage specifics, we must establish a precise vocabulary for consistency. These terms have specific meanings in distributed systems that differ from everyday usage.

What is Consistency?

Consistency, in the distributed systems sense, refers to the agreement between replicas about the current state of data. When data is replicated across multiple nodes (for durability and availability), the question is: when one node receives a write, how quickly must other nodes reflect that write, and what guarantees exist about what reads will return?

Strong Consistency (Linearizability)

Strong consistency provides the simplest mental model: once a write completes, all subsequent reads from any node return that write or a later value. The system behaves as if there's a single, global timeline of operations. This is what you'd expect from a local filesystem or a traditional database.

Characteristics:

Writes are immediately visible to all readers
There's a global ordering of all operations
No stale reads are possible after a write acknowledges
Comes at the cost of latency (waiting for distributed agreement)

Eventual Consistency

Eventual consistency relaxes the timing guarantees: given sufficient time without new writes, all replicas will eventually converge to the same value. But in the interim, different readers may see different values, reads may return stale data, and the ordering of operations may appear inconsistent across observers.

Characteristics:

Writes propagate asynchronously to replicas
Reads may return stale data temporarily
No guaranteed ordering across observers
Lower latency and higher availability
"Eventually" can mean milliseconds or seconds

Consistency Model Comparison
Property	Strong Consistency	Eventual Consistency
Read-after-write	Guaranteed immediately	May see stale data
Cross-client visibility	Immediate	Delayed propagation
Latency	Higher (synchronous replication)	Lower (asynchronous replication)
Availability during partitions	May reject requests	Continues operating
Implementation complexity	High (consensus protocols)	Lower
Mental model	Single timeline	Multiple eventual-converging timelines

The CAP Theorem Connection

Object storage's historical eventual consistency is a direct consequence of the CAP theorem. In a system that must remain Available during network Partitions (AP system), you cannot simultaneously have strong Consistency. Object storage prioritized durability and availability over immediate consistency—a reasonable trade-off for its design goals.

Variations Between Strong and Eventual

Real-world systems often offer intermediate consistency levels:

Read-Your-Writes Consistency: A client always sees their own writes immediately, even if other clients experience delays. This makes single-client workflows predictable.

Causal Consistency: Operations that are causally related are seen in the same order by all observers. If write A caused write B, no observer sees B before A.

Session Consistency: Within a client session, reads are consistent with that session's writes. Different sessions may see different views.

Monotonic Reads: Once a client reads a value, subsequent reads will never return an older value. No "going backwards" in time.

Understanding these variations helps you reason about what guarantees your application actually needs.

Object Storage Consistency: A Historical Perspective

To understand why consistency in object storage is nuanced, let's trace the evolution of AWS S3's consistency model as a representative example.

Pre-2020: Eventually Consistent Reads

For most of S3's history (2006-2020), the consistency model was:

New objects (PUT on a new key): Read-after-write consistency in all regions. A GET immediately after a PUT returns the new object.
Overwrites (PUT on existing key): Eventually consistent. A GET after overwriting might return the old version.
Deletes: Eventually consistent. A GET after DELETE might still return the object.
Listing operations: Eventually consistent. A LIST after PUT might not include the new object.

This meant developers had to build applications that tolerated these behaviors, often adding delays, retries, or out-of-band synchronization.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Scenario 1: New Object (Read-After-Write Consistent)
─────────────────────────────────────────────────────
Timeline:
  T1: PUT /bucket/new-key   → 200 OK
  T2: GET /bucket/new-key   → Returns new content ✓
 
Scenario 2: Overwrite (Eventually Consistent) 
─────────────────────────────────────────────────────
Timeline:
  T0: Object exists with content "A"
  T1: PUT /bucket/key (content "B")   → 200 OK
  T2: GET /bucket/key                 → May return "A" or "B" ⚠️
  T3: (some time passes)
  T4: GET /bucket/key                 → Returns "B" ✓
 
Scenario 3: Delete (Eventually Consistent)
─────────────────────────────────────────────────────
Timeline:
  T0: Object exists
  T1: DELETE /bucket/key              → 204 No Content
  T2: GET /bucket/key                 → May still return object ⚠️
  T3: (some time passes)
  T4: GET /bucket/key                 → 404 Not Found ✓
 
Scenario 4: List After PUT (Eventually Consistent)
─────────────────────────────────────────────────────
Timeline:
  T1: PUT /bucket/new-object          → 200 OK
  T2: LIST /bucket                    → May not include new-object ⚠️
  T3: (some time passes)
  T4: LIST /bucket                    → Includes new-object ✓

December 2020: S3 Achieves Strong Consistency

In a landmark announcement, AWS declared that S3 now provides strong read-after-write consistency for all operations:

PUT requests: Immediately visible to all subsequent reads
DELETE requests: Immediately reflected in all subsequent reads
LIST operations: Immediately reflect preceding PUT and DELETE operations

This was achieved without any price increase or performance degradation—a significant engineering accomplishment that simplified application development considerably.

How Did S3 Achieve This?

The technical details aren't fully public, but the general approach involves:

Request tracking: Each write is assigned a globally unique timestamp/version
Cache coherence: All caches are invalidated or updated before the write acknowledges
Metadata index consistency: The index that tracks objects is updated atomically with writes
Hybrid consensus: A form of distributed agreement that's optimized for S3's specific access patterns

The key insight is that read-after-write consistency doesn't require traditional consensus for every operation. By carefully tracking which reads depend on which writes and ensuring propagation completes before acknowledgment, strong consistency is achievable without sacrificing S3's massive scale.

Not All Object Storage Is Strongly Consistent

While S3, Google Cloud Storage (since 2023), and Azure Blob Storage offer strong consistency, many alternatives don't. Self-hosted solutions (MinIO, Ceph, Swift), older cloud regions, or S3-compatible services may still exhibit eventual consistency. Always verify your specific provider's guarantees.

Practical Implications of Eventual Consistency

Even with strong consistency now available in major cloud providers, understanding eventual consistency implications remains valuable. You may work with systems that still exhibit it, and the design patterns developed for eventual consistency are applicable to many distributed systems scenarios.

Real-World Failures Caused by Eventual Consistency

Here are concrete failure modes that engineers encountered before strong consistency:

Eventual Consistency Failure Modes

•Lost updates: Worker A uploads config V2; Worker B reads stale V1; Worker B processes based on V1; V2 changes are ignored
•Phantom objects: Object deleted but LIST still shows it; downstream system tries to process a non-existent object
•Missing objects: Upload completes; immediate LIST for processing doesn't find the object; object never gets processed
•Processing wrong version: Application overwrites file; another service reads old version; decisions made on stale data
•Inconsistent inventory: Inventory scan runs; during scan, some objects are added/deleted; final inventory is inconsistent snapshot
•Race condition amplification: Two clients upload different versions near-simultaneously; eventual consistency determines "winner" unpredictably

Example: Image Processing Pipeline Failure

Consider an image upload service with eventual consistency:

1. User uploads image → PUT /images/photo.jpg → 200 OK
2. Server queues processing job with message: "Process /images/photo.jpg"
3. Worker receives job, does GET /images/photo.jpg → 404 Not Found (!)
4. Worker marks job as failed (or image as missing)
5. User never gets their processed image

This wasn't a bug in the application—it was correct code that made reasonable assumptions. But those assumptions were invalidated by eventual consistency. The object existed (the PUT succeeded), but the worker's GET happened before consistency propagated.

Defensive Design Principle

Even with strong consistency guarantees, designing for eventual consistency makes your system more resilient. Network issues, cross-region replication, and caching layers can all introduce consistency delays. Treating consistency as a spectrum rather than a binary prepares you for real-world conditions.

The Read-After-HEAD Pitfall

A subtle issue: even with strong consistency, there's a time-of-check-to-time-of-use (TOCTOU) problem:

1. Client A: HEAD /object → ETag: "abc123"
2. Client B: PUT /object (new content) → 200 OK
3. Client A: GET /object (expecting ETag "abc123") → Gets new content!

The HEAD and GET are individually consistent, but between them, the world changed. This isn't eventual consistency—it's concurrent modification. Solutions include:

Conditional requests: GET /object with If-Match: "abc123" fails if object changed
Versioning: Request specific version ID that can't change
Application-level locking: External coordination to prevent concurrent modification

Designing Systems for Eventual Consistency

When working with systems that exhibit eventual consistency (or when building defensively), several design patterns help ensure correctness:

Pattern 1: Retry with Backoff

The simplest approach: if a read fails or returns unexpected results, wait and retry. This gives consistency time to propagate.

// Pseudocode for eventual consistency handling
async function getWithRetry(key, expectedExists, maxRetries = 5) {
    for (let attempt = 0; attempt < maxRetries; attempt++) {
        const result = await s3.getObject(key);
        if (expectedExists && result.exists) return result;
        if (!expectedExists && !result.exists) return null;
        await sleep(100 * Math.pow(2, attempt)); // Exponential backoff
    }
    throw new Error('Consistency timeout: object state unexpected');
}

This approach is simple but increases latency and may mask other bugs (you're retrying assuming eventual consistency when the real issue might be a logic error).

Pattern 2: Pass Data Instead of References

Rather than passing object keys in messages and having workers fetch objects, include the actual data in the message (if small enough):

Eventually Consistent (Risky):

{"type": "process_image", "key": "/images/photo.jpg"}

Worker fetches the object—but it might not exist yet.

Fully Consistent (Safe for small data):

{"type": "process_metadata", "data": {"width": 1920, "height": 1080, "format": "jpeg"}}

No fetch required; data is self-contained.

For larger data, you can include a hash and retry until the fetched object matches the expected hash.

Pattern 3: Use an External Consistent Store

Maintain a separate database (DynamoDB, PostgreSQL) as the source of truth for object existence and metadata. The flow becomes:

Upload object to S3
After successful upload, write record to database (transactionally)
Workers query database for work items
Workers fetch objects; if 404, re-query database (object might have been deleted)

This pattern decouples "object existence" (database says it exists) from "object availability" (S3 has propagated it). The database provides strong consistency while S3 provides the bulk storage.

Additional Design Patterns

•Immutable objects with unique keys — Never overwrite; use versioned keys like /images/photo-v2.jpg. Eliminates overwrite consistency issues.
•Confirmation messages with delays — Queue a delayed message (e.g., 30 seconds) to check if processing completed; gives consistency time to propagate
•Content-addressed storage — Key is hash of content, making overwrites impossible: /blobs/sha256-a1b2c3.... If you have the key, the content is guaranteed.
•Dual-write with reconciliation — Write to both S3 and a consistent store; reconcile discrepancies periodically
•Event sourcing — Only ever append events; reconstruct current state by replaying events. No overwrites means no overwrite consistency issues.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import { S3Client, PutObjectCommand } from "@aws-sdk/client-s3";
import { DynamoDBClient, PutItemCommand } from "@aws-sdk/client-dynamodb";
 
interface UploadRecord {
    objectKey: string;
    uploadedAt: string;
    status: 'pending' | 'ready' | 'failed';
    contentHash: string;
}
 
/**
 * Uploads to S3 with DynamoDB coordination for consistency.
 * The database record confirms the upload completed successfully.
 */
async function uploadWithConsistencyTracking(
    bucket: string,
    key: string,
    content: Buffer,
    contentHash: string
): Promise<void> {
    const s3 = new S3Client({});
    const ddb = new DynamoDBClient({});
    
    // Step 1: Record intent in DynamoDB (status: pending)
    await ddb.send(new PutItemCommand({
        TableName: "ObjectRegistry",
        Item: {
            pk: { S: `OBJECT#${bucket}#${key}` },
            status: { S: "pending" },
            contentHash: { S: contentHash },
            createdAt: { S: new Date().toISOString() }
        }
    }));
    
    try {
        // Step 2: Upload to S3
        await s3.send(new PutObjectCommand({
            Bucket: bucket,
            Key: key,
            Body: content,
            Metadata: { "content-hash": contentHash }
        }));
        
        // Step 3: Update DynamoDB to confirm (status: ready)
        await ddb.send(new PutItemCommand({
            TableName: "ObjectRegistry",
            Item: {
                pk: { S: `OBJECT#${bucket}#${key}` },
                status: { S: "ready" },
                contentHash: { S: contentHash },
                uploadedAt: { S: new Date().toISOString() }
            }
        }));
    } catch (error) {
        // Step 3 (failure): Mark as failed in DynamoDB
        await ddb.send(new PutItemCommand({
            TableName: "ObjectRegistry",
            Item: {
                pk: { S: `OBJECT#${bucket}#${key}` },
                status: { S: "failed" },
                error: { S: String(error) }
            }
        }));
        throw error;
    }
}
 
// Workers check DynamoDB for "ready" status before processing

Consistency Challenges with Caching Layers

Even with strongly consistent object storage, introducing caching layers reintroduces eventual consistency. CDNs, read replicas, and application caches all create windows where clients may receive stale data.

CDN Caching and Object Storage

When you put a CDN (CloudFront, Cloudflare, Akamai) in front of object storage, the CDN caches objects at edge locations worldwide. Now you have a new consistency challenge:

Object updated in S3 (strongly consistent)
CDN edge still has cached old version
Users receive stale data until cache expires

This isn't object storage eventual consistency—it's caching semantics. But the effect is similar: reads return stale data.

Caching Consistency Strategies
Strategy	How It Works	Trade-offs
TTL-based expiration	Cache entries expire after fixed time	Simple; stale data window equals TTL
Cache invalidation	Explicitly purge cache on update	Complex; invalidation is hard; propagation delays
Versioned URLs	Change URL on update: asset-v2.js	Always fresh; requires URL management; old URLs may still be cached
Cache-Control headers	Set max-age, no-cache directives	Client-controlled; varies by resource type
ETag/If-None-Match	Client checks if cached version is current	Reduces bandwidth; still requires origin round-trip

Versioned URL Pattern

The most reliable cache-consistency pattern is versioned or content-addressed URLs:

/assets/app-v1.2.3.js       (version in filename)
/assets/app.a1b2c3d4.js     (content hash in filename)
/assets/app.js?v=1705329842 (version in query string)

When content changes, the URL changes. Caches serve old URLs until TTL expires (fine—no one requests them). New URLs fetch fresh content. This pattern is universal in modern frontend deployment.

The trade-off: you need a mechanism to update references (HTML, manifests) to the new URLs when assets change. Build tools like Webpack, Vite, and Next.js handle this automatically.

Cache Invalidation is Hard

There are only two hard things in Computer Science: cache invalidation and naming things (and off-by-one errors). CDN cache invalidation is eventually consistent itself—purge requests propagate asynchronously to edge nodes. A purge might take seconds to minutes to reach all edges. Don't rely on invalidation for time-critical updates.

Application-Level Caching

Your application may add caching layers:

In-memory caches (process-local): Stale data until restart or TTL
Distributed caches (Redis, Memcached): Stale data until invalidation or TTL
Database read replicas: Replication lag creates stale reads

Each layer multiplies consistency complexity. A request might traverse: client browser cache → CDN edge → load balancer → application in-memory cache → Redis → S3. Any layer can return stale data.

Best practices for multi-layer caching:

Set conservative TTLs at each layer
Ensure inner layers have shorter TTLs than outer layers
Implement cache-busting for critical updates
Monitor cache hit rates and staleness
Provide admin endpoints to force-refresh when needed

Cross-Region Replication Consistency

Object storage supports cross-region replication (CRR), copying objects automatically to buckets in other geographic regions. This provides disaster recovery and reduces latency for global users. However, CRR introduces its own consistency characteristics.

Replication Lag Is Inherent

Cross-region replication is always asynchronous. When you PUT an object to a source bucket, the replication to destination buckets happens in the background:

PUT to us-east-1 bucket → succeeds immediately
Object queued for replication to eu-west-1
Network transmission (milliseconds to seconds for large objects)
PUT to eu-west-1 destination bucket

During steps 2-3, the object exists in us-east-1 but not in eu-west-1. A user in Europe might not see an object that an American user just uploaded.

Cross-Region Replication Behaviors

•New objects: Replicated after they're fully written to source
•Overwrites: New version replicates; destination may show old then new (no stale-after-new)
•Deletes: Delete markers replicate; but you can configure to NOT replicate deletes (protection against accidental mass deletion)
•Replication lag: Typically seconds; can be minutes for large objects or high volumes; SLA is usually 15 minutes for 99.99% of objects
•Versioning required: Both source and destination buckets must have versioning enabled

Designing for Replication Lag

If your application has users in multiple regions reading from region-local buckets:

Option 1: Accept lag for non-critical data For static assets like images, a few seconds of lag is often acceptable. Users rarely notice if an image takes a moment longer to become globally available.

Option 2: Route writes to a single primary region All writes go to one region; reads go to local replicas. Ensures writes are immediately consistent; reads may be slightly behind.

Option 3: Use S3 Replication Time Control (RTC) AWS offers S3 RTC with SLA guaranteeing 99.99% of objects replicate within 15 minutes, with metrics and notifications for tracking. Costs more but provides predictability.

Option 4: Synchronous multi-region writes Use two-phase commit or other coordination to write to all regions synchronously. Increases latency significantly; rarely worth it for object storage.

Replication Monitoring

AWS provides replication metrics: ReplicationLatency (time for object to replicate), BytesPendingReplication, and OperationsPendingReplication. Monitor these to detect replication backlogs before they impact users. Set CloudWatch alarms for replication latency exceeding acceptable thresholds.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
User uploads to US-East-1:
Timeline:
  T+0ms:    PUT /bucket/image.jpg to us-east-1         → 200 OK
  T+10ms:   Object available in us-east-1               (local read succeeds)
  T+10ms:   Replication event queued
  T+100ms:  Object data being transferred to eu-west-1
  T+500ms:  PUT completes in eu-west-1
  T+500ms:  Object available in eu-west-1               (remote read succeeds)
 
During T+10ms to T+500ms:
  - US user reads from us-east-1: Sees object ✓
  - EU user reads from eu-west-1: 404 Not Found ✗
  - EU user reads from us-east-1: Sees object ✓ (cross-region latency)
 
For large objects or network congestion, T+500ms could be T+60000ms (1 minute).

Testing for Consistency Issues

Consistency bugs are notoriously hard to reproduce because they depend on timing, load, and distributed system state. Explicit testing strategies are essential.

Chaos Engineering for Consistency

Inject artificial consistency delays: Wrap your object storage client to randomly delay or fail read-after-write scenarios in test environments
Simulate stale reads: Return old versions of objects probabilistically
Test list-after-put: Write objects and immediately list; verify handling of missing objects
Cross-region testing: Deploy to multiple regions and test with replication lag
Concurrent modification tests: Multiple processes updating the same key simultaneously

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
/**
 * A wrapper that simulates eventual consistency for testing.
 * In production, use the real client. In tests, use this.
 */
class EventualConsistencySimulator {
    private realClient: S3Client;
    private pendingWrites: Map<string, { content: Buffer, visibleAfter: number }> = new Map();
    private consistencyDelayMs: number;
    
    constructor(realClient: S3Client, consistencyDelayMs = 1000) {
        this.realClient = realClient;
        this.consistencyDelayMs = consistencyDelayMs;
    }
    
    async putObject(bucket: string, key: string, content: Buffer): Promise<void> {
        // Write to real storage
        await this.realClient.send(new PutObjectCommand({
            Bucket: bucket,
            Key: key, 
            Body: content
        }));
        
        // But mark as "not yet visible" for simulated delay
        this.pendingWrites.set(`${bucket}/${key}`, {
            content,
            visibleAfter: Date.now() + this.consistencyDelayMs
        });
    }
    
    async getObject(bucket: string, key: string): Promise<Buffer | null> {
        const pending = this.pendingWrites.get(`${bucket}/${key}`);
        
        if (pending && Date.now() < pending.visibleAfter) {
            // Simulate reading before consistency propagates
            // Return null (404) or old version depending on simulation mode
            throw new Error("NoSuchKey"); // Simulates eventual consistency
        }
        
        // Consistency has propagated; clean up and return real data
        this.pendingWrites.delete(`${bucket}/${key}`);
        const response = await this.realClient.send(new GetObjectCommand({
            Bucket: bucket,
            Key: key
        }));
        return Buffer.from(await response.Body!.transformToByteArray());
    }
}
 
// Test that your application handles eventual consistency:
test("handles delayed object visibility", async () => {
    const simulator = new EventualConsistencySimulator(realS3Client, 2000);
    await simulator.putObject("bucket", "key", Buffer.from("data"));
    
    // Immediate read should handle the "missing" object gracefully
    await expect(processObjectWithRetry("bucket", "key"))
        .resolves.not.toThrow();
});

Consistency Testing Checklist

•Read immediately after write — Does your app retry or fail gracefully if object not found?
•List immediately after write — Does your inventory/scan logic handle missing objects?
•Concurrent updates — What happens when two processes update the same object?
•Delete then read — Does reading a deleted object throw expected errors?
•Cross-region reads — If using replication, test reads from replica regions
•Cache layer integration — Test with caching enabled and disabled
•High load conditions — Consistency issues often manifest under load when queues build up

Summary: Mastering Consistency

We've comprehensively explored consistency in object storage—from foundational theory to practical engineering patterns. Here are the key insights:

Key Takeaways

•Consistency is about replica agreement — In distributed systems, consistency describes how quickly and reliably different nodes reflect the same state
•Historical object storage was eventually consistent — Writes propagated asynchronously; reads could return stale data
•Modern cloud providers offer strong consistency — AWS S3 (since 2020), GCS, and Azure provide read-after-write consistency for all operations
•Caching reintroduces eventual consistency — CDNs, application caches, and read replicas create stale data windows regardless of backend consistency
•Cross-region replication has inherent lag — Replication is always asynchronous; design for seconds to minutes of lag
•Design patterns mitigate consistency issues — Retries, immutable keys, external consistent stores, and versioned URLs address consistency challenges
•Test explicitly for consistency — Use chaos engineering and simulation to verify your application handles consistency edge cases

What's next:

Having understood the storage paradigms, the object model, and consistency considerations, we're now ready to explore the practical use cases where object storage excels. The next page examines real-world applications: static asset serving, data lakes and analytics, backup and archival, user-generated content, and more. You'll learn to recognize when object storage is the right solution and how to architect for each use case.

Page Complete

You now possess a deep understanding of consistency in object storage—what it means, why it matters, how it's evolved, and how to design correct systems regardless of consistency guarantees. This knowledge applies not just to object storage but to any distributed system where data is replicated. Next, we'll apply this foundation to real-world object storage use cases.

3 / 4

Loading learning content...

System Design (HLD)Object Storage Fundamentals

Object Storage Fundamentals

LevelIntermediate

Duration60 mins

TopicObject Storage Fundamentals

3 / 4

Eventual Consistency Considerations

The Consistency Challenge in Distributed Storage

What You Will Learn

Understanding Consistency Models

Before diving into object storage specifics, we must establish a precise vocabulary for consistency. These terms have specific meanings in distributed systems that differ from everyday usage.

What is Consistency?

Strong Consistency (Linearizability)

Characteristics:

Writes are immediately visible to all readers
There's a global ordering of all operations
No stale reads are possible after a write acknowledges
Comes at the cost of latency (waiting for distributed agreement)

Eventual Consistency

Characteristics:

Writes propagate asynchronously to replicas
Reads may return stale data temporarily
No guaranteed ordering across observers
Lower latency and higher availability
"Eventually" can mean milliseconds or seconds

Consistency Model Comparison
Property	Strong Consistency	Eventual Consistency
Read-after-write	Guaranteed immediately	May see stale data
Cross-client visibility	Immediate	Delayed propagation
Latency	Higher (synchronous replication)	Lower (asynchronous replication)
Availability during partitions	May reject requests	Continues operating
Implementation complexity	High (consensus protocols)	Lower
Mental model	Single timeline	Multiple eventual-converging timelines

The CAP Theorem Connection

Variations Between Strong and Eventual

Real-world systems often offer intermediate consistency levels:

Read-Your-Writes Consistency: A client always sees their own writes immediately, even if other clients experience delays. This makes single-client workflows predictable.

Causal Consistency: Operations that are causally related are seen in the same order by all observers. If write A caused write B, no observer sees B before A.

Session Consistency: Within a client session, reads are consistent with that session's writes. Different sessions may see different views.

Monotonic Reads: Once a client reads a value, subsequent reads will never return an older value. No "going backwards" in time.

Understanding these variations helps you reason about what guarantees your application actually needs.

Object Storage Consistency: A Historical Perspective

To understand why consistency in object storage is nuanced, let's trace the evolution of AWS S3's consistency model as a representative example.

Pre-2020: Eventually Consistent Reads

For most of S3's history (2006-2020), the consistency model was:

New objects (PUT on a new key): Read-after-write consistency in all regions. A GET immediately after a PUT returns the new object.
Overwrites (PUT on existing key): Eventually consistent. A GET after overwriting might return the old version.
Deletes: Eventually consistent. A GET after DELETE might still return the object.
Listing operations: Eventually consistent. A LIST after PUT might not include the new object.

This meant developers had to build applications that tolerated these behaviors, often adding delays, retries, or out-of-band synchronization.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Scenario 1: New Object (Read-After-Write Consistent)
─────────────────────────────────────────────────────
Timeline:
  T1: PUT /bucket/new-key   → 200 OK
  T2: GET /bucket/new-key   → Returns new content ✓
 
Scenario 2: Overwrite (Eventually Consistent) 
─────────────────────────────────────────────────────
Timeline:
  T0: Object exists with content "A"
  T1: PUT /bucket/key (content "B")   → 200 OK
  T2: GET /bucket/key                 → May return "A" or "B" ⚠️
  T3: (some time passes)
  T4: GET /bucket/key                 → Returns "B" ✓
 
Scenario 3: Delete (Eventually Consistent)
─────────────────────────────────────────────────────
Timeline:
  T0: Object exists
  T1: DELETE /bucket/key              → 204 No Content
  T2: GET /bucket/key                 → May still return object ⚠️
  T3: (some time passes)
  T4: GET /bucket/key                 → 404 Not Found ✓
 
Scenario 4: List After PUT (Eventually Consistent)
─────────────────────────────────────────────────────
Timeline:
  T1: PUT /bucket/new-object          → 200 OK
  T2: LIST /bucket                    → May not include new-object ⚠️
  T3: (some time passes)
  T4: LIST /bucket                    → Includes new-object ✓

December 2020: S3 Achieves Strong Consistency

In a landmark announcement, AWS declared that S3 now provides strong read-after-write consistency for all operations:

PUT requests: Immediately visible to all subsequent reads
DELETE requests: Immediately reflected in all subsequent reads
LIST operations: Immediately reflect preceding PUT and DELETE operations

This was achieved without any price increase or performance degradation—a significant engineering accomplishment that simplified application development considerably.

How Did S3 Achieve This?

The technical details aren't fully public, but the general approach involves:

Request tracking: Each write is assigned a globally unique timestamp/version
Cache coherence: All caches are invalidated or updated before the write acknowledges
Metadata index consistency: The index that tracks objects is updated atomically with writes
Hybrid consensus: A form of distributed agreement that's optimized for S3's specific access patterns

Not All Object Storage Is Strongly Consistent

Practical Implications of Eventual Consistency

Real-World Failures Caused by Eventual Consistency

Here are concrete failure modes that engineers encountered before strong consistency:

Eventual Consistency Failure Modes

•Lost updates: Worker A uploads config V2; Worker B reads stale V1; Worker B processes based on V1; V2 changes are ignored
•Phantom objects: Object deleted but LIST still shows it; downstream system tries to process a non-existent object
•Missing objects: Upload completes; immediate LIST for processing doesn't find the object; object never gets processed
•Processing wrong version: Application overwrites file; another service reads old version; decisions made on stale data
•Inconsistent inventory: Inventory scan runs; during scan, some objects are added/deleted; final inventory is inconsistent snapshot
•Race condition amplification: Two clients upload different versions near-simultaneously; eventual consistency determines "winner" unpredictably

Example: Image Processing Pipeline Failure

Consider an image upload service with eventual consistency:

1. User uploads image → PUT /images/photo.jpg → 200 OK
2. Server queues processing job with message: "Process /images/photo.jpg"
3. Worker receives job, does GET /images/photo.jpg → 404 Not Found (!)
4. Worker marks job as failed (or image as missing)
5. User never gets their processed image

Defensive Design Principle

The Read-After-HEAD Pitfall

A subtle issue: even with strong consistency, there's a time-of-check-to-time-of-use (TOCTOU) problem:

1. Client A: HEAD /object → ETag: "abc123"
2. Client B: PUT /object (new content) → 200 OK
3. Client A: GET /object (expecting ETag "abc123") → Gets new content!

The HEAD and GET are individually consistent, but between them, the world changed. This isn't eventual consistency—it's concurrent modification. Solutions include:

Conditional requests: GET /object with If-Match: "abc123" fails if object changed
Versioning: Request specific version ID that can't change
Application-level locking: External coordination to prevent concurrent modification

Designing Systems for Eventual Consistency

When working with systems that exhibit eventual consistency (or when building defensively), several design patterns help ensure correctness:

Pattern 1: Retry with Backoff

The simplest approach: if a read fails or returns unexpected results, wait and retry. This gives consistency time to propagate.

// Pseudocode for eventual consistency handling
async function getWithRetry(key, expectedExists, maxRetries = 5) {
    for (let attempt = 0; attempt < maxRetries; attempt++) {
        const result = await s3.getObject(key);
        if (expectedExists && result.exists) return result;
        if (!expectedExists && !result.exists) return null;
        await sleep(100 * Math.pow(2, attempt)); // Exponential backoff
    }
    throw new Error('Consistency timeout: object state unexpected');
}

This approach is simple but increases latency and may mask other bugs (you're retrying assuming eventual consistency when the real issue might be a logic error).

Pattern 2: Pass Data Instead of References

Rather than passing object keys in messages and having workers fetch objects, include the actual data in the message (if small enough):

Eventually Consistent (Risky):

{"type": "process_image", "key": "/images/photo.jpg"}

Worker fetches the object—but it might not exist yet.

Fully Consistent (Safe for small data):

{"type": "process_metadata", "data": {"width": 1920, "height": 1080, "format": "jpeg"}}

No fetch required; data is self-contained.

For larger data, you can include a hash and retry until the fetched object matches the expected hash.

Pattern 3: Use an External Consistent Store

Maintain a separate database (DynamoDB, PostgreSQL) as the source of truth for object existence and metadata. The flow becomes:

Upload object to S3
After successful upload, write record to database (transactionally)
Workers query database for work items
Workers fetch objects; if 404, re-query database (object might have been deleted)

This pattern decouples "object existence" (database says it exists) from "object availability" (S3 has propagated it). The database provides strong consistency while S3 provides the bulk storage.

Additional Design Patterns

•Immutable objects with unique keys — Never overwrite; use versioned keys like /images/photo-v2.jpg. Eliminates overwrite consistency issues.
•Confirmation messages with delays — Queue a delayed message (e.g., 30 seconds) to check if processing completed; gives consistency time to propagate
•Content-addressed storage — Key is hash of content, making overwrites impossible: /blobs/sha256-a1b2c3.... If you have the key, the content is guaranteed.
•Dual-write with reconciliation — Write to both S3 and a consistent store; reconcile discrepancies periodically
•Event sourcing — Only ever append events; reconstruct current state by replaying events. No overwrites means no overwrite consistency issues.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import { S3Client, PutObjectCommand } from "@aws-sdk/client-s3";
import { DynamoDBClient, PutItemCommand } from "@aws-sdk/client-dynamodb";
 
interface UploadRecord {
    objectKey: string;
    uploadedAt: string;
    status: 'pending' | 'ready' | 'failed';
    contentHash: string;
}
 
/**
 * Uploads to S3 with DynamoDB coordination for consistency.
 * The database record confirms the upload completed successfully.
 */
async function uploadWithConsistencyTracking(
    bucket: string,
    key: string,
    content: Buffer,
    contentHash: string
): Promise<void> {
    const s3 = new S3Client({});
    const ddb = new DynamoDBClient({});
    
    // Step 1: Record intent in DynamoDB (status: pending)
    await ddb.send(new PutItemCommand({
        TableName: "ObjectRegistry",
        Item: {
            pk: { S: `OBJECT#${bucket}#${key}` },
            status: { S: "pending" },
            contentHash: { S: contentHash },
            createdAt: { S: new Date().toISOString() }
        }
    }));
    
    try {
        // Step 2: Upload to S3
        await s3.send(new PutObjectCommand({
            Bucket: bucket,
            Key: key,
            Body: content,
            Metadata: { "content-hash": contentHash }
        }));
        
        // Step 3: Update DynamoDB to confirm (status: ready)
        await ddb.send(new PutItemCommand({
            TableName: "ObjectRegistry",
            Item: {
                pk: { S: `OBJECT#${bucket}#${key}` },
                status: { S: "ready" },
                contentHash: { S: contentHash },
                uploadedAt: { S: new Date().toISOString() }
            }
        }));
    } catch (error) {
        // Step 3 (failure): Mark as failed in DynamoDB
        await ddb.send(new PutItemCommand({
            TableName: "ObjectRegistry",
            Item: {
                pk: { S: `OBJECT#${bucket}#${key}` },
                status: { S: "failed" },
                error: { S: String(error) }
            }
        }));
        throw error;
    }
}
 
// Workers check DynamoDB for "ready" status before processing

Consistency Challenges with Caching Layers

CDN Caching and Object Storage

When you put a CDN (CloudFront, Cloudflare, Akamai) in front of object storage, the CDN caches objects at edge locations worldwide. Now you have a new consistency challenge:

Object updated in S3 (strongly consistent)
CDN edge still has cached old version
Users receive stale data until cache expires

This isn't object storage eventual consistency—it's caching semantics. But the effect is similar: reads return stale data.

Caching Consistency Strategies
Strategy	How It Works	Trade-offs
TTL-based expiration	Cache entries expire after fixed time	Simple; stale data window equals TTL
Cache invalidation	Explicitly purge cache on update	Complex; invalidation is hard; propagation delays
Versioned URLs	Change URL on update: asset-v2.js	Always fresh; requires URL management; old URLs may still be cached
Cache-Control headers	Set max-age, no-cache directives	Client-controlled; varies by resource type
ETag/If-None-Match	Client checks if cached version is current	Reduces bandwidth; still requires origin round-trip

Versioned URL Pattern

The most reliable cache-consistency pattern is versioned or content-addressed URLs:

/assets/app-v1.2.3.js       (version in filename)
/assets/app.a1b2c3d4.js     (content hash in filename)
/assets/app.js?v=1705329842 (version in query string)

When content changes, the URL changes. Caches serve old URLs until TTL expires (fine—no one requests them). New URLs fetch fresh content. This pattern is universal in modern frontend deployment.

The trade-off: you need a mechanism to update references (HTML, manifests) to the new URLs when assets change. Build tools like Webpack, Vite, and Next.js handle this automatically.

Cache Invalidation is Hard

Application-Level Caching

Your application may add caching layers:

In-memory caches (process-local): Stale data until restart or TTL
Distributed caches (Redis, Memcached): Stale data until invalidation or TTL
Database read replicas: Replication lag creates stale reads

Best practices for multi-layer caching:

Set conservative TTLs at each layer
Ensure inner layers have shorter TTLs than outer layers
Implement cache-busting for critical updates
Monitor cache hit rates and staleness
Provide admin endpoints to force-refresh when needed

Cross-Region Replication Consistency

Replication Lag Is Inherent

Cross-region replication is always asynchronous. When you PUT an object to a source bucket, the replication to destination buckets happens in the background:

PUT to us-east-1 bucket → succeeds immediately
Object queued for replication to eu-west-1
Network transmission (milliseconds to seconds for large objects)
PUT to eu-west-1 destination bucket

During steps 2-3, the object exists in us-east-1 but not in eu-west-1. A user in Europe might not see an object that an American user just uploaded.

Cross-Region Replication Behaviors

•New objects: Replicated after they're fully written to source
•Overwrites: New version replicates; destination may show old then new (no stale-after-new)
•Deletes: Delete markers replicate; but you can configure to NOT replicate deletes (protection against accidental mass deletion)
•Replication lag: Typically seconds; can be minutes for large objects or high volumes; SLA is usually 15 minutes for 99.99% of objects
•Versioning required: Both source and destination buckets must have versioning enabled

Designing for Replication Lag

If your application has users in multiple regions reading from region-local buckets:

Option 2: Route writes to a single primary region All writes go to one region; reads go to local replicas. Ensures writes are immediately consistent; reads may be slightly behind.

Option 4: Synchronous multi-region writes Use two-phase commit or other coordination to write to all regions synchronously. Increases latency significantly; rarely worth it for object storage.

Replication Monitoring

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
User uploads to US-East-1:
Timeline:
  T+0ms:    PUT /bucket/image.jpg to us-east-1         → 200 OK
  T+10ms:   Object available in us-east-1               (local read succeeds)
  T+10ms:   Replication event queued
  T+100ms:  Object data being transferred to eu-west-1
  T+500ms:  PUT completes in eu-west-1
  T+500ms:  Object available in eu-west-1               (remote read succeeds)
 
During T+10ms to T+500ms:
  - US user reads from us-east-1: Sees object ✓
  - EU user reads from eu-west-1: 404 Not Found ✗
  - EU user reads from us-east-1: Sees object ✓ (cross-region latency)
 
For large objects or network congestion, T+500ms could be T+60000ms (1 minute).

Testing for Consistency Issues

Consistency bugs are notoriously hard to reproduce because they depend on timing, load, and distributed system state. Explicit testing strategies are essential.

Chaos Engineering for Consistency

Inject artificial consistency delays: Wrap your object storage client to randomly delay or fail read-after-write scenarios in test environments
Simulate stale reads: Return old versions of objects probabilistically
Test list-after-put: Write objects and immediately list; verify handling of missing objects
Cross-region testing: Deploy to multiple regions and test with replication lag
Concurrent modification tests: Multiple processes updating the same key simultaneously

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
/**
 * A wrapper that simulates eventual consistency for testing.
 * In production, use the real client. In tests, use this.
 */
class EventualConsistencySimulator {
    private realClient: S3Client;
    private pendingWrites: Map<string, { content: Buffer, visibleAfter: number }> = new Map();
    private consistencyDelayMs: number;
    
    constructor(realClient: S3Client, consistencyDelayMs = 1000) {
        this.realClient = realClient;
        this.consistencyDelayMs = consistencyDelayMs;
    }
    
    async putObject(bucket: string, key: string, content: Buffer): Promise<void> {
        // Write to real storage
        await this.realClient.send(new PutObjectCommand({
            Bucket: bucket,
            Key: key, 
            Body: content
        }));
        
        // But mark as "not yet visible" for simulated delay
        this.pendingWrites.set(`${bucket}/${key}`, {
            content,
            visibleAfter: Date.now() + this.consistencyDelayMs
        });
    }
    
    async getObject(bucket: string, key: string): Promise<Buffer | null> {
        const pending = this.pendingWrites.get(`${bucket}/${key}`);
        
        if (pending && Date.now() < pending.visibleAfter) {
            // Simulate reading before consistency propagates
            // Return null (404) or old version depending on simulation mode
            throw new Error("NoSuchKey"); // Simulates eventual consistency
        }
        
        // Consistency has propagated; clean up and return real data
        this.pendingWrites.delete(`${bucket}/${key}`);
        const response = await this.realClient.send(new GetObjectCommand({
            Bucket: bucket,
            Key: key
        }));
        return Buffer.from(await response.Body!.transformToByteArray());
    }
}
 
// Test that your application handles eventual consistency:
test("handles delayed object visibility", async () => {
    const simulator = new EventualConsistencySimulator(realS3Client, 2000);
    await simulator.putObject("bucket", "key", Buffer.from("data"));
    
    // Immediate read should handle the "missing" object gracefully
    await expect(processObjectWithRetry("bucket", "key"))
        .resolves.not.toThrow();
});

Consistency Testing Checklist

•Read immediately after write — Does your app retry or fail gracefully if object not found?
•List immediately after write — Does your inventory/scan logic handle missing objects?
•Concurrent updates — What happens when two processes update the same object?
•Delete then read — Does reading a deleted object throw expected errors?
•Cross-region reads — If using replication, test reads from replica regions
•Cache layer integration — Test with caching enabled and disabled
•High load conditions — Consistency issues often manifest under load when queues build up

Summary: Mastering Consistency

We've comprehensively explored consistency in object storage—from foundational theory to practical engineering patterns. Here are the key insights:

Key Takeaways

•Consistency is about replica agreement — In distributed systems, consistency describes how quickly and reliably different nodes reflect the same state
•Historical object storage was eventually consistent — Writes propagated asynchronously; reads could return stale data
•Modern cloud providers offer strong consistency — AWS S3 (since 2020), GCS, and Azure provide read-after-write consistency for all operations
•Caching reintroduces eventual consistency — CDNs, application caches, and read replicas create stale data windows regardless of backend consistency
•Cross-region replication has inherent lag — Replication is always asynchronous; design for seconds to minutes of lag
•Design patterns mitigate consistency issues — Retries, immutable keys, external consistent stores, and versioned URLs address consistency challenges
•Test explicitly for consistency — Use chaos engineering and simulation to verify your application handles consistency edge cases

What's next:

Page Complete

3 / 4