System DesignCQRS at Scale

CQRS at Scale

LevelAdvanced

Duration90 mins

TopicCQRS at Scale

3 / 5

Eventual Consistency

The Consistency Trade-off

When you separate the write path from the read path in CQRS, a fundamental reality emerges: data written on one side takes time to appear on the other. This temporal gap—however brief—means your system exhibits eventual consistency rather than immediate consistency.

This isn't a bug; it's an architectural choice with profound implications. Strong consistency provides simpler mental models but limits scalability. Eventual consistency enables massive scale but requires careful handling of the transition window.

The question isn't whether to accept eventual consistency, but how to design systems that embrace it gracefully—ensuring users experience a coherent, trustworthy application despite the underlying asynchrony.

What You Will Learn

This page covers the theory and practice of eventual consistency: understanding consistency guarantees, measuring and minimizing lag, handling read-after-write scenarios, designing UX for asynchronous systems, and implementing compensating patterns when consistency matters.

Understanding Consistency Models

Before diving into practical solutions, we must understand the theoretical landscape of consistency models. These form a spectrum from strongest guarantees to weakest.

Strong Consistency (Linearizability):

Every read returns the most recent write. Once a write completes, all subsequent reads see that value. This is the intuitive model most developers expect from traditional databases.

Guarantee: Read always returns latest committed write
Trade-off: Limited scalability, higher latency
Implementation: Single primary, synchronous replication, or consensus protocols

Causal Consistency:

Operations that are causally related are seen in the same order by all nodes. If A causes B, everyone sees A before B. But concurrent operations may be seen in different orders.

Guarantee: Cause precedes effect
Trade-off: Weaker than strong, requires dependency tracking

Eventual Consistency:

If no new writes occur, all replicas will eventually converge to the same value. The system provides no guarantees about when this convergence occurs.

Guarantee: Given enough time, all reads return the same value
Trade-off: Temporary inconsistency is possible
Variation: 'Read-your-writes' ensures you see your own changes

Consistency Model Comparison
Consistency Model	Guarantee	Latency	Availability	Scalability
Strong (Linearizable)	Always latest value	High (sync required)	CP (partition intolerant)	Limited
Causal	Cause precedes effect	Medium	Higher than strong	Moderate
Session	Your writes visible to you	Low-medium	High	Good
Eventual	Converges over time	Lowest	Highest (AP)	Maximum

CAP Theorem Reminder

The CAP theorem states that distributed systems can only guarantee two of: Consistency, Availability, Partition tolerance. Since network partitions are inevitable, the real choice is between CP (consistent but may be unavailable) and AP (available but may be inconsistent). CQRS typically opts for AP on the read side.

The Consistency Window

In CQRS systems, the consistency window is the time between when a write is committed and when it becomes visible in read models. Understanding, measuring, and minimizing this window is critical.

Components of the Consistency Window:

Consistency Window Timeline
Command Received    Event Published    Event Consumed    Projection Updated    Cache Invalidated
      │                    │                   │                    │                    │
      ├────────────────────┼───────────────────┼────────────────────┼────────────────────┤
      │                    │                   │                    │                    │
      │    T1: Command     │    T2: Event      │   T3: Projection   │  T4: Cache Update  │
      │    Processing      │    Publishing     │   Processing       │                    │
      │    (10-50ms)       │    (1-10ms)       │   (10-100ms)       │    (1-5ms)         │
      │                    │                   │                    │                    │
      └────────────────────┴───────────────────┴────────────────────┴────────────────────┘
                                                                                         │
      Total Consistency Window: T1 + T2 + T3 + T4                                        │
      Typical range: 50-500ms, can extend to seconds under load                          │
                                                                                         ▼
                                                                              Data Visible in Reads

Measuring the Consistency Window:

You can't manage what you don't measure. Implement instrumentation to track consistency lag at each stage:

Consistency Lag Measurement
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
interface ConsistencyMetrics {
  eventId: string;
  timestamps: {
    commandReceived: number;
    eventPublished: number;
    eventConsumed: number;
    projectionUpdated: number;
    cacheInvalidated: number;
  };
}
 
class ProjectionConsumer {
  async processEvent(event: DomainEvent): Promise<void> {
    const metrics: ConsistencyMetrics = {
      eventId: event.id,
      timestamps: {
        commandReceived: event.metadata.commandReceivedAt,
        eventPublished: event.metadata.publishedAt,
        eventConsumed: Date.now(),
        projectionUpdated: 0,
        cacheInvalidated: 0
      }
    };
    
    // Process projection
    await this.projection.apply(event);
    metrics.timestamps.projectionUpdated = Date.now();
    
    // Invalidate cache
    await this.cacheInvalidator.invalidate(event);
    metrics.timestamps.cacheInvalidated = Date.now();
    
    // Calculate and record lag components
    const publishLag = metrics.timestamps.eventPublished - metrics.timestamps.commandReceived;
    const queueLag = metrics.timestamps.eventConsumed - metrics.timestamps.eventPublished;
    const projectionLag = metrics.timestamps.projectionUpdated - metrics.timestamps.eventConsumed;
    const totalLag = metrics.timestamps.cacheInvalidated - metrics.timestamps.commandReceived;
    
    // Record metrics
    this.metrics.recordHistogram('consistency.publish_lag_ms', publishLag);
    this.metrics.recordHistogram('consistency.queue_lag_ms', queueLag);
    this.metrics.recordHistogram('consistency.projection_lag_ms', projectionLag);
    this.metrics.recordHistogram('consistency.total_lag_ms', totalLag);
    
    // Alert if lag exceeds threshold
    if (totalLag > 1000) {
      this.alerting.warn(`High consistency lag: ${totalLag}ms for event ${event.id}`);
    }
  }
}

Lag Under Load

Consistency lag behaves non-linearly under load. When event throughput exceeds projection capacity, lag grows unbounded. Monitor queue depth and processing rates. Implement backpressure before the system degrades catastrophically.

Read-After-Write Consistency

The most common user complaint in eventually consistent systems is the read-after-write problem: "I just saved my changes, but when I refresh, they're gone!"

This happens when a user writes data, then immediately reads before the projection catches up. Several patterns can address this.

Pattern 1: Synchronous Projection (Hybrid)

For critical paths, update the read model synchronously as part of the command transaction, while still publishing events for other consumers.

Synchronous Projection for Critical Paths
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class OrderCommandHandler {
  async placeOrder(command: PlaceOrderCommand): Promise<OrderResult> {
    return await this.transactionManager.transaction(async (tx) => {
      // 1. Apply business logic and persist to write store
      const order = Order.create(command);
      await this.writeRepo.save(order, tx);
      
      // 2. Synchronously update critical read model
      // User MUST see their order immediately
      const orderView = this.orderProjection.project(order);
      await this.readRepo.upsert(orderView, tx);
      
      // 3. Publish event for other consumers (async)
      // Non-critical projections can be eventually consistent
      await this.eventBus.publish(new OrderPlacedEvent(order), tx);
      
      return { orderId: order.id, status: 'created' };
    });
  }
}

Pattern 2: Read-Your-Writes via Session Tracking

Track the last write position for each user session. When reading, ensure the read model has processed at least up to that position.

Read-Your-Writes with Session Tokens
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
interface ConsistencyToken {
  lastWritePosition: number;
  timestamp: number;
}
 
class ConsistentQueryService {
  async executeQuery<T>(
    query: Query<T>,
    consistencyToken?: ConsistencyToken
  ): Promise<QueryResult<T>> {
    
    if (consistencyToken) {
      // Wait until read model has caught up to user's last write
      const currentPosition = await this.projector.getCurrentPosition();
      
      if (currentPosition < consistencyToken.lastWritePosition) {
        // Option 1: Wait with timeout
        const caught Up = await this.waitForPosition(
          consistencyToken.lastWritePosition,
          { timeoutMs: 2000 }
        );
        
        if (!caughtUp) {
          // Option 2: Fall back to reading from write store
          return this.executeOnWriteStore(query);
        }
      }
    }
    
    // Execute on read model
    return this.readStore.execute(query);
  }
  
  private async waitForPosition(
    position: number,
    options: { timeoutMs: number }
  ): Promise<boolean> {
    const deadline = Date.now() + options.timeoutMs;
    
    while (Date.now() < deadline) {
      const currentPosition = await this.projector.getCurrentPosition();
      if (currentPosition >= position) {
        return true;
      }
      await sleep(50); // Poll interval
    }
    
    return false; // Timeout
  }
}
 
// Client receives token after writes
class OrderController {
  async createOrder(req: Request): Promise<Response> {
    const result = await this.commandHandler.placeOrder(req.body);
    
    // Return consistency token with response
    return {
      data: result,
      consistencyToken: {
        lastWritePosition: result.eventPosition,
        timestamp: Date.now()
      }
    };
  }
  
  async getOrders(req: Request): Promise<Response> {
    // Client sends token received from previous write
    const token = req.headers['x-consistency-token'];
    
    const orders = await this.queryService.executeQuery(
      new GetOrdersQuery(req.user.id),
      token ? JSON.parse(token) : undefined
    );
    
    return { data: orders };
  }
}

Pattern 3: Optimistic UI Updates

Rather than waiting for the read model, update the client-side state optimistically and reconcile later.

Optimistic UI Pattern
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Frontend: React with optimistic updates
function useCreateOrder() {
  const queryClient = useQueryClient();
  
  return useMutation({
    mutationFn: createOrder,
    
    // Optimistic update: immediately show the new order
    onMutate: async (newOrder) => {
      // Cancel outgoing queries to avoid overwriting optimistic update
      await queryClient.cancelQueries({ queryKey: ['orders'] });
      
      // Snapshot current data for rollback
      const previousOrders = queryClient.getQueryData(['orders']);
      
      // Optimistically add new order to cache
      queryClient.setQueryData(['orders'], (old: Order[]) => [
        ...old,
        { ...newOrder, id: 'temp-' + Date.now(), status: 'pending' }
      ]);
      
      return { previousOrders };
    },
    
    // If mutation fails, rollback to previous state
    onError: (err, newOrder, context) => {
      queryClient.setQueryData(['orders'], context.previousOrders);
      toast.error('Failed to create order');
    },
    
    // After success, refetch to get authoritative data
    onSettled: () => {
      // Slight delay to allow read model to catch up
      setTimeout(() => {
        queryClient.invalidateQueries({ queryKey: ['orders'] });
      }, 500);
    },
  });
}

UX Patterns for Eventual Consistency

User experience design must account for eventual consistency. Users don't think in terms of projections and events—they expect their actions to be reflected immediately. Good UX bridges this expectation gap.

Pattern 1: Immediate Acknowledgment

Even if background processing takes time, acknowledge the user's action immediately.

Poor UX

•User clicks 'Submit Order'
•Loading spinner for 2 seconds
•Page refreshes to order list
•New order not visible
•User confused: 'Did it work?'

Good UX

•User clicks 'Submit Order'
•Immediate: 'Order #12345 placed!'
•Order appears locally with 'Processing' badge
•Background refresh updates status
•User confident: 'My order is in progress'

Pattern 2: Progressive Enhancement

Show what you know immediately, then progressively enhance with additional data as it becomes available.

Progressive Enhancement Component
React
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
function OrderCard({ orderId, optimisticData }) {
  const { data: serverData, isLoading } = useQuery({
    queryKey: ['order', orderId],
    queryFn: () => fetchOrder(orderId),
    // Refetch until we get server data
    refetchInterval: optimisticData && !serverData ? 1000 : false,
  });
  
  // Use server data if available, otherwise optimistic
  const order = serverData || optimisticData;
  const isPending = !serverData;
  
  return (
    <Card className={isPending ? 'opacity-75' : ''}>
      <CardHeader>
        <div className="flex items-center justify-between">
          <span>Order #{order.id}</span>
          {isPending && (
            <Badge variant="outline" className="animate-pulse">
              <Loader2 className="mr-1 h-3 w-3 animate-spin" />
              Confirming...
            </Badge>
          )}
          {!isPending && (
            <Badge variant="success">Confirmed</Badge>
          )}
        </div>
      </CardHeader>
      <CardContent>
        {/* Show known data immediately */}
        <p>Items: {order.items.length}</p>
        <p>Total: ${order.total}</p>
        
        {/* Enhanced data appears when available */}
        {serverData?.estimatedDelivery && (
          <p>Delivery: {serverData.estimatedDelivery}</p>
        )}
        {serverData?.trackingNumber && (
          <p>Tracking: {serverData.trackingNumber}</p>
        )}
      </CardContent>
    </Card>
  );
}

Pattern 3: Transition States

Explicitly model and display intermediate states during async processing.

Transition State Examples

•Payment: 'Submitted' → 'Processing' → 'Confirmed' → 'Completed'
•Upload: 'Uploading' → 'Processing' → 'Ready'
•Order: 'Placed' → 'Confirmed' → 'Preparing' → 'Shipped'
•Subscription: 'Requested' → 'Verifying' → 'Active'

Set User Expectations

When operations take time, tell users explicitly. 'Your changes will appear shortly' is better than silent delays. For longer processes, provide progress indicators or status pages users can refresh.

Handling Conflicts and Anomalies

Eventual consistency can produce anomalies where different users see different states, or where the read model temporarily contradicts business rules. Understanding and handling these cases is critical for system reliability.

Anomaly 1: Stale Read Problems

A user sees outdated information because their request hit a read replica before projection completed.

Stale Read Scenario
Timeline:
T1: User A places order (writes inventory: 100 → 99)
T2: Event published, but not yet projected 
T3: User B checks inventory, sees 100 (stale)
T4: User B places order expecting 100 items available
T5: Projection catches up: inventory now 98
T6: Both orders succeed, but if inventory was 99, second order should have been warned
 
Problem: User B made a decision based on stale data

Solutions for Stale Read Problems:

Critical checks on write path: Validate current state in the command handler, not based on read model.
Bounded staleness: Accept some staleness but bound it (e.g., max 30 seconds).
Version checks: Include a version number in commands; reject if outdated.

Anomaly 2: Out-of-Order Processing

Events may be processed in different orders across different projections.

Handling Out-of-Order Events
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class OrderProjection {
  async handleEvent(event: DomainEvent): Promise<void> {
    const existingOrder = await this.readStore.findOrder(event.aggregateId);
    
    // Check event sequence number
    if (existingOrder && event.sequenceNumber <= existingOrder.lastEventSequence) {
      // Already processed or out of order - skip
      console.log(`Skipping out-of-order event: ${event.id}`);
      return;
    }
    
    // Gap detection: are we missing events?
    if (existingOrder && event.sequenceNumber > existingOrder.lastEventSequence + 1) {
      // Missing events - request replay
      console.warn(`Gap detected: expected ${existingOrder.lastEventSequence + 1}, got ${event.sequenceNumber}`);
      await this.requestReplayFrom(event.aggregateId, existingOrder.lastEventSequence + 1);
      return;
    }
    
    // Process normally
    const updatedView = this.applyEvent(existingOrder, event);
    updatedView.lastEventSequence = event.sequenceNumber;
    await this.readStore.upsert(updatedView);
  }
  
  private async requestReplayFrom(aggregateId: string, fromSequence: number): Promise<void> {
    const missingEvents = await this.eventStore.getEvents(aggregateId, { 
      fromSequence 
    });
    
    for (const event of missingEvents) {
      await this.handleEvent(event);
    }
  }
}

Anomaly 3: Write-Write Conflicts

Two users modify the same entity concurrently, and their changes conflict.

Optimistic Concurrency Control
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
interface VersionedCommand {
  targetId: string;
  expectedVersion: number;  // Version user was viewing when they made changes
  changes: Patch;
}
 
class EntityCommandHandler {
  async updateEntity(command: VersionedCommand): Promise<Result<Entity, ConflictError>> {
    // Load current version from write store
    const entity = await this.repository.findById(command.targetId);
    
    // Version check
    if (entity.version !== command.expectedVersion) {
      // Someone else modified since user last read
      return Err(new ConflictError({
        expectedVersion: command.expectedVersion,
        actualVersion: entity.version,
        conflictingChanges: this.detectConflicts(entity, command.changes)
      }));
    }
    
    // Apply changes and increment version
    entity.applyChanges(command.changes);
    entity.version += 1;
    
    // Save with optimistic lock check
    const saved = await this.repository.saveIfVersion(
      entity, 
      command.expectedVersion
    );
    
    if (!saved) {
      // Race condition - someone beat us
      return Err(new ConflictError({ /* ... */ }));
    }
    
    return Ok(entity);
  }
}
 
// Client handling conflicts
async function handleConflict(error: ConflictError) {
  // Option 1: Auto-merge if changes are compatible
  if (canAutoMerge(error.conflictingChanges)) {
    return retryWithMerge(error);
  }
  
  // Option 2: Show conflict resolution UI
  const resolution = await showConflictDialog(error);
  if (resolution === 'force') {
    return retryWithLatestVersion();
  } else if (resolution === 'discard') {
    return refetchData();
  }
}

Minimizing Consistency Lag

While eventual consistency is acceptable, unnecessary lag is not. Every millisecond of delay impacts user experience. Here are engineering techniques to minimize the consistency window.

Lag Reduction Techniques

•In-Process Event Handling — For single-instance deployments, process events in the same process before returning to the client. Zero network latency.
•Parallel Projection Workers — Scale projection consumers horizontally. Partition events by aggregate ID for parallelism without ordering issues.
•Push-Based Updates — Instead of polling, push events to projections via WebSockets or streaming protocols.
•Projection Co-location — Run projections close to (or on) the write database to reduce network round trips.
•Batch Processing with Micro-Batches — Process events in small batches (10-100) to amortize overhead while keeping lag low.

Parallel Projection Scaling
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class ParallelProjectionRunner {
  private workers: ProjectionWorker[];
  private partitionCount: number;
  
  constructor(options: { workerCount: number }) {
    this.partitionCount = options.workerCount;
    this.workers = Array.from({ length: this.partitionCount }, 
      (_, i) => new ProjectionWorker(i)
    );
  }
  
  async processEvent(event: DomainEvent): Promise<void> {
    // Deterministic partitioning by aggregate ID
    // Events for same aggregate always go to same worker
    const partition = this.getPartition(event.aggregateId);
    const worker = this.workers[partition];
    
    await worker.enqueue(event);
  }
  
  private getPartition(aggregateId: string): number {
    // Consistent hashing ensures same aggregate → same partition
    const hash = murmurHash(aggregateId);
    return hash % this.partitionCount;
  }
}
 
// With Kafka: Partition by aggregate ID at the broker level
// Consumer group automatically distributes partitions across workers
 
// Consumer configuration
const consumer = kafka.consumer({
  groupId: 'order-projection-consumers',
  // Each partition assigned to one consumer in the group
});
 
await consumer.subscribe({ 
  topic: 'order-events',
  fromBeginning: false 
});
 
await consumer.run({
  // Process multiple messages concurrently within a partition batch
  partitionsConsumedConcurrently: 3,
  eachBatch: async ({ batch, resolveOffset, heartbeat }) => {
    for (const message of batch.messages) {
      await processEvent(JSON.parse(message.value.toString()));
      resolveOffset(message.offset);
      await heartbeat();
    }
  },
});

Ordering vs Parallelism Trade-off

Maximum parallelism requires relaxed ordering. If strict cross-aggregate ordering matters, you must process sequentially or accept potential anomalies. Partition by aggregate ID is usually the right balance: events for the same entity are ordered, but different entities process in parallel.

When Strong Consistency Is Required

Not all parts of your system can tolerate eventual consistency. Some domains have business or regulatory requirements for immediate consistency. Identifying these cases and handling them appropriately is crucial.

Signs You Need Strong Consistency:

Financial transactions where race conditions cause monetary loss
Inventory systems where overselling is unacceptable
Identity/access management where stale permissions are a security risk
Regulatory requirements for immediate audit trail visibility
User expectations that cannot be educated away (e.g., banking)

Consistency Requirements by Domain
Domain	Consistency Requirement	Reason	Pattern
E-commerce product catalog	Eventual (seconds OK)	User can refresh; no harm in stale data	Standard CQRS
Shopping cart	Read-your-writes	User expects to see their changes	Session consistency
Inventory count	Varies by business	Depends on oversell tolerance	Hybrid or strong
Payment processing	Strong	Financial atomicity required	Sync projection or no CQRS
User authentication	Strong	Security-critical	Direct read from write store
Social media feed	Eventual (minutes OK)	User tolerates delays	Async projection

Hybrid Approaches:

You don't have to choose all-eventual or all-strong. Use different strategies for different operations within the same system.

Hybrid Consistency Strategy
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class QueryRouter {
  async execute<T>(query: Query<T>): Promise<T> {
    // Route based on consistency requirement
    const requirement = this.getConsistencyRequirement(query);
    
    switch (requirement) {
      case 'strong':
        // Read directly from write store
        return this.writeStoreQueryExecutor.execute(query);
        
      case 'session':
        // Ensure user sees their writes
        const userToken = this.sessionManager.getConsistencyToken();
        await this.waitForPosition(userToken.lastWritePosition);
        return this.readStore.execute(query);
        
      case 'eventual':
      default:
        // Standard read from projection
        return this.readStore.execute(query);
    }
  }
  
  private getConsistencyRequirement(query: Query): ConsistencyLevel {
    // Configuration-driven or query-metadata-driven
    const config = this.consistencyConfig[query.type];
    return config?.consistencyLevel || 'eventual';
  }
}
 
// Configuration example
const consistencyConfig = {
  'GetAccountBalance': { consistencyLevel: 'strong' },      // Financial
  'GetUserPermissions': { consistencyLevel: 'strong' },     // Security
  'GetOrderDetails': { consistencyLevel: 'session' },       // User's own data
  'GetProductCatalog': { consistencyLevel: 'eventual' },    // Public data
  'SearchProducts': { consistencyLevel: 'eventual' },       // Public data
};

Cost of Strong Consistency

Every query that requires strong consistency is a query that can't benefit from CQRS's scaling advantages. Minimize the scope of strong consistency to only what's truly necessary. Often, perceived requirements for strong consistency are actually UX problems solvable with optimistic updates.

Summary: Eventual Consistency

We've explored the realities of eventual consistency in CQRS systems and the strategies for handling it gracefully. Let's consolidate the key insights:

Key Takeaways

•Consistency is a spectrum — From strong to eventual, choose the right level for each use case. One size doesn't fit all.
•Measure the consistency window — Instrument your system to track lag at each stage. You can't improve what you don't measure.
•Read-your-writes is often enough — Users mainly care about seeing their own changes. Session-level consistency solves most complaints.
•Design UX for async — Optimistic updates, transition states, and progressive enhancement make eventual consistency invisible to users.
•Handle anomalies explicitly — Out-of-order events, conflicts, and stale reads will happen. Design for them.
•Minimize lag aggressively — Parallel processing, co-location, and micro-batching reduce lag from seconds to milliseconds.
•Use strong consistency surgically — Some domains require it, but each strong-consistency query limits your scalability.

What's Next:

With consistency models understood, we turn to the mechanics of keeping read models synchronized with write models. The next page covers Synchronization Strategies—the patterns, tools, and techniques for reliably propagating changes from commands to queries.

Page Complete

You now understand the theory and practice of eventual consistency in CQRS systems. From read-after-write patterns to UX design to conflict handling, you have the tools to build systems that are both scalable and user-friendly despite the inherent asynchrony.

3 / 5

Loading learning content...

System DesignCQRS at Scale

CQRS at Scale

LevelAdvanced

Duration90 mins

TopicCQRS at Scale

3 / 5

Eventual Consistency

The Consistency Trade-off

What You Will Learn

Understanding Consistency Models

Before diving into practical solutions, we must understand the theoretical landscape of consistency models. These form a spectrum from strongest guarantees to weakest.

Strong Consistency (Linearizability):

Every read returns the most recent write. Once a write completes, all subsequent reads see that value. This is the intuitive model most developers expect from traditional databases.

Guarantee: Read always returns latest committed write
Trade-off: Limited scalability, higher latency
Implementation: Single primary, synchronous replication, or consensus protocols

Causal Consistency:

Operations that are causally related are seen in the same order by all nodes. If A causes B, everyone sees A before B. But concurrent operations may be seen in different orders.

Guarantee: Cause precedes effect
Trade-off: Weaker than strong, requires dependency tracking

Eventual Consistency:

If no new writes occur, all replicas will eventually converge to the same value. The system provides no guarantees about when this convergence occurs.

Guarantee: Given enough time, all reads return the same value
Trade-off: Temporary inconsistency is possible
Variation: 'Read-your-writes' ensures you see your own changes

Consistency Model Comparison
Consistency Model	Guarantee	Latency	Availability	Scalability
Strong (Linearizable)	Always latest value	High (sync required)	CP (partition intolerant)	Limited
Causal	Cause precedes effect	Medium	Higher than strong	Moderate
Session	Your writes visible to you	Low-medium	High	Good
Eventual	Converges over time	Lowest	Highest (AP)	Maximum

CAP Theorem Reminder

The Consistency Window

In CQRS systems, the consistency window is the time between when a write is committed and when it becomes visible in read models. Understanding, measuring, and minimizing this window is critical.

Components of the Consistency Window:

Consistency Window Timeline
Command Received    Event Published    Event Consumed    Projection Updated    Cache Invalidated
      │                    │                   │                    │                    │
      ├────────────────────┼───────────────────┼────────────────────┼────────────────────┤
      │                    │                   │                    │                    │
      │    T1: Command     │    T2: Event      │   T3: Projection   │  T4: Cache Update  │
      │    Processing      │    Publishing     │   Processing       │                    │
      │    (10-50ms)       │    (1-10ms)       │   (10-100ms)       │    (1-5ms)         │
      │                    │                   │                    │                    │
      └────────────────────┴───────────────────┴────────────────────┴────────────────────┘
                                                                                         │
      Total Consistency Window: T1 + T2 + T3 + T4                                        │
      Typical range: 50-500ms, can extend to seconds under load                          │
                                                                                         ▼
                                                                              Data Visible in Reads

Measuring the Consistency Window:

You can't manage what you don't measure. Implement instrumentation to track consistency lag at each stage:

Consistency Lag Measurement
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
interface ConsistencyMetrics {
  eventId: string;
  timestamps: {
    commandReceived: number;
    eventPublished: number;
    eventConsumed: number;
    projectionUpdated: number;
    cacheInvalidated: number;
  };
}
 
class ProjectionConsumer {
  async processEvent(event: DomainEvent): Promise<void> {
    const metrics: ConsistencyMetrics = {
      eventId: event.id,
      timestamps: {
        commandReceived: event.metadata.commandReceivedAt,
        eventPublished: event.metadata.publishedAt,
        eventConsumed: Date.now(),
        projectionUpdated: 0,
        cacheInvalidated: 0
      }
    };
    
    // Process projection
    await this.projection.apply(event);
    metrics.timestamps.projectionUpdated = Date.now();
    
    // Invalidate cache
    await this.cacheInvalidator.invalidate(event);
    metrics.timestamps.cacheInvalidated = Date.now();
    
    // Calculate and record lag components
    const publishLag = metrics.timestamps.eventPublished - metrics.timestamps.commandReceived;
    const queueLag = metrics.timestamps.eventConsumed - metrics.timestamps.eventPublished;
    const projectionLag = metrics.timestamps.projectionUpdated - metrics.timestamps.eventConsumed;
    const totalLag = metrics.timestamps.cacheInvalidated - metrics.timestamps.commandReceived;
    
    // Record metrics
    this.metrics.recordHistogram('consistency.publish_lag_ms', publishLag);
    this.metrics.recordHistogram('consistency.queue_lag_ms', queueLag);
    this.metrics.recordHistogram('consistency.projection_lag_ms', projectionLag);
    this.metrics.recordHistogram('consistency.total_lag_ms', totalLag);
    
    // Alert if lag exceeds threshold
    if (totalLag > 1000) {
      this.alerting.warn(`High consistency lag: ${totalLag}ms for event ${event.id}`);
    }
  }
}

Lag Under Load

Read-After-Write Consistency

The most common user complaint in eventually consistent systems is the read-after-write problem: "I just saved my changes, but when I refresh, they're gone!"

This happens when a user writes data, then immediately reads before the projection catches up. Several patterns can address this.

Pattern 1: Synchronous Projection (Hybrid)

For critical paths, update the read model synchronously as part of the command transaction, while still publishing events for other consumers.

Synchronous Projection for Critical Paths
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class OrderCommandHandler {
  async placeOrder(command: PlaceOrderCommand): Promise<OrderResult> {
    return await this.transactionManager.transaction(async (tx) => {
      // 1. Apply business logic and persist to write store
      const order = Order.create(command);
      await this.writeRepo.save(order, tx);
      
      // 2. Synchronously update critical read model
      // User MUST see their order immediately
      const orderView = this.orderProjection.project(order);
      await this.readRepo.upsert(orderView, tx);
      
      // 3. Publish event for other consumers (async)
      // Non-critical projections can be eventually consistent
      await this.eventBus.publish(new OrderPlacedEvent(order), tx);
      
      return { orderId: order.id, status: 'created' };
    });
  }
}

Pattern 2: Read-Your-Writes via Session Tracking

Track the last write position for each user session. When reading, ensure the read model has processed at least up to that position.

Read-Your-Writes with Session Tokens
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
interface ConsistencyToken {
  lastWritePosition: number;
  timestamp: number;
}
 
class ConsistentQueryService {
  async executeQuery<T>(
    query: Query<T>,
    consistencyToken?: ConsistencyToken
  ): Promise<QueryResult<T>> {
    
    if (consistencyToken) {
      // Wait until read model has caught up to user's last write
      const currentPosition = await this.projector.getCurrentPosition();
      
      if (currentPosition < consistencyToken.lastWritePosition) {
        // Option 1: Wait with timeout
        const caught Up = await this.waitForPosition(
          consistencyToken.lastWritePosition,
          { timeoutMs: 2000 }
        );
        
        if (!caughtUp) {
          // Option 2: Fall back to reading from write store
          return this.executeOnWriteStore(query);
        }
      }
    }
    
    // Execute on read model
    return this.readStore.execute(query);
  }
  
  private async waitForPosition(
    position: number,
    options: { timeoutMs: number }
  ): Promise<boolean> {
    const deadline = Date.now() + options.timeoutMs;
    
    while (Date.now() < deadline) {
      const currentPosition = await this.projector.getCurrentPosition();
      if (currentPosition >= position) {
        return true;
      }
      await sleep(50); // Poll interval
    }
    
    return false; // Timeout
  }
}
 
// Client receives token after writes
class OrderController {
  async createOrder(req: Request): Promise<Response> {
    const result = await this.commandHandler.placeOrder(req.body);
    
    // Return consistency token with response
    return {
      data: result,
      consistencyToken: {
        lastWritePosition: result.eventPosition,
        timestamp: Date.now()
      }
    };
  }
  
  async getOrders(req: Request): Promise<Response> {
    // Client sends token received from previous write
    const token = req.headers['x-consistency-token'];
    
    const orders = await this.queryService.executeQuery(
      new GetOrdersQuery(req.user.id),
      token ? JSON.parse(token) : undefined
    );
    
    return { data: orders };
  }
}

Pattern 3: Optimistic UI Updates

Rather than waiting for the read model, update the client-side state optimistically and reconcile later.

Optimistic UI Pattern
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Frontend: React with optimistic updates
function useCreateOrder() {
  const queryClient = useQueryClient();
  
  return useMutation({
    mutationFn: createOrder,
    
    // Optimistic update: immediately show the new order
    onMutate: async (newOrder) => {
      // Cancel outgoing queries to avoid overwriting optimistic update
      await queryClient.cancelQueries({ queryKey: ['orders'] });
      
      // Snapshot current data for rollback
      const previousOrders = queryClient.getQueryData(['orders']);
      
      // Optimistically add new order to cache
      queryClient.setQueryData(['orders'], (old: Order[]) => [
        ...old,
        { ...newOrder, id: 'temp-' + Date.now(), status: 'pending' }
      ]);
      
      return { previousOrders };
    },
    
    // If mutation fails, rollback to previous state
    onError: (err, newOrder, context) => {
      queryClient.setQueryData(['orders'], context.previousOrders);
      toast.error('Failed to create order');
    },
    
    // After success, refetch to get authoritative data
    onSettled: () => {
      // Slight delay to allow read model to catch up
      setTimeout(() => {
        queryClient.invalidateQueries({ queryKey: ['orders'] });
      }, 500);
    },
  });
}

UX Patterns for Eventual Consistency

Pattern 1: Immediate Acknowledgment

Even if background processing takes time, acknowledge the user's action immediately.

Poor UX

•User clicks 'Submit Order'
•Loading spinner for 2 seconds
•Page refreshes to order list
•New order not visible
•User confused: 'Did it work?'

Good UX

•User clicks 'Submit Order'
•Immediate: 'Order #12345 placed!'
•Order appears locally with 'Processing' badge
•Background refresh updates status
•User confident: 'My order is in progress'

Pattern 2: Progressive Enhancement

Show what you know immediately, then progressively enhance with additional data as it becomes available.

Progressive Enhancement Component
React
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
function OrderCard({ orderId, optimisticData }) {
  const { data: serverData, isLoading } = useQuery({
    queryKey: ['order', orderId],
    queryFn: () => fetchOrder(orderId),
    // Refetch until we get server data
    refetchInterval: optimisticData && !serverData ? 1000 : false,
  });
  
  // Use server data if available, otherwise optimistic
  const order = serverData || optimisticData;
  const isPending = !serverData;
  
  return (
    <Card className={isPending ? 'opacity-75' : ''}>
      <CardHeader>
        <div className="flex items-center justify-between">
          <span>Order #{order.id}</span>
          {isPending && (
            <Badge variant="outline" className="animate-pulse">
              <Loader2 className="mr-1 h-3 w-3 animate-spin" />
              Confirming...
            </Badge>
          )}
          {!isPending && (
            <Badge variant="success">Confirmed</Badge>
          )}
        </div>
      </CardHeader>
      <CardContent>
        {/* Show known data immediately */}
        <p>Items: {order.items.length}</p>
        <p>Total: ${order.total}</p>
        
        {/* Enhanced data appears when available */}
        {serverData?.estimatedDelivery && (
          <p>Delivery: {serverData.estimatedDelivery}</p>
        )}
        {serverData?.trackingNumber && (
          <p>Tracking: {serverData.trackingNumber}</p>
        )}
      </CardContent>
    </Card>
  );
}

Pattern 3: Transition States

Explicitly model and display intermediate states during async processing.

Transition State Examples

•Payment: 'Submitted' → 'Processing' → 'Confirmed' → 'Completed'
•Upload: 'Uploading' → 'Processing' → 'Ready'
•Order: 'Placed' → 'Confirmed' → 'Preparing' → 'Shipped'
•Subscription: 'Requested' → 'Verifying' → 'Active'

Set User Expectations

When operations take time, tell users explicitly. 'Your changes will appear shortly' is better than silent delays. For longer processes, provide progress indicators or status pages users can refresh.

Handling Conflicts and Anomalies

Anomaly 1: Stale Read Problems

A user sees outdated information because their request hit a read replica before projection completed.

Stale Read Scenario
Timeline:
T1: User A places order (writes inventory: 100 → 99)
T2: Event published, but not yet projected 
T3: User B checks inventory, sees 100 (stale)
T4: User B places order expecting 100 items available
T5: Projection catches up: inventory now 98
T6: Both orders succeed, but if inventory was 99, second order should have been warned
 
Problem: User B made a decision based on stale data

Solutions for Stale Read Problems:

Critical checks on write path: Validate current state in the command handler, not based on read model.
Bounded staleness: Accept some staleness but bound it (e.g., max 30 seconds).
Version checks: Include a version number in commands; reject if outdated.

Anomaly 2: Out-of-Order Processing

Events may be processed in different orders across different projections.

Handling Out-of-Order Events
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class OrderProjection {
  async handleEvent(event: DomainEvent): Promise<void> {
    const existingOrder = await this.readStore.findOrder(event.aggregateId);
    
    // Check event sequence number
    if (existingOrder && event.sequenceNumber <= existingOrder.lastEventSequence) {
      // Already processed or out of order - skip
      console.log(`Skipping out-of-order event: ${event.id}`);
      return;
    }
    
    // Gap detection: are we missing events?
    if (existingOrder && event.sequenceNumber > existingOrder.lastEventSequence + 1) {
      // Missing events - request replay
      console.warn(`Gap detected: expected ${existingOrder.lastEventSequence + 1}, got ${event.sequenceNumber}`);
      await this.requestReplayFrom(event.aggregateId, existingOrder.lastEventSequence + 1);
      return;
    }
    
    // Process normally
    const updatedView = this.applyEvent(existingOrder, event);
    updatedView.lastEventSequence = event.sequenceNumber;
    await this.readStore.upsert(updatedView);
  }
  
  private async requestReplayFrom(aggregateId: string, fromSequence: number): Promise<void> {
    const missingEvents = await this.eventStore.getEvents(aggregateId, { 
      fromSequence 
    });
    
    for (const event of missingEvents) {
      await this.handleEvent(event);
    }
  }
}

Anomaly 3: Write-Write Conflicts

Two users modify the same entity concurrently, and their changes conflict.

Optimistic Concurrency Control
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
interface VersionedCommand {
  targetId: string;
  expectedVersion: number;  // Version user was viewing when they made changes
  changes: Patch;
}
 
class EntityCommandHandler {
  async updateEntity(command: VersionedCommand): Promise<Result<Entity, ConflictError>> {
    // Load current version from write store
    const entity = await this.repository.findById(command.targetId);
    
    // Version check
    if (entity.version !== command.expectedVersion) {
      // Someone else modified since user last read
      return Err(new ConflictError({
        expectedVersion: command.expectedVersion,
        actualVersion: entity.version,
        conflictingChanges: this.detectConflicts(entity, command.changes)
      }));
    }
    
    // Apply changes and increment version
    entity.applyChanges(command.changes);
    entity.version += 1;
    
    // Save with optimistic lock check
    const saved = await this.repository.saveIfVersion(
      entity, 
      command.expectedVersion
    );
    
    if (!saved) {
      // Race condition - someone beat us
      return Err(new ConflictError({ /* ... */ }));
    }
    
    return Ok(entity);
  }
}
 
// Client handling conflicts
async function handleConflict(error: ConflictError) {
  // Option 1: Auto-merge if changes are compatible
  if (canAutoMerge(error.conflictingChanges)) {
    return retryWithMerge(error);
  }
  
  // Option 2: Show conflict resolution UI
  const resolution = await showConflictDialog(error);
  if (resolution === 'force') {
    return retryWithLatestVersion();
  } else if (resolution === 'discard') {
    return refetchData();
  }
}

Minimizing Consistency Lag

While eventual consistency is acceptable, unnecessary lag is not. Every millisecond of delay impacts user experience. Here are engineering techniques to minimize the consistency window.

Lag Reduction Techniques

•In-Process Event Handling — For single-instance deployments, process events in the same process before returning to the client. Zero network latency.
•Parallel Projection Workers — Scale projection consumers horizontally. Partition events by aggregate ID for parallelism without ordering issues.
•Push-Based Updates — Instead of polling, push events to projections via WebSockets or streaming protocols.
•Projection Co-location — Run projections close to (or on) the write database to reduce network round trips.
•Batch Processing with Micro-Batches — Process events in small batches (10-100) to amortize overhead while keeping lag low.

Parallel Projection Scaling
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class ParallelProjectionRunner {
  private workers: ProjectionWorker[];
  private partitionCount: number;
  
  constructor(options: { workerCount: number }) {
    this.partitionCount = options.workerCount;
    this.workers = Array.from({ length: this.partitionCount }, 
      (_, i) => new ProjectionWorker(i)
    );
  }
  
  async processEvent(event: DomainEvent): Promise<void> {
    // Deterministic partitioning by aggregate ID
    // Events for same aggregate always go to same worker
    const partition = this.getPartition(event.aggregateId);
    const worker = this.workers[partition];
    
    await worker.enqueue(event);
  }
  
  private getPartition(aggregateId: string): number {
    // Consistent hashing ensures same aggregate → same partition
    const hash = murmurHash(aggregateId);
    return hash % this.partitionCount;
  }
}
 
// With Kafka: Partition by aggregate ID at the broker level
// Consumer group automatically distributes partitions across workers
 
// Consumer configuration
const consumer = kafka.consumer({
  groupId: 'order-projection-consumers',
  // Each partition assigned to one consumer in the group
});
 
await consumer.subscribe({ 
  topic: 'order-events',
  fromBeginning: false 
});
 
await consumer.run({
  // Process multiple messages concurrently within a partition batch
  partitionsConsumedConcurrently: 3,
  eachBatch: async ({ batch, resolveOffset, heartbeat }) => {
    for (const message of batch.messages) {
      await processEvent(JSON.parse(message.value.toString()));
      resolveOffset(message.offset);
      await heartbeat();
    }
  },
});

Ordering vs Parallelism Trade-off

When Strong Consistency Is Required

Signs You Need Strong Consistency:

Financial transactions where race conditions cause monetary loss
Inventory systems where overselling is unacceptable
Identity/access management where stale permissions are a security risk
Regulatory requirements for immediate audit trail visibility
User expectations that cannot be educated away (e.g., banking)

Consistency Requirements by Domain
Domain	Consistency Requirement	Reason	Pattern
E-commerce product catalog	Eventual (seconds OK)	User can refresh; no harm in stale data	Standard CQRS
Shopping cart	Read-your-writes	User expects to see their changes	Session consistency
Inventory count	Varies by business	Depends on oversell tolerance	Hybrid or strong
Payment processing	Strong	Financial atomicity required	Sync projection or no CQRS
User authentication	Strong	Security-critical	Direct read from write store
Social media feed	Eventual (minutes OK)	User tolerates delays	Async projection

Hybrid Approaches:

You don't have to choose all-eventual or all-strong. Use different strategies for different operations within the same system.

Hybrid Consistency Strategy
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class QueryRouter {
  async execute<T>(query: Query<T>): Promise<T> {
    // Route based on consistency requirement
    const requirement = this.getConsistencyRequirement(query);
    
    switch (requirement) {
      case 'strong':
        // Read directly from write store
        return this.writeStoreQueryExecutor.execute(query);
        
      case 'session':
        // Ensure user sees their writes
        const userToken = this.sessionManager.getConsistencyToken();
        await this.waitForPosition(userToken.lastWritePosition);
        return this.readStore.execute(query);
        
      case 'eventual':
      default:
        // Standard read from projection
        return this.readStore.execute(query);
    }
  }
  
  private getConsistencyRequirement(query: Query): ConsistencyLevel {
    // Configuration-driven or query-metadata-driven
    const config = this.consistencyConfig[query.type];
    return config?.consistencyLevel || 'eventual';
  }
}
 
// Configuration example
const consistencyConfig = {
  'GetAccountBalance': { consistencyLevel: 'strong' },      // Financial
  'GetUserPermissions': { consistencyLevel: 'strong' },     // Security
  'GetOrderDetails': { consistencyLevel: 'session' },       // User's own data
  'GetProductCatalog': { consistencyLevel: 'eventual' },    // Public data
  'SearchProducts': { consistencyLevel: 'eventual' },       // Public data
};

Cost of Strong Consistency

Summary: Eventual Consistency

We've explored the realities of eventual consistency in CQRS systems and the strategies for handling it gracefully. Let's consolidate the key insights:

Key Takeaways

•Consistency is a spectrum — From strong to eventual, choose the right level for each use case. One size doesn't fit all.
•Measure the consistency window — Instrument your system to track lag at each stage. You can't improve what you don't measure.
•Read-your-writes is often enough — Users mainly care about seeing their own changes. Session-level consistency solves most complaints.
•Design UX for async — Optimistic updates, transition states, and progressive enhancement make eventual consistency invisible to users.
•Handle anomalies explicitly — Out-of-order events, conflicts, and stale reads will happen. Design for them.
•Minimize lag aggressively — Parallel processing, co-location, and micro-batching reduce lag from seconds to milliseconds.
•Use strong consistency surgically — Some domains require it, but each strong-consistency query limits your scalability.

What's Next:

Page Complete

3 / 5