System DesignCQRS at Scale

CQRS at Scale

LevelAdvanced

Duration90 mins

TopicCQRS at Scale

4 / 5

Synchronization Strategies

The Synchronization Challenge

In CQRS, the write side is the source of truth. The read side is a projection—a derived view optimized for queries. Between them lies the synchronization layer: the infrastructure that ensures changes on the write side eventually appear on the read side.

This synchronization is deceptively complex. It must be:

Reliable — No events lost, even during failures
Ordered — Cause precedes effect within a stream
Scalable — Handle high event throughput
Observable — Transparent for monitoring and debugging
Recoverable — Able to rebuild from scratch when needed

This page explores the patterns and technologies that make robust synchronization possible at scale.

What You Will Learn

This page covers the major synchronization approaches: change data capture, event streaming, transaction outbox pattern, and dual-write strategies. You'll learn when to use each, their trade-offs, and implementation patterns for production systems.

Synchronization Approaches Overview

There are several fundamental approaches to keeping read models synchronized with write models. Each has distinct characteristics, trade-offs, and appropriate use cases.

The core question: How do changes in the write store get propagated to the read store(s)?

Approach 1: Application-Level Event Publishing

The application explicitly publishes events to a message broker after writing to the database. Projections consume from the broker.

Approach 2: Change Data Capture (CDC)

A separate service monitors the write database's transaction log and publishes changes as events. No application code changes needed.

Approach 3: Transactional Outbox Pattern

Events are written to an 'outbox' table in the same transaction as the write. A separate process reads the outbox and publishes to the broker.

Approach 4: Dual Writes (Anti-pattern)

Writing to both the write store and read store directly. Generally problematic and should be avoided.

Synchronization Approach Comparison
Approach	Reliability	Complexity	Coupling	Use Case
Application Events	Medium (at-least-once)	Low	Tight to broker	Event-sourced systems
CDC	High (from DB log)	Medium	Decoupled from app	Legacy system integration
Transactional Outbox	High (transactional)	Medium-High	Requires outbox table	Financial/critical systems
Dual Writes	Low (no atomicity)	Low	Tight to both stores	AVOID in production

The Dual-Write Problem

Never write directly to both write and read stores in the same operation. Without distributed transactions, failures after the first write but before the second leave your stores permanently inconsistent. This is a common anti-pattern that causes subtle, hard-to-debug data corruption.

Application-Level Event Publishing

In this approach, your application code explicitly publishes events after successful writes. This is common in event-sourced systems where events are first-class citizens.

Architecture:

Application Event Publishing Flow
┌────────────────────────────────────────────────────────────────────────┐
│                         COMMAND HANDLER                                 │
│                                                                         │
│   1. Validate command                                                  │
│   2. Execute business logic                                            │
│   3. Persist to write store (transaction)                              │
│   4. Publish events to message broker     ←─── Can fail after step 3!  │
│                                                                         │
└─────────────────────────────────┬──────────────────────────────────────┘
                                  │
                                  ▼ Events
                    ┌─────────────────────────────┐
                    │      MESSAGE BROKER         │
                    │   (Kafka, RabbitMQ, etc.)   │
                    └─────────────────────────────┘
                                  │
           ┌──────────────────────┼──────────────────────┐
           │                      │                      │
           ▼                      ▼                      ▼
   ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
   │  Projection 1 │     │  Projection 2 │     │  Other        │
   │  (Product     │     │  (Analytics   │     │  Consumers    │
   │   Read Model) │     │   Service)    │     │               │
   └───────────────┘     └───────────────┘     └───────────────┘

Basic Event Publishing Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class OrderCommandHandler {
  constructor(
    private orderRepository: OrderRepository,
    private eventPublisher: EventPublisher,
    private transactionManager: TransactionManager
  ) {}
 
  async placeOrder(command: PlaceOrderCommand): Promise<Order> {
    // Execute within transaction
    return await this.transactionManager.transaction(async (tx) => {
      // 1. Create domain entity
      const order = Order.create(command);
      
      // 2. Persist to write store
      await this.orderRepository.save(order, tx);
      
      // 3. Collect domain events
      const events = order.getUncommittedEvents();
      
      // PROBLEM: If we commit the transaction but event publish fails,
      // the order exists but no events are published
      
      return order;
    });
    
    // 4. Publish events AFTER transaction commits
    // But what if this fails? Data is saved, events are lost!
    const events = order.getUncommittedEvents();
    await this.eventPublisher.publishAll(events);
    order.markEventsAsCommitted();
    
    return order;
  }
}

The Reliability Problem:

With this approach, there's a window between committing to the database and publishing events where failures can cause lost events. If the process crashes after the database commit but before event publishing, the events are never sent.

Mitigations:

Event Store as Primary: If using event sourcing, events ARE the source of truth. Publish directly from the event store scan.
Retry with idempotency: Use message broker features (Kafka transactions, deduplication) to safely retry.
Outbox pattern: Solve this atomically (covered in section 4).

Event Sourcing Advantage

In pure event sourcing, events are stored as the source of truth. A separate process can continuously scan the event store and publish to message brokers. This eliminates the dual-write problem because there's only one write—to the event store.

Change Data Capture (CDC)

Change Data Capture (CDC) monitors the database's transaction log (WAL in PostgreSQL, binlog in MySQL) and streams changes as events. This approach is non-invasive—no application code changes required.

How CDC Works:

Database logs every transaction to its write-ahead log (WAL)
CDC connector reads the WAL in real-time
Each insert/update/delete becomes an event
Events are published to a message broker
Consumers project events to read models

Popular CDC Tools:

CDC Technology Options
Tool	Databases Supported	Output Format	Key Features
Debezium	PostgreSQL, MySQL, MongoDB, SQL Server, Oracle	Kafka	Open source, mature, widely adopted
AWS DMS	Most major databases	Kinesis, Kafka	Managed service, migration + streaming
Fivetran / Airbyte	200+ sources	Various destinations	SaaS, ELT focus
pg_logical	PostgreSQL only	Custom	Built-in PostgreSQL feature
Maxwell	MySQL only	Kafka, Kinesis, etc.	Lightweight, MySQL specific

Debezium CDC Architecture

┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│    PostgreSQL    │────▶│     Debezium     │────▶│      Kafka       │
│                  │ WAL │    Connector     │     │                  │
│ orders           │     │                  │     │ dbserver.public. │
│ customers        │     │  • Reads WAL     │     │ orders           │
│ products         │     │  • Transforms    │     │                  │
│                  │     │  • Publishes     │     │ dbserver.public. │
└──────────────────┘     └──────────────────┘     │ customers        │
                                                  └────────┬─────────┘
                                                           │
              ┌────────────────────────────────────────────┼─────┐
              │                                            │     │
              ▼                                            ▼     ▼
    ┌──────────────────┐                        ┌──────────────────┐
    │ Order Projection │                        │ Search Indexer   │
    │                  │                        │                  │
    │ Consumes events  │                        │ Updates          │
    │ Updates read DB  │                        │ Elasticsearch    │
    └──────────────────┘                        └──────────────────┘

Debezium Event Structure
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
{
  "schema": { /* Kafka Connect schema */ },
  "payload": {
    "before": null,  // Previous state (null for INSERT)
    "after": {       // New state
      "id": 12345,
      "customer_id": 789,
      "total_amount": 99.99,
      "status": "placed",
      "created_at": 1704897600000
    },
    "source": {
      "version": "2.4.0",
      "connector": "postgresql",
      "name": "dbserver",
      "ts_ms": 1704897600123,      // Transaction timestamp
      "snapshot": "false",
      "db": "ecommerce",
      "schema": "public",
      "table": "orders",
      "txId": 987654,              // Transaction ID
      "lsn": 23456789,             // Log sequence number
      "xmin": null
    },
    "op": "c",        // Operation: c=create, u=update, d=delete, r=read (snapshot)
    "ts_ms": 1704897600200,
    "transaction": {
      "id": "987654:23456789",
      "total_order": 1,
      "data_collection_order": 1
    }
  }
}

Advantages of CDC:

Non-invasive: No application code changes required
Comprehensive: Captures ALL database changes, including direct SQL updates
Reliable: Based on the database's own transaction log—same durability guarantees
Low overhead: Reads log asynchronously, minimal impact on write performance

Challenges with CDC:

Schema changes: Altering tables can break CDC pipelines
Semantics vs. intent: CDC captures what changed, not why (no business events)
Ordering complexity: Cross-table transaction ordering requires careful handling
Snapshot on startup: Initial sync of existing data can be slow for large tables

CDC vs Domain Events

CDC captures data changes: 'Order table row inserted.' Domain events capture intent: 'Customer placed an order.' If your projections need rich business context (e.g., who initiated, why, related operations), domain events are preferable. CDC excels for technical synchronization without semantic enrichment.

Transactional Outbox Pattern

The transactional outbox pattern solves the dual-write problem by storing events in the same database transaction as the entity change. An outbox table holds pending events; a separate process publishes them to the message broker.

How It Works:

Application writes entity AND events to database in SAME transaction
Transaction commits atomically (both succeed or both fail)
Relay process polls outbox table for new events
Relay publishes events to message broker
Relay marks events as published

This guarantees: if the entity is saved, the event is also saved. No events can be lost due to crashes between steps.

Outbox Table Schema
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
CREATE TABLE outbox (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Event metadata
    aggregate_type VARCHAR(255) NOT NULL,    -- e.g., 'Order'
    aggregate_id VARCHAR(255) NOT NULL,      -- e.g., order ID
    event_type VARCHAR(255) NOT NULL,        -- e.g., 'OrderPlaced'
    
    -- Event payload
    payload JSONB NOT NULL,
    
    -- Tracking
    created_at TIMESTAMPTZ DEFAULT NOW(),
    published_at TIMESTAMPTZ,                -- NULL until published
    
    -- For ordering and deduplication
    sequence_number BIGSERIAL,
    
    -- Indexes for relay processing
    INDEX idx_unpublished (published_at) WHERE published_at IS NULL,
    INDEX idx_aggregate (aggregate_type, aggregate_id)
);

Outbox Pattern Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
class OrderCommandHandler {
  async placeOrder(command: PlaceOrderCommand): Promise<Order> {
    return await this.transactionManager.transaction(async (tx) => {
      // 1. Create and save order
      const order = Order.create(command);
      await this.orderRepository.save(order, tx);
      
      // 2. Write events to outbox in SAME transaction
      const events = order.getUncommittedEvents();
      await this.outboxRepository.insertAll(events.map(event => ({
        aggregateType: 'Order',
        aggregateId: order.id,
        eventType: event.type,
        payload: event.toJSON()
      })), tx);
      
      // 3. Transaction commits both atomically
      return order;
    });
    // Events are now safely in outbox; relay will publish them
  }
}
 
// Outbox relay process (runs separately)
class OutboxRelay {
  async processOutbox(): Promise<void> {
    while (true) {
      const events = await this.outboxRepository.getUnpublished({ limit: 100 });
      
      if (events.length === 0) {
        await sleep(100); // Polling interval
        continue;
      }
      
      for (const event of events) {
        try {
          // Publish to message broker
          await this.messageBroker.publish(
            `${event.aggregateType}.${event.eventType}`,
            event.payload,
            {
              messageId: event.id, // For deduplication
              headers: {
                'aggregate-id': event.aggregateId,
                'sequence': event.sequenceNumber.toString()
              }
            }
          );
          
          // Mark as published
          await this.outboxRepository.markPublished(event.id);
        } catch (error) {
          // Log error; will retry on next poll
          console.error(`Failed to publish event ${event.id}: ${error}`);
        }
      }
    }
  }
}

Optimizations for Outbox Relay:

Use CDC on the outbox table: Instead of polling, use Debezium to capture outbox inserts and stream to Kafka. Eliminates polling latency and load.
Batched publishing: Publish multiple events in a single broker request.
Partitioned processing: Multiple relay instances process different aggregate types or ID ranges.
Cleanup job: Periodically delete old published events to prevent table bloat.

Listen-to-Yourself Pattern

For strong read-your-writes consistency, the writing service can be a consumer of its own events. After writing to outbox, it consumes from the broker and updates its local read cache. This ensures the same instance that wrote data can immediately read it, while other instances get eventual consistency.

Message Broker Selection

The message broker is the backbone of CQRS synchronization. Choosing the right broker significantly impacts reliability, performance, and operational complexity.

Key Selection Criteria:

Broker Requirements for CQRS

•Durability — Messages must survive broker restarts. In-memory only is unacceptable.
•Ordering — Events for the same aggregate must be processed in order (at least per-partition/queue).
•At-Least-Once Delivery — Losing messages is unacceptable. Duplicates can be handled via idempotency.
•Consumer Groups — Multiple projection instances should share the load without duplicating work.
•Replay Capability — Ability to reread past messages for projection rebuilds.
•Scalability — Handle high throughput without becoming a bottleneck.

Message Broker Comparison for CQRS
Broker	Ordering	Retention	Consumer Groups	Best For
Apache Kafka	Per-partition	Configurable (days-forever)	Yes, with rebalancing	High-throughput, event sourcing
Amazon Kinesis	Per-shard	24h - 365 days	Yes, with KCL	AWS-native, managed
Amazon SQS	FIFO mode only	Up to 14 days	No native groups	Simple queuing, low throughput
RabbitMQ	Per-queue	Until consumed (persistent)	Competing consumers	Traditional messaging, routing
Google Pub/Sub	Per-subscription	7 days default	Yes	GCP-native, global
Azure Event Hubs	Per-partition	1-7 days (standard)	Yes	Azure-native, Kafka-compatible
Pulsar	Per-topic/partition	Tiered (hot to cold)	Yes	Multi-tenancy, tiered storage

Kafka Deep Dive (Most Common Choice):

Kafka is the de-facto standard for event streaming in CQRS systems. Key concepts:

Kafka Configuration for CQRS
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// Producer configuration for reliable publishing
const producer = kafka.producer({
  idempotent: true,                  // Prevent duplicate messages on retry
  maxInFlightRequests: 5,            // Balance throughput and ordering
  transactionTimeout: 60000,         // For exactly-once semantics
});
 
// Topic configuration for CQRS events
const topicConfig = {
  topic: 'order-events',
  numPartitions: 12,                 // Parallelism for consumers
  replicationFactor: 3,              // Durability across broker failures
  
  // Compaction for event sourcing (keep latest per key)
  // OR deletion for pure event streaming
  cleanupPolicy: 'delete',           // or 'compact' for event store pattern
  
  retentionMs: 7 * 24 * 60 * 60 * 1000,  // 7 days retention
  minInsyncReplicas: 2,              // Require 2 replicas for durability
};
 
// Consumer configuration for projections
const consumer = kafka.consumer({
  groupId: 'order-projection-v1',    // Change for projection rebuild
  
  // Start from earliest to rebuild, or latest for new projections
  fromBeginning: true,
  
  // Manual commits for at-least-once
  autoCommit: false,
  
  // Tune for latency vs throughput
  sessionTimeout: 30000,
  heartbeatInterval: 3000,
  maxWaitTimeInMs: 500,
});
 
await consumer.run({
  eachMessage: async ({ topic, partition, message }) => {
    const event = JSON.parse(message.value.toString());
    
    // Process the event
    await projection.apply(event);
    
    // Commit offset after successful processing
    await consumer.commitOffsets([{
      topic,
      partition,
      offset: (Number(message.offset) + 1).toString()
    }]);
  }
});

Partition Key is Critical

Events for the same aggregate MUST go to the same Kafka partition to guarantee ordering. Use the aggregate ID as the message key. If OrderPlaced and OrderShipped for the same order go to different partitions, they might be processed out of order.

Idempotent Projection Handling

With at-least-once delivery, the same event may be delivered multiple times. Projections must be idempotent—processing the same event twice should produce the same result as processing it once.

Why Duplicates Happen:

Consumer processes event, crashes before committing offset
Network partition causes broker to believe consumer died
Manual replay for debugging or rebuilds
At-least-once delivery semantics (most systems)

Idempotent Projection Patterns
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// Pattern 1: Event ID tracking
class IdempotentProjection {
  async apply(event: DomainEvent): Promise<void> {
    // Check if already processed
    const processed = await this.eventTracker.isProcessed(event.id);
    if (processed) {
      console.log(`Skipping duplicate event: ${event.id}`);
      return;
    }
    
    // Process event
    await this.doApply(event);
    
    // Mark as processed (atomically with the update if possible)
    await this.eventTracker.markProcessed(event.id);
  }
}
 
// Pattern 2: Version/sequence-based idempotency
class VersionedProjection {
  async apply(event: OrderEvent): Promise<void> {
    const current = await this.readStore.getOrder(event.orderId);
    
    // Version check: only apply if event is newer
    if (current && event.version <= current.lastAppliedVersion) {
      console.log(`Skipping outdated event: v${event.version} <= v${current.lastAppliedVersion}`);
      return;
    }
    
    // Apply and update version
    const updated = this.applyEvent(current, event);
    updated.lastAppliedVersion = event.version;
    await this.readStore.save(updated);
  }
}
 
// Pattern 3: Conditional/upsert operations
class ConditionalProjection {
  async handleOrderPlaced(event: OrderPlacedEvent): Promise<void> {
    // Use database upsert - naturally idempotent
    await this.readStore.upsertOrder({
      id: event.orderId,
      customerId: event.customerId,
      items: event.items,
      status: 'placed',
      placedAt: event.timestamp,
    });
  }
  
  async handleInventoryDecremented(event: InventoryDecrementedEvent): Promise<void> {
    // Use conditional update - only apply once
    const result = await this.readStore.updateInventoryIfEventNotApplied(
      event.productId,
      event.eventId,
      -event.quantity
    );
    
    if (result.rowsAffected === 0) {
      console.log(`Event already applied: ${event.eventId}`);
    }
  }
}
 
// SQL implementation of conditional update
```sql
-- Table has 'applied_events' JSONB column tracking processed event IDs
UPDATE inventory_read_model
SET 
  quantity = quantity - $quantity,
  applied_events = applied_events || jsonb_build_array($eventId)
WHERE 
  product_id = $productId
  AND NOT applied_events ? $eventId;  -- Only if not already applied
```

Idempotency Best Practices

•Include event ID in every message — Essential for duplicate detection.
•Use upserts over inserts — INSERT OR UPDATE is naturally idempotent.
•Store sequence numbers — Track last-processed sequence per aggregate.
•Avoid increment operations — Use SET quantity = X, not quantity += X.
•Consider event store patterns — Full event log enables accurate replay.

Error Handling and Dead Letters

Projections will fail. Database timeouts, malformed events, bugs in projection logic—all cause processing failures. Robust error handling prevents these from blocking the entire projection pipeline.

The Dead Letter Queue (DLQ) Pattern:

Events that repeatedly fail processing are moved to a 'dead letter queue' for manual investigation. This prevents one bad event from blocking all subsequent events.

Dead Letter Queue Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
interface RetryPolicy {
  maxRetries: number;
  backoffMs: number;
  backoffMultiplier: number;
  maxBackoffMs: number;
}
 
class ResilientProjectionConsumer {
  private retryPolicy: RetryPolicy = {
    maxRetries: 5,
    backoffMs: 100,
    backoffMultiplier: 2,
    maxBackoffMs: 30000
  };
  
  async processEvent(event: DomainEvent): Promise<void> {
    let attempt = 0;
    let lastError: Error;
    
    while (attempt < this.retryPolicy.maxRetries) {
      try {
        await this.projection.apply(event);
        return; // Success
      } catch (error) {
        lastError = error;
        attempt++;
        
        // Classify error
        if (this.isPermanentFailure(error)) {
          // Don't retry permanent failures (e.g., validation errors)
          break;
        }
        
        if (attempt < this.retryPolicy.maxRetries) {
          // Exponential backoff
          const backoffMs = Math.min(
            this.retryPolicy.backoffMs * Math.pow(this.retryPolicy.backoffMultiplier, attempt - 1),
            this.retryPolicy.maxBackoffMs
          );
          
          console.log(`Retry ${attempt}/${this.retryPolicy.maxRetries} in ${backoffMs}ms for event ${event.id}`);
          await sleep(backoffMs);
        }
      }
    }
    
    // All retries exhausted - send to DLQ
    await this.sendToDeadLetterQueue(event, lastError);
  }
  
  private isPermanentFailure(error: Error): boolean {
    // Errors that won't succeed on retry
    return (
      error instanceof ValidationError ||
      error instanceof MalformedEventError ||
      error instanceof BusinessRuleViolation
    );
  }
  
  private async sendToDeadLetterQueue(event: DomainEvent, error: Error): Promise<void> {
    const dlqMessage = {
      originalEvent: event,
      error: {
        message: error.message,
        stack: error.stack,
        name: error.name
      },
      failedAt: new Date().toISOString(),
      projectionName: this.projection.name,
      attempts: this.retryPolicy.maxRetries
    };
    
    // Publish to DLQ topic
    await this.dlqProducer.publish('projection-dlq', dlqMessage);
    
    // Increment DLQ metric for alerting
    this.metrics.increment('projection.dlq.count', {
      projection: this.projection.name,
      eventType: event.type
    });
    
    console.error(`Event sent to DLQ: ${event.id}`, error);
  }
}
 
// DLQ processing service (manual or automated)
class DLQProcessor {
  async replayEvent(dlqMessageId: string): Promise<void> {
    const dlqMessage = await this.dlqStore.getMessage(dlqMessageId);
    
    // Attempt to reprocess (after fixing the underlying issue)
    await this.projectionConsumer.processEvent(dlqMessage.originalEvent);
    
    // If successful, remove from DLQ
    await this.dlqStore.markResolved(dlqMessageId);
  }
  
  async skipEvent(dlqMessageId: string, reason: string): Promise<void> {
    // Mark as intentionally skipped (e.g., corrupted event)
    await this.dlqStore.markSkipped(dlqMessageId, reason);
  }
}

DLQ Alerts are Critical

Every event in the DLQ represents a gap in your read model. Set up alerts for DLQ depth and age. Old messages in the DLQ mean users are seeing stale data. Treat DLQ events as incidents requiring investigation and resolution.

Summary: Synchronization Strategies

We've covered the critical infrastructure that keeps read models synchronized with write models. Let's consolidate the key insights:

Key Takeaways

•Avoid dual writes — Writing to multiple stores without transactions causes permanent inconsistency. Use outbox or CDC instead.
•Transactional outbox is the gold standard — Guarantees atomicity between entity changes and event publishing.
•CDC captures everything — Database-level capture ensures completeness but lacks business semantics.
•Choose brokers carefully — Kafka is the default choice for most CQRS systems due to ordering, durability, and replay.
•Partition by aggregate ID — Ensures events for the same entity are processed in order.
•Make projections idempotent — At-least-once delivery means duplicates are inevitable. Handle them gracefully.
•Use dead letter queues — Don't let one bad event block the entire projection. Isolate failures for manual handling.

What's Next:

With the mechanics of CQRS understood—command/query separation, read model optimization, eventual consistency, and synchronization—we can now step back and ask: When should you actually use CQRS? The final page explores decision criteria, anti-patterns, and real-world case studies to help you determine if CQRS is the right choice for your system.

Page Complete

You now understand the synchronization infrastructure that powers CQRS systems. From transactional outbox to CDC to message broker selection, you have the knowledge to build reliable pipelines that keep read models consistently synchronized with write models.

4 / 5

Loading learning content...

System DesignCQRS at Scale

CQRS at Scale

LevelAdvanced

Duration90 mins

TopicCQRS at Scale

4 / 5

Synchronization Strategies

The Synchronization Challenge

This synchronization is deceptively complex. It must be:

Reliable — No events lost, even during failures
Ordered — Cause precedes effect within a stream
Scalable — Handle high event throughput
Observable — Transparent for monitoring and debugging
Recoverable — Able to rebuild from scratch when needed

This page explores the patterns and technologies that make robust synchronization possible at scale.

What You Will Learn

Synchronization Approaches Overview

There are several fundamental approaches to keeping read models synchronized with write models. Each has distinct characteristics, trade-offs, and appropriate use cases.

The core question: How do changes in the write store get propagated to the read store(s)?

Approach 1: Application-Level Event Publishing

The application explicitly publishes events to a message broker after writing to the database. Projections consume from the broker.

Approach 2: Change Data Capture (CDC)

A separate service monitors the write database's transaction log and publishes changes as events. No application code changes needed.

Approach 3: Transactional Outbox Pattern

Events are written to an 'outbox' table in the same transaction as the write. A separate process reads the outbox and publishes to the broker.

Approach 4: Dual Writes (Anti-pattern)

Writing to both the write store and read store directly. Generally problematic and should be avoided.

Synchronization Approach Comparison
Approach	Reliability	Complexity	Coupling	Use Case
Application Events	Medium (at-least-once)	Low	Tight to broker	Event-sourced systems
CDC	High (from DB log)	Medium	Decoupled from app	Legacy system integration
Transactional Outbox	High (transactional)	Medium-High	Requires outbox table	Financial/critical systems
Dual Writes	Low (no atomicity)	Low	Tight to both stores	AVOID in production

The Dual-Write Problem

Application-Level Event Publishing

In this approach, your application code explicitly publishes events after successful writes. This is common in event-sourced systems where events are first-class citizens.

Architecture:

Application Event Publishing Flow
┌────────────────────────────────────────────────────────────────────────┐
│                         COMMAND HANDLER                                 │
│                                                                         │
│   1. Validate command                                                  │
│   2. Execute business logic                                            │
│   3. Persist to write store (transaction)                              │
│   4. Publish events to message broker     ←─── Can fail after step 3!  │
│                                                                         │
└─────────────────────────────────┬──────────────────────────────────────┘
                                  │
                                  ▼ Events
                    ┌─────────────────────────────┐
                    │      MESSAGE BROKER         │
                    │   (Kafka, RabbitMQ, etc.)   │
                    └─────────────────────────────┘
                                  │
           ┌──────────────────────┼──────────────────────┐
           │                      │                      │
           ▼                      ▼                      ▼
   ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
   │  Projection 1 │     │  Projection 2 │     │  Other        │
   │  (Product     │     │  (Analytics   │     │  Consumers    │
   │   Read Model) │     │   Service)    │     │               │
   └───────────────┘     └───────────────┘     └───────────────┘

Basic Event Publishing Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class OrderCommandHandler {
  constructor(
    private orderRepository: OrderRepository,
    private eventPublisher: EventPublisher,
    private transactionManager: TransactionManager
  ) {}
 
  async placeOrder(command: PlaceOrderCommand): Promise<Order> {
    // Execute within transaction
    return await this.transactionManager.transaction(async (tx) => {
      // 1. Create domain entity
      const order = Order.create(command);
      
      // 2. Persist to write store
      await this.orderRepository.save(order, tx);
      
      // 3. Collect domain events
      const events = order.getUncommittedEvents();
      
      // PROBLEM: If we commit the transaction but event publish fails,
      // the order exists but no events are published
      
      return order;
    });
    
    // 4. Publish events AFTER transaction commits
    // But what if this fails? Data is saved, events are lost!
    const events = order.getUncommittedEvents();
    await this.eventPublisher.publishAll(events);
    order.markEventsAsCommitted();
    
    return order;
  }
}

The Reliability Problem:

Mitigations:

Event Store as Primary: If using event sourcing, events ARE the source of truth. Publish directly from the event store scan.
Retry with idempotency: Use message broker features (Kafka transactions, deduplication) to safely retry.
Outbox pattern: Solve this atomically (covered in section 4).

Event Sourcing Advantage

Change Data Capture (CDC)

How CDC Works:

Database logs every transaction to its write-ahead log (WAL)
CDC connector reads the WAL in real-time
Each insert/update/delete becomes an event
Events are published to a message broker
Consumers project events to read models

Popular CDC Tools:

CDC Technology Options
Tool	Databases Supported	Output Format	Key Features
Debezium	PostgreSQL, MySQL, MongoDB, SQL Server, Oracle	Kafka	Open source, mature, widely adopted
AWS DMS	Most major databases	Kinesis, Kafka	Managed service, migration + streaming
Fivetran / Airbyte	200+ sources	Various destinations	SaaS, ELT focus
pg_logical	PostgreSQL only	Custom	Built-in PostgreSQL feature
Maxwell	MySQL only	Kafka, Kinesis, etc.	Lightweight, MySQL specific

Debezium CDC Architecture

┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│    PostgreSQL    │────▶│     Debezium     │────▶│      Kafka       │
│                  │ WAL │    Connector     │     │                  │
│ orders           │     │                  │     │ dbserver.public. │
│ customers        │     │  • Reads WAL     │     │ orders           │
│ products         │     │  • Transforms    │     │                  │
│                  │     │  • Publishes     │     │ dbserver.public. │
└──────────────────┘     └──────────────────┘     │ customers        │
                                                  └────────┬─────────┘
                                                           │
              ┌────────────────────────────────────────────┼─────┐
              │                                            │     │
              ▼                                            ▼     ▼
    ┌──────────────────┐                        ┌──────────────────┐
    │ Order Projection │                        │ Search Indexer   │
    │                  │                        │                  │
    │ Consumes events  │                        │ Updates          │
    │ Updates read DB  │                        │ Elasticsearch    │
    └──────────────────┘                        └──────────────────┘

Debezium Event Structure
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
{
  "schema": { /* Kafka Connect schema */ },
  "payload": {
    "before": null,  // Previous state (null for INSERT)
    "after": {       // New state
      "id": 12345,
      "customer_id": 789,
      "total_amount": 99.99,
      "status": "placed",
      "created_at": 1704897600000
    },
    "source": {
      "version": "2.4.0",
      "connector": "postgresql",
      "name": "dbserver",
      "ts_ms": 1704897600123,      // Transaction timestamp
      "snapshot": "false",
      "db": "ecommerce",
      "schema": "public",
      "table": "orders",
      "txId": 987654,              // Transaction ID
      "lsn": 23456789,             // Log sequence number
      "xmin": null
    },
    "op": "c",        // Operation: c=create, u=update, d=delete, r=read (snapshot)
    "ts_ms": 1704897600200,
    "transaction": {
      "id": "987654:23456789",
      "total_order": 1,
      "data_collection_order": 1
    }
  }
}

Advantages of CDC:

Non-invasive: No application code changes required
Comprehensive: Captures ALL database changes, including direct SQL updates
Reliable: Based on the database's own transaction log—same durability guarantees
Low overhead: Reads log asynchronously, minimal impact on write performance

Challenges with CDC:

Schema changes: Altering tables can break CDC pipelines
Semantics vs. intent: CDC captures what changed, not why (no business events)
Ordering complexity: Cross-table transaction ordering requires careful handling
Snapshot on startup: Initial sync of existing data can be slow for large tables

CDC vs Domain Events

Transactional Outbox Pattern

How It Works:

Application writes entity AND events to database in SAME transaction
Transaction commits atomically (both succeed or both fail)
Relay process polls outbox table for new events
Relay publishes events to message broker
Relay marks events as published

This guarantees: if the entity is saved, the event is also saved. No events can be lost due to crashes between steps.

Outbox Table Schema
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
CREATE TABLE outbox (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Event metadata
    aggregate_type VARCHAR(255) NOT NULL,    -- e.g., 'Order'
    aggregate_id VARCHAR(255) NOT NULL,      -- e.g., order ID
    event_type VARCHAR(255) NOT NULL,        -- e.g., 'OrderPlaced'
    
    -- Event payload
    payload JSONB NOT NULL,
    
    -- Tracking
    created_at TIMESTAMPTZ DEFAULT NOW(),
    published_at TIMESTAMPTZ,                -- NULL until published
    
    -- For ordering and deduplication
    sequence_number BIGSERIAL,
    
    -- Indexes for relay processing
    INDEX idx_unpublished (published_at) WHERE published_at IS NULL,
    INDEX idx_aggregate (aggregate_type, aggregate_id)
);

Outbox Pattern Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
class OrderCommandHandler {
  async placeOrder(command: PlaceOrderCommand): Promise<Order> {
    return await this.transactionManager.transaction(async (tx) => {
      // 1. Create and save order
      const order = Order.create(command);
      await this.orderRepository.save(order, tx);
      
      // 2. Write events to outbox in SAME transaction
      const events = order.getUncommittedEvents();
      await this.outboxRepository.insertAll(events.map(event => ({
        aggregateType: 'Order',
        aggregateId: order.id,
        eventType: event.type,
        payload: event.toJSON()
      })), tx);
      
      // 3. Transaction commits both atomically
      return order;
    });
    // Events are now safely in outbox; relay will publish them
  }
}
 
// Outbox relay process (runs separately)
class OutboxRelay {
  async processOutbox(): Promise<void> {
    while (true) {
      const events = await this.outboxRepository.getUnpublished({ limit: 100 });
      
      if (events.length === 0) {
        await sleep(100); // Polling interval
        continue;
      }
      
      for (const event of events) {
        try {
          // Publish to message broker
          await this.messageBroker.publish(
            `${event.aggregateType}.${event.eventType}`,
            event.payload,
            {
              messageId: event.id, // For deduplication
              headers: {
                'aggregate-id': event.aggregateId,
                'sequence': event.sequenceNumber.toString()
              }
            }
          );
          
          // Mark as published
          await this.outboxRepository.markPublished(event.id);
        } catch (error) {
          // Log error; will retry on next poll
          console.error(`Failed to publish event ${event.id}: ${error}`);
        }
      }
    }
  }
}

Optimizations for Outbox Relay:

Use CDC on the outbox table: Instead of polling, use Debezium to capture outbox inserts and stream to Kafka. Eliminates polling latency and load.
Batched publishing: Publish multiple events in a single broker request.
Partitioned processing: Multiple relay instances process different aggregate types or ID ranges.
Cleanup job: Periodically delete old published events to prevent table bloat.

Listen-to-Yourself Pattern

Message Broker Selection

The message broker is the backbone of CQRS synchronization. Choosing the right broker significantly impacts reliability, performance, and operational complexity.

Key Selection Criteria:

Broker Requirements for CQRS

•Durability — Messages must survive broker restarts. In-memory only is unacceptable.
•Ordering — Events for the same aggregate must be processed in order (at least per-partition/queue).
•At-Least-Once Delivery — Losing messages is unacceptable. Duplicates can be handled via idempotency.
•Consumer Groups — Multiple projection instances should share the load without duplicating work.
•Replay Capability — Ability to reread past messages for projection rebuilds.
•Scalability — Handle high throughput without becoming a bottleneck.

Message Broker Comparison for CQRS
Broker	Ordering	Retention	Consumer Groups	Best For
Apache Kafka	Per-partition	Configurable (days-forever)	Yes, with rebalancing	High-throughput, event sourcing
Amazon Kinesis	Per-shard	24h - 365 days	Yes, with KCL	AWS-native, managed
Amazon SQS	FIFO mode only	Up to 14 days	No native groups	Simple queuing, low throughput
RabbitMQ	Per-queue	Until consumed (persistent)	Competing consumers	Traditional messaging, routing
Google Pub/Sub	Per-subscription	7 days default	Yes	GCP-native, global
Azure Event Hubs	Per-partition	1-7 days (standard)	Yes	Azure-native, Kafka-compatible
Pulsar	Per-topic/partition	Tiered (hot to cold)	Yes	Multi-tenancy, tiered storage

Kafka Deep Dive (Most Common Choice):

Kafka is the de-facto standard for event streaming in CQRS systems. Key concepts:

Kafka Configuration for CQRS
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// Producer configuration for reliable publishing
const producer = kafka.producer({
  idempotent: true,                  // Prevent duplicate messages on retry
  maxInFlightRequests: 5,            // Balance throughput and ordering
  transactionTimeout: 60000,         // For exactly-once semantics
});
 
// Topic configuration for CQRS events
const topicConfig = {
  topic: 'order-events',
  numPartitions: 12,                 // Parallelism for consumers
  replicationFactor: 3,              // Durability across broker failures
  
  // Compaction for event sourcing (keep latest per key)
  // OR deletion for pure event streaming
  cleanupPolicy: 'delete',           // or 'compact' for event store pattern
  
  retentionMs: 7 * 24 * 60 * 60 * 1000,  // 7 days retention
  minInsyncReplicas: 2,              // Require 2 replicas for durability
};
 
// Consumer configuration for projections
const consumer = kafka.consumer({
  groupId: 'order-projection-v1',    // Change for projection rebuild
  
  // Start from earliest to rebuild, or latest for new projections
  fromBeginning: true,
  
  // Manual commits for at-least-once
  autoCommit: false,
  
  // Tune for latency vs throughput
  sessionTimeout: 30000,
  heartbeatInterval: 3000,
  maxWaitTimeInMs: 500,
});
 
await consumer.run({
  eachMessage: async ({ topic, partition, message }) => {
    const event = JSON.parse(message.value.toString());
    
    // Process the event
    await projection.apply(event);
    
    // Commit offset after successful processing
    await consumer.commitOffsets([{
      topic,
      partition,
      offset: (Number(message.offset) + 1).toString()
    }]);
  }
});

Partition Key is Critical

Idempotent Projection Handling

With at-least-once delivery, the same event may be delivered multiple times. Projections must be idempotent—processing the same event twice should produce the same result as processing it once.

Why Duplicates Happen:

Consumer processes event, crashes before committing offset
Network partition causes broker to believe consumer died
Manual replay for debugging or rebuilds
At-least-once delivery semantics (most systems)

Idempotent Projection Patterns
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// Pattern 1: Event ID tracking
class IdempotentProjection {
  async apply(event: DomainEvent): Promise<void> {
    // Check if already processed
    const processed = await this.eventTracker.isProcessed(event.id);
    if (processed) {
      console.log(`Skipping duplicate event: ${event.id}`);
      return;
    }
    
    // Process event
    await this.doApply(event);
    
    // Mark as processed (atomically with the update if possible)
    await this.eventTracker.markProcessed(event.id);
  }
}
 
// Pattern 2: Version/sequence-based idempotency
class VersionedProjection {
  async apply(event: OrderEvent): Promise<void> {
    const current = await this.readStore.getOrder(event.orderId);
    
    // Version check: only apply if event is newer
    if (current && event.version <= current.lastAppliedVersion) {
      console.log(`Skipping outdated event: v${event.version} <= v${current.lastAppliedVersion}`);
      return;
    }
    
    // Apply and update version
    const updated = this.applyEvent(current, event);
    updated.lastAppliedVersion = event.version;
    await this.readStore.save(updated);
  }
}
 
// Pattern 3: Conditional/upsert operations
class ConditionalProjection {
  async handleOrderPlaced(event: OrderPlacedEvent): Promise<void> {
    // Use database upsert - naturally idempotent
    await this.readStore.upsertOrder({
      id: event.orderId,
      customerId: event.customerId,
      items: event.items,
      status: 'placed',
      placedAt: event.timestamp,
    });
  }
  
  async handleInventoryDecremented(event: InventoryDecrementedEvent): Promise<void> {
    // Use conditional update - only apply once
    const result = await this.readStore.updateInventoryIfEventNotApplied(
      event.productId,
      event.eventId,
      -event.quantity
    );
    
    if (result.rowsAffected === 0) {
      console.log(`Event already applied: ${event.eventId}`);
    }
  }
}
 
// SQL implementation of conditional update
```sql
-- Table has 'applied_events' JSONB column tracking processed event IDs
UPDATE inventory_read_model
SET 
  quantity = quantity - $quantity,
  applied_events = applied_events || jsonb_build_array($eventId)
WHERE 
  product_id = $productId
  AND NOT applied_events ? $eventId;  -- Only if not already applied
```

Idempotency Best Practices

•Include event ID in every message — Essential for duplicate detection.
•Use upserts over inserts — INSERT OR UPDATE is naturally idempotent.
•Store sequence numbers — Track last-processed sequence per aggregate.
•Avoid increment operations — Use SET quantity = X, not quantity += X.
•Consider event store patterns — Full event log enables accurate replay.

Error Handling and Dead Letters

The Dead Letter Queue (DLQ) Pattern:

Events that repeatedly fail processing are moved to a 'dead letter queue' for manual investigation. This prevents one bad event from blocking all subsequent events.

Dead Letter Queue Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
interface RetryPolicy {
  maxRetries: number;
  backoffMs: number;
  backoffMultiplier: number;
  maxBackoffMs: number;
}
 
class ResilientProjectionConsumer {
  private retryPolicy: RetryPolicy = {
    maxRetries: 5,
    backoffMs: 100,
    backoffMultiplier: 2,
    maxBackoffMs: 30000
  };
  
  async processEvent(event: DomainEvent): Promise<void> {
    let attempt = 0;
    let lastError: Error;
    
    while (attempt < this.retryPolicy.maxRetries) {
      try {
        await this.projection.apply(event);
        return; // Success
      } catch (error) {
        lastError = error;
        attempt++;
        
        // Classify error
        if (this.isPermanentFailure(error)) {
          // Don't retry permanent failures (e.g., validation errors)
          break;
        }
        
        if (attempt < this.retryPolicy.maxRetries) {
          // Exponential backoff
          const backoffMs = Math.min(
            this.retryPolicy.backoffMs * Math.pow(this.retryPolicy.backoffMultiplier, attempt - 1),
            this.retryPolicy.maxBackoffMs
          );
          
          console.log(`Retry ${attempt}/${this.retryPolicy.maxRetries} in ${backoffMs}ms for event ${event.id}`);
          await sleep(backoffMs);
        }
      }
    }
    
    // All retries exhausted - send to DLQ
    await this.sendToDeadLetterQueue(event, lastError);
  }
  
  private isPermanentFailure(error: Error): boolean {
    // Errors that won't succeed on retry
    return (
      error instanceof ValidationError ||
      error instanceof MalformedEventError ||
      error instanceof BusinessRuleViolation
    );
  }
  
  private async sendToDeadLetterQueue(event: DomainEvent, error: Error): Promise<void> {
    const dlqMessage = {
      originalEvent: event,
      error: {
        message: error.message,
        stack: error.stack,
        name: error.name
      },
      failedAt: new Date().toISOString(),
      projectionName: this.projection.name,
      attempts: this.retryPolicy.maxRetries
    };
    
    // Publish to DLQ topic
    await this.dlqProducer.publish('projection-dlq', dlqMessage);
    
    // Increment DLQ metric for alerting
    this.metrics.increment('projection.dlq.count', {
      projection: this.projection.name,
      eventType: event.type
    });
    
    console.error(`Event sent to DLQ: ${event.id}`, error);
  }
}
 
// DLQ processing service (manual or automated)
class DLQProcessor {
  async replayEvent(dlqMessageId: string): Promise<void> {
    const dlqMessage = await this.dlqStore.getMessage(dlqMessageId);
    
    // Attempt to reprocess (after fixing the underlying issue)
    await this.projectionConsumer.processEvent(dlqMessage.originalEvent);
    
    // If successful, remove from DLQ
    await this.dlqStore.markResolved(dlqMessageId);
  }
  
  async skipEvent(dlqMessageId: string, reason: string): Promise<void> {
    // Mark as intentionally skipped (e.g., corrupted event)
    await this.dlqStore.markSkipped(dlqMessageId, reason);
  }
}

DLQ Alerts are Critical

Summary: Synchronization Strategies

We've covered the critical infrastructure that keeps read models synchronized with write models. Let's consolidate the key insights:

Key Takeaways

•Avoid dual writes — Writing to multiple stores without transactions causes permanent inconsistency. Use outbox or CDC instead.
•Transactional outbox is the gold standard — Guarantees atomicity between entity changes and event publishing.
•CDC captures everything — Database-level capture ensures completeness but lacks business semantics.
•Choose brokers carefully — Kafka is the default choice for most CQRS systems due to ordering, durability, and replay.
•Partition by aggregate ID — Ensures events for the same entity are processed in order.
•Make projections idempotent — At-least-once delivery means duplicates are inevitable. Handle them gracefully.
•Use dead letter queues — Don't let one bad event block the entire projection. Isolate failures for manual handling.

What's Next:

Page Complete

4 / 5