Event-Driven Fundamentals - Learning Module

Loading content...

0/273

Producers and Consumers

The Two Sides of Every Event

Every event in an event-driven architecture has two fundamental participants: something that produces the event and something that consumes it. These two roles—producer and consumer—form the core interaction pattern of EDA.

Understanding producers and consumers goes far beyond knowing that one sends and one receives. Each role has distinct responsibilities, design considerations, scaling characteristics, and failure modes. A well-designed event-driven system requires careful attention to both sides of the event flow.

This page provides a comprehensive examination of producers and consumers: what they do, how to design them, how to scale them, and how to handle the inevitable failures that occur in distributed systems.

What You Will Learn

By the end of this page, you will understand the responsibilities and design patterns for both producers and consumers, how to handle failures and ensure reliability, strategies for scaling each independently, and common pitfalls to avoid when building event-driven components.

Event Producers: Definition and Role

An event producer is any component that detects state changes or significant occurrences and publishes events describing what happened. Producers are the origin point for all information flowing through an event-driven system.

Core Responsibilities of Producers:

Producer Responsibilities

•Detect State Changes — Identify when something significant happens that other parts of the system might care about (order placed, user registered, payment received)
•Construct Events — Create well-formed event messages with all necessary data, proper formatting, and correct metadata
•Publish to Broker — Send events to the appropriate topic/channel on the event broker
•Ensure Delivery — Confirm that events are durably stored by the broker before proceeding (when delivery guarantees matter)
•Maintain Ordering — Use appropriate partition keys to ensure related events are ordered correctly
•Handle Publishing Failures — Implement retry logic, circuit breakers, and fallback strategies when the broker is unavailable

What Producers Don't Do:

Equally important is understanding what producers should not do:

Producers don't know who consumes their events
Producers don't wait for consumers to process events
Producers don't tailor events for specific consumers
Producers don't guarantee consumer-side processing

This deliberate ignorance is what enables loose coupling. The producer's job ends when the event is durably stored by the broker.

The Publisher's Contract

Think of the producer as a journalist. Their job is to accurately report what happened with all relevant details. They don't control who reads the newspaper, what readers do with the information, or whether readers agree. They report facts; reactions are someone else's concern.

Producer Design Patterns

Several design patterns help structure how producers create and publish events reliably.

Pattern 1: Transactional Outbox

The Transactional Outbox pattern solves a critical problem: how do you ensure that a database change and an event publication either both happen or neither happens?

The Problem: If you first update your database and then publish an event, the event might fail after the database commit. If you first publish and then update, the database might fail after the event is sent. Either way, the system becomes inconsistent.

The Solution: Write the event to an 'outbox' table in the same database transaction as your business data. A separate process reads the outbox and publishes events, marking them as sent. If publishing fails, it retries. The database transaction guarantees atomicity.

Transactional Outbox Pattern
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Transactional Outbox Implementation
async function placeOrder(orderData: OrderData): Promise<Order> {
    return await prisma.$transaction(async (tx) => {
        // 1. Create the order (business logic)
        const order = await tx.order.create({
            data: {
                customerId: orderData.customerId,
                items: orderData.items,
                totalAmount: orderData.totalAmount,
                status: 'PLACED'
            }
        });
        
        // 2. Write event to outbox table (same transaction!)
        await tx.outboxEvent.create({
            data: {
                eventType: 'OrderPlaced',
                aggregateId: order.id,
                aggregateType: 'Order',
                payload: JSON.stringify({
                    orderId: order.id,
                    customerId: order.customerId,
                    items: order.items,
                    totalAmount: order.totalAmount,
                    placedAt: new Date().toISOString()
                }),
                createdAt: new Date(),
                published: false  // Not yet published
            }
        });
        
        return order;
    });
    // Transaction commits both OR rolls back both
    // Separate publisher process reads outbox and publishes
}

Pattern 2: Change Data Capture (CDC)

CDC uses database transaction logs to detect changes and automatically produce events without modifying application code.

How it works:

Capture database transaction log (WAL in Postgres, binlog in MySQL)
Debezium or similar tool reads the log
Changes are transformed into events
Events are published to Kafka or other brokers

Advantages: No application code changes; guaranteed to capture all changes; captures changes from any source (including direct SQL).

Disadvantages: Events reflect raw database changes, not business semantics; requires infrastructure investment.

Pattern 3: Domain Event Publisher

In Domain-Driven Design, domain events are first-class citizens raised by aggregates when business-significant changes occur.

How it works:

Domain models collect events internally during operations
After the transaction commits, events are published
Events represent business concepts, not data changes

Domain Event Publisher Pattern
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Domain Event Publisher Pattern
class Order {
    private domainEvents: DomainEvent[] = [];
    
    place(customerId: string, items: OrderItem[]): void {
        // Business logic
        this.status = OrderStatus.PLACED;
        this.customerId = customerId;
        this.items = items;
        
        // Record domain event (not yet published)
        this.domainEvents.push(new OrderPlacedEvent({
            orderId: this.id,
            customerId: this.customerId,
            items: this.items,
            totalAmount: this.calculateTotal(),
            occurredAt: new Date()
        }));
    }
    
    pullDomainEvents(): DomainEvent[] {
        const events = [...this.domainEvents];
        this.domainEvents = [];
        return events;
    }
}
 
// Application service publishes after commit
async function handlePlaceOrder(command: PlaceOrderCommand): Promise<void> {
    const order = new Order();
    order.place(command.customerId, command.items);
    
    await orderRepository.save(order);
    
    // Publish collected domain events
    const events = order.pullDomainEvents();
    await eventPublisher.publishAll(events);
}

Producer Reliability and Failure Handling

Producers must handle various failure scenarios gracefully. The broker may be unavailable, networks may fail, and events may be malformed.

Producer Failure Scenarios and Mitigations
Failure Scenario	Impact	Mitigation Strategy
Broker unavailable	Events cannot be published	Retry with exponential backoff; buffer locally; circuit breaker
Network partition	Requests hang or timeout	Configure appropriate timeouts; failure detection; fallback queues
Message too large	Rejected by broker	Validate size; compress; split into chunks; store payload externally
Serialization failure	Event cannot be created	Schema validation; catch exceptions; log and alert
Partition assignment failure	Events go to wrong partition	Validate partition keys; consistent hashing; monitoring

Delivery Guarantees:

Producers can configure different levels of delivery guarantee:

Fire-and-Forget (at-most-once): Publish and don't wait for acknowledgment. Fastest, but events may be lost.

Wait for Leader Ack (at-least-once): Wait for the broker leader to acknowledge receipt. Good balance of speed and reliability.

Wait for All Replicas (exactly-once capable): Wait for all in-sync replicas to acknowledge. Most reliable, slowest. With idempotent producers, can achieve exactly-once semantics.

Producer with Retry Logic
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
// Robust producer with retry and circuit breaker
class ResilientEventPublisher {
    private circuitBreaker: CircuitBreaker;
    private retryPolicy: RetryPolicy;
    
    constructor(private kafka: Kafka) {
        this.circuitBreaker = new CircuitBreaker({
            failureThreshold: 5,
            resetTimeout: 30000 // 30 seconds
        });
        
        this.retryPolicy = new RetryPolicy({
            maxRetries: 3,
            baseDelay: 100,  // 100ms
            maxDelay: 5000,  // 5 seconds
            exponentialBackoff: true
        });
    }
    
    async publish(event: DomainEvent): Promise<void> {
        // Check circuit breaker first
        if (this.circuitBreaker.isOpen()) {
            await this.bufferForLater(event);
            throw new CircuitOpenError('Broker circuit is open');
        }
        
        try {
            await this.retryPolicy.execute(async () => {
                await this.kafka.producer.send({
                    topic: event.topic,
                    messages: [{
                        key: event.aggregateId,
                        value: JSON.stringify(event.payload),
                        headers: {
                            'event-type': event.type,
                            'correlation-id': event.correlationId,
                            'timestamp': event.timestamp.toISOString()
                        }
                    }]
                });
            });
            
            this.circuitBreaker.recordSuccess();
            
        } catch (error) {
            this.circuitBreaker.recordFailure();
            await this.bufferForLater(event);
            throw error;
        }
    }
    
    private async bufferForLater(event: DomainEvent): Promise<void> {
        // Store in local database or file for retry by background process
        await this.localBuffer.store(event);
    }
}

Never Lose Business Events

For critical business events, the cost of losing an event often exceeds the complexity of ensuring delivery. Use the transactional outbox pattern to guarantee that committed transactions always produce their events, even if the broker is temporarily unavailable.

Event Consumers: Definition and Role

An event consumer is any component that subscribes to event streams, receives events, and takes action based on what happened. Consumers are where events create value—where business reactions occur.

Core Responsibilities of Consumers:

Consumer Responsibilities

•Subscribe to Topics — Register interest in specific event streams from the broker
•Receive Events — Accept events pushed by the broker or pulled by the consumer
•Deserialize and Validate — Convert wire format to domain objects; validate schema and content
•Process Idempotently — Handle the same event multiple times with the same result
•Acknowledge Processing — Commit offset/acknowledge after successful processing
•Handle Failures — Retry, dead-letter, or skip problematic events appropriately
•Maintain State — Track processing progress (offsets, last processed, etc.)

Push vs Pull Consumer Models
Aspect	Push Model	Pull Model
Who initiates	Broker pushes to consumer	Consumer pulls from broker
Flow control	Broker controls rate	Consumer controls rate
Backpressure	Requires explicit signaling	Natural—consumer pulls when ready
Latency	Lower—immediate delivery	Higher—polling interval
Complexity	Consumer must handle burst	Consumer manages its own pace
Examples	RabbitMQ push, webhooks	Kafka polling, SQS polling

Modern Systems Favor Pull

Most modern event streaming platforms (Kafka, Pulsar) use a pull model specifically because it gives consumers control over backpressure. The consumer asks for events when it's ready, avoiding the overwhelm that can occur with aggressive push.

Consumer Design Patterns

Several patterns help structure reliable, scalable event consumers.

Pattern 1: Idempotent Consumer

In distributed systems with at-least-once delivery, the same event may be delivered multiple times. Consumers must be idempotent—processing the same event twice has the same effect as processing once.

Implementation strategies:

Store processed event IDs and skip duplicates
Use database upserts instead of inserts
Design operations to be naturally idempotent (setting state rather than incrementing)

Idempotent Consumer Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Idempotent consumer using event ID tracking
class IdempotentOrderConsumer {
    async handle(event: OrderPlacedEvent): Promise<void> {
        // Check if already processed
        const existing = await this.processedEvents.find(event.eventId);
        if (existing) {
            console.log(`Skipping duplicate event: ${event.eventId}`);
            return; // Already processed—idempotent skip
        }
        
        // Process within transaction
        await this.db.$transaction(async (tx) => {
            // Business logic
            await tx.inventory.update({
                where: { productId: event.productId },
                data: { 
                    reserved: { increment: event.quantity }
                }
            });
            
            // Record as processed (prevents duplicate processing)
            await tx.processedEvent.create({
                data: {
                    eventId: event.eventId,
                    eventType: 'OrderPlaced',
                    processedAt: new Date()
                }
            });
        });
    }
}
 
// Alternative: Natural idempotency via upsert
async function handlePaymentConfirmed(event: PaymentConfirmedEvent): Promise<void> {
    // Upsert is naturally idempotent—same result regardless of attempts
    await db.orderPayment.upsert({
        where: { orderId: event.orderId },
        create: {
            orderId: event.orderId,
            paymentId: event.paymentId,
            amount: event.amount,
            confirmedAt: event.timestamp
        },
        update: {
            // Same data—no adverse effect on duplicate
            paymentId: event.paymentId,
            amount: event.amount,
            confirmedAt: event.timestamp
        }
    });
}

Pattern 2: Competing Consumers

For high-volume topics, multiple consumer instances can share the workload. The broker ensures each event is delivered to only one consumer in the group.

How it works:

Multiple consumer instances join the same consumer group
Broker assigns partitions to consumers
Each partition is consumed by exactly one group member
Adding consumers increases parallelism (up to partition count)

Scaling considerations:

Maximum parallelism = number of partitions
Rebalancing occurs when consumers join/leave
Ordering is preserved within partitions only

Pattern 3: Consumer with State Machine

Complex consumers often need to track state across multiple events. A state machine helps manage these transitions cleanly.

State Machine Consumer
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// State machine for order lifecycle
enum OrderState {
    PENDING = 'PENDING',
    PAID = 'PAID', 
    SHIPPED = 'SHIPPED',
    DELIVERED = 'DELIVERED',
    CANCELLED = 'CANCELLED'
}
 
type OrderTransition = {
    from: OrderState[];
    to: OrderState;
    event: string;
};
 
const orderTransitions: OrderTransition[] = [
    { from: [OrderState.PENDING], to: OrderState.PAID, event: 'PaymentConfirmed' },
    { from: [OrderState.PAID], to: OrderState.SHIPPED, event: 'OrderShipped' },
    { from: [OrderState.SHIPPED], to: OrderState.DELIVERED, event: 'OrderDelivered' },
    { from: [OrderState.PENDING, OrderState.PAID], to: OrderState.CANCELLED, event: 'OrderCancelled' },
];
 
class OrderStateMachineConsumer {
    async handle(event: DomainEvent): Promise<void> {
        const order = await this.orderRepo.findById(event.orderId);
        if (!order) return; // Order not found
        
        const transition = orderTransitions.find(
            t => t.event === event.type && t.from.includes(order.state)
        );
        
        if (!transition) {
            console.warn(
                `Invalid transition: ${event.type} from ${order.state}`
            );
            return; // Invalid transition—log and skip
        }
        
        await this.orderRepo.update({
            id: event.orderId,
            state: transition.to,
            lastEventId: event.eventId,
            updatedAt: new Date()
        });
    }
}

Consumer Reliability and Failure Handling

Consumers face challenging failure scenarios. Processing may fail, events may be malformed, or downstream services may be unavailable.

Consumer Failure Handling Strategies

•Retry with Backoff — Transient failures often resolve with retry. Use exponential backoff to avoid hammering failing services. Limit retries to prevent infinite loops.
•Dead Letter Queue (DLQ) — After max retries, move the event to a DLQ for investigation. This prevents blocking the queue while preserving messages for analysis.
•Skip and Log — For non-critical events, log the failure and move on. Use for analytics events where occasional loss is acceptable.
•Park and Retry Later — Move to a separate queue for delayed retry. Useful when the failure is expected to be temporary but uncertain duration.
•Circuit Breaker — When downstream dependencies fail repeatedly, stop trying temporarily. Prevents cascade failures and allows recovery.

Consumer with DLQ Handling
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// Consumer with retry and DLQ support
class ResilientConsumer {
    private readonly maxRetries = 3;
    private readonly dlqTopic: string;
    
    async processMessage(message: KafkaMessage): Promise<void> {
        const retryCount = this.getRetryCount(message);
        
        try {
            await this.handleEvent(message);
            
        } catch (error) {
            if (this.isRetryable(error) && retryCount < this.maxRetries) {
                // Retry with backoff
                await this.scheduleRetry(message, retryCount + 1);
            } else {
                // Max retries exceeded or non-retryable error
                await this.sendToDLQ(message, error);
            }
        }
    }
    
    private isRetryable(error: Error): boolean {
        // Database connection errors, network timeouts = retry
        // Validation errors, business rule violations = don't retry
        return error instanceof TransientError || 
               error instanceof ConnectionError ||
               error instanceof TimeoutError;
    }
    
    private async sendToDLQ(message: KafkaMessage, error: Error): Promise<void> {
        await this.kafka.producer.send({
            topic: this.dlqTopic,
            messages: [{
                key: message.key,
                value: message.value,
                headers: {
                    ...message.headers,
                    'dlq-reason': error.message,
                    'dlq-timestamp': new Date().toISOString(),
                    'original-topic': message.topic,
                    'retry-count': String(this.getRetryCount(message))
                }
            }]
        });
        
        console.error(`Sent to DLQ: ${message.key}`, { error: error.message });
    }
}

Monitor Your DLQ

A DLQ that fills up silently is worse than no DLQ at all. Set up alerts for DLQ message counts. Regularly review DLQ contents to identify systemic issues. Automate replay of fixed messages when appropriate.

Scaling Producers and Consumers

One of EDA's key benefits is independent scaling of producers and consumers. Each has different scaling characteristics.

Scaling Producers

•Horizontal scaling — Add producer instances freely
•No coordination needed — Each instance publishes independently
•Bottleneck: broker throughput — Brokers can handle massive ingestion
•Batching improves throughput — Send multiple events per request
•Compression reduces bandwidth — Especially for large payloads
•Partitioning spreads load — More partitions = more parallel writes

Scaling Consumers

•Horizontal scaling — Add consumers to consumer group
•Limit: partition count — Can't exceed partition parallelism
•Rebalancing overhead — Adding/removing triggers rebalance
•Stateless is easier — Stateful consumers need careful handling
•Processing speed matters — Slow consumers lag behind
•Batch processing helps — Process multiple events per poll

Consumer Lag and Backpressure:

When consumers can't keep up with producers, consumer lag grows—the offset distance between what's been produced and what's been consumed.

Managing consumer lag:

Monitor lag continuously — Alert when lag exceeds thresholds
Scale consumers — Add instances (up to partition count)
Optimize processing — Profile and speed up slow handlers
Batch processing — Process multiple events per operation where possible
Temporary capacity — Spin up extra consumers during catch-up

Partition count planning:

Partitions = maximum consumer parallelism. Plan ahead—adding partitions is possible but has implications for ordering.

Don't Over-Partition

While more partitions enable more parallelism, they have costs: increased broker memory, longer rebalancing, more file handles. Start with a reasonable number based on expected throughput and scale up if needed. A common starting point is 3-12 partitions per topic, scaled based on actual throughput requirements.

Consumer Group Dynamics

Consumer groups are central to how modern event streaming platforms enable horizontal scaling. Understanding their dynamics is essential.

How Consumer Groups Work:

Group membership: Consumers with the same group ID form a group
Partition assignment: The broker assigns partitions to group members
One partition, one consumer: Each partition is assigned to exactly one group member
Automatic rebalancing: When members join/leave, partitions are reassigned
Offset management: The group tracks progress (offset) per partition

Multiple consumer groups:

Different consumer groups can read the same topic independently. Each group maintains its own offsets. This enables multiple different systems to react to the same events.

Consumer Group Scenarios
Scenario	Partitions	Consumers	Result
Under-provisioned	6	2	3 partitions per consumer
Balanced	6	6	1 partition per consumer (optimal)
Over-provisioned	6	9	6 active, 3 idle (wasted)
Single consumer	6	1	All partitions on one consumer (bottleneck)

Rebalancing:

Rebalancing reassigns partitions when:

A consumer joins the group
A consumer leaves or crashes
New partitions are added
Consumer heartbeat times out

Rebalancing costs:

Processing pauses during rebalance
State may need to be rebuilt
In-progress work may be interrupted

Minimizing rebalancing impact:

Use sticky partition assignment
Increase heartbeat timeouts for slow consumers
Avoid frequent scaling up/down
Use static group membership where appropriate

Kafka's Cooperative Rebalancing

Modern Kafka supports cooperative (incremental) rebalancing, which only reassigns partitions that need to move rather than stopping all consumers. This significantly reduces rebalancing impact. Enable it with the 'cooperative-sticky' assignor.

Best Practices for Producers and Consumers

Based on production experience across many organizations, here are essential best practices:

Producer Best Practices

•Use meaningful partition keys — Same entity = same partition = ordering guarantee. Use customerId, orderId, or similar business identifiers.
•Include correlation IDs — Enable tracing across services. Generate at entry point; propagate through events.
•Use the transactional outbox — For critical events, guarantee publication atomicity with database writes.
•Validate before publishing — Catch schema issues at the source, not in consumers.
•Log publishing metrics — Track events/second, latency, failures. Alert on anomalies.
•Handle broker unavailability — Buffer locally, retry with backoff, alert if persistent.

Consumer Best Practices

•Always be idempotent — At-least-once delivery means duplicates. Design for safe reprocessing.
•Commit offsets after processing — Not before, not during. After successful completion only.
•Use DLQs for poison messages — Don't let bad events block the queue. Investigate separately.
•Monitor consumer lag — Alert when falling behind. Scale or optimize before lag becomes critical.
•Keep processing fast — Offload heavy work asynchronously. Don't block the consumer loop.
•Handle schema evolution — Be forward-compatible. Ignore unknown fields; handle missing optional fields.
•Test failure scenarios — Include broker failures, malformed events, and dependency outages in testing.

Summary: Producers and Consumers

Producers and consumers are the fundamental building blocks of event-driven systems. Let's consolidate the key learnings:

Key Takeaways

•Producers announce what happened — They detect state changes, construct events, and publish to brokers without knowing consumers.
•Transactional outbox ensures reliability — Write events atomically with database changes for guaranteed delivery.
•Consumers react to events — They subscribe, receive, validate, process idempotently, and acknowledge completion.
•Idempotency is mandatory — At-least-once delivery means duplicates; consumers must handle them safely.
•Scaling is independent — Producers scale freely; consumers scale up to partition count.
•Consumer groups enable parallelism — Multiple consumers share workload; partitions distribute evenly.
•Failure handling is essential — Retries, DLQs, and circuit breakers keep systems resilient.

What's Next:

With a solid understanding of producers and consumers, we now need to explore how events are structured. The next page examines Event Schemas—how to design event payloads, manage schema evolution, and ensure compatibility as your system evolves.

Page Complete

You now have a comprehensive understanding of event producers and consumers—their roles, responsibilities, design patterns, scaling strategies, and failure handling. This foundation will guide you in building robust event-driven components.

Producers and Consumers

The Two Sides of Every Event

What You Will Learn

Event Producers: Definition and Role

Core Responsibilities of Producers:

Producer Responsibilities

•Detect State Changes — Identify when something significant happens that other parts of the system might care about (order placed, user registered, payment received)
•Construct Events — Create well-formed event messages with all necessary data, proper formatting, and correct metadata
•Publish to Broker — Send events to the appropriate topic/channel on the event broker
•Ensure Delivery — Confirm that events are durably stored by the broker before proceeding (when delivery guarantees matter)
•Maintain Ordering — Use appropriate partition keys to ensure related events are ordered correctly
•Handle Publishing Failures — Implement retry logic, circuit breakers, and fallback strategies when the broker is unavailable

What Producers Don't Do:

Equally important is understanding what producers should not do:

Producers don't know who consumes their events
Producers don't wait for consumers to process events
Producers don't tailor events for specific consumers
Producers don't guarantee consumer-side processing

This deliberate ignorance is what enables loose coupling. The producer's job ends when the event is durably stored by the broker.

The Publisher's Contract

Producer Design Patterns

Several design patterns help structure how producers create and publish events reliably.

Pattern 1: Transactional Outbox

The Transactional Outbox pattern solves a critical problem: how do you ensure that a database change and an event publication either both happen or neither happens?

Transactional Outbox Pattern
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Transactional Outbox Implementation
async function placeOrder(orderData: OrderData): Promise<Order> {
    return await prisma.$transaction(async (tx) => {
        // 1. Create the order (business logic)
        const order = await tx.order.create({
            data: {
                customerId: orderData.customerId,
                items: orderData.items,
                totalAmount: orderData.totalAmount,
                status: 'PLACED'
            }
        });
        
        // 2. Write event to outbox table (same transaction!)
        await tx.outboxEvent.create({
            data: {
                eventType: 'OrderPlaced',
                aggregateId: order.id,
                aggregateType: 'Order',
                payload: JSON.stringify({
                    orderId: order.id,
                    customerId: order.customerId,
                    items: order.items,
                    totalAmount: order.totalAmount,
                    placedAt: new Date().toISOString()
                }),
                createdAt: new Date(),
                published: false  // Not yet published
            }
        });
        
        return order;
    });
    // Transaction commits both OR rolls back both
    // Separate publisher process reads outbox and publishes
}

Pattern 2: Change Data Capture (CDC)

CDC uses database transaction logs to detect changes and automatically produce events without modifying application code.

How it works:

Capture database transaction log (WAL in Postgres, binlog in MySQL)
Debezium or similar tool reads the log
Changes are transformed into events
Events are published to Kafka or other brokers

Advantages: No application code changes; guaranteed to capture all changes; captures changes from any source (including direct SQL).

Disadvantages: Events reflect raw database changes, not business semantics; requires infrastructure investment.

Pattern 3: Domain Event Publisher

In Domain-Driven Design, domain events are first-class citizens raised by aggregates when business-significant changes occur.

How it works:

Domain models collect events internally during operations
After the transaction commits, events are published
Events represent business concepts, not data changes

Domain Event Publisher Pattern
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Domain Event Publisher Pattern
class Order {
    private domainEvents: DomainEvent[] = [];
    
    place(customerId: string, items: OrderItem[]): void {
        // Business logic
        this.status = OrderStatus.PLACED;
        this.customerId = customerId;
        this.items = items;
        
        // Record domain event (not yet published)
        this.domainEvents.push(new OrderPlacedEvent({
            orderId: this.id,
            customerId: this.customerId,
            items: this.items,
            totalAmount: this.calculateTotal(),
            occurredAt: new Date()
        }));
    }
    
    pullDomainEvents(): DomainEvent[] {
        const events = [...this.domainEvents];
        this.domainEvents = [];
        return events;
    }
}
 
// Application service publishes after commit
async function handlePlaceOrder(command: PlaceOrderCommand): Promise<void> {
    const order = new Order();
    order.place(command.customerId, command.items);
    
    await orderRepository.save(order);
    
    // Publish collected domain events
    const events = order.pullDomainEvents();
    await eventPublisher.publishAll(events);
}

Producer Reliability and Failure Handling

Producers must handle various failure scenarios gracefully. The broker may be unavailable, networks may fail, and events may be malformed.

Producer Failure Scenarios and Mitigations
Failure Scenario	Impact	Mitigation Strategy
Broker unavailable	Events cannot be published	Retry with exponential backoff; buffer locally; circuit breaker
Network partition	Requests hang or timeout	Configure appropriate timeouts; failure detection; fallback queues
Message too large	Rejected by broker	Validate size; compress; split into chunks; store payload externally
Serialization failure	Event cannot be created	Schema validation; catch exceptions; log and alert
Partition assignment failure	Events go to wrong partition	Validate partition keys; consistent hashing; monitoring

Delivery Guarantees:

Producers can configure different levels of delivery guarantee:

Fire-and-Forget (at-most-once): Publish and don't wait for acknowledgment. Fastest, but events may be lost.

Wait for Leader Ack (at-least-once): Wait for the broker leader to acknowledge receipt. Good balance of speed and reliability.

Wait for All Replicas (exactly-once capable): Wait for all in-sync replicas to acknowledge. Most reliable, slowest. With idempotent producers, can achieve exactly-once semantics.

Producer with Retry Logic
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
// Robust producer with retry and circuit breaker
class ResilientEventPublisher {
    private circuitBreaker: CircuitBreaker;
    private retryPolicy: RetryPolicy;
    
    constructor(private kafka: Kafka) {
        this.circuitBreaker = new CircuitBreaker({
            failureThreshold: 5,
            resetTimeout: 30000 // 30 seconds
        });
        
        this.retryPolicy = new RetryPolicy({
            maxRetries: 3,
            baseDelay: 100,  // 100ms
            maxDelay: 5000,  // 5 seconds
            exponentialBackoff: true
        });
    }
    
    async publish(event: DomainEvent): Promise<void> {
        // Check circuit breaker first
        if (this.circuitBreaker.isOpen()) {
            await this.bufferForLater(event);
            throw new CircuitOpenError('Broker circuit is open');
        }
        
        try {
            await this.retryPolicy.execute(async () => {
                await this.kafka.producer.send({
                    topic: event.topic,
                    messages: [{
                        key: event.aggregateId,
                        value: JSON.stringify(event.payload),
                        headers: {
                            'event-type': event.type,
                            'correlation-id': event.correlationId,
                            'timestamp': event.timestamp.toISOString()
                        }
                    }]
                });
            });
            
            this.circuitBreaker.recordSuccess();
            
        } catch (error) {
            this.circuitBreaker.recordFailure();
            await this.bufferForLater(event);
            throw error;
        }
    }
    
    private async bufferForLater(event: DomainEvent): Promise<void> {
        // Store in local database or file for retry by background process
        await this.localBuffer.store(event);
    }
}

Never Lose Business Events

Event Consumers: Definition and Role

Core Responsibilities of Consumers:

Consumer Responsibilities

•Subscribe to Topics — Register interest in specific event streams from the broker
•Receive Events — Accept events pushed by the broker or pulled by the consumer
•Deserialize and Validate — Convert wire format to domain objects; validate schema and content
•Process Idempotently — Handle the same event multiple times with the same result
•Acknowledge Processing — Commit offset/acknowledge after successful processing
•Handle Failures — Retry, dead-letter, or skip problematic events appropriately
•Maintain State — Track processing progress (offsets, last processed, etc.)

Push vs Pull Consumer Models
Aspect	Push Model	Pull Model
Who initiates	Broker pushes to consumer	Consumer pulls from broker
Flow control	Broker controls rate	Consumer controls rate
Backpressure	Requires explicit signaling	Natural—consumer pulls when ready
Latency	Lower—immediate delivery	Higher—polling interval
Complexity	Consumer must handle burst	Consumer manages its own pace
Examples	RabbitMQ push, webhooks	Kafka polling, SQS polling

Modern Systems Favor Pull

Consumer Design Patterns

Several patterns help structure reliable, scalable event consumers.

Pattern 1: Idempotent Consumer

Implementation strategies:

Store processed event IDs and skip duplicates
Use database upserts instead of inserts
Design operations to be naturally idempotent (setting state rather than incrementing)

Idempotent Consumer Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Idempotent consumer using event ID tracking
class IdempotentOrderConsumer {
    async handle(event: OrderPlacedEvent): Promise<void> {
        // Check if already processed
        const existing = await this.processedEvents.find(event.eventId);
        if (existing) {
            console.log(`Skipping duplicate event: ${event.eventId}`);
            return; // Already processed—idempotent skip
        }
        
        // Process within transaction
        await this.db.$transaction(async (tx) => {
            // Business logic
            await tx.inventory.update({
                where: { productId: event.productId },
                data: { 
                    reserved: { increment: event.quantity }
                }
            });
            
            // Record as processed (prevents duplicate processing)
            await tx.processedEvent.create({
                data: {
                    eventId: event.eventId,
                    eventType: 'OrderPlaced',
                    processedAt: new Date()
                }
            });
        });
    }
}
 
// Alternative: Natural idempotency via upsert
async function handlePaymentConfirmed(event: PaymentConfirmedEvent): Promise<void> {
    // Upsert is naturally idempotent—same result regardless of attempts
    await db.orderPayment.upsert({
        where: { orderId: event.orderId },
        create: {
            orderId: event.orderId,
            paymentId: event.paymentId,
            amount: event.amount,
            confirmedAt: event.timestamp
        },
        update: {
            // Same data—no adverse effect on duplicate
            paymentId: event.paymentId,
            amount: event.amount,
            confirmedAt: event.timestamp
        }
    });
}

Pattern 2: Competing Consumers

For high-volume topics, multiple consumer instances can share the workload. The broker ensures each event is delivered to only one consumer in the group.

How it works:

Multiple consumer instances join the same consumer group
Broker assigns partitions to consumers
Each partition is consumed by exactly one group member
Adding consumers increases parallelism (up to partition count)

Scaling considerations:

Maximum parallelism = number of partitions
Rebalancing occurs when consumers join/leave
Ordering is preserved within partitions only

Pattern 3: Consumer with State Machine

Complex consumers often need to track state across multiple events. A state machine helps manage these transitions cleanly.

State Machine Consumer
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// State machine for order lifecycle
enum OrderState {
    PENDING = 'PENDING',
    PAID = 'PAID', 
    SHIPPED = 'SHIPPED',
    DELIVERED = 'DELIVERED',
    CANCELLED = 'CANCELLED'
}
 
type OrderTransition = {
    from: OrderState[];
    to: OrderState;
    event: string;
};
 
const orderTransitions: OrderTransition[] = [
    { from: [OrderState.PENDING], to: OrderState.PAID, event: 'PaymentConfirmed' },
    { from: [OrderState.PAID], to: OrderState.SHIPPED, event: 'OrderShipped' },
    { from: [OrderState.SHIPPED], to: OrderState.DELIVERED, event: 'OrderDelivered' },
    { from: [OrderState.PENDING, OrderState.PAID], to: OrderState.CANCELLED, event: 'OrderCancelled' },
];
 
class OrderStateMachineConsumer {
    async handle(event: DomainEvent): Promise<void> {
        const order = await this.orderRepo.findById(event.orderId);
        if (!order) return; // Order not found
        
        const transition = orderTransitions.find(
            t => t.event === event.type && t.from.includes(order.state)
        );
        
        if (!transition) {
            console.warn(
                `Invalid transition: ${event.type} from ${order.state}`
            );
            return; // Invalid transition—log and skip
        }
        
        await this.orderRepo.update({
            id: event.orderId,
            state: transition.to,
            lastEventId: event.eventId,
            updatedAt: new Date()
        });
    }
}

Consumer Reliability and Failure Handling

Consumers face challenging failure scenarios. Processing may fail, events may be malformed, or downstream services may be unavailable.

Consumer Failure Handling Strategies

•Retry with Backoff — Transient failures often resolve with retry. Use exponential backoff to avoid hammering failing services. Limit retries to prevent infinite loops.
•Dead Letter Queue (DLQ) — After max retries, move the event to a DLQ for investigation. This prevents blocking the queue while preserving messages for analysis.
•Skip and Log — For non-critical events, log the failure and move on. Use for analytics events where occasional loss is acceptable.
•Park and Retry Later — Move to a separate queue for delayed retry. Useful when the failure is expected to be temporary but uncertain duration.
•Circuit Breaker — When downstream dependencies fail repeatedly, stop trying temporarily. Prevents cascade failures and allows recovery.

Consumer with DLQ Handling
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// Consumer with retry and DLQ support
class ResilientConsumer {
    private readonly maxRetries = 3;
    private readonly dlqTopic: string;
    
    async processMessage(message: KafkaMessage): Promise<void> {
        const retryCount = this.getRetryCount(message);
        
        try {
            await this.handleEvent(message);
            
        } catch (error) {
            if (this.isRetryable(error) && retryCount < this.maxRetries) {
                // Retry with backoff
                await this.scheduleRetry(message, retryCount + 1);
            } else {
                // Max retries exceeded or non-retryable error
                await this.sendToDLQ(message, error);
            }
        }
    }
    
    private isRetryable(error: Error): boolean {
        // Database connection errors, network timeouts = retry
        // Validation errors, business rule violations = don't retry
        return error instanceof TransientError || 
               error instanceof ConnectionError ||
               error instanceof TimeoutError;
    }
    
    private async sendToDLQ(message: KafkaMessage, error: Error): Promise<void> {
        await this.kafka.producer.send({
            topic: this.dlqTopic,
            messages: [{
                key: message.key,
                value: message.value,
                headers: {
                    ...message.headers,
                    'dlq-reason': error.message,
                    'dlq-timestamp': new Date().toISOString(),
                    'original-topic': message.topic,
                    'retry-count': String(this.getRetryCount(message))
                }
            }]
        });
        
        console.error(`Sent to DLQ: ${message.key}`, { error: error.message });
    }
}

Monitor Your DLQ

Scaling Producers and Consumers

One of EDA's key benefits is independent scaling of producers and consumers. Each has different scaling characteristics.

Scaling Producers

•Horizontal scaling — Add producer instances freely
•No coordination needed — Each instance publishes independently
•Bottleneck: broker throughput — Brokers can handle massive ingestion
•Batching improves throughput — Send multiple events per request
•Compression reduces bandwidth — Especially for large payloads
•Partitioning spreads load — More partitions = more parallel writes

Scaling Consumers

•Horizontal scaling — Add consumers to consumer group
•Limit: partition count — Can't exceed partition parallelism
•Rebalancing overhead — Adding/removing triggers rebalance
•Stateless is easier — Stateful consumers need careful handling
•Processing speed matters — Slow consumers lag behind
•Batch processing helps — Process multiple events per poll

Consumer Lag and Backpressure:

When consumers can't keep up with producers, consumer lag grows—the offset distance between what's been produced and what's been consumed.

Managing consumer lag:

Monitor lag continuously — Alert when lag exceeds thresholds
Scale consumers — Add instances (up to partition count)
Optimize processing — Profile and speed up slow handlers
Batch processing — Process multiple events per operation where possible
Temporary capacity — Spin up extra consumers during catch-up

Partition count planning:

Partitions = maximum consumer parallelism. Plan ahead—adding partitions is possible but has implications for ordering.

Don't Over-Partition

Consumer Group Dynamics

Consumer groups are central to how modern event streaming platforms enable horizontal scaling. Understanding their dynamics is essential.

How Consumer Groups Work:

Group membership: Consumers with the same group ID form a group
Partition assignment: The broker assigns partitions to group members
One partition, one consumer: Each partition is assigned to exactly one group member
Automatic rebalancing: When members join/leave, partitions are reassigned
Offset management: The group tracks progress (offset) per partition

Multiple consumer groups:

Different consumer groups can read the same topic independently. Each group maintains its own offsets. This enables multiple different systems to react to the same events.

Consumer Group Scenarios
Scenario	Partitions	Consumers	Result
Under-provisioned	6	2	3 partitions per consumer
Balanced	6	6	1 partition per consumer (optimal)
Over-provisioned	6	9	6 active, 3 idle (wasted)
Single consumer	6	1	All partitions on one consumer (bottleneck)

Rebalancing:

Rebalancing reassigns partitions when:

A consumer joins the group
A consumer leaves or crashes
New partitions are added
Consumer heartbeat times out

Rebalancing costs:

Processing pauses during rebalance
State may need to be rebuilt
In-progress work may be interrupted

Minimizing rebalancing impact:

Use sticky partition assignment
Increase heartbeat timeouts for slow consumers
Avoid frequent scaling up/down
Use static group membership where appropriate

Kafka's Cooperative Rebalancing

Best Practices for Producers and Consumers

Based on production experience across many organizations, here are essential best practices:

Producer Best Practices

•Use meaningful partition keys — Same entity = same partition = ordering guarantee. Use customerId, orderId, or similar business identifiers.
•Include correlation IDs — Enable tracing across services. Generate at entry point; propagate through events.
•Use the transactional outbox — For critical events, guarantee publication atomicity with database writes.
•Validate before publishing — Catch schema issues at the source, not in consumers.
•Log publishing metrics — Track events/second, latency, failures. Alert on anomalies.
•Handle broker unavailability — Buffer locally, retry with backoff, alert if persistent.

Consumer Best Practices

•Always be idempotent — At-least-once delivery means duplicates. Design for safe reprocessing.
•Commit offsets after processing — Not before, not during. After successful completion only.
•Use DLQs for poison messages — Don't let bad events block the queue. Investigate separately.
•Monitor consumer lag — Alert when falling behind. Scale or optimize before lag becomes critical.
•Keep processing fast — Offload heavy work asynchronously. Don't block the consumer loop.
•Handle schema evolution — Be forward-compatible. Ignore unknown fields; handle missing optional fields.
•Test failure scenarios — Include broker failures, malformed events, and dependency outages in testing.

Summary: Producers and Consumers

Producers and consumers are the fundamental building blocks of event-driven systems. Let's consolidate the key learnings:

Key Takeaways

•Producers announce what happened — They detect state changes, construct events, and publish to brokers without knowing consumers.
•Transactional outbox ensures reliability — Write events atomically with database changes for guaranteed delivery.
•Consumers react to events — They subscribe, receive, validate, process idempotently, and acknowledge completion.
•Idempotency is mandatory — At-least-once delivery means duplicates; consumers must handle them safely.
•Scaling is independent — Producers scale freely; consumers scale up to partition count.
•Consumer groups enable parallelism — Multiple consumers share workload; partitions distribute evenly.
•Failure handling is essential — Retries, DLQs, and circuit breakers keep systems resilient.

What's Next:

Page Complete