Loading content...
Every event in an event-driven architecture has two fundamental participants: something that produces the event and something that consumes it. These two roles—producer and consumer—form the core interaction pattern of EDA.
Understanding producers and consumers goes far beyond knowing that one sends and one receives. Each role has distinct responsibilities, design considerations, scaling characteristics, and failure modes. A well-designed event-driven system requires careful attention to both sides of the event flow.
This page provides a comprehensive examination of producers and consumers: what they do, how to design them, how to scale them, and how to handle the inevitable failures that occur in distributed systems.
By the end of this page, you will understand the responsibilities and design patterns for both producers and consumers, how to handle failures and ensure reliability, strategies for scaling each independently, and common pitfalls to avoid when building event-driven components.
An event producer is any component that detects state changes or significant occurrences and publishes events describing what happened. Producers are the origin point for all information flowing through an event-driven system.
Core Responsibilities of Producers:
What Producers Don't Do:
Equally important is understanding what producers should not do:
This deliberate ignorance is what enables loose coupling. The producer's job ends when the event is durably stored by the broker.
Think of the producer as a journalist. Their job is to accurately report what happened with all relevant details. They don't control who reads the newspaper, what readers do with the information, or whether readers agree. They report facts; reactions are someone else's concern.
Several design patterns help structure how producers create and publish events reliably.
Pattern 1: Transactional Outbox
The Transactional Outbox pattern solves a critical problem: how do you ensure that a database change and an event publication either both happen or neither happens?
The Problem: If you first update your database and then publish an event, the event might fail after the database commit. If you first publish and then update, the database might fail after the event is sent. Either way, the system becomes inconsistent.
The Solution: Write the event to an 'outbox' table in the same database transaction as your business data. A separate process reads the outbox and publishes events, marking them as sent. If publishing fails, it retries. The database transaction guarantees atomicity.
123456789101112131415161718192021222324252627282930313233343536
// Transactional Outbox Implementationasync function placeOrder(orderData: OrderData): Promise<Order> { return await prisma.$transaction(async (tx) => { // 1. Create the order (business logic) const order = await tx.order.create({ data: { customerId: orderData.customerId, items: orderData.items, totalAmount: orderData.totalAmount, status: 'PLACED' } }); // 2. Write event to outbox table (same transaction!) await tx.outboxEvent.create({ data: { eventType: 'OrderPlaced', aggregateId: order.id, aggregateType: 'Order', payload: JSON.stringify({ orderId: order.id, customerId: order.customerId, items: order.items, totalAmount: order.totalAmount, placedAt: new Date().toISOString() }), createdAt: new Date(), published: false // Not yet published } }); return order; }); // Transaction commits both OR rolls back both // Separate publisher process reads outbox and publishes}Pattern 2: Change Data Capture (CDC)
CDC uses database transaction logs to detect changes and automatically produce events without modifying application code.
How it works:
Advantages: No application code changes; guaranteed to capture all changes; captures changes from any source (including direct SQL).
Disadvantages: Events reflect raw database changes, not business semantics; requires infrastructure investment.
Pattern 3: Domain Event Publisher
In Domain-Driven Design, domain events are first-class citizens raised by aggregates when business-significant changes occur.
How it works:
1234567891011121314151617181920212223242526272829303132333435363738
// Domain Event Publisher Patternclass Order { private domainEvents: DomainEvent[] = []; place(customerId: string, items: OrderItem[]): void { // Business logic this.status = OrderStatus.PLACED; this.customerId = customerId; this.items = items; // Record domain event (not yet published) this.domainEvents.push(new OrderPlacedEvent({ orderId: this.id, customerId: this.customerId, items: this.items, totalAmount: this.calculateTotal(), occurredAt: new Date() })); } pullDomainEvents(): DomainEvent[] { const events = [...this.domainEvents]; this.domainEvents = []; return events; }} // Application service publishes after commitasync function handlePlaceOrder(command: PlaceOrderCommand): Promise<void> { const order = new Order(); order.place(command.customerId, command.items); await orderRepository.save(order); // Publish collected domain events const events = order.pullDomainEvents(); await eventPublisher.publishAll(events);}Producers must handle various failure scenarios gracefully. The broker may be unavailable, networks may fail, and events may be malformed.
| Failure Scenario | Impact | Mitigation Strategy |
|---|---|---|
| Broker unavailable | Events cannot be published | Retry with exponential backoff; buffer locally; circuit breaker |
| Network partition | Requests hang or timeout | Configure appropriate timeouts; failure detection; fallback queues |
| Message too large | Rejected by broker | Validate size; compress; split into chunks; store payload externally |
| Serialization failure | Event cannot be created | Schema validation; catch exceptions; log and alert |
| Partition assignment failure | Events go to wrong partition | Validate partition keys; consistent hashing; monitoring |
Delivery Guarantees:
Producers can configure different levels of delivery guarantee:
Fire-and-Forget (at-most-once): Publish and don't wait for acknowledgment. Fastest, but events may be lost.
Wait for Leader Ack (at-least-once): Wait for the broker leader to acknowledge receipt. Good balance of speed and reliability.
Wait for All Replicas (exactly-once capable): Wait for all in-sync replicas to acknowledge. Most reliable, slowest. With idempotent producers, can achieve exactly-once semantics.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
// Robust producer with retry and circuit breakerclass ResilientEventPublisher { private circuitBreaker: CircuitBreaker; private retryPolicy: RetryPolicy; constructor(private kafka: Kafka) { this.circuitBreaker = new CircuitBreaker({ failureThreshold: 5, resetTimeout: 30000 // 30 seconds }); this.retryPolicy = new RetryPolicy({ maxRetries: 3, baseDelay: 100, // 100ms maxDelay: 5000, // 5 seconds exponentialBackoff: true }); } async publish(event: DomainEvent): Promise<void> { // Check circuit breaker first if (this.circuitBreaker.isOpen()) { await this.bufferForLater(event); throw new CircuitOpenError('Broker circuit is open'); } try { await this.retryPolicy.execute(async () => { await this.kafka.producer.send({ topic: event.topic, messages: [{ key: event.aggregateId, value: JSON.stringify(event.payload), headers: { 'event-type': event.type, 'correlation-id': event.correlationId, 'timestamp': event.timestamp.toISOString() } }] }); }); this.circuitBreaker.recordSuccess(); } catch (error) { this.circuitBreaker.recordFailure(); await this.bufferForLater(event); throw error; } } private async bufferForLater(event: DomainEvent): Promise<void> { // Store in local database or file for retry by background process await this.localBuffer.store(event); }}For critical business events, the cost of losing an event often exceeds the complexity of ensuring delivery. Use the transactional outbox pattern to guarantee that committed transactions always produce their events, even if the broker is temporarily unavailable.
An event consumer is any component that subscribes to event streams, receives events, and takes action based on what happened. Consumers are where events create value—where business reactions occur.
Core Responsibilities of Consumers:
| Aspect | Push Model | Pull Model |
|---|---|---|
| Who initiates | Broker pushes to consumer | Consumer pulls from broker |
| Flow control | Broker controls rate | Consumer controls rate |
| Backpressure | Requires explicit signaling | Natural—consumer pulls when ready |
| Latency | Lower—immediate delivery | Higher—polling interval |
| Complexity | Consumer must handle burst | Consumer manages its own pace |
| Examples | RabbitMQ push, webhooks | Kafka polling, SQS polling |
Most modern event streaming platforms (Kafka, Pulsar) use a pull model specifically because it gives consumers control over backpressure. The consumer asks for events when it's ready, avoiding the overwhelm that can occur with aggressive push.
Several patterns help structure reliable, scalable event consumers.
Pattern 1: Idempotent Consumer
In distributed systems with at-least-once delivery, the same event may be delivered multiple times. Consumers must be idempotent—processing the same event twice has the same effect as processing once.
Implementation strategies:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
// Idempotent consumer using event ID trackingclass IdempotentOrderConsumer { async handle(event: OrderPlacedEvent): Promise<void> { // Check if already processed const existing = await this.processedEvents.find(event.eventId); if (existing) { console.log(`Skipping duplicate event: ${event.eventId}`); return; // Already processed—idempotent skip } // Process within transaction await this.db.$transaction(async (tx) => { // Business logic await tx.inventory.update({ where: { productId: event.productId }, data: { reserved: { increment: event.quantity } } }); // Record as processed (prevents duplicate processing) await tx.processedEvent.create({ data: { eventId: event.eventId, eventType: 'OrderPlaced', processedAt: new Date() } }); }); }} // Alternative: Natural idempotency via upsertasync function handlePaymentConfirmed(event: PaymentConfirmedEvent): Promise<void> { // Upsert is naturally idempotent—same result regardless of attempts await db.orderPayment.upsert({ where: { orderId: event.orderId }, create: { orderId: event.orderId, paymentId: event.paymentId, amount: event.amount, confirmedAt: event.timestamp }, update: { // Same data—no adverse effect on duplicate paymentId: event.paymentId, amount: event.amount, confirmedAt: event.timestamp } });}Pattern 2: Competing Consumers
For high-volume topics, multiple consumer instances can share the workload. The broker ensures each event is delivered to only one consumer in the group.
How it works:
Scaling considerations:
Pattern 3: Consumer with State Machine
Complex consumers often need to track state across multiple events. A state machine helps manage these transitions cleanly.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
// State machine for order lifecycleenum OrderState { PENDING = 'PENDING', PAID = 'PAID', SHIPPED = 'SHIPPED', DELIVERED = 'DELIVERED', CANCELLED = 'CANCELLED'} type OrderTransition = { from: OrderState[]; to: OrderState; event: string;}; const orderTransitions: OrderTransition[] = [ { from: [OrderState.PENDING], to: OrderState.PAID, event: 'PaymentConfirmed' }, { from: [OrderState.PAID], to: OrderState.SHIPPED, event: 'OrderShipped' }, { from: [OrderState.SHIPPED], to: OrderState.DELIVERED, event: 'OrderDelivered' }, { from: [OrderState.PENDING, OrderState.PAID], to: OrderState.CANCELLED, event: 'OrderCancelled' },]; class OrderStateMachineConsumer { async handle(event: DomainEvent): Promise<void> { const order = await this.orderRepo.findById(event.orderId); if (!order) return; // Order not found const transition = orderTransitions.find( t => t.event === event.type && t.from.includes(order.state) ); if (!transition) { console.warn( `Invalid transition: ${event.type} from ${order.state}` ); return; // Invalid transition—log and skip } await this.orderRepo.update({ id: event.orderId, state: transition.to, lastEventId: event.eventId, updatedAt: new Date() }); }}Consumers face challenging failure scenarios. Processing may fail, events may be malformed, or downstream services may be unavailable.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
// Consumer with retry and DLQ supportclass ResilientConsumer { private readonly maxRetries = 3; private readonly dlqTopic: string; async processMessage(message: KafkaMessage): Promise<void> { const retryCount = this.getRetryCount(message); try { await this.handleEvent(message); } catch (error) { if (this.isRetryable(error) && retryCount < this.maxRetries) { // Retry with backoff await this.scheduleRetry(message, retryCount + 1); } else { // Max retries exceeded or non-retryable error await this.sendToDLQ(message, error); } } } private isRetryable(error: Error): boolean { // Database connection errors, network timeouts = retry // Validation errors, business rule violations = don't retry return error instanceof TransientError || error instanceof ConnectionError || error instanceof TimeoutError; } private async sendToDLQ(message: KafkaMessage, error: Error): Promise<void> { await this.kafka.producer.send({ topic: this.dlqTopic, messages: [{ key: message.key, value: message.value, headers: { ...message.headers, 'dlq-reason': error.message, 'dlq-timestamp': new Date().toISOString(), 'original-topic': message.topic, 'retry-count': String(this.getRetryCount(message)) } }] }); console.error(`Sent to DLQ: ${message.key}`, { error: error.message }); }}A DLQ that fills up silently is worse than no DLQ at all. Set up alerts for DLQ message counts. Regularly review DLQ contents to identify systemic issues. Automate replay of fixed messages when appropriate.
One of EDA's key benefits is independent scaling of producers and consumers. Each has different scaling characteristics.
Consumer Lag and Backpressure:
When consumers can't keep up with producers, consumer lag grows—the offset distance between what's been produced and what's been consumed.
Managing consumer lag:
Partition count planning:
Partitions = maximum consumer parallelism. Plan ahead—adding partitions is possible but has implications for ordering.
While more partitions enable more parallelism, they have costs: increased broker memory, longer rebalancing, more file handles. Start with a reasonable number based on expected throughput and scale up if needed. A common starting point is 3-12 partitions per topic, scaled based on actual throughput requirements.
Consumer groups are central to how modern event streaming platforms enable horizontal scaling. Understanding their dynamics is essential.
How Consumer Groups Work:
Multiple consumer groups:
Different consumer groups can read the same topic independently. Each group maintains its own offsets. This enables multiple different systems to react to the same events.
| Scenario | Partitions | Consumers | Result |
|---|---|---|---|
| Under-provisioned | 6 | 2 | 3 partitions per consumer |
| Balanced | 6 | 6 | 1 partition per consumer (optimal) |
| Over-provisioned | 6 | 9 | 6 active, 3 idle (wasted) |
| Single consumer | 6 | 1 | All partitions on one consumer (bottleneck) |
Rebalancing:
Rebalancing reassigns partitions when:
Rebalancing costs:
Minimizing rebalancing impact:
Modern Kafka supports cooperative (incremental) rebalancing, which only reassigns partitions that need to move rather than stopping all consumers. This significantly reduces rebalancing impact. Enable it with the 'cooperative-sticky' assignor.
Based on production experience across many organizations, here are essential best practices:
Producers and consumers are the fundamental building blocks of event-driven systems. Let's consolidate the key learnings:
What's Next:
With a solid understanding of producers and consumers, we now need to explore how events are structured. The next page examines Event Schemas—how to design event payloads, manage schema evolution, and ensure compatibility as your system evolves.
You now have a comprehensive understanding of event producers and consumers—their roles, responsibilities, design patterns, scaling strategies, and failure handling. This foundation will guide you in building robust event-driven components.