Loading learning content...
It's 3 AM. Customer support reports that orders are being charged but not shipped. The payment system shows successful captures. The shipping system shows empty queues. A customer placed an order 4 hours ago—what happened to the OrderPaidEvent?
This is the nightmare scenario of event-driven systems. Events are ephemeral. They flow through message brokers, trigger handlers, and vanish. When something goes wrong, there's no stack trace that spans the entire flow. No single log file contains the complete picture. The causal chain is distributed across services, queues, and time.
Debugging event-driven systems requires specialized techniques. This page equips you with those techniques—the tools and patterns that Principal Engineers use to diagnose issues in minutes that would otherwise take days.
By the end of this page, you will master: (1) Distributed tracing with correlation IDs and causation chains, (2) Dead letter queue analysis and recovery, (3) Event replay techniques for reproducing issues, (4) Structured logging patterns for event-driven systems, and (5) Observability dashboards and alerting strategies.
Event-driven systems break traditional debugging assumptions. In synchronous systems, a stack trace shows exactly what happened—function A called function B, which called function C. The cause-effect chain is explicit.
In event-driven systems, this chain is implicit and distributed:
The solution is proactive observability. You can't add debugging capabilities after a problem occurs—you must build them into the system from the start. The following sections detail the essential observability patterns for event-driven systems.
If you're reading this after a production incident, you'll be limited to whatever observability was already in place. The lesson: invest in observability during development, not during outages.
The cornerstone of event debugging is correlation IDs—unique identifiers that link all events and operations stemming from a single originating action. When a user places an order, every event, log entry, and database operation generated as a result shares the same correlation ID.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114
/** * Correlation context that flows through the entire event chain */export interface CorrelationContext { // Links all events from the same user action correlationId: string; // ID of the event/command that directly caused this event causationId: string | null; // The original request that started the chain originatingRequestId: string | null; // User who initiated the action userId: string | null; // When the chain started initiatedAt: Date; // Span for distributed tracing (OpenTelemetry compatible) traceId?: string; spanId?: string;} /** * Base event class with built-in correlation */export abstract class DomainEvent { public readonly eventId: string; public readonly occurredAt: Date; public readonly correlationId: string; public readonly causationId: string | null; protected constructor( eventId: string, correlationId: string, causationId: string | null = null ) { this.eventId = eventId; this.occurredAt = new Date(); this.correlationId = correlationId; this.causationId = causationId; }} /** * Event handler base that propagates correlation */export abstract class CorrelatedEventHandler<TEvent extends DomainEvent> { protected abstract handleCore( event: TEvent, context: CorrelationContext ): Promise<void>; public async handle(event: TEvent): Promise<void> { // Extract or create correlation context const context: CorrelationContext = { correlationId: event.correlationId, causationId: event.eventId, // This event causes what we do next originatingRequestId: null, userId: null, initiatedAt: event.occurredAt, }; // Set up logging context AsyncLocalStorage.run(context, async () => { await this.handleCore(event, context); }); } // Utility for handlers to raise follow-up events with propagated correlation protected createFollowUpEvent<T extends DomainEvent>( EventClass: new (data: any) => T, eventData: Omit<T, 'eventId' | 'correlationId' | 'causationId' | 'occurredAt'> ): T { const context = getCurrentCorrelationContext(); return new EventClass({ ...eventData, eventId: generateEventId(), correlationId: context.correlationId, causationId: context.causationId, // Links to parent event }); }} // Usage in a handlerclass InventoryReservationHandler extends CorrelatedEventHandler<OrderPlacedEvent> { protected async handleCore( event: OrderPlacedEvent, context: CorrelationContext ): Promise<void> { // All logging automatically includes correlation ID this.logger.info('Processing order for reservation', { orderId: event.orderId, itemCount: event.items.length, }); // Perform reservation const reservation = await this.inventoryService.reserve( event.orderId, event.items ); // Publish follow-up event with correlation chain const reservedEvent = this.createFollowUpEvent(InventoryReservedEvent, { orderId: event.orderId, reservationId: reservation.id, items: reservation.items, }); await this.eventBus.publish(reservedEvent); }}With correlation IDs in place, debugging becomes a search operation:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
/** * Debugging utility: trace an entire event chain from correlation ID */export async function traceEventChain(correlationId: string): Promise<EventChainView> { // Query event store for all events with this correlation ID const events = await eventStore.findByCorrelationId(correlationId); // Query logs for all log entries with this correlation ID const logs = await logService.query({ filter: { correlationId }, sort: { timestamp: 'asc' }, }); // Build causation tree const tree = buildCausationTree(events); return { correlationId, events: events.sort((a, b) => a.occurredAt.getTime() - b.occurredAt.getTime()), eventTree: tree, logs, // Computed insights totalDuration: calculateDuration(events), failedHandlers: identifyFailures(logs), timeline: buildTimeline(events, logs), };} /** * Build tree showing which events caused which */function buildCausationTree(events: DomainEvent[]): CausationNode { const eventMap = new Map(events.map(e => [e.eventId, e])); const children = new Map<string, DomainEvent[]>(); // Group by causation ID for (const event of events) { if (event.causationId) { const existing = children.get(event.causationId) || []; existing.push(event); children.set(event.causationId, existing); } } // Find root (null causation ID or external) const root = events.find(e => !e.causationId || !eventMap.has(e.causationId)); function buildNode(event: DomainEvent): CausationNode { return { event, causedEvents: (children.get(event.eventId) || []).map(buildNode), }; } return root ? buildNode(root) : { event: null, causedEvents: [] };} // Example output for debugging:/*Correlation ID: corr-abc123Event Chain:├── OrderPlacedEvent (evt-001) @ 10:00:00.000│ ├── InventoryReservedEvent (evt-002) @ 10:00:00.123│ │ └── PaymentCapturedEvent (evt-003) @ 10:00:00.456│ │ └── OrderConfirmedEvent (evt-004) @ 10:00:00.789│ └── NotificationSentEvent (evt-005) @ 10:00:00.234│Duration: 789msFailed Handlers: None*/For production systems, integrate with OpenTelemetry. Use traceId as your correlation ID, and each handler creates child spans. Tools like Jaeger, Zipkin, or cloud provider tracing (AWS X-Ray, Google Cloud Trace) visualize the complete distributed trace automatically.
When event handlers fail permanently, events end up in a Dead Letter Queue (DLQ). The DLQ is your crime scene—it contains evidence of what went wrong and the original events that caused failures.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186
/** * Dead Letter Message with debugging context */export interface DeadLetterMessage { // The original event that failed originalEvent: DomainEvent; // Serialized form as received (for exact reproduction) rawPayload: string; // Which queue/topic the event came from sourceQueue: string; // Which handler(s) failed failedHandler: string; // Error details error: { type: string; message: string; stackTrace: string; code?: string; }; // Retry history attempts: { attemptNumber: number; timestamp: Date; error: string; }[]; // When it landed in DLQ deadLetteredAt: Date; // For prioritization metadata: { correlationId: string; originatingUserId?: string; customerId?: string; orderId?: string; monetaryValue?: number; };} /** * DLQ analysis and recovery tooling */export class DeadLetterQueueManager { constructor( private readonly dlqRepository: DeadLetterRepository, private readonly eventBus: EventBus, private readonly logger: Logger ) {} /** * Get overview of DLQ for dashboard */ async getDashboard(): Promise<DLQDashboard> { const messages = await this.dlqRepository.getAll(); return { totalCount: messages.length, // Group by error type byErrorType: this.groupBy(messages, m => m.error.type), // Group by handler byHandler: this.groupBy(messages, m => m.failedHandler), // Group by event type byEventType: this.groupBy(messages, m => m.originalEvent.constructor.name), // Oldest entries (highest priority) oldest: messages.slice().sort( (a, b) => a.deadLetteredAt.getTime() - b.deadLetteredAt.getTime() ).slice(0, 10), // Highest value (prioritize by business impact) highestValue: messages.slice().sort( (a, b) => (b.metadata.monetaryValue || 0) - (a.metadata.monetaryValue || 0) ).slice(0, 10), // Recent entries (potential ongoing issue) recent: messages.slice().sort( (a, b) => b.deadLetteredAt.getTime() - a.deadLetteredAt.getTime() ).slice(0, 10), }; } /** * Retry a specific dead letter message */ async retrySingle(messageId: string): Promise<RetryResult> { const message = await this.dlqRepository.findById(messageId); if (!message) { return { success: false, error: 'Message not found' }; } this.logger.info('Retrying dead letter message', { messageId, eventType: message.originalEvent.constructor.name, handler: message.failedHandler, correlationId: message.metadata.correlationId, }); try { await this.eventBus.publish(message.originalEvent); await this.dlqRepository.markAsRetried(messageId); return { success: true }; } catch (error) { this.logger.error('Retry failed', { messageId, error }); return { success: false, error: error instanceof Error ? error.message : 'Unknown error', }; } } /** * Batch retry all messages matching criteria */ async retryBatch(criteria: RetryCriteria): Promise<BatchRetryResult> { const messages = await this.dlqRepository.findMatching(criteria); const results = await Promise.allSettled( messages.map(m => this.retrySingle(m.id)) ); return { total: messages.length, succeeded: results.filter(r => r.status === 'fulfilled' && r.value.success).length, failed: results.filter(r => r.status === 'rejected' || !r.value.success).length, }; } /** * Analyze common failure patterns */ async analyzePatterns(): Promise<FailurePattern[]> { const messages = await this.dlqRepository.getAll(); const patterns: Map<string, FailurePattern> = new Map(); for (const message of messages) { const patternKey = `${message.error.type}:${message.failedHandler}`; const existing = patterns.get(patternKey) || { errorType: message.error.type, handler: message.failedHandler, count: 0, firstSeen: message.deadLetteredAt, lastSeen: message.deadLetteredAt, sampleErrors: [], }; existing.count++; existing.lastSeen = new Date(Math.max( existing.lastSeen.getTime(), message.deadLetteredAt.getTime() )); if (existing.sampleErrors.length < 3) { existing.sampleErrors.push(message.error.message); } patterns.set(patternKey, existing); } return Array.from(patterns.values()) .sort((a, b) => b.count - a.count); } private groupBy<T>( items: T[], keyFn: (item: T) => string ): Record<string, number> { const result: Record<string, number> = {}; for (const item of items) { const key = keyFn(item); result[key] = (result[key] || 0) + 1; } return result; }}| Error Pattern | Likely Cause | Investigation Steps | Resolution |
|---|---|---|---|
| Validation errors spike | Schema change in producer | Compare event schema to handler expectations | Update handler or roll back producer |
| Connection refused | Database/service down | Check dependent service health | Fix service, then batch retry |
| Timeout errors | Slow downstream service | Check service metrics and latency | Increase timeout or fix bottleneck |
| Not found errors | Race condition or deleted data | Check if entity exists, timing of operations | Add handler resilience or reorder operations |
| Duplicate key | Handler not idempotent | Check idempotency implementation | Fix handler, manually dedupe DLQ |
Set up alerts for: (1) DLQ depth exceeding threshold, (2) New error types appearing, (3) High-value events in DLQ, (4) Age of oldest DLQ message. These alerts catch issues before they compound.
One of the most powerful debugging techniques in event-driven systems is event replay—taking production events and replaying them in a controlled environment to reproduce issues exactly.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188
/** * Event Replay System for debugging and testing */export class EventReplayService { constructor( private readonly eventStore: EventStore, private readonly eventBus: EventBus, private readonly stateSnapshot: StateSnapshotService, private readonly logger: Logger ) {} /** * Replay events for a specific correlation ID */ async replayCorrelation( correlationId: string, options: ReplayOptions = {} ): Promise<ReplayResult> { const { startFrom = null, endAt = null, handlers = null, // null = all handlers, or list of specific handlers dryRun = true, // Default to dry run for safety speedMultiplier = 1.0, // 1.0 = real-time, 0 = instant } = options; this.logger.info('Starting event replay', { correlationId, options }); // Fetch events let events = await this.eventStore.findByCorrelationId(correlationId); events = events.sort((a, b) => a.occurredAt.getTime() - b.occurredAt.getTime()); // Apply filters if (startFrom) { events = events.filter(e => e.occurredAt >= startFrom); } if (endAt) { events = events.filter(e => e.occurredAt <= endAt); } this.logger.info(`Replaying ${events.length} events`); const results: ReplayEventResult[] = []; let previousTime = events[0]?.occurredAt.getTime(); for (const event of events) { // Simulate time gaps if not instant replay if (speedMultiplier > 0 && previousTime) { const gap = event.occurredAt.getTime() - previousTime; if (gap > 0) { await sleep(gap / speedMultiplier); } } previousTime = event.occurredAt.getTime(); // Replay event const result = await this.replayEvent(event, { handlers, dryRun }); results.push(result); this.logger.info('Replayed event', { eventId: event.eventId, eventType: event.constructor.name, success: result.success, handlerResults: result.handlerResults, }); } return { correlationId, eventsReplayed: events.length, results, succeeded: results.filter(r => r.success).length, failed: results.filter(r => !r.success).length, }; } /** * Replay events from a specific time window (for incident investigation) */ async replayTimeWindow( startTime: Date, endTime: Date, eventTypes: string[], options: ReplayOptions = {} ): Promise<ReplayResult> { const events = await this.eventStore.findInTimeWindow( startTime, endTime, { eventTypes } ); this.logger.info(`Found ${events.length} events in time window`); // Group by correlation ID for ordered replay const byCorrelation = new Map<string, DomainEvent[]>(); for (const event of events) { const existing = byCorrelation.get(event.correlationId) || []; existing.push(event); byCorrelation.set(event.correlationId, existing); } const allResults: ReplayEventResult[] = []; for (const [corrId, corrEvents] of byCorrelation) { const result = await this.replayCorrelation(corrId, { ...options, startFrom: startTime, endAt: endTime, }); allResults.push(...result.results); } return { correlationId: 'multiple', eventsReplayed: allResults.length, results: allResults, succeeded: allResults.filter(r => r.success).length, failed: allResults.filter(r => !r.success).length, }; } /** * Replay single event with detailed tracking */ private async replayEvent( event: DomainEvent, options: { handlers: string[] | null; dryRun: boolean } ): Promise<ReplayEventResult> { const handlerResults: HandlerResult[] = []; if (options.dryRun) { // Dry run: log what would happen without side effects const registeredHandlers = this.eventBus.getHandlersFor(event); return { eventId: event.eventId, eventType: event.constructor.name, success: true, handlerResults: registeredHandlers.map(h => ({ handler: h.name, wouldExecute: true, dryRun: true, })), }; } // Real replay try { await this.eventBus.publish(event); return { eventId: event.eventId, eventType: event.constructor.name, success: true, handlerResults: [{ handler: 'all', success: true }], }; } catch (error) { return { eventId: event.eventId, eventType: event.constructor.name, success: false, error: error instanceof Error ? error.message : 'Unknown', handlerResults: [], }; } }} // Usage exampleasync function debugOrderIssue(orderId: string) { const correlationId = await lookupCorrelationId(orderId); // Step 1: Dry run to see what would happen const dryRunResult = await replayService.replayCorrelation(correlationId, { dryRun: true, }); console.log('Dry run complete:', dryRunResult); // Step 2: If dry run looks good, replay in isolated environment await setupIsolatedEnvironment(); const replayResult = await replayService.replayCorrelation(correlationId, { dryRun: false, handlers: ['InventoryReservationHandler'], // Test specific handler }); console.log('Replay complete:', replayResult);}Logs are your primary debugging tool when correlation queries and event replay aren't sufficient. But unstructured logs ('Processing order...') are nearly useless for debugging. Structured logging with consistent fields enables powerful queries.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176
/** * Structured logging for event-driven systems */export class EventLogger { constructor( private readonly baseLogger: Logger, private readonly contextProvider: () => CorrelationContext | null ) {} /** * Log event publication */ eventPublished(event: DomainEvent, metadata: Record<string, any> = {}): void { this.log('info', 'event.published', { eventId: event.eventId, eventType: event.constructor.name, correlationId: event.correlationId, causationId: event.causationId, ...this.extractEventContext(event), ...metadata, }); } /** * Log handler invocation start */ handlerStarted( handlerName: string, event: DomainEvent, metadata: Record<string, any> = {} ): void { this.log('info', 'handler.started', { handler: handlerName, eventId: event.eventId, eventType: event.constructor.name, correlationId: event.correlationId, ...metadata, }); } /** * Log handler completion */ handlerCompleted( handlerName: string, event: DomainEvent, durationMs: number, metadata: Record<string, any> = {} ): void { this.log('info', 'handler.completed', { handler: handlerName, eventId: event.eventId, eventType: event.constructor.name, correlationId: event.correlationId, durationMs, ...metadata, }); } /** * Log handler failure */ handlerFailed( handlerName: string, event: DomainEvent, error: Error, durationMs: number, metadata: Record<string, any> = {} ): void { this.log('error', 'handler.failed', { handler: handlerName, eventId: event.eventId, eventType: event.constructor.name, correlationId: event.correlationId, durationMs, error: { type: error.constructor.name, message: error.message, stack: error.stack, }, ...metadata, }); } /** * Log business-level action */ action( action: string, details: Record<string, any> ): void { const context = this.contextProvider(); this.log('info', `action.${action}`, { correlationId: context?.correlationId, ...details, }); } private log( level: 'debug' | 'info' | 'warn' | 'error', event: string, data: Record<string, any> ): void { const context = this.contextProvider(); this.baseLogger[level]({ // Standard fields (always present) timestamp: new Date().toISOString(), level, event, // Correlation (automatically added) correlationId: context?.correlationId ?? data.correlationId, traceId: context?.traceId, spanId: context?.spanId, // Event data ...data, // Environment context service: process.env.SERVICE_NAME, version: process.env.APP_VERSION, environment: process.env.NODE_ENV, }); } private extractEventContext(event: DomainEvent): Record<string, any> { // Extract common identifiers for querying const context: Record<string, any> = {}; if ('orderId' in event) context.orderId = (event as any).orderId?.value ?? (event as any).orderId; if ('customerId' in event) context.customerId = (event as any).customerId?.value ?? (event as any).customerId; if ('productId' in event) context.productId = (event as any).productId?.value ?? (event as any).productId; if ('amount' in event) context.amount = (event as any).amount?.value ?? (event as any).amount; return context; }} // Usage in handlersclass InventoryReservationHandler { async handle(event: OrderPlacedEvent): Promise<void> { const startTime = Date.now(); this.logger.handlerStarted('InventoryReservationHandler', event); try { // Do work this.logger.action('inventory.check', { productIds: event.items.map(i => i.productId), }); const reservation = await this.inventoryService.reserve(event.items); this.logger.action('inventory.reserved', { reservationId: reservation.id, itemCount: event.items.length, }); this.logger.handlerCompleted( 'InventoryReservationHandler', event, Date.now() - startTime, { reservationId: reservation.id } ); } catch (error) { this.logger.handlerFailed( 'InventoryReservationHandler', event, error as Error, Date.now() - startTime ); throw error; } }}With structured logging, you can query efficiently:
1234567891011121314151617
# Find all events for a specific ordercorrelationId="corr-abc123" AND orderId="order-456" # Find all handler failures in the last hourevent="handler.failed" AND timestamp > now() - 1h # Find slow handlers (over 5 seconds)event="handler.completed" AND durationMs > 5000 # Find specific error typeevent="handler.failed" AND error.type="InsufficientInventoryError" # Trace inventory issues for a customeraction="inventory.*" AND customerId="cust-789" # Find all events in a time window for an orderorderId="order-456" AND timestamp > "2024-01-15T10:00:00Z" AND timestamp < "2024-01-15T11:00:00Z"Structured logs are only valuable if they're queryable. Use Elasticsearch/Kibana, Datadog, Splunk, or cloud-native solutions (CloudWatch Logs Insights, Google Cloud Logging). Avoid unstructured file logs in production event-driven systems.
Dashboards provide real-time visibility into event flow health. Well-designed dashboards help you spot problems before they become incidents.
| Metric | What It Measures | Alert Threshold |
|---|---|---|
| Event publish rate | Events/second by type | Sudden drop (50% decrease) |
| Handler success rate | % of events successfully processed | Below 99.9% |
| Handler latency p50/p95/p99 | Processing time distribution | p99 > 5s |
| Queue depth | Unprocessed messages per queue | Above 1000 (or 10x normal) |
| DLQ depth | Failed messages awaiting resolution | Any non-zero (new failures) |
| Event lag | Time between publish and process | Above 30 seconds |
| Duplicate events | Events processed more than once | Rising trend |
| Handler retry rate | Retries per event | Above 5% |
Google's SRE book recommends monitoring: Latency, Traffic, Errors, and Saturation. For event-driven systems: Handler latency, Event publish rate, Handler failure rate, and Queue depth satisfy these signals.
Beyond the infrastructure and patterns above, here's a practical toolkit for debugging event-driven systems:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138
/** * CLI utilities for event debugging */ // 1. Event Inspector: Detailed view of a single eventasync function inspectEvent(eventId: string): Promise<void> { const event = await eventStore.findById(eventId); if (!event) { console.error(`Event ${eventId} not found`); return; } console.log('='.repeat(60)); console.log('EVENT DETAILS'); console.log('='.repeat(60)); console.log(`Event ID: ${event.eventId}`); console.log(`Type: ${event.constructor.name}`); console.log(`Occurred At: ${event.occurredAt.toISOString()}`); console.log(`Correlation: ${event.correlationId}`); console.log(`Causation: ${event.causationId || 'none'}`); console.log(''); console.log('Payload:'); console.log(JSON.stringify(event, null, 2)); console.log(''); // Show handlers that processed this event const handlerLogs = await logService.query({ filter: { eventId: event.eventId, event: 'handler.*' }, sort: { timestamp: 'asc' }, }); console.log('Handler Processing:'); for (const log of handlerLogs) { const status = log.event === 'handler.completed' ? '✓' : '✗'; const duration = log.durationMs ? `(${log.durationMs}ms)` : ''; console.log(` ${status} ${log.handler} ${duration}`); if (log.error) { console.log(` Error: ${log.error.message}`); } }} // 2. Trace Viewer: Complete chain for a correlation IDasync function traceChain(correlationId: string): Promise<void> { const events = await eventStore.findByCorrelationId(correlationId); events.sort((a, b) => a.occurredAt.getTime() - b.occurredAt.getTime()); console.log('='.repeat(60)); console.log(`EVENT CHAIN: ${correlationId}`); console.log('='.repeat(60)); const root = events.find(e => !e.causationId); function printEvent(event: DomainEvent, indent: number): void { const prefix = ' '.repeat(indent) + '├── '; const time = event.occurredAt.toISOString().split('T')[1].slice(0, 12); console.log(`${prefix}${time} ${event.constructor.name} [${event.eventId.slice(0, 8)}]`); // Find children const children = events.filter(e => e.causationId === event.eventId); children.forEach(child => printEvent(child, indent + 1)); } if (root) { printEvent(root, 0); } // Summary const totalDuration = events.length > 1 ? events[events.length - 1].occurredAt.getTime() - events[0].occurredAt.getTime() : 0; console.log(''); console.log(`Total Events: ${events.length}`); console.log(`Total Duration: ${totalDuration}ms`);} // 3. Handler History: Recent invocations for a specific handlerasync function handlerHistory( handlerName: string, limit: number = 20): Promise<void> { const logs = await logService.query({ filter: { handler: handlerName, event: /handler\..*/ }, sort: { timestamp: 'desc' }, limit, }); console.log('='.repeat(60)); console.log(`HANDLER HISTORY: ${handlerName}`); console.log('='.repeat(60)); console.log(''); for (const log of logs) { const status = log.event === 'handler.completed' ? '✓' : '✗'; const duration = log.durationMs ? `${log.durationMs}ms` : '--'; console.log(`${status} ${log.timestamp} | ${duration.padStart(6)} | ${log.eventType}`); if (log.error) { console.log(` └─ ${log.error.type}: ${log.error.message}`); } } // Stats const completed = logs.filter(l => l.event === 'handler.completed').length; const failed = logs.filter(l => l.event === 'handler.failed').length; const avgDuration = logs .filter(l => l.durationMs) .reduce((sum, l) => sum + l.durationMs, 0) / completed || 0; console.log(''); console.log(`Success Rate: ${(completed / logs.length * 100).toFixed(1)}%`); console.log(`Avg Duration: ${avgDuration.toFixed(0)}ms`);} // 4. Compare Events: Side-by-side comparison (useful for debugging schema issues)async function compareEvents(eventId1: string, eventId2: string): Promise<void> { const [event1, event2] = await Promise.all([ eventStore.findById(eventId1), eventStore.findById(eventId2), ]); console.log('='.repeat(60)); console.log('EVENT COMPARISON'); console.log('='.repeat(60)); const json1 = JSON.stringify(event1, null, 2).split('\n'); const json2 = JSON.stringify(event2, null, 2).split('\n'); const maxLines = Math.max(json1.length, json2.length); for (let i = 0; i < maxLines; i++) { const line1 = json1[i] || ''; const line2 = json2[i] || ''; const marker = line1 !== line2 ? '≠' : ' '; console.log(`${marker} ${line1.padEnd(40)} | ${line2}`); }}These utilities should be available before production incidents occur. During an outage is not the time to write debugging scripts. Invest in tooling proactively.
Debugging event-driven systems requires preparation, the right tools, and systematic approaches. Let's consolidate the essential practices:
Module Complete:
You've now completed the comprehensive module on Testing Event-Driven Designs. You've learned how to verify event raising, test event handlers in isolation, perform integration testing across event flows, and debug production issues when they occur.
These skills are the mark of an engineer who can design, build, AND maintain event-driven systems confidently. When events inevitably misbehave in production, you'll have the tools and techniques to diagnose and resolve issues rapidly.
Congratulations! You've mastered testing and debugging for event-driven designs. From unit testing individual handlers to integration testing complete flows to debugging production incidents—you now have the comprehensive toolkit that separates competent developers from elite engineers who can confidently operate event-driven systems at scale.