System Design (HLD)Serverless Patterns

Serverless Patterns

LevelAdvanced

Duration90 mins

TopicServerless Patterns

1 / 5

Event-Driven Processing

The Reactive Paradigm in Serverless

Event-driven processing represents the foundational paradigm of serverless computing. Unlike traditional request-response models where servers wait idly for incoming requests, event-driven serverless architectures activate compute resources only when events occur. This fundamental shift transforms how we design, build, and operate distributed systems at scale.

In a traditional architecture, you provision servers that continuously poll for work, consume resources while waiting, and require careful capacity planning to handle peak loads. Event-driven serverless inverts this model entirely: events flow through your system, triggering functions that process data, transform state, and propagate results—all without you managing a single server.

This page provides a comprehensive exploration of event-driven processing in serverless environments. We'll examine the anatomy of events, understand trigger mechanisms, explore common event sources, and develop architectural patterns that maximize the power of reactive computing while avoiding its pitfalls.

What You Will Learn

By the end of this page, you will understand: (1) The fundamental principles of event-driven processing and how they differ from traditional request-response models, (2) The anatomy of events including structure, metadata, and delivery guarantees, (3) Major event sources in cloud environments and their characteristics, (4) Patterns for building resilient, scalable event-driven serverless systems, and (5) Best practices for error handling, retries, and dead letter queues in event-driven architectures.

Understanding Event-Driven Processing

At its core, event-driven processing is a paradigm where the flow of the program is determined by events—significant occurrences or changes in state that the system needs to react to. In serverless computing, this paradigm is elevated to a first-class architectural principle: functions exist dormant until an event awakens them, they process the event, and then return to dormancy.

The Event-Driven Model:

The event-driven model consists of three primary components:

Event Producers: Systems, services, or users that generate events. These can range from user actions in a web application to system-level changes like file uploads or database modifications.
Event Routers: Infrastructure components that receive events from producers and route them to appropriate consumers. In serverless environments, this is typically handled by the cloud provider's event infrastructure (EventBridge, Pub/Sub, Event Grid).
Event Consumers: Functions or services that receive events and execute business logic in response. In serverless, these are your Lambda functions, Cloud Functions, or Azure Functions.

Converting Mermaid diagram...

Why Event-Driven for Serverless?

The event-driven paradigm is particularly well-suited to serverless computing for several fundamental reasons:

Natural Alignment with Pay-Per-Use: Serverless billing models charge for actual compute consumption, typically measured in GB-seconds or invocations. Event-driven processing ensures you only consume resources when there's actual work to do, maximizing cost efficiency.

Automatic Scaling: When events arrive in bursts, the serverless platform automatically scales out to handle the load. When events slow down, it scales back to zero. This elasticity is native to event-driven systems.

Loose Coupling: Event producers don't need to know about consumers, and vice versa. This decoupling enables teams to work independently, deploy separately, and evolve their components without breaking others.

Asynchronous by Default: Event-driven systems naturally handle asynchronous workflows, making them ideal for tasks that don't require immediate responses—batch processing, notifications, data synchronization, and more.

The Inversion of Control

Event-driven processing represents an inversion of control: instead of your code actively pulling for work to do, the infrastructure pushes events to your code when work arrives. This inversion eliminates polling overhead and enables true scale-to-zero behavior, where you pay literally nothing when nothing is happening.

Anatomy of an Event

Understanding the structure and semantics of events is crucial for building robust event-driven systems. While specific event formats vary across platforms and use cases, most events share common structural elements that carry essential information for processing.

Core Event Components:

A well-designed event typically contains several key elements:

Event Type/Name: A semantic identifier describing what occurred (e.g., order.created, user.registered, file.uploaded). This enables consumers to filter and route events appropriately.
Event Source: The origin of the event, identifying which service or system generated it. This provides context and enables troubleshooting.
Event ID: A unique identifier for the specific event instance, enabling idempotency checks and deduplication.
Timestamp: When the event occurred, enabling temporal ordering and time-based processing logic.
Payload/Data: The actual content of the event—the business data that consumers need to process.
Metadata: Additional contextual information such as correlation IDs, trace IDs, or version numbers.

CloudEvents Specification Example
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
  "specversion": "1.0",
  "type": "com.example.order.created",
  "source": "/orders/service",
  "id": "A234-1234-5678-9ABC",
  "time": "2024-01-15T14:32:16.512Z",
  "datacontenttype": "application/json",
  "subject": "order-12345",
  "data": {
    "orderId": "order-12345",
    "customerId": "cust-67890",
    "items": [
      {
        "productId": "prod-111",
        "quantity": 2,
        "price": 29.99
      }
    ],
    "totalAmount": 59.98,
    "currency": "USD",
    "status": "created"
  },
  "extensions": {
    "correlationid": "corr-ABCD-1234",
    "traceid": "trace-WXYZ-5678"
  }
}

CloudEvents Specification:

The CloudEvents specification is an industry-standard attempt to provide a common format for describing events. Developed under the Cloud Native Computing Foundation (CNCF), CloudEvents defines a set of required and optional attributes that enable interoperability across different cloud providers and event systems.

The specification addresses several critical concerns:

Interoperability: Events can flow between different cloud providers and systems with consistent structure
Tooling: Standard formats enable shared tooling for event routing, filtering, and processing
Documentation: Consistent structure makes it easier to understand and work with events from various sources

CloudEvents Core Attributes
Attribute	Type	Required	Description
specversion	String	Yes	Version of CloudEvents spec (e.g., '1.0')
type	String	Yes	Type of event, typically reverse-DNS notation
source	URI-reference	Yes	Identifies the context where event occurred
id	String	Yes	Unique identifier for this event
time	Timestamp	No	When the event occurred (RFC 3339)
datacontenttype	String	No	Content type of data (e.g., 'application/json')
dataschema	URI	No	URI of schema for data attribute
subject	String	No	Subject of event in producer context
data	Any	No	Event payload (domain-specific data)

Event Size Considerations:

When designing events, payload size is a critical consideration. Serverless platforms typically impose limits on event payload size:

AWS Lambda: 256 KB for synchronous invocations, 6 MB via S3-backed invocations
Azure Functions: 100 MB via Azure Storage, smaller for direct invocations
Google Cloud Functions: 10 MB for HTTP, 512 KB for Pub/Sub

For events that would exceed these limits, a common pattern is to store the actual data in object storage (S3, GCS, Azure Blob) and include only a reference (URL or key) in the event payload. This "claim check" pattern keeps events lightweight while enabling processing of arbitrarily large data.

Avoid Event Bloat

Resist the temptation to include everything in an event. Events should contain enough data for consumers to process them independently, but not more. Including unnecessary data increases costs, reduces performance, and creates tight coupling between producers and consumers. If a consumer needs additional data, they can fetch it from a source of truth using identifiers in the event.

Common Event Sources in Serverless

Cloud providers offer a rich ecosystem of event sources that can trigger serverless functions. Understanding these sources and their characteristics is essential for designing effective event-driven architectures.

Storage Events:

Object storage services (S3, GCS, Azure Blob) emit events when files are created, modified, or deleted. These events are fundamental to many serverless workflows:

Image processing: Automatically generate thumbnails when images are uploaded
Data ingestion: Trigger ETL pipelines when data files arrive
Backup verification: Validate backups as they're created
Content moderation: Scan uploaded content for policy violations

Storage events typically include metadata about the affected object (key, size, content type) but not the object content itself—your function retrieves the actual content as needed.

S3 Event Structure
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
  "Records": [
    {
      "eventVersion": "2.1",
      "eventSource": "aws:s3",
      "awsRegion": "us-east-1",
      "eventTime": "2024-01-15T14:32:16.512Z",
      "eventName": "ObjectCreated:Put",
      "s3": {
        "bucket": {
          "name": "my-upload-bucket",
          "arn": "arn:aws:s3:::my-upload-bucket"
        },
        "object": {
          "key": "uploads/images/photo.jpg",
          "size": 1024567,
          "eTag": "d41d8cd98f00b204e9800998ecf8427e",
          "versionId": "096fKKXTRTtl3on89fVO.nfljtsv6qko"
        }
      }
    }
  ]
}

Database Events (Change Data Capture):

Database change streams enable functions to react to data modifications in real-time. Different databases offer different mechanisms:

DynamoDB Streams: Provides ordered sequence of item modifications in DynamoDB tables
Aurora/RDS CDC: Capture changes in relational databases via Binary Log or native CDC
Cosmos DB Change Feed: Real-time order-preserving feed of document changes
Firestore Triggers: React to document creates, updates, or deletes

Database events unlock powerful patterns:

Real-time synchronization: Keep secondary systems in sync with primary data stores
Audit logging: Automatically log all data changes for compliance
Derived data: Maintain materialized views or denormalized read models
Notifications: Trigger alerts or notifications when specific data changes occur

Event Sources by Category
Category	AWS	Azure	GCP
Object Storage	S3 Events	Blob Storage Events	Cloud Storage Events
Database CDC	DynamoDB Streams	Cosmos DB Change Feed	Firestore Triggers
Message Queues	SQS, SNS	Service Bus, Event Grid	Pub/Sub, Tasks
HTTP/API	API Gateway	API Management	Cloud Endpoints
Schedules	EventBridge Scheduler	Timer Trigger	Cloud Scheduler
IoT	IoT Core Rules	IoT Hub Routes	IoT Core Events
Authentication	Cognito Triggers	AAD B2C Events	Firebase Auth Triggers
Custom Events	EventBridge	Event Grid	Eventarc

Message Queue Events:

Message queues (SQS, Service Bus, Pub/Sub) serve as buffers between producers and consumers, enabling reliable asynchronous communication:

Decoupling: Producers and consumers operate independently, at their own pace
Buffering: Handle traffic spikes by queueing messages for later processing
Reliability: Messages persist until successfully processed
Ordering: Some queues guarantee FIFO ordering for sequential processing

HTTP/API Events:

API gateways translate HTTP requests into function invocations, enabling serverless architectures to serve web and mobile applications. This is covered in detail in the API Backends page, but represents one of the most common event sources.

Schedule Events:

Cron-like schedulers trigger functions at defined intervals or times, enabling batch processing, maintenance tasks, and periodic workflows. This is explored further in the Scheduled Tasks page.

Custom Application Events:

Event buses (EventBridge, Event Grid, Eventarc) allow applications to publish custom domain events that trigger downstream functions, enabling sophisticated choreography and event-driven architectures.

Choosing Event Sources Wisely

Not all event sources are equal. Consider delivery guarantees (at-least-once vs exactly-once), ordering guarantees, latency characteristics, retry behavior, and cost models when selecting event sources. The 'right' choice depends on your specific reliability, performance, and cost requirements.

Event Delivery Semantics

Understanding delivery semantics is critical for building reliable event-driven systems. Different event sources provide different guarantees, and your processing logic must be designed accordingly.

At-Most-Once Delivery:

With at-most-once semantics, events are delivered zero or one times. If delivery fails, the event is lost. This is rarely acceptable for business-critical events but may be suitable for metrics or logging where occasional loss is tolerable.

At-Least-Once Delivery:

Most serverless event sources provide at-least-once delivery: every event is guaranteed to be delivered at least once, but may be delivered multiple times. This is the default for S3 events, DynamoDB Streams, SQS, and most other sources.

The implication is profound: your functions must be idempotent. Processing the same event twice must produce the same result as processing it once. Without idempotency, duplicate deliveries can cause:

Duplicate charges or payments
Multiple email notifications to users
Inconsistent data states
Corrupted analytics or metrics

Strategies for Idempotent Processing

•Use Event IDs for Deduplication — Store processed event IDs in a database or cache. Before processing, check if the event ID has already been handled. If so, skip processing.
•Idempotent Operations — Design operations to be naturally idempotent. Use 'SET' instead of 'INCREMENT' where possible. Use upserts instead of inserts.
•Conditional Writes — Use database features like conditional updates or optimistic locking to ensure changes are applied only once.
•Exactly-Once Processing Frameworks — Some streaming frameworks (Kafka Streams, Flink) provide exactly-once processing semantics through transaction coordination.
•State Machines — Design business processes as state machines where each event can only cause a valid state transition, making duplicates harmless.

Idempotent Event Handler Pattern
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import { DynamoDBClient, PutItemCommand, ConditionalCheckFailedException } from "@aws-sdk/client-dynamodb";
 
interface OrderEvent {
  eventId: string;
  orderId: string;
  action: string;
  timestamp: string;
}
 
const dynamodb = new DynamoDBClient({});
 
export async function processOrderEvent(event: OrderEvent): Promise<void> {
  // 1. Attempt to claim the event by recording it
  const claimed = await claimEvent(event.eventId);
  
  if (!claimed) {
    console.log(`Event ${event.eventId} already processed, skipping`);
    return;
  }
 
  try {
    // 2. Process the business logic
    await processOrder(event);
    
    // 3. Mark event as successfully processed
    await markEventComplete(event.eventId);
    
  } catch (error) {
    // 4. Mark event as failed for retry/investigation
    await markEventFailed(event.eventId, error);
    throw error;
  }
}
 
async function claimEvent(eventId: string): Promise<boolean> {
  try {
    await dynamodb.send(new PutItemCommand({
      TableName: "ProcessedEvents",
      Item: {
        eventId: { S: eventId },
        status: { S: "processing" },
        claimedAt: { S: new Date().toISOString() },
        // TTL for automatic cleanup after 7 days
        expiresAt: { N: String(Math.floor(Date.now() / 1000) + 7 * 24 * 60 * 60) }
      },
      // Only succeed if eventId doesn't exist
      ConditionExpression: "attribute_not_exists(eventId)"
    }));
    return true;
  } catch (error) {
    if (error instanceof ConditionalCheckFailedException) {
      return false; // Already processed
    }
    throw error;
  }
}

Exactly-Once Semantics:

True exactly-once delivery is theoretically impossible in distributed systems (due to the Two Generals Problem), but exactly-once processing can be achieved through careful coordination:

Transactional Outbox Pattern: Write business state and event record in the same database transaction
Kafka Transactions: Kafka provides exactly-once semantics within its ecosystem
FIFO Queues with Deduplication: SQS FIFO queues deduplicate messages within a 5-minute window

In practice, most systems opt for at-least-once delivery with idempotent processing, as this provides strong enough guarantees for most use cases while being simpler to implement and reason about.

The Idempotency Window

Idempotency tracking consumes storage and must eventually be cleaned up. Define an appropriate retention window based on your system's characteristics. For most systems, retaining processed event IDs for 24-72 hours is sufficient, as duplicate deliveries typically occur within minutes. Use TTL features in your storage system to automatically expire old records.

Error Handling and Retry Strategies

Robust error handling is essential in event-driven systems. When a function fails to process an event, the system must determine whether to retry, how many times, and what to do with events that consistently fail.

Transient vs Permanent Failures:

Distinguishing between transient and permanent failures is crucial for effective retry strategies:

Transient Failures are temporary issues that may resolve themselves:

Network timeouts or connectivity issues
Throttling from downstream services
Lock contention or database deadlocks
Memory pressure during high load

Permanent Failures won't succeed with retries:

Invalid event data (malformed JSON, missing required fields)
Business rule violations (order for non-existent product)
Authorization failures (user lacks permission)
Bug in processing logic

Retry for Transient Failures

•HTTP 429 (Too Many Requests)
•HTTP 503 (Service Unavailable)
•Network connection timeouts
•Database connection pool exhaustion
•Downstream service throttling
•Temporary disk space issues

Don't Retry Permanent Failures

•HTTP 400 (Bad Request)
•HTTP 401/403 (Auth failures)
•HTTP 404 (Resource not found)
•JSON parsing errors
•Schema validation failures
•Business logic violations

Retry Strategies:

Immediate Retry: Retry immediately after failure. Suitable only for very brief transient issues. Risk: can overwhelm struggling services.

Fixed Delay: Wait a constant duration between retries (e.g., 5 seconds). Simple but may not be optimal.

Exponential Backoff: Double the delay after each retry (1s, 2s, 4s, 8s, ...). Gives failing systems time to recover. The standard approach for most scenarios.

Exponential Backoff with Jitter: Add randomization to backoff delays. Prevents thundering herd when many functions retry simultaneously against a recovered service.

Circuit Breaker: After repeated failures, stop attempting for a cool-off period. Protects downstream services and avoids wasting resources on known-bad destinations.

Retry with Exponential Backoff and Jitter
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
interface RetryConfig {
  maxRetries: number;
  baseDelayMs: number;
  maxDelayMs: number;
  jitterFactor: number; // 0-1, percentage of delay to randomize
}
 
async function withRetry<T>(
  operation: () => Promise<T>,
  config: RetryConfig,
  isRetryable: (error: Error) => boolean
): Promise<T> {
  let lastError: Error;
  
  for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
    try {
      return await operation();
    } catch (error) {
      lastError = error as Error;
      
      // Don't retry if error is not retryable or we've exhausted retries
      if (!isRetryable(lastError) || attempt === config.maxRetries) {
        throw lastError;
      }
      
      // Calculate delay with exponential backoff
      const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt);
      const cappedDelay = Math.min(exponentialDelay, config.maxDelayMs);
      
      // Add jitter: randomize within jitterFactor percentage
      const jitter = cappedDelay * config.jitterFactor * Math.random();
      const finalDelay = cappedDelay + jitter;
      
      console.log(`Attempt ${attempt + 1} failed, retrying in ${finalDelay}ms`);
      await sleep(finalDelay);
    }
  }
  
  throw lastError!;
}
 
function isRetryableError(error: Error): boolean {
  // Check for specific retryable error types
  const retryableStatusCodes = [429, 500, 502, 503, 504];
  const statusCode = (error as any).statusCode;
  
  if (statusCode && retryableStatusCodes.includes(statusCode)) {
    return true;
  }
  
  // Check for network errors
  const networkErrors = ['ECONNRESET', 'ETIMEDOUT', 'ECONNREFUSED'];
  if (networkErrors.some(code => error.message.includes(code))) {
    return true;
  }
  
  return false;
}
 
const sleep = (ms: number) => new Promise(resolve => setTimeout(resolve, ms));

Dead Letter Queues (DLQ):

When events fail all retry attempts, they must go somewhere. Dead Letter Queues capture failed events for later investigation and reprocessing:

Preserve Evidence: Failed events with their metadata are available for debugging
Prevent Data Loss: Events aren't dropped even when processing fails
Enable Recovery: Once bugs are fixed, events can be replayed from the DLQ
Support Monitoring: DLQ depth serves as a critical operational metric

Every production event-driven system should have DLQs configured. Without them, failed events disappear silently, and you lose both the data and the ability to understand what went wrong.

Converting Mermaid diagram...

Always Configure Dead Letter Queues

A production event-driven system without DLQs is flying blind. When something goes wrong (and it will), you need the failed events to understand what happened and to recover. Configure DLQ monitoring and alerting so that your team knows immediately when events are failing—don't discover problems when customers complain.

Event-Driven Processing Patterns

Several proven patterns help structure event-driven processing in serverless architectures. These patterns address common challenges like coordination, aggregation, and distribution.

Fan-Out Pattern:

When a single event needs to trigger multiple independent processes, the fan-out pattern distributes the event to multiple consumers in parallel. This is commonly implemented using SNS (which fans out to multiple SQS queues or Lambda functions) or EventBridge (which routes events to multiple targets based on rules).

Example: When an order is placed, fan out to:

Inventory service to reserve items
Payment service to process payment
Notification service to send confirmation email
Analytics service to record the order

Converting Mermaid diagram...

Fan-In (Aggregation) Pattern:

The inverse of fan-out, this pattern aggregates events from multiple sources into a single processing point. This is useful for combining data from disparate systems or waiting for multiple conditions before proceeding.

Example: Before shipping an order, aggregate:

Payment confirmation from payment service
Stock availability from inventory service
Shipping calculation from logistics service

Event Filtering and Routing:

Not all consumers need all events. Event filtering routes events to appropriate handlers based on event attributes, reducing processing overhead and improving system organization.

EventBridge rules enable sophisticated filtering:

Route order.created events with total > $1000 to fraud detection
Route events from specific regions to regional processors
Route events with specific tags to specialized handlers

EventBridge Rule Pattern Matching
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
{
  "source": ["com.company.orders"],
  "detail-type": ["Order Created"],
  "detail": {
    "totalAmount": [{
      "numeric": [">", 1000]
    }],
    "customerType": ["premium", "enterprise"],
    "region": [{
      "prefix": "us-"
    }]
  }
}

Event-Carried State Transfer:

Instead of consumers querying for additional data, events carry all necessary state with them. This reduces coupling and improves performance but increases event size and requires careful versioning.

Example: Instead of just {"orderId": "123"}, include {"orderId": "123", "customerId": "456", "customerEmail": "user@example.com", "items": [...]}. Consumers have everything they need without additional queries.

Event Sourcing Light:

For simpler cases, maintain a lightweight event log that captures key state changes without full event sourcing complexity. This enables audit logging, debugging, and state reconstruction without the overhead of a complete event sourcing implementation.

Saga Pattern:

For distributed transactions spanning multiple services, the saga pattern coordinates through a sequence of local transactions, each publishing events that trigger the next step. Compensation events handle failures, rolling back previous steps as needed. This pattern is explored in depth in the Event-Driven Architecture chapter.

Event-Driven Patterns Summary
Pattern	Use Case	Implementation	Considerations
Fan-Out	Parallel processing of single event	SNS → Multiple SQS/Lambda	Ensure idempotency in all consumers
Fan-In	Aggregate multiple events	Step Functions, aggregator function	Handle timeout and missing events
Filtering	Route events to specific handlers	EventBridge rules	Start permissive, refine over time
State Transfer	Self-contained event processing	Include all data in event	Event versioning becomes critical
Saga	Distributed transactions	Choreography or orchestration	Complex error handling required

Pattern Composition

These patterns often combine in real systems. An order might trigger a fan-out to multiple services, each of which may use filtering to route sub-events, with sagas coordinating multi-step workflows. Start simple and add complexity only when needed—over-engineering event-driven systems is a common trap.

Best Practices for Event-Driven Serverless

Building production-grade event-driven serverless systems requires attention to many details. These best practices, learned from real-world implementations, help avoid common pitfalls and build systems that are reliable, maintainable, and scalable.

Design Events as First-Class Citizens:

Define clear schemas for all events: Use JSON Schema, Avro, or Protocol Buffers to define event structures
Version events explicitly: Include a version number and plan for schema evolution from day one
Use semantic event names: order.created, payment.processed, user.registered are clear; event1, update, data are not
Document events comprehensively: Event catalogs help teams understand available events and their semantics

Event-Driven Design Principles

•Make functions idempotent by default — Assume every event may be delivered multiple times. Design processing logic to handle duplicates gracefully.
•Keep functions focused — Each function should do one thing well. Avoid multi-purpose functions that handle different event types with branching logic.
•Set appropriate timeouts — Function timeouts should match the work being done. Timeout failures look like other failures to the platform, triggering retries.
•Monitor queue depths and latency — Queue depth and processing latency are key health indicators. Alert on abnormal patterns before they become crises.
•Prefer async over sync — When possible, use asynchronous event-driven patterns instead of synchronous request-response. This improves resilience and scalability.
•Plan for replay — Design systems so that replaying events from an earlier point produces correct results. This enables recovery and debugging.

Observability and Debugging:

Event-driven systems are notoriously difficult to debug because there's no single request thread to follow. Invest heavily in observability:

Correlation IDs: Propagate correlation IDs through all events and logs to trace event chains
Distributed Tracing: Use tools like X-Ray, Jaeger, or OpenTelemetry to visualize event flows
Structured Logging: Log in JSON format with consistent fields (eventId, sourceFunction, timestamp, correlationId)
Metrics and Dashboards: Track invocations, errors, duration, and business metrics per event type
Event Replay Tools: Build tooling to replay events in isolation for debugging

Structured Event Logging
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
interface LogContext {
  correlationId: string;
  eventId: string;
  eventType: string;
  sourceFunction: string;
  timestamp: string;
}
 
function createLogger(context: LogContext) {
  return {
    info: (message: string, data?: Record<string, unknown>) => {
      console.log(JSON.stringify({
        level: "INFO",
        message,
        ...context,
        ...data,
        logTimestamp: new Date().toISOString()
      }));
    },
    error: (message: string, error: Error, data?: Record<string, unknown>) => {
      console.error(JSON.stringify({
        level: "ERROR",
        message,
        errorName: error.name,
        errorMessage: error.message,
        errorStack: error.stack,
        ...context,
        ...data,
        logTimestamp: new Date().toISOString()
      }));
    }
  };
}
 
// Usage in event handler
export async function handler(event: OrderEvent) {
  const logger = createLogger({
    correlationId: event.correlationId || event.eventId,
    eventId: event.eventId,
    eventType: "order.created",
    sourceFunction: "processOrderFunction",
    timestamp: event.timestamp
  });
 
  logger.info("Processing order event", { orderId: event.orderId });
  
  try {
    await processOrder(event);
    logger.info("Order processed successfully", { orderId: event.orderId });
  } catch (error) {
    logger.error("Failed to process order", error as Error, { orderId: event.orderId });
    throw error;
  }
}

Testing Event-Driven Systems:

Testing event-driven architectures requires special consideration:

Unit tests: Test event processing logic in isolation with mocked dependencies
Integration tests: Test the full path from event source through processing to side effects
Contract tests: Verify that event producers and consumers agree on event schemas
Chaos testing: Inject failures to verify retry, DLQ, and recovery mechanisms work correctly
Load testing: Verify behavior under high event volumes and burst patterns

Use local emulators (LocalStack, Firebase Emulator) for development and testing without incurring cloud costs or affecting production systems.

The Async Testing Challenge

Testing async event-driven flows requires different approaches than synchronous request-response. Instead of asserting on responses, you verify that expected side effects occurred (records created, events published, notifications sent). This may require polling or waiting strategies in tests.

Summary: Event-Driven Processing

Event-driven processing is the heart of serverless computing. By understanding events, their sources, delivery guarantees, and processing patterns, you can build systems that are naturally scalable, resilient, and cost-effective.

Let's consolidate the key concepts from this page:

Key Takeaways

•Events drive serverless execution — Functions exist dormant until events awaken them, enabling true scale-to-zero and pay-per-use pricing.
•Event structure matters — Well-designed events with clear schemas, semantic types, and proper metadata enable reliable, evolvable systems.
•Cloud providers offer rich event sources — Storage, databases, queues, HTTP, schedules, and custom events each have distinct characteristics and use cases.
•At-least-once delivery requires idempotency — Most event sources may deliver duplicates; your processing logic must handle them gracefully.
•Error handling is architecture — Retry strategies, backoff algorithms, and dead letter queues aren't afterthoughts—they're core to event-driven design.
•Patterns guide design — Fan-out, fan-in, filtering, and sagas provide proven approaches for common event-driven challenges.
•Observability is essential — Correlation IDs, structured logging, distributed tracing, and monitoring are mandatory for debugging event-driven systems.

What's Next:

With a solid understanding of event-driven processing, we'll explore one of the most common serverless patterns: API Backends. You'll learn how to build scalable, secure HTTP APIs using serverless functions, including request routing, authentication, validation, and response formatting.

Page Complete

You now have a comprehensive understanding of event-driven processing in serverless architectures. This paradigm underlies almost every serverless use case—from simple file processing triggers to complex distributed workflows. Master these concepts, and you'll be well-equipped to design and implement robust serverless systems.

1 / 5

Loading learning content...

System Design (HLD)Serverless Patterns

Serverless Patterns

LevelAdvanced

Duration90 mins

TopicServerless Patterns

1 / 5

Event-Driven Processing

The Reactive Paradigm in Serverless

What You Will Learn

Understanding Event-Driven Processing

The Event-Driven Model:

The event-driven model consists of three primary components:

Event Producers: Systems, services, or users that generate events. These can range from user actions in a web application to system-level changes like file uploads or database modifications.
Event Routers: Infrastructure components that receive events from producers and route them to appropriate consumers. In serverless environments, this is typically handled by the cloud provider's event infrastructure (EventBridge, Pub/Sub, Event Grid).
Event Consumers: Functions or services that receive events and execute business logic in response. In serverless, these are your Lambda functions, Cloud Functions, or Azure Functions.

Converting Mermaid diagram...

Why Event-Driven for Serverless?

The event-driven paradigm is particularly well-suited to serverless computing for several fundamental reasons:

The Inversion of Control

Anatomy of an Event

Core Event Components:

A well-designed event typically contains several key elements:

Event Type/Name: A semantic identifier describing what occurred (e.g., order.created, user.registered, file.uploaded). This enables consumers to filter and route events appropriately.
Event Source: The origin of the event, identifying which service or system generated it. This provides context and enables troubleshooting.
Event ID: A unique identifier for the specific event instance, enabling idempotency checks and deduplication.
Timestamp: When the event occurred, enabling temporal ordering and time-based processing logic.
Payload/Data: The actual content of the event—the business data that consumers need to process.
Metadata: Additional contextual information such as correlation IDs, trace IDs, or version numbers.

CloudEvents Specification Example
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
  "specversion": "1.0",
  "type": "com.example.order.created",
  "source": "/orders/service",
  "id": "A234-1234-5678-9ABC",
  "time": "2024-01-15T14:32:16.512Z",
  "datacontenttype": "application/json",
  "subject": "order-12345",
  "data": {
    "orderId": "order-12345",
    "customerId": "cust-67890",
    "items": [
      {
        "productId": "prod-111",
        "quantity": 2,
        "price": 29.99
      }
    ],
    "totalAmount": 59.98,
    "currency": "USD",
    "status": "created"
  },
  "extensions": {
    "correlationid": "corr-ABCD-1234",
    "traceid": "trace-WXYZ-5678"
  }
}

CloudEvents Specification:

The specification addresses several critical concerns:

Interoperability: Events can flow between different cloud providers and systems with consistent structure
Tooling: Standard formats enable shared tooling for event routing, filtering, and processing
Documentation: Consistent structure makes it easier to understand and work with events from various sources

CloudEvents Core Attributes
Attribute	Type	Required	Description
specversion	String	Yes	Version of CloudEvents spec (e.g., '1.0')
type	String	Yes	Type of event, typically reverse-DNS notation
source	URI-reference	Yes	Identifies the context where event occurred
id	String	Yes	Unique identifier for this event
time	Timestamp	No	When the event occurred (RFC 3339)
datacontenttype	String	No	Content type of data (e.g., 'application/json')
dataschema	URI	No	URI of schema for data attribute
subject	String	No	Subject of event in producer context
data	Any	No	Event payload (domain-specific data)

Event Size Considerations:

When designing events, payload size is a critical consideration. Serverless platforms typically impose limits on event payload size:

AWS Lambda: 256 KB for synchronous invocations, 6 MB via S3-backed invocations
Azure Functions: 100 MB via Azure Storage, smaller for direct invocations
Google Cloud Functions: 10 MB for HTTP, 512 KB for Pub/Sub

Avoid Event Bloat

Common Event Sources in Serverless

Storage Events:

Object storage services (S3, GCS, Azure Blob) emit events when files are created, modified, or deleted. These events are fundamental to many serverless workflows:

Image processing: Automatically generate thumbnails when images are uploaded
Data ingestion: Trigger ETL pipelines when data files arrive
Backup verification: Validate backups as they're created
Content moderation: Scan uploaded content for policy violations

Storage events typically include metadata about the affected object (key, size, content type) but not the object content itself—your function retrieves the actual content as needed.

S3 Event Structure
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
  "Records": [
    {
      "eventVersion": "2.1",
      "eventSource": "aws:s3",
      "awsRegion": "us-east-1",
      "eventTime": "2024-01-15T14:32:16.512Z",
      "eventName": "ObjectCreated:Put",
      "s3": {
        "bucket": {
          "name": "my-upload-bucket",
          "arn": "arn:aws:s3:::my-upload-bucket"
        },
        "object": {
          "key": "uploads/images/photo.jpg",
          "size": 1024567,
          "eTag": "d41d8cd98f00b204e9800998ecf8427e",
          "versionId": "096fKKXTRTtl3on89fVO.nfljtsv6qko"
        }
      }
    }
  ]
}

Database Events (Change Data Capture):

Database change streams enable functions to react to data modifications in real-time. Different databases offer different mechanisms:

DynamoDB Streams: Provides ordered sequence of item modifications in DynamoDB tables
Aurora/RDS CDC: Capture changes in relational databases via Binary Log or native CDC
Cosmos DB Change Feed: Real-time order-preserving feed of document changes
Firestore Triggers: React to document creates, updates, or deletes

Database events unlock powerful patterns:

Real-time synchronization: Keep secondary systems in sync with primary data stores
Audit logging: Automatically log all data changes for compliance
Derived data: Maintain materialized views or denormalized read models
Notifications: Trigger alerts or notifications when specific data changes occur

Event Sources by Category
Category	AWS	Azure	GCP
Object Storage	S3 Events	Blob Storage Events	Cloud Storage Events
Database CDC	DynamoDB Streams	Cosmos DB Change Feed	Firestore Triggers
Message Queues	SQS, SNS	Service Bus, Event Grid	Pub/Sub, Tasks
HTTP/API	API Gateway	API Management	Cloud Endpoints
Schedules	EventBridge Scheduler	Timer Trigger	Cloud Scheduler
IoT	IoT Core Rules	IoT Hub Routes	IoT Core Events
Authentication	Cognito Triggers	AAD B2C Events	Firebase Auth Triggers
Custom Events	EventBridge	Event Grid	Eventarc

Message Queue Events:

Message queues (SQS, Service Bus, Pub/Sub) serve as buffers between producers and consumers, enabling reliable asynchronous communication:

Decoupling: Producers and consumers operate independently, at their own pace
Buffering: Handle traffic spikes by queueing messages for later processing
Reliability: Messages persist until successfully processed
Ordering: Some queues guarantee FIFO ordering for sequential processing

HTTP/API Events:

Schedule Events:

Cron-like schedulers trigger functions at defined intervals or times, enabling batch processing, maintenance tasks, and periodic workflows. This is explored further in the Scheduled Tasks page.

Custom Application Events:

Choosing Event Sources Wisely

Event Delivery Semantics

Understanding delivery semantics is critical for building reliable event-driven systems. Different event sources provide different guarantees, and your processing logic must be designed accordingly.

At-Most-Once Delivery:

At-Least-Once Delivery:

Duplicate charges or payments
Multiple email notifications to users
Inconsistent data states
Corrupted analytics or metrics

Strategies for Idempotent Processing

•Use Event IDs for Deduplication — Store processed event IDs in a database or cache. Before processing, check if the event ID has already been handled. If so, skip processing.
•Idempotent Operations — Design operations to be naturally idempotent. Use 'SET' instead of 'INCREMENT' where possible. Use upserts instead of inserts.
•Conditional Writes — Use database features like conditional updates or optimistic locking to ensure changes are applied only once.
•Exactly-Once Processing Frameworks — Some streaming frameworks (Kafka Streams, Flink) provide exactly-once processing semantics through transaction coordination.
•State Machines — Design business processes as state machines where each event can only cause a valid state transition, making duplicates harmless.

Idempotent Event Handler Pattern
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import { DynamoDBClient, PutItemCommand, ConditionalCheckFailedException } from "@aws-sdk/client-dynamodb";
 
interface OrderEvent {
  eventId: string;
  orderId: string;
  action: string;
  timestamp: string;
}
 
const dynamodb = new DynamoDBClient({});
 
export async function processOrderEvent(event: OrderEvent): Promise<void> {
  // 1. Attempt to claim the event by recording it
  const claimed = await claimEvent(event.eventId);
  
  if (!claimed) {
    console.log(`Event ${event.eventId} already processed, skipping`);
    return;
  }
 
  try {
    // 2. Process the business logic
    await processOrder(event);
    
    // 3. Mark event as successfully processed
    await markEventComplete(event.eventId);
    
  } catch (error) {
    // 4. Mark event as failed for retry/investigation
    await markEventFailed(event.eventId, error);
    throw error;
  }
}
 
async function claimEvent(eventId: string): Promise<boolean> {
  try {
    await dynamodb.send(new PutItemCommand({
      TableName: "ProcessedEvents",
      Item: {
        eventId: { S: eventId },
        status: { S: "processing" },
        claimedAt: { S: new Date().toISOString() },
        // TTL for automatic cleanup after 7 days
        expiresAt: { N: String(Math.floor(Date.now() / 1000) + 7 * 24 * 60 * 60) }
      },
      // Only succeed if eventId doesn't exist
      ConditionExpression: "attribute_not_exists(eventId)"
    }));
    return true;
  } catch (error) {
    if (error instanceof ConditionalCheckFailedException) {
      return false; // Already processed
    }
    throw error;
  }
}

Exactly-Once Semantics:

True exactly-once delivery is theoretically impossible in distributed systems (due to the Two Generals Problem), but exactly-once processing can be achieved through careful coordination:

Transactional Outbox Pattern: Write business state and event record in the same database transaction
Kafka Transactions: Kafka provides exactly-once semantics within its ecosystem
FIFO Queues with Deduplication: SQS FIFO queues deduplicate messages within a 5-minute window

In practice, most systems opt for at-least-once delivery with idempotent processing, as this provides strong enough guarantees for most use cases while being simpler to implement and reason about.

The Idempotency Window

Error Handling and Retry Strategies

Transient vs Permanent Failures:

Distinguishing between transient and permanent failures is crucial for effective retry strategies:

Transient Failures are temporary issues that may resolve themselves:

Network timeouts or connectivity issues
Throttling from downstream services
Lock contention or database deadlocks
Memory pressure during high load

Permanent Failures won't succeed with retries:

Invalid event data (malformed JSON, missing required fields)
Business rule violations (order for non-existent product)
Authorization failures (user lacks permission)
Bug in processing logic

Retry for Transient Failures

•HTTP 429 (Too Many Requests)
•HTTP 503 (Service Unavailable)
•Network connection timeouts
•Database connection pool exhaustion
•Downstream service throttling
•Temporary disk space issues

Don't Retry Permanent Failures

•HTTP 400 (Bad Request)
•HTTP 401/403 (Auth failures)
•HTTP 404 (Resource not found)
•JSON parsing errors
•Schema validation failures
•Business logic violations

Retry Strategies:

Immediate Retry: Retry immediately after failure. Suitable only for very brief transient issues. Risk: can overwhelm struggling services.

Fixed Delay: Wait a constant duration between retries (e.g., 5 seconds). Simple but may not be optimal.

Exponential Backoff: Double the delay after each retry (1s, 2s, 4s, 8s, ...). Gives failing systems time to recover. The standard approach for most scenarios.

Exponential Backoff with Jitter: Add randomization to backoff delays. Prevents thundering herd when many functions retry simultaneously against a recovered service.

Circuit Breaker: After repeated failures, stop attempting for a cool-off period. Protects downstream services and avoids wasting resources on known-bad destinations.

Retry with Exponential Backoff and Jitter
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
interface RetryConfig {
  maxRetries: number;
  baseDelayMs: number;
  maxDelayMs: number;
  jitterFactor: number; // 0-1, percentage of delay to randomize
}
 
async function withRetry<T>(
  operation: () => Promise<T>,
  config: RetryConfig,
  isRetryable: (error: Error) => boolean
): Promise<T> {
  let lastError: Error;
  
  for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
    try {
      return await operation();
    } catch (error) {
      lastError = error as Error;
      
      // Don't retry if error is not retryable or we've exhausted retries
      if (!isRetryable(lastError) || attempt === config.maxRetries) {
        throw lastError;
      }
      
      // Calculate delay with exponential backoff
      const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt);
      const cappedDelay = Math.min(exponentialDelay, config.maxDelayMs);
      
      // Add jitter: randomize within jitterFactor percentage
      const jitter = cappedDelay * config.jitterFactor * Math.random();
      const finalDelay = cappedDelay + jitter;
      
      console.log(`Attempt ${attempt + 1} failed, retrying in ${finalDelay}ms`);
      await sleep(finalDelay);
    }
  }
  
  throw lastError!;
}
 
function isRetryableError(error: Error): boolean {
  // Check for specific retryable error types
  const retryableStatusCodes = [429, 500, 502, 503, 504];
  const statusCode = (error as any).statusCode;
  
  if (statusCode && retryableStatusCodes.includes(statusCode)) {
    return true;
  }
  
  // Check for network errors
  const networkErrors = ['ECONNRESET', 'ETIMEDOUT', 'ECONNREFUSED'];
  if (networkErrors.some(code => error.message.includes(code))) {
    return true;
  }
  
  return false;
}
 
const sleep = (ms: number) => new Promise(resolve => setTimeout(resolve, ms));

Dead Letter Queues (DLQ):

When events fail all retry attempts, they must go somewhere. Dead Letter Queues capture failed events for later investigation and reprocessing:

Preserve Evidence: Failed events with their metadata are available for debugging
Prevent Data Loss: Events aren't dropped even when processing fails
Enable Recovery: Once bugs are fixed, events can be replayed from the DLQ
Support Monitoring: DLQ depth serves as a critical operational metric

Every production event-driven system should have DLQs configured. Without them, failed events disappear silently, and you lose both the data and the ability to understand what went wrong.

Converting Mermaid diagram...

Always Configure Dead Letter Queues

Event-Driven Processing Patterns

Several proven patterns help structure event-driven processing in serverless architectures. These patterns address common challenges like coordination, aggregation, and distribution.

Fan-Out Pattern:

Example: When an order is placed, fan out to:

Inventory service to reserve items
Payment service to process payment
Notification service to send confirmation email
Analytics service to record the order

Converting Mermaid diagram...

Fan-In (Aggregation) Pattern:

Example: Before shipping an order, aggregate:

Payment confirmation from payment service
Stock availability from inventory service
Shipping calculation from logistics service

Event Filtering and Routing:

Not all consumers need all events. Event filtering routes events to appropriate handlers based on event attributes, reducing processing overhead and improving system organization.

EventBridge rules enable sophisticated filtering:

Route order.created events with total > $1000 to fraud detection
Route events from specific regions to regional processors
Route events with specific tags to specialized handlers

EventBridge Rule Pattern Matching
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
{
  "source": ["com.company.orders"],
  "detail-type": ["Order Created"],
  "detail": {
    "totalAmount": [{
      "numeric": [">", 1000]
    }],
    "customerType": ["premium", "enterprise"],
    "region": [{
      "prefix": "us-"
    }]
  }
}

Event-Carried State Transfer:

Instead of consumers querying for additional data, events carry all necessary state with them. This reduces coupling and improves performance but increases event size and requires careful versioning.

Event Sourcing Light:

Saga Pattern:

Event-Driven Patterns Summary
Pattern	Use Case	Implementation	Considerations
Fan-Out	Parallel processing of single event	SNS → Multiple SQS/Lambda	Ensure idempotency in all consumers
Fan-In	Aggregate multiple events	Step Functions, aggregator function	Handle timeout and missing events
Filtering	Route events to specific handlers	EventBridge rules	Start permissive, refine over time
State Transfer	Self-contained event processing	Include all data in event	Event versioning becomes critical
Saga	Distributed transactions	Choreography or orchestration	Complex error handling required

Pattern Composition

Best Practices for Event-Driven Serverless

Design Events as First-Class Citizens:

Define clear schemas for all events: Use JSON Schema, Avro, or Protocol Buffers to define event structures
Version events explicitly: Include a version number and plan for schema evolution from day one
Use semantic event names: order.created, payment.processed, user.registered are clear; event1, update, data are not
Document events comprehensively: Event catalogs help teams understand available events and their semantics

Event-Driven Design Principles

•Make functions idempotent by default — Assume every event may be delivered multiple times. Design processing logic to handle duplicates gracefully.
•Keep functions focused — Each function should do one thing well. Avoid multi-purpose functions that handle different event types with branching logic.
•Set appropriate timeouts — Function timeouts should match the work being done. Timeout failures look like other failures to the platform, triggering retries.
•Monitor queue depths and latency — Queue depth and processing latency are key health indicators. Alert on abnormal patterns before they become crises.
•Prefer async over sync — When possible, use asynchronous event-driven patterns instead of synchronous request-response. This improves resilience and scalability.
•Plan for replay — Design systems so that replaying events from an earlier point produces correct results. This enables recovery and debugging.

Observability and Debugging:

Event-driven systems are notoriously difficult to debug because there's no single request thread to follow. Invest heavily in observability:

Correlation IDs: Propagate correlation IDs through all events and logs to trace event chains
Distributed Tracing: Use tools like X-Ray, Jaeger, or OpenTelemetry to visualize event flows
Structured Logging: Log in JSON format with consistent fields (eventId, sourceFunction, timestamp, correlationId)
Metrics and Dashboards: Track invocations, errors, duration, and business metrics per event type
Event Replay Tools: Build tooling to replay events in isolation for debugging

Structured Event Logging
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
interface LogContext {
  correlationId: string;
  eventId: string;
  eventType: string;
  sourceFunction: string;
  timestamp: string;
}
 
function createLogger(context: LogContext) {
  return {
    info: (message: string, data?: Record<string, unknown>) => {
      console.log(JSON.stringify({
        level: "INFO",
        message,
        ...context,
        ...data,
        logTimestamp: new Date().toISOString()
      }));
    },
    error: (message: string, error: Error, data?: Record<string, unknown>) => {
      console.error(JSON.stringify({
        level: "ERROR",
        message,
        errorName: error.name,
        errorMessage: error.message,
        errorStack: error.stack,
        ...context,
        ...data,
        logTimestamp: new Date().toISOString()
      }));
    }
  };
}
 
// Usage in event handler
export async function handler(event: OrderEvent) {
  const logger = createLogger({
    correlationId: event.correlationId || event.eventId,
    eventId: event.eventId,
    eventType: "order.created",
    sourceFunction: "processOrderFunction",
    timestamp: event.timestamp
  });
 
  logger.info("Processing order event", { orderId: event.orderId });
  
  try {
    await processOrder(event);
    logger.info("Order processed successfully", { orderId: event.orderId });
  } catch (error) {
    logger.error("Failed to process order", error as Error, { orderId: event.orderId });
    throw error;
  }
}

Testing Event-Driven Systems:

Testing event-driven architectures requires special consideration:

Unit tests: Test event processing logic in isolation with mocked dependencies
Integration tests: Test the full path from event source through processing to side effects
Contract tests: Verify that event producers and consumers agree on event schemas
Chaos testing: Inject failures to verify retry, DLQ, and recovery mechanisms work correctly
Load testing: Verify behavior under high event volumes and burst patterns

Use local emulators (LocalStack, Firebase Emulator) for development and testing without incurring cloud costs or affecting production systems.

The Async Testing Challenge

Summary: Event-Driven Processing

Let's consolidate the key concepts from this page:

Key Takeaways

•Events drive serverless execution — Functions exist dormant until events awaken them, enabling true scale-to-zero and pay-per-use pricing.
•Event structure matters — Well-designed events with clear schemas, semantic types, and proper metadata enable reliable, evolvable systems.
•Cloud providers offer rich event sources — Storage, databases, queues, HTTP, schedules, and custom events each have distinct characteristics and use cases.
•At-least-once delivery requires idempotency — Most event sources may deliver duplicates; your processing logic must handle them gracefully.
•Error handling is architecture — Retry strategies, backoff algorithms, and dead letter queues aren't afterthoughts—they're core to event-driven design.
•Patterns guide design — Fan-out, fan-in, filtering, and sagas provide proven approaches for common event-driven challenges.
•Observability is essential — Correlation IDs, structured logging, distributed tracing, and monitoring are mandatory for debugging event-driven systems.

What's Next:

Page Complete

1 / 5