System Design (HLD)Event-Driven Fundamentals

Event-Driven Fundamentals

LevelIntermediate

Duration60 mins

TopicEvent-Driven Fundamentals

4 / 4

Event Schemas

The Contract Between Services

In event-driven architecture, events are the primary means by which services communicate. But for this communication to work, both producers and consumers must agree on the structure of events—what fields exist, what types they have, and what they mean.

This agreement is captured in an event schema. Schemas serve as contracts between services, enabling independent teams to build producers and consumers that work together correctly. But schemas also introduce challenges: how do you evolve schemas without breaking consumers? How do you handle backward and forward compatibility? How do you manage schemas across hundreds of event types?

This page provides a comprehensive exploration of event schemas—from design principles to evolution strategies to operational tooling.

What You Will Learn

By the end of this page, you will understand how to design effective event schemas, strategies for evolving schemas without breaking consumers, schema registries and versioning approaches, and common pitfalls to avoid in schema management.

What Are Event Schemas?

An event schema is a formal definition of an event's structure—the fields it contains, their data types, which are required or optional, and any constraints on values.

Why Schemas Matter:

The Value of Explicit Schemas

•Contract Definition — Schemas explicitly define what producers promise to provide and what consumers can expect to receive.
•Validation — Events can be validated against schemas before publishing or after receiving, catching errors early.
•Documentation — Schemas serve as living documentation of event structures, reducing the need for out-of-band communication.
•Code Generation — Types can be generated from schemas, providing compile-time safety in producers and consumers.
•Evolution Management — Formal schemas enable tooling to detect breaking changes and manage compatibility.
•Discovery — Schema registries allow teams to discover what events exist and understand their structure.

Schema vs Schemaless:

Some systems operate without explicit schemas—producers send JSON, consumers parse what they receive. This seems simpler initially but creates problems at scale:

No validation until runtime
Implicit contracts that drift over time
No way to detect breaking changes
Documentation becomes outdated or missing
Consumers break silently when producers change

Explicit schemas have upfront cost but prevent far more expensive problems down the road. For any non-trivial event-driven system, schemas are essential infrastructure.

Schema-First Development

The most successful teams adopt schema-first development: define the event schema before writing producer or consumer code. This forces clear thinking about what data is needed, enables parallel development (teams can work against the schema contract), and produces better-designed events.

Schema Definition Languages

Several widely-used formats exist for defining event schemas, each with different tradeoffs.

Schema Format Comparison
Format	Strengths	Weaknesses	Best For
Apache Avro	Compact binary, schema evolution, Kafka native	Requires schema registry, more complex	High-volume Kafka deployments
Protocol Buffers	Fast, compact, wide language support	Less flexibility, version in schema	gRPC systems, Google ecosystem
JSON Schema	Human readable, JSON native, no special tooling	Larger payloads, weaker evolution story	REST APIs, simpler systems
Apache Thrift	Efficient, cross-language RPC	More RPC-focused, less event-focused	Facebook ecosystem, RPC-heavy
CloudEvents	Standardized envelope, vendor-neutral	Envelope only, payload schema separate	Multi-cloud, standard compliance

Apache Avro Example:

Avro is particularly well-suited for Kafka-based event systems because of its compact binary format, excellent schema evolution support, and tight integration with Confluent Schema Registry.

order-placed.avsc
Avro Schema
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
{
  "type": "record",
  "name": "OrderPlaced",
  "namespace": "com.example.orders.events",
  "doc": "Emitted when a customer successfully places an order",
  "fields": [
    {
      "name": "eventId",
      "type": "string",
      "doc": "Unique identifier for this event instance"
    },
    {
      "name": "eventTimestamp",
      "type": {
        "type": "long",
        "logicalType": "timestamp-millis"
      },
      "doc": "When the order was placed (epoch milliseconds)"
    },
    {
      "name": "orderId",
      "type": "string",
      "doc": "Unique identifier for the order"
    },
    {
      "name": "customerId",
      "type": "string",
      "doc": "Identifier of the customer who placed the order"
    },
    {
      "name": "items",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "OrderItem",
          "fields": [
            { "name": "productId", "type": "string" },
            { "name": "productName", "type": "string" },
            { "name": "quantity", "type": "int" },
            { "name": "unitPriceCents", "type": "long" }
          ]
        }
      },
      "doc": "Items included in the order"
    },
    {
      "name": "totalAmountCents",
      "type": "long",
      "doc": "Total order amount in cents"
    },
    {
      "name": "currency",
      "type": "string",
      "default": "USD",
      "doc": "ISO currency code"
    },
    {
      "name": "shippingAddress",
      "type": ["null", {
        "type": "record",
        "name": "Address",
        "fields": [
          { "name": "street", "type": "string" },
          { "name": "city", "type": "string" },
          { "name": "state", "type": ["null", "string"], "default": null },
          { "name": "postalCode", "type": "string" },
          { "name": "country", "type": "string" }
        ]
      }],
      "default": null,
      "doc": "Shipping address (null for digital-only orders)"
    }
  ]
}

JSON Schema Example:

JSON Schema is more human-readable and doesn't require special tooling, making it suitable for simpler systems or when JSON is already the wire format.

order-placed.schema.json
JSON Schema
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "https://example.com/schemas/order-placed.json",
  "title": "OrderPlaced",
  "description": "Emitted when a customer successfully places an order",
  "type": "object",
  "required": ["eventId", "eventTimestamp", "orderId", "customerId", "items", "totalAmountCents"],
  "properties": {
    "eventId": {
      "type": "string",
      "format": "uuid",
      "description": "Unique identifier for this event instance"
    },
    "eventTimestamp": {
      "type": "string",
      "format": "date-time",
      "description": "ISO 8601 timestamp when the order was placed"
    },
    "orderId": {
      "type": "string",
      "description": "Unique identifier for the order"
    },
    "customerId": {
      "type": "string",
      "description": "Identifier of the customer who placed the order"
    },
    "items": {
      "type": "array",
      "minItems": 1,
      "items": {
        "type": "object",
        "required": ["productId", "quantity", "unitPriceCents"],
        "properties": {
          "productId": { "type": "string" },
          "productName": { "type": "string" },
          "quantity": { "type": "integer", "minimum": 1 },
          "unitPriceCents": { "type": "integer", "minimum": 0 }
        }
      }
    },
    "totalAmountCents": {
      "type": "integer",
      "minimum": 0,
      "description": "Total order amount in cents"
    },
    "currency": {
      "type": "string",
      "default": "USD",
      "pattern": "^[A-Z]{3}$"
    }
  }
}

Designing Effective Event Schemas

Well-designed event schemas share certain characteristics that make them useful, maintainable, and evolvable.

Event Schema Design Principles

•Self-Describing — Events should carry enough context to be understood without external lookups. Include relevant denormalized data rather than just foreign keys.
•Domain-Oriented — Model events around business concepts, not database tables. 'OrderPlaced' is better than 'OrderTableInserted'.
•Appropriately Granular — Events should represent meaningful business occurrences — not every field change, but not giant aggregations either.
•Explicitly Typed — Use specific types (dates, enums, structured types) rather than generic strings wherever possible.
•Versioned — Include schema version to enable consumers to handle different versions appropriately.
•Documented — Include descriptions for the event and each field. Future consumers will thank you.
•Evolution-Ready — Design with future changes in mind: optional fields, extensible enums, reserved field numbers.

The Self-Describing Event:

Consider two approaches to an OrderPlaced event:

Minimal (Anti-Pattern)
JSON
1
2
3
4
5
6
7
8
9
{
  "eventType": "OrderPlaced",
  "orderId": "ord_123"
}
 
// Consumer must query order service
// to get any useful information.
// Creates runtime dependency.
// Fails if order service is down.

Self-Describing (Recommended)
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
{
  "eventType": "OrderPlaced",
  "orderId": "ord_123",
  "customerId": "cust_456",
  "customerEmail": "user@example.com",
  "items": [
    { "productId": "prod_1", "name": "Widget", "qty": 2 }
  ],
  "totalAmount": 99.99,
  "placedAt": "2025-01-08T10:30:00Z"
}
// Consumer has everything needed.
// No runtime dependencies required.

Denormalization is Expected

Unlike database design where normalization is king, event design benefits from strategic denormalization. Including customer email directly in the Order event (rather than just customerId) allows email services to send confirmations without querying the customer service. The tradeoff is larger event payloads, but the benefit of reduced runtime coupling usually wins.

The Event Envelope Pattern

A common pattern separates event metadata (common across all events) from the event payload (specific to each event type). This creates a standard envelope that wraps event-specific content.

Event Envelope Structure
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// Event envelope - same structure for all events
interface EventEnvelope<T> {
    // ─────────────────────────────────────────────
    // METADATA (common to all events)
    // ─────────────────────────────────────────────
    eventId: string;          // Unique event identifier (UUID)
    eventType: string;        // Event name (e.g., "OrderPlaced")
    eventVersion: string;     // Schema version (e.g., "1.2")
    timestamp: string;        // When event occurred (ISO 8601)
    source: string;           // Producing service/component
    correlationId: string;    // Request correlation for tracing
    causationId?: string;     // ID of event that caused this one
    
    // ─────────────────────────────────────────────
    // PARTITIONING & ROUTING
    // ─────────────────────────────────────────────
    aggregateType: string;    // Entity type (e.g., "Order")
    aggregateId: string;      // Entity ID (partition key)
    
    // ─────────────────────────────────────────────
    // PAYLOAD (event-specific data)
    // ─────────────────────────────────────────────
    data: T;                  // Event-specific content
}
 
// Specific event types define their data payload
interface OrderPlacedData {
    orderId: string;
    customerId: string;
    items: OrderItem[];
    totalAmountCents: number;
    currency: string;
}
 
// Complete OrderPlaced event
type OrderPlacedEvent = EventEnvelope<OrderPlacedData>;
 
// Example event instance
const orderPlaced: OrderPlacedEvent = {
    eventId: "evt_a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    eventType: "OrderPlaced",
    eventVersion: "1.2",
    timestamp: "2025-01-08T10:30:00.000Z",
    source: "order-service",
    correlationId: "req_xyz789",
    aggregateType: "Order",
    aggregateId: "ord_123456",
    data: {
        orderId: "ord_123456",
        customerId: "cust_789",
        items: [
            { productId: "prod_001", quantity: 2, unitPriceCents: 2999 }
        ],
        totalAmountCents: 5998,
        currency: "USD"
    }
};

Benefits of the Envelope Pattern:

Consistent Metadata Handling: All events have the same metadata structure, enabling generic logging, tracing, and routing logic.

Separation of Concerns: The envelope handles cross-cutting concerns (identity, tracing, versioning) while the payload focuses on business data.

Type Safety: Generic types allow strong typing of both envelope and payload in typed languages.

CloudEvents Compatibility: This pattern aligns with the CloudEvents specification, enabling interoperability.

CloudEvents Specification

CloudEvents is an industry standard for event envelope structure. It defines required attributes (id, source, type, specversion) and optional extensions. Adopting CloudEvents enables tooling compatibility and easier integration with cloud services. See cloudevents.io for the full specification.

Schema Evolution: Changing Schemas Over Time

Schemas will change. Business requirements evolve, bugs are found, new data is needed. The challenge is evolving schemas without breaking producers or consumers.

Compatibility Types:

Schema Compatibility Types
Type	Definition	When Needed
Backward Compatible	New schema can read data written with old schema	Consumers upgrade before producers
Forward Compatible	Old schema can read data written with new schema	Producers upgrade before consumers
Full Compatible	Both backward and forward compatible	Independent producer/consumer upgrades
Breaking (Incompatible)	Old and new schemas cannot interoperate	Requires coordinated migration

For event-driven systems, full compatibility is ideal — it allows producers and consumers to upgrade independently without coordination.

Backward Compatible Changes (Safe)

•Add optional field with default — Old events won't have it; consumer uses default
•Remove optional field — Consumer ignores unknown fields (should already)
•Add new enum value — Consumers should handle unknown enum values gracefully
•Widen numeric type — int → long, float → double
•Add new event type — Deploy consumer first to handle it

Breaking Changes (Require Migration)

•Remove required field — Old consumers expect it
•Add required field without default — Old events don't have it
•Change field type — string → int, for example
•Rename field — Old consumers look for old name
•Change field semantics — Same field means something different
•Remove enum value — Old events may have the removed value

Semantic Changes Are Often Invisible

The most dangerous breaking changes are semantic — when a field's meaning changes but its name and type don't. For example, changing 'amount' from dollars to cents, or 'timestamp' from UTC to local time. These won't fail schema validation but will cause silent data corruption. Document semantics clearly and version-bump for any semantic change.

Schema Evolution Strategies

When breaking changes are necessary, several strategies help manage the transition.

Strategy 1: Version in Event Type

Create a new event type for breaking changes: OrderPlaced becomes OrderPlacedV2. Both can coexist.

Version in Event Type
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Parallel event types
// Old producers → OrderPlaced (v1)
// New producers → OrderPlacedV2
 
// Strategy:
// 1. Deploy consumers that handle both
// 2. Migrate producers to V2
// 3. Eventually deprecate V1
 
class OrderConsumer {
    async handleOrderPlaced(event: OrderPlacedV1): Promise<void> {
        // Handle v1 format (for old events)
        await this.processOrder(this.convertV1ToInternal(event));
    }
    
    async handleOrderPlacedV2(event: OrderPlacedV2): Promise<void> {
        // Handle v2 format (new events)
        await this.processOrder(this.convertV2ToInternal(event));
    }
    
    private convertV1ToInternal(v1: OrderPlacedV1): InternalOrder {
        return {
            orderId: v1.orderId,
            // V1 had amount in dollars, convert to cents
            totalCents: Math.round(v1.totalAmount * 100),
            // V1 didn't have currency, default to USD
            currency: 'USD',
            // ... other mappings
        };
    }
}

Strategy 2: Schema Version in Envelope

The event envelope carries a version number. Consumers dispatch based on version.

Version-Based Dispatch
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Envelope includes version
interface EventEnvelope<T> {
    eventType: string;      // e.g., "OrderPlaced"
    eventVersion: string;   // e.g., "2.0"
    data: T;
}
 
// Consumer dispatches by version
class VersionedConsumer {
    async handleEvent(envelope: EventEnvelope<unknown>): Promise<void> {
        const handler = this.getHandler(
            envelope.eventType, 
            envelope.eventVersion
        );
        
        if (!handler) {
            console.warn(
                `No handler for ${envelope.eventType} v${envelope.eventVersion}`
            );
            return; // Log and skip unknown versions
        }
        
        await handler.handle(envelope.data);
    }
    
    private getHandler(type: string, version: string): EventHandler | null {
        const handlers: Record<string, EventHandler> = {
            'OrderPlaced:1.0': new OrderPlacedV1Handler(),
            'OrderPlaced:1.1': new OrderPlacedV1Handler(), // Minor compatible
            'OrderPlaced:2.0': new OrderPlacedV2Handler(), // Breaking change
        };
        
        // Try exact match first, then major version match
        return handlers[`${type}:${version}`] 
            ?? handlers[`${type}:${this.majorVersion(version)}.0`]
            ?? null;
    }
}

Strategy 3: Dual-Write During Migration

Produce both old and new versions simultaneously during migration. Consumers can transition at their own pace.

Prefer Additive Changes

The best evolution strategy is to avoid breaking changes entirely. Design schemas with room to grow: use optional fields with defaults, make enums extensible (reserve an 'UNKNOWN' or 'OTHER' value), and never remove or rename fields—deprecate them instead (document as deprecated, stop producing, eventually stop reading).

Schema Registry: Centralized Schema Management

A schema registry is a centralized service that stores and manages event schemas. It serves as the source of truth for what schemas exist and validates compatibility during evolution.

Schema Registry Capabilities

•Schema Storage — Central repository for all event schemas, versioned and immutable
•Compatibility Checking — Validates new schemas against configured compatibility rules before registration
•Schema ID Mapping — Assigns compact IDs to schemas; events carry ID instead of full schema
•Discovery — APIs to search, browse, and understand available schemas
•Subject Management — Groups schemas by topic, event type, or other organizational units
•Access Control — Who can register, modify, or read schemas

How It Works (Confluent Schema Registry Example):

Producer publishes first event:
- Serializes event with schema
- Registers schema with registry (if new)
- Gets schema ID
- Prepends schema ID to event payload
Consumer receives event:
- Reads schema ID from payload
- Fetches schema from registry (cached)
- Deserializes using schema
- Processes event
Producer updates schema:
- Registry checks compatibility
- If compatible: assigns new version, new ID
- If incompatible: rejects registration

Schema Registry Options
Product	Schema Formats	Notable Features
Confluent Schema Registry	Avro, Protobuf, JSON Schema	Kafka-native, compatibility modes, reference resolution
AWS Glue Schema Registry	Avro, JSON Schema	AWS-integrated, compression, works with MSK
Apicurio Registry	Avro, Protobuf, JSON Schema, more	Open source, CNCF, multiple storage backends
Azure Schema Registry	Avro	Azure Event Hubs integration

Schema Registry Is Critical Infrastructure

If the schema registry goes down, producers cannot serialize and consumers cannot deserialize (until cache expires). Treat schema registry with the same high-availability requirements as your message broker: replicated, backed up, and monitored.

Common Pitfalls in Event Schema Design

Learning from others' mistakes helps you avoid common schema design problems.

Schema Design Anti-Patterns

•Generic Blob Fields — Using data: any or payload: string (containing JSON) defeats schema validation and evolution tracking. Be explicit about structure.
•Too Many Required Fields — Every required field is a future constraint. Use optional with defaults for fields that might not always be present.
•Leaking Internal Models — Exposing database entities directly as events ties your events to your database schema. Design events for consumers, not your implementation.
•No Versioning Strategy — Not including version information makes evolution extremely difficult. Include eventVersion from day one.
•Overloaded Events — One event type containing radically different payloads based on a type field. Create separate event types instead.
•Missing Timestamps — Events without timestamps lose ordering and audit capability. Always include when the event occurred.
•Unstable Field Semantics — Reusing a field name with different meaning. 'amount' in dollars then changed to cents. Version-bump and document.

Anti-Pattern Example: The Generic Event

Anti-Pattern: Generic Event
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// ❌ ANTI-PATTERN: Generic event with blob payload
interface GenericEvent {
    eventType: string;
    timestamp: string;
    payload: string; // JSON-encoded, no schema!
}
 
// Problems:
// - No compile-time safety
// - No schema validation
// - payload can be literally anything
// - Consumers must defensive-parse everything
// - Evolution is invisible and dangerous
 
// ✅ CORRECT: Typed events with explicit schemas
interface OrderPlacedEvent {
    eventType: 'OrderPlaced';
    eventVersion: '1.0';
    timestamp: string;
    data: {
        orderId: string;
        customerId: string;
        items: OrderItem[];
        totalCents: number;
    };
}
 
// Type-safe, validatable, evolvable

Schema Review Process

Treat schema changes like API changes — they deserve review. Establish a process where schema modifications are reviewed for compatibility, naming conventions, and design quality before registration. Automated compatibility checks help, but human review catches semantic and design issues.

Real-World Schema Design Example

Let's design a complete, production-quality schema for a user registration event, demonstrating best practices.

user-registered.v1.schema.json
JSON Schema
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "https://schemas.example.com/events/user/registered/v1.json",
  "title": "UserRegistered",
  "description": "Emitted when a new user account is successfully created",
  
  "type": "object",
  "required": [
    "eventId", "eventType", "eventVersion", "timestamp",
    "source", "correlationId", "data"
  ],
  
  "properties": {
    "eventId": {
      "type": "string",
      "format": "uuid",
      "description": "Unique identifier for this event instance"
    },
    "eventType": {
      "const": "UserRegistered",
      "description": "Event type discriminator"
    },
    "eventVersion": {
      "type": "string",
      "pattern": "^\\d+\\.\\d+$",
      "description": "Schema version (semver without patch)",
      "examples": ["1.0", "1.1", "2.0"]
    },
    "timestamp": {
      "type": "string",
      "format": "date-time",
      "description": "When registration occurred (ISO 8601 UTC)"
    },
    "source": {
      "type": "string",
      "description": "Service that produced this event",
      "examples": ["user-service", "registration-api"]
    },
    "correlationId": {
      "type": "string",
      "description": "Request correlation ID for distributed tracing"
    },
    "causationId": {
      "type": "string",
      "description": "ID of the event that triggered this one (optional)"
    },
    
    "data": {
      "type": "object",
      "description": "Event-specific payload",
      "required": ["userId", "email", "registrationMethod"],
      "properties": {
        "userId": {
          "type": "string",
          "description": "Unique user identifier",
          "examples": ["usr_a1b2c3d4"]
        },
        "email": {
          "type": "string",
          "format": "email",
          "description": "User's email address"
        },
        "emailVerified": {
          "type": "boolean",
          "default": false,
          "description": "Whether email was verified at registration"
        },
        "displayName": {
          "type": "string",
          "description": "User's chosen display name (optional)"
        },
        "registrationMethod": {
          "type": "string",
          "enum": ["EMAIL_PASSWORD", "GOOGLE_OAUTH", "GITHUB_OAUTH", "SSO", "OTHER"],
          "description": "How the user registered"
        },
        "referralCode": {
          "type": "string",
          "description": "Referral code used during registration (optional)"
        },
        "marketingOptIn": {
          "type": "boolean",
          "default": false,
          "description": "User consented to marketing communications"
        },
        "locale": {
          "type": "string",
          "pattern": "^[a-z]{2}(-[A-Z]{2})?$",
          "default": "en-US",
          "description": "User's preferred locale"
        },
        "timezone": {
          "type": "string",
          "description": "User's timezone (IANA format)",
          "examples": ["America/New_York", "Europe/London"]
        }
      },
      "additionalProperties": false
    }
  },
  "additionalProperties": false
}

Design Decisions Explained:

eventVersion included: Enables version-specific handling
Required fields minimized: Only truly essential fields are required
Defaults provided: emailVerified, marketingOptIn, locale have defaults
Extensible enum: registrationMethod includes 'OTHER' for future cases
Optional enrichment fields: referralCode, displayName, timezone are optional
additionalProperties: false: Catches typos, forces explicit schema updates
Format annotations: email, uuid, date-time enable tooling validation
Documentation throughout: Every field has a description

Summary: Event Schemas

Event schemas are fundamental infrastructure for event-driven systems. Let's consolidate the key learnings:

Key Takeaways

•Schemas are contracts — They define what producers promise and consumers expect, enabling independent development.
•Design for self-description — Events should carry enough context to be understood without external lookups.
•Use the envelope pattern — Separate standard metadata from event-specific payload for consistency.
•Plan for evolution — Use optional fields with defaults, extensible enums, and versioning from the start.
•Understand compatibility — Backward and forward compatibility enable independent upgrades without coordination.
•Schema registries are essential — Centralized schema storage with compatibility checking prevents breaking changes.
•Avoid common pitfalls — No generic blobs, no overloaded events, no semantic changes without versioning.

Module Complete:

Congratulations! You have completed the Event-Driven Fundamentals module. You now understand what event-driven architecture is, the distinction between events and commands, how producers and consumers work, and how to design and evolve event schemas.

This foundational knowledge prepares you for the advanced topics ahead: event sourcing, CQRS, choreography vs orchestration, and the practical challenges of event-driven systems.

Module Complete

You have completed the Event-Driven Fundamentals module. You now have a solid conceptual foundation for designing and building event-driven systems—the architecture pattern that powers modern distributed applications at scale.

4 / 4

Loading learning content...

System Design (HLD)Event-Driven Fundamentals

Event-Driven Fundamentals

LevelIntermediate

Duration60 mins

TopicEvent-Driven Fundamentals

4 / 4

Event Schemas

The Contract Between Services

This page provides a comprehensive exploration of event schemas—from design principles to evolution strategies to operational tooling.

What You Will Learn

What Are Event Schemas?

An event schema is a formal definition of an event's structure—the fields it contains, their data types, which are required or optional, and any constraints on values.

Why Schemas Matter:

The Value of Explicit Schemas

•Contract Definition — Schemas explicitly define what producers promise to provide and what consumers can expect to receive.
•Validation — Events can be validated against schemas before publishing or after receiving, catching errors early.
•Documentation — Schemas serve as living documentation of event structures, reducing the need for out-of-band communication.
•Code Generation — Types can be generated from schemas, providing compile-time safety in producers and consumers.
•Evolution Management — Formal schemas enable tooling to detect breaking changes and manage compatibility.
•Discovery — Schema registries allow teams to discover what events exist and understand their structure.

Schema vs Schemaless:

Some systems operate without explicit schemas—producers send JSON, consumers parse what they receive. This seems simpler initially but creates problems at scale:

No validation until runtime
Implicit contracts that drift over time
No way to detect breaking changes
Documentation becomes outdated or missing
Consumers break silently when producers change

Explicit schemas have upfront cost but prevent far more expensive problems down the road. For any non-trivial event-driven system, schemas are essential infrastructure.

Schema-First Development

Schema Definition Languages

Several widely-used formats exist for defining event schemas, each with different tradeoffs.

Schema Format Comparison
Format	Strengths	Weaknesses	Best For
Apache Avro	Compact binary, schema evolution, Kafka native	Requires schema registry, more complex	High-volume Kafka deployments
Protocol Buffers	Fast, compact, wide language support	Less flexibility, version in schema	gRPC systems, Google ecosystem
JSON Schema	Human readable, JSON native, no special tooling	Larger payloads, weaker evolution story	REST APIs, simpler systems
Apache Thrift	Efficient, cross-language RPC	More RPC-focused, less event-focused	Facebook ecosystem, RPC-heavy
CloudEvents	Standardized envelope, vendor-neutral	Envelope only, payload schema separate	Multi-cloud, standard compliance

Apache Avro Example:

Avro is particularly well-suited for Kafka-based event systems because of its compact binary format, excellent schema evolution support, and tight integration with Confluent Schema Registry.

order-placed.avsc
Avro Schema
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
{
  "type": "record",
  "name": "OrderPlaced",
  "namespace": "com.example.orders.events",
  "doc": "Emitted when a customer successfully places an order",
  "fields": [
    {
      "name": "eventId",
      "type": "string",
      "doc": "Unique identifier for this event instance"
    },
    {
      "name": "eventTimestamp",
      "type": {
        "type": "long",
        "logicalType": "timestamp-millis"
      },
      "doc": "When the order was placed (epoch milliseconds)"
    },
    {
      "name": "orderId",
      "type": "string",
      "doc": "Unique identifier for the order"
    },
    {
      "name": "customerId",
      "type": "string",
      "doc": "Identifier of the customer who placed the order"
    },
    {
      "name": "items",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "OrderItem",
          "fields": [
            { "name": "productId", "type": "string" },
            { "name": "productName", "type": "string" },
            { "name": "quantity", "type": "int" },
            { "name": "unitPriceCents", "type": "long" }
          ]
        }
      },
      "doc": "Items included in the order"
    },
    {
      "name": "totalAmountCents",
      "type": "long",
      "doc": "Total order amount in cents"
    },
    {
      "name": "currency",
      "type": "string",
      "default": "USD",
      "doc": "ISO currency code"
    },
    {
      "name": "shippingAddress",
      "type": ["null", {
        "type": "record",
        "name": "Address",
        "fields": [
          { "name": "street", "type": "string" },
          { "name": "city", "type": "string" },
          { "name": "state", "type": ["null", "string"], "default": null },
          { "name": "postalCode", "type": "string" },
          { "name": "country", "type": "string" }
        ]
      }],
      "default": null,
      "doc": "Shipping address (null for digital-only orders)"
    }
  ]
}

JSON Schema Example:

JSON Schema is more human-readable and doesn't require special tooling, making it suitable for simpler systems or when JSON is already the wire format.

order-placed.schema.json
JSON Schema
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "https://example.com/schemas/order-placed.json",
  "title": "OrderPlaced",
  "description": "Emitted when a customer successfully places an order",
  "type": "object",
  "required": ["eventId", "eventTimestamp", "orderId", "customerId", "items", "totalAmountCents"],
  "properties": {
    "eventId": {
      "type": "string",
      "format": "uuid",
      "description": "Unique identifier for this event instance"
    },
    "eventTimestamp": {
      "type": "string",
      "format": "date-time",
      "description": "ISO 8601 timestamp when the order was placed"
    },
    "orderId": {
      "type": "string",
      "description": "Unique identifier for the order"
    },
    "customerId": {
      "type": "string",
      "description": "Identifier of the customer who placed the order"
    },
    "items": {
      "type": "array",
      "minItems": 1,
      "items": {
        "type": "object",
        "required": ["productId", "quantity", "unitPriceCents"],
        "properties": {
          "productId": { "type": "string" },
          "productName": { "type": "string" },
          "quantity": { "type": "integer", "minimum": 1 },
          "unitPriceCents": { "type": "integer", "minimum": 0 }
        }
      }
    },
    "totalAmountCents": {
      "type": "integer",
      "minimum": 0,
      "description": "Total order amount in cents"
    },
    "currency": {
      "type": "string",
      "default": "USD",
      "pattern": "^[A-Z]{3}$"
    }
  }
}

Designing Effective Event Schemas

Well-designed event schemas share certain characteristics that make them useful, maintainable, and evolvable.

Event Schema Design Principles

•Self-Describing — Events should carry enough context to be understood without external lookups. Include relevant denormalized data rather than just foreign keys.
•Domain-Oriented — Model events around business concepts, not database tables. 'OrderPlaced' is better than 'OrderTableInserted'.
•Appropriately Granular — Events should represent meaningful business occurrences — not every field change, but not giant aggregations either.
•Explicitly Typed — Use specific types (dates, enums, structured types) rather than generic strings wherever possible.
•Versioned — Include schema version to enable consumers to handle different versions appropriately.
•Documented — Include descriptions for the event and each field. Future consumers will thank you.
•Evolution-Ready — Design with future changes in mind: optional fields, extensible enums, reserved field numbers.

The Self-Describing Event:

Consider two approaches to an OrderPlaced event:

Minimal (Anti-Pattern)
JSON
1
2
3
4
5
6
7
8
9
{
  "eventType": "OrderPlaced",
  "orderId": "ord_123"
}
 
// Consumer must query order service
// to get any useful information.
// Creates runtime dependency.
// Fails if order service is down.

Self-Describing (Recommended)
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
{
  "eventType": "OrderPlaced",
  "orderId": "ord_123",
  "customerId": "cust_456",
  "customerEmail": "user@example.com",
  "items": [
    { "productId": "prod_1", "name": "Widget", "qty": 2 }
  ],
  "totalAmount": 99.99,
  "placedAt": "2025-01-08T10:30:00Z"
}
// Consumer has everything needed.
// No runtime dependencies required.

Denormalization is Expected

The Event Envelope Pattern

A common pattern separates event metadata (common across all events) from the event payload (specific to each event type). This creates a standard envelope that wraps event-specific content.

Event Envelope Structure
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// Event envelope - same structure for all events
interface EventEnvelope<T> {
    // ─────────────────────────────────────────────
    // METADATA (common to all events)
    // ─────────────────────────────────────────────
    eventId: string;          // Unique event identifier (UUID)
    eventType: string;        // Event name (e.g., "OrderPlaced")
    eventVersion: string;     // Schema version (e.g., "1.2")
    timestamp: string;        // When event occurred (ISO 8601)
    source: string;           // Producing service/component
    correlationId: string;    // Request correlation for tracing
    causationId?: string;     // ID of event that caused this one
    
    // ─────────────────────────────────────────────
    // PARTITIONING & ROUTING
    // ─────────────────────────────────────────────
    aggregateType: string;    // Entity type (e.g., "Order")
    aggregateId: string;      // Entity ID (partition key)
    
    // ─────────────────────────────────────────────
    // PAYLOAD (event-specific data)
    // ─────────────────────────────────────────────
    data: T;                  // Event-specific content
}
 
// Specific event types define their data payload
interface OrderPlacedData {
    orderId: string;
    customerId: string;
    items: OrderItem[];
    totalAmountCents: number;
    currency: string;
}
 
// Complete OrderPlaced event
type OrderPlacedEvent = EventEnvelope<OrderPlacedData>;
 
// Example event instance
const orderPlaced: OrderPlacedEvent = {
    eventId: "evt_a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    eventType: "OrderPlaced",
    eventVersion: "1.2",
    timestamp: "2025-01-08T10:30:00.000Z",
    source: "order-service",
    correlationId: "req_xyz789",
    aggregateType: "Order",
    aggregateId: "ord_123456",
    data: {
        orderId: "ord_123456",
        customerId: "cust_789",
        items: [
            { productId: "prod_001", quantity: 2, unitPriceCents: 2999 }
        ],
        totalAmountCents: 5998,
        currency: "USD"
    }
};

Benefits of the Envelope Pattern:

Consistent Metadata Handling: All events have the same metadata structure, enabling generic logging, tracing, and routing logic.

Separation of Concerns: The envelope handles cross-cutting concerns (identity, tracing, versioning) while the payload focuses on business data.

Type Safety: Generic types allow strong typing of both envelope and payload in typed languages.

CloudEvents Compatibility: This pattern aligns with the CloudEvents specification, enabling interoperability.

CloudEvents Specification

Schema Evolution: Changing Schemas Over Time

Schemas will change. Business requirements evolve, bugs are found, new data is needed. The challenge is evolving schemas without breaking producers or consumers.

Compatibility Types:

Schema Compatibility Types
Type	Definition	When Needed
Backward Compatible	New schema can read data written with old schema	Consumers upgrade before producers
Forward Compatible	Old schema can read data written with new schema	Producers upgrade before consumers
Full Compatible	Both backward and forward compatible	Independent producer/consumer upgrades
Breaking (Incompatible)	Old and new schemas cannot interoperate	Requires coordinated migration

For event-driven systems, full compatibility is ideal — it allows producers and consumers to upgrade independently without coordination.

Backward Compatible Changes (Safe)

•Add optional field with default — Old events won't have it; consumer uses default
•Remove optional field — Consumer ignores unknown fields (should already)
•Add new enum value — Consumers should handle unknown enum values gracefully
•Widen numeric type — int → long, float → double
•Add new event type — Deploy consumer first to handle it

Breaking Changes (Require Migration)

•Remove required field — Old consumers expect it
•Add required field without default — Old events don't have it
•Change field type — string → int, for example
•Rename field — Old consumers look for old name
•Change field semantics — Same field means something different
•Remove enum value — Old events may have the removed value

Semantic Changes Are Often Invisible

Schema Evolution Strategies

When breaking changes are necessary, several strategies help manage the transition.

Strategy 1: Version in Event Type

Create a new event type for breaking changes: OrderPlaced becomes OrderPlacedV2. Both can coexist.

Version in Event Type
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Parallel event types
// Old producers → OrderPlaced (v1)
// New producers → OrderPlacedV2
 
// Strategy:
// 1. Deploy consumers that handle both
// 2. Migrate producers to V2
// 3. Eventually deprecate V1
 
class OrderConsumer {
    async handleOrderPlaced(event: OrderPlacedV1): Promise<void> {
        // Handle v1 format (for old events)
        await this.processOrder(this.convertV1ToInternal(event));
    }
    
    async handleOrderPlacedV2(event: OrderPlacedV2): Promise<void> {
        // Handle v2 format (new events)
        await this.processOrder(this.convertV2ToInternal(event));
    }
    
    private convertV1ToInternal(v1: OrderPlacedV1): InternalOrder {
        return {
            orderId: v1.orderId,
            // V1 had amount in dollars, convert to cents
            totalCents: Math.round(v1.totalAmount * 100),
            // V1 didn't have currency, default to USD
            currency: 'USD',
            // ... other mappings
        };
    }
}

Strategy 2: Schema Version in Envelope

The event envelope carries a version number. Consumers dispatch based on version.

Version-Based Dispatch
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Envelope includes version
interface EventEnvelope<T> {
    eventType: string;      // e.g., "OrderPlaced"
    eventVersion: string;   // e.g., "2.0"
    data: T;
}
 
// Consumer dispatches by version
class VersionedConsumer {
    async handleEvent(envelope: EventEnvelope<unknown>): Promise<void> {
        const handler = this.getHandler(
            envelope.eventType, 
            envelope.eventVersion
        );
        
        if (!handler) {
            console.warn(
                `No handler for ${envelope.eventType} v${envelope.eventVersion}`
            );
            return; // Log and skip unknown versions
        }
        
        await handler.handle(envelope.data);
    }
    
    private getHandler(type: string, version: string): EventHandler | null {
        const handlers: Record<string, EventHandler> = {
            'OrderPlaced:1.0': new OrderPlacedV1Handler(),
            'OrderPlaced:1.1': new OrderPlacedV1Handler(), // Minor compatible
            'OrderPlaced:2.0': new OrderPlacedV2Handler(), // Breaking change
        };
        
        // Try exact match first, then major version match
        return handlers[`${type}:${version}`] 
            ?? handlers[`${type}:${this.majorVersion(version)}.0`]
            ?? null;
    }
}

Strategy 3: Dual-Write During Migration

Produce both old and new versions simultaneously during migration. Consumers can transition at their own pace.

Prefer Additive Changes

Schema Registry: Centralized Schema Management

A schema registry is a centralized service that stores and manages event schemas. It serves as the source of truth for what schemas exist and validates compatibility during evolution.

Schema Registry Capabilities

•Schema Storage — Central repository for all event schemas, versioned and immutable
•Compatibility Checking — Validates new schemas against configured compatibility rules before registration
•Schema ID Mapping — Assigns compact IDs to schemas; events carry ID instead of full schema
•Discovery — APIs to search, browse, and understand available schemas
•Subject Management — Groups schemas by topic, event type, or other organizational units
•Access Control — Who can register, modify, or read schemas

How It Works (Confluent Schema Registry Example):

Producer publishes first event:
- Serializes event with schema
- Registers schema with registry (if new)
- Gets schema ID
- Prepends schema ID to event payload
Consumer receives event:
- Reads schema ID from payload
- Fetches schema from registry (cached)
- Deserializes using schema
- Processes event
Producer updates schema:
- Registry checks compatibility
- If compatible: assigns new version, new ID
- If incompatible: rejects registration

Schema Registry Options
Product	Schema Formats	Notable Features
Confluent Schema Registry	Avro, Protobuf, JSON Schema	Kafka-native, compatibility modes, reference resolution
AWS Glue Schema Registry	Avro, JSON Schema	AWS-integrated, compression, works with MSK
Apicurio Registry	Avro, Protobuf, JSON Schema, more	Open source, CNCF, multiple storage backends
Azure Schema Registry	Avro	Azure Event Hubs integration

Schema Registry Is Critical Infrastructure

Common Pitfalls in Event Schema Design

Learning from others' mistakes helps you avoid common schema design problems.

Schema Design Anti-Patterns

•Generic Blob Fields — Using data: any or payload: string (containing JSON) defeats schema validation and evolution tracking. Be explicit about structure.
•Too Many Required Fields — Every required field is a future constraint. Use optional with defaults for fields that might not always be present.
•Leaking Internal Models — Exposing database entities directly as events ties your events to your database schema. Design events for consumers, not your implementation.
•No Versioning Strategy — Not including version information makes evolution extremely difficult. Include eventVersion from day one.
•Overloaded Events — One event type containing radically different payloads based on a type field. Create separate event types instead.
•Missing Timestamps — Events without timestamps lose ordering and audit capability. Always include when the event occurred.
•Unstable Field Semantics — Reusing a field name with different meaning. 'amount' in dollars then changed to cents. Version-bump and document.

Anti-Pattern Example: The Generic Event

Anti-Pattern: Generic Event
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// ❌ ANTI-PATTERN: Generic event with blob payload
interface GenericEvent {
    eventType: string;
    timestamp: string;
    payload: string; // JSON-encoded, no schema!
}
 
// Problems:
// - No compile-time safety
// - No schema validation
// - payload can be literally anything
// - Consumers must defensive-parse everything
// - Evolution is invisible and dangerous
 
// ✅ CORRECT: Typed events with explicit schemas
interface OrderPlacedEvent {
    eventType: 'OrderPlaced';
    eventVersion: '1.0';
    timestamp: string;
    data: {
        orderId: string;
        customerId: string;
        items: OrderItem[];
        totalCents: number;
    };
}
 
// Type-safe, validatable, evolvable

Schema Review Process

Real-World Schema Design Example

Let's design a complete, production-quality schema for a user registration event, demonstrating best practices.

user-registered.v1.schema.json
JSON Schema
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "https://schemas.example.com/events/user/registered/v1.json",
  "title": "UserRegistered",
  "description": "Emitted when a new user account is successfully created",
  
  "type": "object",
  "required": [
    "eventId", "eventType", "eventVersion", "timestamp",
    "source", "correlationId", "data"
  ],
  
  "properties": {
    "eventId": {
      "type": "string",
      "format": "uuid",
      "description": "Unique identifier for this event instance"
    },
    "eventType": {
      "const": "UserRegistered",
      "description": "Event type discriminator"
    },
    "eventVersion": {
      "type": "string",
      "pattern": "^\\d+\\.\\d+$",
      "description": "Schema version (semver without patch)",
      "examples": ["1.0", "1.1", "2.0"]
    },
    "timestamp": {
      "type": "string",
      "format": "date-time",
      "description": "When registration occurred (ISO 8601 UTC)"
    },
    "source": {
      "type": "string",
      "description": "Service that produced this event",
      "examples": ["user-service", "registration-api"]
    },
    "correlationId": {
      "type": "string",
      "description": "Request correlation ID for distributed tracing"
    },
    "causationId": {
      "type": "string",
      "description": "ID of the event that triggered this one (optional)"
    },
    
    "data": {
      "type": "object",
      "description": "Event-specific payload",
      "required": ["userId", "email", "registrationMethod"],
      "properties": {
        "userId": {
          "type": "string",
          "description": "Unique user identifier",
          "examples": ["usr_a1b2c3d4"]
        },
        "email": {
          "type": "string",
          "format": "email",
          "description": "User's email address"
        },
        "emailVerified": {
          "type": "boolean",
          "default": false,
          "description": "Whether email was verified at registration"
        },
        "displayName": {
          "type": "string",
          "description": "User's chosen display name (optional)"
        },
        "registrationMethod": {
          "type": "string",
          "enum": ["EMAIL_PASSWORD", "GOOGLE_OAUTH", "GITHUB_OAUTH", "SSO", "OTHER"],
          "description": "How the user registered"
        },
        "referralCode": {
          "type": "string",
          "description": "Referral code used during registration (optional)"
        },
        "marketingOptIn": {
          "type": "boolean",
          "default": false,
          "description": "User consented to marketing communications"
        },
        "locale": {
          "type": "string",
          "pattern": "^[a-z]{2}(-[A-Z]{2})?$",
          "default": "en-US",
          "description": "User's preferred locale"
        },
        "timezone": {
          "type": "string",
          "description": "User's timezone (IANA format)",
          "examples": ["America/New_York", "Europe/London"]
        }
      },
      "additionalProperties": false
    }
  },
  "additionalProperties": false
}

Design Decisions Explained:

eventVersion included: Enables version-specific handling
Required fields minimized: Only truly essential fields are required
Defaults provided: emailVerified, marketingOptIn, locale have defaults
Extensible enum: registrationMethod includes 'OTHER' for future cases
Optional enrichment fields: referralCode, displayName, timezone are optional
additionalProperties: false: Catches typos, forces explicit schema updates
Format annotations: email, uuid, date-time enable tooling validation
Documentation throughout: Every field has a description

Summary: Event Schemas

Event schemas are fundamental infrastructure for event-driven systems. Let's consolidate the key learnings:

Key Takeaways

•Schemas are contracts — They define what producers promise and consumers expect, enabling independent development.
•Design for self-description — Events should carry enough context to be understood without external lookups.
•Use the envelope pattern — Separate standard metadata from event-specific payload for consistency.
•Plan for evolution — Use optional fields with defaults, extensible enums, and versioning from the start.
•Understand compatibility — Backward and forward compatibility enable independent upgrades without coordination.
•Schema registries are essential — Centralized schema storage with compatibility checking prevents breaking changes.
•Avoid common pitfalls — No generic blobs, no overloaded events, no semantic changes without versioning.

Module Complete:

This foundational knowledge prepares you for the advanced topics ahead: event sourcing, CQRS, choreography vs orchestration, and the practical challenges of event-driven systems.

Module Complete

4 / 4