Loading learning content...
In the previous page, we explored the challenges of events crossing system boundaries. At the heart of every cross-boundary event is a fundamental transformation: converting in-memory objects into bytes that can travel across networks, be stored in message queues, and be reconstructed by receivers potentially written in different programming languages.
This process—serialization (objects to bytes) and deserialization (bytes to objects)—is deceptively complex. The choice of serialization format affects:
By the end of this page, you will understand: the major serialization formats and their trade-offs (JSON, Avro, Protobuf, MessagePack), how to design events for schema evolution, the role of schema registries, and practical strategies for serialization in real systems.
Serialization is often treated as an afterthought—just use JSON, right? But in high-throughput event-driven systems, serialization can become:
Let's examine why serialization deserves careful thought:
| Area | Poor Serialization Choice | Good Serialization Choice |
|---|---|---|
| Performance | 100μs to serialize one event; system can handle 10K events/sec | 5μs per event; same hardware handles 200K events/sec |
| Size | Average 2KB per event; 1B events = 2TB storage/day | Average 200 bytes per event; 1B events = 200GB storage/day |
| Compatibility | New field breaks all consumers; urgent hotfix required | New field ignored by old consumers; gradual migration possible |
| Debuggability | Binary blob; need special tools to inspect | Human-readable; can debug with curl and grep |
| Interoperability | Works only with Java; Python team can't consume events | Works with all languages; any team can consume |
Serialization costs multiply by volume. A 10% reduction in serialized event size that saves 100 bytes might seem trivial. At 1 million events per hour, that's 100MB/hour saved—2.4GB/day, 73GB/month. Over a year, you save 876GB of storage, bandwidth, and processing. Small optimizations compound.
Several serialization formats dominate the event-driven ecosystem. Each makes different trade-offs between human readability, size efficiency, schema enforcement, and performance.
The Major Contenders:
| Format | Type | Schema | Size | Speed | Readability | Best For |
|---|---|---|---|---|---|---|
| JSON | Text | Optional (JSON Schema) | Large | Slow | Excellent | HTTP APIs, debugging, low volume |
| MessagePack | Binary | Optional | Medium | Fast | Poor | JSON replacement needing speed |
| Avro | Binary | Required | Small | Fast | Poor | Kafka, schema evolution |
| Protocol Buffers | Binary | Required | Smallest | Fastest | Poor | gRPC, high-performance systems |
| Thrift | Binary | Required | Small | Fast | Poor | Legacy Facebook ecosystem |
| XML | Text | Optional (XSD) | Very Large | Very Slow | Good | Enterprise integration, SOAP |
Understanding the Trade-offs:
No format is universally best. The choice depends on your constraints:
JSON (JavaScript Object Notation) is the most widely-used serialization format for events, especially in web-centric systems. Its ubiquity stems from excellent tooling, native browser support, and human readability.
JSON Advantages:
JSON Disadvantages:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
// Event definition with clear serialization in mindinterface OrderPlacedEvent { eventType: "order.placed"; version: 1; eventId: string; occurredAt: string; // ISO 8601 string, not Date payload: { orderId: string; customerId: string; totalInCents: number; // Integers, not floats currency: string; items: Array<{ productId: string; quantity: number; }>; };} // Type-safe serializerclass JsonEventSerializer { serialize(event: OrderPlacedEvent): string { return JSON.stringify(event); } deserialize(json: string): OrderPlacedEvent { const parsed = JSON.parse(json); // Validate at runtime since JSON has no schema this.validate(parsed); return parsed as OrderPlacedEvent; } private validate(data: unknown): void { if (typeof data !== "object" || data === null) { throw new SerializationError("Event must be an object"); } const event = data as Record<string, unknown>; if (event.eventType !== "order.placed") { throw new SerializationError( `Unknown event type: ${event.eventType}` ); } if (typeof event.version !== "number") { throw new SerializationError("Missing or invalid version"); } // Additional validation... }} // Serialized output (formatted for readability)/*{ "eventType": "order.placed", "version": 1, "eventId": "550e8400-e29b-41d4-a716-446655440000", "occurredAt": "2024-01-15T14:30:00.000Z", "payload": { "orderId": "ORD-12345", "customerId": "CUST-67890", "totalInCents": 15999, "currency": "USD", "items": [ { "productId": "PROD-111", "quantity": 2 }, { "productId": "PROD-222", "quantity": 1 } ] }}*/If you're stuck with JSON but need smaller sizes: 1) Use short field names (but document thoroughly), 2) Omit null/empty fields, 3) Use integers for timestamps (Unix epoch) instead of ISO strings, 4) Consider GZIP compression for large events. These can reduce JSON size by 40-60%.
Protocol Buffers (Protobuf), created by Google, is a binary serialization format designed for maximum performance and minimal size. It requires a schema definition (.proto file) that's compiled into language-specific code.
Protobuf Advantages:
Protobuf Disadvantages:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
// order_events.protosyntax = "proto3"; package events; option java_package = "com.example.events";option go_package = "example.com/events"; // Metadata included in all eventsmessage EventMetadata { string event_id = 1; string correlation_id = 2; string source = 3; int64 timestamp_ms = 4; // Unix timestamp in milliseconds} // Order placed eventmessage OrderPlacedEvent { EventMetadata metadata = 1; // Payload fields string order_id = 2; string customer_id = 3; int64 total_cents = 4; // Use int64 for money, not float string currency = 5; repeated OrderItem items = 6; ShippingAddress shipping = 7;} message OrderItem { string product_id = 1; int32 quantity = 2; int64 unit_price_cents = 3; string sku = 4;} message ShippingAddress { string line1 = 1; string line2 = 2; // Optional - empty string if not provided string city = 3; string region = 4; string postal_code = 5; string country = 6;} // Enum for event types (useful for routing)enum EventType { EVENT_TYPE_UNSPECIFIED = 0; ORDER_PLACED = 1; ORDER_CONFIRMED = 2; ORDER_SHIPPED = 3; ORDER_DELIVERED = 4; ORDER_CANCELLED = 5;}12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
// Generated code from protoc compilerimport { OrderPlacedEvent, EventMetadata, OrderItem } from "./generated/order_events_pb"; class ProtobufEventSerializer { serialize(event: OrderPlacedEvent): Uint8Array { // Protobuf serializes to binary directly return event.serializeBinary(); } deserialize(bytes: Uint8Array): OrderPlacedEvent { // Type-safe deserialization with validation built-in return OrderPlacedEvent.deserializeBinary(bytes); }} // Creating an event with Protobuffunction createOrderPlacedEvent(order: Order): OrderPlacedEvent { const metadata = new EventMetadata(); metadata.setEventId(generateUuid()); metadata.setCorrelationId(getCurrentCorrelationId()); metadata.setSource("order-service"); metadata.setTimestampMs(Date.now()); const event = new OrderPlacedEvent(); event.setMetadata(metadata); event.setOrderId(order.id); event.setCustomerId(order.customerId); event.setTotalCents(order.totalCents); event.setCurrency(order.currency); for (const item of order.items) { const protoItem = new OrderItem(); protoItem.setProductId(item.productId); protoItem.setQuantity(item.quantity); protoItem.setUnitPriceCents(item.unitPriceCents); event.addItems(protoItem); } return event;} // Size comparisonfunction compareSerializationSizes(order: Order): void { const protoEvent = createOrderPlacedEvent(order); const jsonEvent = createJsonOrderEvent(order); const protoBytes = protoEvent.serializeBinary(); const jsonBytes = new TextEncoder().encode(JSON.stringify(jsonEvent)); console.log(`Protobuf size: ${protoBytes.length} bytes`); console.log(`JSON size: ${jsonBytes.length} bytes`); console.log(`Protobuf is ${(1 - protoBytes.length / jsonBytes.length) * 100}% smaller`); // Typical output: // Protobuf size: 142 bytes // JSON size: 385 bytes // Protobuf is 63% smaller}In Protobuf, each field has a number (e.g., 'string order_id = 2'). These numbers are encoded into the binary format. Once assigned, a field number can never be changed or reused. Plan your field numbers carefully and reserve ranges for future use.
Apache Avro is a binary serialization format that shines in data-intensive systems, particularly with Apache Kafka. Unlike Protobuf, Avro stores schema information separately from data, enabling powerful schema evolution capabilities.
Avro Advantages:
Avro Disadvantages:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
{ "namespace": "com.example.events", "type": "record", "name": "OrderPlacedEvent", "doc": "Event published when a customer places an order", "fields": [ { "name": "eventId", "type": "string", "doc": "Unique event identifier (UUID)" }, { "name": "version", "type": "int", "default": 1, "doc": "Schema version" }, { "name": "occurredAt", "type": "long", "logicalType": "timestamp-millis", "doc": "When the event occurred (Unix timestamp ms)" }, { "name": "orderId", "type": "string" }, { "name": "customerId", "type": "string" }, { "name": "totalCents", "type": "long", "doc": "Total order amount in cents" }, { "name": "currency", "type": { "type": "enum", "name": "Currency", "symbols": ["USD", "EUR", "GBP", "CAD", "AUD"] } }, { "name": "items", "type": { "type": "array", "items": { "type": "record", "name": "OrderItem", "fields": [ { "name": "productId", "type": "string" }, { "name": "quantity", "type": "int" }, { "name": "unitPriceCents", "type": "long" } ] } } }, { "name": "notes", "type": ["null", "string"], "default": null, "doc": "Optional order notes" } ]}1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
import { SchemaRegistry, SchemaType } from "@kafkajs/confluent-schema-registry";import { Kafka } from "kafkajs"; // Schema Registry clientconst registry = new SchemaRegistry({ host: "http://schema-registry:8081",}); // Kafka clientconst kafka = new Kafka({ brokers: ["kafka:9092"],}); const producer = kafka.producer(); // Register schema and get schema IDasync function registerEventSchema(): Promise<number> { const schema = await loadSchema("./schemas/order-placed-v1.avsc"); const { id } = await registry.register({ type: SchemaType.AVRO, schema: JSON.stringify(schema), }, { subject: "order.placed-value", // Kafka convention }); console.log(`Registered schema with ID: ${id}`); return id;} // Publish event with automatic Avro serializationasync function publishOrderPlaced(order: Order): Promise<void> { const event = { eventId: generateUuid(), version: 1, occurredAt: Date.now(), orderId: order.id, customerId: order.customerId, totalCents: order.totalCents, currency: order.currency, items: order.items.map(item => ({ productId: item.productId, quantity: item.quantity, unitPriceCents: item.unitPriceCents, })), notes: order.notes ?? null, // Avro union type }; // Schema Registry encodes with schema ID prefix const encodedValue = await registry.encode(schemaId, event); await producer.send({ topic: "order-events", messages: [{ key: order.id, value: encodedValue, headers: { "event-type": "order.placed", }, }], });} // Consumer with automatic Avro deserializationasync function consumeEvents(): Promise<void> { const consumer = kafka.consumer({ groupId: "inventory-service" }); await consumer.connect(); await consumer.subscribe({ topic: "order-events" }); await consumer.run({ eachMessage: async ({ message }) => { // Schema Registry decodes using embedded schema ID const event = await registry.decode(message.value); // event is now a typed JavaScript object console.log(`Received order: ${event.orderId}`); // Handle based on event type const eventType = message.headers?.["event-type"]?.toString(); await routeToHandler(eventType, event); }, });}In Avro systems, the Schema Registry is critical infrastructure. It stores all schema versions, enforces compatibility rules, and provides the lookup mechanism for decoding events. Treat it as you would a database—with backups, monitoring, and high availability.
One of the most critical aspects of event serialization is schema evolution—the ability to change event structures over time without breaking existing consumers. Systems evolve, requirements change, and events must change with them.
The Problem:
The Solution: Compatibility Modes
Schema registries enforce compatibility rules that ensure safe evolution:
| Mode | Description | Safe Changes | Deployment Order |
|---|---|---|---|
| BACKWARD | New schema can read old data | Add optional fields, remove fields | Upgrade consumers first, then producers |
| FORWARD | Old schema can read new data | Remove optional fields, add fields | Upgrade producers first, then consumers |
| FULL | Both backward and forward compatible | Add/remove optional fields only | Any order (safest but most restrictive) |
| NONE | No compatibility enforced | Any change allowed | Risky; requires careful coordination |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
// Version 1: Original schemainterface OrderPlacedV1 { eventId: string; orderId: string; customerId: string; totalCents: number; items: OrderItem[];} // Version 2: Adding an optional field (BACKWARD compatible)interface OrderPlacedV2 { eventId: string; orderId: string; customerId: string; totalCents: number; items: OrderItem[]; // New optional field with default shippingMethod?: "STANDARD" | "EXPRESS" | "OVERNIGHT";} // Version 3: Adding a required field (BREAKING CHANGE!)// This is NOT backward compatible - old data doesn't have this fieldinterface OrderPlacedV3Bad { eventId: string; orderId: string; customerId: string; totalCents: number; items: OrderItem[]; shippingMethod: "STANDARD" | "EXPRESS" | "OVERNIGHT"; // REQUIRED!} // Backward-compatible consumer: handles both V1 and V2class RobustOrderHandler { handleOrderPlaced(event: Record<string, unknown>): void { // Required fields that must exist const orderId = event.orderId as string; const customerId = event.customerId as string; const totalCents = event.totalCents as number; // Optional field that may not exist (V1 events) const shippingMethod = (event.shippingMethod as string) ?? "STANDARD"; // Process with default fallback this.processOrder({ orderId, customerId, totalCents, shippingMethod, }); }} // Forward-compatible producer: includes all fieldsclass RobustOrderPublisher { publishOrderPlaced(order: Order): OrderPlacedV2 { return { eventId: generateUuid(), orderId: order.id, customerId: order.customerId, totalCents: order.totalCents, items: order.items, // Always include optional fields when available shippingMethod: order.shippingMethod ?? undefined, }; }}Sometimes breaking changes are unavoidable. When they are: 1) Create a new event type (OrderPlacedV3) instead of modifying the existing one, 2) Run both event types in parallel during migration, 3) Migrate all consumers to the new type, 4) Deprecate and eventually remove the old type. This takes weeks, not hours.
In high-throughput systems, serialization performance directly impacts system capacity. Understanding the performance characteristics of different formats helps you make informed decisions.
Key Performance Metrics:
| Format | Serialize Speed | Deserialize Speed | Size | Memory Overhead |
|---|---|---|---|---|
| JSON | 1x (baseline) | 1x (baseline) | 1x (largest) | High (string allocations) |
| MessagePack | 3x faster | 2x faster | 0.7x | Medium |
| Avro | 5x faster | 5x faster | 0.4x | Low (reuses buffers) |
| Protobuf | 8x faster | 10x faster | 0.35x | Low (generated code is optimized) |
| FlatBuffers | 50x faster* | 100x faster* | 0.5x | Zero-copy possible |
*FlatBuffers achieves extreme speed by enabling zero-copy access—reading data directly from the serialized buffer without full deserialization. However, it's more complex to use correctly.
Optimization Strategies:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
// TECHNIQUE 1: Object pooling to reduce allocationsclass EventSerializerPool { private readonly pool: OrderPlacedEvent[] = []; acquire(): OrderPlacedEvent { return this.pool.pop() ?? new OrderPlacedEvent(); } release(event: OrderPlacedEvent): void { event.clear(); // Reset all fields this.pool.push(event); } serialize(order: Order): Uint8Array { const event = this.acquire(); try { this.populateEvent(event, order); return event.serializeBinary(); } finally { this.release(event); } }} // TECHNIQUE 2: Buffer reuse for outputclass BufferedSerializer { private buffer: Uint8Array = new Uint8Array(4096); serialize(event: OrderPlacedEvent): Uint8Array { const size = event.getSerializedSize(); // Grow buffer if needed if (size > this.buffer.length) { this.buffer = new Uint8Array(size * 2); } // Serialize into existing buffer event.serializeBinaryToWriter( new BinaryWriter(this.buffer) ); // Return view of used portion return this.buffer.subarray(0, size); }} // TECHNIQUE 3: Lazy deserializationclass LazyEventReader { constructor(private readonly bytes: Uint8Array) {} // Only parse what you need getOrderId(): string { // Parse just the orderId field without full deserialization return this.readFieldAt(/* orderId offset */); } // Full deserialization only when needed getFullEvent(): OrderPlacedEvent { return OrderPlacedEvent.deserializeBinary(this.bytes); }} // TECHNIQUE 4: Batching for throughputclass BatchingSerializer { private batch: Event[] = []; private readonly batchSize = 100; add(event: Event): void { this.batch.push(event); if (this.batch.length >= this.batchSize) { this.flush(); } } private flush(): void { // Serialize entire batch at once // More efficient than serializing individually const serialized = this.serializeBatch(this.batch); this.publish(serialized); this.batch = []; }}Don't optimize serialization prematurely. For most systems processing fewer than 10,000 events per second, JSON is fast enough. Profile your actual system, identify real bottlenecks, and optimize there. Premature optimization adds complexity without meaningful benefit.
Event serialization is a foundational concern for any cross-boundary event-driven system. The choice of format and approach affects performance, compatibility, and operational complexity.
What's Next:
Serializing events is only half the battle. As systems evolve, event schemas must evolve too—but in ways that don't break existing consumers. The next page dives deep into event versioning, exploring strategies for managing schema changes over time, version negotiation between producers and consumers, and maintaining backward and forward compatibility as your events mature.
You now understand the fundamentals of event serialization: format trade-offs, JSON vs binary, Protobuf and Avro details, and schema evolution basics. You can make informed decisions about serialization in your event-driven systems. Next, we'll explore event versioning in depth.