Loading content...
In a monolithic application with a single database, consistency is built-in. ACID transactions ensure that either all changes commit together or none do. If an order creation fails after inventory is reserved, the transaction rolls back—no partial state exists.
Microservices shatter this guarantee. When Order Service commits an order and then Inventory Service's reservation fails, you don't have a transaction to rollback. The order exists; the inventory is unreserved. Your system is inconsistent.
This is the fundamental consistency challenge of distributed systems: how do you maintain correctness across autonomous services that don't share a database?
The answer involves accepting weaker consistency guarantees, designing compensating mechanisms, and carefully modeling which operations truly require coordination. This page equips you with the patterns that production systems use to maintain consistency at scale.
By the end of this page, you will understand the consistency spectrum (strong to eventual), distributed transaction patterns (two-phase commit, saga), compensation mechanisms, idempotency requirements, and how to design systems that are correct despite not having traditional transactions.
Consistency isn't binary—it's a spectrum. Understanding where you can accept weaker guarantees is key to designing practical distributed systems.
| Level | Guarantee | Trade-off | Example Use Case |
|---|---|---|---|
| Strong | All readers see the latest write immediately | Slower, less available | Financial transactions |
| Linearizable | Operations appear instantaneous at some point | Coordination overhead | Distributed locks |
| Sequential | All processes see operations in same order | Order maintenance cost | Event logs |
| Causal | Related operations maintain order; unrelated may vary | Moderate complexity | Social media feeds |
| Eventual | All replicas converge eventually; no time bound | Temporary inconsistency | DNS, caching |
The CAP theorem reality:
CAP theorem tells us that during network partitions, distributed systems must choose between:
Most microservices architectures choose availability and accept eventual consistency. The practical question becomes: how eventual is acceptable, and what do we do in the meantime?
Consistency requirements vary by use case:
| Operation | Required Consistency | Reasoning |
|---|---|---|
| Money transfer between accounts | Strong | Losing money is unacceptable |
| Inventory reservation | Strong-ish | Overselling has cost, but bounded |
| Shopping cart update | Eventual | Minor delay is acceptable |
| Profile name change | Eventual | Users won't notice brief delay |
| Analytics increment | Eventual | Approximate is fine |
| Audit logging | Sequential | Order matters for compliance |
The key insight: don't apply strong consistency everywhere. It's expensive and often unnecessary. Identify where strong consistency is truly required and optimize those paths.
When discussing consistency requirements, engage business stakeholders. 'What happens if a customer sees stale inventory for 5 seconds?' is a business question, not just technical. Often, eventual consistency is acceptable—the business already handles similar delays in non-digital processes.
The traditional approach to distributed consistency is the Two-Phase Commit (2PC) protocol. A coordinator orchestrates multiple participants to either all commit or all abort.
How 2PC works:
Phase 1: PREPARE
────────────────
1. Coordinator sends PREPARE to all participants
2. Each participant:
- Acquires locks on affected data
- Writes to durable log
- Responds VOTE_COMMIT or VOTE_ABORT
Phase 2: COMMIT or ABORT
────────────────────────
3. If all votes are COMMIT:
- Coordinator sends COMMIT to all
- Participants apply changes and release locks
4. If any vote is ABORT:
- Coordinator sends ABORT to all
- Participants rollback and release locks
Why 2PC is rarely used in microservices:
When 2PC might apply:
Despite its theoretical guarantees, 2PC is generally avoided in microservices. The coordination overhead, failure modes, and tight coupling contradict microservices principles. Use saga patterns instead.
Sagas are the microservices answer to distributed transactions. Instead of a single atomic transaction, a saga is a sequence of local transactions, each publishing events that trigger the next step. If a step fails, compensating transactions undo the prior steps.
The key insight: Replace one big ACID transaction with many small ones, connected by events and compensations.
Saga execution:
Order Saga: Place Order
─────────────────────────
1. Order Service: Create order (PENDING)
→ Publish: OrderCreated
2. Inventory Service: Reserve items
→ Publish: InventoryReserved
→ Compensate: Release items if later step fails
3. Payment Service: Charge customer
→ Publish: PaymentProcessed
→ Compensate: Refund if later step fails
4. Shipping Service: Create shipment
→ Publish: ShipmentCreated
→ Compensate: Cancel shipment if later step fails
5. Order Service: Complete order (CONFIRMED)
→ Saga complete
IF any step fails:
→ Execute compensating transactions in reverse order
→ Order moves to CANCELLED state
Saga coordination styles:
Choreography — No central coordinator. Each service listens for events and knows what to do next. Decentralized but harder to understand flow.
Orchestration — A central saga orchestrator directs each step. Service coupling but clear flow visibility.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200
// ===================================================// SAGA ORCHESTRATION PATTERN// ===================================================// A central orchestrator manages saga execution,// calling services in sequence and handling failures.// =================================================== interface SagaStep<T> { name: string; execute: (context: T) => Promise<void>; compensate: (context: T) => Promise<void>;} interface OrderSagaContext { orderId: string; customerId: string; items: Array<{ productId: string; quantity: number }>; totalAmount: number; // Results from steps reservationId?: string; paymentId?: string; shipmentId?: string; // Tracking completedSteps: string[]; failedStep?: string; error?: Error;} class PlaceOrderSaga { private steps: SagaStep<OrderSagaContext>[]; constructor( private orderService: OrderServiceClient, private inventoryService: InventoryServiceClient, private paymentService: PaymentServiceClient, private shippingService: ShippingServiceClient, ) { this.steps = [ { name: 'createOrder', execute: this.createOrder.bind(this), compensate: this.cancelOrder.bind(this), }, { name: 'reserveInventory', execute: this.reserveInventory.bind(this), compensate: this.releaseInventory.bind(this), }, { name: 'processPayment', execute: this.processPayment.bind(this), compensate: this.refundPayment.bind(this), }, { name: 'createShipment', execute: this.createShipment.bind(this), compensate: this.cancelShipment.bind(this), }, { name: 'confirmOrder', execute: this.confirmOrder.bind(this), compensate: async () => {}, // Last step: no compensation }, ]; } async execute(context: OrderSagaContext): Promise<OrderSagaResult> { context.completedSteps = []; for (const step of this.steps) { try { console.log(`Executing step: ${step.name}`); await step.execute(context); context.completedSteps.push(step.name); } catch (error) { console.error(`Step ${step.name} failed:`, error); context.failedStep = step.name; context.error = error as Error; // Execute compensations in reverse order await this.compensate(context); return { success: false, orderId: context.orderId, failedStep: step.name, error: (error as Error).message, }; } } return { success: true, orderId: context.orderId, }; } private async compensate(context: OrderSagaContext): Promise<void> { console.log('Starting compensations...'); // Get completed steps in reverse order const stepsToCompensate = [...context.completedSteps].reverse(); for (const stepName of stepsToCompensate) { const step = this.steps.find(s => s.name === stepName); if (!step) continue; try { console.log(`Compensating step: ${step.name}`); await step.compensate(context); } catch (compError) { // Compensation failure is serious - needs manual intervention console.error( `CRITICAL: Compensation for ${step.name} failed!`, compError ); await this.alertOperations(context, step.name, compError); } } } // Step implementations private async createOrder(ctx: OrderSagaContext): Promise<void> { await this.orderService.createOrder({ orderId: ctx.orderId, customerId: ctx.customerId, items: ctx.items, status: 'PENDING', }); } private async cancelOrder(ctx: OrderSagaContext): Promise<void> { await this.orderService.updateOrderStatus(ctx.orderId, 'CANCELLED'); } private async reserveInventory(ctx: OrderSagaContext): Promise<void> { const result = await this.inventoryService.reserveItems({ orderId: ctx.orderId, items: ctx.items, }); ctx.reservationId = result.reservationId; } private async releaseInventory(ctx: OrderSagaContext): Promise<void> { if (ctx.reservationId) { await this.inventoryService.releaseReservation(ctx.reservationId); } } private async processPayment(ctx: OrderSagaContext): Promise<void> { const result = await this.paymentService.charge({ customerId: ctx.customerId, orderId: ctx.orderId, amount: ctx.totalAmount, }); ctx.paymentId = result.paymentId; } private async refundPayment(ctx: OrderSagaContext): Promise<void> { if (ctx.paymentId) { await this.paymentService.refund(ctx.paymentId); } } private async createShipment(ctx: OrderSagaContext): Promise<void> { const result = await this.shippingService.createShipment({ orderId: ctx.orderId, items: ctx.items, }); ctx.shipmentId = result.shipmentId; } private async cancelShipment(ctx: OrderSagaContext): Promise<void> { if (ctx.shipmentId) { await this.shippingService.cancelShipment(ctx.shipmentId); } } private async confirmOrder(ctx: OrderSagaContext): Promise<void> { await this.orderService.updateOrderStatus(ctx.orderId, 'CONFIRMED'); } private async alertOperations( ctx: OrderSagaContext, stepName: string, error: unknown ): Promise<void> { // This is a serious problem requiring manual intervention await this.alerting.critical({ type: 'SAGA_COMPENSATION_FAILURE', sagaType: 'PlaceOrder', orderId: ctx.orderId, failedStep: stepName, error: String(error), context: ctx, }); }}Compensation might need to run multiple times (retries, network issues). Design compensations to be safe when executed repeatedly. Canceling an already-canceled shipment should succeed silently.
The choice between choreography and orchestration significantly impacts how your saga behaves, how visible the flow is, and how easy failures are to handle.
In choreography, each service publishes events and listens for events from others. There's no central controller—the saga "emerges" from event reactions.
1. Order Service publishes OrderCreated
↓
2. Inventory Service listens, reserves inventory, publishes InventoryReserved
↓
3. Payment Service listens, charges payment, publishes PaymentProcessed
↓
4. Shipping Service listens, creates shipment, publishes ShipmentCreated
↓
5. Order Service listens, marks order confirmed
If Payment fails:
- Payment publishes PaymentFailed
- Inventory Service listens, releases reservation
- Order Service listens, marks order failed
In orchestration, a saga orchestrator calls each service in sequence, handling responses and triggering compensations.
Saga Orchestrator:
1. Call Order.create() → store orderId
2. Call Inventory.reserve(orderId) → store reservationId
3. Call Payment.charge(orderId) → if fails, compensate 2,1
4. Call Shipping.create(orderId) → if fails, compensate 3,2,1
5. Call Order.confirm(orderId) → saga complete
| Aspect | Choreography | Orchestration |
|---|---|---|
| Coupling | Loose — services don't know about saga | Moderate — orchestrator knows all services |
| Visibility | Hard — flow spans many services | Easy — flow visible in orchestrator |
| Complexity distribution | Spread across services | Concentrated in orchestrator |
| Adding steps | Change multiple services | Change orchestrator only |
| Circular dependencies | Risk of event cycles | Orchestrator prevents cycles |
| Testing | Integration tests essential | Can unit test orchestrator |
| Failure handling | Each service handles its failures | Orchestrator handles all failures |
When to choose choreography:
When to choose orchestration:
Real systems often use both. An orchestrator coordinates the main flow, but individual services handle local subprocesses via choreography. Choose based on the complexity and visibility needs of each business process.
Sagas provide a mechanism for coordinating changes, but they don't automatically ensure your system makes business sense. This leads to the distinction between syntactic and semantic consistency.
Syntactic consistency: The mechanism works correctly. Steps execute in order, compensations run on failure, events are delivered exactly once.
Semantic consistency: The business invariants are maintained. An order shouldn't exist without payment. Inventory shouldn't go negative. A customer's total orders shouldn't exceed their credit limit.
Syntactic consistency is about the saga machinery. Semantic consistency is about your domain rules.
The semantic challenge in sagas:
Between saga steps, your system is in an intermediate state that may violate invariants temporarily:
Step 1: Order created (order exists)
Step 2: Inventory reserved (order + reservation)
--- Intermediate: Payment pending ---
--- Invariant violated: order exists without payment ---
Step 3: Payment processed (order + reservation + payment)
For a brief window, the order exists without payment. Is this okay?
Solutions:
Pending states — The order is in PENDING state until payment clears. The invariant is 'CONFIRMED orders have payment', not 'all orders have payment'.
Reservation vs. commitment — Distinguish 'intending to do X' from 'X is done'. The inventory is 'reserved', not 'sold', until saga completes.
Timeout and expiry — Pending states expire. An order in PENDING for 30 minutes auto-cancels, triggering compensations.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
// ===================================================// SEMANTIC CONSISTENCY WITH STATE MACHINES// ===================================================// Model explicit states to maintain invariants// even during saga execution.// =================================================== // Order state machine - makes valid transitions explicitenum OrderState { DRAFT = 'DRAFT', // Cart items, not submitted PENDING = 'PENDING', // Saga in progress CONFIRMED = 'CONFIRMED', // Saga completed successfully CANCELLED = 'CANCELLED', // Saga failed or user cancelled SHIPPED = 'SHIPPED', // Physical shipment started DELIVERED = 'DELIVERED', // Customer received} const ORDER_TRANSITIONS: Record<OrderState, OrderState[]> = { [OrderState.DRAFT]: [OrderState.PENDING, OrderState.CANCELLED], [OrderState.PENDING]: [OrderState.CONFIRMED, OrderState.CANCELLED], [OrderState.CONFIRMED]: [OrderState.SHIPPED, OrderState.CANCELLED], [OrderState.SHIPPED]: [OrderState.DELIVERED], [OrderState.DELIVERED]: [], // Terminal state [OrderState.CANCELLED]: [], // Terminal state}; // Invariants by stateconst ORDER_INVARIANTS: Record<OrderState, (order: Order) => boolean> = { [OrderState.DRAFT]: (o) => o.items.length > 0 && !o.paymentId && !o.reservationId, [OrderState.PENDING]: (o) => o.items.length > 0, // Saga in progress, partial state OK [OrderState.CONFIRMED]: (o) => !!o.paymentId && !!o.reservationId, // Must have both [OrderState.SHIPPED]: (o) => !!o.paymentId && !!o.reservationId && !!o.shipmentId, [OrderState.DELIVERED]: (o) => !!o.shipmentId && !!o.deliveryConfirmation, [OrderState.CANCELLED]: (o) => true, // Cancellation always valid (compensations done)}; class Order { id: string; state: OrderState; items: OrderItem[]; paymentId?: string; reservationId?: string; shipmentId?: string; deliveryConfirmation?: string; pendingUntil?: Date; // Auto-cancel if not confirmed by this time transitionTo(newState: OrderState): void { // Validate transition is allowed const allowedTransitions = ORDER_TRANSITIONS[this.state]; if (!allowedTransitions.includes(newState)) { throw new InvalidStateTransitionError( `Cannot transition from ${this.state} to ${newState}` ); } // Validate invariants for new state const invariantCheck = ORDER_INVARIANTS[newState]; if (!invariantCheck(this)) { throw new InvariantViolationError( `Order does not satisfy invariants for state ${newState}` ); } this.state = newState; }} // Background job to handle stuck pending ordersclass PendingOrderCleanup { async run(): Promise<void> { const stuckOrders = await this.orderRepo.findByState( OrderState.PENDING, { pendingUntilBefore: new Date() } ); for (const order of stuckOrders) { console.log(`Auto-cancelling stuck order ${order.id}`); // Trigger compensations await this.sagaOrchestrator.compensateOrder(order.id); // Transition to cancelled order.transitionTo(OrderState.CANCELLED); await this.orderRepo.save(order); } }}Every consistency pattern relies on idempotency: the ability to safely retry operations. In distributed systems, you can never be certain whether an operation succeeded, failed, or is still running. The only safe approach is to make operations idempotent and retry until success is confirmed.
Why idempotency is essential:
1. Client sends payment request to Payment Service
2. Payment Service processes payment
3. Payment Service sends response
4. Network failure — response never arrives
5. Client doesn't know: Did payment succeed?
Without idempotency:
- Retry might charge customer twice!
With idempotency:
- Retry is safe — same result as single execution
Pattern 1: Idempotency Keys
Clients include a unique key with each request. The server stores processed keys and returns cached results for duplicates.
interface PaymentRequest {
idempotencyKey: string; // Client-generated, unique per operation
customerId: string;
amount: number;
}
async function processPayment(req: PaymentRequest) {
// Check if we've seen this key before
const cached = await idempotencyStore.get(req.idempotencyKey);
if (cached) {
return cached.result; // Return same result
}
// Process payment
const result = await actuallyProcessPayment(req);
// Store result for future duplicate requests
await idempotencyStore.set(req.idempotencyKey, {
result,
processedAt: new Date(),
});
return result;
}
Pattern 2: Conditional Writes
Include expected state in write requests. Database rejects if state has changed.
-- Only succeeds if version hasn't changed
UPDATE inventory
SET quantity = quantity - 1, version = 6
WHERE product_id = 'ABC' AND version = 5;
-- If already at version 6, returns 0 rows updated
-- Caller knows this is a retry of already-completed operation
Pattern 3: Natural Idempotency
Design operations to be naturally idempotent through their semantics:
NOT idempotent: addToBalance(+100) // Multiple calls add multiple times
Idempotent: setBalance(500) // Multiple calls same result
NOT idempotent: sendEmail() // Multiple calls send multiple emails
Idempotent: ensureEmailSent() // Checks before sending, stores sent state
Don't store idempotency keys forever. Use a TTL (e.g., 24-72 hours). After that, the same key could legitimately represent a new operation. Balance storage costs against your retry window.
Despite best efforts, conflicts occur in distributed systems. Two processes update the same data concurrently, or event ordering leads to inconsistent states. You need strategies for detecting and resolving conflicts.
The most recent write (by timestamp) overwrites previous values. Simple but may lose updates.
Process A writes X=1 at T1
Process B writes X=2 at T2
Result: X=2 (T2 > T1)
Use when: Data is independent; losing an update is acceptable; simplicity is priority.
Avoid when: All updates must be preserved; updates are semantically mergeable.
Design data structures that can be merged rather than overwritten. CRDTs (Conflict-free Replicated Data Types) are the formal version of this.
Example: Shopping Cart as a set of {productId, quantity}
Cart A: {(ProductX, 2)}
Cart B: {(ProductY, 1)}
Merged: {(ProductX, 2), (ProductY, 1)}
No conflict — union of items
Use when: Data has merge-friendly structure; you can't lose any updates.
Present conflicts to the application (or user) for manual resolution.
Conflict detected:
Version A: { name: 'John' }
Version B: { name: 'Jonathan' }
Options:
- Keep A
- Keep B
- Manually merge: 'John Jonathan'
- Mark for admin review
Use when: Automated resolution isn't reliable; human judgment needed; conflicts are rare enough to handle manually.
Transform conflicting operations so they can both apply. Used in collaborative editing (Google Docs).
Document: 'Hello'
User A inserts 'World' at position 5 → 'HelloWorld'
User B inserts '!' at position 5 → 'Hello!'
Without transformation:
Conflict — both want position 5
With transformation:
A's 'World' at 5 → 'HelloWorld'
Transform B's position: 5 + len('World') = 10
B's '!' at 10 → 'HelloWorld!'
Use when: Collaborative editing; operations can be composed; order-independent results matter.
The best conflict resolution is conflict prevention. Design data ownership to minimize concurrent writes to the same data. Use optimistic locking to detect conflicts early. Partition data so different processes write to different partitions.
Data consistency in microservices requires accepting that traditional transactions don't exist, then building appropriate mechanisms for your consistency requirements. Different data and operations need different guarantees—the art is matching patterns to requirements.
Module Complete:
You've now completed the Data Ownership module. You understand:
These principles form the foundation for building reliable, scalable, and maintainable microservices architectures.
Congratulations! You've mastered data ownership in microservices—from establishing single sources of truth to implementing saga patterns for distributed consistency. These patterns are the backbone of how companies like Netflix, Amazon, and Google manage data at planetary scale. Apply these principles to build systems that are correct, resilient, and scalable.