System Design (HLD)Choreography vs Orchestration

Choreography vs Orchestration: Coordination Patterns in Event-Driven Systems

LevelAdvanced

Duration90 mins

TopicChoreography vs Orchestration

3 / 5

Saga Patterns — Distributed Transaction Management

Beyond ACID: Managing Distributed Transactions

In a traditional monolithic application with a single database, transactions are straightforward. You start a transaction, perform operations, and either commit (all succeed) or rollback (all fail). The database guarantees ACID properties—Atomicity, Consistency, Isolation, Durability.

But what happens when your operations span multiple services, each with its own database?

You can't wrap a single database transaction around calls to Payment Service, Inventory Service, and Shipping Service. Each service manages its own data. There's no global transaction coordinator that can atomically commit across all of them.

This is where Sagas come in. A saga is a sequence of local transactions where each transaction updates a single service's database. If one transaction fails, the saga executes compensating transactions to undo the changes made by preceding transactions. Instead of ACID, sagas provide ACD without Isolation—eventual consistency through careful design.

What You Will Learn

By the end of this page, you will understand saga patterns deeply: their theoretical foundation, the difference between choreographed and orchestrated sagas, how to design compensating transactions, handle partial failures, and maintain data consistency across distributed services. You'll be equipped to implement sagas in production systems.

The Distributed Transaction Problem

Before diving into sagas, let's understand why distributed transactions are fundamentally hard—and why traditional solutions don't work at scale.

The Traditional Solution: Two-Phase Commit (2PC)

In 2PC, a coordinator orchestrates a distributed transaction:

Phase 1 (Prepare): Coordinator asks all participants: "Can you commit?" Each participant locks resources and votes yes/no.
Phase 2 (Commit/Abort): If all voted yes, coordinator says "Commit." Otherwise, "Abort."

Why 2PC Fails at Scale:

Problems with Two-Phase Commit

•Synchronous Blocking — All participants must be available simultaneously. One slow or unavailable participant blocks the entire transaction.
•Lock Contention — Participants hold locks during the entire protocol. Long transactions cause severe performance degradation.
•Single Point of Failure — If the coordinator fails between phases, participants are stuck holding locks, waiting for a decision.
•Not Designed for the Web — 2PC assumes reliable, low-latency networks. The internet is neither.
•Doesn't Work Across Services — Most databases don't support distributed transactions across different database systems or services.

The CAP Theorem Reality:

The CAP theorem tells us that in the presence of network partitions (which are guaranteed to happen), we must choose between consistency and availability. For most distributed applications, availability is essential—users expect the system to work even if some components are temporarily unreachable.

Sagas embrace this reality. Instead of trying to achieve strong consistency across services synchronously, sagas provide eventual consistency through sequential local transactions and compensations.

The Saga Approach:

Execute each step as a local transaction (one service, one database, ACID guaranteed)
If a step fails, execute compensating transactions for all previously successful steps
Accept that between steps, the system is in an intermediate state
Design for eventual convergence to consistent state

Sagas Are Not New

The saga concept was introduced by Hector Garcia-Molina and Kenneth Salem in their 1987 paper 'Sagas.' They designed sagas for long-running transactions that might span hours or days—far too long to hold database locks. Distributed systems have repurposed the pattern for a different but related challenge: transactions spanning multiple services.

Anatomy of a Saga

A saga consists of a sequence of transactions T₁, T₂, ..., Tₙ and their corresponding compensating transactions C₁, C₂, ..., Cₙ₋₁ (the last transaction doesn't need compensation—if it fails, we compensate everything before it; if it succeeds, we're done).

Formal Definition:

Each Tᵢ is a local transaction that updates a single service's data
Each Cᵢ is a compensating transaction that semantically undoes Tᵢ
If all transactions succeed: T₁ → T₂ → ... → Tₙ (saga completes)
If Tₖ fails: T₁ → T₂ → ... → Tₖ₋₁ → Cₖ₋₁ → Cₖ₋₂ → ... → C₁ (saga compensates)

Order Processing Saga Example
Step	Transaction (Tᵢ)	Compensating Transaction (Cᵢ)	Service
T₁	Create order in PENDING state	Cancel order, set state to CANCELLED	Order Service
T₂	Reserve inventory for items	Release reserved inventory	Inventory Service
T₃	Charge customer's payment method	Refund the payment	Payment Service
T₄	Schedule shipment	Cancel shipment	Shipping Service
T₅	Update order to CONFIRMED	(No compensation needed)	Order Service

Failure Scenarios:

Scenario 1: Payment Failed (T₃ fails)

T₁ ✓ (Order created)
T₂ ✓ (Inventory reserved)
T₃ ✗ (Payment declined)
Execute: C₂ (Release inventory) → C₁ (Cancel order)
End state: No order, no reservation, no payment

Scenario 2: Shipment Failed (T₄ fails)

T₁ ✓ (Order created)
T₂ ✓ (Inventory reserved)
T₃ ✓ (Payment charged)
T₄ ✗ (Shipping API down)
Execute: C₃ (Refund payment) → C₂ (Release inventory) → C₁ (Cancel order)
End state: Payment refunded, inventory released, order cancelled

Key Insight: Compensation must happen in reverse order. We undo the most recent change first because later transactions may depend on earlier ones.

Compensation Is Not Rollback

A compensating transaction doesn't 'undo' in the database sense—the original transaction committed and cannot be rolled back. Compensation creates a new transaction that semantically reverses the effect. A 'refund' isn't removing the original charge; it's a new transaction that credits the amount. This distinction is critical for understanding saga semantics.

Choreographed Sagas

In a choreographed saga, there's no central controller. Each service participates in the saga by listening to events and emitting events. The saga emerges from the distributed event flow.

How It Works:

Service A performs its local transaction and emits an event
Service B hears the event, performs its transaction, emits another event
If Service B fails, it emits a failure event
Service A hears the failure event and performs its compensation

This is pure event-driven choreography applied to the saga pattern.

Choreographed Saga Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
// Choreographed Saga: Order Processing
 
// ORDER SERVICE
class OrderService {
  async createOrder(command: CreateOrderCommand): Promise<Order> {
    return this.db.transaction(async (tx) => {
      // T₁: Create order in pending state
      const order = await tx.orders.create({
        id: uuid(),
        customerId: command.customerId,
        items: command.items,
        status: 'PENDING',
        createdAt: new Date(),
      });
      
      // Emit event for next step
      await tx.outbox.insert({
        aggregateId: order.id,
        eventType: 'OrderCreated',
        payload: {
          orderId: order.id,
          customerId: command.customerId,
          items: command.items,
          totalAmount: command.totalAmount,
        },
      });
      
      return order;
    });
  }
  
  // Handle failure from downstream services
  async handleInventoryReservationFailed(event: InventoryReservationFailedEvent) {
    return this.db.transaction(async (tx) => {
      // C₁: Compensate by cancelling order
      await tx.orders.update({
        where: { id: event.orderId },
        data: { 
          status: 'CANCELLED',
          cancelReason: 'Inventory not available',
          cancelledAt: new Date(),
        },
      });
      
      await tx.outbox.insert({
        aggregateId: event.orderId,
        eventType: 'OrderCancelled',
        payload: {
          orderId: event.orderId,
          reason: 'Inventory not available',
        },
      });
    });
  }
  
  async handlePaymentFailed(event: PaymentFailedEvent) {
    // Similar compensation logic
  }
}
 
// INVENTORY SERVICE
class InventoryService {
  async handleOrderCreated(event: OrderCreatedEvent) {
    return this.db.transaction(async (tx) => {
      try {
        // T₂: Reserve inventory
        const reservations = [];
        for (const item of event.items) {
          const available = await tx.inventory.findFirst({
            where: { productId: item.productId, quantity: { gte: item.quantity } },
          });
          
          if (!available) {
            throw new InsufficientInventoryError(item.productId);
          }
          
          await tx.inventory.update({
            where: { id: available.id },
            data: { quantity: { decrement: item.quantity } },
          });
          
          reservations.push({
            productId: item.productId,
            quantity: item.quantity,
            warehouseId: available.warehouseId,
          });
        }
        
        const reservation = await tx.reservations.create({
          id: uuid(),
          orderId: event.orderId,
          items: reservations,
          status: 'RESERVED',
          expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000),
        });
        
        // Emit success event
        await tx.outbox.insert({
          aggregateId: reservation.id,
          eventType: 'InventoryReserved',
          payload: {
            orderId: event.orderId,
            reservationId: reservation.id,
            items: reservations,
          },
        });
        
      } catch (error) {
        if (error instanceof InsufficientInventoryError) {
          // Emit failure event - Order Service will compensate T₁
          await tx.outbox.insert({
            aggregateId: event.orderId,
            eventType: 'InventoryReservationFailed',
            payload: {
              orderId: event.orderId,
              reason: 'Insufficient inventory',
              productId: error.productId,
            },
          });
        } else {
          throw error;
        }
      }
    });
  }
  
  // C₂: Handle payment failure - release inventory
  async handlePaymentFailed(event: PaymentFailedEvent) {
    return this.db.transaction(async (tx) => {
      const reservation = await tx.reservations.findUnique({
        where: { orderId: event.orderId },
      });
      
      if (!reservation) return;
      
      // Release each reserved item
      for (const item of reservation.items) {
        await tx.inventory.update({
          where: { productId: item.productId, warehouseId: item.warehouseId },
          data: { quantity: { increment: item.quantity } },
        });
      }
      
      await tx.reservations.update({
        where: { id: reservation.id },
        data: { status: 'RELEASED' },
      });
      
      await tx.outbox.insert({
        aggregateId: reservation.id,
        eventType: 'InventoryReleased',
        payload: {
          orderId: event.orderId,
          reservationId: reservation.id,
        },
      });
    });
  }
}
 
// PAYMENT SERVICE
class PaymentService {
  async handleInventoryReserved(event: InventoryReservedEvent) {
    return this.db.transaction(async (tx) => {
      try {
        // T₃: Charge payment
        const charge = await this.paymentGateway.charge({
          customerId: event.customerId,
          amount: event.totalAmount,
          idempotencyKey: event.orderId,
        });
        
        await tx.payments.create({
          id: charge.id,
          orderId: event.orderId,
          amount: event.totalAmount,
          status: 'COMPLETED',
        });
        
        await tx.outbox.insert({
          aggregateId: charge.id,
          eventType: 'PaymentCompleted',
          payload: {
            orderId: event.orderId,
            paymentId: charge.id,
            amount: event.totalAmount,
          },
        });
        
      } catch (error) {
        // Emit failure - triggers C₂ and C₁
        await tx.outbox.insert({
          aggregateId: event.orderId,
          eventType: 'PaymentFailed',
          payload: {
            orderId: event.orderId,
            reason: error.message,
          },
        });
      }
    });
  }
}

Choreographed Saga Trade-offs

•Pro: Loose Coupling — Services only know about events, not each other. New services can join without changing existing ones.
•Pro: No Single Point of Failure — No central orchestrator to fail. Each service operates independently.
•Pro: Team Autonomy — Each team manages their service's saga participation independently.
•Con: Hard to Understand — The complete saga is distributed across services. Understanding the flow requires examining multiple codebases.
•Con: Cyclic Dependencies Risk — Events can create complex, hard-to-trace dependency graphs.
•Con: Difficult to Add Steps — Adding a new step requires modifying multiple services' event subscriptions.

Orchestrated Sagas

In an orchestrated saga, a central saga orchestrator explicitly coordinates the sequence of local transactions. It knows the complete saga definition and explicitly invokes each step.

How It Works:

Orchestrator invokes Service A
On success, orchestrator invokes Service B
If Service B fails, orchestrator explicitly calls Service A's compensation
Orchestrator tracks saga state throughout

This provides visibility and control at the cost of centralizing saga logic.

Orchestrated Saga Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
// Orchestrated Saga: Order Processing with Saga Orchestrator
 
interface SagaStep<TData> {
  name: string;
  execute: (data: TData) => Promise<TData>;
  compensate: (data: TData) => Promise<void>;
}
 
class SagaOrchestrator<TData> {
  private steps: SagaStep<TData>[] = [];
  private completedSteps: SagaStep<TData>[] = [];
  
  addStep(step: SagaStep<TData>): this {
    this.steps.push(step);
    return this;
  }
  
  async execute(initialData: TData): Promise<SagaResult<TData>> {
    let data = initialData;
    
    for (const step of this.steps) {
      try {
        console.log(`Executing step: ${step.name}`);
        data = await step.execute(data);
        this.completedSteps.push(step);
      } catch (error) {
        console.log(`Step ${step.name} failed: ${error.message}`);
        await this.compensate(data);
        return { success: false, error, data };
      }
    }
    
    return { success: true, data };
  }
  
  private async compensate(data: TData): Promise<void> {
    // Compensate in reverse order
    for (const step of this.completedSteps.reverse()) {
      try {
        console.log(`Compensating step: ${step.name}`);
        await step.compensate(data);
      } catch (error) {
        console.error(`Compensation failed for ${step.name}: ${error.message}`);
        // Log and continue - compensation must be best-effort
        await this.alertCompensationFailure(step, error);
      }
    }
  }
}
 
// Order Processing Saga Definition
class OrderProcessingSaga {
  constructor(
    private readonly orderService: OrderServiceClient,
    private readonly inventoryService: InventoryServiceClient,
    private readonly paymentService: PaymentServiceClient,
    private readonly shippingService: ShippingServiceClient,
    private readonly stateStore: SagaStateStore,
  ) {}
  
  async execute(input: CreateOrderInput): Promise<OrderResult> {
    const sagaId = uuid();
    
    // Initialize saga state
    let sagaData: OrderSagaData = {
      sagaId,
      orderId: uuid(),
      customerId: input.customerId,
      items: input.items,
      totalAmount: input.totalAmount,
      shippingAddress: input.shippingAddress,
    };
    
    await this.stateStore.create({
      sagaId,
      type: 'OrderProcessing',
      status: 'RUNNING',
      data: sagaData,
      currentStep: 'CREATE_ORDER',
      startedAt: new Date(),
    });
    
    const saga = new SagaOrchestrator<OrderSagaData>()
      // Step 1: Create Order
      .addStep({
        name: 'CREATE_ORDER',
        execute: async (data) => {
          const order = await this.orderService.create({
            orderId: data.orderId,
            customerId: data.customerId,
            items: data.items,
            status: 'PENDING',
          });
          
          await this.updateSagaState(data.sagaId, 'CREATE_ORDER', 'COMPLETED');
          return { ...data, order };
        },
        compensate: async (data) => {
          await this.orderService.cancel({
            orderId: data.orderId,
            reason: 'Saga compensation',
          });
          await this.updateSagaState(data.sagaId, 'CREATE_ORDER', 'COMPENSATED');
        },
      })
      // Step 2: Reserve Inventory
      .addStep({
        name: 'RESERVE_INVENTORY',
        execute: async (data) => {
          const reservation = await this.inventoryService.reserve({
            orderId: data.orderId,
            items: data.items,
          });
          
          await this.updateSagaState(data.sagaId, 'RESERVE_INVENTORY', 'COMPLETED');
          return { ...data, reservation };
        },
        compensate: async (data) => {
          if (data.reservation) {
            await this.inventoryService.release({
              reservationId: data.reservation.id,
            });
          }
          await this.updateSagaState(data.sagaId, 'RESERVE_INVENTORY', 'COMPENSATED');
        },
      })
      // Step 3: Process Payment
      .addStep({
        name: 'PROCESS_PAYMENT',
        execute: async (data) => {
          const payment = await this.paymentService.charge({
            orderId: data.orderId,
            customerId: data.customerId,
            amount: data.totalAmount,
            idempotencyKey: `${data.sagaId}-payment`,
          });
          
          await this.updateSagaState(data.sagaId, 'PROCESS_PAYMENT', 'COMPLETED');
          return { ...data, payment };
        },
        compensate: async (data) => {
          if (data.payment) {
            await this.paymentService.refund({
              paymentId: data.payment.id,
              amount: data.totalAmount,
              reason: 'Order cancelled - saga compensation',
            });
          }
          await this.updateSagaState(data.sagaId, 'PROCESS_PAYMENT', 'COMPENSATED');
        },
      })
      // Step 4: Schedule Shipping
      .addStep({
        name: 'SCHEDULE_SHIPPING',
        execute: async (data) => {
          const shipment = await this.shippingService.schedule({
            orderId: data.orderId,
            reservationId: data.reservation.id,
            destination: data.shippingAddress,
          });
          
          await this.updateSagaState(data.sagaId, 'SCHEDULE_SHIPPING', 'COMPLETED');
          return { ...data, shipment };
        },
        compensate: async (data) => {
          if (data.shipment) {
            await this.shippingService.cancel({
              shipmentId: data.shipment.id,
            });
          }
          await this.updateSagaState(data.sagaId, 'SCHEDULE_SHIPPING', 'COMPENSATED');
        },
      })
      // Step 5: Confirm Order (no compensation needed - final step)
      .addStep({
        name: 'CONFIRM_ORDER',
        execute: async (data) => {
          await this.orderService.confirm({
            orderId: data.orderId,
            paymentId: data.payment.id,
            shipmentId: data.shipment.id,
          });
          
          await this.updateSagaState(data.sagaId, 'CONFIRM_ORDER', 'COMPLETED');
          await this.stateStore.update(data.sagaId, { status: 'COMPLETED' });
          return data;
        },
        compensate: async () => {
          // No compensation for final step
        },
      });
    
    // Execute the saga
    const result = await saga.execute(sagaData);
    
    if (!result.success) {
      await this.stateStore.update(sagaId, {
        status: 'COMPENSATED',
        error: result.error?.message,
      });
    }
    
    return {
      success: result.success,
      orderId: result.data?.orderId,
      trackingNumber: result.data?.shipment?.trackingNumber,
    };
  }
}

Orchestrated Saga Trade-offs

•Pro: Clear Saga Definition — The complete saga is visible in one place. Easy to understand, debug, and modify.
•Pro: Centralized Error Handling — All compensation logic is in the orchestrator. No distributed error event choreography.
•Pro: Easy to Add Steps — Adding a new step requires changing only the orchestrator, not multiple services.
•Pro: Simpler Services — Services don't need to know about the saga; they just execute commands and return results.
•Con: Coupling to Orchestrator — The orchestrator knows all services. Changes to any service might require orchestrator updates.
•Con: Potential Bottleneck — All sagas flow through the orchestrator, though this is usually not a performance issue.
•Con: Single Point of Knowledge — If the orchestrator dies, saga state and logic must be recoverable.

Choose Based on Complexity

For simple sagas (3-4 steps) with stable participants, choreography keeps things decoupled. For complex sagas (5+ steps) with conditional logic, branching, or frequent changes, orchestration provides necessary visibility and control.

Designing Compensating Transactions

Compensating transactions are the heart of saga correctness. A poorly designed compensation can leave the system in an inconsistent state—worse than having no saga at all.

Properties of Good Compensating Transactions:

1. Semantic Reversal, Not Logical Undo Compensation creates new operations that reverse the business effect. A refund isn't removing a charge; it's a new credit transaction.

2. Idempotent Compensation might be attempted multiple times. It must produce the same result whether executed once or many times.

3. Commutative with Original The order of applying compensation versus receiving notifications shouldn't matter. If a customer sees 'charge + refund in same statement,' that's fine.

4. Resilient Compensation must succeed even if the system state has changed. If an order is already cancelled, cancelling again should succeed (idempotent) or gracefully recognize the state.

Well-Designed Compensating Transactions
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
// Examples of well-designed compensating transactions
 
// INVENTORY COMPENSATION
class InventoryService {
  /**
   * Release reserved inventory
   * 
   * Designed for: Idempotency, Resilience, Auditability
   */
  async releaseReservation(reservationId: string): Promise<void> {
    return this.db.transaction(async (tx) => {
      // Check current state - idempotent handling
      const reservation = await tx.reservations.findUnique({
        where: { id: reservationId }
      });
      
      // Already released or never existed - succeed idempotently
      if (!reservation || reservation.status === 'RELEASED') {
        console.log(`Reservation ${reservationId} already released or not found`);
        return;
      }
      
      // Already expired - no action needed
      if (reservation.status === 'EXPIRED') {
        console.log(`Reservation ${reservationId} already expired`);
        return;
      }
      
      // Release each item back to inventory
      for (const item of reservation.items) {
        await tx.inventory.update({
          where: { 
            productId_warehouseId: {
              productId: item.productId, 
              warehouseId: item.warehouseId 
            }
          },
          data: { 
            availableQuantity: { increment: item.quantity },
            reservedQuantity: { decrement: item.quantity },
          },
        });
      }
      
      // Mark reservation as released
      await tx.reservations.update({
        where: { id: reservationId },
        data: { 
          status: 'RELEASED',
          releasedAt: new Date(),
          releaseReason: 'saga_compensation',
        },
      });
      
      // Audit trail
      await tx.inventoryAudit.create({
        action: 'RESERVATION_RELEASED',
        reservationId,
        items: reservation.items,
        reason: 'saga_compensation',
        timestamp: new Date(),
      });
    });
  }
}
 
// PAYMENT COMPENSATION
class PaymentService {
  /**
   * Refund a payment
   * 
   * Handles: Partial refunds, already-refunded, external gateway idempotency
   */
  async refundPayment(input: RefundInput): Promise<RefundResult> {
    return this.db.transaction(async (tx) => {
      const payment = await tx.payments.findUnique({
        where: { id: input.paymentId }
      });
      
      // Payment doesn't exist - succeed idempotently
      if (!payment) {
        return { status: 'NOT_FOUND', refundId: null };
      }
      
      // Already fully refunded - succeed idempotently
      if (payment.status === 'REFUNDED') {
        return { 
          status: 'ALREADY_REFUNDED', 
          refundId: payment.lastRefundId 
        };
      }
      
      // Calculate refundable amount
      const refundableAmount = payment.amount - (payment.refundedAmount || 0);
      const refundAmount = Math.min(input.amount || refundableAmount, refundableAmount);
      
      if (refundAmount <= 0) {
        return { status: 'NOTHING_TO_REFUND', refundId: null };
      }
      
      // Call payment gateway with idempotency key
      const refund = await this.gateway.refund({
        originalTransactionId: payment.transactionId,
        amount: refundAmount,
        idempotencyKey: `refund-${payment.id}-${input.sagaId || 'manual'}`,
        reason: input.reason,
      });
      
      // Update payment record
      const isFullyRefunded = (payment.refundedAmount || 0) + refundAmount >= payment.amount;
      
      await tx.payments.update({
        where: { id: payment.id },
        data: {
          status: isFullyRefunded ? 'REFUNDED' : 'PARTIALLY_REFUNDED',
          refundedAmount: { increment: refundAmount },
          lastRefundId: refund.id,
          lastRefundAt: new Date(),
        },
      });
      
      // Create refund record
      await tx.refunds.create({
        id: refund.id,
        paymentId: payment.id,
        amount: refundAmount,
        reason: input.reason,
        gatewayRefundId: refund.gatewayId,
        status: 'COMPLETED',
      });
      
      return { status: 'REFUNDED', refundId: refund.id };
    });
  }
}
 
// ORDER COMPENSATION
class OrderService {
  /**
   * Cancel an order as compensation
   * 
   * Handles: Already cancelled, shipped orders, customer communications
   */
  async cancelOrder(input: CancelOrderInput): Promise<CancelResult> {
    return this.db.transaction(async (tx) => {
      const order = await tx.orders.findUnique({
        where: { id: input.orderId }
      });
      
      if (!order) {
        return { status: 'NOT_FOUND' };
      }
      
      // Already cancelled - succeed idempotently
      if (order.status === 'CANCELLED') {
        return { status: 'ALREADY_CANCELLED' };
      }
      
      // Order already shipped - cannot cancel
      if (order.status === 'SHIPPED' || order.status === 'DELIVERED') {
        return { 
          status: 'CANNOT_CANCEL',
          reason: 'Order already shipped/delivered',
        };
      }
      
      // For CONFIRMED orders, we need different handling
      if (order.status === 'CONFIRMED') {
        // Customer already notified - need to notify of cancellation
        await this.notificationService.sendOrderCancellation({
          orderId: order.id,
          customerId: order.customerId,
          reason: input.reason,
        });
      }
      
      await tx.orders.update({
        where: { id: order.id },
        data: {
          status: 'CANCELLED',
          cancelledAt: new Date(),
          cancelReason: input.reason,
          cancelledBy: 'saga_compensation',
        },
      });
      
      await tx.orderHistory.create({
        orderId: order.id,
        previousStatus: order.status,
        newStatus: 'CANCELLED',
        reason: input.reason,
        source: 'saga_compensation',
        timestamp: new Date(),
      });
      
      return { status: 'CANCELLED' };
    });
  }
}

The Compensation Cannot Fail Problem

What happens if compensation fails? This is the hardest problem in saga design. Options: (1) Retry indefinitely with exponential backoff, (2) Move to a dead letter queue for manual intervention, (3) Design compensations to be so simple they effectively cannot fail. In practice, use a combination—retry transient failures, alert humans for persistent ones.

Handling Partial Failures

The most complex saga scenarios involve partial failures—situations where you can't simply compensate and walk away. Real-world systems face nuanced cases that require careful design.

Scenario 1: Compensation Partially Succeeds

Your saga is compensating, but one of the compensation steps fails:

Order cancellation succeeded
Inventory release succeeded
Payment refund failed (gateway timeout)

The system is now inconsistent: customer paid but has no order or inventory.

Handling Compensation Failures
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
// Robust compensation with failure handling
 
class SagaCompensationManager {
  async compensate(
    sagaId: string,
    completedSteps: CompletedStep[],
    failureReason: string,
  ): Promise<CompensationResult> {
    const compensationAttempts: CompensationAttempt[] = [];
    const failedCompensations: FailedCompensation[] = [];
    
    // Compensate in reverse order
    for (const step of completedSteps.reverse()) {
      const attempt = await this.compensateStep(step, sagaId);
      compensationAttempts.push(attempt);
      
      if (!attempt.success) {
        failedCompensations.push({
          step: step.name,
          error: attempt.error,
          willRetry: attempt.retryable,
        });
        
        if (attempt.retryable) {
          // Schedule async retry
          await this.scheduleCompensationRetry(sagaId, step, attempt.retryCount);
        } else {
          // Non-retryable: alert for manual intervention
          await this.alertManualIntervention(sagaId, step, attempt.error);
        }
      }
    }
    
    // Determine overall compensation status
    if (failedCompensations.length === 0) {
      return { status: 'FULLY_COMPENSATED', attempts: compensationAttempts };
    } else if (failedCompensations.every(f => f.willRetry)) {
      return { status: 'COMPENSATING', attempts: compensationAttempts, pending: failedCompensations };
    } else {
      return { status: 'REQUIRES_INTERVENTION', attempts: compensationAttempts, failed: failedCompensations };
    }
  }
  
  private async compensateStep(
    step: CompletedStep,
    sagaId: string,
    retryCount: number = 0,
  ): Promise<CompensationAttempt> {
    const maxRetries = 5;
    const baseDelay = 1000; // 1 second
    
    try {
      await step.compensation.execute(step.data);
      
      return {
        step: step.name,
        success: true,
        timestamp: new Date(),
      };
    } catch (error) {
      const isRetryable = this.isRetryableError(error);
      
      return {
        step: step.name,
        success: false,
        error: error.message,
        retryable: isRetryable && retryCount < maxRetries,
        retryCount: retryCount + 1,
        nextRetryAt: isRetryable 
          ? new Date(Date.now() + baseDelay * Math.pow(2, retryCount))
          : undefined,
        timestamp: new Date(),
      };
    }
  }
  
  private isRetryableError(error: Error): boolean {
    // Network errors, timeouts, rate limits are retryable
    return (
      error instanceof NetworkError ||
      error instanceof TimeoutError ||
      error instanceof RateLimitError ||
      (error instanceof HttpError && error.status >= 500)
    );
  }
  
  // Background job to retry failed compensations
  async processCompensationRetries(): Promise<void> {
    const pending = await this.stateStore.findPendingCompensations();
    
    for (const compensation of pending) {
      if (compensation.nextRetryAt <= new Date()) {
        const result = await this.compensateStep(
          compensation.step,
          compensation.sagaId,
          compensation.retryCount,
        );
        
        if (result.success) {
          await this.stateStore.markCompensationComplete(compensation.id);
        } else if (!result.retryable) {
          await this.alertManualIntervention(compensation.sagaId, compensation.step, result.error);
        } else {
          await this.stateStore.updateNextRetry(
            compensation.id,
            result.nextRetryAt,
            result.retryCount,
          );
        }
      }
    }
  }
}

Scenario 2: Forward Recovery

Sometimes, instead of compensating backward, it's better to push forward. If payment and inventory succeeded but shipping failed temporarily, should you refund everything or retry shipping?

Forward Recovery retries the failing step instead of compensating. Use it when:

The failure is likely transient
Compensation has high cost (payment processing fees)
Business rules prefer completion over cancellation

Forward Recovery Decision
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
class SagaRecoveryDecider {
  decideRecoveryStrategy(
    saga: Saga,
    failedStep: SagaStep,
    error: Error,
  ): RecoveryStrategy {
    // Check if forward recovery is possible
    if (this.canRetryForward(failedStep, error)) {
      const retryCount = saga.getRetryCount(failedStep.name);
      
      if (retryCount < failedStep.maxRetries) {
        return {
          type: 'RETRY_FORWARD',
          delay: this.calculateRetryDelay(retryCount, failedStep),
        };
      }
    }
    
    // Check business rules for direction
    const completedSteps = saga.getCompletedSteps();
    const compensationCost = this.estimateCompensationCost(completedSteps);
    const retryProbability = this.estimateRetrySuccessProbability(failedStep, error);
    
    // If compensation is expensive and retry likely to succeed, retry more
    if (compensationCost > threshold && retryProbability > 0.5) {
      return {
        type: 'RETRY_FORWARD',
        delay: this.calculateExtendedRetryDelay(saga),
        maxAdditionalRetries: 3,
      };
    }
    
    // Otherwise, compensate
    return {
      type: 'COMPENSATE_BACKWARD',
      steps: completedSteps.reverse(),
    };
  }
  
  private canRetryForward(step: SagaStep, error: Error): boolean {
    // Some errors should never be retried
    if (error instanceof ValidationError) return false;
    if (error instanceof BusinessRuleViolation) return false;
    if (error instanceof InsufficientFundsError) return false;
    
    // Transient errors can be retried
    return true;
  }
  
  private estimateCompensationCost(steps: CompletedStep[]): number {
    return steps.reduce((cost, step) => {
      // Payment refunds have fees
      if (step.name === 'PROCESS_PAYMENT') {
        cost += step.data.amount * 0.03; // ~3% payment processing fee
      }
      // Inventory release is cheap
      // Shipping cancellation might have fees
      if (step.name === 'SCHEDULE_SHIPPING' && step.data.shipment.status === 'LABEL_PRINTED') {
        cost += 5; // Label printing cost
      }
      return cost;
    }, 0);
  }
}

Semantic Lock Anti-Pattern Caution

Some resources require exclusive access during a saga. A 'semantic lock' holds the resource by marking its state (e.g., 'ORDER_PROCESSING'). Be careful: if the saga fails without releasing the lock, the resource is stuck. Always design timeouts for semantic locks and background jobs to detect and release stuck locks.

Saga Execution Guarantees

Understanding exactly what guarantees sagas provide (and don't provide) is essential for correct system design.

What Sagas Guarantee:

1. Eventual Consistency The system will eventually reach a consistent state—either all transactions complete or all are compensated.

2. Atomicity of Local Transactions Each step is atomic within its service's database. ACID properties apply per-step.

3. Ordered Execution Steps execute in defined order. Compensation follows reverse order.

What Sagas Do NOT Guarantee:

Saga Limitations

•No Isolation — Other transactions may see intermediate states. A user might query their account during the saga and see an inconsistent balance.
•No Global Atomicity — During execution, some services show changes while others don't yet. It's possible to read 'payment succeeded' while inventory reservation is still pending.
•No Rollback — Compensation creates new transactions; original transactions remain in history. Audit logs will show both the original and compensation.
•No Guaranteed Timing — A saga might take seconds or hours depending on service availability and retry policies.

Handling Lack of Isolation:

The 'dirty reads' problem in sagas requires explicit design:

Countermeasure 1: Semantic Locking Mark resources as 'in progress' to prevent conflicting operations. Other sagas wait or fail when encountering locked resources.

Countermeasure 2: Commutative Operations Design operations so order doesn't matter. 'Add $10' followed by 'Subtract $5' equals 'Subtract $5' followed by 'Add $10'.

Countermeasure 3: Pessimistic Views Version your reads. Don't show 'order confirmed' until the saga completes. Show 'order processing' during intermediate states.

Countermeasure 4: Reread Values Before compensation, re-verify the current state. The state might have already been corrected by another process.

Handling Saga Isolation Issues
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
// Semantic Locking for Saga Isolation
 
class InventoryWithSagaLocking {
  async reserveWithLock(input: ReserveInput): Promise<Reservation> {
    return this.db.transaction(async (tx) => {
      // Acquire semantic lock on items
      for (const item of input.items) {
        const product = await tx.products.findUnique({
          where: { id: item.productId },
        });
        
        // Check for existing saga lock
        if (product.lockedBySagaId && product.lockedBySagaId !== input.sagaId) {
          // Another saga is working on this item
          const lockAge = Date.now() - product.lockedAt.getTime();
          
          if (lockAge < 5 * 60 * 1000) { // Lock is fresh (< 5 min)
            throw new ResourceLockedError(
              `Product ${item.productId} is locked by saga ${product.lockedBySagaId}`
            );
          } else {
            // Lock is stale - previous saga likely failed
            // Take over the lock
            console.warn(`Taking over stale lock on ${item.productId}`);
          }
        }
        
        // Acquire lock for this saga
        await tx.products.update({
          where: { id: item.productId },
          data: {
            lockedBySagaId: input.sagaId,
            lockedAt: new Date(),
          },
        });
      }
      
      // Now safe to reserve
      const reservation = await this.createReservation(tx, input);
      
      return reservation;
    });
  }
  
  async releaseWithLock(reservationId: string, sagaId: string): Promise<void> {
    return this.db.transaction(async (tx) => {
      const reservation = await tx.reservations.findUnique({
        where: { id: reservationId },
      });
      
      // Release inventory
      await this.doRelease(tx, reservation);
      
      // Release semantic locks
      for (const item of reservation.items) {
        await tx.products.update({
          where: { 
            id: item.productId,
            lockedBySagaId: sagaId, // Only release our own lock
          },
          data: {
            lockedBySagaId: null,
            lockedAt: null,
          },
        });
      }
    });
  }
}
 
// Pessimistic View - Show uncertain state to users
class OrderStatusService {
  async getOrderStatus(orderId: string): Promise<OrderStatusView> {
    const saga = await this.sagaStore.findByOrderId(orderId);
    
    if (!saga) {
      // No saga running - show actual order status
      return this.getActualStatus(orderId);
    }
    
    // Saga is in progress - show conservative status
    switch (saga.status) {
      case 'RUNNING':
        return {
          status: 'PROCESSING',
          message: 'Your order is being processed',
          approximateWait: '2-5 minutes',
        };
      case 'COMPENSATING':
        return {
          status: 'PROCESSING',
          message: 'Your order is being updated',
          approximateWait: '1-3 minutes',
        };
      case 'REQUIRES_INTERVENTION':
        return {
          status: 'UNDER_REVIEW',
          message: 'Your order requires manual review',
          supportContact: true,
        };
      case 'COMPLETED':
        return this.getActualStatus(orderId);
      case 'COMPENSATED':
        return {
          status: 'CANCELLED',
          message: 'Your order could not be completed',
          reason: saga.compensationReason,
        };
    }
  }
}

Design for Intermediate Visibility

Users will observe intermediate saga states. Design your UI to communicate this clearly: 'Order processing...', 'Payment confirmed, preparing shipment...'. Don't show success until the saga completes. Don't show failure until compensation completes. The status 'working on it' is honest and builds trust.

Summary: Mastering Sagas

We've explored saga patterns for managing distributed transactions. Let's consolidate the key insights:

Key Takeaways

•Sagas replace distributed transactions — Instead of global ACID, sagas use local transactions with compensating transactions for rollback.
•Choreographed sagas are decentralized — Services react to events and emit success/failure events. Great for loose coupling, harder to understand.
•Orchestrated sagas are centralized — An orchestrator explicitly controls the flow. Easier to understand and modify, but creates coupling.
•Compensation is semantic, not logical — Compensating transactions create new effects that reverse business outcomes. They don't undo database operations.
•Compensation must be idempotent — The same compensation might execute multiple times. Design compensations to handle this gracefully.
•Sagas lack isolation — Intermediate states are visible. Use semantic locking and pessimistic views to manage concurrent access.
•Forward recovery is sometimes better — When compensation is costly and retry likely succeeds, push forward rather than backward.

In the next page, we'll explore the trade-offs between choreography and orchestration in depth, providing decision frameworks for choosing between them and understanding when each approach is optimal for different scenarios.

Page Complete

You now understand saga patterns for distributed transactions: their structure, choreographed vs orchestrated implementation, compensation design, partial failure handling, and execution guarantees. You can implement sagas that maintain data consistency across services without distributed transactions. Next, we'll examine the trade-offs between coordination approaches.

3 / 5

Loading learning content...

System Design (HLD)Choreography vs Orchestration

Choreography vs Orchestration: Coordination Patterns in Event-Driven Systems

LevelAdvanced

Duration90 mins

TopicChoreography vs Orchestration

3 / 5

Saga Patterns — Distributed Transaction Management

Beyond ACID: Managing Distributed Transactions

But what happens when your operations span multiple services, each with its own database?

What You Will Learn

The Distributed Transaction Problem

Before diving into sagas, let's understand why distributed transactions are fundamentally hard—and why traditional solutions don't work at scale.

The Traditional Solution: Two-Phase Commit (2PC)

In 2PC, a coordinator orchestrates a distributed transaction:

Phase 1 (Prepare): Coordinator asks all participants: "Can you commit?" Each participant locks resources and votes yes/no.
Phase 2 (Commit/Abort): If all voted yes, coordinator says "Commit." Otherwise, "Abort."

Why 2PC Fails at Scale:

Problems with Two-Phase Commit

•Synchronous Blocking — All participants must be available simultaneously. One slow or unavailable participant blocks the entire transaction.
•Lock Contention — Participants hold locks during the entire protocol. Long transactions cause severe performance degradation.
•Single Point of Failure — If the coordinator fails between phases, participants are stuck holding locks, waiting for a decision.
•Not Designed for the Web — 2PC assumes reliable, low-latency networks. The internet is neither.
•Doesn't Work Across Services — Most databases don't support distributed transactions across different database systems or services.

The CAP Theorem Reality:

The Saga Approach:

Execute each step as a local transaction (one service, one database, ACID guaranteed)
If a step fails, execute compensating transactions for all previously successful steps
Accept that between steps, the system is in an intermediate state
Design for eventual convergence to consistent state

Sagas Are Not New

Anatomy of a Saga

Formal Definition:

Each Tᵢ is a local transaction that updates a single service's data
Each Cᵢ is a compensating transaction that semantically undoes Tᵢ
If all transactions succeed: T₁ → T₂ → ... → Tₙ (saga completes)
If Tₖ fails: T₁ → T₂ → ... → Tₖ₋₁ → Cₖ₋₁ → Cₖ₋₂ → ... → C₁ (saga compensates)

Order Processing Saga Example
Step	Transaction (Tᵢ)	Compensating Transaction (Cᵢ)	Service
T₁	Create order in PENDING state	Cancel order, set state to CANCELLED	Order Service
T₂	Reserve inventory for items	Release reserved inventory	Inventory Service
T₃	Charge customer's payment method	Refund the payment	Payment Service
T₄	Schedule shipment	Cancel shipment	Shipping Service
T₅	Update order to CONFIRMED	(No compensation needed)	Order Service

Failure Scenarios:

Scenario 1: Payment Failed (T₃ fails)

T₁ ✓ (Order created)
T₂ ✓ (Inventory reserved)
T₃ ✗ (Payment declined)
Execute: C₂ (Release inventory) → C₁ (Cancel order)
End state: No order, no reservation, no payment

Scenario 2: Shipment Failed (T₄ fails)

T₁ ✓ (Order created)
T₂ ✓ (Inventory reserved)
T₃ ✓ (Payment charged)
T₄ ✗ (Shipping API down)
Execute: C₃ (Refund payment) → C₂ (Release inventory) → C₁ (Cancel order)
End state: Payment refunded, inventory released, order cancelled

Key Insight: Compensation must happen in reverse order. We undo the most recent change first because later transactions may depend on earlier ones.

Compensation Is Not Rollback

Choreographed Sagas

In a choreographed saga, there's no central controller. Each service participates in the saga by listening to events and emitting events. The saga emerges from the distributed event flow.

How It Works:

Service A performs its local transaction and emits an event
Service B hears the event, performs its transaction, emits another event
If Service B fails, it emits a failure event
Service A hears the failure event and performs its compensation

This is pure event-driven choreography applied to the saga pattern.

Choreographed Saga Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
// Choreographed Saga: Order Processing
 
// ORDER SERVICE
class OrderService {
  async createOrder(command: CreateOrderCommand): Promise<Order> {
    return this.db.transaction(async (tx) => {
      // T₁: Create order in pending state
      const order = await tx.orders.create({
        id: uuid(),
        customerId: command.customerId,
        items: command.items,
        status: 'PENDING',
        createdAt: new Date(),
      });
      
      // Emit event for next step
      await tx.outbox.insert({
        aggregateId: order.id,
        eventType: 'OrderCreated',
        payload: {
          orderId: order.id,
          customerId: command.customerId,
          items: command.items,
          totalAmount: command.totalAmount,
        },
      });
      
      return order;
    });
  }
  
  // Handle failure from downstream services
  async handleInventoryReservationFailed(event: InventoryReservationFailedEvent) {
    return this.db.transaction(async (tx) => {
      // C₁: Compensate by cancelling order
      await tx.orders.update({
        where: { id: event.orderId },
        data: { 
          status: 'CANCELLED',
          cancelReason: 'Inventory not available',
          cancelledAt: new Date(),
        },
      });
      
      await tx.outbox.insert({
        aggregateId: event.orderId,
        eventType: 'OrderCancelled',
        payload: {
          orderId: event.orderId,
          reason: 'Inventory not available',
        },
      });
    });
  }
  
  async handlePaymentFailed(event: PaymentFailedEvent) {
    // Similar compensation logic
  }
}
 
// INVENTORY SERVICE
class InventoryService {
  async handleOrderCreated(event: OrderCreatedEvent) {
    return this.db.transaction(async (tx) => {
      try {
        // T₂: Reserve inventory
        const reservations = [];
        for (const item of event.items) {
          const available = await tx.inventory.findFirst({
            where: { productId: item.productId, quantity: { gte: item.quantity } },
          });
          
          if (!available) {
            throw new InsufficientInventoryError(item.productId);
          }
          
          await tx.inventory.update({
            where: { id: available.id },
            data: { quantity: { decrement: item.quantity } },
          });
          
          reservations.push({
            productId: item.productId,
            quantity: item.quantity,
            warehouseId: available.warehouseId,
          });
        }
        
        const reservation = await tx.reservations.create({
          id: uuid(),
          orderId: event.orderId,
          items: reservations,
          status: 'RESERVED',
          expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000),
        });
        
        // Emit success event
        await tx.outbox.insert({
          aggregateId: reservation.id,
          eventType: 'InventoryReserved',
          payload: {
            orderId: event.orderId,
            reservationId: reservation.id,
            items: reservations,
          },
        });
        
      } catch (error) {
        if (error instanceof InsufficientInventoryError) {
          // Emit failure event - Order Service will compensate T₁
          await tx.outbox.insert({
            aggregateId: event.orderId,
            eventType: 'InventoryReservationFailed',
            payload: {
              orderId: event.orderId,
              reason: 'Insufficient inventory',
              productId: error.productId,
            },
          });
        } else {
          throw error;
        }
      }
    });
  }
  
  // C₂: Handle payment failure - release inventory
  async handlePaymentFailed(event: PaymentFailedEvent) {
    return this.db.transaction(async (tx) => {
      const reservation = await tx.reservations.findUnique({
        where: { orderId: event.orderId },
      });
      
      if (!reservation) return;
      
      // Release each reserved item
      for (const item of reservation.items) {
        await tx.inventory.update({
          where: { productId: item.productId, warehouseId: item.warehouseId },
          data: { quantity: { increment: item.quantity } },
        });
      }
      
      await tx.reservations.update({
        where: { id: reservation.id },
        data: { status: 'RELEASED' },
      });
      
      await tx.outbox.insert({
        aggregateId: reservation.id,
        eventType: 'InventoryReleased',
        payload: {
          orderId: event.orderId,
          reservationId: reservation.id,
        },
      });
    });
  }
}
 
// PAYMENT SERVICE
class PaymentService {
  async handleInventoryReserved(event: InventoryReservedEvent) {
    return this.db.transaction(async (tx) => {
      try {
        // T₃: Charge payment
        const charge = await this.paymentGateway.charge({
          customerId: event.customerId,
          amount: event.totalAmount,
          idempotencyKey: event.orderId,
        });
        
        await tx.payments.create({
          id: charge.id,
          orderId: event.orderId,
          amount: event.totalAmount,
          status: 'COMPLETED',
        });
        
        await tx.outbox.insert({
          aggregateId: charge.id,
          eventType: 'PaymentCompleted',
          payload: {
            orderId: event.orderId,
            paymentId: charge.id,
            amount: event.totalAmount,
          },
        });
        
      } catch (error) {
        // Emit failure - triggers C₂ and C₁
        await tx.outbox.insert({
          aggregateId: event.orderId,
          eventType: 'PaymentFailed',
          payload: {
            orderId: event.orderId,
            reason: error.message,
          },
        });
      }
    });
  }
}

Choreographed Saga Trade-offs

•Pro: Loose Coupling — Services only know about events, not each other. New services can join without changing existing ones.
•Pro: No Single Point of Failure — No central orchestrator to fail. Each service operates independently.
•Pro: Team Autonomy — Each team manages their service's saga participation independently.
•Con: Hard to Understand — The complete saga is distributed across services. Understanding the flow requires examining multiple codebases.
•Con: Cyclic Dependencies Risk — Events can create complex, hard-to-trace dependency graphs.
•Con: Difficult to Add Steps — Adding a new step requires modifying multiple services' event subscriptions.

Orchestrated Sagas

In an orchestrated saga, a central saga orchestrator explicitly coordinates the sequence of local transactions. It knows the complete saga definition and explicitly invokes each step.

How It Works:

Orchestrator invokes Service A
On success, orchestrator invokes Service B
If Service B fails, orchestrator explicitly calls Service A's compensation
Orchestrator tracks saga state throughout

This provides visibility and control at the cost of centralizing saga logic.

Orchestrated Saga Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
// Orchestrated Saga: Order Processing with Saga Orchestrator
 
interface SagaStep<TData> {
  name: string;
  execute: (data: TData) => Promise<TData>;
  compensate: (data: TData) => Promise<void>;
}
 
class SagaOrchestrator<TData> {
  private steps: SagaStep<TData>[] = [];
  private completedSteps: SagaStep<TData>[] = [];
  
  addStep(step: SagaStep<TData>): this {
    this.steps.push(step);
    return this;
  }
  
  async execute(initialData: TData): Promise<SagaResult<TData>> {
    let data = initialData;
    
    for (const step of this.steps) {
      try {
        console.log(`Executing step: ${step.name}`);
        data = await step.execute(data);
        this.completedSteps.push(step);
      } catch (error) {
        console.log(`Step ${step.name} failed: ${error.message}`);
        await this.compensate(data);
        return { success: false, error, data };
      }
    }
    
    return { success: true, data };
  }
  
  private async compensate(data: TData): Promise<void> {
    // Compensate in reverse order
    for (const step of this.completedSteps.reverse()) {
      try {
        console.log(`Compensating step: ${step.name}`);
        await step.compensate(data);
      } catch (error) {
        console.error(`Compensation failed for ${step.name}: ${error.message}`);
        // Log and continue - compensation must be best-effort
        await this.alertCompensationFailure(step, error);
      }
    }
  }
}
 
// Order Processing Saga Definition
class OrderProcessingSaga {
  constructor(
    private readonly orderService: OrderServiceClient,
    private readonly inventoryService: InventoryServiceClient,
    private readonly paymentService: PaymentServiceClient,
    private readonly shippingService: ShippingServiceClient,
    private readonly stateStore: SagaStateStore,
  ) {}
  
  async execute(input: CreateOrderInput): Promise<OrderResult> {
    const sagaId = uuid();
    
    // Initialize saga state
    let sagaData: OrderSagaData = {
      sagaId,
      orderId: uuid(),
      customerId: input.customerId,
      items: input.items,
      totalAmount: input.totalAmount,
      shippingAddress: input.shippingAddress,
    };
    
    await this.stateStore.create({
      sagaId,
      type: 'OrderProcessing',
      status: 'RUNNING',
      data: sagaData,
      currentStep: 'CREATE_ORDER',
      startedAt: new Date(),
    });
    
    const saga = new SagaOrchestrator<OrderSagaData>()
      // Step 1: Create Order
      .addStep({
        name: 'CREATE_ORDER',
        execute: async (data) => {
          const order = await this.orderService.create({
            orderId: data.orderId,
            customerId: data.customerId,
            items: data.items,
            status: 'PENDING',
          });
          
          await this.updateSagaState(data.sagaId, 'CREATE_ORDER', 'COMPLETED');
          return { ...data, order };
        },
        compensate: async (data) => {
          await this.orderService.cancel({
            orderId: data.orderId,
            reason: 'Saga compensation',
          });
          await this.updateSagaState(data.sagaId, 'CREATE_ORDER', 'COMPENSATED');
        },
      })
      // Step 2: Reserve Inventory
      .addStep({
        name: 'RESERVE_INVENTORY',
        execute: async (data) => {
          const reservation = await this.inventoryService.reserve({
            orderId: data.orderId,
            items: data.items,
          });
          
          await this.updateSagaState(data.sagaId, 'RESERVE_INVENTORY', 'COMPLETED');
          return { ...data, reservation };
        },
        compensate: async (data) => {
          if (data.reservation) {
            await this.inventoryService.release({
              reservationId: data.reservation.id,
            });
          }
          await this.updateSagaState(data.sagaId, 'RESERVE_INVENTORY', 'COMPENSATED');
        },
      })
      // Step 3: Process Payment
      .addStep({
        name: 'PROCESS_PAYMENT',
        execute: async (data) => {
          const payment = await this.paymentService.charge({
            orderId: data.orderId,
            customerId: data.customerId,
            amount: data.totalAmount,
            idempotencyKey: `${data.sagaId}-payment`,
          });
          
          await this.updateSagaState(data.sagaId, 'PROCESS_PAYMENT', 'COMPLETED');
          return { ...data, payment };
        },
        compensate: async (data) => {
          if (data.payment) {
            await this.paymentService.refund({
              paymentId: data.payment.id,
              amount: data.totalAmount,
              reason: 'Order cancelled - saga compensation',
            });
          }
          await this.updateSagaState(data.sagaId, 'PROCESS_PAYMENT', 'COMPENSATED');
        },
      })
      // Step 4: Schedule Shipping
      .addStep({
        name: 'SCHEDULE_SHIPPING',
        execute: async (data) => {
          const shipment = await this.shippingService.schedule({
            orderId: data.orderId,
            reservationId: data.reservation.id,
            destination: data.shippingAddress,
          });
          
          await this.updateSagaState(data.sagaId, 'SCHEDULE_SHIPPING', 'COMPLETED');
          return { ...data, shipment };
        },
        compensate: async (data) => {
          if (data.shipment) {
            await this.shippingService.cancel({
              shipmentId: data.shipment.id,
            });
          }
          await this.updateSagaState(data.sagaId, 'SCHEDULE_SHIPPING', 'COMPENSATED');
        },
      })
      // Step 5: Confirm Order (no compensation needed - final step)
      .addStep({
        name: 'CONFIRM_ORDER',
        execute: async (data) => {
          await this.orderService.confirm({
            orderId: data.orderId,
            paymentId: data.payment.id,
            shipmentId: data.shipment.id,
          });
          
          await this.updateSagaState(data.sagaId, 'CONFIRM_ORDER', 'COMPLETED');
          await this.stateStore.update(data.sagaId, { status: 'COMPLETED' });
          return data;
        },
        compensate: async () => {
          // No compensation for final step
        },
      });
    
    // Execute the saga
    const result = await saga.execute(sagaData);
    
    if (!result.success) {
      await this.stateStore.update(sagaId, {
        status: 'COMPENSATED',
        error: result.error?.message,
      });
    }
    
    return {
      success: result.success,
      orderId: result.data?.orderId,
      trackingNumber: result.data?.shipment?.trackingNumber,
    };
  }
}

Orchestrated Saga Trade-offs

•Pro: Clear Saga Definition — The complete saga is visible in one place. Easy to understand, debug, and modify.
•Pro: Centralized Error Handling — All compensation logic is in the orchestrator. No distributed error event choreography.
•Pro: Easy to Add Steps — Adding a new step requires changing only the orchestrator, not multiple services.
•Pro: Simpler Services — Services don't need to know about the saga; they just execute commands and return results.
•Con: Coupling to Orchestrator — The orchestrator knows all services. Changes to any service might require orchestrator updates.
•Con: Potential Bottleneck — All sagas flow through the orchestrator, though this is usually not a performance issue.
•Con: Single Point of Knowledge — If the orchestrator dies, saga state and logic must be recoverable.

Choose Based on Complexity

Designing Compensating Transactions

Compensating transactions are the heart of saga correctness. A poorly designed compensation can leave the system in an inconsistent state—worse than having no saga at all.

Properties of Good Compensating Transactions:

1. Semantic Reversal, Not Logical Undo Compensation creates new operations that reverse the business effect. A refund isn't removing a charge; it's a new credit transaction.

2. Idempotent Compensation might be attempted multiple times. It must produce the same result whether executed once or many times.

3. Commutative with Original The order of applying compensation versus receiving notifications shouldn't matter. If a customer sees 'charge + refund in same statement,' that's fine.

4. Resilient Compensation must succeed even if the system state has changed. If an order is already cancelled, cancelling again should succeed (idempotent) or gracefully recognize the state.

Well-Designed Compensating Transactions
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
// Examples of well-designed compensating transactions
 
// INVENTORY COMPENSATION
class InventoryService {
  /**
   * Release reserved inventory
   * 
   * Designed for: Idempotency, Resilience, Auditability
   */
  async releaseReservation(reservationId: string): Promise<void> {
    return this.db.transaction(async (tx) => {
      // Check current state - idempotent handling
      const reservation = await tx.reservations.findUnique({
        where: { id: reservationId }
      });
      
      // Already released or never existed - succeed idempotently
      if (!reservation || reservation.status === 'RELEASED') {
        console.log(`Reservation ${reservationId} already released or not found`);
        return;
      }
      
      // Already expired - no action needed
      if (reservation.status === 'EXPIRED') {
        console.log(`Reservation ${reservationId} already expired`);
        return;
      }
      
      // Release each item back to inventory
      for (const item of reservation.items) {
        await tx.inventory.update({
          where: { 
            productId_warehouseId: {
              productId: item.productId, 
              warehouseId: item.warehouseId 
            }
          },
          data: { 
            availableQuantity: { increment: item.quantity },
            reservedQuantity: { decrement: item.quantity },
          },
        });
      }
      
      // Mark reservation as released
      await tx.reservations.update({
        where: { id: reservationId },
        data: { 
          status: 'RELEASED',
          releasedAt: new Date(),
          releaseReason: 'saga_compensation',
        },
      });
      
      // Audit trail
      await tx.inventoryAudit.create({
        action: 'RESERVATION_RELEASED',
        reservationId,
        items: reservation.items,
        reason: 'saga_compensation',
        timestamp: new Date(),
      });
    });
  }
}
 
// PAYMENT COMPENSATION
class PaymentService {
  /**
   * Refund a payment
   * 
   * Handles: Partial refunds, already-refunded, external gateway idempotency
   */
  async refundPayment(input: RefundInput): Promise<RefundResult> {
    return this.db.transaction(async (tx) => {
      const payment = await tx.payments.findUnique({
        where: { id: input.paymentId }
      });
      
      // Payment doesn't exist - succeed idempotently
      if (!payment) {
        return { status: 'NOT_FOUND', refundId: null };
      }
      
      // Already fully refunded - succeed idempotently
      if (payment.status === 'REFUNDED') {
        return { 
          status: 'ALREADY_REFUNDED', 
          refundId: payment.lastRefundId 
        };
      }
      
      // Calculate refundable amount
      const refundableAmount = payment.amount - (payment.refundedAmount || 0);
      const refundAmount = Math.min(input.amount || refundableAmount, refundableAmount);
      
      if (refundAmount <= 0) {
        return { status: 'NOTHING_TO_REFUND', refundId: null };
      }
      
      // Call payment gateway with idempotency key
      const refund = await this.gateway.refund({
        originalTransactionId: payment.transactionId,
        amount: refundAmount,
        idempotencyKey: `refund-${payment.id}-${input.sagaId || 'manual'}`,
        reason: input.reason,
      });
      
      // Update payment record
      const isFullyRefunded = (payment.refundedAmount || 0) + refundAmount >= payment.amount;
      
      await tx.payments.update({
        where: { id: payment.id },
        data: {
          status: isFullyRefunded ? 'REFUNDED' : 'PARTIALLY_REFUNDED',
          refundedAmount: { increment: refundAmount },
          lastRefundId: refund.id,
          lastRefundAt: new Date(),
        },
      });
      
      // Create refund record
      await tx.refunds.create({
        id: refund.id,
        paymentId: payment.id,
        amount: refundAmount,
        reason: input.reason,
        gatewayRefundId: refund.gatewayId,
        status: 'COMPLETED',
      });
      
      return { status: 'REFUNDED', refundId: refund.id };
    });
  }
}
 
// ORDER COMPENSATION
class OrderService {
  /**
   * Cancel an order as compensation
   * 
   * Handles: Already cancelled, shipped orders, customer communications
   */
  async cancelOrder(input: CancelOrderInput): Promise<CancelResult> {
    return this.db.transaction(async (tx) => {
      const order = await tx.orders.findUnique({
        where: { id: input.orderId }
      });
      
      if (!order) {
        return { status: 'NOT_FOUND' };
      }
      
      // Already cancelled - succeed idempotently
      if (order.status === 'CANCELLED') {
        return { status: 'ALREADY_CANCELLED' };
      }
      
      // Order already shipped - cannot cancel
      if (order.status === 'SHIPPED' || order.status === 'DELIVERED') {
        return { 
          status: 'CANNOT_CANCEL',
          reason: 'Order already shipped/delivered',
        };
      }
      
      // For CONFIRMED orders, we need different handling
      if (order.status === 'CONFIRMED') {
        // Customer already notified - need to notify of cancellation
        await this.notificationService.sendOrderCancellation({
          orderId: order.id,
          customerId: order.customerId,
          reason: input.reason,
        });
      }
      
      await tx.orders.update({
        where: { id: order.id },
        data: {
          status: 'CANCELLED',
          cancelledAt: new Date(),
          cancelReason: input.reason,
          cancelledBy: 'saga_compensation',
        },
      });
      
      await tx.orderHistory.create({
        orderId: order.id,
        previousStatus: order.status,
        newStatus: 'CANCELLED',
        reason: input.reason,
        source: 'saga_compensation',
        timestamp: new Date(),
      });
      
      return { status: 'CANCELLED' };
    });
  }
}

The Compensation Cannot Fail Problem

Handling Partial Failures

The most complex saga scenarios involve partial failures—situations where you can't simply compensate and walk away. Real-world systems face nuanced cases that require careful design.

Scenario 1: Compensation Partially Succeeds

Your saga is compensating, but one of the compensation steps fails:

Order cancellation succeeded
Inventory release succeeded
Payment refund failed (gateway timeout)

The system is now inconsistent: customer paid but has no order or inventory.

Handling Compensation Failures
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
// Robust compensation with failure handling
 
class SagaCompensationManager {
  async compensate(
    sagaId: string,
    completedSteps: CompletedStep[],
    failureReason: string,
  ): Promise<CompensationResult> {
    const compensationAttempts: CompensationAttempt[] = [];
    const failedCompensations: FailedCompensation[] = [];
    
    // Compensate in reverse order
    for (const step of completedSteps.reverse()) {
      const attempt = await this.compensateStep(step, sagaId);
      compensationAttempts.push(attempt);
      
      if (!attempt.success) {
        failedCompensations.push({
          step: step.name,
          error: attempt.error,
          willRetry: attempt.retryable,
        });
        
        if (attempt.retryable) {
          // Schedule async retry
          await this.scheduleCompensationRetry(sagaId, step, attempt.retryCount);
        } else {
          // Non-retryable: alert for manual intervention
          await this.alertManualIntervention(sagaId, step, attempt.error);
        }
      }
    }
    
    // Determine overall compensation status
    if (failedCompensations.length === 0) {
      return { status: 'FULLY_COMPENSATED', attempts: compensationAttempts };
    } else if (failedCompensations.every(f => f.willRetry)) {
      return { status: 'COMPENSATING', attempts: compensationAttempts, pending: failedCompensations };
    } else {
      return { status: 'REQUIRES_INTERVENTION', attempts: compensationAttempts, failed: failedCompensations };
    }
  }
  
  private async compensateStep(
    step: CompletedStep,
    sagaId: string,
    retryCount: number = 0,
  ): Promise<CompensationAttempt> {
    const maxRetries = 5;
    const baseDelay = 1000; // 1 second
    
    try {
      await step.compensation.execute(step.data);
      
      return {
        step: step.name,
        success: true,
        timestamp: new Date(),
      };
    } catch (error) {
      const isRetryable = this.isRetryableError(error);
      
      return {
        step: step.name,
        success: false,
        error: error.message,
        retryable: isRetryable && retryCount < maxRetries,
        retryCount: retryCount + 1,
        nextRetryAt: isRetryable 
          ? new Date(Date.now() + baseDelay * Math.pow(2, retryCount))
          : undefined,
        timestamp: new Date(),
      };
    }
  }
  
  private isRetryableError(error: Error): boolean {
    // Network errors, timeouts, rate limits are retryable
    return (
      error instanceof NetworkError ||
      error instanceof TimeoutError ||
      error instanceof RateLimitError ||
      (error instanceof HttpError && error.status >= 500)
    );
  }
  
  // Background job to retry failed compensations
  async processCompensationRetries(): Promise<void> {
    const pending = await this.stateStore.findPendingCompensations();
    
    for (const compensation of pending) {
      if (compensation.nextRetryAt <= new Date()) {
        const result = await this.compensateStep(
          compensation.step,
          compensation.sagaId,
          compensation.retryCount,
        );
        
        if (result.success) {
          await this.stateStore.markCompensationComplete(compensation.id);
        } else if (!result.retryable) {
          await this.alertManualIntervention(compensation.sagaId, compensation.step, result.error);
        } else {
          await this.stateStore.updateNextRetry(
            compensation.id,
            result.nextRetryAt,
            result.retryCount,
          );
        }
      }
    }
  }
}

Scenario 2: Forward Recovery

Sometimes, instead of compensating backward, it's better to push forward. If payment and inventory succeeded but shipping failed temporarily, should you refund everything or retry shipping?

Forward Recovery retries the failing step instead of compensating. Use it when:

The failure is likely transient
Compensation has high cost (payment processing fees)
Business rules prefer completion over cancellation

Forward Recovery Decision
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
class SagaRecoveryDecider {
  decideRecoveryStrategy(
    saga: Saga,
    failedStep: SagaStep,
    error: Error,
  ): RecoveryStrategy {
    // Check if forward recovery is possible
    if (this.canRetryForward(failedStep, error)) {
      const retryCount = saga.getRetryCount(failedStep.name);
      
      if (retryCount < failedStep.maxRetries) {
        return {
          type: 'RETRY_FORWARD',
          delay: this.calculateRetryDelay(retryCount, failedStep),
        };
      }
    }
    
    // Check business rules for direction
    const completedSteps = saga.getCompletedSteps();
    const compensationCost = this.estimateCompensationCost(completedSteps);
    const retryProbability = this.estimateRetrySuccessProbability(failedStep, error);
    
    // If compensation is expensive and retry likely to succeed, retry more
    if (compensationCost > threshold && retryProbability > 0.5) {
      return {
        type: 'RETRY_FORWARD',
        delay: this.calculateExtendedRetryDelay(saga),
        maxAdditionalRetries: 3,
      };
    }
    
    // Otherwise, compensate
    return {
      type: 'COMPENSATE_BACKWARD',
      steps: completedSteps.reverse(),
    };
  }
  
  private canRetryForward(step: SagaStep, error: Error): boolean {
    // Some errors should never be retried
    if (error instanceof ValidationError) return false;
    if (error instanceof BusinessRuleViolation) return false;
    if (error instanceof InsufficientFundsError) return false;
    
    // Transient errors can be retried
    return true;
  }
  
  private estimateCompensationCost(steps: CompletedStep[]): number {
    return steps.reduce((cost, step) => {
      // Payment refunds have fees
      if (step.name === 'PROCESS_PAYMENT') {
        cost += step.data.amount * 0.03; // ~3% payment processing fee
      }
      // Inventory release is cheap
      // Shipping cancellation might have fees
      if (step.name === 'SCHEDULE_SHIPPING' && step.data.shipment.status === 'LABEL_PRINTED') {
        cost += 5; // Label printing cost
      }
      return cost;
    }, 0);
  }
}

Semantic Lock Anti-Pattern Caution

Saga Execution Guarantees

Understanding exactly what guarantees sagas provide (and don't provide) is essential for correct system design.

What Sagas Guarantee:

1. Eventual Consistency The system will eventually reach a consistent state—either all transactions complete or all are compensated.

2. Atomicity of Local Transactions Each step is atomic within its service's database. ACID properties apply per-step.

3. Ordered Execution Steps execute in defined order. Compensation follows reverse order.

What Sagas Do NOT Guarantee:

Saga Limitations

•No Isolation — Other transactions may see intermediate states. A user might query their account during the saga and see an inconsistent balance.
•No Global Atomicity — During execution, some services show changes while others don't yet. It's possible to read 'payment succeeded' while inventory reservation is still pending.
•No Rollback — Compensation creates new transactions; original transactions remain in history. Audit logs will show both the original and compensation.
•No Guaranteed Timing — A saga might take seconds or hours depending on service availability and retry policies.

Handling Lack of Isolation:

The 'dirty reads' problem in sagas requires explicit design:

Countermeasure 1: Semantic Locking Mark resources as 'in progress' to prevent conflicting operations. Other sagas wait or fail when encountering locked resources.

Countermeasure 2: Commutative Operations Design operations so order doesn't matter. 'Add $10' followed by 'Subtract $5' equals 'Subtract $5' followed by 'Add $10'.

Countermeasure 3: Pessimistic Views Version your reads. Don't show 'order confirmed' until the saga completes. Show 'order processing' during intermediate states.

Countermeasure 4: Reread Values Before compensation, re-verify the current state. The state might have already been corrected by another process.

Handling Saga Isolation Issues
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
// Semantic Locking for Saga Isolation
 
class InventoryWithSagaLocking {
  async reserveWithLock(input: ReserveInput): Promise<Reservation> {
    return this.db.transaction(async (tx) => {
      // Acquire semantic lock on items
      for (const item of input.items) {
        const product = await tx.products.findUnique({
          where: { id: item.productId },
        });
        
        // Check for existing saga lock
        if (product.lockedBySagaId && product.lockedBySagaId !== input.sagaId) {
          // Another saga is working on this item
          const lockAge = Date.now() - product.lockedAt.getTime();
          
          if (lockAge < 5 * 60 * 1000) { // Lock is fresh (< 5 min)
            throw new ResourceLockedError(
              `Product ${item.productId} is locked by saga ${product.lockedBySagaId}`
            );
          } else {
            // Lock is stale - previous saga likely failed
            // Take over the lock
            console.warn(`Taking over stale lock on ${item.productId}`);
          }
        }
        
        // Acquire lock for this saga
        await tx.products.update({
          where: { id: item.productId },
          data: {
            lockedBySagaId: input.sagaId,
            lockedAt: new Date(),
          },
        });
      }
      
      // Now safe to reserve
      const reservation = await this.createReservation(tx, input);
      
      return reservation;
    });
  }
  
  async releaseWithLock(reservationId: string, sagaId: string): Promise<void> {
    return this.db.transaction(async (tx) => {
      const reservation = await tx.reservations.findUnique({
        where: { id: reservationId },
      });
      
      // Release inventory
      await this.doRelease(tx, reservation);
      
      // Release semantic locks
      for (const item of reservation.items) {
        await tx.products.update({
          where: { 
            id: item.productId,
            lockedBySagaId: sagaId, // Only release our own lock
          },
          data: {
            lockedBySagaId: null,
            lockedAt: null,
          },
        });
      }
    });
  }
}
 
// Pessimistic View - Show uncertain state to users
class OrderStatusService {
  async getOrderStatus(orderId: string): Promise<OrderStatusView> {
    const saga = await this.sagaStore.findByOrderId(orderId);
    
    if (!saga) {
      // No saga running - show actual order status
      return this.getActualStatus(orderId);
    }
    
    // Saga is in progress - show conservative status
    switch (saga.status) {
      case 'RUNNING':
        return {
          status: 'PROCESSING',
          message: 'Your order is being processed',
          approximateWait: '2-5 minutes',
        };
      case 'COMPENSATING':
        return {
          status: 'PROCESSING',
          message: 'Your order is being updated',
          approximateWait: '1-3 minutes',
        };
      case 'REQUIRES_INTERVENTION':
        return {
          status: 'UNDER_REVIEW',
          message: 'Your order requires manual review',
          supportContact: true,
        };
      case 'COMPLETED':
        return this.getActualStatus(orderId);
      case 'COMPENSATED':
        return {
          status: 'CANCELLED',
          message: 'Your order could not be completed',
          reason: saga.compensationReason,
        };
    }
  }
}

Design for Intermediate Visibility

Summary: Mastering Sagas

We've explored saga patterns for managing distributed transactions. Let's consolidate the key insights:

Key Takeaways

•Sagas replace distributed transactions — Instead of global ACID, sagas use local transactions with compensating transactions for rollback.
•Choreographed sagas are decentralized — Services react to events and emit success/failure events. Great for loose coupling, harder to understand.
•Orchestrated sagas are centralized — An orchestrator explicitly controls the flow. Easier to understand and modify, but creates coupling.
•Compensation is semantic, not logical — Compensating transactions create new effects that reverse business outcomes. They don't undo database operations.
•Compensation must be idempotent — The same compensation might execute multiple times. Design compensations to handle this gracefully.
•Sagas lack isolation — Intermediate states are visible. Use semantic locking and pessimistic views to manage concurrent access.
•Forward recovery is sometimes better — When compensation is costly and retry likely succeeds, push forward rather than backward.

Page Complete

3 / 5