System Design (HLD)Saga Pattern

The Saga Pattern: Managing Distributed Transactions

LevelAdvanced

Duration90 mins

TopicSaga Pattern

5 / 5

Saga Implementation

From Theory to Production

We've covered the theory, coordination mechanisms, compensating transactions, and failure handling. Now it's time to bridge the gap between understanding sagas and actually building them in production systems.\n\nThis page synthesizes everything into actionable implementation guidance: choosing between build vs. buy, leveraging saga frameworks, designing for testability, deploying safely, and learning from real-world implementations at scale.\n\nThe Implementation Reality:\n\nBuilding a robust saga system from scratch is a multi-month engineering effort. Before writing any code, understand the full scope of what 'saga infrastructure' entails:\n\n- State persistence and recovery\n- Retry logic with backoff and jitter\n- Timeout management at multiple levels\n- Circuit breaker integration\n- Distributed tracing\n- Metrics and alerting\n- Dead letter queue management\n- Admin tooling for operators\n\nMost organizations should leverage existing frameworks rather than building from scratch.

What You Will Learn

By the end of this page, you will be equipped to implement sagas in your organization: evaluating frameworks, designing saga workflows, implementing step definitions, testing effectively, deploying safely, and operating in production.

Build vs. Buy: The Framework Decision

The first implementation decision is whether to build saga infrastructure yourself or adopt an existing framework. This is rarely a close call.

Build Custom When

•Very simple sagas (2-3 steps, single failure mode)
•Extreme latency requirements (<10ms)
•Unique runtime environment (edge, embedded)
•Team has deep distributed systems expertise
•Existing homegrown orchestration to extend
•Learning/research project, not production

Use Framework When

•Complex/evolving business logic (most cases)
•Need reliable failure handling and recovery
•Require visibility into saga state
•Cannot afford months of infrastructure work
•Team should focus on business logic, not plumbing
•Standard runtime environment (cloud, Kubernetes)

Saga Framework Comparison
Framework	Type	Languages	Best For	Trade-offs
Temporal	Orchestration	Go, Java, TS, Python, PHP	Complex workflows, long-running	Learning curve, infrastructure
Camunda	Orchestration/BPMN	Java, REST API	Enterprise, visual modeling	Heavyweight, enterprise pricing
Axon Framework	Orchestration + ES	Java/Kotlin	DDD + Event Sourcing	Java-only, framework coupling
Eventuate Tram	Both	Java/Spring	Microservices patterns book style	Java-only, simpler features
MassTransit	Orchestration	C#/.NET	.NET ecosystem	.NET only
Conductor (Netflix)	Orchestration	Java, any via REST	Microservices, Netflix stack	Complex setup, Netflix-specific

The 80/20 Rule

For 80% of teams, Temporal is the right choice. It's production-proven at scale (Uber, Netflix, Snap), has excellent multi-language support, provides durable execution out of the box, and has a growing community. Start there unless you have specific requirements that push you elsewhere.

Implementing Sagas with Temporal

Temporal is the most widely adopted saga/workflow framework. Let's examine a production-ready saga implementation using Temporal's TypeScript SDK.

order-saga-temporal.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
// Complete Order Saga Implementation with Temporal
 
// ============================================
// WORKFLOW DEFINITION (saga-workflows.ts)
// ============================================
import {
  proxyActivities,
  sleep,
  ApplicationFailure,
  defineSignal,
  setHandler,
  condition
} from '@temporalio/workflow';
 
import type * as activities from './activities';
 
// Activity proxies with timeout configurations
const { 
  createOrder, 
  reserveInventory, 
  processPayment, 
  scheduleShipment,
  sendConfirmation 
} = proxyActivities<typeof activities>({
  startToCloseTimeout: '30 seconds',
  retry: {
    maximumAttempts: 3,
    initialInterval: '1 second',
    maximumInterval: '30 seconds',
    backoffCoefficient: 2
  }
});
 
const { 
  cancelOrder, 
  releaseInventory, 
  refundPayment 
} = proxyActivities<typeof activities>({
  // Compensations get more aggressive retry
  startToCloseTimeout: '60 seconds',
  retry: {
    maximumAttempts: 10,
    initialInterval: '2 seconds',
    maximumInterval: '5 minutes',
    backoffCoefficient: 2
  }
});
 
// Signals for external interaction
export const cancelOrderSignal = defineSignal<[string]>('cancelOrder');
 
// The saga workflow
export async function orderSaga(input: OrderInput): Promise<OrderResult> {
  let cancelled = false;
  let cancellationReason = '';
  
  // Handle external cancellation requests
  setHandler(cancelOrderSignal, (reason: string) => {
    cancelled = true;
    cancellationReason = reason;
  });
 
  // Compensation tracking
  const compensations: Array<() => Promise<void>> = [];
 
  try {
    // Check for cancellation before each step
    if (cancelled) {
      throw ApplicationFailure.nonRetryable('Order cancelled before start', 'CANCELLED');
    }
 
    // Step 1: Create Order
    console.log('Step 1: Creating order');
    const order = await createOrder(input);
    compensations.push(async () => {
      console.log('Compensating: Cancelling order');
      await cancelOrder(order.id);
    });
 
    if (cancelled) {
      throw ApplicationFailure.nonRetryable(cancellationReason, 'CANCELLED');
    }
 
    // Step 2: Reserve Inventory
    console.log('Step 2: Reserving inventory');
    const reservation = await reserveInventory({
      orderId: order.id,
      items: input.items
    });
    compensations.push(async () => {
      console.log('Compensating: Releasing inventory');
      await releaseInventory(reservation.id);
    });
 
    if (cancelled) {
      throw ApplicationFailure.nonRetryable(cancellationReason, 'CANCELLED');
    }
 
    // Step 3: Process Payment
    console.log('Step 3: Processing payment');
    const payment = await processPayment({
      orderId: order.id,
      amount: order.totalAmount,
      customerId: input.customerId
    });
    compensations.push(async () => {
      console.log('Compensating: Refunding payment');
      await refundPayment(payment.id);
    });
 
    // Step 4: Schedule Shipment (non-compensatable - last)
    console.log('Step 4: Scheduling shipment');
    const shipment = await scheduleShipment({
      orderId: order.id,
      address: input.shippingAddress,
      items: input.items
    });
 
    // Step 5: Send Confirmation
    console.log('Step 5: Sending confirmation');
    await sendConfirmation({
      orderId: order.id,
      email: input.customerEmail,
      trackingNumber: shipment.trackingNumber
    });
 
    return {
      success: true,
      orderId: order.id,
      trackingNumber: shipment.trackingNumber
    };
 
  } catch (error) {
    console.error('Saga failed, executing compensations:', error);
 
    // Execute compensations in reverse order
    for (const compensate of compensations.reverse()) {
      try {
        await compensate();
      } catch (compensationError) {
        // Log but continue with other compensations
        console.error('Compensation failed:', compensationError);
        // In production: alert, add to manual queue
      }
    }
 
    throw error;
  }
}

activities.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
// Activity Implementations (external service calls)
 
import { OrderService, InventoryService, PaymentService, ShippingService } from './services';
 
// Activities are where actual external calls happen
// They should be idempotent and retriable
 
export async function createOrder(input: OrderInput): Promise<Order> {
  const orderService = new OrderService();
  
  // Use idempotency key to prevent duplicates on retry
  return await orderService.create(input, {
    idempotencyKey: `order-${input.customerId}-${input.requestId}`
  });
}
 
export async function cancelOrder(orderId: string): Promise<void> {
  const orderService = new OrderService();
  
  // Idempotent: safe to call multiple times
  const order = await orderService.findById(orderId);
  if (order.status === 'CANCELLED') {
    return; // Already cancelled
  }
  
  await orderService.cancel(orderId);
}
 
export async function reserveInventory(
  input: ReserveInventoryInput
): Promise<Reservation> {
  const inventoryService = new InventoryService();
  
  return await inventoryService.reserve(input.items, {
    idempotencyKey: `inventory-${input.orderId}`
  });
}
 
export async function releaseInventory(reservationId: string): Promise<void> {
  const inventoryService = new InventoryService();
  
  const reservation = await inventoryService.findReservation(reservationId);
  if (!reservation || reservation.status === 'RELEASED') {
    return; // Already released or doesn't exist
  }
  
  await inventoryService.release(reservationId);
}
 
export async function processPayment(input: PaymentInput): Promise<Payment> {
  const paymentService = new PaymentService();
  
  // Use authorization + capture pattern
  const auth = await paymentService.authorize({
    amount: input.amount,
    customerId: input.customerId,
    idempotencyKey: `payment-${input.orderId}`
  });
  
  // Capture immediately (or could delay until shipment)
  await paymentService.capture(auth.id);
  
  return auth;
}
 
export async function refundPayment(paymentId: string): Promise<void> {
  const paymentService = new PaymentService();
  
  const payment = await paymentService.findById(paymentId);
  if (!payment || payment.status === 'REFUNDED') {
    return; // Already refunded
  }
  
  await paymentService.refund(paymentId);
}
 
export async function scheduleShipment(
  input: ShipmentInput
): Promise<Shipment> {
  const shippingService = new ShippingService();
  
  return await shippingService.schedule(input, {
    idempotencyKey: `shipment-${input.orderId}`
  });
}
 
export async function sendConfirmation(
  input: ConfirmationInput
): Promise<void> {
  // Notifications are fire-and-forget
  // Failures shouldn't compensate the saga
  try {
    await emailService.send({
      to: input.email,
      template: 'order-confirmation',
      data: {
        orderId: input.orderId,
        trackingNumber: input.trackingNumber
      }
    });
  } catch (error) {
    console.warn('Failed to send confirmation email:', error);
    // Don't throw - notification failure shouldn't fail the saga
  }
}

Essential Saga Design Patterns

Beyond the basic saga structure, several patterns address common production requirements:

Parallel Step Execution\n\nSome saga steps can execute concurrently to reduce latency. For example, reserving inventory and pre-authorizing payment can happen simultaneously.

TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Parallel saga steps in Temporal
async function orderSagaWithParallelSteps(input: OrderInput) {
  const compensations: Array<() => Promise<void>> = [];
 
  // Step 1: Create order
  const order = await createOrder(input);
  compensations.push(() => cancelOrder(order.id));
 
  // Steps 2a and 2b: Execute in parallel
  const [reservation, paymentAuth] = await Promise.all([
    reserveInventory({ orderId: order.id, items: input.items }),
    authorizePayment({ orderId: order.id, amount: order.totalAmount })
  ]);
 
  // Add compensations (in reverse order they should execute)
  compensations.push(() => releaseInventory(reservation.id));
  compensations.push(() => voidPaymentAuth(paymentAuth.id));
 
  // Step 3: Capture payment (depends on both parallel steps)
  const payment = await capturePayment(paymentAuth.id);
  compensations.push(() => refundPayment(payment.id));
 
  // Continue with remaining sequential steps...
}

Testing Saga Implementations

Sagas have multiple execution paths (success, failure at each step, compensation) that all need testing. A comprehensive test strategy covers multiple levels:

Saga Testing Pyramid

•Activity Unit Tests — Test each activity in isolation with mocked dependencies
•Workflow Unit Tests — Test saga logic with mocked activities (Temporal's test framework)
•Integration Tests — Test saga with real service interactions in test environment
•Failure Injection Tests — Test compensation paths by injecting failures at each step
•End-to-End Tests — Full saga execution in staging environment

saga-tests.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
// Comprehensive Saga Testing
 
import { TestWorkflowEnvironment } from '@temporalio/testing';
import { Worker } from '@temporalio/worker';
 
describe('Order Saga', () => {
  let testEnv: TestWorkflowEnvironment;
 
  beforeAll(async () => {
    testEnv = await TestWorkflowEnvironment.createLocal();
  });
 
  afterAll(async () => {
    await testEnv.teardown();
  });
 
  // ============================================
  // HAPPY PATH TESTS
  // ============================================
  describe('Happy Path', () => {
    it('should complete successfully with all steps', async () => {
      // Mock activities
      const activities = {
        createOrder: jest.fn().mockResolvedValue({ id: 'order-123' }),
        reserveInventory: jest.fn().mockResolvedValue({ id: 'res-123' }),
        processPayment: jest.fn().mockResolvedValue({ id: 'pay-123' }),
        scheduleShipment: jest.fn().mockResolvedValue({ trackingNumber: 'TRACK-123' }),
        sendConfirmation: jest.fn().mockResolvedValue(undefined),
        // Compensations should NOT be called
        cancelOrder: jest.fn(),
        releaseInventory: jest.fn(),
        refundPayment: jest.fn()
      };
 
      const worker = await Worker.create({
        connection: testEnv.nativeConnection,
        taskQueue: 'test',
        workflowsPath: require.resolve('./workflows'),
        activities
      });
 
      const result = await worker.runUntil(
        testEnv.client.workflow.execute('orderSaga', {
          taskQueue: 'test',
          workflowId: 'test-order-saga-1',
          args: [{ customerId: 'cust-1', items: [{ id: 'item-1', qty: 2 }] }]
        })
      );
 
      expect(result.success).toBe(true);
      expect(result.orderId).toBe('order-123');
      expect(activities.cancelOrder).not.toHaveBeenCalled();
      expect(activities.releaseInventory).not.toHaveBeenCalled();
      expect(activities.refundPayment).not.toHaveBeenCalled();
    });
  });
 
  // ============================================
  // COMPENSATION PATH TESTS
  // ============================================
  describe('Compensation Paths', () => {
    it('should compensate when payment fails', async () => {
      const activities = {
        createOrder: jest.fn().mockResolvedValue({ id: 'order-123' }),
        reserveInventory: jest.fn().mockResolvedValue({ id: 'res-123' }),
        processPayment: jest.fn().mockRejectedValue(new Error('Insufficient funds')),
        // Compensations
        cancelOrder: jest.fn().mockResolvedValue(undefined),
        releaseInventory: jest.fn().mockResolvedValue(undefined),
        refundPayment: jest.fn().mockResolvedValue(undefined)
      };
 
      const worker = await Worker.create({
        connection: testEnv.nativeConnection,
        taskQueue: 'test',
        workflowsPath: require.resolve('./workflows'),
        activities
      });
 
      await expect(
        worker.runUntil(
          testEnv.client.workflow.execute('orderSaga', {
            taskQueue: 'test',
            workflowId: 'test-order-saga-2',
            args: [{ customerId: 'cust-1', items: [] }]
          })
        )
      ).rejects.toThrow('Insufficient funds');
 
      // Verify compensations were called in reverse order
      expect(activities.releaseInventory).toHaveBeenCalledWith('res-123');
      expect(activities.cancelOrder).toHaveBeenCalledWith('order-123');
      
      // refundPayment should NOT be called (payment never succeeded)
      expect(activities.refundPayment).not.toHaveBeenCalled();
    });
 
    // Test compensation at each step
    const failureScenarios = [
      { 
        step: 'reserveInventory', 
        expectedCompensations: ['cancelOrder'] 
      },
      { 
        step: 'processPayment', 
        expectedCompensations: ['releaseInventory', 'cancelOrder'] 
      },
      { 
        step: 'scheduleShipment', 
        expectedCompensations: ['refundPayment', 'releaseInventory', 'cancelOrder'] 
      },
    ];
 
    test.each(failureScenarios)(
      'should compensate correctly when %s fails',
      async ({ step, expectedCompensations }) => {
        // Generate activities with failure at specified step
        const activities = generateActivitiesWithFailureAt(step);
        
        const worker = await createWorker(activities);
        
        await expect(executeOrderSaga(worker)).rejects.toThrow();
 
        // Verify expected compensations were called
        for (const compensation of expectedCompensations) {
          expect(activities[compensation]).toHaveBeenCalled();
        }
      }
    );
  });
 
  // ============================================
  // IDEMPOTENCY TESTS
  // ============================================
  describe('Idempotency', () => {
    it('should handle activity retry without duplication', async () => {
      let callCount = 0;
      const activities = {
        createOrder: jest.fn().mockImplementation(async (input) => {
          callCount++;
          if (callCount === 1) {
            throw new Error('Transient failure');
          }
          return { id: 'order-123' };
        }),
        // ... other activities
      };
 
      const worker = await Worker.create({
        connection: testEnv.nativeConnection,
        taskQueue: 'test',
        workflowsPath: require.resolve('./workflows'),
        activities
      });
 
      const result = await worker.runUntil(
        testEnv.client.workflow.execute('orderSaga', {
          taskQueue: 'test',
          workflowId: 'test-order-saga-idempotent',
          args: [testInput]
        })
      );
 
      expect(result.success).toBe(true);
      expect(callCount).toBe(2); // Failed once, succeeded once
    });
  });
 
  // ============================================
  // SIGNAL HANDLING TESTS
  // ============================================
  describe('Signal Handling', () => {
    it('should handle cancellation signal mid-saga', async () => {
      const activities = {
        createOrder: jest.fn().mockResolvedValue({ id: 'order-123' }),
        reserveInventory: jest.fn().mockImplementation(async () => {
          await new Promise(resolve => setTimeout(resolve, 100));
          return { id: 'res-123' };
        }),
        cancelOrder: jest.fn().mockResolvedValue(undefined),
        releaseInventory: jest.fn().mockResolvedValue(undefined)
      };
 
      const worker = await Worker.create({ /* ... */ });
 
      // Start workflow
      const handle = await testEnv.client.workflow.start('orderSaga', {
        taskQueue: 'test',
        workflowId: 'test-order-saga-cancel',
        args: [testInput]
      });
 
      // Wait a bit, then send cancellation signal
      await new Promise(resolve => setTimeout(resolve, 50));
      await handle.signal('cancelOrder', 'Customer requested cancellation');
 
      // Workflow should fail with cancellation
      await expect(handle.result()).rejects.toThrow('Customer requested cancellation');
    });
  });
});

Production Deployment Considerations

Deploying saga infrastructure requires careful planning around versioning, migration, rollback, and scaling.

Critical Deployment Concerns

•Workflow Versioning — Changing saga logic while sagas are in flight is dangerous. Use versioning to run old and new logic concurrently.
•Database Migrations — Schema changes must be backward compatible. Old saga instances need old schemas to continue.
•Rolling Deployments — Ensure new workers can process old saga events and vice versa during deployment.
•Saga Draining — Before major changes, consider draining in-flight sagas before deploying.
•Compensation Stability — Compensation logic must remain available even if forward logic changes.

saga-versioning.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// Saga Versioning with Temporal
 
import { patched, deprecatePatch } from '@temporalio/workflow';
 
async function orderSagaVersioned(input: OrderInput) {
  const order = await createOrder(input);
 
  // Version 1: Original inventory reservation
  // Version 2: Added availability check before reservation
  if (patched('inventory-check-v2')) {
    // New behavior: check availability first
    const available = await checkInventoryAvailability(input.items);
    if (!available) {
      await cancelOrder(order.id);
      throw ApplicationFailure.nonRetryable('Items not available');
    }
  }
 
  const reservation = await reserveInventory({ orderId: order.id });
 
  // Version 1: Single payment processor
  // Version 2: Regional payment processors
  let payment;
  if (patched('regional-payments-v2')) {
    // New behavior: use regional processor
    payment = await processRegionalPayment(order, input.region);
  } else {
    // Old behavior: single processor
    payment = await processPayment(order);
  }
 
  // Continue with saga...
}
 
// Deployment strategy:
// 1. Deploy with patched() returning false (old behavior)
// 2. Monitor in-flight sagas
// 3. Enable patch for new sagas
// 4. Wait for old sagas to complete
// 5. deprecatePatch() to remove old code path

The Versioning Challenge

Long-running sagas (hours/days) make versioning especially challenging. A saga started last week might still be running when you deploy breaking changes. Plan for this: maintain compensation logic for old versions, use feature flags, and consider saga completion before major deployments.

Infrastructure and Scaling

Saga infrastructure must scale with your system load. Key considerations:

Saga Infrastructure Scaling Dimensions
Component	Scaling Approach	Typical Limits	Bottleneck Indicators
Saga Store (DB)	Vertical + Sharding	Varies by DB	High latency, connection exhaustion
Saga Workers	Horizontal	Memory per worker	Task queue depth growing
Event Bus	Partitioning	Partitions × throughput	Consumer lag increasing
Service Endpoints	Load balancing	Service capacity	Step timeouts, retries

temporal-worker-scaling.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
// Temporal Worker Configuration for Production
 
import { Worker, NativeConnection } from '@temporalio/worker';
 
async function createProductionWorker() {
  // Establish connection to Temporal cluster
  const connection = await NativeConnection.connect({
    address: process.env.TEMPORAL_ADDRESS!,
    tls: {
      serverRootCACertificate: await fs.readFile(process.env.TEMPORAL_TLS_CERT!),
    }
  });
 
  // Create worker with production settings
  const worker = await Worker.create({
    connection,
    namespace: process.env.TEMPORAL_NAMESPACE!,
    taskQueue: 'order-sagas',
    workflowsPath: require.resolve('./workflows'),
    activities: require('./activities'),
    
    // Concurrency tuning
    maxConcurrentActivityTaskExecutions: 100,
    maxConcurrentWorkflowTaskExecutions: 100,
    maxConcurrentLocalActivityExecutions: 100,
    
    // Memory management
    maxCachedWorkflows: 500,
    
    // Sticky execution (performance optimization)
    stickyQueueScheduleToStartTimeout: '10s',
    
    // Graceful shutdown
    shutdownGraceTime: '30s'
  });
 
  // Metrics integration
  worker.registerPayload('metrics', (payload) => {
    metrics.increment('temporal.saga.task_completed');
  });
 
  return worker;
}
 
// Kubernetes HPA configuration for workers
const hpaConfig = {
  apiVersion: 'autoscaling/v2',
  kind: 'HorizontalPodAutoscaler',
  spec: {
    scaleTargetRef: {
      apiVersion: 'apps/v1',
      kind: 'Deployment',
      name: 'saga-workers'
    },
    minReplicas: 3,
    maxReplicas: 50,
    metrics: [
      {
        type: 'External',
        external: {
          metric: {
            name: 'temporal_task_queue_depth',
            selector: { matchLabels: { task_queue: 'order-sagas' } }
          },
          target: {
            type: 'AverageValue',
            averageValue: '10'  // Scale up when >10 pending tasks per worker
          }
        }
      }
    ]
  }
};

Real-World Case Studies

Learning from how industry leaders implement sagas provides invaluable practical insight:

Uber's Workflow Platform\n\nUber built Cadence (open-sourced as Temporal) to handle their complex workflow needs across 5,000+ microservices.\n\nScale:\n- 100+ million workflow executions per day\n- Thousands of unique workflow types\n- Workflows spanning hours to months\n\nKey Learnings:\n\n1. Deterministic replay is essential — Workflows must be deterministic for recovery after failures\n2. Visibility is critical — Built extensive tooling to inspect, debug, and manage workflows\n3. Testing at scale matters — Developed replay testing to verify workflow changes against historical executions\n4. Start simple, iterate — Many teams start with simple orchestration before adding saga compensation\n\nUse Cases at Uber:\n- Trip lifecycle management\n- Driver onboarding (multi-day process with background checks)\n- Marketing campaign orchestration\n- Financial reconciliation workflows

Summary: Implementing Production Sagas

We've covered the complete journey from saga theory to production implementation. Let's consolidate the essential guidance:

Key Takeaways

•Use a framework — For most teams, Temporal is the right choice. Building saga infrastructure from scratch is rarely justified.
•Design activities for idempotency — Activities will be retried. Use idempotency keys, check existing state, and handle duplicates.
•Leverage advanced patterns — Parallel steps, conditional execution, nested sagas, and signals address real-world complexity.
•Test all execution paths — Success, failure at each step, compensation, signals, timeouts—all need tests.
•Plan for versioning — Sagas may run for hours/days. Use workflow versioning to evolve logic safely.
•Invest in observability — Metrics, logging, and admin tooling are essential for operating sagas in production.
•Learn from industry leaders — Uber, Netflix, DoorDash have proven these patterns at massive scale.

Module Complete:\n\nYou've now mastered the Saga pattern from first principles through production implementation. You understand why distributed transactions fail, how choreography and orchestration work, the art of compensating transactions, resilient failure handling, and practical implementation patterns.\n\nThis knowledge equips you to design and build distributed systems that maintain data consistency across service boundaries—a fundamental capability for any microservices architecture.

Module Complete: Saga Pattern Mastery

You've completed a comprehensive exploration of the Saga pattern. From understanding why 2PC fails at scale, through choreography vs orchestration trade-offs, compensating transaction design, failure handling strategies, and production implementation—you now have the knowledge to implement reliable distributed transactions in your systems.

5 / 5

Loading learning content...

System Design (HLD)Saga Pattern

The Saga Pattern: Managing Distributed Transactions

LevelAdvanced

Duration90 mins

TopicSaga Pattern

5 / 5

Saga Implementation

From Theory to Production

What You Will Learn

Build vs. Buy: The Framework Decision

The first implementation decision is whether to build saga infrastructure yourself or adopt an existing framework. This is rarely a close call.

Build Custom When

•Very simple sagas (2-3 steps, single failure mode)
•Extreme latency requirements (<10ms)
•Unique runtime environment (edge, embedded)
•Team has deep distributed systems expertise
•Existing homegrown orchestration to extend
•Learning/research project, not production

Use Framework When

•Complex/evolving business logic (most cases)
•Need reliable failure handling and recovery
•Require visibility into saga state
•Cannot afford months of infrastructure work
•Team should focus on business logic, not plumbing
•Standard runtime environment (cloud, Kubernetes)

Saga Framework Comparison
Framework	Type	Languages	Best For	Trade-offs
Temporal	Orchestration	Go, Java, TS, Python, PHP	Complex workflows, long-running	Learning curve, infrastructure
Camunda	Orchestration/BPMN	Java, REST API	Enterprise, visual modeling	Heavyweight, enterprise pricing
Axon Framework	Orchestration + ES	Java/Kotlin	DDD + Event Sourcing	Java-only, framework coupling
Eventuate Tram	Both	Java/Spring	Microservices patterns book style	Java-only, simpler features
MassTransit	Orchestration	C#/.NET	.NET ecosystem	.NET only
Conductor (Netflix)	Orchestration	Java, any via REST	Microservices, Netflix stack	Complex setup, Netflix-specific

The 80/20 Rule

Implementing Sagas with Temporal

Temporal is the most widely adopted saga/workflow framework. Let's examine a production-ready saga implementation using Temporal's TypeScript SDK.

order-saga-temporal.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
// Complete Order Saga Implementation with Temporal
 
// ============================================
// WORKFLOW DEFINITION (saga-workflows.ts)
// ============================================
import {
  proxyActivities,
  sleep,
  ApplicationFailure,
  defineSignal,
  setHandler,
  condition
} from '@temporalio/workflow';
 
import type * as activities from './activities';
 
// Activity proxies with timeout configurations
const { 
  createOrder, 
  reserveInventory, 
  processPayment, 
  scheduleShipment,
  sendConfirmation 
} = proxyActivities<typeof activities>({
  startToCloseTimeout: '30 seconds',
  retry: {
    maximumAttempts: 3,
    initialInterval: '1 second',
    maximumInterval: '30 seconds',
    backoffCoefficient: 2
  }
});
 
const { 
  cancelOrder, 
  releaseInventory, 
  refundPayment 
} = proxyActivities<typeof activities>({
  // Compensations get more aggressive retry
  startToCloseTimeout: '60 seconds',
  retry: {
    maximumAttempts: 10,
    initialInterval: '2 seconds',
    maximumInterval: '5 minutes',
    backoffCoefficient: 2
  }
});
 
// Signals for external interaction
export const cancelOrderSignal = defineSignal<[string]>('cancelOrder');
 
// The saga workflow
export async function orderSaga(input: OrderInput): Promise<OrderResult> {
  let cancelled = false;
  let cancellationReason = '';
  
  // Handle external cancellation requests
  setHandler(cancelOrderSignal, (reason: string) => {
    cancelled = true;
    cancellationReason = reason;
  });
 
  // Compensation tracking
  const compensations: Array<() => Promise<void>> = [];
 
  try {
    // Check for cancellation before each step
    if (cancelled) {
      throw ApplicationFailure.nonRetryable('Order cancelled before start', 'CANCELLED');
    }
 
    // Step 1: Create Order
    console.log('Step 1: Creating order');
    const order = await createOrder(input);
    compensations.push(async () => {
      console.log('Compensating: Cancelling order');
      await cancelOrder(order.id);
    });
 
    if (cancelled) {
      throw ApplicationFailure.nonRetryable(cancellationReason, 'CANCELLED');
    }
 
    // Step 2: Reserve Inventory
    console.log('Step 2: Reserving inventory');
    const reservation = await reserveInventory({
      orderId: order.id,
      items: input.items
    });
    compensations.push(async () => {
      console.log('Compensating: Releasing inventory');
      await releaseInventory(reservation.id);
    });
 
    if (cancelled) {
      throw ApplicationFailure.nonRetryable(cancellationReason, 'CANCELLED');
    }
 
    // Step 3: Process Payment
    console.log('Step 3: Processing payment');
    const payment = await processPayment({
      orderId: order.id,
      amount: order.totalAmount,
      customerId: input.customerId
    });
    compensations.push(async () => {
      console.log('Compensating: Refunding payment');
      await refundPayment(payment.id);
    });
 
    // Step 4: Schedule Shipment (non-compensatable - last)
    console.log('Step 4: Scheduling shipment');
    const shipment = await scheduleShipment({
      orderId: order.id,
      address: input.shippingAddress,
      items: input.items
    });
 
    // Step 5: Send Confirmation
    console.log('Step 5: Sending confirmation');
    await sendConfirmation({
      orderId: order.id,
      email: input.customerEmail,
      trackingNumber: shipment.trackingNumber
    });
 
    return {
      success: true,
      orderId: order.id,
      trackingNumber: shipment.trackingNumber
    };
 
  } catch (error) {
    console.error('Saga failed, executing compensations:', error);
 
    // Execute compensations in reverse order
    for (const compensate of compensations.reverse()) {
      try {
        await compensate();
      } catch (compensationError) {
        // Log but continue with other compensations
        console.error('Compensation failed:', compensationError);
        // In production: alert, add to manual queue
      }
    }
 
    throw error;
  }
}

activities.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
// Activity Implementations (external service calls)
 
import { OrderService, InventoryService, PaymentService, ShippingService } from './services';
 
// Activities are where actual external calls happen
// They should be idempotent and retriable
 
export async function createOrder(input: OrderInput): Promise<Order> {
  const orderService = new OrderService();
  
  // Use idempotency key to prevent duplicates on retry
  return await orderService.create(input, {
    idempotencyKey: `order-${input.customerId}-${input.requestId}`
  });
}
 
export async function cancelOrder(orderId: string): Promise<void> {
  const orderService = new OrderService();
  
  // Idempotent: safe to call multiple times
  const order = await orderService.findById(orderId);
  if (order.status === 'CANCELLED') {
    return; // Already cancelled
  }
  
  await orderService.cancel(orderId);
}
 
export async function reserveInventory(
  input: ReserveInventoryInput
): Promise<Reservation> {
  const inventoryService = new InventoryService();
  
  return await inventoryService.reserve(input.items, {
    idempotencyKey: `inventory-${input.orderId}`
  });
}
 
export async function releaseInventory(reservationId: string): Promise<void> {
  const inventoryService = new InventoryService();
  
  const reservation = await inventoryService.findReservation(reservationId);
  if (!reservation || reservation.status === 'RELEASED') {
    return; // Already released or doesn't exist
  }
  
  await inventoryService.release(reservationId);
}
 
export async function processPayment(input: PaymentInput): Promise<Payment> {
  const paymentService = new PaymentService();
  
  // Use authorization + capture pattern
  const auth = await paymentService.authorize({
    amount: input.amount,
    customerId: input.customerId,
    idempotencyKey: `payment-${input.orderId}`
  });
  
  // Capture immediately (or could delay until shipment)
  await paymentService.capture(auth.id);
  
  return auth;
}
 
export async function refundPayment(paymentId: string): Promise<void> {
  const paymentService = new PaymentService();
  
  const payment = await paymentService.findById(paymentId);
  if (!payment || payment.status === 'REFUNDED') {
    return; // Already refunded
  }
  
  await paymentService.refund(paymentId);
}
 
export async function scheduleShipment(
  input: ShipmentInput
): Promise<Shipment> {
  const shippingService = new ShippingService();
  
  return await shippingService.schedule(input, {
    idempotencyKey: `shipment-${input.orderId}`
  });
}
 
export async function sendConfirmation(
  input: ConfirmationInput
): Promise<void> {
  // Notifications are fire-and-forget
  // Failures shouldn't compensate the saga
  try {
    await emailService.send({
      to: input.email,
      template: 'order-confirmation',
      data: {
        orderId: input.orderId,
        trackingNumber: input.trackingNumber
      }
    });
  } catch (error) {
    console.warn('Failed to send confirmation email:', error);
    // Don't throw - notification failure shouldn't fail the saga
  }
}

Essential Saga Design Patterns

Beyond the basic saga structure, several patterns address common production requirements:

Parallel Step Execution\n\nSome saga steps can execute concurrently to reduce latency. For example, reserving inventory and pre-authorizing payment can happen simultaneously.

TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Parallel saga steps in Temporal
async function orderSagaWithParallelSteps(input: OrderInput) {
  const compensations: Array<() => Promise<void>> = [];
 
  // Step 1: Create order
  const order = await createOrder(input);
  compensations.push(() => cancelOrder(order.id));
 
  // Steps 2a and 2b: Execute in parallel
  const [reservation, paymentAuth] = await Promise.all([
    reserveInventory({ orderId: order.id, items: input.items }),
    authorizePayment({ orderId: order.id, amount: order.totalAmount })
  ]);
 
  // Add compensations (in reverse order they should execute)
  compensations.push(() => releaseInventory(reservation.id));
  compensations.push(() => voidPaymentAuth(paymentAuth.id));
 
  // Step 3: Capture payment (depends on both parallel steps)
  const payment = await capturePayment(paymentAuth.id);
  compensations.push(() => refundPayment(payment.id));
 
  // Continue with remaining sequential steps...
}

Testing Saga Implementations

Sagas have multiple execution paths (success, failure at each step, compensation) that all need testing. A comprehensive test strategy covers multiple levels:

Saga Testing Pyramid

•Activity Unit Tests — Test each activity in isolation with mocked dependencies
•Workflow Unit Tests — Test saga logic with mocked activities (Temporal's test framework)
•Integration Tests — Test saga with real service interactions in test environment
•Failure Injection Tests — Test compensation paths by injecting failures at each step
•End-to-End Tests — Full saga execution in staging environment

saga-tests.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
// Comprehensive Saga Testing
 
import { TestWorkflowEnvironment } from '@temporalio/testing';
import { Worker } from '@temporalio/worker';
 
describe('Order Saga', () => {
  let testEnv: TestWorkflowEnvironment;
 
  beforeAll(async () => {
    testEnv = await TestWorkflowEnvironment.createLocal();
  });
 
  afterAll(async () => {
    await testEnv.teardown();
  });
 
  // ============================================
  // HAPPY PATH TESTS
  // ============================================
  describe('Happy Path', () => {
    it('should complete successfully with all steps', async () => {
      // Mock activities
      const activities = {
        createOrder: jest.fn().mockResolvedValue({ id: 'order-123' }),
        reserveInventory: jest.fn().mockResolvedValue({ id: 'res-123' }),
        processPayment: jest.fn().mockResolvedValue({ id: 'pay-123' }),
        scheduleShipment: jest.fn().mockResolvedValue({ trackingNumber: 'TRACK-123' }),
        sendConfirmation: jest.fn().mockResolvedValue(undefined),
        // Compensations should NOT be called
        cancelOrder: jest.fn(),
        releaseInventory: jest.fn(),
        refundPayment: jest.fn()
      };
 
      const worker = await Worker.create({
        connection: testEnv.nativeConnection,
        taskQueue: 'test',
        workflowsPath: require.resolve('./workflows'),
        activities
      });
 
      const result = await worker.runUntil(
        testEnv.client.workflow.execute('orderSaga', {
          taskQueue: 'test',
          workflowId: 'test-order-saga-1',
          args: [{ customerId: 'cust-1', items: [{ id: 'item-1', qty: 2 }] }]
        })
      );
 
      expect(result.success).toBe(true);
      expect(result.orderId).toBe('order-123');
      expect(activities.cancelOrder).not.toHaveBeenCalled();
      expect(activities.releaseInventory).not.toHaveBeenCalled();
      expect(activities.refundPayment).not.toHaveBeenCalled();
    });
  });
 
  // ============================================
  // COMPENSATION PATH TESTS
  // ============================================
  describe('Compensation Paths', () => {
    it('should compensate when payment fails', async () => {
      const activities = {
        createOrder: jest.fn().mockResolvedValue({ id: 'order-123' }),
        reserveInventory: jest.fn().mockResolvedValue({ id: 'res-123' }),
        processPayment: jest.fn().mockRejectedValue(new Error('Insufficient funds')),
        // Compensations
        cancelOrder: jest.fn().mockResolvedValue(undefined),
        releaseInventory: jest.fn().mockResolvedValue(undefined),
        refundPayment: jest.fn().mockResolvedValue(undefined)
      };
 
      const worker = await Worker.create({
        connection: testEnv.nativeConnection,
        taskQueue: 'test',
        workflowsPath: require.resolve('./workflows'),
        activities
      });
 
      await expect(
        worker.runUntil(
          testEnv.client.workflow.execute('orderSaga', {
            taskQueue: 'test',
            workflowId: 'test-order-saga-2',
            args: [{ customerId: 'cust-1', items: [] }]
          })
        )
      ).rejects.toThrow('Insufficient funds');
 
      // Verify compensations were called in reverse order
      expect(activities.releaseInventory).toHaveBeenCalledWith('res-123');
      expect(activities.cancelOrder).toHaveBeenCalledWith('order-123');
      
      // refundPayment should NOT be called (payment never succeeded)
      expect(activities.refundPayment).not.toHaveBeenCalled();
    });
 
    // Test compensation at each step
    const failureScenarios = [
      { 
        step: 'reserveInventory', 
        expectedCompensations: ['cancelOrder'] 
      },
      { 
        step: 'processPayment', 
        expectedCompensations: ['releaseInventory', 'cancelOrder'] 
      },
      { 
        step: 'scheduleShipment', 
        expectedCompensations: ['refundPayment', 'releaseInventory', 'cancelOrder'] 
      },
    ];
 
    test.each(failureScenarios)(
      'should compensate correctly when %s fails',
      async ({ step, expectedCompensations }) => {
        // Generate activities with failure at specified step
        const activities = generateActivitiesWithFailureAt(step);
        
        const worker = await createWorker(activities);
        
        await expect(executeOrderSaga(worker)).rejects.toThrow();
 
        // Verify expected compensations were called
        for (const compensation of expectedCompensations) {
          expect(activities[compensation]).toHaveBeenCalled();
        }
      }
    );
  });
 
  // ============================================
  // IDEMPOTENCY TESTS
  // ============================================
  describe('Idempotency', () => {
    it('should handle activity retry without duplication', async () => {
      let callCount = 0;
      const activities = {
        createOrder: jest.fn().mockImplementation(async (input) => {
          callCount++;
          if (callCount === 1) {
            throw new Error('Transient failure');
          }
          return { id: 'order-123' };
        }),
        // ... other activities
      };
 
      const worker = await Worker.create({
        connection: testEnv.nativeConnection,
        taskQueue: 'test',
        workflowsPath: require.resolve('./workflows'),
        activities
      });
 
      const result = await worker.runUntil(
        testEnv.client.workflow.execute('orderSaga', {
          taskQueue: 'test',
          workflowId: 'test-order-saga-idempotent',
          args: [testInput]
        })
      );
 
      expect(result.success).toBe(true);
      expect(callCount).toBe(2); // Failed once, succeeded once
    });
  });
 
  // ============================================
  // SIGNAL HANDLING TESTS
  // ============================================
  describe('Signal Handling', () => {
    it('should handle cancellation signal mid-saga', async () => {
      const activities = {
        createOrder: jest.fn().mockResolvedValue({ id: 'order-123' }),
        reserveInventory: jest.fn().mockImplementation(async () => {
          await new Promise(resolve => setTimeout(resolve, 100));
          return { id: 'res-123' };
        }),
        cancelOrder: jest.fn().mockResolvedValue(undefined),
        releaseInventory: jest.fn().mockResolvedValue(undefined)
      };
 
      const worker = await Worker.create({ /* ... */ });
 
      // Start workflow
      const handle = await testEnv.client.workflow.start('orderSaga', {
        taskQueue: 'test',
        workflowId: 'test-order-saga-cancel',
        args: [testInput]
      });
 
      // Wait a bit, then send cancellation signal
      await new Promise(resolve => setTimeout(resolve, 50));
      await handle.signal('cancelOrder', 'Customer requested cancellation');
 
      // Workflow should fail with cancellation
      await expect(handle.result()).rejects.toThrow('Customer requested cancellation');
    });
  });
});

Production Deployment Considerations

Deploying saga infrastructure requires careful planning around versioning, migration, rollback, and scaling.

Critical Deployment Concerns

•Workflow Versioning — Changing saga logic while sagas are in flight is dangerous. Use versioning to run old and new logic concurrently.
•Database Migrations — Schema changes must be backward compatible. Old saga instances need old schemas to continue.
•Rolling Deployments — Ensure new workers can process old saga events and vice versa during deployment.
•Saga Draining — Before major changes, consider draining in-flight sagas before deploying.
•Compensation Stability — Compensation logic must remain available even if forward logic changes.

saga-versioning.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// Saga Versioning with Temporal
 
import { patched, deprecatePatch } from '@temporalio/workflow';
 
async function orderSagaVersioned(input: OrderInput) {
  const order = await createOrder(input);
 
  // Version 1: Original inventory reservation
  // Version 2: Added availability check before reservation
  if (patched('inventory-check-v2')) {
    // New behavior: check availability first
    const available = await checkInventoryAvailability(input.items);
    if (!available) {
      await cancelOrder(order.id);
      throw ApplicationFailure.nonRetryable('Items not available');
    }
  }
 
  const reservation = await reserveInventory({ orderId: order.id });
 
  // Version 1: Single payment processor
  // Version 2: Regional payment processors
  let payment;
  if (patched('regional-payments-v2')) {
    // New behavior: use regional processor
    payment = await processRegionalPayment(order, input.region);
  } else {
    // Old behavior: single processor
    payment = await processPayment(order);
  }
 
  // Continue with saga...
}
 
// Deployment strategy:
// 1. Deploy with patched() returning false (old behavior)
// 2. Monitor in-flight sagas
// 3. Enable patch for new sagas
// 4. Wait for old sagas to complete
// 5. deprecatePatch() to remove old code path

The Versioning Challenge

Infrastructure and Scaling

Saga infrastructure must scale with your system load. Key considerations:

Saga Infrastructure Scaling Dimensions
Component	Scaling Approach	Typical Limits	Bottleneck Indicators
Saga Store (DB)	Vertical + Sharding	Varies by DB	High latency, connection exhaustion
Saga Workers	Horizontal	Memory per worker	Task queue depth growing
Event Bus	Partitioning	Partitions × throughput	Consumer lag increasing
Service Endpoints	Load balancing	Service capacity	Step timeouts, retries

temporal-worker-scaling.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
// Temporal Worker Configuration for Production
 
import { Worker, NativeConnection } from '@temporalio/worker';
 
async function createProductionWorker() {
  // Establish connection to Temporal cluster
  const connection = await NativeConnection.connect({
    address: process.env.TEMPORAL_ADDRESS!,
    tls: {
      serverRootCACertificate: await fs.readFile(process.env.TEMPORAL_TLS_CERT!),
    }
  });
 
  // Create worker with production settings
  const worker = await Worker.create({
    connection,
    namespace: process.env.TEMPORAL_NAMESPACE!,
    taskQueue: 'order-sagas',
    workflowsPath: require.resolve('./workflows'),
    activities: require('./activities'),
    
    // Concurrency tuning
    maxConcurrentActivityTaskExecutions: 100,
    maxConcurrentWorkflowTaskExecutions: 100,
    maxConcurrentLocalActivityExecutions: 100,
    
    // Memory management
    maxCachedWorkflows: 500,
    
    // Sticky execution (performance optimization)
    stickyQueueScheduleToStartTimeout: '10s',
    
    // Graceful shutdown
    shutdownGraceTime: '30s'
  });
 
  // Metrics integration
  worker.registerPayload('metrics', (payload) => {
    metrics.increment('temporal.saga.task_completed');
  });
 
  return worker;
}
 
// Kubernetes HPA configuration for workers
const hpaConfig = {
  apiVersion: 'autoscaling/v2',
  kind: 'HorizontalPodAutoscaler',
  spec: {
    scaleTargetRef: {
      apiVersion: 'apps/v1',
      kind: 'Deployment',
      name: 'saga-workers'
    },
    minReplicas: 3,
    maxReplicas: 50,
    metrics: [
      {
        type: 'External',
        external: {
          metric: {
            name: 'temporal_task_queue_depth',
            selector: { matchLabels: { task_queue: 'order-sagas' } }
          },
          target: {
            type: 'AverageValue',
            averageValue: '10'  // Scale up when >10 pending tasks per worker
          }
        }
      }
    ]
  }
};

Real-World Case Studies

Learning from how industry leaders implement sagas provides invaluable practical insight:

Summary: Implementing Production Sagas

We've covered the complete journey from saga theory to production implementation. Let's consolidate the essential guidance:

Key Takeaways

•Use a framework — For most teams, Temporal is the right choice. Building saga infrastructure from scratch is rarely justified.
•Design activities for idempotency — Activities will be retried. Use idempotency keys, check existing state, and handle duplicates.
•Leverage advanced patterns — Parallel steps, conditional execution, nested sagas, and signals address real-world complexity.
•Test all execution paths — Success, failure at each step, compensation, signals, timeouts—all need tests.
•Plan for versioning — Sagas may run for hours/days. Use workflow versioning to evolve logic safely.
•Invest in observability — Metrics, logging, and admin tooling are essential for operating sagas in production.
•Learn from industry leaders — Uber, Netflix, DoorDash have proven these patterns at massive scale.

Module Complete: Saga Pattern Mastery

5 / 5