Loading content...
Now imagine an orchestra. Unlike the jazz ensemble improvising together, an orchestra has a conductor—a central figure who sees the entire score, coordinates the timing, and ensures every section plays its part at the right moment. The musicians are skilled, but they follow the conductor's direction. The complete piece exists in one place: the conductor's mind and the score before them.
This is orchestration in distributed systems. A central service—the orchestrator—holds the complete workflow definition, explicitly invokes other services, tracks progress, and makes decisions about what happens next. The workflow isn't emergent; it's explicit, visible, and controlled from a single point.
Orchestration trades the autonomy of choreography for visibility and control. When you need to understand exactly what's happening in a complex multi-step process, when business rules change frequently, or when the workflow requires sophisticated branching logic, orchestration often provides the clarity you need.
By the end of this page, you will understand orchestration as a coordination pattern: its architecture, implementation approaches, state management strategies, and the scenarios where centralized control is the right choice. You'll be able to design orchestrated workflows that provide clear visibility into distributed processes.
Orchestration is a coordination pattern where a central service—the orchestrator—explicitly controls the workflow by invoking services in sequence, handling responses, and making routing decisions. The orchestrator knows the complete process and directs other services like a conductor directing musicians.
The fundamental principle: In orchestration, the workflow is explicit and centralized. One service contains the complete business process logic. Other services are invoked rather than reacting. The orchestrator maintains state, handles errors, and decides what happens next based on responses.
Key characteristics of orchestration:
| Aspect | Choreography | Orchestration |
|---|---|---|
| Workflow Location | Distributed across services | Centralized in orchestrator |
| Control Flow | Implicit (event subscriptions) | Explicit (orchestrator code) |
| Coupling | Services coupled to events | Orchestrator coupled to all services |
| Visibility | Requires tracing across services | Visible in one place |
| Autonomy | Services fully autonomous | Services follow orchestrator commands |
| Change Impact | Add services without changing others | Workflow changes centralized |
| Debugging | Requires distributed tracing | Linear execution trace |
| Scalability | Each service scales independently | Orchestrator can be bottleneck |
The same order process, orchestrated:
Consider our e-commerce example from the previous page, now with orchestration:
The complete workflow exists in the orchestrator's code. Each service provides capabilities; the orchestrator composes them into a process.
Orchestration trades autonomy for visibility. You can see the entire workflow in one place, debug it with a single trace, and change it without coordinating multiple teams. But the orchestrator becomes a coordination point—logically, if not necessarily a performance bottleneck.
A well-designed orchestrator has distinct architectural components that separate concerns and enable reliability. Let's examine the key components and how they interact.
Core Components:
1. Workflow Definition The declarative or imperative specification of the process steps, transitions, and decision logic. This could be code, BPMN, a state machine, or a DSL.
2. Execution Engine The runtime that interprets the workflow definition, manages execution state, schedules steps, and handles transitions.
3. State Store Persistent storage for workflow instances, their current state, accumulated data, and history. Essential for durability across restarts.
4. Service Connectors Adapters that invoke external services, whether through synchronous HTTP/gRPC calls or asynchronous message queues.
5. Error Handler Logic for retries, compensation, timeout handling, and escalation when steps fail.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117
// Core interfaces of an orchestrator architecture /** * Workflow Definition - The declarative description of the process */interface WorkflowDefinition { name: string; version: string; steps: WorkflowStep[]; errorHandlers: ErrorHandler[]; timeout: Duration;} interface WorkflowStep { id: string; name: string; type: 'task' | 'decision' | 'parallel' | 'wait' | 'compensate'; action: string; // Reference to service action input: InputMapping; // How to map workflow data to action input output: OutputMapping; // How to map action output to workflow data next: string | Transition[]; // Next step or conditional transitions retryPolicy?: RetryPolicy; timeout?: Duration; compensateWith?: string; // Step to run on rollback} interface Transition { condition: string; // Expression language: "$.payment.status == 'declined'" next: string; // Target step ID} /** * Execution Engine - Runs workflow instances */class WorkflowEngine { constructor( private readonly stateStore: WorkflowStateStore, private readonly serviceRegistry: ServiceRegistry, private readonly scheduler: StepScheduler, ) {} async startWorkflow( definition: WorkflowDefinition, input: Record<string, unknown>, correlationId: string, ): Promise<WorkflowInstance> { // Create initial state const instance = await this.stateStore.create({ id: uuid(), definitionName: definition.name, definitionVersion: definition.version, correlationId, status: 'running', currentStep: definition.steps[0].id, data: input, history: [], createdAt: new Date(), }); // Schedule first step await this.scheduler.scheduleStep(instance.id, definition.steps[0].id); return instance; } async executeStep(instanceId: string, stepId: string): Promise<void> { const instance = await this.stateStore.get(instanceId); const definition = await this.getDefinition(instance); const step = definition.steps.find(s => s.id === stepId); try { // Execute the step const input = this.mapInput(step.input, instance.data); const service = this.serviceRegistry.get(step.action); const output = await service.execute(input, { timeout: step.timeout, retryPolicy: step.retryPolicy, }); // Update state const newData = this.mapOutput(step.output, instance.data, output); const nextStepId = this.determineNextStep(step, newData); await this.stateStore.update(instanceId, { currentStep: nextStepId, data: newData, history: [...instance.history, { stepId, status: 'completed', output, timestamp: new Date(), }], }); // Schedule next step or complete if (nextStepId) { await this.scheduler.scheduleStep(instanceId, nextStepId); } else { await this.stateStore.update(instanceId, { status: 'completed' }); } } catch (error) { await this.handleStepError(instance, step, error); } }} /** * State Store - Durable workflow state persistence */interface WorkflowStateStore { create(instance: WorkflowInstance): Promise<WorkflowInstance>; get(id: string): Promise<WorkflowInstance>; update(id: string, changes: Partial<WorkflowInstance>): Promise<void>; findByCorrelationId(correlationId: string): Promise<WorkflowInstance[]>; findPending(): Promise<WorkflowInstance[]>;}Synchronous vs Asynchronous Orchestration:
Orchestrators can invoke services either synchronously or asynchronously:
Synchronous Orchestration:
Asynchronous Orchestration:
Most production orchestrators use asynchronous orchestration with durable state, even for operations that complete quickly. This provides resilience against orchestrator restarts and enables horizontal scaling.
You can build custom orchestrators or use dedicated workflow engines like Temporal, Camunda, AWS Step Functions, or Apache Airflow. Workflow engines provide the execution infrastructure, letting you focus on workflow logic. For complex workflows, engines usually provide better reliability and observability than custom code.
The orchestrator's superpower is centralized state management. Unlike choreography where state is distributed across services, the orchestrator maintains a complete picture of each workflow instance.
What state must be tracked:
1. Execution Position — What step is the workflow currently at? What steps have completed? What's pending?
2. Accumulated Data — Results from previous steps that will be used in future steps. Payment confirmation needed for shipping, etc.
3. Error State — Which steps failed? How many retries? What compensation has been attempted?
4. Timing Information — When did each step start/complete? Are any steps timing out?
5. Business Context — Correlation IDs, customer information, order details—anything needed throughout the workflow.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
// Comprehensive workflow instance stateinterface WorkflowInstance { // Identity id: string; definitionName: string; definitionVersion: string; correlationId: string; // Execution status status: 'pending' | 'running' | 'waiting' | 'completed' | 'failed' | 'compensating'; currentStep: string | null; // Accumulated workflow data data: { // Initial input input: { orderId: string; customerId: string; items: OrderItem[]; shippingAddress: Address; totalAmount: Money; }; // Results from completed steps payment?: { paymentId: string; status: 'completed' | 'failed'; transactionRef: string; processedAt: string; }; inventory?: { reservationId: string; reservedItems: ReservedItem[]; warehouseId: string; }; shipping?: { shipmentId: string; carrier: string; trackingNumber: string; estimatedDelivery: string; }; // Error information lastError?: { step: string; message: string; code: string; timestamp: string; }; }; // Execution history history: StepExecution[]; // Timing createdAt: Date; startedAt?: Date; completedAt?: Date; // Timeout tracking timeoutAt?: Date; // For compensation/rollback completedCompensableSteps: string[];} interface StepExecution { stepId: string; stepName: string; status: 'started' | 'completed' | 'failed' | 'compensated'; input?: Record<string, unknown>; output?: Record<string, unknown>; error?: { message: string; code: string; retryCount: number; }; startedAt: Date; completedAt?: Date; duration?: number;} // State transitions are explicit and auditedclass WorkflowStateManager { async transitionTo( instanceId: string, newStatus: WorkflowInstance['status'], changes: Partial<WorkflowInstance>, reason: string, ): Promise<void> { const instance = await this.stateStore.get(instanceId); // Validate transition is legal this.validateTransition(instance.status, newStatus); // Apply changes atomically await this.stateStore.update(instanceId, { ...changes, status: newStatus, history: [ ...instance.history, { type: 'transition', from: instance.status, to: newStatus, reason, timestamp: new Date(), } ], }); // Emit state change event for monitoring await this.eventEmitter.emit('workflow.state.changed', { instanceId, previousStatus: instance.status, newStatus, reason, }); } private validateTransition(from: string, to: string): void { const validTransitions: Record<string, string[]> = { 'pending': ['running', 'failed'], 'running': ['waiting', 'completed', 'failed', 'compensating'], 'waiting': ['running', 'failed', 'compensating'], 'compensating': ['completed', 'failed'], 'completed': [], // Terminal 'failed': [], // Terminal }; if (!validTransitions[from]?.includes(to)) { throw new InvalidTransitionError(from, to); } }}State Store Requirements:
The state store is critical infrastructure for orchestration. Key requirements:
Durability: State must survive orchestrator restarts. Use a database, not in-memory storage.
Consistency: State updates must be atomic. A step can't complete without its results being persisted.
Query Support: You need to find workflows by correlation ID, status, age, etc.
Performance: High-throughput orchestration needs a fast state store. Consider write-ahead logs for efficiency.
Visibility: Expose state for debugging and monitoring. Every transition should be queryable.
The state store is the orchestrator's durability guarantee. If it fails, workflow execution stops. Use replicated databases with failover. Never use single-instance stores in production. The state store's availability directly determines the orchestrator's availability.
Let's implement a complete orchestrated order workflow. We'll use Temporal as an example workflow engine, but the patterns apply to any orchestration platform.
Workflow Definition:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230
import { proxyActivities, defineSignal, defineQuery, setHandler, condition, sleep, ApplicationFailure,} from '@temporalio/workflow'; import type { OrderActivities } from './activities'; // Create proxies for activities (service calls)const activities = proxyActivities<OrderActivities>({ startToCloseTimeout: '30s', retry: { initialInterval: '1s', backoffCoefficient: 2, maximumAttempts: 3, nonRetryableErrorTypes: ['PaymentDeclinedError', 'CustomerBlockedError'], },}); // Workflow signals (external events)const cancelOrderSignal = defineSignal('cancelOrder');const updateShippingSignal = defineSignal<[Address]>('updateShipping'); // Workflow queries (read state)const getStatusQuery = defineQuery<OrderStatus>('getStatus');const getHistoryQuery = defineQuery<StepHistory[]>('getHistory'); // Workflow stateinterface OrderWorkflowState { orderId: string; status: OrderStatus; paymentResult?: PaymentResult; inventoryResult?: InventoryResult; shippingResult?: ShippingResult; history: StepHistory[]; cancelled: boolean;} /** * Order Processing Workflow * * Complete order fulfillment from payment through shipping */export async function orderWorkflow(input: OrderInput): Promise<OrderResult> { // Initialize workflow state const state: OrderWorkflowState = { orderId: input.orderId, status: 'processing_payment', history: [], cancelled: false, }; // Set up signal and query handlers setHandler(cancelOrderSignal, () => { state.cancelled = true; }); setHandler(getStatusQuery, () => state.status); setHandler(getHistoryQuery, () => state.history); try { // Step 1: Process Payment state.status = 'processing_payment'; state.history.push({ step: 'payment', status: 'started', at: new Date() }); state.paymentResult = await activities.processPayment({ orderId: input.orderId, customerId: input.customerId, amount: input.totalAmount, paymentMethod: input.paymentMethod, }); state.history.push({ step: 'payment', status: 'completed', at: new Date(), result: { paymentId: state.paymentResult.paymentId }, }); // Check for cancellation between steps if (state.cancelled) { return await compensateOrder(state, 'User cancelled'); } // Step 2: Reserve Inventory state.status = 'reserving_inventory'; state.history.push({ step: 'inventory', status: 'started', at: new Date() }); state.inventoryResult = await activities.reserveInventory({ orderId: input.orderId, items: input.items, warehousePreference: input.shippingAddress.region, }); state.history.push({ step: 'inventory', status: 'completed', at: new Date(), result: { reservationId: state.inventoryResult.reservationId }, }); if (state.cancelled) { return await compensateOrder(state, 'User cancelled'); } // Step 3: Schedule Shipping state.status = 'scheduling_shipment'; state.history.push({ step: 'shipping', status: 'started', at: new Date() }); state.shippingResult = await activities.scheduleShipment({ orderId: input.orderId, reservationId: state.inventoryResult.reservationId, shippingAddress: input.shippingAddress, customerTier: input.customerTier, }); state.history.push({ step: 'shipping', status: 'completed', at: new Date(), result: { shipmentId: state.shippingResult.shipmentId, trackingNumber: state.shippingResult.trackingNumber, }, }); // Step 4: Send Confirmation state.status = 'sending_confirmation'; await activities.sendOrderConfirmation({ orderId: input.orderId, customerId: input.customerId, email: input.customerEmail, orderDetails: { items: input.items, total: input.totalAmount, trackingNumber: state.shippingResult.trackingNumber, estimatedDelivery: state.shippingResult.estimatedDelivery, }, }); state.status = 'completed'; state.history.push({ step: 'complete', status: 'completed', at: new Date() }); return { orderId: input.orderId, status: 'completed', paymentId: state.paymentResult.paymentId, shipmentId: state.shippingResult.shipmentId, trackingNumber: state.shippingResult.trackingNumber, }; } catch (error) { state.status = 'failed'; state.history.push({ step: 'error', status: 'failed', at: new Date(), error: error instanceof Error ? error.message : 'Unknown error', }); // Attempt compensation return await compensateOrder(state, error.message); }} /** * Compensation logic - undo completed steps */async function compensateOrder( state: OrderWorkflowState, reason: string,): Promise<OrderResult> { state.status = 'compensating'; // Compensate in reverse order if (state.shippingResult) { try { await activities.cancelShipment({ shipmentId: state.shippingResult.shipmentId, }); state.history.push({ step: 'shipping_cancel', status: 'completed', at: new Date() }); } catch (e) { // Log but continue compensation } } if (state.inventoryResult) { try { await activities.releaseInventory({ reservationId: state.inventoryResult.reservationId, }); state.history.push({ step: 'inventory_release', status: 'completed', at: new Date(), }); } catch (e) { /* continue */ } } if (state.paymentResult) { try { await activities.refundPayment({ paymentId: state.paymentResult.paymentId, amount: state.paymentResult.amount, reason, }); state.history.push({ step: 'payment_refund', status: 'completed', at: new Date(), }); } catch (e) { /* continue */ } } state.status = 'cancelled'; return { orderId: state.orderId, status: 'cancelled', reason, };}Activities (Service Invocations):
Activities are the actual service calls. They're separate from workflow logic, allowing different retry policies and timeouts for each service.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122
import { Context } from '@temporalio/activity'; // Activities interface - defines what services the workflow can callexport interface OrderActivities { processPayment(input: PaymentInput): Promise<PaymentResult>; reserveInventory(input: InventoryInput): Promise<InventoryResult>; scheduleShipment(input: ShipmentInput): Promise<ShipmentResult>; sendOrderConfirmation(input: ConfirmationInput): Promise<void>; refundPayment(input: RefundInput): Promise<void>; releaseInventory(input: ReleaseInput): Promise<void>; cancelShipment(input: CancelInput): Promise<void>;} // Implementation of activitiesexport function createOrderActivities( paymentService: PaymentServiceClient, inventoryService: InventoryServiceClient, shippingService: ShippingServiceClient, notificationService: NotificationServiceClient,): OrderActivities { return { async processPayment(input) { const ctx = Context.current(); ctx.heartbeat('Processing payment'); try { const result = await paymentService.charge({ orderId: input.orderId, customerId: input.customerId, amount: input.amount, method: input.paymentMethod, idempotencyKey: ctx.info.activityId, // Temporal provides unique ID }); return { paymentId: result.id, status: 'completed', amount: input.amount, transactionRef: result.transactionReference, }; } catch (error) { if (error instanceof PaymentDeclinedError) { // Non-retryable error - workflow will handle throw new ApplicationFailure( 'Payment declined', 'PaymentDeclinedError', true, // nonRetryable [{ reason: error.declineReason }], ); } throw error; // Retryable error } }, async reserveInventory(input) { const ctx = Context.current(); ctx.heartbeat('Reserving inventory'); const result = await inventoryService.reserve({ orderId: input.orderId, items: input.items, warehousePreference: input.warehousePreference, holdDuration: '24h', }); return { reservationId: result.reservationId, reservedItems: result.items, warehouseId: result.warehouseId, expiresAt: result.expiresAt, }; }, async scheduleShipment(input) { const ctx = Context.current(); ctx.heartbeat('Scheduling shipment'); const result = await shippingService.schedule({ orderId: input.orderId, reservationId: input.reservationId, destination: input.shippingAddress, serviceLevel: input.customerTier === 'premium' ? 'express' : 'standard', }); return { shipmentId: result.shipmentId, carrier: result.carrier, trackingNumber: result.trackingNumber, estimatedDelivery: result.estimatedDelivery, }; }, async sendOrderConfirmation(input) { await notificationService.sendEmail({ to: input.email, template: 'order_confirmation', data: input.orderDetails, }); }, // Compensation activities async refundPayment(input) { await paymentService.refund({ paymentId: input.paymentId, amount: input.amount, reason: input.reason, }); }, async releaseInventory(input) { await inventoryService.release({ reservationId: input.reservationId, }); }, async cancelShipment(input) { await shippingService.cancel({ shipmentId: input.shipmentId, }); }, };}Notice the heartbeat calls in activities. For long-running operations, heartbeats tell the workflow engine the activity is still alive. If heartbeats stop, the engine can cancel and retry the activity elsewhere. This is how orchestrators detect and recover from worker failures.
Unlike choreography, where error handling is distributed, orchestration centralizes error handling in the orchestrator. This provides powerful capabilities but requires careful design.
Retry Strategies:
The orchestrator controls retry behavior for each step:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151
// Error handling patterns in orchestration interface RetryPolicy { maxAttempts: number; initialInterval: Duration; backoffCoefficient: number; maxInterval: Duration; nonRetryableErrors: string[];} // Different retry policies for different types of failuresconst retryPolicies = { // Payment service - careful retries, idempotent payment: { maxAttempts: 3, initialInterval: '1s', backoffCoefficient: 2, maxInterval: '10s', nonRetryableErrors: [ 'PaymentDeclinedError', // Customer's card declined 'FraudDetectedError', // Fraud prevention triggered 'InsufficientFundsError', // Not enough balance 'CardExpiredError', // Card is expired ], }, // Inventory service - quick retries for lock contention inventory: { maxAttempts: 5, initialInterval: '100ms', backoffCoefficient: 1.5, maxInterval: '2s', nonRetryableErrors: [ 'ItemDiscontinuedError', // Item no longer sold 'RegionNotServicedError', // Can't ship to that region ], }, // Shipping service - longer intervals for external carrier APIs shipping: { maxAttempts: 3, initialInterval: '5s', backoffCoefficient: 2, maxInterval: '30s', nonRetryableErrors: [ 'AddressUndeliverableError', // Invalid shipping address ], }, // Notification - aggressive retries, not critical notification: { maxAttempts: 10, initialInterval: '10s', backoffCoefficient: 2, maxInterval: '5m', nonRetryableErrors: [], // Always retry notifications },}; // Orchestrator error handling logicasync function handleStepError( instance: WorkflowInstance, step: WorkflowStep, error: Error,): Promise<void> { const policy = retryPolicies[step.serviceType]; // Check if error is non-retryable if (policy.nonRetryableErrors.includes(error.constructor.name)) { await this.handleNonRetryableError(instance, step, error); return; } // Check retry count const retryCount = instance.history.filter( h => h.stepId === step.id && h.status === 'failed' ).length; if (retryCount >= policy.maxAttempts) { await this.handleMaxRetriesExceeded(instance, step, error); return; } // Calculate backoff const interval = Math.min( policy.initialInterval * Math.pow(policy.backoffCoefficient, retryCount), policy.maxInterval, ); // Schedule retry await Promise.all([ this.stateStore.update(instance.id, { history: [...instance.history, { stepId: step.id, status: 'failed', error: { message: error.message, retryCount }, timestamp: new Date(), nextRetry: new Date(Date.now() + interval), }], }), this.scheduler.scheduleStep(instance.id, step.id, { delay: interval }), ]);} async function handleNonRetryableError( instance: WorkflowInstance, step: WorkflowStep, error: Error,): Promise<void> { // For non-retryable errors, we need to make a decision // based on the error type and step if (step.compensateWith) { // This step can be compensated - start compensation flow await this.startCompensation(instance, step, error); } else if (step.fallback) { // There's a fallback step defined await this.executeFallback(instance, step); } else { // No recovery possible - fail the workflow await this.failWorkflow(instance, step, error); }} async function handleMaxRetriesExceeded( instance: WorkflowInstance, step: WorkflowStep, error: Error,): Promise<void> { // Max retries exceeded - could escalate, compensate, or fail // Emit alert for human intervention await this.alerting.send({ severity: 'high', workflow: instance.definitionName, instanceId: instance.id, step: step.id, error: error.message, retryCount: await this.getRetryCount(instance, step), action: 'Max retries exceeded - manual intervention may be required', }); // Start compensation or wait for manual resolution if (step.onMaxRetriesExceeded === 'compensate') { await this.startCompensation(instance, step, error); } else if (step.onMaxRetriesExceeded === 'pause') { await this.pauseForManualResolution(instance, step); } else { await this.failWorkflow(instance, step, error); }}Compensation must happen in the reverse order of execution. If you reserved inventory then charged payment, compensation must refund payment then release inventory. Getting this order wrong can leave the system in inconsistent states. Track completed steps and compensate them in reverse.
Time management is crucial in orchestration. Workflows can't run forever, steps can't hang indefinitely, and some business processes have real deadlines (24-hour sale, reservation expiration, etc.).
Types of Timeouts:
1. Step Timeout (Start-to-Close) Maximum time a single step can take from invocation to completion. Prevents individual service calls from hanging.
2. Workflow Timeout (Execution Timeout) Maximum time for the entire workflow from start to completion. Prevents zombie workflows.
3. Schedule-to-Start Timeout Maximum time a step can wait in queue before a worker picks it up. Detects capacity issues.
4. Heartbeat Timeout Maximum time between heartbeats for long-running steps. Detects worker failures mid-activity.
5. Business Deadline Time by which the workflow must reach a certain state (e.g., order must ship within 24 hours of payment).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152
import { proxyActivities, sleep, CancellationScope, isCancellation, condition,} from '@temporalio/workflow'; /** * Workflow with comprehensive timeout handling */export async function orderWithTimeouts(input: OrderInput): Promise<OrderResult> { const state = initializeState(input); // Track business deadline const orderMustShipBy = new Date(Date.now() + 24 * 60 * 60 * 1000); // 24 hours try { // Step 1: Payment with 30s timeout const paymentActivities = proxyActivities<PaymentActivities>({ startToCloseTimeout: '30s', scheduleToCloseTimeout: '1m', // Including queue time heartbeatTimeout: '10s', }); state.payment = await paymentActivities.processPayment(input); // Check if we still have time for shipping deadline const timeRemaining = orderMustShipBy.getTime() - Date.now(); if (timeRemaining < 60 * 60 * 1000) { // Less than 1 hour // Expedite the process state.expedited = true; } // Step 2: Inventory with business-aware timeout const inventoryActivities = proxyActivities<InventoryActivities>({ startToCloseTimeout: state.expedited ? '10s' : '30s', }); state.inventory = await inventoryActivities.reserveInventory({ ...input, priority: state.expedited ? 'high' : 'normal', }); // Step 3: Shipping - must complete before deadline const shippingDeadline = Math.max( timeRemaining - 30 * 60 * 1000, // Leave 30 min buffer 5 * 60 * 1000, // Minimum 5 minutes ); const shippingActivities = proxyActivities<ShippingActivities>({ startToCloseTimeout: `${Math.floor(shippingDeadline / 1000)}s`, }); state.shipping = await shippingActivities.scheduleShipment({ ...input, reservationId: state.inventory.reservationId, mustShipBy: orderMustShipBy, }); return { status: 'completed', ...state }; } catch (error) { if (isCancellation(error)) { // Workflow was cancelled - clean up return await compensate(state, 'Workflow cancelled'); } if (error.cause?.name === 'TimeoutError') { // A step timed out return await handleTimeout(state, error); } throw error; }} /** * Long-running workflow with wait states */export async function orderWithApproval(input: OrderInput): Promise<OrderResult> { const state = initializeState(input); // Process payment state.payment = await paymentActivities.processPayment(input); // For high-value orders, wait for manual approval if (input.totalAmount.amount > 100000) { // > $1000 state.status = 'awaiting_approval'; // Wait for approval signal OR timeout after 24 hours const approved = await condition( () => state.approvalDecision !== undefined, '24h', // Timeout ); if (!approved) { // Approval timed out - cancel order return await compensate(state, 'Approval timeout'); } if (state.approvalDecision === 'rejected') { return await compensate(state, 'Order rejected by approver'); } } // Continue with approved order return await continueOrder(state, input);} /** * Periodic timeout checks */export async function orderWithPeriodicChecks(input: OrderInput): Promise<OrderResult> { const state = initializeState(input); // Start timeout monitor in parallel const timeoutScope = CancellationScope.cancellable(async () => { // Check every minute if we should abandon while (true) { await sleep('1m'); const elapsed = Date.now() - state.startTime.getTime(); const currentStep = state.currentStep; // Alert if taking too long if (elapsed > 5 * 60 * 1000 && currentStep === 'payment') { await alertActivities.sendSlowWorkflowAlert({ workflowId: input.orderId, step: currentStep, elapsed, }); } // Hard limit: 1 hour if (elapsed > 60 * 60 * 1000) { throw new ApplicationFailure('Workflow timeout exceeded'); } } }); try { // Main workflow logic const result = await processOrder(state, input); // Cancel timeout monitor on success timeoutScope.cancel(); return result; } catch (error) { timeoutScope.cancel(); throw error; }}Timeout values should reflect business requirements, not just technical constraints. A payment timeout of 30 seconds might be technically feasible but terrible for customers on slow connections. A shipping deadline of 24 hours might be a contractual SLA. Work with business stakeholders to set appropriate timeouts.
A common concern with orchestration is that the orchestrator becomes a scalability bottleneck. While valid, this concern is often overstated. Properly designed orchestrators scale effectively.
Scaling Strategies:
1. Partition Workflows Run multiple orchestrator instances, each handling a subset of workflows. Partition by customer ID, order region, workflow type, etc.
2. Separate Workers Separate workflow logic (orchestration) from activity execution (service calls). Scale workers independently based on load.
3. Optimize State Store The state store is often the bottleneck. Use partitioned databases, caching, and efficient serialization. Consider event sourcing for append-only writes.
4. Async Communication Don't wait synchronously for slow activities. Use message queues so orchestrator threads aren't blocked.
| Pattern | How It Works | When to Use |
|---|---|---|
| Horizontal Orchestrator Scaling | Multiple orchestrator instances sharing a partitioned state store | High workflow volume, stateless orchestrator logic |
| Worker Pool Scaling | Activities executed by auto-scaling worker pools | Variable activity load, expensive operations |
| Event-Driven Activities | Activities triggered by events, not direct calls | Very high throughput, eventual consistency acceptable |
| Workflow Sharding | Workflows partitioned across clusters by key | Multi-region, tenant isolation requirements |
| Tiered Execution | Urgent workflows on dedicated fast path | Mixed latency requirements |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
// Scalable orchestrator deployment pattern /** * Multiple orchestrator instances with partitioned state */class PartitionedOrchestrator { constructor( private readonly partitionId: string, private readonly totalPartitions: number, private readonly stateStore: PartitionedStateStore, ) {} // Each instance only processes workflows in its partition shouldProcess(workflowId: string): boolean { const hash = this.hashWorkflowId(workflowId); return (hash % this.totalPartitions) === parseInt(this.partitionId); } async pollAndExecute(): Promise<void> { // Only poll for workflows in this partition const pending = await this.stateStore.findPendingInPartition( this.partitionId ); for (const instance of pending) { await this.executeStep(instance); } }} /** * Separate worker pools for different activity types */interface WorkerPoolConfig { paymentWorkers: { minInstances: 5, maxInstances: 50, scaleMetric: 'queue_depth', scaleThreshold: 100, // Scale up when queue > 100 }, inventoryWorkers: { minInstances: 10, maxInstances: 100, scaleMetric: 'queue_depth', scaleThreshold: 200, }, shippingWorkers: { minInstances: 3, maxInstances: 20, scaleMetric: 'queue_latency_p99', scaleThreshold: 5000, // Scale up when p99 > 5s }, notificationWorkers: { minInstances: 2, maxInstances: 20, scaleMetric: 'queue_depth', scaleThreshold: 1000, // Notification can batch more },} /** * Event-driven activity for high throughput */class EventDrivenActivityHandler { async handleActivityRequest(event: ActivityRequestEvent): Promise<void> { // Perform the activity const result = await this.executeActivity(event.activityType, event.input); // Send result back via event (not blocking the orchestrator) await this.eventBus.publish('activity.completed', { workflowId: event.workflowId, stepId: event.stepId, result, }); }} // Orchestrator consumes completion eventsclass AsyncOrchestrator { async handleActivityCompleted(event: ActivityCompletedEvent): Promise<void> { const instance = await this.stateStore.get(event.workflowId); // Update state with activity result await this.stateStore.update(event.workflowId, { data: { ...instance.data, [event.stepId]: event.result }, }); // Determine and schedule next step const nextStep = this.determineNextStep(instance, event.stepId); if (nextStep) { await this.eventBus.publish('activity.request', { workflowId: event.workflowId, stepId: nextStep.id, activityType: nextStep.activityType, input: this.mapInput(nextStep, instance.data), }); } }}In many systems, the orchestrator's work is minimal—it's just deciding what to do next and updating state. The actual bottleneck is the services being called (payment processing, inventory checks). Profile before optimizing; you may find the orchestrator is never the limiting factor.
We've explored orchestration as a coordination pattern for event-driven architectures. Let's consolidate the key insights:
Orchestration excels when:
In the next page, we'll explore Saga patterns—the critical mechanism for handling distributed transactions in both choreography and orchestration. You'll learn how to maintain consistency across services without distributed locks.
You now understand orchestration as a coordination pattern: its architecture, state management requirements, error handling capabilities, and scaling strategies. You can design orchestrated workflows where a central service controls complex multi-step processes with full visibility and control. Next, we'll examine saga patterns for distributed transaction management.