Loading content...
"Should we use choreography or orchestration?"
This question comes up in nearly every distributed system design, and the answer is almost never simple. Both patterns solve the same fundamental problem—coordinating work across multiple services—but they do so with radically different philosophies. Choosing between them isn't just a technical decision; it affects team autonomy, debugging experience, system evolution, and operational complexity.
There's no universally "better" approach. The right choice depends on your specific context: team structure, workflow complexity, consistency requirements, and operational capabilities. This page provides a rigorous framework for making this decision, examining trade-offs across multiple dimensions and offering concrete guidance for real-world scenarios.
By the end of this page, you will have a systematic framework for choosing between choreography and orchestration. You'll understand the trade-offs across coupling, visibility, reliability, team dynamics, and operational complexity. You'll be equipped to make—and defend—architecture decisions based on your specific requirements.
At its core, the choice between choreography and orchestration is a trade-off between autonomy and visibility.
Choreography maximizes autonomy:
Orchestration maximizes visibility:
Neither is inherently superior. The right choice depends on which trade-offs align with your constraints and priorities.
| Dimension | Choreography | Orchestration | Implication |
|---|---|---|---|
| Coupling | Services coupled to events | Orchestrator coupled to all services | Choreography enables independent deployment; orchestration creates deployment dependencies |
| Visibility | Workflow distributed across services | Workflow centralized in orchestrator | Orchestration easier to understand; choreography requires distributed tracing |
| Resilience | No single point of failure | Orchestrator can be a bottleneck | Choreography degrades gracefully; orchestration fails comprehensively |
| Team Autonomy | Teams fully independent | Changes coordinated through orchestrator team | Choreography suits autonomous teams; orchestration suits centralized platform teams |
| Evolution | Add consumers without changing producers | Add steps in one place | Choreography easier to extend; orchestration easier to modify |
| Debugging | Trace through multiple services | Follow single execution path | Orchestration faster to debug; choreography requires tooling |
A startup with 10 engineers and 5 services has different needs than an enterprise with 500 engineers and 200 services. A workflow that changes weekly has different requirements than one that's stable for years. Always evaluate trade-offs in your specific context.
Both patterns create coupling, but of different types. Understanding these differences is crucial for your decision.
Choreography Coupling:
In choreography, services are coupled to event schemas, not to each other:
This is data coupling—services share data contracts.
Orchestration Coupling:
In orchestration, the orchestrator is coupled to service interfaces:
This is control coupling—the orchestrator controls service invocations.
Coupling Impact on Deployments:
Choreography: Services deploy independently. When you deploy a new version of Payment Service, no other service needs to change. The event schema is the contract; as long as you satisfy it, you're free to deploy.
Orchestration: Deploying service changes may require orchestrator updates. If Payment Service adds a required field, the orchestrator must send that field. This creates deployment coordination needs.
The Key Question: Do you prioritize independent team velocity (choreography) or explicit dependency management (orchestration)?
Choreography's coupling is implicit, not absent. If Inventory Service assumes OrderCreated events always contain shippingAddress, and Order Service changes to make shippingAddress optional, Inventory Service breaks silently. Explicit contracts and consumer-driven contract testing are essential in choreography.
When something goes wrong at 3 AM, how quickly can you understand what happened? This operational reality significantly influences the choice between patterns.
Orchestration Debugging:
With orchestration, debugging follows a predictable path:
The orchestrator is the single source of truth for workflow state. You know exactly where execution stopped and why.
Choreography Debugging:
With choreography, debugging requires correlation across services:
This requires robust distributed tracing infrastructure and consistent correlation ID propagation.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
// ORCHESTRATION: Single query shows workflow stateasync function debugOrchestration(orderId: string): Promise<DebugInfo> { const workflow = await workflowStore.findByCorrelationId(orderId); return { status: workflow.status, // 'FAILED' currentStep: workflow.currentStep, // 'SCHEDULE_SHIPPING' history: workflow.history, // Complete step-by-step history error: workflow.lastError, // 'ShippingService: 503 Service Unavailable' startedAt: workflow.startedAt, duration: workflow.duration, stepsCompleted: [ { step: 'CREATE_ORDER', duration: '150ms', output: { orderId: 'abc' } }, { step: 'RESERVE_INVENTORY', duration: '320ms', output: { reservationId: 'xyz' } }, { step: 'PROCESS_PAYMENT', duration: '1.2s', output: { paymentId: 'def' } }, { step: 'SCHEDULE_SHIPPING', duration: '5s', error: '503 Service Unavailable' }, ], };} // CHOREOGRAPHY: Must correlate across services and message brokerasync function debugChoreography(orderId: string): Promise<DebugInfo> { // Query multiple systems const [ orderEvents, inventoryEvents, paymentEvents, shippingEvents, deadLetterEvents, ] = await Promise.all([ eventStore.findByCorrelationId(orderId, 'order-service'), eventStore.findByCorrelationId(orderId, 'inventory-service'), eventStore.findByCorrelationId(orderId, 'payment-service'), eventStore.findByCorrelationId(orderId, 'shipping-service'), deadLetterQueue.findByCorrelationId(orderId), ]); // Reconstruct the event chain const allEvents = [ ...orderEvents, ...inventoryEvents, ...paymentEvents, ...shippingEvents, ].sort((a, b) => a.timestamp - b.timestamp); // Identify gaps const expectedFlow = ['OrderCreated', 'InventoryReserved', 'PaymentCompleted', 'ShipmentScheduled']; const actualFlow = allEvents.map(e => e.type); const missingStep = expectedFlow.find(e => !actualFlow.includes(e)); // Check dead letter queue const failedEvent = deadLetterEvents.find(e => e.eventType === 'PaymentCompleted'); return { reconstructedFlow: allEvents, missingStep, // 'ShipmentScheduled' lastEvent: allEvents[allEvents.length - 1], // PaymentCompleted deadLetterEvent: failedEvent, // Event that failed processing hypothesis: failedEvent ? 'ShippingService failed to process PaymentCompleted event' : 'Event might be stuck in message broker or never published', nextSteps: [ 'Check shipping-service logs for PaymentCompleted handling', 'Verify message broker connectivity to shipping-service', 'Check for consumer lag on shipping topic', ], };}Choreography requires significant investment in observability infrastructure. If you don't have distributed tracing, centralized logging, and event replay capabilities, choreography debugging will be painful. Factor this infrastructure cost into your decision.
Conway's Law states that systems mirror organizational structure. Your choice between choreography and orchestration should align with how your teams are organized and how they want to work.
Choreography Suits:
Orchestration Suits:
| Organizational Pattern | Better Fit | Why |
|---|---|---|
| Spotify Model (Squads, Tribes) | Choreography | Autonomous squads own services; event contracts are inter-squad interfaces |
| Platform + Product Teams | Orchestration | Platform team owns orchestrators; product teams own domain services |
| Single Full-Stack Team | Either | Small teams can manage either; choose based on workflow complexity |
| Outsourced Development | Orchestration | Clear contracts via orchestrator; less reliance on implicit event understanding |
| Regulated Industry | Orchestration | Audit requirements favor explicit workflow definition |
| Startup (< 20 engineers) | Either / Simpler | Avoid over-engineering; often direct calls are sufficient |
Communication Overhead:
Choreography Communication:
Orchestration Communication:
The Key Question: Does your organization communicate better through explicit coordination (orchestration) or through documented contracts and autonomous adoption (choreography)?
Whichever pattern you choose, ensure clear ownership. In choreography, who owns event schemas? In orchestration, who owns the orchestrator? Unclear ownership leads to unmaintained code, schema drift, and operational incidents. Define ownership before implementing either pattern.
The nature of your workflow significantly influences which pattern fits better. Analyze your workflow along these dimensions:
1. Workflow Complexity
Simple linear flow: Order → Payment → Inventory → Shipping
Complex branching logic: If payment fails, try backup method. If premium customer, expedite inventory. If international, check customs.
2. Workflow Change Frequency
Rarely changes (once per quarter):
Frequently changes (weekly):
3. Workflow Duration
Seconds to minutes:
Hours to days (long-running):
| Characteristic | Recommendation | Rationale |
|---|---|---|
| Linear, 3-5 steps | Either / Prefer Choreography | Simple enough that choreography's decoupling is beneficial without complexity cost |
| Branching logic (> 3 conditions) | Orchestration | Complex conditionals are clearer in orchestrator code than distributed event logic |
| Parallel execution paths | Orchestration | Coordinating parallel branches and joins is orchestrator's strength |
| Human approval steps | Orchestration | Durable orchestrator handles wait states naturally |
| Timeouts and SLAs | Orchestration | Centralized timeout management is more reliable |
| Multi-week processes | Orchestration | Long-running state management is orchestrator's core competency |
| Many independent consumers | Choreography | Multiple teams needing to react to events independently |
| Stable, well-understood flow | Either / Prefer Choreography | Loose coupling is valuable when flow rarely changes |
| Experimental, evolving flow | Orchestration | Central location for rapid iteration |
4. Participant Ownership
Same team owns all participants:
Different teams own different participants:
5. Error Recovery Requirements
Simple retry logic:
Complex compensation logic:
Before deciding, map your actual workflows. Draw the steps, decisions, parallel paths, and error handlers. Count the branches. Identify who owns each participant. This concrete analysis often makes the right choice obvious.
Running systems in production surfaces differences that aren't obvious during design. Consider how each pattern affects day-to-day operations.
Monitoring and Alerting:
Choreography Monitoring:
Orchestration Monitoring:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
# CHOREOGRAPHY MONITORING# Requires aggregation from multiple sources choreography_alerts: - name: "Saga Completion Rate Drop" metric: | rate(saga_completed_total[5m]) / rate(saga_started_total[5m]) < 0.95 severity: warning - name: "Consumer Lag High" metric: | kafka_consumer_lag{consumer_group=~".*-saga-.*"} > 10000 severity: warning - name: "Dead Letter Queue Growing" metric: | rate(dlq_messages_total[10m]) > 0.1 severity: critical - name: "Saga Duration P99 High" # Requires custom aggregation across services metric: | histogram_quantile(0.99, saga_duration_seconds_bucket{saga_type="order"}) > 300 severity: warning # ORCHESTRATION MONITORING# Standard workflow engine metrics orchestration_alerts: - name: "Workflow Failure Rate" metric: | rate(workflow_executions_failed_total[5m]) / rate(workflow_executions_total[5m]) > 0.05 severity: critical - name: "Step Retry Rate High" metric: | rate(activity_retries_total[5m]) > 10 severity: warning - name: "Pending Workflows Backlog" metric: | temporal_pending_workflows > 1000 severity: warning - name: "Workflow Duration P99" metric: | histogram_quantile(0.99, workflow_execution_duration_bucket{workflow="order"}) > 300 severity: warningIncident Response:
Choreography Incidents:
Orchestration Incidents:
Recovery Procedures:
Choreography's operational overhead is often underestimated. Before choosing choreography, ensure you have: distributed tracing, correlation ID standards, log aggregation, DLQ monitoring, event replay tooling, and runbooks for common failure patterns. Without this foundation, on-call will be painful.
Let's synthesize everything into a practical decision framework. Answer these questions about your specific situation:
Scoring Guide:
Count how many answers point to each pattern:
Red Flags — Avoid Choreography When:
Red Flags — Avoid Orchestration When:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
// Programmatic decision frameworkinterface WorkflowContext { stepCount: number; branchCount: number; teamsInvolved: number; changeFrequency: 'weekly' | 'monthly' | 'quarterly' | 'rarely'; hasHumanInLoop: boolean; hasStrongCompliance: boolean; observabilityMaturity: 'low' | 'medium' | 'high'; teamAutonomyPriority: 'low' | 'medium' | 'high'; workflowDuration: 'seconds' | 'minutes' | 'hours' | 'days';} function recommendCoordinationPattern(ctx: WorkflowContext): Recommendation { let orchestrationScore = 0; let choreographyScore = 0; // Workflow complexity if (ctx.branchCount > 3) orchestrationScore += 2; if (ctx.stepCount > 6) orchestrationScore += 1; if (ctx.stepCount <= 4 && ctx.branchCount <= 1) choreographyScore += 2; // Change frequency if (ctx.changeFrequency === 'weekly') orchestrationScore += 2; if (ctx.changeFrequency === 'rarely') choreographyScore += 1; // Team structure if (ctx.teamsInvolved === 1) orchestrationScore += 2; if (ctx.teamsInvolved >= 4) choreographyScore += 2; // Operational readiness if (ctx.observabilityMaturity === 'low') orchestrationScore += 2; if (ctx.observabilityMaturity === 'high') choreographyScore += 1; // Compliance if (ctx.hasStrongCompliance) orchestrationScore += 2; // Team dynamics if (ctx.teamAutonomyPriority === 'high') choreographyScore += 2; // Workflow characteristics if (ctx.hasHumanInLoop) orchestrationScore += 2; if (ctx.workflowDuration === 'hours' || ctx.workflowDuration === 'days') { orchestrationScore += 2; } const recommendation = orchestrationScore > choreographyScore ? 'ORCHESTRATION' : orchestrationScore < choreographyScore ? 'CHOREOGRAPHY' : 'HYBRID_OR_EITHER'; return { recommendation, orchestrationScore, choreographyScore, confidence: Math.abs(orchestrationScore - choreographyScore) >= 3 ? 'high' : 'medium', keyFactors: identifyKeyFactors(ctx, orchestrationScore, choreographyScore), };} // Example usageconst orderWorkflow: WorkflowContext = { stepCount: 5, branchCount: 2, teamsInvolved: 4, changeFrequency: 'monthly', hasHumanInLoop: false, hasStrongCompliance: false, observabilityMaturity: 'high', teamAutonomyPriority: 'high', workflowDuration: 'minutes',}; const result = recommendCoordinationPattern(orderWorkflow);// { recommendation: 'CHOREOGRAPHY', score: { orc: 3, cho: 6 }, confidence: 'high' }If your analysis is inconclusive, consider starting with orchestration. It's generally easier to understand, debug, and modify. You can always evolve to choreography later by having the orchestrator emit events that other services consume. Moving from choreography to orchestration is harder—you're adding central control to a decentralized system.
We've analyzed the trade-offs between choreography and orchestration across multiple dimensions. Let's consolidate the key insights:
In the next page, we'll explore hybrid approaches—patterns that combine choreography and orchestration to get benefits of both. You'll see how to use orchestration within domains but choreography across domains, and learn architectural patterns that blend the paradigms effectively.
You now have a comprehensive framework for choosing between choreography and orchestration. You understand the trade-offs across coupling, visibility, team dynamics, workflow characteristics, and operations. You can make informed architecture decisions and defend them with concrete reasoning. Next, we'll examine hybrid approaches that combine both patterns.