Loading content...
In traditional ACID transactions, failure triggers automatic rollback—the database undoes all changes as if they never happened. Sagas don't have this luxury. Each step is a committed local transaction; there's no technical mechanism to 'undo' committed data.\n\nThis is where compensating transactions become essential. A compensating transaction (often abbreviated CT) is a business operation that semantically reverses the effect of a previous operation. It doesn't erase history—it adds a new entry that counteracts the original effect.\n\nThe Critical Distinction:\n\n- Technical Rollback: DELETE the row that was INSERTed → row never existed\n- Semantic Compensation: INSERT a cancellation record → order existed, then was cancelled\n\nThis distinction has profound implications for system design, auditability, and user experience.
By the end of this page, you will master the art of designing compensating transactions: understanding their properties, categorizing operation types, handling edge cases, dealing with non-compensatable operations, and implementing robust compensation logic that maintains data consistency in failure scenarios.
Before diving into implementation, let's establish the formal properties that compensating transactions must satisfy. Understanding these properties prevents subtle bugs in saga design.
The Commutativity Challenge:\n\nIn practice, compensations often aren't fully commutative. Consider:\n- T1: Reserve 10 units of product A\n- T2: Charge $100 to customer\n\nIf we compensate in order C(T1), C(T2):\n- Release reservation\n- Refund $100\n\nIf we compensate in order C(T2), C(T1):\n- Refund $100\n- Release reservation\n\nThe final state is the same. But consider:\n- T1: Create order record\n- T2: Send confirmation email\n\nCompensating C(T2) first (send 'order cancelled' email) before C(T1) (cancel order) might confuse monitoring systems that expect the order to be cancelled before notifications are sent.\n\nBest Practice: Execute compensations in reverse order of the original transactions. This maintains logical consistency and simplifies reasoning about state.
Not all operations are equally easy to compensate. Understanding the categories helps design sagas that can actually be rolled back.
| Category | Original Operation | Compensation | Difficulty |
|---|---|---|---|
| Reservable | Reserve inventory | Release reservation | Easy |
| Cancellable | Create pending order | Cancel order | Easy |
| Reversible | Credit account | Debit account | Medium |
| Retriable | Send notification | Send correction notification | Medium |
| Non-Compensatable | Ship physical goods | Cannot undo shipping | Hard/Impossible |
| External Side Effect | Charge credit card | Refund (fees may apply) | Hard |
Let's examine each category in depth:
Reservable operations use a two-phase approach: first reserve resources, then confirm or release.\n\nExamples:\n- Inventory reservation → release or confirm\n- Seat reservation → release or book\n- Payment authorization → void or capture\n- Hotel room hold → release or confirm\n\nImplementation Pattern:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
interface InventoryReservation { id: string; productId: string; quantity: number; orderId: string; status: 'RESERVED' | 'CONFIRMED' | 'RELEASED'; reservedAt: Date; expiresAt: Date; // Auto-release if not confirmed} class InventoryService { // T: Reserve inventory async reserve(productId: string, quantity: number, orderId: string): Promise<string> { return await this.db.transaction(async (tx) => { // Check available quantity const product = await tx.products.findById(productId); const reservedQty = await tx.reservations.sumQuantityByProductId(productId); const available = product.stockQuantity - reservedQty; if (available < quantity) { throw new InsufficientInventoryError(productId, quantity, available); } // Create reservation const reservation = await tx.reservations.create({ productId, quantity, orderId, status: 'RESERVED', reservedAt: new Date(), expiresAt: new Date(Date.now() + 15 * 60 * 1000) // 15 min TTL }); return reservation.id; }); } // C(T): Release reservation (compensation) async releaseReservation(reservationId: string): Promise<void> { await this.db.transaction(async (tx) => { const reservation = await tx.reservations.findById(reservationId); // Idempotency: already released or confirmed if (!reservation || reservation.status !== 'RESERVED') { return; // Safe to skip } await tx.reservations.update(reservationId, { status: 'RELEASED', releasedAt: new Date() }); }); } // Confirm reservation (for saga success) async confirmReservation(reservationId: string): Promise<void> { await this.db.transaction(async (tx) => { const reservation = await tx.reservations.findById(reservationId); if (!reservation || reservation.status !== 'RESERVED') { throw new ReservationNotFoundError(reservationId); } // Decrement actual stock await tx.products.decrementStock(reservation.productId, reservation.quantity); await tx.reservations.update(reservationId, { status: 'CONFIRMED', confirmedAt: new Date() }); }); }}Designing effective compensations requires careful analysis of each saga step. Here's a systematic approach:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142
// Comprehensive Compensation Implementation interface CompensationContext { sagaId: string; stepId: string; originalInput: unknown; originalOutput: unknown; executedAt: Date; compensationAttempt: number;} interface CompensationResult { success: boolean; requiresRetry: boolean; requiresManualIntervention: boolean; error?: Error;} // Base class for saga steps with built-in compensation supportabstract class CompensatableStep<TInput, TOutput> { abstract name: string; // Execute the forward action abstract execute(input: TInput): Promise<TOutput>; // Execute the compensation abstract compensate(context: CompensationContext): Promise<CompensationResult>; // Validate that compensation is possible abstract canCompensate(context: CompensationContext): Promise<boolean>;} // Example: Order Creation Stepclass CreateOrderStep extends CompensatableStep<CreateOrderInput, Order> { name = 'CreateOrder'; async execute(input: CreateOrderInput): Promise<Order> { const order = await this.orderRepository.create({ customerId: input.customerId, items: input.items, status: 'PENDING', totalAmount: this.calculateTotal(input.items), createdAt: new Date() }); // Emit event for audit trail await this.eventStore.append({ type: 'OrderCreated', aggregateId: order.id, data: order }); return order; } async canCompensate(context: CompensationContext): Promise<boolean> { const order = context.originalOutput as Order; const currentOrder = await this.orderRepository.findById(order.id); // Can only compensate if order hasn't progressed past PENDING if (!currentOrder) return false; if (currentOrder.status === 'SHIPPED') return false; if (currentOrder.status === 'DELIVERED') return false; if (currentOrder.status === 'CANCELLED') return true; // Already compensated return true; } async compensate(context: CompensationContext): Promise<CompensationResult> { const order = context.originalOutput as Order; try { // Idempotency check const currentOrder = await this.orderRepository.findById(order.id); if (!currentOrder) { // Order doesn't exist - maybe already deleted in a previous attempt return { success: true, requiresRetry: false, requiresManualIntervention: false }; } if (currentOrder.status === 'CANCELLED') { // Already compensated return { success: true, requiresRetry: false, requiresManualIntervention: false }; } // Check if compensation is still possible if (!await this.canCompensate(context)) { return { success: false, requiresRetry: false, requiresManualIntervention: true, error: new Error(`Order ${order.id} has progressed to ${currentOrder.status} and cannot be cancelled`) }; } // Execute compensation await this.orderRepository.update(order.id, { status: 'CANCELLED', cancelledAt: new Date(), cancellationReason: 'Saga compensation', compensationSagaId: context.sagaId }); // Emit compensation event await this.eventStore.append({ type: 'OrderCancelled', aggregateId: order.id, data: { orderId: order.id, reason: 'Saga compensation', sagaId: context.sagaId } }); return { success: true, requiresRetry: false, requiresManualIntervention: false }; } catch (error) { // Determine if this is a transient or permanent failure if (this.isTransientError(error)) { return { success: false, requiresRetry: true, requiresManualIntervention: false, error }; } return { success: false, requiresRetry: false, requiresManualIntervention: true, error }; } } private isTransientError(error: Error): boolean { return error.message.includes('connection timeout') || error.message.includes('deadlock') || error.message.includes('temporarily unavailable'); }}Since sagas lack ACID isolation, concurrent sagas can interfere with each other. The Semantic Lock Pattern prevents anomalies by marking data as 'part of an in-progress saga'.\n\nProblem Scenario:\n\nSaga A starts processing Order #123:\n1. Creates order with status PENDING\n2. Reserves inventory\n3. Payment processing begins...\n\nMeanwhile, a customer support system queries Order #123 and sees it as a valid order. Or worse, Saga B tries to modify the same inventory.\n\nSolution:\n\nAdd a sagaState or lockedBySagaId field to entities that indicates whether the entity is currently being modified by a saga.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135
// Semantic Lock Implementation interface SemanticLockable { sagaState: SagaState; sagaId: string | null; sagaLockedAt: Date | null;} type SagaState = | 'NONE' // Not participating in any saga | 'PENDING' // Saga in progress - data may change | 'COMPENSATING' // Saga is rolling back | 'COMMITTED' // Saga completed successfully | 'ABORTED'; // Saga failed and compensation complete interface Order extends SemanticLockable { id: string; status: OrderStatus; customerId: string; // ... other fields} class OrderService { async createOrderWithLock( sagaId: string, orderData: CreateOrderData ): Promise<Order> { return await this.db.transaction(async (tx) => { const order = await tx.orders.create({ ...orderData, status: 'PENDING', sagaState: 'PENDING', sagaId: sagaId, sagaLockedAt: new Date() }); return order; }); } async commitOrderLock(orderId: string, sagaId: string): Promise<void> { await this.db.transaction(async (tx) => { const order = await tx.orders.findById(orderId); // Verify lock ownership if (order.sagaId !== sagaId) { throw new LockOwnershipError( `Order ${orderId} is locked by saga ${order.sagaId}, not ${sagaId}` ); } await tx.orders.update(orderId, { status: 'CONFIRMED', sagaState: 'COMMITTED', sagaId: null, sagaLockedAt: null }); }); } async abortOrderLock(orderId: string, sagaId: string): Promise<void> { await this.db.transaction(async (tx) => { const order = await tx.orders.findById(orderId); if (order.sagaId !== sagaId) { throw new LockOwnershipError( `Order ${orderId} is locked by saga ${order.sagaId}, not ${sagaId}` ); } await tx.orders.update(orderId, { status: 'CANCELLED', sagaState: 'ABORTED', sagaId: null, sagaLockedAt: null }); }); } // Query that respects semantic locks async findActiveOrdersForCustomer(customerId: string): Promise<Order[]> { return await this.db.orders.findMany({ where: { customerId, // Exclude orders being modified by sagas OR: [ { sagaState: 'NONE' }, { sagaState: 'COMMITTED' } ], status: { not: 'CANCELLED' } } }); }} // Saga coordinator that manages locksclass SagaCoordinator { async executeSaga<T>( sagaId: string, steps: SagaStep<T>[] ): Promise<SagaResult<T>> { const lockedResources: LockedResource[] = []; try { for (const step of steps) { const result = await step.execute(); // Track locked resources for cleanup if (result.lockedResource) { lockedResources.push(result.lockedResource); } } // Success: commit all locks for (const resource of lockedResources) { await this.commitLock(resource, sagaId); } return { success: true }; } catch (error) { // Failure: abort all locks (compensation) for (const resource of lockedResources.reverse()) { try { await this.abortLock(resource, sagaId); } catch (abortError) { // Log and continue - best effort cleanup console.error(`Failed to abort lock for ${resource.id}`, abortError); } } return { success: false, error }; } }}Semantic locks must have timeouts! If a saga crashes without cleaning up, resources remain locked indefinitely. Implement a background process that detects stale locks (e.g., sagaLockedAt > 30 minutes ago) and either forces compensation or escalates for manual review.
When a saga fails, compensations must execute in a specific order. Several strategies exist, each with trade-offs:
| Strategy | Description | When to Use |
|---|---|---|
| Reverse Order | Compensate in reverse order of forward execution: C(Tn-1) → C(Tn-2) → ... → C(T1) | Default strategy; maintains logical consistency |
| Parallel Compensation | Execute all compensations concurrently | When compensations are independent and speed is critical |
| Priority-Based | Compensate critical resources first regardless of order | When some compensations are time-sensitive (e.g., payment refunds) |
| Dependency-Ordered | Compensate based on data dependencies | When compensations have actual dependencies on each other |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118
// Various Compensation Ordering Implementations // 1. REVERSE ORDER (Most Common)async function compensateReverseOrder( completedSteps: CompensatableStep[], sagaContext: CompensationContext): Promise<void> { // Reverse the array and compensate sequentially const reversed = [...completedSteps].reverse(); for (const step of reversed) { await step.compensate(sagaContext); }} // 2. PARALLEL COMPENSATION (For Independent Steps)async function compensateParallel( completedSteps: CompensatableStep[], sagaContext: CompensationContext): Promise<CompensationResult[]> { // Execute all compensations concurrently const compensationPromises = completedSteps.map(step => step.compensate(sagaContext) ); const results = await Promise.allSettled(compensationPromises); // Check for any failures const failures = results .filter((r): r is PromiseRejectedResult => r.status === 'rejected'); if (failures.length > 0) { throw new PartialCompensationError(failures); } return results .filter((r): r is PromiseFulfilledResult<CompensationResult> => r.status === 'fulfilled' ) .map(r => r.value);} // 3. DEPENDENCY-BASED COMPENSATIONinterface StepWithDependencies extends CompensatableStep<unknown, unknown> { compensationDependsOn: string[]; // Step names that must compensate first} async function compensateDependencyOrdered( completedSteps: StepWithDependencies[], sagaContext: CompensationContext): Promise<void> { const compensated = new Set<string>(); const remaining = [...completedSteps]; while (remaining.length > 0) { // Find steps with all dependencies satisfied const ready = remaining.filter(step => step.compensationDependsOn.every(dep => compensated.has(dep)) ); if (ready.length === 0 && remaining.length > 0) { throw new CircularDependencyError('Compensation dependency cycle detected'); } // Compensate all ready steps in parallel await Promise.all(ready.map(step => step.compensate(sagaContext))); // Mark as compensated and remove from remaining for (const step of ready) { compensated.add(step.name); remaining.splice(remaining.indexOf(step), 1); } }} // 4. PRIORITY-BASED COMPENSATIONinterface PrioritizedStep extends CompensatableStep<unknown, unknown> { compensationPriority: number; // Lower = higher priority} async function compensatePriorityOrdered( completedSteps: PrioritizedStep[], sagaContext: CompensationContext): Promise<void> { // Sort by priority (lower numbers first) const sorted = [...completedSteps].sort( (a, b) => a.compensationPriority - b.compensationPriority ); // Compensate in priority order for (const step of sorted) { await step.compensate(sagaContext); }} // Example usage with priorityconst orderSagaSteps: PrioritizedStep[] = [ { name: 'CreateOrder', compensationPriority: 100, // Compensate last // ... }, { name: 'ReserveInventory', compensationPriority: 50, // Medium priority // ... }, { name: 'ChargePayment', compensationPriority: 1, // Compensate FIRST - refund ASAP // ... }, { name: 'NotifyCustomer', compensationPriority: 75, // ... }];What happens when a compensation itself fails? This is the nightmare scenario in saga design—you've already failed the forward path, and now the rollback is failing too.\n\nCauses of Compensation Failures:\n\n- Transient failures: Network timeout, database deadlock, service temporarily down\n- Permanent failures: Record no longer exists, business rule violation, external API rejection\n- Inconsistent state: Data changed since original transaction executed\n- Bugs: Compensation logic has errors that weren't caught in testing
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184
// Comprehensive Compensation Failure Handling interface CompensationFailurePolicy { maxRetries: number; retryDelayMs: number; exponentialBackoff: boolean; maxRetryDelayMs: number; onExhausted: 'halt' | 'continue' | 'escalate';} class ResilientCompensationExecutor { private readonly defaultPolicy: CompensationFailurePolicy = { maxRetries: 5, retryDelayMs: 1000, exponentialBackoff: true, maxRetryDelayMs: 60000, onExhausted: 'escalate' }; async executeCompensation( step: CompensatableStep<unknown, unknown>, context: CompensationContext, policy: CompensationFailurePolicy = this.defaultPolicy ): Promise<CompensationResult> { let attempt = 0; let lastError: Error | undefined; let delay = policy.retryDelayMs; while (attempt <= policy.maxRetries) { context.compensationAttempt = attempt; try { const result = await step.compensate(context); if (result.success) { // Log successful compensation await this.auditLog.record({ sagaId: context.sagaId, stepName: step.name, action: 'COMPENSATION_SUCCESS', attempt, timestamp: new Date() }); return result; } if (result.requiresManualIntervention) { // Don't retry - escalate immediately return this.handlePermanentFailure(step, context, result.error); } if (result.requiresRetry) { lastError = result.error; // Fall through to retry logic } } catch (error) { lastError = error as Error; if (this.isPermanentFailure(error)) { return this.handlePermanentFailure(step, context, error as Error); } } // Retry with delay attempt++; if (attempt <= policy.maxRetries) { await this.sleep(delay); if (policy.exponentialBackoff) { delay = Math.min(delay * 2, policy.maxRetryDelayMs); } } } // Retries exhausted return this.handleRetryExhaustion(step, context, policy, lastError!); } private async handlePermanentFailure( step: CompensatableStep<unknown, unknown>, context: CompensationContext, error: Error ): Promise<CompensationResult> { // Log the permanent failure await this.auditLog.record({ sagaId: context.sagaId, stepName: step.name, action: 'COMPENSATION_PERMANENT_FAILURE', error: error.message, timestamp: new Date() }); // Create manual intervention ticket await this.ticketService.createTicket({ type: 'SAGA_COMPENSATION_FAILURE', priority: 'HIGH', sagaId: context.sagaId, stepName: step.name, error: error.message, context: { originalInput: context.originalInput, originalOutput: context.originalOutput, executedAt: context.executedAt } }); // Alert on-call engineer await this.alerting.page({ severity: 'high', message: `Saga compensation failed permanently: ${context.sagaId}`, runbookUrl: 'https://runbooks/saga-compensation-failure' }); return { success: false, requiresRetry: false, requiresManualIntervention: true, error }; } private async handleRetryExhaustion( step: CompensatableStep<unknown, unknown>, context: CompensationContext, policy: CompensationFailurePolicy, lastError: Error ): Promise<CompensationResult> { await this.auditLog.record({ sagaId: context.sagaId, stepName: step.name, action: 'COMPENSATION_RETRIES_EXHAUSTED', maxRetries: policy.maxRetries, error: lastError.message, timestamp: new Date() }); switch (policy.onExhausted) { case 'halt': // Stop all compensation - saga stuck in COMPENSATING state throw new CompensationHaltError( `Compensation for ${step.name} failed after ${policy.maxRetries} retries`, lastError ); case 'continue': // Log and continue with remaining compensations // This may leave partial state! console.warn( `Compensation for ${step.name} failed, continuing with remaining steps` ); return { success: false, requiresRetry: false, requiresManualIntervention: true, error: lastError }; case 'escalate': default: return this.handlePermanentFailure(step, context, lastError); } } private isPermanentFailure(error: unknown): boolean { if (!(error instanceof Error)) return true; const permanentErrorPatterns = [ 'record not found', 'constraint violation', 'insufficient permissions', 'invalid state transition', 'business rule violation' ]; return permanentErrorPatterns.some(pattern => error.message.toLowerCase().includes(pattern) ); } private sleep(ms: number): Promise<void> { return new Promise(resolve => setTimeout(resolve, ms)); }}The scariest compensation failures are those that sometimes succeed and sometimes fail for the same inputs. These 'Heisencompensations' are often caused by race conditions, non-deterministic external services, or time-dependent state. Combat them with comprehensive logging, idempotency guarantees, and deterministic compensation design.
Compensation code is some of the least-tested code in most systems—it's rarely executed in production, so bugs hide there. Rigorous testing is essential.
execute(x); compensate(x) leaves the system in a consistent state.123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123
// Comprehensive Compensation Testing describe('CreateOrderStep Compensation', () => { describe('Happy Path', () => { it('should cancel a pending order', async () => { // Arrange const order = await createTestOrder({ status: 'PENDING' }); const context = createCompensationContext(order); // Act const result = await createOrderStep.compensate(context); // Assert expect(result.success).toBe(true); const updatedOrder = await orderRepo.findById(order.id); expect(updatedOrder.status).toBe('CANCELLED'); }); }); describe('Idempotency', () => { it('should be safe to execute compensation multiple times', async () => { const order = await createTestOrder({ status: 'PENDING' }); const context = createCompensationContext(order); // Execute compensation twice await createOrderStep.compensate(context); await createOrderStep.compensate(context); // Verify consistent state const updatedOrder = await orderRepo.findById(order.id); expect(updatedOrder.status).toBe('CANCELLED'); // Verify no duplicate events const events = await eventStore.getEventsForAggregate(order.id); const cancellationEvents = events.filter(e => e.type === 'OrderCancelled'); expect(cancellationEvents.length).toBe(1); }); it('should handle concurrent compensation attempts', async () => { const order = await createTestOrder({ status: 'PENDING' }); const context = createCompensationContext(order); // Execute compensation concurrently const results = await Promise.all([ createOrderStep.compensate(context), createOrderStep.compensate(context), createOrderStep.compensate(context) ]); // All should succeed (idempotently) expect(results.every(r => r.success)).toBe(true); }); }); describe('Non-Compensatable States', () => { it('should fail gracefully when order is shipped', async () => { const order = await createTestOrder({ status: 'SHIPPED' }); const context = createCompensationContext(order); const result = await createOrderStep.compensate(context); expect(result.success).toBe(false); expect(result.requiresManualIntervention).toBe(true); expect(result.error.message).toContain('cannot be cancelled'); }); }); describe('Transient Failure Recovery', () => { it('should succeed after transient database failure', async () => { const order = await createTestOrder({ status: 'PENDING' }); const context = createCompensationContext(order); // Simulate transient failure on first attempt let attempts = 0; jest.spyOn(orderRepo, 'update').mockImplementation(async (...args) => { attempts++; if (attempts === 1) { throw new Error('connection timeout'); } return originalUpdate(...args); }); const result = await compensationExecutor.executeCompensation( createOrderStep, context, { maxRetries: 3, retryDelayMs: 10, exponentialBackoff: false } ); expect(result.success).toBe(true); expect(attempts).toBe(2); }); });}); // Property-based testing for compensation correctnessdescribe('Compensation Consistency Properties', () => { it('execute then compensate should return to consistent state', async () => { await fc.assert( fc.asyncProperty( orderDataArbitrary, async (orderData) => { // Get initial state const initialState = await captureSystemState(); // Execute forward transaction const order = await createOrderStep.execute(orderData); // Execute compensation await createOrderStep.compensate(createCompensationContext(order)); // Verify consistent state const finalState = await captureSystemState(); // Orders should be balanced (one created, one cancelled) expect(finalState.orderCount).toBe(initialState.orderCount + 1); expect(finalState.cancelledOrderCount).toBe(initialState.cancelledOrderCount + 1); expect(finalState.activeOrderCount).toBe(initialState.activeOrderCount); } ) ); });});Compensating transactions are the mechanism that makes sagas viable. Without well-designed compensations, sagas would leave systems in inconsistent states when failures occur.
What's Next:\n\nEven with well-designed compensations, failures happen. The next page explores failure handling in depth—understanding failure modes, implementing circuit breakers for saga steps, managing partial failures, and building truly resilient saga architectures.
You now have a comprehensive understanding of compensating transactions—from formal properties to production testing strategies. This knowledge is essential for building sagas that maintain consistency even in failure scenarios.