Saga Pattern - Learning Module

Loading content...

0/273

Compensating Transactions

The Heart of the Saga Pattern

In traditional ACID transactions, failure triggers automatic rollback—the database undoes all changes as if they never happened. Sagas don't have this luxury. Each step is a committed local transaction; there's no technical mechanism to 'undo' committed data.\n\nThis is where compensating transactions become essential. A compensating transaction (often abbreviated CT) is a business operation that semantically reverses the effect of a previous operation. It doesn't erase history—it adds a new entry that counteracts the original effect.\n\nThe Critical Distinction:\n\n- Technical Rollback: DELETE the row that was INSERTed → row never existed\n- Semantic Compensation: INSERT a cancellation record → order existed, then was cancelled\n\nThis distinction has profound implications for system design, auditability, and user experience.

What You Will Learn

By the end of this page, you will master the art of designing compensating transactions: understanding their properties, categorizing operation types, handling edge cases, dealing with non-compensatable operations, and implementing robust compensation logic that maintains data consistency in failure scenarios.

Formal Properties of Compensating Transactions

Before diving into implementation, let's establish the formal properties that compensating transactions must satisfy. Understanding these properties prevents subtle bugs in saga design.

Essential Properties of Compensating Transactions

•Semantic Inverse: C(T) must reverse the business effect of T, not necessarily the technical effect. If T reserves inventory, C(T) must release that reservation—not delete the reservation record.
•Idempotency: Executing C(T) multiple times must have the same effect as executing it once. Compensation may be retried due to failures; duplicate execution must be safe.
•Commutativity with Other Compensations: When compensating multiple steps, the order of compensations should not affect the final state (when possible). C(T1);C(T2) ≈ C(T2);C(T1).
•Atomicity: Each compensating transaction must be atomic—it either fully succeeds or fully fails. Partially compensated state is dangerous.
•Determinism: Given the same input state, C(T) must always produce the same output state. Non-deterministic compensations break recovery guarantees.
•Eventual Success: Compensations should be designed to eventually succeed. If C(T) can permanently fail, the saga may leave data in an inconsistent state forever.

The Commutativity Challenge:\n\nIn practice, compensations often aren't fully commutative. Consider:\n- T1: Reserve 10 units of product A\n- T2: Charge $100 to customer\n\nIf we compensate in order C(T1), C(T2):\n- Release reservation\n- Refund $100\n\nIf we compensate in order C(T2), C(T1):\n- Refund $100\n- Release reservation\n\nThe final state is the same. But consider:\n- T1: Create order record\n- T2: Send confirmation email\n\nCompensating C(T2) first (send 'order cancelled' email) before C(T1) (cancel order) might confuse monitoring systems that expect the order to be cancelled before notifications are sent.\n\nBest Practice: Execute compensations in reverse order of the original transactions. This maintains logical consistency and simplifies reasoning about state.

Categorizing Operations by Compensatability

Not all operations are equally easy to compensate. Understanding the categories helps design sagas that can actually be rolled back.

Operation Categories and Compensation Strategies
Category	Original Operation	Compensation	Difficulty
Reservable	Reserve inventory	Release reservation	Easy
Cancellable	Create pending order	Cancel order	Easy
Reversible	Credit account	Debit account	Medium
Retriable	Send notification	Send correction notification	Medium
Non-Compensatable	Ship physical goods	Cannot undo shipping	Hard/Impossible
External Side Effect	Charge credit card	Refund (fees may apply)	Hard

Let's examine each category in depth:

Reservable operations use a two-phase approach: first reserve resources, then confirm or release.\n\nExamples:\n- Inventory reservation → release or confirm\n- Seat reservation → release or book\n- Payment authorization → void or capture\n- Hotel room hold → release or confirm\n\nImplementation Pattern:

TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
interface InventoryReservation {
  id: string;
  productId: string;
  quantity: number;
  orderId: string;
  status: 'RESERVED' | 'CONFIRMED' | 'RELEASED';
  reservedAt: Date;
  expiresAt: Date;  // Auto-release if not confirmed
}
 
class InventoryService {
  // T: Reserve inventory
  async reserve(productId: string, quantity: number, orderId: string): Promise<string> {
    return await this.db.transaction(async (tx) => {
      // Check available quantity
      const product = await tx.products.findById(productId);
      const reservedQty = await tx.reservations.sumQuantityByProductId(productId);
      const available = product.stockQuantity - reservedQty;
      
      if (available < quantity) {
        throw new InsufficientInventoryError(productId, quantity, available);
      }
 
      // Create reservation
      const reservation = await tx.reservations.create({
        productId,
        quantity,
        orderId,
        status: 'RESERVED',
        reservedAt: new Date(),
        expiresAt: new Date(Date.now() + 15 * 60 * 1000) // 15 min TTL
      });
 
      return reservation.id;
    });
  }
 
  // C(T): Release reservation (compensation)
  async releaseReservation(reservationId: string): Promise<void> {
    await this.db.transaction(async (tx) => {
      const reservation = await tx.reservations.findById(reservationId);
      
      // Idempotency: already released or confirmed
      if (!reservation || reservation.status !== 'RESERVED') {
        return; // Safe to skip
      }
 
      await tx.reservations.update(reservationId, {
        status: 'RELEASED',
        releasedAt: new Date()
      });
    });
  }
 
  // Confirm reservation (for saga success)
  async confirmReservation(reservationId: string): Promise<void> {
    await this.db.transaction(async (tx) => {
      const reservation = await tx.reservations.findById(reservationId);
      
      if (!reservation || reservation.status !== 'RESERVED') {
        throw new ReservationNotFoundError(reservationId);
      }
 
      // Decrement actual stock
      await tx.products.decrementStock(reservation.productId, reservation.quantity);
      
      await tx.reservations.update(reservationId, {
        status: 'CONFIRMED',
        confirmedAt: new Date()
      });
    });
  }
}

Designing Robust Compensating Transactions

Designing effective compensations requires careful analysis of each saga step. Here's a systematic approach:

Compensation Design Checklist

•Capture Sufficient Context — The compensation must have all information needed to reverse the operation. Store IDs, amounts, timestamps, and any state needed for rollback.
•Handle Partial Execution — If a step partially completed before failing, compensation must handle the partial state. Reserved 3 of 5 items? Release exactly those 3.
•Design for Idempotency — Use idempotency keys, version checks, or status guards. Never assume the compensation is running for the first time.
•Consider Timing — Compensation might run minutes, hours, or days after the original operation. Consider state changes in between (e.g., item price changed).
•Plan for Compensation Failure — What if the compensation itself fails? Have retry strategies and escalation paths.
•Test Compensation Paths — Compensation code is rarely executed in production. Test it explicitly; don't discover bugs during actual failures.

comprehensive-compensation.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
// Comprehensive Compensation Implementation
 
interface CompensationContext {
  sagaId: string;
  stepId: string;
  originalInput: unknown;
  originalOutput: unknown;
  executedAt: Date;
  compensationAttempt: number;
}
 
interface CompensationResult {
  success: boolean;
  requiresRetry: boolean;
  requiresManualIntervention: boolean;
  error?: Error;
}
 
// Base class for saga steps with built-in compensation support
abstract class CompensatableStep<TInput, TOutput> {
  abstract name: string;
  
  // Execute the forward action
  abstract execute(input: TInput): Promise<TOutput>;
  
  // Execute the compensation
  abstract compensate(context: CompensationContext): Promise<CompensationResult>;
  
  // Validate that compensation is possible
  abstract canCompensate(context: CompensationContext): Promise<boolean>;
}
 
// Example: Order Creation Step
class CreateOrderStep extends CompensatableStep<CreateOrderInput, Order> {
  name = 'CreateOrder';
  
  async execute(input: CreateOrderInput): Promise<Order> {
    const order = await this.orderRepository.create({
      customerId: input.customerId,
      items: input.items,
      status: 'PENDING',
      totalAmount: this.calculateTotal(input.items),
      createdAt: new Date()
    });
    
    // Emit event for audit trail
    await this.eventStore.append({
      type: 'OrderCreated',
      aggregateId: order.id,
      data: order
    });
    
    return order;
  }
  
  async canCompensate(context: CompensationContext): Promise<boolean> {
    const order = context.originalOutput as Order;
    const currentOrder = await this.orderRepository.findById(order.id);
    
    // Can only compensate if order hasn't progressed past PENDING
    if (!currentOrder) return false;
    if (currentOrder.status === 'SHIPPED') return false;
    if (currentOrder.status === 'DELIVERED') return false;
    if (currentOrder.status === 'CANCELLED') return true; // Already compensated
    
    return true;
  }
  
  async compensate(context: CompensationContext): Promise<CompensationResult> {
    const order = context.originalOutput as Order;
    
    try {
      // Idempotency check
      const currentOrder = await this.orderRepository.findById(order.id);
      
      if (!currentOrder) {
        // Order doesn't exist - maybe already deleted in a previous attempt
        return { success: true, requiresRetry: false, requiresManualIntervention: false };
      }
      
      if (currentOrder.status === 'CANCELLED') {
        // Already compensated
        return { success: true, requiresRetry: false, requiresManualIntervention: false };
      }
      
      // Check if compensation is still possible
      if (!await this.canCompensate(context)) {
        return {
          success: false,
          requiresRetry: false,
          requiresManualIntervention: true,
          error: new Error(`Order ${order.id} has progressed to ${currentOrder.status} and cannot be cancelled`)
        };
      }
      
      // Execute compensation
      await this.orderRepository.update(order.id, {
        status: 'CANCELLED',
        cancelledAt: new Date(),
        cancellationReason: 'Saga compensation',
        compensationSagaId: context.sagaId
      });
      
      // Emit compensation event
      await this.eventStore.append({
        type: 'OrderCancelled',
        aggregateId: order.id,
        data: {
          orderId: order.id,
          reason: 'Saga compensation',
          sagaId: context.sagaId
        }
      });
      
      return { success: true, requiresRetry: false, requiresManualIntervention: false };
      
    } catch (error) {
      // Determine if this is a transient or permanent failure
      if (this.isTransientError(error)) {
        return {
          success: false,
          requiresRetry: true,
          requiresManualIntervention: false,
          error
        };
      }
      
      return {
        success: false,
        requiresRetry: false,
        requiresManualIntervention: true,
        error
      };
    }
  }
  
  private isTransientError(error: Error): boolean {
    return error.message.includes('connection timeout') ||
           error.message.includes('deadlock') ||
           error.message.includes('temporarily unavailable');
  }
}

The Semantic Lock Pattern

Since sagas lack ACID isolation, concurrent sagas can interfere with each other. The Semantic Lock Pattern prevents anomalies by marking data as 'part of an in-progress saga'.\n\nProblem Scenario:\n\nSaga A starts processing Order #123:\n1. Creates order with status PENDING\n2. Reserves inventory\n3. Payment processing begins...\n\nMeanwhile, a customer support system queries Order #123 and sees it as a valid order. Or worse, Saga B tries to modify the same inventory.\n\nSolution:\n\nAdd a sagaState or lockedBySagaId field to entities that indicates whether the entity is currently being modified by a saga.

semantic-lock.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
// Semantic Lock Implementation
 
interface SemanticLockable {
  sagaState: SagaState;
  sagaId: string | null;
  sagaLockedAt: Date | null;
}
 
type SagaState = 
  | 'NONE'           // Not participating in any saga
  | 'PENDING'        // Saga in progress - data may change
  | 'COMPENSATING'   // Saga is rolling back
  | 'COMMITTED'      // Saga completed successfully
  | 'ABORTED';       // Saga failed and compensation complete
 
interface Order extends SemanticLockable {
  id: string;
  status: OrderStatus;
  customerId: string;
  // ... other fields
}
 
class OrderService {
  async createOrderWithLock(
    sagaId: string, 
    orderData: CreateOrderData
  ): Promise<Order> {
    return await this.db.transaction(async (tx) => {
      const order = await tx.orders.create({
        ...orderData,
        status: 'PENDING',
        sagaState: 'PENDING',
        sagaId: sagaId,
        sagaLockedAt: new Date()
      });
      
      return order;
    });
  }
  
  async commitOrderLock(orderId: string, sagaId: string): Promise<void> {
    await this.db.transaction(async (tx) => {
      const order = await tx.orders.findById(orderId);
      
      // Verify lock ownership
      if (order.sagaId !== sagaId) {
        throw new LockOwnershipError(
          `Order ${orderId} is locked by saga ${order.sagaId}, not ${sagaId}`
        );
      }
      
      await tx.orders.update(orderId, {
        status: 'CONFIRMED',
        sagaState: 'COMMITTED',
        sagaId: null,
        sagaLockedAt: null
      });
    });
  }
  
  async abortOrderLock(orderId: string, sagaId: string): Promise<void> {
    await this.db.transaction(async (tx) => {
      const order = await tx.orders.findById(orderId);
      
      if (order.sagaId !== sagaId) {
        throw new LockOwnershipError(
          `Order ${orderId} is locked by saga ${order.sagaId}, not ${sagaId}`
        );
      }
      
      await tx.orders.update(orderId, {
        status: 'CANCELLED',
        sagaState: 'ABORTED',
        sagaId: null,
        sagaLockedAt: null
      });
    });
  }
  
  // Query that respects semantic locks
  async findActiveOrdersForCustomer(customerId: string): Promise<Order[]> {
    return await this.db.orders.findMany({
      where: {
        customerId,
        // Exclude orders being modified by sagas
        OR: [
          { sagaState: 'NONE' },
          { sagaState: 'COMMITTED' }
        ],
        status: { not: 'CANCELLED' }
      }
    });
  }
}
 
// Saga coordinator that manages locks
class SagaCoordinator {
  async executeSaga<T>(
    sagaId: string,
    steps: SagaStep<T>[]
  ): Promise<SagaResult<T>> {
    const lockedResources: LockedResource[] = [];
    
    try {
      for (const step of steps) {
        const result = await step.execute();
        
        // Track locked resources for cleanup
        if (result.lockedResource) {
          lockedResources.push(result.lockedResource);
        }
      }
      
      // Success: commit all locks
      for (const resource of lockedResources) {
        await this.commitLock(resource, sagaId);
      }
      
      return { success: true };
      
    } catch (error) {
      // Failure: abort all locks (compensation)
      for (const resource of lockedResources.reverse()) {
        try {
          await this.abortLock(resource, sagaId);
        } catch (abortError) {
          // Log and continue - best effort cleanup
          console.error(`Failed to abort lock for ${resource.id}`, abortError);
        }
      }
      
      return { success: false, error };
    }
  }
}

Lock Timeout Handling

Semantic locks must have timeouts! If a saga crashes without cleaning up, resources remain locked indefinitely. Implement a background process that detects stale locks (e.g., sagaLockedAt > 30 minutes ago) and either forces compensation or escalates for manual review.

Compensation Ordering Strategies

When a saga fails, compensations must execute in a specific order. Several strategies exist, each with trade-offs:

Compensation Ordering Strategies
Strategy	Description	When to Use
Reverse Order	Compensate in reverse order of forward execution: C(Tn-1) → C(Tn-2) → ... → C(T1)	Default strategy; maintains logical consistency
Parallel Compensation	Execute all compensations concurrently	When compensations are independent and speed is critical
Priority-Based	Compensate critical resources first regardless of order	When some compensations are time-sensitive (e.g., payment refunds)
Dependency-Ordered	Compensate based on data dependencies	When compensations have actual dependencies on each other

compensation-ordering.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
// Various Compensation Ordering Implementations
 
// 1. REVERSE ORDER (Most Common)
async function compensateReverseOrder(
  completedSteps: CompensatableStep[],
  sagaContext: CompensationContext
): Promise<void> {
  // Reverse the array and compensate sequentially
  const reversed = [...completedSteps].reverse();
  
  for (const step of reversed) {
    await step.compensate(sagaContext);
  }
}
 
// 2. PARALLEL COMPENSATION (For Independent Steps)
async function compensateParallel(
  completedSteps: CompensatableStep[],
  sagaContext: CompensationContext
): Promise<CompensationResult[]> {
  // Execute all compensations concurrently
  const compensationPromises = completedSteps.map(step => 
    step.compensate(sagaContext)
  );
  
  const results = await Promise.allSettled(compensationPromises);
  
  // Check for any failures
  const failures = results
    .filter((r): r is PromiseRejectedResult => r.status === 'rejected');
  
  if (failures.length > 0) {
    throw new PartialCompensationError(failures);
  }
  
  return results
    .filter((r): r is PromiseFulfilledResult<CompensationResult> => 
      r.status === 'fulfilled'
    )
    .map(r => r.value);
}
 
// 3. DEPENDENCY-BASED COMPENSATION
interface StepWithDependencies extends CompensatableStep<unknown, unknown> {
  compensationDependsOn: string[];  // Step names that must compensate first
}
 
async function compensateDependencyOrdered(
  completedSteps: StepWithDependencies[],
  sagaContext: CompensationContext
): Promise<void> {
  const compensated = new Set<string>();
  const remaining = [...completedSteps];
  
  while (remaining.length > 0) {
    // Find steps with all dependencies satisfied
    const ready = remaining.filter(step => 
      step.compensationDependsOn.every(dep => compensated.has(dep))
    );
    
    if (ready.length === 0 && remaining.length > 0) {
      throw new CircularDependencyError('Compensation dependency cycle detected');
    }
    
    // Compensate all ready steps in parallel
    await Promise.all(ready.map(step => step.compensate(sagaContext)));
    
    // Mark as compensated and remove from remaining
    for (const step of ready) {
      compensated.add(step.name);
      remaining.splice(remaining.indexOf(step), 1);
    }
  }
}
 
// 4. PRIORITY-BASED COMPENSATION
interface PrioritizedStep extends CompensatableStep<unknown, unknown> {
  compensationPriority: number;  // Lower = higher priority
}
 
async function compensatePriorityOrdered(
  completedSteps: PrioritizedStep[],
  sagaContext: CompensationContext
): Promise<void> {
  // Sort by priority (lower numbers first)
  const sorted = [...completedSteps].sort(
    (a, b) => a.compensationPriority - b.compensationPriority
  );
  
  // Compensate in priority order
  for (const step of sorted) {
    await step.compensate(sagaContext);
  }
}
 
// Example usage with priority
const orderSagaSteps: PrioritizedStep[] = [
  { 
    name: 'CreateOrder', 
    compensationPriority: 100,  // Compensate last
    // ...
  },
  { 
    name: 'ReserveInventory', 
    compensationPriority: 50,   // Medium priority
    // ...
  },
  { 
    name: 'ChargePayment', 
    compensationPriority: 1,    // Compensate FIRST - refund ASAP
    // ...
  },
  { 
    name: 'NotifyCustomer', 
    compensationPriority: 75,
    // ...
  }
];

Handling Compensation Failures

What happens when a compensation itself fails? This is the nightmare scenario in saga design—you've already failed the forward path, and now the rollback is failing too.\n\nCauses of Compensation Failures:\n\n- Transient failures: Network timeout, database deadlock, service temporarily down\n- Permanent failures: Record no longer exists, business rule violation, external API rejection\n- Inconsistent state: Data changed since original transaction executed\n- Bugs: Compensation logic has errors that weren't caught in testing

compensation-failure-handling.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
// Comprehensive Compensation Failure Handling
 
interface CompensationFailurePolicy {
  maxRetries: number;
  retryDelayMs: number;
  exponentialBackoff: boolean;
  maxRetryDelayMs: number;
  onExhausted: 'halt' | 'continue' | 'escalate';
}
 
class ResilientCompensationExecutor {
  private readonly defaultPolicy: CompensationFailurePolicy = {
    maxRetries: 5,
    retryDelayMs: 1000,
    exponentialBackoff: true,
    maxRetryDelayMs: 60000,
    onExhausted: 'escalate'
  };
 
  async executeCompensation(
    step: CompensatableStep<unknown, unknown>,
    context: CompensationContext,
    policy: CompensationFailurePolicy = this.defaultPolicy
  ): Promise<CompensationResult> {
    let attempt = 0;
    let lastError: Error | undefined;
    let delay = policy.retryDelayMs;
 
    while (attempt <= policy.maxRetries) {
      context.compensationAttempt = attempt;
      
      try {
        const result = await step.compensate(context);
        
        if (result.success) {
          // Log successful compensation
          await this.auditLog.record({
            sagaId: context.sagaId,
            stepName: step.name,
            action: 'COMPENSATION_SUCCESS',
            attempt,
            timestamp: new Date()
          });
          return result;
        }
        
        if (result.requiresManualIntervention) {
          // Don't retry - escalate immediately
          return this.handlePermanentFailure(step, context, result.error);
        }
        
        if (result.requiresRetry) {
          lastError = result.error;
          // Fall through to retry logic
        }
        
      } catch (error) {
        lastError = error as Error;
        
        if (this.isPermanentFailure(error)) {
          return this.handlePermanentFailure(step, context, error as Error);
        }
      }
 
      // Retry with delay
      attempt++;
      if (attempt <= policy.maxRetries) {
        await this.sleep(delay);
        
        if (policy.exponentialBackoff) {
          delay = Math.min(delay * 2, policy.maxRetryDelayMs);
        }
      }
    }
 
    // Retries exhausted
    return this.handleRetryExhaustion(step, context, policy, lastError!);
  }
 
  private async handlePermanentFailure(
    step: CompensatableStep<unknown, unknown>,
    context: CompensationContext,
    error: Error
  ): Promise<CompensationResult> {
    // Log the permanent failure
    await this.auditLog.record({
      sagaId: context.sagaId,
      stepName: step.name,
      action: 'COMPENSATION_PERMANENT_FAILURE',
      error: error.message,
      timestamp: new Date()
    });
 
    // Create manual intervention ticket
    await this.ticketService.createTicket({
      type: 'SAGA_COMPENSATION_FAILURE',
      priority: 'HIGH',
      sagaId: context.sagaId,
      stepName: step.name,
      error: error.message,
      context: {
        originalInput: context.originalInput,
        originalOutput: context.originalOutput,
        executedAt: context.executedAt
      }
    });
 
    // Alert on-call engineer
    await this.alerting.page({
      severity: 'high',
      message: `Saga compensation failed permanently: ${context.sagaId}`,
      runbookUrl: 'https://runbooks/saga-compensation-failure'
    });
 
    return {
      success: false,
      requiresRetry: false,
      requiresManualIntervention: true,
      error
    };
  }
 
  private async handleRetryExhaustion(
    step: CompensatableStep<unknown, unknown>,
    context: CompensationContext,
    policy: CompensationFailurePolicy,
    lastError: Error
  ): Promise<CompensationResult> {
    await this.auditLog.record({
      sagaId: context.sagaId,
      stepName: step.name,
      action: 'COMPENSATION_RETRIES_EXHAUSTED',
      maxRetries: policy.maxRetries,
      error: lastError.message,
      timestamp: new Date()
    });
 
    switch (policy.onExhausted) {
      case 'halt':
        // Stop all compensation - saga stuck in COMPENSATING state
        throw new CompensationHaltError(
          `Compensation for ${step.name} failed after ${policy.maxRetries} retries`,
          lastError
        );
 
      case 'continue':
        // Log and continue with remaining compensations
        // This may leave partial state!
        console.warn(
          `Compensation for ${step.name} failed, continuing with remaining steps`
        );
        return {
          success: false,
          requiresRetry: false,
          requiresManualIntervention: true,
          error: lastError
        };
 
      case 'escalate':
      default:
        return this.handlePermanentFailure(step, context, lastError);
    }
  }
 
  private isPermanentFailure(error: unknown): boolean {
    if (!(error instanceof Error)) return true;
    
    const permanentErrorPatterns = [
      'record not found',
      'constraint violation',
      'insufficient permissions',
      'invalid state transition',
      'business rule violation'
    ];
    
    return permanentErrorPatterns.some(pattern => 
      error.message.toLowerCase().includes(pattern)
    );
  }
 
  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

The Heisencompensation

The scariest compensation failures are those that sometimes succeed and sometimes fail for the same inputs. These 'Heisencompensations' are often caused by race conditions, non-deterministic external services, or time-dependent state. Combat them with comprehensive logging, idempotency guarantees, and deterministic compensation design.

Testing Compensating Transactions

Compensation code is some of the least-tested code in most systems—it's rarely executed in production, so bugs hide there. Rigorous testing is essential.

Compensation Testing Strategies

•Unit Test Every Compensation — Each compensation function should have unit tests covering success, idempotency, and failure cases.
•Integration Test Full Saga Rollbacks — Test complete saga execution with failures injected at each step, verifying compensation runs correctly.
•Property-Based Testing — Use property-based testing to verify that execute(x); compensate(x) leaves the system in a consistent state.
•Chaos Testing — Inject failures during compensation itself—network partitions, service restarts—and verify recovery.
•Production Shadow Testing — Periodically execute compensation logic against production data copies to verify it still works with real data shapes.

compensation-tests.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
// Comprehensive Compensation Testing
 
describe('CreateOrderStep Compensation', () => {
  
  describe('Happy Path', () => {
    it('should cancel a pending order', async () => {
      // Arrange
      const order = await createTestOrder({ status: 'PENDING' });
      const context = createCompensationContext(order);
      
      // Act
      const result = await createOrderStep.compensate(context);
      
      // Assert
      expect(result.success).toBe(true);
      const updatedOrder = await orderRepo.findById(order.id);
      expect(updatedOrder.status).toBe('CANCELLED');
    });
  });
  
  describe('Idempotency', () => {
    it('should be safe to execute compensation multiple times', async () => {
      const order = await createTestOrder({ status: 'PENDING' });
      const context = createCompensationContext(order);
      
      // Execute compensation twice
      await createOrderStep.compensate(context);
      await createOrderStep.compensate(context);
      
      // Verify consistent state
      const updatedOrder = await orderRepo.findById(order.id);
      expect(updatedOrder.status).toBe('CANCELLED');
      
      // Verify no duplicate events
      const events = await eventStore.getEventsForAggregate(order.id);
      const cancellationEvents = events.filter(e => e.type === 'OrderCancelled');
      expect(cancellationEvents.length).toBe(1);
    });
    
    it('should handle concurrent compensation attempts', async () => {
      const order = await createTestOrder({ status: 'PENDING' });
      const context = createCompensationContext(order);
      
      // Execute compensation concurrently
      const results = await Promise.all([
        createOrderStep.compensate(context),
        createOrderStep.compensate(context),
        createOrderStep.compensate(context)
      ]);
      
      // All should succeed (idempotently)
      expect(results.every(r => r.success)).toBe(true);
    });
  });
  
  describe('Non-Compensatable States', () => {
    it('should fail gracefully when order is shipped', async () => {
      const order = await createTestOrder({ status: 'SHIPPED' });
      const context = createCompensationContext(order);
      
      const result = await createOrderStep.compensate(context);
      
      expect(result.success).toBe(false);
      expect(result.requiresManualIntervention).toBe(true);
      expect(result.error.message).toContain('cannot be cancelled');
    });
  });
  
  describe('Transient Failure Recovery', () => {
    it('should succeed after transient database failure', async () => {
      const order = await createTestOrder({ status: 'PENDING' });
      const context = createCompensationContext(order);
      
      // Simulate transient failure on first attempt
      let attempts = 0;
      jest.spyOn(orderRepo, 'update').mockImplementation(async (...args) => {
        attempts++;
        if (attempts === 1) {
          throw new Error('connection timeout');
        }
        return originalUpdate(...args);
      });
      
      const result = await compensationExecutor.executeCompensation(
        createOrderStep,
        context,
        { maxRetries: 3, retryDelayMs: 10, exponentialBackoff: false }
      );
      
      expect(result.success).toBe(true);
      expect(attempts).toBe(2);
    });
  });
});
 
// Property-based testing for compensation correctness
describe('Compensation Consistency Properties', () => {
  it('execute then compensate should return to consistent state', async () => {
    await fc.assert(
      fc.asyncProperty(
        orderDataArbitrary,
        async (orderData) => {
          // Get initial state
          const initialState = await captureSystemState();
          
          // Execute forward transaction
          const order = await createOrderStep.execute(orderData);
          
          // Execute compensation
          await createOrderStep.compensate(createCompensationContext(order));
          
          // Verify consistent state
          const finalState = await captureSystemState();
          
          // Orders should be balanced (one created, one cancelled)
          expect(finalState.orderCount).toBe(initialState.orderCount + 1);
          expect(finalState.cancelledOrderCount).toBe(initialState.cancelledOrderCount + 1);
          expect(finalState.activeOrderCount).toBe(initialState.activeOrderCount);
        }
      )
    );
  });
});

Summary: Mastering Compensating Transactions

Compensating transactions are the mechanism that makes sagas viable. Without well-designed compensations, sagas would leave systems in inconsistent states when failures occur.

Key Takeaways

•Semantic, not technical rollback — Compensating transactions don't erase history; they append corrective actions that achieve business-level reversal.
•Properties matter — Compensations must be idempotent, atomic, and deterministic. Design for these properties explicitly.
•Categorize operations — Understand which operations are reservable, cancellable, reversible, or non-compensatable. Design sagas accordingly.
•Semantic locks prevent anomalies — Mark data participating in active sagas to prevent concurrent saga interference.
•Compensation ordering — Execute compensations in reverse order by default; consider priority or dependency-based ordering for specific needs.
•Plan for compensation failures — Implement retry policies, exponential backoff, and escalation paths. Some compensations will fail.
•Test rigorously — Compensation code is rarely executed in production. Unit test, integration test, and chaos test your compensations.

What's Next:\n\nEven with well-designed compensations, failures happen. The next page explores failure handling in depth—understanding failure modes, implementing circuit breakers for saga steps, managing partial failures, and building truly resilient saga architectures.

Page Complete

You now have a comprehensive understanding of compensating transactions—from formal properties to production testing strategies. This knowledge is essential for building sagas that maintain consistency even in failure scenarios.