System Design (HLD)Retry Strategies

Retry Strategies in Distributed Systems

LevelIntermediate

Duration75 mins

TopicRetry Strategies

3 / 5

Jitter for Thundering Herd

When the Herd Stampedes

Imagine this scenario: A database becomes briefly unavailable for 100 milliseconds. 10,000 clients experience simultaneous failures. Each client, following textbook exponential backoff, waits exactly 100ms before retrying.

At t=100ms, all 10,000 clients retry at the exact same instant.

The database, having just recovered, is immediately overwhelmed by 10,000 simultaneous requests. It fails again. All clients wait 200ms. At t=300ms, 10,000 simultaneous requests. Failure. Wait 400ms. 10,000 requests. Failure.

This is the thundering herd problem.

Even with perfect exponential backoff, synchronized retries create periodic load spikes that can be worse than the original failure. The solution is beautifully simple: jitter — adding randomness to break synchronization.

What You Will Master

By the end of this page, you will deeply understand the thundering herd problem, why it occurs even with exponential backoff, the mathematics of jitter strategies, how to implement different jitter algorithms, and how to choose the right jitter type for different scenarios.

The Thundering Herd Problem

The term "thundering herd" originates from operating systems, describing what happens when many processes wait on the same resource (like a mutex or network event) and all wake up simultaneously when the resource becomes available, only to compete and mostly fail again.

In distributed systems, the thundering herd manifests whenever many clients synchronize their behavior—intentionally or accidentally.

Common Thundering Herd Triggers

•Simultaneous failure: All clients fail at the same time due to a shared dependency, then retry together.
•Clock-aligned operations: Health checks or polling scheduled on round seconds (e.g., every minute at :00).
•Cache expiration: Cached entries created at the same time expire together, triggering many cache rebuilds.
•Deployment restarts: All instances restart simultaneously and connect to dependencies at once.
•Circuit breaker recovery: When a circuit breaker transitions from open to half-open, all waiting requests flood through.
•Rate limit reset: API rate limits reset at the same time for all clients (e.g., hourly at :00).

Why exponential backoff alone doesn't solve this:

Exponential backoff spaces out sequential retries from a single client. It does nothing to desynchronize parallel retries across multiple clients. If 10,000 clients all fail at t=0:

All retry at t=100ms (first backoff)
All retry at t=300ms (second backoff: 100+200)
All retry at t=700ms (third backoff: 100+200+400)

The retries remain perfectly synchronized because all clients use the same formula with the same starting point.

thundering-herd-simulation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
/**
 * Simulating the Thundering Herd Problem
 * 
 * This demonstrates why exponential backoff without jitter
 * leads to synchronized retry storms.
 */
 
interface RetryEvent {
  clientId: number;
  timestampMs: number;
  attempt: number;
}
 
function simulateWithoutJitter(
  numClients: number,
  baseDelayMs: number,
  multiplier: number,
  maxAttempts: number
): Map<number, number> {
  // Track load at each millisecond
  const loadByTime = new Map<number, number>();
  
  for (let clientId = 0; clientId < numClients; clientId++) {
    let currentTime = 0;  // All clients fail at t=0
    
    for (let attempt = 1; attempt <= maxAttempts; attempt++) {
      // Exponential delay without jitter
      const delay = baseDelayMs * Math.pow(multiplier, attempt - 1);
      currentTime += delay;
      
      // Round to nearest ms for bucketing
      const timeSlot = Math.round(currentTime);
      loadByTime.set(timeSlot, (loadByTime.get(timeSlot) || 0) + 1);
    }
  }
  
  return loadByTime;
}
 
// Simulate 10,000 clients with base=100ms, multiplier=2, 5 attempts
const loadWithoutJitter = simulateWithoutJitter(10000, 100, 2, 5);
 
// Find peak loads
const sortedLoads = Array.from(loadWithoutJitter.entries())
  .sort((a, b) => b[1] - a[1])
  .slice(0, 10);
 
console.log("Peak loads WITHOUT jitter:");
sortedLoads.forEach(([time, load]) => {
  console.log(`  t=${time}ms: ${load} requests`);
});
 
// Output:
// Peak loads WITHOUT jitter:
//   t=100ms: 10000 requests    <- ALL clients retry together!
//   t=300ms: 10000 requests    <- Same with second retry
//   t=700ms: 10000 requests    <- And third
//   t=1500ms: 10000 requests   <- And fourth
//   t=3100ms: 10000 requests   <- And fifth
 
// This is the thundering herd: perfect synchronization = maximum damage

The Math of Catastrophe

If a service can handle 1,000 req/s but 10,000 clients retry at the same instant, that instant experiences 10× normal load. Even brief overloads cause cascading effects: queues fill, timeouts occur, more retries trigger. Without jitter, recovery becomes mathematically impossible.

Jitter Strategy Overview

Jitter adds controlled randomness to retry delays, desynchronizing clients so they don't all retry at the same instant. Instead of a load spike at t=100ms, retries spread across a time window.

There are several jitter strategies, each with different properties:

1. No Jitter (baseline): delay = base × 2^attempt 2. Full Jitter: delay = random(0, base × 2^attempt) 3. Equal Jitter: delay = (base × 2^attempt)/2 + random(0, (base × 2^attempt)/2) 4. Decorrelated Jitter: delay = random(base, previous_delay × 3)

Jitter Strategy Comparison
Strategy	Formula	Min Delay	Max Delay	Best For
No Jitter	base × 2^n	base × 2^n	base × 2^n	Debugging only (never in production)
Full Jitter	random(0, base × 2^n)	0	base × 2^n	Maximum spread; high contention systems
Equal Jitter	base × 2^n / 2 + random(half)	half of max	base × 2^n	Balanced; guaranteed minimum delay
Decorrelated Jitter	random(base, prev × 3)	base	~3× previous	Heavily loaded systems; best overall

The AWS recommendation:

Amazon Web Services performed extensive analysis and found that decorrelated jitter produces the best results under high contention, reducing total completion time compared to other strategies. Their 2015 paper "Exponential Backoff and Jitter" remains the definitive reference.

The Key Insight

The goal of jitter is not just randomness—it's desynchronization. Even small amounts of randomness break the lock-step behavior that causes thundering herds. The specific jitter algorithm matters less than simply having some jitter.

Full Jitter: Maximum Spread

Full jitter provides maximum randomization by selecting a random delay between 0 and the exponential cap.

Formula: delay = random(0, base × 2^attempt)

Properties:

Minimum delay: 0 (some clients retry almost immediately)
Maximum delay: base × 2^attempt
Average delay: (base × 2^attempt) / 2
Spread: Uniform across the entire range

full-jitter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
/**
 * Full Jitter Implementation
 * 
 * Provides maximum desynchronization by randomizing across
 * the entire delay range. Used by AWS SDKs.
 */
class FullJitterBackoff {
  private readonly baseMs: number;
  private readonly capMs: number;
  
  constructor(baseMs: number = 100, capMs: number = 30000) {
    this.baseMs = baseMs;
    this.capMs = capMs;
  }
  
  /**
   * Calculate delay with full jitter.
   * delay = random(0, min(cap, base × 2^attempt))
   */
  getDelay(attempt: number): number {
    const exponentialDelay = this.baseMs * Math.pow(2, attempt);
    const cappedDelay = Math.min(exponentialDelay, this.capMs);
    
    // Full jitter: random between 0 and capped delay
    return Math.random() * cappedDelay;
  }
  
  /**
   * Get statistical properties for analysis.
   */
  getStatistics(attempt: number): { min: number; max: number; average: number } {
    const exponentialDelay = this.baseMs * Math.pow(2, attempt);
    const cappedDelay = Math.min(exponentialDelay, this.capMs);
    
    return {
      min: 0,
      max: cappedDelay,
      average: cappedDelay / 2,
    };
  }
}
 
// Demonstrate the spread
const fullJitter = new FullJitterBackoff(100, 30000);
 
console.log("Full Jitter Delay Statistics:");
for (let attempt = 1; attempt <= 5; attempt++) {
  const stats = fullJitter.getStatistics(attempt);
  const sample = fullJitter.getDelay(attempt);
  console.log(`  Attempt ${attempt}: range=[0, ${stats.max}ms], avg=${stats.average}ms, sample=${Math.round(sample)}ms`);
}
 
// Output (samples will vary):
// Attempt 1: range=[0, 200ms], avg=100ms, sample=73ms
// Attempt 2: range=[0, 400ms], avg=200ms, sample=312ms
// Attempt 3: range=[0, 800ms], avg=400ms, sample=156ms
// Attempt 4: range=[0, 1600ms], avg=800ms, sample=1203ms
// Attempt 5: range=[0, 3200ms], avg=1600ms, sample=2847ms

Pros and Cons of Full Jitter:

Advantages

•Maximum desynchronization between clients
•Simple to implement and understand
•Some clients retry quickly (good for short failures)
•Well-tested in production at massive scale (AWS)

Disadvantages

•Zero minimum delay (some retries too fast)
•Unpredictable per-request latency
•Can result in very short delays that don't help
•Not optimal for competitive resource acquisition

When to Use Full Jitter

Full jitter is ideal when you want maximum spread and can tolerate some clients retrying almost immediately. It's particularly effective in high-scale scenarios where the statistical distribution matters more than individual client behavior.

Equal Jitter: Balanced Approach

Equal jitter guarantees a minimum delay while still providing randomization. It splits the exponential delay in half: the first half is fixed, the second half is randomized.

Formula: delay = (base × 2^attempt)/2 + random(0, (base × 2^attempt)/2)

Properties:

Minimum delay: half of exponential delay
Maximum delay: full exponential delay
Average delay: 3/4 of exponential delay
Spread: Uniform across upper half of range

equal-jitter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
/**
 * Equal Jitter Implementation
 * 
 * Guarantees a minimum delay while still providing spread.
 * Also known as "half-jitter" or "50% jitter".
 */
class EqualJitterBackoff {
  private readonly baseMs: number;
  private readonly capMs: number;
  
  constructor(baseMs: number = 100, capMs: number = 30000) {
    this.baseMs = baseMs;
    this.capMs = capMs;
  }
  
  /**
   * Calculate delay with equal jitter.
   * delay = cap/2 + random(0, cap/2)
   * where cap = min(maxCap, base × 2^attempt)
   */
  getDelay(attempt: number): number {
    const exponentialDelay = this.baseMs * Math.pow(2, attempt);
    const cappedDelay = Math.min(exponentialDelay, this.capMs);
    
    const fixedPart = cappedDelay / 2;
    const randomPart = Math.random() * (cappedDelay / 2);
    
    return fixedPart + randomPart;
  }
  
  getStatistics(attempt: number): { min: number; max: number; average: number } {
    const exponentialDelay = this.baseMs * Math.pow(2, attempt);
    const cappedDelay = Math.min(exponentialDelay, this.capMs);
    
    return {
      min: cappedDelay / 2,
      max: cappedDelay,
      average: cappedDelay * 0.75,  // 3/4 of cap
    };
  }
}
 
// Compare Full vs Equal jitter
const fullJitter = new FullJitterBackoff(100, 30000);
const equalJitter = new EqualJitterBackoff(100, 30000);
 
console.log("Comparison: Full Jitter vs Equal Jitter");
console.log("---------------------------------------");
 
for (let attempt = 1; attempt <= 4; attempt++) {
  const fullStats = {
    min: 0,
    max: Math.min(100 * Math.pow(2, attempt), 30000),
  };
  const equalStats = equalJitter.getStatistics(attempt);
  
  console.log(`Attempt ${attempt}:`);
  console.log(`  Full:  [${fullStats.min}ms - ${fullStats.max}ms]`);
  console.log(`  Equal: [${equalStats.min}ms - ${equalStats.max}ms]`);
}
 
// Output:
// Attempt 1:
//   Full:  [0ms - 200ms]
//   Equal: [100ms - 200ms]    <- Guaranteed minimum!
// Attempt 2:
//   Full:  [0ms - 400ms]
//   Equal: [200ms - 400ms]
// Attempt 3:
//   Full:  [0ms - 800ms]
//   Equal: [400ms - 800ms]
// Attempt 4:
//   Full:  [0ms - 1600ms]
//   Equal: [800ms - 1600ms]

When Equal Jitter is preferred:

Equal jitter is the right choice when you need predictable minimum backoff. For example:

Rate-limited APIs: You know the server needs at least N ms to recover capacity
SLA requirements: You need to guarantee minimum delay between attempts for compliance
User-facing requests: You want to avoid near-instant retries that feel like "nothing happened"
Resource protection: You want to ensure servers always get breathing room

The Goldilocks Choice

Equal jitter is often the best default choice because it balances spread (preventing thundering herd) with predictability (guaranteed minimum delay). It's not as aggressive as full jitter but also not as risky.

Decorrelated Jitter: The AWS Recommendation

Decorrelated jitter takes a fundamentally different approach: instead of computing delay from the attempt number, it computes delay based on the previous delay. This creates a random walk that naturally spreads clients over time.

Formula: delay = random(base, min(cap, previous_delay × 3))

Properties:

Each delay is computed from the previous, not the attempt number
Delays drift randomly, creating natural decorrelation
No two clients will follow the same path
Mathematically proven to minimize total system time under high contention

decorrelated-jitter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
/**
 * Decorrelated Jitter Implementation
 * 
 * This is the AWS-recommended algorithm from their 2015 paper.
 * It produces the best results under high contention.
 */
class DecorrelatedJitterBackoff {
  private readonly baseMs: number;
  private readonly capMs: number;
  private currentDelayMs: number;
  
  constructor(baseMs: number = 100, capMs: number = 30000) {
    this.baseMs = baseMs;
    this.capMs = capMs;
    this.currentDelayMs = baseMs;  // Start at base
  }
  
  /**
   * Get next delay using decorrelated jitter.
   * delay = random(base, min(cap, previous × 3))
   */
  getDelay(): number {
    // Calculate the upper bound for this iteration
    const upperBound = Math.min(this.capMs, this.currentDelayMs * 3);
    
    // Random delay between base and upper bound
    const newDelay = this.baseMs + Math.random() * (upperBound - this.baseMs);
    
    // Store for next iteration
    this.currentDelayMs = newDelay;
    
    return newDelay;
  }
  
  /**
   * Reset state for a new retry sequence.
   */
  reset(): void {
    this.currentDelayMs = this.baseMs;
  }
  
  /**
   * Simulate a complete sequence of retries.
   */
  simulateSequence(numRetries: number): number[] {
    this.reset();
    const delays: number[] = [];
    
    for (let i = 0; i < numRetries; i++) {
      delays.push(Math.round(this.getDelay()));
    }
    
    return delays;
  }
}
 
// Demonstrate that each sequence is different
const decorrelatedBackoff = new DecorrelatedJitterBackoff(100, 30000);
 
console.log("Decorrelated Jitter: Multiple 5-Retry Sequences");
console.log("(Notice how each sequence is completely different)");
console.log("-----------------------------------------------");
 
for (let seq = 1; seq <= 3; seq++) {
  const delays = decorrelatedBackoff.simulateSequence(5);
  console.log(`Sequence ${seq}: ${delays.map(d => d + "ms").join(" → ")}`);
}
 
// Output (example - will vary):
// Sequence 1: 142ms → 287ms → 651ms → 1823ms → 4201ms
// Sequence 2: 167ms → 398ms → 1041ms → 2871ms → 7432ms
// Sequence 3: 123ms → 312ms → 789ms → 2156ms → 5123ms
 
// Key insight: Unlike fixed exponential (100 → 200 → 400 → 800...),
// decorrelated jitter produces unique paths for each client!

Why decorrelated jitter is optimal:

AWS performed extensive simulations comparing all jitter strategies under high contention (many clients competing for limited resources). Key findings:

Decorrelated Jitter Performance

•Minimal total work: Clients complete all their work in the least total time compared to other strategies.
•Smoothest load: Load distribution over time is maximally uniform, with no spikes.
•Best fairness: No client consistently gets better or worse treatment than others.
•Self-adapting: Long failures naturally produce longer delays without explicit detection.
•State-dependent: Each client's trajectory depends on its own history, maximizing diversity.

Implementation Subtlety

Decorrelated jitter requires maintaining state (the previous delay) between retries. This means the backoff object must persist across retry attempts. Don't create a new instance for each retry — you'll lose the decorrelation benefit!

Comparative Analysis: Choosing the Right Jitter

Let's put all strategies side-by-side with a simulation to understand their real-world behavior.

jitter-comparison.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
/**
 * Comprehensive Jitter Strategy Comparison
 * 
 * Simulates 1000 clients using each strategy and measures:
 * - Load distribution over time (peak vs average)
 * - Total completion time
 * - Fairness (variance in completion times)
 */
 
interface SimulationResult {
  strategy: string;
  peakLoad: number;
  averageLoad: number;
  peakToAverageRatio: number;
  totalCompletionTimeMs: number;
  fairnessScore: number;  // Lower is fairer (std dev of completion times)
}
 
function simulateStrategy(
  strategy: "none" | "full" | "equal" | "decorrelated",
  numClients: number,
  baseMs: number,
  maxAttempts: number
): SimulationResult {
  const loadByTimeBucket = new Map<number, number>();
  const completionTimes: number[] = [];
  const bucketSizeMs = 50;  // Group into 50ms buckets
  
  for (let client = 0; client < numClients; client++) {
    let currentTime = 0;
    let previousDelay = baseMs;
    
    for (let attempt = 1; attempt <= maxAttempts; attempt++) {
      let delay: number;
      
      switch (strategy) {
        case "none":
          delay = baseMs * Math.pow(2, attempt - 1);
          break;
        case "full":
          delay = Math.random() * baseMs * Math.pow(2, attempt - 1);
          break;
        case "equal":
          const cap = baseMs * Math.pow(2, attempt - 1);
          delay = cap / 2 + Math.random() * (cap / 2);
          break;
        case "decorrelated":
          const upperBound = Math.min(30000, previousDelay * 3);
          delay = baseMs + Math.random() * (upperBound - baseMs);
          previousDelay = delay;
          break;
      }
      
      currentTime += delay;
      
      const bucket = Math.floor(currentTime / bucketSizeMs);
      loadByTimeBucket.set(bucket, (loadByTimeBucket.get(bucket) || 0) + 1);
    }
    
    completionTimes.push(currentTime);
  }
  
  // Calculate metrics
  const loads = Array.from(loadByTimeBucket.values());
  const peakLoad = Math.max(...loads);
  const averageLoad = loads.reduce((a, b) => a + b, 0) / loads.length;
  
  const avgCompletion = completionTimes.reduce((a, b) => a + b) / completionTimes.length;
  const variance = completionTimes.reduce((sum, t) => sum + Math.pow(t - avgCompletion, 2), 0) / numClients;
  const fairnessScore = Math.sqrt(variance);
  
  return {
    strategy,
    peakLoad,
    averageLoad,
    peakToAverageRatio: peakLoad / averageLoad,
    totalCompletionTimeMs: Math.max(...completionTimes),
    fairnessScore,
  };
}
 
// Run simulations
const strategies: Array<"none" | "full" | "equal" | "decorrelated"> = [
  "none", "full", "equal", "decorrelated"
];
 
console.log("Jitter Strategy Comparison (1000 clients, 5 retries)");
console.log("====================================================
");
 
strategies.forEach(strategy => {
  const result = simulateStrategy(strategy, 1000, 100, 5);
  console.log(`Strategy: ${result.strategy.toUpperCase()}`);
  console.log(`  Peak Load:     ${result.peakLoad} requests/50ms`);
  console.log(`  Avg Load:      ${result.averageLoad.toFixed(1)} requests/50ms`);
  console.log(`  Peak/Avg:      ${result.peakToAverageRatio.toFixed(2)}x`);
  console.log(`  Total Time:    ${result.totalCompletionTimeMs.toFixed(0)}ms`);
  console.log(`  Fairness:      ${result.fairnessScore.toFixed(0)} (lower=better)
`);
});
 
/* Expected output pattern:
 
Strategy: NONE
  Peak Load:     5000 requests/50ms    <- DISASTER: all in same buckets!
  Peak/Avg:      ~50x
  
Strategy: FULL
  Peak Load:     ~50 requests/50ms     <- Massive improvement
  Peak/Avg:      ~5x
  
Strategy: EQUAL
  Peak Load:     ~40 requests/50ms     <- Good spread, guaranteed minimum
  Peak/Avg:      ~4x
  
Strategy: DECORRELATED
  Peak Load:     ~30 requests/50ms     <- Best distribution
  Peak/Avg:      ~3x
  Fairness:      Lowest                <- Most uniform experience
*/

Jitter Strategy Selection Guide
Scenario	Recommended Strategy	Reasoning
Default choice for most systems	Equal Jitter	Balances spread with guaranteed minimum backoff
Very high contention (thousands of clients)	Decorrelated Jitter	Proven mathematically optimal for high contention
Maximum spread needed	Full Jitter	Zero minimum means some clients succeed immediately
Simple implementation priority	Equal Jitter	Easy to understand and implement correctly
Need minimum backoff guarantee	Equal Jitter	Only strategy with hard minimum
Retry budget is tight	Full Jitter	Lower average delay means more attempts in budget

Production Implementation: Complete Jitter System

A production-ready jitter implementation should support all strategies, be easily configurable, and integrate cleanly with retry loops.

production-jitter-system.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
/**
 * Production-Ready Jitter System
 * 
 * Supports all major jitter strategies with clean configuration
 * and integration with retry logic.
 */
 
type JitterStrategy = "none" | "full" | "equal" | "decorrelated";
 
interface JitterConfig {
  strategy: JitterStrategy;
  baseDelayMs: number;
  maxDelayMs: number;
  multiplier: number;
}
 
interface BackoffResult {
  delayMs: number;
  attempt: number;
  strategy: JitterStrategy;
}
 
class ProductionJitterBackoff {
  private readonly config: JitterConfig;
  private previousDelayMs: number;
  private currentAttempt: number = 0;
  
  constructor(config: Partial<JitterConfig> = {}) {
    this.config = {
      strategy: "equal",  // Sensible default
      baseDelayMs: 100,
      maxDelayMs: 30000,
      multiplier: 2,
      ...config,
    };
    this.previousDelayMs = this.config.baseDelayMs;
  }
  
  /**
   * Get the next backoff delay.
   * Call this before each retry attempt.
   */
  nextBackoff(): BackoffResult {
    this.currentAttempt++;
    
    const delayMs = this.calculateDelay();
    
    // Update state for decorrelated jitter
    if (this.config.strategy === "decorrelated") {
      this.previousDelayMs = delayMs;
    }
    
    return {
      delayMs: Math.round(delayMs),
      attempt: this.currentAttempt,
      strategy: this.config.strategy,
    };
  }
  
  private calculateDelay(): number {
    const { strategy, baseDelayMs, maxDelayMs, multiplier } = this.config;
    
    switch (strategy) {
      case "none":
        return this.calculateNoJitter();
      case "full":
        return this.calculateFullJitter();
      case "equal":
        return this.calculateEqualJitter();
      case "decorrelated":
        return this.calculateDecorrelatedJitter();
      default:
        return this.calculateEqualJitter();
    }
  }
  
  private calculateNoJitter(): number {
    const exponential = this.config.baseDelayMs * 
      Math.pow(this.config.multiplier, this.currentAttempt - 1);
    return Math.min(exponential, this.config.maxDelayMs);
  }
  
  private calculateFullJitter(): number {
    const cap = Math.min(
      this.config.baseDelayMs * Math.pow(this.config.multiplier, this.currentAttempt - 1),
      this.config.maxDelayMs
    );
    return Math.random() * cap;
  }
  
  private calculateEqualJitter(): number {
    const cap = Math.min(
      this.config.baseDelayMs * Math.pow(this.config.multiplier, this.currentAttempt - 1),
      this.config.maxDelayMs
    );
    return cap / 2 + Math.random() * (cap / 2);
  }
  
  private calculateDecorrelatedJitter(): number {
    const upperBound = Math.min(this.config.maxDelayMs, this.previousDelayMs * 3);
    return this.config.baseDelayMs + 
      Math.random() * (upperBound - this.config.baseDelayMs);
  }
  
  /**
   * Reset state for a new retry sequence.
   */
  reset(): void {
    this.currentAttempt = 0;
    this.previousDelayMs = this.config.baseDelayMs;
  }
  
  /**
   * Convenience method: sleep for the calculated delay.
   */
  async sleep(): Promise<BackoffResult> {
    const backoff = this.nextBackoff();
    await new Promise(resolve => setTimeout(resolve, backoff.delayMs));
    return backoff;
  }
}
 
/**
 * High-level retry function with jitter.
 */
async function retryWithJitter<T>(
  operation: () => Promise<T>,
  options: {
    maxAttempts?: number;
    jitterStrategy?: JitterStrategy;
    baseDelayMs?: number;
    maxDelayMs?: number;
    shouldRetry?: (error: Error, attempt: number) => boolean;
    onRetry?: (backoff: BackoffResult, error: Error) => void;
  } = {}
): Promise<T> {
  const {
    maxAttempts = 3,
    jitterStrategy = "equal",
    baseDelayMs = 100,
    maxDelayMs = 30000,
    shouldRetry = () => true,
    onRetry = () => {},
  } = options;
  
  const backoff = new ProductionJitterBackoff({
    strategy: jitterStrategy,
    baseDelayMs,
    maxDelayMs,
  });
  
  let lastError: Error | null = null;
  
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await operation();
    } catch (error) {
      lastError = error as Error;
      
      if (!shouldRetry(lastError, attempt) || attempt === maxAttempts) {
        break;
      }
      
      const backoffResult = await backoff.sleep();
      onRetry(backoffResult, lastError);
    }
  }
  
  throw lastError || new Error("Retry failed");
}
 
// =========================================
// Usage Examples
// =========================================
 
// Example 1: Basic retry with equal jitter (default)
async function fetchWithRetry(url: string): Promise<Response> {
  return retryWithJitter(
    () => fetch(url).then(r => {
      if (!r.ok) throw new Error(`HTTP ${r.status}`);
      return r;
    }),
    {
      maxAttempts: 3,
      onRetry: (backoff, error) => {
        console.log(`Retry ${backoff.attempt} after ${backoff.delayMs}ms: ${error.message}`);
      },
    }
  );
}
 
// Example 2: High-contention scenario with decorrelated jitter
async function acquireDistributedLock(lockId: string): Promise<boolean> {
  return retryWithJitter(
    async () => {
      const response = await fetch(`/locks/${lockId}`, { method: "POST" });
      if (response.status === 409) throw new Error("Lock contention");
      if (!response.ok) throw new Error(`HTTP ${response.status}`);
      return true;
    },
    {
      maxAttempts: 10,
      jitterStrategy: "decorrelated",  // Best for contention
      baseDelayMs: 50,
      maxDelayMs: 5000,
      onRetry: (backoff) => {
        console.log(`Lock contention, retry ${backoff.attempt} in ${backoff.delayMs}ms`);
      },
    }
  );
}
 
// Example 3: Real-time UX with full jitter for speed
async function fetchUserProfile(userId: string): Promise<User> {
  return retryWithJitter(
    async () => {
      const response = await fetch(`/users/${userId}`);
      if (!response.ok) throw new Error(`HTTP ${response.status}`);
      return response.json();
    },
    {
      maxAttempts: 2,
      jitterStrategy: "full",  // Some clients succeed immediately
      baseDelayMs: 25,
      maxDelayMs: 250,
    }
  );
}
 
interface User {
  id: string;
  name: string;
}

Summary: Mastering Jitter

Jitter is the critical ingredient that transforms exponential backoff from a single-client strategy into a distributed-systems-safe one. Without jitter, synchronized retries cause thundering herds. With jitter, retries spread naturally across time.

Key Takeaways

•Thundering herd kills systems — Synchronized retries create load spikes that prevent recovery.
•Jitter = randomness = desynchronization — Adding randomness breaks lock-step behavior.
•Full jitter (0 to max) — Maximum spread, but no guaranteed minimum delay.
•Equal jitter (half to max) — Balanced approach with guaranteed minimum delay.
•Decorrelated jitter — AWS-recommended for high contention; each delay based on previous.
•Always use some jitter — Any jitter is better than none. The specific strategy matters less than having one.

What's next:

With backoff and jitter, we can space out individual client retries effectively. But what happens when the entire system is overloaded and allowing more retries would make things worse? The next page covers Retry Budgets — a critical concept for preventing retry amplification at the system level.

Page Complete

You now understand the thundering herd problem and how jitter strategies solve it. Combined with exponential backoff, jitter forms the foundation of resilient retry behavior. Next, we'll learn how to limit retries at the system level with retry budgets.

3 / 5

Loading learning content...

System Design (HLD)Retry Strategies

Retry Strategies in Distributed Systems

LevelIntermediate

Duration75 mins

TopicRetry Strategies

3 / 5

Jitter for Thundering Herd

When the Herd Stampedes

At t=100ms, all 10,000 clients retry at the exact same instant.

This is the thundering herd problem.

What You Will Master

The Thundering Herd Problem

In distributed systems, the thundering herd manifests whenever many clients synchronize their behavior—intentionally or accidentally.

Common Thundering Herd Triggers

•Simultaneous failure: All clients fail at the same time due to a shared dependency, then retry together.
•Clock-aligned operations: Health checks or polling scheduled on round seconds (e.g., every minute at :00).
•Cache expiration: Cached entries created at the same time expire together, triggering many cache rebuilds.
•Deployment restarts: All instances restart simultaneously and connect to dependencies at once.
•Circuit breaker recovery: When a circuit breaker transitions from open to half-open, all waiting requests flood through.
•Rate limit reset: API rate limits reset at the same time for all clients (e.g., hourly at :00).

Why exponential backoff alone doesn't solve this:

Exponential backoff spaces out sequential retries from a single client. It does nothing to desynchronize parallel retries across multiple clients. If 10,000 clients all fail at t=0:

All retry at t=100ms (first backoff)
All retry at t=300ms (second backoff: 100+200)
All retry at t=700ms (third backoff: 100+200+400)

The retries remain perfectly synchronized because all clients use the same formula with the same starting point.

thundering-herd-simulation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
/**
 * Simulating the Thundering Herd Problem
 * 
 * This demonstrates why exponential backoff without jitter
 * leads to synchronized retry storms.
 */
 
interface RetryEvent {
  clientId: number;
  timestampMs: number;
  attempt: number;
}
 
function simulateWithoutJitter(
  numClients: number,
  baseDelayMs: number,
  multiplier: number,
  maxAttempts: number
): Map<number, number> {
  // Track load at each millisecond
  const loadByTime = new Map<number, number>();
  
  for (let clientId = 0; clientId < numClients; clientId++) {
    let currentTime = 0;  // All clients fail at t=0
    
    for (let attempt = 1; attempt <= maxAttempts; attempt++) {
      // Exponential delay without jitter
      const delay = baseDelayMs * Math.pow(multiplier, attempt - 1);
      currentTime += delay;
      
      // Round to nearest ms for bucketing
      const timeSlot = Math.round(currentTime);
      loadByTime.set(timeSlot, (loadByTime.get(timeSlot) || 0) + 1);
    }
  }
  
  return loadByTime;
}
 
// Simulate 10,000 clients with base=100ms, multiplier=2, 5 attempts
const loadWithoutJitter = simulateWithoutJitter(10000, 100, 2, 5);
 
// Find peak loads
const sortedLoads = Array.from(loadWithoutJitter.entries())
  .sort((a, b) => b[1] - a[1])
  .slice(0, 10);
 
console.log("Peak loads WITHOUT jitter:");
sortedLoads.forEach(([time, load]) => {
  console.log(`  t=${time}ms: ${load} requests`);
});
 
// Output:
// Peak loads WITHOUT jitter:
//   t=100ms: 10000 requests    <- ALL clients retry together!
//   t=300ms: 10000 requests    <- Same with second retry
//   t=700ms: 10000 requests    <- And third
//   t=1500ms: 10000 requests   <- And fourth
//   t=3100ms: 10000 requests   <- And fifth
 
// This is the thundering herd: perfect synchronization = maximum damage

The Math of Catastrophe

Jitter Strategy Overview

Jitter adds controlled randomness to retry delays, desynchronizing clients so they don't all retry at the same instant. Instead of a load spike at t=100ms, retries spread across a time window.

There are several jitter strategies, each with different properties:

Jitter Strategy Comparison
Strategy	Formula	Min Delay	Max Delay	Best For
No Jitter	base × 2^n	base × 2^n	base × 2^n	Debugging only (never in production)
Full Jitter	random(0, base × 2^n)	0	base × 2^n	Maximum spread; high contention systems
Equal Jitter	base × 2^n / 2 + random(half)	half of max	base × 2^n	Balanced; guaranteed minimum delay
Decorrelated Jitter	random(base, prev × 3)	base	~3× previous	Heavily loaded systems; best overall

The AWS recommendation:

The Key Insight

Full Jitter: Maximum Spread

Full jitter provides maximum randomization by selecting a random delay between 0 and the exponential cap.

Formula: delay = random(0, base × 2^attempt)

Properties:

Minimum delay: 0 (some clients retry almost immediately)
Maximum delay: base × 2^attempt
Average delay: (base × 2^attempt) / 2
Spread: Uniform across the entire range

full-jitter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
/**
 * Full Jitter Implementation
 * 
 * Provides maximum desynchronization by randomizing across
 * the entire delay range. Used by AWS SDKs.
 */
class FullJitterBackoff {
  private readonly baseMs: number;
  private readonly capMs: number;
  
  constructor(baseMs: number = 100, capMs: number = 30000) {
    this.baseMs = baseMs;
    this.capMs = capMs;
  }
  
  /**
   * Calculate delay with full jitter.
   * delay = random(0, min(cap, base × 2^attempt))
   */
  getDelay(attempt: number): number {
    const exponentialDelay = this.baseMs * Math.pow(2, attempt);
    const cappedDelay = Math.min(exponentialDelay, this.capMs);
    
    // Full jitter: random between 0 and capped delay
    return Math.random() * cappedDelay;
  }
  
  /**
   * Get statistical properties for analysis.
   */
  getStatistics(attempt: number): { min: number; max: number; average: number } {
    const exponentialDelay = this.baseMs * Math.pow(2, attempt);
    const cappedDelay = Math.min(exponentialDelay, this.capMs);
    
    return {
      min: 0,
      max: cappedDelay,
      average: cappedDelay / 2,
    };
  }
}
 
// Demonstrate the spread
const fullJitter = new FullJitterBackoff(100, 30000);
 
console.log("Full Jitter Delay Statistics:");
for (let attempt = 1; attempt <= 5; attempt++) {
  const stats = fullJitter.getStatistics(attempt);
  const sample = fullJitter.getDelay(attempt);
  console.log(`  Attempt ${attempt}: range=[0, ${stats.max}ms], avg=${stats.average}ms, sample=${Math.round(sample)}ms`);
}
 
// Output (samples will vary):
// Attempt 1: range=[0, 200ms], avg=100ms, sample=73ms
// Attempt 2: range=[0, 400ms], avg=200ms, sample=312ms
// Attempt 3: range=[0, 800ms], avg=400ms, sample=156ms
// Attempt 4: range=[0, 1600ms], avg=800ms, sample=1203ms
// Attempt 5: range=[0, 3200ms], avg=1600ms, sample=2847ms

Pros and Cons of Full Jitter:

Advantages

•Maximum desynchronization between clients
•Simple to implement and understand
•Some clients retry quickly (good for short failures)
•Well-tested in production at massive scale (AWS)

Disadvantages

•Zero minimum delay (some retries too fast)
•Unpredictable per-request latency
•Can result in very short delays that don't help
•Not optimal for competitive resource acquisition

When to Use Full Jitter

Equal Jitter: Balanced Approach

Equal jitter guarantees a minimum delay while still providing randomization. It splits the exponential delay in half: the first half is fixed, the second half is randomized.

Formula: delay = (base × 2^attempt)/2 + random(0, (base × 2^attempt)/2)

Properties:

Minimum delay: half of exponential delay
Maximum delay: full exponential delay
Average delay: 3/4 of exponential delay
Spread: Uniform across upper half of range

equal-jitter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
/**
 * Equal Jitter Implementation
 * 
 * Guarantees a minimum delay while still providing spread.
 * Also known as "half-jitter" or "50% jitter".
 */
class EqualJitterBackoff {
  private readonly baseMs: number;
  private readonly capMs: number;
  
  constructor(baseMs: number = 100, capMs: number = 30000) {
    this.baseMs = baseMs;
    this.capMs = capMs;
  }
  
  /**
   * Calculate delay with equal jitter.
   * delay = cap/2 + random(0, cap/2)
   * where cap = min(maxCap, base × 2^attempt)
   */
  getDelay(attempt: number): number {
    const exponentialDelay = this.baseMs * Math.pow(2, attempt);
    const cappedDelay = Math.min(exponentialDelay, this.capMs);
    
    const fixedPart = cappedDelay / 2;
    const randomPart = Math.random() * (cappedDelay / 2);
    
    return fixedPart + randomPart;
  }
  
  getStatistics(attempt: number): { min: number; max: number; average: number } {
    const exponentialDelay = this.baseMs * Math.pow(2, attempt);
    const cappedDelay = Math.min(exponentialDelay, this.capMs);
    
    return {
      min: cappedDelay / 2,
      max: cappedDelay,
      average: cappedDelay * 0.75,  // 3/4 of cap
    };
  }
}
 
// Compare Full vs Equal jitter
const fullJitter = new FullJitterBackoff(100, 30000);
const equalJitter = new EqualJitterBackoff(100, 30000);
 
console.log("Comparison: Full Jitter vs Equal Jitter");
console.log("---------------------------------------");
 
for (let attempt = 1; attempt <= 4; attempt++) {
  const fullStats = {
    min: 0,
    max: Math.min(100 * Math.pow(2, attempt), 30000),
  };
  const equalStats = equalJitter.getStatistics(attempt);
  
  console.log(`Attempt ${attempt}:`);
  console.log(`  Full:  [${fullStats.min}ms - ${fullStats.max}ms]`);
  console.log(`  Equal: [${equalStats.min}ms - ${equalStats.max}ms]`);
}
 
// Output:
// Attempt 1:
//   Full:  [0ms - 200ms]
//   Equal: [100ms - 200ms]    <- Guaranteed minimum!
// Attempt 2:
//   Full:  [0ms - 400ms]
//   Equal: [200ms - 400ms]
// Attempt 3:
//   Full:  [0ms - 800ms]
//   Equal: [400ms - 800ms]
// Attempt 4:
//   Full:  [0ms - 1600ms]
//   Equal: [800ms - 1600ms]

When Equal Jitter is preferred:

Equal jitter is the right choice when you need predictable minimum backoff. For example:

Rate-limited APIs: You know the server needs at least N ms to recover capacity
SLA requirements: You need to guarantee minimum delay between attempts for compliance
User-facing requests: You want to avoid near-instant retries that feel like "nothing happened"
Resource protection: You want to ensure servers always get breathing room

The Goldilocks Choice

Decorrelated Jitter: The AWS Recommendation

Formula: delay = random(base, min(cap, previous_delay × 3))

Properties:

Each delay is computed from the previous, not the attempt number
Delays drift randomly, creating natural decorrelation
No two clients will follow the same path
Mathematically proven to minimize total system time under high contention

decorrelated-jitter.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
/**
 * Decorrelated Jitter Implementation
 * 
 * This is the AWS-recommended algorithm from their 2015 paper.
 * It produces the best results under high contention.
 */
class DecorrelatedJitterBackoff {
  private readonly baseMs: number;
  private readonly capMs: number;
  private currentDelayMs: number;
  
  constructor(baseMs: number = 100, capMs: number = 30000) {
    this.baseMs = baseMs;
    this.capMs = capMs;
    this.currentDelayMs = baseMs;  // Start at base
  }
  
  /**
   * Get next delay using decorrelated jitter.
   * delay = random(base, min(cap, previous × 3))
   */
  getDelay(): number {
    // Calculate the upper bound for this iteration
    const upperBound = Math.min(this.capMs, this.currentDelayMs * 3);
    
    // Random delay between base and upper bound
    const newDelay = this.baseMs + Math.random() * (upperBound - this.baseMs);
    
    // Store for next iteration
    this.currentDelayMs = newDelay;
    
    return newDelay;
  }
  
  /**
   * Reset state for a new retry sequence.
   */
  reset(): void {
    this.currentDelayMs = this.baseMs;
  }
  
  /**
   * Simulate a complete sequence of retries.
   */
  simulateSequence(numRetries: number): number[] {
    this.reset();
    const delays: number[] = [];
    
    for (let i = 0; i < numRetries; i++) {
      delays.push(Math.round(this.getDelay()));
    }
    
    return delays;
  }
}
 
// Demonstrate that each sequence is different
const decorrelatedBackoff = new DecorrelatedJitterBackoff(100, 30000);
 
console.log("Decorrelated Jitter: Multiple 5-Retry Sequences");
console.log("(Notice how each sequence is completely different)");
console.log("-----------------------------------------------");
 
for (let seq = 1; seq <= 3; seq++) {
  const delays = decorrelatedBackoff.simulateSequence(5);
  console.log(`Sequence ${seq}: ${delays.map(d => d + "ms").join(" → ")}`);
}
 
// Output (example - will vary):
// Sequence 1: 142ms → 287ms → 651ms → 1823ms → 4201ms
// Sequence 2: 167ms → 398ms → 1041ms → 2871ms → 7432ms
// Sequence 3: 123ms → 312ms → 789ms → 2156ms → 5123ms
 
// Key insight: Unlike fixed exponential (100 → 200 → 400 → 800...),
// decorrelated jitter produces unique paths for each client!

Why decorrelated jitter is optimal:

AWS performed extensive simulations comparing all jitter strategies under high contention (many clients competing for limited resources). Key findings:

Decorrelated Jitter Performance

•Minimal total work: Clients complete all their work in the least total time compared to other strategies.
•Smoothest load: Load distribution over time is maximally uniform, with no spikes.
•Best fairness: No client consistently gets better or worse treatment than others.
•Self-adapting: Long failures naturally produce longer delays without explicit detection.
•State-dependent: Each client's trajectory depends on its own history, maximizing diversity.

Implementation Subtlety

Comparative Analysis: Choosing the Right Jitter

Let's put all strategies side-by-side with a simulation to understand their real-world behavior.

jitter-comparison.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
/**
 * Comprehensive Jitter Strategy Comparison
 * 
 * Simulates 1000 clients using each strategy and measures:
 * - Load distribution over time (peak vs average)
 * - Total completion time
 * - Fairness (variance in completion times)
 */
 
interface SimulationResult {
  strategy: string;
  peakLoad: number;
  averageLoad: number;
  peakToAverageRatio: number;
  totalCompletionTimeMs: number;
  fairnessScore: number;  // Lower is fairer (std dev of completion times)
}
 
function simulateStrategy(
  strategy: "none" | "full" | "equal" | "decorrelated",
  numClients: number,
  baseMs: number,
  maxAttempts: number
): SimulationResult {
  const loadByTimeBucket = new Map<number, number>();
  const completionTimes: number[] = [];
  const bucketSizeMs = 50;  // Group into 50ms buckets
  
  for (let client = 0; client < numClients; client++) {
    let currentTime = 0;
    let previousDelay = baseMs;
    
    for (let attempt = 1; attempt <= maxAttempts; attempt++) {
      let delay: number;
      
      switch (strategy) {
        case "none":
          delay = baseMs * Math.pow(2, attempt - 1);
          break;
        case "full":
          delay = Math.random() * baseMs * Math.pow(2, attempt - 1);
          break;
        case "equal":
          const cap = baseMs * Math.pow(2, attempt - 1);
          delay = cap / 2 + Math.random() * (cap / 2);
          break;
        case "decorrelated":
          const upperBound = Math.min(30000, previousDelay * 3);
          delay = baseMs + Math.random() * (upperBound - baseMs);
          previousDelay = delay;
          break;
      }
      
      currentTime += delay;
      
      const bucket = Math.floor(currentTime / bucketSizeMs);
      loadByTimeBucket.set(bucket, (loadByTimeBucket.get(bucket) || 0) + 1);
    }
    
    completionTimes.push(currentTime);
  }
  
  // Calculate metrics
  const loads = Array.from(loadByTimeBucket.values());
  const peakLoad = Math.max(...loads);
  const averageLoad = loads.reduce((a, b) => a + b, 0) / loads.length;
  
  const avgCompletion = completionTimes.reduce((a, b) => a + b) / completionTimes.length;
  const variance = completionTimes.reduce((sum, t) => sum + Math.pow(t - avgCompletion, 2), 0) / numClients;
  const fairnessScore = Math.sqrt(variance);
  
  return {
    strategy,
    peakLoad,
    averageLoad,
    peakToAverageRatio: peakLoad / averageLoad,
    totalCompletionTimeMs: Math.max(...completionTimes),
    fairnessScore,
  };
}
 
// Run simulations
const strategies: Array<"none" | "full" | "equal" | "decorrelated"> = [
  "none", "full", "equal", "decorrelated"
];
 
console.log("Jitter Strategy Comparison (1000 clients, 5 retries)");
console.log("====================================================
");
 
strategies.forEach(strategy => {
  const result = simulateStrategy(strategy, 1000, 100, 5);
  console.log(`Strategy: ${result.strategy.toUpperCase()}`);
  console.log(`  Peak Load:     ${result.peakLoad} requests/50ms`);
  console.log(`  Avg Load:      ${result.averageLoad.toFixed(1)} requests/50ms`);
  console.log(`  Peak/Avg:      ${result.peakToAverageRatio.toFixed(2)}x`);
  console.log(`  Total Time:    ${result.totalCompletionTimeMs.toFixed(0)}ms`);
  console.log(`  Fairness:      ${result.fairnessScore.toFixed(0)} (lower=better)
`);
});
 
/* Expected output pattern:
 
Strategy: NONE
  Peak Load:     5000 requests/50ms    <- DISASTER: all in same buckets!
  Peak/Avg:      ~50x
  
Strategy: FULL
  Peak Load:     ~50 requests/50ms     <- Massive improvement
  Peak/Avg:      ~5x
  
Strategy: EQUAL
  Peak Load:     ~40 requests/50ms     <- Good spread, guaranteed minimum
  Peak/Avg:      ~4x
  
Strategy: DECORRELATED
  Peak Load:     ~30 requests/50ms     <- Best distribution
  Peak/Avg:      ~3x
  Fairness:      Lowest                <- Most uniform experience
*/

Jitter Strategy Selection Guide
Scenario	Recommended Strategy	Reasoning
Default choice for most systems	Equal Jitter	Balances spread with guaranteed minimum backoff
Very high contention (thousands of clients)	Decorrelated Jitter	Proven mathematically optimal for high contention
Maximum spread needed	Full Jitter	Zero minimum means some clients succeed immediately
Simple implementation priority	Equal Jitter	Easy to understand and implement correctly
Need minimum backoff guarantee	Equal Jitter	Only strategy with hard minimum
Retry budget is tight	Full Jitter	Lower average delay means more attempts in budget

Production Implementation: Complete Jitter System

A production-ready jitter implementation should support all strategies, be easily configurable, and integrate cleanly with retry loops.

production-jitter-system.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
/**
 * Production-Ready Jitter System
 * 
 * Supports all major jitter strategies with clean configuration
 * and integration with retry logic.
 */
 
type JitterStrategy = "none" | "full" | "equal" | "decorrelated";
 
interface JitterConfig {
  strategy: JitterStrategy;
  baseDelayMs: number;
  maxDelayMs: number;
  multiplier: number;
}
 
interface BackoffResult {
  delayMs: number;
  attempt: number;
  strategy: JitterStrategy;
}
 
class ProductionJitterBackoff {
  private readonly config: JitterConfig;
  private previousDelayMs: number;
  private currentAttempt: number = 0;
  
  constructor(config: Partial<JitterConfig> = {}) {
    this.config = {
      strategy: "equal",  // Sensible default
      baseDelayMs: 100,
      maxDelayMs: 30000,
      multiplier: 2,
      ...config,
    };
    this.previousDelayMs = this.config.baseDelayMs;
  }
  
  /**
   * Get the next backoff delay.
   * Call this before each retry attempt.
   */
  nextBackoff(): BackoffResult {
    this.currentAttempt++;
    
    const delayMs = this.calculateDelay();
    
    // Update state for decorrelated jitter
    if (this.config.strategy === "decorrelated") {
      this.previousDelayMs = delayMs;
    }
    
    return {
      delayMs: Math.round(delayMs),
      attempt: this.currentAttempt,
      strategy: this.config.strategy,
    };
  }
  
  private calculateDelay(): number {
    const { strategy, baseDelayMs, maxDelayMs, multiplier } = this.config;
    
    switch (strategy) {
      case "none":
        return this.calculateNoJitter();
      case "full":
        return this.calculateFullJitter();
      case "equal":
        return this.calculateEqualJitter();
      case "decorrelated":
        return this.calculateDecorrelatedJitter();
      default:
        return this.calculateEqualJitter();
    }
  }
  
  private calculateNoJitter(): number {
    const exponential = this.config.baseDelayMs * 
      Math.pow(this.config.multiplier, this.currentAttempt - 1);
    return Math.min(exponential, this.config.maxDelayMs);
  }
  
  private calculateFullJitter(): number {
    const cap = Math.min(
      this.config.baseDelayMs * Math.pow(this.config.multiplier, this.currentAttempt - 1),
      this.config.maxDelayMs
    );
    return Math.random() * cap;
  }
  
  private calculateEqualJitter(): number {
    const cap = Math.min(
      this.config.baseDelayMs * Math.pow(this.config.multiplier, this.currentAttempt - 1),
      this.config.maxDelayMs
    );
    return cap / 2 + Math.random() * (cap / 2);
  }
  
  private calculateDecorrelatedJitter(): number {
    const upperBound = Math.min(this.config.maxDelayMs, this.previousDelayMs * 3);
    return this.config.baseDelayMs + 
      Math.random() * (upperBound - this.config.baseDelayMs);
  }
  
  /**
   * Reset state for a new retry sequence.
   */
  reset(): void {
    this.currentAttempt = 0;
    this.previousDelayMs = this.config.baseDelayMs;
  }
  
  /**
   * Convenience method: sleep for the calculated delay.
   */
  async sleep(): Promise<BackoffResult> {
    const backoff = this.nextBackoff();
    await new Promise(resolve => setTimeout(resolve, backoff.delayMs));
    return backoff;
  }
}
 
/**
 * High-level retry function with jitter.
 */
async function retryWithJitter<T>(
  operation: () => Promise<T>,
  options: {
    maxAttempts?: number;
    jitterStrategy?: JitterStrategy;
    baseDelayMs?: number;
    maxDelayMs?: number;
    shouldRetry?: (error: Error, attempt: number) => boolean;
    onRetry?: (backoff: BackoffResult, error: Error) => void;
  } = {}
): Promise<T> {
  const {
    maxAttempts = 3,
    jitterStrategy = "equal",
    baseDelayMs = 100,
    maxDelayMs = 30000,
    shouldRetry = () => true,
    onRetry = () => {},
  } = options;
  
  const backoff = new ProductionJitterBackoff({
    strategy: jitterStrategy,
    baseDelayMs,
    maxDelayMs,
  });
  
  let lastError: Error | null = null;
  
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await operation();
    } catch (error) {
      lastError = error as Error;
      
      if (!shouldRetry(lastError, attempt) || attempt === maxAttempts) {
        break;
      }
      
      const backoffResult = await backoff.sleep();
      onRetry(backoffResult, lastError);
    }
  }
  
  throw lastError || new Error("Retry failed");
}
 
// =========================================
// Usage Examples
// =========================================
 
// Example 1: Basic retry with equal jitter (default)
async function fetchWithRetry(url: string): Promise<Response> {
  return retryWithJitter(
    () => fetch(url).then(r => {
      if (!r.ok) throw new Error(`HTTP ${r.status}`);
      return r;
    }),
    {
      maxAttempts: 3,
      onRetry: (backoff, error) => {
        console.log(`Retry ${backoff.attempt} after ${backoff.delayMs}ms: ${error.message}`);
      },
    }
  );
}
 
// Example 2: High-contention scenario with decorrelated jitter
async function acquireDistributedLock(lockId: string): Promise<boolean> {
  return retryWithJitter(
    async () => {
      const response = await fetch(`/locks/${lockId}`, { method: "POST" });
      if (response.status === 409) throw new Error("Lock contention");
      if (!response.ok) throw new Error(`HTTP ${response.status}`);
      return true;
    },
    {
      maxAttempts: 10,
      jitterStrategy: "decorrelated",  // Best for contention
      baseDelayMs: 50,
      maxDelayMs: 5000,
      onRetry: (backoff) => {
        console.log(`Lock contention, retry ${backoff.attempt} in ${backoff.delayMs}ms`);
      },
    }
  );
}
 
// Example 3: Real-time UX with full jitter for speed
async function fetchUserProfile(userId: string): Promise<User> {
  return retryWithJitter(
    async () => {
      const response = await fetch(`/users/${userId}`);
      if (!response.ok) throw new Error(`HTTP ${response.status}`);
      return response.json();
    },
    {
      maxAttempts: 2,
      jitterStrategy: "full",  // Some clients succeed immediately
      baseDelayMs: 25,
      maxDelayMs: 250,
    }
  );
}
 
interface User {
  id: string;
  name: string;
}

Summary: Mastering Jitter

Key Takeaways

•Thundering herd kills systems — Synchronized retries create load spikes that prevent recovery.
•Jitter = randomness = desynchronization — Adding randomness breaks lock-step behavior.
•Full jitter (0 to max) — Maximum spread, but no guaranteed minimum delay.
•Equal jitter (half to max) — Balanced approach with guaranteed minimum delay.
•Decorrelated jitter — AWS-recommended for high contention; each delay based on previous.
•Always use some jitter — Any jitter is better than none. The specific strategy matters less than having one.

What's next:

Page Complete

3 / 5