Loading learning content...
Imagine this scenario: A database becomes briefly unavailable for 100 milliseconds. 10,000 clients experience simultaneous failures. Each client, following textbook exponential backoff, waits exactly 100ms before retrying.
At t=100ms, all 10,000 clients retry at the exact same instant.
The database, having just recovered, is immediately overwhelmed by 10,000 simultaneous requests. It fails again. All clients wait 200ms. At t=300ms, 10,000 simultaneous requests. Failure. Wait 400ms. 10,000 requests. Failure.
This is the thundering herd problem.
Even with perfect exponential backoff, synchronized retries create periodic load spikes that can be worse than the original failure. The solution is beautifully simple: jitter — adding randomness to break synchronization.
By the end of this page, you will deeply understand the thundering herd problem, why it occurs even with exponential backoff, the mathematics of jitter strategies, how to implement different jitter algorithms, and how to choose the right jitter type for different scenarios.
The term "thundering herd" originates from operating systems, describing what happens when many processes wait on the same resource (like a mutex or network event) and all wake up simultaneously when the resource becomes available, only to compete and mostly fail again.
In distributed systems, the thundering herd manifests whenever many clients synchronize their behavior—intentionally or accidentally.
Why exponential backoff alone doesn't solve this:
Exponential backoff spaces out sequential retries from a single client. It does nothing to desynchronize parallel retries across multiple clients. If 10,000 clients all fail at t=0:
The retries remain perfectly synchronized because all clients use the same formula with the same starting point.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
/** * Simulating the Thundering Herd Problem * * This demonstrates why exponential backoff without jitter * leads to synchronized retry storms. */ interface RetryEvent { clientId: number; timestampMs: number; attempt: number;} function simulateWithoutJitter( numClients: number, baseDelayMs: number, multiplier: number, maxAttempts: number): Map<number, number> { // Track load at each millisecond const loadByTime = new Map<number, number>(); for (let clientId = 0; clientId < numClients; clientId++) { let currentTime = 0; // All clients fail at t=0 for (let attempt = 1; attempt <= maxAttempts; attempt++) { // Exponential delay without jitter const delay = baseDelayMs * Math.pow(multiplier, attempt - 1); currentTime += delay; // Round to nearest ms for bucketing const timeSlot = Math.round(currentTime); loadByTime.set(timeSlot, (loadByTime.get(timeSlot) || 0) + 1); } } return loadByTime;} // Simulate 10,000 clients with base=100ms, multiplier=2, 5 attemptsconst loadWithoutJitter = simulateWithoutJitter(10000, 100, 2, 5); // Find peak loadsconst sortedLoads = Array.from(loadWithoutJitter.entries()) .sort((a, b) => b[1] - a[1]) .slice(0, 10); console.log("Peak loads WITHOUT jitter:");sortedLoads.forEach(([time, load]) => { console.log(` t=${time}ms: ${load} requests`);}); // Output:// Peak loads WITHOUT jitter:// t=100ms: 10000 requests <- ALL clients retry together!// t=300ms: 10000 requests <- Same with second retry// t=700ms: 10000 requests <- And third// t=1500ms: 10000 requests <- And fourth// t=3100ms: 10000 requests <- And fifth // This is the thundering herd: perfect synchronization = maximum damageIf a service can handle 1,000 req/s but 10,000 clients retry at the same instant, that instant experiences 10× normal load. Even brief overloads cause cascading effects: queues fill, timeouts occur, more retries trigger. Without jitter, recovery becomes mathematically impossible.
Jitter adds controlled randomness to retry delays, desynchronizing clients so they don't all retry at the same instant. Instead of a load spike at t=100ms, retries spread across a time window.
There are several jitter strategies, each with different properties:
1. No Jitter (baseline): delay = base × 2^attempt
2. Full Jitter: delay = random(0, base × 2^attempt)
3. Equal Jitter: delay = (base × 2^attempt)/2 + random(0, (base × 2^attempt)/2)
4. Decorrelated Jitter: delay = random(base, previous_delay × 3)
| Strategy | Formula | Min Delay | Max Delay | Best For |
|---|---|---|---|---|
| No Jitter | base × 2^n | base × 2^n | base × 2^n | Debugging only (never in production) |
| Full Jitter | random(0, base × 2^n) | 0 | base × 2^n | Maximum spread; high contention systems |
| Equal Jitter | base × 2^n / 2 + random(half) | half of max | base × 2^n | Balanced; guaranteed minimum delay |
| Decorrelated Jitter | random(base, prev × 3) | base | ~3× previous | Heavily loaded systems; best overall |
The AWS recommendation:
Amazon Web Services performed extensive analysis and found that decorrelated jitter produces the best results under high contention, reducing total completion time compared to other strategies. Their 2015 paper "Exponential Backoff and Jitter" remains the definitive reference.
The goal of jitter is not just randomness—it's desynchronization. Even small amounts of randomness break the lock-step behavior that causes thundering herds. The specific jitter algorithm matters less than simply having some jitter.
Full jitter provides maximum randomization by selecting a random delay between 0 and the exponential cap.
Formula: delay = random(0, base × 2^attempt)
Properties:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
/** * Full Jitter Implementation * * Provides maximum desynchronization by randomizing across * the entire delay range. Used by AWS SDKs. */class FullJitterBackoff { private readonly baseMs: number; private readonly capMs: number; constructor(baseMs: number = 100, capMs: number = 30000) { this.baseMs = baseMs; this.capMs = capMs; } /** * Calculate delay with full jitter. * delay = random(0, min(cap, base × 2^attempt)) */ getDelay(attempt: number): number { const exponentialDelay = this.baseMs * Math.pow(2, attempt); const cappedDelay = Math.min(exponentialDelay, this.capMs); // Full jitter: random between 0 and capped delay return Math.random() * cappedDelay; } /** * Get statistical properties for analysis. */ getStatistics(attempt: number): { min: number; max: number; average: number } { const exponentialDelay = this.baseMs * Math.pow(2, attempt); const cappedDelay = Math.min(exponentialDelay, this.capMs); return { min: 0, max: cappedDelay, average: cappedDelay / 2, }; }} // Demonstrate the spreadconst fullJitter = new FullJitterBackoff(100, 30000); console.log("Full Jitter Delay Statistics:");for (let attempt = 1; attempt <= 5; attempt++) { const stats = fullJitter.getStatistics(attempt); const sample = fullJitter.getDelay(attempt); console.log(` Attempt ${attempt}: range=[0, ${stats.max}ms], avg=${stats.average}ms, sample=${Math.round(sample)}ms`);} // Output (samples will vary):// Attempt 1: range=[0, 200ms], avg=100ms, sample=73ms// Attempt 2: range=[0, 400ms], avg=200ms, sample=312ms// Attempt 3: range=[0, 800ms], avg=400ms, sample=156ms// Attempt 4: range=[0, 1600ms], avg=800ms, sample=1203ms// Attempt 5: range=[0, 3200ms], avg=1600ms, sample=2847msPros and Cons of Full Jitter:
Full jitter is ideal when you want maximum spread and can tolerate some clients retrying almost immediately. It's particularly effective in high-scale scenarios where the statistical distribution matters more than individual client behavior.
Equal jitter guarantees a minimum delay while still providing randomization. It splits the exponential delay in half: the first half is fixed, the second half is randomized.
Formula: delay = (base × 2^attempt)/2 + random(0, (base × 2^attempt)/2)
Properties:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
/** * Equal Jitter Implementation * * Guarantees a minimum delay while still providing spread. * Also known as "half-jitter" or "50% jitter". */class EqualJitterBackoff { private readonly baseMs: number; private readonly capMs: number; constructor(baseMs: number = 100, capMs: number = 30000) { this.baseMs = baseMs; this.capMs = capMs; } /** * Calculate delay with equal jitter. * delay = cap/2 + random(0, cap/2) * where cap = min(maxCap, base × 2^attempt) */ getDelay(attempt: number): number { const exponentialDelay = this.baseMs * Math.pow(2, attempt); const cappedDelay = Math.min(exponentialDelay, this.capMs); const fixedPart = cappedDelay / 2; const randomPart = Math.random() * (cappedDelay / 2); return fixedPart + randomPart; } getStatistics(attempt: number): { min: number; max: number; average: number } { const exponentialDelay = this.baseMs * Math.pow(2, attempt); const cappedDelay = Math.min(exponentialDelay, this.capMs); return { min: cappedDelay / 2, max: cappedDelay, average: cappedDelay * 0.75, // 3/4 of cap }; }} // Compare Full vs Equal jitterconst fullJitter = new FullJitterBackoff(100, 30000);const equalJitter = new EqualJitterBackoff(100, 30000); console.log("Comparison: Full Jitter vs Equal Jitter");console.log("---------------------------------------"); for (let attempt = 1; attempt <= 4; attempt++) { const fullStats = { min: 0, max: Math.min(100 * Math.pow(2, attempt), 30000), }; const equalStats = equalJitter.getStatistics(attempt); console.log(`Attempt ${attempt}:`); console.log(` Full: [${fullStats.min}ms - ${fullStats.max}ms]`); console.log(` Equal: [${equalStats.min}ms - ${equalStats.max}ms]`);} // Output:// Attempt 1:// Full: [0ms - 200ms]// Equal: [100ms - 200ms] <- Guaranteed minimum!// Attempt 2:// Full: [0ms - 400ms]// Equal: [200ms - 400ms]// Attempt 3:// Full: [0ms - 800ms]// Equal: [400ms - 800ms]// Attempt 4:// Full: [0ms - 1600ms]// Equal: [800ms - 1600ms]When Equal Jitter is preferred:
Equal jitter is the right choice when you need predictable minimum backoff. For example:
Equal jitter is often the best default choice because it balances spread (preventing thundering herd) with predictability (guaranteed minimum delay). It's not as aggressive as full jitter but also not as risky.
Decorrelated jitter takes a fundamentally different approach: instead of computing delay from the attempt number, it computes delay based on the previous delay. This creates a random walk that naturally spreads clients over time.
Formula: delay = random(base, min(cap, previous_delay × 3))
Properties:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
/** * Decorrelated Jitter Implementation * * This is the AWS-recommended algorithm from their 2015 paper. * It produces the best results under high contention. */class DecorrelatedJitterBackoff { private readonly baseMs: number; private readonly capMs: number; private currentDelayMs: number; constructor(baseMs: number = 100, capMs: number = 30000) { this.baseMs = baseMs; this.capMs = capMs; this.currentDelayMs = baseMs; // Start at base } /** * Get next delay using decorrelated jitter. * delay = random(base, min(cap, previous × 3)) */ getDelay(): number { // Calculate the upper bound for this iteration const upperBound = Math.min(this.capMs, this.currentDelayMs * 3); // Random delay between base and upper bound const newDelay = this.baseMs + Math.random() * (upperBound - this.baseMs); // Store for next iteration this.currentDelayMs = newDelay; return newDelay; } /** * Reset state for a new retry sequence. */ reset(): void { this.currentDelayMs = this.baseMs; } /** * Simulate a complete sequence of retries. */ simulateSequence(numRetries: number): number[] { this.reset(); const delays: number[] = []; for (let i = 0; i < numRetries; i++) { delays.push(Math.round(this.getDelay())); } return delays; }} // Demonstrate that each sequence is differentconst decorrelatedBackoff = new DecorrelatedJitterBackoff(100, 30000); console.log("Decorrelated Jitter: Multiple 5-Retry Sequences");console.log("(Notice how each sequence is completely different)");console.log("-----------------------------------------------"); for (let seq = 1; seq <= 3; seq++) { const delays = decorrelatedBackoff.simulateSequence(5); console.log(`Sequence ${seq}: ${delays.map(d => d + "ms").join(" → ")}`);} // Output (example - will vary):// Sequence 1: 142ms → 287ms → 651ms → 1823ms → 4201ms// Sequence 2: 167ms → 398ms → 1041ms → 2871ms → 7432ms// Sequence 3: 123ms → 312ms → 789ms → 2156ms → 5123ms // Key insight: Unlike fixed exponential (100 → 200 → 400 → 800...),// decorrelated jitter produces unique paths for each client!Why decorrelated jitter is optimal:
AWS performed extensive simulations comparing all jitter strategies under high contention (many clients competing for limited resources). Key findings:
Decorrelated jitter requires maintaining state (the previous delay) between retries. This means the backoff object must persist across retry attempts. Don't create a new instance for each retry — you'll lose the decorrelation benefit!
Let's put all strategies side-by-side with a simulation to understand their real-world behavior.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120
/** * Comprehensive Jitter Strategy Comparison * * Simulates 1000 clients using each strategy and measures: * - Load distribution over time (peak vs average) * - Total completion time * - Fairness (variance in completion times) */ interface SimulationResult { strategy: string; peakLoad: number; averageLoad: number; peakToAverageRatio: number; totalCompletionTimeMs: number; fairnessScore: number; // Lower is fairer (std dev of completion times)} function simulateStrategy( strategy: "none" | "full" | "equal" | "decorrelated", numClients: number, baseMs: number, maxAttempts: number): SimulationResult { const loadByTimeBucket = new Map<number, number>(); const completionTimes: number[] = []; const bucketSizeMs = 50; // Group into 50ms buckets for (let client = 0; client < numClients; client++) { let currentTime = 0; let previousDelay = baseMs; for (let attempt = 1; attempt <= maxAttempts; attempt++) { let delay: number; switch (strategy) { case "none": delay = baseMs * Math.pow(2, attempt - 1); break; case "full": delay = Math.random() * baseMs * Math.pow(2, attempt - 1); break; case "equal": const cap = baseMs * Math.pow(2, attempt - 1); delay = cap / 2 + Math.random() * (cap / 2); break; case "decorrelated": const upperBound = Math.min(30000, previousDelay * 3); delay = baseMs + Math.random() * (upperBound - baseMs); previousDelay = delay; break; } currentTime += delay; const bucket = Math.floor(currentTime / bucketSizeMs); loadByTimeBucket.set(bucket, (loadByTimeBucket.get(bucket) || 0) + 1); } completionTimes.push(currentTime); } // Calculate metrics const loads = Array.from(loadByTimeBucket.values()); const peakLoad = Math.max(...loads); const averageLoad = loads.reduce((a, b) => a + b, 0) / loads.length; const avgCompletion = completionTimes.reduce((a, b) => a + b) / completionTimes.length; const variance = completionTimes.reduce((sum, t) => sum + Math.pow(t - avgCompletion, 2), 0) / numClients; const fairnessScore = Math.sqrt(variance); return { strategy, peakLoad, averageLoad, peakToAverageRatio: peakLoad / averageLoad, totalCompletionTimeMs: Math.max(...completionTimes), fairnessScore, };} // Run simulationsconst strategies: Array<"none" | "full" | "equal" | "decorrelated"> = [ "none", "full", "equal", "decorrelated"]; console.log("Jitter Strategy Comparison (1000 clients, 5 retries)");console.log("===================================================="); strategies.forEach(strategy => { const result = simulateStrategy(strategy, 1000, 100, 5); console.log(`Strategy: ${result.strategy.toUpperCase()}`); console.log(` Peak Load: ${result.peakLoad} requests/50ms`); console.log(` Avg Load: ${result.averageLoad.toFixed(1)} requests/50ms`); console.log(` Peak/Avg: ${result.peakToAverageRatio.toFixed(2)}x`); console.log(` Total Time: ${result.totalCompletionTimeMs.toFixed(0)}ms`); console.log(` Fairness: ${result.fairnessScore.toFixed(0)} (lower=better)`);}); /* Expected output pattern: Strategy: NONE Peak Load: 5000 requests/50ms <- DISASTER: all in same buckets! Peak/Avg: ~50x Strategy: FULL Peak Load: ~50 requests/50ms <- Massive improvement Peak/Avg: ~5x Strategy: EQUAL Peak Load: ~40 requests/50ms <- Good spread, guaranteed minimum Peak/Avg: ~4x Strategy: DECORRELATED Peak Load: ~30 requests/50ms <- Best distribution Peak/Avg: ~3x Fairness: Lowest <- Most uniform experience*/| Scenario | Recommended Strategy | Reasoning |
|---|---|---|
| Default choice for most systems | Equal Jitter | Balances spread with guaranteed minimum backoff |
| Very high contention (thousands of clients) | Decorrelated Jitter | Proven mathematically optimal for high contention |
| Maximum spread needed | Full Jitter | Zero minimum means some clients succeed immediately |
| Simple implementation priority | Equal Jitter | Easy to understand and implement correctly |
| Need minimum backoff guarantee | Equal Jitter | Only strategy with hard minimum |
| Retry budget is tight | Full Jitter | Lower average delay means more attempts in budget |
A production-ready jitter implementation should support all strategies, be easily configurable, and integrate cleanly with retry loops.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233
/** * Production-Ready Jitter System * * Supports all major jitter strategies with clean configuration * and integration with retry logic. */ type JitterStrategy = "none" | "full" | "equal" | "decorrelated"; interface JitterConfig { strategy: JitterStrategy; baseDelayMs: number; maxDelayMs: number; multiplier: number;} interface BackoffResult { delayMs: number; attempt: number; strategy: JitterStrategy;} class ProductionJitterBackoff { private readonly config: JitterConfig; private previousDelayMs: number; private currentAttempt: number = 0; constructor(config: Partial<JitterConfig> = {}) { this.config = { strategy: "equal", // Sensible default baseDelayMs: 100, maxDelayMs: 30000, multiplier: 2, ...config, }; this.previousDelayMs = this.config.baseDelayMs; } /** * Get the next backoff delay. * Call this before each retry attempt. */ nextBackoff(): BackoffResult { this.currentAttempt++; const delayMs = this.calculateDelay(); // Update state for decorrelated jitter if (this.config.strategy === "decorrelated") { this.previousDelayMs = delayMs; } return { delayMs: Math.round(delayMs), attempt: this.currentAttempt, strategy: this.config.strategy, }; } private calculateDelay(): number { const { strategy, baseDelayMs, maxDelayMs, multiplier } = this.config; switch (strategy) { case "none": return this.calculateNoJitter(); case "full": return this.calculateFullJitter(); case "equal": return this.calculateEqualJitter(); case "decorrelated": return this.calculateDecorrelatedJitter(); default: return this.calculateEqualJitter(); } } private calculateNoJitter(): number { const exponential = this.config.baseDelayMs * Math.pow(this.config.multiplier, this.currentAttempt - 1); return Math.min(exponential, this.config.maxDelayMs); } private calculateFullJitter(): number { const cap = Math.min( this.config.baseDelayMs * Math.pow(this.config.multiplier, this.currentAttempt - 1), this.config.maxDelayMs ); return Math.random() * cap; } private calculateEqualJitter(): number { const cap = Math.min( this.config.baseDelayMs * Math.pow(this.config.multiplier, this.currentAttempt - 1), this.config.maxDelayMs ); return cap / 2 + Math.random() * (cap / 2); } private calculateDecorrelatedJitter(): number { const upperBound = Math.min(this.config.maxDelayMs, this.previousDelayMs * 3); return this.config.baseDelayMs + Math.random() * (upperBound - this.config.baseDelayMs); } /** * Reset state for a new retry sequence. */ reset(): void { this.currentAttempt = 0; this.previousDelayMs = this.config.baseDelayMs; } /** * Convenience method: sleep for the calculated delay. */ async sleep(): Promise<BackoffResult> { const backoff = this.nextBackoff(); await new Promise(resolve => setTimeout(resolve, backoff.delayMs)); return backoff; }} /** * High-level retry function with jitter. */async function retryWithJitter<T>( operation: () => Promise<T>, options: { maxAttempts?: number; jitterStrategy?: JitterStrategy; baseDelayMs?: number; maxDelayMs?: number; shouldRetry?: (error: Error, attempt: number) => boolean; onRetry?: (backoff: BackoffResult, error: Error) => void; } = {}): Promise<T> { const { maxAttempts = 3, jitterStrategy = "equal", baseDelayMs = 100, maxDelayMs = 30000, shouldRetry = () => true, onRetry = () => {}, } = options; const backoff = new ProductionJitterBackoff({ strategy: jitterStrategy, baseDelayMs, maxDelayMs, }); let lastError: Error | null = null; for (let attempt = 1; attempt <= maxAttempts; attempt++) { try { return await operation(); } catch (error) { lastError = error as Error; if (!shouldRetry(lastError, attempt) || attempt === maxAttempts) { break; } const backoffResult = await backoff.sleep(); onRetry(backoffResult, lastError); } } throw lastError || new Error("Retry failed");} // =========================================// Usage Examples// ========================================= // Example 1: Basic retry with equal jitter (default)async function fetchWithRetry(url: string): Promise<Response> { return retryWithJitter( () => fetch(url).then(r => { if (!r.ok) throw new Error(`HTTP ${r.status}`); return r; }), { maxAttempts: 3, onRetry: (backoff, error) => { console.log(`Retry ${backoff.attempt} after ${backoff.delayMs}ms: ${error.message}`); }, } );} // Example 2: High-contention scenario with decorrelated jitterasync function acquireDistributedLock(lockId: string): Promise<boolean> { return retryWithJitter( async () => { const response = await fetch(`/locks/${lockId}`, { method: "POST" }); if (response.status === 409) throw new Error("Lock contention"); if (!response.ok) throw new Error(`HTTP ${response.status}`); return true; }, { maxAttempts: 10, jitterStrategy: "decorrelated", // Best for contention baseDelayMs: 50, maxDelayMs: 5000, onRetry: (backoff) => { console.log(`Lock contention, retry ${backoff.attempt} in ${backoff.delayMs}ms`); }, } );} // Example 3: Real-time UX with full jitter for speedasync function fetchUserProfile(userId: string): Promise<User> { return retryWithJitter( async () => { const response = await fetch(`/users/${userId}`); if (!response.ok) throw new Error(`HTTP ${response.status}`); return response.json(); }, { maxAttempts: 2, jitterStrategy: "full", // Some clients succeed immediately baseDelayMs: 25, maxDelayMs: 250, } );} interface User { id: string; name: string;}Jitter is the critical ingredient that transforms exponential backoff from a single-client strategy into a distributed-systems-safe one. Without jitter, synchronized retries cause thundering herds. With jitter, retries spread naturally across time.
What's next:
With backoff and jitter, we can space out individual client retries effectively. But what happens when the entire system is overloaded and allowing more retries would make things worse? The next page covers Retry Budgets — a critical concept for preventing retry amplification at the system level.
You now understand the thundering herd problem and how jitter strategies solve it. Combined with exponential backoff, jitter forms the foundation of resilient retry behavior. Next, we'll learn how to limit retries at the system level with retry budgets.