Backend For Frontend - Learning Module

Loading content...

0/273

Request Coalescing

The Thundering Herd Problem

Imagine your mobile app's home screen goes viral. Suddenly, 10,000 users open the app simultaneously. Without request coalescing, your BFF receives 10,000 identical requests for trending content, which it dutifully forwards as 10,000 identical calls to the Trending Service. The service collapses under load, taking your entire platform with it.

Now imagine an alternative: those 10,000 requests arrive at the BFF within a 100ms window. The BFF recognizes they're all asking for the same data, makes a single call to the Trending Service, and fans out the response to all 10,000 waiting requests. The Trending Service barely notices the load spike.

This is request coalescing—the art of combining multiple in-flight requests for the same data into a single backend call. It's a critical technique for protecting backend services and improving overall system resilience.

What You Will Learn

By the end of this page, you will master request coalescing techniques including in-flight request deduplication, time-window batching, collapse-forwarding patterns, cache-miss coalescing, and the subtle trade-offs between latency, consistency, and system efficiency that these techniques introduce.

Understanding Request Coalescing

Request coalescing is the process of identifying multiple requests that would result in identical backend operations and executing the operation once, sharing the result among all requesters.

The Core Insight

In any system with multiple clients requesting the same data, there exists an opportunity window where duplicate work can be eliminated:

Temporal: Requests arriving within a short time window often need the same data
Semantic: Requests with identical parameters produce identical results
Derived: Some requests are subsets of others and can be satisfied from larger fetches

Converting Mermaid diagram...

When Coalescing Applies

Coalescing is effective when:

Multiple clients request identical data — Common for public/shared content
Requests arrive in bursts — Event-driven traffic spikes
Backend is slower than request arrival rate — Requests accumulate while backend processes
Cache misses cluster — After cache expiration, multiple clients simultaneously miss

Coalescing is not appropriate when:

Each request is unique — Personalized data with no overlap
Request order matters — Mutations where sequence is significant
Immediate consistency required — Each caller needs guaranteed-fresh data
Very low traffic — Overhead exceeds benefit

In-Flight Request Deduplication

The simplest and most common form of coalescing is in-flight deduplication: when a request arrives for data that's already being fetched, attach the new request to the pending operation rather than starting a new one.

The Singleflight Pattern

The singleflight pattern (popularized by Go's golang.org/x/sync/singleflight package) ensures that for any given key, only one operation executes at a time. Additional callers receive the result of the in-flight operation.

singleflight.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
// Singleflight implementation in TypeScript
 
type PendingCall<T> = {
  promise: Promise<T>;
  callerCount: number;
  startTime: number;
};
 
class Singleflight {
  private pending = new Map<string, PendingCall<any>>();
  private metrics: MetricsClient;
 
  /**
   * Execute a function, deduplicating concurrent calls for the same key.
   * All concurrent callers receive the same result.
   */
  async do<T>(key: string, fn: () => Promise<T>): Promise<T> {
    // Check if there's already an in-flight request for this key
    const existing = this.pending.get(key);
    if (existing) {
      existing.callerCount++;
      this.metrics.increment('singleflight.deduped', { key });
      return existing.promise;
    }
 
    // Create new pending request
    const pendingCall: PendingCall<T> = {
      promise: fn(),
      callerCount: 1,
      startTime: Date.now(),
    };
    
    this.pending.set(key, pendingCall);
    this.metrics.increment('singleflight.new', { key });
 
    try {
      const result = await pendingCall.promise;
      
      // Record metrics
      this.metrics.histogram('singleflight.duration_ms', Date.now() - pendingCall.startTime, { key });
      this.metrics.histogram('singleflight.callers', pendingCall.callerCount, { key });
      
      return result;
    } finally {
      // Always clean up, even on failure
      this.pending.delete(key);
    }
  }
 
  /**
   * Get the number of callers waiting for a specific key
   */
  getWaiterCount(key: string): number {
    return this.pending.get(key)?.callerCount ?? 0;
  }
}
 
// Usage
const singleflight = new Singleflight();
 
async function getTrendingContent(): Promise<Content[]> {
  // All concurrent calls will be coalesced into one backend request
  return singleflight.do('trending', async () => {
    const response = await contentService.getTrending();
    return response.items;
  });
}
 
// Example: 100 concurrent requests
const results = await Promise.all(
  Array(100).fill(null).map(() => getTrendingContent())
);
// All 100 receive the same data; only 1 backend call was made

Key Generation Strategies

The effectiveness of singleflight depends on correctly identifying equivalent requests. The key must capture all semantically significant request parameters:

key-generation.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Key generation strategies for different scenarios
 
// Simple key for public data
function publicDataKey(resource: string): string {
  return resource; // e.g., 'trending', 'categories'
}
 
// Key with pagination
function paginatedKey(resource: string, page: number, limit: number): string {
  return `${resource}:page=${page}:limit=${limit}`;
}
 
// Key with user context (but shared within user)
function userScopedKey(userId: string, resource: string): string {
  return `user:${userId}:${resource}`;
}
 
// Key with complex query parameters
function queryKey(path: string, params: Record<string, string | number>): string {
  const sortedParams = Object.entries(params)
    .sort(([a], [b]) => a.localeCompare(b))
    .map(([k, v]) => `${k}=${v}`)
    .join('&');
  return `${path}?${sortedParams}`;
}
 
// Key with request body (for POST endpoints that behave like GET)
function bodyKey(path: string, body: object): string {
  // Use stable JSON serialization for consistent keys
  return `${path}:${stableStringify(body)}`;
}
 
// WRONG: Including non-semantic fields
function badKey(request: Request): string {
  // ❌ Don't include timestamps, request IDs, or other unique fields
  return `${request.path}:${request.timestamp}:${request.id}`;
  // This key will never match another request!
}

Error Propagation

When one request fails in a singleflight group, ALL waiters receive the error. This means one unlucky timeout can fail 100 requests. Consider whether this amplification is acceptable, or if you need retry-per-caller semantics.

Time-Window Batching

While singleflight coalesces concurrent requests for the same key, time-window batching coalesces requests for different keys that arrive within a short window, making a single batch request.

The Batching Trade-off

Batching introduces intentional latency: you wait for a window to accumulate requests before executing. The trade-off is:

Increased latency for individual requests (waiting for batch window)
Decreased backend load (fewer total calls)
Reduced per-request overhead (connection reuse, fewer round trips)

This is worthwhile when the batch efficiency gains exceed the window latency cost.

batching-window.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
// Time-window batching implementation
 
interface BatchConfig {
  maxBatchSize: number;    // Maximum items per batch
  maxWaitMs: number;       // Maximum time to wait for batch
  minBatchSize?: number;   // Minimum items before early flush
}
 
class BatchingQueue<TKey, TResult> {
  private pending: Map<TKey, {
    resolve: (result: TResult) => void;
    reject: (error: Error) => void;
  }[]> = new Map();
  
  private timer: NodeJS.Timeout | null = null;
  private batchStartTime: number | null = null;
 
  constructor(
    private batchFetcher: (keys: TKey[]) => Promise<Map<TKey, TResult>>,
    private config: BatchConfig
  ) {}
 
  async get(key: TKey): Promise<TResult> {
    return new Promise((resolve, reject) => {
      // Add to pending batch
      if (!this.pending.has(key)) {
        this.pending.set(key, []);
      }
      this.pending.get(key)!.push({ resolve, reject });
 
      // Start timer on first request
      if (this.timer === null) {
        this.batchStartTime = Date.now();
        this.timer = setTimeout(() => this.flush(), this.config.maxWaitMs);
      }
 
      // Check if we should flush early
      if (this.pending.size >= this.config.maxBatchSize) {
        this.flush();
      }
    });
  }
 
  private async flush(): Promise<void> {
    // Clear timer
    if (this.timer) {
      clearTimeout(this.timer);
      this.timer = null;
    }
 
    // Capture current batch
    const batch = new Map(this.pending);
    this.pending.clear();
    
    if (batch.size === 0) return;
 
    // Log batch metrics
    const waitTime = this.batchStartTime ? Date.now() - this.batchStartTime : 0;
    console.log(`Flushing batch: ${batch.size} keys after ${waitTime}ms`);
    this.batchStartTime = null;
 
    try {
      // Execute batch fetch
      const results = await this.batchFetcher([...batch.keys()]);
 
      // Resolve all waiters
      for (const [key, waiters] of batch) {
        const result = results.get(key);
        if (result !== undefined) {
          waiters.forEach(w => w.resolve(result));
        } else {
          waiters.forEach(w => w.reject(new Error(`Key not found: ${key}`)));
        }
      }
    } catch (error) {
      // Reject all waiters on batch failure
      for (const waiters of batch.values()) {
        waiters.forEach(w => w.reject(error as Error));
      }
    }
  }
}
 
// Usage
const productBatcher = new BatchingQueue<string, Product>(
  async (productIds) => {
    // Single batched call instead of N individual calls
    const products = await productService.batchGet(productIds);
    return new Map(products.map(p => [p.id, p]));
  },
  {
    maxBatchSize: 50,
    maxWaitMs: 10, // 10ms maximum wait
  }
);
 
// These calls arriving within 10ms are batched together
const [product1, product2, product3] = await Promise.all([
  productBatcher.get('prod-1'),
  productBatcher.get('prod-2'),
  productBatcher.get('prod-3'),
]);
// One backend call for all three products

Choosing the Batch Window

The optimal batch window depends on several factors:

Batch Window Trade-offs
Window Size	Batch Efficiency	Added Latency	Best For
1-5ms	Low (small batches)	Minimal	Latency-critical paths
10-20ms	Moderate	Acceptable	Most BFF use cases
50-100ms	High	Noticeable	Background processing, analytics
100ms+	Very high	Significant	Only when latency is irrelevant

Adaptive Batching Windows

Consider adaptive windows that shrink under low load (prioritize latency) and expand under high load (prioritize efficiency). Monitor batch sizes—consistently small batches suggest your window is too long; consistently hitting max size suggests backend can handle more load.

Cache-Miss Coalescing

One of the most impactful applications of coalescing is preventing the "thundering herd" on cache expiration. When a popular cached item expires, thousands of requests may simultaneously miss the cache and attempt to regenerate it.

The Cache Stampede Problem

Converting Mermaid diagram...

cache-coalescing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
// Cache with miss coalescing (anti-stampede)
 
class CoalescingCache<T> {
  private cache: Map<string, CacheEntry<T>> = new Map();
  private singleflight = new Singleflight();
 
  constructor(
    private fetcher: (key: string) => Promise<T>,
    private defaultTtlMs: number
  ) {}
 
  async get(key: string): Promise<T> {
    // Check cache
    const cached = this.cache.get(key);
    if (cached && cached.expiresAt > Date.now()) {
      return cached.value;
    }
 
    // Cache miss - use singleflight to coalesce concurrent fetches
    const value = await this.singleflight.do(key, async () => {
      // Double-check cache (another request might have populated it)
      const recheck = this.cache.get(key);
      if (recheck && recheck.expiresAt > Date.now()) {
        return recheck.value;
      }
 
      // Fetch from source
      const fresh = await this.fetcher(key);
      
      // Populate cache
      this.cache.set(key, {
        value: fresh,
        expiresAt: Date.now() + this.defaultTtlMs,
      });
      
      return fresh;
    });
 
    return value;
  }
}
 
// Advanced: Probabilistic early expiration (prevents synchronized expiry)
class StaggeredCache<T> extends CoalescingCache<T> {
  async get(key: string): Promise<T> {
    const cached = this.cache.get(key);
    
    if (cached) {
      const timeToExpiry = cached.expiresAt - Date.now();
      const ttl = cached.originalTtl;
      
      // Probabilistic early refresh
      // As we approach expiry, increase probability of refresh
      // This spreads out refresh load instead of spike at expiry
      const refreshProbability = this.calculateRefreshProbability(timeToExpiry, ttl);
      
      if (timeToExpiry > 0 && Math.random() > refreshProbability) {
        return cached.value;
      }
      
      // Either expired or randomly chosen for early refresh
      // Use background refresh if not expired yet
      if (timeToExpiry > 0) {
        this.refreshInBackground(key);
        return cached.value;
      }
    }
 
    // Expired or not cached - fetch synchronously with coalescing
    return super.get(key);
  }
 
  private calculateRefreshProbability(timeToExpiry: number, ttl: number): number {
    // Exponential probability increase as expiry approaches
    // At 10% remaining TTL, ~63% chance of early refresh
    const remainingFraction = timeToExpiry / ttl;
    if (remainingFraction > 0.1) return 0; // No early refresh until 90% through TTL
    return 1 - Math.exp(-10 * (0.1 - remainingFraction));
  }
}
 
interface CacheEntry<T> {
  value: T;
  expiresAt: number;
  originalTtl: number;
}

Lock-Based vs Lockless Coalescing

Two approaches exist for preventing stampedes:

Singleflight (Lockless) — All concurrent requests wait for one fetch. Simple, but all waiters get same latency.
Mutex Locking — One request acquires a lock, others wait briefly then serve stale. More complex but fairer latency distribution.

Singleflight is generally preferred in BFF contexts for its simplicity.

External Cache Coalescing

When using external caches (Redis, Memcached), coalescing must happen at the application level—the cache itself won't prevent stampedes. Consider using Redis-based distributed locks (SETNX with TTL) for cross-instance coordination of cache refreshes.

Collapse-Forwarding Pattern

Collapse-forwarding is a CDN/proxy technique where multiple requests for the same resource are collapsed into a single origin request. BFFs can implement similar functionality for backend service calls.

How Collapse-Forwarding Works

First request for a resource arrives → Forward to backend, mark as "pending"
Subsequent requests for same resource arrive while first is pending → Hold them
Backend responds → Fulfill original request AND all held requests
Clear pending state

This differs from simple caching in that it works for uncacheable requests and cache misses—situations where pure caching doesn't help.

collapse-forwarding.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
// Collapse-forwarding with request queuing
 
interface PendingRequest<T> {
  inflightPromise: Promise<T>;
  originTime: number;
  queuedRequests: number;
}
 
class CollapseForwarder {
  private pending = new Map<string, PendingRequest<any>>();
  
  // Configuration
  private maxCollapseTime = 5000; // Don't collapse over 5 seconds
  private maxQueueSize = 1000;    // Don't queue more than 1000 requests
  
  async forward<T>(
    key: string,
    forwarder: () => Promise<T>
  ): Promise<T> {
    const existing = this.pending.get(key);
    
    // Check if we can collapse onto existing request
    if (existing) {
      const age = Date.now() - existing.originTime;
      
      // Safety limits
      if (age > this.maxCollapseTime) {
        this.metrics.increment('collapse.rejected.timeout', { key });
        // Request too old, don't collapse (origin might be stuck)
        return forwarder();
      }
      
      if (existing.queuedRequests >= this.maxQueueSize) {
        this.metrics.increment('collapse.rejected.queue_full', { key });
        // Queue full, don't risk memory exhaustion
        return forwarder();
      }
      
      // Collapse onto existing request
      existing.queuedRequests++;
      this.metrics.increment('collapse.joined', { key });
      return existing.inflightPromise;
    }
    
    // Create new pending request
    const newPending: PendingRequest<T> = {
      inflightPromise: this.executeWithCleanup(key, forwarder),
      originTime: Date.now(),
      queuedRequests: 1,
    };
    
    this.pending.set(key, newPending);
    this.metrics.increment('collapse.originated', { key });
    
    return newPending.inflightPromise;
  }
  
  private async executeWithCleanup<T>(
    key: string,
    forwarder: () => Promise<T>
  ): Promise<T> {
    try {
      return await forwarder();
    } finally {
      const pending = this.pending.get(key);
      if (pending) {
        this.metrics.histogram('collapse.queue_size', pending.queuedRequests, { key });
        this.metrics.histogram('collapse.duration_ms', Date.now() - pending.originTime, { key });
      }
      this.pending.delete(key);
    }
  }
  
  // Observability
  getStats(): CollapseStats {
    return {
      pendingKeys: this.pending.size,
      totalQueued: [...this.pending.values()].reduce((a, b) => a + b.queuedRequests, 0),
      oldestPending: Math.min(...[...this.pending.values()].map(p => p.originTime)),
    };
  }
}

Collapse-Forwarding in CDN Context

When a BFF sits behind a CDN, both layers may implement collapse-forwarding. This provides defense in depth:

CDN Layer — Collapses requests at edge locations (regional)
BFF Layer — Collapses requests within each BFF instance (instance-local)
Service Layer — Some backend services implement their own coalescing

Each layer reduces load for the next, providing multiplicative protection.

Monitoring Collapse Age

Track the age of pending requests carefully. If requests are pending for seconds, something is wrong—either the backend is unhealthy or you have a configuration issue. Alert on p99 collapse duration exceeding your expected backend response time.

Distributed Coalescing

In multi-instance BFF deployments, per-instance coalescing has limits. If load balancers distribute requests evenly across 10 instances, each instance might make its own backend call for the same data—reducing duplication by only 90%.

Centralized Coordination

For maximum deduplication, instances must coordinate:

Leader Election — Select one instance to make the actual call
Result Broadcasting — Share the result with all instances
Failure Handling — Fallback if leader fails

distributed-coalescing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
// Distributed coalescing using Redis for coordination
 
class DistributedSingleflight {
  constructor(
    private redis: Redis,
    private pubsub: RedisPubSub,
    private instanceId: string
  ) {}
 
  async do<T>(key: string, fetcher: () => Promise<T>, ttlMs: number = 5000): Promise<T> {
    const lockKey = `singleflight:lock:${key}`;
    const resultKey = `singleflight:result:${key}`;
    const channel = `singleflight:channel:${key}`;
    
    // Try to acquire leader lock
    const acquired = await this.redis.set(lockKey, this.instanceId, 'NX', 'PX', ttlMs);
    
    if (acquired) {
      // We are the leader - execute the fetch
      try {
        const result = await fetcher();
        
        // Store result for other instances
        await this.redis.setex(resultKey, Math.ceil(ttlMs / 1000), JSON.stringify(result));
        
        // Notify waiting instances
        await this.pubsub.publish(channel, JSON.stringify({
          status: 'success',
          instanceId: this.instanceId,
        }));
        
        return result;
      } catch (error) {
        // Notify failure
        await this.pubsub.publish(channel, JSON.stringify({
          status: 'error',
          error: error.message,
          instanceId: this.instanceId,
        }));
        throw error;
      } finally {
        // Release lock
        await this.redis.del(lockKey);
      }
    }
    
    // We are a follower - wait for leader's result
    return new Promise((resolve, reject) => {
      const timeout = setTimeout(() => {
        cleanup();
        // Timeout - try fetching ourselves
        fetcher().then(resolve).catch(reject);
      }, ttlMs);
      
      const cleanup = () => {
        clearTimeout(timeout);
        this.pubsub.unsubscribe(channel);
      };
      
      this.pubsub.subscribe(channel, async (message) => {
        const notification = JSON.parse(message);
        cleanup();
        
        if (notification.status === 'success') {
          // Fetch result from Redis
          const resultJson = await this.redis.get(resultKey);
          resolve(JSON.parse(resultJson!));
        } else {
          reject(new Error(`Leader fetch failed: ${notification.error}`));
        }
      });
      
      // Also check if result already exists (in case we missed the pub)
      this.redis.get(resultKey).then(existing => {
        if (existing) {
          cleanup();
          resolve(JSON.parse(existing));
        }
      });
    });
  }
}

Trade-offs of Distributed Coalescing

While powerful, distributed coalescing adds complexity:

Distributed Coalescing Trade-offs
Aspect	Benefit	Cost
Deduplication	Near-perfect across all instances	Requires external coordination system
Latency	Reduced backend load	+1-5ms for coordination overhead
Reliability	Better protected backends	New failure mode (coordinator failure)
Complexity	—	Significantly more complex implementation
Consistency	All instances see same result	Single point of truth can be bottleneck

When to Use Distributed Coalescing

Start with per-instance coalescing (simple, no external dependencies). Add distributed coalescing only if you observe excessive duplication AND your backends are struggling. The coordination overhead may not be worth it if your backends can handle the load.

Consistency Considerations

Coalescing necessarily means multiple requests receive the same response. This has consistency implications that must be understood and managed.

Temporal Consistency Windows

When requests are coalesced over a time window, all requests receive data as of the first request's fetch time. This creates a consistency window:

Time: 0ms    10ms   20ms   30ms   40ms   50ms (data changes in backend)
         ↓      ↓      ↓
      Req A   Req B  Req C
         ←── Coalesced ──→
             All get data from time 0ms

Requests B and C receive data that may be stale by up to their delay from the original request.

Consistency Risks

•Stale-Read Amplification — If the first request coincides with a write, all coalesced readers see pre-write state, even those arriving after the write.
•Order Inversion — A request starting after another might receive older data due to coalescing timing.
•User Confusion — User refreshes page, sees same stale data because their refresh was coalesced with an older request.
•Race Condition Exposure — Coalescing can expose backend race conditions that individual requests would have masked.

Mitigating Consistency Issues

consistency-mitigation.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
// Consistency-aware coalescing
 
class ConsistentCoalescer {
  async doWithConsistency<T>(
    key: string,
    fetcher: () => Promise<T>,
    options: ConsistencyOptions = {}
  ): Promise<CoalescedResult<T>> {
    const coalescedResult = await this.singleflight.do(key, fetcher);
    const originTime = this.getOriginTime(key);
    
    return {
      data: coalescedResult,
      consistency: {
        dataAsOf: originTime,
        requestTime: Date.now(),
        staleBy: Date.now() - originTime,
        coalesced: true,
      },
    };
  }
}
 
// Response includes staleness metadata
interface CoalescedResult<T> {
  data: T;
  consistency: {
    dataAsOf: number;      // When the data was actually fetched
    requestTime: number;   // When this specific request arrived
    staleBy: number;       // Milliseconds of potential staleness
    coalesced: boolean;    // Whether this was a coalesced request
  };
}
 
// Client can use consistency metadata
async function handleProductRequest(productId: string) {
  const result = await coalescer.doWithConsistency(
    `product:${productId}`,
    () => productService.get(productId)
  );
  
  // Include data age in response headers
  return {
    data: result.data,
    headers: {
      'X-Data-Age-Ms': result.consistency.staleBy.toString(),
      'X-Data-As-Of': new Date(result.consistency.dataAsOf).toISOString(),
    },
  };
}
 
// Opt-out for consistency-critical requests
function shouldBypassCoalescing(request: Request): boolean {
  // User explicitly requests fresh data
  if (request.headers.get('Cache-Control') === 'no-cache') {
    return true;
  }
  
  // Request immediately follows a write
  if (request.headers.get('X-After-Write') === 'true') {
    return true;
  }
  
  // Specific endpoints that need consistency
  const consistentPaths = ['/checkout', '/payment', '/withdrawal'];
  if (consistentPaths.some(p => request.path.startsWith(p))) {
    return true;
  }
  
  return false;
}

Read-Your-Writes Consistency

The most noticeable consistency issue is read-after-write: user updates their profile, immediately views it, but sees stale data from a coalesced request. Prevent this by including a 'last-write-token' in write responses that bypasses coalescing when presented in subsequent reads.

Summary: Request Coalescing Mastery

Request coalescing is a powerful technique for improving system efficiency and protecting backend services. When implemented correctly, it can reduce backend load by orders of magnitude during traffic spikes without significantly impacting user experience.

Key Takeaways

•Singleflight is the foundation — Deduplicate concurrent identical requests; simple to implement, highly effective.
•Time-window batching trades latency for efficiency — Intentional short delays enable request combination; tune window size carefully.
•Cache-miss coalescing prevents stampedes — Critical for popular data; probabilistic early refresh distributes load.
•Collapse-forwarding layers compound — CDN + BFF + backend coalescing provides defense-in-depth.
•Distributed coalescing maximizes deduplication — But adds significant complexity; evaluate if needed.
•Consistency implications are real — Coalesced requests share data staleness; provide metadata for transparency.
•Provide bypass mechanisms — Consistency-critical paths should skip coalescing.

What's Next:

With coalescing patterns mastered, the final page explores BFF Trade-offs—the architectural complexities, operational challenges, and organizational considerations that determine whether the BFF pattern is the right choice for your system.

Page Complete

You now understand request coalescing patterns in depth. You can implement singleflight deduplication, time-window batching, cache-miss protection, and distributed coordination—giving your BFFs the ability to protect backend services from traffic spikes while maintaining acceptable consistency guarantees.

Request Coalescing

The Thundering Herd Problem

What You Will Learn

Understanding Request Coalescing

Request coalescing is the process of identifying multiple requests that would result in identical backend operations and executing the operation once, sharing the result among all requesters.

The Core Insight

In any system with multiple clients requesting the same data, there exists an opportunity window where duplicate work can be eliminated:

Temporal: Requests arriving within a short time window often need the same data
Semantic: Requests with identical parameters produce identical results
Derived: Some requests are subsets of others and can be satisfied from larger fetches

Converting Mermaid diagram...

When Coalescing Applies

Coalescing is effective when:

Multiple clients request identical data — Common for public/shared content
Requests arrive in bursts — Event-driven traffic spikes
Backend is slower than request arrival rate — Requests accumulate while backend processes
Cache misses cluster — After cache expiration, multiple clients simultaneously miss

Coalescing is not appropriate when:

Each request is unique — Personalized data with no overlap
Request order matters — Mutations where sequence is significant
Immediate consistency required — Each caller needs guaranteed-fresh data
Very low traffic — Overhead exceeds benefit

In-Flight Request Deduplication

The Singleflight Pattern

singleflight.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
// Singleflight implementation in TypeScript
 
type PendingCall<T> = {
  promise: Promise<T>;
  callerCount: number;
  startTime: number;
};
 
class Singleflight {
  private pending = new Map<string, PendingCall<any>>();
  private metrics: MetricsClient;
 
  /**
   * Execute a function, deduplicating concurrent calls for the same key.
   * All concurrent callers receive the same result.
   */
  async do<T>(key: string, fn: () => Promise<T>): Promise<T> {
    // Check if there's already an in-flight request for this key
    const existing = this.pending.get(key);
    if (existing) {
      existing.callerCount++;
      this.metrics.increment('singleflight.deduped', { key });
      return existing.promise;
    }
 
    // Create new pending request
    const pendingCall: PendingCall<T> = {
      promise: fn(),
      callerCount: 1,
      startTime: Date.now(),
    };
    
    this.pending.set(key, pendingCall);
    this.metrics.increment('singleflight.new', { key });
 
    try {
      const result = await pendingCall.promise;
      
      // Record metrics
      this.metrics.histogram('singleflight.duration_ms', Date.now() - pendingCall.startTime, { key });
      this.metrics.histogram('singleflight.callers', pendingCall.callerCount, { key });
      
      return result;
    } finally {
      // Always clean up, even on failure
      this.pending.delete(key);
    }
  }
 
  /**
   * Get the number of callers waiting for a specific key
   */
  getWaiterCount(key: string): number {
    return this.pending.get(key)?.callerCount ?? 0;
  }
}
 
// Usage
const singleflight = new Singleflight();
 
async function getTrendingContent(): Promise<Content[]> {
  // All concurrent calls will be coalesced into one backend request
  return singleflight.do('trending', async () => {
    const response = await contentService.getTrending();
    return response.items;
  });
}
 
// Example: 100 concurrent requests
const results = await Promise.all(
  Array(100).fill(null).map(() => getTrendingContent())
);
// All 100 receive the same data; only 1 backend call was made

Key Generation Strategies

The effectiveness of singleflight depends on correctly identifying equivalent requests. The key must capture all semantically significant request parameters:

key-generation.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Key generation strategies for different scenarios
 
// Simple key for public data
function publicDataKey(resource: string): string {
  return resource; // e.g., 'trending', 'categories'
}
 
// Key with pagination
function paginatedKey(resource: string, page: number, limit: number): string {
  return `${resource}:page=${page}:limit=${limit}`;
}
 
// Key with user context (but shared within user)
function userScopedKey(userId: string, resource: string): string {
  return `user:${userId}:${resource}`;
}
 
// Key with complex query parameters
function queryKey(path: string, params: Record<string, string | number>): string {
  const sortedParams = Object.entries(params)
    .sort(([a], [b]) => a.localeCompare(b))
    .map(([k, v]) => `${k}=${v}`)
    .join('&');
  return `${path}?${sortedParams}`;
}
 
// Key with request body (for POST endpoints that behave like GET)
function bodyKey(path: string, body: object): string {
  // Use stable JSON serialization for consistent keys
  return `${path}:${stableStringify(body)}`;
}
 
// WRONG: Including non-semantic fields
function badKey(request: Request): string {
  // ❌ Don't include timestamps, request IDs, or other unique fields
  return `${request.path}:${request.timestamp}:${request.id}`;
  // This key will never match another request!
}

Error Propagation

Time-Window Batching

While singleflight coalesces concurrent requests for the same key, time-window batching coalesces requests for different keys that arrive within a short window, making a single batch request.

The Batching Trade-off

Batching introduces intentional latency: you wait for a window to accumulate requests before executing. The trade-off is:

Increased latency for individual requests (waiting for batch window)
Decreased backend load (fewer total calls)
Reduced per-request overhead (connection reuse, fewer round trips)

This is worthwhile when the batch efficiency gains exceed the window latency cost.

batching-window.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
// Time-window batching implementation
 
interface BatchConfig {
  maxBatchSize: number;    // Maximum items per batch
  maxWaitMs: number;       // Maximum time to wait for batch
  minBatchSize?: number;   // Minimum items before early flush
}
 
class BatchingQueue<TKey, TResult> {
  private pending: Map<TKey, {
    resolve: (result: TResult) => void;
    reject: (error: Error) => void;
  }[]> = new Map();
  
  private timer: NodeJS.Timeout | null = null;
  private batchStartTime: number | null = null;
 
  constructor(
    private batchFetcher: (keys: TKey[]) => Promise<Map<TKey, TResult>>,
    private config: BatchConfig
  ) {}
 
  async get(key: TKey): Promise<TResult> {
    return new Promise((resolve, reject) => {
      // Add to pending batch
      if (!this.pending.has(key)) {
        this.pending.set(key, []);
      }
      this.pending.get(key)!.push({ resolve, reject });
 
      // Start timer on first request
      if (this.timer === null) {
        this.batchStartTime = Date.now();
        this.timer = setTimeout(() => this.flush(), this.config.maxWaitMs);
      }
 
      // Check if we should flush early
      if (this.pending.size >= this.config.maxBatchSize) {
        this.flush();
      }
    });
  }
 
  private async flush(): Promise<void> {
    // Clear timer
    if (this.timer) {
      clearTimeout(this.timer);
      this.timer = null;
    }
 
    // Capture current batch
    const batch = new Map(this.pending);
    this.pending.clear();
    
    if (batch.size === 0) return;
 
    // Log batch metrics
    const waitTime = this.batchStartTime ? Date.now() - this.batchStartTime : 0;
    console.log(`Flushing batch: ${batch.size} keys after ${waitTime}ms`);
    this.batchStartTime = null;
 
    try {
      // Execute batch fetch
      const results = await this.batchFetcher([...batch.keys()]);
 
      // Resolve all waiters
      for (const [key, waiters] of batch) {
        const result = results.get(key);
        if (result !== undefined) {
          waiters.forEach(w => w.resolve(result));
        } else {
          waiters.forEach(w => w.reject(new Error(`Key not found: ${key}`)));
        }
      }
    } catch (error) {
      // Reject all waiters on batch failure
      for (const waiters of batch.values()) {
        waiters.forEach(w => w.reject(error as Error));
      }
    }
  }
}
 
// Usage
const productBatcher = new BatchingQueue<string, Product>(
  async (productIds) => {
    // Single batched call instead of N individual calls
    const products = await productService.batchGet(productIds);
    return new Map(products.map(p => [p.id, p]));
  },
  {
    maxBatchSize: 50,
    maxWaitMs: 10, // 10ms maximum wait
  }
);
 
// These calls arriving within 10ms are batched together
const [product1, product2, product3] = await Promise.all([
  productBatcher.get('prod-1'),
  productBatcher.get('prod-2'),
  productBatcher.get('prod-3'),
]);
// One backend call for all three products

Choosing the Batch Window

The optimal batch window depends on several factors:

Batch Window Trade-offs
Window Size	Batch Efficiency	Added Latency	Best For
1-5ms	Low (small batches)	Minimal	Latency-critical paths
10-20ms	Moderate	Acceptable	Most BFF use cases
50-100ms	High	Noticeable	Background processing, analytics
100ms+	Very high	Significant	Only when latency is irrelevant

Adaptive Batching Windows

Cache-Miss Coalescing

The Cache Stampede Problem

Converting Mermaid diagram...

cache-coalescing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
// Cache with miss coalescing (anti-stampede)
 
class CoalescingCache<T> {
  private cache: Map<string, CacheEntry<T>> = new Map();
  private singleflight = new Singleflight();
 
  constructor(
    private fetcher: (key: string) => Promise<T>,
    private defaultTtlMs: number
  ) {}
 
  async get(key: string): Promise<T> {
    // Check cache
    const cached = this.cache.get(key);
    if (cached && cached.expiresAt > Date.now()) {
      return cached.value;
    }
 
    // Cache miss - use singleflight to coalesce concurrent fetches
    const value = await this.singleflight.do(key, async () => {
      // Double-check cache (another request might have populated it)
      const recheck = this.cache.get(key);
      if (recheck && recheck.expiresAt > Date.now()) {
        return recheck.value;
      }
 
      // Fetch from source
      const fresh = await this.fetcher(key);
      
      // Populate cache
      this.cache.set(key, {
        value: fresh,
        expiresAt: Date.now() + this.defaultTtlMs,
      });
      
      return fresh;
    });
 
    return value;
  }
}
 
// Advanced: Probabilistic early expiration (prevents synchronized expiry)
class StaggeredCache<T> extends CoalescingCache<T> {
  async get(key: string): Promise<T> {
    const cached = this.cache.get(key);
    
    if (cached) {
      const timeToExpiry = cached.expiresAt - Date.now();
      const ttl = cached.originalTtl;
      
      // Probabilistic early refresh
      // As we approach expiry, increase probability of refresh
      // This spreads out refresh load instead of spike at expiry
      const refreshProbability = this.calculateRefreshProbability(timeToExpiry, ttl);
      
      if (timeToExpiry > 0 && Math.random() > refreshProbability) {
        return cached.value;
      }
      
      // Either expired or randomly chosen for early refresh
      // Use background refresh if not expired yet
      if (timeToExpiry > 0) {
        this.refreshInBackground(key);
        return cached.value;
      }
    }
 
    // Expired or not cached - fetch synchronously with coalescing
    return super.get(key);
  }
 
  private calculateRefreshProbability(timeToExpiry: number, ttl: number): number {
    // Exponential probability increase as expiry approaches
    // At 10% remaining TTL, ~63% chance of early refresh
    const remainingFraction = timeToExpiry / ttl;
    if (remainingFraction > 0.1) return 0; // No early refresh until 90% through TTL
    return 1 - Math.exp(-10 * (0.1 - remainingFraction));
  }
}
 
interface CacheEntry<T> {
  value: T;
  expiresAt: number;
  originalTtl: number;
}

Lock-Based vs Lockless Coalescing

Two approaches exist for preventing stampedes:

Singleflight (Lockless) — All concurrent requests wait for one fetch. Simple, but all waiters get same latency.
Mutex Locking — One request acquires a lock, others wait briefly then serve stale. More complex but fairer latency distribution.

Singleflight is generally preferred in BFF contexts for its simplicity.

External Cache Coalescing

Collapse-Forwarding Pattern

How Collapse-Forwarding Works

First request for a resource arrives → Forward to backend, mark as "pending"
Subsequent requests for same resource arrive while first is pending → Hold them
Backend responds → Fulfill original request AND all held requests
Clear pending state

This differs from simple caching in that it works for uncacheable requests and cache misses—situations where pure caching doesn't help.

collapse-forwarding.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
// Collapse-forwarding with request queuing
 
interface PendingRequest<T> {
  inflightPromise: Promise<T>;
  originTime: number;
  queuedRequests: number;
}
 
class CollapseForwarder {
  private pending = new Map<string, PendingRequest<any>>();
  
  // Configuration
  private maxCollapseTime = 5000; // Don't collapse over 5 seconds
  private maxQueueSize = 1000;    // Don't queue more than 1000 requests
  
  async forward<T>(
    key: string,
    forwarder: () => Promise<T>
  ): Promise<T> {
    const existing = this.pending.get(key);
    
    // Check if we can collapse onto existing request
    if (existing) {
      const age = Date.now() - existing.originTime;
      
      // Safety limits
      if (age > this.maxCollapseTime) {
        this.metrics.increment('collapse.rejected.timeout', { key });
        // Request too old, don't collapse (origin might be stuck)
        return forwarder();
      }
      
      if (existing.queuedRequests >= this.maxQueueSize) {
        this.metrics.increment('collapse.rejected.queue_full', { key });
        // Queue full, don't risk memory exhaustion
        return forwarder();
      }
      
      // Collapse onto existing request
      existing.queuedRequests++;
      this.metrics.increment('collapse.joined', { key });
      return existing.inflightPromise;
    }
    
    // Create new pending request
    const newPending: PendingRequest<T> = {
      inflightPromise: this.executeWithCleanup(key, forwarder),
      originTime: Date.now(),
      queuedRequests: 1,
    };
    
    this.pending.set(key, newPending);
    this.metrics.increment('collapse.originated', { key });
    
    return newPending.inflightPromise;
  }
  
  private async executeWithCleanup<T>(
    key: string,
    forwarder: () => Promise<T>
  ): Promise<T> {
    try {
      return await forwarder();
    } finally {
      const pending = this.pending.get(key);
      if (pending) {
        this.metrics.histogram('collapse.queue_size', pending.queuedRequests, { key });
        this.metrics.histogram('collapse.duration_ms', Date.now() - pending.originTime, { key });
      }
      this.pending.delete(key);
    }
  }
  
  // Observability
  getStats(): CollapseStats {
    return {
      pendingKeys: this.pending.size,
      totalQueued: [...this.pending.values()].reduce((a, b) => a + b.queuedRequests, 0),
      oldestPending: Math.min(...[...this.pending.values()].map(p => p.originTime)),
    };
  }
}

Collapse-Forwarding in CDN Context

When a BFF sits behind a CDN, both layers may implement collapse-forwarding. This provides defense in depth:

CDN Layer — Collapses requests at edge locations (regional)
BFF Layer — Collapses requests within each BFF instance (instance-local)
Service Layer — Some backend services implement their own coalescing

Each layer reduces load for the next, providing multiplicative protection.

Monitoring Collapse Age

Distributed Coalescing

Centralized Coordination

For maximum deduplication, instances must coordinate:

Leader Election — Select one instance to make the actual call
Result Broadcasting — Share the result with all instances
Failure Handling — Fallback if leader fails

distributed-coalescing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
// Distributed coalescing using Redis for coordination
 
class DistributedSingleflight {
  constructor(
    private redis: Redis,
    private pubsub: RedisPubSub,
    private instanceId: string
  ) {}
 
  async do<T>(key: string, fetcher: () => Promise<T>, ttlMs: number = 5000): Promise<T> {
    const lockKey = `singleflight:lock:${key}`;
    const resultKey = `singleflight:result:${key}`;
    const channel = `singleflight:channel:${key}`;
    
    // Try to acquire leader lock
    const acquired = await this.redis.set(lockKey, this.instanceId, 'NX', 'PX', ttlMs);
    
    if (acquired) {
      // We are the leader - execute the fetch
      try {
        const result = await fetcher();
        
        // Store result for other instances
        await this.redis.setex(resultKey, Math.ceil(ttlMs / 1000), JSON.stringify(result));
        
        // Notify waiting instances
        await this.pubsub.publish(channel, JSON.stringify({
          status: 'success',
          instanceId: this.instanceId,
        }));
        
        return result;
      } catch (error) {
        // Notify failure
        await this.pubsub.publish(channel, JSON.stringify({
          status: 'error',
          error: error.message,
          instanceId: this.instanceId,
        }));
        throw error;
      } finally {
        // Release lock
        await this.redis.del(lockKey);
      }
    }
    
    // We are a follower - wait for leader's result
    return new Promise((resolve, reject) => {
      const timeout = setTimeout(() => {
        cleanup();
        // Timeout - try fetching ourselves
        fetcher().then(resolve).catch(reject);
      }, ttlMs);
      
      const cleanup = () => {
        clearTimeout(timeout);
        this.pubsub.unsubscribe(channel);
      };
      
      this.pubsub.subscribe(channel, async (message) => {
        const notification = JSON.parse(message);
        cleanup();
        
        if (notification.status === 'success') {
          // Fetch result from Redis
          const resultJson = await this.redis.get(resultKey);
          resolve(JSON.parse(resultJson!));
        } else {
          reject(new Error(`Leader fetch failed: ${notification.error}`));
        }
      });
      
      // Also check if result already exists (in case we missed the pub)
      this.redis.get(resultKey).then(existing => {
        if (existing) {
          cleanup();
          resolve(JSON.parse(existing));
        }
      });
    });
  }
}

Trade-offs of Distributed Coalescing

While powerful, distributed coalescing adds complexity:

Distributed Coalescing Trade-offs
Aspect	Benefit	Cost
Deduplication	Near-perfect across all instances	Requires external coordination system
Latency	Reduced backend load	+1-5ms for coordination overhead
Reliability	Better protected backends	New failure mode (coordinator failure)
Complexity	—	Significantly more complex implementation
Consistency	All instances see same result	Single point of truth can be bottleneck

When to Use Distributed Coalescing

Consistency Considerations

Coalescing necessarily means multiple requests receive the same response. This has consistency implications that must be understood and managed.

Temporal Consistency Windows

When requests are coalesced over a time window, all requests receive data as of the first request's fetch time. This creates a consistency window:

Time: 0ms    10ms   20ms   30ms   40ms   50ms (data changes in backend)
         ↓      ↓      ↓
      Req A   Req B  Req C
         ←── Coalesced ──→
             All get data from time 0ms

Requests B and C receive data that may be stale by up to their delay from the original request.

Consistency Risks

•Stale-Read Amplification — If the first request coincides with a write, all coalesced readers see pre-write state, even those arriving after the write.
•Order Inversion — A request starting after another might receive older data due to coalescing timing.
•User Confusion — User refreshes page, sees same stale data because their refresh was coalesced with an older request.
•Race Condition Exposure — Coalescing can expose backend race conditions that individual requests would have masked.

Mitigating Consistency Issues

consistency-mitigation.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
// Consistency-aware coalescing
 
class ConsistentCoalescer {
  async doWithConsistency<T>(
    key: string,
    fetcher: () => Promise<T>,
    options: ConsistencyOptions = {}
  ): Promise<CoalescedResult<T>> {
    const coalescedResult = await this.singleflight.do(key, fetcher);
    const originTime = this.getOriginTime(key);
    
    return {
      data: coalescedResult,
      consistency: {
        dataAsOf: originTime,
        requestTime: Date.now(),
        staleBy: Date.now() - originTime,
        coalesced: true,
      },
    };
  }
}
 
// Response includes staleness metadata
interface CoalescedResult<T> {
  data: T;
  consistency: {
    dataAsOf: number;      // When the data was actually fetched
    requestTime: number;   // When this specific request arrived
    staleBy: number;       // Milliseconds of potential staleness
    coalesced: boolean;    // Whether this was a coalesced request
  };
}
 
// Client can use consistency metadata
async function handleProductRequest(productId: string) {
  const result = await coalescer.doWithConsistency(
    `product:${productId}`,
    () => productService.get(productId)
  );
  
  // Include data age in response headers
  return {
    data: result.data,
    headers: {
      'X-Data-Age-Ms': result.consistency.staleBy.toString(),
      'X-Data-As-Of': new Date(result.consistency.dataAsOf).toISOString(),
    },
  };
}
 
// Opt-out for consistency-critical requests
function shouldBypassCoalescing(request: Request): boolean {
  // User explicitly requests fresh data
  if (request.headers.get('Cache-Control') === 'no-cache') {
    return true;
  }
  
  // Request immediately follows a write
  if (request.headers.get('X-After-Write') === 'true') {
    return true;
  }
  
  // Specific endpoints that need consistency
  const consistentPaths = ['/checkout', '/payment', '/withdrawal'];
  if (consistentPaths.some(p => request.path.startsWith(p))) {
    return true;
  }
  
  return false;
}

Read-Your-Writes Consistency

Summary: Request Coalescing Mastery

Key Takeaways

•Singleflight is the foundation — Deduplicate concurrent identical requests; simple to implement, highly effective.
•Time-window batching trades latency for efficiency — Intentional short delays enable request combination; tune window size carefully.
•Cache-miss coalescing prevents stampedes — Critical for popular data; probabilistic early refresh distributes load.
•Collapse-forwarding layers compound — CDN + BFF + backend coalescing provides defense-in-depth.
•Distributed coalescing maximizes deduplication — But adds significant complexity; evaluate if needed.
•Consistency implications are real — Coalesced requests share data staleness; provide metadata for transparency.
•Provide bypass mechanisms — Consistency-critical paths should skip coalescing.

What's Next:

Page Complete