Loading content...
Imagine your mobile app's home screen goes viral. Suddenly, 10,000 users open the app simultaneously. Without request coalescing, your BFF receives 10,000 identical requests for trending content, which it dutifully forwards as 10,000 identical calls to the Trending Service. The service collapses under load, taking your entire platform with it.
Now imagine an alternative: those 10,000 requests arrive at the BFF within a 100ms window. The BFF recognizes they're all asking for the same data, makes a single call to the Trending Service, and fans out the response to all 10,000 waiting requests. The Trending Service barely notices the load spike.
This is request coalescing—the art of combining multiple in-flight requests for the same data into a single backend call. It's a critical technique for protecting backend services and improving overall system resilience.
By the end of this page, you will master request coalescing techniques including in-flight request deduplication, time-window batching, collapse-forwarding patterns, cache-miss coalescing, and the subtle trade-offs between latency, consistency, and system efficiency that these techniques introduce.
Request coalescing is the process of identifying multiple requests that would result in identical backend operations and executing the operation once, sharing the result among all requesters.
In any system with multiple clients requesting the same data, there exists an opportunity window where duplicate work can be eliminated:
Coalescing is effective when:
Coalescing is not appropriate when:
The simplest and most common form of coalescing is in-flight deduplication: when a request arrives for data that's already being fetched, attach the new request to the pending operation rather than starting a new one.
The singleflight pattern (popularized by Go's golang.org/x/sync/singleflight package) ensures that for any given key, only one operation executes at a time. Additional callers receive the result of the in-flight operation.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
// Singleflight implementation in TypeScript type PendingCall<T> = { promise: Promise<T>; callerCount: number; startTime: number;}; class Singleflight { private pending = new Map<string, PendingCall<any>>(); private metrics: MetricsClient; /** * Execute a function, deduplicating concurrent calls for the same key. * All concurrent callers receive the same result. */ async do<T>(key: string, fn: () => Promise<T>): Promise<T> { // Check if there's already an in-flight request for this key const existing = this.pending.get(key); if (existing) { existing.callerCount++; this.metrics.increment('singleflight.deduped', { key }); return existing.promise; } // Create new pending request const pendingCall: PendingCall<T> = { promise: fn(), callerCount: 1, startTime: Date.now(), }; this.pending.set(key, pendingCall); this.metrics.increment('singleflight.new', { key }); try { const result = await pendingCall.promise; // Record metrics this.metrics.histogram('singleflight.duration_ms', Date.now() - pendingCall.startTime, { key }); this.metrics.histogram('singleflight.callers', pendingCall.callerCount, { key }); return result; } finally { // Always clean up, even on failure this.pending.delete(key); } } /** * Get the number of callers waiting for a specific key */ getWaiterCount(key: string): number { return this.pending.get(key)?.callerCount ?? 0; }} // Usageconst singleflight = new Singleflight(); async function getTrendingContent(): Promise<Content[]> { // All concurrent calls will be coalesced into one backend request return singleflight.do('trending', async () => { const response = await contentService.getTrending(); return response.items; });} // Example: 100 concurrent requestsconst results = await Promise.all( Array(100).fill(null).map(() => getTrendingContent()));// All 100 receive the same data; only 1 backend call was madeThe effectiveness of singleflight depends on correctly identifying equivalent requests. The key must capture all semantically significant request parameters:
1234567891011121314151617181920212223242526272829303132333435363738
// Key generation strategies for different scenarios // Simple key for public datafunction publicDataKey(resource: string): string { return resource; // e.g., 'trending', 'categories'} // Key with paginationfunction paginatedKey(resource: string, page: number, limit: number): string { return `${resource}:page=${page}:limit=${limit}`;} // Key with user context (but shared within user)function userScopedKey(userId: string, resource: string): string { return `user:${userId}:${resource}`;} // Key with complex query parametersfunction queryKey(path: string, params: Record<string, string | number>): string { const sortedParams = Object.entries(params) .sort(([a], [b]) => a.localeCompare(b)) .map(([k, v]) => `${k}=${v}`) .join('&'); return `${path}?${sortedParams}`;} // Key with request body (for POST endpoints that behave like GET)function bodyKey(path: string, body: object): string { // Use stable JSON serialization for consistent keys return `${path}:${stableStringify(body)}`;} // WRONG: Including non-semantic fieldsfunction badKey(request: Request): string { // ❌ Don't include timestamps, request IDs, or other unique fields return `${request.path}:${request.timestamp}:${request.id}`; // This key will never match another request!}When one request fails in a singleflight group, ALL waiters receive the error. This means one unlucky timeout can fail 100 requests. Consider whether this amplification is acceptable, or if you need retry-per-caller semantics.
While singleflight coalesces concurrent requests for the same key, time-window batching coalesces requests for different keys that arrive within a short window, making a single batch request.
Batching introduces intentional latency: you wait for a window to accumulate requests before executing. The trade-off is:
This is worthwhile when the batch efficiency gains exceed the window latency cost.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103
// Time-window batching implementation interface BatchConfig { maxBatchSize: number; // Maximum items per batch maxWaitMs: number; // Maximum time to wait for batch minBatchSize?: number; // Minimum items before early flush} class BatchingQueue<TKey, TResult> { private pending: Map<TKey, { resolve: (result: TResult) => void; reject: (error: Error) => void; }[]> = new Map(); private timer: NodeJS.Timeout | null = null; private batchStartTime: number | null = null; constructor( private batchFetcher: (keys: TKey[]) => Promise<Map<TKey, TResult>>, private config: BatchConfig ) {} async get(key: TKey): Promise<TResult> { return new Promise((resolve, reject) => { // Add to pending batch if (!this.pending.has(key)) { this.pending.set(key, []); } this.pending.get(key)!.push({ resolve, reject }); // Start timer on first request if (this.timer === null) { this.batchStartTime = Date.now(); this.timer = setTimeout(() => this.flush(), this.config.maxWaitMs); } // Check if we should flush early if (this.pending.size >= this.config.maxBatchSize) { this.flush(); } }); } private async flush(): Promise<void> { // Clear timer if (this.timer) { clearTimeout(this.timer); this.timer = null; } // Capture current batch const batch = new Map(this.pending); this.pending.clear(); if (batch.size === 0) return; // Log batch metrics const waitTime = this.batchStartTime ? Date.now() - this.batchStartTime : 0; console.log(`Flushing batch: ${batch.size} keys after ${waitTime}ms`); this.batchStartTime = null; try { // Execute batch fetch const results = await this.batchFetcher([...batch.keys()]); // Resolve all waiters for (const [key, waiters] of batch) { const result = results.get(key); if (result !== undefined) { waiters.forEach(w => w.resolve(result)); } else { waiters.forEach(w => w.reject(new Error(`Key not found: ${key}`))); } } } catch (error) { // Reject all waiters on batch failure for (const waiters of batch.values()) { waiters.forEach(w => w.reject(error as Error)); } } }} // Usageconst productBatcher = new BatchingQueue<string, Product>( async (productIds) => { // Single batched call instead of N individual calls const products = await productService.batchGet(productIds); return new Map(products.map(p => [p.id, p])); }, { maxBatchSize: 50, maxWaitMs: 10, // 10ms maximum wait }); // These calls arriving within 10ms are batched togetherconst [product1, product2, product3] = await Promise.all([ productBatcher.get('prod-1'), productBatcher.get('prod-2'), productBatcher.get('prod-3'),]);// One backend call for all three productsThe optimal batch window depends on several factors:
| Window Size | Batch Efficiency | Added Latency | Best For |
|---|---|---|---|
| 1-5ms | Low (small batches) | Minimal | Latency-critical paths |
| 10-20ms | Moderate | Acceptable | Most BFF use cases |
| 50-100ms | High | Noticeable | Background processing, analytics |
| 100ms+ | Very high | Significant | Only when latency is irrelevant |
Consider adaptive windows that shrink under low load (prioritize latency) and expand under high load (prioritize efficiency). Monitor batch sizes—consistently small batches suggest your window is too long; consistently hitting max size suggests backend can handle more load.
One of the most impactful applications of coalescing is preventing the "thundering herd" on cache expiration. When a popular cached item expires, thousands of requests may simultaneously miss the cache and attempt to regenerate it.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
// Cache with miss coalescing (anti-stampede) class CoalescingCache<T> { private cache: Map<string, CacheEntry<T>> = new Map(); private singleflight = new Singleflight(); constructor( private fetcher: (key: string) => Promise<T>, private defaultTtlMs: number ) {} async get(key: string): Promise<T> { // Check cache const cached = this.cache.get(key); if (cached && cached.expiresAt > Date.now()) { return cached.value; } // Cache miss - use singleflight to coalesce concurrent fetches const value = await this.singleflight.do(key, async () => { // Double-check cache (another request might have populated it) const recheck = this.cache.get(key); if (recheck && recheck.expiresAt > Date.now()) { return recheck.value; } // Fetch from source const fresh = await this.fetcher(key); // Populate cache this.cache.set(key, { value: fresh, expiresAt: Date.now() + this.defaultTtlMs, }); return fresh; }); return value; }} // Advanced: Probabilistic early expiration (prevents synchronized expiry)class StaggeredCache<T> extends CoalescingCache<T> { async get(key: string): Promise<T> { const cached = this.cache.get(key); if (cached) { const timeToExpiry = cached.expiresAt - Date.now(); const ttl = cached.originalTtl; // Probabilistic early refresh // As we approach expiry, increase probability of refresh // This spreads out refresh load instead of spike at expiry const refreshProbability = this.calculateRefreshProbability(timeToExpiry, ttl); if (timeToExpiry > 0 && Math.random() > refreshProbability) { return cached.value; } // Either expired or randomly chosen for early refresh // Use background refresh if not expired yet if (timeToExpiry > 0) { this.refreshInBackground(key); return cached.value; } } // Expired or not cached - fetch synchronously with coalescing return super.get(key); } private calculateRefreshProbability(timeToExpiry: number, ttl: number): number { // Exponential probability increase as expiry approaches // At 10% remaining TTL, ~63% chance of early refresh const remainingFraction = timeToExpiry / ttl; if (remainingFraction > 0.1) return 0; // No early refresh until 90% through TTL return 1 - Math.exp(-10 * (0.1 - remainingFraction)); }} interface CacheEntry<T> { value: T; expiresAt: number; originalTtl: number;}Two approaches exist for preventing stampedes:
Singleflight (Lockless) — All concurrent requests wait for one fetch. Simple, but all waiters get same latency.
Mutex Locking — One request acquires a lock, others wait briefly then serve stale. More complex but fairer latency distribution.
Singleflight is generally preferred in BFF contexts for its simplicity.
When using external caches (Redis, Memcached), coalescing must happen at the application level—the cache itself won't prevent stampedes. Consider using Redis-based distributed locks (SETNX with TTL) for cross-instance coordination of cache refreshes.
Collapse-forwarding is a CDN/proxy technique where multiple requests for the same resource are collapsed into a single origin request. BFFs can implement similar functionality for backend service calls.
This differs from simple caching in that it works for uncacheable requests and cache misses—situations where pure caching doesn't help.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
// Collapse-forwarding with request queuing interface PendingRequest<T> { inflightPromise: Promise<T>; originTime: number; queuedRequests: number;} class CollapseForwarder { private pending = new Map<string, PendingRequest<any>>(); // Configuration private maxCollapseTime = 5000; // Don't collapse over 5 seconds private maxQueueSize = 1000; // Don't queue more than 1000 requests async forward<T>( key: string, forwarder: () => Promise<T> ): Promise<T> { const existing = this.pending.get(key); // Check if we can collapse onto existing request if (existing) { const age = Date.now() - existing.originTime; // Safety limits if (age > this.maxCollapseTime) { this.metrics.increment('collapse.rejected.timeout', { key }); // Request too old, don't collapse (origin might be stuck) return forwarder(); } if (existing.queuedRequests >= this.maxQueueSize) { this.metrics.increment('collapse.rejected.queue_full', { key }); // Queue full, don't risk memory exhaustion return forwarder(); } // Collapse onto existing request existing.queuedRequests++; this.metrics.increment('collapse.joined', { key }); return existing.inflightPromise; } // Create new pending request const newPending: PendingRequest<T> = { inflightPromise: this.executeWithCleanup(key, forwarder), originTime: Date.now(), queuedRequests: 1, }; this.pending.set(key, newPending); this.metrics.increment('collapse.originated', { key }); return newPending.inflightPromise; } private async executeWithCleanup<T>( key: string, forwarder: () => Promise<T> ): Promise<T> { try { return await forwarder(); } finally { const pending = this.pending.get(key); if (pending) { this.metrics.histogram('collapse.queue_size', pending.queuedRequests, { key }); this.metrics.histogram('collapse.duration_ms', Date.now() - pending.originTime, { key }); } this.pending.delete(key); } } // Observability getStats(): CollapseStats { return { pendingKeys: this.pending.size, totalQueued: [...this.pending.values()].reduce((a, b) => a + b.queuedRequests, 0), oldestPending: Math.min(...[...this.pending.values()].map(p => p.originTime)), }; }}When a BFF sits behind a CDN, both layers may implement collapse-forwarding. This provides defense in depth:
Each layer reduces load for the next, providing multiplicative protection.
Track the age of pending requests carefully. If requests are pending for seconds, something is wrong—either the backend is unhealthy or you have a configuration issue. Alert on p99 collapse duration exceeding your expected backend response time.
In multi-instance BFF deployments, per-instance coalescing has limits. If load balancers distribute requests evenly across 10 instances, each instance might make its own backend call for the same data—reducing duplication by only 90%.
For maximum deduplication, instances must coordinate:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
// Distributed coalescing using Redis for coordination class DistributedSingleflight { constructor( private redis: Redis, private pubsub: RedisPubSub, private instanceId: string ) {} async do<T>(key: string, fetcher: () => Promise<T>, ttlMs: number = 5000): Promise<T> { const lockKey = `singleflight:lock:${key}`; const resultKey = `singleflight:result:${key}`; const channel = `singleflight:channel:${key}`; // Try to acquire leader lock const acquired = await this.redis.set(lockKey, this.instanceId, 'NX', 'PX', ttlMs); if (acquired) { // We are the leader - execute the fetch try { const result = await fetcher(); // Store result for other instances await this.redis.setex(resultKey, Math.ceil(ttlMs / 1000), JSON.stringify(result)); // Notify waiting instances await this.pubsub.publish(channel, JSON.stringify({ status: 'success', instanceId: this.instanceId, })); return result; } catch (error) { // Notify failure await this.pubsub.publish(channel, JSON.stringify({ status: 'error', error: error.message, instanceId: this.instanceId, })); throw error; } finally { // Release lock await this.redis.del(lockKey); } } // We are a follower - wait for leader's result return new Promise((resolve, reject) => { const timeout = setTimeout(() => { cleanup(); // Timeout - try fetching ourselves fetcher().then(resolve).catch(reject); }, ttlMs); const cleanup = () => { clearTimeout(timeout); this.pubsub.unsubscribe(channel); }; this.pubsub.subscribe(channel, async (message) => { const notification = JSON.parse(message); cleanup(); if (notification.status === 'success') { // Fetch result from Redis const resultJson = await this.redis.get(resultKey); resolve(JSON.parse(resultJson!)); } else { reject(new Error(`Leader fetch failed: ${notification.error}`)); } }); // Also check if result already exists (in case we missed the pub) this.redis.get(resultKey).then(existing => { if (existing) { cleanup(); resolve(JSON.parse(existing)); } }); }); }}While powerful, distributed coalescing adds complexity:
| Aspect | Benefit | Cost |
|---|---|---|
| Deduplication | Near-perfect across all instances | Requires external coordination system |
| Latency | Reduced backend load | +1-5ms for coordination overhead |
| Reliability | Better protected backends | New failure mode (coordinator failure) |
| Complexity | — | Significantly more complex implementation |
| Consistency | All instances see same result | Single point of truth can be bottleneck |
Start with per-instance coalescing (simple, no external dependencies). Add distributed coalescing only if you observe excessive duplication AND your backends are struggling. The coordination overhead may not be worth it if your backends can handle the load.
Coalescing necessarily means multiple requests receive the same response. This has consistency implications that must be understood and managed.
When requests are coalesced over a time window, all requests receive data as of the first request's fetch time. This creates a consistency window:
Time: 0ms 10ms 20ms 30ms 40ms 50ms (data changes in backend)
↓ ↓ ↓
Req A Req B Req C
←── Coalesced ──→
All get data from time 0ms
Requests B and C receive data that may be stale by up to their delay from the original request.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
// Consistency-aware coalescing class ConsistentCoalescer { async doWithConsistency<T>( key: string, fetcher: () => Promise<T>, options: ConsistencyOptions = {} ): Promise<CoalescedResult<T>> { const coalescedResult = await this.singleflight.do(key, fetcher); const originTime = this.getOriginTime(key); return { data: coalescedResult, consistency: { dataAsOf: originTime, requestTime: Date.now(), staleBy: Date.now() - originTime, coalesced: true, }, }; }} // Response includes staleness metadatainterface CoalescedResult<T> { data: T; consistency: { dataAsOf: number; // When the data was actually fetched requestTime: number; // When this specific request arrived staleBy: number; // Milliseconds of potential staleness coalesced: boolean; // Whether this was a coalesced request };} // Client can use consistency metadataasync function handleProductRequest(productId: string) { const result = await coalescer.doWithConsistency( `product:${productId}`, () => productService.get(productId) ); // Include data age in response headers return { data: result.data, headers: { 'X-Data-Age-Ms': result.consistency.staleBy.toString(), 'X-Data-As-Of': new Date(result.consistency.dataAsOf).toISOString(), }, };} // Opt-out for consistency-critical requestsfunction shouldBypassCoalescing(request: Request): boolean { // User explicitly requests fresh data if (request.headers.get('Cache-Control') === 'no-cache') { return true; } // Request immediately follows a write if (request.headers.get('X-After-Write') === 'true') { return true; } // Specific endpoints that need consistency const consistentPaths = ['/checkout', '/payment', '/withdrawal']; if (consistentPaths.some(p => request.path.startsWith(p))) { return true; } return false;}The most noticeable consistency issue is read-after-write: user updates their profile, immediately views it, but sees stale data from a coalesced request. Prevent this by including a 'last-write-token' in write responses that bypasses coalescing when presented in subsequent reads.
Request coalescing is a powerful technique for improving system efficiency and protecting backend services. When implemented correctly, it can reduce backend load by orders of magnitude during traffic spikes without significantly impacting user experience.
What's Next:
With coalescing patterns mastered, the final page explores BFF Trade-offs—the architectural complexities, operational challenges, and organizational considerations that determine whether the BFF pattern is the right choice for your system.
You now understand request coalescing patterns in depth. You can implement singleflight deduplication, time-window batching, cache-miss protection, and distributed coordination—giving your BFFs the ability to protect backend services from traffic spikes while maintaining acceptable consistency guarantees.