Loading learning content...
Imagine you need to mail 100 letters. You could walk to the mailbox 100 times, once for each letter—or you could bundle them together and make a single trip. The first approach wastes enormous effort on the walk itself. The second approach amortizes the fixed cost of walking across all 100 letters.
This insight—that fixed per-operation costs can be reduced by grouping operations together—is the essence of batching. In distributed systems, batching is one of the most powerful throughput optimization techniques, often yielding 10-100x improvements with minimal code changes.
The fundamental economics:
$$ \text{Effective Cost per Item} = \frac{\text{Fixed Overhead} + (\text{Variable Cost} \times N)}{N} $$
As N (batch size) increases, fixed overhead per item approaches zero.
This page covers batching strategies across the full stack—from database queries to API calls to message queue production. You'll understand when batching helps, optimal batch sizes, latency trade-offs, and implementation patterns for different scenarios.
To understand why batching is so effective, we must examine the cost structure of common operations. Every remote operation—database query, API call, message send—involves multiple stages, each with fixed and variable components.
Anatomy of a Database Query:
Single INSERT:
┌─────────────────────────────────────────────────────────────┐
│ Client │ Network │ Database Server │
├─────────────────┼────────────────┼──────────────────────────┤
│ Serialize query │→ TCP Round trip│→ Parse SQL │
│ (0.1ms) │ (0.5ms) │ (0.2ms) │
│ │ │→ Query planning (0.1ms) │
│ │ │→ Lock acquisition (0.1ms)│
│ │ │→ Write row (0.05ms) │
│ │ │→ WAL sync (1ms) │
│ │ │→ Ack generation │
│ │← TCP Return │ │
│ Deserialize │ (0.5ms) │ │
│ (0.05ms) │ │ │
└─────────────────┴────────────────┴──────────────────────────┘
Total: ~2.5ms for ONE row
Fixed overhead: ~2.45ms | Variable per-row: ~0.05ms
The math becomes compelling:
| Operation | Fixed Overhead | Variable Cost/Item | Batch Sweet Spot |
|---|---|---|---|
| PostgreSQL INSERT | ~2-5ms (parse, plan, WAL) | ~0.05-0.1ms/row | 100-1000 rows |
| Redis SET | ~0.1-0.5ms (RTT) | ~0.01ms/key | 100-500 keys (MSET) |
| HTTP API call | ~10-50ms (TLS, headers) | ~0.1-1ms/item | 50-500 items |
| Kafka produce | ~1-5ms (leader ack) | ~0.01ms/message | 100-10000 messages |
| S3 PUT | ~50-200ms (connection) | ~0.01ms/KB | 1-100 MB per object |
| Elasticsearch index | ~5-20ms (refresh) | ~0.1ms/document | 500-5000 docs |
Batching provides the greatest benefit when fixed overhead dominates variable cost—i.e., when the ratio (Fixed / Variable) is high. Network round-trips, connection setup, and transactional overhead are prime targets. Batching CPU-bound in-memory operations provides minimal benefit.
Batching manifests in different architectural patterns depending on where and how operations are grouped.
Request Aggregation with DataLoader Pattern:
The DataLoader pattern (popularized by Facebook/Meta for GraphQL) is elegant: within a single tick of the event loop, all data requests are collected, then executed as a single batch query.
Without DataLoader: With DataLoader:
Resolver 1 → fetch(id=1) → DB Resolver 1 → loader.load(1) ─┐
Resolver 2 → fetch(id=2) → DB Resolver 2 → loader.load(2) ─┼─→ fetch([1,2,3]) → DB
Resolver 3 → fetch(id=3) → DB Resolver 3 → loader.load(3) ─┘
3 queries: 3×2ms = 6ms 1 query: 2.5ms
123456789101112131415161718192021222324252627282930
import DataLoader from 'dataloader'; // DataLoader batches all load() calls within a single event loop tickconst userLoader = new DataLoader<string, User>(async (userIds) => { // This function receives ALL requested IDs in a single call console.log(`Batched fetch for ${userIds.length} users`); // Single database query for all users const users = await db.query( 'SELECT * FROM users WHERE id = ANY($1)', [userIds] ); // Return in same order as requested IDs const userMap = new Map(users.map(u => [u.id, u])); return userIds.map(id => userMap.get(id) || null);}); // GraphQL resolver - each call seems independentasync function resolveUser(parent) { // These are automatically batched! return userLoader.load(parent.userId);} // Usage in GraphQL query:// query {// posts { # Returns 50 posts// author { name } # Would be 50 DB queries... becomes 1!// }// }Databases offer multiple batching mechanisms, each with specific use cases and trade-offs.
Multi-Row INSERT:
Instead of executing N separate INSERT statements, use a single INSERT with multiple value tuples:
-- Slow: N separate INSERT statements
INSERT INTO events (user_id, type, data) VALUES (1, 'click', '{...}');
INSERT INTO events (user_id, type, data) VALUES (2, 'view', '{...}');
-- ... 998 more ...
-- Fast: Single INSERT with N rows
INSERT INTO events (user_id, type, data) VALUES
(1, 'click', '{...}'),
(2, 'view', '{...}'),
-- ... 998 more ...
(1000, 'click', '{...}');
Performance characteristics:
The COPY command (PostgreSQL):
For very large loads, COPY is faster than INSERT:
COPY events (user_id, type, data) FROM STDIN WITH (FORMAT csv);
1,click,"{...}"
2,view,"{...}"
\.
COPY bypasses SQL parsing entirely and uses binary protocol—10-100x faster than INSERT for millions of rows.
Network round-trips are often the dominant cost in distributed systems. Batching API calls dramatically reduces this overhead.
HTTP API Design for Batching:
Option 1: Bulk Endpoints
Design APIs that accept arrays:
// Instead of:
POST /api/users/1/notifications {"message": "Hello"}
POST /api/users/2/notifications {"message": "Hi"}
POST /api/users/3/notifications {"message": "Hey"}
// Provide:
POST /api/notifications/batch
{
"notifications": [
{"userId": 1, "message": "Hello"},
{"userId": 2, "message": "Hi"},
{"userId": 3, "message": "Hey"}
]
}
// Response:
{
"results": [
{"userId": 1, "success": true, "id": "notif-123"},
{"userId": 2, "success": true, "id": "notif-124"},
{"userId": 3, "success": false, "error": "User not found"}
]
}
Option 2: GraphQL (Inherent Batching)
GraphQL naturally batches by combining multiple queries in one request:
query {
user1: user(id: 1) { name email }
user2: user(id: 2) { name email }
user3: user(id: 3) { name email }
recentPosts(limit: 10) { title }
notifications { message read }
}
# Single HTTP request, multiple data fetches
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
/** * Automatic request batching for API calls * Collects requests over a time window, sends as single batch */class BatchedApiClient { private pendingRequests: Map<string, { resolve: (value: any) => void; reject: (error: Error) => void; request: any; }[]> = new Map(); private batchTimeoutMs: number; private maxBatchSize: number; private scheduledFlush: NodeJS.Timeout | null = null; constructor(options: { batchTimeoutMs?: number; maxBatchSize?: number } = {}) { this.batchTimeoutMs = options.batchTimeoutMs ?? 10; // 10ms window this.maxBatchSize = options.maxBatchSize ?? 100; } async request<T>(endpoint: string, body: any): Promise<T> { return new Promise((resolve, reject) => { // Add to pending batch for this endpoint if (!this.pendingRequests.has(endpoint)) { this.pendingRequests.set(endpoint, []); } const batch = this.pendingRequests.get(endpoint)!; batch.push({ resolve, reject, request: body }); // Flush if batch is full if (batch.length >= this.maxBatchSize) { this.flushEndpoint(endpoint); return; } // Schedule flush after timeout if (!this.scheduledFlush) { this.scheduledFlush = setTimeout(() => { this.flushAll(); this.scheduledFlush = null; }, this.batchTimeoutMs); } }); } private async flushEndpoint(endpoint: string) { const batch = this.pendingRequests.get(endpoint); if (!batch || batch.length === 0) return; this.pendingRequests.delete(endpoint); try { // Send batch request const response = await fetch(`${endpoint}/batch`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ requests: batch.map(b => b.request) }), }); const results = await response.json(); // Resolve individual promises batch.forEach((item, index) => { const result = results.responses[index]; if (result.error) { item.reject(new Error(result.error)); } else { item.resolve(result.data); } }); } catch (error) { // Reject all on network error batch.forEach(item => item.reject(error as Error)); } } private flushAll() { for (const endpoint of this.pendingRequests.keys()) { this.flushEndpoint(endpoint); } }} // Usage: Individual calls are automatically batchedconst client = new BatchedApiClient({ batchTimeoutMs: 5 }); // These 3 calls within 5ms become 1 HTTP request:const [user1, user2, user3] = await Promise.all([ client.request('/api/users', { id: 1 }), client.request('/api/users', { id: 2 }), client.request('/api/users', { id: 3 }),]);Message queues—Kafka, SQS, RabbitMQ—benefit enormously from batching due to network round-trip and acknowledgment overhead.
Kafka Producer Batching:
Kafka's producer API batches messages automatically based on two thresholds:
Without batching (linger.ms=0): With batching (linger.ms=5):
Message 1 → send → wait ack → done Message 1 ──┐
Message 2 → send → wait ack → done Message 2 ──┼── batch ──→ send ─→ ack
Message 3 → send → wait ack → done Message 3 ──┘
Latency: low per-message Latency: +5ms per batch
Throughput: ~1000 msg/sec Throughput: ~100,000+ msg/sec
Configuration trade-offs:
| Setting | Low Value | High Value |
|---|---|---|
| batch.size | Smaller batches, more network calls | Larger batches, higher memory |
| linger.ms | Lower latency | Higher throughput |
| compression.type | Less CPU, more network | More CPU, less network |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
import { Kafka, CompressionTypes } from 'kafkajs'; const kafka = new Kafka({ brokers: ['localhost:9092'] }); // Producer with batching optimizationsconst producer = kafka.producer({ // Allow batching for up to 5ms // Trade: +5ms latency for massive throughput gains allowAutoTopicCreation: false, // Transaction timeout transactionTimeout: 30000,}); // Configure producer batchingawait producer.connect(); // Sending with batching - all these go in one network callconst messages = Array.from({ length: 1000 }, (_, i) => ({ key: `user-${i % 100}`, value: JSON.stringify({ event: 'click', userId: i, timestamp: Date.now() }),})); // sendBatch is explicit batchingawait producer.sendBatch({ topicMessages: [{ topic: 'user-events', messages, }],}); // Or use natural batching: rapid sends within linger window are batchedfor (const msg of messages) { // These accumulate in internal buffer // Sent when batch.size reached OR linger.ms elapsed producer.send({ topic: 'user-events', messages: [msg], }).catch(console.error);} // Consumer-side batching with Kafkaconst consumer = kafka.consumer({ groupId: 'analytics-group' });await consumer.connect();await consumer.subscribe({ topic: 'user-events' }); await consumer.run({ // Process in batches for efficiency eachBatch: async ({ batch, resolveOffset, heartbeat }) => { console.log(`Processing batch of ${batch.messages.length} messages`); // Batch insert to database const events = batch.messages.map(m => JSON.parse(m.value!.toString())); await db.events.insertMany(events); // Commit offset after batch processed resolveOffset(batch.messages[batch.messages.length - 1].offset); await heartbeat(); },});Batching complicates failure handling. If a batch of 100 messages partially fails, should you retry all 100? Only the failed ones? Idempotency keys, transactional outbox patterns, and exactly-once semantics become critical when batching writes.
Batch size is not "bigger is better." There's an optimal range where throughput is maximized without excessive latency or resource consumption.
The Batching Curve:
Throughput
▲
│ ┌─────────────────── Optimal range
│ ╱ ╲
│ ╱ ╲
│ ╱ ╲
│ ╱ ╲
│ ╱ ╲
│ ╱ ╲ Memory limits,
│ ╱ ╲ timeout issues
│ ╱
│ ╱
│ ╱
└───────────────────────────────────────────▶ Batch Size
1 10 100 1000 10000 100000
Phase 1 (1-100): Rapid throughput increase as overhead amortized
Phase 2 (100-1000): Diminishing returns, still improving
Phase 3 (1000+): Flat or declining - memory pressure, timeouts
Factors Limiting Batch Size:
Finding Your Optimal Batch Size:
The optimal batch size is workload-specific. Use this process:
Typical optimal ranges by system:
| System | Optimal Batch Size | Reasoning |
|---|---|---|
| PostgreSQL INSERT | 100-1000 rows | Beyond ~5000, transaction overhead grows |
| Redis MSET | 100-500 keys | Memory efficiency, pipeline limits |
| Elasticsearch bulk | 500-5000 docs | Refresh interval, memory buffers |
| Kafka producer | 16KB-1MB | Network efficiency vs latency |
| AWS SQS | 10 messages | API limit (SendMessageBatch max 10) |
| HTTP API batch | 50-500 items | Request body size, timeout constraints |
Batching inherently trades latency for throughput. Understanding and managing this trade-off is crucial for system design.
Hybrid Strategies:
Smart systems adapt batching dynamically:
1. Adaptive Batching:
Load-based batch sizing:
if (queueDepth > 1000) {
batchSize = 500; // Under load: prioritize throughput
lingerMs = 50;
} else if (queueDepth > 100) {
batchSize = 100; // Medium load: balance
lingerMs = 10;
} else {
batchSize = 10; // Low load: prioritize latency
lingerMs = 1;
}
2. Dual-Path Architecture:
┌──────────────────┐
│ Incoming │
│ Requests │
└────────┬─────────┘
│
┌──────────────┴──────────────┐
│ Router │
│ (priority classification) │
└──────┬───────────┬──────────┘
│ │
┌───────────▼───┐ ┌───▼───────────┐
│ Fast Path │ │ Batch Path │
│ (immediate) │ │ (buffered) │
│ Premium SLA │ │ Best-effort │
└───────────────┘ └───────────────┘
3. Fire-and-Acknowledge Pattern:
Acknowledge immediately but process in batches:
Client sends request
│
▼
Server writes to durable queue → Immediate ACK to client
│ ("Request accepted")
│
▼ (async)
Background worker batches from queue → Processes batch
│
▼
Webhook/notification on completion
This gives clients low latency (fast ack) while backend achieves high throughput (batched processing).
We've explored batching as a fundamental throughput optimization technique. Here are the key insights:
What's next:
Batching reduces per-operation overhead, but we still pay connection costs for each batch. The next page explores connection reuse—techniques like connection pooling, keep-alive, and multiplexing that further reduce the overhead of communicating with remote services.
You now understand batching as a throughput optimization technique—from the physics of fixed vs variable costs, through database and API batching patterns, to optimal batch size determination and latency trade-offs. Next, we'll examine connection reuse strategies.