System Design (HLD)Throughput Optimization

Throughput Optimization Techniques

LevelIntermediate

Duration120 mins

TopicThroughput Optimization

2 / 5

Batching Requests: Amortizing Overhead Through Grouping

The Power of Doing More at Once

Imagine you need to mail 100 letters. You could walk to the mailbox 100 times, once for each letter—or you could bundle them together and make a single trip. The first approach wastes enormous effort on the walk itself. The second approach amortizes the fixed cost of walking across all 100 letters.

This insight—that fixed per-operation costs can be reduced by grouping operations together—is the essence of batching. In distributed systems, batching is one of the most powerful throughput optimization techniques, often yielding 10-100x improvements with minimal code changes.

The fundamental economics:

Every operation has fixed overhead (connection setup, header parsing, acknowledgment)
Every operation has variable cost (payload processing)
Batching reduces total overhead by sharing fixed costs across multiple items

$$ \text{Effective Cost per Item} = \frac{\text{Fixed Overhead} + (\text{Variable Cost} \times N)}{N} $$

As N (batch size) increases, fixed overhead per item approaches zero.

What You Will Learn

This page covers batching strategies across the full stack—from database queries to API calls to message queue production. You'll understand when batching helps, optimal batch sizes, latency trade-offs, and implementation patterns for different scenarios.

The Physics of Batching

To understand why batching is so effective, we must examine the cost structure of common operations. Every remote operation—database query, API call, message send—involves multiple stages, each with fixed and variable components.

Anatomy of a Database Query:

Single INSERT:
┌─────────────────────────────────────────────────────────────┐
│ Client          │  Network       │  Database Server         │
├─────────────────┼────────────────┼──────────────────────────┤
│ Serialize query │→ TCP Round trip│→ Parse SQL               │
│ (0.1ms)         │  (0.5ms)       │  (0.2ms)                 │
│                 │                │→ Query planning (0.1ms)  │
│                 │                │→ Lock acquisition (0.1ms)│
│                 │                │→ Write row (0.05ms)      │
│                 │                │→ WAL sync (1ms)          │
│                 │                │→ Ack generation          │
│                 │← TCP Return    │                          │
│ Deserialize     │  (0.5ms)       │                          │
│ (0.05ms)        │                │                          │
└─────────────────┴────────────────┴──────────────────────────┘

Total: ~2.5ms for ONE row
Fixed overhead: ~2.45ms | Variable per-row: ~0.05ms

The math becomes compelling:

100 individual INSERTs: 100 × 2.5ms = 250ms
1 batch INSERT of 100 rows: 2.45ms + (100 × 0.05ms) = 7.45ms
Speedup: 33.5x

Fixed vs Variable Costs Across Operations
Operation	Fixed Overhead	Variable Cost/Item	Batch Sweet Spot
PostgreSQL INSERT	~2-5ms (parse, plan, WAL)	~0.05-0.1ms/row	100-1000 rows
Redis SET	~0.1-0.5ms (RTT)	~0.01ms/key	100-500 keys (MSET)
HTTP API call	~10-50ms (TLS, headers)	~0.1-1ms/item	50-500 items
Kafka produce	~1-5ms (leader ack)	~0.01ms/message	100-10000 messages
S3 PUT	~50-200ms (connection)	~0.01ms/KB	1-100 MB per object
Elasticsearch index	~5-20ms (refresh)	~0.1ms/document	500-5000 docs

When Batching Helps Most

Batching provides the greatest benefit when fixed overhead dominates variable cost—i.e., when the ratio (Fixed / Variable) is high. Network round-trips, connection setup, and transactional overhead are prime targets. Batching CPU-bound in-memory operations provides minimal benefit.

Batching Patterns

Batching manifests in different architectural patterns depending on where and how operations are grouped.

Batching Pattern Types

•Request Aggregation — Multiple incoming requests are buffered and processed together as a single backend operation. Common in GraphQL data loaders and ORM batch fetching.
•Bulk APIs — APIs explicitly accept arrays of items rather than single items. Examples: Stripe's batch charge API, Elasticsearch bulk indexing, AWS SQS SendMessageBatch.
•Write Buffering — Writes accumulate in memory and flush periodically or when buffer reaches threshold. Used in database clients, logging systems, and metrics collectors.
•Micro-Batching — Stream processing divides continuous streams into small time-windowed batches. Spark Streaming, Flink checkpointing, and database auto-commit intervals.
•Batch-and-Callback — Clients submit requests immediately but the system delays response until batch completes. Balances latency perception with backend efficiency.

Request Aggregation with DataLoader Pattern:

The DataLoader pattern (popularized by Facebook/Meta for GraphQL) is elegant: within a single tick of the event loop, all data requests are collected, then executed as a single batch query.

Without DataLoader:               With DataLoader:
                                  
Resolver 1 → fetch(id=1) → DB     Resolver 1 → loader.load(1) ─┐
Resolver 2 → fetch(id=2) → DB     Resolver 2 → loader.load(2) ─┼─→ fetch([1,2,3]) → DB
Resolver 3 → fetch(id=3) → DB     Resolver 3 → loader.load(3) ─┘
                                  
3 queries: 3×2ms = 6ms            1 query: 2.5ms

dataloader-pattern.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import DataLoader from 'dataloader';
 
// DataLoader batches all load() calls within a single event loop tick
const userLoader = new DataLoader<string, User>(async (userIds) => {
  // This function receives ALL requested IDs in a single call
  console.log(`Batched fetch for ${userIds.length} users`);
  
  // Single database query for all users
  const users = await db.query(
    'SELECT * FROM users WHERE id = ANY($1)',
    [userIds]
  );
  
  // Return in same order as requested IDs
  const userMap = new Map(users.map(u => [u.id, u]));
  return userIds.map(id => userMap.get(id) || null);
});
 
// GraphQL resolver - each call seems independent
async function resolveUser(parent) {
  // These are automatically batched!
  return userLoader.load(parent.userId);
}
 
// Usage in GraphQL query:
// query {
//   posts {              # Returns 50 posts
//     author { name }    # Would be 50 DB queries... becomes 1!
//   }
// }

Database Batching Strategies

Databases offer multiple batching mechanisms, each with specific use cases and trade-offs.

Multi-Row INSERT:

Instead of executing N separate INSERT statements, use a single INSERT with multiple value tuples:

-- Slow: N separate INSERT statements
INSERT INTO events (user_id, type, data) VALUES (1, 'click', '{...}');
INSERT INTO events (user_id, type, data) VALUES (2, 'view', '{...}');
-- ... 998 more ...

-- Fast: Single INSERT with N rows
INSERT INTO events (user_id, type, data) VALUES
  (1, 'click', '{...}'),
  (2, 'view', '{...}'),
  -- ... 998 more ...
  (1000, 'click', '{...}');

Performance characteristics:

PostgreSQL: Up to ~10,000 rows per statement (limited by max_stack_depth)
MySQL: Limited by max_allowed_packet (default 64MB)
Optimal batch size: Usually 100-1000 rows; beyond that, diminishing returns

The COPY command (PostgreSQL):

For very large loads, COPY is faster than INSERT:

COPY events (user_id, type, data) FROM STDIN WITH (FORMAT csv);
1,click,"{...}"
2,view,"{...}"
\.

COPY bypasses SQL parsing entirely and uses binary protocol—10-100x faster than INSERT for millions of rows.

API and Network Batching

Network round-trips are often the dominant cost in distributed systems. Batching API calls dramatically reduces this overhead.

HTTP API Design for Batching:

Option 1: Bulk Endpoints

Design APIs that accept arrays:

// Instead of:
POST /api/users/1/notifications  {"message": "Hello"}
POST /api/users/2/notifications  {"message": "Hi"}
POST /api/users/3/notifications  {"message": "Hey"}

// Provide:
POST /api/notifications/batch
{
  "notifications": [
    {"userId": 1, "message": "Hello"},
    {"userId": 2, "message": "Hi"},
    {"userId": 3, "message": "Hey"}
  ]
}

// Response:
{
  "results": [
    {"userId": 1, "success": true, "id": "notif-123"},
    {"userId": 2, "success": true, "id": "notif-124"},
    {"userId": 3, "success": false, "error": "User not found"}
  ]
}

Option 2: GraphQL (Inherent Batching)

GraphQL naturally batches by combining multiple queries in one request:

query {
  user1: user(id: 1) { name email }
  user2: user(id: 2) { name email }
  user3: user(id: 3) { name email }
  recentPosts(limit: 10) { title }
  notifications { message read }
}
# Single HTTP request, multiple data fetches

batch-api-client.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
/**
 * Automatic request batching for API calls
 * Collects requests over a time window, sends as single batch
 */
class BatchedApiClient {
  private pendingRequests: Map<string, {
    resolve: (value: any) => void;
    reject: (error: Error) => void;
    request: any;
  }[]> = new Map();
  
  private batchTimeoutMs: number;
  private maxBatchSize: number;
  private scheduledFlush: NodeJS.Timeout | null = null;
 
  constructor(options: { batchTimeoutMs?: number; maxBatchSize?: number } = {}) {
    this.batchTimeoutMs = options.batchTimeoutMs ?? 10; // 10ms window
    this.maxBatchSize = options.maxBatchSize ?? 100;
  }
 
  async request<T>(endpoint: string, body: any): Promise<T> {
    return new Promise((resolve, reject) => {
      // Add to pending batch for this endpoint
      if (!this.pendingRequests.has(endpoint)) {
        this.pendingRequests.set(endpoint, []);
      }
      
      const batch = this.pendingRequests.get(endpoint)!;
      batch.push({ resolve, reject, request: body });
      
      // Flush if batch is full
      if (batch.length >= this.maxBatchSize) {
        this.flushEndpoint(endpoint);
        return;
      }
      
      // Schedule flush after timeout
      if (!this.scheduledFlush) {
        this.scheduledFlush = setTimeout(() => {
          this.flushAll();
          this.scheduledFlush = null;
        }, this.batchTimeoutMs);
      }
    });
  }
 
  private async flushEndpoint(endpoint: string) {
    const batch = this.pendingRequests.get(endpoint);
    if (!batch || batch.length === 0) return;
    
    this.pendingRequests.delete(endpoint);
    
    try {
      // Send batch request
      const response = await fetch(`${endpoint}/batch`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ 
          requests: batch.map(b => b.request) 
        }),
      });
      
      const results = await response.json();
      
      // Resolve individual promises
      batch.forEach((item, index) => {
        const result = results.responses[index];
        if (result.error) {
          item.reject(new Error(result.error));
        } else {
          item.resolve(result.data);
        }
      });
    } catch (error) {
      // Reject all on network error
      batch.forEach(item => item.reject(error as Error));
    }
  }
 
  private flushAll() {
    for (const endpoint of this.pendingRequests.keys()) {
      this.flushEndpoint(endpoint);
    }
  }
}
 
// Usage: Individual calls are automatically batched
const client = new BatchedApiClient({ batchTimeoutMs: 5 });
 
// These 3 calls within 5ms become 1 HTTP request:
const [user1, user2, user3] = await Promise.all([
  client.request('/api/users', { id: 1 }),
  client.request('/api/users', { id: 2 }),
  client.request('/api/users', { id: 3 }),
]);

Message Queue Batching

Message queues—Kafka, SQS, RabbitMQ—benefit enormously from batching due to network round-trip and acknowledgment overhead.

Kafka Producer Batching:

Kafka's producer API batches messages automatically based on two thresholds:

batch.size: Maximum bytes per batch (default 16KB)
linger.ms: Maximum time to wait for more messages (default 0ms)

Without batching (linger.ms=0):           With batching (linger.ms=5):

Message 1 → send → wait ack → done       Message 1 ──┐
Message 2 → send → wait ack → done       Message 2 ──┼── batch ──→ send ─→ ack
Message 3 → send → wait ack → done       Message 3 ──┘

Latency: low per-message                 Latency: +5ms per batch
Throughput: ~1000 msg/sec                Throughput: ~100,000+ msg/sec

Configuration trade-offs:

Setting	Low Value	High Value
batch.size	Smaller batches, more network calls	Larger batches, higher memory
linger.ms	Lower latency	Higher throughput
compression.type	Less CPU, more network	More CPU, less network

kafka-batching-config.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import { Kafka, CompressionTypes } from 'kafkajs';
 
const kafka = new Kafka({ brokers: ['localhost:9092'] });
 
// Producer with batching optimizations
const producer = kafka.producer({
  // Allow batching for up to 5ms
  // Trade: +5ms latency for massive throughput gains
  allowAutoTopicCreation: false,
  
  // Transaction timeout
  transactionTimeout: 30000,
});
 
// Configure producer batching
await producer.connect();
 
// Sending with batching - all these go in one network call
const messages = Array.from({ length: 1000 }, (_, i) => ({
  key: `user-${i % 100}`,
  value: JSON.stringify({ event: 'click', userId: i, timestamp: Date.now() }),
}));
 
// sendBatch is explicit batching
await producer.sendBatch({
  topicMessages: [{
    topic: 'user-events',
    messages,
  }],
});
 
// Or use natural batching: rapid sends within linger window are batched
for (const msg of messages) {
  // These accumulate in internal buffer
  // Sent when batch.size reached OR linger.ms elapsed
  producer.send({
    topic: 'user-events',
    messages: [msg],
  }).catch(console.error);
}
 
// Consumer-side batching with Kafka
const consumer = kafka.consumer({ groupId: 'analytics-group' });
await consumer.connect();
await consumer.subscribe({ topic: 'user-events' });
 
await consumer.run({
  // Process in batches for efficiency
  eachBatch: async ({ batch, resolveOffset, heartbeat }) => {
    console.log(`Processing batch of ${batch.messages.length} messages`);
    
    // Batch insert to database
    const events = batch.messages.map(m => JSON.parse(m.value!.toString()));
    await db.events.insertMany(events);
    
    // Commit offset after batch processed
    resolveOffset(batch.messages[batch.messages.length - 1].offset);
    await heartbeat();
  },
});

Batching and Exactly-Once Semantics

Batching complicates failure handling. If a batch of 100 messages partially fails, should you retry all 100? Only the failed ones? Idempotency keys, transactional outbox patterns, and exactly-once semantics become critical when batching writes.

Determining Optimal Batch Size

Batch size is not "bigger is better." There's an optimal range where throughput is maximized without excessive latency or resource consumption.

The Batching Curve:

Throughput
    ▲
    │           ┌─────────────────── Optimal range
    │          ╱                    ╲
    │         ╱                      ╲
    │        ╱                        ╲
    │       ╱                          ╲
    │      ╱                            ╲
    │     ╱                              ╲ Memory limits,
    │    ╱                                ╲ timeout issues
    │   ╱
    │  ╱
    │ ╱
    └───────────────────────────────────────────▶ Batch Size
       1     10    100   1000  10000  100000

Phase 1 (1-100): Rapid throughput increase as overhead amortized
Phase 2 (100-1000): Diminishing returns, still improving
Phase 3 (1000+): Flat or declining - memory pressure, timeouts

Factors Limiting Batch Size:

Batch Size Constraints

•Memory constraints — Large batches consume memory. A batch of 10,000 items at 10KB each = 100MB in memory. Memory pressure causes GC pauses or OOM.
•Timeout limits — Processing larger batches takes longer. If batch processing exceeds client timeout, the entire batch fails and must be retried.
•Latency requirements — Batching inherently adds latency (waiting for batch to fill). If SLA requires <50ms response, you can't wait 100ms to batch.
•Transaction limits — Database transactions have statement limits, row limits, or duration limits. PostgreSQL: ~10,000 bind parameters per statement.
•Network packet size — Very large batches may require packet fragmentation, adding overhead. MTU is typically 1500 bytes; jumbo frames allow 9000.
•Partial failure complexity — Larger batches mean more complex partial failure handling. Which items succeeded? Which need retry?

Finding Your Optimal Batch Size:

The optimal batch size is workload-specific. Use this process:

Baseline: Measure throughput with batch size = 1
Increment: Double batch size, measure again
Find plateau: When throughput stops increasing significantly (< 10% gain), you've found the optimal range
Consider latency: Check if latency at optimal throughput meets SLAs
Add safety margin: Use 50-80% of maximum stable batch size to handle variance

Typical optimal ranges by system:

System	Optimal Batch Size	Reasoning
PostgreSQL INSERT	100-1000 rows	Beyond ~5000, transaction overhead grows
Redis MSET	100-500 keys	Memory efficiency, pipeline limits
Elasticsearch bulk	500-5000 docs	Refresh interval, memory buffers
Kafka producer	16KB-1MB	Network efficiency vs latency
AWS SQS	10 messages	API limit (SendMessageBatch max 10)
HTTP API batch	50-500 items	Request body size, timeout constraints

The Latency vs Throughput Trade-off

Batching inherently trades latency for throughput. Understanding and managing this trade-off is crucial for system design.

When to Optimize for Latency

•User-facing synchronous requests
•Real-time dashboards and UIs
•Trading systems (microseconds matter)
•Interactive search suggestions
•Payment processing (perceived speed)
•When SLAs specify latency percentiles

When to Optimize for Throughput

•Background job processing
•ETL and data pipelines
•Log ingestion and analytics
•Bulk data imports/exports
•Asynchronous event processing
•When cost per operation matters

Hybrid Strategies:

Smart systems adapt batching dynamically:

1. Adaptive Batching:

Load-based batch sizing:

if (queueDepth > 1000) {
  batchSize = 500;      // Under load: prioritize throughput
  lingerMs = 50;
} else if (queueDepth > 100) {
  batchSize = 100;      // Medium load: balance
  lingerMs = 10;
} else {
  batchSize = 10;       // Low load: prioritize latency
  lingerMs = 1;
}

2. Dual-Path Architecture:

                    ┌──────────────────┐
                    │   Incoming       │
                    │   Requests       │
                    └────────┬─────────┘
                             │
              ┌──────────────┴──────────────┐
              │         Router              │
              │   (priority classification) │
              └──────┬───────────┬──────────┘
                     │           │
         ┌───────────▼───┐   ┌───▼───────────┐
         │  Fast Path    │   │  Batch Path   │
         │  (immediate)  │   │  (buffered)   │
         │  Premium SLA  │   │  Best-effort  │
         └───────────────┘   └───────────────┘

3. Fire-and-Acknowledge Pattern:

Acknowledge immediately but process in batches:

Client sends request
    │
    ▼
Server writes to durable queue → Immediate ACK to client
    │                            ("Request accepted")
    │
    ▼   (async)
Background worker batches from queue → Processes batch
    │
    ▼
Webhook/notification on completion

This gives clients low latency (fast ack) while backend achieves high throughput (batched processing).

Summary: Batching for Throughput

We've explored batching as a fundamental throughput optimization technique. Here are the key insights:

Key Takeaways

•Batching amortizes fixed costs — Network round-trips, connection overhead, and synchronization costs are paid once per batch instead of once per item.
•The economics are compelling — Batching often provides 10-100x throughput improvements with minimal code changes.
•Multiple patterns exist — Request aggregation (DataLoader), bulk APIs, write buffering, and micro-batching serve different use cases.
•Database batching is critical — Bulk INSERT, batch SELECT, and transaction batching eliminate N+1 query problems and reduce I/O overhead.
•Optimal batch size is workload-specific — Too small wastes overhead; too large adds latency, memory pressure, and failure complexity.
•Latency and throughput trade off — Batching inherently adds latency; use adaptive strategies or dual-path architectures when both matter.

What's next:

Batching reduces per-operation overhead, but we still pay connection costs for each batch. The next page explores connection reuse—techniques like connection pooling, keep-alive, and multiplexing that further reduce the overhead of communicating with remote services.

Page Complete

You now understand batching as a throughput optimization technique—from the physics of fixed vs variable costs, through database and API batching patterns, to optimal batch size determination and latency trade-offs. Next, we'll examine connection reuse strategies.

2 / 5

Loading learning content...

System Design (HLD)Throughput Optimization

Throughput Optimization Techniques

LevelIntermediate

Duration120 mins

TopicThroughput Optimization

2 / 5

Batching Requests: Amortizing Overhead Through Grouping

The Power of Doing More at Once

The fundamental economics:

Every operation has fixed overhead (connection setup, header parsing, acknowledgment)
Every operation has variable cost (payload processing)
Batching reduces total overhead by sharing fixed costs across multiple items

$$ \text{Effective Cost per Item} = \frac{\text{Fixed Overhead} + (\text{Variable Cost} \times N)}{N} $$

As N (batch size) increases, fixed overhead per item approaches zero.

What You Will Learn

The Physics of Batching

Anatomy of a Database Query:

Single INSERT:
┌─────────────────────────────────────────────────────────────┐
│ Client          │  Network       │  Database Server         │
├─────────────────┼────────────────┼──────────────────────────┤
│ Serialize query │→ TCP Round trip│→ Parse SQL               │
│ (0.1ms)         │  (0.5ms)       │  (0.2ms)                 │
│                 │                │→ Query planning (0.1ms)  │
│                 │                │→ Lock acquisition (0.1ms)│
│                 │                │→ Write row (0.05ms)      │
│                 │                │→ WAL sync (1ms)          │
│                 │                │→ Ack generation          │
│                 │← TCP Return    │                          │
│ Deserialize     │  (0.5ms)       │                          │
│ (0.05ms)        │                │                          │
└─────────────────┴────────────────┴──────────────────────────┘

Total: ~2.5ms for ONE row
Fixed overhead: ~2.45ms | Variable per-row: ~0.05ms

The math becomes compelling:

100 individual INSERTs: 100 × 2.5ms = 250ms
1 batch INSERT of 100 rows: 2.45ms + (100 × 0.05ms) = 7.45ms
Speedup: 33.5x

Fixed vs Variable Costs Across Operations
Operation	Fixed Overhead	Variable Cost/Item	Batch Sweet Spot
PostgreSQL INSERT	~2-5ms (parse, plan, WAL)	~0.05-0.1ms/row	100-1000 rows
Redis SET	~0.1-0.5ms (RTT)	~0.01ms/key	100-500 keys (MSET)
HTTP API call	~10-50ms (TLS, headers)	~0.1-1ms/item	50-500 items
Kafka produce	~1-5ms (leader ack)	~0.01ms/message	100-10000 messages
S3 PUT	~50-200ms (connection)	~0.01ms/KB	1-100 MB per object
Elasticsearch index	~5-20ms (refresh)	~0.1ms/document	500-5000 docs

When Batching Helps Most

Batching Patterns

Batching manifests in different architectural patterns depending on where and how operations are grouped.

Batching Pattern Types

•Request Aggregation — Multiple incoming requests are buffered and processed together as a single backend operation. Common in GraphQL data loaders and ORM batch fetching.
•Bulk APIs — APIs explicitly accept arrays of items rather than single items. Examples: Stripe's batch charge API, Elasticsearch bulk indexing, AWS SQS SendMessageBatch.
•Write Buffering — Writes accumulate in memory and flush periodically or when buffer reaches threshold. Used in database clients, logging systems, and metrics collectors.
•Micro-Batching — Stream processing divides continuous streams into small time-windowed batches. Spark Streaming, Flink checkpointing, and database auto-commit intervals.
•Batch-and-Callback — Clients submit requests immediately but the system delays response until batch completes. Balances latency perception with backend efficiency.

Request Aggregation with DataLoader Pattern:

The DataLoader pattern (popularized by Facebook/Meta for GraphQL) is elegant: within a single tick of the event loop, all data requests are collected, then executed as a single batch query.

Without DataLoader:               With DataLoader:
                                  
Resolver 1 → fetch(id=1) → DB     Resolver 1 → loader.load(1) ─┐
Resolver 2 → fetch(id=2) → DB     Resolver 2 → loader.load(2) ─┼─→ fetch([1,2,3]) → DB
Resolver 3 → fetch(id=3) → DB     Resolver 3 → loader.load(3) ─┘
                                  
3 queries: 3×2ms = 6ms            1 query: 2.5ms

dataloader-pattern.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import DataLoader from 'dataloader';
 
// DataLoader batches all load() calls within a single event loop tick
const userLoader = new DataLoader<string, User>(async (userIds) => {
  // This function receives ALL requested IDs in a single call
  console.log(`Batched fetch for ${userIds.length} users`);
  
  // Single database query for all users
  const users = await db.query(
    'SELECT * FROM users WHERE id = ANY($1)',
    [userIds]
  );
  
  // Return in same order as requested IDs
  const userMap = new Map(users.map(u => [u.id, u]));
  return userIds.map(id => userMap.get(id) || null);
});
 
// GraphQL resolver - each call seems independent
async function resolveUser(parent) {
  // These are automatically batched!
  return userLoader.load(parent.userId);
}
 
// Usage in GraphQL query:
// query {
//   posts {              # Returns 50 posts
//     author { name }    # Would be 50 DB queries... becomes 1!
//   }
// }

Database Batching Strategies

Databases offer multiple batching mechanisms, each with specific use cases and trade-offs.

Multi-Row INSERT:

Instead of executing N separate INSERT statements, use a single INSERT with multiple value tuples:

-- Slow: N separate INSERT statements
INSERT INTO events (user_id, type, data) VALUES (1, 'click', '{...}');
INSERT INTO events (user_id, type, data) VALUES (2, 'view', '{...}');
-- ... 998 more ...

-- Fast: Single INSERT with N rows
INSERT INTO events (user_id, type, data) VALUES
  (1, 'click', '{...}'),
  (2, 'view', '{...}'),
  -- ... 998 more ...
  (1000, 'click', '{...}');

Performance characteristics:

PostgreSQL: Up to ~10,000 rows per statement (limited by max_stack_depth)
MySQL: Limited by max_allowed_packet (default 64MB)
Optimal batch size: Usually 100-1000 rows; beyond that, diminishing returns

The COPY command (PostgreSQL):

For very large loads, COPY is faster than INSERT:

COPY events (user_id, type, data) FROM STDIN WITH (FORMAT csv);
1,click,"{...}"
2,view,"{...}"
\.

COPY bypasses SQL parsing entirely and uses binary protocol—10-100x faster than INSERT for millions of rows.

API and Network Batching

Network round-trips are often the dominant cost in distributed systems. Batching API calls dramatically reduces this overhead.

HTTP API Design for Batching:

Option 1: Bulk Endpoints

Design APIs that accept arrays:

// Instead of:
POST /api/users/1/notifications  {"message": "Hello"}
POST /api/users/2/notifications  {"message": "Hi"}
POST /api/users/3/notifications  {"message": "Hey"}

// Provide:
POST /api/notifications/batch
{
  "notifications": [
    {"userId": 1, "message": "Hello"},
    {"userId": 2, "message": "Hi"},
    {"userId": 3, "message": "Hey"}
  ]
}

// Response:
{
  "results": [
    {"userId": 1, "success": true, "id": "notif-123"},
    {"userId": 2, "success": true, "id": "notif-124"},
    {"userId": 3, "success": false, "error": "User not found"}
  ]
}

Option 2: GraphQL (Inherent Batching)

GraphQL naturally batches by combining multiple queries in one request:

query {
  user1: user(id: 1) { name email }
  user2: user(id: 2) { name email }
  user3: user(id: 3) { name email }
  recentPosts(limit: 10) { title }
  notifications { message read }
}
# Single HTTP request, multiple data fetches

batch-api-client.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
/**
 * Automatic request batching for API calls
 * Collects requests over a time window, sends as single batch
 */
class BatchedApiClient {
  private pendingRequests: Map<string, {
    resolve: (value: any) => void;
    reject: (error: Error) => void;
    request: any;
  }[]> = new Map();
  
  private batchTimeoutMs: number;
  private maxBatchSize: number;
  private scheduledFlush: NodeJS.Timeout | null = null;
 
  constructor(options: { batchTimeoutMs?: number; maxBatchSize?: number } = {}) {
    this.batchTimeoutMs = options.batchTimeoutMs ?? 10; // 10ms window
    this.maxBatchSize = options.maxBatchSize ?? 100;
  }
 
  async request<T>(endpoint: string, body: any): Promise<T> {
    return new Promise((resolve, reject) => {
      // Add to pending batch for this endpoint
      if (!this.pendingRequests.has(endpoint)) {
        this.pendingRequests.set(endpoint, []);
      }
      
      const batch = this.pendingRequests.get(endpoint)!;
      batch.push({ resolve, reject, request: body });
      
      // Flush if batch is full
      if (batch.length >= this.maxBatchSize) {
        this.flushEndpoint(endpoint);
        return;
      }
      
      // Schedule flush after timeout
      if (!this.scheduledFlush) {
        this.scheduledFlush = setTimeout(() => {
          this.flushAll();
          this.scheduledFlush = null;
        }, this.batchTimeoutMs);
      }
    });
  }
 
  private async flushEndpoint(endpoint: string) {
    const batch = this.pendingRequests.get(endpoint);
    if (!batch || batch.length === 0) return;
    
    this.pendingRequests.delete(endpoint);
    
    try {
      // Send batch request
      const response = await fetch(`${endpoint}/batch`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ 
          requests: batch.map(b => b.request) 
        }),
      });
      
      const results = await response.json();
      
      // Resolve individual promises
      batch.forEach((item, index) => {
        const result = results.responses[index];
        if (result.error) {
          item.reject(new Error(result.error));
        } else {
          item.resolve(result.data);
        }
      });
    } catch (error) {
      // Reject all on network error
      batch.forEach(item => item.reject(error as Error));
    }
  }
 
  private flushAll() {
    for (const endpoint of this.pendingRequests.keys()) {
      this.flushEndpoint(endpoint);
    }
  }
}
 
// Usage: Individual calls are automatically batched
const client = new BatchedApiClient({ batchTimeoutMs: 5 });
 
// These 3 calls within 5ms become 1 HTTP request:
const [user1, user2, user3] = await Promise.all([
  client.request('/api/users', { id: 1 }),
  client.request('/api/users', { id: 2 }),
  client.request('/api/users', { id: 3 }),
]);

Message Queue Batching

Message queues—Kafka, SQS, RabbitMQ—benefit enormously from batching due to network round-trip and acknowledgment overhead.

Kafka Producer Batching:

Kafka's producer API batches messages automatically based on two thresholds:

batch.size: Maximum bytes per batch (default 16KB)
linger.ms: Maximum time to wait for more messages (default 0ms)

Without batching (linger.ms=0):           With batching (linger.ms=5):

Message 1 → send → wait ack → done       Message 1 ──┐
Message 2 → send → wait ack → done       Message 2 ──┼── batch ──→ send ─→ ack
Message 3 → send → wait ack → done       Message 3 ──┘

Latency: low per-message                 Latency: +5ms per batch
Throughput: ~1000 msg/sec                Throughput: ~100,000+ msg/sec

Configuration trade-offs:

Setting	Low Value	High Value
batch.size	Smaller batches, more network calls	Larger batches, higher memory
linger.ms	Lower latency	Higher throughput
compression.type	Less CPU, more network	More CPU, less network

kafka-batching-config.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import { Kafka, CompressionTypes } from 'kafkajs';
 
const kafka = new Kafka({ brokers: ['localhost:9092'] });
 
// Producer with batching optimizations
const producer = kafka.producer({
  // Allow batching for up to 5ms
  // Trade: +5ms latency for massive throughput gains
  allowAutoTopicCreation: false,
  
  // Transaction timeout
  transactionTimeout: 30000,
});
 
// Configure producer batching
await producer.connect();
 
// Sending with batching - all these go in one network call
const messages = Array.from({ length: 1000 }, (_, i) => ({
  key: `user-${i % 100}`,
  value: JSON.stringify({ event: 'click', userId: i, timestamp: Date.now() }),
}));
 
// sendBatch is explicit batching
await producer.sendBatch({
  topicMessages: [{
    topic: 'user-events',
    messages,
  }],
});
 
// Or use natural batching: rapid sends within linger window are batched
for (const msg of messages) {
  // These accumulate in internal buffer
  // Sent when batch.size reached OR linger.ms elapsed
  producer.send({
    topic: 'user-events',
    messages: [msg],
  }).catch(console.error);
}
 
// Consumer-side batching with Kafka
const consumer = kafka.consumer({ groupId: 'analytics-group' });
await consumer.connect();
await consumer.subscribe({ topic: 'user-events' });
 
await consumer.run({
  // Process in batches for efficiency
  eachBatch: async ({ batch, resolveOffset, heartbeat }) => {
    console.log(`Processing batch of ${batch.messages.length} messages`);
    
    // Batch insert to database
    const events = batch.messages.map(m => JSON.parse(m.value!.toString()));
    await db.events.insertMany(events);
    
    // Commit offset after batch processed
    resolveOffset(batch.messages[batch.messages.length - 1].offset);
    await heartbeat();
  },
});

Batching and Exactly-Once Semantics

Determining Optimal Batch Size

Batch size is not "bigger is better." There's an optimal range where throughput is maximized without excessive latency or resource consumption.

The Batching Curve:

Throughput
    ▲
    │           ┌─────────────────── Optimal range
    │          ╱                    ╲
    │         ╱                      ╲
    │        ╱                        ╲
    │       ╱                          ╲
    │      ╱                            ╲
    │     ╱                              ╲ Memory limits,
    │    ╱                                ╲ timeout issues
    │   ╱
    │  ╱
    │ ╱
    └───────────────────────────────────────────▶ Batch Size
       1     10    100   1000  10000  100000

Phase 1 (1-100): Rapid throughput increase as overhead amortized
Phase 2 (100-1000): Diminishing returns, still improving
Phase 3 (1000+): Flat or declining - memory pressure, timeouts

Factors Limiting Batch Size:

Batch Size Constraints

•Memory constraints — Large batches consume memory. A batch of 10,000 items at 10KB each = 100MB in memory. Memory pressure causes GC pauses or OOM.
•Timeout limits — Processing larger batches takes longer. If batch processing exceeds client timeout, the entire batch fails and must be retried.
•Latency requirements — Batching inherently adds latency (waiting for batch to fill). If SLA requires <50ms response, you can't wait 100ms to batch.
•Transaction limits — Database transactions have statement limits, row limits, or duration limits. PostgreSQL: ~10,000 bind parameters per statement.
•Network packet size — Very large batches may require packet fragmentation, adding overhead. MTU is typically 1500 bytes; jumbo frames allow 9000.
•Partial failure complexity — Larger batches mean more complex partial failure handling. Which items succeeded? Which need retry?

Finding Your Optimal Batch Size:

The optimal batch size is workload-specific. Use this process:

Baseline: Measure throughput with batch size = 1
Increment: Double batch size, measure again
Find plateau: When throughput stops increasing significantly (< 10% gain), you've found the optimal range
Consider latency: Check if latency at optimal throughput meets SLAs
Add safety margin: Use 50-80% of maximum stable batch size to handle variance

Typical optimal ranges by system:

System	Optimal Batch Size	Reasoning
PostgreSQL INSERT	100-1000 rows	Beyond ~5000, transaction overhead grows
Redis MSET	100-500 keys	Memory efficiency, pipeline limits
Elasticsearch bulk	500-5000 docs	Refresh interval, memory buffers
Kafka producer	16KB-1MB	Network efficiency vs latency
AWS SQS	10 messages	API limit (SendMessageBatch max 10)
HTTP API batch	50-500 items	Request body size, timeout constraints

The Latency vs Throughput Trade-off

Batching inherently trades latency for throughput. Understanding and managing this trade-off is crucial for system design.

When to Optimize for Latency

•User-facing synchronous requests
•Real-time dashboards and UIs
•Trading systems (microseconds matter)
•Interactive search suggestions
•Payment processing (perceived speed)
•When SLAs specify latency percentiles

When to Optimize for Throughput

•Background job processing
•ETL and data pipelines
•Log ingestion and analytics
•Bulk data imports/exports
•Asynchronous event processing
•When cost per operation matters

Hybrid Strategies:

Smart systems adapt batching dynamically:

1. Adaptive Batching:

Load-based batch sizing:

if (queueDepth > 1000) {
  batchSize = 500;      // Under load: prioritize throughput
  lingerMs = 50;
} else if (queueDepth > 100) {
  batchSize = 100;      // Medium load: balance
  lingerMs = 10;
} else {
  batchSize = 10;       // Low load: prioritize latency
  lingerMs = 1;
}

2. Dual-Path Architecture:

                    ┌──────────────────┐
                    │   Incoming       │
                    │   Requests       │
                    └────────┬─────────┘
                             │
              ┌──────────────┴──────────────┐
              │         Router              │
              │   (priority classification) │
              └──────┬───────────┬──────────┘
                     │           │
         ┌───────────▼───┐   ┌───▼───────────┐
         │  Fast Path    │   │  Batch Path   │
         │  (immediate)  │   │  (buffered)   │
         │  Premium SLA  │   │  Best-effort  │
         └───────────────┘   └───────────────┘

3. Fire-and-Acknowledge Pattern:

Acknowledge immediately but process in batches:

Client sends request
    │
    ▼
Server writes to durable queue → Immediate ACK to client
    │                            ("Request accepted")
    │
    ▼   (async)
Background worker batches from queue → Processes batch
    │
    ▼
Webhook/notification on completion

This gives clients low latency (fast ack) while backend achieves high throughput (batched processing).

Summary: Batching for Throughput

We've explored batching as a fundamental throughput optimization technique. Here are the key insights:

Key Takeaways

•Batching amortizes fixed costs — Network round-trips, connection overhead, and synchronization costs are paid once per batch instead of once per item.
•The economics are compelling — Batching often provides 10-100x throughput improvements with minimal code changes.
•Multiple patterns exist — Request aggregation (DataLoader), bulk APIs, write buffering, and micro-batching serve different use cases.
•Database batching is critical — Bulk INSERT, batch SELECT, and transaction batching eliminate N+1 query problems and reduce I/O overhead.
•Optimal batch size is workload-specific — Too small wastes overhead; too large adds latency, memory pressure, and failure complexity.
•Latency and throughput trade off — Batching inherently adds latency; use adaptive strategies or dual-path architectures when both matter.

What's next:

Page Complete

2 / 5