System Design (LLD)Concurrency Patterns

Thread Pool Pattern

LevelIntermediate

Duration60 mins

TopicConcurrency Patterns

3 / 4

Pool Sizing Considerations

The Pool Size Question

You've built a thread pool. You understand worker threads, task queues, and rejection policies. Now comes the question that has launched a thousand debates in engineering teams:

How many threads should the pool have?

This seemingly simple question has a frustratingly unsatisfying answer: it depends. But "it depends" isn't where the conversation ends—it's where it begins. Understanding what it depends on, and how those factors interact, is what separates principled engineering from cargo-cult configuration.

Pool sizing is not guesswork. It's applied mathematics combined with workload analysis and empirical validation. Get it right, and your system extracts maximum performance from your hardware. Get it wrong, and you either waste resources (too few threads) or create contention and overhead (too many threads).

What You Will Learn

By the end of this page, you will be able to analyze workloads, apply pool sizing formulas, understand the difference between CPU-bound and I/O-bound work, and use empirical methods to tune pool sizes for production systems. You'll move from guessing to reasoning about pool configuration.

The Fundamental Tradeoff

Pool sizing is fundamentally about balancing two opposing forces:

Too few threads:

CPUs sit idle while work waits in queue
Hardware capacity is wasted
Latency increases (queue wait time)
Throughput is artificially limited

Too many threads:

Excessive context switching overhead
Memory wasted on thread stacks
Cache thrashing (threads compete for L1/L2 cache)
Lock contention increases
Diminishing returns, then negative returns

The goal is to find the sweet spot where:

All available CPU cores stay busy
Queue wait time is minimized
Contention overhead is acceptable
Memory usage is reasonable

The Perfect Pool Size Is Workload-Specific

There is no universal "right" pool size. A pool optimized for CPU-intensive image processing will be terrible for I/O-heavy database queries, and vice versa. The key insight is that pool size should be derived from workload characteristics, not chosen from defaults or rules of thumb.

CPU-Bound vs I/O-Bound Workloads

The single most important factor in pool sizing is the nature of the work: is it CPU-bound or I/O-bound? These two categories require fundamentally different pool configurations.

CPU-Bound Work:

Task spends most time computing (math, parsing, encryption)
Thread is always runnable, always using CPU
Limited by compute capacity
Examples: image processing, video encoding, compression, cryptography

I/O-Bound Work:

Task spends most time waiting (network, disk, database)
Thread is often blocked, not using CPU
Limited by I/O latency and bandwidth
Examples: web requests, database queries, file operations, API calls

The key difference:

A CPU-bound thread actively uses the CPU 100% of its running time. An I/O-bound thread might use the CPU only 5-10% of its running time, waiting for I/O the other 90-95%.

CPU-Bound vs I/O-Bound Pool Sizing
Characteristic	CPU-Bound	I/O-Bound
Thread activity	Always computing	Often waiting
CPU utilization per thread	~100%	5-50%
Optimal pool size	~Number of CPU cores	Many times number of cores
Bottleneck when too few threads	Idle CPUs	Idle CPUs (threads blocked)
Problem when too many threads	Context switch overhead	Memory/connection exhaustion
Typical ratio to cores	1:1 to 2:1	5:1 to 100:1 or higher

Most Real Workloads Are Mixed

Pure CPU-bound or pure I/O-bound workloads are rare. Most tasks do some computation and some I/O. The key is identifying the DOMINANT characteristic. If tasks spend 80% of time on I/O, treat it as I/O-bound. If 80% on computation, treat it as CPU-bound.

Pool Sizing for CPU-Bound Work

For CPU-bound workloads, the sizing logic is relatively straightforward: you want one thread per available CPU core. More threads won't help because there are no idle cycles to exploit—you'll just add context switching overhead.

The formula:

cpu-bound-formula.txt

Formula

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│   CPU-Bound Pool Size = Number of CPU Cores (N)             │
│                                                             │
│   Or slightly higher:                                       │
│                                                             │
│   CPU-Bound Pool Size = N + 1                               │
│                                                             │
│   Why +1? To compensate for occasional page faults,         │
│   rare I/O, or other minor blocking events.                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘
 
Example:
- 8-core machine → 8 to 9 threads
- 32-core machine → 32 to 33 threads
- 64-core machine → 64 to 65 threads

Why not more threads?

Imagine an 8-core CPU running 16 CPU-bound threads:

Each core runs 2 threads in time-sliced fashion
Context switches happen constantly (typically every 1-10ms)
Each context switch costs 1-10 microseconds (direct cost)
Plus cache invalidation (indirect cost, often larger)
Net result: threads run SLOWER than with 8 threads

With CPU-bound work, adding threads beyond the core count always decreases total throughput. You're dividing the same compute capacity among more threads while adding overhead.

cpu-bound-pool.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import java.util.concurrent.*;
 
/**
 * CPU-bound thread pool configuration.
 * Use for: image processing, compression, encryption, parsing, etc.
 */
public class CpuBoundPoolConfiguration {
    
    // Get number of available processors
    private static final int CPU_CORES = Runtime.getRuntime().availableProcessors();
    
    /**
     * Create a pool sized for CPU-bound work.
     * 
     * Uses N or N+1 threads where N = number of cores.
     * Fixed-size pool: no dynamic scaling (not needed for CPU-bound).
     * Unbounded queue: tasks are lightweight, pool won't be overwhelmed.
     */
    public static ExecutorService createCpuBoundPool() {
        return Executors.newFixedThreadPool(CPU_CORES);
    }
    
    /**
     * More explicit configuration with N+1 threads.
     */
    public static ExecutorService createCpuBoundPoolExplicit() {
        return new ThreadPoolExecutor(
            CPU_CORES + 1,          // corePoolSize
            CPU_CORES + 1,          // maximumPoolSize (same - fixed)
            0L,                      // keepAliveTime (threads never idle)
            TimeUnit.MILLISECONDS,
            new LinkedBlockingQueue<>(),  // Unbounded for CPU-bound
            new ThreadPoolExecutor.AbortPolicy()
        );
    }
    
    /**
     * Configuration with bounded queue for backpressure.
     * Use when task submission rate could exceed processing rate.
     */
    public static ExecutorService createCpuBoundPoolWithBackpressure(int queueSize) {
        return new ThreadPoolExecutor(
            CPU_CORES,
            CPU_CORES,
            0L,
            TimeUnit.MILLISECONDS,
            new ArrayBlockingQueue<>(queueSize),
            new ThreadPoolExecutor.CallerRunsPolicy()  // Natural backpressure
        );
    }
    
    public static void main(String[] args) {
        System.out.println("Available CPU cores: " + CPU_CORES);
        
        ExecutorService pool = createCpuBoundPool();
        
        // Submit CPU-intensive tasks
        for (int i = 0; i < 100; i++) {
            final int taskId = i;
            pool.submit(() -> {
                // Simulate CPU-bound work
                long sum = 0;
                for (long j = 0; j < 100_000_000; j++) {
                    sum += j % 13;
                }
                System.out.println("Task " + taskId + " complete: " + sum);
            });
        }
        
        pool.shutdown();
    }
}

Python's GIL Changes Everything

Python's Global Interpreter Lock prevents true thread-level parallelism for CPU-bound work. Use ProcessPoolExecutor (separate processes) instead of ThreadPoolExecutor (threads in one process) for CPU-intensive tasks in Python. ThreadPoolExecutor is still useful for I/O-bound work where threads spend time waiting.

Pool Sizing for I/O-Bound Work

I/O-bound workloads are fundamentally different. Since threads spend most of their time waiting (blocked on network, disk, database), you need many more threads than CPU cores to keep the CPUs busy.

The key insight: Wait time creates opportunity for parallelism

If a thread spends 90% of its time waiting and 10% computing, you can have 10 threads per core before reaching CPU saturation. The blocked threads don't consume CPU—they're parked by the OS scheduler.

Brian Goetz's Formula (from Java Concurrency in Practice):

io-bound-formula.txt

Formula

┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│   Number of Threads = N × U × (1 + W/C)                             │
│                                                                     │
│   Where:                                                            │
│     N = Number of CPU cores                                         │
│     U = Target CPU utilization (0 < U ≤ 1)                          │
│     W = Wait time (time thread is blocked)                          │
│     C = Compute time (time thread is running on CPU)                │
│                                                                     │
│   W/C is the "wait-to-compute ratio"                                │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
 
Examples (assuming 8 cores, 100% target utilization):
 
1. Database queries (90% wait, 10% compute):
   W/C = 0.9 / 0.1 = 9
   Threads = 8 × 1 × (1 + 9) = 80 threads
 
2. Network API calls (95% wait, 5% compute):
   W/C = 0.95 / 0.05 = 19
   Threads = 8 × 1 × (1 + 19) = 160 threads
 
3. Mixed workload (50% wait, 50% compute):
   W/C = 0.5 / 0.5 = 1
   Threads = 8 × 1 × (1 + 1) = 16 threads
 
4. CPU-bound (0% wait, 100% compute):
   W/C = 0 / 1 = 0
   Threads = 8 × 1 × (1 + 0) = 8 threads  ← Matches CPU-bound formula!

Practical considerations:

The formula gives a theoretical optimum, but real-world factors complicate matters:

W/C ratio varies: Not all tasks have the same wait/compute ratio. You're targeting an average.
External bottlenecks: If your database can only handle 100 concurrent connections, a 200-thread pool won't help—it'll just queue at the database.
Memory constraints: Each thread consumes memory. 500 threads × 1MB stack = 500MB just for stacks.
Connection pool limits: Database connections, HTTP connections, and other resources have their own pools with limited capacity.
Contention: More threads mean more lock contention, which can dominate at extreme thread counts.

io-bound-pool.java
Java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
import java.util.concurrent.*;
 
/**
 * I/O-bound thread pool configuration.
 * Use for: database queries, HTTP calls, file I/O, etc.
 */
public class IoBoundPoolConfiguration {
    
    private static final int CPU_CORES = Runtime.getRuntime().availableProcessors();
    
    /**
     * Calculate pool size for I/O-bound work using Goetz formula.
     * 
     * @param waitTimeRatio Fraction of time tasks spend waiting (0.0 to 1.0)
     * @param targetUtilization Target CPU utilization (0.0 to 1.0)
     */
    public static int calculatePoolSize(double waitTimeRatio, double targetUtilization) {
        // Convert wait time ratio to W/C ratio
        // If waitTimeRatio = 0.9, then computeRatio = 0.1, W/C = 9
        double computeTimeRatio = 1.0 - waitTimeRatio;
        double wcRatio = waitTimeRatio / computeTimeRatio;
        
        // Brian Goetz formula: N × U × (1 + W/C)
        return (int) Math.ceil(
            CPU_CORES * targetUtilization * (1 + wcRatio)
        );
    }
    
    /**
     * Create pool for database queries (typically 80-95% I/O).
     * Conservative sizing to avoid overwhelming the database.
     */
    public static ExecutorService createDatabaseQueryPool() {
        int poolSize = calculatePoolSize(0.85, 0.9);  // 85% wait, 90% CPU target
        poolSize = Math.min(poolSize, 100);  // Cap at reasonable max
        
        return new ThreadPoolExecutor(
            poolSize,
            poolSize,
            60L, TimeUnit.SECONDS,
            new LinkedBlockingQueue<>(1000),
            new ThreadPoolExecutor.CallerRunsPolicy()
        );
    }
    
    /**
     * Create pool for HTTP API calls (typically 90-98% I/O).
     * Higher thread count because network I/O is very wait-heavy.
     */
    public static ExecutorService createHttpClientPool() {
        int poolSize = calculatePoolSize(0.95, 0.9);  // 95% wait
        poolSize = Math.min(poolSize, 200);  // Cap for safety
        
        return new ThreadPoolExecutor(
            CPU_CORES * 2,          // Core pool: start smaller
            poolSize,               // Max pool: scale up on demand
            60L, TimeUnit.SECONDS,  // Idle threads expire after 60s
            new SynchronousQueue<>(),  // Direct handoff, no queuing
            new ThreadPoolExecutor.CallerRunsPolicy()
        );
    }
    
    /**
     * Dynamic I/O pool that scales based on demand.
     * Good for variable workloads.
     */
    public static ExecutorService createElasticIoPool(int maxThreads) {
        return new ThreadPoolExecutor(
            CPU_CORES,              // Core pool: minimum threads
            maxThreads,             // Maximum threads under peak load
            30L, TimeUnit.SECONDS,  // Idle threads reclaimed quickly
            new SynchronousQueue<>(),
            new ThreadPoolExecutor.CallerRunsPolicy()
        );
    }
    
    public static void main(String[] args) throws Exception {
        System.out.println("CPU Cores: " + CPU_CORES);
        System.out.println("Database pool size: " + calculatePoolSize(0.85, 0.9));
        System.out.println("HTTP pool size: " + calculatePoolSize(0.95, 0.9));
        
        // Demonstrate I/O-bound pool
        ExecutorService pool = createHttpClientPool();
        
        // Simulate HTTP API calls
        for (int i = 0; i < 50; i++) {
            final int requestId = i;
            pool.submit(() -> {
                try {
                    // Simulate network I/O
                    Thread.sleep(200);  // 200ms network latency
                    System.out.println("Request " + requestId + " complete");
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                }
            });
        }
        
        pool.shutdown();
        pool.awaitTermination(30, TimeUnit.SECONDS);
    }
}

Little's Law and Queue Theory

Pool sizing can also be approached through queueing theory, specifically using Little's Law—a fundamental theorem that relates arrival rate, processing time, and concurrency.

Little's Law:

littles-law.txt

Theory

┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│   L = λ × W                                                         │
│                                                                     │
│   Where:                                                            │
│     L = Average number of items in the system (concurrency)         │
│     λ = Arrival rate (items per unit time)                          │
│     W = Average time each item spends in the system                 │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
 
Rearranging for concurrency (pool size):
 
   Threads Needed = Arrival Rate × Processing Time
 
Examples:
 
1. 1000 requests/second, 100ms average processing:
   Threads = 1000 × 0.1 = 100 threads
 
2. 500 requests/second, 200ms average processing:
   Threads = 500 × 0.2 = 100 threads
 
3. 100 requests/second, 2 seconds average processing:
   Threads = 100 × 2 = 200 threads
 
Note: This gives the MINIMUM to avoid queuing. Add buffer for bursts.

Why Little's Law matters:

Little's Law is remarkable because it holds regardless of the arrival distribution, processing time distribution, or scheduling discipline. It's a universal truth about stable systems.

Applying to pool sizing:

If you know your expected request rate and average processing time, Little's Law tells you the minimum concurrency to handle that load without queue growth. If you have fewer threads, the queue will grow indefinitely, eventually causing task timeouts or rejection.

Little's Law for Capacity Planning

Little's Law is incredibly useful for capacity planning. If you expect 5000 requests/second at peak, and your profiling shows 150ms average processing time, you need at least 750 concurrent threads (or processes, or async handlers) to keep up: 5000 × 0.15 = 750.

Accounting for variability:

Little's Law gives the average, but real systems have variance. During bursts:

Arrival rate spikes above average
Some requests take longer than average
Multiple factors can compound

To handle bursts without queue explosion:

Add headroom: Size pool 20-50% above Little's Law minimum
Use bounded queues: Provide buffer for short bursts
Monitor queue depth: Alert when queues grow (system is undersized)
Implement backpressure: Reject or throttle when overloaded

Practical Sizing Strategies

Theory informs practice, but real-world pool sizing usually involves empirical tuning. Here are practical strategies used in production systems:

Pool Sizing Strategies

•Start with formulas, refine empirically — Use Goetz's formula or Little's Law as starting points, then benchmark and adjust based on actual performance.
•Consider external dependencies — If your database connection pool is 50, there's no point having 200 threads making database queries. Match pool sizes to downstream capacity.
•Use dynamic/elastic pools for I/O — Start with a small core size, scale up under load, scale down when idle. This adapts to variable traffic patterns.
•Fixed pools for CPU-bound — CPU-bound pools don't benefit from dynamic sizing since the CPU is the bottleneck. Use fixed-size pools matching core count.
•Separate pools for different workloads — Don't mix CPU-bound and I/O-bound work in the same pool. CPU-bound work will starve if threads are blocked on I/O.
•Monitor and alert on queue depth — Growing queues indicate the pool is undersized or downstream is slow. Queue depth should be near zero under normal load.

Starting Pool Sizes by Workload Type
Workload Type	Formula	Typical Range	Scaling Approach
Pure CPU-bound	N	N to N+1	Fixed (no scaling)
CPU + light I/O	N × 2	N to 2N	Fixed or slight elasticity
Mixed 50/50	N × 2 to N × 4	2N to 4N	Elastic (up to max)
I/O-heavy (80%)	N × 5 to N × 10	5N to 10N	Elastic with higher max
Very I/O-heavy (95%+)	N × 10 to N × 50	10N to 50N+	Elastic, limited by downstream

The benchmark-driven approach:

Ultimately, the best pool size is determined by benchmarking your specific workload:

Start with calculated value from formulas
Load test with realistic traffic patterns
Measure throughput, latency (p50, p95, p99), CPU utilization
Adjust pool size up or down
Repeat until metrics are optimal
Document the reasoning and measurements

Don't Optimize in the Dark

Never guess at pool sizes for production systems. Always measure wait/compute ratios, run load tests, and observe actual behavior. A scientifically-sized pool outperforms both over-provisioning (wasted resources) and under-provisioning (poor performance).

Common Pitfalls and Anti-Patterns

Pool sizing mistakes are common even among experienced engineers. Here are pitfalls to avoid:

Pool Sizing Anti-Patterns

•Using unbounded pools — Executors.newCachedThreadPool() creates unlimited threads. Under heavy load, this can spawn thousands of threads, exhausting memory and causing context switch storms.
•Using fixed defaults — Hardcoding new ThreadPoolExecutor(10, 10, ...) without considering actual hardware or workload. Pool sizes should be derived from system properties.
•One pool for all workloads — Mixing CPU-bound and I/O-bound work in one pool causes either underutilization (sized for CPU, I/O tasks block) or overload (sized for I/O, CPU tasks overwhelm).
•Ignoring downstream limits — Sizing the HTTP client pool at 500 when maximum database connections is 50 creates a bottleneck that produces no benefit from extra threads.
•Over-provisioning "just in case" — Having 500 threads when 50 would suffice wastes ~450MB in stack memory and increases overhead for no benefit.
•Under-provisioning to "save resources" — Having 4 threads on a 32-core machine because "we don't need parallelism" leaves vast capacity unused.
•Not monitoring pool health — Without metrics on queue depth, rejection rate, and thread utilization, you're flying blind and can't tune effectively.

pool-antipatterns.java
Java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// ❌ ANTI-PATTERN: Unbounded cached pool
// Under load, this can create THOUSANDS of threads
ExecutorService bad1 = Executors.newCachedThreadPool();
 
// ❌ ANTI-PATTERN: Hardcoded magic numbers
// Doesn't adapt to different machines
ExecutorService bad2 = Executors.newFixedThreadPool(10);
 
// ❌ ANTI-PATTERN: Single pool for everything
// CPU-bound and I/O-bound tasks fight for threads
ExecutorService bad3 = Executors.newFixedThreadPool(50);
bad3.submit(cpuBoundTask);  // Takes 100% of thread's CPU time
bad3.submit(ioBoundTask);   // Waits for network 95% of time
 
// ✅ CORRECT: Separate pools for different workload types
int cpuCores = Runtime.getRuntime().availableProcessors();
 
// CPU-bound pool: sized to core count
ExecutorService cpuPool = Executors.newFixedThreadPool(cpuCores);
 
// I/O-bound pool: sized for high concurrency
ExecutorService ioPool = new ThreadPoolExecutor(
    cpuCores * 2,           // Core: moderate
    cpuCores * 20,          // Max: high for I/O
    60L, TimeUnit.SECONDS,
    new LinkedBlockingQueue<>(1000)
);
 
// ✅ CORRECT: Use appropriate pool for each workload
cpuPool.submit(cpuBoundTask);
ioPool.submit(ioBoundTask);

Summary

What You've Learned

•Pool sizing balances underutilization vs overhead — Too few threads waste capacity; too many threads add contention and context switching overhead.
•Workload type determines sizing — CPU-bound work needs ~N threads (N = cores); I/O-bound work needs N × (1 + W/C) threads based on wait/compute ratio.
•Goetz's formula (N × U × (1 + W/C)) provides a principled starting point for I/O-bound workloads.
•Little's Law (L = λ × W) relates concurrency to arrival rate and processing time—essential for capacity planning.
•Empirical tuning is essential — Start with formulas, then benchmark with realistic traffic to find the optimal size for your specific system.
•Avoid common pitfalls: unbounded pools, magic number sizing, mixing workload types, ignoring downstream limits.

What's next:

We've covered the theory and considerations for pool sizing. The final page brings it all together with practical, production-ready implementations and real-world usage patterns from major frameworks and systems.

Ready for Production Patterns

You now have the theoretical foundation for pool sizing. Next, we'll see how these principles manifest in real production systems—Java's ExecutorService, Python's ThreadPoolExecutor, and patterns used at scale by companies like Netflix, Amazon, and Google.

3 / 4

Loading learning content...

System Design (LLD)Concurrency Patterns

Thread Pool Pattern

LevelIntermediate

Duration60 mins

TopicConcurrency Patterns

3 / 4

Pool Sizing Considerations

The Pool Size Question

You've built a thread pool. You understand worker threads, task queues, and rejection policies. Now comes the question that has launched a thousand debates in engineering teams:

How many threads should the pool have?

What You Will Learn

The Fundamental Tradeoff

Pool sizing is fundamentally about balancing two opposing forces:

Too few threads:

CPUs sit idle while work waits in queue
Hardware capacity is wasted
Latency increases (queue wait time)
Throughput is artificially limited

Too many threads:

Excessive context switching overhead
Memory wasted on thread stacks
Cache thrashing (threads compete for L1/L2 cache)
Lock contention increases
Diminishing returns, then negative returns

The goal is to find the sweet spot where:

All available CPU cores stay busy
Queue wait time is minimized
Contention overhead is acceptable
Memory usage is reasonable

The Perfect Pool Size Is Workload-Specific

CPU-Bound vs I/O-Bound Workloads

The single most important factor in pool sizing is the nature of the work: is it CPU-bound or I/O-bound? These two categories require fundamentally different pool configurations.

CPU-Bound Work:

Task spends most time computing (math, parsing, encryption)
Thread is always runnable, always using CPU
Limited by compute capacity
Examples: image processing, video encoding, compression, cryptography

I/O-Bound Work:

Task spends most time waiting (network, disk, database)
Thread is often blocked, not using CPU
Limited by I/O latency and bandwidth
Examples: web requests, database queries, file operations, API calls

The key difference:

A CPU-bound thread actively uses the CPU 100% of its running time. An I/O-bound thread might use the CPU only 5-10% of its running time, waiting for I/O the other 90-95%.

CPU-Bound vs I/O-Bound Pool Sizing
Characteristic	CPU-Bound	I/O-Bound
Thread activity	Always computing	Often waiting
CPU utilization per thread	~100%	5-50%
Optimal pool size	~Number of CPU cores	Many times number of cores
Bottleneck when too few threads	Idle CPUs	Idle CPUs (threads blocked)
Problem when too many threads	Context switch overhead	Memory/connection exhaustion
Typical ratio to cores	1:1 to 2:1	5:1 to 100:1 or higher

Most Real Workloads Are Mixed

Pool Sizing for CPU-Bound Work

The formula:

cpu-bound-formula.txt

Formula

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│   CPU-Bound Pool Size = Number of CPU Cores (N)             │
│                                                             │
│   Or slightly higher:                                       │
│                                                             │
│   CPU-Bound Pool Size = N + 1                               │
│                                                             │
│   Why +1? To compensate for occasional page faults,         │
│   rare I/O, or other minor blocking events.                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘
 
Example:
- 8-core machine → 8 to 9 threads
- 32-core machine → 32 to 33 threads
- 64-core machine → 64 to 65 threads

Why not more threads?

Imagine an 8-core CPU running 16 CPU-bound threads:

Each core runs 2 threads in time-sliced fashion
Context switches happen constantly (typically every 1-10ms)
Each context switch costs 1-10 microseconds (direct cost)
Plus cache invalidation (indirect cost, often larger)
Net result: threads run SLOWER than with 8 threads

With CPU-bound work, adding threads beyond the core count always decreases total throughput. You're dividing the same compute capacity among more threads while adding overhead.

cpu-bound-pool.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import java.util.concurrent.*;
 
/**
 * CPU-bound thread pool configuration.
 * Use for: image processing, compression, encryption, parsing, etc.
 */
public class CpuBoundPoolConfiguration {
    
    // Get number of available processors
    private static final int CPU_CORES = Runtime.getRuntime().availableProcessors();
    
    /**
     * Create a pool sized for CPU-bound work.
     * 
     * Uses N or N+1 threads where N = number of cores.
     * Fixed-size pool: no dynamic scaling (not needed for CPU-bound).
     * Unbounded queue: tasks are lightweight, pool won't be overwhelmed.
     */
    public static ExecutorService createCpuBoundPool() {
        return Executors.newFixedThreadPool(CPU_CORES);
    }
    
    /**
     * More explicit configuration with N+1 threads.
     */
    public static ExecutorService createCpuBoundPoolExplicit() {
        return new ThreadPoolExecutor(
            CPU_CORES + 1,          // corePoolSize
            CPU_CORES + 1,          // maximumPoolSize (same - fixed)
            0L,                      // keepAliveTime (threads never idle)
            TimeUnit.MILLISECONDS,
            new LinkedBlockingQueue<>(),  // Unbounded for CPU-bound
            new ThreadPoolExecutor.AbortPolicy()
        );
    }
    
    /**
     * Configuration with bounded queue for backpressure.
     * Use when task submission rate could exceed processing rate.
     */
    public static ExecutorService createCpuBoundPoolWithBackpressure(int queueSize) {
        return new ThreadPoolExecutor(
            CPU_CORES,
            CPU_CORES,
            0L,
            TimeUnit.MILLISECONDS,
            new ArrayBlockingQueue<>(queueSize),
            new ThreadPoolExecutor.CallerRunsPolicy()  // Natural backpressure
        );
    }
    
    public static void main(String[] args) {
        System.out.println("Available CPU cores: " + CPU_CORES);
        
        ExecutorService pool = createCpuBoundPool();
        
        // Submit CPU-intensive tasks
        for (int i = 0; i < 100; i++) {
            final int taskId = i;
            pool.submit(() -> {
                // Simulate CPU-bound work
                long sum = 0;
                for (long j = 0; j < 100_000_000; j++) {
                    sum += j % 13;
                }
                System.out.println("Task " + taskId + " complete: " + sum);
            });
        }
        
        pool.shutdown();
    }
}

Python's GIL Changes Everything

Pool Sizing for I/O-Bound Work

I/O-bound workloads are fundamentally different. Since threads spend most of their time waiting (blocked on network, disk, database), you need many more threads than CPU cores to keep the CPUs busy.

The key insight: Wait time creates opportunity for parallelism

Brian Goetz's Formula (from Java Concurrency in Practice):

io-bound-formula.txt

Formula

┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│   Number of Threads = N × U × (1 + W/C)                             │
│                                                                     │
│   Where:                                                            │
│     N = Number of CPU cores                                         │
│     U = Target CPU utilization (0 < U ≤ 1)                          │
│     W = Wait time (time thread is blocked)                          │
│     C = Compute time (time thread is running on CPU)                │
│                                                                     │
│   W/C is the "wait-to-compute ratio"                                │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
 
Examples (assuming 8 cores, 100% target utilization):
 
1. Database queries (90% wait, 10% compute):
   W/C = 0.9 / 0.1 = 9
   Threads = 8 × 1 × (1 + 9) = 80 threads
 
2. Network API calls (95% wait, 5% compute):
   W/C = 0.95 / 0.05 = 19
   Threads = 8 × 1 × (1 + 19) = 160 threads
 
3. Mixed workload (50% wait, 50% compute):
   W/C = 0.5 / 0.5 = 1
   Threads = 8 × 1 × (1 + 1) = 16 threads
 
4. CPU-bound (0% wait, 100% compute):
   W/C = 0 / 1 = 0
   Threads = 8 × 1 × (1 + 0) = 8 threads  ← Matches CPU-bound formula!

Practical considerations:

The formula gives a theoretical optimum, but real-world factors complicate matters:

W/C ratio varies: Not all tasks have the same wait/compute ratio. You're targeting an average.
External bottlenecks: If your database can only handle 100 concurrent connections, a 200-thread pool won't help—it'll just queue at the database.
Memory constraints: Each thread consumes memory. 500 threads × 1MB stack = 500MB just for stacks.
Connection pool limits: Database connections, HTTP connections, and other resources have their own pools with limited capacity.
Contention: More threads mean more lock contention, which can dominate at extreme thread counts.

io-bound-pool.java
Java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
import java.util.concurrent.*;
 
/**
 * I/O-bound thread pool configuration.
 * Use for: database queries, HTTP calls, file I/O, etc.
 */
public class IoBoundPoolConfiguration {
    
    private static final int CPU_CORES = Runtime.getRuntime().availableProcessors();
    
    /**
     * Calculate pool size for I/O-bound work using Goetz formula.
     * 
     * @param waitTimeRatio Fraction of time tasks spend waiting (0.0 to 1.0)
     * @param targetUtilization Target CPU utilization (0.0 to 1.0)
     */
    public static int calculatePoolSize(double waitTimeRatio, double targetUtilization) {
        // Convert wait time ratio to W/C ratio
        // If waitTimeRatio = 0.9, then computeRatio = 0.1, W/C = 9
        double computeTimeRatio = 1.0 - waitTimeRatio;
        double wcRatio = waitTimeRatio / computeTimeRatio;
        
        // Brian Goetz formula: N × U × (1 + W/C)
        return (int) Math.ceil(
            CPU_CORES * targetUtilization * (1 + wcRatio)
        );
    }
    
    /**
     * Create pool for database queries (typically 80-95% I/O).
     * Conservative sizing to avoid overwhelming the database.
     */
    public static ExecutorService createDatabaseQueryPool() {
        int poolSize = calculatePoolSize(0.85, 0.9);  // 85% wait, 90% CPU target
        poolSize = Math.min(poolSize, 100);  // Cap at reasonable max
        
        return new ThreadPoolExecutor(
            poolSize,
            poolSize,
            60L, TimeUnit.SECONDS,
            new LinkedBlockingQueue<>(1000),
            new ThreadPoolExecutor.CallerRunsPolicy()
        );
    }
    
    /**
     * Create pool for HTTP API calls (typically 90-98% I/O).
     * Higher thread count because network I/O is very wait-heavy.
     */
    public static ExecutorService createHttpClientPool() {
        int poolSize = calculatePoolSize(0.95, 0.9);  // 95% wait
        poolSize = Math.min(poolSize, 200);  // Cap for safety
        
        return new ThreadPoolExecutor(
            CPU_CORES * 2,          // Core pool: start smaller
            poolSize,               // Max pool: scale up on demand
            60L, TimeUnit.SECONDS,  // Idle threads expire after 60s
            new SynchronousQueue<>(),  // Direct handoff, no queuing
            new ThreadPoolExecutor.CallerRunsPolicy()
        );
    }
    
    /**
     * Dynamic I/O pool that scales based on demand.
     * Good for variable workloads.
     */
    public static ExecutorService createElasticIoPool(int maxThreads) {
        return new ThreadPoolExecutor(
            CPU_CORES,              // Core pool: minimum threads
            maxThreads,             // Maximum threads under peak load
            30L, TimeUnit.SECONDS,  // Idle threads reclaimed quickly
            new SynchronousQueue<>(),
            new ThreadPoolExecutor.CallerRunsPolicy()
        );
    }
    
    public static void main(String[] args) throws Exception {
        System.out.println("CPU Cores: " + CPU_CORES);
        System.out.println("Database pool size: " + calculatePoolSize(0.85, 0.9));
        System.out.println("HTTP pool size: " + calculatePoolSize(0.95, 0.9));
        
        // Demonstrate I/O-bound pool
        ExecutorService pool = createHttpClientPool();
        
        // Simulate HTTP API calls
        for (int i = 0; i < 50; i++) {
            final int requestId = i;
            pool.submit(() -> {
                try {
                    // Simulate network I/O
                    Thread.sleep(200);  // 200ms network latency
                    System.out.println("Request " + requestId + " complete");
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                }
            });
        }
        
        pool.shutdown();
        pool.awaitTermination(30, TimeUnit.SECONDS);
    }
}

Little's Law and Queue Theory

Pool sizing can also be approached through queueing theory, specifically using Little's Law—a fundamental theorem that relates arrival rate, processing time, and concurrency.

Little's Law:

littles-law.txt

Theory

┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│   L = λ × W                                                         │
│                                                                     │
│   Where:                                                            │
│     L = Average number of items in the system (concurrency)         │
│     λ = Arrival rate (items per unit time)                          │
│     W = Average time each item spends in the system                 │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
 
Rearranging for concurrency (pool size):
 
   Threads Needed = Arrival Rate × Processing Time
 
Examples:
 
1. 1000 requests/second, 100ms average processing:
   Threads = 1000 × 0.1 = 100 threads
 
2. 500 requests/second, 200ms average processing:
   Threads = 500 × 0.2 = 100 threads
 
3. 100 requests/second, 2 seconds average processing:
   Threads = 100 × 2 = 200 threads
 
Note: This gives the MINIMUM to avoid queuing. Add buffer for bursts.

Why Little's Law matters:

Little's Law is remarkable because it holds regardless of the arrival distribution, processing time distribution, or scheduling discipline. It's a universal truth about stable systems.

Applying to pool sizing:

Little's Law for Capacity Planning

Accounting for variability:

Little's Law gives the average, but real systems have variance. During bursts:

Arrival rate spikes above average
Some requests take longer than average
Multiple factors can compound

To handle bursts without queue explosion:

Add headroom: Size pool 20-50% above Little's Law minimum
Use bounded queues: Provide buffer for short bursts
Monitor queue depth: Alert when queues grow (system is undersized)
Implement backpressure: Reject or throttle when overloaded

Practical Sizing Strategies

Theory informs practice, but real-world pool sizing usually involves empirical tuning. Here are practical strategies used in production systems:

Pool Sizing Strategies

•Start with formulas, refine empirically — Use Goetz's formula or Little's Law as starting points, then benchmark and adjust based on actual performance.
•Consider external dependencies — If your database connection pool is 50, there's no point having 200 threads making database queries. Match pool sizes to downstream capacity.
•Use dynamic/elastic pools for I/O — Start with a small core size, scale up under load, scale down when idle. This adapts to variable traffic patterns.
•Fixed pools for CPU-bound — CPU-bound pools don't benefit from dynamic sizing since the CPU is the bottleneck. Use fixed-size pools matching core count.
•Separate pools for different workloads — Don't mix CPU-bound and I/O-bound work in the same pool. CPU-bound work will starve if threads are blocked on I/O.
•Monitor and alert on queue depth — Growing queues indicate the pool is undersized or downstream is slow. Queue depth should be near zero under normal load.

Starting Pool Sizes by Workload Type
Workload Type	Formula	Typical Range	Scaling Approach
Pure CPU-bound	N	N to N+1	Fixed (no scaling)
CPU + light I/O	N × 2	N to 2N	Fixed or slight elasticity
Mixed 50/50	N × 2 to N × 4	2N to 4N	Elastic (up to max)
I/O-heavy (80%)	N × 5 to N × 10	5N to 10N	Elastic with higher max
Very I/O-heavy (95%+)	N × 10 to N × 50	10N to 50N+	Elastic, limited by downstream

The benchmark-driven approach:

Ultimately, the best pool size is determined by benchmarking your specific workload:

Start with calculated value from formulas
Load test with realistic traffic patterns
Measure throughput, latency (p50, p95, p99), CPU utilization
Adjust pool size up or down
Repeat until metrics are optimal
Document the reasoning and measurements

Don't Optimize in the Dark

Common Pitfalls and Anti-Patterns

Pool sizing mistakes are common even among experienced engineers. Here are pitfalls to avoid:

Pool Sizing Anti-Patterns

•Using unbounded pools — Executors.newCachedThreadPool() creates unlimited threads. Under heavy load, this can spawn thousands of threads, exhausting memory and causing context switch storms.
•Using fixed defaults — Hardcoding new ThreadPoolExecutor(10, 10, ...) without considering actual hardware or workload. Pool sizes should be derived from system properties.
•One pool for all workloads — Mixing CPU-bound and I/O-bound work in one pool causes either underutilization (sized for CPU, I/O tasks block) or overload (sized for I/O, CPU tasks overwhelm).
•Ignoring downstream limits — Sizing the HTTP client pool at 500 when maximum database connections is 50 creates a bottleneck that produces no benefit from extra threads.
•Over-provisioning "just in case" — Having 500 threads when 50 would suffice wastes ~450MB in stack memory and increases overhead for no benefit.
•Under-provisioning to "save resources" — Having 4 threads on a 32-core machine because "we don't need parallelism" leaves vast capacity unused.
•Not monitoring pool health — Without metrics on queue depth, rejection rate, and thread utilization, you're flying blind and can't tune effectively.

pool-antipatterns.java
Java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// ❌ ANTI-PATTERN: Unbounded cached pool
// Under load, this can create THOUSANDS of threads
ExecutorService bad1 = Executors.newCachedThreadPool();
 
// ❌ ANTI-PATTERN: Hardcoded magic numbers
// Doesn't adapt to different machines
ExecutorService bad2 = Executors.newFixedThreadPool(10);
 
// ❌ ANTI-PATTERN: Single pool for everything
// CPU-bound and I/O-bound tasks fight for threads
ExecutorService bad3 = Executors.newFixedThreadPool(50);
bad3.submit(cpuBoundTask);  // Takes 100% of thread's CPU time
bad3.submit(ioBoundTask);   // Waits for network 95% of time
 
// ✅ CORRECT: Separate pools for different workload types
int cpuCores = Runtime.getRuntime().availableProcessors();
 
// CPU-bound pool: sized to core count
ExecutorService cpuPool = Executors.newFixedThreadPool(cpuCores);
 
// I/O-bound pool: sized for high concurrency
ExecutorService ioPool = new ThreadPoolExecutor(
    cpuCores * 2,           // Core: moderate
    cpuCores * 20,          // Max: high for I/O
    60L, TimeUnit.SECONDS,
    new LinkedBlockingQueue<>(1000)
);
 
// ✅ CORRECT: Use appropriate pool for each workload
cpuPool.submit(cpuBoundTask);
ioPool.submit(ioBoundTask);

Summary

What You've Learned

•Pool sizing balances underutilization vs overhead — Too few threads waste capacity; too many threads add contention and context switching overhead.
•Workload type determines sizing — CPU-bound work needs ~N threads (N = cores); I/O-bound work needs N × (1 + W/C) threads based on wait/compute ratio.
•Goetz's formula (N × U × (1 + W/C)) provides a principled starting point for I/O-bound workloads.
•Little's Law (L = λ × W) relates concurrency to arrival rate and processing time—essential for capacity planning.
•Empirical tuning is essential — Start with formulas, then benchmark with realistic traffic to find the optimal size for your specific system.
•Avoid common pitfalls: unbounded pools, magic number sizing, mixing workload types, ignoring downstream limits.

What's next:

Ready for Production Patterns

3 / 4