Capacity Planning - Learning Module

Loading content...

0/241

Resource Planning

Translating Growth into Resource Requirements

Growth estimation tells you how much your database will expand. Resource planning answers the critical follow-up question: what specific resources are needed to support that growth?

Resource planning is the bridge between abstract growth projections and concrete infrastructure decisions. It translates user counts, data volumes, and query loads into CPU cores, memory gigabytes, storage terabytes, and IOPS requirements. Done well, resource planning ensures systems have sufficient capacity for projected workloads while avoiding wasteful over-provisioning.

This page explores the methodology for sizing each critical database resource—CPU, memory, storage, network, and I/O—based on workload characteristics and growth projections.

What You Will Learn

By the end of this page, you will understand how to size database resources based on workload analysis. You'll learn CPU capacity estimation, memory allocation strategies, storage sizing formulas, IOPS calculations, and network bandwidth requirements. You'll also understand resource interdependencies and how to validate sizing decisions before deployment.

Resource Planning Fundamentals

Database resource planning must account for several key principles that distinguish database workloads from other application types:

Non-linear resource consumption:

Unlike stateless application servers that scale linearly, databases exhibit complex resource consumption patterns. Doubling the data volume might quadruple query execution time due to index depth increases and cache saturation. Doubling concurrent connections might 10x memory requirements due to per-connection overhead.

Resource interdependencies:

Database resources are tightly coupled. Insufficient memory forces more disk I/O, which then requires more CPU for I/O processing, which then creates context-switching overhead requiring more CPU still. Optimizing one dimension without considering others often just moves bottlenecks rather than eliminating them.

Peak vs. average requirements:

Databases must handle peak loads without significant degradation. Sizing for average utilization guarantees problems during traffic spikes. The ratio of peak to average varies dramatically by application type—e-commerce may see 10x spikes during sales events, while internal applications may be relatively steady.

Core Resource Categories

•CPU / Compute — Powers query parsing, optimization, execution, data processing, encryption, compression, and background maintenance. Measured in cores/vCPUs and evaluated by utilization percentage and query latency impact.
•Memory (RAM) — Provides buffer pools, query execution workspace, connection overhead, and caching layers. Often the highest-impact optimization lever. Measured in GB, evaluated by cache hit ratios and swap utilization.
•Storage — Holds data files, indexes, transaction logs, backups, and temporary space. Measured in capacity (TB) and evaluated by utilization and growth runway.
•Storage I/O (IOPS) — Determines data access speed. Measured in input/output operations per second with separate read/write profiles. Often the primary bottleneck for analytics workloads.
•Network — Enables client connectivity, replication, backup transfers, and cross-datacenter communication. Measured in bandwidth (Gbps) and latency (ms).

The Golden Rule of Capacity

Plan for 70% peak utilization across all resource dimensions. This provides headroom for unexpected spikes, background maintenance, and graceful degradation. At 90%+ utilization, systems become brittle—small variations cause cascading failures.

CPU Capacity Planning

CPU capacity determines how much work a database can perform concurrently. For OLTP workloads, CPU primarily affects transaction throughput and query latency. For OLAP workloads, CPU determines analytical query completion time.

What consumes CPU in databases:

•Query Execution — Joins, aggregations, sorts, filtering, and data transformation all require CPU cycles. Complex queries with multiple joins can consume orders of magnitude more CPU than simple key lookups.
•Query Parsing and Optimization — Converting SQL text to execution plans. Prepared statements amortize this cost; ad-hoc queries pay it every time.
•Encryption/Decryption — TLS connections, encrypted storage (TDE), field-level encryption all add CPU overhead. Can be 10-30% of total CPU in security-sensitive deployments.
•Compression/Decompression — Page compression, backup compression, and log compression improve storage efficiency but consume CPU.
•Lock Management — Acquiring, holding, and releasing locks, detecting deadlocks, and managing lock escalation.
•Checkpoint Processing — Periodically flushing dirty pages to storage and maintaining write-ahead logs.
•Background Maintenance — Statistics updates, index maintenance, vacuum/garbage collection, and replication all consume CPU.

CPU sizing methodology:

The most reliable approach combines baseline measurement with workload-based extrapolation:

cpu_sizing_calculation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
-- CPU sizing analysis for capacity planning
 
-- Step 1: Establish baseline CPU consumption per transaction type
WITH transaction_cpu_baseline AS (
    SELECT 
        query_type,
        COUNT(*) AS query_count,
        AVG(cpu_time_ms) AS avg_cpu_ms,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY cpu_time_ms) AS p95_cpu_ms,
        SUM(cpu_time_ms) AS total_cpu_ms
    FROM query_performance_log
    WHERE log_timestamp >= NOW() - INTERVAL '7 days'
    GROUP BY query_type
),
 
-- Step 2: Calculate current CPU utilization pattern
hourly_cpu_usage AS (
    SELECT 
        DATE_TRUNC('hour', collection_timestamp) AS hour_bucket,
        AVG(cpu_utilization_percent) AS avg_cpu_pct,
        MAX(cpu_utilization_percent) AS max_cpu_pct,
        AVG(queries_per_second) AS avg_qps
    FROM system_metrics
    WHERE collection_timestamp >= NOW() - INTERVAL '30 days'
    GROUP BY DATE_TRUNC('hour', collection_timestamp)
),
 
-- Step 3: Determine peak-to-average ratio
peak_patterns AS (
    SELECT 
        MAX(max_cpu_pct) AS peak_cpu_utilization,
        AVG(avg_cpu_pct) AS average_cpu_utilization,
        MAX(max_cpu_pct) / NULLIF(AVG(avg_cpu_pct), 0) AS peak_to_average_ratio,
        MAX(avg_qps) AS peak_qps,
        AVG(avg_qps) AS average_qps
    FROM hourly_cpu_usage
),
 
-- Step 4: Project future requirements
current_capacity AS (
    SELECT 
        16 AS current_cpu_cores,  -- Current server configuration
        p.peak_cpu_utilization,
        p.average_cpu_utilization,
        p.peak_to_average_ratio,
        p.peak_qps
    FROM peak_patterns p
)
 
SELECT 
    -- Current state
    current_cpu_cores,
    ROUND(peak_cpu_utilization, 1) AS current_peak_cpu_pct,
    ROUND(peak_qps, 0) AS current_peak_qps,
    
    -- CPU per QPS (efficiency metric)
    ROUND(peak_cpu_utilization / NULLIF(peak_qps, 0), 4) AS cpu_pct_per_qps,
    
    -- Future projections (assuming 2x query load growth)
    ROUND((peak_cpu_utilization * 2), 1) AS projected_cpu_at_2x_load,
    
    -- Required cores for 2x load at 70% peak target
    CEILING(current_cpu_cores * (peak_cpu_utilization * 2) / 70.0) 
        AS required_cores_2x_load,
    
    -- Headroom at current capacity
    ROUND(70.0 / NULLIF(peak_cpu_utilization, 0) * 100 - 100, 0) 
        AS percent_growth_headroom,
    
    -- Time to 70% threshold (months) at 10% monthly growth
    CASE 
        WHEN peak_cpu_utilization >= 70 THEN 0
        ELSE ROUND(LN(70.0 / peak_cpu_utilization) / LN(1.10), 1)
    END AS months_to_threshold
FROM current_capacity;
 
-- Detailed query type analysis for optimization targeting
SELECT 
    query_type,
    query_count,
    ROUND(avg_cpu_ms, 2) AS avg_cpu_ms,
    ROUND(p95_cpu_ms, 2) AS p95_cpu_ms,
    ROUND(100.0 * total_cpu_ms / SUM(total_cpu_ms) OVER (), 1) 
        AS pct_total_cpu,
    CASE 
        WHEN avg_cpu_ms > 500 THEN 'OPTIMIZE'
        WHEN avg_cpu_ms > 100 THEN 'REVIEW'
        ELSE 'OK'
    END AS recommendation
FROM transaction_cpu_baseline
ORDER BY total_cpu_ms DESC
LIMIT 20;

CPU Sizing Guidelines by Workload Type
Workload Type	Typical QPS/Core	Utilization Target	Key Considerations
OLTP - Simple	500-2000	≤60%	Per-query overhead dominates; more cores help parallelism
OLTP - Complex	50-200	≤50%	Query complexity varies widely; size for peak complexity
OLAP / BI	1-10	≤80%	Large queries; parallelism essential; size for concurrent reports
Mixed Workload	100-500	≤55%	Contention between workloads; consider workload separation
Batch Processing	N/A	≤90%	Throughput-focused; can tolerate higher sustained utilization

Frequency vs. Cores Trade-off

Database workloads often benefit more from higher clock frequency than more cores, because many operations are inherently serial (single-query execution, lock management). When choosing hardware, prioritize frequency for OLTP and core count for parallel analytics.

Memory Capacity Planning

Memory is typically the highest-impact resource for database performance. Adequate memory enables data caching, reduces I/O operations, and provides workspace for complex query execution. Memory undersizing causes dramatic performance degradation as the database constantly reads from disk.

Memory consumers in database systems:

Database Memory Components

•Buffer Pool / Buffer Cache — The largest memory consumer, caching data and index pages to avoid disk reads. Size directly correlates with cache hit ratio; typically 60-80% of available RAM.
•Query Execution Memory — Workspace for sorts, hash tables, temporary results. Large analytical queries may require gigabytes per query. Usually allocated dynamically per session.
•Connection Overhead — Each client connection consumes memory for session state, prepared statement caches, and communication buffers. Can range from 1-50MB per connection depending on configuration.
•Lock Table Memory — Space for tracking active locks. Size scales with concurrent transactions and lock granularity (row vs. page locking).
•Log Buffers — In-memory staging area for transaction log writes before flushing to disk. Usually fixed size (16-256MB).
•Procedure Cache — Stores compiled query plans for reuse. Size depends on query diversity and plan complexity.
•Operating System — Reserve 10-15% of RAM for OS caches, kernel operations, and other processes.

Memory sizing formulas:

Memory requirements can be estimated using component-based modeling:

memory_sizing_model.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
"""
Database Memory Sizing Model
 
Calculates required memory based on workload characteristics
and target performance metrics.
"""
 
from dataclasses import dataclass
from typing import Dict
 
@dataclass
class MemorySizingInputs:
    """Input parameters for memory sizing"""
    # Data characteristics
    total_data_size_gb: float          # Total database size
    hot_data_percentage: float         # Percentage actively accessed (0-100)
    index_size_gb: float               # Total index size
    
    # Connection characteristics  
    max_concurrent_connections: int    # Peak connection count
    memory_per_connection_mb: int      # Per-connection overhead (5-50MB typical)
    
    # Query characteristics
    max_concurrent_queries: int        # Peak parallel queries
    avg_sort_memory_mb: int            # Average sort/hash memory per query
    max_sort_memory_mb: int            # Maximum single query memory
    
    # Performance targets
    target_cache_hit_ratio: float      # Desired buffer hit ratio (0.95-0.99)
    target_response_time_ms: float     # Desired P95 response time
 
@dataclass  
class MemorySizingResult:
    """Memory sizing recommendations"""
    minimum_ram_gb: float
    recommended_ram_gb: float
    optimal_ram_gb: float
    buffer_pool_gb: float
    connection_memory_gb: float
    query_memory_gb: float
    os_reserved_gb: float
    rationale: str
 
 
def calculate_memory_requirements(inputs: MemorySizingInputs) -> MemorySizingResult:
    """
    Calculate memory requirements based on workload inputs.
    
    Uses working set model: enough RAM to cache actively-accessed data
    plus operational overhead.
    """
    
    # 1. Buffer Pool Sizing
    # Goal: cache all hot data + all indexes for cache hit targets
    hot_data_gb = inputs.total_data_size_gb * (inputs.hot_data_percentage / 100)
    
    # To achieve 99% hit ratio, cache 100% of hot data
    # To achieve 95% hit ratio, cache ~80% of hot data (80/20 rule assumed)
    cache_coverage = {
        0.99: 1.0,
        0.98: 0.95,
        0.97: 0.90,
        0.95: 0.80,
        0.90: 0.60
    }
    coverage_factor = cache_coverage.get(inputs.target_cache_hit_ratio, 0.80)
    
    # Buffer pool should cache hot data + indexes
    buffer_pool_gb = (hot_data_gb * coverage_factor) + inputs.index_size_gb
    
    # 2. Connection Memory
    connection_memory_gb = (
        inputs.max_concurrent_connections * inputs.memory_per_connection_mb
    ) / 1024
    
    # 3. Query Execution Memory
    # Reserve for concurrent sorts + some safety factor for large queries
    avg_query_memory_gb = (
        inputs.max_concurrent_queries * inputs.avg_sort_memory_mb
    ) / 1024
    max_query_buffer_gb = inputs.max_sort_memory_mb / 1024
    query_memory_gb = avg_query_memory_gb + max_query_buffer_gb
    
    # 4. Operating System Reserve (15%)
    base_ram = buffer_pool_gb + connection_memory_gb + query_memory_gb
    os_reserved_gb = base_ram * 0.15
    
    # 5. Calculate final requirements
    minimum_ram_gb = buffer_pool_gb * 0.5 + connection_memory_gb + query_memory_gb + 4  # Bare minimum
    recommended_ram_gb = buffer_pool_gb + connection_memory_gb + query_memory_gb + os_reserved_gb
    optimal_ram_gb = recommended_ram_gb * 1.3  # 30% headroom for growth
    
    rationale = f"""
Memory Sizing Analysis:
-----------------------
Working Set: {hot_data_gb:.1f} GB hot data + {inputs.index_size_gb:.1f} GB indexes
Target Hit Ratio: {inputs.target_cache_hit_ratio*100:.0f}% → Cache {coverage_factor*100:.0f}% of hot data
 
Component Breakdown:
  Buffer Pool:        {buffer_pool_gb:.1f} GB (cache {coverage_factor*100:.0f}% hot + all indexes)
  Connection Memory:  {connection_memory_gb:.1f} GB ({inputs.max_concurrent_connections} × {inputs.memory_per_connection_mb} MB)
  Query Memory:       {query_memory_gb:.1f} GB ({inputs.max_concurrent_queries} concurrent queries)
  OS Reserve:         {os_reserved_gb:.1f} GB (15% overhead)
 
Recommendations:
  Minimum (degraded):    {minimum_ram_gb:.0f} GB - Expect cache pressure, increased I/O
  Recommended (target):  {recommended_ram_gb:.0f} GB - Meets performance objectives
  Optimal (headroom):    {optimal_ram_gb:.0f} GB - Room for growth and spikes
"""
    
    return MemorySizingResult(
        minimum_ram_gb=round(minimum_ram_gb, 1),
        recommended_ram_gb=round(recommended_ram_gb, 1),
        optimal_ram_gb=round(optimal_ram_gb, 1),
        buffer_pool_gb=round(buffer_pool_gb, 1),
        connection_memory_gb=round(connection_memory_gb, 1),
        query_memory_gb=round(query_memory_gb, 1),
        os_reserved_gb=round(os_reserved_gb, 1),
        rationale=rationale
    )
 
 
# Example: E-commerce database sizing
def example_ecommerce_sizing():
    inputs = MemorySizingInputs(
        total_data_size_gb=500,
        hot_data_percentage=20,      # 20% actively accessed (recent orders, active products)
        index_size_gb=80,
        max_concurrent_connections=500,
        memory_per_connection_mb=15,  # Medium connection overhead
        max_concurrent_queries=100,
        avg_sort_memory_mb=128,
        max_sort_memory_mb=2048,      # Large report queries
        target_cache_hit_ratio=0.97,
        target_response_time_ms=50
    )
    
    result = calculate_memory_requirements(inputs)
    print(result.rationale)
    print(f"
Final Recommendation: {result.recommended_ram_gb:.0f} GB RAM")
    print(f"Buffer Pool Setting: {result.buffer_pool_gb:.0f} GB")
 
 
if __name__ == "__main__":
    example_ecommerce_sizing()

Cache Hit Ratio Target

For OLTP workloads, target 99%+ buffer cache hit ratio. Each percentage point matters exponentially—a drop from 99% to 98% hit ratio doubles disk I/O. For OLAP workloads scanning large datasets, lower hit ratios (90-95%) may be acceptable as sequential scans are efficient.

Storage Capacity Planning

Storage planning encompasses capacity (space), performance (IOPS and throughput), and reliability (redundancy and durability). Unlike CPU and memory which can sometimes be added on-the-fly, storage often requires careful upfront planning as expansion may involve data migration.

Storage components to plan:

Database Storage Components
Component	Typical Size Ratio	I/O Characteristics	Planning Notes
Data Files	Base (1x)	Mixed read/write, random	Primary capacity driver
Index Files	0.2x - 0.5x of data	Read-heavy, highly random	Grows with indexes; benefits from fast storage
Transaction Logs	10-50% of daily change	Sequential write, append-only	High write IOPS; benefits from separate disk
Temp Space	2-5x max concurrent query memory	High churn, sequential	Size for sort spills; can use fast ephemeral storage
Backup Storage	2-5x database size	Sequential write/read	Consider compression; tiered storage viable
Archive Storage	Application-dependent	Write once, rare read	Cheapest tier; data lifecycle policy driven

Capacity sizing calculation:

Storage capacity must account for current data, projected growth, operational overhead, and safety margins.

storage_capacity_planning.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
-- Comprehensive storage capacity planning query
 
WITH current_storage AS (
    SELECT 
        -- Data file usage
        (SELECT SUM(data_length + index_length) / (1024*1024*1024) 
         FROM information_schema.tables) AS data_index_current_gb,
        
        -- Current total allocated
        (SELECT SUM(total_allocated_gb) FROM storage_volumes 
         WHERE purpose = 'database') AS total_allocated_gb,
        
        -- Free space
        (SELECT SUM(free_space_gb) FROM storage_volumes 
         WHERE purpose = 'database') AS free_space_gb
),
 
growth_rates AS (
    -- From growth estimation analysis
    SELECT 
        0.08 AS monthly_data_growth_rate,   -- 8% monthly
        0.10 AS monthly_index_growth_rate,  -- 10% monthly (faster than data)
        0.12 AS monthly_log_growth_rate     -- 12% monthly
),
 
storage_projections AS (
    SELECT 
        c.data_index_current_gb,
        c.total_allocated_gb,
        c.free_space_gb,
        g.monthly_data_growth_rate,
        
        -- 12-month projection
        c.data_index_current_gb * POWER(1 + g.monthly_data_growth_rate, 12) 
            AS data_12m_projection_gb,
        
        -- Storage components breakdown (current estimate)
        c.data_index_current_gb * 0.75 AS estimated_data_gb,
        c.data_index_current_gb * 0.25 AS estimated_index_gb,
        c.data_index_current_gb * 0.10 AS estimated_log_space_gb,
        c.data_index_current_gb * 0.15 AS estimated_temp_space_gb
    FROM current_storage c
    CROSS JOIN growth_rates g
)
 
SELECT 
    -- Current state
    ROUND(data_index_current_gb, 1) AS current_data_index_gb,
    ROUND(total_allocated_gb, 1) AS total_allocated_gb,
    ROUND(free_space_gb, 1) AS free_space_gb,
    ROUND(100 - (free_space_gb / total_allocated_gb * 100), 1) AS utilization_pct,
    
    -- Growth trajectory
    ROUND(data_12m_projection_gb, 1) AS projected_12m_gb,
    ROUND((data_12m_projection_gb - data_index_current_gb), 1) AS growth_12m_gb,
    
    -- Required total storage with safety margins
    ROUND(data_12m_projection_gb * 1.20, 0) AS required_storage_12m_gb_20pct_margin,
    ROUND(data_12m_projection_gb * 1.50, 0) AS required_storage_12m_gb_50pct_margin,
    
    -- Time until current storage exhausted (at 85% as threshold)
    CASE 
        WHEN monthly_data_growth_rate > 0 THEN
            ROUND(
                LN((total_allocated_gb * 0.85) / NULLIF(data_index_current_gb, 0)) / 
                LN(1 + monthly_data_growth_rate), 
            1)
        ELSE NULL
    END AS months_until_85pct_full,
    
    -- Component recommendations
    ROUND(estimated_data_gb * POWER(1 + monthly_data_growth_rate, 12), 0) 
        AS recommended_data_storage_gb,
    ROUND(estimated_index_gb * POWER(1 + monthly_data_growth_rate * 1.2, 12), 0) 
        AS recommended_index_storage_gb,
    ROUND(estimated_temp_space_gb * 2, 0) AS recommended_temp_storage_gb,
    ROUND(data_12m_projection_gb * 3, 0) AS recommended_backup_storage_gb
    
FROM storage_projections;
 
-- IOPS requirements estimation
WITH query_io_profile AS (
    SELECT 
        -- Analyze actual I/O patterns from query log
        AVG(physical_reads) AS avg_reads_per_query,
        AVG(physical_writes) AS avg_writes_per_query,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY physical_reads) AS p95_reads,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY physical_writes) AS p95_writes,
        COUNT(*) / (7 * 24 * 3600.0) AS avg_queries_per_second
    FROM query_performance_log
    WHERE log_timestamp >= NOW() - INTERVAL '7 days'
),
iops_calculation AS (
    SELECT 
        avg_reads_per_query,
        avg_writes_per_query,
        avg_queries_per_second,
        
        -- Average IOPS = QPS × (avg reads + avg writes)
        avg_queries_per_second * (avg_reads_per_query + avg_writes_per_query) 
            AS avg_iops,
        
        -- Peak IOPS (using P95 and 3x QPS spike factor)
        (avg_queries_per_second * 3) * (p95_reads + p95_writes) 
            AS estimated_peak_iops
    FROM query_io_profile
)
SELECT 
    ROUND(avg_iops, 0) AS average_iops_required,
    ROUND(estimated_peak_iops, 0) AS peak_iops_required,
    ROUND(estimated_peak_iops * 0.70, 0) AS read_iops_estimate,  -- 70% reads typical
    ROUND(estimated_peak_iops * 0.30, 0) AS write_iops_estimate,
    
    -- Storage tier recommendation
    CASE 
        WHEN estimated_peak_iops > 50000 THEN 'NVMe SSD (100K+ IOPS)'
        WHEN estimated_peak_iops > 10000 THEN 'Enterprise SSD (50-100K IOPS)'
        WHEN estimated_peak_iops > 2000 THEN 'SSD / Fast HDD RAID (10-50K IOPS)'
        ELSE 'Standard HDD RAID sufficient (2-10K IOPS)'
    END AS recommended_storage_tier
FROM iops_calculation;

Storage Tiering Strategy

Not all database storage needs the fastest tier. Place transaction logs and frequently-accessed indexes on NVMe. Put main data files on enterprise SSD. Archive old data to cheaper HDD or object storage. This tiered approach optimizes cost while maintaining performance where it matters.

Network Capacity Planning

Network capacity is often overlooked in database planning but becomes critical at scale. Network constraints can limit replication speed, backup windows, and client throughput regardless of how powerful the database server is.

Network capacity consumers:

Database Network Traffic Components

•Client Query Traffic — Query requests and result sets. Result set size is primary driver—a query returning 100K rows generates 10-100x more traffic than the query request.
•Replication Traffic — Log shipping and synchronous replication. Nearly equals write volume on primary; can be compressed but adds latency-sensitive load.
•Backup Traffic — Full and incremental backups, often to network storage. Can saturate network during backup windows if not throttled.
•Management Traffic — Monitoring, administration, health checks. Usually minimal but consistent overhead.
•Inter-Database Traffic — Linked servers, federated queries, distributed transactions. Highly variable based on architecture.

Network sizing calculations:

network_capacity_estimation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
-- Network capacity estimation for database workloads
 
WITH traffic_components AS (
    -- Client traffic estimation
    SELECT 'client_inbound' AS traffic_type,
           SUM(query_bytes) / (7.0 * 86400) AS bytes_per_sec  -- 7-day average
    FROM query_traffic_log
    WHERE direction = 'inbound'
      AND log_timestamp >= NOW() - INTERVAL '7 days'
    
    UNION ALL
    
    SELECT 'client_outbound',
           SUM(result_bytes) / (7.0 * 86400) AS bytes_per_sec
    FROM query_traffic_log  
    WHERE direction = 'outbound'
      AND log_timestamp >= NOW() - INTERVAL '7 days'
    
    UNION ALL
    
    -- Replication traffic (equals change volume)
    SELECT 'replication',
           (SELECT SUM(wal_bytes_generated) / (7.0 * 86400) 
            FROM replication_stats
            WHERE stat_timestamp >= NOW() - INTERVAL '7 days')
    
    UNION ALL
    
    -- Backup traffic (daily full, assuming 8-hour window)
    SELECT 'backup',
           (SELECT pg_database_size(current_database()) / (8.0 * 3600))
),
 
bandwidth_requirements AS (
    SELECT 
        traffic_type,
        bytes_per_sec,
        bytes_per_sec * 8 / 1e9 AS gbps_average,      -- Convert to Gbps
        bytes_per_sec * 8 * 3 / 1e9 AS gbps_peak,    -- 3x for peak estimate
        bytes_per_sec * 8 / 1e6 AS mbps_average
    FROM traffic_components
)
 
SELECT 
    traffic_type,
    ROUND(mbps_average, 1) AS avg_mbps,
    ROUND(gbps_peak * 1000, 1) AS peak_mbps,
    ROUND(gbps_peak, 3) AS peak_gbps
FROM bandwidth_requirements
 
UNION ALL
 
-- Total aggregation
SELECT 
    'TOTAL REQUIRED' AS traffic_type,
    ROUND(SUM(mbps_average), 0) AS avg_mbps,
    ROUND(SUM(gbps_peak * 1000), 0) AS peak_mbps,
    ROUND(SUM(gbps_peak), 2) AS peak_gbps
FROM bandwidth_requirements;
 
-- Network interface sizing recommendation
WITH total_bandwidth AS (
    SELECT SUM(bytes_per_sec * 8 / 1e9) * 3 AS peak_gbps  -- 3x for peak
    FROM traffic_components
)
SELECT 
    ROUND(peak_gbps, 2) AS peak_bandwidth_gbps,
    CASE 
        WHEN peak_gbps > 20 THEN '100 Gbps bonded (25Gbps × 4)'
        WHEN peak_gbps > 8 THEN '25 Gbps (or 10 Gbps bonded)'
        WHEN peak_gbps > 3 THEN '10 Gbps'
        ELSE '1 Gbps sufficient'
    END AS recommended_network,
    ROUND(peak_gbps / 25.0 * 100, 0) AS pct_25gbps_utilization
FROM total_bandwidth;

Network Latency vs. Bandwidth

For OLTP workloads, network latency often matters more than bandwidth. A 1ms round-trip time adds 1ms to every query—at 1000 queries/second, that's 1000 seconds of cumulative latency per second of wall-clock time. Locate databases close to application servers and use private networks.

Understanding Resource Interdependencies

Database resources don't operate in isolation—they interact in complex ways. Understanding these interdependencies is essential for balanced resource allocation and avoiding hidden bottlenecks.

Critical Resource Interactions

•Memory → I/O — Insufficient memory forces data reads from disk. Halving buffer pool might 10x I/O requirements. Always size memory before storage I/O.
•I/O → CPU — Heavy I/O creates CPU overhead for I/O scheduling, interrupt handling, and waiting. Slow storage can cause high CPU wait times that look like CPU shortage.
•CPU → Memory — CPU-bound query processing delays memory release. Slow queries hold locks longer, increase connection lifetimes, and consume more cumulative memory.
•Network → Memory — Large result sets must be buffered in memory during transmission. Slow networks extend buffer lifetime and increase memory pressure.
•Connections → All Resources — Each connection has baseline costs across all resources. 1000 idle connections consume RAM, lock table space, and management CPU.
•Locks → CPU — Lock contention wastes CPU on wait spinning and deadlock detection. High contention can make CPU utilization look high while actual work throughput is low.

Balanced sizing principle:

Resource allocation should be balanced such that no single resource is dramatically over- or under-provisioned relative to others. An unbalanced configuration wastes resources—excess capacity in one dimension provides no benefit if another dimension is the bottleneck.

Resource Balance Guidelines
If You Have	You Should Also Have	Rationale
64 GB RAM	≥8 CPU cores	1 core per 8GB RAM processes buffered data efficiently
500 GB hot data	≥100 GB RAM	20% cache ratio minimum for acceptable hit rates
10,000 IOPS storage	≥64 GB RAM	Memory reduces IOPS needs; balance investment
1000 concurrent connections	≥32 GB RAM + 16 cores	Connection overhead requires per-connection resources
10 Gbps network	NVMe storage + ample CPU	Fast network can only deliver what other resources can produce

The Bottleneck Cascade

When one resource becomes saturated, it often triggers apparent shortages in others. A memory shortage causing excessive I/O looks like a storage problem. Slow storage causing CPU waits looks like a CPU problem. Always look for the root bottleneck, not the symptoms.

Validating Resource Sizing Decisions

Theoretical resource sizing provides a starting point, but validation through testing is essential before committing to production infrastructure. Testing reveals unexpected bottlenecks, validates assumptions, and builds confidence in sizing decisions.

Validation Approaches

•Load Testing — Simulate expected peak loads on candidate configurations. Measure latencies, error rates, and resource utilization. Identify which resource saturates first.
•Stress Testing — Exceed expected peak loads to find breaking points. Understanding failure modes helps set safety margins and alerting thresholds.
•Benchmark Comparison — Run standardized benchmarks (TPC-C, TPC-H, sysbench) across configurations. Provides apples-to-apples comparison and vendor-independent metrics.
•Production Replay — Capture production traffic and replay against test systems. Most realistic validation but requires traffic capture infrastructure.
•Synthetic Modeling — Use database-specific simulators if available. Some databases (Oracle, SAP) have capacity estimation tools.

Key validation metrics:

resource_validation_metrics.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
-- Comprehensive resource validation during load testing
 
-- CPU validation
SELECT 
    'CPU' AS resource,
    MAX(cpu_utilization_pct) AS peak_utilization,
    AVG(cpu_utilization_pct) AS avg_utilization,
    MAX(cpu_wait_pct) AS max_io_wait,
    CASE 
        WHEN MAX(cpu_utilization_pct) > 85 THEN 'UNDERSIZED'
        WHEN MAX(cpu_utilization_pct) > 70 THEN 'ADEQUATE'
        ELSE 'OVERSIZED'
    END AS assessment
FROM load_test_metrics
WHERE test_run_id = 'latest_load_test'
 
UNION ALL
 
-- Memory validation  
SELECT 
    'Memory' AS resource,
    MAX(100 - (free_memory_pct)) AS peak_utilization,
    AVG(100 - (free_memory_pct)) AS avg_utilization,
    MAX(swap_usage_pct) AS max_swap,
    CASE 
        WHEN MAX(swap_usage_pct) > 5 THEN 'UNDERSIZED - Swapping'
        WHEN MAX(buffer_hit_ratio) < 0.95 THEN 'UNDERSIZED - Low cache hit'
        WHEN AVG(free_memory_pct) > 40 THEN 'OVERSIZED'
        ELSE 'ADEQUATE'
    END AS assessment
FROM load_test_metrics
WHERE test_run_id = 'latest_load_test'
 
UNION ALL
 
-- Storage I/O validation
SELECT 
    'Storage IOPS' AS resource,
    MAX(total_iops) AS peak_iops,
    AVG(total_iops) AS avg_iops,
    MAX(avg_io_latency_ms) AS max_latency_ms,
    CASE 
        WHEN MAX(avg_io_latency_ms) > 20 THEN 'UNDERSIZED - High latency'
        WHEN MAX(iops_utilization_pct) > 80 THEN 'UNDERSIZED - Near IOPS limit'
        WHEN MAX(iops_utilization_pct) < 30 THEN 'OVERSIZED'
        ELSE 'ADEQUATE'
    END AS assessment
FROM load_test_metrics
WHERE test_run_id = 'latest_load_test'
 
UNION ALL
 
-- Network validation
SELECT 
    'Network' AS resource,
    MAX(network_throughput_mbps) AS peak_mbps,
    AVG(network_throughput_mbps) AS avg_mbps,
    MAX(network_latency_ms) AS max_latency_ms,
    CASE 
        WHEN MAX(network_utilization_pct) > 70 THEN 'UNDERSIZED'
        WHEN MAX(network_utilization_pct) < 20 THEN 'OVERSIZED'  
        ELSE 'ADEQUATE'
    END AS assessment
FROM load_test_metrics
WHERE test_run_id = 'latest_load_test';
 
-- Latency validation at target load levels
SELECT 
    query_type,
    COUNT(*) AS query_count,
    ROUND(AVG(latency_ms), 1) AS avg_latency,
    ROUND(PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY latency_ms), 1) AS p50,
    ROUND(PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency_ms), 1) AS p95,
    ROUND(PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency_ms), 1) AS p99,
    CASE 
        WHEN PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency_ms) > 500 
            THEN 'FAILED - P95 > 500ms'
        WHEN PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency_ms) > 200 
            THEN 'WARNING - P95 > 200ms'
        ELSE 'PASSED'
    END AS sla_assessment
FROM load_test_query_log
WHERE test_run_id = 'latest_load_test'
GROUP BY query_type;

Test at 120% of Projected Peak

Validate configurations at 120% of expected peak load, not at the exact projected peak. This confirms headroom exists for traffic spikes and forecast errors. If the system fails at 120%, it's undersized even if it passes at 100%.

Summary: Mastering Resource Planning

Resource planning translates growth estimates into actionable infrastructure requirements. By systematically sizing CPU, memory, storage, network, and I/O based on workload analysis, DBAs can provision systems that meet performance objectives while avoiding wasteful over-spending.

Key Takeaways

•Size for peak loads — Databases must handle spikes without degradation. Target 70% peak utilization to provide headroom.
•Memory is highest-impact — Adequate memory reduces I/O, improves cache hit ratios, and enables complex query processing. Size memory first.
•Resources are interdependent — Bottlenecks cascade across resource types. Balance allocation to avoid hidden constraints.
•Storage has multiple dimensions — Plan capacity, IOPS, and throughput separately. Consider tiering for cost optimization.
•Network matters at scale — High-throughput and low-latency networks become critical as data and replication volumes grow.
•Validate before deploying — Load testing confirms theoretical sizing meets real-world requirements. Test at 120% of projected peak.

What's next:

With resource requirements defined, the next challenge is determining how to add capacity when needed. The next page covers Scaling Strategies—vertical vs. horizontal scaling, read replicas, sharding, and the trade-offs of each approach.

Page Complete

You now understand how to translate growth projections into concrete resource requirements. This foundation enables informed infrastructure decisions that balance performance, cost, and operational complexity. Next, we'll explore the strategies for adding capacity as demand grows.