Loading content...
Latency tells you how fast. Throughput tells you how much. Availability tells you how reliable. But resource utilization tells you how efficiently your system uses the hardware it runs on—and whether you're about to hit a wall.
Resource utilization metrics—CPU, memory, disk I/O, network bandwidth—are the vital signs of your infrastructure. They reveal whether your system is starving for resources, wasting money on over-provisioning, or approaching dangerous saturation. They're the bridge between abstract performance metrics and concrete capacity planning decisions.
Understanding resource utilization enables you to answer critical questions: How many more users can this server handle? When will we need to scale? Is our database CPU-bound or I/O-bound? Are we paying for resources we don't use? Are we about to run out of memory?
This page explores the four primary resources—CPU, memory, disk, and network—covering how to measure utilization, interpret the numbers, understand when high utilization is concerning vs. efficient, and use utilization data for capacity planning. You'll develop the operational intuition that distinguishes engineers who understand their systems at the hardware level.
Every computing system ultimately consumes four fundamental resources. Understanding their characteristics, measurement, and constraints is foundational to performance engineering.
| Resource | What It Does | Units | Typical Bottleneck Pattern |
|---|---|---|---|
| CPU | Executes instructions, performs computation | Cores, %, cycles | Compute-intensive: encryption, compression, parsing |
| Memory (RAM) | Stores active data for fast access | GB, % utilization | Large datasets, caching, connection metadata |
| Disk (Storage) | Persists data, provides I/O | IOPS, MB/s, latency | Database operations, logging, file serving |
| Network | Transfers data between systems | Mbps, packets/s | API calls, data replication, user traffic |
Resource Interactions
Resources don't exist in isolation—they interact in complex ways:
The Bottleneck Principle
Performance is limited by the scarcest resource—the bottleneck. Adding more of a non-bottleneck resource does nothing:
Identifying the bottleneck is the first step in any performance investigation.
CPU utilization measures what percentage of available processing capacity is being used. It's the most commonly monitored resource metric but also one of the most misunderstood.
How CPU Utilization Is Measured
CPU time is categorized into states (on Linux/Unix systems):
CPU Utilization = (user + system) / (user + system + iowait + idle + steal) × 100%
Multi-Core Considerations
Modern servers have multiple cores. "100% CPU" means different things:
Understanding per-core vs. aggregate utilization matters:
High iowait looks like CPU utilization in some tools but actually means CPUs are waiting for slow I/O. If you see high iowait, your bottleneck is disk, not CPU. Adding CPU won't help—you need faster storage or better I/O patterns.
What CPU Utilization Levels Mean:
| Utilization | Interpretation | Typical Action |
|---|---|---|
| 0-30% | Underutilized; overpaying for capacity | Consider downsizing or consolidating |
| 30-60% | Healthy headroom for traffic spikes | Normal operating range |
| 60-80% | Well-utilized; limited spike headroom | Monitor closely, plan scaling |
| 80-90% | High utilization; risk of saturation | Active scaling needed |
| 90-100% | Saturated; latency degrading rapidly | Immediate action required |
CPU-Intensive Operations
Recognize what drives high CPU:
Memory (RAM) provides fast storage for active data. Unlike CPU which is a "renewable" resource (freed immediately when computation completes), memory is a "depletable" resource that must be explicitly managed.
Memory Metrics
Total Memory = Used + Free + Buffers + Cached
A Common Misconception:
Low free memory doesn't mean memory problems! Linux aggressively uses free memory for disk caching. A system with 1% free memory but 40% cached is healthy—the cache can be evicted when applications need memory.
What Actually Matters: Available Memory
Available = Free + Buffers + Cached (reclaimable)
This is the memory applications can actually use. Monitor available memory, not free memory.
When memory is exhausted and no more can be reclaimed, the Linux OOM (Out-Of-Memory) Killer activates, terminating processes to free memory. This is a catastrophic failure—your application dies. Memory exhaustion should trigger earlier warnings, not surprise termination.
| Available Memory | Interpretation | Action |
|---|---|---|
50% | Abundant headroom | Consider rightsizing down |
| 30-50% | Healthy operating range | Normal, monitor trends |
| 15-30% | Getting tight, cache pressure | Plan capacity increase |
| 5-15% | Critical, frequent cache eviction | Scale immediately |
| < 5% | Dangerous, OOM risk imminent | Emergency response |
Memory-Intensive Operations
Recognize what consumes memory:
Memory Leaks
Memory leaks cause utilization to grow over time:
Disk I/O is often the most constrained resource in database-heavy systems. Unlike CPU and network which are fast, disk operations involve physical or flash storage with inherent latency.
Key Disk Metrics
| Metric | What It Measures | Typical HDD Value | Typical SSD/NVMe Value |
|---|---|---|---|
| IOPS | Operations per second | 100-200 (random) | 50,000-500,000+ |
| Throughput (MB/s) | Data transfer rate | 100-200 MB/s | 500-7,000 MB/s |
| Latency | Time per operation | 5-15ms | 0.01-0.5ms |
| Queue Depth | Pending I/O operations | < 2 healthy | < 32 healthy |
| Utilization % | Time disk is busy | < 60% healthy | < 80% healthy |
Understanding Disk Utilization
Disk utilization (reported by iostat as %util) shows what percentage of time the disk is handling requests. But interpretation differs by disk type:
Queue Depth: A Better Saturation Indicator
Queue depth (average number of pending I/O requests) often reveals saturation better than utilization:
HDDs are 100× slower at random I/O than sequential I/O (seek time). SSDs have similar performance for both. If you're stuck with HDDs, optimizing for sequential access patterns (append-only logs, sequential scans) dramatically improves performance. SSDs are more forgiving of random access patterns.
Disk-Intensive Operations
Flash Storage Considerations
SSDs and NVMe have different failure patterns:
Network resources connect distributed systems. Network constraints manifest differently from compute resources—physical bandwidth limits, latency costs, and packet processing overhead all matter.
Key Network Metrics
| Metric | What It Measures | Unit | Concern Threshold |
|---|---|---|---|
| Bandwidth Utilization | Data throughput vs. capacity | % | 70% sustained |
| Packets Per Second | Packet volume | pps | Depends on NIC (millions on modern NICs) |
| Latency (RTT) | Round-trip time | ms | Increases under congestion |
| Packet Loss | Lost/dropped packets | % | 0.1% problematic |
| TCP Retransmissions | Packets needing resend | count/s | Increases under loss |
| Connection Count | Active connections | count | OS limit: ~65K per IP pair |
Bandwidth vs. Packet Rate
Network interfaces can be limited by either:
Small packets can exhaust packet rate before bandwidth. A 10 Gbps link at 1M pps handles ~10,000 bits per packet = 1,250 bytes/packet. Smaller packets (like ACK-heavy workloads) hit packet limits first.
Network Latency Sources
Intra-datacenter latency: ~0.1-0.5ms Cross-region latency: 20-100ms Intercontinental: 100-300ms
Cloud providers charge for cross-region and internet-bound traffic, often $0.02-0.12 per GB. High-volume services can have network egress as their largest infrastructure cost. Monitor egress carefully and design to minimize cross-region data transfer.
Network-Intensive Operations
Resources become constrained not just from high utilization, but from contention—multiple consumers competing for the same resource simultaneously.
Types of Contention
Noisy Neighbors
In shared environments (cloud, Kubernetes, multi-tenant), other workloads compete for the same physical resources:
Symptoms of noisy neighbors:
Use dedicated instances for latency-sensitive workloads. Kubernetes node affinity/anti-affinity rules can separate noisy workloads. Provisioned IOPS storage eliminates storage noisy neighbors. The premium cost is often worth the performance predictability.
Resource utilization data is the foundation of capacity planning—projecting future needs based on current usage and growth patterns.
The Capacity Planning Process
Calculating Headroom
Headroom = (Safe Threshold - Current Utilization) / Current Utilization × 100%
Example: CPU at 50% utilization, 70% safe threshold:
Headroom = (70% - 50%) / 50% = 40% growth capacity
You can grow 40% before hitting the safe threshold.
| Resource | Safe Sustained Threshold | Reason |
|---|---|---|
| CPU | 70-80% | Leaves room for traffic spikes |
| Memory (Available) | 20% | Buffer for temporary allocations |
| Disk I/O | 60-70% | Queue depth grows rapidly beyond this |
| Network | 60-70% | Congestion and retransmissions increase |
Runway Calculation
Runway (months) = Headroom / Monthly Growth Rate
Example: 40% headroom, 10% monthly growth:
Runway = 40% / 10% = 4 months
You have 4 months before hitting the safe threshold.
Scaling Lead Time
Account for how long scaling takes:
Start scaling before you hit the threshold:
Scale Trigger = Threshold - (Growth Rate × Lead Time)
With 7-day database scaling lead time and 10% monthly (~2.5% weekly) growth:
Trigger at 70% - (2.5% × 1 week) = 67.5%
Monitor cost per unit of work ($/request, $/user, etc.). If efficiency is declining (cost per request increasing), you have scaling inefficiencies—perhaps coordination overhead, underutilized nodes, or architectural bottlenecks that worsen at scale.
Utilization metrics reveal not just when to scale up, but when to scale down. Over-provisioning wastes money; right-sizing matches resources to actual needs.
Identifying Over-Provisioning
Signs you're over-provisioned:
Each under-utilized resource is money paid for capacity never used.
| Resource | If Under-Utilized | If Over-Utilized |
|---|---|---|
| CPU | Smaller instance, fewer cores | Larger instance, horizontal scale, optimize code |
| Memory | Smaller instance, reduce heap | Larger instance, optimize allocations, reduce caching |
| Disk | Smaller disk, lower IOPS tier | Faster disk (HDD→SSD→NVMe), add caching |
| Network | Lower bandwidth tier, smaller instance | Upgrade instance, CDN offload, compression |
Right-Sizing Methodology
Resource Matching
Cloud providers offer specialized instance types:
Match instance type to your bottleneck resource for best cost efficiency.
For stable baseline utilization, reserved capacity (1-3 year commitments) costs 30-70% less than on-demand. Use on-demand for variable/burst capacity. Right-size before reserving—locking in an oversized instance for 3 years is expensive.
Resource utilization metrics bridge abstract performance goals with concrete infrastructure decisions. Let's consolidate what we've learned:
Module Complete:
You've now completed the Performance Metrics module. You understand the five essential metrics for system performance:
These metrics form the vocabulary for all performance discussions and the foundation for every optimization decision.
You now understand resource utilization as the bridge between application performance and infrastructure capacity. You can read utilization metrics, identify bottlenecks, plan capacity, and right-size resources. Combined with latency, throughput, percentiles, and availability, you have a complete toolkit for performance engineering.