Performance Metrics - Learning Module

Loading content...

0/273

Resource Utilization

Understanding What Your System Consumes

Latency tells you how fast. Throughput tells you how much. Availability tells you how reliable. But resource utilization tells you how efficiently your system uses the hardware it runs on—and whether you're about to hit a wall.

Resource utilization metrics—CPU, memory, disk I/O, network bandwidth—are the vital signs of your infrastructure. They reveal whether your system is starving for resources, wasting money on over-provisioning, or approaching dangerous saturation. They're the bridge between abstract performance metrics and concrete capacity planning decisions.

Understanding resource utilization enables you to answer critical questions: How many more users can this server handle? When will we need to scale? Is our database CPU-bound or I/O-bound? Are we paying for resources we don't use? Are we about to run out of memory?

What You Will Learn

This page explores the four primary resources—CPU, memory, disk, and network—covering how to measure utilization, interpret the numbers, understand when high utilization is concerning vs. efficient, and use utilization data for capacity planning. You'll develop the operational intuition that distinguishes engineers who understand their systems at the hardware level.

The Four Primary Resources

Every computing system ultimately consumes four fundamental resources. Understanding their characteristics, measurement, and constraints is foundational to performance engineering.

The Four Primary Resources
Resource	What It Does	Units	Typical Bottleneck Pattern
CPU	Executes instructions, performs computation	Cores, %, cycles	Compute-intensive: encryption, compression, parsing
Memory (RAM)	Stores active data for fast access	GB, % utilization	Large datasets, caching, connection metadata
Disk (Storage)	Persists data, provides I/O	IOPS, MB/s, latency	Database operations, logging, file serving
Network	Transfers data between systems	Mbps, packets/s	API calls, data replication, user traffic

Resource Interactions

Resources don't exist in isolation—they interact in complex ways:

CPU and Memory: Insufficient memory triggers swapping, consuming CPU and disk I/O.
Memory and Disk: Memory acts as cache for disk, reducing disk I/O. Insufficient memory = more disk access.
CPU and Network: Serializing/deserializing network data consumes CPU. Compression trades CPU for network bandwidth.
Disk and Network: Streaming data from disk to network requires both resources in coordination.

The Bottleneck Principle

Performance is limited by the scarcest resource—the bottleneck. Adding more of a non-bottleneck resource does nothing:

If you're CPU-bound, adding memory doesn't help.
If you're memory-bound, adding disk doesn't help.
If you're disk-bound, adding CPU doesn't help.

Identifying the bottleneck is the first step in any performance investigation.

CPU Utilization

CPU utilization measures what percentage of available processing capacity is being used. It's the most commonly monitored resource metric but also one of the most misunderstood.

How CPU Utilization Is Measured

CPU time is categorized into states (on Linux/Unix systems):

user: Time running user-space application code
system: Time handling kernel operations (I/O, syscalls)
iowait: Time waiting for I/O operations to complete
idle: Time doing nothing
steal: Time lost to hypervisor (in virtualized environments)

CPU Utilization = (user + system) / (user + system + iowait + idle + steal) × 100%

Multi-Core Considerations

Modern servers have multiple cores. "100% CPU" means different things:

100% on one core (8-core server) = 12.5% overall
100% on all cores = system is fully saturated

Understanding per-core vs. aggregate utilization matters:

Per-core saturation: One thread is bottlenecked, may indicate single-threaded workload
Aggregate saturation: All cores busy, indicates true CPU bottleneck

The iowait Trap

High iowait looks like CPU utilization in some tools but actually means CPUs are waiting for slow I/O. If you see high iowait, your bottleneck is disk, not CPU. Adding CPU won't help—you need faster storage or better I/O patterns.

What CPU Utilization Levels Mean:

Interpreting CPU Utilization Levels
Utilization	Interpretation	Typical Action
0-30%	Underutilized; overpaying for capacity	Consider downsizing or consolidating
30-60%	Healthy headroom for traffic spikes	Normal operating range
60-80%	Well-utilized; limited spike headroom	Monitor closely, plan scaling
80-90%	High utilization; risk of saturation	Active scaling needed
90-100%	Saturated; latency degrading rapidly	Immediate action required

CPU-Intensive Operations

Recognize what drives high CPU:

Encryption/decryption (TLS, at-rest encryption)
Compression/decompression (gzip, JSON serialization)
Parsing (XML, JSON, complex formats)
Template rendering and HTML generation
Image/video processing
Complex computation (analytics, ML inference)
Garbage collection (in managed languages)

Memory Utilization

Memory (RAM) provides fast storage for active data. Unlike CPU which is a "renewable" resource (freed immediately when computation completes), memory is a "depletable" resource that must be explicitly managed.

Memory Metrics

Total Memory = Used + Free + Buffers + Cached

Used: Memory allocated to processes
Free: Completely unused memory
Buffers: Memory for filesystem metadata
Cached: Disk content cached in RAM for faster access

A Common Misconception:

Low free memory doesn't mean memory problems! Linux aggressively uses free memory for disk caching. A system with 1% free memory but 40% cached is healthy—the cache can be evicted when applications need memory.

What Actually Matters: Available Memory

Available = Free + Buffers + Cached (reclaimable)

This is the memory applications can actually use. Monitor available memory, not free memory.

The OOM Killer

When memory is exhausted and no more can be reclaimed, the Linux OOM (Out-Of-Memory) Killer activates, terminating processes to free memory. This is a catastrophic failure—your application dies. Memory exhaustion should trigger earlier warnings, not surprise termination.

Interpreting Memory Utilization
Available Memory	Interpretation	Action
50%	Abundant headroom	Consider rightsizing down
30-50%	Healthy operating range	Normal, monitor trends
15-30%	Getting tight, cache pressure	Plan capacity increase
5-15%	Critical, frequent cache eviction	Scale immediately
< 5%	Dangerous, OOM risk imminent	Emergency response

Memory-Intensive Operations

Recognize what consumes memory:

Large in-memory caches (Redis, application caches)
Connection pools (each connection has memory overhead)
Session state storage
Large data structures (in-memory indexes, materialized views)
Object allocations (especially in GC languages with heap growth)
Memory-mapped files
JVM/language runtime overhead

Memory Leaks

Memory leaks cause utilization to grow over time:

Monitoring pattern: Steady increase that never decreases
Eventually leads to OOM or forced restart
Detect by tracking memory over hours/days, not minutes
Common causes: caches without eviction, listener registration without cleanup, circular references

Disk I/O Utilization

Disk I/O is often the most constrained resource in database-heavy systems. Unlike CPU and network which are fast, disk operations involve physical or flash storage with inherent latency.

Key Disk Metrics

Disk Performance Metrics
Metric	What It Measures	Typical HDD Value	Typical SSD/NVMe Value
IOPS	Operations per second	100-200 (random)	50,000-500,000+
Throughput (MB/s)	Data transfer rate	100-200 MB/s	500-7,000 MB/s
Latency	Time per operation	5-15ms	0.01-0.5ms
Queue Depth	Pending I/O operations	< 2 healthy	< 32 healthy
Utilization %	Time disk is busy	< 60% healthy	< 80% healthy

Understanding Disk Utilization

Disk utilization (reported by iostat as %util) shows what percentage of time the disk is handling requests. But interpretation differs by disk type:

HDD: 70%+ utilization often means saturation; performance degrades
SSD/NVMe: Can sustain higher utilization; internal parallelism handles concurrent requests

Queue Depth: A Better Saturation Indicator

Queue depth (average number of pending I/O requests) often reveals saturation better than utilization:

Queue depth < 1: Disk has spare capacity
Queue depth 1-4: Normal operation
Queue depth 4-32: Getting busy, watch latency
Queue depth > 32: Likely saturated, latency will spike

Random vs. Sequential I/O

HDDs are 100× slower at random I/O than sequential I/O (seek time). SSDs have similar performance for both. If you're stuck with HDDs, optimizing for sequential access patterns (append-only logs, sequential scans) dramatically improves performance. SSDs are more forgiving of random access patterns.

Disk-Intensive Operations

Database queries (especially non-cached reads)
Log writing (synchronous commits are slow)
File uploads/downloads
Search indexing
Backup operations
Data imports/exports
Swapping (when memory is exhausted)

Flash Storage Considerations

SSDs and NVMe have different failure patterns:

Write endurance limits (each cell can only be written so many times)
Write amplification (writing 1 byte may internally write entire pages)
Garbage collection pauses (internal maintenance operations)

Network Utilization

Network resources connect distributed systems. Network constraints manifest differently from compute resources—physical bandwidth limits, latency costs, and packet processing overhead all matter.

Key Network Metrics

Network Performance Metrics
Metric	What It Measures	Unit	Concern Threshold
Bandwidth Utilization	Data throughput vs. capacity	%	70% sustained
Packets Per Second	Packet volume	pps	Depends on NIC (millions on modern NICs)
Latency (RTT)	Round-trip time	ms	Increases under congestion
Packet Loss	Lost/dropped packets	%	0.1% problematic
TCP Retransmissions	Packets needing resend	count/s	Increases under loss
Connection Count	Active connections	count	OS limit: ~65K per IP pair

Bandwidth vs. Packet Rate

Network interfaces can be limited by either:

Bandwidth: Total data volume (e.g., 10 Gbps)
Packet rate: Number of packets (e.g., 1 million pps)

Small packets can exhaust packet rate before bandwidth. A 10 Gbps link at 1M pps handles ~10,000 bits per packet = 1,250 bytes/packet. Smaller packets (like ACK-heavy workloads) hit packet limits first.

Network Latency Sources

Propagation delay: Speed of light, ~5μs per km
Transmission delay: Time to push bits onto wire
Queueing delay: Waiting in router/switch buffers
Processing delay: Router/kernel packet handling

Intra-datacenter latency: ~0.1-0.5ms Cross-region latency: 20-100ms Intercontinental: 100-300ms

Cross-Region Network Costs

Cloud providers charge for cross-region and internet-bound traffic, often $0.02-0.12 per GB. High-volume services can have network egress as their largest infrastructure cost. Monitor egress carefully and design to minimize cross-region data transfer.

Network-Intensive Operations

API calls between services
Database replication (especially cross-region)
Large response payloads (images, files, bulk data)
Streaming data (video, real-time feeds)
CDN origin fills
Backup and snapshot transfers
Log aggregation to central systems

Understanding Resource Contention

Resources become constrained not just from high utilization, but from contention—multiple consumers competing for the same resource simultaneously.

Types of Contention

Common Contention Patterns

•CPU scheduling contention — More runnable threads than cores. Context switches add overhead, and some threads wait for their time slice.
•Lock contention — Multiple threads waiting for the same mutex/lock. Serializes parallel work, causes threads to wait.
•I/O contention — Multiple processes competing for disk. Queue depth increases, latency grows for everyone.
•Network contention — Bandwidth exhaustion. Packets queue in buffers, retransmissions increase.
•Memory contention — Multiple processes exhausting memory. Triggers swapping, cache eviction, GC pressure.
•Port exhaustion — Running out of ephemeral ports for outbound connections. New connections fail.

Noisy Neighbors

In shared environments (cloud, Kubernetes, multi-tenant), other workloads compete for the same physical resources:

Virtual machine neighbors: Other VMs on the same hypervisor
Container neighbors: Other containers on the same node
Network neighbors: Others sharing the network fabric
Storage neighbors: Others sharing disk pools

Symptoms of noisy neighbors:

Inconsistent performance (sometimes fast, sometimes slow)
High CPU steal time in VMs
Disk latency spikes unrelated to your I/O
Network packet loss or latency not explained by your traffic

Isolate Noisy Workloads

Use dedicated instances for latency-sensitive workloads. Kubernetes node affinity/anti-affinity rules can separate noisy workloads. Provisioned IOPS storage eliminates storage noisy neighbors. The premium cost is often worth the performance predictability.

Using Utilization for Capacity Planning

Resource utilization data is the foundation of capacity planning—projecting future needs based on current usage and growth patterns.

The Capacity Planning Process

Measure current utilization: What resources are you using today?
Identify the constraint: Which resource will hit its limit first?
Project growth: How fast is usage growing?
Calculate runway: When will you exhaust capacity?
Plan scaling: When and how to add capacity?

Calculating Headroom

Headroom = (Safe Threshold - Current Utilization) / Current Utilization × 100%

Example: CPU at 50% utilization, 70% safe threshold:

Headroom = (70% - 50%) / 50% = 40% growth capacity

You can grow 40% before hitting the safe threshold.

Resource Safe Thresholds
Resource	Safe Sustained Threshold	Reason
CPU	70-80%	Leaves room for traffic spikes
Memory (Available)	20%	Buffer for temporary allocations
Disk I/O	60-70%	Queue depth grows rapidly beyond this
Network	60-70%	Congestion and retransmissions increase

Runway Calculation

Runway (months) = Headroom / Monthly Growth Rate

Example: 40% headroom, 10% monthly growth:

Runway = 40% / 10% = 4 months

You have 4 months before hitting the safe threshold.

Scaling Lead Time

Account for how long scaling takes:

Cloud VM: Minutes to hours
Kubernetes pods: Seconds to minutes
Database scaling: Hours to days
Hardware procurement: Weeks to months

Start scaling before you hit the threshold:

Scale Trigger = Threshold - (Growth Rate × Lead Time)

With 7-day database scaling lead time and 10% monthly (~2.5% weekly) growth:

Trigger at 70% - (2.5% × 1 week) = 67.5%

Track Resource Efficiency

Monitor cost per unit of work ($/request, $/user, etc.). If efficiency is declining (cost per request increasing), you have scaling inefficiencies—perhaps coordination overhead, underutilized nodes, or architectural bottlenecks that worsen at scale.

Right-Sizing and Cost Optimization

Utilization metrics reveal not just when to scale up, but when to scale down. Over-provisioning wastes money; right-sizing matches resources to actual needs.

Identifying Over-Provisioning

Signs you're over-provisioned:

Sustained CPU < 20%
Available memory > 60%
Disk I/O rarely above 20%
Network utilization < 10%

Each under-utilized resource is money paid for capacity never used.

Right-Sizing Actions by Resource
Resource	If Under-Utilized	If Over-Utilized
CPU	Smaller instance, fewer cores	Larger instance, horizontal scale, optimize code
Memory	Smaller instance, reduce heap	Larger instance, optimize allocations, reduce caching
Disk	Smaller disk, lower IOPS tier	Faster disk (HDD→SSD→NVMe), add caching
Network	Lower bandwidth tier, smaller instance	Upgrade instance, CDN offload, compression

Right-Sizing Methodology

Collect utilization data over representative periods (weeks, not hours)
Identify peak utilization for each resource
Add safety margin (typically 30-50% above peak)
Select instance type matching that profile
Monitor after change to verify adequate resources
Iterate periodically as workloads change

Resource Matching

Cloud providers offer specialized instance types:

Compute-optimized: High CPU-to-memory ratio (compute-intensive workloads)
Memory-optimized: High memory-to-CPU ratio (caches, in-memory databases)
Storage-optimized: High IOPS, high throughput (databases, analytics)
General purpose: Balanced (web servers, application servers)

Match instance type to your bottleneck resource for best cost efficiency.

Reserved vs. On-Demand

For stable baseline utilization, reserved capacity (1-3 year commitments) costs 30-70% less than on-demand. Use on-demand for variable/burst capacity. Right-size before reserving—locking in an oversized instance for 3 years is expensive.

Summary: Understanding Your Resources

Resource utilization metrics bridge abstract performance goals with concrete infrastructure decisions. Let's consolidate what we've learned:

Key Takeaways

•Four primary resources — CPU, memory, disk, and network. Performance is limited by the scarcest (bottleneck). Identify and address the actual constraint.
•CPU utilization — Monitor per-core vs. aggregate. Watch for iowait (disk problem, not CPU). Target 60-80% for healthy headroom.
•Memory utilization — Track available memory, not free memory. Linux caches heavily—this is normal. Approaching 0% available triggers OOM killer.
•Disk I/O — Queue depth often reveals saturation better than utilization %. Huge performance gap between HDD and SSD/NVMe. Random vs. sequential matters for HDD.
•Network — Both bandwidth and packet rate can bottleneck. Cross-region traffic is expensive. Monitor packet loss and retransmissions.
•Resource contention — Competition for shared resources (locks, noisy neighbors) causes performance degradation beyond simple utilization.
•Capacity planning — Use utilization trends to project runway. Account for scaling lead time. Start scaling before you hit limits.
•Right-sizing — Match resources to actual needs. Over-provisioning wastes money. Match instance types to your bottleneck resource.

Module Complete:

You've now completed the Performance Metrics module. You understand the five essential metrics for system performance:

Latency: How fast individual requests complete
Throughput: How much work the system handles
Percentiles: The full distribution, not just averages
Availability: Whether the system is up when users need it
Resource Utilization: How efficiently hardware is used

These metrics form the vocabulary for all performance discussions and the foundation for every optimization decision.

Module Complete

You now understand resource utilization as the bridge between application performance and infrastructure capacity. You can read utilization metrics, identify bottlenecks, plan capacity, and right-size resources. Combined with latency, throughput, percentiles, and availability, you have a complete toolkit for performance engineering.