Loading learning content...
Understanding how cgroups work is only half the battle. The harder question is: What limits should I actually set? Too low, and your application is constantly throttled or OOM-killed. Too high, and you waste resources or risk one container starving others.
Effective resource limiting requires understanding your workload's characteristics, the difference between requests and limits, overcommitment tradeoffs, and monitoring feedback loops. This page bridges cgroup mechanics with practical container sizing decisions.
By the end of this page, you will understand how to analyze workload resource requirements, the requests vs limits model used by Kubernetes, CPU and memory limit calculation strategies, overcommitment ratios, QoS classes, and how to use monitoring data to iteratively tune resource configurations.
Before setting limits, you must understand your workload's resource consumption patterns. Different workload types have dramatically different profiles.
CPU-Bound vs I/O-Bound Workloads
Steady-State vs Bursty Workloads
| Workload Type | CPU Pattern | Memory Pattern | Sizing Strategy |
|---|---|---|---|
| Web API Server | Bursty, request-driven | Stable (connection pools) | Size for P99 latency targets |
| Background Worker | Steady, queue-driven | Varies with batch size | Size for throughput; limit to prevent starvation |
| Database | Bursty, query-driven | Stable (buffer pool) | High memory; CPU for concurrent queries |
| ML Inference | High, steady per request | High (model in memory) | Fixed high CPU/memory; horizontal scale |
| Cache (Redis) | Low (network-bound) | Very high (dataset) | Minimal CPU; memory is primary constraint |
| Build/Compile | Very high, sustained | Moderate (compiler state) | Maximum CPU; memory for parallel compiles |
Measuring Workload Requirements
Before setting production limits, measure actual usage under realistic load:
# Monitor container resource usage with docker stats
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
# For Kubernetes, use kubectl top
kubectl top pods --containers
# For detailed cgroup metrics
cat /sys/fs/cgroup/<path>/cpu.stat
cat /sys/fs/cgroup/<path>/memory.current
cat /sys/fs/cgroup/<path>/memory.stat
# Use Prometheus with cAdvisor for time-series data
# Key metrics:
# - container_cpu_usage_seconds_total
# - container_memory_working_set_bytes
# - container_cpu_cfs_throttled_seconds_total
Collect data across:
Startup is often the most resource-intensive phase—applications loading data, warming caches, or compiling JIT code may need temporarily higher limits.
Running without limits initially (in a safe environment) while monitoring provides the baseline data needed for informed limit setting. Many organizations run new workloads with monitoring but no hard limits for a burn-in period, then set limits based on observed P99 usage plus headroom.
Kubernetes popularized the requests/limits model, which maps directly to cgroup configurations but provides a higher-level abstraction for resource management.
Requests: Guaranteed Resources
A pod's requests specify the minimum resources it needs to run properly:
Limits: Maximum Resources
A pod's limits specify the maximum resources it can consume:
123456789101112131415
apiVersion: v1kind: Podmetadata: name: web-appspec: containers: - name: app image: myapp:latest resources: requests: cpu: "500m" # 0.5 CPU cores guaranteed memory: "256Mi" # 256 MB guaranteed limits: cpu: "2" # Can burst to 2 cores when available memory: "512Mi" # Hard limit; OOM if exceededHow Requests/Limits Map to cgroups
| Kubernetes | cgroup v2 File | Effect |
|---|---|---|
requests.cpu | cpu.weight | Proportional share = (request/1 CPU) * 100 |
limits.cpu | cpu.max | $((limit * 100000)) 100000 |
requests.memory | (scheduling only) | Not enforced by cgroup |
limits.memory | memory.max | Hard limit in bytes |
The Gap Between Requests and Limits
The ratio of limits to requests determines overcommitment potential:
Example:
requests:
cpu: "100m"
limits:
cpu: "1000m" # 10x overcommit ratio
This pod needs 100m to function but can use up to 1000m if available. On a 4-CPU node, you could schedule 40 such pods by requests, but they could collectively demand 40 CPUs—10x the node's capacity if all burst simultaneously.
CPU overcommit is relatively safe—excess demand causes throttling but not crashes. Memory overcommit is dangerous—if all pods simultaneously need their limits, there's not enough physical memory, triggering OOM kills. Conservative memory limits are essential for stability.
CPU limits require balancing predictability against utilization. Several strategies exist, each with tradeoffs.
Strategy 1: No CPU Limits (Requests Only)
Some organizations (notably Google) recommend setting CPU requests but not limits:
resources:
requests:
cpu: "500m"
# No limits.cpu set
Advantages:
Disadvantages:
Strategy 2: Limits Equal to Requests (Guaranteed)
resources:
requests:
cpu: "1"
limits:
cpu: "1" # Same as request
Advantages:
Disadvantages:
Best for: Latency-sensitive applications where predictability matters more than efficiency.
Strategy 3: Limits Higher Than Requests (Burstable)
resources:
requests:
cpu: "500m"
limits:
cpu: "2" # 4x burst capacity
Advantages:
Disadvantages:
Best for: Batch workloads, bursty applications, cost-optimized environments.
Calculating CPU Limits
Use monitoring data to calculate appropriate limits:
# Collect CPU usage over time (e.g., from Prometheus)
cpu_usage_samples = get_cpu_usage_over_7_days()
# Calculate percentiles
p50 = percentile(cpu_usage_samples, 50)
p99 = percentile(cpu_usage_samples, 99)
max_spike = max(cpu_usage_samples)
# Set request at P50-P75 (typical usage)
request = p50 * 1.2 # 20% headroom
# Set limit at P99 + headroom
limit = p99 * 1.5 # 50% headroom for P99
# Sanity check: limit should handle max observed + margin
if limit < max_spike * 1.1:
limit = max_spike * 1.1
Debugging CPU Throttling
Throttling indicates the limit is too low:
# Check throttling in cgroup stats
cat /sys/fs/cgroup/.../cpu.stat
# nr_throttled 12345 # Number of throttle events
# throttled_usec 78901 # Total time throttled
# In Kubernetes, check container_cpu_cfs_throttled_seconds_total
# If throttle ratio (throttled/usage) > 5-10%, increase limit
Throttling causes latency spikes. A request taking 10ms of CPU spread across a throttled period might actually take 50ms wall-clock time.
CFS bandwidth uses a 100ms period. A pod with limit=100m (10% of one CPU) gets 10ms every 100ms. A burst that uses 15ms CPU in one period exhausts quota and must wait 90ms for more. For latency-sensitive workloads, consider requests=limits or no limits.
Memory limits are less forgiving than CPU limits. Exceeding a CPU limit causes throttling (recoverable). Exceeding a memory limit causes OOM (often fatal). Memory sizing requires careful analysis.
Understanding Memory Types
What counts toward the memory limit?
In cgroup v2's memory.current:
memory.current = RSS + page cache + kernel memory used by cgroup
Kubernetes uses memory.current minus inactive cache for metrics, as cache can be reclaimed under pressure.
Calculating Memory Limits
Memory limits should accommodate:
# Example calculation for a Java web service
jvm_heap_max = 384 * MB # -Xmx setting
jvm_metaspace = 64 * MB # Typical metaspace
jvm_overhead = 50 * MB # JVM internal structures
app_static = 20 * MB # Static data
per_request = 0.5 * MB # Memory per concurrent request
max_concurrent = 50 # Max concurrent requests
fragmentation = 1.15 # 15% fragmentation overhead
working_memory = (
jvm_heap_max +
jvm_metaspace +
jvm_overhead +
app_static +
(per_request * max_concurrent)
)
memory_limit = int(working_memory * fragmentation)
# ≈ 600 MB
For non-JVM applications, profile memory under load:
# Track memory over time
while true; do
cat /sys/fs/cgroup/.../memory.current >> memory_log.txt
sleep 5
done
# Analyze for P99 and max
awk '{print}' memory_log.txt | sort -n | tail -1 # Max
JVMs manage their own heap. A 512MB container with a JVM using -Xmx512m will OOM—the JVM heap is only part of JVM memory. Use -Xmx at 70-80% of container limit, or use -XX:MaxRAMPercentage to auto-size. Always set -XX:+UseContainerSupport (default in recent JDKs) so JVM respects cgroup limits.
Memory Requests: The Scheduling Guarantee
Unlike CPU requests which affect scheduling, memory requests affect scheduling AND behavior:
# Conservative: request = limit (guaranteed, no eviction risk)
resources:
requests:
memory: "512Mi"
limits:
memory: "512Mi"
# Overcommitted: request < limit (risk of eviction under pressure)
resources:
requests:
memory: "256Mi" # Scheduling guarantee
limits:
memory: "512Mi" # Hard limit
Memory Limit Tuning Signals
Monitor these to tune limits:
| Signal | Indication | Action |
|---|---|---|
OOM kills (memory.events oom_kill) | Limit too low | Increase limit or optimize app |
| High memory.high events | Approaching limit frequently | Increase limit or add memory.high |
| Memory stable at limit | May be throttling reclaim | Check io.pressure, add headroom |
| Memory well below limit | Over-provisioned | Reduce limit to free cluster capacity |
Kubernetes classifies pods into QoS classes based on their resource configuration. QoS affects eviction priority during node resource pressure.
Guaranteed QoS
A pod is Guaranteed when:
containers:
- name: app
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "512Mi" # Same as request
cpu: "500m" # Same as request
Characteristics:
Burstable QoS
A pod is Burstable when:
containers:
- name: app
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi" # Different from request
cpu: "1" # Different from request
Characteristics:
BestEffort QoS
A pod is BestEffort when:
containers:
- name: app
# No resources specified
image: myapp:latest
Characteristics:
| QoS Class | Eviction Priority | Resource Efficiency | Predictability | Use Case |
|---|---|---|---|---|
| Guaranteed | Last (most protected) | Lowest (full reservation) | Highest | Critical production |
| Burstable | Middle | Good (overcommit allowed) | Medium | General workloads |
| BestEffort | First (least protected) | Highest (no reservation) | Lowest | Batch, interruptible |
Eviction Order Under Pressure
When a node experiences memory pressure, the kubelet evicts pods in this order:
Within each class, pods using more memory relative to their requests are evicted first.
OOM Score Calculation
The Linux OOM killer uses oom_score_adj to prioritize kills. Kubernetes sets this based on QoS:
| QoS Class | oom_score_adj | Meaning |
|---|---|---|
| Guaranteed | -997 | Almost never killed |
| BestEffort | 1000 | Maximum kill priority |
| Burstable | 2 to 999 | Varies by memory usage vs request ratio |
Choose QoS class intentionally. Use Guaranteed for databases, caches, and critical APIs. Use Burstable for web servers and workers that can tolerate occasional eviction. Use BestEffort only for jobs that can be restarted without consequence.
Overcommitment allows scheduling more workloads than physical resources by betting that not all workloads peak simultaneously. It's a powerful cost optimization tool but requires careful management.
CPU Overcommitment
CPU is "compressible"—excess demand causes throttling, not crashes. CPU overcommitment is common and relatively safe:
Node: 4 CPUs
Pod A: request=1, limit=2
Pod B: request=1, limit=2
Pod C: request=1, limit=2
Pod D: request=1, limit=2
Total requests: 4 CPUs (100% scheduled)
Total limits: 8 CPUs (200% overcommit ratio)
If all pods burst simultaneously:
- Each gets proportional share based on requests
- Pod with request=1 gets 1 CPU even if limit=2
- Throttling occurs but no crashes
Memory Overcommitment
Memory is "incompressible"—excess demand causes OOM. Memory overcommitment is risky:
Node: 8 GB
Pod A: request=2GB, limit=4GB
Pod B: request=2GB, limit=4GB
Pod C: request=2GB, limit=4GB
Pod D: request=2GB, limit=4GB
Total requests: 8 GB (100% scheduled)
Total limits: 16 GB (200% overcommit ratio)
If all pods use their limits:
- Node has only 8 GB physical memory
- OOM killer invoked
- Pods crash, potential cascade failure
Memory overcommitment is the primary cause of production outages in containerized environments. A best practice is requests = limits for memory, eliminating memory overcommit risk. CPU can be overcommitted more aggressively.
Calculating Overcommit Ratios
Determine safe overcommit ratios based on workload diversity:
# If pods have independent load patterns (low correlation)
# Higher overcommit is safer
correlation = calculate_usage_correlation(pod_metrics)
if correlation < 0.3: # Low correlation
safe_cpu_overcommit = 3.0 # 300%
safe_memory_overcommit = 1.2 # 120%
elif correlation < 0.6: # Medium correlation
safe_cpu_overcommit = 2.0 # 200%
safe_memory_overcommit = 1.0 # None
else: # High correlation (e.g., all peak together)
safe_cpu_overcommit = 1.2 # 120%
safe_memory_overcommit = 1.0 # None (too risky)
Bin Packing Strategies
Bin packing is the scheduler's job—fitting pods onto nodes efficiently:
Kubernetes supports custom scheduling policies:
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: default-scheduler
plugins:
score:
enabled:
- name: NodeResourcesFit
weight: 1
pluginConfig:
- name: NodeResourcesFit
args:
scoringStrategy:
type: MostAllocated # Tight packing
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
Vertical Pod Autoscaler (VPA) for Right-Sizing
VPA automatically adjusts resource requests/limits based on observed usage:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto" # Automatically adjust resources
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4
memory: 4Gi
VPA collects usage metrics, computes recommendations, and (in Auto mode) evicts pods to apply new resource values. Use "Off" mode initially to get recommendations without changes.
Resource limits should not be set and forgotten. Continuous monitoring and iterative tuning ensures optimal performance and cost efficiency.
Key Metrics to Monitor
# CPU throttling (indicates limit too low)
rate(container_cpu_cfs_throttled_seconds_total[5m])
/ rate(container_cpu_usage_seconds_total[5m]) > 0.1
# Alert if throttling > 10% of usage
# Memory approaching limit (OOM risk)
(container_memory_working_set_bytes
/ container_spec_memory_limit_bytes) > 0.9
# Alert if using > 90% of limit
# OOM kills
increase(kube_pod_container_status_restarts_total[1h]) > 3
# Combined with OOM termination reason
# Resource waste (over-provisioning)
1 - (avg_over_time(container_memory_working_set_bytes[7d])
/ container_spec_memory_limit_bytes) > 0.5
# Alert if using < 50% of limit consistently
Interpreting Pressure Stall Information (PSI)
Cgroup v2's PSI provides direct pressure metrics without needing to correlate multiple signals:
cat /sys/fs/cgroup/.../cpu.pressure
# some avg10=0.00 avg60=0.00 avg300=0.00 total=0
# full avg10=0.00 avg60=0.00 avg300=0.00 total=0
cat /sys/fs/cgroup/.../memory.pressure
# some avg10=5.23 avg60=2.11 avg300=0.89 total=123456
# full avg10=0.50 avg60=0.20 avg300=0.08 total=12345
PSI thresholds for alerts:
some > 10%: Resource becoming constrained; investigatesome > 25%: Resource is a bottleneck; increase limitsfull > 1%: Severe constraint; immediate attention needed12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
#!/usr/bin/env python3"""Analyze container resource usage and generate limit recommendations.Fetch data from Prometheus and calculate optimal requests/limits.""" import numpy as npfrom prometheus_api_client import PrometheusConnect def analyze_container_resources(prom: PrometheusConnect, container: str, days: int = 7): # Query CPU usage over the period cpu_query = f''' rate(container_cpu_usage_seconds_total{{container="{container}"}}[5m]) ''' cpu_data = prom.custom_query_range( cpu_query, start_time=f"{days}d", end_time="now", step="5m" ) cpu_values = [float(v[1]) for v in cpu_data[0]['values']] # Query memory usage memory_query = f''' container_memory_working_set_bytes{{container="{container}"}} ''' memory_data = prom.custom_query_range( memory_query, start_time=f"{days}d", end_time="now", step="5m" ) memory_values = [float(v[1]) for v in memory_data[0]['values']] # Calculate percentiles recommendations = { 'cpu': { 'request': f"{int(np.percentile(cpu_values, 75) * 1000)}m", 'limit': f"{int(np.percentile(cpu_values, 99) * 1500)}m", 'p50': np.percentile(cpu_values, 50), 'p99': np.percentile(cpu_values, 99), 'max': max(cpu_values), }, 'memory': { 'request': f"{int(np.percentile(memory_values, 90) / 1024**2)}Mi", 'limit': f"{int(np.percentile(memory_values, 99) * 1.25 / 1024**2)}Mi", 'p50': np.percentile(memory_values, 50) / 1024**2, 'p99': np.percentile(memory_values, 99) / 1024**2, 'max': max(memory_values) / 1024**2, } } return recommendations # Example usageprom = PrometheusConnect(url="http://prometheus:9090")recs = analyze_container_resources(prom, "my-app", days=14)print(f"CPU: request={recs['cpu']['request']}, limit={recs['cpu']['limit']}")print(f"Memory: request={recs['memory']['request']}, limit={recs['memory']['limit']}")We've covered the practical aspects of container resource limiting. Let's consolidate the key takeaways:
What's next:
We've explored namespaces for isolation, cgroups for resource control, and practical resource limiting strategies. The final page of this module covers container isolation from a holistic security perspective—how namespaces, cgroups, capabilities, seccomp, and LSMs work together to create secure container boundaries, and the limitations that require additional isolation layers like gVisor or Kata Containers.
You now understand practical resource limiting—how to analyze workloads, calculate appropriate requests and limits, choose QoS classes, and iteratively tune resource configurations. Next, we'll explore container isolation security in depth.