Loading content...
In the world of Kubernetes, resource management is the invisible foundation upon which all stability, performance, and cost efficiency rest. A cluster without proper resource management is like a city without traffic laws—everything might work fine when traffic is light, but chaos erupts under load.
Every container running in Kubernetes competes for the same finite pool of compute resources: CPU cycles, memory, storage bandwidth, and network capacity. Without explicit resource management, this competition becomes a free-for-all where a single misbehaving application can starve its neighbors, trigger cascading failures, or cause the scheduler to make poor placement decisions.
Resource management in Kubernetes encompasses:
By the end of this page, you'll understand how to configure CPU and memory resources, predict scheduling behavior, implement QoS strategies, enforce organizational resource policies, and avoid the common pitfalls that lead to cluster instability and cost overruns.
Before diving into configuration, we must understand how Kubernetes models compute resources and how these models differ from traditional virtualization.
CPU Resources: Compressible and Shareable
CPU is a compressible resource in Kubernetes terminology. When CPU demand exceeds supply, the kernel's CFS (Completely Fair Scheduler) throttles containers proportionally. No container crashes—they simply run slower. This compressibility makes CPU more forgiving but also means CPU limits can introduce unexpected latency through throttling.
CPU Units:
250m means 0.25 CPU cores0.1 CPU (100m) for lightweight sidecarsMemory Resources: Incompressible and Fatal
Memory is incompressible. When a container tries to exceed its memory limit, the kernel's OOM (Out of Memory) killer terminates it—there's no graceful degradation. This makes memory limits more dangerous than CPU limits; incorrect memory limits cause container restarts and potential data loss.
Memory Units:
| Characteristic | CPU | Memory |
|---|---|---|
| Compressibility | Compressible (throttled) | Incompressible (OOM killed) |
| Failure mode | Increased latency | Container restart |
| Overcommit risk | Performance degradation | Cascading failures |
| Monitoring signal | Throttle metrics | OOM events, restarts |
| Limit enforcement | CFS bandwidth control | Kernel OOM killer |
| Burstable | Yes (can exceed request) | Yes, but dangerous |
Setting memory limits too close to actual usage is a recipe for instability. Applications often have transient memory spikes (GC pauses, request bursts, background tasks) that exceed steady-state consumption. Leave headroom—typically 20-30% above observed peak usage—to absorb these spikes without OOM kills.
The distinction between requests and limits is the most critical concept in Kubernetes resource management. Misunderstanding this distinction leads to scheduling failures, cluster instability, and resource waste.
Requests: What the Scheduler Sees
Resource requests tell the Kubernetes scheduler how much CPU and memory a container needs to run. The scheduler uses requests—not limits—to make placement decisions. A pod is only scheduled to a node if the node has sufficient allocatable resources to satisfy the pod's total requests.
Key implications:
Limits: What the Kernel Enforces
Resource limits define the maximum resources a container can consume. Unlike requests, limits are enforced by Linux kernel mechanisms (cgroups), not the Kubernetes scheduler.
Limit enforcement:
123456789101112131415161718192021222324252627282930313233
apiVersion: v1kind: Podmetadata: name: resource-demo namespace: productionspec: containers: - name: application image: myapp:v1.2.3 resources: # Requests: Scheduler guarantees, must always be available requests: memory: "512Mi" # Container needs at least 512Mi to start cpu: "250m" # Container needs 0.25 CPU cores minimum # Limits: Hard ceiling, kernel-enforced limits: memory: "1Gi" # Container killed if exceeds 1Gi cpu: "1000m" # Container throttled if exceeds 1 CPU - name: sidecar-logger image: fluentbit:latest resources: requests: memory: "64Mi" cpu: "50m" limits: memory: "128Mi" cpu: "100m" # Total pod requests: 576Mi memory, 300m CPU# Total pod limits: 1152Mi memory, 1100m CPU# Scheduler reserves: 576Mi memory, 300m CPUKubernetes assigns every pod a Quality of Service (QoS) class based on its resource specification. QoS classes determine eviction priority when nodes experience memory pressure—pods with lower QoS are evicted before pods with higher QoS.
Understanding QoS is essential because:
The Three QoS Classes:
1. Guaranteed (Highest Priority)
2. Burstable (Medium Priority)
3. BestEffort (Lowest Priority)
| QoS Class | Criteria | Eviction Priority | Use Case |
|---|---|---|---|
| Guaranteed | All containers: requests = limits for CPU & memory | Last (most protected) | Databases, critical services, latency-sensitive apps |
| Burstable | At least one request or limit set; requests ≠ limits | Middle | Web servers, API services, most production workloads |
| BestEffort | No requests or limits on any container | First (least protected) | Batch jobs, CI/CD runners, dev environments |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
# Guaranteed QoS - Highest protectionapiVersion: v1kind: Podmetadata: name: guaranteed-podspec: containers: - name: database image: postgres:15 resources: requests: memory: "4Gi" cpu: "2000m" limits: memory: "4Gi" # Same as request cpu: "2000m" # Same as request ---# Burstable QoS - Can burst above requestsapiVersion: v1kind: Podmetadata: name: burstable-podspec: containers: - name: webserver image: nginx:1.24 resources: requests: memory: "256Mi" cpu: "100m" limits: memory: "512Mi" # Higher than request cpu: "500m" # Higher than request ---# BestEffort QoS - No resource guaranteesapiVersion: v1kind: Podmetadata: name: besteffort-podspec: containers: - name: batch-job image: batch-processor:latest # No resources section = BestEffortFor production clusters, aim for most pods to be Burstable with appropriate requests, and reserve Guaranteed for truly critical services (databases, message queues, coordination services). BestEffort should be restricted to development namespaces or preemptible batch workloads.
Resource Quotas enable cluster administrators to constrain aggregate resource consumption per namespace. In multi-tenant clusters, quotas prevent any single team from monopolizing cluster resources, ensure fair resource distribution, and provide cost accountability.
What Resource Quotas Control:
Compute Resources
requests.cpu, limits.cpu: Total CPU across all podsrequests.memory, limits.memory: Total memory across all podsrequests.nvidia.com/gpu: Extended resources (GPUs, etc.)Storage Resources
requests.storage: Total PersistentVolumeClaim sizepersistentvolumeclaims: Number of PVCs<storage-class>.storageclass.storage.k8s.io/requests.storage: Per-storage-class limitsObject Counts
pods, services, secrets, configmaps: Limit number of objectscount/<resource>.<group>: Generic object count (e.g., count/deployments.apps)Enforcement Behavior:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
apiVersion: v1kind: ResourceQuotametadata: name: production-quota namespace: team-paymentsspec: hard: # Compute Quotas requests.cpu: "100" # 100 CPU cores total requests.memory: "200Gi" # 200 GiB memory requests limits.cpu: "200" # 200 CPU cores limit total limits.memory: "400Gi" # 400 GiB memory limits # Pod count limits pods: "500" # Max 500 pods in namespace # Storage limits requests.storage: "1Ti" # 1 TiB total PVC storage persistentvolumeclaims: "50" # Max 50 PVCs # Object count limits services: "100" secrets: "200" configmaps: "200" ---# Scoped quota - only applies to specific QoS classapiVersion: v1kind: ResourceQuotametadata: name: guaranteed-quota namespace: team-paymentsspec: hard: pods: "20" # Only 20 Guaranteed pods allowed scopeSelector: matchExpressions: - operator: In scopeName: PriorityClass values: ["high-priority"] ---# Quota for specific priority classapiVersion: v1kind: ResourceQuotametadata: name: low-priority-quota namespace: team-analyticsspec: hard: cpu: "50" memory: "100Gi" scopeSelector: matchExpressions: - operator: In scopeName: PriorityClass values: ["batch", "preemptible"]If a ResourceQuota exists that requires tracking compute resources (cpu/memory requests/limits), all pods in that namespace MUST specify those resources. Otherwise, pod creation fails. Use LimitRange to set defaults so users don't need to specify resources on every container.
LimitRange resources provide three critical functions:
LimitRanges are particularly valuable in multi-tenant environments where you want to:
LimitRange Types:
123456789101112131415161718192021222324252627282930313233343536373839404142
apiVersion: v1kind: LimitRangemetadata: name: default-limits namespace: team-checkoutspec: limits: # Container-level limits (most common) - type: Container # Default values applied when container doesn't specify default: cpu: "500m" memory: "512Mi" # Default requests when not specified defaultRequest: cpu: "100m" memory: "128Mi" # Minimum allowed (rejects pods below this) min: cpu: "50m" memory: "64Mi" # Maximum allowed (rejects pods above this) max: cpu: "4" memory: "8Gi" # Maximum limit/request ratio (prevents extreme overcommit) maxLimitRequestRatio: cpu: "10" # Limit can be at most 10x request memory: "4" # Limit can be at most 4x request # Pod-level limits (aggregate across all containers) - type: Pod max: cpu: "8" # Total pod CPU cannot exceed 8 cores memory: "16Gi" # Total pod memory cannot exceed 16Gi # PVC storage limits - type: PersistentVolumeClaim min: storage: "1Gi" # No tiny PVCs max: storage: "100Gi" # Cap individual PVC sizeHow Defaults Are Applied:
When a pod is created, the admission controller checks each container:
resources.limits, apply default valuesresources.requests, apply defaultRequest valuesdefaultRequest is not set but default is, request = limit (Guaranteed QoS)min, max, and maxLimitRequestRatioCommon LimitRange Patterns:
A common source of confusion in Kubernetes resource management is the difference between node capacity and node allocatable. Understanding this distinction is critical for capacity planning.
Node Capacity: Total hardware resources on the node
Node Allocatable: Resources available for scheduling pods
The difference accounts for:
Formula:
Allocatable = Capacity - SystemReserved - KubeReserved - EvictionThreshold
12345678910111213141516171819202122232425262728293031323334353637
# Kubelet configuration for resource reservationapiVersion: kubelet.config.k8s.io/v1beta1kind: KubeletConfiguration # Reserve resources for system processessystemReserved: cpu: "500m" memory: "1Gi" ephemeral-storage: "1Gi" # Reserve resources for Kubernetes componentskubeReserved: cpu: "500m" memory: "1Gi" ephemeral-storage: "1Gi" # Eviction thresholds (when to start evicting pods)evictionHard: memory.available: "500Mi" nodefs.available: "10%" nodefs.inodesFree: "5%" imagefs.available: "15%" evictionSoft: memory.available: "1Gi" nodefs.available: "15%" evictionSoftGracePeriod: memory.available: "1m30s" nodefs.available: "1m30s" # Example calculation for 16Gi memory node:# Capacity: 16Gi# - SystemReserved: 1Gi# - KubeReserved: 1Gi # - EvictionHard: 500Mi# = Allocatable: ~13.5GiView a node's allocatable resources with: kubectl describe node <node-name>. Look for the 'Allocatable' section. The difference between 'Capacity' and 'Allocatable' shows exactly what's reserved for system and Kubernetes components.
| Node Size | Capacity | System + Kube Reserved | Eviction Buffer | Allocatable | % Available |
|---|---|---|---|---|---|
| Small (4 CPU, 16Gi) | 16Gi | 2Gi | 500Mi | ~13.5Gi | 84% |
| Medium (8 CPU, 32Gi) | 32Gi | 2.5Gi | 1Gi | ~28.5Gi | 89% |
| Large (16 CPU, 64Gi) | 64Gi | 3Gi | 1.5Gi | ~59.5Gi | 93% |
| XLarge (32 CPU, 128Gi) | 128Gi | 4Gi | 2Gi | ~122Gi | 95% |
CPU limits in Kubernetes are enforced via the Linux CFS (Completely Fair Scheduler) bandwidth control mechanism. While this seems straightforward, the implementation details cause significant performance issues that many teams overlook.
How CFS Bandwidth Control Works:
The CFS scheduler enforces CPU limits using a quota/period system:
--cpu-cfs-quota-period)The throttling mechanism:
The Burst Problem:
Even if your container's average CPU usage is well below limits, it can be throttled if usage is bursty. Consider a web server that processes requests in bursts—each request handler might spike CPU for 10-20ms. If multiple requests arrive in the same 100ms period, the quota exhausts and requests queue, adding latency.
1234567891011121314151617181920212223242526272829
#!/bin/bash# Monitor CFS throttling for a specific container # Find the cgroup path for your containerCONTAINER_ID=$(kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].containerID}' | sed 's/containerd:\/\///') # For containerd runtime, cgroup path is typically:CGROUP_PATH="/sys/fs/cgroup/cpu,cpuacct/kubepods/pod<pod-uid>/<container-id>" # Key metrics to monitor:# nr_periods: Total number of CFS periods elapsedcat $CGROUP_PATH/cpu.stat | grep nr_periods # nr_throttled: Number of periods where container was throttled cat $CGROUP_PATH/cpu.stat | grep nr_throttled # throttled_time: Total time container was throttled (nanoseconds)cat $CGROUP_PATH/cpu.stat | grep throttled_time # Calculate throttle percentage:# throttle_pct = (nr_throttled / nr_periods) * 100 # Using cgroup v2 (modern systems):cat /sys/fs/cgroup/kubepods.slice/kubepods-pod*.slice/cpu.stat # Prometheus metrics (if using node_exporter or cAdvisor):# container_cpu_cfs_throttled_periods_total# container_cpu_cfs_periods_total# Throttle ratio = throttled_periods / total_periodsMany Google, Netflix, and other high-performance teams do NOT set CPU limits, only requests. Their reasoning: CPU throttling causes unpredictable latency spikes, and CPU overcommit on well-monitored clusters is manageable. Request-only allocation ensures scheduling fairness without throttling penalties. This is controversial—not setting limits requires mature monitoring and smaller node pools to limit blast radius.
Unlike CPU, memory is incompressible—you either have it or you don't. Memory overcommit (where total memory limits exceed node capacity) is inherently risky but often necessary for cost efficiency. Understanding how Kubernetes handles memory pressure is critical.
Memory Pressure Handling:
When a node experiences memory pressure, several mechanisms activate:
1. Eviction (Kubelet-driven)
2. OOM Kill (Kernel-driven)
3. System OOM
| QoS Class | OOM Score Adj | Kill Priority | Rationale |
|---|---|---|---|
| Guaranteed | -997 | Lowest | Resources were reserved; breaking promise is wrong |
| Burstable (high request) | -500 to -100 | Medium-Low | Closer to request = more protected |
| Burstable (low request) | 0 to 500 | Medium-High | Far below request = less protection |
| BestEffort | 1000 | Highest | No guarantees made; first to sacrifice |
12345678910111213141516171819202122232425262728293031323334353637
# Pattern 1: Prevent memory overcommit per namespaceapiVersion: v1kind: LimitRangemetadata: name: prevent-overcommit namespace: productionspec: limits: - type: Container # Force requests = limits for memory (Guaranteed QoS) maxLimitRequestRatio: memory: "1" # Limit can only be 1x request ---# Pattern 2: Reserve memory buffer on nodes# kubelet configurationapiVersion: kubelet.config.k8s.io/v1beta1kind: KubeletConfigurationevictionHard: memory.available: "1Gi" # Start hard eviction at 1Gi freeevictionSoft: memory.available: "2Gi" # Start soft eviction at 2Gi freeevictionSoftGracePeriod: memory.available: "2m" ---# Pattern 3: Pod Disruption Budget for critical workloads# Ensure eviction doesn't take down too many replicasapiVersion: policy/v1kind: PodDisruptionBudgetmetadata: name: critical-service-pdbspec: minAvailable: 2 selector: matchLabels: app: critical-serviceMemory leaks are the most common cause of OOM kills. Before setting tight memory limits, ensure your application has no memory leaks. Use heap profilers, monitor memory growth over time, and test with realistic workloads. A limit set against a leaking application is just scheduled failure.
Beyond CPU and memory, Kubernetes supports extended resources for specialized hardware like GPUs, TPUs, FPGAs, or any custom resource you define.
How Extended Resources Work:
Common Extended Resources:
nvidia.com/gpu: NVIDIA GPUs (requires nvidia device plugin)amd.com/gpu: AMD GPUsintel.com/fpga: Intel FPGAsgoogle.com/tpu: Google TPUs (GKE only)Extended Resource Characteristics:
12345678910111213141516171819202122232425262728293031323334353637
apiVersion: v1kind: Podmetadata: name: ml-training-jobspec: restartPolicy: Never containers: - name: trainer image: tensorflow/tensorflow:latest-gpu resources: limits: # Request 2 NVIDIA GPUs nvidia.com/gpu: "2" # Also need regular resources memory: "32Gi" cpu: "8" requests: memory: "32Gi" cpu: "8" volumeMounts: - mountPath: /data name: training-data # Tolerate GPU node taints (common pattern) tolerations: - key: "nvidia.com/gpu" operator: "Exists" effect: "NoSchedule" # Prefer GPU node pool nodeSelector: cloud.google.com/gke-accelerator: "nvidia-tesla-t4" volumes: - name: training-data persistentVolumeClaim: claimName: training-datasetModern GPUs (NVIDIA A100, H100) support Multi-Instance GPU (MIG) to partition a single GPU into multiple isolated instances. This enables fractional GPU sharing with hardware isolation. Time-sharing solutions like NVIDIA GPU Operator's time-slicing also allow oversubscription without hardware partitioning, useful for development workloads.
Effective resource management is the cornerstone of stable, cost-efficient Kubernetes operations. Let's consolidate the key principles and practices covered in this comprehensive guide:
What's Next:
With resource management foundations in place, the next page explores Auto-Scaling with Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA). You'll learn how to automatically adjust resource allocation and replica counts based on real-time demand, eliminating manual capacity management while maintaining performance SLAs.
You now have a comprehensive understanding of Kubernetes resource management—from basic CPU/memory configuration through QoS classes, quotas, limit ranges, and advanced topics like CPU throttling and extended resources. Apply these principles to build stable, efficient, and cost-effective Kubernetes deployments.