Operating SystemsNamespaces and cgroups

Linux Namespaces and Control Groups

LevelAdvanced

Duration90 mins

TopicNamespaces and cgroups

3 / 5

cgroups (Control Groups)

The Other Half of Containerization

While namespaces answer the question "What can a process see?", cgroups (control groups) answer an equally critical question: "How much can a process consume?"

Without cgroups, a containerized process could monopolize CPU time, exhaust system memory, or saturate disk I/O—bringing down not just itself, but the entire host and all other containers. Cgroups provide the resource accounting and limiting mechanisms that make multi-tenant container hosting safe and predictable.

Cgroups are not a container-specific feature—they're a fundamental kernel mechanism for resource management. But containers are their killer application, and understanding cgroups is essential for anyone operating containerized infrastructure.

What You Will Learn

By the end of this page, you will understand what cgroups are, how they evolved from v1 to v2, the hierarchical structure they form, and the controllers that govern different resource types. You'll learn how to create and configure cgroups, set resource limits, and understand how container runtimes use cgroups to enforce resource constraints.

The cgroup Concept

A control group (cgroup) is a kernel mechanism that organizes processes into hierarchical groups and applies resource management policies to those groups. Unlike namespaces (which provide isolation), cgroups provide:

Resource Limiting: Set hard or soft limits on resource consumption (CPU, memory, I/O, network)
Resource Accounting: Track resource usage by process groups for monitoring and billing
Prioritization: Control the relative allocation of resources between competing groups
Control: Freeze, checkpoint, or restart process groups

Every process in the system belongs to exactly one cgroup for each controller (resource type). The cgroups form a hierarchy—a tree structure where child cgroups inherit properties from their parents and can have additional constraints applied.

The Right Mental Model

Think of cgroups as organizational units in a company's budget system. Departments (cgroups) are organized hierarchically. Each department has a budget allocation (resource limits). Employees (processes) belong to departments. The sum of department budgets cannot exceed the parent organization's budget (hierarchical limits).

Controllers: The Resource Managers

Cgroups themselves are just organizational structures—the actual resource management is performed by controllers (also called subsystems). Each controller manages a specific resource type:

cpu: CPU time allocation and throttling
cpuset: CPU and memory node assignment
memory: Memory usage limits and accounting
io (v2) / blkio (v1): Block I/O throttling
pids: Process count limits
hugetlb: Huge page usage limits
cpuacct (v1): CPU accounting (merged into cpu in v2)
net_cls, net_prio (v1): Network classification and priority
devices: Device access whitelisting
freezer: Process suspension/resume
perf_event: Performance monitoring

A cgroup can have multiple controllers attached, providing comprehensive resource management from a single hierarchy.

The Hierarchy Structure

Cgroups form a tree where:

The root cgroup contains all processes by default
Child cgroups are created under parents
Processes in a child cgroup are constrained by both their cgroup's limits AND all ancestor cgroup limits
A process moves between cgroups by writing its PID to the target cgroup's cgroup.procs file

Converting Mermaid diagram...

cgroup v1 vs cgroup v2

Linux has two cgroup implementations: the original cgroup v1 (introduced in kernel 2.6.24, 2008) and the redesigned cgroup v2 (introduced in kernel 4.5, 2016, became mature around 5.x). Understanding both is necessary because production systems still use v1, while v2 is the future.

cgroup v1: Multiple Hierarchies

In v1, each controller has its own independent hierarchy. A process belongs to one cgroup in the CPU hierarchy, a potentially different cgroup in the memory hierarchy, and another in the I/O hierarchy.

cgroup-v1-structure.txt

Plaintext

/sys/fs/cgroup/
├── cpu/                      # CPU controller hierarchy
│   ├── docker/
│   │   ├── container-a/
│   │   │   ├── cpu.cfs_period_us
│   │   │   ├── cpu.cfs_quota_us
│   │   │   └── tasks
│   │   └── container-b/
│   └── system.slice/
├── memory/                   # Separate memory hierarchy
│   ├── docker/
│   │   ├── container-a/
│   │   │   ├── memory.limit_in_bytes
│   │   │   └── tasks
│   │   └── container-b/
│   └── system.slice/
├── blkio/                    # Separate block I/O hierarchy  
│   └── docker/
└── pids/                     # Separate PID limit hierarchy
    └── docker/

This multi-hierarchy approach has significant problems:

Inconsistent membership: A process can be in /cpu/docker/container-a but /memory/system.slice/sshd—confusing and error-prone
Complex coordination: Managing multiple hierarchies requires careful synchronization
Incomplete accounting: Some resource pressure metrics are per-controller, not global
Ambiguous semantics: Different controllers have different (sometimes conflicting) API conventions

cgroup v2: Unified Hierarchy

Cgroup v2 mandates a single, unified hierarchy. All controllers share the same tree structure, and a process's position in the hierarchy determines its constraints across all controllers.

cgroup-v2-structure.txt

Plaintext

/sys/fs/cgroup/                # Unified hierarchy root
├── cgroup.controllers         # Available controllers
├── cgroup.subtree_control     # Controllers enabled for subtree
├── docker/
│   ├── container-a/
│   │   ├── cgroup.controllers
│   │   ├── cpu.max              # CPU limit (replaces quota/period)
│   │   ├── memory.max           # Memory limit
│   │   ├── io.max               # I/O limit
│   │   ├── pids.max             # PID limit
│   │   └── cgroup.procs         # Member processes
│   └── container-b/
│       ├── cpu.max
│       ├── memory.max
│       └── ...
└── system.slice/
    └── sshd.service/

Key cgroup v2 Improvements

Unified hierarchy: One tree, consistent membership across all controllers
Simplified API: More consistent file naming and semantics
Improved resource distribution: Weight-based distribution at all levels
Buffer accounting: Memory controller properly accounts for file-backed pages
Pressure stall information (PSI): cpu.pressure, memory.pressure, io.pressure provide standardized pressure metrics
Thread-level control: cgroup.type can be threaded for per-thread granularity
Delegation-safe: Clear rules for delegating cgroup management to unprivileged processes

cgroup v1 vs v2 Key Differences
Aspect	cgroup v1	cgroup v2
Hierarchy	Multiple (per-controller)	Single (unified)
Mount point	/sys/fs/cgroup/<controller>	/sys/fs/cgroup
Process placement	Can differ per controller	Same for all controllers
CPU limit file	cpu.cfs_quota_us + cpu.cfs_period_us	cpu.max (quota period)
Memory limit file	memory.limit_in_bytes	memory.max
I/O limit file	blkio.throttle.*	io.max
Pressure metrics	Not standardized	PSI (cpu/memory/io.pressure)
Delegation	Complex, error-prone	Well-defined rules

Hybrid Mode Warning

Some systems run in 'hybrid' mode with v2 for some controllers and v1 for others. This is a transitional configuration that should be avoided for new deployments. A controller can only be attached to one hierarchy version at a time.

Creating and Managing cgroups

Cgroups are managed through a pseudo-filesystem mounted at /sys/fs/cgroup. Creating a cgroup is as simple as creating a directory; configuring it involves writing to files in that directory.

Creating a cgroup (v2)

create-cgroup.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash
# Create a cgroup v2 with CPU and memory limits
 
# 1. Create the cgroup by making a directory
mkdir -p /sys/fs/cgroup/my_container
 
# 2. Enable controllers for this cgroup's children
# (Controllers must be enabled at each level of the hierarchy)
echo "+cpu +memory +io +pids" > /sys/fs/cgroup/cgroup.subtree_control
 
# 3. Configure CPU limit: 50% of one CPU (50000 microseconds per 100000)
echo "50000 100000" > /sys/fs/cgroup/my_container/cpu.max
 
# 4. Configure memory limit: 512MB hard limit
echo $((512 * 1024 * 1024)) > /sys/fs/cgroup/my_container/memory.max
 
# 5. Configure memory soft limit (for reclaim under pressure)
echo $((256 * 1024 * 1024)) > /sys/fs/cgroup/my_container/memory.high
 
# 6. Configure PID limit: Max 100 processes
echo 100 > /sys/fs/cgroup/my_container/pids.max
 
# 7. Move a process into the cgroup
echo $$ > /sys/fs/cgroup/my_container/cgroup.procs
 
# 8. Verify membership
cat /proc/$$/cgroup
# Output: 0::/my_container

The subtree_control Mechanism

In cgroup v2, controllers must be explicitly enabled for a cgroup's subtree before child cgroups can use them. This is done via the cgroup.subtree_control file:

# At the root, enable controllers
echo "+cpu +memory" > /sys/fs/cgroup/cgroup.subtree_control

# Now children can use cpu and memory controllers
mkdir /sys/fs/cgroup/child
echo "max 100000" > /sys/fs/cgroup/child/cpu.max  # Works!

This explicit enablement:

Prevents resource overhead from unused controllers
Makes the controller hierarchy explicit and predictable
Allows different subtrees to use different controller sets

Moving Processes Between cgroups

Processes are moved by writing their PID to the target cgroup's cgroup.procs file:

# Move process 1234 to my_container cgroup
echo 1234 > /sys/fs/cgroup/my_container/cgroup.procs

# Move all threads of a process (cgroup v2)
echo 1234 > /sys/fs/cgroup/my_container/cgroup.procs
# (All threads move together by default in v2)

# Check which cgroup a process is in
cat /proc/1234/cgroup

The No Internal Processes Rule (v2)

Cgroup v2 enforces the "no internal processes" rule: a cgroup with configured controllers cannot have both processes AND child cgroups. Processes must be in leaf cgroups.

/sys/fs/cgroup/
└── parent/                  # Has subtree_control configured
    ├── cgroup.procs         # Must be empty if controllers enabled for subtree
    ├── child-a/
    │   └── cgroup.procs     # Processes go here (leaf)
    └── child-b/
        └── cgroup.procs     # Or here (leaf)

This rule simplifies resource distribution calculations and prevents ambiguous accounting scenarios.

Removing cgroups

Cgroups are removed by removing their directory, but only if empty:

# This fails if cgroup has processes or children
rmdir /sys/fs/cgroup/my_container

# First, move all processes out
for pid in $(cat /sys/fs/cgroup/my_container/cgroup.procs); do
    echo $pid > /sys/fs/cgroup/cgroup.procs
done

# Then remove
rmdir /sys/fs/cgroup/my_container

CPU Controller — Scheduling and Limiting

The CPU controller manages how much CPU time processes in a cgroup receive. It supports both limiting (hard caps) and weighting (relative shares).

CPU Limiting (CFS Bandwidth)

The Completely Fair Scheduler (CFS) implements bandwidth limiting via quota and period:

quota: Maximum CPU time (microseconds) the cgroup can use per period
period: The time window for quota enforcement (typically 100ms)

In cgroup v2, these are combined in cpu.max:

cpu-limit-examples.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# cpu.max format: "$quota $period" (microseconds)
 
# Limit to 50% of one CPU (50ms every 100ms)
echo "50000 100000" > /sys/fs/cgroup/container/cpu.max
 
# Limit to 200% (2 full CPUs worth)
echo "200000 100000" > /sys/fs/cgroup/container/cpu.max
 
# Limit to 25% of one CPU
echo "25000 100000" > /sys/fs/cgroup/container/cpu.max
 
# No limit (default)
echo "max 100000" > /sys/fs/cgroup/container/cpu.max
 
# Check current CPU limit
cat /sys/fs/cgroup/container/cpu.max
# Output: 50000 100000

CPU Weighting (Shares)

When multiple cgroups compete for CPU, the cpu.weight (v2) or cpu.shares (v1) determines relative allocation:

# Default weight is 100
# Double the default weight = 2x relative CPU access
echo 200 > /sys/fs/cgroup/container-a/cpu.weight
echo 100 > /sys/fs/cgroup/container-b/cpu.weight
# container-a gets 2/3 of contested CPU time
# container-b gets 1/3 of contested CPU time

Important: weights only matter when CPU is contested. If container-b is idle, container-a gets all available CPU regardless of weights.

Limits vs Weights

Limits (cpu.max) are hard caps—the kernel throttles processes that exceed their quota. Weights (cpu.weight) only affect distribution when CPU is contested—an idle-weighted container costs nothing. Use limits for billing and isolation guarantees; use weights for fair sharing.

CPU Throttling Mechanics

When a cgroup exhausts its quota before the period ends, the kernel throttles all runnable processes in that cgroup. Throttled processes are paused until the next period begins and quota is replenished.

Throttling is visible in cgroup statistics:

cat /sys/fs/cgroup/container/cpu.stat
# usage_usec 1234567890    # Total CPU time used
# user_usec 1000000000     # User-space CPU time
# system_usec 234567890    # Kernel CPU time
# nr_periods 12345         # Number of periods elapsed
# nr_throttled 234         # Number of times throttled
# throttled_usec 5678900   # Total time spent throttled

High nr_throttled or throttled_usec values indicate the cgroup's CPU limit is too low for its workload—it's regularly being paused mid-computation.

The cpuset Controller

The cpuset controller constrains which CPUs (cores) and which memory nodes (NUMA) a cgroup can use:

# Pin cgroup to CPUs 0-1 (cores 0 and 1 only)
echo "0-1" > /sys/fs/cgroup/container/cpuset.cpus

# Pin to NUMA node 0 for memory allocation
echo "0" > /sys/fs/cgroup/container/cpuset.mems

Cpuset is critical for:

Performance-sensitive workloads (avoid cache pollution)
NUMA-aware applications (locality optimization)
Preventing noisy neighbors on specific cores
Real-time workloads requiring dedicated CPUs

Kubernetes CPU Requests and Limits

Kubernetes' CPU configuration maps directly to cgroups:

resources:
  requests:
    cpu: "500m"      # Converts to cpu.weight proportional share
  limits:
    cpu: "2"         # Converts to cpu.max: 200000 100000 (2 cores)

requests.cpu affects scheduling decisions and sets cpu.shares/weight
limits.cpu sets the cpu.max quota for hard throttling

Memory Controller — Limits and OOM

The memory controller is arguably the most critical for container stability. It tracks memory usage, enforces limits, and handles out-of-memory conditions for cgroups.

Memory Accounting

The memory controller accounts for:

Anonymous memory: Stack, heap, mmap without files (RSS)
File-backed memory: Page cache for files used by the cgroup
Kernel memory: Memory used by kernel on behalf of the cgroup (slab, network buffers, etc.)
Swap usage: Swap space consumed by the cgroup's pages

In cgroup v2, comprehensive accounting is available via memory.stat:

memory-stat-example.txt

Plaintext

$ cat /sys/fs/cgroup/container/memory.stat
anon 52428800                  # Anonymous memory (50 MB)
file 26214400                  # Page cache (25 MB)
kernel 4194304                 # Kernel memory (4 MB)
sock 8192                      # Socket buffers
shmem 0                        # Shared memory
file_mapped 10485760           # Mapped file pages
file_dirty 4096                # Dirty file pages
file_writeback 0               # Pages being written back
slab 2097152                   # Slab allocator
pgfault 1234567                # Page faults
pgmajfault 42                  # Major page faults (disk reads)
workingset_refault 12345       # Refaults from working set
workingset_activate 6789       # Working set activations
oom_kill 0                     # OOM kills in this cgroup

Memory Limits

Cgroup v2 provides three limit levels:

memory.max: Hard limit. Exceeding triggers OOM killer
memory.high: Soft limit. Exceeding triggers aggressive reclaim (throttling)
memory.min: Guaranteed minimum. Protected from reclaim even under pressure
memory.low: Soft minimum. Protected unless entire system under severe pressure

memory-limits.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Hard limit: 1GB - OOM if exceeded
echo $((1024 * 1024 * 1024)) > /sys/fs/cgroup/container/memory.max
 
# Soft limit: 768MB - Aggressive reclaim above this
echo $((768 * 1024 * 1024)) > /sys/fs/cgroup/container/memory.high
 
# Protected minimum: 256MB - Won't be reclaimed below this
echo $((256 * 1024 * 1024)) > /sys/fs/cgroup/container/memory.min
 
# Soft minimum: 512MB - Protected unless system under severe pressure
echo $((512 * 1024 * 1024)) > /sys/fs/cgroup/container/memory.low
 
# Check current memory usage
cat /sys/fs/cgroup/container/memory.current
# Output: 52428800 (50 MB in bytes)

The OOM Killer

When a cgroup reaches its memory.max and cannot reclaim enough memory, the kernel invokes the OOM killer within that cgroup. The OOM killer selects and terminates processes to free memory, considering:

Process memory usage (larger consumers are more likely victims)
oom_score_adj values (per-process adjustment)
Whether killing the process would actually free memory

In containerized environments, OOM kills are confined to the offending cgroup—they don't kill processes in other containers or the host. This is essential for isolation.

OOM events are logged and visible:

# Watch for OOM events
cat /sys/fs/cgroup/container/memory.events
# low 0        # Times memory dropped below memory.low
# high 42      # Times memory exceeded memory.high
# max 3        # Times memory hit memory.max
# oom 1        # OOM events
# oom_kill 1   # Processes killed by OOM

# Check cgroup OOM kill count
cat /sys/fs/cgroup/container/memory.stat | grep oom_kill

The OOM Trap

If the container's main process (PID 1 inside the container) is OOM-killed, the entire container dies. Memory limits should allow headroom, and critical applications should monitor memory.events for early warning of memory pressure.

Memory Reclaim Under Pressure

Before invoking OOM, the kernel attempts reclaim:

Page cache eviction: Discard clean file-backed pages
Dirty page writeback: Write dirty pages to disk, then evict
Anonymous page swapping: If swap enabled, page out anonymous memory
Slab shrinking: Reclaim kernel slab memory

When approaching memory.high, the kernel aggressively reclaims, which throttles the cgroup. Applications see increased latency as the kernel works to free memory. Monitoring memory.high events helps identify workloads that need more memory or optimization.

Swap Behavior

Cgroup v2's memory.swap.max controls swap usage per-cgroup:

# Allow up to 512MB swap
echo $((512 * 1024 * 1024)) > /sys/fs/cgroup/container/memory.swap.max

# Disable swap for this cgroup (common for containers)
echo 0 > /sys/fs/cgroup/container/memory.swap.max

# Check current swap usage
cat /sys/fs/cgroup/container/memory.swap.current

Containers typically disable swap to:

Ensure predictable latency (no swap thrashing)
Get immediate OOM rather than degraded performance
Adhere to container memory guarantees (if memory is exhausted, die quickly)

I/O and Other Controllers

Beyond CPU and memory, cgroups provide controllers for I/O, process count, devices, and more.

I/O Controller

The I/O controller (cgroup v2's io or v1's blkio) manages block device I/O bandwidth and IOPS:

# Limit I/O to 10 MB/s read, 5 MB/s write for device 8:0
echo "8:0 rbps=10485760 wbps=5242880" > /sys/fs/cgroup/container/io.max

# Limit IOPS: 100 read IOPS, 50 write IOPS
echo "8:0 riops=100 wiops=50" > /sys/fs/cgroup/container/io.max

# Combine bandwidth and IOPS limits
echo "8:0 rbps=10485760 wbps=5242880 riops=1000 wiops=500" > /sys/fs/cgroup/container/io.max

# Check I/O statistics
cat /sys/fs/cgroup/container/io.stat
# 8:0 rbytes=1234567 wbytes=7654321 rios=100 wios=50 dbytes=0 dios=0

Device identifiers (8:0) are major:minor numbers. Find them with:

ls -la /dev/sda  # Shows major:minor
cat /sys/block/sda/dev  # Shows major:minor

I/O Weight (Proportional Sharing)

Like CPU weights, I/O weights determine relative bandwidth when devices are contested:

# Default weight is 100, range is 1-10000
echo "8:0 200" > /sys/fs/cgroup/container-a/io.weight
echo "8:0 100" > /sys/fs/cgroup/container-b/io.weight
# container-a gets 2x the I/O bandwidth of container-b when contested

PIDs Controller

The PIDs controller limits the number of processes (and threads) in a cgroup, preventing fork bombs:

# Limit to 100 processes
echo 100 > /sys/fs/cgroup/container/pids.max

# Check current process count
cat /sys/fs/cgroup/container/pids.current
# Output: 42

# Check if limit was ever hit
cat /sys/fs/cgroup/container/pids.events
# max 0  (number of times fork failed due to limit)

Fork Bomb Protection

The PIDs controller is essential for multi-tenant systems. Without it, a malicious or buggy container could fork-bomb the entire host. Kubernetes and Docker set pids.max by default.

Devices Controller

The devices controller (v1, being replaced by eBPF in v2) whitelists which devices a cgroup can access:

# In cgroup v1:
# Deny all devices by default
echo "a" > /sys/fs/cgroup/devices/container/devices.deny

# Allow /dev/null (char 1:3)
echo "c 1:3 rwm" > /sys/fs/cgroup/devices/container/devices.allow

# Allow /dev/urandom (char 1:9)
echo "c 1:9 r" > /sys/fs/cgroup/devices/container/devices.allow

Format: [type] [major:minor] [permissions]

Type: c (char), b (block), a (all)
major:minor: Device numbers (* for wildcard)
Permissions: r (read), w (write), m (mknod)

Freezer Controller

The freezer controller suspends and resumes all processes in a cgroup:

# Freeze (suspend) all processes
echo 1 > /sys/fs/cgroup/container/cgroup.freeze

# Resume all processes
echo 0 > /sys/fs/cgroup/container/cgroup.freeze

# Check frozen state
cat /sys/fs/cgroup/container/cgroup.freeze

Used for:

Container checkpointing (CRIU)
Live migration preparation
Debugging (freeze, inspect, resume)

cgroup Controller Summary
Controller	Controls	Key Files (v2)	Primary Use
cpu	CPU time	cpu.max, cpu.weight	Limit/share CPU
cpuset	CPU/NUMA affinity	cpuset.cpus, cpuset.mems	Pin to cores
memory	Memory usage	memory.max, memory.high	Limit RAM
io	Block I/O	io.max, io.weight	Limit disk I/O
pids	Process count	pids.max	Prevent fork bomb
devices	Device access	(v1: devices.allow)	Device whitelist
freezer	Process execution	cgroup.freeze	Suspend/resume
hugetlb	Huge pages	hugetlb.*.max	Limit huge pages

cgroups in Container Runtimes

Container runtimes (Docker, containerd, CRI-O, Podman) abstract cgroup management, but understanding the mapping helps with troubleshooting and optimization.

Docker cgroup Management

When you run a Docker container with resource limits:

docker run -d 
  --name myapp 
  --cpus="1.5" 
  --memory="512m" 
  --memory-swap="512m" 
  --pids-limit=100 
  nginx

Docker creates a cgroup (the path depends on cgroup driver) and sets:

Docker Flag	cgroup v2 File	Value
`--cpus=1.5`	`cpu.max`	`150000 100000`
`--memory=512m`	`memory.max`	`536870912`
`--memory-swap=512m`	`memory.swap.max`	`0` (same as mem = no swap)
`--pids-limit=100`	`pids.max`	`100`

Finding a Container's cgroup

# Get container's cgroup path
docker inspect myapp --format '{{.HostConfig.CgroupParent}}'

# Or check from inside the container's PID
CID=$(docker inspect myapp --format '{{.State.Pid}}')
cat /proc/$CID/cgroup

# For cgroup v2, typically:
# 0::/system.slice/docker-<container-id>.scope

# Navigate to cgroup
cd /sys/fs/cgroup/system.slice/docker-$(docker inspect myapp -f '{{.Id}}').scope
ls
# cpu.max  cpu.stat  memory.current  memory.max  pids.current  ...

Kubernetes cgroup Structure

Kubernetes organizes cgroups hierarchically:

/sys/fs/cgroup/
└── kubepods.slice/                     # All pod cgroups
    ├── kubepods-burstable.slice/       # Burstable QoS pods
    │   └── kubepods-burstable-pod<uid>.slice/
    │       └── cri-containerd-<cid>.scope/   # Container cgroup
    ├── kubepods-besteffort.slice/      # BestEffort QoS pods
    └── kubepods-guaranteed.slice/      # Guaranteed QoS pods

This structure enables:

Per-pod resource accounting
QoS-based resource distribution
Hierarchical limits (pod limits parent container limits)

systemd cgroup Integration

Modern Linux systems use systemd as init, and systemd manages cgroups via slices, scopes, and services:

Slices: Hierarchical groupings (user.slice, system.slice)
Scopes: Externally-launched processes (containers)
Services: systemd-managed services

Docker can use the systemd-cgroup driver (recommended for systemd-based hosts):

// /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"]
}

This delegates cgroup management to systemd, ensuring consistency between systemd's view and Docker's view of resource usage.

Debugging Resource Issues

When containers misbehave, check cgroup stats:

# Check if CPU throttled
cat /sys/fs/cgroup/.../cpu.stat | grep throttled

# Check memory pressure
cat /sys/fs/cgroup/.../memory.pressure
# some avg10=0.00 avg60=0.00 avg300=0.00 total=0
# full avg10=0.00 avg60=0.00 avg300=0.00 total=0

# Check for OOM events
cat /sys/fs/cgroup/.../memory.events | grep oom

# Real-time monitoring with bpftrace or systemd-cgtop
systemd-cgtop

PSI for Monitoring

Pressure Stall Information (PSI) in cgroup v2 provides standardized metrics for resource contention. Monitor cpu.pressure, memory.pressure, and io.pressure for early warning of resource exhaustion before OOM or throttling becomes severe.

Summary: Control Groups

We've explored Linux control groups comprehensively. Let's consolidate the key takeaways:

Key Takeaways

•cgroups control resource consumption — While namespaces control visibility, cgroups control how much CPU, memory, I/O, and other resources processes can use
•cgroup v2 is the future — Single unified hierarchy, cleaner API, PSI metrics, and better semantics; v1 is legacy
•Controllers manage specific resources — CPU, memory, io, pids, cpuset, devices, freezer each handle their resource domain
•CPU controller supports limits and weights — cpu.max for hard throttling, cpu.weight for proportional sharing under contention
•Memory controller enforces limits with OOM — memory.max triggers OOM killer; memory.high triggers aggressive reclaim and throttling
•Hierarchical limits cascade — A child cgroup cannot exceed its parent's limits; resource constraints are inherited down the tree
•Container runtimes abstract cgroup management — Docker, Kubernetes flags map to cgroup files; understanding the mapping aids debugging
•PSI provides pressure metrics — cpu/memory/io.pressure files indicate resource contention levels for proactive monitoring

What's next:

We now understand the two foundational container primitives: namespaces for isolation and cgroups for resource control. The next page focuses on resource limiting in practice—how to calculate appropriate limits, common patterns (requests vs limits), overcommitment strategies, and real-world tuning based on workload characteristics. We'll connect the theoretical cgroup knowledge to practical container sizing decisions.

Page Complete

You now understand Linux control groups—the resource management primitive that complements namespaces for containerization. You know how to create cgroups, configure controllers, set limits, and interpret statistics. Next, we'll apply this knowledge to practical resource limiting scenarios.

3 / 5

Loading learning content...

Operating SystemsNamespaces and cgroups

Linux Namespaces and Control Groups

LevelAdvanced

Duration90 mins

TopicNamespaces and cgroups

3 / 5

cgroups (Control Groups)

The Other Half of Containerization

While namespaces answer the question "What can a process see?", cgroups (control groups) answer an equally critical question: "How much can a process consume?"

What You Will Learn

The cgroup Concept

Resource Limiting: Set hard or soft limits on resource consumption (CPU, memory, I/O, network)
Resource Accounting: Track resource usage by process groups for monitoring and billing
Prioritization: Control the relative allocation of resources between competing groups
Control: Freeze, checkpoint, or restart process groups

The Right Mental Model

Controllers: The Resource Managers

Cgroups themselves are just organizational structures—the actual resource management is performed by controllers (also called subsystems). Each controller manages a specific resource type:

cpu: CPU time allocation and throttling
cpuset: CPU and memory node assignment
memory: Memory usage limits and accounting
io (v2) / blkio (v1): Block I/O throttling
pids: Process count limits
hugetlb: Huge page usage limits
cpuacct (v1): CPU accounting (merged into cpu in v2)
net_cls, net_prio (v1): Network classification and priority
devices: Device access whitelisting
freezer: Process suspension/resume
perf_event: Performance monitoring

A cgroup can have multiple controllers attached, providing comprehensive resource management from a single hierarchy.

The Hierarchy Structure

Cgroups form a tree where:

The root cgroup contains all processes by default
Child cgroups are created under parents
Processes in a child cgroup are constrained by both their cgroup's limits AND all ancestor cgroup limits
A process moves between cgroups by writing its PID to the target cgroup's cgroup.procs file

Converting Mermaid diagram...

cgroup v1 vs cgroup v2

cgroup v1: Multiple Hierarchies

cgroup-v1-structure.txt

Plaintext

/sys/fs/cgroup/
├── cpu/                      # CPU controller hierarchy
│   ├── docker/
│   │   ├── container-a/
│   │   │   ├── cpu.cfs_period_us
│   │   │   ├── cpu.cfs_quota_us
│   │   │   └── tasks
│   │   └── container-b/
│   └── system.slice/
├── memory/                   # Separate memory hierarchy
│   ├── docker/
│   │   ├── container-a/
│   │   │   ├── memory.limit_in_bytes
│   │   │   └── tasks
│   │   └── container-b/
│   └── system.slice/
├── blkio/                    # Separate block I/O hierarchy  
│   └── docker/
└── pids/                     # Separate PID limit hierarchy
    └── docker/

This multi-hierarchy approach has significant problems:

Inconsistent membership: A process can be in /cpu/docker/container-a but /memory/system.slice/sshd—confusing and error-prone
Complex coordination: Managing multiple hierarchies requires careful synchronization
Incomplete accounting: Some resource pressure metrics are per-controller, not global
Ambiguous semantics: Different controllers have different (sometimes conflicting) API conventions

cgroup v2: Unified Hierarchy

Cgroup v2 mandates a single, unified hierarchy. All controllers share the same tree structure, and a process's position in the hierarchy determines its constraints across all controllers.

cgroup-v2-structure.txt

Plaintext

/sys/fs/cgroup/                # Unified hierarchy root
├── cgroup.controllers         # Available controllers
├── cgroup.subtree_control     # Controllers enabled for subtree
├── docker/
│   ├── container-a/
│   │   ├── cgroup.controllers
│   │   ├── cpu.max              # CPU limit (replaces quota/period)
│   │   ├── memory.max           # Memory limit
│   │   ├── io.max               # I/O limit
│   │   ├── pids.max             # PID limit
│   │   └── cgroup.procs         # Member processes
│   └── container-b/
│       ├── cpu.max
│       ├── memory.max
│       └── ...
└── system.slice/
    └── sshd.service/

Key cgroup v2 Improvements

Unified hierarchy: One tree, consistent membership across all controllers
Simplified API: More consistent file naming and semantics
Improved resource distribution: Weight-based distribution at all levels
Buffer accounting: Memory controller properly accounts for file-backed pages
Pressure stall information (PSI): cpu.pressure, memory.pressure, io.pressure provide standardized pressure metrics
Thread-level control: cgroup.type can be threaded for per-thread granularity
Delegation-safe: Clear rules for delegating cgroup management to unprivileged processes

cgroup v1 vs v2 Key Differences
Aspect	cgroup v1	cgroup v2
Hierarchy	Multiple (per-controller)	Single (unified)
Mount point	/sys/fs/cgroup/<controller>	/sys/fs/cgroup
Process placement	Can differ per controller	Same for all controllers
CPU limit file	cpu.cfs_quota_us + cpu.cfs_period_us	cpu.max (quota period)
Memory limit file	memory.limit_in_bytes	memory.max
I/O limit file	blkio.throttle.*	io.max
Pressure metrics	Not standardized	PSI (cpu/memory/io.pressure)
Delegation	Complex, error-prone	Well-defined rules

Hybrid Mode Warning

Creating and Managing cgroups

Cgroups are managed through a pseudo-filesystem mounted at /sys/fs/cgroup. Creating a cgroup is as simple as creating a directory; configuring it involves writing to files in that directory.

Creating a cgroup (v2)

create-cgroup.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash
# Create a cgroup v2 with CPU and memory limits
 
# 1. Create the cgroup by making a directory
mkdir -p /sys/fs/cgroup/my_container
 
# 2. Enable controllers for this cgroup's children
# (Controllers must be enabled at each level of the hierarchy)
echo "+cpu +memory +io +pids" > /sys/fs/cgroup/cgroup.subtree_control
 
# 3. Configure CPU limit: 50% of one CPU (50000 microseconds per 100000)
echo "50000 100000" > /sys/fs/cgroup/my_container/cpu.max
 
# 4. Configure memory limit: 512MB hard limit
echo $((512 * 1024 * 1024)) > /sys/fs/cgroup/my_container/memory.max
 
# 5. Configure memory soft limit (for reclaim under pressure)
echo $((256 * 1024 * 1024)) > /sys/fs/cgroup/my_container/memory.high
 
# 6. Configure PID limit: Max 100 processes
echo 100 > /sys/fs/cgroup/my_container/pids.max
 
# 7. Move a process into the cgroup
echo $$ > /sys/fs/cgroup/my_container/cgroup.procs
 
# 8. Verify membership
cat /proc/$$/cgroup
# Output: 0::/my_container

The subtree_control Mechanism

In cgroup v2, controllers must be explicitly enabled for a cgroup's subtree before child cgroups can use them. This is done via the cgroup.subtree_control file:

# At the root, enable controllers
echo "+cpu +memory" > /sys/fs/cgroup/cgroup.subtree_control

# Now children can use cpu and memory controllers
mkdir /sys/fs/cgroup/child
echo "max 100000" > /sys/fs/cgroup/child/cpu.max  # Works!

This explicit enablement:

Prevents resource overhead from unused controllers
Makes the controller hierarchy explicit and predictable
Allows different subtrees to use different controller sets

Moving Processes Between cgroups

Processes are moved by writing their PID to the target cgroup's cgroup.procs file:

# Move process 1234 to my_container cgroup
echo 1234 > /sys/fs/cgroup/my_container/cgroup.procs

# Move all threads of a process (cgroup v2)
echo 1234 > /sys/fs/cgroup/my_container/cgroup.procs
# (All threads move together by default in v2)

# Check which cgroup a process is in
cat /proc/1234/cgroup

The No Internal Processes Rule (v2)

Cgroup v2 enforces the "no internal processes" rule: a cgroup with configured controllers cannot have both processes AND child cgroups. Processes must be in leaf cgroups.

/sys/fs/cgroup/
└── parent/                  # Has subtree_control configured
    ├── cgroup.procs         # Must be empty if controllers enabled for subtree
    ├── child-a/
    │   └── cgroup.procs     # Processes go here (leaf)
    └── child-b/
        └── cgroup.procs     # Or here (leaf)

This rule simplifies resource distribution calculations and prevents ambiguous accounting scenarios.

Removing cgroups

Cgroups are removed by removing their directory, but only if empty:

# This fails if cgroup has processes or children
rmdir /sys/fs/cgroup/my_container

# First, move all processes out
for pid in $(cat /sys/fs/cgroup/my_container/cgroup.procs); do
    echo $pid > /sys/fs/cgroup/cgroup.procs
done

# Then remove
rmdir /sys/fs/cgroup/my_container

CPU Controller — Scheduling and Limiting

The CPU controller manages how much CPU time processes in a cgroup receive. It supports both limiting (hard caps) and weighting (relative shares).

CPU Limiting (CFS Bandwidth)

The Completely Fair Scheduler (CFS) implements bandwidth limiting via quota and period:

quota: Maximum CPU time (microseconds) the cgroup can use per period
period: The time window for quota enforcement (typically 100ms)

In cgroup v2, these are combined in cpu.max:

cpu-limit-examples.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# cpu.max format: "$quota $period" (microseconds)
 
# Limit to 50% of one CPU (50ms every 100ms)
echo "50000 100000" > /sys/fs/cgroup/container/cpu.max
 
# Limit to 200% (2 full CPUs worth)
echo "200000 100000" > /sys/fs/cgroup/container/cpu.max
 
# Limit to 25% of one CPU
echo "25000 100000" > /sys/fs/cgroup/container/cpu.max
 
# No limit (default)
echo "max 100000" > /sys/fs/cgroup/container/cpu.max
 
# Check current CPU limit
cat /sys/fs/cgroup/container/cpu.max
# Output: 50000 100000

CPU Weighting (Shares)

When multiple cgroups compete for CPU, the cpu.weight (v2) or cpu.shares (v1) determines relative allocation:

# Default weight is 100
# Double the default weight = 2x relative CPU access
echo 200 > /sys/fs/cgroup/container-a/cpu.weight
echo 100 > /sys/fs/cgroup/container-b/cpu.weight
# container-a gets 2/3 of contested CPU time
# container-b gets 1/3 of contested CPU time

Important: weights only matter when CPU is contested. If container-b is idle, container-a gets all available CPU regardless of weights.

Limits vs Weights

CPU Throttling Mechanics

Throttling is visible in cgroup statistics:

cat /sys/fs/cgroup/container/cpu.stat
# usage_usec 1234567890    # Total CPU time used
# user_usec 1000000000     # User-space CPU time
# system_usec 234567890    # Kernel CPU time
# nr_periods 12345         # Number of periods elapsed
# nr_throttled 234         # Number of times throttled
# throttled_usec 5678900   # Total time spent throttled

High nr_throttled or throttled_usec values indicate the cgroup's CPU limit is too low for its workload—it's regularly being paused mid-computation.

The cpuset Controller

The cpuset controller constrains which CPUs (cores) and which memory nodes (NUMA) a cgroup can use:

# Pin cgroup to CPUs 0-1 (cores 0 and 1 only)
echo "0-1" > /sys/fs/cgroup/container/cpuset.cpus

# Pin to NUMA node 0 for memory allocation
echo "0" > /sys/fs/cgroup/container/cpuset.mems

Cpuset is critical for:

Performance-sensitive workloads (avoid cache pollution)
NUMA-aware applications (locality optimization)
Preventing noisy neighbors on specific cores
Real-time workloads requiring dedicated CPUs

Kubernetes CPU Requests and Limits

Kubernetes' CPU configuration maps directly to cgroups:

resources:
  requests:
    cpu: "500m"      # Converts to cpu.weight proportional share
  limits:
    cpu: "2"         # Converts to cpu.max: 200000 100000 (2 cores)

requests.cpu affects scheduling decisions and sets cpu.shares/weight
limits.cpu sets the cpu.max quota for hard throttling

Memory Controller — Limits and OOM

The memory controller is arguably the most critical for container stability. It tracks memory usage, enforces limits, and handles out-of-memory conditions for cgroups.

Memory Accounting

The memory controller accounts for:

Anonymous memory: Stack, heap, mmap without files (RSS)
File-backed memory: Page cache for files used by the cgroup
Kernel memory: Memory used by kernel on behalf of the cgroup (slab, network buffers, etc.)
Swap usage: Swap space consumed by the cgroup's pages

In cgroup v2, comprehensive accounting is available via memory.stat:

memory-stat-example.txt

Plaintext

$ cat /sys/fs/cgroup/container/memory.stat
anon 52428800                  # Anonymous memory (50 MB)
file 26214400                  # Page cache (25 MB)
kernel 4194304                 # Kernel memory (4 MB)
sock 8192                      # Socket buffers
shmem 0                        # Shared memory
file_mapped 10485760           # Mapped file pages
file_dirty 4096                # Dirty file pages
file_writeback 0               # Pages being written back
slab 2097152                   # Slab allocator
pgfault 1234567                # Page faults
pgmajfault 42                  # Major page faults (disk reads)
workingset_refault 12345       # Refaults from working set
workingset_activate 6789       # Working set activations
oom_kill 0                     # OOM kills in this cgroup

Memory Limits

Cgroup v2 provides three limit levels:

memory.max: Hard limit. Exceeding triggers OOM killer
memory.high: Soft limit. Exceeding triggers aggressive reclaim (throttling)
memory.min: Guaranteed minimum. Protected from reclaim even under pressure
memory.low: Soft minimum. Protected unless entire system under severe pressure

memory-limits.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Hard limit: 1GB - OOM if exceeded
echo $((1024 * 1024 * 1024)) > /sys/fs/cgroup/container/memory.max
 
# Soft limit: 768MB - Aggressive reclaim above this
echo $((768 * 1024 * 1024)) > /sys/fs/cgroup/container/memory.high
 
# Protected minimum: 256MB - Won't be reclaimed below this
echo $((256 * 1024 * 1024)) > /sys/fs/cgroup/container/memory.min
 
# Soft minimum: 512MB - Protected unless system under severe pressure
echo $((512 * 1024 * 1024)) > /sys/fs/cgroup/container/memory.low
 
# Check current memory usage
cat /sys/fs/cgroup/container/memory.current
# Output: 52428800 (50 MB in bytes)

The OOM Killer

Process memory usage (larger consumers are more likely victims)
oom_score_adj values (per-process adjustment)
Whether killing the process would actually free memory

In containerized environments, OOM kills are confined to the offending cgroup—they don't kill processes in other containers or the host. This is essential for isolation.

OOM events are logged and visible:

# Watch for OOM events
cat /sys/fs/cgroup/container/memory.events
# low 0        # Times memory dropped below memory.low
# high 42      # Times memory exceeded memory.high
# max 3        # Times memory hit memory.max
# oom 1        # OOM events
# oom_kill 1   # Processes killed by OOM

# Check cgroup OOM kill count
cat /sys/fs/cgroup/container/memory.stat | grep oom_kill

The OOM Trap

Memory Reclaim Under Pressure

Before invoking OOM, the kernel attempts reclaim:

Page cache eviction: Discard clean file-backed pages
Dirty page writeback: Write dirty pages to disk, then evict
Anonymous page swapping: If swap enabled, page out anonymous memory
Slab shrinking: Reclaim kernel slab memory

Swap Behavior

Cgroup v2's memory.swap.max controls swap usage per-cgroup:

# Allow up to 512MB swap
echo $((512 * 1024 * 1024)) > /sys/fs/cgroup/container/memory.swap.max

# Disable swap for this cgroup (common for containers)
echo 0 > /sys/fs/cgroup/container/memory.swap.max

# Check current swap usage
cat /sys/fs/cgroup/container/memory.swap.current

Containers typically disable swap to:

Ensure predictable latency (no swap thrashing)
Get immediate OOM rather than degraded performance
Adhere to container memory guarantees (if memory is exhausted, die quickly)

I/O and Other Controllers

Beyond CPU and memory, cgroups provide controllers for I/O, process count, devices, and more.

I/O Controller

The I/O controller (cgroup v2's io or v1's blkio) manages block device I/O bandwidth and IOPS:

# Limit I/O to 10 MB/s read, 5 MB/s write for device 8:0
echo "8:0 rbps=10485760 wbps=5242880" > /sys/fs/cgroup/container/io.max

# Limit IOPS: 100 read IOPS, 50 write IOPS
echo "8:0 riops=100 wiops=50" > /sys/fs/cgroup/container/io.max

# Combine bandwidth and IOPS limits
echo "8:0 rbps=10485760 wbps=5242880 riops=1000 wiops=500" > /sys/fs/cgroup/container/io.max

# Check I/O statistics
cat /sys/fs/cgroup/container/io.stat
# 8:0 rbytes=1234567 wbytes=7654321 rios=100 wios=50 dbytes=0 dios=0

Device identifiers (8:0) are major:minor numbers. Find them with:

ls -la /dev/sda  # Shows major:minor
cat /sys/block/sda/dev  # Shows major:minor

I/O Weight (Proportional Sharing)

Like CPU weights, I/O weights determine relative bandwidth when devices are contested:

# Default weight is 100, range is 1-10000
echo "8:0 200" > /sys/fs/cgroup/container-a/io.weight
echo "8:0 100" > /sys/fs/cgroup/container-b/io.weight
# container-a gets 2x the I/O bandwidth of container-b when contested

PIDs Controller

The PIDs controller limits the number of processes (and threads) in a cgroup, preventing fork bombs:

# Limit to 100 processes
echo 100 > /sys/fs/cgroup/container/pids.max

# Check current process count
cat /sys/fs/cgroup/container/pids.current
# Output: 42

# Check if limit was ever hit
cat /sys/fs/cgroup/container/pids.events
# max 0  (number of times fork failed due to limit)

Fork Bomb Protection

The PIDs controller is essential for multi-tenant systems. Without it, a malicious or buggy container could fork-bomb the entire host. Kubernetes and Docker set pids.max by default.

Devices Controller

The devices controller (v1, being replaced by eBPF in v2) whitelists which devices a cgroup can access:

# In cgroup v1:
# Deny all devices by default
echo "a" > /sys/fs/cgroup/devices/container/devices.deny

# Allow /dev/null (char 1:3)
echo "c 1:3 rwm" > /sys/fs/cgroup/devices/container/devices.allow

# Allow /dev/urandom (char 1:9)
echo "c 1:9 r" > /sys/fs/cgroup/devices/container/devices.allow

Format: [type] [major:minor] [permissions]

Type: c (char), b (block), a (all)
major:minor: Device numbers (* for wildcard)
Permissions: r (read), w (write), m (mknod)

Freezer Controller

The freezer controller suspends and resumes all processes in a cgroup:

# Freeze (suspend) all processes
echo 1 > /sys/fs/cgroup/container/cgroup.freeze

# Resume all processes
echo 0 > /sys/fs/cgroup/container/cgroup.freeze

# Check frozen state
cat /sys/fs/cgroup/container/cgroup.freeze

Used for:

Container checkpointing (CRIU)
Live migration preparation
Debugging (freeze, inspect, resume)

cgroup Controller Summary
Controller	Controls	Key Files (v2)	Primary Use
cpu	CPU time	cpu.max, cpu.weight	Limit/share CPU
cpuset	CPU/NUMA affinity	cpuset.cpus, cpuset.mems	Pin to cores
memory	Memory usage	memory.max, memory.high	Limit RAM
io	Block I/O	io.max, io.weight	Limit disk I/O
pids	Process count	pids.max	Prevent fork bomb
devices	Device access	(v1: devices.allow)	Device whitelist
freezer	Process execution	cgroup.freeze	Suspend/resume
hugetlb	Huge pages	hugetlb.*.max	Limit huge pages

cgroups in Container Runtimes

Container runtimes (Docker, containerd, CRI-O, Podman) abstract cgroup management, but understanding the mapping helps with troubleshooting and optimization.

Docker cgroup Management

When you run a Docker container with resource limits:

docker run -d 
  --name myapp 
  --cpus="1.5" 
  --memory="512m" 
  --memory-swap="512m" 
  --pids-limit=100 
  nginx

Docker creates a cgroup (the path depends on cgroup driver) and sets:

Docker Flag	cgroup v2 File	Value
`--cpus=1.5`	`cpu.max`	`150000 100000`
`--memory=512m`	`memory.max`	`536870912`
`--memory-swap=512m`	`memory.swap.max`	`0` (same as mem = no swap)
`--pids-limit=100`	`pids.max`	`100`

Finding a Container's cgroup

# Get container's cgroup path
docker inspect myapp --format '{{.HostConfig.CgroupParent}}'

# Or check from inside the container's PID
CID=$(docker inspect myapp --format '{{.State.Pid}}')
cat /proc/$CID/cgroup

# For cgroup v2, typically:
# 0::/system.slice/docker-<container-id>.scope

# Navigate to cgroup
cd /sys/fs/cgroup/system.slice/docker-$(docker inspect myapp -f '{{.Id}}').scope
ls
# cpu.max  cpu.stat  memory.current  memory.max  pids.current  ...

Kubernetes cgroup Structure

Kubernetes organizes cgroups hierarchically:

/sys/fs/cgroup/
└── kubepods.slice/                     # All pod cgroups
    ├── kubepods-burstable.slice/       # Burstable QoS pods
    │   └── kubepods-burstable-pod<uid>.slice/
    │       └── cri-containerd-<cid>.scope/   # Container cgroup
    ├── kubepods-besteffort.slice/      # BestEffort QoS pods
    └── kubepods-guaranteed.slice/      # Guaranteed QoS pods

This structure enables:

Per-pod resource accounting
QoS-based resource distribution
Hierarchical limits (pod limits parent container limits)

systemd cgroup Integration

Modern Linux systems use systemd as init, and systemd manages cgroups via slices, scopes, and services:

Slices: Hierarchical groupings (user.slice, system.slice)
Scopes: Externally-launched processes (containers)
Services: systemd-managed services

Docker can use the systemd-cgroup driver (recommended for systemd-based hosts):

// /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"]
}

This delegates cgroup management to systemd, ensuring consistency between systemd's view and Docker's view of resource usage.

Debugging Resource Issues

When containers misbehave, check cgroup stats:

# Check if CPU throttled
cat /sys/fs/cgroup/.../cpu.stat | grep throttled

# Check memory pressure
cat /sys/fs/cgroup/.../memory.pressure
# some avg10=0.00 avg60=0.00 avg300=0.00 total=0
# full avg10=0.00 avg60=0.00 avg300=0.00 total=0

# Check for OOM events
cat /sys/fs/cgroup/.../memory.events | grep oom

# Real-time monitoring with bpftrace or systemd-cgtop
systemd-cgtop

PSI for Monitoring

Summary: Control Groups

We've explored Linux control groups comprehensively. Let's consolidate the key takeaways:

Key Takeaways

•cgroups control resource consumption — While namespaces control visibility, cgroups control how much CPU, memory, I/O, and other resources processes can use
•cgroup v2 is the future — Single unified hierarchy, cleaner API, PSI metrics, and better semantics; v1 is legacy
•Controllers manage specific resources — CPU, memory, io, pids, cpuset, devices, freezer each handle their resource domain
•CPU controller supports limits and weights — cpu.max for hard throttling, cpu.weight for proportional sharing under contention
•Memory controller enforces limits with OOM — memory.max triggers OOM killer; memory.high triggers aggressive reclaim and throttling
•Hierarchical limits cascade — A child cgroup cannot exceed its parent's limits; resource constraints are inherited down the tree
•Container runtimes abstract cgroup management — Docker, Kubernetes flags map to cgroup files; understanding the mapping aids debugging
•PSI provides pressure metrics — cpu/memory/io.pressure files indicate resource contention levels for proactive monitoring

What's next:

Page Complete

3 / 5