Loading learning content...
Of the eight Linux namespace types, three form the essential foundation for practical containerization: PID namespaces, network namespaces, and mount namespaces. These three isolate the most critical system resources—processes, networking, and filesystems—that determine what a container can see, access, and modify.
Understanding these namespaces in depth reveals how container runtimes like Docker, containerd, and CRI-O construct the isolated environments that power modern cloud infrastructure. Each namespace type has unique characteristics, hierarchy rules, and interaction patterns that directly impact container behavior.
By the end of this page, you will understand the internal mechanics of PID, network, and mount namespaces in precise detail. You'll learn about PID namespace hierarchies, network namespace connectivity via virtual interfaces, and mount propagation semantics. This knowledge enables you to debug container issues, implement custom runtime features, and understand why containers behave the way they do.
PID namespaces provide the foundational illusion that each container has its own complete process tree, starting from PID 1. This isolation prevents containers from seeing, signaling, or interfering with processes outside their namespace.
The PID 1 Problem
In UNIX tradition, PID 1 is special—it's the init process, responsible for:
When Docker runs a container, the container's main process becomes PID 1 inside the PID namespace. This has profound implications that catch many developers off guard.
PID 1 has special signal handling semantics in the kernel. Unlike other processes, PID 1 only receives signals for which it has explicitly installed handlers. SIGTERM and SIGKILL without a handler are ignored! This is why docker stop sometimes times out—the container process ignores SIGTERM because it never expected to be PID 1.
PID Namespace Hierarchy
PID namespaces form a strict parent-child hierarchy. This is unique among namespace types—most other namespaces are flat collections.
When process A in namespace N creates a PID namespace N', then:
This hierarchy is enforced: you cannot join a PID namespace that is an ancestor of your current one—that would break the isolation model.
Dual PID Visibility
A key insight is that processes in nested PID namespaces have multiple PIDs simultaneously—one in each ancestor namespace including their own. The kernel maintains this mapping internally:
# From host namespace
$ ps aux | grep nginx
root 2001 nginx: master process # Host PID
root 2002 nginx: worker process # Host PID
# From inside container
$ ps aux
PID CMD
1 nginx: master process # Container PID
2 nginx: worker process # Container PID
The process has PID 2001 in the host namespace and PID 1 in the container namespace. Both are valid; which one you see depends on which namespace you're observing from.
The /proc Interface
Each PID namespace maintains its own view of /proc. When a container mounts its own /proc, it only shows processes within its PID namespace. This is why ps inside a container only shows container processes—it's reading a namespace-scoped /proc.
Zombie Reaping
When a process's parent dies, it becomes orphaned and is reparented to PID 1 of its PID namespace (not the host's PID 1). If the container's PID 1 doesn't properly implement wait() for children, zombies accumulate within the container. This is why init systems like tini or dumb-init exist—they handle zombie reaping for containers whose main process wasn't designed to be PID 1.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
// Simple init process that properly reaps zombies// Used as PID 1 in containers to wrap the real application #include <signal.h>#include <sys/wait.h>#include <unistd.h> static volatile sig_atomic_t got_sigchld = 0; void sigchld_handler(int sig) { got_sigchld = 1;} void reap_zombies() { int status; pid_t pid; // Reap all available zombie children while ((pid = waitpid(-1, &status, WNOHANG)) > 0) { // Log child termination if needed }} int main(int argc, char *argv[]) { // Install SIGCHLD handler struct sigaction sa; sa.sa_handler = sigchld_handler; sigemptyset(&sa.sa_mask); sa.sa_flags = SA_RESTART | SA_NOCLDSTOP; sigaction(SIGCHLD, &sa, NULL); // Fork and exec the real application pid_t child = fork(); if (child == 0) { // Child: exec the real command execvp(argv[1], &argv[1]); _exit(127); } // Parent: wait for main child while reaping zombies while (1) { pause(); // Sleep until signal if (got_sigchld) { got_sigchld = 0; reap_zombies(); } // Check if main child exited int status; if (waitpid(child, &status, WNOHANG) > 0) { return WEXITSTATUS(status); } }}Network namespaces provide complete network stack isolation. Each network namespace has its own:
A new network namespace starts nearly empty—it contains only an unconfigured loopback interface (lo). To be useful, network devices must be created or moved into it, and routing must be configured.
Network namespaces isolate at layer 3 and above. A container has its own IP addresses, routes, and firewall—it's as if it had a completely separate network stack. This is more thorough than just IP aliasing or port translation; it's true stack virtualization.
Virtual Ethernet Pairs (veth)
The primary mechanism for connecting network namespaces is the veth pair—two virtual Ethernet interfaces that act as a bidirectional pipe. Packets transmitted on one interface appear as received on the other.
Veth pairs are created as a linked pair, then one end is moved to a different namespace:
# Create veth pair: veth0 and veth1
ip link add veth0 type veth peer name veth1
# Move veth1 to container's network namespace (by PID)
ip link set veth1 netns $CONTAINER_PID
# Configure host end
ip addr add 10.0.0.1/24 dev veth0
ip link set veth0 up
# Configure container end (run inside namespace)
ip addr add 10.0.0.2/24 dev veth1
ip link set veth1 up
ip route add default via 10.0.0.1
Now traffic from the container (10.0.0.2) reaches the host (10.0.0.1) through the veth pair. The host can NAT this traffic to the external network, providing internet access.
Bridge Networking
Docker's default networking (the bridge mode) uses a software bridge in the host namespace. All container veth endpoints connect to this bridge, enabling:
The bridge acts as a virtual switch at layer 2. Containers on the same bridge can communicate using their bridge-assigned IPs without NAT.
Host Networking Mode
When a container uses --network=host, it shares the host's network namespace entirely. No isolation exists—the container sees all host interfaces and can bind to any port. This is faster (no veth overhead) but sacrifices isolation.
None Networking Mode
With --network=none, the container gets only an unconfigured loopback interface. It has no network connectivity. This is useful for:
| Mode | Network Namespace | Interfaces | Performance | Isolation |
|---|---|---|---|---|
| bridge (default) | Separate per container | veth pair + bridge | Good (small overhead) | Full network isolation |
| host | Shared with host | All host interfaces | Native (no overhead) | None (shared stack) |
| none | Separate, empty | Only lo (unconfigured) | N/A (no networking) | Complete (no network) |
| container:<id> | Shared with another container | Shared with target | Good (shared namespace) | Shared with target container |
| macvlan | Separate per container | Virtual MAC on host NIC | Near native | Layer 2 separation |
CNI (Container Network Interface)
Kubernetes and other orchestrators use CNI plugins to configure container networking. The CNI specification defines a simple interface:
Popular CNI plugins (Calico, Cilium, Flannel) implement various networking models—overlay networks, BGP peering, eBPF-based routing—all using the same fundamental namespace primitives.
Mount namespaces isolate the filesystem mount table—the kernel data structure that maps directory paths to mounted filesystems. Each mount namespace has its own independent mount table, enabling containers to have entirely different filesystem views than the host.
The Mount Table
Every mount namespace maintains a complete, independent mount table. When a process mounts a filesystem, it only affects processes in the same mount namespace. This enables:
/proc, /sys, /dev with appropriate visibilitytmpfs) for container scratch space123456789101112131415161718192021222324
#!/bin/bash# Demonstrate mount namespace isolation # Create a new mount namespaceunshare --mount /bin/bash << 'INNER_SHELL' echo "Inside new mount namespace" # This mount is only visible in this namespace mkdir -p /tmp/isolated-demo mount -t tmpfs tmpfs /tmp/isolated-demo echo "secret-data" > /tmp/isolated-demo/secret.txt echo "File exists here:" cat /tmp/isolated-demo/secret.txt # Check mount from inside mount | grep isolated-demo # Output: tmpfs on /tmp/isolated-demo type tmpfs (rw,relatime)INNER_SHELL # Back in original namespaceecho "Outside the mount namespace:"cat /tmp/isolated-demo/secret.txt 2>/dev/null || echo "File not accessible!"mount | grep isolated-demo || echo "Mount not visible here!"Mount Propagation
Mount namespaces have a critical feature called mount propagation that controls how mount/unmount events propagate between namespace instances. This is essential for scenarios where the host's mounts should (or should not) be visible inside containers.
There are four propagation types:
These propagation semantics enable sophisticated mount configurations:
| Propagation | Host Mounts Visible in Container? | Container Mounts Visible on Host? | Use Case |
|---|---|---|---|
| shared | Yes (after container start) | Yes | Shared filesystem pools |
| slave | Yes (after container start) | No | Dynamic host mounts (USB, NFS) |
| private | No (only initial mounts) | No | Full isolation (typical container) |
| unbindable | No | No | Security-sensitive mounts |
Container Root Filesystem
Container images provide a root filesystem that becomes the container's /. The container runtime uses mount namespaces and overlay filesystems to achieve this:
pivot_root() or chroot() to change root to the overlay mount/proc, /sys, /devOverlayFS: The Container Filesystem
OverlayFS is a union filesystem that overlays multiple directory trees, presenting a unified view. For containers:
When a container reads a file, OverlayFS searches from top to bottom:
When a container writes:
This architecture enables:
The first write to a large file triggers a full copy from lowerdir to upperdir. For write-heavy workloads on large files (databases, log files), use volumes that bypass OverlayFS entirely to avoid copy-on-write overhead.
To solidify our understanding, let's walk through creating a minimal container using the namespace primitives directly. This is essentially what container runtimes do, stripped to essentials.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114
#define _GNU_SOURCE#include <sched.h>#include <stdio.h>#include <stdlib.h>#include <string.h>#include <sys/mount.h>#include <sys/stat.h>#include <sys/syscall.h>#include <sys/wait.h>#include <unistd.h> #define STACK_SIZE (1024 * 1024)static char child_stack[STACK_SIZE]; // Container rootfs path (e.g., extracted alpine rootfs)const char *rootfs = "./rootfs"; void setup_mounts() { // Make all mounts private to prevent propagation mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL); // Bind mount the rootfs to itself (prepare for pivot_root) mount(rootfs, rootfs, NULL, MS_BIND | MS_REC, NULL); // Create directories for pivot_root char old_root[256]; snprintf(old_root, sizeof(old_root), "%s/.old_root", rootfs); mkdir(old_root, 0755); // Change root to container rootfs if (syscall(SYS_pivot_root, rootfs, old_root) == -1) { perror("pivot_root"); exit(1); } // Change working directory to new root chdir("/"); // Unmount old root umount2("/.old_root", MNT_DETACH); rmdir("/.old_root"); // Mount essential filesystems mount("proc", "/proc", "proc", 0, NULL); mount("sysfs", "/sys", "sysfs", 0, NULL); mount("tmpfs", "/tmp", "tmpfs", 0, NULL); // Mount minimal /dev mount("tmpfs", "/dev", "tmpfs", MS_NOSUID | MS_STRICTATIME, "mode=755"); mknod("/dev/null", S_IFCHR | 0666, makedev(1, 3)); mknod("/dev/zero", S_IFCHR | 0666, makedev(1, 5)); mknod("/dev/random", S_IFCHR | 0666, makedev(1, 8)); mknod("/dev/urandom", S_IFCHR | 0666, makedev(1, 9));} int container_main(void *arg) { char **argv = (char **)arg; // Set container hostname sethostname("container", 9); // Setup container filesystem setup_mounts(); printf("Container started (PID %d inside namespace)", getpid()); printf("Hostname: "); system("hostname"); // Execute the specified command execvp(argv[0], argv); perror("execvp"); return 1;} int main(int argc, char *argv[]) { if (argc < 2) { fprintf(stderr, "Usage: %s <command> [args...]", argv[0]); return 1; } printf("Starting container with namespaces..."); // Create child process in new namespaces int flags = CLONE_NEWPID | // New PID namespace CLONE_NEWNS | // New mount namespace CLONE_NEWUTS | // New UTS namespace (hostname) CLONE_NEWNET | // New network namespace CLONE_NEWIPC | // New IPC namespace SIGCHLD; pid_t child_pid = clone(container_main, child_stack + STACK_SIZE, flags, &argv[1]); if (child_pid == -1) { perror("clone"); return 1; } printf("Container PID from host perspective: %d", child_pid); // Wait for container to exit int status; waitpid(child_pid, &status, 0); printf("Container exited with status %d", WEXITSTATUS(status)); return WEXITSTATUS(status);}What This Code Does
Creates namespaces: Uses clone() with namespace flags to create PID, mount, UTS, network, and IPC namespaces
Sets up mount namespace:
pivot_root() to change the root to container rootfs/proc, /sys, /tmp with appropriate filesystem types/dev with essential device nodesConfigures hostname: Sets a container-specific hostname in the UTS namespace
Executes command: Runs the specified command inside the isolated environment
This is a simplified version of what Docker or runc does. A production container runtime adds:
The three namespaces we've studied interact in subtle ways. Understanding these interactions is crucial for debugging container issues.
PID and Mount Namespace Interaction
The /proc filesystem is PID-namespace aware. When you mount proc inside a mount namespace that's also in a different PID namespace, the mounted /proc shows only processes in that PID namespace.
# This won't work as expected:
unshare --mount /bin/bash
mount -t proc proc /proc
ps aux # Still shows host processes!
# You need both namespaces:
unshare --mount --pid --fork /bin/bash
mount -t proc proc /proc
ps aux # Now shows only container processes
The --fork is required with --pid because the calling process is already assigned a PID. Only new processes can become PID 1 in the new namespace.
If ps inside a container shows host processes, check: (1) Is /proc mounted inside the container? (2) Was /proc mounted from within the container's PID namespace? A /proc mounted before PID namespace creation shows the old PID namespace's processes.
Network and Mount Namespace Interaction
/proc/net and /sys/class/net reflect the network namespace of the viewing process, not the mount namespace. This means:
# Inside container that shares host mount namespace but has separate netns
ls /sys/class/net # Shows container interfaces, not host
cat /proc/net/tcp # Shows container's TCP connections
The kernel dynamically generates these entries based on the network namespace, regardless of mount configuration.
Signals Across PID Namespaces
Processes can only signal processes in the same PID namespace or descendant namespaces. A container cannot kill host processes—not just because of permissions, but because from the container's perspective, those PIDs don't exist.
However, the host can signal container processes using their host-visible PIDs:
# From host, kill process inside container
kill -SIGTERM 2001 # Using host PID, not container PID 1
Network Namespace and Loopback
Each network namespace has its own loopback interface (lo). A common issue: the loopback isn't automatically brought up in new namespaces.
unshare --net /bin/bash
ip link show # lo exists but is DOWN
ping 127.0.0.1 # Network unreachable!
# Must manually bring up loopback:
ip link set lo up
ping 127.0.0.1 # Now works
Container runtimes handle this automatically, but custom namespace usage requires explicit loopback configuration.
Namespaces are designed to be lightweight, but they're not free. Understanding their performance characteristics helps in designing efficient container architectures.
PID Namespace Overhead
PID namespaces add minimal runtime overhead:
In practice, PID namespace overhead is negligible—a few nanoseconds per operation. The hierarchy is typically shallow (1-3 levels), and the kernel optimizes lookups.
Network Namespace Overhead
Network namespaces introduce measurable overhead through veth pairs:
For network-intensive workloads, options to reduce overhead:
--network=host)| Metric | Host Native | Bridge (veth) | macvlan | Host Mode |
|---|---|---|---|---|
| Latency (μs) | Baseline | +10-20 | +5-10 | Baseline |
| Throughput (%) | 100% | 90-95% | 95-99% | 100% |
| CPU Overhead | Baseline | 5-15% | 2-5% | Baseline |
| Isolation | None | Full | L2 only | None |
Mount Namespace Overhead
Mount namespaces themselves add negligible overhead—mount lookups traverse the same VFS structures. The overhead comes from what you mount:
Memory Overhead
Namespaces consume kernel memory:
For thousands of containers, this adds up. Shared namespaces (e.g., multiple containers sharing a network namespace) reduce memory usage.
PID 1 Considerations
The containerized PID 1 receives special treatment:
Opting for a proper init process (tini, dumb-init) adds minimal overhead but ensures correct signal handling and zombie reaping.
For most workloads, namespace overhead is negligible. Focus optimization on: (1) Minimize OverlayFS layer count for I/O-intensive workloads, (2) Consider host networking for network-intensive workloads requiring lowest latency, (3) Share namespaces between related containers (pods) to reduce overhead and enable efficient communication.
We've explored the three most critical namespace types for containerization in depth. Let's consolidate the key takeaways:
What's next:
Namespaces provide isolation—they limit what containers can see. But seeing resources and consuming them are different concerns. The next page introduces cgroups (control groups), the Linux kernel feature that limits how much of shared resources (CPU, memory, I/O) each container can consume. Together, namespaces and cgroups form the complete container resource model.
You now understand PID, network, and mount namespaces in depth—how they work internally, how they interact, and how container runtimes use them to create isolated environments. Next, we'll explore cgroups for resource control and limiting.