Loading learning content...
In the previous page, we explored the concepts and strategies behind system call filtering. Now we dive deep into seccomp (secure computing mode)—Linux's kernel facility for implementing these filters.
Seccomp represents one of the most important security innovations in the Linux kernel. Introduced progressively between 2005 and 2012, it has become the foundation of sandboxing in Chrome, Docker, Android, systemd, and countless other security-critical systems. Understanding seccomp is essential for anyone building or analyzing sandboxed systems on Linux.
Seccomp operates at the system call boundary, making decisions about whether to allow, deny, or otherwise handle each syscall before it executes. Because it runs in kernel context, seccomp provides guarantees that user-space monitoring cannot: there is no race condition between the check and the syscall execution.
By the end of this page, you will understand seccomp's architecture, the BPF programming model used for filters, how to write and install seccomp filters, the seccomp user notification mechanism for complex policies, and practical patterns used in production systems.
Seccomp operates at a critical point in the kernel: the system call entry path. When a user-space process invokes a system call, the kernel's syscall handler checks if seccomp filters are installed before executing the actual syscall handler.
Execution Flow:
User Space Kernel Space
─────────────────────────────────────────────────────
process → syscall → ──────► syscall entry
│
▼
seccomp filter check
│
┌─────────┴──────────┐
│ │
▼ ▼
ALLOW ───► syscall KILL ───► terminate
handler │
│ ▼
▼ ERRNO ───► return error
◄── return TRAP ────► SIGSYS signal
NOTIFY ──► supervisor
Key Architectural Properties:
Filter Runs in Kernel Context: The BPF filter executes in kernel space during syscall entry. This is critical—user space cannot race with or manipulate the filter execution.
Filters Are Inherited: Child processes inherit their parent's seccomp filters. Once a filter is installed, it applies to all current and future threads.
Filters Are Append-Only: You can add more restrictive filters but cannot remove filters. This prevents an attacker who gains code execution from disabling the sandbox.
Filters Stack: Multiple filters can be installed. The kernel evaluates all filters, and the most restrictive result wins (lowest precedence value).
| Action | Value | Behavior | Used For |
|---|---|---|---|
| SECCOMP_RET_KILL_PROCESS | 0x80000000 | Kill entire process | Highly dangerous syscalls |
| SECCOMP_RET_KILL_THREAD | 0x00000000 | Kill calling thread | Less disruptive than process kill |
| SECCOMP_RET_TRAP | 0x00030000 | Send SIGSYS signal | User-space emulation |
| SECCOMP_RET_ERRNO | 0x00050000 | Return errno value | Graceful denial |
| SECCOMP_RET_USER_NOTIF | 0x7fc00000 | Notify supervisor | Broker/policy server |
| SECCOMP_RET_TRACE | 0x7ff00000 | Ptrace notification | Debugging |
| SECCOMP_RET_LOG | 0x7ffc0000 | Log and allow | Policy development |
| SECCOMP_RET_ALLOW | 0x7fff0000 | Allow syscall | Permitted operations |
When multiple filters are stacked, the kernel takes the action with the lowest numeric value. KILL (0x00000000) beats ERRNO (0x00050000) beats ALLOW (0x7fff0000). This ensures that once a filter denies a syscall, no subsequent filter can allow it.
Seccomp has two distinct modes of operation, reflecting its evolution over time:
Mode 1: Strict Mode (Original seccomp)
The original seccomp (2005) provides an extremely simple, fixed policy: only four syscalls are allowed: read, write, exit, and sigreturn. Any other syscall immediately terminates the process.
#include <linux/seccomp.h>
#include <sys/prctl.h>
void enable_strict_seccomp() {
// Enable strict seccomp mode
if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT) != 0) {
perror("prctl");
exit(1);
}
// From here, only read/write/exit/sigreturn work
// Any other syscall = instant death
}
Use Cases for Strict Mode:
Limitations:
Mode 2: Filter Mode (seccomp-bpf)
Filter mode (2012) extends seccomp with BPF (Berkeley Packet Filter) programs, allowing flexible, programmable policies. The filter can inspect the syscall number, arguments, and architecture, returning any of the action codes.
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <sys/prctl.h>
void enable_seccomp_filter(struct sock_fprog *prog) {
// Required: prevent privilege escalation through exec of setuid binaries
if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) != 0) {
perror("prctl NO_NEW_PRIVS");
exit(1);
}
// Install seccomp filter
if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, prog) != 0) {
perror("prctl SECCOMP_MODE_FILTER");
exit(1);
}
}
// Alternative: use seccomp() syscall directly
void enable_seccomp_syscall(struct sock_fprog *prog): {
if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) != 0) {
perror("prctl NO_NEW_PRIVS");
exit(1);
}
if (seccomp(SECCOMP_SET_MODE_FILTER, 0, prog) != 0) {
perror("seccomp");
exit(1);
}
}
Installing seccomp filters requires either CAP_SYS_ADMIN or having PR_SET_NO_NEW_PRIVS set. Without this, an attacker could install a filter, then exec a setuid binary—the setuid binary would run with elevated privileges but be constrained by the attacker's filter, potentially enabling attacks.
Seccomp uses classic BPF (cBPF) for filter programs—a simple bytecode virtual machine originally designed for packet filtering. A BPF program consists of instructions that load values, perform arithmetic, and make conditional jumps.
The seccomp_data Structure:
The BPF filter can access syscall information through the seccomp_data structure:
struct seccomp_data {
int nr; // Syscall number
__u32 arch; // AUDIT_ARCH_ value (architecture)
__u64 instruction_pointer; // Return address
__u64 args[6]; // Syscall arguments
};
BPF Instructions:
BPF uses a small instruction set:
| Instruction Type | Purpose |
|---|---|
BPF_LD | Load value into accumulator |
BPF_LDX | Load value into index register |
BPF_ST | Store accumulator to memory |
BPF_ALU | Arithmetic/logic operations |
BPF_JMP | Conditional/unconditional jump |
BPF_RET | Return value (action) |
BPF_MISC | Miscellaneous (register copy) |
Example: Minimal Allowlist Filter:
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <linux/audit.h>
#include <sys/syscall.h>
struct sock_filter filter[] = {
// Load architecture
BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
offsetof(struct seccomp_data, arch)),
// Verify architecture is x86_64
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, AUDIT_ARCH_X86_64, 1, 0),
// Kill if wrong architecture
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL_PROCESS),
// Load syscall number
BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
offsetof(struct seccomp_data, nr)),
// Allow read (syscall 0)
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_read, 0, 1),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
// Allow write (syscall 1)
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_write, 0, 1),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
// Allow exit_group (syscall 231)
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_exit_group, 0, 1),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
// Default: kill
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL_PROCESS),
};
struct sock_fprog prog = {
.len = sizeof(filter) / sizeof(filter[0]),
.filter = filter,
};
Understanding the BPF_STMT and BPF_JUMP Macros:
// BPF_STMT: unconditional instruction
// BPF_STMT(code, k)
// code = operation type | size | mode
// k = immediate value
BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, nr))
// BPF_LD: load operation
// BPF_W: word (32-bit) size
// BPF_ABS: absolute addressing from filter data (seccomp_data)
// Result: load seccomp_data.nr into accumulator
// BPF_JUMP: conditional jump
// BPF_JUMP(code, k, jt, jf)
// code = operation type | comparison
// k = comparison value
// jt = instructions to skip if true
// jf = instructions to skip if false
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_read, 0, 1)
// BPF_JEQ: jump if equal
// BPF_K: compare with immediate value k
// If accumulator == __NR_read: skip 0 instructions (execute next)
// If accumulator != __NR_read: skip 1 instruction
Writing raw BPF is error-prone and architecture-specific. The libseccomp library provides a high-level API that generates correct BPF for any architecture. Production code should use libseccomp unless you have specific reasons for raw BPF.
libseccomp provides a high-level, architecture-independent API for creating seccomp filters. It handles the complexities of BPF generation, architecture differences, and syscall number translation.
Basic Usage:
#include <seccomp.h>
void setup_seccomp() {
// Create filter context with default action KILL_PROCESS
scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL_PROCESS);
if (ctx == NULL) {
perror("seccomp_init");
exit(1);
}
// Allow specific syscalls
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(close), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(fstat), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(mmap), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(mprotect), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(munmap), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(brk), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit_group), 0);
// Load filter into kernel
if (seccomp_load(ctx) != 0) {
perror("seccomp_load");
exit(1);
}
// Release context (filter is now in kernel)
seccomp_release(ctx);
}
Argument Filtering with libseccomp:
libseccomp makes argument filtering straightforward:
#include <seccomp.h>
void setup_with_arg_filtering() {
scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL);
// Allow socket() only for AF_UNIX (argument 0 == 1)
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(socket), 1,
SCMP_A0(SCMP_CMP_EQ, AF_UNIX));
// Allow mprotect() but not with PROT_EXEC | PROT_WRITE
// (Block arg2 having both bits set)
seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(mprotect), 1,
SCMP_A2(SCMP_CMP_MASKED_EQ,
PROT_EXEC | PROT_WRITE,
PROT_EXEC | PROT_WRITE));
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(mprotect), 0);
// Allow ioctl() only for specific commands
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(ioctl), 1,
SCMP_A1(SCMP_CMP_EQ, TCGETS));
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(ioctl), 1,
SCMP_A1(SCMP_CMP_EQ, FIONREAD));
// Block openat with O_CREAT (prevent file creation)
seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(openat), 1,
SCMP_A2(SCMP_CMP_MASKED_EQ, O_CREAT, O_CREAT));
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(openat), 0);
seccomp_load(ctx);
seccomp_release(ctx);
}
Comparison Operators:
| Operator | C Macro | Description |
|---|---|---|
| Not equal | SCMP_CMP_NE | arg != value |
| Less than | SCMP_CMP_LT | arg < value |
| Less or equal | SCMP_CMP_LE | arg <= value |
| Equal | SCMP_CMP_EQ | arg == value |
| Greater or equal | SCMP_CMP_GE | arg >= value |
| Greater than | SCMP_CMP_GT | arg > value |
| Masked equal | SCMP_CMP_MASKED_EQ | (arg & mask) == value |
Architecture Handling:
libseccomp automatically handles architecture differences:
// Enable support for additional architectures
seccomp_arch_add(ctx, SCMP_ARCH_X86); // 32-bit x86
seccomp_arch_add(ctx, SCMP_ARCH_X32); // x32 ABI
// Or remove architectures (stricter)
seccomp_arch_remove(ctx, SCMP_ARCH_NATIVE); // Remove native first
seccomp_arch_add(ctx, SCMP_ARCH_X86_64); // Add only x86_64
// Now 32-bit and x32 syscalls will be blocked
libseccomp can export generated BPF for debugging: seccomp_export_bpf(ctx, fd) writes raw BPF, and seccomp_export_pfc(ctx, fd) writes a human-readable pseudo-filter code format.
SECCOMP_RET_USER_NOTIF (Linux 5.0+) enables a powerful broker pattern where a supervisor process handles blocked syscalls on behalf of the sandboxed process. The supervisor can inspect the syscall, validate it against policy, and either perform the operation or deny it.
Architecture:
┌─────────────────┐ ┌─────────────────┐
│ Sandboxed │ │ Supervisor │
│ Process │ │ Process │
├─────────────────┤ ├─────────────────┤
│ │ │ │
│ syscall(open) │──► blocked by ─────│ notif_recv() │
│ ▼ │ seccomp │ ▼ │
│ [blocked] │ │ validate path │
│ │ │ ▼ │
│ │ │ open() if OK │
│ │ │ ▼ │
│ [resumes] ◄───│◄─ notif_send() ────│ send fd back │
│ with fd │ │ │
└─────────────────┘ └─────────────────┘
Setting Up User Notification:
#include <linux/seccomp.h>
#include <sys/ioctl.h>
int setup_notify_supervisor() {
struct sock_filter filter[] = {
// ... architecture check ...
// Load syscall number
BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
offsetof(struct seccomp_data, nr)),
// Send openat to supervisor
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_openat, 0, 1),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_USER_NOTIF),
// Allow other syscalls
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
};
struct sock_fprog prog = {
.len = sizeof(filter) / sizeof(filter[0]),
.filter = filter,
};
// Install filter and get notification fd
int notify_fd = seccomp(SECCOMP_SET_MODE_FILTER,
SECCOMP_FILTER_FLAG_NEW_LISTENER,
&prog);
if (notify_fd < 0) {
perror("seccomp");
exit(1);
}
return notify_fd; // Parent uses this fd to receive notifications
}
Supervisor Event Loop:
void supervisor_loop(int notify_fd) {
struct seccomp_notif *req = NULL;
struct seccomp_notif_resp *resp = NULL;
struct seccomp_notif_sizes sizes;
// Get sizes for allocation
seccomp(SECCOMP_GET_NOTIF_SIZES, 0, &sizes);
req = malloc(sizes.seccomp_notif);
resp = malloc(sizes.seccomp_notif_resp);
while (1) {
memset(req, 0, sizes.seccomp_notif);
memset(resp, 0, sizes.seccomp_notif_resp);
// Receive notification (blocks until syscall intercepted)
if (ioctl(notify_fd, SECCOMP_IOCTL_NOTIF_RECV, req) != 0) {
if (errno == ENOENT) // Target died
continue;
perror("NOTIF_RECV");
break;
}
// req->id: unique notification ID
// req->pid: pid of sandboxed process
// req->data.nr: syscall number
// req->data.args[]: syscall arguments
resp->id = req->id;
resp->flags = 0;
if (req->data.nr == __NR_openat) {
// Read pathname from sandboxed process memory
char path[PATH_MAX];
if (read_process_memory(req->pid, req->data.args[1],
path, sizeof(path)) < 0) {
resp->error = -EACCES;
} else if (is_path_allowed(path)) {
// Perform open on behalf of sandbox
int fd = openat(req->data.args[0], path,
req->data.args[2], req->data.args[3]);
if (fd >= 0) {
// Send fd to sandboxed process
resp->val = fd;
resp->flags = SECCOMP_USER_NOTIF_FLAG_CONTINUE;
// Actually, for fd injection, use SECCOMP_IOCTL_NOTIF_ADDFD
} else {
resp->error = -errno;
}
} else {
resp->error = -EACCES;
}
}
// Send response (unblocks sandboxed process)
if (ioctl(notify_fd, SECCOMP_IOCTL_NOTIF_SEND, resp) != 0) {
perror("NOTIF_SEND");
}
}
}
User notification has an inherent TOCTOU race: between reading the sandboxed process's memory and performing the syscall, the sandbox might modify the memory. Always use SECCOMP_IOCTL_NOTIF_ID_VALID before acting on read data to verify the target hasn't been recycled, and consider the security implications carefully.
The seccomp() system call accepts various flags that modify filter behavior:
SECCOMP_FILTER_FLAG_TSYNC:
Synchronize the filter across all threads in the thread group. Without this, each thread needs to install the filter individually:
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, &prog);
// Now all existing threads have the filter
SECCOMP_FILTER_FLAG_LOG:
Log all filtered syscalls that match a LOG action. Also ensures non-LOG actions are logged if audit is enabled:
seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_LOG, &prog);
// dmesg shows: seccomp filter LOG action
SECCOMP_FILTER_FLAG_SPEC_ALLOW:
Disable Spectre mitigations for this filter. Can improve performance but reduces security against Spectre attacks:
// Only use if performance is critical and you understand Spectre risks
seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_SPEC_ALLOW, &prog);
| Flag | Value | Purpose |
|---|---|---|
| SECCOMP_FILTER_FLAG_TSYNC | 1 | Sync filter to all threads |
| SECCOMP_FILTER_FLAG_LOG | 2 | Enable logging |
| SECCOMP_FILTER_FLAG_SPEC_ALLOW | 4 | Disable Spectre mitigations |
| SECCOMP_FILTER_FLAG_NEW_LISTENER | 8 | Return notification fd |
| SECCOMP_FILTER_FLAG_TSYNC_ESRCH | 16 | ESRCH if sync fails |
| SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV | 32 | Killable wait in NOTIF_RECV |
seccomp Attribute Operations:
Recent kernels support querying and setting seccomp attributes:
struct seccomp_notif_sizes sizes;
// Get required allocation sizes for notification structures
seccomp(SECCOMP_GET_NOTIF_SIZES, 0, &sizes);
printf("notif: %hu, notif_resp: %hu, data: %hu\n",
sizes.seccomp_notif, sizes.seccomp_notif_resp,
sizes.seccomp_data);
// Get current seccomp action (for diagnostic)
int action;
seccomp(SECCOMP_GET_ACTION_AVAIL, 0, &action);
For multi-threaded applications, always use SECCOMP_FILTER_FLAG_TSYNC when installing the filter. Otherwise, there's a race window where some threads might execute syscalls before the filter is installed. TSYNC atomically applies the filter to all threads.
Production seccomp implementations follow established patterns that balance security with reliability:
Pattern: Privileged Setup, Then Sandbox:
Perform all privileged operations before installing the filter:
int main() {
// Phase 1: Privileged setup
int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
bind(listen_fd, ...); // Bind to port 80
listen(listen_fd, 100);
// Open config files, allocate resources, etc.
config_t *cfg = load_config("/etc/myservice.conf");
// Phase 2: Drop privileges
drop_root_privileges();
// Phase 3: Install seccomp filter
install_seccomp_filter();
// Phase 4: Run main loop (now sandboxed)
while (1) {
int conn = accept(listen_fd, ...);
handle_connection(conn); // Cannot open files, bind ports, etc.
}
}
Pattern: Fail-Open Development, Fail-Closed Production:
void install_filter(int strict_mode) {
scmp_filter_ctx ctx = seccomp_init(
strict_mode ? SCMP_ACT_KILL : SCMP_ACT_LOG
);
// ... rules ...
seccomp_load(ctx);
}
// Development: blocked syscalls are logged, not killed
install_filter(false);
// Production: blocked syscalls terminate process
install_filter(true);
Pattern: Sandbox Entry Function:
int enter_sandbox(void (*sandboxed_main)(void *), void *arg) {
// 1. Validate we're in a clean state
if (getuid() == 0) {
fprintf(stderr, "Must not be root\n");
return -1;
}
// 2. Set up additional isolation
if (unshare(CLONE_NEWNET | CLONE_NEWNS | CLONE_NEWIPC) != 0) {
perror("unshare");
// Continue anyway—seccomp is the primary protection
}
// 3. Set NO_NEW_PRIVS
if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) != 0) {
perror("PR_SET_NO_NEW_PRIVS");
return -1;
}
// 4. Drop capabilities
drop_all_capabilities();
// 5. Install seccomp filter
if (install_seccomp_filter() != 0) {
return -1;
}
// 6. Close unnecessary file descriptors
close_fds_above(2);
// 7. Call sandboxed code
sandboxed_main(arg);
return 0;
}
Seccomp filter performance is generally excellent, but understanding the performance characteristics helps in designing efficient sandboxes.
Filter Execution Overhead:
Seccomp filters add overhead to every system call. The overhead depends on:
Typical Overhead:
| Scenario | Overhead per Syscall |
|---|---|
| No seccomp | Baseline |
| Small filter (20 instructions) | ~50-100 ns |
| Medium filter (100 instructions) | ~200-400 ns |
| Large filter (500+ instructions) | ~500-1000 ns |
For perspective, a typical syscall takes 100-500 ns, so overhead ranges from negligible to doubling syscall time.
Optimization Techniques:
1. Put Common Syscalls First:
libseccomp and manual filters should check frequently-called syscalls early:
// Bad: check rare syscalls first
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(reboot), 0); // Never called!
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0); // Called constantly
// Better: common syscalls checked first (exits filter faster)
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(futex), 0);
libseccomp can optimize this automatically:
seccomp_attr_set(ctx, SCMP_FLTATR_CTL_OPTIMIZE, 2); // Enable optimizations
2. Use Binary Search Tree Structure:
libseccomp can generate binary search trees instead of linear lists:
seccomp_attr_set(ctx, SCMP_FLTATR_CTL_OPTIMIZE, 2);
// Filter structure will be optimized for O(log n) lookup
3. Minimize Argument Checks:
Argument filtering is more expensive than syscall number checking. Only filter arguments when necessary for security.
For most applications, seccomp overhead is negligible compared to actual syscall work and application logic. Profile before optimizing. Syscall-heavy micro-benchmarks show worst-case overhead; real applications see much less impact.
We have explored Linux's seccomp facility in depth—from architecture to practical implementation patterns. Let's consolidate the key insights:
What's Next:
We've covered process sandboxing and system call filtering in detail. The final page in this module explores container isolation—how container technologies like Docker and Kubernetes combine the mechanisms we've studied (namespaces, cgroups, seccomp, capabilities) to create practical, scalable isolation for modern workloads.
You now understand seccomp's architecture, the BPF programming model for filters, how to use libseccomp for practical filter development, user notification for complex policies, and the patterns used in production sandboxes. You can design, implement, and analyze seccomp-based system call filtering.