Operating SystemseBPF

eBPF: The Linux Kernel's Programmability Revolution

LevelAdvanced

Duration90 mins

TopiceBPF

1 / 5

eBPF Overview

The Most Transformative Technology in Modern Linux

Imagine if you could safely inject custom code directly into the Linux kernel without recompiling, without loading kernel modules, and without risking system stability. Imagine observing every system call, every network packet, every file operation—all with near-zero overhead. Imagine implementing custom networking logic, security policies, or performance monitoring tools that run at kernel speeds.

This is eBPF.

eBPF (extended Berkeley Packet Filter) represents the most significant advancement in Linux kernel programmability since loadable kernel modules. It has fundamentally transformed how we approach observability, networking, and security in modern infrastructure. Companies like Netflix, Facebook, Google, and Cloudflare have built critical infrastructure components using eBPF, and it has become the foundation for tools like Cilium, Falco, and bpftrace.

What You Will Learn

By the end of this page, you will understand eBPF's architecture, the virtual machine that executes eBPF programs, the verification process that ensures safety, and why eBPF represents a paradigm shift in kernel programmability. You'll gain the foundational knowledge needed to understand how modern observability and networking tools operate at the kernel level.

From BPF to eBPF: The Evolution

To understand eBPF, we must first understand its predecessor: the Berkeley Packet Filter (BPF). The evolution from classic BPF to eBPF represents a transformation from a specialized packet filtering mechanism to a general-purpose in-kernel virtual machine.

Classic BPF: The Origin (1992)

BPF was introduced in 1992 by Steven McCanne and Van Jacobson in their seminal paper "The BSD Packet Filter: A New Architecture for User-level Packet Capture." Its purpose was simple but revolutionary: provide an efficient way to filter network packets in-kernel, avoiding the expensive copying of unwanted packets to user space.

The classic BPF architecture consisted of:

A simple register-based virtual machine with two 32-bit registers (A and X)
A small instruction set optimized for packet filtering
A just-in-time (JIT) compiler for performance
Use cases limited primarily to tcpdump and libpcap

Classic BPF vs. eBPF: Architectural Comparison
Feature	Classic BPF (cBPF)	Extended BPF (eBPF)
Register count	2 registers (A, X), 32-bit	11 registers (R0-R10), 64-bit
Instruction width	32 bits	64 bits
Stack	16 memory slots	512-byte stack
Maps (data structures)	None	Hash maps, arrays, ring buffers, etc.
Helper functions	None	1000+ kernel helper functions
Tail calls	Not supported	Supported (program chaining)
Verification	Basic safety checks	Comprehensive static analysis
Use cases	Packet filtering only	Tracing, networking, security, etc.
Attachment points	Socket filters only	100+ attachment points (kprobes, tracepoints, XDP, etc.)

The eBPF Revolution (2014)

In 2014, Alexei Starovoitov and Daniel Borkmann introduced eBPF as a major extension of classic BPF. The key insight was that the concepts behind BPF—verified, safe, sandboxed execution within the kernel—could be generalized far beyond packet filtering.

The eBPF transformation included:

Extended register set: From 2 registers to 11 64-bit registers (R0-R10), enabling more complex programs and matching modern CPU architectures.
Richer instruction set: Added modern instructions, including 64-bit arithmetic, function calls, and memory operations.
BPF maps: Introduced kernel-side data structures (hash tables, arrays, etc.) that can be shared between eBPF programs and user space.
Helper functions: Provided access to kernel functionality through a growing set of helper functions.
Multiple attachment points: Extended beyond sockets to kprobes, tracepoints, XDP, cgroups, and more.
Advanced verification: Implemented a sophisticated static analyzer to ensure program safety.

Why "extended" BPF?

The "e" in eBPF stands for "extended," but the technology has evolved so far beyond its origins that many now simply refer to it as "BPF." The Linux kernel internally uses the term "BPF" for the modern implementation, with classic BPF often referred to as "cBPF." When you see references to BPF in modern contexts, it almost always means eBPF.

The eBPF Virtual Machine Architecture

At the heart of eBPF is a sophisticated virtual machine that executes eBPF bytecode within the kernel. This VM provides the foundation for safe, efficient in-kernel programmability.

Register Architecture

The eBPF VM is a register-based machine with 11 64-bit registers:

Register	Purpose	Calling Convention
R0	Return value from functions/helpers	Return value
R1-R5	Function arguments	Arguments to helpers/functions
R6-R9	Callee-saved registers	Preserved across calls
R10	Read-only frame pointer	Points to 512-byte stack

This architecture closely mirrors the x86-64 and ARM64 calling conventions, enabling efficient JIT compilation to native code.

eBPF Register Usage Example

eBPF Assembly

// Example: eBPF program structure showing register usage
// R1 contains the context (e.g., struct pt_regs* for kprobes)
// R10 is the frame pointer
 
// Function prologue - allocate stack space
r1 = *(u64 *)(r1 + 0)     // Load first argument from context
r6 = r1                     // Save to callee-saved register
 
// Call a helper function
// Arguments go in R1-R5, result comes back in R0
r1 = r6                     // First argument
r2 = 16                     // Second argument (size)
call bpf_probe_read_kernel  // Helper function call
 
// R0 now contains the return value
if r0 != 0 goto error       // Check for errors
 
// Access stack using frame pointer (R10)
*(u64 *)(r10 - 8) = r0      // Store result on stack
r0 = *(u64 *)(r10 - 8)      // Load from stack
 
exit:
  r0 = 0                    // Return success
  exit
 
error:
  r0 = 1                    // Return error
  exit

Instruction Set

The eBPF instruction set uses a fixed 64-bit instruction format:

┌──────────┬─────────┬─────────┬────────────┬───────────────┐
│  opcode  │  dst_reg │ src_reg │   offset   │   immediate   │
│  8 bits  │  4 bits │  4 bits │  16 bits   │    32 bits    │
└──────────┴─────────┴─────────┴────────────┴───────────────┘

The instruction classes include:

Class	Encoding	Description
BPF_LD	0x00	Load operations (legacy)
BPF_LDX	0x01	Load from memory
BPF_ST	0x02	Store immediate
BPF_STX	0x03	Store from register
BPF_ALU	0x04	32-bit arithmetic
BPF_JMP	0x05	Jump operations
BPF_JMP32	0x06	32-bit jump operations
BPF_ALU64	0x07	64-bit arithmetic

Key eBPF Instructions

•Arithmetic: ADD, SUB, MUL, DIV, MOD, AND, OR, XOR, LSH, RSH, ARSH, NEG (both 32-bit and 64-bit variants)
•Memory: LDX (load), STX (store) with B/H/W/DW (byte/half/word/double-word) sizes
•Jumps: JEQ, JNE, JGT, JGE, JLT, JLE, JSET (signed and unsigned variants)
•Function calls: CALL (invoke helper functions or BPF-to-BPF calls)
•Program termination: EXIT (return from program with value in R0)
•Atomic operations: XADD, XCHG, CMPXCHG for concurrent access

Design Philosophy

The eBPF instruction set was deliberately designed to be simple enough for verification while expressive enough for complex programs. The choice of a register-based architecture (vs. stack-based) enables efficient JIT compilation and makes static analysis tractable. The instruction set avoids complex addressing modes found in x86, trading some flexibility for verifiability.

The eBPF Verifier: Guardian of Kernel Safety

The eBPF verifier is the critical component that makes in-kernel programmability safe. Before any eBPF program can execute, it must pass the verifier's comprehensive static analysis. This is not a simple syntax check—it's a deep simulation of all possible execution paths.

Why Verification is Essential

Kernel code runs with full system privileges. A bug in kernel code can:

Crash the entire system
Corrupt memory, causing data loss
Create security vulnerabilities
Cause system hangs or deadlocks

The verifier ensures that eBPF programs cannot cause these problems, even when written by untrusted users.

Verifier Safety Guarantees

•Termination: Programs must terminate. The verifier proves there are no infinite loops by requiring all back-edges to be bounded.
•Memory safety: All memory accesses must be within valid bounds. The verifier tracks the type and range of every register.
•Type safety: Register types (scalar, pointer to map, pointer to context, etc.) are tracked and enforced throughout the program.
•No uninitialized reads: The verifier ensures all registers are initialized before use.
•Valid helper calls: Only permitted helper functions can be called, with properly typed arguments.
•Stack safety: Stack accesses must be within the 512-byte frame, with proper alignment.
•No dangerous operations: Division by zero, out-of-bounds accesses, and null pointer dereferences are prevented.

How the Verifier Works

The verifier performs abstract interpretation, simulating program execution with abstract register states rather than concrete values.

Step 1: Control Flow Graph Construction

The verifier first builds a DAG (Directed Acyclic Graph) representation of the program, identifying all basic blocks and control flow edges.

Step 2: State Tracking

For each register, the verifier tracks:

Type: scalar, pointer type (map value, context, stack, etc.)
Value range: For scalars, tracks minimum and maximum possible values ([var_off, umin, umax, smin, smax])
Pointer offset: For pointers, tracks the offset from base
Taint flags: Whether the value comes from untrusted input

Step 3: Path Exploration

The verifier explores every possible execution path using depth-first search. At conditional branches, it forks the state and continues with both branches. States are merged at join points, taking the union of possible values.

Verifier State Tracking Example
C (BPF)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Consider this eBPF program
SEC("kprobe/sys_open")
int trace_open(struct pt_regs *ctx) {
    void *ptr;
    u64 len;
    
    // Verifier tracks: R1 = PTR_TO_CTX
    len = bpf_get_current_pid_tgid() >> 32;
    // Verifier tracks: len is SCALAR, range [0, 2^32 - 1]
    
    if (len > 100) {
        // In this branch: len range is [101, 2^32 - 1]
        len = 100;  
        // After assignment: len range is [100, 100]
    }
    // After join: len range is [0, 100]
    
    char buf[256];
    // This is SAFE because len <= 100 < 256
    bpf_probe_read_kernel(buf, len, ptr);
    
    return 0;
}
 
// But this would be REJECTED:
SEC("kprobe/sys_open")
int unsafe_trace(struct pt_regs *ctx) {
    u64 len = bpf_get_current_pid_tgid() >> 32;
    // Verifier tracks: len is SCALAR, range [0, 2^32 - 1]
    
    char buf[256];
    // REJECTED! len could be > 256, causing buffer overflow
    bpf_probe_read_kernel(buf, len, ctx);
    // Error: "R2 unbounded memory access, use 'var &= const'"
    
    return 0;
}

Verifier Complexity Limits

To prevent denial-of-service attacks where malicious programs cause the verifier to consume excessive resources, strict limits are enforced:

Limit	Value	Purpose
Max instructions	1 million	Limits program size
Max verified instructions	1 million	Limits verification time
Max stack depth	512 bytes	Limits stack usage
Max tail calls	33	Prevents infinite recursion via tail calls
BPF-to-BPF call depth	8	Limits function nesting
Loops (bounded)	Kernel 5.3+	Back-edges must have provable bounds

Verifier Complexity in Practice

The verifier is sophisticated but not omniscient. Complex programs can hit verification limits even when semantically safe. Developers often need to refactor code, add explicit bounds checks, or restructure control flow to satisfy the verifier. This is the price of running untrusted code in kernel space safely.

JIT Compilation: From Bytecode to Native Speed

Once an eBPF program passes verification, it's compiled to native machine code using a Just-In-Time (JIT) compiler. This transforms the eBPF bytecode into x86-64, ARM64, or other architecture-specific instructions, eliminating interpretation overhead.

JIT Compilation Process

Input: Verified eBPF bytecode
Output: Native machine code stored in executable kernel memory
Strategy: One-to-one or one-to-many instruction mapping
Optimizations: Architecture-specific optimizations, register allocation

Without JIT (Interpretation)

•Each eBPF instruction decoded at runtime
•Switch/dispatch overhead per instruction
•Poor CPU branch prediction
•Cache inefficiency
•~10x slower than native code

With JIT Compilation

•Compiles once, executes natively
•No interpretation overhead
•Excellent CPU branch prediction
•Instruction cache efficiency
•Near-native performance

eBPF to x86-64 JIT Example

Assembly

// eBPF bytecode (simplified)
r0 = 0                    // BPF_MOV64_IMM R0, 0
r1 = *(u64 *)(r10 - 8)    // BPF_LDX_MEM DW R1, R10, -8
r0 = r0 + r1              // BPF_ALU64 ADD R0, R1
exit                       // BPF_EXIT
 
// Corresponding x86-64 JIT output
// (Generated by the Linux kernel's x86 BPF JIT)
 
push   %rbp               // Function prologue
mov    %rsp, %rbp
sub    $0x200, %rsp       // Allocate 512-byte BPF stack
 
xor    %eax, %eax         // r0 = 0 (x86: eax is low 32 bits of rax)
mov    -0x8(%rbp), %rcx   // r1 = *(u64 *)(r10 - 8)
                          // rbp maps to R10 (frame pointer)
add    %rcx, %rax         // r0 += r1
 
add    $0x200, %rsp       // Function epilogue
pop    %rbp
retq                       // exit (return with value in rax)
 
// Register mapping on x86-64:
// R0  -> rax (return value)
// R1  -> rdi (1st arg, also temp)
// R2  -> rsi (2nd arg)
// R3  -> rdx (3rd arg)
// R4  -> rcx (4th arg)
// R5  -> r8  (5th arg)
// R6  -> rbx (callee-saved)
// R7  -> r13 (callee-saved)
// R8  -> r14 (callee-saved)
// R9  -> r15 (callee-saved)
// R10 -> rbp (frame pointer)

JIT Hardening

BPF JIT includes several security hardening features:

Constant blinding: Immediate values are XORed with random values and unblinded at runtime, preventing JIT spray attacks.
Image randomization: JIT'd code is placed at randomized addresses within kernel memory.
Retpoline support: Indirect jumps use retpolines on CPUs vulnerable to Spectre variant 2.
Read-only executable memory: JIT images are marked as read-only and executable, preventing code modification.

Checking JIT Status

You can check if BPF JIT is enabled with: sysctl net.core.bpf_jit_enable. Value 0 = disabled (interpreted), 1 = enabled, 2 = enabled with debug output. For production systems handling high-throughput eBPF programs (like XDP), JIT should always be enabled for acceptable performance.

eBPF Program Types and Attachment Points

eBPF's versatility comes from its diverse program types and attachment points. Each program type has a specific purpose, context, and set of available helper functions. The attachment point determines when and where the program executes.

Program Type Categories

eBPF programs can be broadly categorized by their domain:

Major eBPF Program Types
Category	Program Type	Attachment Point	Primary Use Case
Networking	BPF_PROG_TYPE_XDP	Network driver (ingress)	High-performance packet processing, DDoS mitigation
Networking	BPF_PROG_TYPE_SCHED_CLS	TC (Traffic Control)	Packet classification, container networking
Networking	BPF_PROG_TYPE_SOCKET_FILTER	Socket	Packet filtering (classic use case)
Networking	BPF_PROG_TYPE_SK_SKB	Sockmap	Socket-level proxy, load balancing
Tracing	BPF_PROG_TYPE_KPROBE	Kernel function entry/exit	Dynamic function tracing
Tracing	BPF_PROG_TYPE_TRACEPOINT	Static tracepoints	Stable kernel event tracing
Tracing	BPF_PROG_TYPE_RAW_TRACEPOINT	Raw tracepoints	Low-overhead tracepoint access
Tracing	BPF_PROG_TYPE_PERF_EVENT	Perf events	Performance monitoring, sampling
Security	BPF_PROG_TYPE_LSM	LSM hooks	Security policy enforcement
Security	BPF_PROG_TYPE_CGROUP_*	cgroup events	Per-container resource control
Observability	BPF_PROG_TYPE_STRUCT_OPS	Kernel struct_ops	Custom kernel subsystem implementations

Context Structures

Each program type receives a specific context structure as its input (passed in R1). The context provides access to relevant data for that hook point:

Program Type	Context Structure	Key Fields
XDP	`struct xdp_md`	data, data_end, data_meta, ingress_ifindex
Socket Filter	`struct __sk_buff`	data, data_end, protocol, len, ifindex
kprobe	`struct pt_regs`	CPU registers at probe point
Tracepoint	Tracepoint-specific	Varies per tracepoint
LSM	Hook-specific	Security-relevant data for the hook

eBPF Program Type Examples
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// Example 1: XDP program (networking)
SEC("xdp")
int xdp_drop_all(struct xdp_md *ctx) {
    // Context gives us packet data pointers
    void *data_end = (void *)(long)ctx->data_end;
    void *data = (void *)(long)ctx->data;
    
    struct ethhdr *eth = data;
    
    // Bounds check required by verifier
    if ((void *)(eth + 1) > data_end)
        return XDP_DROP;
    
    // Drop all UDP packets (example)
    if (eth->h_proto == htons(ETH_P_IP)) {
        struct iphdr *ip = (void *)(eth + 1);
        if ((void *)(ip + 1) > data_end)
            return XDP_DROP;
        if (ip->protocol == IPPROTO_UDP)
            return XDP_DROP;  // Drop UDP
    }
    
    return XDP_PASS;  // Allow other packets
}
 
// Example 2: Kprobe program (tracing)
SEC("kprobe/do_sys_openat2")
int trace_openat(struct pt_regs *ctx) {
    // Context gives us CPU registers at function entry
    // On x86-64, function arguments are in rdi, rsi, rdx, rcx, r8, r9
    
    int dirfd = PT_REGS_PARM1(ctx);
    const char *pathname = (const char *)PT_REGS_PARM2(ctx);
    
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    
    // Log to trace output
    bpf_printk("PID %d opening file at dirfd %d\n", pid, dirfd);
    
    return 0;
}
 
// Example 3: Tracepoint program (stable tracing)
SEC("tracepoint/syscalls/sys_enter_openat")
int trace_sys_enter_openat(struct trace_event_raw_sys_enter *ctx) {
    // Tracepoint context has structured access to syscall args
    int dirfd = ctx->args[0];
    const char *pathname = (const char *)ctx->args[1];
    int flags = ctx->args[2];
    
    bpf_printk("openat: dirfd=%d, flags=0x%x\n", dirfd, flags);
    
    return 0;
}

Choosing Program Types

The choice of program type depends on your goal. For observability, prefer tracepoints (stable ABI) over kprobes (unstable). For networking, XDP provides the best performance for early packet drops, while TC offers more flexibility. For security, LSM programs integrate with the Linux Security Module framework. Understanding these tradeoffs is essential for effective eBPF development.

The eBPF Ecosystem and Toolchain

The eBPF ecosystem has matured significantly, offering multiple layers of tooling from low-level to high-level abstractions. Understanding this ecosystem is crucial for choosing the right tool for your use case.

The eBPF Toolchain Stack

┌─────────────────────────────────────────────────────────────────┐
│  High-Level Tools: bpftrace, BCC, kubectl-trace                │
├─────────────────────────────────────────────────────────────────┤
│  eBPF Frameworks: libbpf, ebpf-go, aya (Rust), libbpf-rs        │
├─────────────────────────────────────────────────────────────────┤
│  Compiler: Clang/LLVM (C → eBPF bytecode)                       │
├─────────────────────────────────────────────────────────────────┤
│  BTF (BPF Type Format): CO-RE (Compile Once, Run Everywhere)    │
├─────────────────────────────────────────────────────────────────┤
│  Kernel: bpf() syscall, verifier, JIT, maps, helpers            │
└─────────────────────────────────────────────────────────────────┘

Key Ecosystem Components

•libbpf: The canonical C library for eBPF development, maintained in the Linux kernel tree. Provides skeleton generation, CO-RE support, and loader functionality.
•Clang/LLVM: The only production-ready compiler for eBPF. Compiles C code to eBPF bytecode using the bpf target.
•BTF (BPF Type Format): Debug information format enabling CO-RE (Compile Once, Run Everywhere) portability across kernel versions.
•BCC (BPF Compiler Collection): Python/Lua wrapper for eBPF development, great for scripting but limited for production deployment.
•bpftrace: High-level tracing language inspired by awk and DTrace. Excellent for one-liner observability queries.
•ebpf-go: Pure Go library for eBPF, enabling Go-native eBPF development.
•Aya: Rust framework for eBPF, providing memory safety in user-space tooling.

CO-RE: Solving the Portability Problem

Historically, eBPF programs that accessed kernel structures were tied to specific kernel versions—the offsets of struct fields could change between releases. CO-RE (Compile Once, Run Everywhere) solves this by:

BTF in the kernel: Modern kernels (5.2+) embed BTF information describing all kernel types.
BTF in eBPF objects: Compiled eBPF objects contain BTF describing the structures they expect.
Runtime relocation: At load time, libbpf adjusts field offsets based on the running kernel's BTF.

This enables distributing pre-compiled eBPF programs that work across different kernel versions without recompilation.

CO-RE Example: Portable Kernel Structure Access
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// vmlinux.h is generated from kernel BTF
// Contains all kernel type definitions
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_core_read.h>  // CO-RE helpers
 
SEC("kprobe/do_sys_openat2")
int trace_open(struct pt_regs *ctx) {
    struct task_struct *task;
    struct file *file;
    char comm[16];
    pid_t pid;
    
    // Get current task_struct
    task = (struct task_struct *)bpf_get_current_task();
    
    // CO-RE: Read pid field - works across kernel versions
    // BPF_CORE_READ handles offset relocation automatically
    pid = BPF_CORE_READ(task, pid);
    
    // Read task->comm (process name)
    BPF_CORE_READ_STR_INTO(&comm, task, comm);
    
    // CO-RE field existence check
    // Some fields may not exist in all kernel versions
    if (bpf_core_field_exists(task->loginuid)) {
        // Access loginuid only if it exists
        kuid_t loginuid = BPF_CORE_READ(task, loginuid);
    }
    
    bpf_printk("PID %d (%s) opening file\n", pid, comm);
    
    return 0;
}
 
// libbpf skeleton usage in user space:
// 1. Generate skeleton: bpftool gen skeleton program.bpf.o > program.skel.h
// 2. Load and attach:
//    struct program_bpf *skel = program_bpf__open_and_load();
//    program_bpf__attach(skel);
// 3. Cleanup:
//    program_bpf__destroy(skel);

Modern eBPF Development

For new eBPF projects, the recommended stack is libbpf + CO-RE + skeleton generation. This provides the best combination of performance, portability, and maintainability. BCC is useful for prototyping but carries significant runtime dependencies. bpftrace excels at interactive exploration and one-off queries.

Why eBPF Matters: The Bigger Picture

eBPF is not just a Linux feature—it represents a fundamental shift in how we think about kernel extensibility. Let's examine the paradigm shift and why major technology companies have bet heavily on eBPF.

The Traditional Problem

Before eBPF, extending kernel functionality required:

Kernel modification: Changing kernel source code, recompiling, and rebooting. Impractical for most organizations.
Loadable kernel modules (LKMs): More flexible but still dangerous—a buggy module can crash the system, and modules have full kernel privileges.
User-space solutions: Safe but slow—data must cross the kernel/user boundary, causing context switches and memory copies.

None of these options provided safe, efficient, dynamic kernel extensibility.

Kernel Extension Approaches Compared
Approach	Safety	Performance	Dynamic	Ease of Use
Kernel modification	Low (full privileges)	Best	No (requires reboot)	Hard (kernel dev skills)
Kernel modules	Low (full privileges)	Best	Yes (loadable)	Medium (kernel dev skills)
User-space	High (isolated)	Poor (context switches)	Yes	Easy
eBPF	High (verified)	Near-native	Yes (loadable)	Medium-Hard

The eBPF Paradigm

eBPF provides a fourth option: sandboxed, verified, JIT-compiled code that runs in kernel space with near-native performance. This unlocks capabilities that were previously impractical:

1. Ubiquitous Observability

With eBPF, you can observe anything happening in the kernel without modifying applications, restarting services, or incurring significant overhead. This has revolutionized debugging and monitoring in production environments.

2. High-Performance Networking

XDP (eXpress Data Path) enables packet processing at millions of packets per second per core, directly in the network driver. This powers DDoS mitigation, load balancing, and packet filtering at line rate.

3. Dynamic Security Policies

LSM programs enable runtime security policy enforcement without rebuilding kernels. This powers runtime security tools like Falco and Tetragon.

4. Custom Kernel Behavior

struct_ops allows implementing custom TCP congestion control, schedulers, and other kernel subsystems as eBPF programs—without kernel modifications.

Industry Adoption

•Facebook/Meta: Uses eBPF for network load balancing (Katran), observability, and container networking across their global infrastructure.
•Netflix: Leverages eBPF for performance analysis, debugging, and FlameScope visualizations using tools like bpftrace and BCC.
•Google: Uses eBPF in production for networking (GKE Dataplane V2/Cilium) and security (runtime enforcement).
•Cloudflare: Employs XDP for DDoS mitigation, processing millions of packets per second at the edge.
•Isovalent/Cilium: Built an entire Kubernetes CNI (Container Network Interface) on eBPF, now a CNCF graduated project.
•Microsoft: Porting eBPF to Windows, demonstrating the technology's influence beyond Linux.

The Future of eBPF

eBPF is evolving rapidly. Recent developments include: eBPF for Windows, user-space eBPF runtimes (for testing and portability), Rust-based eBPF development (Aya), and the Linux kernel's sched_ext for eBPF-based CPU schedulers. Understanding eBPF now positions you for the future of systems programming.

Summary: eBPF Overview

We've covered the foundations of eBPF. Let's consolidate the key concepts:

Key Takeaways

•eBPF evolved from classic BPF — From a simple packet filter (1992) to a general-purpose in-kernel virtual machine (2014+) with 11 registers, rich data structures, and extensive kernel integration.
•The eBPF VM architecture — Register-based, 64-bit, with a stack, designed for efficient JIT compilation while remaining verifiable.
•The verifier ensures safety — Static analysis proves programs terminate, access only valid memory, and cannot destabilize the kernel—enabling untrusted code to run safely in kernel space.
•JIT compilation enables performance — eBPF bytecode compiles to native machine code, achieving near-native execution speed.
•Multiple program types and attachment points — From XDP for networking to kprobes for tracing to LSM for security, eBPF integrates throughout the kernel.
•The ecosystem is mature — libbpf, CO-RE, BTF, and skeleton generation provide a production-ready development experience.
•eBPF enables a new paradigm — Safe, efficient, dynamic kernel programmability that has transformed observability, networking, and security.

What's Next:

Now that you understand the eBPF foundation, the next page dives into eBPF programs themselves—how they're structured, how they interact with the kernel through helper functions, how they communicate via maps, and the development workflow from C source to running in the kernel.

Foundation Complete

You now have a solid understanding of eBPF's architecture, its safety guarantees, and its place in the Linux ecosystem. In the next page, we'll explore eBPF programs in detail—structure, helper functions, maps, and the complete development lifecycle.

1 / 5

Loading learning content...

Operating SystemseBPF

eBPF: The Linux Kernel's Programmability Revolution

LevelAdvanced

Duration90 mins

TopiceBPF

1 / 5

eBPF Overview

The Most Transformative Technology in Modern Linux

This is eBPF.

What You Will Learn

From BPF to eBPF: The Evolution

Classic BPF: The Origin (1992)

The classic BPF architecture consisted of:

A simple register-based virtual machine with two 32-bit registers (A and X)
A small instruction set optimized for packet filtering
A just-in-time (JIT) compiler for performance
Use cases limited primarily to tcpdump and libpcap

Classic BPF vs. eBPF: Architectural Comparison
Feature	Classic BPF (cBPF)	Extended BPF (eBPF)
Register count	2 registers (A, X), 32-bit	11 registers (R0-R10), 64-bit
Instruction width	32 bits	64 bits
Stack	16 memory slots	512-byte stack
Maps (data structures)	None	Hash maps, arrays, ring buffers, etc.
Helper functions	None	1000+ kernel helper functions
Tail calls	Not supported	Supported (program chaining)
Verification	Basic safety checks	Comprehensive static analysis
Use cases	Packet filtering only	Tracing, networking, security, etc.
Attachment points	Socket filters only	100+ attachment points (kprobes, tracepoints, XDP, etc.)

The eBPF Revolution (2014)

The eBPF transformation included:

Extended register set: From 2 registers to 11 64-bit registers (R0-R10), enabling more complex programs and matching modern CPU architectures.
Richer instruction set: Added modern instructions, including 64-bit arithmetic, function calls, and memory operations.
BPF maps: Introduced kernel-side data structures (hash tables, arrays, etc.) that can be shared between eBPF programs and user space.
Helper functions: Provided access to kernel functionality through a growing set of helper functions.
Multiple attachment points: Extended beyond sockets to kprobes, tracepoints, XDP, cgroups, and more.
Advanced verification: Implemented a sophisticated static analyzer to ensure program safety.

Why "extended" BPF?

The eBPF Virtual Machine Architecture

At the heart of eBPF is a sophisticated virtual machine that executes eBPF bytecode within the kernel. This VM provides the foundation for safe, efficient in-kernel programmability.

Register Architecture

The eBPF VM is a register-based machine with 11 64-bit registers:

Register	Purpose	Calling Convention
R0	Return value from functions/helpers	Return value
R1-R5	Function arguments	Arguments to helpers/functions
R6-R9	Callee-saved registers	Preserved across calls
R10	Read-only frame pointer	Points to 512-byte stack

This architecture closely mirrors the x86-64 and ARM64 calling conventions, enabling efficient JIT compilation to native code.

eBPF Register Usage Example

eBPF Assembly

// Example: eBPF program structure showing register usage
// R1 contains the context (e.g., struct pt_regs* for kprobes)
// R10 is the frame pointer
 
// Function prologue - allocate stack space
r1 = *(u64 *)(r1 + 0)     // Load first argument from context
r6 = r1                     // Save to callee-saved register
 
// Call a helper function
// Arguments go in R1-R5, result comes back in R0
r1 = r6                     // First argument
r2 = 16                     // Second argument (size)
call bpf_probe_read_kernel  // Helper function call
 
// R0 now contains the return value
if r0 != 0 goto error       // Check for errors
 
// Access stack using frame pointer (R10)
*(u64 *)(r10 - 8) = r0      // Store result on stack
r0 = *(u64 *)(r10 - 8)      // Load from stack
 
exit:
  r0 = 0                    // Return success
  exit
 
error:
  r0 = 1                    // Return error
  exit

Instruction Set

The eBPF instruction set uses a fixed 64-bit instruction format:

┌──────────┬─────────┬─────────┬────────────┬───────────────┐
│  opcode  │  dst_reg │ src_reg │   offset   │   immediate   │
│  8 bits  │  4 bits │  4 bits │  16 bits   │    32 bits    │
└──────────┴─────────┴─────────┴────────────┴───────────────┘

The instruction classes include:

Class	Encoding	Description
BPF_LD	0x00	Load operations (legacy)
BPF_LDX	0x01	Load from memory
BPF_ST	0x02	Store immediate
BPF_STX	0x03	Store from register
BPF_ALU	0x04	32-bit arithmetic
BPF_JMP	0x05	Jump operations
BPF_JMP32	0x06	32-bit jump operations
BPF_ALU64	0x07	64-bit arithmetic

Key eBPF Instructions

•Arithmetic: ADD, SUB, MUL, DIV, MOD, AND, OR, XOR, LSH, RSH, ARSH, NEG (both 32-bit and 64-bit variants)
•Memory: LDX (load), STX (store) with B/H/W/DW (byte/half/word/double-word) sizes
•Jumps: JEQ, JNE, JGT, JGE, JLT, JLE, JSET (signed and unsigned variants)
•Function calls: CALL (invoke helper functions or BPF-to-BPF calls)
•Program termination: EXIT (return from program with value in R0)
•Atomic operations: XADD, XCHG, CMPXCHG for concurrent access

Design Philosophy

The eBPF Verifier: Guardian of Kernel Safety

Why Verification is Essential

Kernel code runs with full system privileges. A bug in kernel code can:

Crash the entire system
Corrupt memory, causing data loss
Create security vulnerabilities
Cause system hangs or deadlocks

The verifier ensures that eBPF programs cannot cause these problems, even when written by untrusted users.

Verifier Safety Guarantees

•Termination: Programs must terminate. The verifier proves there are no infinite loops by requiring all back-edges to be bounded.
•Memory safety: All memory accesses must be within valid bounds. The verifier tracks the type and range of every register.
•Type safety: Register types (scalar, pointer to map, pointer to context, etc.) are tracked and enforced throughout the program.
•No uninitialized reads: The verifier ensures all registers are initialized before use.
•Valid helper calls: Only permitted helper functions can be called, with properly typed arguments.
•Stack safety: Stack accesses must be within the 512-byte frame, with proper alignment.
•No dangerous operations: Division by zero, out-of-bounds accesses, and null pointer dereferences are prevented.

How the Verifier Works

The verifier performs abstract interpretation, simulating program execution with abstract register states rather than concrete values.

Step 1: Control Flow Graph Construction

The verifier first builds a DAG (Directed Acyclic Graph) representation of the program, identifying all basic blocks and control flow edges.

Step 2: State Tracking

For each register, the verifier tracks:

Type: scalar, pointer type (map value, context, stack, etc.)
Value range: For scalars, tracks minimum and maximum possible values ([var_off, umin, umax, smin, smax])
Pointer offset: For pointers, tracks the offset from base
Taint flags: Whether the value comes from untrusted input

Step 3: Path Exploration

Verifier State Tracking Example
C (BPF)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Consider this eBPF program
SEC("kprobe/sys_open")
int trace_open(struct pt_regs *ctx) {
    void *ptr;
    u64 len;
    
    // Verifier tracks: R1 = PTR_TO_CTX
    len = bpf_get_current_pid_tgid() >> 32;
    // Verifier tracks: len is SCALAR, range [0, 2^32 - 1]
    
    if (len > 100) {
        // In this branch: len range is [101, 2^32 - 1]
        len = 100;  
        // After assignment: len range is [100, 100]
    }
    // After join: len range is [0, 100]
    
    char buf[256];
    // This is SAFE because len <= 100 < 256
    bpf_probe_read_kernel(buf, len, ptr);
    
    return 0;
}
 
// But this would be REJECTED:
SEC("kprobe/sys_open")
int unsafe_trace(struct pt_regs *ctx) {
    u64 len = bpf_get_current_pid_tgid() >> 32;
    // Verifier tracks: len is SCALAR, range [0, 2^32 - 1]
    
    char buf[256];
    // REJECTED! len could be > 256, causing buffer overflow
    bpf_probe_read_kernel(buf, len, ctx);
    // Error: "R2 unbounded memory access, use 'var &= const'"
    
    return 0;
}

Verifier Complexity Limits

To prevent denial-of-service attacks where malicious programs cause the verifier to consume excessive resources, strict limits are enforced:

Limit	Value	Purpose
Max instructions	1 million	Limits program size
Max verified instructions	1 million	Limits verification time
Max stack depth	512 bytes	Limits stack usage
Max tail calls	33	Prevents infinite recursion via tail calls
BPF-to-BPF call depth	8	Limits function nesting
Loops (bounded)	Kernel 5.3+	Back-edges must have provable bounds

Verifier Complexity in Practice

JIT Compilation: From Bytecode to Native Speed

JIT Compilation Process

Input: Verified eBPF bytecode
Output: Native machine code stored in executable kernel memory
Strategy: One-to-one or one-to-many instruction mapping
Optimizations: Architecture-specific optimizations, register allocation

Without JIT (Interpretation)

•Each eBPF instruction decoded at runtime
•Switch/dispatch overhead per instruction
•Poor CPU branch prediction
•Cache inefficiency
•~10x slower than native code

With JIT Compilation

•Compiles once, executes natively
•No interpretation overhead
•Excellent CPU branch prediction
•Instruction cache efficiency
•Near-native performance

eBPF to x86-64 JIT Example

Assembly

// eBPF bytecode (simplified)
r0 = 0                    // BPF_MOV64_IMM R0, 0
r1 = *(u64 *)(r10 - 8)    // BPF_LDX_MEM DW R1, R10, -8
r0 = r0 + r1              // BPF_ALU64 ADD R0, R1
exit                       // BPF_EXIT
 
// Corresponding x86-64 JIT output
// (Generated by the Linux kernel's x86 BPF JIT)
 
push   %rbp               // Function prologue
mov    %rsp, %rbp
sub    $0x200, %rsp       // Allocate 512-byte BPF stack
 
xor    %eax, %eax         // r0 = 0 (x86: eax is low 32 bits of rax)
mov    -0x8(%rbp), %rcx   // r1 = *(u64 *)(r10 - 8)
                          // rbp maps to R10 (frame pointer)
add    %rcx, %rax         // r0 += r1
 
add    $0x200, %rsp       // Function epilogue
pop    %rbp
retq                       // exit (return with value in rax)
 
// Register mapping on x86-64:
// R0  -> rax (return value)
// R1  -> rdi (1st arg, also temp)
// R2  -> rsi (2nd arg)
// R3  -> rdx (3rd arg)
// R4  -> rcx (4th arg)
// R5  -> r8  (5th arg)
// R6  -> rbx (callee-saved)
// R7  -> r13 (callee-saved)
// R8  -> r14 (callee-saved)
// R9  -> r15 (callee-saved)
// R10 -> rbp (frame pointer)

JIT Hardening

BPF JIT includes several security hardening features:

Constant blinding: Immediate values are XORed with random values and unblinded at runtime, preventing JIT spray attacks.
Image randomization: JIT'd code is placed at randomized addresses within kernel memory.
Retpoline support: Indirect jumps use retpolines on CPUs vulnerable to Spectre variant 2.
Read-only executable memory: JIT images are marked as read-only and executable, preventing code modification.

Checking JIT Status

eBPF Program Types and Attachment Points

Program Type Categories

eBPF programs can be broadly categorized by their domain:

Major eBPF Program Types
Category	Program Type	Attachment Point	Primary Use Case
Networking	BPF_PROG_TYPE_XDP	Network driver (ingress)	High-performance packet processing, DDoS mitigation
Networking	BPF_PROG_TYPE_SCHED_CLS	TC (Traffic Control)	Packet classification, container networking
Networking	BPF_PROG_TYPE_SOCKET_FILTER	Socket	Packet filtering (classic use case)
Networking	BPF_PROG_TYPE_SK_SKB	Sockmap	Socket-level proxy, load balancing
Tracing	BPF_PROG_TYPE_KPROBE	Kernel function entry/exit	Dynamic function tracing
Tracing	BPF_PROG_TYPE_TRACEPOINT	Static tracepoints	Stable kernel event tracing
Tracing	BPF_PROG_TYPE_RAW_TRACEPOINT	Raw tracepoints	Low-overhead tracepoint access
Tracing	BPF_PROG_TYPE_PERF_EVENT	Perf events	Performance monitoring, sampling
Security	BPF_PROG_TYPE_LSM	LSM hooks	Security policy enforcement
Security	BPF_PROG_TYPE_CGROUP_*	cgroup events	Per-container resource control
Observability	BPF_PROG_TYPE_STRUCT_OPS	Kernel struct_ops	Custom kernel subsystem implementations

Context Structures

Each program type receives a specific context structure as its input (passed in R1). The context provides access to relevant data for that hook point:

Program Type	Context Structure	Key Fields
XDP	`struct xdp_md`	data, data_end, data_meta, ingress_ifindex
Socket Filter	`struct __sk_buff`	data, data_end, protocol, len, ifindex
kprobe	`struct pt_regs`	CPU registers at probe point
Tracepoint	Tracepoint-specific	Varies per tracepoint
LSM	Hook-specific	Security-relevant data for the hook

eBPF Program Type Examples
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// Example 1: XDP program (networking)
SEC("xdp")
int xdp_drop_all(struct xdp_md *ctx) {
    // Context gives us packet data pointers
    void *data_end = (void *)(long)ctx->data_end;
    void *data = (void *)(long)ctx->data;
    
    struct ethhdr *eth = data;
    
    // Bounds check required by verifier
    if ((void *)(eth + 1) > data_end)
        return XDP_DROP;
    
    // Drop all UDP packets (example)
    if (eth->h_proto == htons(ETH_P_IP)) {
        struct iphdr *ip = (void *)(eth + 1);
        if ((void *)(ip + 1) > data_end)
            return XDP_DROP;
        if (ip->protocol == IPPROTO_UDP)
            return XDP_DROP;  // Drop UDP
    }
    
    return XDP_PASS;  // Allow other packets
}
 
// Example 2: Kprobe program (tracing)
SEC("kprobe/do_sys_openat2")
int trace_openat(struct pt_regs *ctx) {
    // Context gives us CPU registers at function entry
    // On x86-64, function arguments are in rdi, rsi, rdx, rcx, r8, r9
    
    int dirfd = PT_REGS_PARM1(ctx);
    const char *pathname = (const char *)PT_REGS_PARM2(ctx);
    
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    
    // Log to trace output
    bpf_printk("PID %d opening file at dirfd %d\n", pid, dirfd);
    
    return 0;
}
 
// Example 3: Tracepoint program (stable tracing)
SEC("tracepoint/syscalls/sys_enter_openat")
int trace_sys_enter_openat(struct trace_event_raw_sys_enter *ctx) {
    // Tracepoint context has structured access to syscall args
    int dirfd = ctx->args[0];
    const char *pathname = (const char *)ctx->args[1];
    int flags = ctx->args[2];
    
    bpf_printk("openat: dirfd=%d, flags=0x%x\n", dirfd, flags);
    
    return 0;
}

Choosing Program Types

The eBPF Ecosystem and Toolchain

The eBPF Toolchain Stack

┌─────────────────────────────────────────────────────────────────┐
│  High-Level Tools: bpftrace, BCC, kubectl-trace                │
├─────────────────────────────────────────────────────────────────┤
│  eBPF Frameworks: libbpf, ebpf-go, aya (Rust), libbpf-rs        │
├─────────────────────────────────────────────────────────────────┤
│  Compiler: Clang/LLVM (C → eBPF bytecode)                       │
├─────────────────────────────────────────────────────────────────┤
│  BTF (BPF Type Format): CO-RE (Compile Once, Run Everywhere)    │
├─────────────────────────────────────────────────────────────────┤
│  Kernel: bpf() syscall, verifier, JIT, maps, helpers            │
└─────────────────────────────────────────────────────────────────┘

Key Ecosystem Components

•libbpf: The canonical C library for eBPF development, maintained in the Linux kernel tree. Provides skeleton generation, CO-RE support, and loader functionality.
•Clang/LLVM: The only production-ready compiler for eBPF. Compiles C code to eBPF bytecode using the bpf target.
•BTF (BPF Type Format): Debug information format enabling CO-RE (Compile Once, Run Everywhere) portability across kernel versions.
•BCC (BPF Compiler Collection): Python/Lua wrapper for eBPF development, great for scripting but limited for production deployment.
•bpftrace: High-level tracing language inspired by awk and DTrace. Excellent for one-liner observability queries.
•ebpf-go: Pure Go library for eBPF, enabling Go-native eBPF development.
•Aya: Rust framework for eBPF, providing memory safety in user-space tooling.

CO-RE: Solving the Portability Problem

BTF in the kernel: Modern kernels (5.2+) embed BTF information describing all kernel types.
BTF in eBPF objects: Compiled eBPF objects contain BTF describing the structures they expect.
Runtime relocation: At load time, libbpf adjusts field offsets based on the running kernel's BTF.

This enables distributing pre-compiled eBPF programs that work across different kernel versions without recompilation.

CO-RE Example: Portable Kernel Structure Access
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// vmlinux.h is generated from kernel BTF
// Contains all kernel type definitions
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_core_read.h>  // CO-RE helpers
 
SEC("kprobe/do_sys_openat2")
int trace_open(struct pt_regs *ctx) {
    struct task_struct *task;
    struct file *file;
    char comm[16];
    pid_t pid;
    
    // Get current task_struct
    task = (struct task_struct *)bpf_get_current_task();
    
    // CO-RE: Read pid field - works across kernel versions
    // BPF_CORE_READ handles offset relocation automatically
    pid = BPF_CORE_READ(task, pid);
    
    // Read task->comm (process name)
    BPF_CORE_READ_STR_INTO(&comm, task, comm);
    
    // CO-RE field existence check
    // Some fields may not exist in all kernel versions
    if (bpf_core_field_exists(task->loginuid)) {
        // Access loginuid only if it exists
        kuid_t loginuid = BPF_CORE_READ(task, loginuid);
    }
    
    bpf_printk("PID %d (%s) opening file\n", pid, comm);
    
    return 0;
}
 
// libbpf skeleton usage in user space:
// 1. Generate skeleton: bpftool gen skeleton program.bpf.o > program.skel.h
// 2. Load and attach:
//    struct program_bpf *skel = program_bpf__open_and_load();
//    program_bpf__attach(skel);
// 3. Cleanup:
//    program_bpf__destroy(skel);

Modern eBPF Development

Why eBPF Matters: The Bigger Picture

The Traditional Problem

Before eBPF, extending kernel functionality required:

Kernel modification: Changing kernel source code, recompiling, and rebooting. Impractical for most organizations.
Loadable kernel modules (LKMs): More flexible but still dangerous—a buggy module can crash the system, and modules have full kernel privileges.
User-space solutions: Safe but slow—data must cross the kernel/user boundary, causing context switches and memory copies.

None of these options provided safe, efficient, dynamic kernel extensibility.

Kernel Extension Approaches Compared
Approach	Safety	Performance	Dynamic	Ease of Use
Kernel modification	Low (full privileges)	Best	No (requires reboot)	Hard (kernel dev skills)
Kernel modules	Low (full privileges)	Best	Yes (loadable)	Medium (kernel dev skills)
User-space	High (isolated)	Poor (context switches)	Yes	Easy
eBPF	High (verified)	Near-native	Yes (loadable)	Medium-Hard

The eBPF Paradigm

eBPF provides a fourth option: sandboxed, verified, JIT-compiled code that runs in kernel space with near-native performance. This unlocks capabilities that were previously impractical:

1. Ubiquitous Observability

2. High-Performance Networking

3. Dynamic Security Policies

LSM programs enable runtime security policy enforcement without rebuilding kernels. This powers runtime security tools like Falco and Tetragon.

4. Custom Kernel Behavior

struct_ops allows implementing custom TCP congestion control, schedulers, and other kernel subsystems as eBPF programs—without kernel modifications.

Industry Adoption

•Facebook/Meta: Uses eBPF for network load balancing (Katran), observability, and container networking across their global infrastructure.
•Netflix: Leverages eBPF for performance analysis, debugging, and FlameScope visualizations using tools like bpftrace and BCC.
•Google: Uses eBPF in production for networking (GKE Dataplane V2/Cilium) and security (runtime enforcement).
•Cloudflare: Employs XDP for DDoS mitigation, processing millions of packets per second at the edge.
•Isovalent/Cilium: Built an entire Kubernetes CNI (Container Network Interface) on eBPF, now a CNCF graduated project.
•Microsoft: Porting eBPF to Windows, demonstrating the technology's influence beyond Linux.

The Future of eBPF

Summary: eBPF Overview

We've covered the foundations of eBPF. Let's consolidate the key concepts:

Key Takeaways

•eBPF evolved from classic BPF — From a simple packet filter (1992) to a general-purpose in-kernel virtual machine (2014+) with 11 registers, rich data structures, and extensive kernel integration.
•The eBPF VM architecture — Register-based, 64-bit, with a stack, designed for efficient JIT compilation while remaining verifiable.
•The verifier ensures safety — Static analysis proves programs terminate, access only valid memory, and cannot destabilize the kernel—enabling untrusted code to run safely in kernel space.
•JIT compilation enables performance — eBPF bytecode compiles to native machine code, achieving near-native execution speed.
•Multiple program types and attachment points — From XDP for networking to kprobes for tracing to LSM for security, eBPF integrates throughout the kernel.
•The ecosystem is mature — libbpf, CO-RE, BTF, and skeleton generation provide a production-ready development experience.
•eBPF enables a new paradigm — Safe, efficient, dynamic kernel programmability that has transformed observability, networking, and security.

What's Next:

Foundation Complete

1 / 5