Ebpf - Learning Module | OneNoughtOne

Loading content...

0/227

eBPF Programs

Writing Code That Runs Inside the Kernel

eBPF programs are unlike any other software you'll write. They execute directly within the Linux kernel, triggered by specific events—a network packet arriving, a system call being invoked, a function being entered. They run with kernel privileges yet are constrained by the verifier to guarantee safety. They can observe everything happening in the system while maintaining near-zero overhead.

Understanding how to write, structure, and deploy eBPF programs is essential for leveraging this powerful technology. This page takes you through the complete lifecycle of an eBPF program: from C source code to verified bytecode running in kernel space.

What You Will Learn

By the end of this page, you will understand how eBPF programs are structured, the role of sections and attributes, how helper functions provide kernel access, how maps enable data persistence and communication, and the complete compilation-to-execution workflow. You'll be equipped to read, understand, and begin writing eBPF programs.

Anatomy of an eBPF Program

An eBPF program is written in a restricted subset of C (or Rust with Aya), compiled to eBPF bytecode, and loaded into the kernel. Let's examine the essential components that make up an eBPF program.

Core Components

Every eBPF program contains these fundamental elements:

License declaration — Required by the kernel; GPL-compatible licenses unlock more helper functions
Section attribute — Specifies the program type and attachment point
Entry function — The main function that executes when triggered
Map definitions — Optional data structures for persistence and communication
Helper function calls — Kernel-provided functions for privileged operations

Complete eBPF Program Structure
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
// ============================================
// INCLUDES
// ============================================
// vmlinux.h: Generated type definitions from kernel BTF
// Contains all kernel structures (task_struct, sk_buff, etc.)
#include "vmlinux.h"
 
// BPF helper definitions and macros
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_core_read.h>
#include <bpf/bpf_tracing.h>
 
// ============================================
// LICENSE DECLARATION (Required)
// ============================================
// Must be GPL-compatible for many helper functions
// Options: "GPL", "GPL v2", "GPL and additional rights",
//          "Dual BSD/GPL", "Dual MIT/GPL", "Dual MPL/GPL"
char LICENSE[] SEC("license") = "GPL";
 
// ============================================
// MAP DEFINITIONS
// ============================================
// BPF maps persist data across program invocations
// and allow communication with user space
 
// Hash map: key-value storage
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10240);
    __type(key, u32);           // PID as key
    __type(value, u64);         // Count as value
} pid_count SEC(".maps");
 
// Per-CPU array for statistics
struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __uint(max_entries, 1);
    __type(key, u32);
    __type(value, u64);
} total_count SEC(".maps");
 
// ============================================
// HELPER STRUCTURES
// ============================================
// Custom structures for ring buffer events, etc.
struct event {
    u32 pid;
    u32 tid;
    u64 ts;
    char comm[16];
};
 
// ============================================
// BPF PROGRAM (Main Entry Point)
// ============================================
// SEC("...") defines program type and attachment point
// The section name determines:
//   - Which program type (kprobe, tracepoint, xdp, etc.)
//   - Where it attaches (function name, tracepoint path, etc.)
 
SEC("kprobe/do_sys_openat2")
int trace_openat(struct pt_regs *ctx) {
    // Get current PID/TID
    u64 pid_tgid = bpf_get_current_pid_tgid();
    u32 pid = pid_tgid >> 32;
    u32 tid = (u32)pid_tgid;
    
    // Look up existing count for this PID
    u64 *count = bpf_map_lookup_elem(&pid_count, &pid);
    u64 new_count = count ? *count + 1 : 1;
    
    // Update the map
    bpf_map_update_elem(&pid_count, &pid, &new_count, BPF_ANY);
    
    // Update total count (per-CPU, no locking needed)
    u32 zero = 0;
    u64 *total = bpf_map_lookup_elem(&total_count, &zero);
    if (total)
        __sync_fetch_and_add(total, 1);
    
    return 0;
}
 
// Multiple programs can exist in one file
SEC("kprobe/do_sys_close")
int trace_close(struct pt_regs *ctx) {
    bpf_printk("close() called by PID %d\n",
               bpf_get_current_pid_tgid() >> 32);
    return 0;
}

Section Names and Program Types

The SEC() macro places the function in a specific ELF section, which tells the loader the program type and attachment point. The section name follows conventions understood by libbpf:

Section Pattern	Program Type	Example
`kprobe/<func>`	BPF_PROG_TYPE_KPROBE	`SEC("kprobe/vfs_read")`
`kretprobe/<func>`	BPF_PROG_TYPE_KPROBE (return)	`SEC("kretprobe/vfs_read")`
`tracepoint/<cat>/<name>`	BPF_PROG_TYPE_TRACEPOINT	`SEC("tracepoint/syscalls/sys_enter_open")`
`raw_tracepoint/<name>`	BPF_PROG_TYPE_RAW_TRACEPOINT	`SEC("raw_tracepoint/sys_enter")`
`xdp`	BPF_PROG_TYPE_XDP	`SEC("xdp")`
`tc`	BPF_PROG_TYPE_SCHED_CLS	`SEC("tc")`
`lsm/<hook>`	BPF_PROG_TYPE_LSM	`SEC("lsm/bprm_check_security")`
`fentry/<func>`	BPF_PROG_TYPE_TRACING	`SEC("fentry/do_sys_open")`
`fexit/<func>`	BPF_PROG_TYPE_TRACING	`SEC("fexit/do_sys_open")`

Choosing Between Tracing Program Types

kprobe/kretprobe work on any kernel function but are unstable (function signatures can change). Tracepoints are stable but limited to predefined points. fentry/fexit are the modern alternative—they're faster than kprobes (no int3 trap) and include BTF information for type-safe access. For new programs, prefer tracepoints for stability or fentry/fexit for performance with BTF-enabled kernels.

BPF Helper Functions: The Kernel API

eBPF programs execute in a sandboxed environment and cannot directly call arbitrary kernel functions. Instead, they interact with the kernel through helper functions—a well-defined, stable API that the kernel exposes to eBPF programs.

Helper functions are essential because they:

Provide controlled access to kernel functionality
Maintain the safety guarantees of eBPF (the verifier knows their semantics)
Enable privileged operations (reading kernel memory, generating events, etc.)
Abstract away kernel internal details

There are over 200 helper functions in modern kernels, categorized by purpose:

Major BPF Helper Function Categories
Category	Helper Functions	Purpose
Map Operations	`bpf_map_lookup_elem`, `bpf_map_update_elem`, `bpf_map_delete_elem`	Read/write BPF map data
Current Task Info	`bpf_get_current_pid_tgid`, `bpf_get_current_uid_gid`, `bpf_get_current_comm`	Get info about current process
Memory Access	`bpf_probe_read_kernel`, `bpf_probe_read_user`, `bpf_copy_from_user`	Safely read kernel/user memory
Time	`bpf_ktime_get_ns`, `bpf_ktime_get_boot_ns`, `bpf_ktime_get_coarse_ns`	Get timestamps
Output/Logging	`bpf_trace_printk`, `bpf_printk`, `bpf_perf_event_output`	Debug output, send events
Networking	`bpf_skb_load_bytes`, `bpf_redirect`, `bpf_clone_redirect`	Packet manipulation
Random	`bpf_get_prandom_u32`	Pseudo-random number generation
Tail Calls	`bpf_tail_call`	Chain to another BPF program
Ring Buffer	`bpf_ringbuf_reserve`, `bpf_ringbuf_submit`, `bpf_ringbuf_output`	Efficient event streaming
Spinlocks	`bpf_spin_lock`, `bpf_spin_unlock`	Synchronize map access

Common Helper Functions in Practice
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
// ============================================
// TASK/PROCESS INFORMATION
// ============================================
SEC("kprobe/do_sys_openat2")
int get_process_info(struct pt_regs *ctx) {
    // Get PID (high 32 bits) and TID (low 32 bits)
    u64 pid_tgid = bpf_get_current_pid_tgid();
    u32 pid = pid_tgid >> 32;
    u32 tid = (u32)pid_tgid;
    
    // Get UID (high 32 bits) and GID (low 32 bits)
    u64 uid_gid = bpf_get_current_uid_gid();
    u32 uid = uid_gid >> 32;
    u32 gid = (u32)uid_gid;
    
    // Get process name (comm)
    char comm[16];
    bpf_get_current_comm(&comm, sizeof(comm));
    
    // Get current cgroup ID
    u64 cgroup_id = bpf_get_current_cgroup_id();
    
    return 0;
}
 
// ============================================
// MEMORY ACCESS
// ============================================
SEC("kprobe/vfs_read")
int trace_read(struct pt_regs *ctx) {
    // Get the file* argument (first argument on x86-64)
    struct file *f = (struct file *)PT_REGS_PARM1(ctx);
    
    // Read the filename from kernel memory
    // MUST use bpf_probe_read_* for kernel pointers
    char filename[256];
    struct dentry *dentry;
    
    // CO-RE safe read of nested structures
    bpf_probe_read_kernel(&dentry, sizeof(dentry), &f->f_path.dentry);
    
    // Read the name from dentry
    if (dentry) {
        bpf_probe_read_kernel_str(filename, sizeof(filename),
            &dentry->d_name.name);
    }
    
    // For user-space memory (e.g., syscall arguments)
    char user_buf[64];
    void __user *user_ptr = (void *)PT_REGS_PARM2(ctx);
    // This would fail - user memory might be paged out
    // long ret = bpf_probe_read_user(user_buf, sizeof(user_buf), user_ptr);
    
    return 0;
}
 
// ============================================
// TIME-BASED OPERATIONS
// ============================================
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10240);
    __type(key, u32);
    __type(value, u64);
} start_times SEC(".maps");
 
// Track function latency
SEC("kprobe/vfs_read")
int trace_read_entry(struct pt_regs *ctx) {
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    u64 ts = bpf_ktime_get_ns();  // Monotonic nanoseconds
    
    bpf_map_update_elem(&start_times, &pid, &ts, BPF_ANY);
    return 0;
}
 
SEC("kretprobe/vfs_read")
int trace_read_exit(struct pt_regs *ctx) {
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    u64 *start_ts = bpf_map_lookup_elem(&start_times, &pid);
    
    if (start_ts) {
        u64 duration = bpf_ktime_get_ns() - *start_ts;
        bpf_printk("vfs_read latency: %llu ns\n", duration);
        bpf_map_delete_elem(&start_times, &pid);
    }
    
    return 0;
}
 
// ============================================
// SENDING EVENTS TO USER SPACE (Ring Buffer)
// ============================================
struct event {
    u32 pid;
    u64 timestamp;
    char comm[16];
    char filename[256];
};
 
struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 256 * 1024);  // 256 KB buffer
} events SEC(".maps");
 
SEC("tracepoint/syscalls/sys_enter_openat")
int trace_openat_rb(struct trace_event_raw_sys_enter *ctx) {
    // Reserve space in ring buffer
    struct event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
    if (!e)
        return 0;  // Buffer full, drop event
    
    // Fill in the event
    e->pid = bpf_get_current_pid_tgid() >> 32;
    e->timestamp = bpf_ktime_get_ns();
    bpf_get_current_comm(&e->comm, sizeof(e->comm));
    
    // Read filename from syscall args
    const char *pathname = (const char *)ctx->args[1];
    bpf_probe_read_user_str(&e->filename, sizeof(e->filename), pathname);
    
    // Submit to ring buffer (makes it visible to user space)
    bpf_ringbuf_submit(e, 0);
    
    return 0;
}

GPL-Only Helpers

Some helper functions are only available to GPL-licensed eBPF programs. These include bpf_probe_read_kernel, bpf_probe_write_user, and most tracing helpers. If your program declares a non-GPL license, the verifier will reject calls to these helpers. When in doubt, use GPL for the license declaration.

BPF Maps: Persistent Data Structures

BPF maps are the primary mechanism for:

State persistence: eBPF programs are stateless per invocation—maps maintain state across executions
Data sharing: Multiple eBPF programs can share the same map
User-space communication: User-space applications can read/write maps
Configuration: Maps can store runtime configuration for eBPF programs

Maps are kernel-side data structures with well-defined semantics and performance characteristics. The kernel provides numerous map types, each optimized for specific use cases.

BPF Map Types
Map Type	Description	Use Case
`BPF_MAP_TYPE_HASH`	Hash table with arbitrary keys	Keyed lookups (PID -> data)
`BPF_MAP_TYPE_ARRAY`	Array with integer indices	Direct indexed access, counters
`BPF_MAP_TYPE_PERCPU_HASH`	Per-CPU hash table	High-contention hash without locking
`BPF_MAP_TYPE_PERCPU_ARRAY`	Per-CPU array	Per-CPU statistics
`BPF_MAP_TYPE_LRU_HASH`	LRU evicting hash table	Bounded caches
`BPF_MAP_TYPE_LRU_PERCPU_HASH`	Per-CPU LRU hash	Per-CPU bounded caches
`BPF_MAP_TYPE_RINGBUF`	Single producer ring buffer	Efficient event streaming to user space
`BPF_MAP_TYPE_PERF_EVENT_ARRAY`	Per-CPU ring buffers	Legacy event streaming (prefer ringbuf)
`BPF_MAP_TYPE_PROG_ARRAY`	Array of BPF program FDs	Tail calls (program chaining)
`BPF_MAP_TYPE_STACK_TRACE`	Stack trace storage	Profiling, stack unwinding
`BPF_MAP_TYPE_CGROUP_ARRAY`	Array of cgroup FDs	cgroup-based filtering
`BPF_MAP_TYPE_SOCKMAP`	Socket storage	Socket-level proxying
`BPF_MAP_TYPE_BLOOM_FILTER`	Probabilistic set membership	Efficient existence checks

BPF Map Definitions and Usage
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
// ============================================
// MODERN MAP DEFINITIONS (BTF-defined maps)
// ============================================
// This is the recommended syntax for modern libbpf
 
// Hash map: arbitrary key-value storage
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10240);
    __type(key, u32);              // Key type: PID
    __type(value, struct data);    // Value type: custom struct
} my_hash SEC(".maps");
 
// Array: indexed by u32, O(1) access
struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 256);
    __type(key, u32);
    __type(value, u64);
} my_array SEC(".maps");
 
// Per-CPU array: no locking, each CPU has its own copy
// Total values = max_entries * num_cpus
struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __uint(max_entries, 1);
    __type(key, u32);
    __type(value, u64);
} cpu_stats SEC(".maps");
 
// LRU hash: automatically evicts least-recently-used entries
struct {
    __uint(type, BPF_MAP_TYPE_LRU_HASH);
    __uint(max_entries, 1024);
    __type(key, struct flow_key);
    __type(value, struct flow_stats);
} flow_cache SEC(".maps");
 
// Ring buffer: efficient event streaming
struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 256 * 1024);  // 256 KB
} events SEC(".maps");
 
// ============================================
// MAP OPERATIONS IN BPF PROGRAMS
// ============================================
SEC("tracepoint/syscalls/sys_enter_read")
int trace_read(void *ctx) {
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    
    // === LOOKUP ===
    // Returns pointer to value or NULL if not found
    u64 *count = bpf_map_lookup_elem(&my_hash, &pid);
    
    // === UPDATE ===
    // Flags: BPF_ANY (insert or update)
    //        BPF_NOEXIST (insert only, fail if exists)
    //        BPF_EXIST (update only, fail if not exists)
    u64 new_count = count ? *count + 1 : 1;
    bpf_map_update_elem(&my_hash, &pid, &new_count, BPF_ANY);
    
    // === DELETE ===
    // bpf_map_delete_elem(&my_hash, &pid);
    
    // === PER-CPU ACCESS ===
    // No locking needed - each CPU sees its own value
    u32 zero = 0;
    u64 *cpu_count = bpf_map_lookup_elem(&cpu_stats, &zero);
    if (cpu_count)
        __sync_fetch_and_add(cpu_count, 1);
    
    // === RING BUFFER ===
    struct event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
    if (e) {
        e->pid = pid;
        e->timestamp = bpf_ktime_get_ns();
        bpf_ringbuf_submit(e, 0);
    }
    
    return 0;
}
 
// ============================================
// INNER MAP (Map-in-Map) for Dynamic Structures
// ============================================
// Outer map holds file descriptors to inner maps
struct inner_map {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 1024);
    __type(key, u32);
    __type(value, u64);
} inner_map SEC(".maps");
 
struct {
    __uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
    __uint(max_entries, 4);
    __uint(key_size, sizeof(u32));
    __array(values, struct inner_map);
} outer_map SEC(".maps");
 
SEC("kprobe/some_func")
int use_inner_map(struct pt_regs *ctx) {
    u32 outer_key = 0;
    void *inner = bpf_map_lookup_elem(&outer_map, &outer_key);
    if (!inner)
        return 0;
    
    u32 inner_key = 42;
    u64 *val = bpf_map_lookup_elem(inner, &inner_key);
    if (val)
        bpf_printk("Found: %llu\n", *val);
    
    return 0;
}

Map Selection Guidelines

•HASH vs ARRAY: Use HASH when keys are sparse or non-sequential; use ARRAY when indices are dense (0 to N-1) for O(1) direct access.
•PERCPU variants: Choose PERCPU when updates are frequent and reads can tolerate per-CPU aggregation. Eliminates lock contention.
•LRU maps: Use when you need bounded memory with automatic eviction. Ideal for caches and connection tracking.
•RINGBUF vs PERF_EVENT_ARRAY: Prefer RINGBUF (5.8+) for event streaming—it's more efficient and provides a single shared buffer.
•Memory considerations: Each map entry consumes kernel memory. size = max_entries × (key_size + value_size). PERCPU multiplies by CPU count.

Map Pinning for Persistence

By default, BPF maps are destroyed when the loading program exits. To persist maps beyond the loader's lifetime, pin them to the BPF filesystem (bpffs). This creates a file at /sys/fs/bpf/<name> that holds a reference to the map. Other programs can then access the map by opening this path. This enables map sharing across processes and program hot-reloading.

The Compilation Pipeline

Transforming C source code into running eBPF programs involves multiple stages. Understanding this pipeline is essential for debugging compilation issues and optimizing program size.

The Complete Pipeline

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ C Source    │────▶│ Clang/LLVM  │────▶│ ELF Object  │────▶│ Loader      │
│ (.bpf.c)    │     │ -target bpf │     │ (.bpf.o)    │     │ (libbpf)    │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
                                                                    │
                                                                    ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ Native Code │◀────│ JIT         │◀────│ Verifier    │◀────│ bpf()       │
│ (in kernel) │     │ Compiler    │     │ (safety)    │     │ syscall     │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

Stage 1: Clang Compilation

Clang with LLVM is the only production-ready compiler for eBPF. The compilation produces an ELF object file containing:

eBPF bytecode in code sections (named by SEC() macros)
BTF (BPF Type Format) debug information
Map definitions in the '.maps' section
Relocation information for CO-RE

Compilation Commands
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Basic compilation
clang -O2 -g -target bpf -c program.bpf.c -o program.bpf.o
 
# Full production compilation with all options:
clang \
    -O2 \                      # Optimization level (always use -O2)
    -g \                       # Generate debug info (needed for BTF)
    -target bpf \              # Target: BPF bytecode
    -D__TARGET_ARCH_x86 \      # Target arch for vmlinux.h macros
    -I/path/to/libbpf/include \ # libbpf headers (bpf_helpers.h, etc.)
    -I. \                      # Local headers (vmlinux.h)
    -c program.bpf.c \         # Input: C source
    -o program.bpf.o            # Output: ELF object
 
# Generate vmlinux.h from kernel BTF
bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
 
# Inspect the compiled program
llvm-objdump -d program.bpf.o           # Disassemble bytecode
bpftool prog dump xlated pinned /sys/fs/bpf/prog  # See translated bytecode
bpftool prog dump jited pinned /sys/fs/bpf/prog   # See JIT'd machine code
 
# Check BTF information
bpftool btf dump file program.bpf.o

Stage 2: The ELF Object File

The compiled .bpf.o file is a standard ELF object with special sections:

Section	Purpose
`.text`	Default code section
`kprobe/...`, `tracepoint/...`	Named program sections
`.maps`	Map definitions
`.rodata`	Read-only data (const globals)
`.data`	Read-write global data
`.bss`	Zero-initialized data
`.BTF`	Type information
`.BTF.ext`	Extended BTF (line info, CO-RE relocations)
`.rel*`	Relocations for the corresponding section

Stage 3: Loading with libbpf

libbpf is the canonical library for loading eBPF programs. It handles:

Parsing the ELF object
Creating maps via bpf() syscall
Applying CO-RE relocations using kernel BTF
Loading programs via bpf() syscall
Attaching programs to hooks

Modern libbpf development uses skeleton generation for type-safe access:

libbpf Skeleton Workflow
1
2
3
4
5
6
7
8
9
10
11
12
# 1. Compile BPF program
clang -O2 -g -target bpf -c program.bpf.c -o program.bpf.o
 
# 2. Generate skeleton header
bpftool gen skeleton program.bpf.o > program.skel.h
 
# The skeleton provides:
# - struct program_bpf: holds all BPF objects
# - program_bpf__open(): parse ELF, prepare for loading
# - program_bpf__load(): create maps, load programs
# - program_bpf__attach(): attach to hooks
# - program_bpf__destroy(): cleanup

Debugging Verifier Errors

When the verifier rejects your program, set LIBBPF_LOG_LEVEL=debug or use libbpf_set_print() to see the full verifier output. Common issues include: unbounded loops (add explicit bounds), uninitialized register access (check all paths initialize variables), and out-of-bounds memory access (add explicit bounds checks before access).

Program Attachment and Lifecycle

After loading and verification, eBPF programs must be attached to their execution points. The attachment mechanism differs by program type, and understanding the lifecycle is crucial for reliable operation.

Attachment Methods by Program Type

eBPF Program Attachment Mechanisms
Program Type	Attachment Method	Kernel Interface
kprobe/kretprobe	perf_event or link	kprobe_register / perf_event_open
tracepoint	perf_event or link	perf_event_open with tracepoint ID
raw_tracepoint	bpf(BPF_RAW_TRACEPOINT_OPEN)	Direct syscall
fentry/fexit	bpf_link	bpf(BPF_LINK_CREATE)
XDP	netlink or bpf_link	IFLA_XDP or bpf(BPF_LINK_CREATE)
TC (sched_cls)	netlink (tc)	RTM_NEWTFILTER
cgroup/*	bpf(BPF_PROG_ATTACH) or link	File descriptor to cgroup dir
LSM	bpf_link	bpf(BPF_LINK_CREATE)
socket filter	setsockopt(SO_ATTACH_BPF)	Socket file descriptor

BPF Links: The Modern Attachment API

Traditional attachment methods had issues:

Programs would auto-detach when the file descriptor closed
No way to atomically replace programs
Cleanup was error-prone

BPF links (introduced in kernel 5.7) solve these problems:

Links persist in the kernel until explicitly destroyed
Links can be pinned to bpffs for process-independent lifetime
Support atomic replacement via bpf_link_update()
Allow querying attachment state

Program Attachment Examples
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// ============================================
// KPROBE ATTACHMENT
// ============================================
// Using libbpf skeleton (handles attachment automatically)
struct program_bpf *skel = program_bpf__open_and_load();
struct bpf_link *link = bpf_program__attach(skel->progs.trace_openat);
if (!link) {
    fprintf(stderr, "Failed to attach kprobe\n");
}
 
// Manual kprobe attachment with explicit function
struct bpf_link *link = bpf_program__attach_kprobe(
    skel->progs.my_kprobe,
    false,              // retprobe? false = entry, true = return
    "vfs_read"          // function name
);
 
// ============================================
// XDP ATTACHMENT
// ============================================
int ifindex = if_nametoindex("eth0");
 
// Attach with flags
// XDP_FLAGS_DRV_MODE:      Native driver mode (best performance)
// XDP_FLAGS_SKB_MODE:      Generic mode (works everywhere)
// XDP_FLAGS_HW_OFFLOAD:    Hardware offload (NIC executes BPF)
int err = bpf_xdp_attach(
    ifindex,
    bpf_program__fd(skel->progs.xdp_prog),
    XDP_FLAGS_DRV_MODE,
    NULL
);
 
// Or using bpf_link for better lifecycle management
struct bpf_link *xdp_link = bpf_program__attach_xdp(
    skel->progs.xdp_prog,
    ifindex
);
 
// ============================================
// CGROUP ATTACHMENT
// ============================================
int cgroup_fd = open("/sys/fs/cgroup/my_cgroup", O_RDONLY);
 
struct bpf_link *link = bpf_program__attach_cgroup(
    skel->progs.cgroup_skb_prog,
    cgroup_fd
);
 
// ============================================
// LINK PINNING FOR PERSISTENCE
// ============================================
// Pin link to bpffs - survives process exit
err = bpf_link__pin(link, "/sys/fs/bpf/my_link");
 
// Later, from another process, reopen the link
struct bpf_link *reopened = bpf_link__open("/sys/fs/bpf/my_link");
 
// Atomic program replacement
int new_prog_fd = bpf_program__fd(new_skel->progs.updated_prog);
err = bpf_link__update_program(reopened, new_skel->progs.updated_prog);
 
// Cleanup
bpf_link__unpin(link);
bpf_link__destroy(link);

Program Lifecycle States

┌──────────┐     ┌────────┐     ┌──────────┐     ┌──────────────┐
│ Compiled │────▶│ Loaded │────▶│ Attached │────▶│ Detached/    │
│ (.bpf.o) │     │        │     │          │     │ Destroyed    │
└──────────┘     └────────┘     └──────────┘     └──────────────┘
                     │              │                   ▲
                     │              │                   │
                     │              └───────────────────┘
                     │              (unpin/destroy link)
                     │
                     └─────────────────────────────────▶
                     (close FD without attachment = destroyed)

Key Lifecycle Points:

Load: Program FD created, maps created, verifier passed
Attach: Program begins executing at hook points
Running: Program invoked on each trigger event
Detach: Program stops executing but still exists
Destroy: Program and maps freed when all references released

Reference Counting

BPF programs and maps are reference-counted. They're destroyed when the last reference is closed. References include: open file descriptors, pinned paths on bpffs, active attachments (links), and maps referencing programs (prog_array for tail calls). To keep programs alive after your process exits, pin them to bpffs or use systemd to manage the loader process.

Tail Calls and Program Chaining

eBPF programs have strict size limits (1 million verified instructions), and the verifier limits complexity to prevent infinite loops. Tail calls provide a mechanism to work around these limits by chaining multiple programs together.

What is a Tail Call?

A tail call transfers execution from one eBPF program to another, replacing the current program entirely (like execve() for processes). The key properties are:

The stack is reused (no additional stack frames)
The called program receives the same context
Control never returns to the calling program
Limited to 33 consecutive tail calls to prevent infinite chains

Tail Call Implementation
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
// ============================================
// TAIL CALL MAP (prog_array)
// ============================================
// Holds file descriptors to BPF programs, indexed by u32
struct {
    __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
    __uint(max_entries, 8);
    __type(key, u32);
    __type(value, u32);  // Actually holds prog_fd, managed by loader
} jump_table SEC(".maps");
 
// Program indices (used as keys in the jump table)
#define PROG_PARSER     0
#define PROG_TCP        1
#define PROG_UDP        2
#define PROG_ICMP       3
 
// ============================================
// DISPATCHER PROGRAM (Entry Point)
// ============================================
SEC("xdp")
int xdp_dispatcher(struct xdp_md *ctx) {
    void *data_end = (void *)(long)ctx->data_end;
    void *data = (void *)(long)ctx->data;
    
    struct ethhdr *eth = data;
    if ((void *)(eth + 1) > data_end)
        return XDP_DROP;
    
    // Tail call to parser program
    bpf_tail_call(ctx, &jump_table, PROG_PARSER);
    
    // If tail call fails (program not in map), continue here
    return XDP_PASS;
}
 
// ============================================
// PARSER PROGRAM
// ============================================
SEC("xdp")
int xdp_parser(struct xdp_md *ctx) {
    void *data_end = (void *)(long)ctx->data_end;
    void *data = (void *)(long)ctx->data;
    
    struct ethhdr *eth = data;
    struct iphdr *ip = (void *)(eth + 1);
    
    if ((void *)(ip + 1) > data_end)
        return XDP_DROP;
    
    // Dispatch to protocol-specific handler
    switch (ip->protocol) {
        case IPPROTO_TCP:
            bpf_tail_call(ctx, &jump_table, PROG_TCP);
            break;
        case IPPROTO_UDP:
            bpf_tail_call(ctx, &jump_table, PROG_UDP);
            break;
        case IPPROTO_ICMP:
            bpf_tail_call(ctx, &jump_table, PROG_ICMP);
            break;
    }
    
    // Unknown protocol or tail call failed
    return XDP_PASS;
}
 
// ============================================
// PROTOCOL-SPECIFIC HANDLERS
// ============================================
SEC("xdp")
int xdp_tcp_handler(struct xdp_md *ctx) {
    // TCP-specific processing
    bpf_printk("Processing TCP packet\n");
    return XDP_PASS;
}
 
SEC("xdp")
int xdp_udp_handler(struct xdp_md *ctx) {
    // UDP-specific processing
    bpf_printk("Processing UDP packet\n");
    return XDP_PASS;
}
 
// ============================================
// USER-SPACE: Loading programs into jump table
// ============================================
/*
// After loading all programs:
int dispatcher_fd = bpf_program__fd(skel->progs.xdp_dispatcher);
int parser_fd = bpf_program__fd(skel->progs.xdp_parser);
int tcp_fd = bpf_program__fd(skel->progs.xdp_tcp_handler);
int udp_fd = bpf_program__fd(skel->progs.xdp_udp_handler);
 
int jump_table_fd = bpf_map__fd(skel->maps.jump_table);
 
// Populate the jump table
u32 key;
key = PROG_PARSER;
bpf_map_update_elem(jump_table_fd, &key, &parser_fd, BPF_ANY);
 
key = PROG_TCP;
bpf_map_update_elem(jump_table_fd, &key, &tcp_fd, BPF_ANY);
 
key = PROG_UDP;
bpf_map_update_elem(jump_table_fd, &key, &udp_fd, BPF_ANY);
*/

Tail Call Use Cases

•Protocol dispatching: Parse packet header, then tail-call to protocol-specific handler (as shown above).
•Overcoming size limits: Split large programs into multiple smaller programs chained via tail calls.
•Runtime extensibility: Add new handlers without reloading the main program—just add entries to the prog_array.
•Feature flags: Enable/disable features by adding/removing programs from the jump table.
•Modular development: Different teams can develop different programs that chain together.

Tail Call Limitations

Tail calls have limitations: max 33 in a chain (prevents infinite loops), all programs must have the same type, tail-called programs don't inherit verifier state (may need to re-validate pointers). For function reuse within a single program, prefer BPF-to-BPF function calls (static functions with __always_inline or BPF function calls in 4.16+) which share verifier state.

Global Variables and Configuration

eBPF programs can use global variables for configuration and shared state. The kernel and libbpf provide mechanisms to initialize these variables from user space and even modify them at runtime.

Types of Global Variables

Type	Section	Modifiable at Runtime	Use Case
`const volatile`	`.rodata`	No (before load only)	Configuration, feature flags
Regular global	`.data`	Yes (via map)	Runtime-modifiable state
Static	`.bss`	Yes (via map)	Zero-initialized state

Key Insight: Global variables are implemented as implicit BPF maps. libbpf automatically creates array maps for .rodata, .data, and .bss sections, allowing user space to read and (for .data/.bss) modify them.

Global Variables in eBPF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// ============================================
// CONSTANT CONFIGURATION (set before load)
// ============================================
// 'const volatile' tells the compiler:
//   - const: Value doesn't change during program execution
//   - volatile: Don't optimize away reads (value set externally)
 
const volatile u32 filter_pid = 0;       // Target PID to trace
const volatile bool debug_mode = false;  // Enable debug output
const volatile u64 sample_rate = 100;    // Sample 1 in N events
 
// ============================================
// MUTABLE GLOBAL STATE
// ============================================
// Regular globals can be modified at runtime via the .data map
u64 event_count = 0;
u32 last_pid = 0;
 
// ============================================
// USING GLOBALS IN BPF PROGRAMS
// ============================================
SEC("tracepoint/syscalls/sys_enter_openat")
int trace_openat(void *ctx) {
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    
    // Read-only config: filter by PID if set
    if (filter_pid && pid != filter_pid)
        return 0;
    
    // Increment global counter
    __sync_fetch_and_add(&event_count, 1);
    last_pid = pid;
    
    // Conditional debug output
    if (debug_mode)
        bpf_printk("openat from PID %d, total events: %llu\n", 
                   pid, event_count);
    
    // Sampling
    if (event_count % sample_rate != 0)
        return 0;
    
    // Process sampled event...
    return 0;
}

Best Practice: Use const volatile for Config

Use const volatile for configuration that doesn't change after loading. The verifier can sometimes use known values of const volatile variables to eliminate dead code paths. For example, if filter_pid is set to 0, the verifier knows the condition if (filter_pid && ...) is always false, potentially eliminating that code path entirely.

Summary: eBPF Programs

We've covered the complete anatomy of eBPF programs. Let's consolidate the key concepts:

Key Takeaways

•Program structure — eBPF programs consist of license declarations, section attributes that define program type, map definitions, and entry functions that execute on triggers.
•Helper functions — Over 200 kernel-provided functions give eBPF programs controlled access to kernel functionality: map operations, memory reading, time, output, and more.
•BPF maps — Persistent key-value data structures (hash, array, ringbuf, etc.) enable state storage, program-to-program sharing, and user-space communication.
•Compilation pipeline — C source → Clang/LLVM → ELF object → libbpf loader → bpf() syscall → verifier → JIT → native code.
•Attachment — Programs are attached to hooks via type-specific mechanisms; BPF links provide modern lifecycle management with pinning and atomic updates.
•Tail calls — Chain programs together via prog_array maps to overcome size limits and enable modular, extensible designs.
•Global variables — Configure programs before load via const volatile, or share mutable state via .data/.bss sections backed by array maps.

What's Next:

Now that you understand how eBPF programs are structured and operate, the next page explores tracing and observability—one of the most powerful applications of eBPF. You'll learn how to instrument the kernel, capture performance data, trace system calls, and build the foundation for tools like bpftrace, Falco, and production observability platforms.

Program Mastery Achieved

You now understand the complete structure and lifecycle of eBPF programs. In the next page, we'll put this knowledge to use by exploring eBPF's tracing and observability capabilities—the foundation for understanding what's happening inside your systems at the kernel level.