Loading content...
What if you could answer questions like:
...without modifying applications, restarting services, or adding significant overhead?
eBPF makes this possible. By attaching programs to kernel functions, tracepoints, and performance counters, eBPF provides unprecedented visibility into system behavior. This capability has revolutionized debugging, performance analysis, and security monitoring in production environments.
Netflix, Facebook, and Google use eBPF-based observability to debug issues in real-time across millions of servers. Tools like bpftrace, perf, and commercial solutions like Datadog's agent leverage eBPF to provide insights that were previously impossible or impractical to obtain.
By the end of this page, you will understand the eBPF tracing capabilities (kprobes, tracepoints, USDT), learn to use bpftrace for ad-hoc analysis, understand profiling and flamegraph generation, and appreciate how production observability tools leverage these primitives.
Tracing is the process of recording events as they occur in a system. Unlike logging (where applications explicitly emit messages), tracing instruments the system to capture events automatically. eBPF enables tracing at multiple levels:
Tracing Sources in Linux
| Source | Description | Stability | Performance |
|---|---|---|---|
| kprobes | Dynamic instrumentation of any kernel function | Unstable (functions can change) | Good |
| kretprobes | Function return tracing (captures return value) | Unstable | Good |
| tracepoints | Static, pre-defined kernel instrumentation points | Stable API | Best |
| raw_tracepoints | Lower-overhead tracepoint access | Stable | Better than regular |
| fentry/fexit | BTF-enabled function tracing (kernel 5.5+) | Unstable (but typed) | Best |
| USDT | User-space statically defined tracing | Application-defined | Minimal |
| uprobes | Dynamic user-space function instrumentation | Unstable | Good |
| perf events | Hardware/software performance counters | Stable | Counter-dependent |
Understanding the Tracing Landscape
┌───────────────────────────────────────────────────────────────────┐
│ USER SPACE │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Application │ │ Library │ │ Runtime (JVM) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────────┘ │
│ │ uprobes │ uprobes │ USDT probes │
└─────────┼──────────────────┼───────────────────┼────────────────┘
│ │ │
═══════════════════════════════════════════════════════════════════
│ │ │
┌─────────┼──────────────────┼───────────────────┼────────────────┐
│ ▼ ▼ ▼ KERNEL │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ System Calls (syscall tracepoints) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ VFS, Scheduler, Memory (kprobes/tracepoints) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Drivers (kprobes, device tracepoints) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Hardware (perf events, PMU counters) │ │
│ └─────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
In production, prefer stable interfaces. A tracepoint-based tool will work across kernel upgrades, while a kprobe-based tool might break. For example, tracing sys_enter_openat via tracepoint is stable; tracing do_sys_openat2 via kprobe might fail if the kernel refactors that function.
Kprobes (kernel probes) enable dynamic instrumentation of almost any kernel function. With kprobes, you can place breakpoints at function entry points, specific instructions, or function returns (kretprobes) without modifying the kernel.
How Kprobes Work
vfs_read)int3 on x86)This mechanism enables tracing any of the ~50,000+ kernel functions.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
// ============================================// KPROBE: Trace Function Entry// ============================================// SEC name format: kprobe/<function_name>SEC("kprobe/vfs_read")int BPF_KPROBE(trace_vfs_read, struct file *file, char __user *buf, size_t count, loff_t *pos) { // BPF_KPROBE macro handles architecture-specific argument extraction // Arguments match the kernel function signature u32 pid = bpf_get_current_pid_tgid() >> 32; bpf_printk("vfs_read: pid=%d, count=%lu", pid, count); return 0;} // ============================================// KRETPROBE: Trace Function Return// ============================================// SEC name format: kretprobe/<function_name>SEC("kretprobe/vfs_read")int BPF_KRETPROBE(trace_vfs_read_ret, ssize_t ret) { // BPF_KRETPROBE provides the return value // 'ret' contains the value returned by vfs_read u32 pid = bpf_get_current_pid_tgid() >> 32; if (ret < 0) { bpf_printk("vfs_read failed: pid=%d, err=%ld", pid, ret); } else { bpf_printk("vfs_read success: pid=%d, bytes=%ld", pid, ret); } return 0;} // ============================================// LATENCY TRACING PATTERN// ============================================struct { __uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 10240); __type(key, u64); // PID + TID __type(value, u64); // Start timestamp} start_times SEC(".maps"); SEC("kprobe/vfs_read")int trace_read_entry(struct pt_regs *ctx) { u64 pid_tgid = bpf_get_current_pid_tgid(); u64 ts = bpf_ktime_get_ns(); bpf_map_update_elem(&start_times, &pid_tgid, &ts, BPF_ANY); return 0;} SEC("kretprobe/vfs_read")int trace_read_return(struct pt_regs *ctx) { u64 pid_tgid = bpf_get_current_pid_tgid(); u64 *start_ts = bpf_map_lookup_elem(&start_times, &pid_tgid); if (start_ts) { u64 duration_ns = bpf_ktime_get_ns() - *start_ts; u64 duration_us = duration_ns / 1000; // Only log slow reads (> 1ms) if (duration_us > 1000) { bpf_printk("slow vfs_read: %llu us", duration_us); } bpf_map_delete_elem(&start_times, &pid_tgid); } return 0;}Finding Kprobeable Functions
# List all available kprobe points
cat /sys/kernel/debug/tracing/available_filter_functions | head -20
# Search for specific functions
cat /sys/kernel/debug/tracing/available_filter_functions | grep vfs_
# Check if a function exists in current kernel
grep -w "do_sys_openat2" /proc/kallsyms
Kprobe Limitations:
Avoid kprobing hot paths in production unless necessary. Kprobes introduce overhead per invocation. Tracing vfs_read on a busy file server generates millions of probe hits per second. Use filtering (by PID, cgroup, etc.) to reduce overhead, and prefer tracepoints when available.
Tracepoints are static instrumentation points compiled into the kernel. Unlike kprobes, which are dynamic, tracepoints are:
The kernel contains ~1000+ tracepoints covering syscalls, scheduler, memory, networking, block I/O, and more.
12345678910111213141516171819202122232425
# List all available tracepointscat /sys/kernel/debug/tracing/available_events # View tracepoint format (shows available fields)cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat/format # Example output:# name: sys_enter_openat# ID: 614# format:# field:long __syscall_nr; offset:8; size:8; signed:0;# field:int dfd; offset:16; size:8; signed:0;# field:const char * filename;offset:24; size:8; signed:0;# field:int flags; offset:32; size:8; signed:0;# field:umode_t mode; offset:40; size:8; signed:0; # Key tracepoint categories:# syscalls/sys_enter_* - System call entry# syscalls/sys_exit_* - System call exit # sched/* - Scheduler events# block/* - Block I/O# net/* - Networking# irq/* - Interrupts# timer/* - Timer events# kmem/* - Memory allocationStarting with kernel 5.5+, fentry/fexit probes offer the best of both worlds: they attach to specific kernel functions like kprobes, but use BTF for type-safe argument access and have lower overhead (no int3 trap). Use fentry when available and the target function doesn't have a tracepoint.
bpftrace is a high-level tracing language for Linux, inspired by DTrace and AWK. It compiles one-liners and scripts into eBPF programs, making eBPF accessible for ad-hoc analysis without writing C code.
bpftrace Architecture
┌───────────────────────────────────────────────────────────┐
│ bpftrace script │
│ kprobe:vfs_read { @bytes = hist(arg2); } │
└─────────────────────────┬─────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────┐
│ bpftrace compiler │
│ (parser → AST → LLVM IR) │
└─────────────────────────┬─────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────┐
│ eBPF bytecode │
│ (loaded via libbpf/bpf() syscall) │
└─────────────────────────┬─────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────┐
│ Kernel │
│ verifier → JIT → attach to kprobe/tracepoint/etc. │
└───────────────────────────────────────────────────────────┘
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
# ============================================# PROCESS TRACING# ============================================ # Trace new process executionbpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%s called execve", comm); }' # Count syscalls by processbpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' # Trace process creation with args (only first 64 bytes)bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%-6d %-16s ", pid, comm); join(args->argv);}' # ============================================# FILE SYSTEM TRACING# ============================================ # Trace file opensbpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s opened %s", comm, str(args->filename)); }' # Count reads by filebpftrace -e 'kprobe:vfs_read { @[str(((struct file *)arg0)->f_path.dentry->d_name.name)] = count(); }' # Histogram of read sizesbpftrace -e 'kprobe:vfs_read { @bytes = hist(arg2); }' # Read latency histogrambpftrace -e 'kprobe:vfs_read { @start[tid] = nsecs; } kretprobe:vfs_read /@start[tid]/ { @us = hist((nsecs - @start[tid]) / 1000); delete(@start[tid]); }' # ============================================# SCHEDULER TRACING# ============================================ # Trace context switches for a specific PIDbpftrace -e 'tracepoint:sched:sched_switch /args->prev_pid == 1234 || args->next_pid == 1234/ { printf("%-8d %-16s -> %-16s", nsecs, args->prev_comm, args->next_comm);}' # Measure runqueue latency (time task waits to run)bpftrace -e 'tracepoint:sched:sched_wakeup { @queuetime[args->pid] = nsecs; } tracepoint:sched:sched_switch /args->prev_pid == 0 && @queuetime[args->next_pid]/ { @us = hist((nsecs - @queuetime[args->next_pid]) / 1000); delete(@queuetime[args->next_pid]); }' # Off-CPU time by stackbpftrace -e 'tracepoint:sched:sched_switch { @blocked[args->prev_pid] = nsecs;}tracepoint:sched:sched_switch /args->prev_pid == 0 && @blocked[args->next_pid]/ { @us[kstack] = sum((nsecs - @blocked[args->next_pid]) / 1000); delete(@blocked[args->next_pid]);}' # ============================================# NETWORK TRACING# ============================================ # TCP retransmitsbpftrace -e 'tracepoint:tcp:tcp_retransmit_skb { printf("retransmit: %s:%d -> %s:%d", ntop(args->saddr), args->sport, ntop(args->daddr), args->dport); }' # Count TCP connections by destinationbpftrace -e 'kprobe:tcp_v4_connect { @connects[ntop(((struct sockaddr_in *)arg1)->sin_addr.s_addr)] = count(); }' # Socket accept latencybpftrace -e 'kprobe:inet_csk_accept { @start[tid] = nsecs; } kretprobe:inet_csk_accept /@start[tid]/ { @accept_us = hist((nsecs - @start[tid]) / 1000); delete(@start[tid]); }'bpftrace Syntax Quick Reference
| Element | Description | Example |
|---|---|---|
probe | Attachment point | kprobe:vfs_read, tracepoint:syscalls:sys_enter_open |
filter | Conditional execution | /pid == 1234/, /comm == "nginx"/ |
action | Code to execute | `{ printf("%d |
", pid); }| |@map| Aggregation map |@counts[comm] = count();| |$var| Scalar variable |$ts = nsecs;| |arg0-argN| kprobe arguments |arg0(first function argument) | |args->field| Tracepoint args |args->filename| |tid, pid| Thread/Process ID | Built-in variables | |comm| Process name | Built-in variable | |nsecs| Nanosecond timestamp | Built-in variable | |kstack, ustack` | Stack traces | For flamegraphs |
bpftrace excels at ad-hoc debugging. When a production issue occurs, you can write a one-liner in seconds to answer questions like 'which process is calling this syscall?' or 'what's the read latency distribution?' It's the kernel equivalent of adding print statements—but without modifying or restarting anything.
CPU Profiling identifies where a program spends its CPU time. eBPF enables efficient profiling by:
Flamegraphs are the standard visualization for profiling data. They show:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
# ============================================# ON-CPU PROFILING (CPU sampling)# ============================================ # Profile all processes at 99 Hz for 10 seconds# Using bpftrace:bpftrace -e 'profile:hz:99 { @[kstack, ustack, comm] = count(); }' > stacks.txt # Using perf with eBPF-enabled stacks:perf record -F 99 -a -g -- sleep 10perf script > stacks.txt # Generate flamegraph from stacks# Requires: https://github.com/brendangregg/FlameGraph./stackcollapse-bpftrace.pl stacks.txt | ./flamegraph.pl > profile.svg # ============================================# OFF-CPU ANALYSIS (Blocking time)# ============================================ # Track time spent blocked/sleeping# This shows WHERE processes are waiting (I/O, locks, etc.) bpftrace -e 'tracepoint:sched:sched_switch { if (args->prev_state == 1 || args->prev_state == 2) { @blocked[args->prev_pid, kstack] = nsecs; }} tracepoint:sched:sched_switch /args->prev_pid == 0 && @blocked[args->next_pid, kstack]/ { @offcpu_us[@blocked[args->next_pid, kstack]] = sum((nsecs - @blocked[args->next_pid, kstack]) / 1000); delete(@blocked[args->next_pid, kstack]);}' > offcpu_stacks.txt # ============================================# FUNCTION DURATION PROFILING# ============================================ # Profile time spent in specific functionsbpftrace -e 'kprobe:ext4_file_write_iter { @start[tid] = nsecs; }kretprobe:ext4_file_write_iter /@start[tid]/ { @duration_us[kstack] = sum((nsecs - @start[tid]) / 1000); delete(@start[tid]);}END { print(@duration_us); }' # ============================================# DIFFERENTIAL PROFILING# ============================================ # Compare before/after a change:# 1. Profile baselinebpftrace -e 'profile:hz:99 { @[kstack] = count(); }' > before.txt # 2. Make change, profile againbpftrace -e 'profile:hz:99 { @[kstack] = count(); }' > after.txt # 3. Generate differential flamegraph./difffolded.pl before.txt after.txt | ./flamegraph.pl > diff.svgReading Flamegraphs
┌────────────────────────────────────────────────────────────────────┐
│ do_sys_open │ ← Wide = lots of time
├───────────────────────────────────┬────────────────────────────────┤
│ do_filp_open │ security_file_open │
├──────────────────┬────────────────┼────────────────────────────────┤
│ path_openat │ alloc_fd │ selinux_* │
├────────┬─────────┼────────────────┤ │
│ lookup │ create │ │ │
└────────┴─────────┴────────────────┴────────────────────────────────┘
△ Narrow = less time
Flamegraph Analysis Tips:
Look for wide plateaus — These are functions where significant time is spent without calling other functions (CPU-bound work or leaf functions)
Compare widths — If function_a is twice as wide as function_b, it consumes twice the CPU time
Follow the hot path — Start from the widest root and follow the widest child at each level
Ignore narrow towers — Deeply nested but narrow stacks contribute little to overall time
On-CPU flamegraphs show where CPU time is spent—useful for CPU-bound workloads. Off-CPU flamegraphs show where time is spent blocked (waiting for I/O, locks, etc.)—essential for I/O-bound or latency-sensitive workloads. For complete analysis, generate both and compare.
eBPF has enabled a new generation of observability tools that provide deep system visibility with minimal overhead. Let's examine the patterns and tools used in production environments.
The eBPF Observability Stack
┌──────────────────────────────────────────────────────────────────┐
│ Visualization │
│ Grafana, Jaeger, custom dashboards │
└────────────────────────────────┬─────────────────────────────────┘
│
┌────────────────────────────────┼─────────────────────────────────┐
│ Backends │
│ Prometheus, Elasticsearch, ClickHouse, Parca │
└────────────────────────────────┬─────────────────────────────────┘
│
┌────────────────────────────────┼─────────────────────────────────┐
│ eBPF Agents │
│ Pixie, Parca, Tetragon, Datadog Agent, Cilium Hubble │
└────────────────────────────────┬─────────────────────────────────┘
│
┌────────────────────────────────┼─────────────────────────────────┐
│ eBPF Programs │
│ Ring buffers → user space → export │
└────────────────────────────────┬─────────────────────────────────┘
│
└────────────────────────────────┴─────────────────────────────────┘
Linux Kernel
| Tool | Focus Area | Key Capabilities |
|---|---|---|
| Cilium Hubble | Network observability | L3/L4/L7 flow visibility, service maps, DNS visibility |
| Pixie | Application performance | Auto-instrumented traces, flamegraphs, service topology |
| Parca | Continuous profiling | Always-on profiling, differential analysis |
| Tetragon | Security observability | Process execution, file access, network tracing |
| Falco | Runtime security | Syscall-based threat detection, rule engine |
| bcc tools | Ad-hoc analysis | 50+ readymade tools (execsnoop, opensnoop, etc.) |
| Datadog Agent | Full-stack observability | eBPF-enhanced APM, network monitoring, security |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118
// ============================================// PATTERN 1: Efficient Event Streaming with Ring Buffer// ============================================// Ring buffers are the modern way to stream events to user space// - Single buffer shared across CPUs// - Lock-free for single producer (BPF program)// - Notification coalescing reduces wakeups struct event { u64 timestamp; u32 pid; u32 tid; char comm[16]; char filename[256]; s64 retval;}; struct { __uint(type, BPF_MAP_TYPE_RINGBUF); __uint(max_entries, 1024 * 1024); // 1 MB} events SEC(".maps"); SEC("tracepoint/syscalls/sys_exit_openat")int trace_openat_exit(struct trace_event_raw_sys_exit *ctx) { struct event *e; // Reserve space atomically e = bpf_ringbuf_reserve(&events, sizeof(*e), 0); if (!e) return 0; // Buffer full, drop (handle gracefully) // Fill event e->timestamp = bpf_ktime_get_ns(); e->pid = bpf_get_current_pid_tgid() >> 32; e->tid = bpf_get_current_pid_tgid(); bpf_get_current_comm(&e->comm, sizeof(e->comm)); e->retval = ctx->ret; // Submit (makes visible to user space) bpf_ringbuf_submit(e, 0); return 0;} // ============================================// PATTERN 2: In-Kernel Aggregation// ============================================// Aggregate in BPF maps to reduce user-space load// Only send summaries, not individual events struct latency_key { char comm[16]; u8 bucket; // Latency bucket (log2)}; struct { __uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 10240); __type(key, struct latency_key); __type(value, u64); // Count} latency_histogram SEC(".maps"); static __always_inline u8 log2_bucket(u64 value) { // Returns 0-63, representing 2^N to 2^(N+1) range u8 bucket = 0; while (value > 1 && bucket < 63) { value >>= 1; bucket++; } return bucket;} SEC("kretprobe/vfs_read")int trace_read_latency(struct pt_regs *ctx) { u64 *start_ts, latency_ns; struct latency_key key = {}; u64 *count; // Get start timestamp from map (set in kprobe) // ... (lookup and calculate latency) latency_ns = /* calculated */; bpf_get_current_comm(&key.comm, sizeof(key.comm)); key.bucket = log2_bucket(latency_ns / 1000); // µs buckets // Increment counter count = bpf_map_lookup_elem(&latency_histogram, &key); if (count) { __sync_fetch_and_add(count, 1); } else { u64 one = 1; bpf_map_update_elem(&latency_histogram, &key, &one, BPF_ANY); } return 0;} // ============================================// PATTERN 3: Cgroup Filtering for Containers// ============================================// In containerized environments, filter events by cgroup const volatile u64 target_cgroupid = 0; // Set by user space SEC("tracepoint/syscalls/sys_enter_openat")int trace_container_opens(struct trace_event_raw_sys_enter *ctx) { u64 cgid = bpf_get_current_cgroup_id(); // Skip if not target container if (target_cgroupid && cgid != target_cgroupid) return 0; // Process event for this container // ... return 0;}bpftool prog to check program run counts and run times. Aim for <1% CPU overhead.Facebook runs eBPF programs on millions of servers with negligible overhead. The key is efficient program design: aggregate in-kernel, filter early, sample when necessary, and use modern APIs like ring buffers. Well-designed eBPF observability adds <0.5% CPU overhead even on busy systems.
Debugging eBPF programs presents unique challenges: you can't use GDB, printf debugging has limitations, and verifier errors can be cryptic. Here are the essential debugging techniques.
Debugging Toolkit
| Tool/Technique | Purpose | When to Use |
|---|---|---|
bpf_printk() | Kernel log output | Quick debugging |
bpftool prog show | List loaded programs | Verify loading |
bpftool prog dump | Disassemble programs | Understand JIT output |
bpftool map dump | Inspect map contents | Verify data flow |
| Verifier output | Understand rejections | Fix verification errors |
| BTF (CO-RE) | Type-safe access | Portable programs |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
# ============================================# INSPECT LOADED PROGRAMS# ============================================ # List all loaded BPF programsbpftool prog show # Example output:# 42: kprobe name trace_openat tag 7a8e3f04b9b3d57a gpl# loaded_at 2024-01-15T10:30:00+0000 uid 0# bytes_xlated 392 jited 224 memlock 4096B# map_ids 5,6 # Show detailed program infobpftool prog show id 42 --pretty # Disassemble BPF bytecodebpftool prog dump xlated id 42 # Show JIT'd native codebpftool prog dump jited id 42 # ============================================# INSPECT MAPS# ============================================ # List all BPF mapsbpftool map show # Dump map contentsbpftool map dump id 5 # Dump in JSON format for parsingbpftool map dump id 5 --json | jq # Look up specific keybpftool map lookup id 5 key 0x00 0x00 0x00 0x01 # ============================================# READ DEBUG OUTPUT# ============================================ # bpf_printk() writes to trace_pipe# Run in separate terminal:cat /sys/kernel/debug/tracing/trace_pipe # Or use bpftool:# (requires kernel 5.9+)bpftool prog tracelog # ============================================# VERIFIER DEBUGGING# ============================================ # Get verbose verifier output# In code, use:# LIBBPF_OPTS(bpf_object_open_opts, opts, .kernel_log_level = 1); # Or set environment variable:LIBBPF_LOG_LEVEL=debug ./my_loader # The verifier will print something like:# func#0 @0# 0: (b7) r1 = 0# 1: (63) *(u32 *)(r10 -4) = r1# ...# 12: (bf) r2 = r1# R1 !read_ok <-- Error: R1 might be NULL# processed 12 insns (limit 1000000) # ============================================# CHECK ATTACHMENT STATUS# ============================================ # List kprobescat /sys/kernel/debug/kprobes/list # List tracepoints in usecat /sys/kernel/debug/tracing/enabled_events # BPF linksbpftool link showCommon Verifier Errors and Solutions
| Error Message | Cause | Solution |
|---|---|---|
R1 !read_ok | Reading potentially NULL pointer | Add NULL check before access |
unbounded access | Array index not bounded | Add bounds check if (idx < MAX) |
invalid mem access | Wrong offset/type for context | Check context structure fields |
back-edge from insn X | Unbounded loop detected | Add bounded loop or refactor |
bpf_xxx: unknown func | Helper not available | Check kernel version, license |
variable stack access | Stack access with non-const offset | Use constant array indices |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
// ============================================// ERROR: R1 !read_ok (potentially NULL pointer)// ============================================ // BAD: Verifier doesn't know lookup can succeedu64 *value = bpf_map_lookup_elem(&my_map, &key);*value += 1; // ERROR: value might be NULL // GOOD: Always check map lookup resultsu64 *value = bpf_map_lookup_elem(&my_map, &key);if (value) { *value += 1;} // ============================================// ERROR: unbounded memory access// ============================================ // BAD: len could be any valueSEC("kprobe/vfs_read")int bad_trace(struct pt_regs *ctx) { size_t len = PT_REGS_PARM3(ctx); char buf[256]; bpf_probe_read_kernel(buf, len, some_ptr); // ERROR} // GOOD: Bound the lengthSEC("kprobe/vfs_read")int good_trace(struct pt_regs *ctx) { size_t len = PT_REGS_PARM3(ctx); char buf[256]; if (len > sizeof(buf)) len = sizeof(buf); bpf_probe_read_kernel(buf, len, some_ptr); // OK} // ALTERNATIVE: Use bitwise AND for power-of-2 sizessize_t len = PT_REGS_PARM3(ctx) & (sizeof(buf) - 1); // ============================================// ERROR: back-edge from insn (loops)// ============================================ // BAD: Unbounded loopfor (int i = 0; i < count; i++) { // ERROR: count is variable // ...} // GOOD: Bounded loop (kernel 5.3+)#pragma unrollfor (int i = 0; i < 16; i++) { // OK: constant bound if (i >= count) break; // ...} // ALTERNATIVE: Use bpf_loop() helper (kernel 5.17+)static int loop_callback(u32 index, void *ctx) { // ... process item return 0; // Return 0 to continue, 1 to break}bpf_loop(count, loop_callback, &my_ctx, 0);Start simple: write a minimal program that attaches and prints one thing. Gradually add complexity. When the verifier rejects, read the full output carefully—it tells you exactly which instruction failed and why. Use bpf_printk() liberally during development, then remove for production.
We've explored eBPF's transformative tracing and observability capabilities. Let's consolidate the key concepts:
What's Next:
Now that you understand eBPF's tracing and observability capabilities, the next page explores networking use cases—how eBPF is revolutionizing packet processing, load balancing, and network security through technologies like XDP, TC, and socket-level eBPF.
You now have the foundational knowledge to use eBPF for system observability—from ad-hoc debugging with bpftrace to production monitoring with efficient in-kernel aggregation. In the next page, we'll see how eBPF is equally transformative for networking.