eBPF for Observability: The Linux Kernel's Superpower

January 25, 2021

eBPF (extended Berkeley Packet Filter) is transforming how we observe and secure systems. It allows running sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules. The implications for observability are profound.

Here’s how eBPF is changing the game.

What Is eBPF

The Concept

┌─────────────────────────────────────────────────────────────────┐
│                         User Space                               │
│                                                                  │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
│  │   App 1     │    │   App 2     │    │  eBPF Tool  │         │
│  └─────────────┘    └─────────────┘    └──────┬──────┘         │
│                                               │                  │
├───────────────────────────────────────────────┼──────────────────┤
│                         Kernel                 │                  │
│                                               │                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                    eBPF Programs                         │   │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐   │   │
│  │  │ Tracing │  │Networking│  │ Security │  │  XDP    │   │   │
│  │  └─────────┘  └─────────┘  └─────────┘  └─────────┘   │   │
│  └─────────────────────────────────────────────────────────┘   │
│                              │                                   │
│  ┌───────────────────────────┼───────────────────────────────┐  │
│  │                    Kernel Functions                        │  │
│  │  syscalls, network stack, scheduler, filesystems, etc.    │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Why It Matters

Traditional observability:

eBPF observability:

Use Cases

System Tracing

# BCC tool: trace file opens
from bcc import BPF

program = """
int trace_open(struct pt_regs *ctx, const char __user *filename, int flags) {
    bpf_trace_printk("open called: %s\\n", filename);
    return 0;
}
"""

b = BPF(text=program)
b.attach_kprobe(event="do_sys_open", fn_name="trace_open")
b.trace_print()

Network Observability

// XDP program for packet counting
SEC("xdp")
int count_packets(struct xdp_md *ctx) {
    __u32 key = 0;
    __u64 *count = bpf_map_lookup_elem(&packet_count, &key);
    if (count) {
        __sync_fetch_and_add(count, 1);
    }
    return XDP_PASS;
}

Continuous Profiling

CPU profiling without application changes:

# Profile all processes at 99 Hz
profile -F 99 -a -g > flamegraph.txt

# Profile specific process
profile -p $(pgrep myapp) -F 99 > myapp_profile.txt

Container Observability

// Track container network connections
SEC("kprobe/tcp_connect")
int trace_connect(struct pt_regs *ctx, struct sock *sk) {
    struct task_struct *task = (struct task_struct *)bpf_get_current_task();
    __u32 cgroup_id = bpf_get_current_cgroup_id();

    // Get container info and connection details
    struct connection_t conn = {};
    conn.pid = bpf_get_current_pid_tgid() >> 32;
    conn.cgroup_id = cgroup_id;
    bpf_probe_read(&conn.daddr, sizeof(conn.daddr), &sk->sk_daddr);
    bpf_probe_read(&conn.dport, sizeof(conn.dport), &sk->sk_dport);

    bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &conn, sizeof(conn));
    return 0;
}

Tools and Frameworks

BCC (BPF Compiler Collection)

Python-friendly eBPF toolkit:

# Histogram of read() latencies
from bcc import BPF

bpf_text = """
#include <uapi/linux/ptrace.h>

BPF_HISTOGRAM(dist);

int trace_read_entry(struct pt_regs *ctx) {
    u64 ts = bpf_ktime_get_ns();
    u32 pid = bpf_get_current_pid_tgid();
    start.update(&pid, &ts);
    return 0;
}

int trace_read_return(struct pt_regs *ctx) {
    u64 *tsp, delta;
    u32 pid = bpf_get_current_pid_tgid();
    tsp = start.lookup(&pid);
    if (tsp != 0) {
        delta = bpf_ktime_get_ns() - *tsp;
        dist.increment(bpf_log2l(delta));
        start.delete(&pid);
    }
    return 0;
}
"""

b = BPF(text=bpf_text)
b.attach_kprobe(event="vfs_read", fn_name="trace_read_entry")
b.attach_kretprobe(event="vfs_read", fn_name="trace_read_return")
b["dist"].print_log2_hist("usecs")

bpftrace

High-level tracing language:

# One-liner: syscall counts
bpftrace -e 'tracepoint:syscalls:sys_enter_* { @[probe] = count(); }'

# File I/O latency by process
bpftrace -e '
kprobe:vfs_read { @start[tid] = nsecs; }
kretprobe:vfs_read /@start[tid]/ {
    @us[comm] = hist((nsecs - @start[tid]) / 1000);
    delete(@start[tid]);
}
'

# TCP connections by process
bpftrace -e '
kprobe:tcp_connect {
    @[comm] = count();
}
'

Cilium

eBPF-based Kubernetes networking:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-frontend
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: frontend
      toPorts:
        - ports:
            - port: "80"
              protocol: TCP

Pixie

Automatic application observability:

# Install Pixie
px deploy

# Query application metrics without instrumentation
px run px/service_stats
px run px/http_data
px run px/dns_data

Production Observability

Continuous Profiling

# Parca / Pyroscope / Polar Signals
setup:
  - Deploy eBPF agent to nodes
  - Automatic profiling of all processes
  - No code changes required
  - Minimal overhead (<1% CPU)

benefits:
  - Always-on profiling
  - Historical analysis
  - Compare before/after deployments
  - Find performance regressions

Network Flow Monitoring

# Hubble (Cilium's observability layer)
observability:
  network_flows:
    - Source and destination pods
    - Protocol and port
    - DNS queries
    - HTTP requests (L7)

  security_events:
    - Dropped packets
    - Policy violations
    - Connection attempts

  performance:
    - Latency histograms
    - Throughput metrics
    - Retransmission rates

Security Monitoring

# Falco / Tetragon for runtime security
events_captured:
  - Process execution
  - File access
  - Network connections
  - System calls
  - Container escapes

response_options:
  - Alert
  - Log
  - Kill process
  - Block network

Performance Considerations

Overhead

ebpf_overhead:
  well_designed:
    - Sub-microsecond per event
    - Near-zero when not triggered
    - Scales with event rate

  poorly_designed:
    - Excessive map lookups
    - Complex computations in kernel
    - Too many attached probes

best_practices:
  - Filter early (in eBPF, not user space)
  - Use appropriate map types
  - Batch data transfer to user space
  - Monitor eBPF program overhead

Safety

verifier_guarantees:
  - No infinite loops
  - Bounded execution time
  - Memory safety
  - No kernel crashes

limitations:
  - Stack size limited (512 bytes)
  - Program size limited
  - Some kernel functions not callable
  - Verifier can reject valid programs

Getting Started

Prerequisites

# Check kernel version (4.15+ for most features)
uname -r

# Install BCC tools
apt-get install bpfcc-tools linux-headers-$(uname -r)

# Or bpftrace
apt-get install bpftrace

Simple Examples

# List available tracepoints
bpftrace -l 'tracepoint:*'

# Count syscalls
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

# Block I/O latency histogram
biolatency

# TCP connection tracing
tcpconnect

# File open tracing
opensnoop

Key Takeaways

eBPF changes what’s possible for observability. Deep system visibility that was previously impossible or expensive is now routine.