A light introduction to eBPF

A light introduction to eBPF

eBPF

In this blog post, we are going to cover eBPF, what it is, why we need it, when to use it and how to get started.


"super powers have finally come to Linux" -- Brendan Gregg

In this blog post, we are going to cover eBPF, what it is, why we need it, when to use it and how to get started. However, before that, we must cover some more fundamental concepts around the Linux Kernel and Linux-based operating systems.

Figure 1: Quote: eBPF is to the kernel what JavaScript is to the bowser -- Source

Some Background Information

Most of you have probably heard of Linux. Linux is an operating system – actually, the most popular operating system, used on personal computers, in data centres, at the edge and Android is also based on Linux.

Generally, an Operating System manages the communication between the software and the hardware on your machine. We can logically divide a machine into the Kernel Space and the User Space. The Kernel space encompasses all the vital functions required to navigate processes on the hardware. These include memory management, process management, system calls and security, and device drivers. The user space, on the other end, is everything that the user can normally interact with, such as the files and applications on the machine. Usually, the Kernel and the processes on top are invisible to the user. This is a good thing since most users should not be able to change vital components of the Operating System.

Figure 2: User Space vs. Kernel Space

In some cases, one might want to interact directly with the processes in the Kernel space.
Interacting with processes on a lower level, right as they happen, can provide more accurate information since there is no layer filtering or modifying events. To interact with the Kernel space, historically, one would have to propose changes to the Linux Kernel itself or extend its functionality with Kernel Modules. Kernel Modules are code that can be dynamically loaded into the Kernel to extend its functionality without having to recompile the Kernel. However, Kernel modules are difficult to use and maintain as they tend to break with every update to the Kernel. This is making and implementing changes to the Kernel a difficult and lengthy process. A newer more efficient and secure way is using eBPF.

Historically, the operating system has always been an ideal place to implement observability, security, and networking functionality due to the kernel’s privileged ability to oversee and control the entire system. Source

What is eBPF?

eBPF is a technology that makes it possible to run programs in a sandbox environment. The acronym eBPF used to stand for extended Berkeley Packet Filter but has little to do with the technology now. Thus, by now, eBPF is simply the name for the technology.

Think of a sandbox environment as a restricted test environment. When you run a program inside the Linux Kernel, you want to make sure that the processes you run cannot do any damage to other processes outside of your sandbox environment. This is exactly what eBPF enables engineers to do – to interact with the Kernel in a secure, efficient and least invasive way.

Figure 3: eBPF overview diagram from the main website.

The eBPF site itself showcases the following diagram to highlight where eBPF would be implemented and used. However, the diagram itself can be a bit misleading. To explain it a little bit easier, most people never interact with the Kernel but rather with the applications on top of the operating system. Similarly, most programmers are utilising higher-level languages. This is usually good enough for developing applications but interacting with application processes on a lower level will become difficult. When an engineer wants to gain more insights into networking, and security or monitor the behaviour of applications and their underlying processes, they will have to interact with the Kernel. This can be done through eBPF programs. These allow you to load bytecode into the Kernel during runtime. This removes the need to make permanent changes to the Kernel directly or to recompile the Kernel to load the eBPF program.

The above diagram shows some of the main components of eBPF:

- Verifier: Checks the code before it is loaded into the Kernel to make sure that it is safe to run

- JIT (Just In Time compiler): Is responsible to compile the code to bytecode for loading into the Kernel

- Maps: Used to exchange data between the user space and the kernel space

- Kernel Helper API: These are functions that can be loaded into the Kernel to provide further information on how the program runs

To ensure that eBPF program cannot cause any permanent damage within the Kernel, verifiers check the code before it is loaded. The verifiers will follow any potential path that the program takes within the Kernel and imitate its behaviour to ensure it passes a set of requirements. For instance, eBPF programs are not supposed to contain loops.

Generally, eBPF programs will listen for specific events in the Kernel. Once that event happens, the eBPF code gets triggered. The diagram below provides a simplified overview:

Figure 4: Diagram that showcases how eBPF programs are accepted or rejected by the verifiers -- Source

There are several toolchains that assist the development of eBPf programs. These toolchains usually allow us to write our programs in our higher-level languages such as C, python, and go and then do the hard work of compiling the code down to lower languages. Below is a list of toolchains:

  1. Bpftrace, which is a high-level tracing language
  2. Ebpf Go Library
  3. libbpf C/C++ Library
  4. Bcc which enables users to write Python programs with ebpf embedded in them

Use Cases and Applications

eBPF has a growing landscape of applications; most of them are listed directly on the eBPF site. These applications fit into the following categories:

  1. Networking
  2. Security
  3. Observability
  4. And tools to aid the development of eBPF programs

Additionally, eBPF can be used for debugging processes.

eBPF has become very popular in the cloud native space in the previous years.

The below graphic shows what tools are available for the cloud native space that are using eBPF. The graphic has been copied from the eBPF Day presentation by Thomas Graf from Isovalent.

Figure 5: List of cloud native projects that are using eBPF Source

Containers and Container Orchestration Systems, such as Kubernetes, are very complex systems. A lot of the processes are often not visible to users but happen in the background on the underlying machine. eBPF makes it possible to visualise and access parts that most cluster admins and security professionals would not be able to visualise otherwise. This is particularly important for security and observability related use cases.

Furthermore, eBPF can enable more efficient networking whereby networking rules are applied to eBPF programs rather than modifying IP tables. For instance, some of the logic in Kubernetes sidecars can be replaced with eBPF programs. Sidecars are containers that run alongside your application and usually have a specific responsibility.

This is just a very high-level overview of the use cases and benefits that eBPF provides in the cloud native space.

What’s next?

In this blog post, we provided an overview of eBPF, including some background information, and how and why eBPF is so powerful.

Thank you for reading. If you enjoy this content, consider subscribing to my weekly DevOps newsletter 🥰

Shout out to Shubham and Jose for reviewing the blog post.