In this blog post, we are going to cover eBPF, what it is, why we need it, when to use it and how to get started. However, before that, we must cover some more fundamental concepts around the Linux Kernel and Linux-based operating systems.
Figure 1: Quote: eBPF is to the kernel what JavaScript is to the bowser -- Source
Some Background Information
Most of you have probably heard of Linux. Linux is an operating system – actually, the most popular operating system, used on personal computers, in data centres, at the edge and Android is also based on Linux.
Generally, an Operating System manages the communication between the software and the hardware on your machine. We can logically divide a machine into the Kernel Space and the User Space. The Kernel space encompasses all the vital functions required to navigate processes on the hardware. These include memory management, process management, system calls and security, and device drivers. The user space, on the other end, is everything that the user can normally interact with, such as the files and applications on the machine. Usually, the Kernel and the processes on top are invisible to the user. This is a good thing since most users should not be able to change vital components of the Operating System.
Figure 2: User Space vs. Kernel Space
In some cases, one might want to interact directly with the processes in the Kernel space.
Interacting with processes on a lower level, right as they happen, can provide more accurate information since there is no layer filtering or modifying events. To interact with the Kernel space, historically, one would have to propose changes to the Linux Kernel itself or extend its functionality with Kernel Modules. Kernel Modules are code that can be dynamically loaded into the Kernel to extend its functionality without having to recompile the Kernel. However, Kernel modules are difficult to use and maintain as they tend to break with every update to the Kernel. This is making and implementing changes to the Kernel a difficult and lengthy process. A newer more efficient and secure way is using eBPF.
What is eBPF?
eBPF is a technology that makes it possible to run programs in a sandbox environment. The acronym eBPF used to stand for extended Berkeley Packet Filter but has little to do with the technology now. Thus, by now, eBPF is simply the name for the technology.
Think of a sandbox environment as a restricted test environment. When you run a program inside the Linux Kernel, you want to make sure that the processes you run cannot do any damage to other processes outside of your sandbox environment. This is exactly what eBPF enables engineers to do – to interact with the Kernel in a secure, efficient and least invasive way.
Figure 3: eBPF overview diagram from the main website.
The eBPF site itself showcases the following diagram to highlight where eBPF would be implemented and used. However, the diagram itself can be a bit misleading. To explain it a little bit easier, most people never interact with the Kernel but rather with the applications on top of the operating system. Similarly, most programmers are utilising higher-level languages. This is usually good enough for developing applications but interacting with application processes on a lower level will become difficult. When an engineer wants to gain more insights into networking, and security or monitor the behaviour of applications and their underlying processes, they will have to interact with the Kernel. This can be done through eBPF programs. These allow you to load bytecode into the Kernel during runtime. This removes the need to make permanent changes to the Kernel directly or to recompile the Kernel to load the eBPF program.
The above diagram shows some of the main components of eBPF:
- Verifier: Checks the code before it is loaded into the Kernel to make sure that it is safe to run
- JIT (Just In Time compiler): Is responsible to compile the code to bytecode for loading into the Kernel
- Maps: Used to exchange data between the user space and the kernel space
- Kernel Helper API: These are functions that can be loaded into the Kernel to provide further information on how the program runs
To ensure that eBPF program cannot cause any permanent damage within the Kernel, verifiers check the code before it is loaded. The verifiers will follow any potential path that the program takes within the Kernel and imitate its behaviour to ensure it passes a set of requirements. For instance, eBPF programs are not supposed to contain loops.
Generally, eBPF programs will listen for specific events in the Kernel. Once that event happens, the eBPF code gets triggered. The diagram below provides a simplified overview: