The eXpress Data Path: Fast Programmable Packet Processing in the
Operating System Kernel
Hoiland-Jorgensen, Brouer, Borkman, Fastabend, Herbert, Ahern, Miller (2018)
What kind of paper is this?
- That's a tough question.
- It's not really a research paper.
- It's kind of a description and evaluation of a deployed system -- in
someways, it is perhaps similar to some of the Google papers (e.g., Map
Reduce, BigTable, etc).
- In other ways, it's not, because there seems to be little novelty -
it feels like a nice application of eBPF.
The Story
- Once upon networks were slow. Operating systems had no difficulty
processing packets quickly enough to keep up with the rate they were
being transmitted. However, in an era of 40GBPS and 100 GBPS networks,
this is no longer the case. Many intrepid researchers developed different
approaches to allow people to write tiny programs that execute on each
incoming packet. Many of these bypass the kernel, and while that saves
time, it also introduces additional engineering costs and introduces
possible security risks. The Linux folks decided that they could
write small programs that they download into the kernel and do packet
processing in-kernel using this little programs. They wrote these
programs in a language (eBPF) that could be checked to make sure that
they didn't do anything bad. These programs got the benefit of running
in kernel (no context switch per incoming packet) as well as the benefit
of not being able to crash the kernel, so everyone lived happily ever
after.
Best of Both Worlds
- Kernel bypass (e.g., DPDK) moves control of the network HW out of
the kernel into the application (what does this remind you of?).
- Downside of kernel bypass: we are processing packets in unprotected code.
- eXpress instead moves application logic into the kernel.
- Upside: kernel is in control of the HW.
- Downside of downloading code: could corrupt the kernel.
- Solution: introduce a safe "virtual execution environment" in the kernel.
Four Components
- XDP Hook Driver:called on every packet (main entry point)
- Hook in the network device driver (runs in-kernel)
- Uses tail calls to get modularity without stack depth
- Given a context object upon invocation of the XDP program
- Can add metadata to a special area referenced by context object
- Can also write the packet data
- eBPF Virtual Machine
- Register-based Vm
- Elevent 64-bit registers
- Code is JITed
- BPF Maps
- Maintains all persistent data needed by XDP
- Can be both global and per-CPU maps.
- eBPF verifier
- Static analysis checks for: no unsafe actions, no loops, program
size limited
- Builds control flow DAG
- Walks all paths chceking for safe memory accesses and that helper
functions are called with proper parameters.
- Ensure that program does its own bounds checing
Performance Evaluatoin
- Baseline is Linux; best case is DPDK (user-mode direct access)
- Evaluate three metrics:
- Packet drop
- CPU utilization
- Raw packet forwarding performance
- Packet Drop: XDP comes in closer to DPDK than to Linux
- CPU Usage: DPDK polls, so we see that XDP is way less CPU intensive
(and much less than Linux as well).
- Packet forwarding: Does better than both Linux and DPDK by a fair amount
at core counts above three.
- ALso demonstrate some use cases: Load balancing, Software routing,
DOS mitigation
- Nice, straight forward eval.