LXDs: Towards Isolation of Kernel Subsystems
Narayanan, Balasubramanian, Jacobsen, Spall, Bauer, Quigley, Hussain, Younis, Shen, Bhattacharyya, Burtsev (2019)
What kind of paper is this?
- We saw a problem; we designed a solution; we evaluated that solution.
- I felt betrayed when (at the top of page 2) they admit that their technique
really doesn't work, for exactly the reasons they desrcibed above -- everything
is tightly coupled. So, this paper is really, "Yet another technique for
isolating device drivers." I better see it compared to all the other approaches
for protecting device drivers. And they do exactly two device drivers.
The Story
- Today's operating systems are monolithic (for better or worse).
- Operating systems have bugs, one bug can affect the entire kernel,
and bugs cause security problems.
- We have a new isolation mechanism to isolate parts of the kernel from one another. It's easier to do than use a microkernel (implied).
- We have fewer security vulnerabilities.
Comparisons to prior work
- Sawmill: Linux on L4 (microkernel). Main criticism: synchronous IPC.
- Nooks: isolate device drivers. Main criticism: sync IPC, static analysis
for interfaces (and some claim that IDL is better?!).
- OSKit: large kernel components that could be assembled (did not folcus
on isolation at all). Main criticism: Too hard to maintain (true).
- Rump kernel: different problem -- building libOS for NetBSD. Claim that LXD
automates the decomposition, but I don't see that.
- User level device drivers. Criticisms: Too hard to build the user level
environment (but you only do that once), or virtualized kernel environment
is too expensive.
Architecture
- LXD is a loadable kernel module: glue code (from IDL) and libLXD (kernel
callbacks from device drivers).
- Uses VT-x for isolation (provides direct assignment of PCIe and direct
interrupt delivery).
- Use asynch cross-core communication when possible.
- And we have a small microkernel embedded in the monolithic OS! (L4-esque)
- Their IDL is for backward compatibility, but I don't see how it differs
from any other IDL.
- Use async threads for communication.
Their IDL
- Organized around modules, which have an interface.
- Generated stubs for the interface functions (just like any IDL ...).
- Generates a dispatch loop.
- Shadow all data structures in the driver and in the glue code (to main
tain isolation barrier).
- Synchronize shadows on every(?) invocation.
- Generate cross domain calls (a hidden argument is just like the 'this'
parameter in other languages).
- This may be a somewhat more involved generation problem but I see absolutely
nothing novel in this IDL or its implementation.
Use Cases
- Network Device Drivers: dummy and the Intel 82599 10Gpbs Ethernet Driver
- IDL LoC: 64/153
- Develop IDL Specification for the PCI bus interface
- Single copy from user to LXD; then zero copy in the LXD.
- Multi-Queue Null Block Driver
- IDL LoC: 68
- Nothing else about this driver seems particularly noteworthy.
Eval
- Async Runtime (What should I take away from this???)
- create and teardown minimal async block: 36 cycles
- Switch between pair of async threads: 29 cycles (20 CPU instructions
of which 16 touch memory).
- Single blocking ASYNC: 124 cycles
- Four blocking ASYNC: 374 (93.5 per block)
- Same-core v cross-core IPC: LXD v seL4 -- I have no idea what to make of
these numbers?! The numbers in the prose do not seem to match those in the
tables!
- Message Batching: batching helps.
- Dummy Device Driver: Example of a fast driver (iperf2)
- one kernel/driver crossing per packet: non-isolated: 956 K IOPS isolate: 730K (76%)
- two crossings per packet: adds 1794 cycles (to the 3009 from last experiment) -- that makes crossings look really expensive!
- Async communication induced by 4KB packets: async non-isolated 534 K IOPS;
sync non-isolated: 236 K (44%); async isolated: 341 K (64%)
- IXGBE driver (only real driver in the study): within 13% of the native
drivers on the send path; within 18% on the receive path.
- Multi-Queue Block Driver (fio) -- gets baout 79% of the non-isolated performance.