Firecracker: Lightweight Virtualization for Serverless Applications
Agache, Brooker, Glorescu, Iordache, Liguori, Neugebauer, Piwonka, Popa (2020)
What kind of paper is this?
- We saw a problem; we built a thing; let us tell you about our thing.
The Story (Fairytale)
- Serverless is a thing.
- There is a tension between strong isolation (between different clients
running on the same host) and performance (overhead of launching the
environment for the service). This is a real problem for providers.
- Firecracker, a new VMM, is specifically designed to resolve this
tension.
- Using Firecracker, you can have good performance and strong isolation.
Context Setting
- Serverless is attractive from a systems management perspective
- Containers are popular for similar reasons
- When you support multiple workloads, you have to worry about isolation,
both for security and performance reasons. (This is what public clouds do.)
- You can support more containers on a HW platform than you can VMs though,
so if you want the scaling capabilities of serverless, you have lots of
users and that makes isolating them using VMs not quite possible.
- If you are willing to go all Linux all the time, you might be able to
cobble something together using Linux: cgroups, namespaces, and seccomp-bpf
with containers, but Amazon didn't want to do that.
- So, they chose to keep KVM, toss QEMU, and build their own VMM.
- Advertisement for Firecracker:
- Memory: Under 5 MB/container (VM?).
- Boot tie: Under 125 ms.
- Creation: Up to 150 MicroVMs per seonc per host.
- In production for two years.
- And the anti-advertisement (what it doesn't do)
- No BIOS
- Cannot boot 'arbitrary kernels' (I want to know what it can boot -- seems not
Windows)
- Does not emulate legacy devices or PCI
- No VM migration
- No orchestration (this replaces QEMU not Docker)
Lambda Focused
- Initial deployment: VMs separate customers; within a customer containers
separate functions.
- Dissatisfaction: required security/compatibility tradeoff and made
resource allocation more difficult.
- Requirements for a replacement:
- Isolation
- Compatibility (run Linux binaries)
- Minimal overhead/maximal density
- Bare metal performance
- Fast Allocation
- Soft switching
A Digression on Linux containers
- cgroups: limit use of memory, CPU, etc
- namespaces: partition user IDs, pids, network interfaces
- seccomp-bpf: limit which system calls ca be used
Firecracker VMM
- Use Linux features when they work for Firecracker (huge operational
advantage in famliarity and maturity).
- Block IO
- Process scheduling
- Memory management (between VMs)
- Virtual network interfaces
- Start with Google's crossvm
- Remove all the extraneous device drivers
- Remove 9p
- So: firecracker = crossvm - half-of-cross-vm + 20K loc + update(30K loc)
- Uses a boatload of Rust crates used both by crossvm and firecracker
- Interesting fast-start approach: start the firecracker process; configure
the microVM, but don't start/boot the MicroVM until you need it. Thus the
function startup time is short.
- Note that a firecracker VM acts more like a server (uses a REST API) than
a simple user process.
- Rate limiters seem to be a general mechanism used to control resource
utilization.
- Sandbox Firecracker with chroot, isolated pid and network namespaces,
dropping privileges, and using seccomp-bpf (24 syscalls; 30 ioctls)
to restruct the interface.
A Digression on side-channel Mitigation
- Disable SMT (hyper threading)
- Require kernel to enable mitigations: Kernel page-table isolation,
indirect branch prediction barriers, indirect branch restricted
speculation and cache flush mitigations againt L1 Terminal Faults.
- Enable kernel options: speculative store bypass mitigation, disabled
swap and samepage merging, avoiding sharing files, and HW mitigations
against rowhammer.
Eval
- Use the six requirements as the evaluation criteria.
- Baselines:
- QEMU
- Intel Cloud Hypervisor (took data)
- Boot Time:
- Measurement interval: start: VMM process fork; end: fork of init
- Interesting result: while cloudHV outperforms Firecracker in serial
startups, in paralle, Firecracker does better. (Why??)
- Memory Overhead:
- 32X smaller than QEMU
- About 4X smaller than CloudHV
- This is the huge economic win for Amazon
- IO:
- These results are (IMO) pretty surprising (particularly Figure 8).
- (BW) First, QEMU small write performance is way worse than bare metal, while the
other tests are much closer - it's something about the per IO overhead;
small reads are also significantly different.
- (BW) Focusing on Firecracker -- it is comprable to CloudHV, but inconsistently
compares to QEMU.
- The latency results are far less interesting.
- This seems like an area for improvement, unless they convince me that
persistent IO is not important (which it might not be; it might be the
case that when you care about persisting data, you just use some other
service).