Porter: Rethinking the Library OS from the Top Down
Porter, Boyd-Wickizer, Howell, Olinsky, Hunt (2011)
What kind of paper is this?
- This feels like another best of both worlds paper: we want the benefits of
VMs (strong isolation) with the flexibility and size of a library OS.
- Seems that the idea is not really new, but it's a massive step forward in
the evolution of the idea, perhaps?
- They actually run real Windows applications!
- They claim they redesigned a commercial OS as a library OS, which is amazing,
but did they really do the whole thing?
The Fairy Tale: Once upon a time the folks from MIT proposed implementing
most of an operating system in a library that could be linked directly with applications.
Their emphasis was on providing direct access to hardware to improve performance.
This approach fell out of favor as virtual machine (VM) computing took over the world.
However, VMs are large and consume a lot of resources. Drawbridge takes the idea of
combining applications with OS APIs (but not the direct management of hardware) to
produce unikernels that are much smaller than VMs, but provide similar advantage:
isolation, migration, and packaging. They demonstrated this approach using a large
commercial OS (Windows 7) and convinced the world that if they adopted this approach,
everyone would live happily ever after.
The Story
- Claim: Library OS went out of style with rise of VMM and VMs.
- Approach: Prioritize providing APIs in library with OS code reuse, and
- Avoid: low-level management of the hardware (in the library).
Architecture
- Hardware services: device drivers (Host OS)
- User services: GUI, shell, desktop, clipboard (Host OS)
- Application services: API implementation (Library OS)
- Interface libOS to HW: ABI implemented by platform adaptation layer and
security monitor (part of a VMM: virtualizes host resources)
- Interface libOS to User services: Remote Desktop (tunneled through ABI)
Windows 7
- Architecture
- DLLs
- services (daemons)
- NT kernel
- resource management
- scheduling
- IO services
- hierarchical namespace
- registry (KV store for configuratoin data)
- Drivers (replaceable components)
- Win32: Used by pretty much all applications
- Implemented as a collection of DLLs
- Over 100,000 API functions!
- Access kernel via ntdll (think libc)
- Tons of different pieces: kernel32k, win32k, user32, ...
- Also depends on some services, smss (init), etc
Approach
- Cull the 100,000 APIs down to a "manageable" 14,000 most commonly used.
- Implement a thin NT kernel emulation layer (doesn't need to deal with
multiple users)
- For dependencies on service daemons and Windows subsystem, try to remove
dependency; if not possible, then put the code you need into the libOS.
(Basically, could omit stuff designed to handle multi-application coordination.
- User simple devices in libOS and then tunnel them through RDP to real devices.
- Security monitor implements Drawbridge API (connects libOS with hostOS)
Drawbridge ABI
- VM (3 functions)
- Threading (5 for threads; 7 for synchronization)
- IO Streams (9 calls for data; 3 for metadata)
- Processes (2)
- Misc (7: time, random, cache flush, reference counting)
- Implemented in two components: dkmon (security monitor) and dkpal (platform
adaptation layer)
Process of Porting Windows 7 to libOS
- Bootstrap (Ugh!)
- Emulate the NT Kernel Interfaces
- 150 calls
- Either wrap drawbridge or return an error
- Deal with Shared System Services
- Process Serialization
Evaluation
- Oh look how wonderful -- they outline precisely what they need to evaluate:
- libOS can run rich desktop applications
- refactoring a real OS is feasible
- libOS is suitable for isolating a bunch of applications
- libOS protects host
- libOS provides greater mobility
- libOS allows independent evolution
- Note: A tiny line piqued my curiosity and really made me think:
they had to run setup on a desktop OS. I first made a note and asked
why not run it on Drawbridge and then I realized the problem: Drawbridge
is designed to run an application, but before you have an application
to run, what do you do?
Could you have a "setup" Drawbridge application? I'm quite curious.
- Cost of refactoring: gave us stats on the "cost" of bringing code into the
Drawbridge libOS
- 93 total binaries
- 62 with no change
- 12 are new implementations (claim most are tiny stubs)
- 19 modified
- But here is the "proof": Of 5.6 MLoC, required fewer than 16KLoc changes and
36KLoc of new code.
- But two person years sounds like kind of a lot?
- Overheads
- Memory: Looks excellent
- Startup: Way better than VM; tad worse than native
- Repeated copies of Excel: Again, way better than HyperV; modest increase relative
to native (almost no increase for IE, IIS).
- Security: this is a tough thing to show -- they did a reasonable job
- Delete registry key: does not affect any applications (because malware runs in its
own Drawbridge process)
- Key logger: captures only key strokes from malware
- Keetch study (checks IE exploits): mitigates all five attack vectors
- Migration: compare snapshot size to that of HyperV -- base executable about 2% of
the size; larger data makes comparisons smaller, because you have to capture the
data in both approaches.
- Kernel evolution: Another super hard thing to evaluate, so they tried using
a different kernel with their libOS and it basically worked.
Security patches: unsurprisingly (and perhaps unfairly), libOS requires fewer
security patches. (Unfairly, because it does not implement everything in the base system.)