Twizzler: a Data-Centric OS for Non-Volatile Memory
Bittman, Alvar, Mehra, Long, Miller (2020)
What kind of paper is this?
- Big idea: build a new OS around NVM.
- Also a "new HW drives new SW" paper.
Bold Vision
- Get kernel out of IO path (i.e., kernel bypass, a la Arakis)
- Memory-style access to persistent data
- Long term sharing between applications and between runs of applications
(vaguely reminescent of FAASM).
- UNIX vision in the context of NVM.
- "And, oh by the way, we're fast."
The Story
- NVM offers the promise of persistence at DRAM performance.
- Fully leveraging this technology requires redesigning the whole stack.
- Make objects the centerpiece of the system.
- Pointers are relative to an object (not an address space).
- Such a system is easier to program, more flexible, and allows for better performance.
What is a Data-centric OS?
- No kernel involvement for persistence (even more extreme than kernel bypass).
- Pointers last forever -- they cannot be virtual addresses, tied to an address space.
- Architectural implications
- Use MMU for isolation and translation, but data structures are not referenced
by virtual addresses.
- Replace processes with security contexts.
Twizzler Design
- Abstractions
- Threads
- Address spaces
- Persistent Objects
- Security Contexts
- Execution of programs sounds a lot like a proces: a number of threads
executing in an address space.
- Persistent objects are mapped, on demand, into the address space.
- Views are how programs map objects (but are not a main abstraction?)
- Security contexts define a thread's rights on objects
- Twizzler is like an exokernel; expects a library OS (libtwz). Libtwz:
- Manage's a program's mappings to objects
- "deals with" persistent pointers
- twix is a POSIX layer
- musl (a small libc) maps libc calls to twix (all in userspace)
Objects
- 128 bit UID
- Contiguous in both physical(?) and virtual memory
- 4 KB - 1 GB in size
- Unit of access control (via MMU)
- Support references between objects
- Kernel object services
- Creation/deletion
- Object copy (does COW)
- Naming
- Reference counting
-
Virtual Address Management
- Map objects into an address space (via libtwz).
- Loading/mapping executables does not require kernel involvement.
- A view is an object that defines the layout of an address space.
- Standard access controls apply to views (as they are just objects).
- The kernel maps objects -- the view is like a page table.
- Threads can change views (via set_view system call) or invalidate views.
Persistent Pointers
- object UID + offset
- Indirect through a (per-object) foreign object table
(FOT) (this is a lot like linkage sections in Multics); FOT is an
array containing an object ID and some flags (RWX).
- FOT is at a known offset in the object.
- The actual cross-object pointer is simply a 64-bit value with an FOT
index and an offset.
- Multiple FOT entries can access the same object, but with different
permissions, and there is an atomic update for all FOT entries for a
specific object.
- FOT entries can use object IDs or names (which get bound in a lookup
table in the object) -- facilites late binding.
- Persistent pointers are translated to virtual addresses (and back) using
ptr_lea and ptr_store.
Security and Access Control
- Permissions are checked on access, not mapping.
- Threads run in a security context, which specifies access rights for objects.
- Use virtualization extensions to map VM to object space (which has
access rights) which is then mapped to physical memory.
Eval
- Note: two of the three questions are not performance related.
- KV-store Case Study
- Multi-threaded, hash-table.
- Supports insert, lookup and delete.
- 250 lines of code.
- Data and indexes are stored in separate objects.
- Added per-key access control: values with the same access rights get
stored into the same object, so index now points to different objects.
(This means changing access is hard?)
- Red-Black Tree
- In-memory RB tree
- Initial (normal pointer version) was 100 lines of code.
- Manually wrote base+offset version: requires manually converting
pointers to/from persistent. [Why is this application specific?]
- Ported to Twizzler
- Growing the data structure is brittle in Unix: no cross-object pointers
and growing mappings had to be done explicitly.
- Porting SQLite:
Used an existing memory-mapped backend; port required minimal modifications.
- Performance: Compared to NOVA which has problems over time.
- YCSB (on SQLite): ~25%-2x better throughput.
- YCSB (on SQLite): Latency is comparable or better (PMDK is awful).
- twzkv v unixkv: comparable latency for lookup and insert