Xen and the Art of Virtualization
Barham, Dragovic, Fraswer, Hand, Harris, Ho, Neugebauer, Pratt, Warfield
(2003)
What kind of paper is this?
- Maybe a big idea? Or does it just describe a system?
- Different approach to virtualization: require "minimal"
porting effort to run Guest OSs.
- Why?
- Is it worth it?
Requirements
- Isolation
- Support lots of operating systems
- Low overhead
Design Points
- Support 100 VMs (what is a modern server?)
- OK to require porting OS
- Not OK to require porting applications (paravirtualization OK)
- VMs must provide performance isolation
- May need to expose some aspects of the virtualization
Why?
- Fundamentally the x86 is not very virtualizable! Requires trapping
to hypervisor for some instructions and maintaining shadow data
structures.
- Sometimes it's nice to have a view of both virtual resources and
real resources.
- Why do you suppose there is a point by point comparison?
- After reading this section, do you have a reasonable picture of the Denali environment?
- Do you like Denali? Do you want to read papers about it?
- As we read papers over the next couple of weeks, can you come up with
a name for the style of system represented by Denali?
Terminology
- Guest OS: an OS that Xen can host
- Domain: a running VM in which an OS runs
- Hypervisor: Xenli>Hypervisor: Xen
Paravirtualization
- The exported machine is a machine, but not a machine identical
to the actual hardware.
- Requires some porting effort for the guest OSs.
- Applications run unchanged
- There are advantages to being able to expose both the real and
virtual hardware (e.g., real time).
The Virtual Machine
- Memory Management
- Hardware managed TLB
- No tagged TLB
- Xen resides in high 64MB of memory (avoids flushing TLB on every
entrance/exit of hypervisor)
- Guest OSs responsible for managing hardware page tables
- Once allocated, page table udpates are under the control of Xen (not the Guest OS)
- CPU
- Use Ring 1 for guest OSs to grant OS privilege over applications.
- Some privileged instructions fail silently.
- Most of the time, Guest OS handlers are the same as they are for
the real HW.
- Page faults all have to go to Xen so that Xen can leave the faulting
address somewhere other than the (protected) CR2 register.
- Device I/O
- Xen provides a set of clean, simple interfaces
- Lightweight events implement "interrupts" for the VMs
- This is perhaps the best part about paravirtualization! Device
drivers become really simple (because we can design really simple
virtual devices).
- Porting to the VM
- Fairly straightforward
- Linux easy
- XP trickier, but possible. More changes to architecture independent
code because it ``uses a variety of structures and unions for accessing page
table entries.'' So there was a lot of manual (and scripted) changes.
the architecture-independent code?).
- NetBSD still in progress.
- Sure wish they told us how much time it took and whether the work was
done by a developer experienced with the particular system.
Xen Architecture
- Communication between Xen and guests
- Hypercall: synchronous call (e.g., a trap) from Guest to Xen
- Asynchronous events (lightweight notification, interrupts) from Xen
to Guest OS.
- Data Transfer
- Goal: make efficient and avoid interference between different
virtual machines.
- Xen and the Guest OS share an I/O descriptor ring
- The guest allocates the buffers and refers to them from descriptors
in the ring.
- Xen and Guest act as producer/consumer around the ring: Guest
produces into request portion; Xen produces into response portion.
- Virtualization Techniques
- CPU Scheduling
- Uses Borrowed Virtual Time
- Schedule in terms of virtual time
- Threads that need low-latency behavior can "borrow" against future
CPU allocations, but changing their effective virtual time.
- Schedule thread with earliest effective virtual time.
- Work conserving
- Low latency dispatch
- Time
- realtime: cycle counter
- virtual time: ticks while a domain runs
- wallclock: offset added to real time
- Virtual Memory
Guest OSs have read-only access to page tables
Guest OS page tables are done in HW
Xen is needed to update the tables
Updates from guests can be delayed by Xen; usually
transparent, but sometimes guest OS needs to take action.
Physical Memory
- Statically partitioned among domains
- Guest OSs can release some and ask for more
- Xen may give a domain discontiguous chunks; guest is
responsible for providing illusion of contiguity
Network
- Domains have their own virtual interfaces (VIF)
- Use I/O rings to interface to VIF
Disk
- Guests use Virtual Block Drivers (VBD)
- Domain0 accesses actual disk.
- Also accessed via I/O rings
- VBD is a list of extends with ownership and ACL information
Evaluation
- Structure
- Other virtualization techniques
- Virtualized versus native
- Performance isolation
- What are your thoughts about the systems against which they couldn't
compare?
- Comparison systems
- Xen + XenoLinux
- VMware Workstation on Linux, running Linux
- User mode Linux (UML)
- Native Linux
- Benchmarks
- Specint: sanity check -- should be pretty comparable
- Linux build: standard OS benchmark
- DB benchmarks: ones that challenge other virtualization approaches
- dbench: exercise file system
- SpecWeb99: exercise network
- Summary: Xen does quite well (typically 95%);
not all virtualization techniques do.
- Performance relative to Linux
- SpecINT:High order bit is that Xen looks pretty great.
- Linux build: Good: better than 95%.
- DB benchmarks: above 90%; others are terrible
- dbbench: above 95%; others are terrible
- SpecWeb99: multiple apps or OSs. Numbers look great (other techniques
look awful)
- lmbbench: Xen looks excellent
- Postgres numbers cool: multiple domains take advantage of second
processor; not too much overhead going to additional domains.
- Isolation: nice experiments; great results
- Scalability: again, results are very encouraging -- minimal
degradation
Note
- I particularly liked, "...presumably due to a fortuitous cache alignment
in XenoLinux, hence underlining the dangers of taking microbenchmarks
too seriously.