Adding Generic Process Containers to the Linux Kernel
Menage (2007)
What kind of paper is this?
- It's really a proposal (and a bit of background)
The Story
- Once upon a time Linux had cpusets, which allow for the
allocation of memory and CPUs to a set of processes.
However, this mechanism was insufficient to provide full
resource management to a set of processes, which was deemed
attractive to many.
Many groups proposed multiple different approaches to
providing such resource management, but the approaches were
all slightly different, yet overlapping.
Then Paul Menage outlines the requirements of a fundamental
resource management framework and proposes a way to meet
these requirements so that if Linux incorporated it, all its
users would live happily ever after.
Background
- Definitions
- Resource control: track and/or limit resource consumption by
set of processes. (Visible to processes)
- Namespace isolation: a layer of naming indirection that
constrains what is visible to a set of processes. (Invisible
to the processes)
- Container: set of processes being managed in this fashion.
- At the time of this writing, there were a bunch of different solutions.
Requirements
- Support for multiple independent containers
- Processes should be able to move into and out of containers
- Process itself cannot move it; only a privileged process
- UI Extensibility -- customized interfaces for different
subsystems.
- Some UI Consistency:
All containers should support some of (this is a suggestion):
- A file system (specify what files can be read/written)
- A property API (name/value pairs)
- Nesting
- Allow processes to be in multiple containers (of different
types) so they could be part of different hierarchies -- that is
a process can belong to one CPU hierarchy and a different memory
hierarchy
- Allocate things other than processes to containers, e.g.,
pages, sockets, etc.
- Low overhead
Proposal
- Extension of cpusets
- Multiple hierarchy approach
- New structures
- container: abstraction for a container -- no resource-specific
state; no list of tasks; just meta-data about the container.
- container_subsys: single resource controler -- contains the
callbacks invoked to do the actual resource accounting.
- container_subsys_state: base type from which subsystem state
objects are derived; interface between a specific subsystem and the
generic container system
- css_group : one container_subsys_state pointer for each
registered subsystem