seL4: Formal Verification of an OS Kernel

Klein, Elphinstone, Heiser, Andronick, Cock, Derrin, Elkaduwe, Engelhardt, Kolanski, Norrish, Sewell, Tuch, Winwood (2009)

Big deal: First real system that was verified.
"First formal proof of functional correctness of a complete general-purpose operating system kernel."

Once upon a time people built operating systems without any way to know they were correct other than to run tests. However, operating systems are kind of critical and people thought it might be nice to be able to prove they were correct. Unfortunately, everyone thought that formal verification methods simply wouldn't work on something the size and complexity of an operating system. The brave knights of Australia decided to give this a try. They developed a two stage refinement approach 1) from abstract specification to executable specificatoin and 2) from executable specification to implementation (in C). After two long person-decades they emerged victorious from the quagmire -- with such a verified system, people could build secure operating systems and sleep well, living happily ever after (as long as the specs were correct ...).

Assumed to be correct: compiler, assembly code, boot code, cache management, and hardware
Definition of "kernel" = microkernel. (Sometimes we call the entire operating system the kernel, so it's important to get this clear.) -- the entire kernel is 8700 lines of C and 600 lines of assembler.
Definition of correctness: Implementation strictly follows the high-level abstract specification

3^rd generation microkernel
based on L4
Contains: virtual address spaces, threads, IPC, and capabilities (not from L4)
However, the kernel's VM has no kernel-defined structure -- that is implemented by user-level pagers.
Exceptions and non-native IPC also go to user-level servers to support virtualization.
Capabilities stored in capability container objects, CNodes, in capability address spaces.
Device drivers run at user level.
Kernel memory allocation is typed and explicit.

Select Haskell (subset) as an intermediate target that is:
- readily accessible by both OS developers and formal methods practitioners
- providing an artefact that can be automatically translated into the theorem proving tool and reasoned about
Write prototype kernel in Haskell (run it on simulated hardware). This can be automatically translated into an "Executable Specifiction."
Manually rewrite the kernel in C (allow for optimization)
Verification technique is machine-assisted and machine-checked proof, using Isabelle/HOL.
They are showing that the implementation is a refinement of an abstract specification. More precisely: the executable specification (translated automatically from the Haskell prototype) is a refinement of the abstract specification. The C implementation is a refinement of the Executable Specification.
So, the three parts are:
1. Abstract specification: specifies the external kernel interface, how system call argument are encoded in binary; what each system call does (in abstract terms) and what happens on an interrupt or fault. Describes WHAT happens but not how it happens.
2. Executable specification: Contains all the data structure and implementation details we expect to have in the final C kernel.
3. C implementation. (That means there is a formal semantics for a large subset of C.)
Basic proof structure
1. Preconditions
2. Statement(s) that modify state
3. Postconditions

Implicit state updates are bad (surprisingly global state is OK).
Data structures with lots of different uses and invariants are also hard.
Memory Management POLICY outside kernel; only need to prove that the mechanism works in the kernel.
Control parallelism (concurrency) by limiting to a uniprocessor.
Limit concurrency as much as possible -- event driven model and mostly atomic APIs.
Run as much as possible with interrupts disabled and then enable interrupts via polling (which gives you tight control over where they happen).
Typically return out of the kernel on an interrupt and then retry later.
Design to eliminate exceptions.

Describes HOW the kernel works.
Deterministic.
Data structures get explicit types.
Question: FIgure 4 is Haskell -- that's the Haskell prototype, not the executable specification automatically translated from the Haskell prototype, yes?

Do not model hardware instructions (rely on testing instead for things like cache/TLB flushes).
machine_state encapsulates the formal machine model.

Functional correctness via refinement.
Formalised for general state machines.
Main technique in refinement is to show that operations between states in the abstract specification map to transitions in states in the refined representation.
Transition types
1. kernel: the things described by each layer (in increasing detail)
2. user: non-deterministically changing arbitrary user-accessible parts of kernel state.
3. User events: kernel entry
4. idle: behavior of the idle thread
5. idle events: interrupts that occur during idle time
Let:
- M_A: The Abstract machine
- M_E: The Executable specification
- M_C: The C implementation
Therefore, we just need to show:
- M_E refines M_A
- M_C refines M_E
- Therefore, transitively, M_C refines M_A

The behavior of the C implementation is fully captured by the abstract specification.
Coverage is complete.
M_C never fails and always has defined behavior.
The kernel can never crash (all assertions are true).
All kernel API calls terminate (and return to user level).
No infinite loops.
All parameter checking is correct.
Four types of invariants
1. Low-level memory invariants: no object at 0, everything is properly aligned, objects have well-defined types, references refer to objects of the correct types.
2. Typing invariants: stronger than typical PL typing invariants. Context dependent and include value ranges and exclude values (e.g., NULL).
3. Data structure invariants: Links are correct in linked lists; no loops in data structures.
4. Algorithmic invariants: Prove things about how seL4 works (hardest part).

Yet again, we have a paper whose evaluation is kind of tricky.
I would call it a dancing bear, but they do claim performance is good.
IPC performance looks good, but it's not actually part of the verified C code base. That seems kind of unfair.
Code size: 32,900 of Isabelle; 14,400 of Haskell/C and 165,000 of Proof!!!
The proof is TWENTY PERSON YEARS.
I found the section about effort fascinating.