Hance: Storage Systems are Distributed Systems (So Verify Them That Way!)

Hance, Lattuada, Hawblitzel, Howell, Johnson, Parno (2020)

What kind of paper is this?

The Story

Once upon a time, storage systems used simple data structures that were great to reason about or verify. However, they suffered performance overheads for random-insertion workloads. Storage systems then switched to LSM-trees and Bε trees. This not only improved performance but also made the code drastically more complex and made verifying it extraordinarily difficult and slow, reducing programmer productivity. The authors thought: wait a minute: aren't storage systems not distributed systems in disguise? Using that insight, they ant generalized IronFleet's proof methodology for verifying distributed systems and applied it to a storage system. They specified, built, and used Dafny to verify VeriBetrKV, a key-value store using Bε trees and journaling. Along the way, the authors developed techniques to quickly evaluate correctness of code and proofs using modularization resulting in 99% of the proofs finishing within 20 seconds or less.

Assumptions

Overall Approach

The IOSystem State Machine

A Different Performance Eval

Reading Dafny

VeriBetrKV

Eval

Margo's Pet Peeve