μSuite: A Benchmark Suite for Microservices
Sriraman, Wenisch (2018)
What kind of paper is this?
The Story
- Online services are not composed of microservices.
- Prior work analyzed entire, monolithic applications and the
constraints on OS functionality are completely different in the
microservices (and μs regime).
- μSuite is a benchmark suite that targets these microservices,
demonstrating just how important certain key OS activities are.
- With this knowledge, we should be able to build systems more
conducive to modern architectures.
The Benchmarks
- The Problem: While there are instances of microservices, there
are no available services implemented from the available microservices.
- HDSearch: content-based image similarity search.
- 500K images
- 2K dimension feature vector
- Dataset size ~10 GB
- Front-end: Web app and feature extraction (not studied)
- Mid-tier: return IDs of the k-nearest neighbors to a query image.
Computes a locality sensitive hash (LSH); looks it up in an in-memory table.
Given a set of nearest neighbor candidates, submit request to leaf servers
that might have the IDs returned.
Leaves compute distance from the query vector to the IDs it stores
and returns the distance-sorted list.
Merge response from each of the leaves and pick the top-k
- Router: Routes efficiently to something like memcached.
- Simplified subset of McRouter features.
- Front-end: client library that transports memcached get/set requests
over RPC (not studied).
- Mid-tier: hash key using SpookyHash, map hash to leaf servers,
and then send request to that a set of servers (three).
- Set Algebra: document retrieval
- 4.3 Million WikiText documents from Wikipedia
- Total size about 10 GB)
- Sharded uniformly across leaves
- A posting list is a sorted list of document IDs (stored as a skip list).
- Leaves index posting lists of each term in their part of the corpus
- Front end: selects search queries from a query set
- Mid-tier: forward queries to leaf set, merge posting lists returned from
each leaf server (set union).
- Leaf-tier: set intersection to identify IDs that appear on all the
relevant posting lists
- Recommend: Use aggregated user preferences to predict user ratings
for an item.
- 10K user, item rating triples (from MovieLens)
- Represented as a sparse matrix
- Front-end: Recommend clients are a load generator that picks 1K user/item
pairs and asks for a prediction on how much user will like item.
- Mid-Tier: Receives pairs from front end and forwards them to all the
leaves; leaves are then averaged and returned to client.
- Leaves: collaborative filtering.
Benchmark Framework Design
- All the mid-tier services use the same design.
- Maintains a thread pool to avoid thread creation.
- Network poller threads block waiting for requests from the front end.
- Communication with leaves is fully asynchronous.
- Network pollers dispatch threads to workers via producer/consumer queues.
- A separate thread pool waits for leaf responses
- This is where merges take place between response from different leaves
Measurement Methodology
- Closed loop measurements establish peak sustainable throughput.
- Open loop measurements explore tail latency
- Load generators run on separate hardware and draw from poisson
interarrival times.
- Average measurements over five trials.
- 4-way sharded leaves for HDsearch, Set Algebra, Recommend
- 16-way sharding for Router
- Use ebpf for syscount and timing.
Results
- Goal: Do not saturate until 10's of thousands of requests
- Achieved: 11.5K-16.5K peak requests!
- Latency:
- Online services are bursty and have daily patterns
- Evaluate latency versus load: higher load induces more temporal
locality and lower median latency (but higher tail latency)
- OS and Network overheads
- Syscall invocations:futex dominates in all workloads; message and
polling calls are less frequent, but noticeable.
- Syscall overhead: all in Active-Exe (delay from runnable to running)
- Context switches and thread contention: grow with load and become
significant in the O(10000) range.