μSuite: A Benchmark Suite for Microservices

Sriraman, Wenisch (2018)

What kind of paper is this?

A performance study.

The Story

Online services are not composed of microservices.
Prior work analyzed entire, monolithic applications and the constraints on OS functionality are completely different in the microservices (and μs regime).
μSuite is a benchmark suite that targets these microservices, demonstrating just how important certain key OS activities are.
With this knowledge, we should be able to build systems more conducive to modern architectures.

The Benchmarks

The Problem: While there are instances of microservices, there are no available services implemented from the available microservices.
HDSearch: content-based image similarity search.
- 500K images
- 2K dimension feature vector
- Dataset size ~10 GB
- Front-end: Web app and feature extraction (not studied)
- Mid-tier: return IDs of the k-nearest neighbors to a query image. Computes a locality sensitive hash (LSH); looks it up in an in-memory table. Given a set of nearest neighbor candidates, submit request to leaf servers that might have the IDs returned. Leaves compute distance from the query vector to the IDs it stores and returns the distance-sorted list. Merge response from each of the leaves and pick the top-k
Router: Routes efficiently to something like memcached.
- Simplified subset of McRouter features.
- Front-end: client library that transports memcached get/set requests over RPC (not studied).
- Mid-tier: hash key using SpookyHash, map hash to leaf servers, and then send request to that a set of servers (three).
Set Algebra: document retrieval
- 4.3 Million WikiText documents from Wikipedia
- Total size about 10 GB)
- Sharded uniformly across leaves
- A posting list is a sorted list of document IDs (stored as a skip list).
- Leaves index posting lists of each term in their part of the corpus
- Front end: selects search queries from a query set
- Mid-tier: forward queries to leaf set, merge posting lists returned from each leaf server (set union).
- Leaf-tier: set intersection to identify IDs that appear on all the relevant posting lists
Recommend: Use aggregated user preferences to predict user ratings for an item.
- 10K user, item rating triples (from MovieLens)
- Represented as a sparse matrix
- Front-end: Recommend clients are a load generator that picks 1K user/item pairs and asks for a prediction on how much user will like item.
- Mid-Tier: Receives pairs from front end and forwards them to all the leaves; leaves are then averaged and returned to client.
- Leaves: collaborative filtering.

Benchmark Framework Design

All the mid-tier services use the same design.
Maintains a thread pool to avoid thread creation.
Network poller threads block waiting for requests from the front end.
Communication with leaves is fully asynchronous.
Network pollers dispatch threads to workers via producer/consumer queues.
A separate thread pool waits for leaf responses
This is where merges take place between response from different leaves

Measurement Methodology

Closed loop measurements establish peak sustainable throughput.
Open loop measurements explore tail latency
Load generators run on separate hardware and draw from poisson interarrival times.
Average measurements over five trials.
4-way sharded leaves for HDsearch, Set Algebra, Recommend
16-way sharding for Router
Use ebpf for syscount and timing.

Results

Goal: Do not saturate until 10's of thousands of requests
Achieved: 11.5K-16.5K peak requests!
Latency:
- Online services are bursty and have daily patterns
- Evaluate latency versus load: higher load induces more temporal locality and lower median latency (but higher tail latency)
OS and Network overheads
- Syscall invocations:futex dominates in all workloads; message and polling calls are less frequent, but noticeable.
- Syscall overhead: all in Active-Exe (delay from runnable to running)
- Context switches and thread contention: grow with load and become significant in the O(10000) range.