Assignment 1: Paper Critique and Reproducibility (2019)
Due September 23 -- electronically (email to email@example.com) by 5:00 PM
You may complete this assignment either alone or with a partner.
If, however, you have a partner, I will have higher expectations in
terms of what you actually reproduce. If you plan on working with a
partner, please let me know ASAP.
Select one paper that meets all of the following criteria:
- It is an evaluation paper.
- You are not an author.
- It includes data (graphs, tables, numbers).
- It has something to do with operating systems.
The paper selected may be one listed here or one from the reading list,
but it does not have to be.
As soon as you've selected your paper, please send me
email and tell me what paper you are doing (this gives me time to read
all the papers before I have your assignments to read).
Your job is to:
- Critique the paper (details below). Do this part before you
attempt part 2.
- Reproduce one or more of the experiments in the paper.
See below for what to turn in.
- Write an addendum to your critique commenting on the reproducibility
of the research presented
1. Writing your Critique
Focus your attention on the research methodology more than the
research idea. Your critique will probably be on the order of two
to three pages although there will be exceptions. In your critique
you must at least answer the following questions (depending on the
paper, there will be other things to discuss as well).
- What is the purpose of the paper?
- What is the hypothesis that the authors are testing?
- What is the experimental setup?
- What is good/bad about the experimental setup?
- How well was the research carried out? What results are presented?
- Do you believe the results? Why/Why not?
- What things might you have done differently?
- What lessons did you learn from reading this paper critically?
2. Reproducing Results
Pick one or two experiments from the paper and try to reproduce the
tests and measurements. Undoubtedly you will have difficulty
actually reproducing the results. This is OK. You will be graded
on how you approach the reproduction, how carefully and fairly you
compare your experience with that of the authors,
and how completely you can state the assumptions that you had to make.
The goal of this exercise is to understand systems research, writing,
Be careful to
articulate any hidden assumptions that you make. Think hard about
how to interpret your results given different hardware and
You may take advantage of data and/or tools that have been made
available by the authors, but you may not do so to the extent that
there is no work left to the assignment.
What to turn in:
- A write-up of what experiment you are trying to reproduce (identify the
corresponding tables/graphs from the original paper).
- A description of your experimental setup.
- A discussion of any assumptions you made and important information that
the authors did not provide in their paper.
- A list of any tools and/or traces that you used.
Discuss your results and how and why they differ from those published.
Then add a few
paragraphs to your critique discussing the reproducibility of the results.
whether or not your assessment of the paper changed after trying to reproduce the
Here are some suggested papers.
If you chose something not on this list, check with me
before beginning the assignment to make sure that the task you are
- Agrawal 2009,
Generating Realistic Impressions for File-System Benchmarking.
Take measurements of a real syste, generate an impressions file system
and them compare (as is done in Figure 2).
- Baker 1991, Measurements of a Distributed File System.
Process the traces to reproduce some/all of the graphs or write a
simulator to reproduce some of the results from the second half of
- Blackwell 1995, Heuristic Cleaning
Algorithms in Log-Structured File Systems
(appeared in the 1995 Usenix Technical Conference).
Use the trace-data that is available and reproduce some of the graphs.
- Blake 2003,
High Availability, Scalable Storage,
Dynamic Peer Networks: Pick
Two (appeared in the 2003 Hot Topics in Operating Systems).
Reproduce the graph in Section 4.1.
- Brown 1995,
Benchmarking in the Wake of
Lmbench: A Case Study of the Performance of NetBSD on the Intel x86 Architecture.
Find a range of Intel processors
and try to reproduce some of the hierarchical decompositions.
- Cadar 2008, Klee: Unassisted and
Automatic Generation of High-Coverage Tests for Complex
Systems Programs. Download their tool (it's not on a Stanford site, it's
at llvm.org) and try it on some of the workloads they used.
- Chen 1995, The Measured Performance of
Personal Computer Operating Systems. Try rerunning
some of the microbenchmarks on modern machines and modern versions of
the operating systems in the paper.
- Cutler 2018,
The benefits and costs of writing a
POSIX kernel in a high-level language.
The code for this project
is available here.
See if you can reproduce some of their measurements about kernel functionality.
- Dahlin 1994, A Quantitative Analysis of Cache
Policies for Scalable Network File Systems
(appeared in the 1994 Sigmetrics). Use the Sprite traces and reproduce some of
- Ellard 2003,
Passive NFS tracing of Email Workloads (appeared in the 2003
USENIX FAST conference). Process the traces with your
own tools and produce some of the graphs.
- Harnik 2013
To Zip or not to Zip: Effective Resource Usage for Real-Time
See if you can get the same kinds of compression timings that
the authors got.
- Holland 2013
Flash Caching on the Storage Client.
Build a simple simulator that tackles a small subset of the design
space and see if you can get results similar to those of the authors.
- Howard 1988, Scale and Performance in a
Distributed File System (on the reading list). Recreate
the Andrew benchmark results.
- Koller 2013:
Write Policies for Host-side Flash Caches.
Start with the analytical results from Figure 1.
Then see if you can put together a system that looks something like
what the authors did and see if you can run any of their benchmarks.
- Kyrola 2012
GraphChi: Large-Scale Graph Computation on Just a PC.
Most of the graphs from this paper are available from the SNAP repository
and many of the systems against which to compare are open source.
- Lozi 2016
The Linux Scheduler: a Decade of Wasted Cores
This paper has a collection of different graphs illustrating several
interesting behaviors of the Linux scheduler, see if the behavior
described still exists.
- Mao 2012
Cache Craftiness for Fast Multicore Key-Value Storage.
The software described here is available
See if you can reproduce any of figures 9 - 11.
- McKusick 1984, A Fast File
System for UNIX. Regenerate (and explain how) the numbers in
Table 1; generate results similar to those in tables IIa and IIb
for different block/fragment size combinations on an FFS.
- McSherry 2013
Scalability! But at what COST?
These are all graph processing problems on a PC; how hard can that be?
- McVoy 1996,
lmbench: Portable Tools for Performance Analysis
(appeared in the 1996 USENIX technical conference). Run lmbench
on some platforms similar/identical to the ones described.
- Megiddo 2003,
RC: A Self-tuning Low Overhead Replacement Cache
(appeared in the 2003 USENIX FAST conference). Implement or simulate ARC and
- Oppenheimer 2006,
Service Placement in a Shared Wide-Area Platform. Obtain
their traces and see if you can reproduce some graphs.
- Qiao 2006,
Structured and unstructured overlays under the microscope. Pick some
existing systems in deployment today and conduct analyses like those done in
- Roghanchi 2017,
ffwd: delegation is (much) faster than
you think. This paper explores different ways to consistently handle
access to shared memory. See if you can reproduce any of the benchmarks
in the first three or four figures.
Code is available here
- Rosenblum 1992, The Design and
Implementation of a Log-Structured File System.
Recreate the micro-benchmarks section.
- Roy 2013,
X-Stream: Edge-centric Graph Processing using Streaming Partitions.
This paper has a lot of different data - not just run time. Trying to
reproduce it should be, um, fun.
- Sumbaly 2012,
Serving Large-scale Batch Computed Data with Project Voldemort.
Using the publicly available Voldemort and MySQL releases, see if you
can reproduce any of the graphs in the evaluation.
- Vangood 2017, To FUSE or Not to
FUSE: Performance of User-Space File Systems. This is largely a benchmark
paper, so pick a benchmark or two and go wild!
- Volos 2014, Aerie: flexible file-system
interfaces to storage-class memory. See if you can reproduce Figure 1.
- Wu 2018, Anna: A KVS For Any Scale.
The code for this system is available
here. Can you
reproduce any of the comparisons with Redis or Cassandra or any other
widely used KV store?