Assignment 1: Paper Critique and Reproducibility (2020w2)

Due January 29 -- electronically (email to, by 5:00 PM

You may complete this assignment either alone or with a partner. If, however, you have a partner, we will have higher expectations in terms of what you actually reproduce. If you plan on working with a partner, please let us know ASAP.

Select one paper that meets all of the following criteria:

  1. It is an evaluation paper.
  2. You are not an author.
  3. It includes data (graphs, tables, numbers).
  4. It has something to do with systems.

The paper selected may be one listed here or one from the reading list, but it does not have to be. As soon as you've selected your paper, please send us email and tell us what paper you are doing (this gives us time to read all the papers before we have your assignments to read).

Your job is to:

  1. Critique the paper (details below). Do this part before you attempt part 2.
  2. Reproduce one or more of the experiments in the paper. See below for what to turn in.
  3. Write an addendum to your critique commenting on the reproducibility of the research presented

1. Writing your Critique

Focus your attention on the research methodology more than the research idea. Your critique will probably be on the order of two to three pages although there will be exceptions. In your critique you must at least answer the following questions (depending on the paper, there will be other things to discuss as well).

2. Reproducing Results

Pick one or two experiments from the paper and try to reproduce the tests and measurements. Undoubtedly you will have difficulty actually reproducing the results. This is OK. You will be graded on how you approach the reproduction, how carefully and fairly you compare your experience with that of the authors, and how completely you can state the assumptions that you had to make. The goal of this exercise is to understand systems research, writing, and reproducibility.

Be careful to articulate any hidden assumptions that you make. Think hard about how to interpret your results given different hardware and software configurations. You may take advantage of data and/or tools that have been made available by the authors, but you may not do so to the extent that there is no work left to the assignment.

What to turn in:

Critique Addendum

Discuss your results and how and why they differ from those published. Then add a few paragraphs to your critique discussing the reproducibility of the results. Comment on whether or not your assessment of the paper changed after trying to reproduce the results.

Suggested Papers

Here are some suggested papers. If you chose something not on this list, check with at least one of us before beginning the assignment to make sure that the task you are undertaking is reasonable.

  1. Agrawal 2009, Generating Realistic Impressions for File-System Benchmarking. Take measurements of a real syste, generate an impressions file system and them compare (as is done in Figure 2).
  2. Amit 2017, Optimizing the TLB Shootdown Algorithm with Page Access Tracking
  3. Amit 2019, JumpSwitches: Restoring the Performance of Indirect Branches In the Era of Spectre
  4. Baker 1991, Measurements of a Distributed File System. Process the traces to reproduce some/all of the graphs or write a simulator to reproduce some of the results from the second half of the paper.
  5. Blackwell 1995, Heuristic Cleaning Algorithms in Log-Structured File Systems (appeared in the 1995 Usenix Technical Conference). Use the trace-data that is available and reproduce some of the graphs.
  6. Blake 2003, High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two (appeared in the 2003 Hot Topics in Operating Systems). Reproduce the graph in Section 4.1.
  7. Brown 1995, Benchmarking in the Wake of Lmbench: A Case Study of the Performance of NetBSD on the Intel x86 Architecture. Find a range of Intel processors and try to reproduce some of the hierarchical decompositions.
  8. Cadar 2008, Klee: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. Download their tool (it's not on a Stanford site, it's at and try it on some of the workloads they used.
  9. Chen 1995, The Measured Performance of Personal Computer Operating Systems. Try rerunning some of the microbenchmarks on modern machines and modern versions of the operating systems in the paper.
  10. Cutler 2018, The benefits and costs of writing a POSIX kernel in a high-level language. The code for this project is available here. See if you can reproduce some of their measurements about kernel functionality.
  11. Dahlin 1994, A Quantitative Analysis of Cache Policies for Scalable Network File Systems (appeared in the 1994 Sigmetrics). Use the Sprite traces and reproduce some of their graphs.
  12. Dice 2017, Malthusian Locks
  13. Ellard 2003, Passive NFS tracing of Email Workloads (appeared in the 2003 USENIX FAST conference). Process the traces with your own tools and produce some of the graphs.
  14. Harnik 2013 To Zip or not to Zip: Effective Resource Usage for Real-Time Compression. See if you can get the same kinds of compression timings that the authors got.
  15. Holland 2013 Flash Caching on the Storage Client. Build a simple simulator that tackles a small subset of the design space and see if you can get results similar to those of the authors.
  16. Howard 1988, Scale and Performance in a Distributed File System (on the reading list). Recreate the Andrew benchmark results.
  17. Koller 2013: Write Policies for Host-side Flash Caches. Start with the analytical results from Figure 1. Then see if you can put together a system that looks something like what the authors did and see if you can run any of their benchmarks.
  18. Kyrola 2012 GraphChi: Large-Scale Graph Computation on Just a PC. Most of the graphs from this paper are available from the SNAP repository and many of the systems against which to compare are open source.
  19. Lozi 2016 The Linux Scheduler: a Decade of Wasted Cores This paper has a collection of different graphs illustrating several interesting behaviors of the Linux scheduler, see if the behavior described still exists.
  20. Mao 2012 Cache Craftiness for Fast Multicore Key-Value Storage. The software described here is available here. See if you can reproduce any of figures 9 - 11.
  21. McKusick 1984, A Fast File System for UNIX. Regenerate (and explain how) the numbers in Table 1; generate results similar to those in tables IIa and IIb for different block/fragment size combinations on an FFS.
  22. McSherry 2013 Scalability! But at what COST? These are all graph processing problems on a PC; how hard can that be?
  23. McVoy 1996, lmbench: Portable Tools for Performance Analysis (appeared in the 1996 USENIX technical conference). Run lmbench on some platforms similar/identical to the ones described.
  24. Megiddo 2003, RC: A Self-tuning Low Overhead Replacement Cache (appeared in the 2003 USENIX FAST conference). Implement or simulate ARC and produce results.
  25. Oppenheimer 2006, Service Placement in a Shared Wide-Area Platform. Obtain their traces and see if you can reproduce some graphs.
  26. Qiao 2006, Structured and unstructured overlays under the microscope. Pick some existing systems in deployment today and conduct analyses like those done in the paper.
  27. Ren 2019, An Analysis of Performance Evolution of Linux’s Core Operations
  28. Roghanchi 2017, ffwd: delegation is (much) faster than you think. This paper explores different ways to consistently handle access to shared memory. See if you can reproduce any of the benchmarks in the first three or four figures. Code is available here
  29. Rosenblum 1992, The Design and Implementation of a Log-Structured File System. Recreate the micro-benchmarks section.
  30. Roy 2013, X-Stream: Edge-centric Graph Processing using Streaming Partitions. This paper has a lot of different data - not just run time. Trying to reproduce it should be, um, fun.
  31. Sumbaly 2012, Serving Large-scale Batch Computed Data with Project Voldemort. Using the publicly available Voldemort and MySQL releases, see if you can reproduce any of the graphs in the evaluation.
  32. Vangood 2017, To FUSE or Not to FUSE: Performance of User-Space File Systems. This is largely a benchmark paper, so pick a benchmark or two and go wild!
  33. Volos 2014, Aerie: flexible file-system interfaces to storage-class memory. See if you can reproduce Figure 1.
  34. Wu 2018, Anna: A KVS For Any Scale. The code for this system is available here. Can you reproduce any of the comparisons with Redis or Cassandra or any other widely used KV store?
  35. Zhao 2016, Non-Intrusive Performance Profiling for Entire Software Stacks Based on the Flow Reconstruction Principle. Pick one workload used in the paper and see if you can reproduce it. Can you run workloads not in the paper?