Assignment 1: Paper Critique and Reproducibility (2022w1)
Due Septemer 30 -- electronically (via handin repository; see
instructions) by 5:00 PM
You may complete this assignment either alone or with a partner.
If, however, you have a partner, we will have higher expectations in
terms of what you actually reproduce. If you plan on working with a
partner, please let me know ASAP.
Select one paper that meets all of the following criteria:
- It is an evaluation paper.
- You are not an author.
- It includes data (graphs, tables, numbers).
- It has something to do with systems.
The paper selected may be one listed here or one from the reading list,
but it does not have to be.
As soon as you've selected your paper, please send us
email and tell us what paper you are doing and send a link to it if
it is not one on this list or in the course readings.
Your job is to:
- Critique the paper (details below). Do this part before you
attempt part 2.
- Reproduce one or more of the experiments in the paper.
See below for details on what this might look like depending on the state
of the paper's artifact evaluation.
- Write an addendum to your critique commenting on the reproducibility
of the research presented.
1. Writing your Critique
Focus your attention on the research methodology more than the
research idea. Your critique will probably be on the order of one
to two pages although there will be exceptions. In your critique
you must at least answer the following questions (depending on the
paper, there will be other things to discuss as well).
- What is the purpose of the paper?
- What is the hypothesis that the authors are testing?
- What is the experimental setup?
- What is good/bad about the experimental setup?
- How well was the research carried out? What results are presented?
- Do you believe the results? Why/Why not?
- What things might you have done differently?
- What lessons did you learn from reading this paper critically?
2. Reproducing Results
The goal of this exercise is to understand systems research, writing,
and reproducibility.
In an age of artifact evaluations, this part can take many different forms.
Please read this entire section before getting started.
1. Your paper has no published artifact
Pick one or two experiments from the paper and try to reproduce the
tests and measurements. Undoubtedly you will have difficulty
actually reproducing the results. This is OK. You will be graded
on how you approach the reproduction, how carefully and fairly you
compare your experience with that of the authors
and how completely you can state the assumptions that you had to make.
2. Your paper has a published artifact that has NOT been through an
artifact evaluation.
In this case, you may have difficulty using the published artifact.
That would not be surprising.
Report on the difficulty and do your best to get something similar
to the artifact running!
If you get it running easily, pretend that you are in case 3 below.
3. Your paper has a published artifact that has been through an
artifact evaluation.
In this case, make sure you run the experiment on a platform that is quite
different from that used in the paper.
Ideally, the artifact runs easily and your task will require that you
understand the benchmarks and platform differences sufficiently well
that you can explain/justify your results.
In all cases
Be careful to
articulate any hidden assumptions that you make. Think hard about
how to interpret your results given different hardware and
software configurations.
You may take advantage of data and/or tools that have been made
available by the authors, but you may not do so to the extent that
there is no work left to the assignment.
3. Critique Addendum
Discuss your results and how and why they differ from those published.
Then add a few
paragraphs to your critique discussing the reproducibility of the results.
Comment on
whether or not your assessment of the paper changed after trying to reproduce the
results.
What to turn in
- A write-up of what experiment you are trying to reproduce (identify the
corresponding tables/graphs from the original paper).
- A description of your experimental setup.
- A discussion of any assumptions you made and important information that
the authors did not provide in their paper.
- A list of any tools and/or traces that you used.
How to turn in
- Place all the parts of your assignment into a single PDF document.
- Name your document a1_[cwl].pdf where cwl is your CWL. If you worked with
a partner, name your document a1_[cwl1]_[cwl2] where the two CWLs are listed
alphabetically.
- Add and commit this document into your handin_[CWL] repository (if you
worked with a partner, both of you should do this).
- Push your repository.
Suggested Papers
Here are some suggested papers.
If you chose something not on this list, check with either Professor
Seltzer or one of the TAs to make sure that the task you are
undertaking is reasonable.
- Amit 2017: Optimizing the TLB Shootdown Algorithm with
Page Access Tracking
- Amit 2019: JumpSwitches: Restoring the Performance
of Indirect Branches In the Era of Spectre
-
Balmau 2017:
TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores.
Reproduce any figure numbered 9 or greater.
- Blake 2003:
High Availability, Scalable Storage,
Dynamic Peer Networks: Pick
Two (appeared in the 2003 Hot Topics in Operating Systems).
Reproduce the graph in Section 4.1.
- Cadar 2008: Klee: Unassisted and
Automatic Generation of High-Coverage Tests for Complex
Systems Programs. Download their tool (it's not on a Stanford site, it's
at llvm.org) and try it on some of the workloads they used.
- Curtsinger 2015:
COZ: Finding Code that Counts with Causal Profiling.
Reproduce any of the examples of COZ profiling. If that works seamlessly,
use COZ to evaluate something that they did not evaluate in the paper and
report on what you learned about it.
- Cutler 2018:
The benefits and costs of writing a
POSIX kernel in a high-level language.
The code for this project
is available here.
See if you can reproduce some of their measurements about kernel functionality.
- Harnik 2013:
To Zip or not to Zip: Effective Resource Usage for Real-Time
Compression.
See if you can get the same kinds of compression timings that
the authors got.
- Jamet 2020:
Characterizing the impact of last-level cache replacement policies on big-data workloads.
In theory, it should be easy to use the simulator and tracer used in this study
to reproduce their results exactly. See if theory meets practice. If so,
analyze a different benchmark using their tools!
- Kadekodi 2018:
Geriatrix: Aging what you see and what you don’t see.
A file system aging approach for modern storage systems.
There are so many graphs frmo which to choose -- see if you can reproduce
some runtime results on an aged file system.
- Koller 2013:
Write Policies for Host-side Flash Caches.
Start with the analytical results from Figure 1.
Then see if you can put together a system that looks something like
what the authors did and see if you can run any of their benchmarks.
- Kyrola 2012:
GraphChi: Large-Scale Graph Computation on Just a PC.
Most of the graphs from this paper are available from the SNAP repository
and many of the systems against which to compare are open source.
- Lawall 2022 :
OS scheduling with nest: keeping tasks close together on warm cores.
This has undergone artifact evaluation, so this is a type 3 project. You
need to make sure you are running on a very different platform.
Then you need to explain your results relative to those in the paper.
- Lozi 2016:
The Linux Scheduler: a Decade of Wasted Cores
This paper has a collection of different graphs illustrating several
interesting behaviors of the Linux scheduler, see if the behavior
described still exists.
- Mao 2012:
Cache Craftiness for Fast Multicore Key-Value Storage.
The software described here is available
here.
See if you can reproduce any of figures 9 - 11.
- Min 2016:
Understanding Manycore Scalability of File Systems.
You can pretty much try to reproduce anything in these figures!
- Ren 2019: An Analysis of Performance Evolution of Linux’s Core Operations
- Roghanchi 2017:
ffwd: delegation is (much) faster than
you think. This paper explores different ways to consistently handle
access to shared memory. See if you can reproduce any of the benchmarks
in the first three or four figures.
Code is available here
- Roy 2013:
X-Stream: Edge-centric Graph Processing using Streaming Partitions.
This paper has a lot of different data - not just run time. Trying to
reproduce it should be, um, fun.
- Sumbaly 2012:
Serving Large-scale Batch Computed Data with Project Voldemort.
Using the publicly available Voldemort and MySQL releases, see if you
can reproduce any of the graphs in the evaluation.
- Vangoor 2017:
To FUSE or Not to FUSE: Performance of User-Space File Systems.
See if you can reproduce a few of the results from Table 3 on any system
to which you have access.
- Volos 2014: Aerie: flexible file-system
interfaces to storage-class memory. See if you can reproduce Figure 1.
- Wu 2018: Anna: A KVS For Any Scale.
The code for this system is available
here. Can you
reproduce any of the comparisons with Redis or Cassandra or any other
widely used KV store?
- Zhao 2016:
Non-Intrusive Performance Profiling for
Entire Software Stacks Based on the Flow
Reconstruction Principle. Pick one workload used in the paper and
see if you can reproduce it. Can you run workloads not in the paper?