CPSC 508 Final Project (2021)

Project Proposal & Research Plan Due: 5:00 PM February 9, 2021
1st Status Meeting: Early Week of March 1, 2021

2nd Status Meeting: Week of March 16, 2021

In-class Presentations: April 6/8, 2021
Depending on how many different projects we have, we may need only one of these

First Draft Due: 9:00 PM April 4, 2021

Final Project Due 5:00 PM April 30, 2021

The goal of the final project is to provide the opportunity for you to conduct systems research. The size of the project can vary, but thinking of it as a conference paper is probably a good model. This History of a Paper outlines a paper from initial submission to final paper. It includes the original (rejected) submission (extended abstract) and reviews, another (accepted) submission and reviews, and the the final paper. This collection should give you an idea of how to give and respond to constructive criticism. It will also give you a sense of what we mean by "conference paper."

Teams / Project

Teams: You may chose to complete your final project alone or as a team. However, we strongly encourage the final project to be done in teams of two graduate students or up to four undergraduate students. If you feel that you have a project sufficiently large to warrant more people, come talk to us.

Projects overlapping other courses/research/... Projects may also be undertaken in cooperation with other graduate courses, but any such project must be approved by the professors of both courses. Not surprisingly, we expect more depth and work for a project that is satisfying two class requirements. Similarly, if you wish to undertake a project related to your own research, we will permit it, but you must demonstrate how what we've learned in CPSC 508 influences your work and/or ways in which your research would have been different had you not been also conducting a project in CPSC 508. In other words, your project in CPSC 508 must extend work you would normally have done in some new and/or different way. In either way: please come and talk to us if your project will be overlapping with other courses/research.

Deliverables

For this project, you need to pose a question, design a framework in which to answer the question, conduct the research, and write up your experience and results. There will be five deliverables for this project.

Project Proposal and Research Plan (20%)

Hint: if you hand it in early, we will try to give you feedback within 24 hours. That will maximize the time you have to work on the project. We therefore strongly encourage you to try to hand it in early (unheard of, we know).

The proposal part should begin with a single paragraph fairytale: what is the story you would like to tell?

Then, more formally or seriously describe your project. You should clearly motivate and state the research question you are investigating. Provide a few sentences of explanation about why you find this to be an interesting question, why it is important, and how it qualifies as research.

estimate

Related Work (1): Related work falls in two categories. Background readings are those that are required for you to understand the area and problem you are attacking. By the time you complete your proposal, we expect that you have both identified and completed (some of) the background reading. You should be able to explain the background in sufficient detail that any of your classmates would understand what you are doing and why.
The other category of related work we call contextual work. This includes prior work upon which you are building (if it wasn't needed in the background), the work of others who are solving the same problem you are, but doing so in different ways, work in adjacent areas that influences what you've done. The purpose of this part of the related work section is to help a reader place your work in the research landscape and truly understand your results. You need not have conducted an exhaustive literature search by the time that you submit the proposal, but you should know what work is out there and at a minimum, in what areas you need to be looking. Consider this: You need to know that you are undertaking new research and not merely repeating assignment 1 -- doing something someone else has already done. If you are not familiar with the related work, you have no way of knowing this. Understand that this means that by the time you submit your proposal, you have a concrete idea of both the problem and some ideas about how yo are tackling the problem.
If you think earlier related work is flawed and want to correct those flaws, this would be the section where you describe that. Even if you have not yet read all the papers you intend to read, this section should include a list of papers that you plan to read, an indication of other areas in which you think you might need to find some papers, and at least a couple of specific comparisons between prior work and what you're doing (you just need a few sentences for each of these).
Conclusion Format (1 paragraph): What sorts of conclusions do you expect to state? Obviously, you do not know what the results are in advance, but you should know the format of those results. For example, if you are comparing two systems, you would expect to be able to say things like, "System A outperforms System B in these instances and System B outperforms System A in these other instances." Naturally, after you have done the research, you should also be able to explain why. You might provide some template sentences that you expect to have in your conclusion (and/or abstract).
Experimental Setup (1): What experiments will you conduct? Why? What question is each experiment designed to answer? What do you hope to learn from each experiment? What measurement tools will you use? How will you determine if your measurements are accurate? What tests will you conduct conditionally? (For example, if we learn X from experiment 1 then we must do A else we'll do B.) What problems do you expect? The more detail you include in this section, the more able we will be to give useful feedback and the more easily you will be able to conduct your research. Think hard about what the important questions are and how you can answer them, and then map those questions and answers back to explicit tasks.
Resources Needed (<1): What equipment, software, tools will you need? This section is particularly important so that the we can make sure we have the right stuff available. If you have any doubts about the availability of the resources you'll need, please speak to one of us as soon as possible! You do not want to find out in the middle of November that the hardware and/or software you need is not available!
Schedule (< 1): (be specific; include dates and milestones).

Status Meeting

syllabus

First Draft (30%)

The draft should contain all the parts of the paper, although it may have preliminary results and may be missing some results.

we want you to write the results section

please, please, please don't write this all the night before it is due

In-class Presentation (10%)

Final Report (40%)

Part of your final report grade will be based upon how well you address comments raised by the program committee. Do not ignore my and the reviewer comments!

Project Suggestions

We suggest some topics below (we may add to this list after it is up on the web site, so it makes sense to check there if you are stuck for project ideas). You need not pick your final project from this list, but if you decide on a project not on this list, please check with us before fully committing to the project. The key characteristics of a project should be:

The work can reasonably be completed in two months.
We have access to the required hardware and software.
The research question has something to do with systems (I'm willing to give a fair bit of freedom here, but if there are any questions, please check with me).
The project is structured in such a way that you can have tangible results. (No big idea papers probably.)
You will learn something from undertaking this project.

Securing Containerized Computation

Leveraging recent development of Kernel Runtime Security Instrumentation by Google and looking at security namespacing, can you propose a solution to allow containers to load Linux Security Monitors and Berkeley Packet Filters on a per container (a.k.a namespace basis)? You may want to look at existing cgroup-ed BPF programs and understand the necessary requirements to enable such a feature in the eBPF-LSM context. Can you reproduce the use cases presented here? Can you present new interesting use cases?

Contact Thomas Pasquier (tfjmp@cs.ubc.ca).

Execution Grammars

Expanding on prior work, Thomas Pasquier's group has started to build a tool that creates a “graph grammar” describing the provenance subgraph corresponding to a given system call. Given code snippets or malware binaries, it should be possible to build a grammar representing their execution and therefore to detect the subgraph they generate. How could one build such grammar? Can this be done at scale (e.g. given a malware library)? Can you automatically extract a generalization that matches a given malware family to keep detection high in the case of a new malware variant? Could this solution be used to remove the need for human expertise when compared to POIROT?

Contact: Thomas Pasquier

Building an automated network traffic shape generator tool

The sizes and timing of an application's network packets can reveal the content of the (encrypted) traffic, such as web pages, video streams or voip chats. One way to address this problem is to shape the traffic, i.e., modify the sizes and timing of the application's packets to make them independent of application secrets. What is an optimal shaping strategy for an application's traffic that is sufficient to hide its secrets and yet does not impose significant bandwidth or latency overheads on the traffic?

In this project, the goal is to build a shape generator that automatically computes optimal traffic shapes for a vulnerable application. One way to approach this problem would be to apply machine learning techniques on network traces to determine optimal traffic shapes for different classes of application secrets.

Contact: Aastha Mehta

Building a bluetooth beacon-based epidemic risk mitigation system

The goal is to build a prototype of PanCast system. There are two broad goals: (a) Implement the message protocols for encounter broadcasts, encounter history upload and risk dissemination.

(b) Develop a benchmark suite for empirical measurements on BLE 5.2 protocol. Specifically, develop benchmarks and measure the achievable throughput, transmission and receive error rates, and battery consumption.

Contact: Aastha Mehta

Specification of Translation Hardware

Translation Hardware Formalization
Translation Hardware Spec
Generation of Capability Types in Barrelfish

uKernel generation

Synthesizing uKernels
uKernel Tuning

Companion Kernel Threads

Process overhead in operating systems

Secure Tenant Isolation in a single Process

Performance Model for Near-Memory / In Memory Computation / Accelerators

In-Memory Database Operator Offload

One Queue to rule them all.

Cloud Computing in the Wild

Unikernels [1] have been as an alternative to containers (e.g., Docker) as a cloud application deployment solution. In parallel, the concept of "fog" computing is emerging, where services are being deployed directly into the Internet of Things (IoT) infrastructure to improve latency, privacy, and partition resilience (among other things). This suggests the questions, "Could we allow the dynamic migration of services from the cloud, to edge-devices and directly into end-devices?" and "Could this be done while maintaining the use of programming languages and skills backed by a relatively cheap and abundant workforce?"

Transpilation techniques [2] combined with a unikernel designed for extremely low resource consumption [3] could be a step in this direction. A preliminary proof of concept demonstrated the possibility to transform a PHP application into a few MB self-contained virtual machine image. We want to go beyond that proof of concept and build a prototype to demonstrate effective service migration in such a manner.

1 - Madhavapeddy, Anil, et al. "Unikernels: Library operating systems for the cloud." ACM SIGPLAN Notices 48.4 (2013): 461-472.

2 - Zhao, Haiping, et al. "The HipHop compiler for PHP." ACM SIGPLAN Notices. Vol. 47. No. 10. ACM, 2012.

3 - Bratterud, Alfred, et al. "IncludeOS: A minimal, resource efficient unikernel for cloud services." 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 2015.

A General Purpose Isolation Mechanism

One could view a hypervisor as a mechanism for providing units of isolation whose interface is a machine ISA. Similarly, a conventional operating system provides units of isolation whose interface is the system call API. This trend continues: a JVM is a user-level process that provides units of isolation whose API is Java bytecodes. Web servers adopted a pile of complexity to support virtual domains, thereby introducing yet another way to provide isolation." Some browsers also provide units of isolation between each browser tab. Given this stack of software, each providing an isolation mechanism, one might wonder why we have N different mechanisms instead of a single coherent mechanism. Putting it another way, could all these systems use a single isolation model/implementation? If so, what would it look like?

The goal of this project would be to design an isolation mechanism that could be used in this way and evaluate it. (It's possible that you could build something like this on L4.) You could imagine evaluating this using a set of examples like the web server and running it in different architectural configurations: a single web server running virtual domains; a web server per virtual machine; a web server using the isolation mechanism you provide.

Using Provenance to Solve OS Problems

There are many systems papers of the form, "We wanted to solve some problem, so we modified the kernel to produce a bunch of data, and then we used that data to do something." I'd like to see how many of these projects could be done via a single provenance capture system. CamFlow is a selective whole-system provenance capture system. It also have a lovely front-end display engine. I would love to see how many special-purpose systems could be replaced by scripts running over CamFlow data. I could imagine doing this dynamically over streaming data (using CamQuery) or statically over collected data.

For example, prefetching files requires that you know what files are likely to be accessed, before programs actually access them -- PASS captures much of that data. So, see if you can replicate the work in "An Analytical Approach to File Prefetching (1997 USENIX)" using PASS. Here are other papers on file prefetching to examine:
- Marginal Cost-Benefit Analysis for Predictive File Prefetching (ACSME 2003)
- Design and Implementation of Predictive File Prefetch ing (USENIX 2002)
Another area where provenance might be useful is in cache replacement algorithms -- if you knew what you might need again soon, you would keep it in your cache. Look for papers on caching, such as:
- Informed prefetching and caching (SOSP 1995)
- Application controlled prefetching and caching (USENIX 2002)
The Coda file system was designed to help users work in a disconnected mode. One component of that system was a hoarding mechanism where the system would try to figure out what files you were going to need to function while disconnected. It seems that one could exploit provenance to perform better hoarding. Do it!

Warning: I have a strong vested interest in this project. The upside is that you are likely to get lots of attention; the downside is that you are likely to get lots of attention.

Storing whole system provenance on blockchain

Camflow

very recent paper

Prove that LSM-based provenance capture is guaranteed to detect a security breach.

this paper

must

Real End-to-end Provenance

Python

VisTrails

CamFlow

glue

Deriving Boot sequences from Machine Independent specifications

not

Generating a HAL for ARMv8

machine reabdable specification of the ARMv8 architecture

Tiny OS Components for Tiny Processors

Access Pattern from JIT

OpenMP

GCC

Snap for Storage

NVM Write cache

device mapper