Project Proposal & Research Plan Due: 5:00 PM October 4, 2019
1st Status Meeting: Early Week of October 14, 2019 or Late Week of October 21
2nd Status Meeting: Week of November 4, 2019
In-class Presentations: November 19/21, 2019
Depending on how many different projects we have, we may need only one of theseFirst Draft Due: 9:00 PM November 24, 2019
Final Project Due 5:00 PM December 13, 2019
Final projects may be undertaken in teams of two graduate students or up to four undergraduate students. If you feel that you have a project sufficiently large to warrant more people, come talk to me. Projects may also be undertaken in cooperation with other graduate courses, but any such project must be approved by the professors of both courses. Not surprisingly, we expect more depth and work for a project that is satisfying two class requirements. Similarly, if you wish to undertake a project related to your own research, I will permit it, but you must demonstrate how what we've learned in CPSC 508 influences your work and/or ways in which your research would have been different had you not been also conducting a project in CPSC 508. In other words, your project in CPSC 508 must extend work you would normally have done in some new and/or diferent way.
For this project, you need to pose a question, design a framework in which to answer the question, conduct the research, and write up your experience and results. There will be five deliverables for this project.The proposal part should be a single page that describes your project. You should clearly motivate and state the research question you are investigating. Provide a few sentences of explanation about why you think this is an interesting question, why it is important, and how it qualifies as research.
The research plan is a more comprehensive document. It should include the following components (the numbers in parentheses are an indication of an estimate of the number pages that you might need for the section).The other category of related work I call contextual work. This includes prior work upon which you are building (if it wasn't needed in the background), the work of others who are solving the same problem you are, but doing so in different ways, work in adjacent areas that influences what you've done. The purpose of this part of the related work section is to help a reader place your work in the research landscape and truly understand your results. You need not have conducted an exhaustive literature search by the time that you submit the proposal, but you should know what work is out there and at a minimum, in what areas you need to be looking. Consider this: You need to know that you are undertaking new research and not merely repeating assignment 1 -- doing something someone else has already done. If you are not familiar with the related work, you have no way of knowing this. Understand that this means that by the time you submit your proposal, you have a concrete idea of both the problem and some ideas about how yo are tackling the problem.
If you think earlier related work is flawed and want to correct those flaws, this would be the section where you describe that. Even if you have not yet read all the papers you intend to read, this section should include a list of papers that you plan to read, an indication of other areas in which you think you might need to find some papers, and at least a couple of specific comparisons between prior work and what you're doing (you just need a few sentences for each of these).
Part of your final report grade will be based upon how well you address comments raised by the program committee. Do not ignore my and the reviewer comments!
I am assuming that after completing homework 1, you have concluded that it's quite difficult to make systems research reproducible. There are many different provenance capture systems in the world, and our hypothesis is that if you have complete provenance for an experiment, then you can automatically construct a virtual machine that can be used to reproduce the research in a paper. We think this would be cool!
So, pick a provenance capture system (we have ideas -- we have a simple R-based provenance capture system that would be the obvious one to use, but you are free to use others if you prefer), and develop tools that can reconstruct an experiment based solely on the provenance.
You will undoubtedly want to check out CamFlow and Philip Guo's CDE.
Unikernels [1] have been as an alternative to containers (e.g., Docker) as a cloud application deployment solution. In parallel, the concept of "fog" computing is emerging, where services are being deployed directly into the Internet of Things (IoT) infrastructure to improve latency, privacy, and partition resilience (among other things). This suggests the questions, "Could we allow the dynamic migration of services from the cloud, to edge-devices and directly into end-devices?" and "Could this be done while maintaining the use of programming languages and skills backed by a relatively cheap and abundant workforce?"
Transpilation techniques [2] combined with a unikernel designed for extremely low resource consumption [3] could be a step in this direction. A preliminary proof of concept demonstrated the possibility to transform a PHP application into a few MB self-contained virtual machine image. We want to go beyond that proof of concept and build a prototype to demonstrate effective service migration in such a manner.
1 - Madhavapeddy, Anil, et al. "Unikernels: Library operating systems for the cloud." ACM SIGPLAN Notices 48.4 (2013): 461-472.
2 - Zhao, Haiping, et al. "The HipHop compiler for PHP." ACM SIGPLAN Notices. Vol. 47. No. 10. ACM, 2012.
3 - Bratterud, Alfred, et al. "IncludeOS: A minimal, resource efficient unikernel for cloud services." 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 2015.
One could view a hypervisor as a mechanism for providing units of isolation whose interface is a machine ISA. Similarly, a conventional operating system provides units of isolation whose interface is the system call API. This trend continues: a JVM is a user-level process that provides units of isolation whose API is Java bytecodes. Web servers adopted a pile of complexity to support virtual domains, thereby introducing yet another way to provide isolation." Some browsers also provide units of isolation between each browser tab. Given this stack of software, each providing an isolation mechanism, one might wonder why we have N different mechanisms instead of a single coherent mechanism. Putting it another way, could all these systems use a single isolation model/implementation? If so, what would it look like?
The goal of this project would be to design an isolation mechanism that could be used in this way and evaluate it. (It's possible that you could build something like this on L4.) You could imagine evaluting this using a set of examples like the web server and running it in different architectural configurations: a single web server running virtual domains; a web server per virtual machine; a web server using the isolation mechanism you provide.
There are many systems papers of the form, "We wanted to solve some problem, so we modified the kernel to produce a bunch of data, and then we used that data to do something." I'd like to see how many of these projects could be done via a single provenance capture system. CamFlow is a selective whole-system provenance capture system. It also have a lovely front-end display engine. I would love to see how many special-purpose systems could be replaced by scripts running over CamFlow data. I could imagine doing this dynamically over streaming data (using CamQuery) or statically over collected data.
Warning: I have a strong vested interest in this project. The upside is that you are likely to get lots of attention; the downside is that you are likely to get lots of attention.