Project Proposal & Research Plan Due: 5:00 PM October 7, 2022
1st Status Meeting: Early Week of October 18, 2022
2nd Status Meeting: Week of November 1, 2022
In-class Presentations: November 29/December 1, 2022
First Draft Due: 9:00 PM November 27, 2022
Final Project Due 5:00 PM December 19, 2022
Teams: You may chose to complete your final project alone or as a team. However, we strongly encourage the final project to be done in teams of two. If you feel that you have a project sufficiently large to warrant more people, come talk to us.
Projects overlapping other courses/research/... Projects may also be undertaken in cooperation with other graduate courses, but any such project must be approved by the professors of both courses. Not surprisingly, we expect more depth and work for a project that is satisfying two class requirements. Similarly, if you wish to undertake a project related to your own research, we will permit it, but you must demonstrate how what you've learned in CPSC 508 influences your work and/or ways in which your research would have been different had you not been also conducting a project in CPSC 508. In other words, your project in CPSC 508 must extend work you would normally have done in some new and/or different way. In either way: please come and talk to us if your project will be overlapping with other courses/research.
The proposal part should begin with a single paragraph fairytale: what is the story you would like to tell?
Then, more formally or seriously describe your project. You should clearly motivate and state the research question you are investigating. Provide a few sentences of explanation about why you find this to be an interesting question, why it is important, and how it qualifies as research.
The research plan is a more comprehensive document. It should include the following components (the numbers in parentheses are an indication of an estimate of the number pages that you might need for the section; it is just an estimate; in practice you should write exactly as much as you need to write to convey answers to questions we pose).The other category of related work we call contextual work. This includes prior work upon which you are building (if it wasn't needed in the background), the work of others who are solving the same problem you are, but doing so in different ways, work in adjacent areas that influences what you've done. The purpose of this part of the related work section is to help a reader place your work in the research landscape and truly understand your results. You need not have conducted an exhaustive literature search by the time that you submit the proposal, but you should know what work is out there and at a minimum, in what areas you need to be looking. Consider this: You need to know that you are undertaking new research and not merely repeating assignment 1 -- doing something someone else has already done. If you are not familiar with the related work, you have no way of knowing this. Understand that this means that by the time you submit your proposal, you have a concrete idea of both the problem and some ideas about how you are tackling the problem.
If you think earlier related work is flawed and want to correct those flaws, this would be the section where you describe that. Even if you have not yet read all the papers you intend to read, this section should include a list of papers that you plan to read, an indication of other areas in which you think you might need to find some papers, and at least a couple of specific comparisons between prior work and what you're doing (you just need a few sentences for each of these).
Part of your final report grade will be based upon how well you address comments raised by the program committee. Do not ignore my and the reviewer comments!
Leveraging recent development of Kernel Runtime Security Instrumentation by Google and looking at security namespacing, can you propose a solution to allow containers to load Linux Security Monitors and Berkeley Packet Filters on a per container (a.k.a. namespace) basis? You may want to look at existing cgroup-ed BPF programs and understand the necessary requirements to enable such a feature in the eBPF-LSM context. Can you reproduce the use cases presented here? Can you present new interesting use cases?
Contact Thomas Pasquier (tfjmp@cs.ubc.ca).
Expanding on prior work, Thomas Pasquier's group has started to build a tool that creates a "graph grammar" that describes the provenance subgraph corresponding to a given system call. Given code snippets or malware binaries, it should be possible to build a grammar representing their execution and therefore to detect the subgraph they generate. How could one build such grammar? Can this be done at scale (e.g., given a malware library)? Can you automatically extract a generalization that matches a given malware family to keep detection high in the case of a new malware variant? Could this solution be used to remove the need for human expertise compared to POIROT?
Contact: Thomas Pasquier (tfjmp@cs.ubc.ca).
Contact: Reto Achermann (achreto@cs.ubc.ca).
Unikernels [1] offer an alternative to containers (e.g., Docker) as a cloud application deployment solution. In parallel, the concept of "fog" computing is emerging, where services are being deployed directly into the Internet of Things (IoT) infrastructure to improve latency, privacy, and partition resilience (among other things). This suggests the following questions: "Could we allow the dynamic migration of services from the cloud, to edge-devices and directly into end-devices?" and "Could this be done while maintaining the use of programming languages and skills backed by a relatively cheap and abundant workforce?"
Transpilation techniques [2] combined with a unikernel designed for extremely low resource consumption [3] could be a step in this direction. A preliminary proof of concept demonstrated the possibility of transforming a PHP application into a few MB self-contained virtual machine image. We want to go beyond that proof of concept and build a prototype to demonstrate effective service migration in such a manner.
1 - Madhavapeddy, Anil, et al. "Unikernels: Library operating systems for the cloud." ACM SIGPLAN Notices 48.4 (2013): 461-472.
2 - Zhao, Haiping, et al. "The HipHop compiler for PHP." ACM SIGPLAN Notices. Vol. 47. No. 10. ACM, 2012.
3 - Bratterud, Alfred, et al. "IncludeOS: A minimal, resource efficient unikernel for cloud services." 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 2015.
There exist many systems papers of the form, "We wanted to solve some problem, so we modified the kernel to produce a bunch of data, and then we used that data to do something." I'd like to see how many of these projects could be done via a single provenance capture system. CamFlow is a selective whole-system provenance capture system. It also has a lovely front-end display engine. How many special-purpose systems could be replaced by scripts running over CamFlow data. I could imagine doing this dynamically over streaming data (using CamQuery) or statically over collected data.
Warning: I have a strong vested interest in this project. The upside is that you are likely to get lots of attention; the downside is that you are likely to get lots of attention.
At a high level, the goal of this project is to extract high level specifications from device drivers. There is a large body of work identying and leveraging copy-paste errors in operating systems and device drivers to find/fix bugs, e.g., CP-Miner, An empirical study of operating system errors. We are particularly interested in such errors as they appear in device drivers. While we are actively engaged in developing a system to synthesize device drivers based on high level specifications, writing device driver specifications is still tedious. The question we ask here is whether we can leverage copy-pasted code (or other commonalities) in existing device drivers, to automatically extract high level specifications of the driver, so we could then synthesize similar drivers or drivers for other operating systems.
Nearly every machine on which a programmer debugs code has multiple cores. However, most of those cores remain idle while the programmer decides what steps to take in tracking down a bug. In this project, we ask the question, "Can we use techniques from program synthesis, static analysis, dynamic analysis, etc. to automatically use extra cores to provide useful information to the human who is debugging the code?"
We have moved from a world of servers to microservices and more recently function-as-a-service offerings. As such, much research attention focuses on how to make these platforms perform better. Assessing improvements inside of an Amazon or Google is relatively straight forward, but evaluating improvements for academic research groups is nearly impossible. Imagine for a moment that you could spin up a set of microservices (or functions), each of which exhibited exactly the behavior you want to model. If you could make this a reality, then anyone could run good evaluations.
The following is written in the context of microservices, but you could do the exact same thing for functions-as-a-service platforms. You might try to approach this any way you like, but one approach might be to A) Identify the key parameters that differentiate different microservices (this is mostly a literature review), B) Similarly, identify the key characteristics of workloads that people run on collections of microservices. C) Develop a microserver parameterized according to what was uncovered in A. D) Develop a workload generator using what was determined in B. E) Write an engine that lets you specify a workload and collection of parameterized microservices on which to run the workload and generates the experimental deployment.
An alternative way to implement this might be write even higher level specifications for the workload and microservice and use modern program synthesis approaches to generate an implementation of a workload generator (microserver) that meets the specification.
CHERI ( Capability Hardware Enhanced RISC Instructions) is a capability-based hardware extension. CHERI provides fine-grain memory protection within a single address space.
Tinkertoy is a set of operating system components from which one can assemble an operating system designed for 4th generation IoT devices. Tinkertoy does not have a virtual memory system.
Design a system to provide protected address spaces in Tinkertoy using CHERI.
Device drivers make up a significant portion of operating systems code, and they are disporportionately responsible for bugs in said systems. One approach to protecting the rest of the kernel from bugs in drivers is kernel compartmentalization using techniques such as Ksplit. Such approaches highlight just how intertwined device drivers are with the rest of the kernel. The goal of this project is to precisely identify and quantify these dependencies.
The immediately use case of this analysis is to assist in an ongoing project in device driver synthesis. One of the key challenge in moving drivers between different systems is translating calls from a driver into the surrounding operating system. We want to understand the scope of this interfaces to design an approach that will be feasible across a large number of systems.
As mentioned in previous project descriptions, we have an ongoing project to synthesize device drivers from specifications. This project is a dual: synthesizing emulated devices. The question that we would like to ask is, "To what extent can we generate an emulated device for the Qemu virtual machine and/or the Arm FastModels simulator from the specification we use to synthesize device drivers?"
The beauty of program synthesis is that verification is a key part of the synthesis pipeline. By using verification, developers can prove that their programs adhere to their specification. While applying this technique to user code is reasonably straight forward (though nontrivial), doing the same for system software requires both a spec of the software and a detailed enough specification of the underlying hardware, potentially including its various caches, buffers, and architectural state that can influence the result of software execution. To expand the reach of program synthesis, the goal of this project is to develop a useful, but simplified, specification of the memory subsystem of modern processors and demonstrate its utility.