CPSC 508 Final Project (2022)

Project Proposal & Research Plan Due: 5:00 PM October 7, 2022

1st Status Meeting: Early Week of October 18, 2022

2nd Status Meeting: Week of November 1, 2022

In-class Presentations: November 29/December 1, 2022

First Draft Due: 9:00 PM November 27, 2022

Final Project Due 5:00 PM December 19, 2022

The goal of the final project is to provide the opportunity for you to conduct systems research. The size of the project can vary, but thinking of it as a conference paper is probably a good model. This History of a Paper outlines a paper from initial submission to final paper. It includes the original (rejected) submission (extended abstract) and reviews, another (accepted) submission and reviews, and the final paper. This collection should give you an idea of how to give and respond to constructive criticism. It will also give you a sense of what we mean by "conference paper."

Teams / Project

Teams: You may chose to complete your final project alone or as a team. However, we strongly encourage the final project to be done in teams of two. If you feel that you have a project sufficiently large to warrant more people, come talk to us.

Projects overlapping other courses/research/... Projects may also be undertaken in cooperation with other graduate courses, but any such project must be approved by the professors of both courses. Not surprisingly, we expect more depth and work for a project that is satisfying two class requirements. Similarly, if you wish to undertake a project related to your own research, we will permit it, but you must demonstrate how what you've learned in CPSC 508 influences your work and/or ways in which your research would have been different had you not been also conducting a project in CPSC 508. In other words, your project in CPSC 508 must extend work you would normally have done in some new and/or different way. In either way: please come and talk to us if your project will be overlapping with other courses/research.

Deliverables

For this project, you need to pose a question, design a framework in which to answer the question, conduct the research, and write up your experience and results. There will be five deliverables for this project.

  1. Project Proposal and Research Plan (20%)
  2. Hint: if you hand it in early, we will try to give you feedback within 24 hours. That will maximize the time you have to work on the project. We therefore strongly encourage you to try to hand it in early (unheard of, we know).

    The proposal part should begin with a single paragraph fairytale: what is the story you would like to tell?

    Then, more formally or seriously describe your project. You should clearly motivate and state the research question you are investigating. Provide a few sentences of explanation about why you find this to be an interesting question, why it is important, and how it qualifies as research.

    The research plan is a more comprehensive document. It should include the following components (the numbers in parentheses are an indication of an estimate of the number pages that you might need for the section; it is just an estimate; in practice you should write exactly as much as you need to write to convey answers to questions we pose).

  3. Status Meeting
  4. We encourage you to come talk to us about your project or schedule other meetings with me as the need arises. At a minimum, I want to meet with you twice before the extended abstract is due. See the syllabus These meetings are for your benefit. We expect to answer questions you may have, ask you questions about what you've done, brainstorm about what to do next, etc. If you haven't done anything, you will get little value out of these meetings. Come with questions I can help you answer.

  5. First Draft (30%)
  6. This is the version of the paper that we will review at the "Mock Program Committee." You should have some of your research completed by this point. The draft should contain all the parts of the paper, although it may have preliminary results and may be missing some results. For any results you do not have, we want you to clearly state that you do not have results, but we want you to write the results section. That is, we want you to think through what results you hope to have and how you'll present them. This can be super fun, because you can MAKE UP WHATEVER RESULTS YOU WANT. The reason we do that is because when you get actual results, you then have something against which to compare them. When your actual results do not match your predictions, one of two things is true: Either, your initial intuition was wrong or your system/tests are wrong. If the former is true, that's really interesting -- your final paper will then explain what you originally thought and why it was not right. If the latter is true, you get to do some debugging. This first submission should contain a complete introduction, background, description of your research, related work, and (preliminary) conclusions. You should be able to write significant parts of this immediate after your project proposal is turned in, so please, please, please don't write this all the night before it is due. The better and more complete the first draft, the more valuable the input that we and your classmates can give you.

  7. In-class Presentation (10%)
  8. Each group will present a short talk on their research during class. You should plan on approximately a 10 minute presentation and 2-5 minutes for questions and answers. You can think of the in-class presentation as being a short conference presentation. You will not have time to present all the details and subtleties of your work, but you should be able to motivate the audience and explain the important results of your work. After your presentation, your classmates should want to read your final report. The presentation is a great way to make sure that they understand what you're trying to do, so that there is no confusion when they read/review your project.

  9. Final Report (40%)
  10. The final report is a research paper. I expect that most reports will be approximately 10 - 15 "Conference pages" including graphs, tables, diagrams and reference. You should complete the writing early enough that you have time to reread your work and critique it with the rigor that you applied to Assignment 1. Be honest. State shortcomings in your work. Discuss follow on projects. I expect that several of these reports will be suitable for submission to a conference, and I will be happy to work with you to turn them into submissions.

    Part of your final report grade will be based upon how well you address comments raised by the program committee. Do not ignore my and the reviewer comments!

Project Suggestions

We suggest some topics below (we may add to this list after it is up on the web site, so it makes sense to check there if you are stuck for project ideas). You need not pick your final project from this list, but if you decide on a project not on this list, please check with us before fully committing to the project. The key characteristics of a project should be:

  1. The work can reasonably be completed in two months.
  2. We have access to the required hardware and software.
  3. The research question has something to do with systems (I'm willing to give a fair bit of freedom here, but if there are any questions, please check with me).
  4. The project is structured in such a way that you can have tangible results. (No big idea papers probably.)
  5. You will learn something from undertaking this project.

  1. Securing Containerized Computation
  2. Leveraging recent development of Kernel Runtime Security Instrumentation by Google and looking at security namespacing, can you propose a solution to allow containers to load Linux Security Monitors and Berkeley Packet Filters on a per container (a.k.a. namespace) basis? You may want to look at existing cgroup-ed BPF programs and understand the necessary requirements to enable such a feature in the eBPF-LSM context. Can you reproduce the use cases presented here? Can you present new interesting use cases?

    Contact Thomas Pasquier (tfjmp@cs.ubc.ca).

  3. Execution Grammars
  4. Expanding on prior work, Thomas Pasquier's group has started to build a tool that creates a "graph grammar" that describes the provenance subgraph corresponding to a given system call. Given code snippets or malware binaries, it should be possible to build a grammar representing their execution and therefore to detect the subgraph they generate. How could one build such grammar? Can this be done at scale (e.g., given a malware library)? Can you automatically extract a generalization that matches a given malware family to keep detection high in the case of a new malware variant? Could this solution be used to remove the need for human expertise compared to POIROT?

    Contact: Thomas Pasquier (tfjmp@cs.ubc.ca).

  5. Specification of Translation Hardware
  6. Hardware vendors have come up with various kinds of memory protection units and translation hardware. For example, segments provide variable sized, contiguous regions to be translated; MMUs provide page-based translations for the CPU or IO devices. System software is required to correctly program these translation units.
    On a high-level, there is a map from addresses in one address space to addresses in a different address space, or report an error (note: we do not consider broadcast/multicast memory addresses at the moment). For example: the processor's MMU translates virtual addresses used by applications to physical addresses used by the processor hardware. The MMU does not operate on arbitrary addresses, but on pre-defined ranges (page-sizes), and its configuration is defined by a multi-level page table.
    In this space, there are a couple of possible projects which can be done:

  7. uKernel generation
  8. In Liedke's paper on micro-kernel construction you learned that uKernels are inherently non-portable (for performance reasons). Here are a few directions one might want to explore:

  9. Companion Kernel Threads
  10. Processor mode switches have become inherently more costly due to the isolation of the page tables in response to specter/meltdown-style attacks. Performing a syscall requires both a processor mode switch and an additional context switch as well. How bad is it?
    Modern processors often come with hyperthreads: two execution contexts sharing a core. The question is: can we use a hyperthread as a companion kernel thread and use shared-memory / queue-based communication between application and kernel threads to asynchronously execute syscalls? Could this be an event-based mechanism in combination with user-level threads? To what extent can we relax the companion association and use a N-M architecture where N user level threads issue system calls to M handlers in the kernel. Note: this could lead to an application for Snap for storage.

  11. Process overhead in operating systems
  12. Multi-tenant databases, such as MongoDB, often host multiple database tenants on a single machine: this provides better resource utilization. At the same time, they face the following dilemma: hosting multiple tenants within a single process has obvious security and safety problems, while hosting each tenant in a separate process raises performance concerns. Your goal is to evaluate the overhead of running many processes on a modern operating system. Design an experimental framework that will enable you to vary how many processes are active (idle), their total virtual memory size, their working set size and other parameters. Think about key performance metrics users might care about: e.g., throughput, response time, etc. Run experiments to determine how many processes can co-exist on a system before performance begins to degrade. Trace down the reasons for the overhead to kernel data structures and resource management policies. Other aspects which might be worth exploring: using containers (e.g., Docker) to run your database, or using virtual machines which provide stronger isolation but may introduce additional cost.

  13. Secure Tenant Isolation in a single Process
  14. Assume that in the project above, someone discovered that running multiple tenants in a single process is beneficial for performance, but comes at the cost of weaker isolation between two tenants. The goal of this project is to try to isolate tenants within a single process using modern hardware technologies such as Intel memory protection extensions (MPX) which allow accesses only within a certain range of memory.

  15. Performance Model for Near-Memory / In Memory Computation / Accelerators
  16. Running algorithms on a general-purpose processor may be suboptimal, either due to the lack of parallelism, inefficiencies, or limited memory bandwidth etc. Accelerators provide a more efficient way to perform computations. Yet to use the accelerator, the application must decide beforehand which part of the algorithm to offload. Modern compilers already have certain auto-vectorization capabilities that allow the use of vector instructions as one form of acceleration. Whether or not there is a performance gain depends on various factors such as the compute capabilities of the accelerator, potential memory copy overhead etc. Can we come up with a runtime system that automatically decides which parts of the application are executed on the CPU and which parts are executed by the accelerator? For this we need two things: 1) a model for the machine, and 2) a way to express the algorithm. (either simulated, using annotations or using a higher-level language).

  17. One Queue to rule them all.
  18. There exist many descriptor queues in an operating system. For example, sending a network packet requires that the driver format an entry in a descriptor queue and tell the network card where to find the newly formatted packet. Often, there is a pre-allocated descriptor ring, and the driver communicates that there is a new queue element by moving the head pointer of the queue. Conceptually, this transfers ownership of the buffer from the driver to the device. CleanQ presents a way to formalize the semantics of ownership transfer in the context of descriptor queues, with proofs down to the C code for certain queues. For this project, design and develop a way to specify the descriptor queue of the device, so that you can generate code that performs the queue operations, including formatting of the descriptor queues.

  19. Cloud Computing in the Wild
  20. Unikernels [1] offer an alternative to containers (e.g., Docker) as a cloud application deployment solution. In parallel, the concept of "fog" computing is emerging, where services are being deployed directly into the Internet of Things (IoT) infrastructure to improve latency, privacy, and partition resilience (among other things). This suggests the following questions: "Could we allow the dynamic migration of services from the cloud, to edge-devices and directly into end-devices?" and "Could this be done while maintaining the use of programming languages and skills backed by a relatively cheap and abundant workforce?"

    Transpilation techniques [2] combined with a unikernel designed for extremely low resource consumption [3] could be a step in this direction. A preliminary proof of concept demonstrated the possibility of transforming a PHP application into a few MB self-contained virtual machine image. We want to go beyond that proof of concept and build a prototype to demonstrate effective service migration in such a manner.

    1 - Madhavapeddy, Anil, et al. "Unikernels: Library operating systems for the cloud." ACM SIGPLAN Notices 48.4 (2013): 461-472.

    2 - Zhao, Haiping, et al. "The HipHop compiler for PHP." ACM SIGPLAN Notices. Vol. 47. No. 10. ACM, 2012.

    3 - Bratterud, Alfred, et al. "IncludeOS: A minimal, resource efficient unikernel for cloud services." 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 2015.

  21. Real End-to-end Provenance
  22. Data provenance is metadata that describes how a digital artifact came to be in its present state. One problem with existing provenance capture systems is that they capture only local provenance: provenance from a particular language (e.g., R, Python) or from a particular workflow system (e.g., VisTrails). However, once you copy files or use multiple langauges, or connect different programs together in a script, you run the risk of breaking the provenance chain. We believe that whole system provenance (e.g., CamFlow) could provide the glue that connects different provenance systems. Your goal is to demonstrate some application that uses provenance from multiple different collection sources to do something interesting. For example, given a shell script that calls both R and Python programs, can you automatically build a container or VM that precisely and exactly reproduces the experiment? Alternately, could you use provenance to build a debugging tool? If you're interested in this project, come talk to me.

  23. Using Provenance to Solve OS Problems
  24. There exist many systems papers of the form, "We wanted to solve some problem, so we modified the kernel to produce a bunch of data, and then we used that data to do something." I'd like to see how many of these projects could be done via a single provenance capture system. CamFlow is a selective whole-system provenance capture system. It also has a lovely front-end display engine. How many special-purpose systems could be replaced by scripts running over CamFlow data. I could imagine doing this dynamically over streaming data (using CamQuery) or statically over collected data.

    1. For example, prefetching files requires that you know what files are likely to be accessed, before programs actually access them -- Camflow captures much of that data. So, see if you can replicate the work in "An Analytical Approach to File Prefetching (1997 USENIX)" using Camflow. Here are other papers on file prefetching to examine:
      • Marginal Cost-Benefit Analysis for Predictive File Prefetching (ACSME 2003)
      • Design and Implementation of Predictive File Prefetch ing (USENIX 2002)
    2. Another area where provenance might be useful is in cache replacement algorithms -- if you knew what you might need again soon, you would keep it in your cache. Look for papers on caching, such as:
        A study of integrated prefetching and caching strategies (Sigmetrics PER 1995).
      • Informed prefetching and caching (SOSP 1995)
      • Application controlled prefetching and caching (USENIX 2002)
    3. The Coda file system was designed to help users work in a disconnected mode. One component of that system was a hoarding mechanism where the system would try to figure out what files you were going to need to function while disconnected. It seems that one could exploit provenance to perform better hoarding. Do it!

    Warning: I have a strong vested interest in this project. The upside is that you are likely to get lots of attention; the downside is that you are likely to get lots of attention.

  25. Storing whole system provenance on blockchain
  26. We are frequently asked how we maintain the integrity of provenance. As Camflow provides a mechanism for shipping provenance out to a remote site, it’s possible that you could simply store provenance on the blockchain. However, it’s also possible that performance of blockchain storage will be too slow. Read the Camflow paper and as much blockchain background as necessary. Can you make this work? (Consider alternatives to proof-of-work, else this is just a non-starter.) (There is a recent paper that does something like this; I was not impressed with it, so I'm pretty confident there is something more interesting to be done.)

  27. Prove that LSM-based provenance capture is guaranteed to detect a security breach.
  28. This is a two-step process. First, using the methodology used in this paper, show that the current LSM interface captures all security-related flows in the kernel. Next, given provenance captured at these points, prove (or disprove) that a security violation must show up as an anomaly in the provenance graph.

  29. Generating a HAL for ARMv8
  30. There exists a machine reabdable specification of the ARMv8 architecture. Based on this architecture, we can build an architectural simulator of an ARMv8 processor. Can we use this information to generate a hardware abstraction layer and/or certain boot sequences or configuration sequences that, for example, switch the execution level, configure architectural timers etc.

  31. Tiny OS Components for Tiny Processors
  32. Pick some special purpose processor. Design a tiny operating system that either A) makes it easier to develop applications for the processor, or B) allows it to seamlessly communicate with a general purpose processor and OS. Once you've established the functionality you need for your tiny OS, design a set of even tinier components that can be assembled to provide that functionality.

  33. Access Pattern from JIT
  34. Interpreted languages such as Python or languages with an intermediate bytecode representation such as Java are not compiled to the target architecture of the target machine. Instead the runtime starts executing the statements or the bytecode by emulating it on an abstract machine (e.g., as provided by the JVM). This process introduces overhead. Just-in-time compilation generates native machine code, which is then executed natively rather than being interpreted. Develop a way to extract access patterns, e.g., looping over an array. Can we identify patterns that are good to run on accelerators such as GPUs or near-memory processing devices? The idea here is to try to identify parts of algorithms that can be run more efficiently on some form of accelerator.

  35. OpenMP
  36. OpenMP is a standard for parallel programs. Users add pragmas to code, and those pragmas direct the compiler to emit code to parallelize, e.g., a loop. Instead of directly executing the loop, the loop body is wrapped into a function. The compiler then uses the OpenMP support library (e.g., libgomp for GCC) and passes the wrapper function as a function to be run by the threads managed by the runtime. As in the previous project, identify techniques to extract access patterns. Then develop ways to offload specific patterns to one or more accelerators, e.g., a GPU. You may find it handy to include a survey of existing offloading techniques in compilers such as GCC.

  37. Snap for Storage
  38. The Snap paper advocates using a uKernel approach for networking. Can you develop a similar for storage. How does the performance compare to the traditional Linux storage stack? What new functionality can we achieve using this design? Where are its drawbacks?

  39. NVM Write cache
  40. Non-volatile memory (NVM) is an emerging memory technology resembling both storage and memory. Like storage, it does not lose data upon a crash. Like memory (e.g., DRAM), it sits on a memory bus and is byte-addressable. While NVM has been in research for more than a decade, in April 2019 Intel (in partnership with Micron) released the first commercial product: Optane NVM (also known as 3D Xpoint). Despite tremendous excitement among researchers and practitioners alike, the jury is still out as to what is the best use form NVM and how it fits into the storage hierarchy. This project will answer part of this big question in the context of a specific application: a file system cache. Intel implemented a write cache, using NVRAM as a layer that sits on top of a block storage device, such as an SSD or HDD. The write cache was implemented as part of the Linux device mapper. The write cache sits next to the kernel buffer cache (which is in DRAM) and caches blocks that are being written to the block storage device (flushing them to the device on the background). The write cache does not cache blocks that are being read. Your goal is to evaluate the write cache to determine when it improves performance and when it doesn’t. How could you achieve the former without penalty? Does it make sense to have reads cached as well, and if so when? Choose storage-intensive workloads, such as YCSB or others.

  41. Leveraging Copy-Paste errors in Device Drives
  42. At a high level, the goal of this project is to extract high level specifications from device drivers. There is a large body of work identying and leveraging copy-paste errors in operating systems and device drivers to find/fix bugs, e.g., CP-Miner, An empirical study of operating system errors. We are particularly interested in such errors as they appear in device drivers. While we are actively engaged in developing a system to synthesize device drivers based on high level specifications, writing device driver specifications is still tedious. The question we ask here is whether we can leverage copy-pasted code (or other commonalities) in existing device drivers, to automatically extract high level specifications of the driver, so we could then synthesize similar drivers or drivers for other operating systems.

  43. Using multiple cores to improve debuggers
  44. Nearly every machine on which a programmer debugs code has multiple cores. However, most of those cores remain idle while the programmer decides what steps to take in tracking down a bug. In this project, we ask the question, "Can we use techniques from program synthesis, static analysis, dynamic analysis, etc. to automatically use extra cores to provide useful information to the human who is debugging the code?"

  45. Synthesizing a Microservices or FaaS Testbed
  46. We have moved from a world of servers to microservices and more recently function-as-a-service offerings. As such, much research attention focuses on how to make these platforms perform better. Assessing improvements inside of an Amazon or Google is relatively straight forward, but evaluating improvements for academic research groups is nearly impossible. Imagine for a moment that you could spin up a set of microservices (or functions), each of which exhibited exactly the behavior you want to model. If you could make this a reality, then anyone could run good evaluations.

    The following is written in the context of microservices, but you could do the exact same thing for functions-as-a-service platforms. You might try to approach this any way you like, but one approach might be to A) Identify the key parameters that differentiate different microservices (this is mostly a literature review), B) Similarly, identify the key characteristics of workloads that people run on collections of microservices. C) Develop a microserver parameterized according to what was uncovered in A. D) Develop a workload generator using what was determined in B. E) Write an engine that lets you specify a workload and collection of parameterized microservices on which to run the workload and generates the experimental deployment.

    An alternative way to implement this might be write even higher level specifications for the workload and microservice and use modern program synthesis approaches to generate an implementation of a workload generator (microserver) that meets the specification.

  47. CHERI Toy
  48. CHERI ( Capability Hardware Enhanced RISC Instructions) is a capability-based hardware extension. CHERI provides fine-grain memory protection within a single address space.

    Tinkertoy is a set of operating system components from which one can assemble an operating system designed for 4th generation IoT devices. Tinkertoy does not have a virtual memory system.

    Design a system to provide protected address spaces in Tinkertoy using CHERI.

  49. Device Driver Analysis
  50. Device drivers make up a significant portion of operating systems code, and they are disporportionately responsible for bugs in said systems. One approach to protecting the rest of the kernel from bugs in drivers is kernel compartmentalization using techniques such as Ksplit. Such approaches highlight just how intertwined device drivers are with the rest of the kernel. The goal of this project is to precisely identify and quantify these dependencies.

    The immediately use case of this analysis is to assist in an ongoing project in device driver synthesis. One of the key challenge in moving drivers between different systems is translating calls from a driver into the surrounding operating system. We want to understand the scope of this interfaces to design an approach that will be feasible across a large number of systems.

  51. Device Generation
  52. As mentioned in previous project descriptions, we have an ongoing project to synthesize device drivers from specifications. This project is a dual: synthesizing emulated devices. The question that we would like to ask is, "To what extent can we generate an emulated device for the Qemu virtual machine and/or the Arm FastModels simulator from the specification we use to synthesize device drivers?"

  53. Memory System Specification
  54. The beauty of program synthesis is that verification is a key part of the synthesis pipeline. By using verification, developers can prove that their programs adhere to their specification. While applying this technique to user code is reasonably straight forward (though nontrivial), doing the same for system software requires both a spec of the software and a detailed enough specification of the underlying hardware, potentially including its various caches, buffers, and architectural state that can influence the result of software execution. To expand the reach of program synthesis, the goal of this project is to develop a useful, but simplified, specification of the memory subsystem of modern processors and demonstrate its utility.