Research Projects for Summer 2015

This link is to a page of opportunities outside the CS department... .

Please apply by Feb. 15, 2015 via the link at left...

Project titles/advisors

A title/advisor overview of the projects described on this page and in our application:

  • Computational Biology: from Darwin to DP   [Hadas/Wu]
  • OCM: Observationally Cooperative Multithreading   [O'Neill/Stone]
  • Proof Checking for Mathematical English   [Stone]
  • Computer Science Teaching Tips   [Lewis]
  • Trace Repository   [Kuenning]
  • Big Data - Small Energy   [Kuenning]
  • Automatic Text Simplification   [Medero]
  • Computing for Active Transportation   [Medero]
  • Particle Simulations for Dreamworks   [Amelang]
  • Make Stuff at Sandia Go Faster   [Amelang]
  • What Makes You Different from Me, and What Does It Mean about Evolution?   [Wu]
  • Query Processing with the Crowd   [Trushkowsky]
  • Project IMMERSION: Mathematical Modeling for Elementary Schools   [Levy]
  • PaWPal: Productivity and Wellness Pal   [Boerkoel]
  • Robot Brunch   [Boerkoel]
  • Summer Staff   [Wiedermann]
  • Intelligent Music Software   [Keller]
  • Robot Mapping   [Dodds]
  • MyCS: Middle-years Computer Science   [Dodds]
  • PLs: What good are they?   [Wiedermann]
  • Vice-chancellor of Fun positions   [Wiedermann/Lewis/Dodds]

Project Descriptions

Computational Biology: From Darwin to Dynamic Programming

Darwin posited in the "Origin of Species" that two species - such as flowers and the bees that pollinate them - might evolve in tandem for the mutual benefit of both species. Darwin's speculations on "coevolution" of species have been validated by recent advances in computational biology. In particular, our group and others have developed algorithmic methods - many of them using dynamic programming - for understanding the most likely scenarios by which a pair of species coevolved.

These same methods can be used for the related problem of understanding the relationships between species and their genes. In some cases, species acquire genes by "transfers" from other species. And, genes can duplicate and then one copy can independently evolve new function. In short, genes and species have relationships that look - in many ways - like the relations between pairs of species like flowers and bees.

This research seeks to develop new techniques and software tools to help biologists better understand the relationships between groups, such as in the coevolution of species or the evolution of genes and species. The work involves several different activities including the design and analysis of algorithms, implementing new software tools, visualization of large datasets, among others. Students will work on the activities that best match their interests and background.

There are at least five positions available this summer. A few positions will be available to first-year students who have taken CS 60. Other positions will be available to students who have taken "Algorithms" (CS 140/Math 168). MathBio 118 is also good background for this project. Directed by Prof. Libeskind-Hadas in collaboration with Prof. Wu.

Observationally Cooperative Multithreading

Programmers face a multi-core future but no one knows how to take advantage of it. Current techniques for concurrency tend to be primitive, complicated, and error-prone. This project will investigate OCM, a new approach to writing concurrent code without the hassle and bugs of locks. Programmers write code as if each thread has the machine to itself until it explicitly yields control; under the hood, the system detects noninteracting threads and runs them simultaneously.

Students are needed both to continue past work on the design and implementation of OCM, and to apply OCM to real problems (to measure efficiency and ease-of-use). Work on the system would benefit from CS 131 (Programming Languages) and/or CS 105 (Computer Systems) as background, but the application and measurement work should be accessible to students finishing their first year at HMC. Directed by Profs. O'Neill and Stone.

Proof Checking for Mathematical English

Errors in software become more expensive and dangerous each year. Testing is helpful, but proofs are the only way to guarantee that an algorithm is correct, or that a cryptographic protocol is secure, or that a programming language is safe. Unfortunately, it's hard to get the details right in a big proof; even careful reasoners make mistakes, and even careful readers miss them. Automation is a natural solution and "proof assistant” systems do exist, but they produce proofs intelligible only by other computers.

I want a system to verify proofs written for humans, proofs in textbooks and research papers. Essentially, I want to go beyond spell-checking and grammar-checking to logic-checking.

This summer we'll build on preliminary infrastructure developed last year, extending the natural language processing in the front end and improving the reasoning (theorem proving) in the back end. (Directed by Prof. Stone)

Computer Science Teaching Tips is a NSF-sponsored project for disseminating effective computer science teaching practices. Summer research activities will include some subset of the following tasks: (1) interviewing CS educators about their CS teaching (2) writing tips for, (3) transferring our body of tips from to other resource libraries for CS educators, (4) reading CS education research papers, (5) improving the appearance and infrastructure of the existing website, (6) developing interactive visualizations for standardized test data, (7) integrating tips from into an existing CS curriculum, (8) thinking about the teaching and learning of CS, (9) working with CS teachers, (10) developing the social media presence of, and (11) some other stuff.

As you can see, there isn't a finalized set of tasks we'll do. Summer researchers will help steer the project to achieve the greatest impact for the international CS education community! We might not know exactly what we'll do - but we'll have a LOT of fun!!! Only CS5 (or equivalent) experience is required.

In your essay, please indicate whether you'd be interested in working 10 weeks (~May 18 - July 17) or 7 weeks (after summer math ~ June 8 - July 17). If you have other scheduling constraints this summer please mention this in your essay, as well. (Directed by Prof. Lewis.)

Trace Repository

The SNIA Trace Repository contains several terabytes of data collected by observing the behavior of real file systems. Harvey Mudd is responsible for the management and enhancement of this repository. Students will develop tools related to traces, help write standards, locate, convert, and post new traces, and integrate tools contributed by researchers. (Directed by Prof. Kuenning.)

Big Data - Small Energy

Data centers are the cathedrals of our age, yet they consume far more energy than required under everyday conditions. This new NSF-funded project will investigate how zettabyte- (and larger-) scale systems can be designed and implemented in order to strike different balances between performance and energy consumption. (Directed by Prof. Kuenning.)

Automatic Text Simplification

We have massive quantities of text information available to us, including wikipedia articles, news articles, government and policy documents, and social media. The vast majority of this information is written at a late-middle school or early-high school reading level. But 15% of U.S. adults don't read well enough to be able to make use of all that information. In this project, we will explore ways to use natural language processing to automatically re-write text to make it easier to read. Possible student contributions will address questions like:

  • How do we when a reader has trouble with a text?
  • What features of a word or sentence are useful in predicting if a reader will struggle with it?
  • How do humans re-write texts to make them easier to read?

(Directed by Prof. Medero)

Computing for Active Transportation

There was a time when almost half of K-8 students walked or rode their bike to school, but today only 13% of students do. Research shows, though, that students who participate in one of these forms of active transportation do better in school. At the same time, increasing the number of students who walk or bike to school would decrease the car traffic in front of schools, resulting in improved traffic safety, better air quality, and lower transportation costs for parents. In this project, we will learn more about projects like Walking School Buses, which aim to address parents' safety concerns and make it easier for more students to walk to school. Then we will explore how we as computer scientists can support those efforts. For example, what algorithms exist for determining optimal school bus routes? How well do those same algorithms work for identifying routes for groups of children to gather and walk to school together? How would the algorithms need to be modified to identify ideal routes for traditional school buses alongside walking routes? How can we model the expected use of each mode of transportation and the resulting impact on students, parents, and schools? (Directed by Prof. Medero.)

Particle Simulations for Dreamworks

Particle-based simulations are becoming increasingly used to simulate many effects in the film and entertainment industries, including smoke, water, and landslides. Dreamworks is looking to investigate particle-based simulations and data structures in the context of their OpenVDB software for use in movie effects. Summer work will include extensive programming in c++ and at least one presentation to the Dreamworks group in Burbank. Prior experience in shared memory high performance computing is highly desired for those looking to participate. (Directed by Prof. Jeff Amelang)

Using GPUs to Make Stuff at Sandia Go Faster, or "Migration of Compute-Intensive Scientific Kernels to Manycore Architectures"

A group from Sandia is looking to redesign some compute-intensive core calculations so that they work well on modern coprocessors (Nvidia Tesla GPU, Xeon Phi). Summer work will include reading and understanding current CPU implementations, designing, implementing, and analyzing various thread-scalable implementations, and reporting performance. Some possible calculations to redesign are graph algorithms to analyze social networks or molecular dynamics for the study of nanoscale materials. All work will be done in c++ and a presentation to Sandia in Albuquerque is possible. Prior experience in manycore high performance computing (GPU programming through Cuda, OpenCL, or similar) is extremely desired for those looking to participate. An interest in pursuing graduate school is also highly desired, and participation in this project opens paths for future internships at Sandia. (Directed by Prof. Jeff Amelang)

What Makes You Different from Me, and What Does It Mean about Evolution?

Evolution is responsible for the immense biological diversity of our planet; however, despite its central role as the most fundamental property of life, the process of evolution remains poorly understood, and current models have typically been unable to span the diversity of scales at which evolution can act.

The goal of this project is to develop computational models and tools to enable better insight into this evolutionary process. In particular, we will be leveraging massive collections of genomic data to look at how differences among individuals within the same species can be used to improve our inferences.

This project incorporates knowledge from a variety of fields, including machine learning and algorithms, mathematical modeling and statistics, and evolutionary biology -- students interested in learning more about any of these fields are encouraged to apply. There are projects available for both intro and advanced students, and we will work together to find one tailored to your background and interest. (Advised by Prof. Wu)

Query Processing with the Crowd

Crowd-processing is a compelling means for augmenting the capabilities of modern (non-human!) machines using the insights of people. This project will investigate novel approaches to using "the crowd" for query processing. (Advised by Prof. Trushkowsky)

Project IMMERSION: Mathematical Modeling for Elementary School

IMMERSION stands for Integrating Mathematical Modeling, Experiential learning and Research through a Sustainable Infrastructure and an Online Network for teachers in the elementary grades. Project IMMERSION is a three-year NSF-funded effort focused on mathematical modeling for the elementary grades. This summer the team will help collect and design modeling activities for a new online repository. The team will also assist with coordination and research during a one-week professional development course, follow-up lesson study and a local conference. Participants will be elementary school teachers from the Pomona Unified School District. We will coordinate our efforts with collaborators at Montana State University/Bozeman School District and George Mason University/Fairfax Country School District. (Directed by Prof Levy.)

PaWPal: Productivity and Wellness Pal

College students often struggle to balance their work with personal wellness. In part, this occurs because students work when they are unable to focus. One approach to increasing both productivity and wellness is to help users achieve flow. Flow is a state in which people feel focused, motivated, and fully immersed in their activity, resulting in feelings of satisfaction and even joy.

The Productivity and Wellness Pal (PaWPal) is a smart-phone-based application that seeks to make users aware of their efficacy at various tasks as well as which courses of action are likely to lead to immersive experiences. This summer, we'll be asking the following questions:

  • Can PaWPal build a model of users' efficacy and predict when they will be most likely to experience flow?
  • Can PaWPal present this information in a way that motivates users to understand and act when they are most likely to achieve flow?

(Directed by Prof. Boerkoel.)

Robot Brunch

Temporal plans exist to provide robust directives that robots can follow to accomplish their goals, while also coordinating when these activities should occur. In general, we want temporal plans that are adaptable to events that are beyond the direct control of agents; e.g., a robot may experience slippage or sensor failures. To do this, we must answer two questions: (1) how and when do new or unexpected events arise in practice?, and (2) how "good" is the temporal plan at adapting to unexpected events that might otherwise invalidate the plan?

Last summer, we introduced new metric called robustness, which assesses the likelihood that a multi-robot plan succeeds. We also showed that robustness is a better measure of multi-robot plan quality. However, we left one big question unanswered: how do we generate robust plans in the first place? The goals for this summer include:

  • Play with robots
  • Create a new multi-robot coordination application (robot Pac-man anyone?)
  • Design and evaluate new algorithms for generating robust multi-robot temporal plans

(Directed by Prof. Boerkoel.)

Summer Staff

Summer staff is a small group of students who help maintain and improve our CS department's computational infrastructure: software, hardware, and usage patterns/policies. No previous systems-administration experience is required: you will learn about the systems that It's fun, vital for the department, and an ideal chance to expand your systems knowledge -- join us! (Directed by Prof. Wiedermann.)

Intelligent Music Software

The Intelligent Music Software project has been developing educational software tools to help students learn to improvise music, particularly jazz. Our approach is to aid the student in constructing melodies similar to ones that could be improvised, in order to get a better understanding of harmony and its relationship to melody construction. One tool produced by this work is Impro-Visor which is free open-source source software. Two types of advice given are: empirical advice, based on a database of stored melodies that match certain chord changes, and grammatical advice, based on a grammar that generates melodies on the fly. This free software tool has been used in classroom settings for six years and has over 8400 registered users at present. In addition to its primary function, it provides a microcosm of examples for software development, including knowledge representation and real-time execution of music accompaniment.

An anticipated area of focus for summer 2015 includes Audio Input and enhancement of real-time aspects of Impro-Visor: We would like Impro-Visor to become a better companion for accompanying and trading melodies with the user. Ideally, the real-time improvisor would emulate the thought processes of a human improvisor at a macro scale. Many of the features of the tool are capable of working in real-time, but there are ergonomic interface and knowledge-representation issues to be researched, including the use of "harmonic bricks," which are elements of Impro-Visor generated roadmaps. The input can be MIDI, or an audio-in interface, on which there is some initial development, although there is also research to be done on improving the interface.

Please see this link for more information, including papers and related work.

(Directed by Prof. Keller.)

Robot Mapping

Mapping with a single camera is a compelling problem because we humans have no trouble wandering a new environment or assessing (un)familiar objects - even with one eye closed. What's more, we can subsequently use our visual experiences to move around and perform tasks. This project will include several 2d and 3d mapping teams:

  • 2d object mapping for authentication and retrieval   This team will use computer vision and machine learning to map and navigate objects in image space. Its goals include determining if one object is authentic, e.g., a coin, stamp, or collectible card, and mapping its idiosyncrasies (small "imperfections") to distinguish ostensibly similar obejcts from one another. The robotic component will investigate custom object-handling and lighting-control systems, as well.

  • 2d ecological mapping   In collaboration with Bio's Prof. Matina Donaldson-Matasci, this team will use a drone to collect aerial terrain images and will use field data and ground images to composite them into an ecological maps, specifically to support the study of bees' habitat and food resources in the area.

  • 3d indoor mapping   This team will leverage the object models created by our Matterport "3d camera" in order to reason about a robot's or a person's (indoor) spatial environment. Ideally, from a novel image the system would be able to pinpoint the location from which the robot (or person) took that image.

We certainly do not seek to re-derive the amazing progress in this field! Thus, 2015's projects will depend heavily on the software scaffolding of ROS (the Robot Operating System), PCL (the Point Cloud Library), OpenCV (the Open Computer Vision library), and other helpful systems we encounter. These projects include spots for 8 or so students. (Directed by Prof. Dodds.)

MyCS: Middle-years Computer Science

This project seeks to increase the skillset and mindset of computing in precollege-age students -- in particular, at the middle-school level. This summer we will be expanding our middle-school target audience to include both an elementary version and a high-school version in which CS5 is used as a foundation for the new AP CS principles course. Join us to help develop K-12 CS curriculum and present two week-long summer workshops to a small groups of middle-, elementary- and high-school teachers! This project can support 8 or so students. (Directed by Prof. Dodds.)

Programming Languages: What good are they?

Even though we've had decades of research and development on programming languages, it's *still* difficult or impossible for most people to write a program that does what they want.


In this project, we'll read and write about the design of programming languages: For whom have they been designed? Whose digital voices are silenced because they cannot program? What counts a programming language? What can we do to make it easier for more people to write programs?

Funding is a available for one student. To work on the project, no prior experience with programming language design is necessary. You should be interested in reading and writing papers on language design; there won't be much programming for this project. A strong interest in an area outside of CS (e.g., the arts, public policy, science, math, engineering , etc.) is a plus. (Directed by Prof. Wiedermann)

Vice-chancellor of fun position

If you are applying to an HMC summer position and would enjoy working as the social coordinator for the program and for all of HMC CS summer research, then this vice-chancellor of fun position will be of interest. It supports your on-campus housing for the summer and it adds a $1000 stipend to your existing summer stipend and puts you in charge of procuring summer foodstuff and organizing activities (or encouraging others to do so!) You need to have a driver's license, a willingness and ability to drive the large HMC van, and enthusiasm for making things happen! (2 spots available) (Directed by Profs. Wiedermann, Lewis, and Dodds.)