This Year's Projects


HMC's 2021 Summer REU projects span theoretical, practical, and experimental elements of computer science research. We are excited to offer projects in machine learning, natural language processing, and program analysis!

Improving Topic Modeling Tools for Novice Text Miners

Advisor: Professor Xanda Schofield

Probabilistic topic models are widely used outside of computer science to find patterns of meaning in large text collections. However, domain experts outside machine learning often encounter a steep learning curve to understand and adapt topic modeling tools for their work. Last summer, a team of students interviewed 16 researchers who study text with topic models to show that topic model research is iterative, alternating between training topic models and diagnosing issues in preprocessing and configuration from symptoms seen in the outputs of those models. While the final model for a project may take less than a week to train, diagram, and analyze, it may take months to successfully reach the point of training that model. Our research question is this: how can we design a tool that supports iterative refinement of topic models for a user base with limited programming experience but deep textual questions?

This summer, we will build on work from last summer to develop jsLDA 2.0, a revision of a small web-based topic modeling interface that streamlines common “loops” and workflows described by these users. We will then perform user studies with both novices and experts to further develop this tool, alongside an accompanying tutorial suitable for digital humanities and computational social science classrooms. Students working on this project will practice skills of data processing for text, visualization, web development, and user study design.

For more information about our project from last summer, check out our short video here.

Using Imperfect Predictions to Make Good Decisions

Advisor: Professor Erin Talvitie

Imagine learning to play a new video game. A natural approach would be to learn to make predictions about how the video game will respond to your actions (model learning) and to use those predictions to make decisions (planning). Model-based reinforcement learning (MBRL) is an analogous approach for artificial learning agents, where agents use their experience to create a predictive model of their environment and then use that model for planning purposes. Unfortunately, model-based learning has not been as successful in artificial agents as it is in natural ones! One major reason for this is that even tiny errors in the model can lead to catastrophic planning errors. In this project we will study this problem and potential remedies that may make MBRL more robust. We will work toward applying these ideas in the challenging domain of Atari 2600 games. Specific projects will be informed by the mutual interests and experience of student and mentor, but will likely center on questions about how to measure and represent model error/uncertainty, how to robustly make decisions using a flawed, uncertain model, and/or how to scale up MBRL techniques to complex, high-dimensional problems.

Discovering the Limits of Machine Learning

Advisor: Professor George Montañez

What powers machine learning? The AMISTAD Lab opens the black-box to understand how machine learning works as a form of search, governed by information theoretic and statistical constraints. We prove formal results in learning and search. We will explore how search provides a unifying concept for machine learning, how information resources and dependence structures can be leveraged to move beyond memorization to true generalization, and we will probe the formal limits of learning processes. More information about our lab can be found here, as well as a brief news item about some of our recent publications.

Looping in the Human Mind: Time, Path, and Cognitive Complexity

Advisor: Professor Lucas Bang

How complex is a piece of code?

To answer that question, we first need to decide what the word "complexity" means! One option is time complexity, or a measure of the number of steps executed by the algorithm that the code implements. Another measure is called path complexity, which gives the number of different execution paths through a given piece of code. Yet a third measure is the cognitive complexity: the amount of mental effort required to understand some source code. In this project, we will examine how time complexity, path complexity, and cognitive complexity of code are related to one another.

If you are interested in any or all of human learning, programming, algorithms, and combinatorics please apply!

See here for a short video on our path complexity work from last summer.

1 or 2 REU students will be supported for this project depending on interest.

Code Switching: Semantic Density and Programming Language Learnability

Advisor: Professor Lucas Bang

Programming languages vary in their verbosity. Some languages require writing a lot of code (think Java) and some languages require writing very little code (APL and J are extreme examples of terse languages). The amount of meaning that is encoded into the symbols of a program determines the semantic density of that program. In this project we will borrow techniques from natural language processing and information theory to quantify the semantic density of a programming language's syntax. We will design an experiment to compare semantic density with human performance on programming tasks in a custom domain specific language.

Those interested primarily in human-centered programming language design with additional interests in natural language processing or information theory are encouraged to apply.

1 or 2 REU students will be supported for this project depending on interest.