Clinic Projects

Please click on a link below to view the Harvey Mudd College Computer Science Clinic projects for the corresponding time period.

Clinic Projects for 2016-2017

Hotel Recommendation System

Client
American Express

Faculty Advisor
Zachary Dodds

Student Team
Christine Chen (PM), Yacht Kitimoon, Alyssa Kubota, Jon Ueki
American Express and its partners seek to make travel planning easier, quicker, and more personalized. To that end, the objective of our clinic team was to explore and extend American Express's current machine-learning based hotel recommendation algorithms with the goal of improving their performance.

Measuring User Engagement in Fairway Solitaire

Client
Big Fish Games

Faculty Advisor
Colleen Lewis

Student Team
Justis Allen, Michael Diamond (PM), Adam Dunlap, Aaron Stringer-Usdan
Big Fish Games, Inc. develops the game Fairway Solitaire and cares deeply about the player experience. The goal of our project was to use machine learning on existing user data to better understand how player interaction with the game's features affects player engagement. Better understanding what makes users engage with the game would allow Big Fish Games, Inc. to make more informed design decisions.

Using Latent Topics Models to Detect Rare Behaviors

Client
FICO

Faculty Advisor
Robert Keller

Student Team
Savannah Baron, Sneha Deo (PM-S), Emily First (PM-F), Hope Yu
Our project's goal was to investigate the detection of rare customer behaviors in transactional data using latent topic models, a form of unsupervised machine learning typically used to detect topics in examples of natural language. Our team has developed techniques to apply these models to time series data and has assessed their viability in the detection of anomalous behaviors.

Dynamic Website Updates

Client
GoDaddy

Faculty Advisor
Ben Wiedermann

Student Team
Keighley Overbay (PM-S), Maureen Naval (PM-F), Terrence Diaz, Linnea Nelson, Connie Wang
GoDaddy provides small business owners with the tools they need to easily host and create their own personalized websites. The goal of our project is to extend GoDaddy's website builder to allow small business owners to automatically display announcements on their websites at a predetermined time, as well as directly post these announcements to their social media accounts.

Serializing Chromium Tab State

Client
Google, Inc.

Faculty Advisor
Beth Trushkowsky

Student Team
Julien Chien (PM), Zoab Kapoor, Thomas Le, Yi Yang
Google's Chrome browser, based on the open source project Chromium, is the most widely used browser in the world. The goal of our project was to research and implement strategies for serializing the full state of a tab in Chromium so that it can be suspended and restored with minimal user-visible disruption. Our team researched and implemented strategies for serializing and restoring tabs so that Chrome users would not lose information during memory-constrained situations.

Cognitive Note Taking

Client
International Business Machines Corporation (IBM)

Faculty Advisor
Lisa Kaczmarczyk

Student Team
Scott Chow, Harry Cooke (PM), Wyatt Cooper, Julia Cosma, Emilia Reed
This project aims to develop a note-taking mobile application and service that acts as a personal cognitive assistant to help IBM employees extract company-specific information from their notes. The application accepts collections of documents (such as PDF files or images) and direct input to the application (such as text or handwriting). It utilizes IBM Watson cognitive services to analyze the documents and extract specific information about companies, which is then verified by the user.

Fast Detection of Problems in Scanned Documents

Client
Laserfiche

Faculty Advisor
Yekaterina Kharitonova, Melissa O’Neill

Student Team
Tiffany Sun (PM), Kharisma Calderon, Carmen Mejia, Andrew Scott
Laserfiche builds software that helps organizations digitize content and automate processes. To ensure that data from scanned paper documents can be accurately extracted, Laserfiche has tools to fix image quality problems such as skew and speckles. The goal of our project was to automatically and quickly detect problems in scanned documents. By detecting these problems, the software can reduce the time and processing required for image correction. Our team extracted features from a collection of scanned images and used machine learning classifiers to predict if a newly scanned document has problems.

High Performance Portability

Client
Lawrence Livermore National Laboratory

Faculty Advisor
Chris Stone

Student Team
Nick Gonzalez (PM-F), Aaron Lobb (PM-S), Dan Obermiller
Lawrence Livermore National Lab (LLNL) uses supercomputers to perform complex physics simulations. Maintaining parallel code is difficult when faster computers with different architectures are installed every few years. Portability layers can simplify this code by hiding details of computer architecture and parallelism. The goal of the clinic team was to improve RAJA, a portability layer created and used by LLNL. The team also compared RAJA to other portability layers with respect to usability and performance.

New Relic Churn Prediction & Prevention -- Micro-Segmentation and Predictive Analytics

Client
New Relic, Inc.

Faculty Advisor
Ran Libeskind-Hadas

Student Team
Felis Perez, Rose Choi, William Chen (PM-S), Yiqing Cai (PM-F)
The goal of the New Relic Clinic project is to develop a data-driven approach to predicting potential churn. New Relic has collected substantial data. The types of data that has been collected have varied over time, leading to a heterogeneous dataset that is difficult to analyze. To address this issue, the New Relic Clinic project aims to first restructure the data to be consistent and then apply machine learning techniques to identify features capable of predicting churn.

Predicting Malicious URLs

Client
Proofpoint, Inc.

Faculty Advisor
Elizabeth Sweedyk

Student Team
Vidushi Ojha (PM), Aidan Cheng, Kevin Herrera, Carli Lessard
As part of their security solutions, Proofpoint provides a service to scan URLs embedded in clients' emails, and determine whether they lead to sites containing malware. Suspicious URLs are redirected to a virtual environment, or sandbox, where they are tested for maliciousness. The goal of our project is to create a machine learning classifier which can better detect malicious URLs, so that fewer URLs need to be sandboxed. We investigated various models and features to create a number of options for such a classifier.

Detecting Evil Through Machine Learning

Client
Reddit

Faculty Advisor
Yi-Chieh Wu

Student Team
Jonathan Chang (PM-S), Rachel Lee, Anna Ma, Kent Shikama (PM-F), Lisa Yin
Reddit is an online discussion platform where users can form communities centered around a variety of topics. Like any discussion website, Reddit has experienced instances of spam, trolling, cyberbullying, and general aggressive behaviors from users, which can alienate other users or discourage people from joining Reddit. To help Reddit combat such behavior, we created an extensible machine learning pipeline for predicting whether a comment will be perceived as aggressive.

Image De-identification

Client
The MITRE Corporation

Faculty Advisor
Lisa Kaczmarczyk

Student Team
Madi Pignetti (PM), Nava Dallal, Michael Sheely, Veronica Rivera
The MITRE Corporation is a not-for-profit research company that applies new technologies to problems in an array of areas. The goal of the MITRE clinic team's project is to produce an algorithm that transforms an image to preserve apparent similarity to a human, but leads to a decreased accuracy for a number of specified recognition algorithms. This algorithm makes it significantly more difficult for certain facial recognition algorithms to detect individuals, leading to increased security and privacy.

Real-Time Visualization and Machine Learning On Network Streams

Client
Webroot, Inc.

Faculty Advisor
Geoff Kuenning

Student Team
Nick Bailey (PM), Rohin Lohe, Jeff Milling, Norwood Square
Webroot specializes in cloud-based Internet security for consumers and businesses. The goal of our project was to perform real-time machine learning on local network streams and provide insight about anomalous and malicious behavior. Our team developed an infrastructure that uses machine learning to capture and analyze network traffic and display information about it, offering clients a better understanding of malicious activity on their networks and a new way to protect against malware.