Algorithms for Computational Biology

This project will investigate algorithm design and analysis for a fascinating and important problem in computational biology. The research addresses something called the “Cophylogeny Reconstruction Problem”—the problem of determining how two species may have coevolved.

This research involves designing and analyzing algorithms, implementing these algorithms as software tools that can be used by practicing biologists, and working closely with biologists to analyze real biological data.

The Cophylogeny Reconstruction Problem

We are given the phylogenetic tree (also known as an evolutionary tree) for some group of "host" species (e.g. gophers) and a second phylogenetic tree for a group of "parasite" species (e.g. lice); we believe that the two species coevolved.

The two trees have the current species as their leaves and the internal nodes are hypothesized ancestral species which no longer exist. In addition to these two trees, we are given a mapping between the leaves of the parasite tree to the leaves of the host tree, indicating the parasite species that are currently specialized to the the existing host species.

These two trees are almost never completely identical in their structure. (That is, they are not isomorphic.) However, there are certain biologically plausible events that can explain the deviation between these two trees. The fundamental question is "what is the most plausible set of events that would explain the coevolution of parasite and host species?" Our research group has made some progress on this problem (including a recent paper to appear in the Journal of Computational Biology and a new software tool called Jane) but there are many interesting issues that have not yet been resolved. Ultimately, we would like to develop general algorithms and software tools that would allow evolutionary biologists to investigate any host/parasite pair of phylogenetic trees.

Mentor: Professor Ran Libeskind-Hadas

Ran has been a professor at Harvey Mudd since 1993. His general research interests are in algorithm design and analysis. He has recently begun working on algorithmic problems in computational biology. He received is B.A. in applied mathematics from Harvard University and his M.S. and Ph.D. in computer science from the University of Illinois at Urbana-Champaign. In addition to research and teaching, he enjoys hiking, biking, and playing with his two children.


Required Background

Students working on this project should have completed an undergraduate course in algorithms. In particular, familiarity with algorithm design paradigms such as dynamic programming are important as are knowledge of basic graph algorithms (e.g. shortest path algorithms, network flow, etc.). Some exposure to the theory of NP-completeness is desirable but not absolutely required. Applicants should clearly indicate all courses and experiences related to discrete mathematics, algorithms, and complexity theory.