TreeFix
paper | download | supplemental data | tutorial

Paper

TreeFix: Statistically Informed Gene Tree Error Correction Using Species Trees
Yi-Chieh Wu, Matthew D. Rasmussen, Mukul S. Bansal, and Manolis Kellis.
Systematic Biology. 2013. doi: 10.1093/sysbio/sys076

Address correspondence to: Yi-Chieh Wu (yjw at mit.edu) and Manolis Kellis (manoli at mit.edu)

Additionally, if you use the default module for computing the test statistic, please cite
Alexandros Stamatakis. RAxML-VI-HPC: Maximum Likelihood-based Phylogenetic Analyses with Thousands of Taxa and Mixed Models. Bioinformatics 22(21):2688-2690, 2006

Download

TreeFix is a phylogenetic method for improving gene tree reconstructions using a test statistic for likelihood equivalence and a species tree aware (reconciliation) cost function.

The default TreeFix package is meant for use in eukaryotic genomes. For prokaryotes, use TreeFix-DTL.

The TreeFix package includes the Python source code, modified RAxML source code, as well as several library interfaces for Python. A detailed README and sample dataset are also included.

Requirements

Likelihood models

By default, TreeFix computes p-values based on the Shimodaira-Hasegawa (SH) test statistic with RAxML site-wise likelihoods. This is included in the main TreeFix package. Modules based on other phylogenetic programs or using other test statistics may be added in the future. For more test statistics, see CONSEL.

Reconciliation models

By default, TreeFix uses maximum parsimony reconciliation (MPR) and computes the duplication-loss cost. This is included in the main TreeFix package. Modules based on other reconciliation models may be added in the future.

Tutorial

A fairly thorough tutorial with detailed installation instructions, descriptions of command line options, and step-by-step instructions for using TreeFix is available.

Supplemental data

In our paper, we evaluated TreeFix on two clades of species, the 12 Drosophila and 16 fungi, and using the same datasets used to evaluate SPIMAP. This included 5351 real gene families across the 16 fungal genomes, as well as 1000 simulated gene families (generated under the SPIMAP model) across each clade. We also evaluated TreeFix on simulated gene families with simulated species trees generated using a range of speciation rates and tree sizes. Note that TreeFix uses many of the same conventions as SPIMAP, so please refer to its website for more detail on any of these files. Additional simulated datasets available upon request.

References


Last updated 06/19/14.