Address correspondence to: Yi-Chieh Wu (yjw at mit.edu) and Manolis Kellis (manoli at mit.edu)
STAR-MP is a phylogenetic method for reconstructing architecture evolution based on a known species tree, extant architectures, and (reconstructed) module (domain) phylogenies.
In our paper, we considered domain architecture rearrangements in
9 fully sequenced 
STAR-MP requires a species tree and species map. We use the species tree estimated by Tamura2004. Additionally, we provide the species map that specifies which genes belong to which species, and the species name abbreviations used in *.stree and *.smap. See SPIDIR and SPIMAP for more detail on these files.
Our files use the FlyBase peptide (e.g. dmel_FBpp0079164) as unique gene ids. Users who with to use alternative identifiers can use this tab-delimited file to map the peptide id to a (1) CG protein id (CG7562-PA), (2) common protein name (e.g. Trf-PA), (3) FlyBase gene id (dmel_FBgn0010287), (4) CG gene id (CG7562), (5) short gene name (Trf), or (6) long gene name (TBP-related factor).
Each line provides the gene, the start and end position (1-indexed) of the module, and the module family.
      Each line in the text files lists the genes belonging to a single architecture family.
      To focus on gene fusions and fissions, the architecture families were
      filtered to a set of "merge/split" families, in which one species has 
      a gene with two connected modules and another species has a gene with 
      at least one of these modules unconnected. STAR-MP was used to 
      reconstruct the evolutionary histories of these families.  These 
      families are indexed by their line number in "fams.ms.txt", and 
      for each family, we have provided the architecture family 
      (*.fam), the (100 bootstrapped) gene trees as reconstructed by 
      SPIMAP (*.nt.uniq.trees), the architecture scenario as 
      reconstructed by STAR-MP (*.mp), and a figure of this 
      reconstructed architecture scenario (*.mp.svg).
      Finally, to limit the effect of genome annotation errors, we also
      considered a conservative set of "merge/split" families, in which no 
      genes within the family are adjacent, no genes are at the ends of 
      scaffolds, and no genes have transitive BLAST hits through alternatively 
      spliced forms.
    
In addition, we considered three possible mechanisms for module
rearrangement and catalogued ~9000 
      Two adjacent genes merge into a single gene, or a single gene splits into two genes.
      
      Large-loop mismatch repair or replication slippage results in a merged gene
      located between the ancestral split (but not necessarily
      adjacent) genes.
      
      A retrotransposed copy of a gene combines with exons from another gene.
      
       A chromosomal segment duplicates, and alternative portions of the duplicates are lost.
       
Last updated 03/08/13.