To use Jane, begin by downloading the .zip file available from the main page.
Once you have unzipped it, there will be a jar file in the Jane folder. To
run Jane, use the command
java -jar Jane.jar. On some machines, you can
simply click or double-click on the Jane.jar icon.
There is also a command-line version of Jane included in the download. This version of Jane is particularly useful for conducting large-scale experiments controlled by scripts. More details on this option are given at the end of this tutorial in the CLI section.
If Jane crashes on your computer, see the F.A.Q. for help.
When the graphical version of Jane is opened, the Main Window is displayed.
The Main Window consists of the parts listed below.
The minimum amount of data that Jane needs is a pair of phylogenetic trees (henceforth called the "host" and "parasite" trees, but in practice these can be any pair of trees such as a species tree and gene tree, or otherwise) and a mapping between the tips of these trees. The mapping need not be one-to-one. Jane permits multiple parasites to be mapped to a single host tip and for a single parasite to be on multiple host tips.
Such data can either come from a text file in .tree or .nex formats or you can draw a pair of trees and their tip mappings using Jane's graphical user interface "tree builder" feature (which are later saved in one of these two formats). For information on the two file format types, please refer to the file formats page.
To load in an existing text file, select the File menu and selection Open Trees. Now you can browse for files and load them into Jane. We'll come back to what to do once a file is loaded, but first we'll look at the tree builder tool.Back to Top
You can either build a tree from scratch or load in a tree in .tree or .nex format. The tree builder has the ability to load in both completely specified and partially specified trees (e.g., only the host tree, only the parasite tree, both trees but not the tip alignments).
To load a tree you can either select "Open..." from the File menu or click the open button in the top left corner and then navigate to the desired tree. When saving the tree, you have the option of saving in either .tree or .nex formats. However, the .nex format does not support multi-host parasites or region information.
To add nodes to the tree you can select the "Add Child" button to switch to Add Child Mode. Once in Add Child Mode you can click on any node to add a child. If the node is currently a tip this will add two children to the node. If the node is an internal node, it will add one additional child with each click to create a polytomy.
To assign names to the nodes you can switch to Label Mode by selecting the "Labels" button. Once in Label Mode simply click the node you would like to name and type the name in the box that appears. Jane does not support the naming of internal nodes. Additionally, integer names are not permitted.
To create a tip mapping between the host and parasite trees you can switch to Link Mode by selecting the "Add Link" button. To add a link the click one of the nodes and drag to the node in the other tree or click one node and then the next. When clicking on a tip node in Link Mode only that node's links will appear in blue and all other links will appear greyed out. Jane supports both failure to diverge events and multi-host parasites so parasite and host nodes can have multiple links.Note: Multi-host parasites are only supported in .tree files.
Move Mode can be used to rearrange the order of tree branches that contain a lot of crossing or cluttered links. Once in Move mode you can rearrange horizontal edges by selecting an edge and dragging up or down to uncross links. If the host and parasite tips are too close together to clearly distinguish the links the tips can be moved apart by placing the cursor in line with the tips and dragging left or right until the desired spacing is reached.
Jane supports an option called "time zones." If you have included time zone information, this information provides constraints that indicate that certain nodes (speciation events) in a tree happened before other nodes. Time zones can be represented as a collection of vertical lines across a tree. Nodes that appear on different sides of a vertical line cannot have occured at the same time, and thus host switch events between them are not permitted.
To add time zones, select the "Time Zones" button and enter the desired number of time zones. If, after adding time zones, you wish to change the number of time zones, click the "Change Number of Time Zones" button in the upper left corner and enter the new number of time zones. After entering the number of time zones, the time zones will be separated by dashed red lines. To set the time zone for a node, drag the node into the desired time zone. Tips must occur in the last time zone. Each node will have a red arrow on either side, drag these arrows into another time zone to set a region of time zones for the node.
Note: If any time zones are specified then all the specified time zones must be used in the host tree.
Jane also supports the option of specifying that certain nodes are in a common "region" (e.g. a geographical region). Once regions are specified, the cost of a host switching event between two different regions can be set to any value. In this way, differential "penalties" can be imposed on duplication with host switch events from one region to another. Specifically, the duplication with host switching cost is computed as the general duplication with host switching cost plus the region-specific host-switching cost.
To add regions, switch to Regions Mode by clicking the "Regions" button. Once in Regions Mode, drag to select a set of nodes and enter a number for the region. When a node is assigned a region, the region will appear in a key above the trees and the node will change to the corresponding color. Once regions have been assigned, region switching costs can be set by selecting the "Edit Region Switching Costs" button and enter the desired cost. A switch from one region to another will be the sum of the baseline host switch cost (user-definable in Jane as described later in this tutorial) and the additional inter-region host switch cost. A cost of "i", indicating "infinity," can be used to prevent all host switching between two regions. The region switching cost will be displayed next to the region key.Note: All nodes default to Region 0 and when assigning regions they must all be in Region 0 or a region greater than 0.
Erase Mode can be used to remove a node or link. Simply switch to Erase Mode using the "Erase" button and click the node or link you would like to remove. Removing an internal node will cause its descendant subtree to be removed. If a node has two children and one of those children is removed, then both children will be removed.
Clear Mode can be used to remove an entire tree. Simply click either the parasite or host tree to clear that tree. If both trees are cleared, timezone and region information will be cleared as well.
Once the trees have been completed, the "Load to Solver" button can be used to save the file and open it in the main Jane interface. If you have not yet saved the file you will be prompted to do so and then the file will open in the main interface where it can be run in Solve Mode or Stats Mode.Back to Top
Now we are back at the main Jane page. At the top of that page, you will see three pull-down menus labeled File, Settings, and Solve Mode Options. We have already seen the File menu; this is how you import files or launch the tree builder. now turn to the Settings Menu. This menu is used to set parameters relating to how Jane computes host-parasite reconstructions.
You can set the costs of each type of event: Cospeciation, duplication, duplication with host switch, loss, and failure to diverge.
To change the costs of events, select the "Set Costs" Option in the Jane Settings menu. As seen in the image below, the menu has the events listed with a box next to each. To change the cost of an event, simply enter the desired cost into the box next to the event name, or use the arrows next to the box. Jane requires event costs to be integers. Note that if you wish to have higher precision, you can simulate this by simply scaling up the costs (e.g., costs of "100" and "101" are analogous to "1.00" and "1.01").
Because it can often be difficult to accurately estimate the relative costs of events, Jane also supports solving for ranges of costs. For example, we might specify that cospeciations cost 0, duplications cost between 1 and 3, duplications with host switch cost between 3 and 5, etc. Jane will then compute solutions for every combination of costs in the given ranges and you can later browse through these combinations of costs to see the best solutions found for each.
To use this feature, select the "Range Costs" tab as seen in the image below. This tab allows you to select the range of costs Jane should find solutions for. Note that Jane's running time will increase based on the number of possible combinations of costs that can be made from the ranges given.
The "Region Costs" section at the bottom of the "Set Costs" dialog allows you to change the cost of duplication with host switch events between regions if the tree being solved has region information.
In addition to regions, which allows for customized host switching costs between different groups of nodes, Jane also allows you to completely prohibit any host switches between parts of the tree that are "too far apart." Specifically, the "Set Host Switch Parameters" option of the Settings menu allows you to set a maximum distance on host switches. Jane will not allow a host switch to occur if the graph theoretic distance between two nodes along the edges of the tree is greater than the given value.
Jane inteprets polytomies (multifurcations) as soft polytomies - polytomies that should be ultimately resolved into a sequence of bifurcations. Jane attempts to resolve polytomies in both the host and parasite trees in a way that minimizes the total cost of the resulting cophylogeny reconstruction.
You have control of two options in the "Set Polytomy Parameters" menu. The "Ensure Sequential Resolutions" option (default) ensures that the resolution of each polytomy results in bifurcations that occur in "rapid succession" so that no other speciation events in any part of the tree occur in time between two bifurcations of that resolved polytomy. In contrast, by turning off this option, a soft-polytomy can be resolved into a sequence of bifurcations that occur "slowly" in the sense that other events in the tree can be temporally interleaved with these bifurcations. The two options can result in slightly different solutions.
Similarly, you can toggle the "Prevent Mid-Polytomy Events" option. If this option is on (default) then there cannot be duplication or host switch events involving the edges that are created to resolve the polytomy into a sequence of bifurcations.
Most users will never touch this menu option. But, if you are curious or want to adjust the way that Jane searches for solutions, read on.
The Settings > Set Advanced Genetic Parameters can be used to set the mutation rate and the selection strength used in the genetic algorithm. The mutation rate is a value between 0 and 1, with 0 being never mutate and 1 being always mutate. The selection strength has minimum value 0, which corresponds to a randomly selected population, and there is no upper bound for this parameter. Most users find that the default parameters work well and choose not to adjust the values of these parameters.
In Solve Mode, Jane finds good reconstructions of the parasite tree onto the host tree and permits you to view those reconstructions. When Jane is run, Solve Mode is selected by default. Solve Mode may be reached from Stats Mode by simply clicking Solve Mode tab on main window. Click the Go button in the Actions box to run the genetic algorithm and search for a good embedding.
Jane finds low-cost associations between the host tree and the parasite tree by generating many random relative timings of the internal nodes of host tree and solving for the optimal association, then applying a genetic algorithm to modify timings and generating new host timing candidates.
In a genetic algorithm, there are two main control parameters called "population size" and "number of generations". These are parameters related to the inner workings of the algorithm and have nothing to do with the populations being studied in the host/parasite system. In a nutshell, the "population size" is the number of different solutions being considered at each iteration of the algorithm and the "number of generations" is the number of iterations performed by the algorithm as it seeks a good reconstruction of the parasite tree onto the host three.
Choosing larger values for these two parameters generally leads to better solutions, up to some point where increasing the values further makes no difference. However, increasing these parameters will cause computation to take more time. While some experimentation with these parameters will be required of you, see the parameters page for more information and advice about selecting appropriate values.
The population size and number of generations can be set by using the slide bars or inputting the number manually in the Genetic Algorithm Parameters box under the Solve Mode or Stats Mode tab. Note that these parameters are separate for each mode, so changing the parameter in one mode will not affect the parameters in the other mode.
The Solution Table displays the list of information about each solution found in Solve Mode, including the number of occurrences of each event and the total cost.
As you will see in a moment, many of the solutions that Jane finds will appear to be identical. In fact, while the solutions may look identical, they are based on different "timings" or relative orderings of the speciation events in the host tree. To compress these apparently identical solutions so that only truly distinct solutions are presented, check the Compress Isomorphic Solutions button.
After clicking Go in Solve Mode, Jane finds new solutions. The best solutions found will be added to the Solution Table.
As noted earlier, you can set a range of costs for each event type in the Settings > Set Costs menu. You might choose, for example, to have cospeciations to cost 0, duplications range in cost from 1 to 3, etc. In this way, you can explore the impact of costs on the solutions that Jane finds.
If you choose this option, Jane will display the following type of pull-down menus above the solution view. You can then select any combination of costs. Note that Jane completely recomputes the solutions when you click on a new cost, so this may take a bit of time to re-solve. Be patient.
Because a large number of different best solutions may be found in every run, you can specify the number of solutions saved by going to Solve Mode > Adjust Number of Solutions and entering the maximum number of host timings to be kept for each run.
To view a solution, simply double click at a solution in the table. A new Solution Viewer window will pop up. The usage of Solution Viewer is described below. To clear the table, go to Solve Mode Options > Clear Table to erase all information in the table.Back to Top
In the Solution Viewer, the host tree is drawn in black and the parasite tree is drawn in blue. There are five types of events: cospeciation, duplication, host switch, loss, and failure to diverge. For more information about Jane's cost models, see the Cost Model and Event page.The Solution Viewer will render each event as follows:
|Cospeciation:||A Cospeciation is marked by a hollow colored circle.|
|Duplication:||A Duplication is marked by a solid colored circle.|
|Duplication with Host Switch:||A Host Switch is marked by a duplication, with an arrow following the trajectory of the switching species.|
|Loss:||A Loss is marked by a dashed line.|
|Failure To Diverge:||A Failure to Diverge is marked by a jagged line.|
|Additionally, if the mouse cursor is hovered over a parasite node, the event type will appear.|
Notice that each node in the parasite tree is marked with a colored circle. The colors indicate the existence of other possible locations for the association. A green node means there is a location of lower cost where the parasite node and its descendants may be mapped. A yellow node indicates that there is another location of equal cost, and a red node means that all other locations it may be mapped to are of higher cost. To view this information from within Jane, go to Options > Show Key while in the Solution Viewer.
Each parasite node can be dragged to a new position, as long as that position leads to a possible embedding of the parasite tree on the host tree without requiring changes outside of the sub-tree rooted at the dragged node. When dragging a parasite node, segments of the host tree will highlight in the colors corresponding to the change in cost when the parasite node is moved to that location on the host tree (see the image below). Grey segments indicate that the parasite node cannot currently be moved to that position. Dragging that node to its current location will also move its descendants to their respective optimal locations.
In some situations there may be many lower cost locations on the host tree to move a specific parasite node and it may be useful to know the location that will result in the overall lowest cost. Simply double-clicking on a parasite node will simulate a "drag and drop" of that node to the location of lowest cost.
As a node is dropped on a location, nodes that are descendants of that parasite node will be placed in their optimal positions automatically. If you wish to extensively modify a particular embedding, it is advisable to work your way "down" the tree, starting at the parasite root node. Otherwise, modifications done near the bottom of the parasite tree will be undone when an ancestor of that node is moved. The cost at the top of the window will change according to the modifications.
Jane 4 is capable of providing Support Values in the solution viewer. Support values give the percentage of solutions in which a specific event of the parasite tree is mapped to a given location on the host tree. Below is a small example of this functionality. In this example, 8 percent of solutions have a host switch from the bottom host edge to the top host edge, and 22 percent have a host switch from the top host edge to the middle one.
To view support values, select the menu option "Show Support Values" as seen in the image below. Note that the time it takes to calculate support values will vary based on the size of the trees. Additionally, support values may vary slightly from run to run due to the method of calculation.
Jane is currently limited to providing support values for unmodified solutions.
Jane 4 has the ability to find solutions for polytomic trees by automatically resolving them. Jane's algorithm attempts to find the best solution from among all possible resolutions.
Host polytomies in Jane are displayed with edges in purple as shown below:
Parasite polytomies are displayed with edges in pink, as below:
There are a few options that govern Jane's handling of polytomies that are covered in detail on the page Polytomy Parameters.
|We now examine a step-by-step example of interacting with the Solution Viewer. We start with the optimal embedding of the two trees (i.e. the initial state upon opening the viewer).|
|Step 1: The parasite root node is being moved to a new location. The entire subtree of the moved node will now be optimally embedded onto the host tree. Since the moving node happens to be the root, the entire parasite tree will be adjusted.|
|Step 2: A different parasite node is now dragged to a new location. Notice that only the subtree of the moved node is rearranged, and all of the other parasite nodes stay at the same location.|
|Step 3: The root of the parasite tree is moved once again. Note that this step undoes the changes made in "Step 2".|
|Step 4: If a node is double-clicked, it will move to its optimal position (by only changing the location of itself and its subtree). In this step, the root of the parasite node is double-clicked, which returns the tree to its optimal embedding onto the host tree (see first picture).|
Jane allows you to save solutions to a file for later use in a few different ways. When you are viewing a solution, click on the File menu to see the options. One option is to Save Timing. This option allows you to save a representation of the solution that can later be reloaded into Jane. The two other options allow you to save the actual image in either zoomed or regular scale. These latter options are for saving the image for use in publications.
We now examine how to save a solution using the Save Timing timing option. Jane works by considering "timings" of the nodes in the host tree where a timing is simply an ordering of the speciation events (nodes) of the tree. From a given timing, Jane can very quickly find an optimal solution for that timing. Each solution that Jane presents in the Solution Viewer window is a different timing among the best solutions that Jane can find.
If a solution is of interest, its corresponding timing can be saved (without the manual changes made to the parasite tree within that timing) by going to File > Save Timing inside the Solution Viewer window. When the corresponding problem instance is loaded in the main window, the timing can be reloaded and Jane can reconstruct the corresponding solution. To do this, go to Solve Mode Options > Add Host Timing to Table, open the timing, and it will be added back into the list of solutions at the bottom of the Solve Mode tab. Command-line users can save the best timing found by using the -o switch with the name of a file in which to store the timing. It should be noted that the timing files are written in a human-readable format so that they can be modified manually if desired.Back to Top
In Stats Mode, Jane generates samples of random parasite trees or tip mappings and then solves these samples to obtain their cost. These costs are used to perform statistical analyses. To use the statistics mode, simply select Stats Mode tab in the main window.
The problem instance and genetic algorithm parameters can be configured in the same fashion as in Solve Mode. Note that the number of generations and population size for the genetic algorithm are independent between Solve Mode and Stats Mode; the values used in Solve Mode are not carried over to Stats mode and vice versa.
In the Statistical Parameters box, the Sample Size (the number of random problem instances to be generated and solved) can be set. To include the original problem instance in the sample, check the Include Original Problem Instance box.
Since the computation required to perform the randomization tests can be substantial for large trees, it may be desirable to distribute the work over multiple computers by running a small number of samples on each machine. Jane is multithreaded and will use all of the cores available on each machine.
Checking the "include original problem instance" box will solve the original problem instance in addition to the random ones and generate comparison data. This option is included in the event that you wishes to split the randomization tests over multiple computers. On one computer, the original problem instance will be evaluated in addition to some randomized instances. On the other computers, only randomized instances need to be solved. Note that if you choose to save the sample costs, the original problem instance cost will not be included in these. Thus, make sure you write it down if you need it later!
Random Tip Mapping: In this method, the tip mappings are permuted randomly. Each host tip will have the same degree (number of associated parasite tips) as the original problem instance. For example, if in the original problem instance there are two host tips each with degree two, the randomized problem instance will maintain this characteristic after randomization. If all parasite tips have a degree of one (that is, each parasite is associated to exactly one host) then the randomized tip mapping will be selected uniformly from all possible mappings that maintain the host tip degrees.
If any parasites have degrees greater than one then the randomized mapping will preserve the host degrees and will ensure that each parasite has a degree of at least one. However, the mapping will no longer be selected uniformly from all such mappings but instead the probability of an edge appearing between a host and a parasite is equal to the degree of that host divided by the total number of parasites.
After clicking Go to perform statistical analyses, the resulting cost distribution is displayed in the histogram on the bottom left of the Stats Mode tab, and basic Statistical values are shown on the bottom right of the Stats Mode tab.Cost Histogram:
The Cost Histogram section will show the result of the run in histogram format, where the horizontal axis represents the cost of the sample and the vertical axis represents the number of samples with the corresponding cost. If included, the cost of the solution to the original problem instance will appear as a red dashed line. Click Save as Picture to save the histogram as a portable network graphics (.png) file format, or click Show Histogram in New Window in order to see the histogram in a larger, separate window.
Robustness, range costs, and the p-Value Histogram:
You might wonder about the robustness of the statistical results. After all, they might be quite different if you had chosen different event costs. Recall that in the Settings > Set Costs menu, you can set ranges of costs for the different events. (These events costs must be integers.) If you choose a range of costs, you will a view like this at the bottom of the Jane window after pressing Go.
You can now view the Cost Histogram for each of the value combinations by simply clicking on the particular event button and selecting the particular cost.
In addition, you can view a summary of the p-values over all of those different cost combinations by clicking on p-value Histogram. Hovering with your mouse over each histogram bar will provide additional information.
The color coding of the bars in the p-value histogram is as follows: The red bar is the p-value for the currently selected cost values (the values selected from the pull-down menus at the bottom of the window). The gray and black bars indicate all other p-values. The bars with p-values less than 0.1 are shades of gray and those greater than 0.1 are black.Statistics Window:
The Statistics window contains statistics obtained from the randomization tests, including the mean, standard deviation, and, if the original problem instance is included, the percentile rank of the original compared to the random. The results of the runs can be saved as a comma separated values file by clicking Save Sample Costs. Excel and most other popular spreadsheet applications can open this type of file. Note that even if you check the include original problem instance check box, the cost for the solve of the original problem instance will NOT be saved! You need to write it down yourself.Back to Top
Jane can be invoked from the command line rather than by launching the graphical
user interface as follows. Note that
to the name of an input file in either .tree or .nex formats.
java -cp Jane.jar edu.hmc.jane.CLI treefile
The -help flag may be used to get more information about the command line options. These options are also summarized below:
By default, Jane will solve for associations of trees described in treefile, then output the best host timing found and the association information for each parasite node. Stats Mode can be invoked by using -stats option; Jane will output the statistics obtained from the generated samples. There are options available for configuring Jane as listed below:
|-help||Prints help message|
|-V||Turns on verbose output - Jane will report the minimum cost and number of host timings found at each generation, rather than just at the end.|
|-C||Causes Jane to evaluate costs using a "node-based" cost model rather than the default edge-based cost model. (Note though that node-based cost model is the default in the GUI version of Jane.)|
|-c <cosp dup switch loss ftd>||This defines the cost vector to use; i.e. -c 0 1 2 3 4 would cause cospeciations to cost 0, duplications to cost 1, host switches to cost 2, losses/sorting to cost 3, and failures to diverge to cost 4 (default: 0, 1, 1, 2, 1).|
|-m <value>||Sets the mutation rate. Appropriate values fall on the interval [0, 1] with 0 being never mutate and 1 being always mutate (default: 0.6).|
|-p <value>||Sets the initial population size (default: 30).|
|-i <value>||Sets the number of generations that the genetic algorithm should run (default: 30).|
|-s <value>||Sets the selection strength. 0 is completely random and there is no upper bound (default: 0.8).|
|-S <value>||Sets the maximum host switch distance allowed. -1 causes the distance to be unlimited (default: -1).|
|-stats <samplesize>||Switches Jane to Stats Mode and sets the number of samples to <samplesize>. By default, the tip mapping is randomized.|
|-I||When in stats mode, also does a solve of the original tip mapping/trees and prints data comparing it to the random sample. Note that the sample distribution will NOT include the cost of this solution. -stats must be invoked first.|
|-B <value>||Switches to parasite tree randomization using the Yule model where the value is Yule parameter. -stats must be invoked first.|
|-o <filename>||For Solve Mode, outputs the best host timing file to a file called <filename>. This file can then be viewed in graphical version of Jane. For Stats Mode, outputs the sample costs as a common separated value file (generally .csv or .xls) to a file called <filename>.|
|-silent||This causes Jane to produce absolutely no output to the console (though it will still write any files that are specified). Any operation that would normally print something out will still run, but it won't print anything out.|