Friday, May 21, 2010

Transferring Support Values Between Trees

Papers reporting phylogenetic trees often label some point estimate of the phylogeny (e.g., an ML tree) with support values derived from several methods (e.g., posterior probabilities, ML bootstraps, and MP bootstraps). Perhaps I'm merely ignorant of the available software, but I have never been able to find a way to easily transfer multiple support values from various consensus trees to a point estimate without having to recalculate the consensus values from the original collections of trees. This problem is particularly acute when the consensus trees differ in topology.

As general Python programming practice, as well as a way to learn to use Jeet and Mark's great Dendropy library, I decided to take a crack at fixing this problem. I wrote a python script, creatively titled transferBranchLabels.py, that takes a single point estimate of the phylogeny (w/ or w/o branch lengths) and an arbitrary number of trees with support values as command-line arguments. It then returns the point estimate (retaining branch lengths, if originally present) with all the support values labeling the internal nodes. If a node is present in the point estimate (target tree) but not in one of the support trees, the node is labeled with '-'. The output tree simply includes all the support values separated by some delimeter (I've used '/' for now). TreeView X doesn't like that format, but FigTree handles it just fine.

Here's an example run of the program:


Note that the program reports how many of the clades in each support tree were (not) found in the target tree. Here are the corresponding trees from FigTree:


Upper: Original ML tree. Lower: Labeled ML tree.


Upper: Consensus tree from a posterior distribution. Lower: Consensus tree from ML bootstrapping.

If this would be a useful utility for you, feel free to download it and give it a try. Let me know how it goes.