Friday, May 21, 2010

Transferring Support Values Between Trees

Papers reporting phylogenetic trees often label some point estimate of the phylogeny (e.g., an ML tree) with support values derived from several methods (e.g., posterior probabilities, ML bootstraps, and MP bootstraps). Perhaps I'm merely ignorant of the available software, but I have never been able to find a way to easily transfer multiple support values from various consensus trees to a point estimate without having to recalculate the consensus values from the original collections of trees. This problem is particularly acute when the consensus trees differ in topology.

As general Python programming practice, as well as a way to learn to use Jeet and Mark's great Dendropy library, I decided to take a crack at fixing this problem. I wrote a python script, creatively titled transferBranchLabels.py, that takes a single point estimate of the phylogeny (w/ or w/o branch lengths) and an arbitrary number of trees with support values as command-line arguments. It then returns the point estimate (retaining branch lengths, if originally present) with all the support values labeling the internal nodes. If a node is present in the point estimate (target tree) but not in one of the support trees, the node is labeled with '-'. The output tree simply includes all the support values separated by some delimeter (I've used '/' for now). TreeView X doesn't like that format, but FigTree handles it just fine.

Here's an example run of the program:


Note that the program reports how many of the clades in each support tree were (not) found in the target tree. Here are the corresponding trees from FigTree:


Upper: Original ML tree. Lower: Labeled ML tree.


Upper: Consensus tree from a posterior distribution. Lower: Consensus tree from ML bootstrapping.

If this would be a useful utility for you, feel free to download it and give it a try. Let me know how it goes.

4 comments:

  1. Nice, I was just thinking about how to do this, you've saved me the trouble!!

    ReplyDelete
  2. Hi Jeremy,

    Nice work!

    I wonder if you might find things easier simply to recompose the internal labels as the desired string, and then rely on native writing methods.

    That, is (due caveats to formatting messed up due to blogger):

    ##############################################
    for nd in tree:
    nd.label = delim.join([str(s) for s in nd.label])
    tree.write_to_path(path, "nexus")
    ##############################################

    This would avoid the need to reimplement your own writer/formatter, and gives you easy access to write the tree out in any format that DendroPy supports through its "write_to_*()" methods.

    ReplyDelete
  3. p.s. I think blogger has got the most code(r)-unfriendly interface I have ever dealt with! As far as I can tell, there *no* way to get anything even close to formatted code in comments!

    Why doesn't the whole world run on good old plain text?

    ReplyDelete
  4. Nick,

    Let me know how it goes.

    Jeet,

    That's a great suggestion. I think I did it the way I did partly because I wasn't sure how Dendropy handled the output of node labels and partly because I wanted some practice writing a recursive function. Regardless, your way is much more efficient!

    I agree about blogger's interface. Putting images in blog posts is also a pain, as it always places them at the top of the post and leaves you to move them into place.

    ReplyDelete