Folks,
As I discuss in my
programming language comparison post, for performance, C++ is a natural choice. Calculating the simplest set of summary statistics on a set of data may take 30 seconds when using my beloved Python, but just a few seconds using C++. When needing to crunch through thousands or even tens of thousands of datasets, this makes a HUGE difference!
Of course, if writing the script to do so in Python takes less than 20 minutes, writing the equivalent C++ program can easily take several hours or even the whole day depending on your memory gremlins. A lot of the development time can be saved by making use of existing libraries. One of these that is particularly relevant to phylogeography is Kevin Thornton's
libsequence. This library implements a broad range of summary statistics calculations in C++, and also features a fairly robust FASTA-format parser.
Using this library, I wrote a simple program to calculate Fst statistics from a FASTA file in less than an hour, and a lot of this time was spent in just dealing with parsing/processing the user options and arguments. As I note in the comments in the post, however, for a production-grade application, I would rather use the much more robust and flexible
command-line parser that I wrote for my
Ginkgo application, which would cut down development time on this aspect of the program by a dramatic amount.
In addition, it took me a little while longer than I anticipated to figure out how to get the whole thing to compile and link using
gcc. As such, I also thought that I would share a simple general recipe for compiling and linking a
libsequence application using
gcc.
Because blogger sucks at code layout (or because I suck at getting blogger to layout code correctly), the actual sample code and build instructions are actually presented
in a post on my personal site instead of here.
I have also written up a "
autoconf" and "
automake" project skeleton that automates most of the build process. This can be found in an attachment to
the post mentioned above (direct link
here).