A recent editorial in PLoS Computational Biology on building and maintaining a good scientific reputation just came to my attention. Much of this is probably already known and practiced by most, but it never hurts to see it again and consider it closely.
Wednesday, July 6, 2011
Scientific Reputation
A recent editorial in PLoS Computational Biology on building and maintaining a good scientific reputation just came to my attention. Much of this is probably already known and practiced by most, but it never hurts to see it again and consider it closely.
Tuesday, June 28, 2011
Ascertainment bias data sets?
Hi folks,
I'm working on a simulation study looking at the effects of using ascertained SNP data for phylogenetic and phylogeographic reconstruction. I'm basing my simulations on real data sets. I have a few but am looking for more!
Have any of you seen studies recently that use single nucleotide loci that were selected based on being polymorphic is some ascertainment panel? Mostly SNP chip based data sets are what I was thinking of, but there are others too.
For example, Decker et al. 2009 used marker loci that are known to be polymorphic in cattle to reconstruct divergences up to 29 mya across bovidae.
SNP chip data are cheap and easy to get- I want to figure out how far you can push using them!
Send me an email! ejmctavish [at] utexas.edu
Thanks,
ejm
Fig. 1 from Decker et al 2009. Strict consensus cladogram (no branch lengths) of 17 most parsimonious trees based on 40,843 SNP genotypes. *, Denotes paraphyletic group.
I'm working on a simulation study looking at the effects of using ascertained SNP data for phylogenetic and phylogeographic reconstruction. I'm basing my simulations on real data sets. I have a few but am looking for more!
Have any of you seen studies recently that use single nucleotide loci that were selected based on being polymorphic is some ascertainment panel? Mostly SNP chip based data sets are what I was thinking of, but there are others too.
For example, Decker et al. 2009 used marker loci that are known to be polymorphic in cattle to reconstruct divergences up to 29 mya across bovidae.
SNP chip data are cheap and easy to get- I want to figure out how far you can push using them!
Send me an email! ejmctavish [at] utexas.edu
Thanks,
ejm
Fig. 1 from Decker et al 2009. Strict consensus cladogram (no branch lengths) of 17 most parsimonious trees based on 40,843 SNP genotypes. *, Denotes paraphyletic group.
Tuesday, May 24, 2011
2011 and admixture!
Hi all!
It has been a long hiatus, but I just heard a talk that I though you all would find interesting, and Evolution is coming up, so I thought I would post!
I am in Okinawa at OIST for a workshop on genomics, with a focus on linkage and recombination. Today we had talks from Gil McVean and Simon Meyers, both of which were exciting, but it was Simon's work that I found particularly applicable to questions I (we, perhaps?) are interested in. He was involved in the development of HAPMIX, described in Price et al. 2008, a program for inferring ancestry of segments of chromosomes in admixed populations. I am planning to apply it to my longhorn data set, and I'll let you know how it goes, but it appears to have some advantages over using site likelihoods in STRUCTURE to estimate ancestry of alleles. Downside might be that it could require more dense genomic data than I have.
He also presented as yet unpublished work (I think in collaboration with Daniel Falush and Garrett Hellenthal) on using inference of patterns of introgression between populations to estimate levels and times of admixture, without ever actually estimating the blocks. Not clear on exactly how it works... Because he is doing this on human populations these correlate with fascinating historical events. Using a 28 year estimate for generation time, he was able to place fairly precisely times of admixture events which were corroborated by known historical migrations. Reconstructing Spanish introgression into native North American populations in the 1700s isn't terribly exciting, but inferring European ancestry from 400 BC in the Kalash people of northern Pakistan, who have an oral history of being descended from the armies of Alexander the Great is pretty cool. I'm hoping I can apply these methods to my cattle SNP data. I have already been claiming that Moors brought African cattle to Spain, but having dates to back it up make it lot more plausible...
Anyhow- I think HAPMIX might be a really useful resource, whose existence had slipped past me, and I look forwar to this new method being published.
Hope to see some of you folks in Norman next month. I am really looking forward to the next-gen data in phylogeography symposium!
Monday, October 18, 2010
Bayesian NCPA?
Hi all,
I just read Three roads diverged? Routes to phylogeographic inference by Erik W. Bloomquist, Philippe Lemey, and Marc A. Suchard, out next month in TREE and I'm confused! They review some interesting work in phylogeography, but also talk up a "Bayesian NCPA" method. However, my understanding of the method they discuss is that although it takes into account uncertainty in reconstruction of the haplotype network, it doesn't modify the inference key at all, which seemed to me to be the most problematically inscrutable part of NCPA. Have any of you folks looked at this? Is there anything fresh there?
As well- what are your impressions of the method they describe as "spatial diffusion"? My understanding from the Lemey et al 2009 paper is that it is akin to reconstructing location as a continuous character on the phylogeny under a variety of models. Have any of you tried this method? What are your thoughts?
Thanks!
Emily
I just read Three roads diverged? Routes to phylogeographic inference by Erik W. Bloomquist, Philippe Lemey, and Marc A. Suchard, out next month in TREE and I'm confused! They review some interesting work in phylogeography, but also talk up a "Bayesian NCPA" method. However, my understanding of the method they discuss is that although it takes into account uncertainty in reconstruction of the haplotype network, it doesn't modify the inference key at all, which seemed to me to be the most problematically inscrutable part of NCPA. Have any of you folks looked at this? Is there anything fresh there?
As well- what are your impressions of the method they describe as "spatial diffusion"? My understanding from the Lemey et al 2009 paper is that it is akin to reconstructing location as a continuous character on the phylogeny under a variety of models. Have any of you tried this method? What are your thoughts?
Thanks!
Emily
Thursday, October 14, 2010
Sandwalk: Philosophers, Science, and Creationism
I don't entirely disagree with this, but...in what sense is "the supernatural did it" an explanation at all? If this is a tough question, then so is "proving naturalism couldn't do it" is also a tough question. Criticizing some particular scientific theory != no possible natural explanation is possible. It is easy to see how reasonable people might say this stuff gets beyond mere science.
Sunday, July 4, 2010
Building a Simple Application Using 'libsequence'
Folks,
As I discuss in my programming language comparison post, for performance, C++ is a natural choice. Calculating the simplest set of summary statistics on a set of data may take 30 seconds when using my beloved Python, but just a few seconds using C++. When needing to crunch through thousands or even tens of thousands of datasets, this makes a HUGE difference!
Of course, if writing the script to do so in Python takes less than 20 minutes, writing the equivalent C++ program can easily take several hours or even the whole day depending on your memory gremlins. A lot of the development time can be saved by making use of existing libraries. One of these that is particularly relevant to phylogeography is Kevin Thornton's libsequence. This library implements a broad range of summary statistics calculations in C++, and also features a fairly robust FASTA-format parser.
Using this library, I wrote a simple program to calculate Fst statistics from a FASTA file in less than an hour, and a lot of this time was spent in just dealing with parsing/processing the user options and arguments. As I note in the comments in the post, however, for a production-grade application, I would rather use the much more robust and flexible command-line parser that I wrote for my Ginkgo application, which would cut down development time on this aspect of the program by a dramatic amount.
In addition, it took me a little while longer than I anticipated to figure out how to get the whole thing to compile and link using gcc. As such, I also thought that I would share a simple general recipe for compiling and linking a libsequence application using gcc.
Because blogger sucks at code layout (or because I suck at getting blogger to layout code correctly), the actual sample code and build instructions are actually presented in a post on my personal site instead of here.
I have also written up a "autoconf" and "automake" project skeleton that automates most of the build process. This can be found in an attachment to the post mentioned above (direct link here).
As I discuss in my programming language comparison post, for performance, C++ is a natural choice. Calculating the simplest set of summary statistics on a set of data may take 30 seconds when using my beloved Python, but just a few seconds using C++. When needing to crunch through thousands or even tens of thousands of datasets, this makes a HUGE difference!
Of course, if writing the script to do so in Python takes less than 20 minutes, writing the equivalent C++ program can easily take several hours or even the whole day depending on your memory gremlins. A lot of the development time can be saved by making use of existing libraries. One of these that is particularly relevant to phylogeography is Kevin Thornton's libsequence. This library implements a broad range of summary statistics calculations in C++, and also features a fairly robust FASTA-format parser.
Using this library, I wrote a simple program to calculate Fst statistics from a FASTA file in less than an hour, and a lot of this time was spent in just dealing with parsing/processing the user options and arguments. As I note in the comments in the post, however, for a production-grade application, I would rather use the much more robust and flexible command-line parser that I wrote for my Ginkgo application, which would cut down development time on this aspect of the program by a dramatic amount.
In addition, it took me a little while longer than I anticipated to figure out how to get the whole thing to compile and link using gcc. As such, I also thought that I would share a simple general recipe for compiling and linking a libsequence application using gcc.
Because blogger sucks at code layout (or because I suck at getting blogger to layout code correctly), the actual sample code and build instructions are actually presented in a post on my personal site instead of here.
I have also written up a "autoconf" and "automake" project skeleton that automates most of the build process. This can be found in an attachment to the post mentioned above (direct link here).
Sunday, June 13, 2010
Evolution 2010 Fast Approaching
It's hard to believe, but Evolution 2010 in Portland, OR is right around the corner. According to their website, it will be the largest Evolution meetings ever with > 1,800 registrants and > 1,050 talks. Judging from the recently posted program, the meeting will have quite a lot to offer to those interested in all things phylogenetic. I count 6 sessions on "phylogenetic theory", a whopping 13 sessions on "phylogeography", 12 sessions on "phylogenetics & diversification", and lots of other related topics. Unfortunately, many of these sessions are overlapping on Saturday and Sunday. I hope everyone arrives rested and with a large coffee mug! (I'm serious about the mugs. The organizers are pushing to reduce waste at the meeting and are asking everyone to bring their own reusable beverage containers.)
On a side note, I will be acting as a mentor on Saturday for the Undergraduate Diversity program. Part of the program's purpose is to help participating undergraduates network by meeting researchers in the field. So, if you see me on Saturday, please stop for a quick hello and introductions.
Looking forward to seeing everyone who can make it!
Subscribe to:
Posts (Atom)