Monday, September 22, 2014

2014 Frontiers in Phylogenetics Symposium

Last week I presented a talk entitled "Deep metazoan phylogeny and the utility of taxon-specific ortholog sets" at the Smithsonian Frontiers in Phylogenetics symposium. This year's theme was Genome Scale Phylogenetics: Analyzing the Data. My talk was on my research using custom, taxon-specific core ortholog sets in the program HaMStR as a computationally tractable alternative to more computationally intensive all-versus-all BLAST-based approaches (e.g., OrthoMCL) or similar approaches (e.g., OMA).

Here's a link to a video of the talk for anyone interested in checking it out (Guillermo's introduction for my talk starts around 1:28 in)
http://www.ustream.tv/recorded/52713111

Here's the abstract:
Orthology determination based on a set of pre-defined orthologs is a powerful and computationally tractable approach to identify molecular data suitable for genome-scale phylogenetics. However, genes that are single copy among the taxa used to prepare such a set may have undergone lineage-specific duplications in other clades, suggesting taxon-specific core ortholog sets are advantageous. I will present studies of deep metazoan phylogeny based on custom sets of core orthologs that are specific to the taxon sampling of a given project. By selecting a small but representative subset of around 5-10 taxa with high quality transcriptomes (and ideally at least one taxon with a high quality genome) and using all-versus-all BLAST, OrthoMCL, and our PhyloTreePruner software, HaMStR core ortholog sets that are specific to a group of interest can easily be generated. Importantly, these core ortholog sets tend to contain many more genes than the sets based on broader taxon sampling. Starting from a larger pool of orthologous sequences allows for stricter gene selection criteria, enabling the investigator to exclude genes with e.g., large amounts of missing data, few unambiguously aligned positions, a high rate of evolution, different rates of evolution among taxa, etc.