January 22nd, 2009
AGBT 2009 will be starting in about two weeks in Marco Island, FL. The 10th annual AGBT meeting looks like it is going to be very impressive. With all of the advances in sequencing and genomics over the last year and the announcement of future platform updates, there should be a lot to take in. I will be speaking at this years meeting so, if you are attending, please consider coming to give a listen. My talk will be Thursday evening, 5 Feb 2009, in the Bioinformatics session chaired by Deanna Church. Here is the abstract of my talk.
Massively parallel sequencing platforms are generating sequence data at a rate that is rapidly overwhelming the informatics community’s analysis throughput. From large projects like 1000 Genomes to individual investigator labs with a single next-generation sequencing instrument, bioinformaticians are being bombarded with unprecedented amounts of data. To begin to address this issue, much recent bioinformatics research has focused on developing faster algorithms for alignment, assembly, annotation, etc., resulting in a proliferation of analysis tools. However, faster algorithms for traditional analyses are only one aspect of the recent bioinformatics explosion. The richness of data now available to bioinformaticians allows investigators to ask new types of questions requiring new algorithms and new approaches, e.g., in the fields of medical and population genomics. The data and tools explosion has created a dichotomy for the bioinformatician: perform high-throughput analysis on large volumes of samples while evaluating many different tools and analysis workflows. High-throughput pipelines are linear and static and require detailed tracking. Evaluating tools and comparing results from different workflows are non-linear and ever-changing (especially in the current environment of rapidly evolving sequencing technologies and analysis tools) and typically involve many false starts that are not tracked. To reconcile these contrasting requirements, we have developed a software framework for genome analysis that combines ease-of-use, detailed tracking and reporting, flexible chaining of tools to define workflows, results comparison, and scalability. The design of this framework heavily leverages the lessons we have learned in developing a laboratory information management system for high-throughput genome sequencing in an environment implementing continual process improvements. The result is a next-generation informatics framework that will allow bioinformatics tools and workflows to be part of the experimental design of sequencing projects.
Sounds exciting, eh? Looking at the other talks in the session, it seems I should not delve too much into the coding details and focus more on the motivation and high-level concepts used in our system.
Several other people from The Genome Center will also be attending and presenting. Our Director, Rick Wilson, will be giving a keynote address on Saturday. Co-Director Elaine Mardis will chair the always interesting New Genomic Frontiers plenary session (don’t make the mistake of booking that early flight, you’ll miss the best stuff). Todd Wylie and Jon Armstrong from our Technology Development group will be giving talks on miRNA characterization and capture techniques, respectively. Also, be sure to check out Dan Koboldt’s blog post giving a sneak preview of the poster he will be presenting about the performance of various short-read aligners.