PolITiGenomics

Politics, Information Technology, and Genomics

We made Parade

December 28th, 2008 dd Posted in genomics 2 Comments »

Our sequencing of the first cancer genome made Parade Magazine’s Breakthroughs Of The Year. I guess we have really made it.

AddThis Social Bookmark Button

CC bioinformatics

December 24th, 2008 dd Posted in IT, genomics No Comments »

Thanks to the beauty of Creative Commons, the folks over at CLC bio have posted a version of my next-generation sequencing informatics table that adds a new metric, Yield per hour: NGS Platform Overview. Their updated table is also available under a CC-BY-SA license.

AddThis Social Bookmark Button

Elaine Mardis on Cancer in Technology Review

December 24th, 2008 dd Posted in genomics No Comments »

Elaine Mardis, Co-Director of The Genome Center has a brief article in the January/February 2009 issue of Technology Review discussing the application of next-generation sequencing technologies to cancer research entitled Cancer Genomics: DNA sequencing will transform our understanding of cancer. In the same issue, there is an article that discusses the oncoming age of genomics, including its affect on personalized medicine, “third-generation” sequencing technology player Pacific Biosciences, and the Personal Genome Project.

AddThis Social Bookmark Button

1000 Genome SNPs released

December 24th, 2008 dd Posted in genomics No Comments »

The 1000 Genomes Project has announced its initial release of SNP data from four of the individuals sequenced to high depth-of-coverage as part of the second pilot project (trios). Here is the announcement from Paul Flicek of EBI and the 1000 Genomes Data Coordination Center (and formerly of Washington University).

Dear All,

I’m pleased to provide everyone a stocking stuffer in the form of the first release of data from the 1000 Genomes project.

The preliminary list of SNPs for 4 of the high coverage individuals are now available on the EBI and NCBI 1000 Genomes FTP sites. Instructions on how to access the data can be found at http://www.1000genomes.org.

In addition, we have created a project specific genome browser to allow the data to be visualised in the context of genome annotations and data from other projects including the Venter and Watson genomes. The browser is based on the Ensembl platform and is available at http://browser.1000genomes.org. We will be making updates to the browser throughout January to ensure the 1000 Genomes data is visible by default and is easy to find (SNP tracks can now be found on the “Features” menu). I welcome any comments, questions or suggestions that that you have about the workings of the browser.

A long list of people worked very hard to get this done and any attempt to mention people will certainly miss some. However, I would like to specifically acknowledge Tom Blackwell, Goncalo Abecasis, Fiona Hyland, Zam Iqbal, Laura Clarke, Eugene Kulesha, Yuan Chen, Stephen Keenan, Fiona Cunningham, Justin Paschall, Martin Shumway,
Hoda Kouri and Steve Sherry.

All the very best for the holiday season.

Paul Flicek

Obviously, the three 1000 Genomes pilot projects have been a massive undertaking that has strained not only the production centers, but the IT and informatics infrastructures of the production and analysis centers. To date, over 3.8 terabases (3.8×1012 or 3.8 trillion bases which is equivalent to about 1270 haploid human genomes) have been submitted as part of these pilot projects. The average SRF file submitted to the NCBI SRA stored 50 bytes of information per base; so the amount of data submitted so far is nearly 200 TB! At current broadband rates in the United States, it would take nearly 10 years to download all of this data (those still using 1600 baud modems may want to request they ship you the data on hard drives). Did I mention these are just the pilot projects?

AddThis Social Bookmark Button

AML at ASH

December 10th, 2008 dd Posted in genomics No Comments »

It seems the publication of the first whole cancer genome sequence is getting a good reception in hematology circles. This past Saturday at the 50th Annual Meeting of the American Society of Hematology, Dr. Tim Ley presented our AML sequencing work to a packed room of over 2000 people. In fact, there was such demand to see the talk that they scheduled a second time for Dr. Ley to give the talk so that all those that were turned away the first time could get a chance to see it. There is a lot of excitement in these oncology circles because the initiating events for many of these tumors are not known and an unbiased, whole-genome sequencing approach is currently the best chance to discover them.

AddThis Social Bookmark Button

Using the next-generation sequencing statistics table

December 10th, 2008 dd Posted in IT, genomics No Comments »

Since posting the next-generation sequencing informatics statistics table I have received several requests asking if they could reproduce the table in one format or another. The answer is yes, as long as you adhere to the license it is published under: the Creative Commons Attribution-Share Alike 3.0 License. Basically, if you allow other people to do to the table you present what you are asking me to do for you (reproduce the table with proper credit and, possibly, alteration), you are free to republish the content. If you have any questions, feel free to post a comment and ask. Enjoy.

AddThis Social Bookmark Button

Ensembl on Amazon

December 5th, 2008 dd Posted in IT, genomics No Comments »

The Amazon Web Services (AWS) blog has an entry on using Amazon’s Elastic Compute Cloud (EC2) to host and access public data sets, including Ensembl release 51. The data are stored as Amazon Elastic Block Store (Amazon EBS) snapshots. Anyone using EC2 can then create their own EBS using the public data EBS as a starting point. The data are then available to the user to modify, update, and perform calculations using the cloud. You can find more information on how to use the available public data sets and even upload your own data sets at Public Data Sets on AWS.

AddThis Social Bookmark Button

Next-Generation Sequencing Informatics

December 4th, 2008 dd Posted in IT, genomics 2 Comments »

I have put together a table with a bunch of important metrics for the major next-generation sequencing platforms: Next-Generation Sequencing Informatics (there is also a link on the left-hand side of the page). It includes number of reads, read length, data sizes, computational time, etc. I will try to keep it as up to date as I can and add new platforms and revisions as they become available. Consider it an early Christmas present.

AddThis Social Bookmark Button

Junk DNA no more

November 13th, 2008 dd Posted in genomics No Comments »

The New York Times has a good article, Now - The Rest of the Genome, on the latest research on genes and the non-protein-encoding parts of the genome. As reported in our recent AML paper, while we found many somatic variants (i.e., variants specific to the cancer genome) throughout the genome, at present we are really only able to interpret those variants that fall in genes. Research such as that described in the New York Times article, e.g., ENCODE, will help us to be able to interpret all the other variations that may play a role in cancer and other diseases.

AddThis Social Bookmark Button

AML on NPR

November 7th, 2008 dd Posted in genomics No Comments »

Dr. Tim Ley of The Genome Center is going to be a guest on NPR’s Talks of the Nation Science Friday today. He will be talking about the AML paper published in Nature yesterday. The show starts at 2 p.m. EST but will be interrupted on our local NPR station, KWMU 90.7 FM, for President-elect Obama’s press conference at 2:20 p.m. EST. So if you want to listen, it might be better to listen on-line (although other stations may be interrupted as well) or get the podcast.

AddThis Social Bookmark Button