The 1000 Genomes Project has announced its initial release of SNP data from four of the individuals sequenced to high depth-of-coverage as part of the second pilot project (trios). Here is the announcement from Paul Flicek of EBI and the 1000 Genomes Data Coordination Center (and formerly of Washington University).
I'm pleased to provide everyone a stocking stuffer in the form of the first release of data from the 1000 Genomes project.
The preliminary list of SNPs for 4 of the high coverage individuals are now available on the EBI and NCBI 1000 Genomes FTP sites. Instructions on how to access the data can be found at http://www.1000genomes.org.
In addition, we have created a project specific genome browser to allow the data to be visualised in the context of genome annotations and data from other projects including the Venter and Watson genomes. The browser is based on the Ensembl platform and is available at http://browser.1000genomes.org. We will be making updates to the browser throughout January to ensure the 1000 Genomes data is visible by default and is easy to find (SNP tracks can now be found on the "Features" menu). I welcome any comments, questions or suggestions that that you have about the workings of the browser.
A long list of people worked very hard to get this done and any attempt to mention people will certainly miss some. However, I would like to specifically acknowledge Tom Blackwell, Goncalo Abecasis, Fiona Hyland, Zam Iqbal, Laura Clarke, Eugene Kulesha, Yuan Chen, Stephen Keenan, Fiona Cunningham, Justin Paschall, Martin Shumway,
Hoda Kouri and Steve Sherry.
All the very best for the holiday season.
Obviously, the three 1000 Genomes pilot projects have been a massive undertaking that has strained not only the production centers, but the IT and informatics infrastructures of the production and analysis centers. To date, over 3.8 terabases (3.8×1012 or 3.8 trillion bases which is equivalent to about 1270 haploid human genomes) have been submitted as part of these pilot projects. The average SRF file submitted to the NCBI SRA stored 50 bytes of information per base; so the amount of data submitted so far is nearly 200 TB! At current broadband rates in the United States, it would take nearly 10 years to download all of this data (those still using 1600 baud modems may want to request they ship you the data on hard drives). Did I mention these are just the pilot projects?