Next-Generation Sequencing Informatics Update
February 19th, 2010
I updated the Next-Generation Sequencing Informatics table a few weeks ago but forgot to mention it on the blog. The main update was the 50G configuration of the Illumina GA IIx. Also, the Sides & Associates blog linked to my table and referred to it as a “somewhat dated comparison of next-generation sequencing platforms.” Just to clarify, this table represents average throughput for production systems; not vendor claims about throughput, not future vaporware (and Alejandro Gutierrez corrected his description in the post once I pointed this out). As new systems come online and further improvements are made to existing platforms, the table will be updated.
Posted in IT, genomics | 8 Comments »
Tagged with: compute, genomics, Illumina, informatics, IT, storage
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
February 19th, 2010 at 8:55 pm
Thanks for the update, that table is a great resource whenever we start doing back of the envelop calculations for potential new projects.
February 19th, 2010 at 9:36 pm
I’m curious what you use for your source of statistics for platforms you don’t have direct access to — publications or you have a good network of other labs?
This is very useful information!
February 20th, 2010 at 12:51 am
Thanks for the update and clarification. I’ve corrected “somewhat dated” on our blog with a more faithful explanation.
February 20th, 2010 at 4:16 am
Mate-pair and paired-end should not be confused as they are fundamentally different chemistries. Also, the SRA does not require the submission of images and so most submitters are not uploading images. Is this removed from the SRA file size (it seems quite high given our experience).
Finally, some statistics for the SOLiD 3+ would be great as they are regularly producing 800M-1B 50+50 reads per run (2 slides).
February 20th, 2010 at 5:06 pm
Keith, I get production numbers on the platform we do not have (SOLiD), from drd at Baylor.
Alejandro, I saw that you updated your post, and I appreciate it. I did not mention it in this post to impugn you, rather to help make a point.
Dirk, it’s true that mate-pair and paired-end are different, but for the purposes of this table the distinction is not important (indeed, it can easily be inferred by the size of the insert). None of the submission data is for images. It is for SRF (now ~17 B/b but ~50 B/b at one time) or gzipped FASTQ (~0.1 B/b). When the SRA has a fully functioning BAM pipeline, I have those numbers (~1 B/b). Note that I leave the SRF for the older platforms for historical reasons. As for SOLiD 3+, I am happy to post them if someone can provide them (see above).
March 1st, 2010 at 12:01 am
yeah.SOLiD v3 plus spec sheet will have all those numbers you need.
http://www3.appliedbiosystems.com/cms/groups/mcb_marketing/documents/generaldocuments/cms_072050.pdf
April 15th, 2010 at 12:47 pm
You should probably just report what you know, as the rest of these numbers look like typical GSC bias.
April 15th, 2010 at 12:57 pm
Classy stuff, Seth Peterson, Field Application Scientist at Applied Biosystems. Perhaps you are showing your AB bias. As I said above, the numbers for all platforms are real. The SOLiD numbers come from Baylor, a very pro-SOLiD shop.