PolITiGenomics

Politics, Information Technology, and Genomics

What the Crisis Nursery does

AddThis Social Bookmark Button

March 11th, 2010

Below is a nice interview with DiAnne Mueller, CEO of the St. Louis Crisis Nursery, talking about what the Crisis Nursery does and how you can help.


Gathering cloud at XGen

AddThis Social Bookmark Button

March 10th, 2010

If you are going to be at XGen next week and you are interested in cloud computing and its application to bioinformatics, be sure to stop and participate in the Cloud Computing in Bioinformatics discussion I will be “facilitating” on Wednesday morning (March 17). My talk is at 3:05 p.m. PT on Tuesday and I will be chairing the first session on Monday (if my plane is on time and the taxi is fast enough).


New data center approved

AddThis Social Bookmark Button

March 10th, 2010

The Genome Center recently received word that its grant proposal for a data center was approved (St. Louis Business Journal). The $14.3 million grant is funded by National Center for Research Resources and the money comes from ARRA. The grant, along with about $8 million dollars from Washington University, will allow us to essentially duplicate our current data center capacity. We took possession of our current data center in May 2008 and it is already 80-90% full, so this new data center will greatly help us to keep pace with all of the exciting, new projects we are undertaking.


Me, in podcast form

AddThis Social Bookmark Button

February 24th, 2010

I recently did an interview in advance of my talk at the XGen Congress next month in San Diego. The interview is about 14 minutes and discusses our work at The Genome Center in general and more specifically the software and IT infrastructure we have created to enable the analysis of the massive amounts of sequence data we generate. The interview is available to download as part of the XGen Congress podcast series.


The Pac’s out of the bag

AddThis Social Bookmark Button

February 23rd, 2010

Most of you have probably already seen this, but Pacific Biosciences announced the institutions that will be getting their first ten prototype instruments (Bio-IT World, GenomeWeb, MarketWatch). The Genome Center is among the institutions that will be getting one. It looks like PacBio will indeed be the first third generation sequencing company with instruments out in the wild. Don’t get too excited though, it’s probable that these third generation instruments will be a lot like the first batch of second generation instruments: it will take a while before they are ready for production sequencing, reliably producing good quality data. We’ll find out more from all the sequencing instrument companies in the coming days at AGBT.


Next-Generation Sequencing Informatics Update

AddThis Social Bookmark Button

February 19th, 2010

I updated the Next-Generation Sequencing Informatics table a few weeks ago but forgot to mention it on the blog. The main update was the 50G configuration of the Illumina GA IIx. Also, the Sides & Associates blog linked to my table and referred to it as a “somewhat dated comparison of next-generation sequencing platforms.” Just to clarify, this table represents average throughput for production systems; not vendor claims about throughput, not future vaporware (and Alejandro Gutierrez corrected his description in the post once I pointed this out). As new systems come online and further improvements are made to existing platforms, the table will be updated.


Puff piece

AddThis Social Bookmark Button

February 16th, 2010

Why should one be skeptical of all the information touting the wonders of cloud computing? This older, in-depth piece by Gartner, Hype Cycle for Cloud Computing, 2009, lays out the reasons pretty well. But one need not spend that much time reading about it. You can simply read this much shorter piece by Jason Stowe: Is the Future Of High- Performance Computing For Life Sciences Cloudy? Reading that story, one can only get the impression that the cloud is some panacea where all computational problems are solved. In fact, the picture is so rosy that one may become suspicious. So suspicious that one may read the About the Author section at the bottom of the piece an see that Mr. Stowe happens to be CEO of a company selling cloud computing services.

Jason Stowe is the founder and CEO of Cycle Computing, a provider of high-performance computing (HPC) and open source technology in the cloud. A seasoned entrepreneur and experienced technologist, Jason attended Carnegie Mellon and Cornell Universities.

No wonder he makes cloud computing sound so attractive. No mention of the IT expertise needed to get up and running on the cloud. No mention of the software engineering needed to ensure your programs run efficiently on the cloud. It may not be apparent from his article, but a program that runs well on one or ten computers does not necessarily run well on hundreds of computers. In fact, he implies the exact opposite.

For compute clusters as a service, the math is different: Having 40 processors work for 100 hours costs the same as having 1,000 processors run for 4 hours.

It may cost the same under that scenario, but not everything scales linearly. In fact, most things don’t and that less-than-linear scaling actually ends up making it cost more to get a shorter turnaround. This fact was clearly evident in the Crossbow paper where it cost $52 to complete the analysis in 6.5 hours but $84 to finish it under 3 hours (Table 4). The article fails to mention this; a marvel given the fact that the lack of good, scalable bioinformatics tools that can run well in highly parallel environments is perhaps the largest impediment to the adoption cloud computing in bioinformatics. Of course, I am sure he will gladly sell you consulting services that will get you up and running on the cloud. In short, this looks like a shill.

Unfortunately, omitting information is not the only problem with many of the stories about cloud computing; many also contain misinformation. For example, the story Gathering clouds and a sequencing storm in Nature Biotechnology mentions the software engineering challenges but erroneously states

…bioinformaticians might not be willing to spend the time to familiarize themselves with hadoop, the open source program needed to process large data sets on a cloud

What?!? You do not have to develop tools using Hadoop. Sure it is a nice platform that provides fault-tolerant parallelism, but it is by no means required by any cloud provider that I know of (not even Google, whose MapReduce framework provided the model for Hadoop!) nor is it the only way to achieve parallel processing (far from it). Amazon EC2 just provides you with a virtual machine with a basic operating system installed on it and remote access. You can do whatever you want with it after that. Google and Microsoft do require that you develop your code in their cloud framework, but you do not have to use Hadoop. For information on what you do have to do to run jobs on the major cloud providers, check out this article by Udayan Banerjee, Cloud Economics — Amazon, Microsoft, Google Compared, and each providers web site: Amazon AWS, Google App Engine, and Microsoft Windows Azure.

(How many bad cloud puns can I work into post titles? Stay tuned.)


In case you missed second grade

AddThis Social Bookmark Button

February 15th, 2010

Speaking of global climate change and snowstorms, NPR has a story this morning about how a lot of snow in Washington, DC does not contradict the theory of global climate change. For those who missed second grade, the piece contains this information.

A storm is part of what scientists classify as weather. Weather is largely influenced by local conditions and changes week to week. It’s fickle — fraught with wild ups and downs.

Climate is the long-term trend of atmospheric conditions across large regions, even the whole planet. Changes in climate are slow and measured in decades, not weeks.

Judging from the comments on the story, it seems some are not swayed by facts and logic. I am sure their objections are based on sound scientific inquiry and not politically motivated.


Seeing double

AddThis Social Bookmark Button

February 12th, 2010

It seems there is a shortage of news satire ideas. Two days ago, The Daily Show and The Colbert Report each had similar pieces on global climate change.

The Colbert Report Mon – Thurs 11:30pm / 10:30c
We’re Off to See the Blizzard
www.colbertnation.com
Colbert Report Full Episodes Political Humor Skate Expectations

Rachel Maddow also had similar sentiments.

Visit msnbc.com for breaking news, world news, and news about the economy

(For the slow learners out there, climate and weather are not the same thing. You were supposed to have learned this in second grade.)

Then, again, last night there were similar similarities (yes, that’s intentional) between The Daily Show and The Colbert Report’s reports on the response of Republicans (and Admiral Ackbar) to President Obama’s invitation to participate in a televised bipartisan summit on health care reform.

The Colbert Report Mon – Thurs 11:30pm / 10:30c
The Word – Political Suicide
www.colbertnation.com
Colbert Report Full Episodes Political Humor Skate Expectations

Well, it’s funny anyway.


Seq-o-matic ‘76

AddThis Social Bookmark Button

February 3rd, 2010

Bass-o-matic

Soon after Illumina announced its HiSeq 2000, it also announced the GA IIx’s little brother, the GA IIe. The IIe will produce about half as much data as the IIx, but no one seems to know exactly how this is done. The unit is cheaper than the IIx, $250,000 for the IIe compared to $400,000 (I think) for the IIx, but is upgradeable to the IIx. So perhaps the optics system is cheaper. But the run time is the same, so it seems like the optics would need to be about the same (the older optics system was slower). The IIe seems to use the same kits as the GA IIx. That seems odd to me because the consumables cost is typically the largest part of the per run cost. So while you will save on instrument depreciation costs per run, those savings disappear when considering cost per Gb. Another way to look at it is that if reagent costs are indeed the same, it makes no sense to buy two GA IIe instruments. You would be much better off buying one GA IIx. It is only if your lab has a sequencing workload that cannot utilize a GA IIx full time that a GA IIe makes economic sense.