454 XLR-HD
The next upgrade of the 454 FLX platform is called Titanium. The previous name gave a better indication of what the upgrade entails: XLR-HD which is short for eXtra Long Reads-High Density. The XLR is due to the run having twice the number of cycles so the average read length will increase from 250 to 400 bases (the average read length is not exactly double due to nucleotide flow order, mononucleotide runs, degraded signal as the number of cycles increase, etc.). The HD is due to smaller, more densely packed wells on the picotiter plate which increases the number of DNA fragments sequenced per run. Putting these together, 454 FLX Titanium runs will quintuple their data output from 100 Mb to about 500 Mb (or more).
This increase in data does not come without a price. Up until now, the primary analysis (image processing and base calling) of 454 data was able to be performed in a few hours on a moderately powerful computer. With the increased data output, primary analysis requires a small cluster: 20 cores with 1 GiB RAM per core having shared access to 1-2 TB of disk space. While those are the minimal requirements, 10 cores per run region seem to be the sweet spot for best performance. The initial production release will support Red Hat-compatible GNU/Linux distributions (RHEL, CentOS, and Fedora). Previous releases also only officially supported Red Hat-like operating systems but we have not had a problem running them on Debian GNU/Linux (454 also indicated they are pushing toward LSB3 compliance). Fortunately, 454 is eliminating the hard-coded dependence that the software be installed and the analysis processes have write access to /usr/local/rig. This will make installation across a cluster much easier. They are also abandoning their custom IPC implementation in favor of the “standard” MPI, specifically OpenMPI or MPICH2. While it is good that they are using a standard IPC implementation, it is unfortunate that MPI implementations are so fragmented and often incompatible, i.e., if one vendor uses MPICH2 and another uses LAM, you need to set up different systems to support each because they cannot coexist on the same system without problems.
I know this is unrelated to informatics, but if you will allow me to journey back to my transport phenomena days as a chemical engineer, the new picotiter plate requires much smaller beads, about 1 micron in diameter. At these length scales, transport phenomena, specifically boundary affects and polymer diffusion, may become important during the emulsion PCR and sequencing. Someone needs to calculate a Reynolds number.
Oh, one more thing, there is talk of paired-end reads with 20 kb inserts.
Tagged with: 454 , compute , FLOSS , genomics , informatics , IT
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
June 12th, 2008 at 4:07 pm
A clarification on your entry…
“…it is unfortunate that MPI implementations are so fragmented and often incompatible, i.e., if one vendor uses MPICH2 and another uses LAM, you need to set up different systems to support each because they cannot coexist on the same system without problems.”
Can you cite a specific problem? All MPI implementations that I am aware of can co-exist just fine on a single host without problems. Most of the time, all you need to do it set your PATH and possibly LD_LIBRARY_PATH and MANPATH to point to the MPI implementation that you want, and you’re good to go (sometimes you need to do this in shell setup files, especially if you’re using rsh/ssh to login to remote nodes to start MPI processes). As such, it is perfectly possible to run multiple MPI apps, each using a different MPI implementation on the back-end, and hide all that glue from the end-user.
I do agree, however, that it is a bummer that a) ISVs are forced to choose to support one or multiple MPI implementations, and b) that sysadmins have to deal with this in the first place (e.g., making glue to hide the differences and/or seamlessly allow users to swap between different MPI implementations — BTW, one package that many have found useful for this is Environment Modules: http://modules.sf.net/).
MPI apps are source code compatible between MPI implementations, of course, but that doesn’t necessarily mean that they will run with the same characteristics with different MPI implementations. This can be a real headache to handle properly, especially for ISVs. Having an application binary interface (ABI) is something that the MPI 3.0 Forum is looking at, but to be honest, I’m a little skeptical as to whether it will actually happen and/or be useful. One obvious example is that even if you have an app that will run with MPI implementations A and B simply by switching your LD_LIBRARY_PATH, a) how many users will screw this up, and b) will the app run the same way between A and B?
There are many more ABI issues than this, of course (both pro and con)… (please see http://meetings.mpi-forum.org/ if you’d like to express your opinions about an MPI ABI!)
Additionally, the official site for MPI is http://www.mpi-forum.org/, not the Argonne site.
June 13th, 2008 at 9:47 am
Do you perhaps have a reference regarding the 454 XLR-HD numbers for the storage and processing requirements?
June 13th, 2008 at 2:57 pm
The easy ones first.
Fixed.
I don’t think 454 has made that information widely available yet. Until they do, you can always cite this entry.
As someone who has developed in MPI a bit, I agree that from a developer standpoint, the spec works. As someone who has dealt with MPI as a system administrator, I agree that as long as you set up your environment properly, multiple implementations can coexist on the same system. It was the difficulty in getting all that sorted out, especially integrating it with cluster management/scheduling software, e.g., Platform LSF, that I was referring. It is certainly possible to make it work, it just is more difficult that it probably should be. As for an MPI-ABI, I think that is a lower priority (I don’t mind recompiling) than getting all the helper tools to have similar calling semantics and capabilities (in fact, being better able to specify which implementation you would like would likely allow you to avoid many such complications). And thanks for the pointer to Environment Modules.
June 13th, 2008 at 3:01 pm
[...] have posted some details about the upcoming update to the 454 [...]