PolITiGenomics

Politics, Information Technology, and Genomics

454 XLR-HD

AddThis Social Bookmark Button

The next upgrade of the 454 FLX platform is called Titanium. The previous name gave a better indication of what the upgrade entails: XLR-HD which is short for eXtra Long Reads-High Density. The XLR is due to the run having twice the number of cycles so the average read length will increase from 250 to 400 bases (the average read length is not exactly double due to nucleotide flow order, mononucleotide runs, degraded signal as the number of cycles increase, etc.). The HD is due to smaller, more densely packed wells on the picotiter plate which increases the number of DNA fragments sequenced per run. Putting these together, 454 FLX Titanium runs will quintuple their data output from 100 Mb to about 500 Mb (or more).

This increase in data does not come without a price. Up until now, the primary analysis (image processing and base calling) of 454 data was able to be performed in a few hours on a moderately powerful computer. With the increased data output, primary analysis requires a small cluster: 20 cores with 1 GiB RAM per core having shared access to 1-2 TB of disk space. While those are the minimal requirements, 10 cores per run region seem to be the sweet spot for best performance. The initial production release will support Red Hat-compatible GNU/Linux distributions (RHEL, CentOS, and Fedora). Previous releases also only officially supported Red Hat-like operating systems but we have not had a problem running them on Debian GNU/Linux (454 also indicated they are pushing toward LSB3 compliance). Fortunately, 454 is eliminating the hard-coded dependence that the software be installed and the analysis processes have write access to /usr/local/rig. This will make installation across a cluster much easier. They are also abandoning their custom IPC implementation in favor of the “standard” MPI, specifically OpenMPI or MPICH2. While it is good that they are using a standard IPC implementation, it is unfortunate that MPI implementations are so fragmented and often incompatible, i.e., if one vendor uses MPICH2 and another uses LAM, you need to set up different systems to support each because they cannot coexist on the same system without problems.

I know this is unrelated to informatics, but if you will allow me to journey back to my transport phenomena days as a chemical engineer, the new picotiter plate requires much smaller beads, about 1 micron in diameter. At these length scales, transport phenomena, specifically boundary affects and polymer diffusion, may become important during the emulsion PCR and sequencing. Someone needs to calculate a Reynolds number.

Oh, one more thing, there is talk of paired-end reads with 20 kb inserts.

Posted in IT, genomics |

Tagged with: , , , , ,


You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

8 Responses to “454 XLR-HD”

  1. A clarification on your entry…

    “…it is unfortunate that MPI implementations are so fragmented and often incompatible, i.e., if one vendor uses MPICH2 and another uses LAM, you need to set up different systems to support each because they cannot coexist on the same system without problems.”

    Can you cite a specific problem? All MPI implementations that I am aware of can co-exist just fine on a single host without problems. Most of the time, all you need to do it set your PATH and possibly LD_LIBRARY_PATH and MANPATH to point to the MPI implementation that you want, and you’re good to go (sometimes you need to do this in shell setup files, especially if you’re using rsh/ssh to login to remote nodes to start MPI processes). As such, it is perfectly possible to run multiple MPI apps, each using a different MPI implementation on the back-end, and hide all that glue from the end-user.

    I do agree, however, that it is a bummer that a) ISVs are forced to choose to support one or multiple MPI implementations, and b) that sysadmins have to deal with this in the first place (e.g., making glue to hide the differences and/or seamlessly allow users to swap between different MPI implementations — BTW, one package that many have found useful for this is Environment Modules: http://modules.sf.net/).

    MPI apps are source code compatible between MPI implementations, of course, but that doesn’t necessarily mean that they will run with the same characteristics with different MPI implementations. This can be a real headache to handle properly, especially for ISVs. Having an application binary interface (ABI) is something that the MPI 3.0 Forum is looking at, but to be honest, I’m a little skeptical as to whether it will actually happen and/or be useful. One obvious example is that even if you have an app that will run with MPI implementations A and B simply by switching your LD_LIBRARY_PATH, a) how many users will screw this up, and b) will the app run the same way between A and B?

    There are many more ABI issues than this, of course (both pro and con)… (please see http://meetings.mpi-forum.org/ if you’d like to express your opinions about an MPI ABI!)

    Additionally, the official site for MPI is http://www.mpi-forum.org/, not the Argonne site. :-)

  2. Do you perhaps have a reference regarding the 454 XLR-HD numbers for the storage and processing requirements?

  3. The easy ones first.

    Additionally, the official site for MPI is http://www.mpi-forum.org/, not the Argonne

    Fixed.

    Do you perhaps have a reference regarding the 454 XLR-HD numbers for the storage and processing requirements?

    I don’t think 454 has made that information widely available yet. Until they do, you can always cite this entry.

    Can you cite a specific problem [with multiple MPI implementations]?

    As someone who has developed in MPI a bit, I agree that from a developer standpoint, the spec works. As someone who has dealt with MPI as a system administrator, I agree that as long as you set up your environment properly, multiple implementations can coexist on the same system. It was the difficulty in getting all that sorted out, especially integrating it with cluster management/scheduling software, e.g., Platform LSF, that I was referring. It is certainly possible to make it work, it just is more difficult that it probably should be. As for an MPI-ABI, I think that is a lower priority (I don’t mind recompiling) than getting all the helper tools to have similar calling semantics and capabilities (in fact, being better able to specify which implementation you would like would likely allow you to avoid many such complications). And thanks for the pointer to Environment Modules.

  4. [...] have posted some details about the upcoming update to the 454 [...]

  5. Does anyone know the exact specs of the ‘gs flx titanium cluster’ that roche is offering with the xlr hd upgrade? I know it’s being built by pssc labs and comes in a neat little rack. We’re thinking of building one ourselves, but I’d like it to be at least as good as the one they offer. So the exact specs of what is actually in the rack would be nice. Roche sent out specs of what you might need in terms of space, cores and network, but those specifications date from early this year.

  6. To the best of my knowledge, the specs have not changed much, if at all. We are running primary analysis on 20 cores and it takes about eight hours. Nowadays, you can get 20 cores in one box.

  7. Are you running the standard titanium cluster Roche is offering? And if not, I’m assuming a standard rocks cluster with mpi should do the trick, no? Can’t imagine the software installation to be all that difficult.

  8. Jef, we are not running their standard cluster. In fact, our cluster does not really meet any of their specifications (we use Debian instead of Red Hat, MPICH2 instead of OpenMPI, Platform LSF HPC instead of sge). I think a standard Rocks cluster with an MPI 2 implementation would do just fine.

Leave a Reply