<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PolITiGenomics &#187; 454</title>
	<atom:link href="http://www.politigenomics.com/tag/454/feed" rel="self" type="application/rss+xml" />
	<link>http://www.politigenomics.com</link>
	<description>Politics, Information Technology, and Genomics</description>
	<lastBuildDate>Thu, 21 Apr 2011 17:49:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Next-Generation Sequencing Informatics table update</title>
		<link>http://www.politigenomics.com/2009/10/next-generation-sequencing-informatics-table-update.html</link>
		<comments>http://www.politigenomics.com/2009/10/next-generation-sequencing-informatics-table-update.html#comments</comments>
		<pubDate>Mon, 05 Oct 2009 14:45:02 +0000</pubDate>
		<dc:creator>dd</dc:creator>
				<category><![CDATA[genomics]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[454]]></category>
		<category><![CDATA[compute]]></category>
		<category><![CDATA[Illumina]]></category>
		<category><![CDATA[informatics]]></category>
		<category><![CDATA[SOLiD]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://www.politigenomics.com/?p=1606</guid>
		<description><![CDATA[I have made some updates to the Next-Generation Sequencing Informatics table. Specifically, I have updated the numbers for 454 Ti, including paired-end information, and added information on the Illumina GA IIx. If anyone that is not employed by AB has real-world numbers for SOLiD 3, I&#8217;d appreciate you passing them along to me (I&#8217;m looking [...]]]></description>
			<content:encoded><![CDATA[<p>I have made some updates to the <a href="http://www.politigenomics.com/next-generation-sequencing-informatics">Next-Generation Sequencing Informatics table</a>. Specifically, I have updated the numbers for 454 Ti, including paired-end information, and added information on the Illumina GA IIx. If anyone that is not employed by AB has real-world numbers for SOLiD 3, I&#8217;d appreciate you passing them along to me (I&#8217;m looking at you drd).</p>
<p><strong>Update:</strong> I received some SOLiD 3 number from Nicholas Socci (thanks Nicholas!).</p>
<p><strong>Update2:</strong> I received a fuller set of numbers from drd and the SOLiD 3 column is complete (thanks drd!).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.politigenomics.com/2009/10/next-generation-sequencing-informatics-table-update.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>VarScan published</title>
		<link>http://www.politigenomics.com/2009/06/varscan-published.html</link>
		<comments>http://www.politigenomics.com/2009/06/varscan-published.html#comments</comments>
		<pubDate>Tue, 23 Jun 2009 13:04:19 +0000</pubDate>
		<dc:creator>dd</dc:creator>
				<category><![CDATA[genomics]]></category>
		<category><![CDATA[454]]></category>
		<category><![CDATA[Illumina]]></category>
		<category><![CDATA[informatics]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[wustl]]></category>

		<guid isPermaLink="false">http://www.politigenomics.com/?p=1294</guid>
		<description><![CDATA[VarScan, a tool developed at The Genome Center to detect variants in massively parallel sequence data has been published in Bioinformatics. VarScan can process both 454 and Solexa data of individuals or pools. You can find more information about VarScan in a post by Dan Koboldt, one of the paper&#8217;s and VarScan&#8217;s authors.]]></description>
			<content:encoded><![CDATA[<p><a href="http://genome.wustl.edu/tools/cancer-genomics#varscan">VarScan</a>, a tool developed at <a href="http://genome.wustl.edu/">The Genome Center</a> to detect variants in massively parallel sequence data has been published in <a href="http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btp373">Bioinformatics</a>. VarScan can process both 454 and Solexa data of individuals or pools. You can find more information about <a href="http://www.massgenomics.org/2009/06/variant-detection-in-massively-parallel-sequencing.html">VarScan in a post by Dan Koboldt</a>, one of the paper&#8217;s and VarScan&#8217;s authors.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.politigenomics.com/2009/06/varscan-published.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sequencing — the past, present, and future</title>
		<link>http://www.politigenomics.com/2009/04/sequencing-%e2%80%94-the-past-present-and-future.html</link>
		<comments>http://www.politigenomics.com/2009/04/sequencing-%e2%80%94-the-past-present-and-future.html#comments</comments>
		<pubDate>Tue, 21 Apr 2009 19:49:21 +0000</pubDate>
		<dc:creator>dd</dc:creator>
				<category><![CDATA[genomics]]></category>
		<category><![CDATA[454]]></category>
		<category><![CDATA[health]]></category>
		<category><![CDATA[Illumina]]></category>
		<category><![CDATA[PacBio]]></category>
		<category><![CDATA[SOLiD]]></category>
		<category><![CDATA[wustl]]></category>

		<guid isPermaLink="false">http://www.politigenomics.com/?p=1063</guid>
		<description><![CDATA[Science Magazine has a nice article, Sanger Who? Sequencing the Next Generation, describing past sequencing technology, the current &#8220;next-generation&#8221; sequencing instruments and their capabilities, and several of the companies working to become the next big thing in sequencing. If you are interested in learning, at a high level, how each of the technologies work and [...]]]></description>
			<content:encoded><![CDATA[<p>Science Magazine has a nice article, <a href="http://www.sciencemag.org/products/lst_20090410.dtl">Sanger Who? Sequencing the Next Generation</a>, describing past sequencing technology, the current &#8220;next-generation&#8221; sequencing instruments and their capabilities, and several of the companies working to become the next big thing in sequencing. If you are interested in learning, at a high level, how each of the technologies work and how they compare to each other, it is worth a read.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.politigenomics.com/2009/04/sequencing-%e2%80%94-the-past-present-and-future.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My talk at AGBT</title>
		<link>http://www.politigenomics.com/2009/02/my-talk-at-agbt.html</link>
		<comments>http://www.politigenomics.com/2009/02/my-talk-at-agbt.html#comments</comments>
		<pubDate>Wed, 18 Feb 2009 15:40:33 +0000</pubDate>
		<dc:creator>dd</dc:creator>
				<category><![CDATA[genomics]]></category>
		<category><![CDATA[454]]></category>
		<category><![CDATA[AGBT]]></category>
		<category><![CDATA[Creative Commons]]></category>
		<category><![CDATA[Illumina]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[wustl]]></category>

		<guid isPermaLink="false">http://www.politigenomics.com/?p=870</guid>
		<description><![CDATA[Several people have asked me to post my slides from AGBT. Given the type of slides I prepare, I thought that might be less than useful, so I recorded my talk and posted it to blip.tv. The narrative is a bit rough because I did it in one take a few weeks after I gave [...]]]></description>
			<content:encoded><![CDATA[<p>Several people have asked me to post my slides from AGBT. Given the type of slides I prepare, I thought that might be less than useful, so I recorded my talk and posted it to <a href="http://blip.tv/file/1787219">blip.tv</a>. The narrative is a bit rough because I did it in one take a few weeks after I gave the talk, but all the basics are there (actually, it is a bit longer, 20 minutes, than the talk I gave at AGBT).</p>
<div class="widevideo"><embed src="http://blip.tv/play/Ae3Uf5SFQQ" type="application/x-shockwave-flash" width="510" height="404" allowscriptaccess="always" allowfullscreen="true"></embed></div>
<p>Here are the links that appear on the last slide.
<ul>
<li><a href="http://genome.wustl.edu/">The Genome Center</a></li>
<li><a href="http://www.politigenomics.com/">PolITiGenomics (this) blog</a></li>
<li><a href="http://www.biomedcentral.com/1471-2105/8/362">LIMS Paper</a></li>
<li><a href="http://www.media-landscape.com/yapc/2006-06-27.ScottSmith/">YAPC UR presentation</a></li>
</ul>
<h3>The Odyssey</h3>
<p>So you may be asking yourself, how did he generate a movie of his talk? Even if you are not asking yourself that, I am going to tell you so that if you need to do it (or if I need to do it again), you can avoid a lot of hassle. This was all done on a MacBook using a slide deck created in MS PowerPoint 2008. First I tested screen capture using <a href="http://www.videolan.org/vlc/">VLC</a>. It took a few tries with the video settings to make it not look terrible (use H.264 at 1024 kb/s bitrate, 25 or more fps, MPEG 4 encapsulation), but the audio capture did not work. To compensate for the lack of audio, I did audio capture using GarageBand (podcast project type) at the same time I did the video capture. I then loaded the audio and the video into iMovie HD, synchronized the audio to the video, clipped the beginning and the end, and exported. This resulted in a fair quality product. Being stupid, I thought I could do better. I next tested recording narration in MS PowerPoint, but there didn&#8217;t seem to be a good way to export both a video of the slides with the recorded timings and the audio into a single format (not to mention it stupidly saves the audio in seemingly uncompressed files without an extension, one file per slide). So I imported the slides into Keynote 08, cleaned up the messes created by the import, recorded the slide show, and exported to QuickTime at high quality. This looked and played great in QuickTime so I went to upload it to <a href="http://www.youtube.com/user/ddgenome">YouTube</a>. Sorry, YouTube has a ridiculous and arbitrary limit of 10 minutes for a video. Moving on to <a href="http://ddgenome.blip.tv/">blip.tv</a> (which, nicely, also supports <a href="http://creativecommons.org/">Creative Commons</a> licensing), I uploaded the video. After waiting for it to convert, I played it and noticed that although the audio was fine, the slides did not advance. Thinking something was wrong with the blip.tv Flash converter, I moved on to <a href="http://www.vimeo.com/user1315280/videos">Vimeo</a>. Slides didn&#8217;t advance there either. So next I checked the video by watching it using VLC and <a href=http://www.mplayerhq.hu/"">mplayer</a>. No slide advancement. Trying to fix it, I went into the custom video settings when exporting in KeyNote, trying all sorts of combinations to generate a video that played well in VLC. Using various settings, I was able to get the slides to advance for a while, but eventually they would stop. Audio was always fine. I tried upgrading to Mac OS X 10.5. Still no luck (although in 10.5, VLC can capture from the iSight camera, so maybe audio capture will work in VLC now). I then recalled that the MPEG-2 on Mac was a little quirky (QuickTime won&#8217;t played decoded Tivo Series 2 video without first converting them to MPEG-4 using <a href="http://ffmpeg.org/">ffmpeg</a>). As a last ditch effort, I imported the KeyNote exported movie into iMovie HD as an MPEG-4 project. I check to make sure iMovie HD played it correctly, then exported at full quality as an MPEG-4. This ballooned the size of the video from around 20 MB to over 110 MB, lessened the quality, and introduced a strange pulsing phenomena in the slides (probably due to compression degradation being corrected by key frames every second or so), but it seemed to play correctly in VLC and was higher quality than the VLC capture video. This video, uploaded to blip.tv, is what you see above.</p>
<p><strong>Update:</strong> Here is how <a href="http://www.lessig.org/blog/2008/07/one_step_until_brilliant_scree.html">Lawrence Lessig screencasts</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.politigenomics.com/2009/02/my-talk-at-agbt.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Next-Generation Sequencing Informatics</title>
		<link>http://www.politigenomics.com/2008/12/next-generation-sequencing-informatics.html</link>
		<comments>http://www.politigenomics.com/2008/12/next-generation-sequencing-informatics.html#comments</comments>
		<pubDate>Thu, 04 Dec 2008 22:15:59 +0000</pubDate>
		<dc:creator>dd</dc:creator>
				<category><![CDATA[genomics]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[454]]></category>
		<category><![CDATA[compute]]></category>
		<category><![CDATA[Illumina]]></category>
		<category><![CDATA[informatics]]></category>
		<category><![CDATA[SOLiD]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://www.politigenomics.com/?p=660</guid>
		<description><![CDATA[I have put together a table with a bunch of important metrics for the major next-generation sequencing platforms: Next-Generation Sequencing Informatics (there is also a link on the left-hand side of the page). It includes number of reads, read length, data sizes, computational time, etc. I will try to keep it as up to date [...]]]></description>
			<content:encoded><![CDATA[<p>I have put together a table with a bunch of important metrics for the major next-generation sequencing platforms: <a href="http://www.politigenomics.com/next-generation-sequencing-informatics">Next-Generation Sequencing Informatics</a> (there is also a link on the left-hand side of the page). It includes number of reads, read length, data sizes, computational time, etc. I will try to keep it as up to date as I can and add new platforms and revisions as they become available. Consider it an early Christmas present.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.politigenomics.com/2008/12/next-generation-sequencing-informatics.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Personal DNA testing</title>
		<link>http://www.politigenomics.com/2008/07/personal-dna-testing.html</link>
		<comments>http://www.politigenomics.com/2008/07/personal-dna-testing.html#comments</comments>
		<pubDate>Wed, 09 Jul 2008 13:38:08 +0000</pubDate>
		<dc:creator>dd</dc:creator>
				<category><![CDATA[genomics]]></category>
		<category><![CDATA[454]]></category>
		<category><![CDATA[Colbert]]></category>
		<category><![CDATA[evolution]]></category>
		<category><![CDATA[health]]></category>
		<category><![CDATA[informatics]]></category>
		<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://www.politigenomics.com/?p=104</guid>
		<description><![CDATA[Last night on NOVA scienceNOW there was a segment on the personal DNA tests currently being marketed to consumers (you can watch the segment on the website, unfortunately no ability to embed video on other sites). The host, Neil deGrasse Tyson, had his DNA tested by Navigenics and learned his &#8220;probability&#8221; as compared to the [...]]]></description>
			<content:encoded><![CDATA[<p>Last night on <a href="http://www.pbs.org/wgbh/nova/sciencenow/">NOVA scienceNOW</a> there was a segment on the <a href="http://www.pbs.org/wgbh/nova/sciencenow/0302/01.html">personal DNA tests currently being marketed to consumers</a> (you can watch the segment on the website, unfortunately no ability to embed video on other sites). The host, <a href="http://research.amnh.org/~tyson/">Neil deGrasse Tyson</a>, had his DNA tested by <a href="http://www.navigenics.com/">Navigenics</a> and learned his &#8220;probability&#8221; as compared to the rest of the population for getting certain diseases; even deciding to learn the genotype associate with his APOE4 gene, the so-called Alzheimer&#8217;s gene (something <a href="http://nobelprize.org/nobel_prizes/medicine/laureates/1962/watson-bio.html">James Watson</a> decided <em>not</em> to do when his <a href="http://www.technologyreview.com/Biotech/18809/?a=f">genome was sequenced</a>). As all the scientists who do not work for one of these personal genomics companies said when interviewed, while these tests may provide some information about a person&#8217;s genome, we really don&#8217;t know what they are telling us about the person&#8217;s health, how the SNPs detected affect phenotype, how to use them to guide lifestyle, treatment, diet, etc.</p>
<p>Later in the program there was also an interesting segment on geneticist and rocker, <a href="http://www.pbs.org/wgbh/nova/sciencenow/0302/04.html">Pardis Sabeti</a>, who pioneered a statistical approach to determine if mutations in a population were random or enriched due to natural selection.</p>
<p>Speaking of Neil deGrasse Tyson, check out the tour of the Hayden Planetarium he gives to Stephen Colbert so Stephen can become an astrophysicist if the Colbert Report doesn&#8217;t pan out.</p>
<div style="text-align: center;"><embed FlashVars='videoId=156552' src='http://www.comedycentral.com/sitewide/video_player/view/default/swf.jhtml' quality='high' bgcolor='#cccccc' width='332' height='316' name='comedy_central_player' align='middle' allowScriptAccess='always' allownetworking='external' type='application/x-shockwave-flash' pluginspage='http://www.macromedia.com/go/getflashplayer'></embed></div>
]]></content:encoded>
			<wfw:commentRss>http://www.politigenomics.com/2008/07/personal-dna-testing.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What&#8217;s in an SRF?</title>
		<link>http://www.politigenomics.com/2008/06/whats-in-an-srf.html</link>
		<comments>http://www.politigenomics.com/2008/06/whats-in-an-srf.html#comments</comments>
		<pubDate>Mon, 30 Jun 2008 21:21:20 +0000</pubDate>
		<dc:creator>dd</dc:creator>
				<category><![CDATA[genomics]]></category>
		<category><![CDATA[1000 Genomes]]></category>
		<category><![CDATA[454]]></category>
		<category><![CDATA[Illumina]]></category>
		<category><![CDATA[informatics]]></category>
		<category><![CDATA[SOLiD]]></category>

		<guid isPermaLink="false">http://www.politigenomics.com/?p=100</guid>
		<description><![CDATA[I have written a bit about the NCBI Short Read Archive (SRA), its internals, and data transfer rates. Here is some information about the data format people are using to submit data from the massively parallel sequencers to the SRA. I apologize in advance for all the acronyms. The SRA is currently accepting 454 data [...]]]></description>
			<content:encoded><![CDATA[<p>I have written a bit about the <a href="http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi">NCBI Short Read Archive (SRA)</a>, <a href="http://www.politigenomics.com/2008/05/short-read-archive.html">its internals</a>, and <a href="http://www.politigenomics.com/2008/06/how-fast.html">data transfer rates</a>. Here is some information about the data format people are using to submit data from the massively parallel sequencers to the SRA. I apologize in advance for all the acronyms.</p>
<p>The SRA is currently accepting 454 data in <a href="http://www.454.com/news-events/press-releases.asp?display=detail&#038;id=48">standard flowgram format (SFF)</a> and Solexa in <a href="http://srf.sourceforge.net/">SRF</a> format.  Soon 454 and AB SOLiD will support the SRF format and submissions will commence in that format for those platforms.  The SFF format contains the flowgrams (intensity per cycle at each spot), base calls, and base quality values.  In other words, the SFF is very similar to the SCF format used for capillary sequencing data (except flowgrams are discrete whereas chromatograms are continuous).  Also, NCBI (as <a href="http://www.politigenomics.com/2008/05/short-read-archive.html">recently discussed</a>) has developed their own storage format for massively parallel sequencing data that they will also be accepting as a submission format within the next few months.</p>
<p>So what is an SRF? Well, it is basically just a container format, i.e., what you store in it is up to the implementation.  Thus far, SRF has only been implemented for Illumina/Solexa data; so the rest of this post is specific to that platform and the data types that its implementation of the SRF format contains. The Solexa SRF implementation was done largely by James Bonfield at <a href="http://www.sanger.ac.uk/">Sanger</a> and is distributed as part of the <a href="https://sourceforge.net/project/showfiles.php?group_id=100316&#038;package_id=108243">io_lib</a> package (now distributed separately from the <a href="http://staden.sourceforge.net/">Staden package</a>).  I would imagine that the SOLiD implementation will be very similar to the Solexa implementation.  The 454 implementation will likely be very similar to the SFF already in wide use.</p>
<p>For the <a href="http://www.1000genomes.org/">1000 Genomes</a> <a href="http://www.politigenomics.com/2008/03/1000-genomes.html">pilot projects</a>, the 1000 Genomes Data Collection Center (DCC) is asking that we submit the &#8220;raw&#8221;, &#8220;processed&#8221;, and &#8220;base&#8221; data for each spot.  Raw data are the intensity values (int) and noise (nse) values.  Processed data are the processed intensity values (sig2) and four-channel quality values (prb).  Base data are the base calls (the quality value is gotten from the prb for the called base).  This results in about 50 bytes per base for the SRF. Compared to 2 bits per base, the minimum possible for DNA&#8217;s four letter alphabet, this is a 200-fold increase.  So not only do these instruments generate a lot more data, we are storing more information per base now too.  The average submission for an Solexa run is about 100 GB.</p>
<p>Why store all this extra information?  Essentially, people do not trust/believe the data at this point.  The quality values provided by these pipelines are not as reliable as those generated for capillary sequence data.  Some people want the raw data so that they can develop and improve base calling/quality algorithms. Clearly you would not need <em>all</em> the 1000 Genomes data to develop such algorithms (although the technology changes at such a rate that you would likely want some rolling subset of the latest runs). Others want the raw data because they think they may want to go back and re-analyze data when better algorithms become available. For a wide variety of reasons (disk space, computational cost, network bandwidth, keeping pace with newly generated data), I doubt any such massive re-analysis will ever take place.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.politigenomics.com/2008/06/whats-in-an-srf.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How much?</title>
		<link>http://www.politigenomics.com/2008/06/how-much.html</link>
		<comments>http://www.politigenomics.com/2008/06/how-much.html#comments</comments>
		<pubDate>Fri, 20 Jun 2008 18:11:06 +0000</pubDate>
		<dc:creator>dd</dc:creator>
				<category><![CDATA[genomics]]></category>
		<category><![CDATA[454]]></category>
		<category><![CDATA[Illumina]]></category>
		<category><![CDATA[informatics]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[SOLiD]]></category>
		<category><![CDATA[wustl]]></category>

		<guid isPermaLink="false">http://www.politigenomics.com/?p=97</guid>
		<description><![CDATA[The Genome Center recently published a paper entitled Aspects of coverage in medical DNA sequencing that develops a model for diploid sequence coverage using data from massively parallel sequencing platforms (454, Solexa, SOLiD). It uses a known yardstick, 8&#215; BAC or WGS coverage with capillary sequencing, to establish the equivalent coverage for the new sequencing [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://genome.wustl.edu/">The Genome Center</a> recently published a paper entitled <a href="http://www.biomedcentral.com/1471-2105/9/239">Aspects of coverage in medical DNA sequencing</a> that develops a model for diploid sequence coverage using data from massively parallel sequencing platforms (454, Solexa, SOLiD). It uses a known yardstick, 8&times; BAC or WGS coverage with capillary sequencing, to establish the equivalent coverage for the new sequencing platforms. It turns out you need about 20&times; to 30&times; redundancy using these new platforms to obtain the equivalent amount of information as 10&times; coverage with capillary sequencing. The paper is published in an open access journal, <a href="http://www.biomedcentral.com/bmcbioinformatics">BMC Bioinformatics</a>, so enjoy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.politigenomics.com/2008/06/how-much.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>454 XLR-HD</title>
		<link>http://www.politigenomics.com/2008/06/454-xlr-hd.html</link>
		<comments>http://www.politigenomics.com/2008/06/454-xlr-hd.html#comments</comments>
		<pubDate>Thu, 12 Jun 2008 15:58:01 +0000</pubDate>
		<dc:creator>dd</dc:creator>
				<category><![CDATA[genomics]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[454]]></category>
		<category><![CDATA[compute]]></category>
		<category><![CDATA[FLOSS]]></category>
		<category><![CDATA[informatics]]></category>

		<guid isPermaLink="false">http://www.politigenomics.com/?p=92</guid>
		<description><![CDATA[The next upgrade of the 454 FLX platform is called Titanium. The previous name gave a better indication of what the upgrade entails: XLR-HD which is short for eXtra Long Reads-High Density. The XLR is due to the run having twice the number of cycles so the average read length will increase from 250 to [...]]]></description>
			<content:encoded><![CDATA[<p>The next upgrade of the <a href="http://www.genome-sequencing.com/">454 FLX</a> platform is called Titanium.  The previous name gave a better indication of what the upgrade entails: XLR-HD which is short for eXtra Long Reads-High Density.  The XLR is due to the run having twice the number of cycles so the average read length will increase from 250 to 400 bases (the average read length is not exactly double due to nucleotide flow order, mononucleotide runs, degraded signal as the number of cycles increase, etc.).  The HD is due to smaller, more densely packed wells on the picotiter plate which increases the number of DNA fragments sequenced per run.  Putting these together, 454 FLX Titanium runs will quintuple their data output from 100 Mb to about 500 Mb (or more).</p>
<p>This increase in data does not come without a price.  Up until now, the primary analysis (image processing and base calling) of 454 data was able to be performed in a few hours on a moderately powerful computer.  With the increased data output, primary analysis requires a small cluster: 20 cores with 1 GiB RAM per core having shared access to 1-2 TB of disk space.  While those are the minimal requirements, 10 cores per run region seem to be the sweet spot for best performance.  The initial production release will support <a href="http://www.redhat.com/">Red Hat</a>-compatible GNU/Linux distributions (<a href="http://www.redhat.com/rhel/">RHEL</a>, <a href="http://www.centos.org/">CentOS</a>, and <a href="http://fedoraproject.org/">Fedora</a>).  Previous releases also only officially supported Red Hat-like operating systems but we have not had a problem running them on <a href="http://www.debian.org/">Debian GNU/Linux</a> (454 also indicated they are pushing toward <a href="http://www.linuxfoundation.org/en/LSB">LSB3</a> compliance).  Fortunately, 454 is eliminating the hard-coded dependence that the software be installed and the analysis processes have write access to <code>/usr/local/rig</code>.  This will make installation across a cluster much easier.  They are also abandoning their custom <a href="http://en.wikipedia.org/wiki/Inter-process_communication">IPC</a> implementation in favor of the &#8220;standard&#8221; <a href="http://www.mpi-forum.org/">MPI</a>, specifically <a href="http://www.open-mpi.org/">OpenMPI</a> or <a href="http://www.mcs.anl.gov/research/projects/mpich2/">MPICH2</a>.  While it is good that they are using a standard IPC implementation, it is unfortunate that MPI implementations are so fragmented and often incompatible, i.e., if one vendor uses MPICH2 and another uses <a href="http://www.lam-mpi.org/">LAM</a>, you need to set up different systems to support each because they cannot coexist on the same system without problems.</p>
<p>I know this is unrelated to informatics, but if you will allow me to journey back to my transport phenomena days as a chemical engineer, the new picotiter plate requires much smaller beads, about 1 micron in diameter.  At these length scales, <a href="http://www.engr.wisc.edu/che/newsletter/2001-02_fallwinter/transport.html">transport phenomena</a>, specifically boundary affects and polymer diffusion, may become important during the emulsion PCR and sequencing.  Someone needs to calculate a <a href="http://www.efunda.com/formulae/fluids/calc_reynolds.cfm">Reynolds number</a>.</p>
<p>Oh, one more thing, there is talk of paired-end reads with 20 kb inserts.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.politigenomics.com/2008/06/454-xlr-hd.html/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>N Genomes</title>
		<link>http://www.politigenomics.com/2008/05/n-genomes.html</link>
		<comments>http://www.politigenomics.com/2008/05/n-genomes.html#comments</comments>
		<pubDate>Fri, 09 May 2008 20:21:23 +0000</pubDate>
		<dc:creator>dd</dc:creator>
				<category><![CDATA[genomics]]></category>
		<category><![CDATA[1000 Genomes]]></category>
		<category><![CDATA[454]]></category>
		<category><![CDATA[CSHL]]></category>
		<category><![CDATA[health]]></category>
		<category><![CDATA[Illumina]]></category>
		<category><![CDATA[informatics]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[SOLiD]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://www.politigenomics.com/?p=74</guid>
		<description><![CDATA[Earlier this week there were several meetings about the 1000 Genomes Project at Cold Spring Harbor Labs. The first meeting Monday morning was about data flow and data repositories. NCBI&#8217;s Short Read Archive (SRA) and the equivalent at EBI (which should be ready in a month or two) will house all the data. The pilot [...]]]></description>
			<content:encoded><![CDATA[<p>Earlier this week there were several meetings about the <a href="http://www.1000genomes.org/">1000 Genomes Project</a> at <a href="http://www.cshl.edu/">Cold Spring Harbor Labs</a>.  The first meeting Monday morning was about data flow and data repositories.  NCBI&#8217;s <a href="http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi">Short Read Archive (SRA)</a> and the equivalent at <a href="http://www.ebi.ac.uk/">EBI</a> (which should be ready in a month or two) will house all the data.  The <a href="http://www.politigenomics.com/2008/03/1000-genomes.html">pilot projects</a> for the 1000 Genomes Project just started less than two months ago and have already generated as much sequence data as half of the entire <a href="http://www.ncbi.nlm.nih.gov/Traces/trace.cgi">trace archive</a> (which contains the sequence data for all publicly funded genome projects over the last 10 years).    In other words, this project is going to generate a <span style="font-weight:bold;">lot</span> of sequence data (not to mention all the data generated by analysis of the sequence).  Paul Flicek from EBI estimates the pilot projects alone will generate about 1 PT (1,000,000 GB) of sequence data.  Moving that much data from site to site will be a challenge.  Normal solutions, e.g., FTP, rsync, and shipping hard drives, can&#8217;t seem to keep up with the data generation rates.  NCBI, EBI, and the sequencing centers are testing a high-speed data transfer solution called <a href="http://www.asperasoft.com/products/scp/index.html">Aspera scp</a>.  It has impressive transfer rates, but seems to stall after a while for no discernible reason.  We&#8217;ll see if we can get it to work reliably over the coming weeks.</p>
<p>After the data flow meeting was a meeting of the 1000 Genomes Steering Committee.  The day and a half that ensued was filled with a lot of lively discussion.  When all was said and done, one thing was clear: there are a lot of questions that need to be answered.  The analysis group presented convincing results from simulations that indicated 2× coverage in a large number of individual genomes (Pilot 1) is probably not sufficient to detect the rare variants the project is going after (present in 1-2% of the population).  The simulations indicated that the power of the study to detect such variants (at a constant cost, i.e., constant total amount of sequence generated) would be greatly enhanced by sequencing half as many people at 4× coverage.  There was no firm decision on how to change the pilot (if at all), but going forward it is likely that some of the individuals in Pilot 1 will be sequenced up to 4× or even 8×.  Thus, while the project may be named 1000 Genomes, exactly how many genomes we are going to sequence is yet to be determined.</p>
<p>Another issue that arose was the rapid development of the massively parallel sequencing technologies.  These platforms (454 FLX, Illumina Genome Analyzer, and AB SOLiD) increase their throughput, improve their data quality, improve analysis software, etc. several times each year.  Such dynamic platforms make the development of tools to analyze their data, e.g., align the data to a reference genome and detect variants, very difficult.  The right platforms and tools today may not be the best next month or next year when the main project gets underway.  This causes two major needs to come to the fore.  First, experimental design will not end when the project starts.  The experiment will need to be adjusted as capabilities and capacities change.  Second, we will not only have to continually develop and refine tools throughout the project, we will need to develop frameworks to continually test and compare the tools that are available.  It&#8217;s always fun to hit a moving target.</p>
<p>The meeting also discussed the ethical, legal, and social implications (ELSI) of the project.  This discussion largely focused on which populations to sample for the project.  Should we deepen our knowledge of individuals of Central European, African, and East Asian ancestry to aid in methodology development?  Or should we broaden our knowledge of overall human variation by including fewer individuals from a larger number of populations?  To be determined&hellip;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.politigenomics.com/2008/05/n-genomes.html/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

