Recently, The Genome Center's Dr. Elaine Mardis gave a talk at the Cold Spring Harbor's Personal Genomes meeting. The topic of the talk was the ever increasing efforts of the genomics community to understand the molecular nature of cancer. In this blog, I have discussed projects like The Cancer Genome Atlas (TCGA) and Tumor Sequencing Project (TSP), but what are we actually doing in these projects to try to better understand and, ultimately, cure cancer? As with most things in genomics, the answer to that question changes day to day; with ever more powerful techniques being applied to study cancer. It is the stated goal of the National Cancer Institute (NCI) to end suffering and death from cancer by the middle of the next decade. This post is about how we are going to do that.

Cancer is a disease of the genome. Mutations occur over time in the DNA of every cell. Most of these mutations are benign; they have no affect on the normal operation of the cell. Some mutations or combinations of mutations can be deleterious. If the mutation causes the cell to die, the result is that a single cell dies and there is likely little affect on the organism as a whole. Some mutations, however, may affect the cell cycle (cell replication) in such a way that the cell replicates uncontrollably. Since all of the uncontrollably replicating cell's progeny have the same mutations, they and their progeny also replicate uncontrollably. Since these cells grow and divide much more quickly than the surrounding cells, these tumor cells quickly come to dominate the tissue in which they originated, starving other cells in the tissue of nutrients. The rapid replication of cells often leads to more mutations being accumulated until the tumor metastasizes and the cancer spreads throughout the body. Over the past few decades, cancer care has improved for many types of cancer as early detection has increased and various new chemotherapeutics have been developed. More recently, drugs that target specific cell processes, or pathways, whose increase or decrease in function due to mutations play a role in cancer have been developed and used effectively in the clinic. Unfortunately, the many diseases that fall under the name cancer are widely varied and extremely complex. Breakthroughs in understanding and treating one subtype of cancer may not be applicable to other cancers or even other subtypes of the same cancer. Indeed, even within a tumor there can be heterogeneity; all cells will have the mutations that led to tumorigenesis but different subpopulations within the tumor may appear, each accumulating its own specific mutations. Thus we need a better molecular, DNA-level understanding of the many diseases collectively called cancer. In other words, for each tumor we need to understand what mutations lead to tumorigenesis, how those mutations affected cell replication, and how to reverse that change of function.

To begin to discover more about the molecular nature of cancer, several years ago we began the Tumor Sequencing Project. In TSP, we use both array-based technologies and DNA sequencing to study tumor and normal tissues from a cohort of about 200 patients with lung adenocarcinoma, a form of lung cancer that affects a disproportionate amount of never-smokers. SNP arrays were used to determine copy-number variation (CNV). The CNV work was published last year in Nature, Characterizing the cancer genome in lung adenocarcinoma. While the CNV work provided valuable information about large-scale differences between the tumor and normal tissues over the entire genome, it and other array-based approaches cannot provided detailed information about specific mutations. To get a single base level resolution of the differences between an individual's tumor and normal genomes, you have to sequence the DNA. At the time TSP started, the cost of sequencing an entire genome was much greater than running a SNP array. Therefore, the low-resolution information from our SNP array investigations and other, similar investigations in the literature were used to guide the selection of specific genes thought to play a role in cancer in general and lung adenocarcinoma in particular. For each patient, we sequenced these genes in both their tumor and normal tissues and looked for differences between the two sequences. Once you have those differences, genes annotations (definitions) can be used to determine if the mutation in the tumor tissue would lead to a change in the amino acid sequence of the protein that gene encodes. If the protein would be changed, other programs can be used to predict whether the change would lead to a change in protein conformation and a possible loss or gain of function. The cellular pathways in which the mutated protein participates can also be researched to see if any of the pathways control cell division or other important signaling paths in the cell cycle. Once the gene and its pathways are known, they can be correlated with known oncogenes and oncogene families. Once that is established, further experiments, e.g., induction of found mutations in disease model organisms, can be done to determine if the mutations and predicted changes do contribute to carcinogenesis. It is also interesting to see how changes that span several patients correlate with phenotypic/patient information, e.g., cancer subtype, patient outcome, age, sex, and smoking status. The initial sequencing results of the TSP will be in tomorrow's issue of Nature, Somatic mutations affect key pathways in lung adenocarcinoma.

The Cancer Genome Atlas pilot project, which is studying brain, ovarian, and lung cancer, has followed much the same path as TSP, albeit on a larger scale. Compared to TSP, a wider variety of array platforms, more patients, and DNA sequencing of more genes have been employed. Initial gene lists for sequencing were obtained through a review of the literature with subsequent lists obtained through the results of the array-based studies. The initial glioblastoma paper from TCGA was published online by Nature last month and, like the TSP paper, will be in tomorrow's issue, Comprehensive genomic characterization defines human glioblastoma genes and core pathways.

With both TSP and TCGA, we are able to look at the whole genome in a coarse-grained way and specific parts of the genome (genes) in a fine-grained way. So the question becomes, what are we missing by not studying the entire genome at single base resolution? The advent of next-generation/massively parallel sequencing has allowed us to begin to answer this question. About a year and a half ago, we began whole-genome sequencing of a single AML patient's tumor and normal genomes using the Illumina/Solexa sequencing platform. Generating such a complete picture of a single human genome raised significant privacy issues that needed to be addressed before publishing and data release. The massive amount of data generated for each of these genomes presented significant challenges to our informatics infrastructure. All of the data had to be combed through to find the small variants (single nucleotide variations and small insertions and deletions of sequence) between the tumor and normal genomes (at the time, it was not possible to detect larger variations, e.g., structural rearrangements, with the Illumina/Solexa technology). With this approach, you are not only able to find mutations in genes that have not previously been implicated in cancer, but you are also able to find mutations in sequences conserved across species, microRNAs, regulatory regions, etc.; basically anything annotated in Ensembl or UCSC. Of course, the effect of mutations in non-genic regions are more difficult to interpret than those in genic regions that alter proteins, but the efforts of the ENCODE project, disease model organism studies, and sequencing more cancer genomes will greatly aid in the interpretation of these mutations.

In summary, these cancer sequencing efforts, especially whole-genome cancer sequencing, are increasing our understanding of cancer at a very rapid pace, laying the groundwork for more individualized approaches to treatment. As our knowledge of tumorigenesis increases, our ability to detect cancers early and treat cancers effectively will also increase. Ultimately, these sorts of studies will make the goal of ending suffering and death from cancer achievable.