Second Generation Sequencing of t...
Second Generation Sequencing of the Mesothelioma Tumor Genome Raphael Bueno1,2, Assunta De Rienzo1,2, Lingsheng Dong1,2, Gavin J. Gordon1,2, Colin F. Hercus6, William G. Richards1,2, Roderick V. Jensen7, Arif Anwar6, Gautam Maulik1,2, Lucian R. Chirieac1,3, Kim- Fong Ho6, Bruce E. Taillon5, Cynthia L. Turcotte5, Robert G. Hercus6, Steven R. Gullans8, David J. Sugarbaker1,2* 1 The International Mesothelioma Program, Brigham and Women���s Hospital, Boston, Massachusetts, United States of America, 2 Division of Thoracic Surgery, Brigham and Women���s Hospital, Boston, Massachusetts, United States of America, 3 Department of Pathology, Brigham and Women���s Hospital, Boston, Massachusetts, United States of America, 4 Harvard Medical School, Boston, Massachusetts, United States of America, 5 454 Life Sciences, Inc., Branford, Connecticut, United States of America, 6 Synamatix, Kuala Lumpur, Malaysia, 7 Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, United States of America, 8 Excel Medical Ventures, Boston, Massachusetts, United States of America Abstract The current paradigm for elucidating the molecular etiology of cancers relies on the interrogation of small numbers of genes, which limits the scope of investigation. Emerging second-generation massively parallel DNA sequencing technologies have enabled more precise definition of the cancer genome on a global scale. We examined the genome of a human primary malignant pleural mesothelioma (MPM) tumor and matched normal tissue by using a combination of sequencing-by-synthesis and pyrosequencing methodologies to a 9.6X depth of coverage. Read density analysis uncovered significant aneuploidy and numerous rearrangements. Method-dependent informatics rules, which combined the results of different sequencing platforms, were developed to identify and validate candidate mutations of multiple types. Many more tumor-specific rearrangements than point mutations were uncovered at this depth of sequencing, resulting in novel, large- scale, inter- and intra-chromosomal deletions, inversions, and translocations. Nearly all candidate point mutations appeared to be previously unknown SNPs. Thirty tumor-specific fusions/translocations were independently validated with PCR and Sanger sequencing. Of these, 15 represented disrupted gene-encoding regions, including kinases, transcription factors, and growth factors. One large deletion in DPP10 resulted in altered transcription and expression of DPP10 transcripts in a set of 53 additional MPM tumors correlated with survival. Additionally, three point mutations were observed in the coding regions of NKX6-2, a transcription regulator, and NFRKB, a DNA-binding protein involved in modulating NFKB1. Several regions containing genes such as PCBD2 and DHFR, which are involved in growth factor signaling and nucleotide synthesis, respectively, were selectively amplified in the tumor. Second-generation sequencing uncovered all types of mutations in this MPM tumor, with DNA rearrangements representing the dominant type. Citation: Bueno R, De Rienzo A, Dong L, Gordon GJ, Hercus CF, et al. (2010) Second Generation Sequencing of the Mesothelioma Tumor Genome. PLoS ONE 5(5): e10612. doi:10.1371/journal.pone.0010612 Editor: Anita Brandstaetter, Innsbruck Medical University, Austria Received October 30, 2009 Accepted April 1, 2010 Published May 13, 2010 Copyright: �� 2010 Bueno et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by funds from the International Mesothelioma Program (IMP) at Brigham and Women���s Hospital (www.impmeso.org). DJS is also supported by P20CA9057801-A1, U01 CA65170-07 and U24CA114725-01 from the NCI, WGR is also supported by P20CA9057801-A1 and U24CA114725-01 from the NCI, RB is also supported by RO1 CA 120528 and R21/R33 CA 100315 from the NCI. Competing Interests: There are authors on this manuscript that work for three companies: Roche 454 Life Sciences, Inc., Synamatix Sdn Bhd and Excel Medical Ventures and according to PLOS One rules must be called Funder. Each of these companies provided work for this project as a subcontractor and was paid for it. None of these companies owns any IP, has any control or exerted any influence on this article other than scientific editorializing. Specifically, the authors contracted 454 (and then Roche-454) to perform some of the sequencing since at the time there was no 454 machine in Boston. The authors met with their staff and they helped in the experimental design, but the sequencing was paid for by the authors��� institution and the company holds no IP or any other controls. Synamatix was contracted to help the authors perform bioinformatic analysis using their equipment as the authors did not at the time have the full computer capability. They were used as a service company and do not control any IP or data. One of the authors��� long term collaborators, Steve Gullans, moved from academia to the venture company he founded, Excel. He participated as a paid scientific collaborator and consultant in a manner unrelated to his company. Like the case for the others, his company places no restrictions nor owns any IP. Given that as described above, all these companies provided paid for, subcontracted resources, not unlike a Core function, the authors included some of their employees as authors for some scientific contributions. Furthermore, all the final data analysis, processing and validation was done in the authors��� academic institution. However, there is no competing interest nor are there any data sharing limitations. Thus the authors continue to adhere to PLOS one rules and regulations in full. * E-mail: dsugarbaker@partners.org Introduction Malignant pleural mesothelioma (MPM) is a highly aggressive pleural tumor associated with asbestos exposure. Therapeutic options are limited and median patient survival is about 7 months. The standard chemotherapy regimen consisting of pemetrexed and platinum extends median survival by approximately 2 months [1] however, select patients who undergo complete surgical resection followed by chemotherapy and radiation derive a longer survival benefit with some surviving over 5 years [2,3]. To date, most clinical trials have focused on cytotoxic agents rather than targeted therapies [4���6]. While it has been shown that chromosomal abnormalities are abundant within MPM [7���11], whole genome analysis of this tumor PLoS ONE | www.plosone.org 1 May 2010 | Volume 5 | Issue 5 | e10612
has not yet been described. Massively parallel second-generation sequencing methodologies using pyrosequencing or derivations of sequencing by synthesis have paved the way for large-scale analyses of tumor biology by targeted gene re-sequencing, mutation detection, copy number variation (CNV), single nucleotide poly- morphism (SNP), changes in chromatin architecture, and epigenetic changes such as methylation pattern alteration [12���24]. The ability to analyze entire genomes opens the door to global mapping of normal variation and mutations of all types for correlation with disease propensity, diagnosis, treatment, prognosis, as well as identification of new targets for interventional therapy discovery and development [15,25,26]. We report a definition at the genomic level of a primary human MPM tumor using a combination of approaches, namely, Illumina sequencing by synthesis and Roche/454 pyrosequencing [18,19,27]. Building upon previous work in which we sequenced the transcriptomes of four MPM patient samples with Roche/454 technology [26], we selected the genomic DNA (gDNA) from a tumor and normal control from one of those patients for more in- depth analysis. In this single tumor, we found hundreds of previously unreported single nucleotide variants (SNVs), single nucleotide insertions/deletions (indels), inter- and intra-chromosomal large-scale DNA rearrangements (including many that occur within genes), and translocations. Many of the aberrations are tumor-specific mutations occurring at loci that code for genes involved in distinct pathways known to play key roles in cancer initiation and progression. We also found substantial variability in this individual���s normal genome, confirming previous reports [21,28���31]. The data herein suggest that the numbers and types of variations both in the normal and the MPM tumor genomes are considerable, and further, that efforts to use these data (and similar data from other solid tumors) to elucidate tumor biology and identify novel candidate therapeutic targets may be more challenging than previously thought. Results Sequencing and mapping We sequenced MPM tumor gDNA and matched normal lung gDNA using Illumina paired-ends (PE) technology to generate 17.8 and 15.67 Gigabase pairs (Gbp) or 5.6X and 5.2X coverage of the respective genomes (Table 1). Greater than 97% of the individual sequenced reads aligned to NCBI���s RefSeq database. The tumor���s read density when compared to the RefSeq database revealed numerous chromosomal CNVs (Fig. 1), a known hallmark of many tumor types including MPM [9,21,28,29,32���35]. Several chromo- somes (2, 3, 6, 7, 10, 12, 13 and 20) appeared to be mostly euploid, whereas others, including Y, 4, 14, 18, 19 and 10, were less abundant than expected, or appeared to have extra copies (e.g., chromosome 5). Furthermore, a number of chromosomes (1, 8, 9, 11, 15, 16, 17, 21, and 22) appeared to have complex structures that included both haploid and polyploid segments. The read density/CNV were independently verified using deep exploration of the same samples with comparative genomic hybridization (CGH) arrays (Fig. 2) and were shown to match in a highly statistically significant manner (Pearson correlation coefficient (r)=0.7918 99% confidence interval =0.788#r#0.796). Thus, whole genome sequence analysis revealed as expected that this MPM tumor displayed large-scale aneuploidy. Global validation using Roche/454 sequencing Given the degree of aneuploidy and the large number of variants observed with Illumina sequencing in both the tumor and normal gDNAs, it became clear that independent validation or at least prioritization of candidate tumor mutations would be required to identify true positive mutations. Furthermore, second-generation sequencing methodologies possess inherent limitations related to artifacts and biases representing false-positive mutations [18,19,27,36]. Thus, a pyrosequencing approach (Roche/454) was undertaken to confirm the read densities observed and the presence of any mutations detected. Roche/454 sequencing of the tumor gDNA generated 10.8 Gbp or ,4X coverage, extending the total coverage of the tumor genome to nearly 10X. Ninety-six percent of the Roche/ 454 reads mapped to known human DNA and the read density analysis was highly statistically correlated with that obtained by Illumina sequencing and the CGH arrays (Figs. 1 and 2 and File S1). Genomic rearrangement The dramatic changes in read density observed in the MPM tumor (Figs. 1 and 2) led us to focus on the identification of Table 1. Mapping of Illumina Paired-End Sequence Data. Tumor Normal Total reads (PE) 219,359,022 175,965,242 Total base pair 17,808,799,058 15,673,204,592 Avg. length/PE read 81 89 Mapping attributes of Paired-end reads (Read 1: Read 2) # of mapped reads % # of mapped reads % 1:1 153,601,517 70.02% 131,604,101 74.79% 0:1 1,972,379 0.90% 1,435,061 0.82% 1:0 12,270,045 5.59% 4,109,024 2.34% N:1 12,108,970 5.52% 9,250,544 5.26% 1:N 11,930,579 5.44% 8,867,264 5.04% N:0 2,201,869 1.00% 705,230 0.40% 0:N 508,749 0.23% 357,149 0.20% N:N 18,242,362 8.32% 14,769,141 8.39% Total mapped reads 212,836,470 97.03% 171,097,514 97.23% 1, reads maps to unique region 0, read does not map to any region N, read maps to multiple regions. doi:10.1371/journal.pone.0010612.t001 Mesothelioma Genome Sequencing PLoS ONE | www.plosone.org 2 May 2010 | Volume 5 | Issue 5 | e10612