The doppelganger effect: Hidden duplicates in databases of transcriptome profiles

Levi Waldron; Markus Riester; Marcel Ramos; Giovanni Parmigiani; Michael Birrer

Journal ArticleOPEN ACCESS

The doppelganger effect: Hidden duplicates in databases of transcriptome profiles

Journal of the National Cancer Institute (2016) 108(11)

DOI: 10.1093/jnci/djw146

11Citations

23Readers

Abstract

Whole-genome analysis of cancer specimens is commonplace, and investigators frequently share or re-use specimens in later studies. Duplicate expression profiles in public databases will impact re-analysis if left undetected, a so-called "doppelg anger" effect.We propose a method that should be routine practice to accurately match duplicate cancer transcriptomes when nucleotide-level sequence data are unavailable, even for samples profiled by different microarray technologies or by both microarray and RNA sequencing. We demonstrate the effectiveness of the method in databases containing dozens of datasets and thousands of ovarian, breast, bladder, and colorectal cancer microarray profiles and of matching microarray and RNA sequencing expression profiles from The Cancer Genome Atlas (TCGA). We identified probable duplicates among more than 50% of studies, originating in different continents, using different technologies, published years apart, and even within the TCGA itself. Finally, we provide the doppelgangR Bioconductor package for screening transcriptome databases for duplicates. Given the potential for unrecognized duplication to falsely inflate prediction accuracy and confidence in differential expression, doppelganger-checking should be a part of standard procedure for combining multiple genomic datasets.

Cite

CITATION STYLE

APA

Waldron, L., Riester, M., Ramos, M., Parmigiani, G., & Birrer, M. (2016). The doppelganger effect: Hidden duplicates in databases of transcriptome profiles. Journal of the National Cancer Institute, 108(11). https://doi.org/10.1093/jnci/djw146

The doppelganger effect: Hidden duplicates in databases of transcriptome profiles

Abstract

Cite

Register to see more suggestions