Landscape of the Dark Transcriptome Revealed Through Re-mining Massive RNA-Seq Data

6Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The “dark transcriptome” can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins (“orphan-ORFs”); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.

Cite

CITATION STYLE

APA

Li, J., Singh, U., Arendsee, Z., & Wurtele, E. S. (2021). Landscape of the Dark Transcriptome Revealed Through Re-mining Massive RNA-Seq Data. Frontiers in Genetics, 12. https://doi.org/10.3389/fgene.2021.722981

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free