Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs?

32Citations
Citations of this article
82Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Current genome sequencing projects reveal substantial numbers of taxonomically restricted, so called orphan genes that lack homology with genes from other evolutionary lineages. However, it is not clear to what extent orphan genes are real, genomic artifacts, or represent non-coding RNAs. Results: Here, we use a simple set of assumptions to test the nature of orphan genes. First, a sequence that is transcribed is considered a real biological entity. Second, every sequence that is supported by proteome data or shows a depletion of non-synonymous substitutions is a protein-coding gene. Using genomic, transcriptomic and proteomic data for the nematode Pristionchus pacificus, we show that between 4129-7997 (42-81 %) of predicted orphan genes are expressed and 3818-7545 (39-76 %) of orphan genes are under negative selection. In three cases that exhibited strong evolutionary constraint but lacked expression evidence in 14 RNA-seq samples, we could experimentally validate the predicted gene structures. Comparing different data sets to infer selection on orphan gene clusters, we find that the presence of a closely related genome provides the most powerful resource to robustly identify evidence of negative selection. However, even in the absence of other genomic data, the availability of paralogous sequences was enough to show negative selection in 8-10 % of orphan genes. Conclusions: Our study shows that the great majority of previously identified orphan genes in P. pacificus are indeed protein-coding genes. Even though this work represents a case study on a single species, our approach can be transferred to genomic data of other non-model organisms in order to ascertain the protein-coding nature of orphan genes.

Cite

CITATION STYLE

APA

Prabh, N., & Rödelsperger, C. (2016). Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs? BMC Bioinformatics, 17(1). https://doi.org/10.1186/s12859-016-1102-x

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free