Remote homology and the functions of metagenomic dark matter

28Citations
Citations of this article
82Readers
Mendeley users who have this article in their library.

Abstract

Predicted open reading frames (ORFs) that lack detectable homology to known proteins are termed ORFans. Despite their prevalence in metagenomes, the extent to which ORFans encode real proteins, the degree to which they can be annotated, and their functional contributions, remain unclear. To gain insights into these questions, we applied sensitive remote-homology detection methods to functionally analyze ORFans from soil, marine, and human gut metagenome collections. ORFans were identified, clustered into sequence families, and annotated through profile-profile comparison to proteins of known structure. We found that a considerable number of metagenomic ORFans (73,896 of 484,121, 15.3%) exhibit significant remote homology to structurally characterized proteins, providing a means for ORFan functional profiling. The extent of detected remote homology far exceeds that obtained for artificial protein families (1.4%). As expected for real genes, the predicted functions of ORFans are significantly similar to the functions of their gene neighbors (p < 0.001). Compared to the functional profiles predicted through standard homology searches, ORFans show biologically intriguing differences. Many ORFan-enriched functions are virus-related and tend to reflect biological processes associated with extreme sequence diversity. Each environment also possesses a large number of unique ORFan families and functions, including some known to play important community roles such as gut microbial polysaccharide digestion. Lastly, ORFans are a valuable resource for finding novel enzymes of interest, as we demonstrate through the identification of hundreds of novel ORFan metalloproteases that all possess a signature catalytic motif despite a general lack of similarity to known proteins. Our ORFan functional predictions are a valuable resource for discovering novel protein families and exploring the boundaries of protein sequence space. All remote homology predictions are available at http://doxey.uwaterloo.ca/ORFans.

References Powered by Scopus

Gapped BLAST and PSI-BLAST: A new generation of protein database search programs

63179Citations
N/AReaders
Get full text

A human gut microbial gene catalogue established by metagenomic sequencing

8993Citations
N/AReaders
Get full text

Pfam: The protein families database

4856Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Baseline human gut microbiota profile in healthy people and standard reporting template

147Citations
N/AReaders
Get full text

A molecular portrait of de novo genes in yeasts

78Citations
N/AReaders
Get full text

An assessment of genome annotation coverage across the bacterial tree of life

67Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Lobb, B., Kurtz, D. A., Moreno-Hagelsieb, G., & Doxey, A. C. (2015). Remote homology and the functions of metagenomic dark matter. Frontiers in Genetics, 6(JUL). https://doi.org/10.3389/fgene.2015.00234

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 38

63%

Researcher 16

27%

Professor / Associate Prof. 6

10%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 40

60%

Biochemistry, Genetics and Molecular Bi... 18

27%

Immunology and Microbiology 6

9%

Environmental Science 3

4%

Article Metrics

Tooltip
Mentions
References: 1
Social Media
Shares, Likes & Comments: 16

Save time finding and organizing research with Mendeley

Sign up for free