A strategy for predicting gene functions from genome and metagenome sequences on the basis of oligopeptide frequency distance

0Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

As a result of the extensive decoding of a massive amount of genomic and metagenomic sequence data, a large number of genes whose functions cannot be predicted by sequence similarity searches are accumulating, and such genes are of little use to science or industry. Current genome and metagenome sequencing largely depend on high-throughput and low-cost methods. In the case of genome sequencing for a single species, high-density sequencing can reduce sequencing errors. For metagenome sequences, however, high-density sequencing does not necessarily increase the sequence quality because multiple and unknown genomes, including those of closely related species, are likely to exist in the sample. There-fore, a function prediction method that is robust against sequence errors becomes an increased need. Here, we present a method for predicting protein gene function that does not depend on sequence similarity searches. Using an unsupervised machine learning method called BLSOM (batch-learning self-organizing map) for short oligopeptide frequencies, we previously developed a sequence alignment-free method for clustering bacterial protein genes according to clusters of orthologous groups of proteins (COGs), without using information from COGs during machine learning. This allows function-unknown proteins to cluster with function-known proteins, based solely on similarity with respect to oligopeptide frequency, although the method required high-performance supercomputers (HPCs). Based on a wide range of knowledge obtained with HPCs, we have now developed a strategy to cor-relate function-unknown proteins with COG categories, using only oligopeptide frequency distances (OPDs), which can be conducted with PC-level computers. The OPD strategy is suitable for predicting the functions of proteins with low sequence similarity and is applied here to predict the functions of a large number of gene candidates discovered using metagenome sequencing.

Cite

CITATION STYLE

APA

Abe, T., Ikarashi, R., Mizoguchi, M., Otake, M., & Ikemura, T. (2020). A strategy for predicting gene functions from genome and metagenome sequences on the basis of oligopeptide frequency distance. Genes and Genetic Systems, 95(1), 11–19. https://doi.org/10.1266/ggs.19-00041

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free