Clustering of metagenomic data by combining different distance functions

7Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

Metagenomics allows researchers to sequence genomes of many microorganisms directly from a natural environment, without the need to isolate them. The results of this type of sequencing are a huge set of DNA fragments of different organisms. These results pose a new computational challenge to identify the groups of DNA sequences that belong to the same organism. Even when there are big databases of known species genomes and some similarity-based supervised algorithms, they only have a very small representation of existing microorganisms and the process to identify a set of short fragments is very time consuming. For all those reasons, the reconstruction and identification process in a set of metagenomics fragments has a binning process, as a preprocess step, in order to join fragments into groups of the same taxonomic levels. In this paper, we propose a clustering algorithm based on k-means iterative and a consensus of clusters using different distance functions. The results achieved by the proposed method are divided using different lengths of sequences and different combinations of distances. The proposed method outperforms the simple and iterative k-means.

Cite

CITATION STYLE

APA

Bonet, I., Escobar, A., Mesa-Múnera, A., & Alzate, J. F. (2017). Clustering of metagenomic data by combining different distance functions. Acta Polytechnica Hungarica, 14(3), 223–236. https://doi.org/10.12700/APH.14.3.2017.3.13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free