Clustering of metagenomic data by combining different distance functions

Isis Bonet; Adriana Escobar; Andrea Mesa-Múnera; Juan Fernando Alzate

Journal ArticleOPEN ACCESS

Clustering of metagenomic data by combining different distance functions

Acta Polytechnica Hungarica (2017) 14(3) 223-236

DOI: 10.12700/APH.14.3.2017.3.13

7Citations

6Readers

Abstract

Metagenomics allows researchers to sequence genomes of many microorganisms directly from a natural environment, without the need to isolate them. The results of this type of sequencing are a huge set of DNA fragments of different organisms. These results pose a new computational challenge to identify the groups of DNA sequences that belong to the same organism. Even when there are big databases of known species genomes and some similarity-based supervised algorithms, they only have a very small representation of existing microorganisms and the process to identify a set of short fragments is very time consuming. For all those reasons, the reconstruction and identification process in a set of metagenomics fragments has a binning process, as a preprocess step, in order to join fragments into groups of the same taxonomic levels. In this paper, we propose a clustering algorithm based on k-means iterative and a consensus of clusters using different distance functions. The results achieved by the proposed method are divided using different lengths of sequences and different combinations of distances. The proposed method outperforms the simple and iterative k-means.

Author supplied keywords

Cite

CITATION STYLE

APA

Bonet, I., Escobar, A., Mesa-Múnera, A., & Alzate, J. F. (2017). Clustering of metagenomic data by combining different distance functions. Acta Polytechnica Hungarica, 14(3), 223–236. https://doi.org/10.12700/APH.14.3.2017.3.13

Clustering of metagenomic data by combining different distance functions

Abstract

Author supplied keywords

Cite

Register to see more suggestions