A novel hierarchical clustering algorithm for gene sequences

Dan Wei; Qingshan Jiang; Yanjie Wei; Shengrui Wang

Journal ArticleOPEN ACCESS

A novel hierarchical clustering algorithm for gene sequences

BMC Bioinformatics (2012) 13(1)

DOI: 10.1186/1471-2105-13-174

74Citations

135Readers

Abstract

Background: Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the occurrence, location and order relation of k-tuples in DNA sequence. Afterwards, a hierarchical procedure is applied to clustering DNA sequences based on the feature vectors.Results: The proposed distance measure and clustering method are evaluated by clustering functionally related genes and by phylogenetic analysis. This method is also compared with BlastClust, CD-HIT-EST and some others. The experimental results show our method is effective in classifying DNA sequences with similar biological characteristics and in discovering the underlying relationship among the sequences.Conclusions: We introduced a novel clustering algorithm which is based on a new sequence similarity measure. It is effective in classifying DNA sequences with similar biological characteristics and in discovering the relationship among the sequences. © 2012 Wei et al.; licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

Wei, D., Jiang, Q., Wei, Y., & Wang, S. (2012). A novel hierarchical clustering algorithm for gene sequences. BMC Bioinformatics, 13(1). https://doi.org/10.1186/1471-2105-13-174

A novel hierarchical clustering algorithm for gene sequences

Abstract

Cite

Register to see more suggestions