Similarity evaluation of DNA sequences based on frequent patterns and entropy

Xiaojing Xie; Jihong Guan; Shuigeng Zhou

Journal ArticleOPEN ACCESS

Similarity evaluation of DNA sequences based on frequent patterns and entropy

BMC Genomics (2015) 16(3)

DOI: 10.1186/1471-2164-16-S3-S5

10Citations

25Readers

Abstract

Background: DNA sequence analysis is an important research topic in bioinformatics. Evaluating the similarity between sequences, which is crucial for sequence analysis, has attracted much research effort in the last two decades, and a dozen of algorithms and tools have been developed. These methods are based on alignment, word frequency and geometric representation respectively, each of which has its advantage and disadvantage. Results: In this paper, for effectively computing the similarity between DNA sequences, we introduce a novel method based on frequency patterns and entropy to construct representative vectors of DNA sequences. Experiments are conducted to evaluate the proposed method, which is compared with two recently-developed alignment-free methods and the BLASTN tool. When testing on the β-globin genes of 11 species and using the results from MEGA as the baseline, our method achieves higher correlation coefficients than the two alignment-free methods and the BLASTN tool. Conclusions: Our method is not only able to capture fine-granularity information (location and ordering) of DNA sequences via sequence blocking, but also insensitive to noise and sequence rearrangement due to considering only the maximal frequent patterns. It outperforms major existing methods or tools.

Author supplied keywords

Cite

CITATION STYLE

APA

Xie, X., Guan, J., & Zhou, S. (2015). Similarity evaluation of DNA sequences based on frequent patterns and entropy. BMC Genomics, 16(3). https://doi.org/10.1186/1471-2164-16-S3-S5

Similarity evaluation of DNA sequences based on frequent patterns and entropy

Abstract

Author supplied keywords

Cite

Register to see more suggestions