Intelligent data recognition of DNA sequences using statistical models

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The intelligent data acquisition in biological sequences is a hard and challenge problem since most biological sequences contain unknowledgeable, diverse and huge data. However, the intelligent data acquisition reduces a demand on the use of high computation methods because the data are more compact and more precise. We propose a novel approach for discovering sequence signatures, which are sufficiently distinctive information in identifying the sequences. The signatures are derived from the best combination of the n-grams and the statistical scoring models. From our experiments in applying them to identify the Influenza virus, we found that the identifiers constructed by too short n-gram signatures and inappropriate scoring models get low efficiency since the inappropriate combinations of n-gram signatures and scoring models bring about unbalanced class and pattern score distribution. However, the other identifiers provide accuracy over 80% and up to 100%, when they apply an appropriate combination. In addition to accomplishing in the signature recognition, our proposed approach also requires low computation time for the biological sequence identification. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Keinduangjun, J., Piamsa-nga, P., & Poovorawan, Y. (2005). Intelligent data recognition of DNA sequences using statistical models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3776 LNCS, pp. 630–635). Springer Verlag. https://doi.org/10.1007/11590316_100

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free