String mining in bioinformatics

21Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra-and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string-or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word data-mining is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14]. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Abouelhoda, M., & Ghanem, M. (2010). String mining in bioinformatics. In Scientific Data Mining and Knowledge Discovery: Principles and Foundations (pp. 207–247). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-02788-8_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free