String mining in bioinformatics

Mohamed Abouelhoda; Moustafa Ghanem

Book Chapter

String mining in bioinformatics

Springer Berlin Heidelberg, (2010), 207-247

DOI: 10.1007/978-3-642-02788-8_9

21Citations

20Readers

Get full text

Abstract

Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra-and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string-or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word data-mining is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14]. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Abouelhoda, M., & Ghanem, M. (2010). String mining in bioinformatics. In Scientific Data Mining and Knowledge Discovery: Principles and Foundations (pp. 207–247). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-02788-8_9

String mining in bioinformatics

Abstract

Cite

Register to see more suggestions