Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra-and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string-or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word data-mining is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14]. © 2010 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Abouelhoda, M., & Ghanem, M. (2010). String mining in bioinformatics. In Scientific Data Mining and Knowledge Discovery: Principles and Foundations (pp. 207–247). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-02788-8_9
Mendeley helps you to discover research relevant for your work.