Lossless filter for finding long multiple approximate repetitions using a new data structure, the Bi-factor array

Pierre Peterlongo; Nadia Pisanti; Frederic Boyer; Marie France Sagot

Conference Proceedings

Lossless filter for finding long multiple approximate repetitions using a new data structure, the Bi-factor array

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3772 LNCS 179-190

DOI: 10.1007/11575832_20

14Citations

7Readers

Get full text

Abstract

Similarity search in texts, notably biological sequences, has received substantial attention in the last few years. Numerous filtration and indexing techniques have been created in order to speed up the resolution of the problem. However, previous filters were made for speeding up pattern matching, or for finding repetitions between two sequences or occurring twice in the same sequence. In this paper, we present an algorithm called NIMBUS for filtering sequences prior to finding repetitions occurring more than twice in a sequence or in more than two sequences. NIMBUS uses gapped seeds that are indexed with a new data structure, called a bi-factor array, that is also presented in this paper. Experimental results show that the filter can be very efficient: preprocessing with NIMBUS a data set where one wants to find functional elements using a multiple local alignment tool such as GLAM ([7]), the overall execution time can be reduced from 10 hours to 6 minutes while obtaining exactly the same results. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Peterlongo, P., Pisanti, N., Boyer, F., & Sagot, M. F. (2005). Lossless filter for finding long multiple approximate repetitions using a new data structure, the Bi-factor array. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3772 LNCS, pp. 179–190). https://doi.org/10.1007/11575832_20

Lossless filter for finding long multiple approximate repetitions using a new data structure, the Bi-factor array

Abstract

Cite

Register to see more suggestions