Finding approximate overlaps is the first phase of many sequence assembly methods. Given a set of r strings of total length n and an error-rate ε, the goal is to find, for all-pairs of strings, their suffix/prefix matches (overlaps) that are within edit distance k = [εl], where l is the length of the overlap. We propose new solutions for this problem based on backward backtracking (Lam et al. 2008) and suffix filters (Kärkkäinen and Na, 2008). Techniques use nHk + o(n log σ) + r log r bits of space, where Hk is the k-th order entropy and σ the alphabet size. In practice, methods are easy to parallelize and scale up to millions of DNA reads. © Springer-Verlag Berlin Heidelberg 2010.
CITATION STYLE
Välimäki, N., Ladra, S., & Mäkinen, V. (2010). Approximate all-pairs suffix/prefix overlaps. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6129 LNCS, pp. 76–87). https://doi.org/10.1007/978-3-642-13509-5_8
Mendeley helps you to discover research relevant for your work.