Longest common prefix with mismatches

12Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The Longest Common Prefix (LCP) array is a data structure commonly used in combination with the Suffix Array. However, in some settings we are interested in the LCP values per se since they provide useful information on the repetitiveness of the underlying sequence. Since sequences can contain alterations, which can be either malicious (plagiarism attempts) or pseudo-random (as in sequencing experiments), when using LCP values to measure repetitiveness it makes sense to allow for a small number of errors. In this paper we formalize this notion by considering the longest common prefix in the presence of mismatches. In particular, we propose an algorithm that computes, for each text suffix, the length of its longest prefix that occurs elsewhere in the text with at most one mismatch. For a sequence of length n our algorithm uses Θ(n log n) bits and runs in O(nLave log n/ log log n) time where Lave is the average LCP of the input sequence. Although Lave is Θ(n) in the worst case, recent analyses of real world data show that it usually grows logarithmically with the input size. We then describe and analyse a second algorithm that uses a greedy strategy to reduce the amount of computation and that can be turned into an even faster algorithm if allow an additive one-sided error. Finally, we consider the related problem of computing the 1- mappability of a sequence. In this problem we are asked to compute, for each length-m substring of the input sequence, the number of other substrings which are at Hamming distance one. For this problem we propose an algorithm that takes O(mnlog n/ log log n) time using Θ(n log n) bits of space.

Cite

CITATION STYLE

APA

Manzini, G. (2015). Longest common prefix with mismatches. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9309, pp. 299–310). Springer Verlag. https://doi.org/10.1007/978-3-319-23826-5_29

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free