Alignment-free approaches for sequence similarity based on substring composition are increasingly attracting interest from the scientific community. In fact, in several contexts, with respect to alignmentbased approaches, alignment-free techniques are faster but less accurate. Recently, several studies (e.g. [4,8,9]) attempted to bridge the accuracy gap with the introduction of approximate matches in the definition of composition-based distance measures. In this work we present MissMax, an exact algorithm for the computation of the longest common substring with mismatches between each suffix of a sequence x and a sequence y. This collection of statistics is useful for the computation of two similarity distances that have been recently extended to incorporate approximate matching, namely the longest and the average common substring with k mismatches. Our approach is exact, and it is based on a filtering technique that showed, in a set of preliminary experiments, to substantially reduce the size of the set of potential sites of a longest match.
CITATION STYLE
Pizzi, C. (2015). A filtering approach for Alignment-Free Biosequences comparison with Mismatches. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9289, pp. 231–242). Springer Verlag. https://doi.org/10.1007/978-3-662-48221-6_17
Mendeley helps you to discover research relevant for your work.