Approximate substring matching is a common problem in many applications. In this paper, we study approximate substring matching with edit distance constraints. Existing methods are very sensitive to query strings or query parameters like query length and edit distance. To address the problem, we propose a new approach using partition scheme. It first partitions a query into several segments, and finds matching substrings of these segments as candidates, then performs a bidirectional verification on these candidates to get final results. We devise an even partition scheme to efficiently find candidates, and a best partition scheme to find high quality candidates. Furthermore, through theoretical analysis, we find that the best partition scheme cannot always outperform the even partition scheme. Thus we propose an adaptive approach for selectively choosing scheme using statistic knowledge. We conduct comprehensive experiments to demonstrate the efficiency and quality of our proposed method.
CITATION STYLE
Wang, J., Yang, X., Wang, B., & Liu, C. (2016). An adaptive approach of approximate substring matching. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9642, pp. 501–516). Springer Verlag. https://doi.org/10.1007/978-3-319-32025-0_31
Mendeley helps you to discover research relevant for your work.