Approximate Pattern Matching Using Search Schemes and In-Text Verification

Luca Renders; Lore Depuydt; Jan Fostier

Conference Proceedings

Approximate Pattern Matching Using Search Schemes and In-Text Verification

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13347 LNBI 419-435

DOI: 10.1007/978-3-031-07802-6_36

1Citations

2Readers

Get full text

Abstract

Search schemes enable the efficient identification of all approximate occurrences of a search pattern in a text. Using a bidirectional FM-index, search schemes describe how to explore the search space in such a way that runtime is minimized. Even though in-index matching has an optimal time complexity, relatively expensive random memory access is required for elementary operations on the FM-index. We analyze to what extent in-index matching can be complemented with in-text verification where a candidate occurrence is directly validated in the text using a bit-parallel, pairwise alignment procedure. We find that hybrid in-index/in-text matching can reduce the running time by more than a factor of two, compared to pure in-index matching. We present Columba 1.1, an open-source (AGPL-3.0 license) software tool written in C++ that efficiently implements these ideas. Using a single CPU core, Columba 1.1 can identify, within a maximum edit distance of four, all occurrences of 100 000 Illumina reads (150 bp) in the human reference genome in roughly half a minute. This significantly outperforms existing, state-of-the-art tools.

Author supplied keywords

Cite

CITATION STYLE

APA

Renders, L., Depuydt, L., & Fostier, J. (2022). Approximate Pattern Matching Using Search Schemes and In-Text Verification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13347 LNBI, pp. 419–435). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-07802-6_36

Approximate Pattern Matching Using Search Schemes and In-Text Verification

Abstract

Author supplied keywords

Cite

Register to see more suggestions