Super-linear indices for approximate dictionary searching

Leonid Boytsov

Conference Proceedings

Super-linear indices for approximate dictionary searching

Boytsov L

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7404 LNCS 162-176

DOI: 10.1007/978-3-642-32153-5_12

3Citations

12Readers

Get full text

Abstract

We present experimental analysis of approximate search algorithms that involve indexing of deletion neighborhoods. These methods require huge indices whose sizes grow exponentially with respect to the maximum allowable number of errors k. Despite extraordinary space requirements, the super-linear indices are of great interest, because they provide some of the shortest retrieval times. A straightforward implementation that creates a hash index directly over residual strings (obtained by deletions from dictionary words) is not space efficient. Rather than memorizing complete residual strings, we record only deleted characters and their respective positions. These data are indexed using a perfect hash function computed for a set of residual dictionary strings [2]. We carry out an experimental evaluation of this approach against several well-known benchmarks (including FastSS, which stores residual strings directly [3]). Experiments show that our implementation has a comparable or superior performance to that of the fastest benchmarks. At the same time, our implementation requires 4-8 times less space as compared to FastSS. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Boytsov, L. (2012). Super-linear indices for approximate dictionary searching. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7404 LNCS, pp. 162–176). Springer Verlag. https://doi.org/10.1007/978-3-642-32153-5_12

Super-linear indices for approximate dictionary searching

Abstract

Author supplied keywords

Cite

Register to see more suggestions