LSH Forest: Practical algorithms made theoretical

28Citations
Citations of this article
38Readers
Mendeley users who have this article in their library.

Abstract

We analyze LSH Forest [BCG05] a popular heuristic for the nearest neighbor search and show that a careful yet simple modification of it outperforms vanilla LSH algorithms. The end result is the first instance of a simple, practical algorithm that provably leverages data-dependent hashing to improve upon data-oblivious LSH. Here is the entire algorithm for the d-dimensional Hamming space. The LSH Forest, for a given dataset, applies a random permutation to all the d coordinates, and builds a trie on the resulting strings. In our modification, we further augment this trie: for each node, we store a constant number of points close to the mean of the corresponding subset of the dataset, which are compared to any query point reaching that node. The overall data structure is simply several such tries sampled independently. While the new algorithm does not quantitatively improve upon the best data-dependent hashing algorithms from [AR15] (which are known to be optimal), it is significantly simpler, being based on a practical heuristic, and is provably better than the best LSH algorithm for the Hamming space [IM98, HIM12].

Cite

CITATION STYLE

APA

Andoni, A., Razenshteyn, I., & Nosatzki, N. S. (2017). LSH Forest: Practical algorithms made theoretical. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (Vol. 0, pp. 67–78). Association for Computing Machinery. https://doi.org/10.1137/1.9781611974782.5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free