MapReduce based personalized locality sensitive hashing for similarity joins on large scale data

Jingjing Wang; Chen Lin

Journal ArticleOPEN ACCESS

MapReduce based personalized locality sensitive hashing for similarity joins on large scale data

Computational Intelligence and Neuroscience (2015) 2015

DOI: 10.1155/2015/217216

9Citations

13Readers

Abstract

Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives and false negatives is favored. To address these problems, in this paper we propose Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is embedded to tailor the number of false positives, false negatives, and the sum of both. PLSH is implemented in parallel using MapReduce framework to deal with similarity joins on large scale data. Experimental studies on real and simulated data verify the efficiency and effectiveness of our proposed PLSH technique, compared with state-of-the-art methods.

Cite

CITATION STYLE

APA

Wang, J., & Lin, C. (2015). MapReduce based personalized locality sensitive hashing for similarity joins on large scale data. Computational Intelligence and Neuroscience, 2015. https://doi.org/10.1155/2015/217216

MapReduce based personalized locality sensitive hashing for similarity joins on large scale data

Abstract

Cite

Register to see more suggestions