Hash -join: Approximate string similarity join with hashing

Peisen Yuan; Chaofeng Sha; Yi Sun

Conference Proceedings

Hash -join: Approximate string similarity join with hashing

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8505 LNCS 217-229

DOI: 10.1007/978-3-662-43984-5_16

3Citations

1Readers

Get full text

Abstract

The string similarity join, which finds similar string pairs from string sets, has received extensive attention in database and information retrieval fields. To this problem, the filter-and-refine framework is usually adopted by the existing research work, and various filtering methods have been proposed. Recently, tree based index techniques with the edit distance constraint are effectively employed for evaluating the string similarity join. However, they do not scale well with large distance threshold. In this paper, we propose an approach for approximate string similarity join based on Min-Hashing locality sensitive hashing and trie-based index techniques. Our approach is flexible between trading the efficiency and performance. Empirical study using the real datasets demonstrates that our framework is more efficient and scales better. © 2014 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Yuan, P., Sha, C., & Sun, Y. (2014). Hash -join: Approximate string similarity join with hashing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8505 LNCS, pp. 217–229). Springer Verlag. https://doi.org/10.1007/978-3-662-43984-5_16

Hash -join: Approximate string similarity join with hashing

Abstract

Cite

Register to see more suggestions