Min-Hash is a reputable hashing technique which realizes set similarity search. Min-Hash assumes the Jaccard similarity as the similarity measure between two sets A and B. Accordingly, Min-Hash is not optimal for applications which would like to measure the set similarity with the intersection cardinality since the Jaccard similarity decreases irrespective of as the gap between |A| and |B| becomes larger. This paper shows that, by modifying Min-Hash slightly, we can effectively settle the above difficulty inherent to Min-Hash. Our method is shown to be valid both by theoretical analysis and with experiments.
CITATION STYLE
Koga, H., Suzuki, S., Itabashi, T., Pineda, G. F., & Toda, T. (2018). Extended Min-Hash Focusing on Intersection Cardinality. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11314 LNCS, pp. 17–26). Springer Verlag. https://doi.org/10.1007/978-3-030-03493-1_3
Mendeley helps you to discover research relevant for your work.