Unsupervised few-bits semantic hashing with implicit topics modeling

4Citations
Citations of this article
62Readers
Mendeley users who have this article in their library.

Abstract

Semantic hashing is a powerful paradigm for representing texts as compact binary hash codes. The explosion of short text data has spurred the demand of few-bits hashing. However, the performance of existing semantic hashing methods cannot be guaranteed when applied to few-bits hashing because of severe information loss. In this paper, we present a simple but effective unsupervised neural generative semantic hashing method with a focus on few-bits hashing. Our model is built upon variational autoencoder and represents each hash bit as a Bernoulli variable, which allows the model to be end-to-end trainable. To address the issue of information loss, we introduce a set of auxiliary implicit topic vectors. With the aid of these topic vectors, the generated hash codes are not only low-dimensional representations of the original texts but also capture their implicit topics. We conduct comprehensive experiments on four datasets. The results demonstrate that our approach achieves significant improvements over state-of-the-art semantic hashing methods in few-bits hashing.

Cite

CITATION STYLE

APA

Ye, F., Manotumruksa, J., & Yilmaz, E. (2020). Unsupervised few-bits semantic hashing with implicit topics modeling. In Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 (pp. 2566–2575). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.findings-emnlp.233

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free