A fast matching method based on semantic similarity for short texts

8Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

As the emergence of various social media, short texts, such as weibos and instant messages, are very prevalent on today's websites. In order to mine semantically similar information from massive data, a fast and efficient matching method for short texts has become an urgent task. However, the conventional matching methods suffer from the data sparsity in short documents. In this paper, we propose a novel matching method, referred as semantically similar hashing (SSHash). The basic idea of SSHash is to directly train a topic model from corpus rather than documents, then project texts into hash codes by using latent features. The major advantages of SSHash are that 1) SSHash alleviates the sparse problem in short texts, because we obtain the latent features from whole corpus regardless of document level; and 2) SSHash can accomplish similar matching in an interactive real time by introducing hash method. We carry out extensive experiments on real-world short texts. The results demonstrate that our method significantly outperforms baseline methods on several evaluation metrics. © Springer-Verlag Berlin Heidelberg 2013.

Cite

CITATION STYLE

APA

Xu, J., Liu, P., Wu, G., Sun, Z., Xu, B., & Hao, H. (2013). A fast matching method based on semantic similarity for short texts. In Communications in Computer and Information Science (Vol. 400, pp. 299–309). Springer Verlag. https://doi.org/10.1007/978-3-642-41644-6_28

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free