A fast matching method based on semantic similarity for short texts

Jiaming Xu; Pengcheng Liu; Gaowei Wu; Zhengya Sun; Bo Xu; Hongwei Hao

Conference Proceedings

A fast matching method based on semantic similarity for short texts

Xu J
Liu P
Wu G
et al.

Communications in Computer and Information Science (2013) 400 299-309

DOI: 10.1007/978-3-642-41644-6_28

8Citations

21Readers

Get full text

Abstract

As the emergence of various social media, short texts, such as weibos and instant messages, are very prevalent on today's websites. In order to mine semantically similar information from massive data, a fast and efficient matching method for short texts has become an urgent task. However, the conventional matching methods suffer from the data sparsity in short documents. In this paper, we propose a novel matching method, referred as semantically similar hashing (SSHash). The basic idea of SSHash is to directly train a topic model from corpus rather than documents, then project texts into hash codes by using latent features. The major advantages of SSHash are that 1) SSHash alleviates the sparse problem in short texts, because we obtain the latent features from whole corpus regardless of document level; and 2) SSHash can accomplish similar matching in an interactive real time by introducing hash method. We carry out extensive experiments on real-world short texts. The results demonstrate that our method significantly outperforms baseline methods on several evaluation metrics. © Springer-Verlag Berlin Heidelberg 2013.

Author supplied keywords

Cite

CITATION STYLE

APA

Xu, J., Liu, P., Wu, G., Sun, Z., Xu, B., & Hao, H. (2013). A fast matching method based on semantic similarity for short texts. In Communications in Computer and Information Science (Vol. 400, pp. 299–309). Springer Verlag. https://doi.org/10.1007/978-3-642-41644-6_28

A fast matching method based on semantic similarity for short texts

Abstract

Author supplied keywords

Cite

Register to see more suggestions