Short Text Similarity Measurement Using Context from Bag of Word Pairs and Word Co-occurrence

4Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the rapid development of social networks, short texts have become a prevalent form of social communications on the Internet. Measuring the similarity between short texts is a fundamental task to many applications, such as social network text querying, short text clustering and geographical event detection for smart city. However, short texts in social media always show limited contextual information and they are sparse, noisy and ambiguous. Hence, effectively measuring the distance between short texts is a challenging task. In this paper, we propose a new heuristic word pair distance measurement (WPDM) technique for short texts, which exploits the corpus level word relations and enriches the context of each short text with bag of word pairs representation. We first adjust Jaccard similarity to measure the distance between words. Then, words are paired up to capture latent semantics in a short text document and thus transfer short text into a bag of word pairs representation. The similarity between short text documents is finally calculated through averaging the distances of the word pairs. Experimental results on a real-world dataset demonstrate that the proposed WPDM is effective and achieves much better performance than state-of-the-art methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Yang, S., Huang, G., & Ofoghi, B. (2020). Short Text Similarity Measurement Using Context from Bag of Word Pairs and Word Co-occurrence. In Communications in Computer and Information Science (Vol. 1179 CCIS, pp. 221–231). Springer. https://doi.org/10.1007/978-981-15-2810-1_22

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free