Short Text Similarity Measurement Using Context from Bag of Word Pairs and Word Co-occurrence

Shuiqiao Yang; Guangyan Huang; Bahadorreza Ofoghi

Conference Proceedings

Short Text Similarity Measurement Using Context from Bag of Word Pairs and Word Co-occurrence

Communications in Computer and Information Science (2020) 1179 CCIS 221-231

DOI: 10.1007/978-981-15-2810-1_22

4Citations

8Readers

Get full text

Abstract

With the rapid development of social networks, short texts have become a prevalent form of social communications on the Internet. Measuring the similarity between short texts is a fundamental task to many applications, such as social network text querying, short text clustering and geographical event detection for smart city. However, short texts in social media always show limited contextual information and they are sparse, noisy and ambiguous. Hence, effectively measuring the distance between short texts is a challenging task. In this paper, we propose a new heuristic word pair distance measurement (WPDM) technique for short texts, which exploits the corpus level word relations and enriches the context of each short text with bag of word pairs representation. We first adjust Jaccard similarity to measure the distance between words. Then, words are paired up to capture latent semantics in a short text document and thus transfer short text into a bag of word pairs representation. The similarity between short text documents is finally calculated through averaging the distances of the word pairs. Experimental results on a real-world dataset demonstrate that the proposed WPDM is effective and achieves much better performance than state-of-the-art methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Yang, S., Huang, G., & Ofoghi, B. (2020). Short Text Similarity Measurement Using Context from Bag of Word Pairs and Word Co-occurrence. In Communications in Computer and Information Science (Vol. 1179 CCIS, pp. 221–231). Springer. https://doi.org/10.1007/978-981-15-2810-1_22

Short Text Similarity Measurement Using Context from Bag of Word Pairs and Word Co-occurrence

Abstract

Author supplied keywords

Cite

Register to see more suggestions