With the rapid development of social networks, short texts have become a prevalent form of social communications on the Internet. Measuring the similarity between short texts is a fundamental task to many applications, such as social network text querying, short text clustering and geographical event detection for smart city. However, short texts in social media always show limited contextual information and they are sparse, noisy and ambiguous. Hence, effectively measuring the distance between short texts is a challenging task. In this paper, we propose a new heuristic word pair distance measurement (WPDM) technique for short texts, which exploits the corpus level word relations and enriches the context of each short text with bag of word pairs representation. We first adjust Jaccard similarity to measure the distance between words. Then, words are paired up to capture latent semantics in a short text document and thus transfer short text into a bag of word pairs representation. The similarity between short text documents is finally calculated through averaging the distances of the word pairs. Experimental results on a real-world dataset demonstrate that the proposed WPDM is effective and achieves much better performance than state-of-the-art methods.
CITATION STYLE
Yang, S., Huang, G., & Ofoghi, B. (2020). Short Text Similarity Measurement Using Context from Bag of Word Pairs and Word Co-occurrence. In Communications in Computer and Information Science (Vol. 1179 CCIS, pp. 221–231). Springer. https://doi.org/10.1007/978-981-15-2810-1_22
Mendeley helps you to discover research relevant for your work.