locality-sensitive term weighting for short text clustering

Chu Tao Zheng; Sheng Qian; Wen Ming Cao; Hau San Wong

Conference Proceedings

locality-sensitive term weighting for short text clustering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10634 LNCS 434-444

DOI: 10.1007/978-3-319-70087-8_46

2Citations

5Readers

Get full text

Abstract

To alleviate sparseness in short text clustering, considerable researches investigate external information such as Wikipedia to enrich feature representation, which requires extra works and resources and might lead to possible inconsistency. Sparseness leads to weak connections between short texts, thus the similarity information is difficult to be measured. We introduce a special term-specific document set—potential locality set—to capture weak similarity. Specifically, for any two short documents within the same potential locality, the Jaccard similarity between them is greater than 0. In other words, the adjacency graph based on these weak connections is a complete graph. Further, a locality-sensitive term weighting scheme is proposed based on our potential locality set. Experimental results show the proposed approach builds more reliable neighborhood for short text data. Compared with another state-of-the-art algorithm, the proposed approach obtains better clustering performances, which verifies its effectiveness.

Author supplied keywords

Cite

CITATION STYLE

APA

Zheng, C. T., Qian, S., Cao, W. M., & Wong, H. S. (2017). locality-sensitive term weighting for short text clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10634 LNCS, pp. 434–444). Springer Verlag. https://doi.org/10.1007/978-3-319-70087-8_46

locality-sensitive term weighting for short text clustering

Abstract

Author supplied keywords

Cite

Register to see more suggestions