locality-sensitive term weighting for short text clustering

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

To alleviate sparseness in short text clustering, considerable researches investigate external information such as Wikipedia to enrich feature representation, which requires extra works and resources and might lead to possible inconsistency. Sparseness leads to weak connections between short texts, thus the similarity information is difficult to be measured. We introduce a special term-specific document set—potential locality set—to capture weak similarity. Specifically, for any two short documents within the same potential locality, the Jaccard similarity between them is greater than 0. In other words, the adjacency graph based on these weak connections is a complete graph. Further, a locality-sensitive term weighting scheme is proposed based on our potential locality set. Experimental results show the proposed approach builds more reliable neighborhood for short text data. Compared with another state-of-the-art algorithm, the proposed approach obtains better clustering performances, which verifies its effectiveness.

Author supplied keywords

Cite

CITATION STYLE

APA

Zheng, C. T., Qian, S., Cao, W. M., & Wong, H. S. (2017). locality-sensitive term weighting for short text clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10634 LNCS, pp. 434–444). Springer Verlag. https://doi.org/10.1007/978-3-319-70087-8_46

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free