Enhancement of short text clustering by iterative classification

Md Rashadul Hasan Rakib; Norbert Zeh; Magdalena Jankowska; Evangelos Milios

Conference ProceedingsOPEN ACCESS

Enhancement of short text clustering by iterative classification

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12089 LNCS 105-117

DOI: 10.1007/978-3-030-51310-8_10

24Citations

29Readers

Abstract

Short text clustering is a challenging task due to the lack of signal contained in short texts. In this work, we propose iterative classification as a method to boost the clustering quality of short texts. The idea is to repeatedly reassign (classify) outliers to clusters until the cluster assignment stabilizes. The classifier used in each iteration is trained using the current set of cluster labels of the non-outliers; the input of the first iteration is the output of an arbitrary clustering algorithm. Thus, our method does not require any human-annotated labels for training. Our experimental results show that the proposed clustering enhancement method not only improves the clustering quality of different baseline clustering methods (e.g., k-means, k-means--, and hierarchical clustering) but also outperforms the state-of-the-art short text clustering methods on several short text datasets by a statistically significant margin.

Author supplied keywords

Cite

CITATION STYLE

APA

Rakib, M. R. H., Zeh, N., Jankowska, M., & Milios, E. (2020). Enhancement of short text clustering by iterative classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12089 LNCS, pp. 105–117). Springer. https://doi.org/10.1007/978-3-030-51310-8_10

Enhancement of short text clustering by iterative classification

Abstract

Author supplied keywords

Cite

Register to see more suggestions