Short text clustering via convolutional neural networks

Jiaming Xu; Peng Wang; Guanhua Tian; Bo Xu; Jun Zhao; Fangyuan Wang; Hongwei Hao

Conference ProceedingsOPEN ACCESS

Short text clustering via convolutional neural networks

1st Workshop on Vector Space Modeling for Natural Language Processing, VS 2015 at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2015 (2015) 62-69

DOI: 10.3115/v1/w15-1509

189Citations

251Readers

Abstract

Short text clustering has become an increasing important task with the popularity of social media, and it is a challenging problem due to its sparseness of text representation. In this paper, we propose a Short Text Clustering via Convolutional neural networks (abbr. to STCC), which is more beneficial for clustering by considering one constraint on learned features through a self-taught learning framework without using any external tags/labels. First, we embed the original keyword features into compact binary codes with a locality-preserving constraint. Then, word embeddings are explored and fed into convolutional neural networks to learn deep feature representations, with the output units fitting the pre-trained binary code in the training process. After obtaining the learned representations, we use K-means to cluster them. Our extensive experimental study on two public short text datasets shows that the deep feature representation learned by our approach can achieve a significantly better performance than some other existing features, such as term frequency-inverse document frequency, Laplacian eigenvectors and average embedding, for clustering.

Cite

CITATION STYLE

APA

Xu, J., Wang, P., Tian, G., Xu, B., Zhao, J., Wang, F., & Hao, H. (2015). Short text clustering via convolutional neural networks. In 1st Workshop on Vector Space Modeling for Natural Language Processing, VS 2015 at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2015 (pp. 62–69). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w15-1509

Short text clustering via convolutional neural networks

Abstract

Cite

Register to see more suggestions