CREST: Cluster-based representation enrichment for short text classification

Zichao Dai; Aixin Sun; Xu Ying Liu

Conference Proceedings

CREST: Cluster-based representation enrichment for short text classification

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7819 LNAI(PART 2) 256-267

DOI: 10.1007/978-3-642-37456-2_22

22Citations

23Readers

Get full text

Abstract

Text classification has gained research interests for decades. Many techniques have been developed and have demonstrated very good classification accuracies in various applications. Recently, the popularity of social platforms has changed the way we access (and contribute) information. Particularly, short messages, comments, and status updates, are now becoming a large portion of the online text data. The shortness, and more importantly, the sparsity, of the short text data call for a revisit of text classification techniques developed for well-written documents such as news articles. In this paper, we propose a cluster-based representation enrichment method, namely Crest, to deal with the shortness and sparsity of short text. More specifically, we propose to enrich a short text representation by incorporating a vector of topical relevances in addition to the commonly adopted tf -idf representation. The topics are derived from the knowledge embedded in the short text collection of interest by using hierarchical clustering algorithm with purity control. Our experiments show that the enriched representation significantly improves the accuracy of short text classification. The experiments were conducted on a benchmark dataset consisting of Web snippets using Support Vector Machines (SVM) as the classifier. © Springer-Verlag 2013.

Author supplied keywords

Cite

CITATION STYLE

APA

Dai, Z., Sun, A., & Liu, X. Y. (2013). CREST: Cluster-based representation enrichment for short text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7819 LNAI, pp. 256–267). https://doi.org/10.1007/978-3-642-37456-2_22

CREST: Cluster-based representation enrichment for short text classification

Abstract

Author supplied keywords

Cite

Register to see more suggestions