Learning a latent semantic representing from a large number of short text corpora makes a profound practical significance in research and engineering. However, it is difficult to use standard topic models in microblogging environments since microblogs have short length, large amount, snarled noise and irregular modality characters, which prevent topic models from using full information of microblogs. In this paper, we propose a novel non-probabilistic topic model called sparse topical coding with sparse groups (STCSG), which is capable of discovering sparse latent semantic representations of large short text corpora. STCSG relaxes the normalization constraint of the inferred representations with sparse group lasso, a sparsity-inducing regularizer, which is convenient to directly control the sparsity of document, topic and word codes. Furthermore, the relaxed non-probabilistic STCSG can be effectively learned with alternating direction method of multipliers (ADMM). Our experimental results on Twitter dataset demonstrate that STCSG performs well in finding meaningful latent representations of short documents. Therefore, it can substantially improve the accuracy and efficiency of document classification.
CITATION STYLE
Peng, M., Xie, Q., Huang, J., Zhu, J., Ouyang, S., Huang, J., & Tian, G. (2016). Sparse topical coding with sparse groups. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9658, pp. 415–426). Springer Verlag. https://doi.org/10.1007/978-3-319-39937-9_32
Mendeley helps you to discover research relevant for your work.