Smoothing LDA model for text categorization

Wenbo Li; Le Sun; Yuanyong Feng; Dakun Zhang

Conference Proceedings

Smoothing LDA model for text categorization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 4993 LNCS 83-94

DOI: 10.1007/978-3-540-68636-1_9

4Citations

14Readers

Get full text

Abstract

Latent Dirichlet Allocation (LDA) is a document level language model. In general, LDA employ the symmetry Dirichlet distribution as prior of the topic-words' distributions to implement model smoothing. In this paper, we propose a data-driven smoothing strategy in which probability mass is allocated from smoothing-data to latent variables by the intrinsic inference procedure of LDA. In such a way, the arbitrariness of choosing latent variables' priors for the multi-level graphical model is overcome. Following this data-driven strategy, two concrete methods, Laplacian smoothing and Jelinek-Mercer smoothing, are employed to LDA model. Evaluations on different text categorization collections show data-driven smoothing can significantly improve the performance in balanced and unbalanced corpora. © 2008 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, W., Sun, L., Feng, Y., & Zhang, D. (2008). Smoothing LDA model for text categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4993 LNCS, pp. 83–94). https://doi.org/10.1007/978-3-540-68636-1_9

Smoothing LDA model for text categorization

Abstract

Author supplied keywords

Cite

Register to see more suggestions