Analyses of Multi-collection Corpora via Compound Topic Modeling

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Popular probabilistic topic models have typically centered on one single text collection, which is deficient for comparative text analyses. We consider a setting where we have partitionable corpora. Each subcollection shares a single set of topics, but there exists relative variation in topic proportions among collections. We propose the compound latent Dirichlet allocation (cLDA) model that encourages generalizability, depends less on user-input parameters, and includes any prior knowledge corpus organization structure. For parameter estimation, we study Markov chain Monte Carlo (MCMC) and variational inference approaches extensively and suggest an efficient MCMC method. We evaluate cLDA using both synthetic and real-world corpora and cLDA shows superior performance over the state-of-the-art models.

Cite

CITATION STYLE

APA

George, C. P., Xia, W., & Michailidis, G. (2019). Analyses of Multi-collection Corpora via Compound Topic Modeling. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11943 LNCS, pp. 205–218). Springer. https://doi.org/10.1007/978-3-030-37599-7_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free