Sparse additive generative models of text

226Citations
Citations of this article
301Readers
Mendeley users who have this article in their library.

Abstract

Generative models of text typically associate a multinomial with every class label or topic. Even in simple models this requires the estimation of thousands of parameters; in multifaceted latent variable models, standard approaches require additional latent "switching" variables for every token, complicating inference. In this paper, we propose an alternative generative model for text. The central idea is that each class label or latent topic is endowed with a model of the deviation in log-frequency from a constant background distribution. This approach has two key advantages: we can enforce sparsity to prevent overfitting, and we can combine generative facets through simple addition in log space, avoiding the need for latent switching variables. We demonstrate the applicability of this idea to a range of scenarios: classification, topic modeling, and more complex multifaceted generative models. Copyright 2011 by the author(s)/owner(s).

Cite

CITATION STYLE

APA

Eisenstein, J., Ahmed, A., & Xing, E. P. (2011). Sparse additive generative models of text. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011 (pp. 1041–1048).

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free