Sign up & Download
Sign in

Latent dirichlet allocation

by DM Blei, AY Ng
The Journal of Machine Learning ()
  • ISSN: 1532-4435

Abstract

We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

Cite this document (BETA)

Readership Statistics

22 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
36% Student (Master)
 
23% Ph.D. Student
 
9% Student (Bachelor)
by Country
 
5% Australia
 
5% China
 
5% Turkey

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in