MML-based approach for determining the number of topics in EDCM mixture models

7Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper proposes an unsupervised algorithm for learning a finite mixture model of the exponential family approximation to the Dirichlet Compound Multinomial (EDCM). An important part of the mixture modeling problem is determining the number of components that best describes the data. In this work, we extend the Minimum Message Length (MML) principle to determine the number of topics (clusters) in case of text modeling using a mixture of EDCMs. Parameters estimation is based on the previously proposed deterministic annealing expectation-maximization approach. The proposed method is validated using several document collections. A comparison with results obtained for other selection criteria is provided.

Cite

CITATION STYLE

APA

Zamzami, N., & Bouguila, N. (2018). MML-based approach for determining the number of topics in EDCM mixture models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10832 LNAI, pp. 211–217). Springer Verlag. https://doi.org/10.1007/978-3-319-89656-4_17

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free