This contributions deals with a generative approach for the analysis of textual data. Instead of creating heuristic rules for the representation of documents and word counts, we employ a distribution able to model words along text considering different topics. In this regard, following Minka proposal [5], we implement a Dirichlet compound Multinomial distribution that is a mixture of random variables over words and topics. On the basis of this model we evaluate the predictive performance of the distribution by using seven different classifiers and taking into account the count of words in common between text document and reference class. © Springer-Verlag Berlin Heidelberg 2011.
CITATION STYLE
Bonafede, C. E., & Cerchiello, P. (2011). A study on text modelling via Dirichlet compound multinomial. In Studies in Classification, Data Analysis, and Knowledge Organization (pp. 115–123). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-642-13312-1_11
Mendeley helps you to discover research relevant for your work.