A solution of the multiaspect text categorization problem by a hybrid HMM and LDA based technique

3Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In our previous work we introduced a novel concept of the multiaspect text categorization (MTC) task meant as a special, extended form of the text categorization (TC) problem which is widely studied in information retrieval. The essence of the MTC problem is the classification of documents on two levels: first, on a more or less standard level of thematic categories and then on the level of document sequences which is much less studied in the literature. The latter stage of classification, which is by far more challenging, is the main focus of this paper. A promising way of attacking it requires some kind of modeling of connections between documents forming sequences. To solve this problem we propose a novel approach that combines a well-known techniques to model sequences, i.e., the Hidden Markov Models (HMM) and the Latent Dirichlet Allocation (LDA) technique for the advanced document representation, hence obtaining a hybrid approach. We present details of our proposed approach as well as results of some computational experiments.

Cite

CITATION STYLE

APA

Zadrożny, S., Kacprzyk, J., & Gajewski, M. (2016). A solution of the multiaspect text categorization problem by a hybrid HMM and LDA based technique. In Communications in Computer and Information Science (Vol. 610, pp. 214–225). Springer Verlag. https://doi.org/10.1007/978-3-319-40596-4_19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free