In our previous work we introduced a novel concept of the multiaspect text categorization (MTC) task meant as a special, extended form of the text categorization (TC) problem which is widely studied in information retrieval. The essence of the MTC problem is the classification of documents on two levels: first, on a more or less standard level of thematic categories and then on the level of document sequences which is much less studied in the literature. The latter stage of classification, which is by far more challenging, is the main focus of this paper. A promising way of attacking it requires some kind of modeling of connections between documents forming sequences. To solve this problem we propose a novel approach that combines a well-known techniques to model sequences, i.e., the Hidden Markov Models (HMM) and the Latent Dirichlet Allocation (LDA) technique for the advanced document representation, hence obtaining a hybrid approach. We present details of our proposed approach as well as results of some computational experiments.
CITATION STYLE
Zadrożny, S., Kacprzyk, J., & Gajewski, M. (2016). A solution of the multiaspect text categorization problem by a hybrid HMM and LDA based technique. In Communications in Computer and Information Science (Vol. 610, pp. 214–225). Springer Verlag. https://doi.org/10.1007/978-3-319-40596-4_19
Mendeley helps you to discover research relevant for your work.