A solution of the multiaspect text categorization problem by a hybrid HMM and LDA based technique

Sławomir Zadrożny; Janusz Kacprzyk; Marek Gajewski

Conference Proceedings

A solution of the multiaspect text categorization problem by a hybrid HMM and LDA based technique

Communications in Computer and Information Science (2016) 610 214-225

DOI: 10.1007/978-3-319-40596-4_19

3Citations

4Readers

Get full text

Abstract

In our previous work we introduced a novel concept of the multiaspect text categorization (MTC) task meant as a special, extended form of the text categorization (TC) problem which is widely studied in information retrieval. The essence of the MTC problem is the classification of documents on two levels: first, on a more or less standard level of thematic categories and then on the level of document sequences which is much less studied in the literature. The latter stage of classification, which is by far more challenging, is the main focus of this paper. A promising way of attacking it requires some kind of modeling of connections between documents forming sequences. To solve this problem we propose a novel approach that combines a well-known techniques to model sequences, i.e., the Hidden Markov Models (HMM) and the Latent Dirichlet Allocation (LDA) technique for the advanced document representation, hence obtaining a hybrid approach. We present details of our proposed approach as well as results of some computational experiments.

Author supplied keywords

Cite

CITATION STYLE

APA

Zadrożny, S., Kacprzyk, J., & Gajewski, M. (2016). A solution of the multiaspect text categorization problem by a hybrid HMM and LDA based technique. In Communications in Computer and Information Science (Vol. 610, pp. 214–225). Springer Verlag. https://doi.org/10.1007/978-3-319-40596-4_19

A solution of the multiaspect text categorization problem by a hybrid HMM and LDA based technique

Abstract

Author supplied keywords

Cite

Register to see more suggestions