Many classification problems, such as text classification, require the ability to handle the high dimension of a structured representation of the documents. The enormous size of the data would result in burdensome computations. Consequently, there is a strong need for reducing the quantity of handled information to develop the classification process. In this paper, we propose a dimensionality reduction technique on text datasets based on a clustering method to group documents with a simple Hidden Markov Model to represent them. We have applied the new method on the OHSUMED benchmark text corpora using the k-NN and SVM classifiers. The results obtained are very satisfactory and demonstrate the suitability of the proposed technique for the problem of dimensionality reduction and document classification.
CITATION STYLE
Vieira, A. S., Iglesias, E. L., & Borrajo, L. (2015). A new dimensionality reduction technique based on HMM for boosting document classification. In Advances in Intelligent Systems and Computing (Vol. 375, pp. 69–77). Springer Verlag. https://doi.org/10.1007/978-3-319-19776-0_8
Mendeley helps you to discover research relevant for your work.