Document representation is one of the foundations of natural language processing. The bag-of-words (BoW) model, as the repre-sentative of document representation models, is a method with the properties of simplicity and validity. However, the traditional BoW model has the drawbacks of sparsity and lacking of latent semantic relations. In this paper, to solve these mentioned prob-lems, we propose two tolerance rough set-based BOW models, called as TRBoW1 and TRBoW2 according to different weight calculation methods. Different from the popular representation methods of supervision, they are unsupervised and no prior knowledge required. Extending each document to its upper approximation with TRBoW1 or TRBoW2, the semantic relations among documents are mined and document vectors become denser. Comparative experiments on various document representation methods for text classification on different datasets have verified optimal performance of our methods.
CITATION STYLE
Qiu, D., Jiang, H., & Yan, R. (2020). Tolerance rough set-based bag-of-words model for document representation. International Journal of Computational Intelligence Systems, 13(1), 1218–1226. https://doi.org/10.2991/ijcis.d.200808.001
Mendeley helps you to discover research relevant for your work.