Tolerance rough set-based bag-of-words model for document representation

Dong Qiu; Haihuan Jiang; Ruiteng Yan

Journal ArticleOPEN ACCESS

Tolerance rough set-based bag-of-words model for document representation

International Journal of Computational Intelligence Systems (2020) 13(1) 1218-1226

DOI: 10.2991/ijcis.d.200808.001

2Citations

11Readers

Abstract

Document representation is one of the foundations of natural language processing. The bag-of-words (BoW) model, as the repre-sentative of document representation models, is a method with the properties of simplicity and validity. However, the traditional BoW model has the drawbacks of sparsity and lacking of latent semantic relations. In this paper, to solve these mentioned prob-lems, we propose two tolerance rough set-based BOW models, called as TRBoW1 and TRBoW2 according to different weight calculation methods. Different from the popular representation methods of supervision, they are unsupervised and no prior knowledge required. Extending each document to its upper approximation with TRBoW1 or TRBoW2, the semantic relations among documents are mined and document vectors become denser. Comparative experiments on various document representation methods for text classification on different datasets have verified optimal performance of our methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Qiu, D., Jiang, H., & Yan, R. (2020). Tolerance rough set-based bag-of-words model for document representation. International Journal of Computational Intelligence Systems, 13(1), 1218–1226. https://doi.org/10.2991/ijcis.d.200808.001

Tolerance rough set-based bag-of-words model for document representation

Abstract

Author supplied keywords

Cite

Register to see more suggestions