Tolerance rough set-based bag-of-words model for document representation

2Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Document representation is one of the foundations of natural language processing. The bag-of-words (BoW) model, as the repre-sentative of document representation models, is a method with the properties of simplicity and validity. However, the traditional BoW model has the drawbacks of sparsity and lacking of latent semantic relations. In this paper, to solve these mentioned prob-lems, we propose two tolerance rough set-based BOW models, called as TRBoW1 and TRBoW2 according to different weight calculation methods. Different from the popular representation methods of supervision, they are unsupervised and no prior knowledge required. Extending each document to its upper approximation with TRBoW1 or TRBoW2, the semantic relations among documents are mined and document vectors become denser. Comparative experiments on various document representation methods for text classification on different datasets have verified optimal performance of our methods.

Cite

CITATION STYLE

APA

Qiu, D., Jiang, H., & Yan, R. (2020). Tolerance rough set-based bag-of-words model for document representation. International Journal of Computational Intelligence Systems, 13(1), 1218–1226. https://doi.org/10.2991/ijcis.d.200808.001

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free