A non-trivial obstacle in good text classification for information filtering and retrieval (IF/IR) is the dimensionality of the data. This paper proposes a technique using Rough Set Theory to alleviate this situation. Given corpora of documents and a training set of examples of classified documents, the technique locates a minimal set of coordinate keywords to distinguish between classes of documents, reducing the dimensionality of the keyword vectors. This simplifies the creation of knowledge-based IF/IR systems, speeds up their operation, and allows easy editing of the rule bases employed. The paper describes the proposed technique, discusses the integration of a keyword acquisition algorithm with a rough set-based dimensionality reduction algorithm, and provides experimental results of a proof-of-concept implementation.
CITATION STYLE
Chouchoulas, A., & Shen, Q. (1999). A rough set-based approach to text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1711, pp. 118–127). Springer Verlag. https://doi.org/10.1007/978-3-540-48061-7_16
Mendeley helps you to discover research relevant for your work.