Manipulating Large Corpora for Text Classification

5Citations
Citations of this article
74Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we address the problem of dealing with a large collection of data and propose a method for text classification which manipulates data using two well-known machine learning techniques, Naive Bayes(NB) and Support Vector Machines(SVMs). NB is based on the assumption of word independence in a text, which makes the computation of it far more efficient. SVMs, on the other hand, have the potential to handle large feature spaces, which makes it possible to produce better performance. The training data for SVMs are extracted using NB classifiers according to the category hierarchies, which makes it possible to reduce the amount of computation necessary for classification without sacrificing accuracy.

Cite

CITATION STYLE

APA

Fukumoto, F., & Suzuki, Y. (2002). Manipulating Large Corpora for Text Classification. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002 (pp. 196–203). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1118693.1118719

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free