Manipulating Large Corpora for Text Classification

Fumiyo Fukumoto; Yoshimi Suzuki

Conference Proceedings

Manipulating Large Corpora for Text Classification

Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002 (2002) 196-203

DOI: 10.3115/1118693.1118719

5Citations

74Readers

Get full text

Abstract

In this paper, we address the problem of dealing with a large collection of data and propose a method for text classification which manipulates data using two well-known machine learning techniques, Naive Bayes(NB) and Support Vector Machines(SVMs). NB is based on the assumption of word independence in a text, which makes the computation of it far more efficient. SVMs, on the other hand, have the potential to handle large feature spaces, which makes it possible to produce better performance. The training data for SVMs are extracted using NB classifiers according to the category hierarchies, which makes it possible to reduce the amount of computation necessary for classification without sacrificing accuracy.

Cite

CITATION STYLE

APA

Fukumoto, F., & Suzuki, Y. (2002). Manipulating Large Corpora for Text Classification. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002 (pp. 196–203). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1118693.1118719

Manipulating Large Corpora for Text Classification

Abstract

Cite

Register to see more suggestions