Improving generalization by data categorization

15Citations
Citations of this article
30Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In most of the learning algorithms, examples in the training set are treated equally. Some examples, however, carry more reliable or critical information about the target than the others, and some may carry wrong information. According to their intrinsic margin, examples can be grouped into three categories: typical, critical, and noisy. We propose three methods, namely the selection cost, SVM confidence margin, and AdaBoost data weight, to automatically group training examples into these three categories. Experimental results on artificial datasets show that, although the three methods have quite different nature, they give similar and reasonable categorization. Results with real-world datasets further demonstrate that treating the three data categories differently in learning can improve generalization. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Li, L., Pratap, A., Lin, H. T., & Abu-Mostafa, Y. S. (2005). Improving generalization by data categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3721 LNAI, pp. 157–168). Springer Verlag. https://doi.org/10.1007/11564126_19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free