Boosting with data generation: Improving the classification of hard to learn examples

Hongyu Guo; Herna L. Viktor

Conference Proceedings

Boosting with data generation: Improving the classification of hard to learn examples

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2004) 3029 1082-1091

DOI: 10.1007/978-3-540-24677-0_111

20Citations

12Readers

Get full text

Abstract

An ensemble of classifiers consists of a set of individually trained classifiers whose predictions are combined to classify new instances. In particular, boosting is an ensemble method where the performance of weak classifiers is improved by focusing on "hard examples" which are difficult to classify. Recent studies have indicated that boosting algorithm is applicable to a broad spectrum of problems with great success. However, boosting algorithms frequently suffer from over-emphasizing the hard examples, leading to poor training and test set accuracies. Also, the knowledge acquired from such hard examples may be insufficient to improve the overall accuracy of the ensemble. This paper describes a new algorithm to solve the above-mentioned problems through data generation. In the DataBoost method, hard examples are identified during each of the iterations of the boosting algorithm. Subsequently, the hard examples are used to generate synthetic training data. These synthetic examples are added to the original training set and are used for farther training. The paper shows the results of this approach against ten data sets, using both decision trees and neural networks as base classifiers. The experiments show promising results, in terms of the overall accuracy obtained.

Cite

CITATION STYLE

APA

Guo, H., & Viktor, H. L. (2004). Boosting with data generation: Improving the classification of hard to learn examples. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3029, pp. 1082–1091). Springer Verlag. https://doi.org/10.1007/978-3-540-24677-0_111

Boosting with data generation: Improving the classification of hard to learn examples

Abstract

Cite

Register to see more suggestions