Cost-based sampling of individual instances

2Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In many practical domains, misclassification costs can differ greatly and may be represented by class ratios, however, most learning algorithms struggle with skewed class distributions. The difficulty is attributed to designing classifiers to maximize the accuracy. Researchers call for using several techniques to address this problem including; under-sampling the majority class, employing a probabilistic algorithm, and adjusting the classification threshold. In this paper, we propose a general sampling approach that assigns weights to individual instances according to the cost function. This approach helps reveal the relationship between classification performance and class ratios and allows the identification of an appropriate class distribution for which, the learning method achieves a reasonable performance on the data. Our results show that combining an ensemble of Naive Bayes classifiers with threshold selection and under-sampling techniques works well for imbalanced data. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Klement, W., Flach, P., Japkowicz, N., & Matwin, S. (2009). Cost-based sampling of individual instances. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5549 LNAI, pp. 86–97). https://doi.org/10.1007/978-3-642-01818-3_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free