Data Mining for Imbalanced Datasets: An Overview

  • Chawla N
N/ACitations
Citations of this article
426Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying ma- chine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performance of a classifier, might not be appropriate when the data is imbalanced andlor the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniques used for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.

Cite

CITATION STYLE

APA

Chawla, N. V. (2006). Data Mining for Imbalanced Datasets: An Overview. In Data Mining and Knowledge Discovery Handbook (pp. 853–867). Springer-Verlag. https://doi.org/10.1007/0-387-25465-x_40

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free