Applying Support Vector Machines to Imbalanced Datasets

  • Akbani R
  • Kwek S
  • Japkowicz N
  • 2

    Readers

    Mendeley users who have this article in their library.
  • N/A

    Citations

    Citations of this article.

Abstract

Support Vector Machines (SVM) have been extensively studied and have shown remarkable success in many applications. However the success of SVM is very limited when it is applied to the problem of learning from imbal- anced datasets in which negative instances heavily outnumber the positive in- stances (e.g. in gene profiling and detecting credit card fraud). This paper dis- cusses the factors behind this failure and explains why the common strategy of undersampling the training data may not be the best choice for SVM. We then propose an algorithm for overcoming these problems which is based on a vari- ant of the SMOTE algorithm by Chawla et al, combined with Veropoulos et al’s different error costs algorithm. We compare the performance of our algorithm against these two algorithms, along with undersampling and regular SVM and show that our algorithm outperforms all of them.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

There are no full text links

Authors

  • Rehan Akbani

  • Stephen Kwek

  • Nathalie Japkowicz

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free