An improved algorithm for SVMs classification of imbalanced data sets

Cristiano Leite Castro; Mateus Araujo Carvalho; Antônio Padua Braga

Conference Proceedings

An improved algorithm for SVMs classification of imbalanced data sets

Communications in Computer and Information Science (2009) 43 CCIS 108-118

DOI: 10.1007/978-3-642-03969-0_11

11Citations

16Readers

Get full text

Abstract

Support Vector Machines (SVMs) have strong theoretical foundations and excellent empirical success in many pattern recognition and data mining applications. However, when induced by imbalanced training sets, where the examples of the target class (minority) are outnumbered by the examples of the non-target class (majority), the performance of SVM classifier is not so successful. In medical diagnosis and text classification, for instance, small and heavily imbalanced data sets are common. In this paper, we propose the Boundary Elimination and Domination algorithm (BED) to enhance SVM class-prediction accuracy on applications with imbalanced class distributions. BED is an informative resampling strategy in input space. In order to balance the class distributions, our algorithm considers density information in training sets to remove noisy examples of the majority class and generate new synthetic examples of the minority class. In our experiments, we compared BED with original SVM and Synthetic Minority Oversampling Technique (SMOTE), a popular resampling strategy in the literature. Our results demonstrate that this new approach improves SVM classifier performance on several real world imbalanced problems. © 2009 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Castro, C. L., Carvalho, M. A., & Braga, A. P. (2009). An improved algorithm for SVMs classification of imbalanced data sets. In Communications in Computer and Information Science (Vol. 43 CCIS, pp. 108–118). https://doi.org/10.1007/978-3-642-03969-0_11

An improved algorithm for SVMs classification of imbalanced data sets

Abstract

Author supplied keywords

Cite

Register to see more suggestions