Impact of imbalanced data on the performance of software defect prediction classifiers

Lichao Wang; Wei Wang; Bingyou Liu; Shuqiao Geng

Conference ProceedingsOPEN ACCESS

Impact of imbalanced data on the performance of software defect prediction classifiers

Journal of Physics: Conference Series (2019) 1345(2)

DOI: 10.1088/1742-6596/1345/2/022026

0Citations

10Readers

Abstract

Software defect prediction plays an important role in analysing software quality and balancing software cost. However, it lacks suggestions for project managers and software engineers in selecting classifiers. Firstly, a method for building imbalanced distribution data is proposed. Then, Matthews correlation coefficient is used to measure the performance of different classifiers, and the coefficient of variation is utilised to evaluate the stability of classifiers on imbalanced distribution data. Finally, an experiment is conducted on 8 common classifiers and 12 publicly available and widely used data sets. Results show that NaiveBayes behaves steadily when the imbalance rate of data sets changes significantly. The experimental results provide a basis for project managers and software engineers to select classifiers.

Cite

CITATION STYLE

APA

Wang, L., Wang, W., Liu, B., & Geng, S. (2019). Impact of imbalanced data on the performance of software defect prediction classifiers. In Journal of Physics: Conference Series (Vol. 1345). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1345/2/022026

Impact of imbalanced data on the performance of software defect prediction classifiers

Abstract

Cite

Register to see more suggestions