Software defect prediction plays an important role in analysing software quality and balancing software cost. However, it lacks suggestions for project managers and software engineers in selecting classifiers. Firstly, a method for building imbalanced distribution data is proposed. Then, Matthews correlation coefficient is used to measure the performance of different classifiers, and the coefficient of variation is utilised to evaluate the stability of classifiers on imbalanced distribution data. Finally, an experiment is conducted on 8 common classifiers and 12 publicly available and widely used data sets. Results show that NaiveBayes behaves steadily when the imbalance rate of data sets changes significantly. The experimental results provide a basis for project managers and software engineers to select classifiers.
CITATION STYLE
Wang, L., Wang, W., Liu, B., & Geng, S. (2019). Impact of imbalanced data on the performance of software defect prediction classifiers. In Journal of Physics: Conference Series (Vol. 1345). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1345/2/022026
Mendeley helps you to discover research relevant for your work.