Abstract
Software Defect Prediction (SDP) is crucial for ensuring software quality. However, class imbalance (CI) poses a significant challenge in predictive modeling. This study introduces a novel approach by employing the Synthetic Data Vault (SDV) to tackle CI within Cross-Project Defect Prediction (CPDP). Methodologically, the study addresses CI across multiple datasets (ReLink, MDP, and PROMISE) by leveraging SDV to augment minority classes. Classification utilizing Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Random Forest (RF), also model performance is evaluated using AUC and t-Test. The results consistently show that SDV performs better than SMOTE and other techniques in various projects. This superiority is evident through statistically significant improvements. KNN dominance in average AUC results, with values 0.695, 0.704, and 0.750. On ReLink, KNN show 16.06% improvement over the imbalanced and 12.84% over SMOTE. Similarly, on MDP, KNN 20.71% improvement over the imbalanced and a 10.16% over SMOTE. Moreover, on PROMISE, KNN 13.55% improvement over the imbalanced and 7.01% over SMOTE. RF displays moderate performance, closely followed by LR and DT, while NB lags behind. Overall, SDV got an improvement of 10.10% from imbalanced, and 7.54% from SMOTE. The statistical significance of these findings is confirmed by t-Test, all below the 0.05 threshold. The practical implication of adopting SDV for defect detection and CI mitigation lies in its demonstrated effectiveness, particularly with KNN as the best classification algorithm, showcasing promising potential to enhance software quality by addressing CI and improving predictive modeling outcomes.
Author supplied keywords
Cite
CITATION STYLE
Nabella, P., Herteno, R., Saputro, S. W., Faisal, M. R., & Abadi, F. (2024). Impact of a Synthetic Data Vault for Imbalanced Class in Cross-Project Defect Prediction. Journal of Electronics, Electromedical Engineering, and Medical Informatics, 6(2), 219–230. https://doi.org/10.35882/jeeemi.v6i2.409
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.