Abstract
Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.
Author supplied keywords
Cite
CITATION STYLE
Meng, F., Cheng, W., & Wang, J. (2021). Semi-supervised Software Defect Prediction Model Based on Tri-training. KSII Transactions on Internet and Information Systems, 15(11), 4028–4042. https://doi.org/10.3837/TIIS.2021.11.009
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.