Semi-supervised Software Defect Prediction Model Based on Tri-training

Fanqi Meng; Wenying Cheng; Jingdong Wang

Journal ArticleOPEN ACCESS

Semi-supervised Software Defect Prediction Model Based on Tri-training

KSII Transactions on Internet and Information Systems (2021) 15(11) 4028-4042

DOI: 10.3837/TIIS.2021.11.009

81Citations

42Readers

Abstract

Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.

Author supplied keywords

Cite

CITATION STYLE

APA

Meng, F., Cheng, W., & Wang, J. (2021). Semi-supervised Software Defect Prediction Model Based on Tri-training. KSII Transactions on Internet and Information Systems, 15(11), 4028–4042. https://doi.org/10.3837/TIIS.2021.11.009

Semi-supervised Software Defect Prediction Model Based on Tri-training

Abstract

Author supplied keywords

Cite

Register to see more suggestions