An empirical study on the equivalence and stability of feature selection for noisy software defect data

3Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Software Defect Data (SDD) are used to build defect prediction models for software quality assurance. Existing work employs feature selection to eliminate irrelevant features in the data to improve prediction performance. Previous studies have shown that different feature selection methods do not always yield similar prediction performance on SDD, which indicates that these methods are not equivalent. Also, previous studies have shown that SDD usually contains noise that may interfere the process of feature selection. In this work, we empirically investigate and measure the equivalence of different feature selection methods for SDD. Further, we intend to analyze the stability of the methods for noisy SDD. We perform statistical analyses on eight projects from NASA dataset with eight feature selection methods. For the equivalence analysis, we introduce Principal Component Analysis (PCA) and overlap index to qualitatively and quantitatively analyze the equivalence of these methods respectively. For the stability analysis, we apply consistency index to measure the stability of these methods. Experimental results indicate that different feature selection methods are indeed not equivalent to each other, and Correlation and Fisher Score methods achieve better stability. Keywords-defect data; feature selection; equivalence analysis; stability analysis;.

Cite

CITATION STYLE

APA

Xu, Z., Liu, J., Xia, Z., & Yuan, P. (2017). An empirical study on the equivalence and stability of feature selection for noisy software defect data. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (pp. 191–196). Knowledge Systems Institute Graduate School. https://doi.org/10.18293/SEKE2017-097

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free