Combining feature subset selection and data sampling for coping with highly imbalanced software data

24Citations
Citations of this article
36Readers
Mendeley users who have this article in their library.

Abstract

In the software quality modeling process, many practitioners often ignore problems such as high dimensionality and class imbalance that exist in data repositories. They directly use the available set of software metrics to build classification models without regard to the condition of the underlying software measurement data, leading to a decline in prediction performance and extension of training time. In this study, we propose an approach, in which feature selection is combined with data sampling, to overcome these problems. Feature selection is a process of choosing a subset of relevant features so that the quality of prediction models can be maintained or improved. Data sampling seeks a more balanced dataset through the addition or removal of instances. Three different approaches would be produced when combing these two techniques: 1- sampling performed prior to feature selection, but retaining the unsampled data instances; 2- sampling performed prior to feature selection, retaining the sampled data instances; 3- sampling performed after feature selection. The empirical study was carried out on six datasets from a real-world software system. We employed one filter-based (no learning algorithm involved in the selection process) feature subset selection technique called correlationbased feature selection combined with the random undersampling method. The results demonstrate that sampling performed prior to feature selection, but retaining the unsampled data instances (Approach 1) performs better than the other two approaches.

Cite

CITATION STYLE

APA

Gao, K., Khoshgoftaar, T. M., & Napolitano, A. (2015). Combining feature subset selection and data sampling for coping with highly imbalanced software data. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (Vol. 2015-January, pp. 439–444). Knowledge Systems Institute Graduate School. https://doi.org/10.18293/SEKE2015-182

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free