This study examines the effect of undersampling on the detection of outliers in terms of the number of errors in embedded software development projects. Our study aims at estimating the number of errors and the amount of effort in projects. As outliers can adversely affect this estimation, they are excluded from many estimation models. However, such outliers can be identified in practice once the projects have been completed; therefore, they should not be excluded while constructing models and estimating errors or effort. We have also attempted to detect outliers. However, the accuracy of the classifications was not acceptable because of a small number of outliers. This problem is referred to as data imbalance. To avoid this problem, we explore rebalancing methods using k-means cluster-based undersampling. This method aims at improving the proportion of outliers that are correctly identified while maintaining the other classification performance metrics high. Evaluation experiments were performed, and the results show that the proposed methods can improve the accuracy of detecting outliers; however, they also classify too many samples as outliers.
CITATION STYLE
Iwata, K., Nakashima, T., Anan, Y., & Ishii, N. (2018). Detecting outliers in terms of errors in embedded software development projects using imbalanced data classification. Studies in Computational Intelligence, 726, 65–80. https://doi.org/10.1007/978-3-319-63618-4_6
Mendeley helps you to discover research relevant for your work.