Detecting outliers in terms of errors in embedded software development projects using imbalanced data classification

Kazunori Iwata; Toyoshiro Nakashima; Yoshiyuki Anan; Naohiro Ishii

Journal Article

Detecting outliers in terms of errors in embedded software development projects using imbalanced data classification

Studies in Computational Intelligence (2018) 726 65-80

DOI: 10.1007/978-3-319-63618-4_6

2Citations

4Readers

Get full text

Abstract

This study examines the effect of undersampling on the detection of outliers in terms of the number of errors in embedded software development projects. Our study aims at estimating the number of errors and the amount of effort in projects. As outliers can adversely affect this estimation, they are excluded from many estimation models. However, such outliers can be identified in practice once the projects have been completed; therefore, they should not be excluded while constructing models and estimating errors or effort. We have also attempted to detect outliers. However, the accuracy of the classifications was not acceptable because of a small number of outliers. This problem is referred to as data imbalance. To avoid this problem, we explore rebalancing methods using k-means cluster-based undersampling. This method aims at improving the proportion of outliers that are correctly identified while maintaining the other classification performance metrics high. Evaluation experiments were performed, and the results show that the proposed methods can improve the accuracy of detecting outliers; however, they also classify too many samples as outliers.

Cite

CITATION STYLE

APA

Iwata, K., Nakashima, T., Anan, Y., & Ishii, N. (2018). Detecting outliers in terms of errors in embedded software development projects using imbalanced data classification. Studies in Computational Intelligence, 726, 65–80. https://doi.org/10.1007/978-3-319-63618-4_6

Detecting outliers in terms of errors in embedded software development projects using imbalanced data classification

Abstract

Cite

Register to see more suggestions