Detecting outliers in terms of errors in embedded software development projects using imbalanced data classification

2Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This study examines the effect of undersampling on the detection of outliers in terms of the number of errors in embedded software development projects. Our study aims at estimating the number of errors and the amount of effort in projects. As outliers can adversely affect this estimation, they are excluded from many estimation models. However, such outliers can be identified in practice once the projects have been completed; therefore, they should not be excluded while constructing models and estimating errors or effort. We have also attempted to detect outliers. However, the accuracy of the classifications was not acceptable because of a small number of outliers. This problem is referred to as data imbalance. To avoid this problem, we explore rebalancing methods using k-means cluster-based undersampling. This method aims at improving the proportion of outliers that are correctly identified while maintaining the other classification performance metrics high. Evaluation experiments were performed, and the results show that the proposed methods can improve the accuracy of detecting outliers; however, they also classify too many samples as outliers.

Cite

CITATION STYLE

APA

Iwata, K., Nakashima, T., Anan, Y., & Ishii, N. (2018). Detecting outliers in terms of errors in embedded software development projects using imbalanced data classification. Studies in Computational Intelligence, 726, 65–80. https://doi.org/10.1007/978-3-319-63618-4_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free