Estimation by analogy (EBA) is one of the most attractive software effort development estimation techniques. However, one of the critical issues when using EBA is the occurrence of missing data (MD) in the historical data sets. The absence of values of several relevant software attributes is a frequent phenomenon that may cause inaccurate EBA estimations. The MD can be numerical and/or categorical. This paper evaluates four MD techniques (toleration, deletion, k-nearest neighbors (KNN) imputation and support vector regression (SVR) imputation) over four mixed data sets. A total of 432 experiments were conducted involving four MD techniques, nine MD percentages (from 100% to 90%), three missingness mechanisms (MCAR: Missing Completely at Random, MAR: Missing at Random and NIM: Non-Ignorable Missing) and four data sets. The evaluation process consists of four steps and uses several accuracy measures such as standardized accuracy (SA) and prediction level (Pred). The results suggest that EBA with imputation techniques achieved significantly better SA values over EBA with toleration or deletion regardless of the mechanism of missingness. Moreover, no particular MD imputation technique outperformed the other techniques overall. However, according to Pred and other accuracy criteria, EBA with SVR was the best, followed by KNN imputation; we also found that toleration instead of deletion improves the accuracy of EBA.
CITATION STYLE
Abnane, I., & Idri, A. (2018). Improved analogy-based effort estimation with incomplete mixed data. In Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, FedCSIS 2018 (pp. 1015–1024). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.15439/2018F95
Mendeley helps you to discover research relevant for your work.