Evaluating machine learning classification using sorted missing percentage technique based on missing data

Che Yu Hung; Bernard C. Jiang; Chien Chih Wang

Journal ArticleOPEN ACCESS

Evaluating machine learning classification using sorted missing percentage technique based on missing data

Applied Sciences (Switzerland) (2020) 10(14)

DOI: 10.3390/app10144920

9Citations

18Readers

Abstract

Missing data are common in industrial sensor readings owing to system updates and unequal radio-frequency periods. Existing methods addressing missing data through imputation may not always be appropriate. This study presented a sorted missing percentages technique for filtering attributes when building machine learning classification models using sensor readings with missing data. Signal detection theory was employed to evaluate the distinguishing ability of resulting models. To evaluate its performance, the proposed technique was applied to a publicly available air pressure system dataset, which then was used to build several classifiers. The experimental results indicated that the proposed technique allowed a logistic regression model to achieve the best accuracy score (99.56%) and a better distinguishing ability (response bias of 0.0013, adjusted response bias of 0.0044, and decision criterion of -1.8994) compared with the methods applied to the same dataset and reported in papers published between 2016 and 2019 March on binary classification, wherein attributes with more than 20% of missing data were filtered out. The proposed technique is suitable for industrial sensor data analysis and can be applied to the scenarios dealing with missing data owing to unequal radio-frequency periods or a system being updated with new fields.

Author supplied keywords

Cite

CITATION STYLE

APA

Hung, C. Y., Jiang, B. C., & Wang, C. C. (2020). Evaluating machine learning classification using sorted missing percentage technique based on missing data. Applied Sciences (Switzerland), 10(14). https://doi.org/10.3390/app10144920

Evaluating machine learning classification using sorted missing percentage technique based on missing data

Abstract

Author supplied keywords

Cite

Register to see more suggestions