Influence of outliers introduction on predictive models quality

Mateusz Kalisch; Marcin Michalak; Marek Sikora; Łukasz Wróbel; Piotr Przystałka

Journal Article

Influence of outliers introduction on predictive models quality

Communications in Computer and Information Science (2016) 613 79-93

DOI: 10.1007/978-3-319-34099-9_5

9Citations

8Readers

Get full text

Abstract

The paper presents results of the research related to influence of the level of outliers in the data (train and test data considered separately) on the quality of a model prediction in a classification task. The set of 100 semi–artificial time series was taken into consideration, which independent variables was close to real ones, observed in a underground coal mining environment and dependent variable was generated with the decision tree. For every considered method (decision trees, naive bayes, logistic regression and kNN) a reference model was built (no outliers in the data) which quality was compared with the quality of two models: Out–Out (outliers in train and test data) and Non-out–Out (outliers only in test data). 50 levels of outliers in the data were considered, from 1 % to 50 %. Statistical comparison of models was done on the basis of sign test.

Author supplied keywords

Cite

CITATION STYLE

APA

Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., & Przystałka, P. (2016). Influence of outliers introduction on predictive models quality. Communications in Computer and Information Science, 613, 79–93. https://doi.org/10.1007/978-3-319-34099-9_5

Influence of outliers introduction on predictive models quality

Abstract

Author supplied keywords

Cite

Register to see more suggestions