Influence of outliers introduction on predictive models quality

9Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The paper presents results of the research related to influence of the level of outliers in the data (train and test data considered separately) on the quality of a model prediction in a classification task. The set of 100 semi–artificial time series was taken into consideration, which independent variables was close to real ones, observed in a underground coal mining environment and dependent variable was generated with the decision tree. For every considered method (decision trees, naive bayes, logistic regression and kNN) a reference model was built (no outliers in the data) which quality was compared with the quality of two models: Out–Out (outliers in train and test data) and Non-out–Out (outliers only in test data). 50 levels of outliers in the data were considered, from 1 % to 50 %. Statistical comparison of models was done on the basis of sign test.

Cite

CITATION STYLE

APA

Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., & Przystałka, P. (2016). Influence of outliers introduction on predictive models quality. Communications in Computer and Information Science, 613, 79–93. https://doi.org/10.1007/978-3-319-34099-9_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free