The paper presents results of the research related to influence of the level of outliers in the data (train and test data considered separately) on the quality of a model prediction in a classification task. The set of 100 semi–artificial time series was taken into consideration, which independent variables was close to real ones, observed in a underground coal mining environment and dependent variable was generated with the decision tree. For every considered method (decision trees, naive bayes, logistic regression and kNN) a reference model was built (no outliers in the data) which quality was compared with the quality of two models: Out–Out (outliers in train and test data) and Non-out–Out (outliers only in test data). 50 levels of outliers in the data were considered, from 1 % to 50 %. Statistical comparison of models was done on the basis of sign test.
CITATION STYLE
Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., & Przystałka, P. (2016). Influence of outliers introduction on predictive models quality. Communications in Computer and Information Science, 613, 79–93. https://doi.org/10.1007/978-3-319-34099-9_5
Mendeley helps you to discover research relevant for your work.