Real data are often corrupted by noise, which can be provenient from errors in data collection, storage and processing. The presence of noise hampers the induction of Machine Learning models from data, which can have their predictive or descriptive performance impaired, while also making the training time longer. Moreover, these models can be overly complex in order to accomodate such errors. Thus, the identification and reduction of noise in a data set may benefit the learning process. In this paper, we thereby investigate the use of data complexity measures to identify the presence of noise in a data set. This identification can support the decision regarding the need of the application of noise redution techniques. © 2013 Springer-Verlag.
CITATION STYLE
García, L. P. F., De Carvalho, A. C. P. L. F., & Lorena, A. C. (2013). Noisy data set identification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8073 LNAI, pp. 629–638). https://doi.org/10.1007/978-3-642-40846-5_63
Mendeley helps you to discover research relevant for your work.