During the last decade, data have played a key role for learning and decision making models. Unfortunately, the quality of data has been ignored or partially investigated as a pre-processing step. Motivated by applications in various fields, we propose to study data quality and its impact on the performance of several learning models. In this work, we first study the difficulty of repairing errors by introducing a list of elementary repairing tasks ranging from easy to complex with an increasing level. Then, we form categories from the state-of-the-art cleaning and repairing methods. We also investigate if it is always efficient to repair data. By including standard classifications models and public dataset, our work enables their use in different contexts and can be extended to other machine learning applications.
CITATION STYLE
Jouseau, R., Salva, S., & Samir, C. (2022). On Studying the Effect of Data Quality on Classification Performances. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13756 LNCS, pp. 82–93). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-21753-1_9
Mendeley helps you to discover research relevant for your work.