On Studying the Effect of Data Quality on Classification Performances

Roxane Jouseau; Sébastien Salva; Chafik Samir

Conference Proceedings

On Studying the Effect of Data Quality on Classification Performances

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13756 LNCS 82-93

DOI: 10.1007/978-3-031-21753-1_9

1Citations

3Readers

Get full text

Abstract

During the last decade, data have played a key role for learning and decision making models. Unfortunately, the quality of data has been ignored or partially investigated as a pre-processing step. Motivated by applications in various fields, we propose to study data quality and its impact on the performance of several learning models. In this work, we first study the difficulty of repairing errors by introducing a list of elementary repairing tasks ranging from easy to complex with an increasing level. Then, we form categories from the state-of-the-art cleaning and repairing methods. We also investigate if it is always efficient to repair data. By including standard classifications models and public dataset, our work enables their use in different contexts and can be extended to other machine learning applications.

Author supplied keywords

Cite

CITATION STYLE

APA

Jouseau, R., Salva, S., & Samir, C. (2022). On Studying the Effect of Data Quality on Classification Performances. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13756 LNCS, pp. 82–93). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-21753-1_9

On Studying the Effect of Data Quality on Classification Performances

Abstract

Author supplied keywords

Cite

Register to see more suggestions