As large-scale machine learning models become more prevalent in assistive and pervasive technologies, the research community has started examining limitations and challenges that arise from training data, e.g., fairness, bias, and interpretability issues. To this end, data-centric approaches are increasingly prevailing over time, showing that high-quality data is a critical component in many applications. Several studies explore methods to define and improve data quality, however, no uniform definition exists. In this work, we present an empirical analysis of the multifaceted problem of evaluating data quality. Our work aims at identifying data quality challenges that are most commonly observed by data users and practitioners. Inspired by the need for generally applicable methods, we select a representative set of quality indicators, that covers a broad spectrum of issues, and investigate the utility of these indicators on a broad range of datasets through inter-annotator agreement analysis. Our work provides insights and presents open challenges in designing improved data life cycles.
CITATION STYLE
Pleimling, X., Shah, V., & Lourentzou, I. (2022). Quality Lies In The Eyes Of The Beholder. In ACM International Conference Proceeding Series (pp. 118–124). Association for Computing Machinery. https://doi.org/10.1145/3529190.3529222
Mendeley helps you to discover research relevant for your work.