How I stopped worrying about training data bugs and started complaining

1Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

There is an increasing awareness of the gap between machine learning research and production. The research community has largely focused on developing a model that performs well on a validation set, but the production environment needs to make sure the model also performs well in a downstream application. The latter is more challenging because the test/inference-time data used in the application could be quite different from the training data. To address this challenge, we advocate for "complaint-driven"data debugging, which allows the user to complain about the unexpected behaviors of the model in the downstream application, and proposes interventions for training data errors that likely led to the complaints. This new debugging paradigm helps solve a range of training data quality problems such as labeling error, fairness, and data drift. We present our long-term vision, highlight achieved milestones, and outline a research roadmap including a number of open problems.

Cite

CITATION STYLE

APA

Flokas, L., Wu, W., Wang, J., Verma, N., & Wu, E. (2022). How I stopped worrying about training data bugs and started complaining. In Proceedings of the 6th Workshop on Data Management for End-To-End Machine Learning, DEEM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference. Association for Computing Machinery, Inc. https://doi.org/10.1145/3533028.3533305

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free