How I stopped worrying about training data bugs and started complaining

Lampros Flokas; Weiyuan Wu; Jiannan Wang; Nakul Verma; Eugene Wu

Conference ProceedingsOPEN ACCESS

How I stopped worrying about training data bugs and started complaining

Proceedings of the 6th Workshop on Data Management for End-To-End Machine Learning, DEEM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference (2022)

DOI: 10.1145/3533028.3533305

1Citations

7Readers

Get full text

Abstract

There is an increasing awareness of the gap between machine learning research and production. The research community has largely focused on developing a model that performs well on a validation set, but the production environment needs to make sure the model also performs well in a downstream application. The latter is more challenging because the test/inference-time data used in the application could be quite different from the training data. To address this challenge, we advocate for "complaint-driven"data debugging, which allows the user to complain about the unexpected behaviors of the model in the downstream application, and proposes interventions for training data errors that likely led to the complaints. This new debugging paradigm helps solve a range of training data quality problems such as labeling error, fairness, and data drift. We present our long-term vision, highlight achieved milestones, and outline a research roadmap including a number of open problems.

Cite

CITATION STYLE

APA

Flokas, L., Wu, W., Wang, J., Verma, N., & Wu, E. (2022). How I stopped worrying about training data bugs and started complaining. In Proceedings of the 6th Workshop on Data Management for End-To-End Machine Learning, DEEM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference. Association for Computing Machinery, Inc. https://doi.org/10.1145/3533028.3533305

How I stopped worrying about training data bugs and started complaining

Abstract

Cite

Register to see more suggestions