Everyone wants to do the model work, not the data work: Data cascades in high-stakes ai

604Citations
Citations of this article
733Readers
Mendeley users who have this article in their library.
Get full text

Abstract

AI models are increasingly applied in high-stakes domains like health and conservation. Data quality carries an elevated signifcance in high-stakes AI due to its heightened downstream impact, impacting predictions like cancer detection, wildlife poaching, and loan allocations. Paradoxically, data is the most under-valued and de-glamorised aspect of AI. In this paper,we report on data practices in high-stakes AI, from interviews with 53 AI practitioners in India, East and West African countries, and USA. We defne, identify, and present empirical evidence on Data Cascades-compounding events causing negative, downstream efects from data issues-triggered by conventional AI/ML practices that undervalue data quality. Data cascades are pervasive (92% prevalence), invisible, delayed, but often avoidable. We discuss HCI opportunities in designing and incentivizing data excellence as a frst-class citizen of AI, resulting in safer and more robust systems for all.

Cite

CITATION STYLE

APA

Sambasivan, N., Kapania, S., & Highfll, H. (2021). Everyone wants to do the model work, not the data work: Data cascades in high-stakes ai. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery. https://doi.org/10.1145/3411764.3445518

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free