Abstract
AI models are increasingly applied in high-stakes domains like health and conservation. Data quality carries an elevated signifcance in high-stakes AI due to its heightened downstream impact, impacting predictions like cancer detection, wildlife poaching, and loan allocations. Paradoxically, data is the most under-valued and de-glamorised aspect of AI. In this paper,we report on data practices in high-stakes AI, from interviews with 53 AI practitioners in India, East and West African countries, and USA. We defne, identify, and present empirical evidence on Data Cascades-compounding events causing negative, downstream efects from data issues-triggered by conventional AI/ML practices that undervalue data quality. Data cascades are pervasive (92% prevalence), invisible, delayed, but often avoidable. We discuss HCI opportunities in designing and incentivizing data excellence as a frst-class citizen of AI, resulting in safer and more robust systems for all.
Author supplied keywords
Cite
CITATION STYLE
Sambasivan, N., Kapania, S., & Highfll, H. (2021). Everyone wants to do the model work, not the data work: Data cascades in high-stakes ai. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery. https://doi.org/10.1145/3411764.3445518
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.