Computer vision is a "data hungry"field. Researchers and practitioners who work on human-centric computer vision, like facial recognition, emphasize the necessity of vast amounts of data for more robust and accurate models. Humans are seen as a data resource which can be converted into datasets. The necessity of data has led to a proliferation of gathering data from easily available sources, including "public"data from the web. Yet the use of public data has significant ethical implications for the human subjects in datasets. We bridge academic conversations on the ethics of using publicly obtained data with concerns about privacy and agency associated with computer vision applications. Specifically, we examine how practices of dataset construction from public data-not only from websites, but also from public settings and public records-make it extremely difficult for human subjects to trace their images as they are collected, converted into datasets, distributed for use, and, in some cases, retracted. We discuss two interconnected barriers current data practices present to providing an ethics of traceability for human subjects: awareness and control. We conclude with key intervention points for enabling traceability for data subjects. We also offer suggestions for an improved ethics of traceability to enable both awareness and control for individual subjects in dataset curation practices.
CITATION STYLE
Scheuerman, M. K., Weathington, K., Mugunthan, T., Denton, E., & Fiesler, C. (2023). From Human to Data to Dataset: Mapping the Traceability of Human Subjects in Computer Vision Datasets. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1). https://doi.org/10.1145/3579488
Mendeley helps you to discover research relevant for your work.