Responsible Data Integration: Next-generation Challenges

25Citations
Citations of this article
36Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data integration has been extensively studied by the data management community and is a core task in the data pre-processing step of ML pipelines. When the integrated data is used for analysis and model training, responsible data science requires addressing concerns about data quality and bias. We present a tutorial on data integration and responsibility, highlighting the existing efforts in responsible data integration along with research opportunities and challenges. In this tutorial, we encourage the community to audit data integration tasks with responsibility measures and develop integration techniques that optimize the requirements of responsible data science. We focus on three critical aspects: (1) the requirements to be considered for evaluating and auditing data integration tasks for quality and bias; (2) the data integration tasks that elicit attention to data responsibility measures and methods to satisfy these requirements; and, (3) techniques, tasks, and open problems in data integration that help achieve data responsibility.

Cite

CITATION STYLE

APA

Nargesian, F., Asudeh, A., & Jagadish, H. V. (2022). Responsible Data Integration: Next-generation Challenges. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 2458–2464). Association for Computing Machinery. https://doi.org/10.1145/3514221.3522567

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free