Conformance Constraint Discovery: Measuring Trust in Data-Driven Systems

Anna Fariha; Ashish Tiwari; Arjun Radhakrishna; Sumit Gulwani; Alexandra Meliou

Conference ProceedingsOPEN ACCESS

Conformance Constraint Discovery: Measuring Trust in Data-Driven Systems

Proceedings of the ACM SIGMOD International Conference on Management of Data (2021) 499-512

DOI: 10.1145/3448016.3452795

9Citations

21Readers

Get full text

Abstract

The reliability of inferences made by data-driven systems hinges on the data's continued conformance to the systems' initial settings and assumptions. When serving data (on which we want to apply inference) deviates from the profile of the initial training data, the outcome of inference becomes unreliable. We introduce conformance constraints, a new data profiling primitive tailored towards quantifying the degree of non-conformance, which can effectively characterize if inference over that tuple is untrustworthy. Conformance constraints are constraints over certain arithmetic expressions (called projections) involving the numerical attributes of a dataset, which existing data profiling primitives such as functional dependencies and denial constraints cannot model. Our key finding is that projections that incur low variance on a dataset construct effective conformance constraints. This principle yields the surprising result that low-variance components of a principal component analysis, which are usually discarded for dimensionality reduction, generate stronger conformance constraints than the high-variance components. Based on this result, we provide a highly scalable and efficient technique - linear in data size and cubic in the number of attributes - for discovering conformance constraints for a dataset. To measure the degree of a tuple's non-conformance with respect to a dataset, we propose a quantitative semantics that captures how much a tuple violates the conformance constraints of that dataset. We demonstrate the value of conformance constraints on two applications: trusted machine learning and data drift. We empirically show that conformance constraints offer mechanisms to (1) reliably detect tuples on which the inference of a machine-learned model should not be trusted, and (2) quantify data drift more accurately than the state of the art.

Author supplied keywords

Cite

CITATION STYLE

APA

Fariha, A., Tiwari, A., Radhakrishna, A., Gulwani, S., & Meliou, A. (2021). Conformance Constraint Discovery: Measuring Trust in Data-Driven Systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 499–512). Association for Computing Machinery. https://doi.org/10.1145/3448016.3452795

Conformance Constraint Discovery: Measuring Trust in Data-Driven Systems

Abstract

Author supplied keywords

Cite

Register to see more suggestions