FILA: Online Auditing of Machine Learning Model Accuracy under Finite Labelling Budget

Naiqing Guan; Nick Koudas

Conference ProceedingsOPEN ACCESS

FILA: Online Auditing of Machine Learning Model Accuracy under Finite Labelling Budget

Proceedings of the ACM SIGMOD International Conference on Management of Data (2022) 1784-1794

DOI: 10.1145/3514221.3517904

2Citations

11Readers

Get full text

Abstract

Machine learning (ML) is increasingly adopted in industrial applications. Typically, a ML pipeline is instantiated to automate the process of collecting training data, training a model, auditing the model accuracy and generating predictions. In this paper we present a sampling based approach to audit the model accuracy in ML pipelines in an online manner, with general applicability, including entity resolution. We target an online setting in which a deployed model makes predictions, which can be selectively assessed for accuracy by humans in the loop. We present a consistent adaptive stratified sampling estimator for model accuracy and propose the Finite Labels (FILA) method to allocate samples under a finite label budget. We demonstrate that under mild statistical assumptions FILA is asymptotically optimal. We analyze the variance of the estimator under a finite labelling budget and compare our approach to other applicable techniques and analytically establish the conditions under which our proposed FILA is the method of choice. We also present an algorithm based on Thompson Sampling, named FILA-Thompson, utilizing explore-exploit trade-offs to estimate model accuracy. Finally we present the results of a thorough experimental evaluation using real benchmark data sets demonstrating the practical utility (in terms of estimation accuracy and variance minimization under finite samples) of our proposals compared to other applicable approaches.

Author supplied keywords

Cite

CITATION STYLE

APA

Guan, N., & Koudas, N. (2022). FILA: Online Auditing of Machine Learning Model Accuracy under Finite Labelling Budget. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1784–1794). Association for Computing Machinery. https://doi.org/10.1145/3514221.3517904

FILA: Online Auditing of Machine Learning Model Accuracy under Finite Labelling Budget

Abstract

Author supplied keywords

Cite

Register to see more suggestions