Machine learning (ML) is increasingly adopted in industrial applications. Typically, a ML pipeline is instantiated to automate the process of collecting training data, training a model, auditing the model accuracy and generating predictions. In this paper we present a sampling based approach to audit the model accuracy in ML pipelines in an online manner, with general applicability, including entity resolution. We target an online setting in which a deployed model makes predictions, which can be selectively assessed for accuracy by humans in the loop. We present a consistent adaptive stratified sampling estimator for model accuracy and propose the Finite Labels (FILA) method to allocate samples under a finite label budget. We demonstrate that under mild statistical assumptions FILA is asymptotically optimal. We analyze the variance of the estimator under a finite labelling budget and compare our approach to other applicable techniques and analytically establish the conditions under which our proposed FILA is the method of choice. We also present an algorithm based on Thompson Sampling, named FILA-Thompson, utilizing explore-exploit trade-offs to estimate model accuracy. Finally we present the results of a thorough experimental evaluation using real benchmark data sets demonstrating the practical utility (in terms of estimation accuracy and variance minimization under finite samples) of our proposals compared to other applicable approaches.
CITATION STYLE
Guan, N., & Koudas, N. (2022). FILA: Online Auditing of Machine Learning Model Accuracy under Finite Labelling Budget. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1784–1794). Association for Computing Machinery. https://doi.org/10.1145/3514221.3517904
Mendeley helps you to discover research relevant for your work.