Active Learning for ML Enhanced Database Systems

54Citations
Citations of this article
103Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recent research has shown promising results by using machine learning (ML) techniques to improve the performance of database systems, e.g., in query optimization or index recommendation. However, in many production deployments, the ML models' performance degrades significantly when the test data diverges from the data used to train these models. In this paper, we address this performance degradation by using B-instances to collect additional data during deployment. We propose an active data collection platform, ADCP, that employs active learning (AL) to gather relevant data cost-effectively. We develop a novel AL technique, Holistic Active Learner (HAL), that robustly combines multiple noisy signals for data gathering in the context of database applications. HAL applies to various ML tasks, budget sizes, cost types, and budgeting interfaces for database applications. We evaluate ADCP on both industry-standard benchmarks and real customer workloads. Our evaluation shows that, compared with other baselines, our technique improves ML models' prediction performance by up to 2x with the same cost budget. In particular, on production workloads, our technique reduces the prediction error of ML models by 75% using about 100 additionally collected queries.

Cite

CITATION STYLE

APA

Ma, L., Ding, B., Das, S., & Swaminathan, A. (2020). Active Learning for ML Enhanced Database Systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 175–191). Association for Computing Machinery. https://doi.org/10.1145/3318464.3389768

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free