Scalable Learning to Troubleshoot Query Performance Problems

3Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Query optimization has long been fundamental for database systems. There are cracks in the edifice, however, as the complexity of modern query workloads outpace what database systems can manage well. Automatic tools are needed for database vendors, such as IBM with Db2, to help customers troubleshoot their performance problems, as manual troubleshooting is painstaking. To manage complex and large workloads, we develop a distributed system called dGALO that learns recurring problem patterns in query plans over workloads. dGALO employs these problem patterns to build a RDF-based, SPARQL-queried knowledge-base of plan-rewrite remedies. We illustrate a distributed implementation of dGALO on Apache Spark with efficient partitioning strategies for load balancing. The system employs additional pruning strategies via clustering, which yields a fine-grained trade off between runtime and accuracy. dGALO uses its knowledge-base to re-optimize queries, often to dramatic effect, and is a valuable tool for the development team to refine the optimizer with new techniques. We demonstrate by an experimental study over the TPC-DS benchmark the efficiency and effectiveness of our techniques.

Cite

CITATION STYLE

APA

Mihaylov, A., Corvinelli, V., Godfrey, P., Mierzejewski, P., Szlichta, J., & Zuzarte, C. (2021). Scalable Learning to Troubleshoot Query Performance Problems. In International Conference on Information and Knowledge Management, Proceedings (pp. 4016–4025). Association for Computing Machinery. https://doi.org/10.1145/3459637.3481947

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free