Abstract
Query optimization has long been fundamental for database systems. There are cracks in the edifice, however, as the complexity of modern query workloads outpace what database systems can manage well. Automatic tools are needed for database vendors, such as IBM with Db2, to help customers troubleshoot their performance problems, as manual troubleshooting is painstaking. To manage complex and large workloads, we develop a distributed system called dGALO that learns recurring problem patterns in query plans over workloads. dGALO employs these problem patterns to build a RDF-based, SPARQL-queried knowledge-base of plan-rewrite remedies. We illustrate a distributed implementation of dGALO on Apache Spark with efficient partitioning strategies for load balancing. The system employs additional pruning strategies via clustering, which yields a fine-grained trade off between runtime and accuracy. dGALO uses its knowledge-base to re-optimize queries, often to dramatic effect, and is a valuable tool for the development team to refine the optimizer with new techniques. We demonstrate by an experimental study over the TPC-DS benchmark the efficiency and effectiveness of our techniques.
Author supplied keywords
Cite
CITATION STYLE
Mihaylov, A., Corvinelli, V., Godfrey, P., Mierzejewski, P., Szlichta, J., & Zuzarte, C. (2021). Scalable Learning to Troubleshoot Query Performance Problems. In International Conference on Information and Knowledge Management, Proceedings (pp. 4016–4025). Association for Computing Machinery. https://doi.org/10.1145/3459637.3481947
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.