A Learned Query Optimizer for Spatial Join

13Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The importance and complexity of spatial join resulted in many join algorithms, some of which run on big-data platforms such as Hadoop and Spark. This paper proposes the first machine-learning-based query optimizer for spatial join operation which can accommodate the skewness of the spatial datasets and the complexity of the different algorithms. The main challenge is how to develop portable cost models that take into account the important input characteristics such as data distribution, spatial partitioning, logic of spatial join algorithms, and the relationship between the two datasets. The proposed system defines a set of features that can all be computed efficiently for the data to catch the intricate aspects of spatial join. Then, it uses these features to train three machine learning models that capture several metrics to estimate the cost of four spatial join algorithms according to user requirements. The first model can estimate the cardinality of spatial join algorithm. The second model can predict the number of rough comparisons for a specific join algorithm. Finally, the third model is a classification model that can choose the best join algorithm to run. Experiments on large scale synthetic and real data show the efficiency of the proposed models over baseline methods.

Cite

CITATION STYLE

APA

Vu, T., Belussi, A., Migliorini, S., & Eldawy, A. (2021). A Learned Query Optimizer for Spatial Join. In GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems (pp. 458–467). Association for Computing Machinery. https://doi.org/10.1145/3474717.3484217

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free