Strark-H: A Strategy for Spatial Data Storage to Improve Query Efficiency Based on Spark

Weitao Zou; Weipeng Jing; Guangsheng Chen; Yang Lu

Conference Proceedings

Strark-H: A Strategy for Spatial Data Storage to Improve Query Efficiency Based on Spark

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 11944 LNCS 285-299

DOI: 10.1007/978-3-030-38991-8_19

0Citations

4Readers

Get full text

Abstract

In this paper, we propose Strark-H, a storage and query strategy for large-scale spatial data based on Spark, to improve the response speed of spatial query by considering the spatial location and category keywords of spatial objects. Firstly, we define a custom InputFormat class to make spark natively understand the content of Shapefile, which is a common file format to store spatial data. Then, we put forward a partition and indexing method for spatial storage, based on which spatial data is partitioned unevenly according to the spatial position, which ensures the size of each partition does not exceed the block in HDFS and preserve the spatial proximity of spatial objects in the cluster. Moreover, a secondary index is generated, including global index based on spatial position for all partitions as well as local index based on category of spatial objects. Finally, we design a new data loading and query scheme based on Strark-H for spatial queries including range query, K-NN query and spatial join query. Extensive experiments on OSM show that Strark-H can be applied to Spark to natively support spatial query and storage with efficiency and scalability.

Author supplied keywords

Cite

CITATION STYLE

APA

Zou, W., Jing, W., Chen, G., & Lu, Y. (2020). Strark-H: A Strategy for Spatial Data Storage to Improve Query Efficiency Based on Spark. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11944 LNCS, pp. 285–299). Springer. https://doi.org/10.1007/978-3-030-38991-8_19

Strark-H: A Strategy for Spatial Data Storage to Improve Query Efficiency Based on Spark

Abstract

Author supplied keywords

Cite

Register to see more suggestions