Over the past decade, geospatial data has witnessed significant expansion, coupled with swift technological advancements. However, its adoption in various industries has been partially constrained by the absence of an efficient data system that can keep pace with these advancements. Among potential solutions, the Lakehouse approach emerges as the most promising. However, existing Lakehouse systems lack integrated support for geospatial data. To address this gap, we present GeoLake, the industry's first universal solution for geospatial data tailored to Lakehouse systems. GeoLake introduces a range of pivotal features, including native support for storing geometry values in Lakehouse tables, and enables seamless querying and processing of geospatial tables using Spatial SQL. Furthermore, GeoLake incorporates advanced techniques such as spatial partitioning and spatial predicate pushdown within the Lakehouse architecture. These techniques expedite geospatial queries through efficient data skipping and optimized data layout. Additionally, GeoLake incorporates a thoughtfully designed Parquet file format that is tailored for the rapid scanning of encoded geometry values. Our experiments with real-world, large-scale geospatial datasets demonstrate the significant performance improvements achieved by GeoLake. Moreover, several industries, including automotive manufacturing, automobile insurance, and smart city initiatives, some of which are Fortune 500 enterprises, are actively adopting or considering the implementation of GeoLake to address the complexities associated with managing and analyzing geospatial data at scale.
CITATION STYLE
Zhang, Y., Peng, B., Du, Y., & Su, J. (2023). GeoLake: Bringing Geospatial Support to Lakehouses. IEEE Access, 11, 143037–143049. https://doi.org/10.1109/ACCESS.2023.3343953
Mendeley helps you to discover research relevant for your work.