With the rising amount of publicly available data, data-driven modeling is becoming increasingly popular. Geospatial data is one of the most important facets that can be combined with virtually all data science real-world applications. However, there is a lack of customized geospatial data that can be used in various data science applications from different domains, e.g., hydrology, political science, climatology, and agriculture. This paper introduces a Spark-based extractor that can extract rich geospatial datasets from OpenStreetMap (OSM). OSM hosts crowd-sourced geospatial data that represent a variety of natural and human-made features, e.g., lakes, buildings, and roads. The size of this data is extremely huge and requires complex processing before being ready to use in data science. The proposed extractor runs on Apache SparkSQL which allows it to scale to the Planet.osm file which spans the entire world. In addition to the extractor, we make the data available in various standard formats, e.g., GeoJSON, CSV, KML, and Shapefile. Furthermore, we host these datasets on UCR-Star which allows users to visually explore these datasets and download any subset of the data for any geospatial region.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Singla, S., Zhang, Y., & Eldawy, A. (2022). OSMX: spark-based geospatial data extractor from OpenStreetMap. In GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems. Association for Computing Machinery. https://doi.org/10.1145/3557915.3560954