Aligning points of interest (POIs) from heterogeneous geographical data sources is an important task that helps extend map data with information from different datasets. This task poses several challenges, including differences in type hierarchies, labels (different formats, languages, and levels of detail), and deviations in the coordinates. Scalability is another major issue, as global-scale datasets may have tens or hundreds of millions of entities. In this paper, we propose the GeographicaL Entities AligNment (GLEAN) system for efficiently matching large geographical datasets based on spatial partitioning with an adaptable margin. In particular, we introduce a text similarity measure based on the local-context relevance of tokens used in combination with sentence embeddings. We then come up with a scalable type embedding model. Finally, we demonstrate that our proposed system can efficiently handle the alignment of large datasets while improving the quality of alignments using the proposed entity similarity measure.
CITATION STYLE
Melo, A., Er-Rahmadi, B., & Pan, J. Z. (2022). A System for Aligning Geographical Entities from Large Heterogeneous Sources. ISPRS International Journal of Geo-Information, 11(2). https://doi.org/10.3390/ijgi11020096
Mendeley helps you to discover research relevant for your work.