Streaming spatial-textual data that contains geographic and textual information, e.g., geo-tagged tweets, has an unprecedented increase in amount. As one of the basic operations, the continuous spatial-textual queries that retrieve real-time results continuously on large-scale spatial-textual streams call for means of efficient distributed processing. However, existing proposals either are spatialaware only, or superficially exploit textual information for pruning. We propose a distributed system, called HASTE, for hybrid and adaptive processing on streaming spatial-textual data. The novelty lies on three aspects: (1) We propose a novel method to reduce the workload beforehand by dividing objects and queries into mutually exclusive types; (2) We develop a novel load partitioning strategy and a novel cost model that consider both spatial and textual properties; (3) We design a multi-level load adjustment strategy that adaptively copes with different degrees of load imbalance. We report on extensive experiments with real-world data that offer insight into the performance of the solution, and show that the solution is capable of outperforming the state-of-the-art proposals.
CITATION STYLE
Yang, Z., Zheng, B., Tong, C., Weng, L., Li, C., & Li, G. (2021). HASTE: A Distributed System for Hybrid and Adaptive Processing on Streaming Spatial-Textual Data. In International Conference on Information and Knowledge Management, Proceedings (pp. 2363–2372). Association for Computing Machinery. https://doi.org/10.1145/3459637.3482435
Mendeley helps you to discover research relevant for your work.