The increasing pervasiveness of object tracking technologies has enabled collection of huge amount of spatio-temporal trajectories. Discovering the useful movement patterns from such big data is gaining in importance and challenging. In this paper we propose an distributed mining framework on Hadoop for efficiently discovering swarm patterns from big spatio-temporal trajectories in parallel. We first define the notion of maximal objectset that captures swarms by recombining clusters in timeset domain. Second, we propose a parallel model based on timeset independent property of swarm pattern to parallel the mining process. Furthermore we propose a distributed algorithm using MapReduce chain architecture based on the proposed parallel model, which features two optimization pruning strategies designed to minimize the computation costs. Our empirical study on the real Taxi dataset demonstrates its effectiveness in finding object-closed swarms. Extensive experiments on 5 network-connected workstations also validate that our proposed algorithm nearly achieves 5-fold speedups against the serial solution.
CITATION STYLE
Yu, Y., Qi, J., Lu, Y., Zhang, Y., & Liu, Z. (2016). MR-swarm: Mining swarms from big spatio-temporal trajectories using mapreduce. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9937 LNCS, pp. 568–575). Springer Verlag. https://doi.org/10.1007/978-3-319-46257-8_61
Mendeley helps you to discover research relevant for your work.