Scientific applications are generating an ever-increasing volume of multi-dimensional data that are largely processed inside distributed array databases and frameworks. Similarity join is a fundamental operation across scientific workloads that requires complex processing over an unbounded number of pairs of multi-dimensional points. In this paper, we introduce a novel distributed similarity join operator for multi-dimensional arrays. Unlike immediate extensions to array join and relational similarity join, the proposed operator minimizes the overall data transfer and network congestion while providing load-balancing, without completely repartitioning and replicating the input arrays. We define formally array similarity join and present the design, optimization strategies, and evaluation of the first array similarity join operator.
CITATION STYLE
Zhao, W., Rusu, F., Dong, B., & Wu, K. (2016). Similarity join over array data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Vol. 26-June-2016, pp. 2007–2022). Association for Computing Machinery. https://doi.org/10.1145/2882903.2915247
Mendeley helps you to discover research relevant for your work.