We consider the join operation in metric spaces. Given two sets A and B of objects drawn from some universe double-struck U, we want to compute the set A times sign closed B = {(a, b) ∈ A x B | d(a, b) ≤ r} efficiently, where d : double-struck U x double-struck U → ℝ+ is a metric distance function and r ∈ ℝ+ is user supplied query radius. In particular we are interested in the case where we have no index available (nor we can afford to build it) for either A or B. In this paper we improve the Quickjoin algorithm (Jacox and Samet, 2008), based on the well-know Quicksort algorithm, by (i) replacing the low level component that handles small subsets with essentially brute-force nested loop with a more efficient method; (ii) showing that, contrary to Quicksort, in Quickjoin unbalanced partitioning can improve the algorithm; and (iii) making the algorithm probabilistic while still obtaining most of the relevant results. We also show how to use Quickjoin to compute k-nearest neighbor joins. The experimental results show that the method works well in practice. © 2013 Springer-Verlag.
CITATION STYLE
Fredriksson, K., & Braithwaite, B. (2013). Quicker similarity joins in metric spaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8199 LNCS, pp. 127–140). https://doi.org/10.1007/978-3-642-41062-8_13
Mendeley helps you to discover research relevant for your work.