Proximity measures for rank join

Davide Martinenghi; Marco Tagliasacchi

Journal Article

Proximity measures for rank join

ACM Transactions on Database Systems (2012) 37(1)

DOI: 10.1145/2109196.2109198

7Citations

13Readers

Get full text

Abstract

We introduce the proximity rank join problem, where we are given a set of relations whose tuples are equipped with a score and a real-valued feature vector. Given a target feature vector, the goal is to return the K combinations of tuples with high scores that are as close as possible to the target and to each other, according to some notion of distance or dissimilarity. The setting closely resembles that of traditional rank join, but the geometry of the vector space plays a distinctive role in the computation of the overall score of a combination. Also, the input relations typically return their results either by distance from the target or by score. Because of these aspects, it turns out that traditional rank join algorithms, such as the well-known HRJN, have shortcomings in solving the proximity rank join problem, as they may read more input than needed. To overcome this weakness, we define a tight bound (used as a stopping criterion) that guarantees instance optimality, that is, an I/O cost is achieved that is always within a constant factor of optimal. The tight bound can also be used to drive an adaptive pulling strategy, deciding at each step which relation to access next. For practically relevant classes of problems, we show how to compute the tight bound efficiently. An extensive experimental study validates our results and demonstrates significant gains over existing solutions. © 2012 ACM.

Author supplied keywords

Cite

CITATION STYLE

APA

Martinenghi, D., & Tagliasacchi, M. (2012). Proximity measures for rank join. ACM Transactions on Database Systems, 37(1). https://doi.org/10.1145/2109196.2109198

Proximity measures for rank join

Abstract

Author supplied keywords

Cite

Register to see more suggestions