Optimal Dimension Order: A generic technique for the similarity join

6Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The similarity join is an important database primitive which has been successfully applied to speed up applications such as similarity search, data analysis and data mining. The similarity join combines two point sets of a multidimensional vector space such that the result contains all point pairs where the distance does not exceed a given Parameter ε. Although the similarity join is clearly CPU bound, most previous publications propose strategies that primarily improve the I/O performance. Only little effort has been taken to address CPU aspects. In this Paper, we show that most of the computational overhead is dedicated to the final distance computations between the feature vectors. Consequently, we propose a generic technique to reduce the response time of a large number of basic algorithms for the similarity join. It is applicable for index based join algorithms as well as for most join algorithms based on hashing or sorting. Our technique, called Optimal Dimension Order, is able to avoid and accelerate distance calculations between feature vectors by a careful order of the dimensions. The order is determined according to a probability model. In the experimental evaluation, we show that our technique yields high performance improvements for various underlying similarity join algorithms such as the R-tree similarity join, the breadthfirst-R-tree join, the Multipage Index Join, and the ε-Grid-Order. © 2002 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Böhm, C., Krebs, F., & Kriegel, H. P. (2002). Optimal Dimension Order: A generic technique for the similarity join. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2454 LNCS, pp. 135–149). Springer Verlag. https://doi.org/10.1007/3-540-46145-0_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free