Partition-based similarity join in high dimensional data spaces

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

It is not desirable in the performance perspective of search algorithms to partition a high dimensional data space by dividing all the dimensions. This is because the number of cells resulted from partitioning explodes as the number of partitioning dimensions increases, thus making any search method based on space partitioning impractical. To address this problem, we propose an algorithm to dynamically select partitioning dimensions based on a data sampling method for efficient similarity join processing. Futhermore, a disk-based plane sweeping method is proposed to minimize the cost of joins between the partitioned cells. The experimental results show that the proposed schemes substantially improve the performance of the partition-based similarity joins in high dimensional data spaces.

Cite

CITATION STYLE

APA

Shin, H., Moon, B., & Lee, S. (2002). Partition-based similarity join in high dimensional data spaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2453, pp. 741–750). Springer Verlag. https://doi.org/10.1007/3-540-46146-9_73

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free