Partition-based similarity join in high dimensional data spaces

Hyoseop Shin; Bongki Moon; Sukho Lee

Conference Proceedings

Partition-based similarity join in high dimensional data spaces

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2002) 2453 741-750

DOI: 10.1007/3-540-46146-9_73

2Citations

2Readers

Get full text

Abstract

It is not desirable in the performance perspective of search algorithms to partition a high dimensional data space by dividing all the dimensions. This is because the number of cells resulted from partitioning explodes as the number of partitioning dimensions increases, thus making any search method based on space partitioning impractical. To address this problem, we propose an algorithm to dynamically select partitioning dimensions based on a data sampling method for efficient similarity join processing. Futhermore, a disk-based plane sweeping method is proposed to minimize the cost of joins between the partitioned cells. The experimental results show that the proposed schemes substantially improve the performance of the partition-based similarity joins in high dimensional data spaces.

Cite

CITATION STYLE

APA

Shin, H., Moon, B., & Lee, S. (2002). Partition-based similarity join in high dimensional data spaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2453, pp. 741–750). Springer Verlag. https://doi.org/10.1007/3-540-46146-9_73

Partition-based similarity join in high dimensional data spaces

Abstract

Cite

Register to see more suggestions