It is not desirable in the performance perspective of search algorithms to partition a high dimensional data space by dividing all the dimensions. This is because the number of cells resulted from partitioning explodes as the number of partitioning dimensions increases, thus making any search method based on space partitioning impractical. To address this problem, we propose an algorithm to dynamically select partitioning dimensions based on a data sampling method for efficient similarity join processing. Futhermore, a disk-based plane sweeping method is proposed to minimize the cost of joins between the partitioned cells. The experimental results show that the proposed schemes substantially improve the performance of the partition-based similarity joins in high dimensional data spaces.
CITATION STYLE
Shin, H., Moon, B., & Lee, S. (2002). Partition-based similarity join in high dimensional data spaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2453, pp. 741–750). Springer Verlag. https://doi.org/10.1007/3-540-46146-9_73
Mendeley helps you to discover research relevant for your work.