Partition-based similarity joins using diagonal dimensions in high dimensional data spaces

Hyoseop Shin

Conference Proceedings

Partition-based similarity joins using diagonal dimensions in high dimensional data spaces

Shin H

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4224 LNCS 546-553

DOI: 10.1007/11875581_66

2Citations

1Readers

Get full text

Abstract

Distributions of very high dimensional data are, in most cases, not even, but skewed. For this reason, there can be more effective dimensions than others in partitioning a high dimensional data set. Effective dimensions can be used to partition the data set in more balanced way so that data are located in more evenly distributed. In this paper, we present schemes to select dimensions by which high dimensional data sets are partitioned for efficient similarity joins. Especially, in order to efficiently reduce the number of partition dimensions, we propose a novel scheme using diagonal dimensions compared with perpendicular dimensions. The experimental results show that the proposed schemes substantially improve the performance of the partition-based similarity joins in high dimensional data spaces. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Shin, H. (2006). Partition-based similarity joins using diagonal dimensions in high dimensional data spaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4224 LNCS, pp. 546–553). Springer Verlag. https://doi.org/10.1007/11875581_66

Partition-based similarity joins using diagonal dimensions in high dimensional data spaces

Abstract

Cite

Register to see more suggestions