A comprehensive study of idistance partitioning strategies for kNN queries and high-dimensional data indexing

Michael A. Schuh; Tim Wylie; Juan M. Banda; Rafal A. Angryk

Conference Proceedings

A comprehensive study of idistance partitioning strategies for kNN queries and high-dimensional data indexing

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7968 LNCS 238-252

DOI: 10.1007/978-3-642-39467-6_22

7Citations

6Readers

Get full text

Abstract

Efficient database indexing and information retrieval tasks such as k-nearest neighbor (kNN) search still remain difficult challenges in large-scale and high-dimensional data. In this work, we perform the first comprehensive analysis of different partitioning strategies for the state-of-the-art high-dimensional indexing technique iDistance. This work greatly extends the discussion of why certain strategies work better than others over datasets of various distributions, dimensionality, and size. Through the use of novel partitioning strategies and extensive experimentation on real and synthetic datasets, our results establish an up-to-date iDistance benchmark for efficient kNN querying of large-scale and high-dimensional data and highlight the inherent difficulties associated with such tasks. We show that partitioning strategies can greatly affect the performance of iDistance and outline current best practices for using the indexing algorithm in modern application or comparative evaluation. © 2013 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Schuh, M. A., Wylie, T., Banda, J. M., & Angryk, R. A. (2013). A comprehensive study of idistance partitioning strategies for kNN queries and high-dimensional data indexing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7968 LNCS, pp. 238–252). https://doi.org/10.1007/978-3-642-39467-6_22

A comprehensive study of idistance partitioning strategies for kNN queries and high-dimensional data indexing

Abstract

Author supplied keywords

Cite

Register to see more suggestions