Efficient bulk loading of large high-dimensional indexes

Christian Böhm; Hans Peter Kriegel

Conference Proceedings

Efficient bulk loading of large high-dimensional indexes

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (1999) 1676 251-260

DOI: 10.1007/3-540-48298-9_27

N/ACitations

6Readers

Get full text

Abstract

Efficient index construction in multidimensional data spaces is important for many knowledge discovery algorithms, because construction times typically must be amortized by performance gains in query processing. In this paper, we propose a generic bulk loading method which allows the application of user-defined split strategies in the index construction. This approach allows the adaptation of the index properties to the requirements of a specific knowledge discovery algorithm. As our algorithm takes into account that large data sets do not fit in main memory, our algorithm is based on external sorting. Decisions of the split strategy can be made according to a sample of the data set which is selected automatically. The sort algorithm is a variant of the well-known Quicksort algorithm, enhanced to work on secondary storage. The index construction has a runtime complexity of O(n log n). We show both analytically and experimentally that the algorithm outperforms traditional index construction methods by large factors.

Cite

CITATION STYLE

APA

Böhm, C., & Kriegel, H. P. (1999). Efficient bulk loading of large high-dimensional indexes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1676, pp. 251–260). Springer Verlag. https://doi.org/10.1007/3-540-48298-9_27

Efficient bulk loading of large high-dimensional indexes

Abstract

Cite

Register to see more suggestions