Properly-designed bulk-loading techniques are more efficient than the conventional tuple-loading method in constructing a multidimensional index tree for a large data set. Although a number of bulk-loading algorithms have been proposed in the literature, most of them were designed for continuous data spaces (CDS) and cannot be directly applied to non-ordered discrete data spaces (NDDS). In this paper, we present a new space-partitioning-based bulk-loading algorithm for the NSP-tree - a multidimensional index tree recently developed for NDDSs . The algorithm constructs the target NSP-tree by repeatedly partitioning the underlying NDDS for a given data set until input vectors in every subspace can fit into a leaf node. Strategies to increase the efficiency of the algorithm, such as multi-way splitting, memory buffering and balanced space partitioning, are employed. Histograms that characterize the data distribution in a subspace are used to decide space partitions. Our experiments show that the proposed bulk-loading algorithm is more efficient than the tuple-loading algorithm and a popular generic bulk-loading algorithm that could be utilized to build the NSP-tree. © 2008 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Qian, G., Seok, H. J., Zhu, Q., & Pramanik, S. (2008). Space-partitioning-based bulk-loading for the NSP-Tree in non-ordered discrete data spaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5181 LNCS, pp. 404–418). https://doi.org/10.1007/978-3-540-85654-2_37
Mendeley helps you to discover research relevant for your work.