CM-tree: A dynamic clustered index for similarity search in metric databases

22Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Repositories of unstructured data types, such as free text, images, audio and video, have been recently emerging in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects and similarity is modeled by a metric distance function. In this article we propose a new dynamic paged and balanced access method for similarity search in metric data sets, named CM-tree (Clustered Metric tree). It fully supports dynamic capabilities of insertions and deletions both of single objects and in bulk. Distinctive from other methods, it is especially designed to achieve a structure of tight and low overlapping clusters via its primary construction algorithms (instead of post-processing), yielding significantly improved performance. Several new methods are introduced to achieve this: a strategy for selecting representative objects of nodes, clustering based node split algorithm and criteria for triggering a node split, and an improved sub-tree pruning method used during search. To facilitate these methods the pairwise distances between the objects of a node are maintained within each node. Results from an extensive experimental study show that the CM-tree outperforms the M-tree and the Slim-tree, improving search performance by up to 312% for I/O costs and 303% for CPU costs. © 2007 Elsevier B.V. All rights reserved.

Cite

CITATION STYLE

APA

Aronovich, L., & Spiegler, I. (2007). CM-tree: A dynamic clustered index for similarity search in metric databases. Data and Knowledge Engineering, 63(3), 919–946. https://doi.org/10.1016/j.datak.2007.06.001

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free