Many data mining algorithms are distance-based and may benefit from using a database index accelerating the similarity search. Examples include clustering algorithms such as DBSCAN, nearest-neighbor classification, and the local outlier factor (LOF). However, choosing the appropriate index requires some knowledge and experience, so it commonly is left to the user, or there is a default value known to work for many. In this article, we discuss a system that contains a query optimizer for such queries that can automatically choose and create an appropriate index. It can reuse suitable indexes that are already present, and it comes with memory management that can also automatically drop an unused auto-created index when memory is scarce. The system is integrated into the ELKI data mining framework version 0.8.0, released along with this paper, and will be used automatically by many algorithms in the toolkit.
CITATION STYLE
Schubert, E. (2022). Automatic Indexing for Similarity Search in ELKI. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13590 LNCS, pp. 205–213). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-17849-8_16
Mendeley helps you to discover research relevant for your work.