Nn-descent on high-dimensional data

12Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.
Get full text

Abstract

K-nearest neighbor graphs (K-NNGs) are used in many data-mining and machine-learning algorithms. Naive construction of K-NNGs has a complexity of O(n2), which could be a problem for large-scale data sets. In order to achieve higher efficiency, many exact and approximate algorithms have been developed, including the NN-Descent algorithm of Dong, Charikar and Li. Empirical evidence suggests that the practical complexity of this algorithm is in Õ(n1.14), which is a significant improvement over brute force construction. However, NN-Descent has a major drawback — it produces good results only on data of low intrinsic dimensionality. This paper presents an experimental analysis of this behavior, and investigates possible solutions. We link the quality of performance of NN-Descent with the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs —points with high in-degrees in the K-NNG. We propose two approaches to alleviate the observed negative influence of hubs on NN-Descent performance.

Cite

CITATION STYLE

APA

Bratić, B., Houle, M. E., Kurbalija, V., Oria, V., & Radovanović, M. (2018). Nn-descent on high-dimensional data. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3227609.3227643

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free