Nn-descent on high-dimensional data

Brankica Bratić; Michael E. Houle; Vladimir Kurbalija; Vincent Oria; Miloš Radovanović

Conference Proceedings

Nn-descent on high-dimensional data

ACM International Conference Proceeding Series (2018)

DOI: 10.1145/3227609.3227643

12Citations

23Readers

Get full text

Abstract

K-nearest neighbor graphs (K-NNGs) are used in many data-mining and machine-learning algorithms. Naive construction of K-NNGs has a complexity of O(n2), which could be a problem for large-scale data sets. In order to achieve higher efficiency, many exact and approximate algorithms have been developed, including the NN-Descent algorithm of Dong, Charikar and Li. Empirical evidence suggests that the practical complexity of this algorithm is in Õ(n1.14), which is a significant improvement over brute force construction. However, NN-Descent has a major drawback — it produces good results only on data of low intrinsic dimensionality. This paper presents an experimental analysis of this behavior, and investigates possible solutions. We link the quality of performance of NN-Descent with the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs —points with high in-degrees in the K-NNG. We propose two approaches to alleviate the observed negative influence of hubs on NN-Descent performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Bratić, B., Houle, M. E., Kurbalija, V., Oria, V., & Radovanović, M. (2018). Nn-descent on high-dimensional data. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3227609.3227643

Nn-descent on high-dimensional data

Abstract

Author supplied keywords

Cite

Register to see more suggestions