Fast and scalable outlier detection with approximate nearest neighbor ensembles

Erich Schubert; Arthur Zimek; Hans Peter Kriegel

Conference Proceedings

Fast and scalable outlier detection with approximate nearest neighbor ensembles

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9050 19-36

DOI: 10.1007/978-3-319-18123-3_2

29Citations

44Readers

Get full text

Abstract

Popular outlier detection methods require the pairwise comparison of objects to compute the nearest neighbors. This inherently quadratic problem is not scalable to large data sets, making multidimensional outlier detection for big data still an open challenge. Existing approximate neighbor search methods are designed to preserve distances as well as possible. In this article, we present a highly scalable approach to compute the nearest neighbors of objects that instead focuses on preserving neighborhoods well using an ensemble of space-filling curves. We show that the method has near-linear complexity, can be distributed to clusters for computation, and preserves neighborhoods—but not distances— better than established methods such as locality sensitive hashing and projection indexed nearest neighbors. Furthermore, we demonstrate that, by preserving neighborhoods, the quality of outlier detection based on local density estimates is not only well retained but sometimes even improved, an effect that can be explained by relating our method to outlier detection ensembles. At the same time, the outlier detection process is accelerated by two orders of magnitude.

Cite

CITATION STYLE

APA

Schubert, E., Zimek, A., & Kriegel, H. P. (2015). Fast and scalable outlier detection with approximate nearest neighbor ensembles. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9050, pp. 19–36). Springer Verlag. https://doi.org/10.1007/978-3-319-18123-3_2

Fast and scalable outlier detection with approximate nearest neighbor ensembles

Abstract

Cite

Register to see more suggestions