Online Clustering: Algorithms, Evaluation, Metrics, Applications and Benchmarking

5Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Online clustering algorithms play a critical role in data science, especially with the advantages regarding time, memory usage and complexity, while maintaining a high performance compared to traditional clustering methods. This tutorial serves, first, as a survey on online machine learning and, in particular, data stream clustering methods. During this tutorial, state-of-the-art algorithms and the associated core research threads will be presented by identifying different categories based on distance, density grids and hidden statistical models. Clustering validity indices, an important part of the clustering process which are usually neglected or replaced with classification metrics, resulting in misleading interpretation of final results, will also be deeply investigated. Then, this introduction will be put into the context with River, a go-to Python library merged between Creme and scikit-multiflow. It is also the first open-source project to include an online clustering module that can facilitate reproducibility and allow direct further improvements. From this, we propose methods of clustering configuration, applications and settings for benchmarking, using real-world problems and datasets.

Cite

CITATION STYLE

APA

Montiel, J., Ngo, H. A., Le-Nguyen, M. H., & Bifet, A. (2022). Online Clustering: Algorithms, Evaluation, Metrics, Applications and Benchmarking. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 4808–4809). Association for Computing Machinery. https://doi.org/10.1145/3534678.3542600

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free