Online Clustering: Algorithms, Evaluation, Metrics, Applications and Benchmarking

Jacob Montiel; Hoang Anh Ngo; Minh Huong Le-Nguyen; Albert Bifet

Conference ProceedingsOPEN ACCESS

Online Clustering: Algorithms, Evaluation, Metrics, Applications and Benchmarking

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2022) 4808-4809

DOI: 10.1145/3534678.3542600

5Citations

15Readers

Get full text

Abstract

Online clustering algorithms play a critical role in data science, especially with the advantages regarding time, memory usage and complexity, while maintaining a high performance compared to traditional clustering methods. This tutorial serves, first, as a survey on online machine learning and, in particular, data stream clustering methods. During this tutorial, state-of-the-art algorithms and the associated core research threads will be presented by identifying different categories based on distance, density grids and hidden statistical models. Clustering validity indices, an important part of the clustering process which are usually neglected or replaced with classification metrics, resulting in misleading interpretation of final results, will also be deeply investigated. Then, this introduction will be put into the context with River, a go-to Python library merged between Creme and scikit-multiflow. It is also the first open-source project to include an online clustering module that can facilitate reproducibility and allow direct further improvements. From this, we propose methods of clustering configuration, applications and settings for benchmarking, using real-world problems and datasets.

Author supplied keywords

Cite

CITATION STYLE

APA

Montiel, J., Ngo, H. A., Le-Nguyen, M. H., & Bifet, A. (2022). Online Clustering: Algorithms, Evaluation, Metrics, Applications and Benchmarking. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 4808–4809). Association for Computing Machinery. https://doi.org/10.1145/3534678.3542600

Online Clustering: Algorithms, Evaluation, Metrics, Applications and Benchmarking

Abstract

Author supplied keywords

Cite

Register to see more suggestions