Density-Based Clustering

Poornachandra Sarang

Book Chapter

Density-Based Clustering

Sarang P

DOI: 10.1007/978-3-031-02363-7_12

N/ACitations

2Readers

Get full text

Abstract

The flat, hierarchical, and GMM are incapable of handling outliers. DBSCAN is an algorithm that handles large density-based spatial data containing noise. In particular, it can surely find the non-linearly separable clusters in datasets. OPTICS is another algorithm that improves upon DBSCAN. These algorithms are resistant to noise and can handle nonlinear clusters of varying shapes and sizes. They also detect the number of clusters on their own. To use these algorithms effectively, you need to understand several terms like core point, reachable point, outliers, core distance, and reachability distance. I have created many visuals to illustrate these terms. Though implementing these algorithms is complex, as a data scientist, you can just focus on their implementation provided in the standard libraries. I have created several simulations and provided code snippets for your experimentation that show the effect of various parameters on cluster formations. Toward the end, I introduce you to Mean Shift clustering---an algorithm that discovers clusters in a smooth density of data points. The algorithm is based on the concept of kernel density estimation (KDE). This algorithm requires only a single parameter called bandwidth; however, the selection of bandwidth is non-trivial. In the chapter, I provide you with the definite guidelines on estimating the bandwidth. The chapter contains several examples, practical applications, and guidelines on how to use these powerful, complex algorithms.

Cite

CITATION STYLE

APA

Sarang, P. (2023). Density-Based Clustering (pp. 209–228). https://doi.org/10.1007/978-3-031-02363-7_12

Density-Based Clustering

Abstract

Cite

Register to see more suggestions