k-Means-Clustering

Annalyn Ng; Kenneth Soo

Book Chapter

k-Means-Clustering

Ng A
Soo K

Springer Berlin Heidelberg, (2018), 19-28

DOI: 10.1007/978-3-662-56776-0_2

N/ACitations

27Readers

Get full text

Abstract

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes. The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Cite

CITATION STYLE

APA

Ng, A., & Soo, K. (2018). k-Means-Clustering. In Data Science – was ist das eigentlich?! (pp. 19–28). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-662-56776-0_2

k-Means-Clustering

Abstract

Cite

Register to see more suggestions