We consider a generalization of the fundamental k-means clustering for data with incomplete or corrupted entries. When data objects are represented by points in Rd, a data point is said to be incomplete when some of its entries are missing or unspecified. An incomplete data point with at most ∆ unspecified entries corresponds to an axis-parallel affine subspace of dimension at most ∆, called a ∆-point. Thus we seek a partition of n input ∆-points into k clusters minimizing the k-means objective. For ∆ = 0, when all coordinates of each point are specified, this is the usual k-means clustering. We give an algorithm that finds an (1 + ε)-approximate solution in time f(k,ε,∆) · n2 · d for some function f of k,ε, and ∆ only.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Eiben, E., Fomin, F. V., Golovach, P. A., Lochet, W., Panolan, F., & Simonov, K. (2021). EPTAS for k-means clustering of affine subspaces. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (pp. 2649–2659). Association for Computing Machinery. https://doi.org/10.1137/1.9781611976465.157