Incomplete data sets with different data types are difficult to handle, but regularly to be found in practical clustering tasks. Therefore in this paper, two procedures for clustering mixed-type data with missing values are derived and analyzed in a simulation study with respect to the factors of partition, prototypes, imputed values, and cluster assignment. Both approaches are based on the k-prototypes algorithm (an extension of k-means), which is one of the most common clustering methods for mixed-type data (i.e., numerical and categorical variables). For k-means clustering of incomplete data, the k-POD algorithm recently has been proposed, which imputes the missings with values of the associated cluster center. We derive an adaptation of the latter and additionally present a cluster aggregation strategy after multiple imputation. It turns out that even a simplified and time-saving variant of the presented method can compete with multiple imputation and subsequent pooling.
CITATION STYLE
Aschenbruck, R., Szepannek, G., & Wilhelm, A. F. X. (2023). Imputation Strategies for Clustering Mixed-Type Data with Missing Values. Journal of Classification, 40(1), 2–24. https://doi.org/10.1007/s00357-022-09422-y
Mendeley helps you to discover research relevant for your work.