Performance evaluation of missing-value imputation clustering based on a multivariate Gaussian mixture model

4Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

BACKGROUND: It is challenging to deal with mixture models when missing values occur in clustering datasets.<br /><br />METHODS AND RESULTS: We propose a dynamic clustering algorithm based on a multivariate Gaussian mixture model that efficiently imputes missing values to generate a "pseudo-complete" dataset. Parameters from different clusters and missing values are estimated according to the maximum likelihood implemented with an expectation-maximization algorithm, and multivariate individuals are clustered with Bayesian posterior probability. A simulation showed that our proposed method has a fast convergence speed and it accurately estimates missing values. Our proposed algorithm was further validated with Fisher's Iris dataset, the Yeast Cell-cycle Gene-expression dataset, and the CIFAR-10 images dataset. The results indicate that our algorithm offers highly accurate clustering, comparable to that using a complete dataset without missing values. Furthermore, our algorithm resulted in a lower misjudgment rate than both clustering algorithms with missing data deleted and with missing-value imputation by mean replacement.<br /><br />CONCLUSION: We demonstrate that our missing-value imputation clustering algorithm is feasible and superior to both of these other clustering algorithms in certain situations.

Cite

CITATION STYLE

APA

Xiao, J., Xu, Q., Wu, C., Gao, Y., Hua, T., & Xu, C. (2016). Performance evaluation of missing-value imputation clustering based on a multivariate Gaussian mixture model. PLoS ONE, 11(8). https://doi.org/10.1371/journal.pone.0161112

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free