Decorrelated clustering with data selection bias

11Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.

Abstract

Most of existing clustering algorithms are proposed without considering the selection bias in data. In many real applications, however, one cannot guarantee the data is unbiased. Selection bias might bring the unexpected correlation between features and ignoring those unexpected correlations will hurt the performance of clustering algorithms. Therefore, how to remove those unexpected correlations induced by selection bias is extremely important yet largely unexplored for clustering. In this paper, we propose a novel Decorrelation regularized K-Means algorithm (DCKM) for clustering with data selection bias. Specifically, the decorrelation regularizer aims to learn the global sample weights which are capable of balancing the sample distribution, so as to remove unexpected correlations among features. Meanwhile, the learned weights are combined with k-means, which makes the reweighted k-means cluster on the inherent data distribution without unexpected correlation influence. Moreover, we derive the updating rules to effectively infer the parameters in DCKM. Extensive experiments results on real world datasets well demonstrate that our DCKM algorithm achieves significant performance gains, indicating the necessity of removing unexpected feature correlations induced by selection bias when clustering.

Cite

CITATION STYLE

APA

Wang, X., Fan, S., Kuang, K., Shi, C., Liu, J., & Wang, B. (2020). Decorrelated clustering with data selection bias. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2021-January, pp. 2177–2183). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2020/301

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free