A Variational EM Acceleration for Efficient Clustering at Very Large Scales

12Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

How can we efficiently find very large numbers of clusters C in very large datasets N of potentially high dimensionality D? Here we address the question by using a novel variational approach to optimize Gaussian mixture models (GMMs) with diagonal covariance matrices. The variational method approximates expectation maximization (EM) by applying truncated posteriors as variational distributions and partial E-steps in combination with coresets. Run time complexity to optimize the clustering objective then reduces from O(NCD) per conventional EM iteration to O(N′G2D) for a variational EM iteration on coresets (with coreset size N ′ ≤ N and truncation parameter G ≪ C). Based on the strongly reduced run time complexity per iteration, which scales sublinearly with NC, we then provide a concrete, practically applicable, parallelized and highly efficient clustering algorithm. In numerical experiments on standard large-scale benchmarks we (A) show that also overall clustering times scale sublinearly with NC, and (B) observe substantial wall-clock speedups compared to already highly efficient recently reported results. The algorithm's sublinear scaling allows for applications at scales where alternative methods cease to be applicable. We demonstrate such very large-scale applicability using the YFCC100M benchmark, for which we realize with a GMM of up to 50.000 clusters an optimization of a data density model with up to 150 M parameters.

Cite

CITATION STYLE

APA

Hirschberger, F., Forster, D., & Lucke, J. (2022). A Variational EM Acceleration for Efficient Clustering at Very Large Scales. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12), 9787–9801. https://doi.org/10.1109/TPAMI.2021.3133763

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free