Abstract
Statement of need R (R Core Team, 2020) provides a rich collection of packages for building and analyzing finite mixture models, which are widely used in unsupervised learning, such as model-based clustering and density estimation. For example, mclust (Scrucca et al., 2016) can be used to build Gaussian mixture models with different covariance structures, mixtools (Benaglia et al., 2010) implements parametric and non-parametric mixture models as well as mixtures of Gaussian regressions, flexmix (Leisch, 2004) provides a general framework for finite mixtures of regression models, mixdist (Macdonald et al., 2018) fits mixture models for grouped and conditional data (also called binned data). To our knowledge, almost all R packages for finite mixture models are designed to use raw data as the modeling input except mixdist. However, the popular model selection methods based on information criteria or bootstrapping likelihood ratio test (bLRT) (Feng & McCulloch, 1996; McLachlan, 1987; Yu & Harvill, 2019) are not implemented in mixdist. To bridge this gap and to unify the interface for finite mixture modeling for both raw and binned data, we implement mixR package that provides the following primary features. • mixfit() performs maximum likelihood estimation (MLE) for finite mixture models for Gaussian, Weibull, Gamma, and Log-normal distributions via EM algorithm (Dempster et al., 1977). The model fitting is accelerated via package Rcpp (Eddelbuettel et al., 2011). • select() selects the best model from a series of mixture models with a different number of mixture components by using Bayesian Information Criterion (BIC). • bs.test() performs bLRT for two mixture models from the same distribution family but with a different number of components. mixR also contains the following additional features. • Visualization of the fitted mixture models using ggplot2 (Wickham, 2011). • Functions to generate random data from mixture models. • Functions to convert parameters of Weibull and Gamma mixture models between shape-scale representation used in probability density functions and mean-variance representation which is more intuitive for people to understand the distribution. Examples We demonstrate how to use mixR for fitting finite mixture models and selecting mixture models using BIC and bLRT. Yu, Y., (2022). mixR: An R package for Finite Mixture Modeling for Both Raw and Binned Data. Journal of Open Source Software, 7(69), 4031. https://doi.
Cite
CITATION STYLE
Yu, Y. (2022). mixR: An R package for Finite Mixture Modeling for Both Raw and Binned Data. Journal of Open Source Software, 7(69), 4031. https://doi.org/10.21105/joss.04031
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.