Sign up & Download
Sign in

GENERALIZED SMOOTH FINITE MIXTURES

by Mattias Villani, Robert Kohn, David Nott
Statistics (2010)

Cite this document (BETA)

Available from www.cbe.anu.edu.au
Page 1
hidden

GENERALIZED SMOOTH FINITE MIXTURES

GENERALIZED SMOOTH FINITE MIXTURES
MATTIAS VILLANI, ROBERT KOHN, AND DAVID NOTT
Abstract. We propose a general class of models and a unified Bayesian inference method-
ology for flexibly estimating the density of a response variable conditional on a possibly
high-dimensional set of covariates. Our model is a finite mixture of component models with
covariate-dependent mixing weights. The component densities can belong to any parametric
family, with each model parameter being a deterministic function of covariates though a link
function. Our MCMC methodology allows for Bayesian variable selection among the covari-
ates in the mixture components and in the mixing weights. The model's parametrization and
variable selection prior are chosen to prevent overfitting. We use simulated and several real
data sets to illustrate the methodology.
Keywords: Bayesian inference, Markov Chain Monte Carlo, Mixture of Experts, GLM,
Variable selection.
1. Introduction
We propose a general methodology for flexibly estimating the conditional predictive den-
sity p(yjx) for all x, where y is univariate discrete or continuous, and x is a possibly high-
dimensional set of covariates. Making inferences on the whole density p(yjx) is crucial in many
applications. One obvious example is the Value-at-Risk measure in finance, which is defined
as a tail quantile in a return distribution. Another example is the distribution of the aggregate
firm default rate in the economy as a function of indicators of macroeconomic activity.
Our model is a generalization of the Smooth Adaptive Gaussian Mixture (SAGM) in Vil-
lani et al. (2009). SAGM is a mixture model with covariate dependent weights modelled as
a multinomial logit. The mixture components in the SAGM model are Gaussian with the
mean and log variance functions of covariates, so SAGM is a model for a continuous response
variable. Li et al. (2010) extend the SAGM model by having skewed student-t components
in the mixture. The scope of our model and inference methodology is much larger as the
components can belong to essentially any distributional family, so the model applies to both
continuous and discrete y, and to densities outside the exponential family. The parameters in
the component densities are linked to different sets of covariates via arbitrary link functions.
As an example, a smooth mixture of beta-binomial components with both the mean and over-
dispersion parameter linked to covariates is a very flexible model for binomial data. Section
5.1 models proportions data using a smooth mixture of Beta densities with the mean and the
over-dispersion parameter in each component both being functions of covariates.
Villani: Research Division, Sveriges Riksbank, SE-103 37 Stockholm, Sweden and Department of Statistics,
Stockholm University. E-mail: mattias.villani@riksbank.se. Kohn: Australian School of Business, University
of New South Wales, UNSW, Sydney 2052, Australia. Nott: Department of Statistics and Applied Probability,
National University of Singapore. The views expressed in this paper are solely the responsibility of the author
and should not be interpreted as reflecting the views of the Executive Board of Sveriges Riksbank. Robert
Kohn was partially supported by ARC grant DP0667069.
1
Page 2
hidden
GENERALIZED SMOOTH FINITE MIXTURES 2
A key component of our approach is an extension of the highly efficient MCMC method
in Villani et al. (2009). This method generates joint proposals of the model coefficients and
the variable selection indicators from a tailored proposal density obtained by taking a few
Newton-Raphson steps toward the full conditional mode in each iteration of the algorithm.
The model setup with low-dimensional model parameters linked to possibly high-dimensional
sets of covariates, and the availability of gradient and Hessians in analytically tractable form,
makes this a fast and very efficient MCMC scheme. Previous literature has developed several
clever MCMC schemes for smooth mixtures with specific component densities (Peng et al.
(1996); Wood et al. (2002); Wood et al. (2008); Geweke and Keane (2007)). An important
advantage of our inferential procedure is that it treats all models in a uniform way, regardless
of the distributional form of the component densities. It is therefore possible to write general
computer code where completely new component models can be implemented very rapidly.
The user only needs to code up the likelihood, gradient and (optionally) the hessian, but only
with respect to the low-dimensional (typically scalar) parameters of the component densities,
which in most cases is a trivial exercise.
Villani et al. (2009) showed how a smooth mixture of homoscedastic Gaussian components is
able to generate some heteroscedasticity, but seldom enough to fit heteroscedastic data well, at
least in situations with more than a couple of covariates. We show here that this result seems
to generalize to other data types. Using mixtures of over-dispersed components improves the
out-of-sample performance of the predictive density compared to smooth mixtures of simpler,
equi-dispersed, components. A simulation study shows that when analyzing over-dispersed
count data, it is better to allow for over-dispersion in the mixture components than to rely
on a smooth mixture of Poissons to generate the over-dispersion, even if the estimated over-
dispersed model is mis-specified. An application to German health reform data points clearly
in the same direction. As a side note, we find that the less well-known Generalized Poisson
regression model in Consul and Jain (1973) and Famoye and Singh (2006) is a strong competitor
to the traditionally used Negative Binomial model regression model for count data, both in
terms of predictive performance and computing time.
Our model class is obviously very flexible and typically over-parametrized. We confront
over-fitting using mainly three different strategies. First, we use a Bayesian approach with
easily specified priors on low-dimensional aspects of the model. Second, we use Bayesian
variable selection among the covariates in all the component parameters and in the mixing
function. This automatically reduces the number of effective parameters, especially in multi-
component models, and also gives insights about the importance of covariates in the different
parts of the model. Finally, we parametrize the components as deviations from a reference
component. This means that the variable selection prior is not setting coefficients to zero, but
rather pushing components toward each other, which can lead to very efficient parsimony in
some situations. All in all, our model seems to be very flexible when needed, but is able to
simplify when data suggests a more parsimonious data generating process.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

4 Readers on Mendeley
by Discipline
 
by Academic Status
 
25% Doctoral Student
 
25% Ph.D. Student
 
25% Researcher (at a non-Academic Institution)
by Country
 
50% Sweden
 
25% Japan
 
25% Denmark