Ensemble learning and evidence maximization

Djc MacKay

Journal Article

Ensemble learning and evidence maximization

MacKay D

Proc. NIPS (1995) 0-7

N/ACitations

36Readers

Abstract

Ensemble learning by variational free energy minimization is a tool introduced to neural networks by Hinton and van Camp in which learning is described in terms of the optimization of an ensemble of parameter vectors. The optimized ensemble is an approximation to the posterior probability distribution of the parameters. This tool has now been applied to a variety of statistical inference problems. In this paper I study a linear regression model with both parameters and hyper-parameters. I demonstrate that the evidence approximation for the optimization of regularization constants can be derived in detail from a free energy minimization viewpoint. 1 Ensemble Learning by Free Energy Minimization A new tool has recently been introduced into the eld of neural networks and statistical inference. In traditional approaches to neural networks, a single parameter vector w is optimized by maximum likelihood or penalized maximum likelihood. In the Bayesian interpretation , these optimized parameters are viewed as deening the mode of a posterior probability distribution P (wjD; H) (given data D and model assumptions H), which can be approximated, with a Gaussian distribution ~ P for example (MacKay 1992b), in order to obtain predictive distributions and optimize model control parameters. The new concept introduced by Hinton and van Camp (1993) is to work in terms of an approximating ensemble Q(w;), that is, a probability distribution over the parameters, and optimize the ensemble (by varying its own parameters) so that it approximates the posterior distribution of the parameters P (wjD; H) well. The objective function chosen to measure the quality of the approximation is a variational free energy, 1 F () = ? Z d k w Q(w;) log P (Djw; H)P(wjH) Q(w;) (1) The free energy F () is bounded below by ? log P (DjH) and only attains this value for Q(w;) = P (wjD; H). F () can be viewed as the sum of ? log P (DjH) and the Kullback-1 Variational free energy minimization is a well-established tool in statistical physics (Feynman 1972); `mean neld theory' is an important special case. The free energy can also be described in terms of description lengths.

Cite

CITATION STYLE

APA

MacKay, D. (1995). Ensemble learning and evidence maximization. Proc. NIPS, 0–7. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.4083&rep=rep1&type=pdf

Ensemble learning and evidence maximization

Abstract

Cite

Register to see more suggestions