Bayes' rule specifies how to obtain a posterior from a class of hypotheses endowed with a prior and the observed data. There are three principle ways to use this posterior for predicting the future: marginalization (integration over the hypotheses w.r.t. the posterior), MAP (taking the a posteriori most probable hypothesis), and stochastic model selection (selecting a hypothesis at random according to the posterior distribution). If the hypothesis class is countable and contains the data generating distribution, strong consistency theorems are known for the former two methods, asserting almost sure convergence of the predictions to the truth as well as loss bounds. We prove the first corresponding results for stochastic model selection. As a main technical tool, we will use the concept of a potential: this quantity, which is always positive, measures the total possible amount of future prediction errors. Precisely, in each time step, the expected potential decrease upper bounds the expected error. We introduce the entropy potential of a hypothesis class as its worst-case entropy with regard to the true distribution. We formulate our results in the online classification framework, but they are equally applicable to the prediction of non-i.i.d. sequences. © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Poland, J. (2006). The missing consistency theorem for Bayesian learning: Stochastic model selection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4264 LNAI, pp. 259–273). Springer Verlag. https://doi.org/10.1007/11894841_22
Mendeley helps you to discover research relevant for your work.