In order to apply the Minimum Description Length Principle, one must associate each model in the model class under consideration with a corresponding code. For probabilistic model classes, there is a principled and generally agreed-upon method for doing this; for non-probabilistic model classes (i.e. classes of functions together with associated error functions) it is not so clear how to do this. Here, we present a new method for probabilistic and non-probabilistic model classes alike. Our method can be re-interpreted as mapping arbitrary model classes to associated classes of probability distributions. The method can therefore also be applied in a Bayesian context. In contrast to earlier proposals by Barron, Yamanishi and Rissanen and to the ad-hoc solutions found in applications of MDL, our method involves learning the optimal scaling factor in the mapping from models to codes/probability distributions from the data at hand. We show that this method satisfies several optimality properties. We present several theorems that suggest that with the help of our mapping of models to codes, one can successfully learn using MDL and/or Bayesian methods when (1) almost arbitrary model classes and error functions are allowed, and (2) none of the models in the class under consideration are close to the 'truth' that generates the data.
CITATION STYLE
Grunwald, P. (1999). Viewing all models as “probabilistic.” In Proceedings of the Annual ACM Conference on Computational Learning Theory (pp. 171–182). ACM. https://doi.org/10.1145/307400.307436
Mendeley helps you to discover research relevant for your work.