Abstract
We consider the archetypal learning problem where a finite sample of examples generated by an underlying random process is made available to the learner who generates a hypothesis in a model class by gradient descent over the empirical loss function. In this context, we derive two criteria for machine size selection for a class of general nonlinear machines which includes feedforward neural networks as a subclass. The first criterion yields simultaneous estimates of optimal machine size and optimal stopping time for the gradient descent learning algorithm and may be viewed as a formal extension of Akaike's information criterion (AIC) to include general models and the learning process per se. This criterion results in optimal generalization-in the sense of minimizing the loss function-but may not lead to consistent estimation. The second criterion admits of a selection of machine size which leads to consistent estimation but is provably nonoptimal. The latter criterion has the same asymptotic form as Rissanen's minimum description length principle (MDL). A study of the properties of the two criteria sheds light on the effects of AIC and MDL on generalization performance, and provides guidelines in effecting a choice between the two types of model size selection criteria.
Cite
CITATION STYLE
Wang, C., & Venkatesh, S. S. (1995). Criteria for specifying machine complexity in learning. In Proceedings of the 8th Annual Conference on Computational Learning Theory, COLT 1995 (Vol. 1995-January, pp. 273–280). Association for Computing Machinery. https://doi.org/10.1145/225298.225331
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.