Criteria for specifying machine complexity in learning

0Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We consider the archetypal learning problem where a finite sample of examples generated by an underlying random process is made available to the learner who generates a hypothesis in a model class by gradient descent over the empirical loss function. In this context, we derive two criteria for machine size selection for a class of general nonlinear machines which includes feedforward neural networks as a subclass. The first criterion yields simultaneous estimates of optimal machine size and optimal stopping time for the gradient descent learning algorithm and may be viewed as a formal extension of Akaike's information criterion (AIC) to include general models and the learning process per se. This criterion results in optimal generalization-in the sense of minimizing the loss function-but may not lead to consistent estimation. The second criterion admits of a selection of machine size which leads to consistent estimation but is provably nonoptimal. The latter criterion has the same asymptotic form as Rissanen's minimum description length principle (MDL). A study of the properties of the two criteria sheds light on the effects of AIC and MDL on generalization performance, and provides guidelines in effecting a choice between the two types of model size selection criteria.

Cite

CITATION STYLE

APA

Wang, C., & Venkatesh, S. S. (1995). Criteria for specifying machine complexity in learning. In Proceedings of the 8th Annual Conference on Computational Learning Theory, COLT 1995 (Vol. 1995-January, pp. 273–280). Association for Computing Machinery. https://doi.org/10.1145/225298.225331

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free