Probabilistic adaptive computation time

3Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

We present a probabilistic model with discrete latent variables that control the computation time in deep learning models such as ResNets and LSTMs. A prior on the latent variables expresses the preference for faster computation. The amount of computation for an input is determined via amortized maximum a posteriori (MAP) inference. MAP inference is performed using a novel stochastic variational optimization method. The recently proposed adaptive computation time mechanism can be seen as an ad-hoc relaxation of this model. We demonstrate training using the general-purpose concrete relaxation of discrete variables. Evaluation on ResNet shows that our method matches the speed-accuracy trade-off of adaptive computation time, while allowing for evaluation with a simple deterministic procedure that has a lower memory footprint.

Cite

CITATION STYLE

APA

Figurnov, M., Sobolev, A., & Vetrov, D. (2018). Probabilistic adaptive computation time. Bulletin of the Polish Academy of Sciences: Technical Sciences, 811–820. https://doi.org/10.24425/bpas.2018.125928

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free