The Straight-Through Estimator (STE) [Hinton, 2012][Bengio et al., 2013] is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blending (AB), which quantizes neural networks to low precision using stochastic gradient descent (SGD). Our (AB) method avoids STE approximation by replacing the quantized weight in the loss function by an affine combination of the quantized weight wq and the corresponding full-precision weight w with non-trainable scalar coefficient α and (1−α). During training, α is gradually increased from 0 to 1; the gradient updates to the weights are through the full precision term, (1−α)w, of the affine combination; the model is converted from full-precision to low precision progressively. To evaluate the (AB) method, a 1-bit BinaryNet [Hubara et al., 2016a] on CIFAR10 dataset and 8-bits, 4-bits MobileNet v1, ResNet 50 v1/2 on ImageNet are trained using the alpha-blending approach, and the evaluation indicates that AB improves top-1 accuracy by 0.9%, 0.82% and 2.93% respectively compared to the results of STE based quantization [Hubara et al., 2016a] 1 [Krishnamoorthi, 2018].
CITATION STYLE
Liu, Z. G., & Mattina, M. (2019). Learning low-precision neural networks without Straight-Through Estimator (STE). In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2019-August, pp. 3066–3072). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/425
Mendeley helps you to discover research relevant for your work.