Convolutional neural networks (CNNs) have demonstrated state-of-the-art performance in computer vision tasks. However, the high computational power demand of running devices of recent CNNs has hampered many of their applications. Recently, many methods have quantized the floating-point weights and activations to fixed-points or binary values to convert fractional arithmetic to integer or bit-wise arithmetic. However, since the distributions of values in CNNs are extremely complex, fixed-points or binary values lead to numerical information loss and cause performance degradation. On the other hand, convolution is composed of multiplications and accumulation, but the implementation of multiplications in hardware is more costly comparing with accumulation. We can preserve the rich information of floating-point values on dedicated low power devices by considerably decreasing the multiplications. In this paper, we quantize the floating-point weights in each kernel separately to multiple bit planes to remarkably decrease multiplications. We obtain a closed-form solution via an aggressive Lloyd algorithm and the fine-tuning is adopted to optimize the bit planes. Furthermore, we propose dual normalization to solve the pathological curvature problem during fine-tuning. Our quantized networks show negligible performance loss compared to their floating-point counterparts.
CITATION STYLE
Zeng, L., Wang, Z., & Tian, X. (2019). KCNN: Kernel-wise quantization to remarkably decrease multiplications in convolutional neural network. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2019-August, pp. 4234–4242). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/588
Mendeley helps you to discover research relevant for your work.