Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We propose a novel learning-based algorithm to derive mixed precision models end-to-end under target computation constraints and model sizes. During the optimization, the bitwidth of each layer / kernel in the model is at a fractional status of two consecutive bit-widths which can be adjusted gradually. With a differentiable regularization term, the resource constraints can be met during the quantization-aware training which results in an optimized mixed precision model. Our final models achieve comparable or better performance than previous quantization methods with mixed precision on MobilenetV1/V2, ResNet18 under different resource constraints on ImageNet dataset.
CITATION STYLE
Yang, L., & Jin, Q. (2021). FracBits: Mixed Precision Quantization via Fractional Bit-Widths. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 12A, pp. 10612–10620). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i12.17269
Mendeley helps you to discover research relevant for your work.