APNN-TC: Accelerating arbitrary precision neural networks on ampere GPU tensor cores

Boyuan Feng; Yuke Wang; Tong Geng; Ang Li; Yufei Ding

Conference ProceedingsOPEN ACCESS

APNN-TC: Accelerating arbitrary precision neural networks on ampere GPU tensor cores

International Conference for High Performance Computing, Networking, Storage and Analysis, SC (2021)

DOI: 10.1145/3458817.3476157

25Citations

22Readers

Get full text

Abstract

Over the years, accelerating neural networks with quantization has been widely studied. Unfortunately, prior efforts with diverse precisions (e.g., 1-bit weights and 2-bit activations) are usually restricted by limited precision support on GPUs (e.g., int1 and int4). To break such restrictions, we introduce the first Arbitrary Precision Neural Network framework (APNN-TC)1 to fully exploit quantization benefits on Ampere GPU Tensor Cores. Specifically, APNN-TC first incorporates a novel emulation algorithm to support arbitrary short bit-width computation with int1 compute primitives and XOR/AND Boolean operations. Second, APNN-TC integrates arbitrary precision layer designs to efficiently map our emulation algorithm to Tensor Cores with novel batching strategies and specialized memory organization. Third, APNN-TC embodies a novel arbitrary precision NN design to minimize memory access across layers and further improve performance. Extensive evaluations show that APNN-TC can achieve significant speedup over CUTLASS kernels and various NN models, such as ResNet and VGG.

Author supplied keywords

Cite

CITATION STYLE

APA

Feng, B., Wang, Y., Geng, T., Li, A., & Ding, Y. (2021). APNN-TC: Accelerating arbitrary precision neural networks on ampere GPU tensor cores. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society. https://doi.org/10.1145/3458817.3476157

APNN-TC: Accelerating arbitrary precision neural networks on ampere GPU tensor cores

Abstract

Author supplied keywords

Cite

Register to see more suggestions