Distilling bit-level sparsity parallelism for general purpose deep learning acceleration

Hang Lu; Liang Chang; Chenglong Li; Zixuan Zhu; Shengjian Lu; Yanhuan Liu; Mingzhe Zhang

Conference ProceedingsOPEN ACCESS

Distilling bit-level sparsity parallelism for general purpose deep learning acceleration

Proceedings of the Annual International Symposium on Microarchitecture, MICRO (2021) 963-976

DOI: 10.1145/3466752.3480123

51Citations

57Readers

Get full text

Abstract

Along with the rapid evolution of deep neural networks, the everincreasing complexity imposes formidable computation intensity to the hardware accelerator. In this paper, we propose a novel computing philosophy called "bit interleaving"and the associate accelerator design called "Bitlet"to maximally exploit the bit-level sparsity. Apart from existing bit-serial/parallel accelerators, Bitlet leverages the abundant "sparsity parallelism"in the parameters to enforce the inference acceleration. Bitlet is versatile by supporting diverse precisions on a single platform, including floating-point 32 and fixed-point from 1b to 24b. The versatility enables Bitlet feasible for both efficient inference and training. Empirical studies on 12 domain-specific deep learning applications highlight the following results: (1) up to 81×/21× energy efficiency improvement for training/inference over recent high performance GPUs; (2) up to 15×/8× higher speedup/efficiency over state-of-the-art fixed-point accelerators; (3) 1.5mm2 area and scalable power consumption from 570mW (f loat32) to 432mW (16b) and 365mW (8b) @28nm TSMC; (4) highly configurable justified by ablation and sensitivity studies.

Author supplied keywords

Cite

CITATION STYLE

APA

Lu, H., Chang, L., Li, C., Zhu, Z., Lu, S., Liu, Y., & Zhang, M. (2021). Distilling bit-level sparsity parallelism for general purpose deep learning acceleration. In Proceedings of the Annual International Symposium on Microarchitecture, MICRO (pp. 963–976). IEEE Computer Society. https://doi.org/10.1145/3466752.3480123

Distilling bit-level sparsity parallelism for general purpose deep learning acceleration

Abstract

Author supplied keywords

Cite

Register to see more suggestions