Distilling bit-level sparsity parallelism for general purpose deep learning acceleration

51Citations
Citations of this article
57Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Along with the rapid evolution of deep neural networks, the everincreasing complexity imposes formidable computation intensity to the hardware accelerator. In this paper, we propose a novel computing philosophy called "bit interleaving"and the associate accelerator design called "Bitlet"to maximally exploit the bit-level sparsity. Apart from existing bit-serial/parallel accelerators, Bitlet leverages the abundant "sparsity parallelism"in the parameters to enforce the inference acceleration. Bitlet is versatile by supporting diverse precisions on a single platform, including floating-point 32 and fixed-point from 1b to 24b. The versatility enables Bitlet feasible for both efficient inference and training. Empirical studies on 12 domain-specific deep learning applications highlight the following results: (1) up to 81×/21× energy efficiency improvement for training/inference over recent high performance GPUs; (2) up to 15×/8× higher speedup/efficiency over state-of-the-art fixed-point accelerators; (3) 1.5mm2 area and scalable power consumption from 570mW (f loat32) to 432mW (16b) and 365mW (8b) @28nm TSMC; (4) highly configurable justified by ablation and sensitivity studies.

Cite

CITATION STYLE

APA

Lu, H., Chang, L., Li, C., Zhu, Z., Lu, S., Liu, Y., & Zhang, M. (2021). Distilling bit-level sparsity parallelism for general purpose deep learning acceleration. In Proceedings of the Annual International Symposium on Microarchitecture, MICRO (pp. 963–976). IEEE Computer Society. https://doi.org/10.1145/3466752.3480123

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free