Abstract
Along with the rapid evolution of deep neural networks, the everincreasing complexity imposes formidable computation intensity to the hardware accelerator. In this paper, we propose a novel computing philosophy called "bit interleaving"and the associate accelerator design called "Bitlet"to maximally exploit the bit-level sparsity. Apart from existing bit-serial/parallel accelerators, Bitlet leverages the abundant "sparsity parallelism"in the parameters to enforce the inference acceleration. Bitlet is versatile by supporting diverse precisions on a single platform, including floating-point 32 and fixed-point from 1b to 24b. The versatility enables Bitlet feasible for both efficient inference and training. Empirical studies on 12 domain-specific deep learning applications highlight the following results: (1) up to 81×/21× energy efficiency improvement for training/inference over recent high performance GPUs; (2) up to 15×/8× higher speedup/efficiency over state-of-the-art fixed-point accelerators; (3) 1.5mm2 area and scalable power consumption from 570mW (f loat32) to 432mW (16b) and 365mW (8b) @28nm TSMC; (4) highly configurable justified by ablation and sensitivity studies.
Author supplied keywords
Cite
CITATION STYLE
Lu, H., Chang, L., Li, C., Zhu, Z., Lu, S., Liu, Y., & Zhang, M. (2021). Distilling bit-level sparsity parallelism for general purpose deep learning acceleration. In Proceedings of the Annual International Symposium on Microarchitecture, MICRO (pp. 963–976). IEEE Computer Society. https://doi.org/10.1145/3466752.3480123
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.