BitX: Empower Versatile Inference with Hardware Runtime Pruning

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Classic DNN pruning mostly leverages software-based methodologies to tackle the accuracy/speed tradeoff, which involves complicated procedures like critical parameter searching, fine-tuning and sparse training to find the best plan. In this paper, we explore the opportunities of hardware runtime pruning and propose a hardware runtime pruning methodology, termed as "BitX"to empower versatile DNN inference. It targets the abundant useless bits in the parameters, pinpoints and prunes these bits on-the-fly in the proposed BitX accelerator. The versatility of BitX lies in: (1) software effortless; (2) orthogonal to the software-based pruning; and (3) multi-precision support (including both floating point and fixed point). Empirical studies on image classification and object detection models highlight the following results: (1) up to 4.82x speedup over the original non-pruned DNN and 14.76x speedup collaborated with the software-pruned DNN; (2) up to 0.07% and 0.9% higher accuracy for the floating-point and fixed-point DNN, respectively; (3) 2.00x and 3.79x performance improvement over the state-of-the-art accelerators, with 0.039 mm2 and 68.62 mW (floating-point 32), 36.41 mW(16-bit fixed point) power consumption under TSMC 28 nm technology library.

Cite

CITATION STYLE

APA

Li, H., Lu, H., Huang, J., Wang, W., Zhang, M., Chen, W., … Li, X. (2021). BitX: Empower Versatile Inference with Hardware Runtime Pruning. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3472456.3472513

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free