Bucket Getter: A Bucket-based Processing Engine for Low-bit Block Floating Point (BFP) DNNs

13Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Block floating point (BFP), an efficient numerical system for deep neural networks (DNNs), achieves a good trade-off between dynamic range and hardware costs. Specifically, prior works have demonstrated that BFP format with 3 ∼5-bit mantissa can achieve FP32-comparable accuracy for various DNN workloads. We find that the floating-point adder (FP-Acc), which contains modules for normalization, alignment, addition, and fixed-point-to-floating-point (FXP2FP) conversion, dominates the power and area overheads, hence hindering the hardware efficiency of state-of-the-art low-bit BFP processing engines (BFP-PE). To mitigate the identified issue, we propose Bucket Getter, a novel architecture implemented with the following techniques for improving the energy efficiency and area efficiency: 1) we propose a bucket-based accumulation unit prior to FP-Acc, which uses multiple small accumulators (buckets) that are responsible for a small range of exponent values where intermediate results are distributed accordingly, and b) accumulate in FXP domain. This reduces the activities of power-hungry a) alignment and b) format conversion units. 2) We propose inter-bucket carry propagation, which allows each bucket to transmit overflow to an adjacent bucket and further reduces the activity of FP-Acc. 3) We propose an out-of-bound-aware, adaptive and circular bucket accumulator to significantly reduce the overhead for the bucket-based accumulator. 4) We further propose shared FP-Acc, which exploits the low activity of FP-Acc in the bucket-based architecture and shares an FP-Acc across several MAC engines to reduce the area overhead of FP-Acc. The experimental results based on TSMC 40 nm demonstrate that our proposed Bucket Getter architecture reduces the computational energy by up to 57% and improves the area efficiency by up to 1.4 ×, compared to state-of-the-art BFP engines across seven representative DNN models. Furthermore, our proposed approach helps state-of-the-art floating-point engines reduce up to 32% of the PE area and 81% of the PE power.

Cite

CITATION STYLE

APA

Lo, Y. C., & Liu, R. S. (2023). Bucket Getter: A Bucket-based Processing Engine for Low-bit Block Floating Point (BFP) DNNs. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023 (pp. 1002–1015). Association for Computing Machinery, Inc. https://doi.org/10.1145/3613424.3614249

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free