Abstract
The emerging non-volatile memory (eNVM) based mixed-signal Compute-in-Memory (CIM) accelerators are of great interest in today's AI accelerators design due to their high energy efficiency. Various CIM architectures and circuit-level designs have been proposed, showing superior hardware performance for deep neural network (DNN) acceleration. However, hardware-aware quantization strategies for CIM-based accelerators are not systematically explored. Since there are a variety of design options for neural network mapping on CIM systems while improper strategies may narrow the circuit-level design space and further limit the hardware performance, it is important to make a comprehensive early-stage design space exploration and find appropriate quantization/mapping strategies to achieve better hardware performance. In this paper, we provide a joint algorithm-hardware analysis and compare the system-level hardware performance for various design options, including quantization algorithms, data representation methods and analog-to-digital converter (ADC) configurations. This work aims to propose guidelines for choosing more hardware-friendly design options for chip architects. According to our evaluation results for CIFAR-10/100 and ImageNet classification, the properly chosen quantization approach and optimal mapping strategy (dynamic fixed-point quantization + 2's complement representation/shifted unsigned INT representation + optimized precision ADC) could achieve ∼2 × energy efficiency and 1.2 ∼1.6 × throughput with 5%∼25% reduced area overhead, compared to naïve strategy (fixed-point quantization + differential pair number representation + full precision ADC).
Author supplied keywords
Cite
CITATION STYLE
Huang, S., Jiang, H., & Yu, S. (2022). Hardware-aware Quantization/Mapping Strategies for Compute-in-Memory Accelerators. ACM Transactions on Design Automation of Electronic Systems, 28(3). https://doi.org/10.1145/3569940
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.