Integrated Imager and 3.22 μs/Kernel-Latency All-Digital In-Imager Global-Parallel Binary Convolutional Neural Network Accelerator for Image Processing

Ruizhi Wang; Cheng Hsuan Wu; Makoto Takamiya

Journal ArticleOPEN ACCESS

Integrated Imager and 3.22 μs/Kernel-Latency All-Digital In-Imager Global-Parallel Binary Convolutional Neural Network Accelerator for Image Processing

IEEE Access (2023) 11 74364-74378

DOI: 10.1109/ACCESS.2023.3296429

1Citations

10Readers

Abstract

This paper presents an innovative approach to achieve ultralow-latency convolutional neural network (CNN) processing, which is critical for real-time image processing applications such as autonomous driving and virtual reality. Traditional CNN accelerators employing in/near-array-computing (inclusive of in/near-memory-computing and in/near-sensor-computing) architectures have struggled to meet real-time requirements due to latency bottlenecks encountered with conventional column-parallel processing for image processing. To address this challenge, we propose a novel, all-digital in- imager global-parallel binary convolutional neural network (IIGP-BNN) accelerator. This new approach employs a global-parallel processing concept, which enables multiply-and-accumulate operations (MACs) to be executed simultaneously within the imager array in a 2D manner, eliminating the additional latency associated with row-by-row processing and data access from random access memories (RAMs). In this design, convolution and subsampling operations using a 3 × 3 kernel are completed within just nine steps of global-parallel processing, regardless of image size. This results in a theoretical reduction of over 88.5% of repeated row scans compared to conventional column-parallel processing architectures, thus significantly reducing computing latency. We have designed and prototyped a 30 × 30 integrated imager and IIGP-BNN accelerator IC using a 0.18m CMOS process. This prototype achieved a latency of 3.22s /kernel on the first layer convolution at a power supply of 1 V and a clock frequency of 35.7 MHz. This represents a latency reduction of 35.6% compared to the state-of-the-art in/near-imager-computing works. This proposed global-parallel processing concept opens up the potential for processing high-resolution images in 4K and 8K with the same ultralow latency, marking a significant advancement in high-speed image processing.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, R., Wu, C. H., & Takamiya, M. (2023). Integrated Imager and 3.22 μs/Kernel-Latency All-Digital In-Imager Global-Parallel Binary Convolutional Neural Network Accelerator for Image Processing. IEEE Access, 11, 74364–74378. https://doi.org/10.1109/ACCESS.2023.3296429

Integrated Imager and 3.22 μs/Kernel-Latency All-Digital In-Imager Global-Parallel Binary Convolutional Neural Network Accelerator for Image Processing

Abstract

Author supplied keywords

Cite

Register to see more suggestions