Accelerating Deep Neural Networks on Mobile Multicore NPUs

Hanwoong Jung; Hexiang Ji; Alexey Pushchin; Maxim Ostapenko; Wenlong Niu; Ilya Palachev; Yutian Qu; Pavel Fedin; Yuri Gribov; Heewoo Nam; Dongguen Lim; Hyunjun Kim; Joonho Song; Seungwon Lee; Hwansoo Han

Conference ProceedingsOPEN ACCESS

Accelerating Deep Neural Networks on Mobile Multicore NPUs

CGO 2023 - Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization (2023) 236-248

DOI: 10.1145/3579990.3580015

1Citations

11Readers

Get full text

Abstract

Neural processing units (NPUs) have become indispensable parts of mobile SoCs. Furthermore, integrating multiple NPU cores into a single chip becomes a promising solution for ever-increasing computing power demands in mobile devices. This paper addresses techniques to maximize the utilization of NPU cores and reduce the latency of on-device inference. Mobile NPUs typically have a small amount of local memory (or scratch pad memory, SPM) that provides space only enough for input/output tensors and weights of one layer operation in deep neural networks (DNNs). Even in multicore NPUs, such local memories are distributed across the cores. In such systems, executing network layer operations in parallel is the primary vehicle to achieve performance. By partitioning a layer of DNNs into multiple sub-layers, we can execute them in parallel on multicore NPUs. Within a core, we can also employ pipelined execution to reduce the execution time of a sub-layer. In this execution model, synchronizing parallel execution and loading/storing intermediate tensors in global memory are the main bottlenecks. To alleviate these problems, we propose novel optimization techniques which carefully consider partitioning direction, execution order, synchronization, and global memory access. Using six popular convolutional neural networks (CNNs), we evaluate our optimization techniques in a flagship mobile SoC with three cores. Compared to the highest-performing partitioning approach, our techniques improve performance by 23%, achieving a speedup of 2.1x over single-core systems.

Author supplied keywords

Cite

CITATION STYLE

APA

Jung, H., Ji, H., Pushchin, A., Ostapenko, M., Niu, W., Palachev, I., … Han, H. (2023). Accelerating Deep Neural Networks on Mobile Multicore NPUs. In CGO 2023 - Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization (pp. 236–248). Association for Computing Machinery, Inc. https://doi.org/10.1145/3579990.3580015

Accelerating Deep Neural Networks on Mobile Multicore NPUs

Abstract

Author supplied keywords

Cite

Register to see more suggestions