A layer-wise frequency scaling for a neural processing unit

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Dynamic voltage frequency scaling (DVFS) has been widely adopted for run-time power management of various processing units. In the case of neural processing units (NPUs), power management of neural network applications is required to adjust the frequency and voltage every layer to consider the power behavior and performance of each layer. Unfortunately, DVFS is inappropriate for layer-wise run-time power management of NPUs due to the long latency of voltage scaling compared with each layer execution time. Because the frequency scaling is fast enough to keep up with each layer, we propose a layer-wise dynamic frequency scaling (DFS) technique for an NPU. Our proposed DFS exploits the highest frequency under the power limit of an NPU for each layer. To determine the highest allowable frequency, we build a power model to predict the power consumption of an NPU based on a real measurement on the fabricated NPU. Our evaluation results show that our proposed DFS improves frame per second (FPS) by 33% and saves energy by 14% on average, compared with DVFS.

Cite

CITATION STYLE

APA

Chung, J., Kim, H. M., Shin, K., Lyuh, C. G., Cho, Y. C. P., Han, J., … Chung, S. W. (2022). A layer-wise frequency scaling for a neural processing unit. ETRI Journal, 44(5), 849–858. https://doi.org/10.4218/etrij.2022-0094

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free