On-the-Fly Lowering Engine: Offloading Data Layout Conversion for Convolutional Neural Networks

1Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Many deep learning frameworks utilize GEneral Matrix Multiplication (GEMM)-based convolution to accelerate CNN execution. GEMM-based convolution provides faster convolution yet requires a data conversion process called lowering (i.e., im2col), which incurs significant memory overhead and diminishes performance. This paper proposes a novel hardware mechanism, called On-the-fly Lowering Engine (OLE), to eliminate the lowering overheads. Our goal is to offload the lowering overheads from the GEMM-based convolution. With OLE, the lowered matrix is neither pre-calculated nor stored in the main memory. Instead, a hardware engine generates lowered matrix on-the-fly from the original input matrix to reduce memory footprint and bandwidth requirements. Furthermore, the hardware offloading eliminates CPU cycles for lowering operation and overlaps computation with lowering to hide the performance overhead. Our evaluation shows that OLE can reduce memory footprint of convolutional layer inputs down to 1/12.5× and the overall memory footprint by up to 33.5%. Moreover, OLE can reduce the execution time of convolutional layers by 57.7% on average, resulting in an average speedup of 2.3× for representative CNN models.

Author supplied keywords

Cite

CITATION STYLE

APA

Kang, M., Hyun, S., Han, T. H., Kim, J., & Hong, S. (2022). On-the-Fly Lowering Engine: Offloading Data Layout Conversion for Convolutional Neural Networks. IEEE Access, 10, 79730–79746. https://doi.org/10.1109/ACCESS.2022.3192618

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free