Design space exploration for layer-parallel execution of convolutional neural networks on CGRAs

9Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this work, we systematically explore the design space of throughput, energy, and hardware costs for layer-parallel mappings of Convolutional Neural Networks (CNNs) onto coarse-grained reconfigurable arrays (CGRAs). We derive an analytical model that computes the required resources (processing elements) and buffer memory and thus hardware cost C to sustain a given throughput T as well as the resulting overall energy consumption E for inference. Further, we propose an efficient design space exploration (DSE) to determine the fronts of Pareto-optimal (T,E,C) solutions. This exploration helps to determine the limits of scalability of the presented tiled CGRA accelerator architectures in terms of throughput, the number of parallel layers that can be simultaneously processed, and memory requirements. Finally, we provide an evaluation of energy savings achievable on our architecture in comparison to implementations that execute sequentially a CNN layer-by-layer. In experiments, it is shown that layer-parallel processing is able to reduce energy consumption E by 3.6X, hardware cost C by 1.2X, and increase the achievable throughput T by 6.2X for MobileNet.

Cite

CITATION STYLE

APA

Heidorn, C., Hannig, F., & Teich, J. (2020). Design space exploration for layer-parallel execution of convolutional neural networks on CGRAs. In Proceedings of the 23rd International Workshop on Software and Compilers for Embedded Systems, SCOPES 2020 (pp. 26–31). Association for Computing Machinery, Inc. https://doi.org/10.1145/3378678.3391878

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free