In this work, we systematically explore the design space of throughput, energy, and hardware costs for layer-parallel mappings of Convolutional Neural Networks (CNNs) onto coarse-grained reconfigurable arrays (CGRAs). We derive an analytical model that computes the required resources (processing elements) and buffer memory and thus hardware cost C to sustain a given throughput T as well as the resulting overall energy consumption E for inference. Further, we propose an efficient design space exploration (DSE) to determine the fronts of Pareto-optimal (T,E,C) solutions. This exploration helps to determine the limits of scalability of the presented tiled CGRA accelerator architectures in terms of throughput, the number of parallel layers that can be simultaneously processed, and memory requirements. Finally, we provide an evaluation of energy savings achievable on our architecture in comparison to implementations that execute sequentially a CNN layer-by-layer. In experiments, it is shown that layer-parallel processing is able to reduce energy consumption E by 3.6X, hardware cost C by 1.2X, and increase the achievable throughput T by 6.2X for MobileNet.
CITATION STYLE
Heidorn, C., Hannig, F., & Teich, J. (2020). Design space exploration for layer-parallel execution of convolutional neural networks on CGRAs. In Proceedings of the 23rd International Workshop on Software and Compilers for Embedded Systems, SCOPES 2020 (pp. 26–31). Association for Computing Machinery, Inc. https://doi.org/10.1145/3378678.3391878
Mendeley helps you to discover research relevant for your work.