Accelerating DNNS inference with predictive layer fusion

Mohammad Hossein Olyaiy; Christopher Ng; Mieszko Lis

Conference ProceedingsOPEN ACCESS

Accelerating DNNS inference with predictive layer fusion

Proceedings of the International Conference on Supercomputing (2021) 291-303

DOI: 10.1145/3447818.3460378

8Citations

9Readers

Abstract

Many modern convolutional neural neworks (CNNs) rely on bottleneck block structures where the activation tensor is mapped between higher dimensions using an intermediate low dimension, and convolved with depthwise feature filters rather than multichannel filters. Because most of the computation lies in computing the large dimensional tensors, however, such networks cannot be scaled without significant computation costs. In this paper, we show how fusing the layers inside these blocks can dramatically reduce the multiplication count (by 6-20×) at the cost of extra additions. ReLU nonlinearities are predicted dynamically, and only the activations that survive ReLU contribute to directly compute the output of the block. We also propose FusioNet, a CNN architecture optimized for fusion, as well as Archon, a novel accelerator design with a dataflow optimized for fused networks. When FusioNet is executed on the proposed accelerator, it yields up to 5.8× faster inference compared to compact networks executed on a dense DNN accelerator, and 2.1 × faster inference compared to the same networks when pruned and executed on a sparse DNN accelerator.

Author supplied keywords

Cite

CITATION STYLE

APA

Olyaiy, M. H., Ng, C., & Lis, M. (2021). Accelerating DNNS inference with predictive layer fusion. In Proceedings of the International Conference on Supercomputing (pp. 291–303). Association for Computing Machinery. https://doi.org/10.1145/3447818.3460378

Accelerating DNNS inference with predictive layer fusion

Abstract

Author supplied keywords

Cite

Register to see more suggestions