Deep learning applications are compute intensive and naturally parallel; this has spurred the development of new processor architectures tuned for the work load. In this paper, we consider structural differences between deep learning neural networks and more conventional circuits - highlighting how this impacts strategies for mapping neural network compute kernels onto available hardware. We present an efficient mapping approach based on dynamic programming, and also a method to establish performance bounds. We also propose an architectural approach to extend the practical life time of hardware accelerators, enabling the integration of a variety of heterogenous processors into a high performance system. Experimental results using benchmarks from a recent ISPD contest are also reported.
CITATION STYLE
Özdemir, S., Khasawneh, M., Rao, S., & Madden, P. H. (2022). Kernel Mapping Techniques for Deep Learning Neural Network Accelerators. In Proceedings of the International Symposium on Physical Design (pp. 21–28). Association for Computing Machinery. https://doi.org/10.1145/3505170.3506730
Mendeley helps you to discover research relevant for your work.