To meet the growing demand from deep learning applications for computing resources, accelerators by ASIC are necessary. A wafer-scale engine (WSE) is recently proposed [1], which is able to simultaneously accelerate multiple layers from a neural network (NN). However, without a high-quality placement that properly maps NN layers onto the WSE, the acceleration efficiency cannot be achieved. Here, the WSE placement resembles the traditional ASIC floorplan problem of placing blocks onto a chip region, but they are fundamentally different. Since the slowest layer determines the compute time of the whole NN on WSE, a layer with a heavier workload needs more computing resources. Besides, locations of layers and protocol adapter cost of internal 10 connections will influence inter-layer communication overhead. In this paper, we propose GigaPlacer to handle this new challenge. A binary-search-based framework is developed to obtain a minimum compute time of the NN. Two dynamic-programming-based algorithms with different optimizing strategies are integrated to produce legal placement. The distance and adapter cost between connected layers will be further minimized by some refinements. Compared with the first place of the ISPD2020 Contest, GigaPlacer reduces the contest metric by up to 6.89% and on average 2.09%, while runs 7.23x faster.
CITATION STYLE
Li, B., Du, Q., Liu, D., Zhang, J., Chen, G., & You, H. (2021). Placement for Wafer-Scale Deep Learning Accelerator. In Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC (pp. 665–670). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3394885.3431563
Mendeley helps you to discover research relevant for your work.