This paper presents a design strategy of chiplet-based processing-in-memory systems for deep neural network applications. Monolithic silicon chips are area and power limited, failing to catch the recent rapid growth of deep learning algorithms. The paper first demonstrates a straightforward layer-wise method that partitions the workload of a monolithic accelerator to a multi-chiplet pipeline. A quantitative analysis shows that the straightforward separation degrades the overall utilization of computing resources due to the reduced on-chiplet memory size, thus introducing a higher memory wall. A tile interleaving strategy is proposed to overcome such degradation. This strategy can segment one layer to different chiplets which maximizes the computing utilization. To facilitate the strategy, the modification of the chiplet system hardware is also discussed. To validate the proposed strategy, a nine-chiplet processing-in-memory system is evaluated with a custom-designed object detection network. Each chiplet can achieve a peak performance of 204.8GOPS at a 100-MHz rate. The peak performance of the overall system is 1.711TOPS, where no off-chip memory access is needed. By the tile interleaving strategy, the utilization is improved from 53.9 to 92.8
CITATION STYLE
Jiao, B., Zhu, H., Zhang, J., Wang, S., Kang, X., Zhang, L., … Chen, C. (2021). Computing Utilization Enhancement for Chiplet-based Homogeneous Processing-in-Memory Deep Learning Processors. In Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI (pp. 241–246). Association for Computing Machinery. https://doi.org/10.1145/3453688.3461499
Mendeley helps you to discover research relevant for your work.