Simba: Scaling Deep-Learning Inference with Chiplet-Based Architecture

Yakun Sophia Shao; Jason Cemons; Rangharajan Venkatesan; Brian Zimmer; Matthew Fojtik; Nan Jiang; Ben Keller; Alicia Klinefelter; Nathaniel Pinckney; Priyanka Raina; Stephen G. Tell; Yanqing Zhang; William J. Dally; Joel Emer; C. Thomas Gray; Brucek Khailany; Stephen W. Keckler

Conference ProceedingsOPEN ACCESS

Simba: Scaling Deep-Learning Inference with Chiplet-Based Architecture

Communications of the ACM (2021) 64(6) 107-116

DOI: 10.1145/3460227

19Citations

27Readers

Abstract

Package-level integration using multi-chip-modules (MCMs) is a promising approach for building large-scale systems. Compared to a large monolithic die, an MCM combines many smaller chiplets into a larger system, substantially reducing fabrication and design costs. Current MCMs typically only contain a handful of coarse-grained large chiplets due to the high area, performance, and energy overheads associated with inter-chiplet communication. This work investigates and quantifies the costs and benefits of using MCMs with finegrained chiplets for deep learning inference, an application domain with large compute and on-chip storage requirements. To evaluate the approach, we architected, implemented, fabricated, and tested Simba, a 36-chiplet prototype MCM system for deep-learning inference. Each chiplet achieves 4 TOPS peak performance, and the 36-chiplet MCM package achieves up to 128 TOPS and up to 6.1 TOPS/W. The MCM is configurable to support a flexible mapping of DNN layers to the distributed compute and storage units. To mitigate inter-chiplet communication overheads, we introduce three tiling optimizations that improve data locality. These optimizations achieve up to 16% speedup compared to the baseline layer mapping. Our evaluation shows that Simba can process 1988 images/s running ResNet-50 with a batch size of one, delivering an inference latency of 0.50 ms.

Cite

CITATION STYLE

APA

Shao, Y. S., Cemons, J., Venkatesan, R., Zimmer, B., Fojtik, M., Jiang, N., … Keckler, S. W. (2021). Simba: Scaling Deep-Learning Inference with Chiplet-Based Architecture. In Communications of the ACM (Vol. 64, pp. 107–116). Association for Computing Machinery. https://doi.org/10.1145/3460227

Simba: Scaling Deep-Learning Inference with Chiplet-Based Architecture

Abstract

Cite

Register to see more suggestions