Deep Partitioned Training from Near-Storage Computing to DNN Accelerators

Yongjoo Jang; Sejin Kim; Daehoon Kim; Sungjin Lee; Jaeha Kung

Journal ArticleOPEN ACCESS

Deep Partitioned Training from Near-Storage Computing to DNN Accelerators

IEEE Computer Architecture Letters (2021) 20(1) 70-73

DOI: 10.1109/LCA.2021.3081752

5Citations

17Readers

Abstract

In this letter, we present deep partitioned training to accelerate computations involved in training DNN models. This is the first work that partitions a DNN model across storage devices, an NPU and a host CPU forming a unified compute node for training workloads. To validate the benefit of using the proposed system during DNN training, a trace-based simulator or an FPGA prototype is used to estimate the overall performance and obtain the layer index to be partitioned that provides the minimum latency. As a case study, we select two benchmarks, i.e., vision-related tasks and a recommendation system. As a result, the training time reduces by 12.2∼31.0 percent with four near-storage computing devices in vision-related tasks with a mini-batch size of 512 and 40.6∼44.7 percent with one near-storage computing device in the selected recommendation system with a mini-batch size of 64.

Author supplied keywords

Cite

CITATION STYLE

APA

Jang, Y., Kim, S., Kim, D., Lee, S., & Kung, J. (2021). Deep Partitioned Training from Near-Storage Computing to DNN Accelerators. IEEE Computer Architecture Letters, 20(1), 70–73. https://doi.org/10.1109/LCA.2021.3081752

Deep Partitioned Training from Near-Storage Computing to DNN Accelerators

Abstract

Author supplied keywords

Cite

Register to see more suggestions