Deep Partitioned Training from Near-Storage Computing to DNN Accelerators

5Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this letter, we present deep partitioned training to accelerate computations involved in training DNN models. This is the first work that partitions a DNN model across storage devices, an NPU and a host CPU forming a unified compute node for training workloads. To validate the benefit of using the proposed system during DNN training, a trace-based simulator or an FPGA prototype is used to estimate the overall performance and obtain the layer index to be partitioned that provides the minimum latency. As a case study, we select two benchmarks, i.e., vision-related tasks and a recommendation system. As a result, the training time reduces by 12.2∼31.0 percent with four near-storage computing devices in vision-related tasks with a mini-batch size of 512 and 40.6∼44.7 percent with one near-storage computing device in the selected recommendation system with a mini-batch size of 64.

Cite

CITATION STYLE

APA

Jang, Y., Kim, S., Kim, D., Lee, S., & Kung, J. (2021). Deep Partitioned Training from Near-Storage Computing to DNN Accelerators. IEEE Computer Architecture Letters, 20(1), 70–73. https://doi.org/10.1109/LCA.2021.3081752

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free