A proposal for enhancing training speed in deep learning models based on memory activity survey

0Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

Deep Learning (DL) training process involves intensive computations that require a large number of memory accesses. There are many surveys on memory behaviors with the DL training. They use well-known profiling tools or improving the existing tools to monitor the training processes. This paper presents a new approach to profile using a co-operate solution from software and hardware. The idea is to use Field-Programmable-Gate-Array memory as the main memory for the DL training processes on a computer. Then, the memory behaviors from both software and hardware point-of-views can be monitored and evaluated. The most common DL models are selected for the tests, including ResNet, VGG, AlexNet, and GoogLeNet. The CIFAR-10 dataset is chosen for the training database. The experimental results show that the ratio between read and write transactions is roughly about 3 to 1. The requested allocations are varied from 2-Byte to 64-MB, with the most requested sizes are approximately 16-KB to 64-KB. Based on the statistic, a suggestion was made to improve the training speed using an L4 cache for the Double-Data-Rate (DDR) memory. It can be demonstrated that our recommended L4 cache configuration can improve the DDR performance by about 15% to 18%.

Cite

CITATION STYLE

APA

Kiet, D. T., Kieu-Do-Nguyen, B., Hoang, T. T., Nguyen, K. D., Tran, X. T., & Pham, C. K. (2021). A proposal for enhancing training speed in deep learning models based on memory activity survey. IEICE Electronics Express, 18(15), 1–6. https://doi.org/10.1587/elex.18.20210252

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free