Although GPUs have emerged as the mainstream for the acceleration of convolutional neural network (CNN) training processes, they usually have limited physical memory, meaning that it is hard to train large-scale CNN models. Many methods for memory optimization have been proposed to decrease the memory consumption of CNNs and to mitigate the increasing scale of these networks; however, this optimization comes at the cost of an obvious drop in time performance. We propose a new memory optimization strategy named Layup that realizes both better memory efficiency and better time performance. First, a fast layer-type-specific method for memory optimization is presented, based on the new finding that a single memory optimization often shows dramatic differences in time performance for different types of layers. Second, a new memory reuse method is presented in which greater attention is paid to multi-type intermediate data such as convolutional workspaces and cuDNN handle data. Experiments show that Layup can significantly increase the scale of extra-deep network models on a single GPU with lower performance loss. It even can train ResNet with 2,504 layers using 12GB memory, outperforming the state-of-the-art work of SuperNeurons with 1,920 layers (batch size = 16).
CITATION STYLE
Jiang, W., Ma, Y., Liu, B., Liu, H., Zhou, B. B., Zhu, J., … Jin, H. (2019). Layup. ACM Transactions on Architecture and Code Optimization, 16(4), 1–23. https://doi.org/10.1145/3357238
Mendeley helps you to discover research relevant for your work.