Convolutional neural networks (CNNs) are one of the most successful deep architectures in machine learning. While they achieve superior recognition rate, the intensive computation of CNNs limits their applicability. In this paper, we propose a method based on separable filters to reduce the computational cost. By using Singular Value Decompositions (SVDs), a 2D filter in the CNNs can be approximated by the product of two 1D filters, and the 2D convolution can be computed via two consecutive 1D convolutions. We implemented a batched SVD routine on GPUs that can compute the SVD of multiple small matrices simultaneously, and three convolution methods using different memory spaces according to the filter size. Comparing to the state-of-art GPU implementations of CNNs, experimental results show that our methods can achieve up to 2.66 times speedup in the forward pass and up to 2.35 times speedup in the backward pass.
CITATION STYLE
Kang, H. P., & Lee, C. R. (2015). Improving performance of convolutional neural networks by separable filters on GPU. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9233, pp. 638–649). Springer Verlag. https://doi.org/10.1007/978-3-662-48096-0_49
Mendeley helps you to discover research relevant for your work.