Graphic Processing Unit (GPU) can achieve remarkable performance for dataset-oriented application such as Back Propagation Network (BPN) under reasonable task decomposition and memory optimization. However, advantages of GPU’s memory architecture are still not fully exploited to parallel BPN. In this paper, we develop and analyze a parallel implementation of a back propagation neural network using CUDA. It focuses on kernels optimization through the use of shared memory and suitable blocks dimensions. The implementation was tested with seven well-known benchmark data sets and the results show promising 33.8x to 64.3x speedups can be realized compared to a sequential implementation on a CPU.
CITATION STYLE
Wang, Y., Tang, P., An, H., Liu, Z., Wang, K., & Zhou, Y. (2015). Optimization and analysis of parallel back propagation neural network on GPU using CUDA. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9491, pp. 156–163). Springer Verlag. https://doi.org/10.1007/978-3-319-26555-1_18
Mendeley helps you to discover research relevant for your work.