Optimization and analysis of parallel back propagation neural network on GPU using CUDA

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Graphic Processing Unit (GPU) can achieve remarkable performance for dataset-oriented application such as Back Propagation Network (BPN) under reasonable task decomposition and memory optimization. However, advantages of GPU’s memory architecture are still not fully exploited to parallel BPN. In this paper, we develop and analyze a parallel implementation of a back propagation neural network using CUDA. It focuses on kernels optimization through the use of shared memory and suitable blocks dimensions. The implementation was tested with seven well-known benchmark data sets and the results show promising 33.8x to 64.3x speedups can be realized compared to a sequential implementation on a CPU.

Cite

CITATION STYLE

APA

Wang, Y., Tang, P., An, H., Liu, Z., Wang, K., & Zhou, Y. (2015). Optimization and analysis of parallel back propagation neural network on GPU using CUDA. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9491, pp. 156–163). Springer Verlag. https://doi.org/10.1007/978-3-319-26555-1_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free