Tensor-based CUDA optimization for ANN inferencing using parallel acceleration on embedded GPU

Ahmed Khamis Abdullah Al Ghadani; Waleeja Mateen; Rameshkumar G. Ramaswamy

Conference ProceedingsOPEN ACCESS

Tensor-based CUDA optimization for ANN inferencing using parallel acceleration on embedded GPU

IFIP Advances in Information and Communication Technology (2020) 583 IFIP 291-302

DOI: 10.1007/978-3-030-49161-1_25

6Citations

19Readers

Abstract

With image processing, robots acquired visual perception skills; enabling them to become autonomous. Since the emergence of Artificial Intelligence (AI), sophisticated tasks such as object identification have become possible through inferencing Artificial Neural Networks (ANN). Be that as it may, Autonomous Mobile Robots (AMR) are Embedded Systems (ESs) with limited on-board resources. Thus, efficient techniques in ANN inferencing are required for real-time performance. This paper presents the process of optimizing ANNs inferencing using tensor-based optimization on embedded Graphical Processing Unit (GPU) with Computer Unified Device Architecture (CUDA) platform for parallel acceleration on ES. This research evaluates renowned network, namely, You-Only-Look-Once (YOLO), on NVIDIA Jetson TX2 System-On-Module (SOM). The findings of this paper display a significant improvement in inferencing speed in terms of Frames-Per-Second (FPS) up to 3.5 times the non-optimized inferencing speed. Furthermore, the current CUDA model and TensorRT optimization techniques are studied, comments are made on its implementation for inferencing, and improvements are proposed based on the results acquired. These findings will contribute to ES developers and industries will benefit from real-time performance inferencing for AMR automation solutions.

Author supplied keywords

Cite

CITATION STYLE

APA

Al Ghadani, A. K. A., Mateen, W., & Ramaswamy, R. G. (2020). Tensor-based CUDA optimization for ANN inferencing using parallel acceleration on embedded GPU. In IFIP Advances in Information and Communication Technology (Vol. 583 IFIP, pp. 291–302). Springer. https://doi.org/10.1007/978-3-030-49161-1_25

Tensor-based CUDA optimization for ANN inferencing using parallel acceleration on embedded GPU

Abstract

Author supplied keywords

Cite

Register to see more suggestions