BLAS3 optimization for the Godson-3B1500

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper proposes a performance model for general matrix multiplication (GEMM) on decoupled access/execute (DAE) architecture platforms, in order to guide improvements of the GEMM performance in the Godson-3B1500. This model focuses on the features of access processors (APs) and execute processors (EPs). To reduce the synchronization overhead between APs and EPs, a synchronization module selection mechanism (SMSM) is presented. Furthermore, two optimized algorithms of GEMM for DAE platforms based on the performance model are proposed for ideal performance. In the proposed algorithms, the kernel functions are optimized with single instruction multiple data (SIMD) vector instructions, and the overhead of AP is almost overlapped with EP by taking full advantage of the features of the architecture. Moreover, the synchronization overhead can be reduced according to the SMSM. In the end, the proposed algorithms are tested on the Godson-3B1500. The experimental results demonstrate that the computing performance of dGEMM reaches 91.9% of the theoretical peak performance and that zGEMM can reach 93% of the theoretical peak performance.

Cite

CITATION STYLE

APA

Zhang, M., Gu, N., & Ren, K. (2016). BLAS3 optimization for the Godson-3B1500. SpringerPlus, 5(1). https://doi.org/10.1186/s40064-016-3690-3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free