Implementation and optimization of the OpenMP accelerator model for the TI Keystone II architecture

Gaurav Mitra; Eric Stotzer; Ajay Jayaraj; Alistair P. Rendell

Journal Article

Implementation and optimization of the OpenMP accelerator model for the TI Keystone II architecture

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8766 202-214

DOI: 10.1007/978-3-319-11454-5_15

26Citations

6Readers

Get full text

Abstract

The TI Keystone II architecture provides a unique combination of ARM Cortex-A15 processors with high performance TI C66x floating-point DSPs on a single low-power System-on-chip (SoC). Commercially available systems such as the HP Proliant m800 and nCore BrownDwarf are based on this ARM-DSP SoC. The Keystone II architecture promises to deliver high GFLOPS/Watt and is of increasing interest as it provides an alternate building block for future exascale systems. However, the success of this architecture is intimately related to the ease of migrating existing HPC applications for maximum performance. Effective use of all ARM and DSP cores and DMA co-processors is crucial for maximizing performance/watt. This paper explores issues and challenges encountered while migrating the matrix multiplication (GEMM) kernel, originally written only for the C6678 DSP to the ARM-DSP SoC using an early prototype of the OpenMP 4.0 accelerator model. Single precision (SGEMM) matrix multiplication performance of 110.11 GFLOPS and and double precision (DGEMM) performance of 29.15 GFLOPS was achieved on the TI Keystone II Evaluation Module Revision 3.0 (EVM). Trade-offs and factors affecting performance are discussed.

Cite

CITATION STYLE

APA

Mitra, G., Stotzer, E., Jayaraj, A., & Rendell, A. P. (2014). Implementation and optimization of the OpenMP accelerator model for the TI Keystone II architecture. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8766, 202–214. https://doi.org/10.1007/978-3-319-11454-5_15

Implementation and optimization of the OpenMP accelerator model for the TI Keystone II architecture

Abstract

Cite

Register to see more suggestions