Optimizing OpenMP parallelized DGEMM calls on SGI Altix 3700

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Using functions of parallelized mathematical libraries is a common way to accelerate numerical applications. Computer architectures with shared memory characteristics support different approaches for the implementation of such libraries, usually OpenMP or MPI. This paper's content is based on the performance comparison of DGEMM calls (floating point matrix multiplication, double precision) with different OpenMP parallelized numerical libraries, namely Intel MKL and SGI SCSL, and how they can be optimized. Additionally, we have a look at the memory placement policy and give hints for initializing data. Our attention has been focused on a SGI Altix 3700 Bx2 system using BenchIT [1] as a very convenient performance measurement suite for the examinations. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Hackenberg, D., Schöne, R., Nagel, W. E., & Pflüger, S. (2006). Optimizing OpenMP parallelized DGEMM calls on SGI Altix 3700. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4128 LNCS, pp. 145–154). Springer Verlag. https://doi.org/10.1007/11823285_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free