Fine tuning matrix multiplications on multicore

6Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Multicore systems are becoming ubiquituous in scientific computing. As performance libraries are adapted to such systems, the difficulty to extract the best performance out of them is quite high. Indeed, performance libraries such as Intel's MKL, while performing very well on unicore architectures, see their behaviour degrade when used on multicore systems. Moreover, even multicore systems show wide differences among each other (presence of shared caches, memory bandwidth, etc.) We propose a systematic method to improve the parallel execution of matrix multiplication, through the study of the behavior of unicore DGEMM kernels in MKL, as well as various other criteria. We show that our fine-tuning can out-perform Intel's parallel DGEMM of MKL, with performance gains sometimes up to a factor of two. © 2008 Springer Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Zuckerman, S., Pérache, M., & Jalby, W. (2008). Fine tuning matrix multiplications on multicore. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5374 LNCS, pp. 30–41). Springer Verlag. https://doi.org/10.1007/978-3-540-89894-8_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free