Fine tuning matrix multiplications on multicore

Stéphane Zuckerman; Marc Pérache; William Jalby

Conference Proceedings

Fine tuning matrix multiplications on multicore

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5374 LNCS 30-41

DOI: 10.1007/978-3-540-89894-8_7

6Citations

10Readers

Get full text

Abstract

Multicore systems are becoming ubiquituous in scientific computing. As performance libraries are adapted to such systems, the difficulty to extract the best performance out of them is quite high. Indeed, performance libraries such as Intel's MKL, while performing very well on unicore architectures, see their behaviour degrade when used on multicore systems. Moreover, even multicore systems show wide differences among each other (presence of shared caches, memory bandwidth, etc.) We propose a systematic method to improve the parallel execution of matrix multiplication, through the study of the behavior of unicore DGEMM kernels in MKL, as well as various other criteria. We show that our fine-tuning can out-perform Intel's parallel DGEMM of MKL, with performance gains sometimes up to a factor of two. © 2008 Springer Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Zuckerman, S., Pérache, M., & Jalby, W. (2008). Fine tuning matrix multiplications on multicore. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5374 LNCS, pp. 30–41). Springer Verlag. https://doi.org/10.1007/978-3-540-89894-8_7

Fine tuning matrix multiplications on multicore

Abstract

Author supplied keywords

Cite

Register to see more suggestions