Task parallel implementation of matrix multiplication on multi-socket multi-core architectures

Yizhuo Wang; Weixing Ji; Xu Chen; Sensen Hu

Conference Proceedings

Task parallel implementation of matrix multiplication on multi-socket multi-core architectures

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9530 93-104

DOI: 10.1007/978-3-319-27137-8_8

0Citations

2Readers

Get full text

Abstract

Matrix multiplication is a very important computation kernel in many science and engineering applications. This paper presents a parallel implementation framework for dense matrix multiplication on multi-socket multi-core architectures. Our framework first partitions the computation between the multi-core processors. Then a hybrid matrix multiplication algorithm is used on each processor, which combines the Winograd algorithm and the classical algorithm. In addition, a hierarchical work-stealing scheme is applied to achieve dynamic load balancing and enforce data locality in our framework. Performance experiments on two platforms show that our implementation gets significant performance gains compared with the state-of-the-art implementations.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, Y., Ji, W., Chen, X., & Hu, S. (2015). Task parallel implementation of matrix multiplication on multi-socket multi-core architectures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9530, pp. 93–104). Springer Verlag. https://doi.org/10.1007/978-3-319-27137-8_8

Task parallel implementation of matrix multiplication on multi-socket multi-core architectures

Abstract

Author supplied keywords

Cite

Register to see more suggestions