Performance Analysis of 2D-compatible 2.5D-PDGEMM on Knights Landing Cluster

2Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper discusses the performance of a parallel matrix multiplication routine (PDGEMM) that uses the 2.5D algorithm, which is a communication-reducing algorithm, on a cluster based on the Xeon Phi 7200-series (codenamed Knights Landing), Oakforest-PACS. Although the algorithm required a 2.5D matrix distribution instead of the conventional 2D distribution, it performed computations of 2D distributed matrices on a 2D process grid by redistributing the matrices (2D-compatible 2.5D-PDGEMM). Our use of up to 8192 nodes (8192 Xeon Phi processors) demonstrates that in terms of strong scaling, our implementation performs better than conventional 2D implementations.

Cite

CITATION STYLE

APA

Mukunoki, D., & Imamura, T. (2018). Performance Analysis of 2D-compatible 2.5D-PDGEMM on Knights Landing Cluster. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10862 LNCS, pp. 853–858). Springer Verlag. https://doi.org/10.1007/978-3-319-93713-7_85

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free