Abstract
As the supercomputing system entering the exascale era, power consumption becomes a major concern in the system design. Among all the novel techniques for reducing power consumption, ARM architecture is gaining popularity in the HPC community due to its low power footprint and high energy efficiency. As one of the initiatives for addressing the exascale challenges in China, Tianhe-3 supercomputer has adopted the technology roadmap of using the many-core ARM architecture with home-built phytium-2000 + and matrix-2000 + processors. In this paper, we evaluate several linear algebra kernels such as matrix-matrix multiplication, matrix-vector multiplication and triangular solver with both sparse and dense datasets. These linear algebra kernels are good performance indicators of the prototype Tianhe-3 cluster. Comprehensive analysis is performed using roofline model to identify the directions for performance optimization from both hardware and software perspectives. In addition, we compare the performance of phytium-2000 + and matrix-2000 + with widely used KNL processor. We believe this paper provides valuable experiences and insights as work-in-progress towards exascale for the HPC community.
Author supplied keywords
Cite
CITATION STYLE
You, X., Yang, H., Luan, Z., Liu, Y., & Qian, D. (2019). Performance evaluation and analysis of linear algebra Kernels in the prototype Tianhe-3 cluster. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11416 LNCS, pp. 86–105). Springer Verlag. https://doi.org/10.1007/978-3-030-18645-6_6
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.