Abstract
We have been developinganadvanced scientific code called"ARTED" for an electron dynamics simulation using the first-order computation of materials to be ported to various large-scale parallel systems including the "K" Computer, which was previously Japan's fastest supercomputer. In this paper, the implementation and performance evaluation of the ARTED code used in Intel's latest many-core processor, the Knights Landing (KNL) stand-alone cluster, are described based onpast researchonporting the codetothe Knights Corner (KNC) accelerator. Our target system is Oakforest-PACS, which is currently the fastest supercomputer in Japan. For performance tuning on KNL, the largest issue is how to utilize multiple levels of parallelism, such as the instruction level (512-bit SIMD instruction), hardware thread (4 threads/core), and large number of cores. We focus on the dominant computation part of the code, where 25 points of a 3D stencil computation are required. We successfully optimize this part to achieve 758.4 GFLOPS per node, which corresponds to 24.8% of the theoretical peak on the node of Oakforest-PACS usingan Intel Xeon Phi 7250 (3046 GFLOPS peak). It is also shown that the KNL sustained performance is better than that of the two KNC accelerator cards. The entire ARTED code implies other time step computing, and was designed for a large-scale parallel execution using MPI, whereas single-node par-allelization is achieved using OpenMP. We finally evaluate the entire parallel execution performance with up to 128 nodes.
Author supplied keywords
Cite
CITATION STYLE
Hirokawa, Y., Boku, T., Sato, S. A., & Yabana, K. (2018). Performance evaluation of large scale electron dynamics simulation under many-core cluster based on knights landing. In ACM International Conference Proceeding Series (pp. 183–191). Association for Computing Machinery. https://doi.org/10.1145/3149457.3149465
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.