In this work, we discuss the porting to the GPU platform of the latest production version of the Gyrokinetic Torodial Code (GTC), which is a petascale fusion simulation code using particle-in-cell method. New GPU parallel algorithms have been designed for the particle push and shift operations. The GPU version of the GTC code was benchmarked on up to 3072 nodes of the Tianhe-1A supercomputer, which shows about 2x-3x overall speedup comparing NVIDIA M2050 GPUs to Intel Xeon X5670 CPUs. Strong and weak scaling studies have been performed using actual production simulation parameters, providing insights into GTC's scalability and bottlenecks on large GPU supercomputers. © 2013 Springer-Verlag.
CITATION STYLE
Meng, X., Zhu, X., Wang, P., Zhao, Y., Liu, X., Zhang, B., … Lin, Z. (2013). Heterogeneous programming and optimization of gyrokinetic toroidal code and large-scale performance test on TH-1A. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7905 LNCS, pp. 81–96). https://doi.org/10.1007/978-3-642-38750-0_7
Mendeley helps you to discover research relevant for your work.