In this paper, with MPI+CUDA, we present a dual-level parallelization of a high-order CFD software for 3D, multi-block structural girds on the TianHe-1A supercomputer. A self-developed compact high-order finite difference scheme HDCS is used in the CFD software. Our GPU parallelization can efficiently exploit both fine-grained data-level parallelism within a grid block and coarse-grained task-level parallelism among multiple grid blocks. Further, we perform multiple systematic optimizations for the high-order CFD scheme at the CUDA-device level and the cluster level. We present the performance results using up to 256 GPUs (with 114K+ processing cores) on TianHe-1A. We can achieve a speedup of over 10 when comparing our GPU code on a Tesla M2050 with the serial code on an Xeon X5670, and our implementation scales well on TianHe-1A. With our method, we successfully simulate a flow over a high-lift airfoil configuration using 400 GPUs. To the authors' best knowledge, our work involves the largest-scale simulation on GPU-accelerated systems that solves a realistic CFD problem with complex configurations and high-order schemes. © 2013 Springer-Verlag.
CITATION STYLE
Xu, C., Deng, X., Zhang, L., Jiang, Y., Cao, W., Fang, J., … Liu, W. (2013). Parallelizing a high-order CFD software for 3D, multi-block, structural grids on the TianHe-1A supercomputer. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7905 LNCS, pp. 26–39). https://doi.org/10.1007/978-3-642-38750-0_3
Mendeley helps you to discover research relevant for your work.