Performance of a computation-intensive multi-purpose CFD code PHASTA is analyzed on the NCSA Intel IA-64 Linux cluster. The capabilities of current-generation, open-source performance analysis tools available on this terascale system are demonstrated. Code profiling and hardware-performance counting tools are used to measure single-processor performance. Results pinpoint dominant but inefficient subroutines when level-3 optimization is used. Performance of these subroutines improves by compiling with level-2 optimization instead, due to reduction in total instructions. Flop rates of individual subroutines are estimated to guide further tuning. Parallel performance is addressed with performance visualization of inter-processor communication. Results reveal sporadic communication overhead in the function MPI_Waitall. This overhead constitutes about 18% of total simulation time. © Springer-Verlag Berlin Heidelberg 2003.
CITATION STYLE
Kwok, W. Y. (2003). Performance analysis of PHASTA on NCSA Intel IA-64 Linux cluster. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2660, 43–52. https://doi.org/10.1007/3-540-44864-0_5
Mendeley helps you to discover research relevant for your work.