Performance anomalies when running Gaussian frequency calculations in parallel on SGI Altix computers with CC-NUMA memory architecture are analyzed using performance tools that access hardware counters. The bottleneck is the frequent and nearly simultaneous data-loads of all threads involved in the calculation of data allocated in the node where the master thread runs. Code changes that ensure these data-loads are localized improve performance by a factor close to two. The improvements carry over to other molecular models and other types of calculations. An expansion or an alternative of FirstPrivate OpenMP's clause can facilitate the code transformations. © 2009 Springer.
CITATION STYLE
Gomperts, R., Frisch, M., & Panziera, J. P. (2009). Scalability of Gaussian 03 on SGI Altix: The importance of data locality on CC-NUMA architecture. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5568 LNCS, pp. 93–103). https://doi.org/10.1007/978-3-642-02303-3_8
Mendeley helps you to discover research relevant for your work.