Optimizing excited-state electronic-structure codes for intel knights landing: A case study on the BerkeleyGW software

2Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We profile and optimize calculations performed with the BerkeleyGW [2,3] code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW method is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.

Cite

CITATION STYLE

APA

Deslippe, J., da Jornada, F. H., Vigil-Fowler, D., Barnes, T., Wichmann, N., Raman, K., … Louie, S. G. (2016). Optimizing excited-state electronic-structure codes for intel knights landing: A case study on the BerkeleyGW software. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9945 LNCS, pp. 402–414). Springer Verlag. https://doi.org/10.1007/978-3-319-46079-6_29

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free