Quicksilver represents key elements of the Mercury Monte Carlo Particle Transport simulation software developed at Lawrence Livermore National Laboratory (LLNL). Mercury is one of the applications used in the Department of Energy (DOE) for nuclear security and nuclear reactor simulations. Thus Quicksilver, as a Mercury proxy, influences DOE’s hardware procurement and co-design activities. Quicksilver has a complicated implementation and performance profile: its performance is dominated by latency-bound table look-ups and control flow divergence that limit SIMD/SIMT parallelization opportunities. Therefore, obtaining high performance for Quicksilver is quite challenging. This paper shows how to improve Quicksilver’s performance on Intel Xeon CPUs by 1.8 × compared to its original version by selectively replicating conflict-prone data structures. It also shows how to efficiently port Quicksilver on the new Intel Programmable Integrated Unified Memory Architecture (PIUMA). Preliminary analysis shows that a PIUMA die (8 cores) is about 2 × faster than an Intel Xeon 8280 socket (28 cores) and provides better strong scaling efficiency.
CITATION STYLE
Tithi, J. J., Petrini, F., & Richards, D. F. (2021). Lessons Learned from Accelerating Quicksilver on Programmable Integrated Unified Memory Architecture (PIUMA) and How That’s Different from CPU. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12728 LNCS, pp. 38–56). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-78713-4_3
Mendeley helps you to discover research relevant for your work.