Performance analysis of the NWChem TCE for different communication patterns

3Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One-sided communication is a model that separates communication from synchronization, and has been in practice for over two decades in libraries such as SHMEM and Global Arrays (GA). GA is used in a number of application codes, especially NWChem, and provides a superset of SHMEM functionality that includes remote accumulate, among other features. Remote accumulate is an active-message operation that applies y+ = a ∗ x at the target rather than just y = x (as in Put) which gives the programmer additional choices with respect to algorithm design. In this paper, we discuss and evaluate communication scenarios for dense block-tensor contractions, one of the mainstays of the NWChem computation chemistry package. We show that apart from the classical approach involving dynamic scheduling of data blocks for load balancing, reordering one-sided Get and Accumulate calls affects the performance of tensor contractions on leadership-class machines substantially. In order to understand why this reordering affects the performance, we develop a proxy application for the NWChem Tensor Contraction Engine (TCE) module. We utilize this proxy application to compare different implementations with a focus on communication.

Cite

CITATION STYLE

APA

Ghosh, P., Hammond, J. R., Ghosh, S., & Chapman, B. (2014). Performance analysis of the NWChem TCE for different communication patterns. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8551, pp. 281–294). Springer Verlag. https://doi.org/10.1007/978-3-319-10214-6_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free