UPC collective operations optimization

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In any parallel programming language; collective communication operations involve more than one thread/process and act on multiple streams of data. The language's API provides both algorithmic and run-time system support to optimize the performance of these operations. Some developers, however, choose to play clever and start from the language's primitive operations and write their own versions of the collective operations. The question that always pops up: Are these developers wise? In this paper, we check the case of UPC (Universal Parallel C) and prove that in some circumstances, it is wiser for developers to optimize starting from UPCs primitive operations. In our testing we found out that optimization using primitive UPC operations by the developers can have better performance than readily available UPCs collective operations. In this paper, we pin point specific optimizations at both the algorithmic and the runtime support levels that developers could use to uncover missed optimization opportunities. We also propose a novel approach to implementing UPC collective operations across clusters. Under this methodology, performance-critical components are moved close to the network. We argue that this provide unique advantages for performance improvement. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Salama, R. A., & Sameh, A. (2007). UPC collective operations optimization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4705 LNCS, pp. 536–549). Springer Verlag. https://doi.org/10.1007/978-3-540-74472-6_44

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free