Optimizing explicit data transfers for data parallel applications on the cell architecture

Selma Saidi; Pranav Tendulkar; Thierry Lepley; Oded Maler

Journal ArticleOPEN ACCESS

Optimizing explicit data transfers for data parallel applications on the cell architecture

Transactions on Architecture and Code Optimization (2012) 8(4)

DOI: 10.1145/2086696.2086716

24Citations

18Readers

Abstract

In this paper we investigate a general approach to automate some deployment decisions for a certain class of applications on multi-core computers. We consider data-parallelizable programs that use the well-known double buffering technique to bring the data from the off-chip slow memory to the local memory of the cores via a DMA (direct memory access) mechanism. Based on the computation time and size of elementary data items as well as DMA characteristics, we derive optimal and near optimal values for the number of blocks that should be clustered in a single DMA command. We then extend the results to the case where a computation for one data item needs some data in its neighborhood. In this setting we characterize the performance of several alternative mechanisms for data sharing. Our models are validated experimentally using a cycle-accurate simulator of the Cell Broadband Engine architecture. © 2012 ACM.

Author supplied keywords

Cite

CITATION STYLE

APA

Saidi, S., Tendulkar, P., Lepley, T., & Maler, O. (2012). Optimizing explicit data transfers for data parallel applications on the cell architecture. Transactions on Architecture and Code Optimization, 8(4). https://doi.org/10.1145/2086696.2086716

Optimizing explicit data transfers for data parallel applications on the cell architecture

Abstract

Author supplied keywords

Cite

Register to see more suggestions