Simple memory machine models for GPUs

N/ACitations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The main contribution of this paper is to introduce two parallel memory machines, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). Unlike well studied theoretical parallel computational models such as PRAMs, these parallel memory machines are practical and capture the essential feature of memory access of NVIDIA GPUs. %Thus, algorithmic technique developed on the DMM and the UMM can be used %on the current GPUS. As a first step of the development of algorithmic techniques on the DMM and the UMM, we first evaluated the computing time for the contiguous access and the stride access to the memory on these models. We then go on to present parallel algorithms to transpose a two dimensional array on these models. Finally, we show that, for any given permutation, data in an array can be moved along a given permutation both on the DMM and on the UMM. Since the computing time of our permutation algorithms on the DMM and the UMM is equal to the sum of the lower bounds obtained from the memory bandwidth limitation and the latency overhead, they are optimal from the theoretical point of view. © 2012 IEEE.

Cite

CITATION STYLE

APA

Nakano, K. (2012). Simple memory machine models for GPUs. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012 (pp. 794–803). https://doi.org/10.1109/IPDPSW.2012.98

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free