An optimal parallel prefix-sums algorithm on the memory machine models for GPUs

21Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The main contribution of this paper is to show optimal algorithms computing the sum and the prefix-sums on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in time units on the DMM and the UMM. We then go on to show that time units are necessary to compute the sum. Finally, we show an optimal parallel algorithm that computes the prefix-sums of n numbers in time units on the DMM and the UMM. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Nakano, K. (2012). An optimal parallel prefix-sums algorithm on the memory machine models for GPUs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7439 LNCS, pp. 99–113). https://doi.org/10.1007/978-3-642-33078-0_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free