An optimal parallel prefix-sums algorithm on the memory machine models for GPUs

Koji Nakano

Conference Proceedings

An optimal parallel prefix-sums algorithm on the memory machine models for GPUs

Nakano K

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7439 LNCS(PART 1) 99-113

DOI: 10.1007/978-3-642-33078-0_8

21Citations

2Readers

Get full text

Abstract

The main contribution of this paper is to show optimal algorithms computing the sum and the prefix-sums on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in time units on the DMM and the UMM. We then go on to show that time units are necessary to compute the sum. Finally, we show an optimal parallel algorithm that computes the prefix-sums of n numbers in time units on the DMM and the UMM. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Nakano, K. (2012). An optimal parallel prefix-sums algorithm on the memory machine models for GPUs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7439 LNCS, pp. 99–113). https://doi.org/10.1007/978-3-642-33078-0_8

An optimal parallel prefix-sums algorithm on the memory machine models for GPUs

Abstract

Author supplied keywords

Cite

Register to see more suggestions