CUDA memory optimizations for large data-structures in the gravit simulator

6Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Modern GPUs open a completely new field to optimize embarrassingly parallel algorithms. Implementing an algorithm on a GPU confronts the programmer with a new set of challenges for program optimization. Especially tuning the program for the GPU memory hierarchy whose organization and performance implications are radically different from those of general purpose CPUs; and optimizing programs at the instruction-level for the GPU. In this paper we analyze different approaches for optimizing the memory usage and access patterns for GPUs and propose a class of memory layout optimizations that can take full advantage of the unique memory hierarchy of NVIDIA CUDA. Furthermore, we analyze some classical optimization techniques and how they effect the performance on a GPU. We used the Gravit gravity simulator to demonstrate these optimizations. The final optimized GPU version achieves a 87 × speedup compared to the original CPU version. Almost 30% of this speedup are direct results of the optimizations discussed in this paper.

Cite

CITATION STYLE

APA

Siegel, J., Ributzka, J., & Li, X. (2011). CUDA memory optimizations for large data-structures in the gravit simulator. In Journal of Algorithms and Computational Technology (Vol. 5, pp. 341–362). https://doi.org/10.1260/1748-3018.5.2.341

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free