Designing parallel sparse matrix transposition algorithm using CSR for GPUs

4Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this chapter, we propose a parallel algorithm for sparse matrix transposition using CSR format to run on many-core GPUs, utilizing the tremendous computational power and memory bandwidth of the GPU offered by parallel programming in CUDA. Our code is run on a quad-core Intel Xeon64 CPU E5507 platform and a NVIDIA GPU GTX 470 card. We measure the performance of our algorithm running with input ranging from smaller to larger matrices, and our experimental results show that the preliminary results are scaling well up to 512 threads and are promising for bigger matrices. © 2013 Springer Science+Business Media New York.

Cite

CITATION STYLE

APA

Weng, T. H., Pham, H., Jiang, H., & Li, K. C. (2013). Designing parallel sparse matrix transposition algorithm using CSR for GPUs. In Lecture Notes in Electrical Engineering (Vol. 234 LNEE, pp. 251–257). https://doi.org/10.1007/978-1-4614-6747-2_31

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free