Automatically tuning sparse matrix-vector multiplication for GPU architectures

N/ACitations
Citations of this article
57Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallelize for GPUs due to irregular patterns of memory references. In this paper we present a new storage format for sparse matrices that better employs locality, has low memory footprint and enables automatic specialization for various matrices and future devices via parameter tuning. Experimental evaluation demonstrates significant speedups compared to previously published results. © 2010 Springer-Verlag.

Cite

CITATION STYLE

APA

Monakov, A., Lokhmotov, A., & Avetisyan, A. (2010). Automatically tuning sparse matrix-vector multiplication for GPU architectures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5952 LNCS, pp. 111–125). https://doi.org/10.1007/978-3-642-11515-8_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free