Understanding the performance of concurrent data structures on graphics processors

13Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this paper we revisit the design of concurrent data structures - specifically queues - and examine their performance portability with regard to the move from conventional CPUs to graphics processors. We have looked at both lock-based and lock-free algorithms and have, for comparison, implemented and optimized the same algorithms on both graphics processors and multi-core CPUs. Particular interest has been paid to study the difference between the old Tesla and the new Fermi and Kepler architectures in this context. We provide a comprehensive evaluation and analysis of our implementations on all examined platforms. Our results indicate that the queues are in general performance portable, but that platform specific optimizations are possible to increase performance. The Fermi and Kepler GPUs, with optimized atomic operations, are observed to provide excellent scalability for both lock-based and lock-free queues. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Cederman, D., Chatterjee, B., & Tsigas, P. (2012). Understanding the performance of concurrent data structures on graphics processors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7484 LNCS, pp. 883–894). https://doi.org/10.1007/978-3-642-32820-6_87

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free