GPU sample sort

59Citations
Citations of this article
54Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we present the design of a sample sort algorithm for manycore GPUs. Despite being one of the most efficient comparison-based sorting algorithms for distributed memory architectures its performance on GPUs was previously unknown. For uniformly distributed keys our sample sort is at least 25% and on average 68% faster than the best comparison-based sorting algorithm, GPU Thrust merge sort, and on average more than 2 times faster than GPU quicksort. Moreover, for 64-bit integer keys it is at least 63% and on average 2 times faster than the highly optimized GPU Thrust radix sort that directly manipulates the binary representation of keys. Our implementation is robust to different distributions and entropy levels of keys and scales almost linearly with the input size. These results indicate that multi-way techniques in general and sample sort in particular achieve substantially better performance than two-way merge sort and quicksort. ©2010 IEEE.

Author supplied keywords

Cite

CITATION STYLE

APA

Leischner, N., Osipov, V., & Sanders, P. (2010). GPU sample sort. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010. https://doi.org/10.1109/IPDPS.2010.5470444

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free