Engineering a multi-core radix sort

38Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We present a fast radix sorting algorithm that builds upon a microarchitecture-aware variant of counting sort. Taking advantage of virtual memory and making use of write-combining yields a per-pass throughput corresponding to at least 89% of the system's peak memory bandwidth. Our implementation outperforms Intel's recently published radix sort by a factor of 1.64. It also compares favorably to the reported performance of an algorithm for Fermi GPUs when data-transfer overhead is included. These results indicate that scalar, bandwidth-sensitive sorting algorithms remain competitive on current architectures. Various other memory-intensive applications can benefit from the techniques described herein. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Wassenberg, J., & Sanders, P. (2011). Engineering a multi-core radix sort. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6853 LNCS, pp. 160–169). https://doi.org/10.1007/978-3-642-23397-5_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free