A 64-GB Sort at 28 GB/s on a 4-GPU POWER9 Node for Uniformly-Distributed 16-Byte Records with 8-Byte Keys

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Govinderaju et al. [1] have shown that a hybrid CPU-GPU system is cost-performance effective at sorting large datasets on a single node, but thus far large clusters used on sorting benchmarks have been limited by network and storage performance, and such clusters have remained CPU-only. With network and storage bandwidths improving more rapidly than CPU throughput performance, the cost effectiveness of CPU-GPU clusters for large sorts should be re-examined. As a first step, we evaluate sort performance on a single GPU-accelerated node with initial and final data residing in system memory. Access to main memory is limited to two reads and two writes, while executing the partitioning and sort in GPU memory. On a dual-socket IBM POWER9 system with four NVlink-attached NVIDIA V100 GPUs a single-node sort of 64 GB 8-byte key, 8-byte value records completes in under 2.3 s corresponding to a sort rate of over 28 GB/s. On a small (4-node) cluster with the same amount of data per node, the cluster sort completes in under 4.5 s. Sort performance is enabled by high system memory bandwidth, managing system-memory NUMA affinities, high CPU-GPU bandwidth, an efficient GPU-based partitioner, and an optimized GPU sort implementation. A cluster version of the algorithm benefits from minimizing copy operations by using RDMA. Matching the throughput of an optimized partitioner for our system would require a 50-100 GB/s network, which is feasible with a dual-socket POWER9 system.

Author supplied keywords

Cite

CITATION STYLE

APA

Fossum, G. C., Wang, T., & Hofstee, H. P. (2018). A 64-GB Sort at 28 GB/s on a 4-GPU POWER9 Node for Uniformly-Distributed 16-Byte Records with 8-Byte Keys. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11203 LNCS, pp. 373–386). Springer Verlag. https://doi.org/10.1007/978-3-030-02465-9_25

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free