Hybrid CPU-GPU scheduling and execution of tree traversals

14Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

GPUs offer the promise of massive, powereficient parallelism. However, exploiting this parallelism requires code to be carefully structured to deal with the limitations of the SIMT execution model. In recent years, there has been much interest in mapping irregular applications to GPUs: applications with unpredictable, datadependent behaviors. While most of the work in this space has focused on ad hoc implementations of specific algorithms, recent work has looked at generic techniques for mapping a large class of tree traversal algorithms to GPUs, through careful restructuring of the tree traversal algorithms to make them behave more regularly. Unfortunately, even this general approach for GPU execution of tree traversal algorithms is reliant on ad hoc, hand-written, algorithm-specific scheduling (i.e., assignment of threads to warps) to achieve high performance. The key challenge of scheduling is that it is a highly irregular process, that requires the inspection of thread behavior and then careful sorting of those threads into warps. In this paper, we present a novel scheduling and execution technique for tree traversal algorithms that is both general and automatic. The key novelty is a hybrid, inspector-executor approach: the GPU partially executes tasks to inspect thread behavior and transmits information back to the CPU, which uses that information to perform the scheduling itself, before executing the remaining, carefully scheduled, portion of the traversals on the GPU. We applied this framework to six tree traversal algorithms, achieving significant speedups over optimized GPU code that does not perform application-specific scheduling. Further, we show that in many cases, our hybrid approach is able to deliver better performance even than GPU code that uses handtuned, application-specific scheduling.

Cite

CITATION STYLE

APA

Liu, J., Hegde, N., & Kulkarni, M. (2016). Hybrid CPU-GPU scheduling and execution of tree traversals. In Proceedings of the International Conference on Supercomputing (Vol. 01-03-June-2016). Association for Computing Machinery. https://doi.org/10.1145/2925426.2926261

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free