PARTANS

  • Lutz T
  • Fensch C
  • Cole M
N/ACitations
Citations of this article
19Readers
Mendeley users who have this article in their library.

Abstract

GPGPUs are a powerful and energy-efficient solution for many problems. For higher performance or larger problems, it is necessary to distribute the problem across multiple GPUs, increasing the already high programming complexity.In this article, we focus on abstracting the complexity of multi-GPU programming for stencil computation. We show that the best strategy depends not only on the stencil operator, problem size, and GPU, but also on the PCI express layout. This adds nonuniform characteristics to a seemingly homogeneous setup, causing up to 23% performance loss. We address this issue with an autotuner that optimizes the distribution across multiple GPUs.

Cite

CITATION STYLE

APA

Lutz, T., Fensch, C., & Cole, M. (2013). PARTANS. ACM Transactions on Architecture and Code Optimization, 9(4), 1–24. https://doi.org/10.1145/2400682.2400718

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free