Data transfer overhead between computing cores and memory hierarchy has been a persistent issue for von Neumann architectures and the problem has only become more challenging with the emergence of manycore systems. A conceptually powerful approach to mitigate this overhead is to bring the computation closer to data, known as Near Data Computing (NDC). Recently, NDC has been investigated in different flavors for CPU-based multicores, while the GPU domain has received little attention. In this paper, we present a novel NDC solution for GPU architectures with the objective of minimizing on-chip data transfer between the computing cores and Last-Level Cache (LLC). To achieve this, we first identify frequently occurring Load-Compute-Store instruction chains in GPU applications. These chains, when offloaded to a compute unit closer to where the data resides, can significantly reduce data movement. We develop two offloading techniques, called LLC-Compute and Omni-Compute. The first technique, LLC-Compute, augments the LLCs with computational hardware for handling the computation offloaded to them. The second technique (Omni-Compute) employs simple bookkeeping hardware to enable GPU cores to compute instructions offloaded by other GPU cores. Our experimental evaluations on nine GPGPU workloads indicate that the LLC-Compute technique provides, on an average, 19% performance improvement (IPC), 11% performance/watt improvement, and 29% reduction in on-chip data movement compared to the baseline GPU design. The Omni-Compute design boosts these benefits to 31%, 16% and 44%, respectively.
CITATION STYLE
Pattnaik, A., Tang, X., Kayiran, O., Jog, A., Mishra, A., Kandemir, M. T., … Das, C. R. (2019). Opportunistic computing in GPU architectures. In Proceedings - International Symposium on Computer Architecture (pp. 210–223). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3307650.3322212
Mendeley helps you to discover research relevant for your work.