Snake: A Variable-length Chain-based Prefetching for GPUs

12Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Graphics Processing Units (GPUs) utilize memory hierarchy and Thread-Level Parallelism (TLP) to tolerate off-chip memory latency, which is a significant bottleneck for memory-bound applications. However, parallel threads generate a large number of memory requests, which increases the average memory latency and degrades cache performance due to high contention. Prefetching is an effective technique to reduce memory access latency, and prior research shows the positive impact of stride-based prefetching on GPU performance. However, existing prefetching methods only rely on fixed strides. To address this limitation, this paper proposes a new prefetching technique, Snake, which is built upon chains of variable strides, using throttling and memory decoupling strategies. Snake achieves 80% coverage and 75% accuracy in prefetching demand memory requests, resulting in a 17% improvement in total GPU performance and energy consumption for memory-bound General-Purpose Graphics Processing Unit (GPGPU) applications.

Cite

CITATION STYLE

APA

Mostofi, S., Falahati, H., Mahani, N., Lotfi-Kamran, P., & Sarbazi-Azad, H. (2023). Snake: A Variable-length Chain-based Prefetching for GPUs. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023 (pp. 728–741). Association for Computing Machinery, Inc. https://doi.org/10.1145/3613424.3623782

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free