Snake: A Variable-length Chain-based Prefetching for GPUs

Saba Mostofi; Hajar Falahati; Negin Mahani; Pejman Lotfi-Kamran; Hamid Sarbazi-Azad

Conference ProceedingsOPEN ACCESS

Snake: A Variable-length Chain-based Prefetching for GPUs

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023 (2023) 728-741

DOI: 10.1145/3613424.3623782

12Citations

8Readers

Get full text

Abstract

Graphics Processing Units (GPUs) utilize memory hierarchy and Thread-Level Parallelism (TLP) to tolerate off-chip memory latency, which is a significant bottleneck for memory-bound applications. However, parallel threads generate a large number of memory requests, which increases the average memory latency and degrades cache performance due to high contention. Prefetching is an effective technique to reduce memory access latency, and prior research shows the positive impact of stride-based prefetching on GPU performance. However, existing prefetching methods only rely on fixed strides. To address this limitation, this paper proposes a new prefetching technique, Snake, which is built upon chains of variable strides, using throttling and memory decoupling strategies. Snake achieves 80% coverage and 75% accuracy in prefetching demand memory requests, resulting in a 17% improvement in total GPU performance and energy consumption for memory-bound General-Purpose Graphics Processing Unit (GPGPU) applications.

Author supplied keywords

Cite

CITATION STYLE

APA

Mostofi, S., Falahati, H., Mahani, N., Lotfi-Kamran, P., & Sarbazi-Azad, H. (2023). Snake: A Variable-length Chain-based Prefetching for GPUs. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023 (pp. 728–741). Association for Computing Machinery, Inc. https://doi.org/10.1145/3613424.3623782

Snake: A Variable-length Chain-based Prefetching for GPUs

Abstract

Author supplied keywords

Cite

Register to see more suggestions