Abstract
Hardware prefetching is a latency-hiding technique that hides the costly off-chip DRAM accesses. However, state-of-the-art prefetchers fail to deliver performance improvement in the case of many-core systems with constrained DRAM bandwidth. For SPEC CPU2017 homogeneous workloads, the state-of-the-art Berti L1 prefetcher, on a 64-core system with four and eight DRAM channels, incurs performance slowdowns of 24% and 16%, respectively. However, Berti improves performance by 35% if we use an unrealistic configuration of 64 DRAM channels for a 64-core system (one DRAM channel per core). Prior approaches such as prefetch throttling and critical load prefetching are not effective in the presence of state-of-the-art prefetchers. Existing load criticality predictors fail to detect loads that are critical in the presence of hardware prefetching and the best predictor provides an average critical load prediction accuracy of 41%. Existing prefetch throttling techniques use prefetch accuracy as one of the primary metrics. However, these techniques offer limited benefits for state-of-the-art prefetchers that deliver high prefetch accuracy and use prefetcher-specific throttling and filtering. We propose CLIP, a novel load criticality predictor for hardware prefetching with constrained DRAM bandwidth. Our load criticality predictor provides an average accuracy of more than 93% and as high as 100%. CLIP also filters out the critical loads that lead to accurate prefetching. For a 64-core system with eight DRAM channels, CLIP improves the effectiveness of state-of-the-art Berti prefetcher by 24% and 9% for 45 and 200 64-core homogeneous and heterogeneous workload mixes, respectively. We show that CLIP is equally effective in the presence of other state-of-the-art L1 and L2 prefetchers. Overall, CLIP incurs a storage overhead of 1.56KB/core.
Author supplied keywords
Cite
CITATION STYLE
Panda, B. (2023). CLIP: Load Criticality based Data Prefetching for Bandwidth-constrained Many-core Systems. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023 (pp. 714–727). Association for Computing Machinery, Inc. https://doi.org/10.1145/3613424.3614245
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.