Prefetching is a highly effective latency hiding technique that can greatly improve application performance. However, aggressive prefetching can potentially stress the off-chip bandwidth. The resulting bandwidth stalls can potentially negate the performance gain due to prefetching. In this paper, focusing on a multicore environment, we first study the comparative benefits of hardware and software prefetching and analyze if the two are complimentary or redundant. This analysis also evaluates different aggressiveness levels of hardware prefetching. Secondly, we weigh the positive performance benefits of prefetching against the negative performance effects of bandwidth stalls. Thirdly, we propose a hierarchical prefetch management scheme for multicores that controls the prefetch levels such that the overall performance gain is improved. Lastly, we show that our proposed off-chip bandwidth aware prefetch management scheme is very effective in practice, leading to performance gains of upto about 10% in system throughput over a bandwidth agnostic prefetching scheme. © 2011 Springer-Verlag.
CITATION STYLE
Muralidhara, S. P., Kandemir, M., & Zhang, Y. (2011). Bandwidth constrained coordinated HW/SW prefetching for multicores. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6852 LNCS, pp. 310–325). https://doi.org/10.1007/978-3-642-23400-2_29
Mendeley helps you to discover research relevant for your work.