Many modern workloads compute on large amounts of data, often with irregular memory accesses. Current architectures perform poorly for these workloads, as existing prefetching techniques cannot capture the memory access patterns; these applications end up heavily memory-bound as a result. Although a number of techniques exist to explicitly configure a prefetcher with traversal patterns, gaining significant speedups, they do not generalise beyond their target data structures. Instead, we propose an event-triggered programmable prefetcher combining the flexibility of a generalpurpose computational unit with an event-based programming model, along with compiler techniques to automatically generate events from the original source code with annotations. This allows more complex fetching decisions to be made, without needing to stall when intermediate results are required. Using our programmable prefetching system, combined with small prefetch kernels extracted from applications, we achieve an average 3.0× speedup in simulation for a variety of graph, database and HPC workloads.
CITATION STYLE
Ainsworth, S., & Jones, T. M. (2018). An event-triggered programmable prefetcher for irregular workloads. In ACM SIGPLAN Notices (Vol. 53, pp. 578–592). Association for Computing Machinery. https://doi.org/10.1145/3173162.3173189
Mendeley helps you to discover research relevant for your work.