Abstract
Many applications employ irregular and sparse memory accesses that cannot take advantage of existing cache hierarchies in high performance processors. To solve this problem, Data Layout Transformation (DLT) techniques rearrange sparse data into a dense representation, improving locality and cache utilization. However, prior proposals in this space fail to provide a design that (i) scales with multi-core systems, (ii) hides rearrangement latency, and (iii) provides the necessary interfaces to ease programmability. In this work we present Planar, a programmable near-memory accelerator that rearranges sparse data into dense. By placing Planar devices at the memory controller level we enable a design that scales well with multi-core systems, hides operation latency by performing non-blocking fine-grain data rearrangements, and eases programmability by supporting virtual memory and conventional memory allocation mechanisms. Our evaluation shows that Planar leads to significant reductions in data movement and dynamic energy, providing an average 4.58× speedup.
Author supplied keywords
Cite
CITATION STYLE
Barredo, A., Armejach, A., Beard, J. C., & Moretó, M. (2021). Planar: A programmable accelerator for near-memory data rearrangement. In Proceedings of the International Conference on Supercomputing (pp. 164–176). Association for Computing Machinery. https://doi.org/10.1145/3447818.3460368
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.