As CPU core counts continue to increase, the gap between compute power and available memory bandwidth has widened. A larger and deeper cache hierarchy benefits locality-friendly computation, but offers limited improvement to irregular, data intensive applications. In this work we explore a novel approach to accelerating these applications through in-memory data restructuring. Unlike other proposed processing-in- memory architectures, the rearrangement hardware performs data reduction, not compute offload. Using a custom FPGA emulator, we quantitatively evaluate performance and en- ergy benefits of near-memory hardware structures that dy- namically restructure in-memory data to cache-friendly lay- out, minimizing wasted memory bandwidth. Our results on representative irregular benchmarks using the Micron Hy- brid Memory Cube memory model show speedup, band- width savings, and energy reduction. We present an API for the near-memory accelerator and describe the interaction between the CPU and the rearrangement hardware with ap- plication examples. The merits of an SRAM vs. a DRAM scratchpad buffer for rearranged data are explored.
CITATION STYLE
Gokhale, M., Lloyd, S., & Hajas, C. (2015). Near memory data structure rearrangement. In ACM International Conference Proceeding Series (Vol. 05-08-October-2015, pp. 283–290). Association for Computing Machinery. https://doi.org/10.1145/2818950.2818986
Mendeley helps you to discover research relevant for your work.