Near memory data structure rearrangement

19Citations
Citations of this article
44Readers
Mendeley users who have this article in their library.

Abstract

As CPU core counts continue to increase, the gap between compute power and available memory bandwidth has widened. A larger and deeper cache hierarchy benefits locality-friendly computation, but offers limited improvement to irregular, data intensive applications. In this work we explore a novel approach to accelerating these applications through in-memory data restructuring. Unlike other proposed processing-in- memory architectures, the rearrangement hardware performs data reduction, not compute offload. Using a custom FPGA emulator, we quantitatively evaluate performance and en- ergy benefits of near-memory hardware structures that dy- namically restructure in-memory data to cache-friendly lay- out, minimizing wasted memory bandwidth. Our results on representative irregular benchmarks using the Micron Hy- brid Memory Cube memory model show speedup, band- width savings, and energy reduction. We present an API for the near-memory accelerator and describe the interaction between the CPU and the rearrangement hardware with ap- plication examples. The merits of an SRAM vs. a DRAM scratchpad buffer for rearranged data are explored.

Cite

CITATION STYLE

APA

Gokhale, M., Lloyd, S., & Hajas, C. (2015). Near memory data structure rearrangement. In ACM International Conference Proceeding Series (Vol. 05-08-October-2015, pp. 283–290). Association for Computing Machinery. https://doi.org/10.1145/2818950.2818986

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free