Neummu: Architectural support for efcient address translations in neural processing units

28Citations
Citations of this article
63Readers
Mendeley users who have this article in their library.
Get full text

Abstract

To satisfy the compute and memory demands of deep neural networks (DNNs), neural processing units (NPUs) are widely being utilized for accelerating DNNs. Similar to how GPUs have evolved from a slave device into a mainstream processor architecture, it is likely that NPUs will become frst-class citizens in this fast-evolving heterogeneous architecture space. This paper makes a case for enabling address translation in NPUs to decouple the virtual and physical memory address space. Through a careful data-driven application characterization study, we root-cause several limitations of prior GPU-centric address translation schemes and propose a memory management unit (MMU) that is tailored for NPUs. Compared to an oracular MMU design point, our proposal incurs only an average 0.06% performance overhead.

Cite

CITATION STYLE

APA

Hyun, B., Kwon, Y., Choi, Y., Kim, J., & Rhu, M. (2020). Neummu: Architectural support for efcient address translations in neural processing units. In International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS (pp. 1109–1124). Association for Computing Machinery. https://doi.org/10.1145/3373376.3378494

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free