Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chip

19Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

Abstract

Local memory is a key factor for the performance of accelerators in SoCs. Despite technology scaling, the gap between on-chip storage and memory footprint of embedded applications keeps widening. We present a solution to preserve the speedup of accelerators when scaling from small to large data sets. Combining specialized DMA and address translation with a software layer in Linux, our design is transparent to user applications and broadly applicable to any class of SoCs hosting high-throughput accelerators. We demonstrate the robustness of our design across many heterogeneous workload scenarios and memory allocation policies with FPGA-based SoC prototypes featuring twelve concurrent accelerators accessing up to 768MB out of 1GB-addressable DRAM.

Cite

CITATION STYLE

APA

Mantovani, P., Cota, E. G., Pilato, C., Di Guglielmo, G., & Carloni, L. P. (2016). Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chip. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES 2016. Association for Computing Machinery, Inc. https://doi.org/10.1145/2968455.2968509

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free