Local memory is a key factor for the performance of accelerators in SoCs. Despite technology scaling, the gap between on-chip storage and memory footprint of embedded applications keeps widening. We present a solution to preserve the speedup of accelerators when scaling from small to large data sets. Combining specialized DMA and address translation with a software layer in Linux, our design is transparent to user applications and broadly applicable to any class of SoCs hosting high-throughput accelerators. We demonstrate the robustness of our design across many heterogeneous workload scenarios and memory allocation policies with FPGA-based SoC prototypes featuring twelve concurrent accelerators accessing up to 768MB out of 1GB-addressable DRAM.
CITATION STYLE
Mantovani, P., Cota, E. G., Pilato, C., Di Guglielmo, G., & Carloni, L. P. (2016). Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chip. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES 2016. Association for Computing Machinery, Inc. https://doi.org/10.1145/2968455.2968509
Mendeley helps you to discover research relevant for your work.