Abstract
To increase the performance of data-intensive applications, we present an extension to a CPU architecture that enables arbitrary near-data processing capabilities close to the main memory. This is realized by introducing a component attached to the CPU system-bus and a component at the memory side. Together they support hardware-managed coherence and virtual memory support to integrate the near-data processors in a shared-memory environment. We present an implementation of the components, as well as a systemsimulator, providing detailed performance estimations. With a variety of syntheticworkloadswe demonstrate the performance of the memory accesses, the mixed fine-And coarse-grained coherence mechanisms, and the near-data processor communication mechanism. Furthermore, we quantify the inevitable start-up penalty regarding coherence and data writeback, and argue that near-data processingworkloads should access data several times to offset this penalty. A case study based on the Graph500 benchmark confirms the small overhead for the proposed coherence mechanisms and shows the ability to outperform a real CPU by a factor of two.
Author supplied keywords
Cite
CITATION STYLE
Vermij, E., Fiorin, L., Jongerius, R., Hagleitner, C., Van Lunteren, J., & Bertels, K. (2017). An architecture for integrated near-data processors. ACM Transactions on Architecture and Code Optimization, 14(3). https://doi.org/10.1145/3127069
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.