Performance modelling and optimization of memory access on cellular computer architecture Cyclops64

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper focuses on the Cyclops64 computer architecture and presents an analytical model and performance simulation results for the preloading and loop unrolling approaches to optimize the performance of SVD (Singular Value Decomposition) benchmark. A performance model for dissecting the total execution cycles is presented. The data preloading using "memcpy" or hand optimized "inline" assembly code, and the loop unrolling approach are implemented and compared with each other in terms of the total number of memory access cycles. The key idea is to preload data from offchip to onchip memory and store the data back after the computation. These approaches can reduce the total memory access cycles and can thus improve the benchmark performance significantly. © IFIP International Federation for Information Processing 2005.

Cite

CITATION STYLE

APA

Niu, Y., Hu, Z., Barner, K., & Gao, G. R. (2005). Performance modelling and optimization of memory access on cellular computer architecture Cyclops64. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3779 LNCS, pp. 132–143). https://doi.org/10.1007/11577188_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free