Abstract
The execution of numerically intensive programs presents a challenge to memory system designers. Numerical program execution can be accelerated by pipelined arithmetic units, but to be effective, must be supported by high speed memory access. A cache memory is a well known hardware mechanism used to reduce the average memory access latency. Numerical programs, however, often have poor cache performance. Stride directed prefetching has been proposed to improve the cache performance of numerical programs executing on a vector processor. This paper shows how this approach can be extended to a scalar processor by using a simple hardware mechanism, called a stride prediction table (SPT), to calculate the stride distances of array accesses made from within the loop body of a program. The results using selected programs from the PERFECT and SPEC benchmarks show that stride directed prefetching on a scalar processor can significantly reduce the cache miss rate of particular programs and a SPT need only a small number of entries to be effective.
Cite
CITATION STYLE
Fu, J. W. C., Patel, J. H., & Janssens, B. L. (1992). Stride directed prefetching in scalar processors. In Proceedings of the 25th Annual International Symposium on Microarchitecture (pp. 102–110). Publ by ACM. https://doi.org/10.1145/144965.145006
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.