Abstract
Traditionally, register files have been the primary agent for inter-operation communication in load/store architectures. As processors start issuing multiple instructions per cycle, a centralized register file can easily become a bottleneck. This paper analyzes the register file traffic in a load/store architecture with a view to motivate the development of alternate inter-operation communication mechanisms that reduce the bandwidth demanded of a centralized register file. We first provide metrics to characterize the register traffic. These metrics deal with the degree and locality of use of the register instances created. We then present the results of a simulation study that uses the MIPS R2000 architecture and the SPEC benchmark programs. We have two major results. First, a large number of the register instances are used only once, and the average degree of use of register instances is about 2. Second, most of the register instances are used up soon after they are created (within about 30-40 instructions). This suggests that alternate inter-operation communication mechanisms that exploit the temporal locality of use of register instances are likely to be effective in reducing the traffic burden on the centralized register file. The second result was pivotal in the design of the distributed register file for the multiscalar processing paradigm.
Cite
CITATION STYLE
Franklin, M., & Sohi, G. S. (1992). Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors. In Proceedings of the 25th Annual International Symposium on Microarchitecture (pp. 236–245). Publ by ACM. https://doi.org/10.1145/144965.145818
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.