Abstract
High performance computer implementation today is increasingly directed toward parallelism in the hardware. Superscalar machines, where the hardware can issue more than one instruction each cycle, are being adopted by more implementations. As the trend toward wider issue rates continues, so too must the ability to fetch more instructions each cycle. Although compilers can improve the situation by increasing the size of basic blocks, hardware mechanisms to fetch multiple possibly non-consecutive basic blocks are also needed. Viable mechanisms for fetching multiple non-consecutive basic blocks have not been previously investigated. We present a mechanism for predicting multiple branches and fetching multiple non-consecutive basic blocks each cycle which is both viable and effective. We measured the effectiveness of the mechanism in terms of the IPCLf, the number of instructions fetched per clock for a machine front-end. For one, two, and three basic blocks, the IPC-f of integer benchmarks went from 3.0 to 4.2 and 4.9, respectively. For floating point benchmarks, the IPC-f went from from 6.6 to 7.1 and 8.9.
Cite
CITATION STYLE
Yeh, T. Y., Marr, D. T., & Patt, Y. N. (1993). Increasing the instruction fetch rate via multiple branch prediction and a branch address cache. In Proceedings of the International Conference on Supercomputing (Vol. Part F129670, pp. 67–76). Association for Computing Machinery. https://doi.org/10.1145/165939.165956
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.