Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

90Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

High performance computer implementation today is increasingly directed toward parallelism in the hardware. Superscalar machines, where the hardware can issue more than one instruction each cycle, are being adopted by more implementations. As the trend toward wider issue rates continues, so too must the ability to fetch more instructions each cycle. Although compilers can improve the situation by increasing the size of basic blocks, hardware mechanisms to fetch multiple possibly non-consecutive basic blocks are also needed. Viable mechanisms for fetching multiple non-consecutive basic blocks have not been previously investigated. We present a mechanism for predicting multiple branches and fetching multiple non-consecutive basic blocks each cycle which is both viable and effective. We measured the effectiveness of the mechanism in terms of the IPCLf, the number of instructions fetched per clock for a machine front-end. For one, two, and three basic blocks, the IPC-f of integer benchmarks went from 3.0 to 4.2 and 4.9, respectively. For floating point benchmarks, the IPC-f went from from 6.6 to 7.1 and 8.9.

Cite

CITATION STYLE

APA

Yeh, T. Y., Marr, D. T., & Patt, Y. N. (1993). Increasing the instruction fetch rate via multiple branch prediction and a branch address cache. In Proceedings of the International Conference on Supercomputing (Vol. Part F129670, pp. 67–76). Association for Computing Machinery. https://doi.org/10.1145/165939.165956

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free