Adding a vector unit to a superscalar processor

29Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

The focus of this paper is on adding a vector unit to a superscalar core, as a way to scale current state of the art superscalar processors. The proposed architecture has a vector register file that shares functional units both with the integer datapath and with the floating point datapath. A key point in our proposal is the design of a high performance cache interface that delivers high bandwidth to the vector unit at a low cost and low latency. We propose a double-banked cache with alignment circuitry to serve vector accesses and we study two cache hierarchies: one feeds the vector unit from the L1; the other from the L2. Our results show that large IPU values (higher than 10 in some cases) can be achieved. Moreover the scalability of our architecture simply requires addition of functional units, without requiring more issue bandwidth. As a consequence, the proposed vector unit achieves high performance for numerical and multimedia codes with minimal impact on the cycle time of the processor or on the performance of integer codes.

Cite

CITATION STYLE

APA

Quintana, F., Corbal, J., Espasa, R., & Valero, M. (1999). Adding a vector unit to a superscalar processor. Proceedings of the International Conference on Supercomputing, 1–10. https://doi.org/10.1145/305138.305148

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free