The relationship between the degree of central processor pipelining and performance is examined. This relationship is studied in the context of modern supercomputers. Limitations due to instruction dependencies are studied via simulations of the CRAY-1S for both scalar and vector code are studied. It is shown that instruction dependencies severely limit scalar code performance as well as overall performance. Latch overhead, which is primarily caused by the difference between maximum and minimum gate propagation delays is studied analytically in order to obtain a lower bound on the clock period that may be used in a pipelined system. This analysis also touches on other points related to latch clocking, and shows that for short pipeline segments, the Earle latch and polarity hold latch both give the same clock period bound for single-phase and multiphase clocks. Overhead due to data skew and unintentional clock skew are each added to the CRAY-1S simulation model. Simulation results with realistic assumptions show that eight to ten gate levels per pipeline segment lead to optimal overall performance. The results also show that for short pipeline segments data skew and clock skew contribute about equally to the degradation in performance.
CITATION STYLE
Kunkel, S. R., & Smith, J. E. (1986). OPTIMAL PIPELINING IN SUPERCOMPUTERS. In Conference Proceedings - Annual Symposium on Computer Architecture (pp. 404–411). IEEE. https://doi.org/10.1145/17356.17403
Mendeley helps you to discover research relevant for your work.