Speculative clustered caches for clustered processors

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Clustering is a technique for partitioning superscalar processor's execution resources to simultaneously allow for more in-flight instructions, wider issue width, and more aggressive clock speeds. As either the size of individual clusters or the total number of clusters increases, the distance to the first level data cache increases as well. Although clustering may expose more parallelism by allowing a greater number of instructions to be simultaneously analyzed and issued, the gains may be obliterated if the latencies to memory grow too large. We propose to augment each cluster with a small, fast, simple Level Zero (L0) data cache that is accessed in parallel with a traditional L1 data cache. The difference between our solution and other proposed caching techniques for clustered processors is that we do not support versioning or coherence. This may occasionally result in a load instruction that reads a stale value from the L0 cache, but the common case is a low latency hit in the L0 cache. Our simulation studies show that 4KB, 2-way set associative L0 caches provide a 6.5-12.3% IPC improvement over a wide range of processor configurations. © 2002 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Henry, D. S., Loh, G. H., & Sami, R. (2002). Speculative clustered caches for clustered processors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2327 LNCS, pp. 281–290). https://doi.org/10.1007/3-540-47847-7_24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free