Zero-Content Augmented Caches
- ISBN: 9781605584980
- DOI: 10.1145/1542275.1542288
Abstract
It has been observed that some applications manipulate large amounts of null data. Moreover these zero data often exhibit high spatial locality. On some applications more than 20% of the data accesses concern null data blocks. Representing a null block in a cache on a standard cache line appears as a waste of resources. In this paper, we propose the Zero-Content Augmented cache, the ZCA cache. A ZCA cache consists of a conventional cache augmented with a specialized cache for memorizing null blocks, the Zero-Content cache or ZC cache. In the ZC cache, the data block is represented by its address tag and a validity bit. Moreover, as null blocks generally exhibit high spatial locality, several null blocks can be associated with a single address tag in the ZC cache. For instance, a ZC cache mapping 32MB of zero 64-byte lines uses less than 80KB of storage. Decompression of a null block is very simple, therefore read access time on the ZCA cache is in the same range as the one of a conventional cache. On applications manipulating large amount of null data blocks, such a ZC cache allows to significantly reduce the miss rate and memory traffic, and therefore to increase performance for a small hardware overhead. In particular, the write-back traffic on null blocks is limited. For applications with a low null block rate, no performance loss is observed.
Zero-Content Augmented Caches
Julien Dusser
julien.dusser@inria.fr
Thomas Piquet
thomas.piquet@inria.fr
André Seznec
andre.seznec@inria.fr
Centre de recherche INRIA Rennes – Bretagne Atlantique
Campus de Beaulieu, 35042 Rennes Cedex, France
ABSTRACT
It has been observed that some applications manipulate large
amounts of null data. Moreover these zero data often exhibit
high spatial locality. On some applications more than 20%
of the data accesses concern null data blocks. Representing
a null block in a cache on a standard cache line appears as
a waste of resources.
In this paper, we propose the Zero-Content Augmented
cache, the ZCA cache. A ZCA cache consists of a conven-
tional cache augmented with a specialized cache for memo-
rizing null blocks, the Zero-Content cache or ZC cache. In
the ZC cache, the data block is represented by its address
tag and a validity bit. Moreover, as null blocks generally
exhibit high spatial locality, several null blocks can be asso-
ciated with a single address tag in the ZC cache.
For instance, a ZC cache mapping 32MB of zero 64-byte
lines uses less than 80KB of storage. Decompression of a
null block is very simple, therefore read access time on the
ZCA cache is in the same range as the one of a conventional
cache. On applications manipulating large amount of null
data blocks, such a ZC cache allows to significantly reduce
the miss rate and memory traffic, and therefore to increase
performance for a small hardware overhead. In particular,
the write-back traffic on null blocks is limited. For appli-
cations with a low null block rate, no performance loss is
observed.
Categories and Subject Descriptors
B.3.2 [Hardware]: Memory Structures—Cache memories
General Terms
Design, Performances
1. INTRODUCTION
It has been observed that some applications manipulate
large amounts of null data. Ekman and Stenstrom [7] showed
that on many applications many data are null in memory
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ICS’09, June 8–12, 2009, Yorktown Heights, New York, USA.
Copyright 2009 ACM 978-1-60558-498-0/09/06 ...$5.00.
and that in many cases, complete 64-byte blocks are null.
This study was performed through dumping the memory
content. For SPEC2000 benchmarks, they report that 30%
of 64-byte memory blocks are only zeros, with some bench-
marks such as gcc exhibiting up to 75% of null blocks in
memory. While these results stand for static blocks in mem-
ory, our experiments further show that on some applications
more than 20 % of dynamic accesses to data are accesses to
null 64-byte data blocks. Moreover these zero data often
exhibit high spatial locality. Resources are wasted in repre-
senting null block data on a standard cache line.
Null blocks could be represented in an adjunct cache ac-
cessed in parallel with the cache as was suggested for fre-
quently used values for the Frequent Value Cache in [23].
A null block would be represented by its address tag and
a single validity bit. In such an adjunct cache, the address
tag would constitute the major storage cost. However, the
spatial locality of null blocks can be leveraged. The Zero
Content Augmented cache, ZCA cache (Fig. 4) presented
in this paper associates a conventional cache with a zero-
content cache, ZC cache. The ZC cache only stores null
blocks. A ZC cache entry consists of an address tag and N
validity bits. Therefore a single ZC cache entry can map up
to N null blocks. The ZC cache is accessed in parallel with
the cache. The ZC cache can represent a large number of
null blocks at a very limited storage cost. For instance, if
block size is 64 bytes, the null blocks in an 8 KBytes page
can be represented with a single address tag and 128 validity
bits: a 4096-entry ZC cache can map up to 32 MBytes of
null blocks and uses only 78 KBytes of storage. While using
more general compressed caches has been considered in sev-
eral previous studies [2, 23], the ZCA cache features a very
simple compression/decompression hardware. Compression
just requires a tree of OR gates for detecting a null block.
Decompression does not induce extra access latency.
On applications manipulating large amounts of null data
blocks, the ZC cache allows to reduce the miss rate on the
main cache and on the memory traffic. Moreover as a side-
effect, some write-back traffic is suppressed: null blocks are
often overwritten with null data, the ZC cache captures this
situation and avoids writing back these blocks.
The remainder of the paper is organized as follows. Sec-
tion 2 analyzes the occurrences of accesses to null data blocks
through the whole memory hierarchy. Section 3 presents
the architecture of the ZCA cache. In Section 4, we present
our experimental framework. Section 5 presents the per-
formance evaluation of the ZCA cache. Section 6 discusses
related work. Section 7 shows that ZCA functionalities can
46
NAPKI APKI NAPKI APKI NAPKI APKI NAPKI APKI
gzip 2.79 235 0.44 12.27 0.34 1.20 0.32 0.85
wupwise 7.75 274 0.17 4.52 0.16 4.22 0.15 4.11
swim 0.06 429 0.06 50.72 0.06 44.25 0.06 34.04
mgrid 9.82 430 1.05 10.86 1.05 10.21 1.02 7.35
applu 0.39 345 0.07 14.16 0.07 13.68 0.07 13.67
vpr 2.74 268 1.55 21.00 0.54 11.42 0.01 1.13
gcc 107.11 417 9.54 28.63 6.38 20.26 0.52 1.16
mesa 10.19 333 0.58 2.58 0.58 1.23 0.58 1.15
art 0.63 274 0.53 145.13 0.53 145.10 0.53 87.15
mcf 0.06 509 0.03 153.96 0.03 129.19 0.03 102.37
equake 9.55 390 1.93 23.37 1.92 22.22 1.92 22.09
crafty 1.36 286 0.06 11.98 0.00 0.48 0.00 0.05
ammp 0.52 369 0.07 23.18 0.07 16.85 0.07 6.27
parser 4.79 297 0.59 13.09 0.44 5.73 0.14 2.32
sixtrack 25.15 228 0.21 1.34 0.17 0.81 0.07 0.25
bzip2 2.19 294 0.28 10.90 0.25 3.38 0.09 0.32
twolf 0.01 341 0.00 33.92 0.00 22.11 0.00 5.46
apsi 17.02 301 1.11 14.42 0.84 8.60 0.45 3.39
Table 1: Null block Access Per Kilo-Instruction (NAPKI) and Access Per Kilo-Instruction (APKI) on a
32KB L1 data cache, a 256KB L2 and a 1MB L3 cache with 64B blocks.
be added to a decoupled sectored cache at a very limited
hardware cost. Section 8 concludes this study.
2. ACCESSES TONULL DATABLOCKS IN
APPLICATIONS
Storing a null memory block in the memory hierarchy can
be seen as a waste of cache space. Our study focuses on stor-
ing these null blocks in a compressed form. Such a mecha-
nism can be justified only if the accesses to null blocks rep-
resent a significant part of cache accesses. In this section,
we first analyze quantitatively the occurrences of accesses
to null data blocks in applications showing that some appli-
cations exhibit a quite significant amount of access to null
data blocks. Then, for two applications, we analyze how null
data blocks are used all along the execution.
2.1 Quantifying accesses to null blocks
Table 1 represents the dynamic occurrences of accesses to
null blocks in the different levels of a memory hierarchy for
the first 50 billions instructions on SPEC2000 CPU bench-
marks, thus eliminating initialization phase effects. We rep-
resent the frequency of accesses to null blocks, both misses
and write-backs on a three-level memory hierarchy. Access
per kilo-instructions (APKI) features previous level misses
and writebacks. Our experimental framework is further de-
scribed in Section 4.
We observe that most of the SPEC CPU 2000 applications
manipulate some null data blocks, but in very different pro-
portions. In particular, for some applications e.g. mesa, gcc
and mgrid, more than 20% of the accesses flowing out to the
main memory concerned null data blocks.
From Table 1, we can infer that avoiding traffic on null
data blocks, particularly on the main memory, may help to
improve performance on many applications. On our bench-
mark set, one can expect some performance gain on gzip,
wupwise, mgrid, gcc, equake, parser, bzip2 and apsi.
Figure 1 illustrates the proportion of accesses to null data
blocks on the main memory over 50 billions of instructions,
each point representing an interval of a billion instructions.
This figure shows that while some applications (wupwise
and apsi) essentially manipulate null blocks during their ini-
tialization phases, other applications manipulate significant
numbersofnullblocksoverthewholeexecution,e.g.mesa,
gcc, sixtrack and mgrid.
2.2 Null Block Usage Analysis
We analyze the use of a large ratio of null data blocks on
two application examples, mesa and gcc.
static void Render( int frames, [...] )
{
[. . .]
for (i=0; i<frames; i++) {
5 [...]
glClear([. . .]);
glPushMatrix();
glRotatef(−Xrot,1,0,0);
10 glRotatef(Yrot,0,1,0);
glRotatef(−90, 1, 0, 0);
SPECWriteIntermediateImage([...]);
DrawMesh();
glPopMatrix();
15 Yrot += 5.0F;
}
}
Figure 2: A code section manipulating large number
of null blocks in mesa.
Figure 2 illustrates a code section in mesa.Nullblocksare
manipulated all along the execution of the function Render.
Function glClear sets a whole buffer of 5MB (1280∗1024∗4)
to the default color which is .0. This generates lot of writes
of null blocks on the main memory. Then these null blocks
are read back and modified. This sequence is repeated for
each frame.
gcc also manipulates a large amount of null blocks. The
use of null data blocks varies during the execution. During
flow analysis and instruction scheduling, some data struc-
tures are initialized to zero. As an example, during instruc-
tion scheduling, function schedule block() illustrated in Fig-
ure 3 is called for each basic block. In this function, large
47
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


