Sign up & Download
Sign in

Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems

by J Lin, Q Lu, X Ding, Z Zhang, X Zhang, P Sadayappan
2008 IEEE 14th International Symposium on High Performance Computer Architecture (2008)

Abstract

Cache partitioning and sharing is critical to the effective utilization of multicore processors. However, almost all existing studies have been evaluated by simulation that often has several limitations, such as excessive simulation time, absence of OS activities and proneness to simulation inaccuracy. To address these issues, we have taken an efficient software approach to supporting both static and dynamic cache partitioning in OS through memory address mapping. We have comprehensively evaluated several representative cache partitioning schemes with different optimization objectives, including performance, fairness, and quality of service (QoS). Our software approach makes it possible to run the SPEC CPU2006 benchmark suite to completion. Besides confirming important conclusions from previous work, we are able to gain several insights from whole-program executions, which are infeasible from simulation. For example, giving up some cache space in one program to help another one may improve the performance of both programs for certain workloads due to reduced contention for memory bandwidth. Our evaluation of previously proposed fairness metrics is also significantly different from a simulation-based study. The contributions of this study are threefold. (1) To the best of our knowledge, this is a highly comprehensive execution- and measurement-based study on multicore cache partitioning. This paper not only confirms important conclusions from simulation-based studies, but also provides new insights into dynamic behaviors and interaction effects. (2) Our approach provides a unique and efficient option for evaluating multicore cache partitioning. The implemented software layer can be used as a tool in multicore performance evaluation and hardware design. (3) The proposed schemes can be further refined for OS kernels to improve performance.

Cite this document (BETA)

Available from ieeexplore.ieee.org
Page 1
hidden

Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems

Gaining Insights into Multicore Cache Partitioning:
Bridging the Gap between Simulation and Real Systems
Jiang Lin1, Qingda Lu2, Xiaoning Ding2, Zhao Zhang1, Xiaodong Zhang2 and P. Sadayappan2
1Dept. of Electrical and Computer Engineering
Iowa State University
Ames, IA 50011
{linj,zzhang}@iastate.edu
2 Dept. of Computer Science and Engineering
The Ohio State University
Columbus, OH 43210
{luq,dingxn,zhang,saday}@cse.ohio-state.edu
Abstract
Cache partitioning and sharing is critical to the effective
utilization of multicore processors. However, almost all ex-
isting studies have been evaluated by simulation that often
has several limitations, such as excessive simulation time,
absence of OS activities and proneness to simulation inac-
curacy. To address these issues, we have taken an efficient
software approach to supporting both static and dynamic
cache partitioning in OS through memory address map-
ping. We have comprehensively evaluated several represen-
tative cache partitioning schemes with different optimiza-
tion objectives, including performance, fairness, and qual-
ity of service (QoS). Our software approach makes it possi-
ble to run the SPEC CPU2006 benchmark suite to comple-
tion. Besides confirming important conclusions from previ-
ous work, we are able to gain several insights from whole-
program executions, which are infeasible from simulation.
For example, giving up some cache space in one program
to help another one may improve the performance of both
programs for certain workloads due to reduced contention
for memory bandwidth. Our evaluation of previously pro-
posed fairness metrics is also significantly different from a
simulation-based study.
The contributions of this study are threefold. (1) To
the best of our knowledge, this is a highly comprehen-
sive execution- and measurement-based study on multicore
cache partitioning. This paper not only confirms important
conclusions from simulation-based studies, but also pro-
vides new insights into dynamic behaviors and interaction
effects. (2) Our approach provides a unique and efficient
option for evaluating multicore cache partitioning. The im-
plemented software layer can be used as a tool in multi-
core performance evaluation and hardware design. (3) The
proposed schemes can be further refined for OS kernels to
improve performance.
1. Introduction
Cache partitioning and sharing is critical to the effec-
tive utilization of multicore processors. Cache partition-
ing usually refers to the partitioning of shared L2 or L3
caches among a set of programming threads running simul-
taneously on different cores. Most commercial multicore
processors today still use cache designs from uniproces-
sors, which do not consider the interference among multiple
cores. Meanwhile, a number of cache partitioning methods
have been proposed with different optimization objectives,
including performance [17, 11, 5, 2], fairness [8, 2, 12], and
QoS (Quality of Service) [6, 10, 12].
Most existing studies, including the above cited ones,
were evaluated by simulation. Although simulation is flexi-
ble, it possesses several limitations in evaluating cache par-
titioning schemes. The most serious one is the slow sim-
ulation speed – it is infeasible to run large, complex and
dynamic real-world programs to completion on a cycle-
accurate simulator. A typical simulation-based study may
only simulate a few billion instructions for a program,
which is equivalent to about one second of execution on a
real machine. The complex structure and dynamic behav-
ior of concurrently running programs can hardly be repre-
sented by such a short execution. Furthermore, the effect
of operating systems can hardly be evaluated in simulation-
based studies because the full impact cannot be observed in
a short simulation time. This limitation may not be the most
serious concern for microprocessor design, but is becoming
increasingly relevant to system architecture design. In ad-
dition, careful measurements on real machines are reliable,
while evaluations on simulators are prone to inaccuracy and
coding errors.
Our Objectives and Approach To address these limi-
tations, we present an execution- and measurement-based
study attempting to answer the following questions of con-
cern: (1) Can we confirm the conclusions made by the
simulation-based studies on cache partitioning and sharing
1

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

29 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
62% Ph.D. Student
 
17% Student (Master)
 
7% Doctoral Student
by Country
 
31% China
 
31% United States
 
7% Japan