Sign up & Download
Sign in

GPU support for batch oriented workloads

by Lauro B Costa, Samer Al-Kiswany, Matei Ripeanu
2009 IEEE 28th International Performance Computing and Communications Conference (2009)

Abstract

This paper explores the ability to use graphics processing units (GPUs) as co-processors to harness the inherent parallelism of batch operations in systems that require high performance. To this end we have chosen bloom filters (space-efficient data structures that support the probabilistic representation of set membership) as the queries these data structures support are often performed in batches. Bloom filters exhibit low computational cost per amount of data, providing a baseline for more complex batch operations. We implemented BloomGPU a library that supports offloading bloom filter support to the GPU and evaluate this library under realistic usage scenarios. By completely offloading Bloom filter operations to the GPU, BloomGPU outperforms an optimized CPU implementation of the bloom filter as the workload becomes larger.

Cite this document (BETA)

Available from ieeexplore.ieee.org
Page 1
hidden

GPU support for batch oriented workloads

GPU Support for Batch Oriented Workloads
Lauro B. Costa Samer Al-Kiswany Matei Ripeanu
NetSysLab
Electrical and Computer Engineering Department
The University of British Columbia
Vancouver, BC, Canada
{lauroc,samera,matei}@ece.ubc.ca

Abstract - This paper explores the ability to use Graphics
Processing Units (GPUs) as co-processors to harness the
inherent parallelism of batch operations in systems that
require high performance. To this end we have chosen
Bloom filters (space-efficient data structures that support
the probabilistic representation of set membership) as the
queries these data structures support are often performed
in batches. Bloom filters exhibit low computational cost
per amount of data, providing a baseline for more
complex batch operations. We implemented BloomGPU a
library that supports offloading Bloom filter support to
the GPU and evaluate this library under realistic usage
scenarios. By completely offloading Bloom filter
operations to the GPU, BloomGPU outperforms an
optimized CPU implementation of the Bloom filter as the
workload becomes larger.
Keywords – gpu; bloom filter; batch workload;
graphics processing unit
I. INTRODUCTION
High–performance systems that produce, process,
and manage large amounts of data, use space-efficient
data structures that provide fast response time when
operating over these large datasets. A common
characteristic of these systems is the use of batch
operations to handle the data elements. That is, the data
structures are built, queried, and processed for a set of
elements at a time and not just for a single element.
Since at the data-element level the operations in the
batch are idempotent, batches have the attractive
inherent potential for parallelization.
Recent technology trends make such potential for
parallelization even more attractive indeed. After
decades relying on single-core processor performance
improvements (e.g., faster clock-rates, larger caches,
instruction-level parallelism) for higher application
performance, developers have now the choice of a wide
array of novel, multi/many-core architectures from
symmetric (massively) multi-core processors (e.g.,
existing quad-core architectures, Intel’s 80-core
Larabee prototype [18], GPUs), to asymmetric
multicores (e.g., IBM’s Cell Broadband Engine
processor).
This paper explores the ability to exploit the
parallelization opportunity triggered by batch
workloads on a specific class of massively multicore
processing units: namely Graphical Processing Units
(GPUs). There are two reasons for this choice: GPUs’
single- instruction-multiple-data (SIMD) architecture is
a good match to batch operations, even though this
architecture does not offer the same flexibility as chip-
level multiprocessing. Second, today’s GPUs have
evolved to highly parallel devices that deliver an
enormous computational power at relatively low-cost.
We explore the support for one, although simple,
important and frequently used operation: membership
queries, that is queries that simply ask “Is element x in
set S?”. To support set operations, we use Bloom filters
[1] as their space- and time-efficiency characteristics
make them a popular and practical implementation
option for many existing systems [2].
Recently, Bloom filters have been used in scenarios
that involve batch operations in high-performance data-
intensive systems like Globus Replica Location Service
(RLS) [5] and Google’s BigTable [3]. The RLS is a
mechanism to maintain information about file replica
location in large distributed computing systems. RLS
uses Bloom filters to represent the set of file replicas
stored at a node thus avoiding to forward file requests
to nodes that do not keep a replica of the file. Large
RLS deployments use Bloom filters to represent sets of
millions of files. BigTable is the primary storage system
for several Google applications and handles tens of
thousands of lookup operations per second. BigTable
uses Bloom filters to represent the set of data items
already loaded in the memory, thus helping to avoid
touching the disk for lookups.
Additionally, Bloom filters rely on hash calculations
over the sets’ elements, rendering a relatively low ratio
of computation over data. This characteristic makes
Bloom filters a good baseline for comparison, useful in
providing a lower bound for the speedups that can be
obtained with more computationally intensive batch
oriented workloads.
The contributions of this work are:
ƒ This paper proposes BloomGPU, a library that
enables the use of GPUs as a co-processor to
offload Bloom filter operations, as the vehicle to
evaluate the feasibility of using the novel multi-
core architectures to support batch operations in
current data-intensive high performance systems.
We explore BloomGPU performance under
different workloads and determine in which
situations GPU’s characteristics can be efficiently

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

4 Readers on Mendeley
by Discipline
 
by Academic Status
 
50% Student (Master)
 
50% Ph.D. Student
by Country
 
50% Brazil
 
25% Canada
 
25% Slovenia