Sign up & Download
Sign in

Towards Automating the Configuration of a Distributed Storage System

by Lauro B Costa, Matei Ripeanu
Computer Engineering (2010)

Abstract

Versatile storage systems aim to maximize storage resource utilization by supporting the ability to `morph' the storage system to best match the application's demands. To this end, versatile storage systems significantly extend the deployment- or run-time configurability of the storage system. This flexibility, however, introduces a new problem: a much larger, and potentially dynamic, configuration space makes manually configuring the storage system an undesirable if not unfeasible task. This paper presents our initial progress towards answering the question: How can we configure a distributed storage system (i.e., enable/disable its various optimizations and configure their parameters) with minimal human intervention? We discuss why manually configuring the storage system is undesirable; present the success criteria for an automated configuration solution; propose a generic architecture that supports automated configuration; and, finally, instantiate this architecture into a first prototype, which controls the configuration of similarity detection optimizations in the MosaStore distributed storage system. Our evaluation results demonstrate that the prototype can provide performance close to the optimal configuration at the cost of minimal overhead.

Cite this document (BETA)

Available from ieeexplore.ieee.org
Page 1
hidden

Towards Automating the Configuration of a Distributed Storage System

Towards Automating the Configuration of a Distributed Storage System
Lauro B. Costa, Matei Ripeanu
Department of Electrical and Computer Engineering
The University of British Columbia
Vancouver, BC, Canada
{lauroc,matei}@ece.ubc.ca

Abstract—Versatile storage systems aim to maximize storage
resource utilization by supporting the ability to ‘morph’ the
storage system to best match the application’s demands. To
this end, versatile storage systems significantly extend the
deployment- or run-time configurability of the storage system.
This flexibility, however, introduces a new problem: a much
larger, and potentially dynamic, configuration space makes
manually configuring the storage system an undesirable if not
unfeasible task.
This paper presents our initial progress towards answering
the question: “How can we configure a distributed storage
system (i.e., enable/disable its various optimizations and
configure their parameters) with minimal human intervention?”
We discuss why manually configuring the storage system is
undesirable; present the success criteria for an automated
configuration solution; propose a generic architecture that
supports automated configuration; and, finally, instantiate this
architecture into a first prototype, which controls the
configuration of similarity detection optimizations in the
MosaStore distributed storage system. Our evaluation results
demonstrate that the prototype can provide performance close
to the optimal configuration at the cost of minimal overhead.
Keywords: distributed storage systems; automated configuration;
self-tuning;
I. INTRODUCTION
Aggregating available storage space from network-
connected nodes has several appealing properties: low cost –
it is cheaper than a dedicated storage solution; high efficiency
– it allows for good resource utilization; high-performance -
applications benefit form a wider I/O channel by striping
and/or replicating data across several nodes. One of the many
instances [1][2][3][5][6][15][16] where this technique is
used, yet relevant to Grid settings, is storage system ‘glide-
ins’ [1][18][19][20]. In this scenario, the components of a
storage system are submitted together with a batch
application and the storage system is instantiated on the fly
to aggregate the storage resources available on the nodes
allocated to the application. Once instantiated, the storage
system will provide a dedicated, high performance ‘scratch’
space co-located with the application. Multiple projects have
validated the practical appeal of this approach [1][3][18][20].
Instantiating the storage system on the fly, however,
raises a challenge: optimally designing and configuring the
storage system is significantly more complex. For example,
replication and caching may speed data access; yet, they
may entail complex consistency protocols. Online data
deduplication, i.e., data compression by detecting repeated
chunks of data and storing them only once, may save storage
space and bandwidth when there is high similarity between
successive write operations, yet at the cost of increased
computational overheads. Typically, to avoid such
complexity, most storage system designs fix these decisions
in order to make the system simpler to manage, providing, in
effect, a “one size fits all” solution.
An alternative rarely explored to date is a versatile
storage system approach [1][2]. Versatility in this case is the
ability to provide a set of storage-system optimizations that
can be activated and/or configured at deployment time or
even at runtime. Versatility enables maximally harnessing
the available storage resources by ‘morphing’ the storage
system to match the workload at hand. For example, for a
read-most workload with good locality, the administrator can
configure the storage system with appropriate replication and
caching levels to maximize read performance. Similarly, for
a checkpointing workload, depending on the type of
checkpointing used (e.g., application-level, process-level, or
virtual machine–level) and on the frequency of the
checkpointing operation, the administrator may enable
similarity detection to reduce the storage footprint and the
network effort [3].
Although versatility enables better harnessing the storage
resources and ultimately increased application performance,
it also requires the administrator to tune the storage system.
Such manual configuration is undesirable for several reasons
including: The administrator might lack the necessary
knowledge about the application and its generated storage
workload; temporal variations in the workload or new
application versions might make one-time tuning
meaningless; and, equally important, performance tuning is
time-consuming due to the large size of potential
configuration space that needs to be considered.
Our long-term research program aims to answer the
following question: How can we configure a distributed
storage system (i.e., enable/disable various optimizations
and configure their parameters) with minimal human
intervention? We aim for a solution that allows automated
configuration of a versatile storage system.
In this paper, we make initial progress towards answering
the above question. We start by providing an in-depth
analysis of why manually configuring the storage system is
undesirable. We then present the success criteria for a
solution that automates configuration tuning. We propose a
generic architecture that supports automated configuration
tuning and, finally, we instantiate this architecture into a first
prototype and integrate it with MosaStore distributed storage
system [1].

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

5 Readers on Mendeley
by Discipline
 
by Academic Status
 
60% Student (Master)
 
20% Ph.D. Student
 
20% Student (Postgraduate)
by Country
 
60% Brazil
 
20% Australia
 
20% Canada