Sign up & Download
Sign in

NodeWiz: Fault-tolerant grid information service

by Sujoy Basu, Costa, Francisco Brasileiro, Sujata Banerjee, Puneet Sharma, Sung-Ju Lee
PeertoPeer Networking and Applications (2009)

Abstract

Abstract  Large scale grid computing systems may provide multitudinous services, from different providers, whose quality of service will vary. Moreover, services are deployed and undeployed in the grid with no central coordination. Thus, to find out the most suitable service to fulfill their needs, or to find the most suitable set of resources on which to deploy their services, grid users must resort to a Grid Information Service (GIS). This service allows users to submit rich queries that are normally composed of multiple attributes and range operations. The ability to efficiently execute complex searches in a scalable and reliable way is a key challenge for current GIS designs. Scalability issues are normally dealt with by using peer-to-peer technologies. However, the more reliable peer-to-peer approaches do not cater for rich queries in a natural way. On the other hand, approaches that can easily support these rich queries are less robust in the presence of failures. In this paper we present the design of NodeWiz, a GIS that allows multi-attribute range queries to be performed efficiently in a distributed manner, while maintaining load balance and resilience to failures.

Cite this document (BETA)

Available from www.springerlink.com
Page 1
hidden

NodeWiz: Fault-tolerant grid information service

Peer-to-Peer Netw Appl (2009) 2:348–366
DOI 10.1007/s12083-009-0030-1
NodeWiz: Fault-tolerant grid information service
Sujoy Basu · Lauro Beltrão Costa ·
Francisco Brasileiro · Sujata Banerjee ·
Puneet Sharma · Sung-Ju Lee
Received: 11 April 2008 / Accepted: 26 January 2009 / Published online: 17 March 2009
' Springer Science + Business Media, LLC 2009
Abstract Large scale grid computing systems may pro-
vide multitudinous services, from different providers,
whose quality of service will vary. Moreover, services
are deployed and undeployed in the grid with no central
coordination. Thus, to find out the most suitable service
to fulfill their needs, or to find the most suitable set
of resources on which to deploy their services, grid
users must resort to a Grid Information Service (GIS).
This service allows users to submit rich queries that
are normally composed of multiple attributes and range
operations. The ability to efficiently execute complex
searches in a scalable and reliable way is a key challenge
for current GIS designs. Scalability issues are normally
dealt with by using peer-to-peer technologies. How-
ever, the more reliable peer-to-peer approaches do not
cater for rich queries in a natural way. On the other
hand, approaches that can easily support these rich
S. Basu · S. Banerjee · P. Sharma · S.-J. Lee
Hewlett-Packard Laboratories, Palo Alto, CA 94304, USA
S. Basu
e-mail: sujoy.basu@hp.com
S. Banerjee
e-mail: sujata.banerjee@hp.com
P. Sharma
e-mail: puneet.sharma@hp.com
S.-J. Lee
e-mail: sungju.lee@hp.com
L. B. Costa · F. Brasileiro (B)
Universidade Federal de Campina Grande,
58.109-970, Campina Grande, Paraíba, Brazil
e-mail: fubica@dsc.ufcg.edu.br
L. B. Costa
e-mail: lauro@dsc.ufcg.edu.br
queries are less robust in the presence of failures. In this
paper we present the design of NodeWiz, a GIS that
allows multi-attribute range queries to be performed
efficiently in a distributed manner, while maintaining
load balance and resilience to failures.
Keywords Grid information service · Peer-to-peer ·
K-d-tree · Failure detection · Availability
1 Introduction
Efficient discovery of resources and services is a crucial
problem in the deployment of computational grids, es-
pecially as these evolve to support diverse applications
including interactive applications with real-time QoS
requirements (e.g., multi-player networked games).
Within such an environment multitudinous services
made available by different providers co-exist. Once
services are deployed and properly advertised, users
can search for the available services and select the most
suitable ones to cater for their needs. It is anticipated
that clients will search for raw computing and storage
resources (e.g., machine with Pentium 1.8 GHz CPU
and at least 512 MB memory) as well as services (e.g.,
lightly loaded Everquest game server). Furthermore,
the attributes may be dynamically changing (e.g., avail-
able bandwidth between two nodes) rather than static
(e.g., OS version). Finally, services may appear and
disappear in the grid, and the quality of service deliv-
ered by the more stable services may vary widely over
time. Thus, providers should be constantly renewing
their advertisement, while users should be constantly
querying for the availability of better services. These
trends make the resource or service discovery problem
Page 2
hidden
Peer-to-Peer Netw Appl (2009) 2:348–366 349
challenging. The information service must be archi-
tected to support multi-attribute range queries in an
efficient manner in this environment.
Grid Information Service (GIS) has been proposed
to help users in the task of choosing which service to
use to better fulfil their needs [7]. The GIS can be seen
as a directory in which providers publish the static and
dynamic attributes of their resources and services, and
to which the consumers of these services submit their
queries. Obviously, to be useful for large grids, a GIS
implementation must be scalable. Moreover, in a sys-
tem with potentially many thousands of components,
failures are the norm and not the exception. Therefore,
fault-tolerance of the GIS is another requirement.
Early implementations of the GIS were either cen-
tralized or distributed over a static hierarchy of infor-
mation severs. Centralized solutions do not scale well
in large systems or with dynamic attributes that change
rapidly. Many centralized solutions can be augmented
by replication, but then managing consistent replicas
can incur significant overhead. Hierarchical distributed
systems alleviate some of the issues with the central-
ized systems. However, most of these are inefficient
in retrieving the answers to a multi-attribute range
query; the dynamic nature of the attributes queried
implies that the query has to be forwarded inefficiently
to the hierarchy of information servers. Further, there
is limited recourse available if due to the query load
patterns, some information servers get heavily loaded
while others are essentially unloaded.
More recent approaches rely on some scalable struc-
tured peer-to-peer (P2P) substrate on top of which the
directory service is built [1, 4, 13, 19, 21, 27, 28]. Most of
these systems rely on Distributed Hash Tables (DHTs)
to implement structured P2P directories. DHTs are
scalable and very robust to failures. On the other hand,
the only search operation that is efficiently supported
by DHTs is exact match. These systems do not provide
a natural way to perform complex multi-attribute range
queries while maintaining load balance.
Our goal is to design a GIS that allows multi-
attribute range queries to be performed efficiently
in a distributed manner. We emphasize this class of
queries because these are among the more useful and
common types of queries that a client of the GIS
would need to execute to identify services or resources
that meet its requirements. In this paper, we present
NodeWiz, a GIS that is organized as a P2P system.
The multi-attribute search space is distributed among
the NodeWiz peers according to a distributed tree
structure. NodeWiz is self-organizing such that loaded
peers can dynamically offload some of their load onto
other peers. Further, as described later, the information
storage and organization is driven by query workloads,
thereby providing a very natural way, not only to
balance the query workload but also to optimize the
performance for more common multi-attribute range
queries. However, systems based on distributed tree
structures are in general less resilient to failures than
DHT-based ones. In this paper we analyze the im-
pact of failures in such systems and propose mecha-
nisms for dealing with these failures. They have been
implemented in NodeWiz and evaluated in this paper.
The next section provides the background and re-
lated work. Section 3 describes the NodeWiz architec-
ture in detail and presents the associated algorithms.
Next, in Section 4, we analyze the impact of failures.
Then in Section 5, we describe the fault-tolerance
mechanisms that have been implemented in NodeWiz.
Implementation issues are discussed in Section 6.This
is followed by an evaluation of our NodeWiz imple-
mentation in Section 7. Finally, our conclusions are
presented in Section 8.
2 Related work
A GIS is a key component of any large grid installa-
tion. It addresses the important problem of resource
and service discovery which enables such large-scale,
geographically-distributed, general-purpose resource
sharing environments. Deployed grids based on first
version of the Globus Toolkit [10] employed the Meta-
computing Directory Service (MDS) [16]. The initial
architecture was centralized. Subsequently, MDS-2 [7]
was implemented with a decentralized architecture.
The X.500 data model used by LDAP [25] is em-
ployed in MDS-2 to organize objects in a hierarchi-
cal namespace. Each entry has one or more object
classes, and must have values assigned to the manda-
tory attributes for these classes. Values for optional
attributes may also be present. The query language,
also borrowed from LDAP, allows search based on
attribute values, as well as on the position of ob-
jects in the hierarchical namespace. The MDS-2 system
architecture consists of directory servers and other in-
formation providers maintained by the different orga-
nizations participating in a grid. They use soft-state
registration to join aggregate directory servers, which
in turn can query them to get details of their content.
More recently, MDS-4 [23] has been released as part
of Globus Toolkit version 4. The interfaces of MDS-4
have been standardized using web services. The infor-
mation providers can be infrastructure monitoring tools
like Ganglia [9] or any service that is provided by the

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

6 Readers on Mendeley
by Discipline
 
 
by Academic Status
 
67% Student (Master)
 
33% Ph.D. Student
by Country
 
50% Brazil
 
17% United Kingdom
 
17% Germany