SafeStore : A Durable and Practical Storage System
- ISBN: 9998888776
Abstract
This paper presents SafeStore, a distributed storage system designed to maintain long-term data durability despite conventional hardware and software faults, environmental disruptions, and administrative failures caused by human error or malice. The architecture of SafeStore is based on fault isolation, which Safe-Store applies aggressively along administrative, physical, and temporal dimensions by spreading data across autonomous storage service providers (SSPs). However, current storage interfaces provided by SSPs are not designed for high end-to-end durability. In this paper, we propose a new storage system architecture that (1) spreads data efficiently across autonomous SSPs using informed hierarchical erasure coding that, for a given replication cost, provides several additional 9's of durability over what can be achieved with existing black-box SSP interfaces, (2) performs an efficient end-to-end audit of SSPs to detect data loss that, for a 20% cost increase, improves data durability by two 9's by reducing MTTR, and (3) offers durable storage with cost, performance, and availability competitive with traditional storage systems. We instantiate and evaluate these ideas by building a SafeStore-based file system with an NFS-like interface.
SafeStore : A Durable and Practical Storage System
Ramakrishna ‘otla, jorenzo Alvisi, an? Mike Dahlin
The University of Texas at Austin
Abstract
This paper presents SafeStore, a distributed storage
system designed to maintain long-term data durabil-
ity despite conventional hardware and software faults,
environmental disruptions, and administrative failures
caused by human error or malice. The architecture
of SafeStore is based on fault isolation, which Safe-
Store applies aggressively along administrative, physi-
cal, and temporal dimensions by spreading data across
autonomous storage service providers (SSPs). However,
current storage interfaces provided by SSPs are not de-
signed for high end-to-end durability. In this paper,
we propose a new storage system architecture that (1)
spreads data efficiently across autonomous SSPs using
informe? hierarchical erasure coing that, for a given
replication cost, provides several additional 9’s of dura-
bility over what can be achieved with existing black-box
SSP interfaces, (2) performs an efficient end-to-end au-
dit of SSPs to detect data loss that, for a 20% cost in-
crease, improves data durability by two 9’s by reducing
MTTR, and (3) offers durable storage with cost, per-
formance, and availability competitive with traditional
storage systems. We instantiate and evaluate these ideas
by building a SafeStore-based file system with an NFS-
like interface.
1 Introduction
The ?esign of storage systems that provie ata ura-
bility on the time scale of ecaes is an increasingly
important challenge as more valuable information is
store? ?igitally [10, 31, m7]@ For example, ata from the
National Archives an? Recors A?ministration inicate
that 93% of companies go bankrupt within a year if they
lose their ?ata center in some isaster [m], an? a grow-
ing number of government laws [8, 22] manate multi-
year perio?s of ?ata retention for many types of infor-
mation [12, m0]@
Against a back?rop in which over 34% of companies
fail to test their tape backups [6] an? over 40% of in-
iviuals o not back up their ?ata at all [29], multi-
ecae scale urable storage raises two technical chal-
lenges@ First, there exist a broa? range of threats to ?ata
?urability incluing me?ia failures [m1, 60, 67], software
bugs [m2, 68], malware [18, 63], user error [m0, m9], a?-
ministrator error [39, 48], organizational failures [24,
28], malicious insi?ers [27, 32], an? natural isasters on
the scale of builings [7] or geographic regions [11]@
Requiring robustness on the scale of ecaes magnifies
them all: threats that coul? otherwise be consiere? neg-
ligible must now be a?resse@ Secon, such a system
has to be practical with cost, performance, an? availabil-
ity competitive with traitional systems@
Storage outsourcing is emerging as a popular ap-
proach to a?ress some of these challenges [41]@ óy
entrusting storage management to a Storage Service
Provier (SSP-, where “economies of scale? can min-
imize harware an? aministrative costs, inivi?ual
users an? small to meium-size? businesses seek cost-
effective professional system management an? peace
of min? vis-a-vis both conventional me?ia failures an?
catastrophic events@
Unfortunately, relying on an SSP is no panacea for
long-term ?ata integrity@ SSPs face the same list of har?
problems outline? above an? as a result even bran?-
name ones [9, 14] can still lose ?ata@ To make mat-
ters worse, clients often become aware of such losses
only after it is too late@ This opaqueness is a symp-
tom of a fun?amental problem: SSPs are separate a?-
ministrative entities an? the internal ?etails of their op-
eration may not be known by ata owners@ While most
SSPs may be highly competent an? follow best practices
punctiliously, some may not@ óy entrusting their ?ata to
back-box SSPs, ata owners may free themselves from
the aily worries of storage management, but they also
relinquish ultimate control over the fate of their ?ata@
In short, while SSPs are an economically attractive re-
sponse to the costs an? complexity of long-term ?ata
storage, they ?o not offer their clients any en-to-en?
guarantees on ata urability, which we efine as the
probability that a specific ?ata object will not be lost or
2007 USENIS Annual Technical ConferenceUSENIS Association
129
Aggressive isolation for durability. SafeStore stores
?ata reun?antly across multiple SSPs an? leverages
iversity across SSPs to prevent permanent ata loss
cause? by isolate? aministrator errors, software bugs,
insier attacks, bankruptcy, or natural catastrophes@
With respect to ?ata store? at each SSP, SafeStore em-
ploys a “trust but verify? approach: it oes not interfere
with the policies use? within each SSP to maintain ?ata
integrity, but it provies an audit interface so that ?ata
owner retain en?-to-en? control over ata integrity@ The
auit mechanism can quickly ?etect ?ata loss an? trigger
ata recovery from reunant storage before a?itional
faults result in unrecoverable loss@ Finally, to guar? ?ata
store? at SSPs against faults at the ata owner site (e@g@
operator errors, software bugs, an? malware attacks-,
SafeStore restricts the interface to provi?e temporal iso-
lation between clients an? SSPs so that the latter export
the abstraction of write-once-rea-many storage@
Making aggressive isolation practical. SafeStore in-
trouces an efficient storage interface to re?uce network
ban?with an? storage cost using an informed hierar-
chical erasure coding scheme, that, when applie? across
an? within SSPs, can achieve near-optimal urability@
SafeStore SSPs expose re?unant enco?ing options to
allow the system to efficiently ?ivie storage reun?an-
cies across an? within SSPs@ A?itionally, SafeStore
limits the cost of implementing its “trust but verify? pol-
icy through an auit protocol that shifts most of the pro-
cessing to the auite? SSPs an? encourages them proac-
tively measure an? report any ?ata loss they experience@
Dishonest SSPs are quickly caught with high probabil-
ity an? at little cost to the auitor using probabilistic spot
checks@ Finally, to reuce the banwi?th, performance,
an? availability costs of implementing geographic an?
aministrative isolation, SafeStore implements a two-
level storage architecture where a local server (possibly
running on the client machine- is use? as a soft-state
cache, an? if the local server crashes, SafeStore limits
own-time by quickly recovering the critical meta ?ata
from the remote SSPs while the actual ?ata is being re-
covere? in the backgroun@
Contributions. The contribution of this paper is a
highly ?urable storage architecture that uses a new repli-
cation interface to ?istribute ata efficiently across i-
verse set of SSPs an? an effective auit protocol to check
ata integrity@ We emonstrate that this approach can
provi?e high urability in a way that is practical an?
economically viable with cost, availability, an? perfor-
mance competitive with tra?itional systems@ We emon-
strate these ieas by builing an? evaluating SSFS, an
NFS-base? SafeStore storage system@ “verall, we show
that SafeStore provies an economical alternative to re-
alize multi-?ecae scale ?urable storage for inivi?uals
an? small-to-meium size? businesses with limite? re-
sources@ Note that although we focus our attention on
outsource? SSPs, the SafeStore architecture coul? also
be applie? internally by large enterprises that maintain
multiple isolate? ata centers@
2 Architecture and Design Principles
The main goal of SafeStore is to provi?e extremely
urable storage over many years or ecaes@
2.1 Threat model
“ver such long time perio?s, even relatively rare events
can affect ata urability, so we must consi?er broa?
range of threats along multiple imensions—physical,
a?ministrative, an? software@
Physical faults: Physical faults causing ?ata loss in-
clue ?isk meia faults [3m, 67], theft [23], fire [7], an?
wier geographical catastrophes [11]@ These faults can
result in ?ata loss at a single noe or spanning multiple
noes at a site or in a region@
Administrative and client-side faults: Acciental
misconfiguration by system a?ministrators [39, 48], e-
liberate insi?er sabotage [27, 32], or business failures
leaing to bankruptcy [24] can lea? to ata corruption
or loss@ Clients can also elete ?ata acci?entally by, for
example, executing “rm -r *@ A?ministrator an? client
faults can be particularly ?evastating because they can
affect replicas across otherwise isolate? subsystems@ For
instance [27], a system aministrator not only elete?
ata but also stole the only backup tape after he was
fire, resulting in financial amages in excess of C10
million an? layoff of 80 employees@
Software faults: Software bugs [m2, 68] in file sys-
tems, viruses [18], worms [63], an? Trojan horses can
elete or corrupt ?ata@ A vivi? example of threats ue to
malware is the recent phenomenon of ransomware [20]
where an attacker encrypts a user’s ?ata an? withhol?s
the encryption key until a ransom is pai@
“f course, any of the liste? faults may occur rarely@
óut at the scale of ?ecaes, it becomes risky to assume
that no rare events will occur@ It is important to note that
some of these failures [7, m1, 60] are often correlate? re-
sulting in simultaneous ata loss at multiple noes while
others [m2] are more likely to occur inepenently@
Limitations of existing practice. Most existing ap-
proaches to ata storage face two problems that are par-
ticularly acute in our target environments of inivi?uals
2007 USENIS Annual Technical Conference USENIS Association
130
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


