Efficient metadata management in large distributed storage systems
20th IEEE11th NASA Goddard Conference on Mass Storage Systems and Technologies 2003 MSST 2003 Proceedings (2003)
- ISBN: 0769519148
- DOI: 10.1109/MASS.2003.1194865
Available from ieeexplore.ieee.org
or
Abstract
Efficient metadata management is a critical aspect of overall system performance in large distributed storage systems. Directory subtree partitioning and pure hashing are two common techniques used for managing metadata in such systems, but both suffer from bottlenecks at very high concurrent access rates. We present a new approach called Lazy Hybrid (LH) metadata management that combines the best aspects of these two approaches while avoiding their shortcomings.
Page 1
Efficient metadata management in ...
Efficient Metadata Management in Large Distributed Storage Systems��� Scott A. Brandt scott@cs.ucsc.edu Ethan L. Miller elm@cs.ucsc.edu Darrell D. E. Long darrell@cs.ucsc.edu Lan Xue lanxue@cs.ucsc.edu Storage Systems Research Center University of California, Santa Cruz Abstract Efficient metadata management is a critical aspect of overall system performance in large distributed storage systems. Directory subtree partitioning and pure hashing are two common techniques used for managing metadata in such systems, but both suffer from bottlenecks at very high concurrent access rates. We present a new approach called Lazy Hybrid (LH) metadata management that combines the best aspects of these two approaches while avoiding their shortcomings. 1. Introduction In large distributed storage systems, avoiding bottle- necks is critical to achieving high performance and scal- ability. One potential bottleneck is metadata access. Al- though the size of metadata is generally small compared to the overall storage capacity of such a system, 50% to 80% of all file system accesses are to metadata [12], so the careful management of metadata is critical. We present Lazy Hybrid (LH) metadata management, a new metadata management architecture designed to provide very high- performance, scalable metadata management. Traditionally, metadata and data are managed by the same file system, on the same machine, and stored on the same device [9]. For efficiency, metadata is often stored physically close to the data it describes [7]. In some mod- ern distributed file systems, data is stored on devices that can be directly accessed through the network, while meta- data is managed separately by one or more specialized metadata servers [5]. We are developing LH in the context of a large high- performance object-based storage system [17]. Object- based storage systems separate the data and metadata man- ���This research is supported by Lawrence Livermore National Labora- tory, Los Alamos National Laboratory, and Sandia National Laboratory under contract 520714. Metadata access Metadata Server Metadata Server Metadata Server M etadata update Client Client Client Direct data access Object Based Storage Device Object Based Storage Device Object Based Storage Device Disk array High speed networks Storage area network Metadata Server Cluster Disk array Disk array Disk array Disk array Disk array High speed networks Figure 1. Storage system architecture agement as depicted in Figure 1. Semi-independent object- based storage devices (OBSDs) manage low-level data storage tasks such as request scheduling and data layout, and present a simple object-based data access interface to the rest of the system. A separate cluster of metadata servers manage the namespace and directory hierarchy, file and directory permissions, and the mapping from files to objects. The metadata server cluster is otherwise not in- volved in the storage and retrieval of data, allowing for very efficient concurrent data transfers between large numbers of clients and OBSDs. The goal in systems with specialized metadata manage- ment is to efficiently manage the metadata so that standard directory and file semantics can be maintained, but without negatively affecting overall system performance. This in- cludes handling large numbers of files ranging from bytes to terabytes in size, supporting very small and very large directories, and serving tens or hundreds of thousands of parallel accesses to different files in different directories, different files in the same directory, and even to the same file. A key question in the design of such a system is how to Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS���03) 0-7695-1914-8/03 $17.00 �� 2003 IEEE
Readership Statistics
19 Readers on Mendeley
by Discipline
by Academic Status
42% Ph.D. Student
21% Student (Master)
11% Researcher (at an Academic Institution)
by Country
16% Brazil
16% United States
11% Germany
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


