A practical approach to reconciling availability, performance, and capacity in provisioning extreme-scale storage systems

5Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.

Abstract

The increasing data demands from high-performance computing applications significantly accelerate the capacity, capability and reliability requirements of storage systems. As systems scale, component failures and repair times increase, significantly impacting data availability. A wide array of decision points must be balanced in designing such systems. We propose a systematic approach that balances and optimizes both initial and continuous spare provisioning based on a detailed investigation of the anatomy and field failure data analysis of extreme-scale storage systems. We consider the component failure characteristics and its cost and impact at the system level simultaneously. We build a tool to evaluate different provisioning schemes, and the results demonstrate that our optimized provisioning can reduce the duration of data unavailability by as much as 52% under a fixed budget. We also observe that non-disk components have much higher failure rates than disks, and warrant careful considerations in the overall provisioning process.

Cite

CITATION STYLE

APA

Wan, L., Wang, F., Oral, S., Tiwari, D., Vazhkudai, S. S., & Cao, Q. (2015). A practical approach to reconciling availability, performance, and capacity in provisioning extreme-scale storage systems. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC (Vol. 15-20-November-2015). IEEE Computer Society. https://doi.org/10.1145/2807591.2807615

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free