Monitoring and analyzing I/O behaviors is critical to the efficient utilization of parallel storage systems. Unfortunately, with increasing I/O requirements and resource contention, I/O performance variability is becoming a significant concern. This paper investigates I/O behavior and performance variability on a large-scale high-performance computing (HPC) system using a novel methodology that identifies similarity across jobs from the same application leveraging an I/O characterization tool and then, detects potential I/O performance variability across jobs of the same application. We demonstrate and discuss how our unique methodology can be used to perform temporal and feature analyses to detect interesting I/O performance variability patterns in production HPC systems, and their implications for operating/managing large-scale systems.
CITATION STYLE
Costa, E., Patel, T., Schwaller, B., Brandt, J. M., & Tiwari, D. (2021). Systematically inferring i/o performance variability by examining repetitive job behavior. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society. https://doi.org/10.1145/3458817.3476186
Mendeley helps you to discover research relevant for your work.