Analyzing I/O Performance of a Hierarchical HPC Storage System for Distributed Deep Learning

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Deep learning is a vital technology in our lives today. Both the size of training datasets and neural networks are growing to tackle more challenging problems with deep learning. Distributed deep neural network (DDNN) training technique is necessary to train a model with large datasets and networks. For large-scale DDNN training, HPC clusters are excellent computation environments. I/O performance is critical in large-scale DDNN on HPC clusters because it is becoming a bottleneck. Most flagship-class HPC clusters have hierarchical storage systems. It is necessary to quantify the performance improvement effect of the hierarchical storage system on the workloads to design future HPC storage systems. This study demonstrates the quantitative performance analysis of the hierarchical storage system for DDNN workload in a flagship-class supercomputer. Our analysis shows how much performance improvement and storage volume increment will be required to achieve the performance goal.

Cite

CITATION STYLE

APA

Fukai, T., Sato, K., & Hirofuchi, T. (2023). Analyzing I/O Performance of a Hierarchical HPC Storage System for Distributed Deep Learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13798 LNCS, pp. 81–93). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-29927-8_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free