Prediction of the Resource Consumption of Distributed Deep Learning Systems

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Predicting resource consumption for the distributed training of deep learning models is of paramount importance, as it can inform a priori users of how long their training would take and enable users to manage the cost of training. Yet, no such prediction is available for users because the resource consumption itself varies significantly according to "settings"such as GPU types and also by "workloads"like deep learning models. Previous studies have attempted to derive or model such a prediction, but they fall short of accommodating the various combinations of settings and workloads together. This study presents Driple, which designs graph neural networks to predict the resource consumption of diverse workloads. Driple also designs transfer learning to extend the graph neural networks to adapt to differences in settings. The evaluation results show that Driple effectively predicts a wide range of workloads and settings. In addition, Driple can efficiently reduce the time required to tailor the prediction for different settings by up to 7.3×.

Cite

CITATION STYLE

APA

Yang, G., Shin, C., Lee, J., Yoo, Y., & Yoo, C. (2022). Prediction of the Resource Consumption of Distributed Deep Learning Systems. In Performance Evaluation Review (Vol. 50, pp. 69–70). Association for Computing Machinery. https://doi.org/10.1145/3489048.3530962

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free