Prior research in resource scheduling for machine learning training workloads has largely focused on minimizing job completion times. Commonly, these model training workloads collectively search over a large number of parameter values that control the learning process in a hyperparameter search. It is preferable to identify and maximally provision the best-performing hyperparameter configuration (trial) to achieve the highest accuracy result as soon as possible. To optimally trade-off evaluating multiple configurations and training the most promising ones by a fixed deadline, we design and build HyperSched-A dynamic application-level resource scheduler to track, identify, and preferentially allocate resources to the best performing trials to maximize accuracy by the deadline. HyperSched leverages three properties of a hyperparameter search workload overlooked in prior work-trial disposability, progressively identifiable rankings among different configurations, and space-Time constraints-to outperform standard hyperparameter search algorithms across a variety of benchmarks.
CITATION STYLE
Liaw, R., Bhardwaj, R., Dunlap, L., Zou, Y., Gonzalez, J. E., Stoica, I., & Tumanov, A. (2019). HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline. In SoCC 2019 - Proceedings of the ACM Symposium on Cloud Computing (pp. 61–73). Association for Computing Machinery. https://doi.org/10.1145/3357223.3362719
Mendeley helps you to discover research relevant for your work.