An Advantage Actor-Critic Deep Reinforcement Learning Method for Power Management in HPC Systems

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A primary concern when deploying a High-Performance Computing (HPC) system is its high energy consumption. Typical HPC systems consist of hundreds to thousands of compute nodes that consume huge amount of electrical power even during their idle states. One way to increase the energy efficiency is to apply the backfilling method to the First Come First Serve (FCFS) job scheduler (FCFS+Backfilling). The backfilling method allows jobs that arrive later than the first job in the queue to be executed earlier if the starting time of the first job is not affected, therefore increasing the throughput and the energy efficiency of the system. Nodes that are idle for a specific amount of time can also be switched off to further improve the energy efficiency. However, switching off nodes based only on their idle time can also impair the energy efficiency and the throughput instead of improving them. As an example, new jobs may immediately arrive after nodes are switched off, hence missing the chance of directly executing the jobs via backfilling. This paper proposed a Deep Reinforcement Learning (DRL)-based method to predict the most appropriate timing to switch on/off nodes. A DRL agent is trained with Advantage Actor-Critic algorithm to decide which nodes must be switched on/off at a specific timestep. Our simulation results on NASA iPSC/860 HPC historical job dataset show that the proposed method can reduce the total energy consumption compared to most of the conventional timeout policies that switch off nodes after they became idle for some period of time.

Cite

CITATION STYLE

APA

Khasyah, F. R., Santiyuda, K. G., Kaunang, G., Makhrus, F., Amrizal, M. A., & Takizawa, H. (2023). An Advantage Actor-Critic Deep Reinforcement Learning Method for Power Management in HPC Systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13798 LNCS, pp. 94–107). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-29927-8_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free