Deep Reinforcement Learning (DRL) has shown its extraordinary performance on a variety of challenging learning tasks, especially those in games. It has been recognized that DRL process is a high-dynamic and non-stationary optimization process even in the static environments, their performance is notoriously sensitive to the hyperparameter configuration which includes learning rate, discount coefficient, and step size, etc. The situation will be more serious when DRL is conducting in a changing environment. The most ideal state of hyperparameter configuration in DRL is that the hyperparameter can self-adapt to the best values promptly for their current learning state, rather than using a fixed set of hyperparameters for the whole course of training like most previous works did. In this paper, an efficient online hyperparameter adaptation method is presented, which is an improved version of Population-based Training (PBT) method on the promptness of adaptation. A recombination operation inspired by GA is introduced into the population adaptation to accelerating the convergence of the population towards the better hyperparameter configurations. Experiment results have shown that in four test environments, the presented method has achieved 92%, 70%, 2% and 15% performance improvement over PBT.
CITATION STYLE
Zhou, Y., Liu, W., & Li, B. (2019). Efficient online hyperparameter adaptation for deep reinforcement learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11454 LNCS, pp. 141–155). Springer Verlag. https://doi.org/10.1007/978-3-030-16692-2_10
Mendeley helps you to discover research relevant for your work.