Continuous Control in Deep Reinforcement Learning with Direct Policy Derivation from Q Network

Aydar Akhmetzyanov; Rauf Yagfarov; Salimzhan Gafurov; Mikhail Ostanin; Alexandr Klimchik

Conference Proceedings

Continuous Control in Deep Reinforcement Learning with Direct Policy Derivation from Q Network

Advances in Intelligent Systems and Computing (2020) 1152 AISC 168-174

DOI: 10.1007/978-3-030-44267-5_25

3Citations

3Readers

Get full text

Abstract

The reinforcement learning approach allows learning desired control policy in different environments without explicitly providing system dynamics. A model-free deep Q-learning algorithm is proven to be efficient on a large set of discrete-action tasks. Extension of this method to the continuous control task usually solved with actor-critic methods which approximate a policy function with additional actor network and uses Q function to speed up policy network training. Another approach is to discretize action space which will not give a smooth policy and is not applicable for large action spaces. A direct continuous policy derivation from the Q network leads to optimization of action on each inference and training step which is not efficient but provides optimal and continuous action. Time-efficient Q function input optimization is required in order to apply this method in practice. In this work, we implement efficient action derivation method which allows using Q-learning in real-time continuous control tasks. In addition, we test our algorithm on robotics control tasks from robotics gym environments and compare this method with modern continuous RL methods. The results have shown that in some cases proposed approach learns smooth continuous policy keeping the implementation simplicity of the original discreet action space Q-learning algorithm.

Author supplied keywords

Cite

CITATION STYLE

APA

Akhmetzyanov, A., Yagfarov, R., Gafurov, S., Ostanin, M., & Klimchik, A. (2020). Continuous Control in Deep Reinforcement Learning with Direct Policy Derivation from Q Network. In Advances in Intelligent Systems and Computing (Vol. 1152 AISC, pp. 168–174). Springer. https://doi.org/10.1007/978-3-030-44267-5_25

Continuous Control in Deep Reinforcement Learning with Direct Policy Derivation from Q Network

Abstract

Author supplied keywords

Cite

Register to see more suggestions