Continuous Control in Deep Reinforcement Learning with Direct Policy Derivation from Q Network

3Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The reinforcement learning approach allows learning desired control policy in different environments without explicitly providing system dynamics. A model-free deep Q-learning algorithm is proven to be efficient on a large set of discrete-action tasks. Extension of this method to the continuous control task usually solved with actor-critic methods which approximate a policy function with additional actor network and uses Q function to speed up policy network training. Another approach is to discretize action space which will not give a smooth policy and is not applicable for large action spaces. A direct continuous policy derivation from the Q network leads to optimization of action on each inference and training step which is not efficient but provides optimal and continuous action. Time-efficient Q function input optimization is required in order to apply this method in practice. In this work, we implement efficient action derivation method which allows using Q-learning in real-time continuous control tasks. In addition, we test our algorithm on robotics control tasks from robotics gym environments and compare this method with modern continuous RL methods. The results have shown that in some cases proposed approach learns smooth continuous policy keeping the implementation simplicity of the original discreet action space Q-learning algorithm.

Cite

CITATION STYLE

APA

Akhmetzyanov, A., Yagfarov, R., Gafurov, S., Ostanin, M., & Klimchik, A. (2020). Continuous Control in Deep Reinforcement Learning with Direct Policy Derivation from Q Network. In Advances in Intelligent Systems and Computing (Vol. 1152 AISC, pp. 168–174). Springer. https://doi.org/10.1007/978-3-030-44267-5_25

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free