Guided deterministic policy optimization with gradient-free policy parameters information

9Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) are two classical deterministic policy gradient algorithms. It is worth noting that the policies of both DDPG and TD3 are completely dependent on the gradient of critics. This will cause the policy to be unstable and easy to converge to the local optimum in the learning process. Although the idea of maximum entropy learning can provide more effective exploration, it can only be applied to the algorithm using stochastic policy, not to DDPG and TD3. In this paper, we propose a deterministic policy optimization method combining gradient-free policy parameters information (GFPPI). Specifically, we obtain a new set of policies by injecting Gaussian noise into the policy parameters, and then weight these policy parameters based on critics to obtain GFPPI. Finally, GFPPI is used as the regularization term of the policy optimization function to guide the policy update. GFPPI can mitigate premature policy convergence and facilitate exploration with optimistic principles. We provide the theoretical guarantee for monotonic improvement of expected cumulative return using augmented loss function with GFPPI, experimentally analyze the role of GFPPI in policy optimization and combine it with deterministic policy gradient information for policy optimization. The experiments on OpenAI gym demonstrate that GFPPI can improve sample efficiency and enable the algorithm to get higher performance.

Cite

CITATION STYLE

APA

Shen, C., Zhu, S., Han, S., Gong, X., & Lü, S. (2023). Guided deterministic policy optimization with gradient-free policy parameters information. Expert Systems with Applications, 231. https://doi.org/10.1016/j.eswa.2023.120693

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free