Guided deterministic policy optimization with gradient-free policy parameters information

Chun Shen; Sheng Zhu; Shuai Han; Xiaoyu Gong; Shuai Lü

Journal Article

Guided deterministic policy optimization with gradient-free policy parameters information

Expert Systems with Applications (2023) 231

DOI: 10.1016/j.eswa.2023.120693

9Citations

6Readers

Get full text

Abstract

Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) are two classical deterministic policy gradient algorithms. It is worth noting that the policies of both DDPG and TD3 are completely dependent on the gradient of critics. This will cause the policy to be unstable and easy to converge to the local optimum in the learning process. Although the idea of maximum entropy learning can provide more effective exploration, it can only be applied to the algorithm using stochastic policy, not to DDPG and TD3. In this paper, we propose a deterministic policy optimization method combining gradient-free policy parameters information (GFPPI). Specifically, we obtain a new set of policies by injecting Gaussian noise into the policy parameters, and then weight these policy parameters based on critics to obtain GFPPI. Finally, GFPPI is used as the regularization term of the policy optimization function to guide the policy update. GFPPI can mitigate premature policy convergence and facilitate exploration with optimistic principles. We provide the theoretical guarantee for monotonic improvement of expected cumulative return using augmented loss function with GFPPI, experimentally analyze the role of GFPPI in policy optimization and combine it with deterministic policy gradient information for policy optimization. The experiments on OpenAI gym demonstrate that GFPPI can improve sample efficiency and enable the algorithm to get higher performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Shen, C., Zhu, S., Han, S., Gong, X., & Lü, S. (2023). Guided deterministic policy optimization with gradient-free policy parameters information. Expert Systems with Applications, 231. https://doi.org/10.1016/j.eswa.2023.120693

Guided deterministic policy optimization with gradient-free policy parameters information

Abstract

Author supplied keywords

Cite

Register to see more suggestions