POLICY GRADIENT METHODS FOR DISCRETE TIME LINEAR QUADRATIC REGULATOR WITH RANDOM PARAMETERS

Deyue Li

Journal ArticleOPEN ACCESS

POLICY GRADIENT METHODS FOR DISCRETE TIME LINEAR QUADRATIC REGULATOR WITH RANDOM PARAMETERS

Li D

ESAIM - Control, Optimisation and Calculus of Variations (2024) 30

DOI: 10.1051/cocv/2024014

0Citations

1Readers

Get full text

Abstract

This paper studies an infinite horizon optimal control problem for discrete-time linear system and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. In this general setting, we apply the policy gradient method, a reinforcement learning technique, to search for the optimal control without requiring knowledge of statistical information of the parameters. We investigate the sub-Gaussianity of the state process and establish global linear convergence guarantee for this approach based on assumptions that are weaker and easier to verify compared to existing results. Numerical experiments are presented to illustrate our result.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, D. (2024). POLICY GRADIENT METHODS FOR DISCRETE TIME LINEAR QUADRATIC REGULATOR WITH RANDOM PARAMETERS. ESAIM - Control, Optimisation and Calculus of Variations, 30. https://doi.org/10.1051/cocv/2024014

POLICY GRADIENT METHODS FOR DISCRETE TIME LINEAR QUADRATIC REGULATOR WITH RANDOM PARAMETERS

Abstract

Author supplied keywords

Cite

Register to see more suggestions