Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient Method and Global Convergence

11Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Recently, policy optimization has received renewed attention from the control community due to various applications in reinforcement learning tasks. In this article, we investigate the global convergence of the gradient method for quadratic optimal control of discrete-time Markovian jump linear systems (MJLS). First, we study the optimization landscape of direct policy optimization for MJLS, with static-state feedback controllers and quadratic performance costs. Despite the nonconvexity of the resultant problem, we are still able to identify several useful properties such as coercivity, gradient dominance, and smoothness. Based on these properties, we prove that the gradient method converges to the optimal-state feedback controller for MJLS at a linear rate if initialized at a controller, which is mean-square stabilizing. This article brings new insights for understanding the performance of the policy gradient method on the Markovian jump linear quadratic control problem.

Cite

CITATION STYLE

APA

Jansch-Porto, J. P., Hu, B., & Dullerud, G. E. (2023). Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient Method and Global Convergence. IEEE Transactions on Automatic Control, 68(4), 2475–2482. https://doi.org/10.1109/TAC.2022.3176439

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free