A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

Krishna C. Kalagarla; Rahul Jain; Pierluigi Nuzzo

Conference ProceedingsOPEN ACCESS

A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

35th AAAI Conference on Artificial Intelligence, AAAI 2021 (2021) 9B 8030-8037

DOI: 10.1609/aaai.v35i9.16979

35Citations

11Readers

Abstract

Constrained Markov decision processes (CMDPs) formalize sequential decision-making problems whose objective is to minimize a cost function while satisfying constraints on various cost functions. In this paper, we consider the setting of episodic fixed-horizon CMDPs. We propose an online algorithm which leverages the linear programming formulation of repeated optimistic planning for finite-horizon CMDP to provide a probably approximately correctness (PAC) guarantee on the number of episodes needed to ensure a near optimal policy, i.e., with resulting objective value close to that of the optimal value and satisfying the constraints within low tolerance, with high probability. The number of episodes needed is shown to have linear dependence on the sizes of the state and action spaces and quadratic dependence on the time horizon and an upper bound on the number of possible successor states for a state-action pair. Therefore, if the upper bound on the number of possible successor states is much smaller than the size of the state space, the number of needed episodes becomes linear in the sizes of the state and action spaces and quadratic in the time horizon.

Cite

CITATION STYLE

APA

Kalagarla, K. C., Jain, R., & Nuzzo, P. (2021). A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 9B, pp. 8030–8037). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i9.16979

A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

Abstract

Cite

Register to see more suggestions