Policy iteration based on stochastic factorization

André M.S. Barreto; Joelle Pineau; Doina Precup

Journal ArticleOPEN ACCESS

Policy iteration based on stochastic factorization

Journal of Artificial Intelligence Research (2014) 50 763-803

DOI: 10.1613/jair.4301

13Citations

19Readers

Abstract

When a transition probability matrix is represented as the product of two stochastic matrices, one can swap the factors of the multiplication to obtain another transition matrix that retains some fundamental characteristics of the original. Since the derived matrix can be much smaller than its precursor, this property can be exploited to create a compact version of a Markov decision process (MDP), and hence to reduce the computational cost of dynamic programming. Building on this idea, this paper presents an approximate policy iteration algorithm called policy iteration based on stochastic factorization, or PISF for short. In terms of computational complexity, PISF replaces standard policy iteration's cubic dependence on the size of the MDP with a function that grows only linearly with the number of states in the model. The proposed algorithm also enjoys nice theoretical properties: it always terminates after a finite number of iterations and returns a decision policy whose performance only depends on the quality of the stochastic factorization. In particular, if the approximation error in the factorization is sufficiently small, PISF computes the optimal value function of the MDP. The paper also discusses practical ways of factoring an MDP and illustrates the usefulness of the proposed algorithm with an application involving a large-scale decision problem of real economical interest. © 2014 AI Access Foundation. All rights reserved.

Cite

CITATION STYLE

APA

Barreto, A. M. S., Pineau, J., & Precup, D. (2014). Policy iteration based on stochastic factorization. Journal of Artificial Intelligence Research, 50, 763–803. https://doi.org/10.1613/jair.4301

Policy iteration based on stochastic factorization

Abstract

Cite

Register to see more suggestions