Average reward reinforcement learning for semi-markov decision processes

Jiayuan Yang; Yanjie Li; Haoyao Chen; Jiangang Li

Conference Proceedings

Average reward reinforcement learning for semi-markov decision processes

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10634 LNCS 768-777

DOI: 10.1007/978-3-319-70087-8_79

1Citations

4Readers

Get full text

Abstract

In this paper, we study new reinforcement learning (RL) algorithms for Semi-Markov decision processes (SMDPs) with an average reward criterion. Based on the discrete-time type Bellman optimality equation, we use incremental value iteration (IVI), stochastic shortest path (SSP) value iteration and bisection algorithms to derive novel RL algorithms in a straightforward way. These algorithms use IVI, SSP and dichotomy to directly estimate the optimal average reward to solve the instability of average reward RL, respectively. Furthermore, a simulation experiment is used to compare the convergence among these algorithms.

Author supplied keywords

Cite

CITATION STYLE

APA

Yang, J., Li, Y., Chen, H., & Li, J. (2017). Average reward reinforcement learning for semi-markov decision processes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10634 LNCS, pp. 768–777). Springer Verlag. https://doi.org/10.1007/978-3-319-70087-8_79

Average reward reinforcement learning for semi-markov decision processes

Abstract

Author supplied keywords

Cite

Register to see more suggestions