Average reward reinforcement learning for semi-markov decision processes

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we study new reinforcement learning (RL) algorithms for Semi-Markov decision processes (SMDPs) with an average reward criterion. Based on the discrete-time type Bellman optimality equation, we use incremental value iteration (IVI), stochastic shortest path (SSP) value iteration and bisection algorithms to derive novel RL algorithms in a straightforward way. These algorithms use IVI, SSP and dichotomy to directly estimate the optimal average reward to solve the instability of average reward RL, respectively. Furthermore, a simulation experiment is used to compare the convergence among these algorithms.

Cite

CITATION STYLE

APA

Yang, J., Li, Y., Chen, H., & Li, J. (2017). Average reward reinforcement learning for semi-markov decision processes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10634 LNCS, pp. 768–777). Springer Verlag. https://doi.org/10.1007/978-3-319-70087-8_79

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free