Double Sarsa and Double Expected Sarsa with Shallow and Deep Learning

  • Ganger M
  • Duryea E
  • Hu W
N/ACitations
Citations of this article
35Readers
Mendeley users who have this article in their library.

Abstract

Double Q-learning has been shown to be effective in reinforcement learning scenarios when the reward system is stochastic. We apply the idea of double learning that this algorithm uses to Sarsa and Expected Sarsa, producing two new algorithms called Double Sarsa and Double Expected Sarsa that are shown to be more robust than their single counterparts when rewards are stochastic. We find that these algorithms add a significant amount of stability in the learning process at only a minor computational cost, which leads to higher returns when using an on-policy algorithm. We then use shallow and deep neural networks to approximate the actionvalue, and show that Double Sarsa and Double Expected Sarsa are much more stable after convergence and can collect larger rewards than the single versions.

Cite

CITATION STYLE

APA

Ganger, M., Duryea, E., & Hu, W. (2016). Double Sarsa and Double Expected Sarsa with Shallow and Deep Learning. Journal of Data Analysis and Information Processing, 04(04), 159–176. https://doi.org/10.4236/jdaip.2016.44014

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free